Last week I came across a Python package called pandas-profiling before that I dont know how powerful this package is, I mean super easy to use and get the first glance of data in super easy way. Its purpose is to automate a lot of descriptive analysis that many Data Scientists tend to do when they first dive in to a new dataset. It is so easy to use, I feel it is a powerful starting point for anyone who is faced with a dataset and an open-ended analysis task.
Though we have pandas df.describe() function is great but a little basic for serious exploratory data analysis.
Let’s run through a quick demo to see how it works.
Installation:
You can install using the pip package manager by running
pip install pandas-profiling
Using conda:
You can install using the conda package manager by running
conda install pandas-profiling
Usage:
The profile report is written in HTML5 and CSS3, which means pandas-profiling requires a modern browser.
Jupyter Notebook:
Start by loading in your pandas DataFrame, e.g. by using
import pandas as pd
import pandas_profiling
df=pd.read_csv("hello.csv", parse_dates=True, encoding='UTF-8')
To display the report in a Jupyter notebook, run:
pandas_profiling.ProfileReport(df)
If you want to generate a HTML report file, save the ProfileReport to an object and use the to_file() function:
profile = pandas_profiling.ProfileReport(df) profile.to_file(outputfile="/tmp/myoutputfile.html")
Dependencies:
An internet connection. Pandas-profiling requires an internet connection to download the Bootstrap and JQuery libraries.
I might change this in the future, let me know if you want that sooner than later.
python (>= 2.7) pandas (>=0.19) matplotlib (>=1.4) six (>=1.9)
That’s about all there is to it. A quick and easy way to do a lot of descriptive analysis in a very short amount of time.
I will say a must try as it is super powerful in terms of visualization, That’s my opinion.about “pandas-profiling”
Have fun and Happy learning 🙂