Hello everyone! Today I want to write about the Pandas library and here are the 30 things you can do with Pandas to better understand the data!
First thing first, lets import pandas library:
import pandas as pd
df=pd.read_csv(‘test.csv’) # read a test file to dataframe
(1) Read in a CSV dataset
pd.DataFrame.from_csv(“csv_file”)
or
pd.read_csv(“csv_file”)
(2) Read in an Excel dataset
pd.read_excel(“excel_file”)
(3) Write your data frame directly to csv
df.to_csv(“data.csv”, sep=”,”, index=False)
(4) Create a dataframe from data with column names
pd.DataFrame(data,columns=[])
(5) Get Data type for all the columns
df.dtypes
(6) Basic dataset feature info
df.info()
(7) Basic dataset statistics
print(df.describe())
(8) List the column names
df.columns
(9) Drop missing data
df.dropna(axis=0, how=’any’)
(10) Replace missing data
df.replace(to_replace=None, value=None)
(11) Check for NANs
pd.isnull(object)
(12) Drop a feature
df.drop(‘feature_variable_name’, axis=1)
(13) Convert object type to float
pd.to_numeric(df[“feature_name”], errors=’coerce’)
(14) Convert data frame to numpy array
df.as_matrix()
(15) Get first “n” rows of a data frame
df.head(n)
(16) Get last “n” rows of a data frame
df.tail(n)
(17) Get data by feature name
df.loc[feature_name]
(18) Apply a function to a data frame
df[“height”].apply(lambda height: 2 * height)
(19) Renaming a column
df.rename(columns = {df.columns[2]:’size’}, inplace=True)
(20) Count categories of categorical variable
df[“job”].value_counts()
(21) Get the unique entries of a column
df[“name”].unique()
(22) Accessing sub-data frames
new_df = df[[“name”, “size”]]
(23) Summary information about your data
# Sum of values in a data frame
df.sum()
# Lowest value of a data frame
df.min()
# Highest value
df.max()
# Index of the lowest value
df.idxmin()
# Index of the highest value
df.idxmax()
# Statistical summary of the data frame, with quartiles, median, etc.
df.describe()
# Average values
df.mean()
# Median values
df.median()
# Correlation between columns
df.corr()
# To get these values for only one column, just select it like this#
df[“size”].median()
(24) Sorting your data
df.sort_values(ascending = False)
(25) Boolean indexing
df[df[“size”] == 5]
(26) Selecting values
df.loc([0], [‘size’])
(27 Cross frequency tables between two variables
pd.crosstab(df[“y”],df[“z”])
(28) Plot function for numeric columns
df[“size”].plot()
(29) Get shape (row,columns) of the DataFrame
df.shape
(30) Get Randomly selected n rows from DataFrame
df.sample(n)
There are many more useful things in pandas. We’ll see more about them in upcoming posts.

“Happy Reading, Happy Learning”