30 things you can do with Pandas

Hello everyone! Today I want to write about the Pandas library and here are the 30 things you can do with Pandas to better understand the data!

First thing first, lets import pandas library:

import pandas as pd
df=pd.read_csv(‘test.csv’) # read a test file to dataframe

(1) Read in a CSV dataset

pd.DataFrame.from_csv(“csv_file”)
or
pd.read_csv(“csv_file”)

(2) Read in an Excel dataset

pd.read_excel(“excel_file”)

(3) Write your data frame directly to csv

df.to_csv(“data.csv”, sep=”,”, index=False)

(4) Create a dataframe from data with column names

pd.DataFrame(data,columns=[])

(5) Get Data type for all the columns

df.dtypes

(6) Basic dataset feature info

df.info()

(7) Basic dataset statistics

print(df.describe())

(8) List the column names

df.columns

(9) Drop missing data

df.dropna(axis=0, how=’any’)

(10) Replace missing data

df.replace(to_replace=None, value=None)

(11) Check for NANs

pd.isnull(object)

(12) Drop a feature

df.drop(‘feature_variable_name’, axis=1)

(13) Convert object type to float

pd.to_numeric(df[“feature_name”], errors=’coerce’)

(14) Convert data frame to numpy array

df.as_matrix()

(15) Get first “n” rows of a data frame

df.head(n)

(16) Get last “n” rows of a data frame

df.tail(n)

(17) Get data by feature name

df.loc[feature_name]

(18) Apply a function to a data frame

df[“height”].apply(lambda height: 2 * height)

(19) Renaming a column

df.rename(columns = {df.columns[2]:’size’}, inplace=True)

(20) Count categories of categorical variable

df[“job”].value_counts()

(21) Get the unique entries of a column

df[“name”].unique()

(22) Accessing sub-data frames

new_df = df[[“name”, “size”]]

(23) Summary information about your data

# Sum of values in a data frame
df.sum()
# Lowest value of a data frame
df.min()
# Highest value
df.max()
# Index of the lowest value
df.idxmin()
# Index of the highest value
df.idxmax()
# Statistical summary of the data frame, with quartiles, median, etc.
df.describe()
# Average values
df.mean()
# Median values
df.median()
# Correlation between columns
df.corr()
# To get these values for only one column, just select it like this#
df[“size”].median()

(24) Sorting your data

df.sort_values(ascending = False)

(25) Boolean indexing

df[df[“size”] == 5]

(26) Selecting values

df.loc([0], [‘size’])

(27 Cross frequency tables between two variables

pd.crosstab(df[“y”],df[“z”])

(28) Plot function for numeric columns

df[“size”].plot()

(29) Get shape (row,columns) of the DataFrame

df.shape

(30) Get Randomly selected n rows from DataFrame

df.sample(n)

There are many more useful things in pandas. We’ll see more about them in upcoming posts.

“Happy Reading, Happy Learning”

30 things you can do with Pandas

Published by llamasearch

Leave a comment Cancel reply

Share this:

Related

Published by llamasearch

Leave a comment Cancel reply