0% found this document useful (0 votes)

7 views12 pages

EDA in Python

The document provides a comprehensive guide on Exploratory Data Analysis (EDA) in Python, detailing essential setup steps, data loading techniques, and methods for handling missing values and duplicates. It also covers descriptive statistics, group analysis, correlation, and various visualization techniques. Additionally, automated EDA tools and export options for cleaned data are discussed to enhance the analysis process.

Uploaded by

gowthami jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views12 pages

EDA in Python

Uploaded by

gowthami jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Exploratory Data

Analysis (EDA) in
Python
Core Techniques Every Data Professional
Should Know

Pooja Pawar
EDA Setup
Create virtual env → isolate project
$ python -m venv eda_env

Activate env → start workspace

$ source eda_env/bin/activate

Upgrade pip → avoid install errors

$ python -m pip install --upgrade pip

Install core libraries → EDA essentials

$ pip install pandas numpy
$ pip install matplotlib seaborn

Install stats tools → deeper analysis

$ pip install scipy

Install profiling → automated EDA

$ pip install ydata-profiling

Launch Jupyter → interactive analysis

$ jupyter notebook Pooja Pawar
Load Data & First Look
Read CSV file → load dataset into a
DataFrame
pd.read_csv("[Link]")

Read Excel file → import spreadsheet data

into Python
pd.read_excel("[Link]")

Head → preview first few rows of the dataset

[Link]()

Tail → inspect last rows to check data

completeness
[Link]()

Shape → check number of rows and columns

[Link]

Columns → list all feature names in the

dataset
[Link] Pooja Pawar
Data Types & Columns
Dtypes → check column data types
[Link]

Convert to datetime → fix dates

pd.to_datetime(df["date"])

Convert to numeric → clean numbers

pd.to_numeric(df["amount"], errors="coerce")

Astype category → optimize memory

df["city"].astype("category")

Rename columns → consistency

[Link](columns={"Order Date":"order_date"})

Sort values → inspect extremes

df.sort_values("amount")

Unique values → detect IDs

[Link]()

Pooja Pawar
Missing Values Handling
Is null → missing check
[Link]().any()

Null count → column-wise

[Link]().sum()

Null percentage → severity

[Link]().mean()*100

Drop null rows → strict cleaning

[Link]()

Fill with value → simple impute

[Link](0)

Fill with median → numeric fix

df["amount"].fillna(df["amount"].median())

Forward fill → time series

[Link](method="ffill")
Pooja Pawar
Duplicates & Quality
Checks
Find duplicates → detect repeats
[Link]()

Count duplicates → data quality

[Link]().sum()

Drop duplicates → clean data

df.drop_duplicates()

Subset duplicates → key-based

[Link](subset=["id","date"])

Memory usage → dataset size

df.memory_usage(deep=True)

Sample rows → random check

[Link](5)

Value counts → category spread

df["status"].value_counts() Pooja Pawar
Descriptive Statistics
Mean → average value
df["amount"].mean()

Median → central value

df["amount"].median()

Std dev → variation

df["amount"].std()

Quantiles → distribution cut

df["amount"].quantile([0.25,0.5,0.75])

Skew → distribution shape

df["amount"].skew()

Kurtosis → tail heaviness

df["amount"].kurt()

Mode → most frequent

df["status"].mode()
Pooja Pawar
GroupBy Analysis
Group mean → segment average
[Link]("city")["amount"].mean()

Group sum → totals

[Link]("city")["amount"].sum()

Multiple agg → deeper insight

[Link]("city")
["amount"].agg(["mean","median","count"])

Pivot table → summary view

pd.pivot_table(df, values="amount", index="city")

Crosstab → category vs category

[Link](df["city"], df["status"])

Rank within group → comparison

[Link]("city")["amount"].rank()

Top N per group → leaders

df.sort_values("amount").groupby("city").tail(3)
Pooja Pawar
Correlation &
Relationships
Correlation matrix → relationships
df.select_dtypes("number").corr()

Target correlation → drivers

[Link]()["amount"]

Covariance → joint variation

[Link]()

Scatter plot → relation view

[Link](df["x"], df["y"])

Pairplot → multi-feature view

[Link](df)

Heatmap → correlation visual

[Link]([Link]())

Line fit → trend check

[Link](x, y, 1) Pooja Pawar
Visual EDA
Histogram → distribution
df["amount"].hist()

Boxplot → outliers
[Link](column="amount")

Bar plot → category counts

df["city"].value_counts().[Link]()

Line plot → trends

[Link](x="date", y="amount")

Countplot → frequency
[Link](x="status", data=df)

Violin plot → density

[Link](x="status", y="amount", data=df)

Save plot → reuse

[Link]("[Link]")
Pooja Pawar
Automated EDA &
Export
Profile report → full EDA
from ydata_profiling import ProfileReport
ProfileReport(df).to_file("[Link]")

Minimal profile → large data

ProfileReport(df, minimal=True)

Sweetviz → instant report

[Link](df).show_html()

Missingno matrix → null pattern

[Link](df)

Export CSV → cleaned data

df.to_csv("[Link]", index=False)

Export Parquet → analytics-ready

df.to_parquet("[Link]")

Save stats → share insights

[Link]().to_csv("[Link]") Pooja Pawar
Follow for more
data analytics
content

Pooja Pawar

EDA Python Cheat Sheet for Analysts
No ratings yet
EDA Python Cheat Sheet for Analysts
2 pages
DataFrames in Machine Learning
No ratings yet
DataFrames in Machine Learning
10 pages
Universal Data Analytics Guide
No ratings yet
Universal Data Analytics Guide
51 pages
EDA Cheat Sheet for Pandas Users
No ratings yet
EDA Cheat Sheet for Pandas Users
7 pages
EDA Basics: Python Guide for Data Analysis
100% (1)
EDA Basics: Python Guide for Data Analysis
30 pages
Exploratory Data Analysis Techniques
100% (1)
Exploratory Data Analysis Techniques
8 pages
EDA Techniques and Visualization in Python
No ratings yet
EDA Techniques and Visualization in Python
14 pages
Data Science with Pandas Guide
No ratings yet
Data Science with Pandas Guide
30 pages
Essential Steps in Exploratory Data Analysis
No ratings yet
Essential Steps in Exploratory Data Analysis
11 pages
Understanding Exploratory Data Analysis
No ratings yet
Understanding Exploratory Data Analysis
6 pages
EDA in SAS: Communicating Insights
No ratings yet
EDA in SAS: Communicating Insights
25 pages
Introduction to Exploratory Data Analysis
No ratings yet
Introduction to Exploratory Data Analysis
40 pages
Data Exploration and Munging Guide
No ratings yet
Data Exploration and Munging Guide
11 pages
Data Analytics with Pandas Guide
No ratings yet
Data Analytics with Pandas Guide
42 pages
Pandas EDA Cheatsheet for Data Analysis
No ratings yet
Pandas EDA Cheatsheet for Data Analysis
14 pages
EDA with Pandas: Comprehensive Cheat Sheet
No ratings yet
EDA with Pandas: Comprehensive Cheat Sheet
8 pages
Data Exploration in Python Guide
100% (1)
Data Exploration in Python Guide
12 pages
Pandas EDA Cheat Sheet
No ratings yet
Pandas EDA Cheat Sheet
7 pages
EDA with Pandas: A Comprehensive Cheat Sheet
No ratings yet
EDA with Pandas: A Comprehensive Cheat Sheet
6 pages
Exploratory Data Analysis (EDA) Guide
No ratings yet
Exploratory Data Analysis (EDA) Guide
25 pages
Python Data Analysis Essentials Guide
No ratings yet
Python Data Analysis Essentials Guide
5 pages
EDA Steps in Python Guide
No ratings yet
EDA Steps in Python Guide
2 pages
EDA Techniques Using Pandas Guide
No ratings yet
EDA Techniques Using Pandas Guide
3 pages
Data Preprocessing & EDA Guide
No ratings yet
Data Preprocessing & EDA Guide
9 pages
EDA vs Descriptive Analysis Techniques
No ratings yet
EDA vs Descriptive Analysis Techniques
47 pages
Exploratory Data Analysis Overview
No ratings yet
Exploratory Data Analysis Overview
6 pages
Box Plot and Data Analysis Techniques
No ratings yet
Box Plot and Data Analysis Techniques
7 pages
Python Data Analysis Techniques
No ratings yet
Python Data Analysis Techniques
42 pages
Mastering Exploratory Data Analysis EDA
No ratings yet
Mastering Exploratory Data Analysis EDA
63 pages
Exploratory Data Analysis Micro Project
No ratings yet
Exploratory Data Analysis Micro Project
21 pages
Intro to Data Analysis with Python
100% (2)
Intro to Data Analysis with Python
29 pages
Essential Steps in Exploratory Data Analysis
No ratings yet
Essential Steps in Exploratory Data Analysis
47 pages
EDA with Pandas: Techniques & Tools
No ratings yet
EDA with Pandas: Techniques & Tools
9 pages
Data Analysis with Python Course Overview
No ratings yet
Data Analysis with Python Course Overview
137 pages
Essential Pandas Commands Cheat Sheet
No ratings yet
Essential Pandas Commands Cheat Sheet
11 pages
File Handling and Data Analysis in Python
No ratings yet
File Handling and Data Analysis in Python
2 pages
Pandas Cheat Sheet for Data Analysis
No ratings yet
Pandas Cheat Sheet for Data Analysis
20 pages
Data Cleansing Techniques for Analytics
No ratings yet
Data Cleansing Techniques for Analytics
30 pages
Python Interview Cheat Sheet for Analysts
No ratings yet
Python Interview Cheat Sheet for Analysts
2 pages
EDA on Global Superstore Dataset
No ratings yet
EDA on Global Superstore Dataset
16 pages
Essential Guide to Exploratory Data Analysis
No ratings yet
Essential Guide to Exploratory Data Analysis
11 pages
Understanding Pandas Data Structures
No ratings yet
Understanding Pandas Data Structures
62 pages
Pandas Data Analytics Guide
No ratings yet
Pandas Data Analytics Guide
21 pages
Exploratory Data Analysis Basics in Python
No ratings yet
Exploratory Data Analysis Basics in Python
10 pages
Pandas Data Aggregation & Time Series
No ratings yet
Pandas Data Aggregation & Time Series
1 page
Data Preprocessing with Pandas in Python
No ratings yet
Data Preprocessing with Pandas in Python
15 pages
Step-by-Step EDA Guide
No ratings yet
Step-by-Step EDA Guide
26 pages
Jupyter Notebook Data Analysis Cheat Sheet
No ratings yet
Jupyter Notebook Data Analysis Cheat Sheet
10 pages
Importance of Exploratory Data Analysis
No ratings yet
Importance of Exploratory Data Analysis
7 pages
Essential Pandas Operations Guide
No ratings yet
Essential Pandas Operations Guide
6 pages
Bivariate Analysis in Python EDA Guide
No ratings yet
Bivariate Analysis in Python EDA Guide
10 pages
Pandas Data Manipulation for ML
No ratings yet
Pandas Data Manipulation for ML
10 pages
Data Analysis Techniques and Insights
No ratings yet
Data Analysis Techniques and Insights
52 pages
EDA Techniques: Dataset Insights & Cleanup
No ratings yet
EDA Techniques: Dataset Insights & Cleanup
19 pages
Data Exploration and Cleaning Techniques
No ratings yet
Data Exploration and Cleaning Techniques
12 pages
Data Wrangling and Cleaning in Python
No ratings yet
Data Wrangling and Cleaning in Python
22 pages
Essential Python Cheat Sheet for Data Analysis
No ratings yet
Essential Python Cheat Sheet for Data Analysis
12 pages
Data Analysis with Python Tools
No ratings yet
Data Analysis with Python Tools
9 pages
Generative AI: A Beginner's Guide
No ratings yet
Generative AI: A Beginner's Guide
3 pages
UV-C Sterilizer Box for Safe Disinfection
No ratings yet
UV-C Sterilizer Box for Safe Disinfection
4 pages
Super 100 Performance Report 2024
No ratings yet
Super 100 Performance Report 2024
8 pages
Test Instructions and Guidelines
No ratings yet
Test Instructions and Guidelines
2 pages
Control System Analysis and Design
No ratings yet
Control System Analysis and Design
125 pages
Applsci 14 01406
No ratings yet
Applsci 14 01406
17 pages
Light CNN for Type 2 Diabetes Detection
No ratings yet
Light CNN for Type 2 Diabetes Detection
14 pages
Unit5 Questions
No ratings yet
Unit5 Questions
17 pages
Wrong Answer Summary for EE Exam
No ratings yet
Wrong Answer Summary for EE Exam
2 pages
Wireless Lans Part Ii: 802.11A/B/G/N/Ac
No ratings yet
Wireless Lans Part Ii: 802.11A/B/G/N/Ac
12 pages
3D MRI Analysis of Equine MCP Joint
No ratings yet
3D MRI Analysis of Equine MCP Joint
7 pages
Software Beta Tester NDA Template
No ratings yet
Software Beta Tester NDA Template
3 pages
OOP Toll Plaza Management System
No ratings yet
OOP Toll Plaza Management System
5 pages
Binary to Decimal Conversion Guide
No ratings yet
Binary to Decimal Conversion Guide
6 pages
Understanding Cyber Deviance Online
No ratings yet
Understanding Cyber Deviance Online
208 pages
7-Level Inverter Hardware Implementation
No ratings yet
7-Level Inverter Hardware Implementation
84 pages
Matrix Operations and Graphing Solutions
No ratings yet
Matrix Operations and Graphing Solutions
24 pages
Cloudflare's AI Labyrinth Against Scrapers
No ratings yet
Cloudflare's AI Labyrinth Against Scrapers
8 pages
Titan Hoist Operating Instructions
No ratings yet
Titan Hoist Operating Instructions
11 pages
LAMIBIND 420 Bookbinding Machine Overview
100% (1)
LAMIBIND 420 Bookbinding Machine Overview
2 pages
LingoJam Font Changer Overview
No ratings yet
LingoJam Font Changer Overview
1 page
Eco New
No ratings yet
Eco New
6 pages
Cybersecurity Issues and Challenges
No ratings yet
Cybersecurity Issues and Challenges
29 pages
DIY DRL Modification for Grand Vitara
No ratings yet
DIY DRL Modification for Grand Vitara
16 pages
EZVIZ 2018-2019 Product Catalog
No ratings yet
EZVIZ 2018-2019 Product Catalog
36 pages
Algorithm Design Lab Manual
No ratings yet
Algorithm Design Lab Manual
27 pages
AI in Government: Trends and Future Directions
No ratings yet
AI in Government: Trends and Future Directions
18 pages
Boundary Value Analysis for User Registration
No ratings yet
Boundary Value Analysis for User Registration
15 pages
Simatic S7/HMI SIMATIC Automation Tool V 3.0 Release Notes
No ratings yet
Simatic S7/HMI SIMATIC Automation Tool V 3.0 Release Notes
5 pages
Draft 2nd Sem UG Time Table 2025-26 W.E.F. 02.02.26-1
No ratings yet
Draft 2nd Sem UG Time Table 2025-26 W.E.F. 02.02.26-1
20 pages
? Industry 4.0 To Industry 5.0
100% (1)
? Industry 4.0 To Industry 5.0
201 pages
Ultimate Digital Forensics Handbook
No ratings yet
Ultimate Digital Forensics Handbook
29 pages
Demographic Analysis in SPSS Setup
No ratings yet
Demographic Analysis in SPSS Setup
8 pages
BPM Aris Part1
No ratings yet
BPM Aris Part1
10 pages
Process Control Techniques in SPC
0% (1)
Process Control Techniques in SPC
2 pages
Job Application Form
No ratings yet
Job Application Form
3 pages
Affidavit of Paternity and Surname Use
No ratings yet
Affidavit of Paternity and Surname Use
1 page
SAP HCM Consultant Profile - Nishant Kashyap
No ratings yet
SAP HCM Consultant Profile - Nishant Kashyap
2 pages