0% found this document useful (0 votes)
14 views10 pages

Predicting Student Dropout Risk

Uploaded by

evoytquanta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views10 pages

Predicting Student Dropout Risk

Uploaded by

evoytquanta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Predicting Student Dropout

Risk
Student: D. Mithilesh

Submission Date: 28/09/2025

A comprehensive data science approach to identifying at-risk


students using machine learning algorithms

"Data reveals.
Action prevents…."
Project Introduction
Objective
Develop a predictive model to identify students at risk of dropping out using comprehensive
data analysis and machine learning techniques.

Dataset Overview
• Student records with attendance patterns
• Academic performance metrics
• Demographic and behavioral indicators

Tools & Techniques


→Platforms: Python, Excel, Google Sheets
• Algorithms: K-NN Classification
• Analysis: K-Means Clustering
Exploratory Data Analysis
Comprehensive analysis revealed critical patterns in student behavior and academic performance that correlate with dropout risk.

85% 2.8 42%


Attendance Rate Average GPA Risk Indicators
Average attendance among successful Mean GPA of continuing students Students showing multiple warning signs
students
K-NN Classification Methodology
K-Nearest Neighbors algorithm identifies dropout risk by analyzing similarity patterns between
students based on key performance indicators.

01

Data Normalization

Standardize features for fair comparison

02

Distance Calculation

Compute Euclidean distances between students

03

Neighbor Selection

Identify K closest similar students

04

Outcome Prediction

Classify based on majority neighbor outcomes


K-Means Clustering Methodology
Initialize Centroids
Place K random cluster centers in the data space

Assign Data Points


Group students to nearest centroid based on characteristics

Recompute Centers
Update centroid positions based on cluster members

Iterate Until Stable


Repeat process until clusters converge
K-NN Classification Results
The K-NN model achieved strong predictive accuracy, successfully identifying high-risk students with precision.

Student ID Distance Prediction

ST_001 0.23 High Risk

ST_002 0.45 Low Risk


87%
ST_003 0.31 High Risk

ST_004 0.67 Low Risk

ST_005 0.19 High Risk


Model Accuracy

Correct predictions on test data

82%

Precision Rate

True positive identification


K-Means Clustering Results
High Risk Cluster Moderate Risk Cluster
Students with low Average performers with
attendance (<60%) and inconsistent patterns.
declining grades. Requires Benefit from targeted
immediate intervention. support programs.

Low Risk Cluster


Strong academic performance with consistent engagement.
Minimal intervention needed.
Key Insights and Learnings

Attendance is the Early Detection


Strongest Predictor Enables Intervention

Students with <70% ML models identify at-


attendance show 3x risk students 2 semesters
higher dropout risk in advance

Multiple Factors Create Compound Risk

Combination of low grades and poor engagement amplifies


dropout probability
Challenges and Recommendations
Data Quality Challenges Future Recommendations
• Missing attendance records for 15% of students • Expand dataset to include 5+ years of historical
• Inconsistent grading scales across departments data

• Limited socioeconomic background data • Implement Random Forest and Neural Network
models
• Integrate real-time data collection systems
Conclusion and Impact
Project Summary
Successfully developed a predictive model achieving 87% accuracy in identifying student dropout
risk using K-NN classification and K-Means clustering techniques.

Broader Implications
AI-driven early warning systems can transform educational outcomes by enabling proactive
interventions, potentially saving thousands of academic careers annually.

References: Documentation, IIT-M DATA SCIENCE AND AI COURSE VIDEOS, Google AI.

TOOLS: ChatGPT, Chrome, MS Excel, MS PPT.

You might also like