1
STUDENT SCORE PREDICTION USING
SUPERVISED MACHINE LEARNING
Project report
TEAM MEMBERS:
MADHURA S (R21EJ112)
MALLESHWARI R (R21EJ113)
ANUSHA N (R21EJ096)
MANISHA S (R21EJ115)
SUBMITTED TO
MUTHIREDDY SIR
2
3
Problem statement
Abstract 3
Introduction 4
Objectives 5
Methodology 7
Tecnology Used 9
Implementation 10
Results 13
Conclusion 15
4
Problem Statement
Given Problem Statement
Scores of Student
1. Frame a simple linear regression model involving 2 variables and predict the
percentage of an student based on the hours of study.
2. What will be the predicted score, if a student study for 9.75 hrs/ day?
Abstract
Student Performance Analysis System is an emerging field and is very crucial to
schools and universities in helping their students and professors. Most of the pre-
existing methods are based only on past academic performance of students. This
system aims to develop models which can predict the student’s performance and
grades. It uses various machine learning and deep learning techniques to predict
the performance of students, and basic exploratory data analysis to derive various
correlations of student’s performance.
5
Student Performance Analysis
System is an emerging field and
is very crucial to
schools and universities in
helping their students and
professors. Most of the pre-
existing methods are based only
on past academic performance
of students. This
system aims to develop models
which can predict the student’s
performance and
grades
6
The project aims to predict a student’s performance by analysing behavioural
patterns and existing grades. It helps us to identify the success factors and success
blockers. It also helps us to figure out what changes can be made by the student in
his daily routine to improve his/her grades and additional practices that can be
implemented by the teacher to help him/her achieve their educational [Link]
project contains an Algorithm which analyses attributes like: existing
grades,absences, number of hours studied, TV time, siblings, etc. The Algorithm
then predicts an expected performance based on the attributes. The project is based
on Machine Learning and the algorithm used is Linear Regression. Linear
Regression Algorithm helps us to identify correlation betweenvariables or
attributes. The system thus intend to provide a most efficient combination of
attributes that help in the performance of a student.
Introduction
In With the advent of technology and sophistication of database management
resources, recently there has been interest in educational databases containing a
variety of valuable information which could help less sucessful students improve
their academic performance and help academic institutions optimize their resources
to improve overall wellbeing of their students. The objective of the task is to
7
predict post test scores of students using given set of features. The project aims to
predict a student’s performance based on input given by Teachers and Students.
The input given by Teachers include a student’s academicmarks, attendance,
failures, etc. The input given by Students include his or herday-to-day routine
activities like the amount of time he spends daily for studying,the amount of time
he spends going out with friends, the time taken to travel tohis/her academic
institution, etc. A Dataset is then formed by combining the two inputs of all the
students. Machine Learning Algorithm Linear Regression is then applied to this
Dataset and it then trains a model based on these values. With the model, we can
input the attributes of any particular student and predict his performance.
LITERATURE SURVEY
“Implementation of Student SGPA Prediction System (SSPS) Using Optimal
Selection of Classification Algorithm” [1] In today’s world, there is competition in
education institution every student plays a major role in the growth of the
institution. An algorithm such as Logistic Model Tree, Random tree, and REP tree
is used, the data set collected from the university may contain errors and noises
which make the model less effective so data cleaning is done and the data set will
reduce to 236 instances from 260 records. The REP tree algorithm has given more
8
accuracy with 61.70%. “Machine Learning Algorithm for Student’s Performance
Prediction” [2]. The performance can be improved by predicting their marks by
using the previous year’s marks and can groom the students to improve
themselves. By using machine learning techniques, we can improve the
performance of every student the dataset of 1170 data was collected from three
subjects. Algorithm such as KNearest Neighbors, SVC, Decision Tree Classifier,
and Linear Discriminant Analysis. The decision tree classifier model has given the
highest accuracy of 94.44%. [Link] © 2022 IJCRT | Volume 10, Issue 7
July 2022 | ISSN: 2320-2882 IJCRT2207282 International Journal of Creative
Research Thoughts (IJCRT) [Link] c141 “Prediction of Student’s
Performance by Modelling Small Dataset Size” [3] An educational institution’s
major objective is to give its students a high-quality education. Early performance
forecasting for students can help them earn better grades and get into prestigious
schools. The machine learning classification algorithm such as Naïve Bayes,
Support vector machines, K-nearest neighbor, and Linear discriminant analysis.
The Linear discriminant analysis has given accuracy of 79%. “Prediction of
Student Academic Performance Using Neural Network, Linear Regression, and
Support Vector Regression: A Case Study” [4] Institutions have a significant
9
impact on academic and pupil success. In the final year, pupils’ academic standing
has a big impact on their future jobs. The algorithm used is Neural Network (NN),
Support Vector Regression (SVR), and Linear Regression (LR). The dataset of 134
data was collected, the linear regression has shown more accuracy compared to
other algorithms.
Objectives
Machine learning techniques can be used to forecast the performance of the
students and identifying the at risk as early as possible so appropriate actions can
be taken to enhance their performance. The aim is to help the students to avoid
his/her predicted poor result using ML, were we are trying to find out student’s
current status and further predict his/her future results. This project focuses on
evaluating students’ capabilities in various subjects using a classification task. Data
classification has many approaches, and the decision tree method and probabilistic
classification method are utilized here. By performing this task, knowledge is
extracted that describes students’ performance in the end-semester examination.
This helps in identifying dropouts and students who require special attention,
enabling teachers to provide appropriate advising and counseling. In the context of
10
academic performance prediction, the goal is to perform regression analysis to
predict a student's percentage score based on the number of hours they have
studied. This involves using a dataset with two key columns: "Hours" representing
the study duration and "Scores" indicating the corresponding percentage score
obtained by the student in an examination.
Understand the Dataset & cleanup (if required).
Build Regression models to predict the student marks wrt multiple features.
Also evaluate the models & compare their respective scores
Methodology
Since universities are prestigious places of higher education, students’ retention in
these universities is a matter of high concern. It has been found that most of the
students’drop-out from the universities during their first year is due to lack of
proper support in undergraduate courses. Due to this reason, the first year of the
undergraduate student is referred as a “make or break” year. Without getting any
support on the course domain and its complexity, it may demotivate a student and
can be the cause to withdraw the course.
There is a great need to develop an appropriate solution to assist students retention
at higher education institutions. Early grade prediction is one of the solutions that
11
have a tendency to monitor students’ progress in the degree courses at the
University and will lead to improving the students’ learning process based on
predicted grades.
Using machine learning with Educational Data Mining can improve the learning
process of students. Different models can be developed to predict students’ grades
in the enrolled courses, which provide valuable information to facilitate students’
retention in those courses. This information can be used to early identify students
at-risk based on which a system can 1 suggest the instructors to provide special
attention to those students. This information can also help in predicting the
students’ grades in different courses to monitor their performance in a better way
that can enhance the students’ retention rate of the
universities.
Reading the dataset
Dependency of various features/attributes on the final grad
Removing the least Correlated attributes
Converting the data types
Splitting the data for training and testing
Prediction
Graphical Representation of the Result
Tools used
• Python
• Numpy ,Pandas,Seaborn
• Sklearn (Scikit-learn)
• Regression, including Linear and Logistic Regression
• Classification
• Clustering
12
• Model selection
• Preprocessing
Content diagram
Implementation
• Let’s consider a school and data of its students. We collect every single
minute information about the students and we put in an excel file. This gives
a shape to the data. Suppose that we have around thirty three different
attributes or features of data that we collected for every student. The first
step in our process is identifying the attributes. We should now consider
only those attributes that depend on the Grade of the student. To find out
13
these attribute we should find correlations. Correlations give the dependency
of a dependent variable on an independent variable. We then consider only
the attributes that are mostly related and discard the rest. We then change the
data types of the file as the system cannot compute multiple data types at
once. So we convert the data type of the file and send it for training and
testing. In training and testing, the data is partitioned randomly for training
and testing. The partitions are sent for training and testing respectively.
Then, it sent for a fit function using Linear Regression algorithm.
• For portraying accuracy it is shown using Boxplot
We aim to solve the problem statement by creating a plan of action,
Here are some of the necessary steps:
1. Data Exploration
2. Exploratory Data Analysis (EDA)
3. Data Pre-processing
4. Data Manipulation
5. Feature Selection/Extraction
6. Predictive Modelling
7. Project Outcomes & Conclusion
Exploratory Data Analysis
BAR PLOT
14
LINE PLOT
15
SCATTER PLOT
16
REGRESSION PLOT
17
LINEAR REGRESSION TESTED MODEL
18
ACTUAL VS PREDICTED DISTRIBUTION
PLOT
19
Conclusion
The accurate student academic performance prediction model is demand of every
educational institute nowadays. But to resolve the data quality issues in student
20
perfor-mance prediction model is often biggest challenge. This research
work, presented a student performance prediction model based on supervised
learning technique . In this project we did a analysis of what could be possible
factor on whether a student is likely to get a high score or a low score. The data
does not contain that much information but still we were able to predict a pretty
precise Linear Regression algorithm that predicts what score a student will get in
the foreseen feature by analysing the features. It is to my understanding that the
linear regression model is used to predict values with a given number of features.
In future the proposed model will be tested on large dataset with more number of
attributes.
The accurate student academic
performance prediction model is
demand of every
educational institute nowadays. But to
resolve the data quality issues in
student perfor-
21
mance prediction model is often
biggest challenge. This research
work, presented a
student performance prediction model
based on supervised learning technique
The accurate student academic
performance prediction model is
demand of every
educational institute nowadays. But to
resolve the data quality issues in
student perfor-
mance prediction model is often
biggest challenge. This research
work, presented a
student performance prediction model
based on supervised learning technique
22
The accurate student academic
performance prediction model is
demand of every
educational institute nowadays. But to
resolve the data quality issues in
student perfor-
mance prediction model is often
biggest challenge. This research
work, presented a
student performance prediction model
based on supervised learning technique
The accurate student academic
performance prediction model is
demand of every
educational institute nowadays. But to
resolve the data quality issues in
student perfor-
23
mance prediction model is often
biggest challenge. This research
work, presented a
student performance prediction model
based on supervised learning technique
24