0% found this document useful (0 votes)

125 views20 pages

Toxic Comment Analysis Project Report

This document is a project report for analyzing toxic comments using data science. It includes an introduction describing the aims of analyzing toxicity in online comments. It then discusses the project background and system requirements. Several screenshots show the preprocessing of comment text, including removing punctuation and stopwords. Various machine learning models are tested for classifying comments by toxicity, including random forest models. The report concludes with discussing future scopes, such as expanding the system for more restaurants.

Uploaded by

Ali Asghar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views20 pages

Toxic Comment Analysis Project Report

Uploaded by

Ali Asghar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

PROJECT REPORT

Introduction To Data Science (CSL-487)

BS(CS)-6(A)
Project Title: TOXIC COMMENT ANALYSIS
Group Members
Name Enrollment
1. Huzaifa Muzaffar
02-134191-066
2. Ali Asghar
02-134191-121
3. Anas Shakeel
02-134191-047

Submitted to:
MISS SOOMAL FATIMA
BAHRIA UNIVERSITY KARACHI CAMPUS
Department of Computer Science

1|Page
TABLE OF CONTENT

1. ABSTRACT-------------------------------------------------------------------------------03
2. INTRODUCTION------------------------------------------------------------------------03
a. PROJECTAIMS& OBJECTIVES-------------------------------------------03
b. BACKGROUNDOFPROJECT----------------------------------------------04
3. SYSTEM ANALYSIS-------------------------------------------------------------------04
a. SOFTWAREREQUIRMENTSPECIFICATION--------------------------04
b. SOFTWARE& HARDWAREREQUIRMENT----------------------------04
c. OPERATION ENVIROMENT------------------------------------------------05
4. SYSTEM IMPLEMENTATION-------------------------------------------------------05
5. SCREENSHOT---------------------------------------------------------------------------06
6. SYSTEM TESTING---------------------------------------------------------------------20
7. CONCLUSION AND FUTURE SCOPE---------------------------------------------20
8. REFERENCES---------------------------------------------------------------------------20

2|Page
ABSTRACT

Online forums and social media platforms have provided individuals with the means to put
forward their thoughts and freely express their opinion on various issues and incidents. In some
cases, these online comments contain explicit language which may hurt the readers. Comments
containing explicit language can be classified into myriad categories such as Toxic, Severe Toxic,
Obscene, Threat, Insult, and Identity Hate. The threat of abuse and harassment means that many
people stop expressing themselves and give up on seeking different opinions.

To protect users from being exposed to offensive language on online forums or social media sites,
companies have started flagging comments and blocking users who are found guilty of using
unpleasant language. Several Machine Learning models have been developed and deployed to
filter out the unruly language and protect internet users from becoming victims of online
harassment and cyberbullying.

INTRODUCTION
Our main area of focus is the study of negative online behaviors like toxic comment.

PROJECT AIMS AND OBJECTIVES

The aims and objectives are as follows:
 To learn how data preprocessing works.
 To analyze the toxic comments.
 To figure out the levels of toxicity in comments.
 To figure out the best model for further analyses.

BACKGROUND OF PROJECT
According to Oxford Language Dictionary, comments are a verbal or written remark expressing an opinion
or reaction. When ever a comment is passed, it can harass someone or that comment can be very
motivating for someone. The

3|Page
SYSTEM IMPLEMENTATION

Toxic Comment Classifier is a competition that has been organized by Jigsaw/Conversation AI and
hosted on Kaggle. The data set for building the classification model was acquired from the
competition site and it included the training set as well as the test set. The steps elaborated in
the workflow below will describe the entire process from Data Pre-Processing to Model Testing.

Data Exploration, Data Pre-processing, and Feature Engineering

Step 1: Checking for missing values.

First and foremost, after importing the training and test data into the pandas dataframe, I
decided to check for missing values in the downloaded data. Using the “isnull” function on both
the training and test data, I discovered that there were no missing records and therefore, I
moved on to the next step of my project.

Step 2: Text Normalization.

As I was now certain that there are no missing records in my data, I decided to start with data
pre-processing. Firstly, I decided to normalize the text data since comments from online forums
usually contain inconsistent language, use of special characters in place of letters (e.g.
@rgument), as well as the use of numbers to represent letters (e.g. n0t). To tackle such
inconsistencies in data, I decided to use Regex. The text normalization steps that I performed are
listed below: -

 Removing Characters in between Text.

 Removing Repeated Characters.
 Converting data to lower-case.
 Removing Punctuation.
 Removing unnecessary white spaces in between words.
 Removing “\n”.
 Removing Non-English characters.

4|Page
Step 4: Stopwords Removal.

Stopwords Removal, as we all know, is one of the most critical steps in text pre-processing for
use-cases that involve text classification. Removing stopwords ensures that more focus is on
those words that define the meaning of the text.

i. To remove stopwords from my data, I took the help of the “spacy” library. Spacy has a list of
common stopwords, “STOP_WORDS” that can be used to remove stopwords from any textual
data.

ii. Although the list provided by spacy’s library is quite extensive, I decided to search for
additional stopwords that might be unique to my dataset.

iii. Firstly, I decided to add single-letter and two-letter words to the list of stopwords. While
reading through random comments in my dataset, I came across instances where single-letter or
two-letter words existed without any context, (e.g. Wow such a lovely pillow w!! or He is such a
happy guy bb.) To make sure that such instances of single-letter or two-letter words do not affect
the performance of my deep learning model, I added them to the list of stopwords. Although, I
made sure that words like me, am, as, or letters like I and a are not added to the list of
stopwords.

PROJECT DESIGN

5|Page
Libraries are an important essential part of Python:

6|Page
The above figure shows the information of the data. The output shows the that
the data is clean and there are no non null data.

The above figure shows the comment, we can see that the comments are not
clean. Thus, we need to clean the data.

Next, we will see how many columns are there and what is the percentage of
toxicity in comments.

7|Page
Graphs are visual representation of data. Below figure shows the visual data
representation:

8|Page
The following figure shows the early stages of preprocessing data, numbers,
letters and punctuation has been removed.

9|Page
Separate our dataset into 6 sections. Each section is comment +1 category.

10 | P a g e
11 | P a g e
Import relevant packages for Modeling

18 | P a g e
Pickling trained RandomForest models for all categories

19 | P a g e
System testing
The objective of our program was that it stays free from all sorts of bugs and errors, the flow of
program shows that the program was smooth and user friendly by looking at the Web Page. The
testing of the program was based on random user inputs and selections which exhibits that the
program has consistency. The program was handed over to different people from the group and
outside the group to check the solidarity of the program in random hands. Criticisms were
welcomed peacefully. The program was smooth and out of errors.
Step by step implementations was performed and each part was focused on the program. The flow
and the presentation of the program was kept in notice. The errors were removed, and the flow of
program was as per expectations. The program will create difficulty to one who finds it difficult to
understand the GUI.

CONCLUSION & FUTURE SCOPE

While making of the project on “Restaurant Web Page” we made our progress by solving several
problems. Solution to each problem was the biggest challenge for us and was the most important
lesson in our experience. We hope that these solutions and problems will help us in future.
We came to know how different types of properties can be implemented and programed in Web
Page. We also learned how good program is written and how to convert real life problems into
solutions. How to understand and fully write a program architecture.
This project has a great wide scope as restaurant centers are increasing not decreasing and in fact
every food center needs a management system to rely on. We hope that our project will be updated
with time as we move on to gain new heights in our life.

REFERENCES
 [Link]
 [Link]
 [Link]
 [Link]
 [Link]
 [Link]
 [Link]

20 | P a g e

Toxic Comment Classification Project Report
No ratings yet
Toxic Comment Classification Project Report
10 pages
Transformer NLP for Sentiment Analysis
No ratings yet
Transformer NLP for Sentiment Analysis
17 pages
Types and Processes of Data Science
No ratings yet
Types and Processes of Data Science
35 pages
Fundamentals of Data Science: Tutorials 5 2CS - 2025/2026: Exercise 1: Data Type Decision Making & Pipeline Design
No ratings yet
Fundamentals of Data Science: Tutorials 5 2CS - 2025/2026: Exercise 1: Data Type Decision Making & Pipeline Design
2 pages
Data Science Projects: R & Text Preprocessing
No ratings yet
Data Science Projects: R & Text Preprocessing
2 pages
Hate Speech Detection Project Report
100% (1)
Hate Speech Detection Project Report
24 pages
Malignant Comments Classifier Project
No ratings yet
Malignant Comments Classifier Project
30 pages
Toxic Comment Classification Project
No ratings yet
Toxic Comment Classification Project
29 pages
Text Editor Project Report Using Tkinter
No ratings yet
Text Editor Project Report Using Tkinter
40 pages
Remove Plagrisim
No ratings yet
Remove Plagrisim
38 pages
Programming Language Project Reflection
No ratings yet
Programming Language Project Reflection
6 pages
Intelligent Code Annotation Tool
No ratings yet
Intelligent Code Annotation Tool
23 pages
YouTube Comments Sentiment Analyzer
No ratings yet
YouTube Comments Sentiment Analyzer
84 pages
Python Text Analysis Tutorial for Business
No ratings yet
Python Text Analysis Tutorial for Business
5 pages
Internship Report on Sentiment Analysis
No ratings yet
Internship Report on Sentiment Analysis
27 pages
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
No ratings yet
Sentiment Analysis On Amazon Fine Food Reviews by Using Linear Machine Learning Models
6 pages
Toxicity Detection in Online Comments
No ratings yet
Toxicity Detection in Online Comments
35 pages
AI-Powered Music and Crowd Management
No ratings yet
AI-Powered Music and Crowd Management
25 pages
Quiz Generator Project Report
No ratings yet
Quiz Generator Project Report
8 pages
Algorithm Design and Python Implementation
No ratings yet
Algorithm Design and Python Implementation
2 pages
Sentiment Analysis Project Report
No ratings yet
Sentiment Analysis Project Report
16 pages
Sentiment Analysis Using NLP Techniques
No ratings yet
Sentiment Analysis Using NLP Techniques
69 pages
Slang Detection in Tweet Sentiment Analysis
No ratings yet
Slang Detection in Tweet Sentiment Analysis
18 pages
NLP Text Analysis in Excel Project
No ratings yet
NLP Text Analysis in Excel Project
7 pages
Sentiment Analysis System Architecture
No ratings yet
Sentiment Analysis System Architecture
4 pages
CET333 Project Development Overview
No ratings yet
CET333 Project Development Overview
13 pages
Aspect-Based Sentiment Analysis Internship Report
No ratings yet
Aspect-Based Sentiment Analysis Internship Report
7 pages
AI Model for Detecting Cyberbullying
No ratings yet
AI Model for Detecting Cyberbullying
6 pages
Sentiment Analysis of Tweets Project
No ratings yet
Sentiment Analysis of Tweets Project
15 pages
Data Science Internship Report Summary
No ratings yet
Data Science Internship Report Summary
41 pages
Sentiment Analysis Using AI Techniques
No ratings yet
Sentiment Analysis Using AI Techniques
27 pages
AI Sentiment Analysis Project Overview
No ratings yet
AI Sentiment Analysis Project Overview
52 pages
AI Toxicity Classification for Social Media
No ratings yet
AI Toxicity Classification for Social Media
5 pages
Computer Science Research Methodologies
No ratings yet
Computer Science Research Methodologies
20 pages
Sentiment Analysis of Tweets Project
No ratings yet
Sentiment Analysis of Tweets Project
15 pages
Django Course Management System Report
No ratings yet
Django Course Management System Report
23 pages
Internship Report: Text to SQL Project
No ratings yet
Internship Report: Text to SQL Project
18 pages
Machine Learning Sentiment Analysis Project
No ratings yet
Machine Learning Sentiment Analysis Project
6 pages
Computational Thinking and Algorithms Guide
No ratings yet
Computational Thinking and Algorithms Guide
7 pages
Comment Toxicity Classification Report
No ratings yet
Comment Toxicity Classification Report
52 pages
Text Mining Project Report Overview
No ratings yet
Text Mining Project Report Overview
19 pages
Toxic Comment Classification Report
No ratings yet
Toxic Comment Classification Report
30 pages
BERT for Social Media Sentiment Analysis
No ratings yet
BERT for Social Media Sentiment Analysis
34 pages
WeRateDogs Data Wrangling Report
No ratings yet
WeRateDogs Data Wrangling Report
3 pages
AI Sentiment Analysis for Businesses
No ratings yet
AI Sentiment Analysis for Businesses
4 pages
AI Email Classification & Spam Detection
No ratings yet
AI Email Classification & Spam Detection
70 pages
AI-Powered Sentiment Analysis Report
No ratings yet
AI-Powered Sentiment Analysis Report
25 pages
Restaurant Review Prediction Using NLP
67% (3)
Restaurant Review Prediction Using NLP
59 pages
AI Project Logbook for Rank Prediction
No ratings yet
AI Project Logbook for Rank Prediction
26 pages
ML Projectttttttttttt
No ratings yet
ML Projectttttttttttt
26 pages
Smart Student Grading System Guide
No ratings yet
Smart Student Grading System Guide
10 pages
Web Scraping Internship Report
No ratings yet
Web Scraping Internship Report
27 pages
Spam News Detection with Machine Learning
No ratings yet
Spam News Detection with Machine Learning
33 pages
Deep Learning for Sentiment Analysis
No ratings yet
Deep Learning for Sentiment Analysis
62 pages
Text Analytics and Web Mining Report
No ratings yet
Text Analytics and Web Mining Report
13 pages
Women Safety App Project Report
No ratings yet
Women Safety App Project Report
57 pages
Educational Psychology Test Bank 7th Edition
No ratings yet
Educational Psychology Test Bank 7th Edition
16 pages
Correctional Officer Resume of Austin Perdue
No ratings yet
Correctional Officer Resume of Austin Perdue
1 page
Bandura, A., Barbaranelli, C., Caprara, G., & Pastorelli, C. (1996) - Mechanisms of Moral Disengagement in The Exercise of Moral Agency.
No ratings yet
Bandura, A., Barbaranelli, C., Caprara, G., & Pastorelli, C. (1996) - Mechanisms of Moral Disengagement in The Exercise of Moral Agency.
11 pages
Understanding Morphemes in Psychology
No ratings yet
Understanding Morphemes in Psychology
32 pages
Burnham Grammar School Ofsted Report 2017
No ratings yet
Burnham Grammar School Ofsted Report 2017
4 pages
Virality of Philippine Pop Culture Lesson
No ratings yet
Virality of Philippine Pop Culture Lesson
12 pages
Impact of Classroom Assessment on K-12 Achievement
No ratings yet
Impact of Classroom Assessment on K-12 Achievement
11 pages
Brain-Computer Interface Overview
No ratings yet
Brain-Computer Interface Overview
14 pages
Understanding Oneself for Growth
100% (1)
Understanding Oneself for Growth
25 pages
Principles of Logic and Argumentation
No ratings yet
Principles of Logic and Argumentation
11 pages
Role-Play Presentation Rubric
No ratings yet
Role-Play Presentation Rubric
4 pages
Elementary PE Curriculum Overview
No ratings yet
Elementary PE Curriculum Overview
2 pages
Crime Scene Reconstruction Explained
No ratings yet
Crime Scene Reconstruction Explained
4 pages
Empowering Through Beauty Pageants
No ratings yet
Empowering Through Beauty Pageants
1 page
Understanding Design Research Dynamics
No ratings yet
Understanding Design Research Dynamics
8 pages
Examiners' Report June 2014 GCSE History 5HA03 3B
No ratings yet
Examiners' Report June 2014 GCSE History 5HA03 3B
16 pages
BSBSTR801 Student Assessment Tasks
100% (2)
BSBSTR801 Student Assessment Tasks
16 pages
Sampling Techniques in Research Methods
No ratings yet
Sampling Techniques in Research Methods
3 pages
Shcherba's Impact on Phonetics and Phonology
No ratings yet
Shcherba's Impact on Phonetics and Phonology
8 pages
Traits of Effective Teachers and Bullying
No ratings yet
Traits of Effective Teachers and Bullying
3 pages
9th Grade Honors English Level Selection
No ratings yet
9th Grade Honors English Level Selection
3 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
16 pages
Error Correction in Student Observations
No ratings yet
Error Correction in Student Observations
2 pages
Grade 7 English Curriculum Map
100% (1)
Grade 7 English Curriculum Map
24 pages
(Widi) JPPP VOL 9 NO 3 - Tramelia Salsabela 629-638
No ratings yet
(Widi) JPPP VOL 9 NO 3 - Tramelia Salsabela 629-638
10 pages
Online Platforms for Academic Support
No ratings yet
Online Platforms for Academic Support
5 pages
Personal Effectiveness Assessment Tool
100% (4)
Personal Effectiveness Assessment Tool
4 pages
Ida Jean Orlando's Nursing Theory Summary
100% (2)
Ida Jean Orlando's Nursing Theory Summary
22 pages
AF301 Course Outline: Semester II 2017
No ratings yet
AF301 Course Outline: Semester II 2017
7 pages
STET 2023 Madhyamic Paper I Syllabus
No ratings yet
STET 2023 Madhyamic Paper I Syllabus
4 pages