CERTIFICATE
It is certified that the work contained in the project report titled “Object
detection in autonomous vehicles using deep learning” by “Abhay
Singh , Aryan Kaushik , Kashish Bhardwaj , Lakshay Choudhary ”
has been carried out under my/our supervision and that this work has
not been submitted elsewhere for a degree.
Signature of Supervisor(s)
Mr. KAPIL KUMAR
Assistant Professor
CSE Department
COER University
April, 2024
Declaration
We declare that this written submission represents our ideas in our own words and
where others' ideas or words have been included, We have adequately cited and
referenced the original sources. We also declare that We have adhered to all principles
of academic honesty and integrity and have not misrepresented or fabricated or
falsified any
idea/data/fact/source in my submission. We understand that any violation of the above
will be cause for disciplinary action by the Institute and can also evoke penal action
from the sources which have thus not been properly cited or from whom proper
permission has not been taken when needed.
Abhay Singh (223025001)
Aryan Kaushik (223025016)
Kashish Bhardwaj (223025036)
Lakshay Choudhary (223025040)
TABLE OF CONTENTS
[Link]. CHAPTER PAGE NO.
1. INTRODUCTION 1-5
1.1 APPLICATION OF OBJECT DETECTION IN AUTONOMOUS
1.1.1. Pedestrian Detection
1.1.2. Vehicle Detection
1.1.3. Obstacle Detection
1.1.4. Traffic Sign recognition
1.2 TRADITIONAL MODELS
1.2.1 HISTOGRAM OF ORIENTED GRADIENT (HOG)
1.2.2 Region-based Convolutional Neural Networks (R-CNN)
1.2.3 Fast R-CNN
1.2.4 Faster R-CNN
1.2.5 YOLO
2. REVIEW OF LITERATURE 6
2.1 HISTOGRAM OF ORIENTED GRADIENT (HOG)
2.2 Region-based Convolutional Neural Networks (R-CNN)
2.3 Fast R-CNN
2.4 Faster R-CNN
2.5 YOLO
3. REPORT ON THE PRESENT INVESTIGATION 7-8
3.1 Methodology:
3.1.1. Data collection
3.1.2. Labelling Img
3.1.3. Downloading Required model
3.1.4. Training the model
3.1.5. Making predictions
4. RESULT AND DISCUSSION 9-11
4.1 RESULT
4.2 Technologies used:
4.2.1 Python
4.2.2 Yolov8 model
4.2.3 Deep learning
5 SUMMARY AND CONCLUSIONS 12
6. Appendix 13
7. REFERENCES 14
CHAPTER1 INTRODUCTION
In this Project “Object detection in autonomous vehicles using deep learning” we are
trying to detect the object in real time by using YOLOv8 algorithm. Object detection
helps in identifying and locating the object within a frame. Most of the people get
confused about the term object classification and object detection. Basically object
classification is done to identify and categorize the object. In object classification we are
not detect the object or we can say that we are not locate the position of the object. But
on other hand object detection deals with identifying and locating objects of a given img.
With the development of deep learning, we are able to achieve accurate result of object
detection. In today’s world most of the car manufacturer like tesla are using object
detection technology so that they are able to make selfdriven cars. When the car driver
activate the autopilot mode then the car is able to navigate by itself easily. We are using
YOLOv8 algorithm because of its speed and accuracy.
Over the years many models are introduces to detect the object such as CNN
(Convolutional Neural Network), R-CNN (Region-Based Convolutional Neural
Network) but they are not able to detect the object in real time and they also take very
much time to detect the object. So to remove these issues YOLO was came into the
picture which are able to detect the object in real time and it take very less time to detect
the object with high accuracy. YOLO was introduced in 2015 by Joseph Redmon ,
Santosh Divvala , Ross Girshick ,and Ali Farhadi. It is single shot algorithm which
means that it detect the object in single pass only. YOLO algorithm divides the given
image into grid cells and it predict the probability of presence of an object by using the
bounding box coordinates of the object.
There are several steps of working of YOLO algorithm:
1. First of all we take an img and then passed it through CNN. This process is done to
extract the features of the img.
2. The obtained features are then passed through a series of fully connected layer that
help in predicting the object probabilities and bounding box coordinates.
3. In next step the img is divided into a grid cell , and each grid cell is responsible for
predicting a set of bounding box and object probabilities.
1
4. By the help of bounding box we predict the object by using a post processing
algorithm to remove overlapping boxes and choose the box with the highest
probability.
APPLICATIONS
1. Pedestrian Detection:
- Object detection techniques are employed to identify pedestrians on roads,
sidewalks, or crosswalks.
- Detecting pedestrians allows autonomous vehicles to anticipate their movements,
maintain safe distances, and avoid collisions, thus enhancing road safety.
2. Vehicle Detection:
- Autonomous vehicles utilize object detection to detect and track other vehicles
sharing the road.
- Identifying nearby vehicles enables autonomous systems to maintain safe distances,
anticipate lane changes, and navigate through traffic flows smoothly.
3. Obstacle Detection:
- Object detection algorithms are employed to identify various obstacles such as
debris, construction zones, or stationary vehicles obstructing the roadway.
- Detecting obstacles allows autonomous vehicles to plan alternative routes, adjust
their speed, and avoid potential collisions, ensuring safe navigation.
4. Traffic Sign Recognition:
- Object detection is utilized to recognize and interpret traffic signs, including stop
signs, speed limit signs, and traffic signals.
- Identifying traffic signs enables autonomous vehicles to adhere to traffic
regulations, adjust their speed accordingly, and navigate intersections safely.
1.2 TRADITIONAL MODELS
1 . Histogram of oriented Gradients (HOG)
2 . Region-based Convolutional Neural Networks (R-CNN)
3 . Fast R-CNN
4. Faster R-CNN
5. YOLO
2
1.2.1 HISTOGRAM OF ORIENTED GRADIENT (HOG)
Histogram of oriented Gradients was introduced in 1986. It is the oldest method for
object detection. It was not so popular at that time. It become popular in 2005 where
it is used to perform many task related to computer. HOG extract the features of an
image to detect the
object.
Below are some points that tell us the working of HOG works -
1. First of all we have to find the gradient by dividing the entire computation of the
image into gradient representation (8x8 cells).
2. By the help of 64 gradient vector we split the cell into angular bins and compute
the histogram for a particular area. This process helps to reduce the size of 64 vectors to
9 values.
3. When we get 9 values for histogram of each cell then we choose to create overlaps
for the bock of cell.
4. The final step is to form the feature blocks, normalize the obtained features vector
and collect all the features vector to get all HOG features..
LIMITATIONS –
• It is time consuming.
• Computational complexity is very high
1.2.2 Region-based Convolutional Neural Networks (R-CNN)
It was introduced in 2014. This model remove many issues that are present in HOG. In
this we are trying to extract about 2000 features by making use of selective features.
Selective search algorithm helps us for selecting the most significant extractions.
Below are some points that tell us the working of HOG works -
1 . First step is that by the help of selective search algorithm we select the important
regional proposals that ensure to generate multiple sub segment of a particular img.
2. Once the selective search algorithm is completed our next step is to extract the
features.
By the help of a pre-trained convolutional neural network we are able to extract the
features.
3
3. The final step is to make predictions of the image . The prediction are made by the
computation of a classification model and regression model is used to correct the
bounding box classification for the proposed region.
LIMITATIONS –
• High memory consumption.
• RCNN can be slow during training phase because it processes each region
proposal independently.
1.1.3 Fast R-CNN
This model was introduced in 2015. In R-CNN we pass each region proposal one by one
in CNN architecture and selective search algorithm generate 2000 region proposal so it
is very complex and expensive to train the image using R-CNN. So to remove this
problem
FastRCNN was introduced. Basically it take the whole image as an input in CNN
architecture LIMITATIONS –
• It struggle to detect small object in the img.
• Training time can be time consuming when working with large dataset
1.2.4 Faster R-CNN
Faster R-CNN was introduced in 2015. We know that there are some issues in R-CNN
and to remove those issues Fast R-CNN model was proposed. But there are issues in
Fast RCNN and to remove them Faster R-CNN model was introduced. Fast R-CNN
also use selective search algorithms to compute the region proposals, so this technique
was replaced by Faster R-CNN by introducing superior region proposal network. The
region proposal network reduce the margin computation time , usually 10 ms per
image. This network consist of convolutional network by the help of which we obtain
essential feature of each pixel. For each feature we have multiple anchor (the centre of
the sliding window with unique size and scale). These anchors are passed into
classification layer and regression layer by the help of which we classify the object
and localize the bounding box.
LIMITATION –
• It must not be fast enough for real-time application due to multi-stage process.
4
1.2.5 YOLO
YOLO (You Only Look Once) was introduced by Joseph Redmon, Santosh Divvala,
Ross Girshick, and Ali Farhadi in [Link] (You Only Look Once) is a real-time
object detection algorithm that uses deep learning to detect objects in images or videos.
YOLO works by processing an image or video frame at a time and predicting the
location and class of objects in the frame. It uses a convolutional neural network (CNN)
to extract features from the input image and then applies a series of regression models to
predict the bounding boxes and class probabilities of objects in the frame. YOLO is
known for its speed and accuracy, making it a popular choice for real-time object
detection applications.
5
CHAPTER2 REVIEW OF LITERATURE
2.1 Histogram of Oriented Gradients (HOG)
Introduced by Navneet Dalal and Bill Triggs in 2005, the Histogram of Oriented
Gradients (HOG) method revolutionized object detection by efficiently capturing local
gradient information in images. Despite its early inception in 1986, it gained widespread
recognition and popularity for its effectiveness in various computer vision tasks.
2.2 Region-based Convolutional Neural Networks (R-CNN)
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik introduced R-CNN in
2014. This model marked a significant shift in object detection by incorporating deep
learning techniques, particularly Convolutional Neural Networks (CNNs). By leveraging
selective search algorithms and CNNs, R-CNN significantly improved detection
accuracy.
2.3 Fast R-CNN
Building upon R-CNN, Ross Girshick introduced Fast R-CNN in 2015. This model
addressed the computational inefficiencies of R-CNN by proposing a unified
framework for object detection. By processing entire images in one pass through a
CNN, Fast R-CNN achieved faster inference speeds and improved training efficiency.
2.4 Faster R-CNN
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun introduced Faster R-CNN in
2015 as an evolution of Fast R-CNN. This model introduced a Region Proposal Network
(RPN), eliminating the need for selective search algorithms. By integrating RPN into the
detection pipeline, Faster R-CNN further improved speed and accuracy by generating
region proposals dynamically.
2.5 YOLO (You Only Look Once)
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi introduced YOLO in
2016, revolutionizing real-time object detection. By processing entire images in one pass
6
through a CNN and directly predicting bounding boxes and class probabilities, YOLO
achieved remarkable speed and accuracy, making it ideal for real-time applications like
autonomous vehicles and surveillance systems.
7
CHAPTER 3 REPORT ON THE PRESENT
INVESTIGATION
We are using YOLOv8 model for object detection to increase the speed and accuracy.
Here are some steps which we have followed during the completion of our project – 1
3.1 Methodology
3.1.1 Data Collection:
This step involves gathering images of vehicles from online sources like Pexels and
Pixabay. These platforms offer a wide variety of high-quality images that can be used for
training an object detection model. It is important to collect a diverse set of images that
cover different types of vehicles, backgrounds, lighting conditions, and angles to ensure
that the model generalizes well to real-world scenarios.
3.1.2. Labeling Img:
Labeling is the process of marking images with bounding boxes that indicate the
location of objects (in this case, vehicles) within the img. The tool mentioned, labelImg,
is commonly used for this purpose. It allows users to open images in a graphical
interface and draw bounding boxes around objects. These labeled images are then saved
along with XML files that contain information about the coordinates of the bounding
boxes and the corresponding object classes.
The labeled images are typically divided into three subsets: training, testing, and
validation. The training set is used to train the model, the testing set is used to evaluate
the model's performance during training, and the validation set is used to fine-tune the
model and assess
its generalization ability.
3.1.3. Downloading Required Model:
Before training the object detection model, it's necessary to have the appropriate
software and libraries installed. In this case, the ultralytics package needs to be installed
using pip.
This package provides implementations of various deep learning models, including
YOLOv8, which is a popular architecture for object detection tasks.
3.1.4. Training the Model:
8
Training the object detection model involves feeding the labeled images into the
YOLOv8 architecture and adjusting the model's parameters to minimize the difference
between the predicted bounding boxes and the ground truth bounding boxes.
The [Link] file contains the code for configuring the training process, including
specifying hyperparameters such as learning rate, batch size, and number of epochs.
During training, the model learns to recognize vehicles in the images and predict their
bounding boxes.
The output of this step is a trained model file named [Link], which contains the
learned weights and parameters of the model.
3.1.5. Making Predictions:
Once the model is trained, it can be used to detect vehicles in new images or videos. In
this case, the model is applied to a video file (test2.mp4) to identify vehicles.
The yolo command is used to perform object detection using the trained YOLOv8 model.
Parameters such as the model file ([Link]), confidence threshold (conf), and source
file (test2.mp4) are specified to customize the detection process.
The output of this step is typically a new video file or images with bounding boxes
drawn around the detected vehicles, providing visual confirmation of the model's
performance.
Fig. 3.1: Flow Chart of how model is build
9
CHAPTER4 RESULT AND DISCUSSION
4.1 RESULT
YOLO is another popular deep learning model for object detection in autonomous vehicles.
In a study that compared the performance of Faster R-CNN and YOLO on the COCO
dataset, YOLO achieved higher speed (45 frames per second) and comparable accuracy
to Faster RCNN.
Fig 4.1: Shows the Detection of cars and truck and also shows the percentage of
accuracy.
10
Fig 4.2: Shows the Detection of cars and shows the percentage of accuracy.
4.1 Technologies Used:
4.1.1. Python
Python plays a crucial role in the application of YOLO (You Only Look Once), a real-
time object detection system, due to its versatility, extensive library support, and ease of
use. Python is often used to integrate, customize, and extend the YOLO codebase,
allowing developers to tailor the model to specific use cases. It is also employed for data
preprocessing and augmentation, leveraging libraries such as NumPy, OpenCV, and PIL
for tasks like resizing, cropping, and applying transformations to images. Furthermore,
Python's popular deep learning libraries such as TensorFlow and PyTorch are commonly
used for training, fine-tuning, and inference with YOLO models, while visualization
libraries like matplotlib and seaborn aid in model analysis and performance visualization.
The rich Python community and availability of resources further make it an attractive
choice for working with
YOLO.
4.1.2 YOLOv8 Model
The YOLOv8 model, an advanced iteration of the YOLO (You Only Look Once) series,
is widely employed for real-time object detection across diverse applications due to its
exceptional features and capabilities. Its real-time detection capability makes it well-
suited for applications such as autonomous vehicles, surveillance systems, and robotics
11
where rapid and accurate object detection is essential. The model's versatility extends
across various domains including industrial automation, retail analytics, security systems,
medical imaging, and sports analytics. Moreover, YOLOv8 is known for its precision in
accurately localizing objects within images, an important feature for applications such as
medical imaging and quality control in manufacturing. Its scalability and efficiency
make it suitable for deployment in both high-powered server environments and resource-
constrained edge devices, contributing to its wide applicability.
4 .1.3 Deep learning
Deep learning plays a critical role in object detection by leveraging complex neural
network architectures to automatically extract features from images, train object
detection models, and significantly enhance detection accuracy compared to traditional
computer vision methods. Through techniques like convolutional neural networks, deep
learning models can efficiently learn hierarchical representations of data, enabling
precise localization of objects and the ability to detect a wide range of object classes
across diverse domains such as healthcare, autonomous driving, and surveillance.
Object detection is a critical component of autonomous vehicles, as it enables the vehicle
to perceive and react to its surroundings. Deep learning has shown great potential in
achieving accurate and efficient object detection in autonomous vehicles, as demonstrated
by various studies that have applied deep learning models, such as Faster R-CNN, YOLO.
The capacity of deep learning-based object identification to handle complex and diverse
things, such as pedestrians, automobiles, and traffic signs, in a variety of environmental
situations, is one of its key benefits. such as different lighting, weather, and road
conditions. Deep learning models can learn to detect and classify these objects based on
their features and patterns in large datasets, enabling the autonomous vehicle to make
informed decisions and respond appropriately.
Another advantage of deep learning-based object detection is its adaptability and
scalability. Deep learning models can be trained on diverse datasets and can be fine-tuned
or transferred to different domains or tasks, enabling the autonomous vehicle to detect
new objects or respond to new situations. Additionally, deep learning models can be
optimized for different hardware platforms, such as GPUs and embedded systems, to
achieve real-time performance and low power consumption. However, there are also some
challenges and limitations of deep learningbased object detection in autonomous vehicles.
12
One challenge is the need for large and diverse datasets for training and evaluation, which
may require significant resources and time. Another challenge is the interpretability and
transparency of deep learning models, which may affect their trustworthiness and
accountability.
CHAPTER5 SUMMARY AND CONCLUSIONS
This Project “Object detection in autonomous vehicles using deep learning” focus on
detection of objects like cars, bike and trucks in real-time by the help of YOLOv8 model.
Autonomous vehicles are those without a driver that offer better security and comfort to
passengers. The safety of their propulsion and their ability to avoid causing traffic
accidents are the two most crucial factors with regard to autonomous cars. It involves
the system and device functional safety of the vehicle. Object detection is a critical
component in enabling autonomous vehicles to perceive and interact with their
environment. In recent years, deep learning-based approaches have shown significant
improvements in object detection accuracy and speed. We propose a method for object
detection in autonomous vehicles using YOLOv8 model. Our approach achieves high
accuracy and fast speed , making it suitable for real-time applications in autonomous
vehicles. In this project first we collect images of vehicles from online sources like
Pexels and Pixabay. Then we do the image labeling for marking images with bounding
boxes that indicate the location of objects (in this case, vehicles) within the image. After
that we choose YOLOv8 model for object detection. The purpose to choose YOLOv8
model is that it has the capability to detect the object in real time and also the speed and
accuracy of this model is very high. Then we train our model. Once the model get
trained we can detect vehicles in new images or videos.
13
The purpose of our project is to detect the object in autonomous vehicles using deep
learning by using YOLOv8 model. By the help of YOLOv8 algorithm we are able to
detect the images correctly and we can predict the Bounding boxes and multiple class
probabilities are displayed concurrently but we are not able to achieve 100 % accuracy.
CHAPTER 6 APPENDIX
YOLO is a real-time object detection system that treats object detection as a regression
problem, which makes it faster than other object detection methods that use a two-stage
approach (e.g., R-CNN, Fast R-CNN, and Faster R-CNN). YOLOv8, the latest official release,
introduces several improvements over its predecessors, such as:
1. Bag of Freebies (BoF): A collection of techniques that improve the model's
performance without introducing significant computational overhead. These techniques
include data augmentation, spatial pyramid pooling, and various normalization methods.
2. Bag of Specials (BoS): A set of more complex techniques that require additional
computational resources but significantly improve the model's performance. Examples
include Mish activation function, SPP-Block, and Dense Prediction Block (DPB).
3. Mish Activation Function: A self-regularized non-monotonic activation function that
improves the model's generalization ability and performance.
4. SPP-Block: Spatial Pyramid Pooling block helps the model to handle objects of
different scales and aspect ratios.
5. Dense Prediction Block (DPB): A module that aggregates features from different scales
and improves the model's ability to detect small objects.
14
6. CSPNet: Cross-Stage Partial Network, which is used for building the backbone of the
YOLOv4 model. CSPNet helps to reduce the computational complexity and improve
the model's performance.
REFERENCES
1. Youtube : [Link]
2. Deep learning module : [Link]
3. [Link]
4. [Link]
5. [Link]
6. Muhammad Azriyahya “Deep Learning for Object Identification in LiDAR for
Autonomous Vehicles” 2020 IEEE 10th International Conference on System
Engineering and Technology (ICSET), 9 November 2020, Shah Alam,
Malaysia.
7. Ruturaj Kulkarni “"Traffic Light Detection and Recognition for Self-Driving
Vehicles using Deep Learning," 2018 IEEE Fourth International Conference on
Computing, Communication, Control, and Automation (ICCU).
15