0% found this document useful (0 votes)
112 views14 pages

Pothole Detection with YOLOv8 Model

Uploaded by

Kids Network
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views14 pages

Pothole Detection with YOLOv8 Model

Uploaded by

Kids Network
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Title: Pothole Detection using YOLO for Road Safety and Security purpose.

Abstract: road accidents are one of the common problems faced by the individuals of our
country .According to the Ministry of Road Transport and Highways (MoRTH),India has
reported around 4,61,312 road accidents in the year 2022 resulting in around 1.7 lakh fatalities
and 4.5 lakh injuries .There is a significant amount of increase in the number of road accidents
in the year 2022 compared to the year 2021 (of around 12%).
One of the several reasons for this is the presence of
‘Potholes’ on the [Link] Potholes manually is very time consuming and we will not be
able to detect the them [Link] proposed model aims at providing a better system to
detect the potholes on highways as well as on the muddy [Link] are collected from the
the internet and then annotated to assign them labels to create a [Link] datset is trained
using YOLOv8 to identify whether the roads have potholes or not.
[Link] :
Roads are one of the essential means for the transportation in a
[Link] GDP(Gross Domestic Product) of a country is directly linked with the health of
it’s roads . The calamitous infrastructure of the roads can be disastrous and fatal for the safety
and security of the passengers as well as vehicles. Road surfaces, usually asphalt, can develop
depressions or cavities called potholes as a result of pavement deterioration. They present
serious risks to both cars and pedestrians, and they are a prevalent problem on roadways all
over the world, including India.
Pothole creation is a multi-step process. Roads first acquire minor cracks as a result of age,
heat expansion and contraction, and frequent traffic loads. Water then penetrates through these
fissures and reaches the pavement's underlying layers. In colder areas, the water can freeze and
expand, causing the pavement to bulge and fracture even more. When the ice melts, it creates
cavities beneath the surface that collapse under vehicles, resulting in potholes. Even in warmer
climates like India, water infiltration weakens the foundation layers, and vehicle pressure can
cause the pavement to break apart, resulting in potholes.
Potholes arise as a result of a variety of causes. Poor drainage systems cause water to collect
on or beneath the road surface, hastening degradation. The use of poor construction materials
can result in early road failure. Potholes are common in India, especially during the monsoon
season. Heavy rainfall causes water collection on roads, and poor drainage exacerbates the
problem.
Road maintenance organizations now demand several working hours to evaluate road
[Link] cost-effectiveness of patching is influenced by material, labour, and equipment
[Link] assessment of damage from the acquired data is vital to future decision-making for
[Link] automated system for forecasting future deterioration rates and budgets is
[Link] requirement for speedy, precise evaluation of road discomfort is expanding.

Highway Pothole road Muddy Pothole road

[Link] Methodology :
The Custom Dataset is created using the images taken from the different sources on the
[Link] dataset is then divided into train,test and validation [Link] dataset is then
trained using YOLOv8 which results in the creation of a Object Detection Model for the
Pothole detection along with the generation of performance metrics like Accuracy,Precision,
Recall,F1 score,mAP. We also check for the training and validation losses .The Generated
Custom Weights are then run on an IDE(integrated development environment) like Google
Collaboratory,PyCharm,Jupyter Notebook where we give input video and then the the output
video is generated using a python script involving our model and input video, detecting the
potholes in the input video and generating an output video which shows potholes in the
different frames of the input using the Bounding [Link] output video detecting the
potholes on road is generated which makes object detection simple and much more efficient
and less time taking than doing this manually or using less efficient and less accurate models.
 Data Acquisition :
The training dataset affects the models' dependability and performance. Realistic pothole photos
must be included in the [Link] dataset that we have used consists of 693 images divided into
the set of 493 train images ,validation set of 133 images and test set of 67 images , utilizing
lighting changes, shadows, and moving cars to simulate real-world situations near potholes. The
photos in the collection are low-quality and noisy since they were gathered from internet sources.

 Data Preprocessing and Spilliting :

The creation of Custom dataset is done by first collecting the images from all over the internet
and then we use roboflow for the annotation of the images providing the labels for the potholes
in each of the 693 images. Annotation is a must step while creation of the custom dataset as we
are assigning and labelling the potholes in each of the image which will be trained for better
results on the [Link] images(693) are split into train(493),test(67) and valid(133) image sets.

 YOLO :
YOLO (you only look once) was first introduced in 2016 by Redmon. In order to detect an
item and forecast its bounding box coordinates, it splits the input image into SxS grid cells.
Along with the class title, each item bounding box displays the X, Y coordinates, height (h),
width (w), and confidence score. The confidence score indicates the accuracy of the bounding
box prediction and is calculated as the percentage that the actual labeled object bounding box
matches the predicted bounding box. Unlike other algorithms that need to repeatedly scan an
input image, it can detect, categorize, and localize many objects in a single step. Known as
YOLOv1, this algorithm was the first to recognize objects in real time. When YOLOv1
detects small and crowded objects, it shows certain limits. These drawbacks were removed in
2016 and presented as YOLO900 or YOLOv2. Increased speed, precision, and performance
were among the notable aspects of the upgraded version. Advanced methods including anchor
boxes and batch normalization were incorporated in this version. An upgraded version of
"YOLOV3: an incremental improvement," which was much better and more powerful than
earlier iterations, was proposed in 2018. Cutting-edge detectors like RetinaNet, SSD, and its
variations are outperformed by YOLOv3. Comparing it to other detection models, its speed
much [Link] on April 23, 2020, by Alexey Bochkovskiy, Chien-Yao Wang,
and Hong-Yuan Mark Liao, YOLOv4 boasts a 12% increase in frames per second and a 10%
improvement in mean average precision when compared to YOLOv3. Tiny YOLOv4 is also a
simplified and quicker real-time object detector, which is a compressed version of the original
YOLOv4. After the YOLOv4 release, a company named “Ultralytics” came up with the
YOLOv5 by Glenn [Link] is different from previous models as it is implemented in
PyTorch. In 2020, Ultralytics published YOLOv5, an open-source project that builds on the
popularity of earlier iterations. YOLOv6 concentrated on increasing system efficiency and
decreasing memory usage. It used SPP-Net (Spatial Pyramid Pooling Network), a novel CNN
design. The YOLOv7 was first released in 2022. ResNeXt, a new CNN architecture, is one of
the main enhancements in YOLOv7. The Ultralytics created YOLOv8, a state-of-the-art
(SOTA) model that expands on the popularity of earlier iterations of YOLO and adds new
features and enhancements to increase performance and adaptability even further. YOLOv8 is
a great option for a variety of object recognition, picture segmentation, and image
classification jobs because of its quick, accurate, and user-friendly design.

 Architecture :

The Backbone, Neck, and Head are the three main parts of the modular design that forms the
basis of YOLOv8. Rich hierarchical characteristics are extracted from the input picture via the
backbone. This is accomplished in YOLOv8 by use of the C2f module, a more sophisticated
version of the C3 module present in YOLOv5. The model's ability to capture intricate spatial
hierarchies in the image while maintaining a small footprint is made possible by the C2f
(Concatenate-2f) structure. This allows for feature reuse and effective gradient flow by first
dividing the input, processing each split via a number of lightweight bottleneck layers, and
finally concatenating the outputs.
The Neck of YOLOv8 uses a mix of Path Aggregation Network (PAN) and Feature Pyramid
Network (FPN) structures to aggregate multi-scale features after the backbone. By combining
information from various resolution layers, these elements allow the model to identify objects
of varied sizes.
A decoupled detection head is one of the most important architectural enhancements in
YOLOv8.
YOLOv8 simplifies the model and eliminates the need for intricate hyperparameter tweaking
by directly predicting the object center and bounding box dimensions rather than depending
on preset anchor boxes. Faster inference speeds and more model adaptability across datasets
with different object sizes and aspect ratios are two benefits of this anchor-free approach.
YOLOv8 has many pre-defined model sizes that are available for scalability: YOLOv8n
(nano), YOLOv8s (small), YOLOv8m (medium), YOLOv8l (large), and YOLOv8x (extra-
large). Deployment across a variety of platforms, from resource-constrained edge devices to
potent GPU servers, is made possible by the tuning of each version with varying depth and
width multipliers to balance the trade-offs between accuracy and speed.
Loss Function in YOLO is given by :

Some of the other popular performance metrics used in deep learning are :

 𝐴𝐶𝐶𝑈𝑅𝐴𝐶𝑌 = ……(1)
 𝑃𝑅𝐸𝐶𝐼𝑆𝐼𝑂𝑁 = ……(2)
 𝑅𝐸𝐶𝐴𝐿𝐿 = …….(3)

 𝐹1𝑆𝐶𝑂𝑅𝐸 =2∗ …….(4)
 𝐴𝑃(𝑎𝑣𝑒𝑟𝑎𝑔𝑒𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) = ∑ (𝑅 − 𝑅 ) * 𝑃 ………(5)
 𝑚𝐴𝑃(𝑚𝑒𝑎𝑛 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) = ∑ 𝐴𝑃 ………(6)

Where TP = True Positives , TN = Ture Negatives , FP = False Positives and FN = False


Negatives
Comparison Table :
S. Refer Datase Used yea Performance measurement
No. ence ts used Model/ r
Architectu
re
Effici Precisi F- Accur Sensiti Specificit Youden’ MAE MSE mAP Others
ency on score acy vity y s- index Measureme
nt If any
1 [1] Smart SSD- 85 - - - - - - - 84 Processin
phon TensorFl g speed =
e ow, 20 frames
Captu YOLOv per
res, 3- second
Intern Darknet -
et 53,
Sourc YOLOv
es 4-
Darknet
53

2 [2] Smart YOLOv - 80 - - -- - - 82 CPU


phone 8 Small, usage =
captur YOLOv 35 %
es 8 Nano, ,inference
moun YOLOv time =
ted on 7 Tiny, 0.72 per
vehicl and image
e Faster
winds R-CNN
hields
3 [3] Image YOLOv4, - - - - - - - - - 80 Frames
s with Tiny- Per
potho YOLOv4, Second
les in YOLOv5 (FPS):
divers 31.76
e
road
condi
tions
and
illumi
natio
n
variat
ions
4 [4] Imag YOLOv5, - - - - - - -- - - - 79
es YOLOv4,
depic Tiny-
ting YOLOv4,
potho ResNet5
les 0
and (specific
crack ally for
s crack
under detectio
vario n)
us
road
types
and
lighti
ng
condi
tions.
5 [5] PothR YOLOv8 - 90 - 80 - - - - - 89 Recall =
GBD, n-seg 84
comp
rising
1,000
RGB-
D
image
s
captu
red
using
an
Intel
RealS
ense
D415
depth
came
ra
6 [6] 70 Image - 80 - 81 - - - -- - Recall =
grays Segment 86
cale ation,
image Shape
s of Extractio
aspha n,
lt Texture
pave Analysis
ment
s,
7 [7] Image SSD(sing - - - - - - - - 77. Frames
s le shot 5 per
taken multibox second =
from detector 23.91
intern )
et
8 [8] PASC YOLO- - - - - - - - - - 33.
AL LITE 81
VOC model
and
MS
COCO
datas
ets
9 [9] VisDr YOLO(yo - - - - - - - - - 27.
one u look 2
and only
TinyP once)
erson
datas
ets
10 [10] potho MobileN - 82 83 - - - - - - - Recall =
le et, 84
image Efficient
s Net and
collec other
ted deep
via learning
smart models
phon
es

Result and discussion:


As bounding box regression loss continuously declines, box predictions get better. The model is
clearly learning class labels as evidenced by the considerable drop in classification loss. A
decrease in Distribution Focal Loss improves localization accuracy. With increased precision
over time, there are fewer false positives. Recall rises with time, suggesting that more true
positives are found.

[1]results file -showing different box,cls,dfl losses and mAP at 0.5 and 0.95
The dataset is primarily image-centered and solely includes [Link] are tiny
[Link] is a slight relationship between the height and width of the bounding [Link] are
helpful for confirming the quality of labels and identifying bias or imbalance in the dataset
before training an object detection model.

[2] labels for potholes [3]labels_correlogram

[4] images detecting the potholes [5] pothole detection using YOLO
The dataset is primarily image-centered and solely includes [Link] are tiny
[Link] is a slight relationship between the height and width of the bounding [Link] are
helpful for confirming the quality of labels and identifying bias or imbalance in the dataset
before training an object detection model. These images collectively depict the outputs and data
distribution of a machine learning model trained to detect potholes.
The graph 6 illustrates how variations in the confidence threshold affect the F1 score. This curve
illustrates how confidence boosts [Link] rises as the threshold is raised since only
the most certain forecasts are taken into [Link] recall, on the other hand, can result in
fewer [Link] total precision for all classes is displayed by the blue bold [Link] phrase
all classes 1.00 at 0.884 indicates that, at a confidence level of 0.884, the model achieves 100%
precision.

[6] f1 score confidence curve [7] precision confidence curve

[8] confusion matrix [8]Normalized confusion matrix


A performance evaluation tool for classification issues that shows how well a machine learning
model is doing is called a confusion matrix. The predicted labels from the model are contrasted
with the actual (real) labels.
 Perforamnce criteria :

The various performance metrics which are used in the evaluation of a model like
precision,recall,F1score,accuracy,mAP(mean average Precision),AP(average Precision)
etc . are listed in the formulas from equations 1 to 6 above in this work. Using these
equations and the normalized confusion matrix we can calcaulate these different
performance criterions for the proposed model .
The performance of the model heavily
relies on the training and validation of dataset and on the fine tuning of different
parameters.

Performance metrics values


Accuracy 84
Precision 92
Recall 89
F1 score 87

Conclusion and future scope:

The pothole detection for safety and security of persons and vehicles using the YOLO
object detection model has achieved an accuracy of 84 %,precision score 92 %,recall score
as 89 and f1 score as 87which can be considered high enough for the real time
applications. Therefore our pothole detector using YOLOv8 can be considered as a robust,
effective ,real time system that can be used to deal with the real life scenarios.

In future we will try to work with a larger dataset having atlest


3000 or more images under different circumstances like weather conditions , different
lighting conditions, several severities etc. Different searchers have emphasized on the fact
that while working with such large datasets we can achieve a better object detector for the
different conditions.

References :

1. Al Shaghouri, A., Alkhatib, R., & Berjaoui, S. (2021). Real-time pothole


detection using deep learning. arXiv preprint arXiv:2107.06356.
[Link]
2. Amri, A. U., & Kusuma, G. P. (2024). Comparative study of pothole detection
using deep learning on smartphone. Indonesian Journal of Electrical
Engineering and Computer Science, 37(2), 995–1004.
[Link]
3. Asad, M., et al. (2022). Pothole detection using deep learning: A real-time and
AI-on-the-edge perspective. Advances in Civil Engineering, 2022, Article ID
9221211. [Link]
4. Babu, G. R., et al. (2024). Pothole and crack detection using deep learning:
Advancements in road surface anomaly recognition. Machine Intelligence
Research, 18(1), 92–100.
[Link]
5. Yurdakul, M., & Tasdemir, Ş. (2025). An enhanced YOLOv8 model for real-
time and accurate pothole detection and measurement. arXiv preprint
arXiv:2505.04207. [Link]
6. Koch, C., & Brilakis, I. (2011). Pothole detection in asphalt pavement images.
Advanced Engineering Informatics, 25(3), 507–515.
[Link]
7. Lim, J.-S., Astrid, M., Yoon, H.-J., & Lee, S.-I. (2019). Small object detection
using context and attention. arXiv preprint arXiv:1912.06319.
[Link]
8. Pedoeem, J., & Huang, R. (2018). YOLO-LITE: A real-time object detection
algorithm optimized for non-GPU computers. arXiv preprint arXiv:1811.05588.
[Link]
9. Tang, Y., Zhang, J., & Li, X. (2024). YOLO-RSFM: An efficient road small
object detection method. IET Image Processing, 18(5), 445–456.
[Link]
10. Amri, A. U., & Kusuma, G. P. (2024). Comparative study of pothole detection
using deep learning on smartphone. Indonesian Journal of Electrical Engineering
and Computer Science, 37(2), 995–1004.
[Link]

Abstract of different references :

1. An strategy based on deep learning for real-time pothole identification using computer vision
techniques is presented in the publication "Real-time pothole detection using deep learning" by Al
Shaghouri, Alkhatib, and Berjaoui (2021). Convolutional neural networks, or CNNs, are used by
the authors to scan road photos and accurately detect potholes. The model is meant to be effective
enough to be implemented on embedded devices used in automobiles.
2. Pothole identification using deep learning models designed for smartphones is compared in the
work by Amri and Kusuma (2024). Using transfer learning and Bayesian hyperparameter
tweaking, the authors created and assessed lightweight models including YOLOv8 small,
YOLOv8-nano, YOLOv7 tiny, and Faster R-CNN MobileNetV3. Of them, YOLOv8-nano was
the most effective for mobile deployment due to its high detection accuracy (82.5% AP), lowest
file size, and quickest inference time (0.72 seconds per picture). The potential of smartphone-
based solutions for real-time pothole monitoring is highlighted by this study, which might
improve road upkeep and urban infrastructure management.
3. The use of deep learning models for real-time pothole detection on edge devices is examined in
the study by Asad et al. (2022). An AI kit (OAK-D) coupled with a Raspberry Pi is used in the
study to test models such as SSD-MobileNetV2 and YOLOv1–v5 under various traffic conditions.
Tiny-YOLOv4 provided a balance between accuracy (90%) and speed (31.76 FPS), making it
appropriate for real-time applications, whereas YOLOv5 had the greatest mean average precision
(95%).
4. This study uses an edge computing setup consisting of a Raspberry Pi and an OAK-D AI module
to assess the performance of three deep learning models: YOLOv4, YOLOv5, and Tiny-YOLOv4
for real-time pothole and crack identification.
5. For precise pothole identification and measurement, the authors provide an improved YOLOv8
model that combines Gaussian Error Linear Unit (GELU), Simple Attention Module (SimAM),
and Dynamic Snake Convolution (DSConv).
6. This work describes an automated technique that uses histogram shape-based thresholding to
identify potholes in photos of asphalt pavement. By dividing photos into areas with and without
defects, the method lessens the need for expensive equipment and physical examinations,
providing an affordable pavement condition assessment option.
7. The authors provide a technique that improves detection accuracy by combining multi-scale
feature concatenation with an attention mechanism in order to address the problem of tiny object
recognition. The method outperformed traditional SSD models in recognizing tiny items,
achieving a mean average accuracy (mAP) of 78.1% for 300×300 input pictures when tested on
the PASCAL VOC2007 dataset.
8. A real-time object identification technique designed for devices without GPUs is presented:
YOLO-LITE. Despite having poorer mAP scores of 33.81% on PASCAL VOC and 12.26% on
COCO datasets, it is 3.8 times faster than SSD Mobilenetv1 with just 7 layers and 482 million
FLOPS, achieving 21 FPS on non-GPU desktops and 10 FPS on web implementations.
9. The authors provide an effective technique for identifying tiny road items called YOLO-RSFM.
The model's integration of sophisticated feature extraction and attention processes improves
detection speed and accuracy, making it appropriate for real-time applications in road situations.
[Link] this comparison research, the lightweight deep learning models for pothole identification on
smartphones—YOLOv8 small, YOLOv8-nano, YOLOv7 tiny, and Faster R-CNN
MobileNetV3—are evaluated. With the fastest inference time of 0.72 seconds per image, the
lowest model size, and an average precision of 82.5%, YOLOv8-nano proved to be the most
effective, demonstrating its applicability for mobile deployment.

Common questions

Powered by AI

Advancements like YOLOv8-nano have facilitated portable deployment through lightweight design optimized for mobile platforms. YOLOv8-nano offers high detection accuracy (82.5% AP) alongside the fastest inference times (0.72 seconds per image) and the lowest file size among tested models. These characteristics make it highly efficient for real-time deployment on smartphones, enabling pothole detection without the need for powerful computational resources. Such efficiency and compactness allow for widespread use in mobile applications, improving road monitoring .

YOLOv8's architecture facilitates improved object detection by combining an efficient backbone with advanced feature aggregation techniques and a novel detection head. The backbone uses the C2f module for sophisticated spatial hierarchy extraction, which is compact and efficient. Its neck integrates Path Aggregation Network (PAN) and Feature Pyramid Network (FPN) structures to agitate multi-scale features, enhancing capability in detecting objects of varying sizes. Furthermore, the decoupled detection head allows for independent optimization of classification and localization, improving accuracy and speed classification and detection activities .

YOLOv4 and its variants have significantly transformed real-time object detection by offering enhancements in speed and precision compared to earlier YOLO models. YOLOv4 boasts a 12% increase in frame rate and a 10% improvement in mAP over YOLOv3. It incorporates advanced techniques such as spatial pyramid pooling, which enhances feature extraction across different scales, and sophisticated training strategies like mosaic data augmentation. These improvements enhance the model's ability to detect objects quickly and accurately, even on less powerful hardware, making it more suitable for real-time applications .

Performance metrics vital for evaluating a pothole detection model like YOLOv8 include accuracy, precision, recall, F1 score, mean average precision (mAP), and average precision (AP). Accuracy gives a general sense of the model’s correctness, precision indicates how many of the identified potholes were relevant, while recall measures how many actual potholes were identified by the model. The F1 score harmonizes precision and recall into a single metric, useful for imbalanced datasets. mAP provides a more comprehensive measure over various confidence thresholds, and AP quantifies the model’s performance across classes. These metrics collectively ensure the model functions effectively in real-time scenarios .

Future improvements for pothole detection models using YOLO algorithms include developing larger datasets with more than 3000 images to encompass a wider range of conditions such as different lighting, weather, and road scenarios. This enhancement would improve the model's generalization abilities, allowing it to detect potholes under various real-world conditions. There's also a potential to refine models for edge computing devices, optimizing them for deployment on lower-power hardware like smartphones to facilitate widespread real-time monitoring and improve urban infrastructure maintenance .

YOLOv7 differs from YOLOv6 by integrating ResNeXt, a novel CNN architecture, as part of its enhancements. ResNeXt contributes to performance improvements by offering higher accuracy through efficient use of grouped convolutions, enabling better feature map extraction and processing. ResNeXt's introduction helps YOLOv7 achieve better performance than YOLOv6 in terms of accuracy and speed, making it more suitable for real-world, high-performance applications while maintaining computational efficiency .

Data pre-processing and annotation profoundly affect the performance of object detection models like YOLOv8. Proper annotation and labeling of images are crucial, as they form the foundation of supervised learning by providing accurate ground truths. Pre-processing steps, such as noise reduction and augmentation, improve data quality and diversity, enhancing the model’s ability to generalize across various conditions and reducing the likelihood of overfitting. Robust pre-processing and precise annotations lead to more accurate and reliable model predictions, as seen in the performance metrics of pothole detection systems .

YOLOv1 had limitations in detecting small and crowded objects due to its grid-based prediction approach, which often led to missed detections or inaccurate bounding box placements. Subsequent iterations, like YOLOv2 and beyond, introduced enhancements such as anchor boxes and batch normalization, which allowed better handling of object scaling and improved feature normalization across deep networks. YOLOv3 further improved by offering better class prediction using multi-scale feature maps and deeper architectures, enhancing its capability to recognize small and dense objects more reliably .

YOLOv8 introduces several key enhancements over its predecessors, including the use of the C2f module in its backbone for richer hierarchical feature extraction, enabling the model to maintain a small footprint while capturing intricate spatial hierarchies. Its neck utilizes a combination of Path Aggregation Network (PAN) and Feature Pyramid Network (FPN) structures for improved multi-scale feature aggregation, allowing detection of objects of various sizes. Additionally, the decoupled detection head simplifies the model, improving efficiency and accuracy .

The training dataset greatly affects the dependability and performance of the YOLOv8 model for pothole detection. A well-diversified dataset, which includes realistic pothole photos with variations in lighting, shadows, and present moving vehicles, helps simulate real-world scenarios. This diversity assists in enhancing the model’s ability to generalize and accurately identify potholes under various conditions. The dataset’s size and quality play a critical role in training and validation, thereby impacting precision, recall, and overall accuracy of the model .

You might also like