Proceedings of the International Conference on Machine Learning and Autonomous Systems (ICMLAS-2025)
IEEE Xplore Part Number: CFP258A9-ART; ISBN: 979-8-3315-0574-5
Machine Learning Approaches for Predicting Lung
Cancer Risk and Early Diagnosis
Jegan K Infant Nikesh [Link]
Dept of Computer Science and Dept of Computer Science and Dept of Computer Science and
Engineering Engineering Engineering
Sathyabama Institute of Science and Sathyabama Institute of Science and Sathyabama Institute of Science and
Technology, Chennai-600119, Technology, Chennai-600119, Technology, Chennai-600119,
Tamilnadu, India Tamilnadu, India Tamilnadu, India
Jegannagi2003@[Link] Infantnikesh901@[Link] Pravin_ane@[Link]
2025 International Conference on Machine Learning and Autonomous Systems (ICMLAS) | 979-8-3315-0574-5/25/$31.00 ©2025 IEEE | DOI: 10.1109/ICMLAS64557.2025.10968104
Illustrative Example: A patient displays ongoing cough
Abstract— Using a healthcare dataset, this study applies a variety alongside breathlessness while demonstrating a smoking
of ML techniques to analyze and forecast the likelihood of lung background. The proposed model enables doctors to calculate
cancer. This dataset will be initialized by cleaning, encoding the probability of lung cancer in patients which supports their
categorical variables, and removing duplicates to ensure quality diagnostic process.
and efficiency. Correlation analysis is carried out to find
important characteristics that raise the risk of lung cancer. II. LITERATURE SURVEY
engineering including interaction terms enhances the predictive
efficiency of the dataset. The various machine learning models are
This research investigates prognostic variables, such as age,
introduced they are Logistic Regression, SVC, Random Forests, smoking, and genetic risk factors, with classification algorithms
GBM, and XGB. Over-sampling with ADASYN addresses class like Support Vector Machines (SVM), Decision Trees, Logistic
imbalance to improve model training. The logistic regression Regression, and so on. Among them, SVM has been recognized
machine learning model has attained 97% accuracy. It will ensure as the best classifier for lung cancer diagnosis with the best
consistency and reliability. Comprehensive visualizations such as accuracy in comparative algorithm evaluation [1]. The study
heatmaps and bar plots, explore feature correlations and class investigates the use of machine learning methods to make early
distributions. The study represents the efficiency of ensemble prediction and classification of lung cancer, the foremost killer
methods like Random Forests and XGBoost, achieving superior of cancer deaths across the globe. The study includes data
predictive performance compared to baseline models. This system extraction, feature choice, model learning, testing, and
predicts whether they might have lung cancer, helping doctors to determination of the best methodology. Machine learning
diagnose or early intervention. algorithms are measured on the performance measures of
accuracy, sensitivity, specificity, precision, and the area under
Keywords— Lung cancer-Machine learning-Dataset Preprocessing- receiver operating characteristic curve (AUC-ROC) [2]. This
Feature engineering-Models-ADASYN-Accuracy-Cross-validation- research explores machine learning model predictability for the
Visualizations. detection of lung cancer. Results show that SVM and the C4.5
I. INTRODUCTION Decision Tree algorithm are very effective in accuracy and F-
score. But the Incremental Weighted Tree algorithm excelled
Lung cancer ranks among the principal death causes worldwide. over others with more than 90% accuracy in the ensemble model
The treatment performance improves dramatically through early testing, outperforming the Random Forest classifier [3]. The
diagnosis in lung cancer although conventional methods face results also point out possible limitations, including limited
obstacles with precision and availability and operational pace. generalizability as a result of the utilization of a single dataset
The structured dataset structure can fulfill predictive modeling and dependence on a single classification method. However, the
needs for machine learning to serve as an effective alternative. research proves that ensemble learning methods, such as
The current studies both omit comprehensive model majority voting and gradient-boosted trees, possess high
performance studies with multiple models and exclude predictive ability for lung cancer diagnosis [3].
explanations as main research aspects.
This study employs machine learning models to overcome
This research creates a machine learning framework which the shortcomings of expensive and time-consuming clinical
proves data effectively before adding structured features to practices, such as X-rays and CT scans. Classification
evaluate various ML models. This study brings forward a novel algorithms, such as SVM, KNN, and ANN, are utilized to
contribution through its systematic evaluation of different identify lung cancer correctly. Clinical datasets are utilized for
classifiers that identifies their capabilities and restrictive factors measuring model performance, guaranteeing efficient and
in predicting lung cancer. The research employs SHAP affordable diagnostic options. The strategy is most advantageous
(SHapley Additive exPlanations) analysis to explain how for smokers and passive smokers, who are most prone to
developing lung cancer. By facilitating early detection of the
disease, the strategy can enhance patient outcomes and minimize
specific features affect the decision-making process of the mortality rates [4]. The research compares the predictive
model.
979-8-3315-0574-5/25/$31.00 ©2025 IEEE 351
Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 09,2026 at [Link] UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Machine Learning and Autonomous Systems (ICMLAS-2025)
IEEE Xplore Part Number: CFP258A9-ART; ISBN: 979-8-3315-0574-5
accuracy of Convolutional Neural Networks (CNN) with four Metabolomic analysis of 61 biomarkers in 110 lung cancer
conventional machine learning models—Logistic Regression, patients and 43 healthy subjects identified a panel of six
Decision Trees, SVM, and Gaussian Naïve Bayes. The biomarkers with high diagnostic accuracy (AUC = 0.989,
preprocessing methods involving confusion matrices and ROC Sensitivity = 98.1%, Specificity = 100.0%). The Fast
curve analysis are used to make the comparison robust. The Correlation-Based Filter (FCBF) algorithm detected the most
findings illustrate how machine learning models can be used to useful biomarkers for screening, while Naïve Bayes was
help healthcare professionals make data-driven decisions and effective for early detection. This multidisciplinary approach is
correctly diagnose lung cancer [5]. The main aim of this study is relevant to the establishment of blood-based screening methods
to identify lung cancer, one of the major causes of death in India, and has implications beyond cancer diagnosis [13]. It is
through a combination of CT scans and blood sample tests. estimated that 27 million new cancer cases will occur by the year
Machine learning and image processing methods are used to 2030, as indicated by the International Agency for Research on
classify images from CT scans, identify anomalies, and segment Cancer (IARC). This paper gives an overview of lung cancer,
tumor areas. Early diagnosis is essential for successful benchmarking datasets, and current developments in deep
treatment, and SVM is used in this study to classify image learning-based medical image analysis. It discusses methods of
features extracted with high accuracy [6]. In response to the image acquisition, feature extraction, segmentation, and
increasing lung cancer death rate, the system under classification and compares their efficacy, strength, and
consideration combines machine learning and digital image weaknesses. The research determines shortcomings and
processing (DIP) to forecast cancer in its early stage. suggests future research directions for enhancing lung cancer
Preprocessing and segmentation of CT scan images are detection and classification accuracy [14]. This study combines
performed, followed by feature extraction and classification deep learning with Surface-Enhanced Raman Spectroscopy
with algorithms like SVM, Random Forest, and ANN. The (SERS) to detect exosomes for early lung cancer diagnosis. A
system classifies a tumor as benign or malignant based on deep learning model produced 95% accuracy in classification,
optimizing performance measures such as statistical accuracy, detecting plasma exosomes with 90.7% similarity with cancer
recall, and precision [7]. cell exosomes in 43 patients. The model produced an AUC of
0.912 for the overall cohort and 0.910 for Stage I patients, which
The research proposes a novel histological correlation-based indicates its potential for early lung cancer detection by liquid
method for identifying small cell lung cancer (SCLC) from CT
biopsy methods [15].
scans, which is referred to as the Early Detection Model (EDM).
Early detection is essential since lung cancer kills more than
70,000 people each year and costs the healthcare system more
than $10 billion. Six high-resolution chest CT scans from III. PROPOSED SYSTEM
patients with SCLC and six from healthy subjects were In accordance with clinical and demographic data, the proposed
examined from the National Cancer Institute. The proposed methodology utilizes machine learning (ML) algorithms to
EDM algorithm achieved an accuracy of 77.8%, demonstrating make predictions regarding lung cancer presence. Through the
its potential for early diagnosis and improved clinical outcomes development and testing of predictive models by employing
[9]. Computer-aided diagnosis (CAD) systems have made sophisticated methods like oversampling, feature engineering,
significant advancements in detecting diseases, including early- and resilient artificial intelligence (AI) practices, this model
stage cancer. This study employs CT scan images to identify, seeks to improve early lung cancer detection.
classify, and assess the malignancy of lung nodules. A 3D multi-
path network similar to VGG is proposed and applied on data A. Data Processing
sets like the Kaggle Data Science Bowl 2017 using the U-Net For this study the database is taken from Kaggle, it is a famous
architecture for segmentation. The conclusions from the 3D platform for high-quality databases, which is used in machine
network and U-Net are combined, with the latter showing a learning research. The database titled "Survey lung cancer" was
95.6% accuracy and a log loss of 0.387732, proving the reprocessed to ensure reliability and consistency. The first step
effectiveness of this technique in classifying lung nodules and is to remove duplicate entities and manage missing values,
identifying malignancies [10]. The emphasis of this study is to followed by encoding categorical variables to implement model
utilize data-driven machine learning models to forecast lung compatibility. In addition, class imbalance in the target variable
cancer progression. Classification models like SVM, Decision was defined neither the help of the SMOTE to ensure the
Trees, and Logistic Regression are employed to examine key balanced data for the training model.
factors such as age, smoking status, and genetic susceptibility.
Out of these models, Random Forest attained the maximum
accuracy of 88.5% in lung cancer detection and can be
considered a good diagnostic tool for supporting radiologists
[11]. Machine learning predictive models are intended to assist
clinicians in the management of indeterminate pulmonary
nodules found incidentally or on screening. The systems should
decrease variability in nodule classification, improve decision-
making, and decrease unnecessary follow-up or intervention for
benign nodules. Several classification algorithms, such as SVM
and Decision Trees, have specific strengths and weaknesses.
Although these approaches are promising in enhancing
diagnostic accuracy, there are limitations in model development,
validation, and clinical use that need to be overcome for them to
be fully incorporated in healthcare environments [12].
979-8-3315-0574-5/25/$31.00 ©2025 IEEE 352
Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 09,2026 at [Link] UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Machine Learning and Autonomous Systems (ICMLAS-2025)
IEEE Xplore Part Number: CFP258A9-ART; ISBN: 979-8-3315-0574-5
Feature engineering involves the creation of interaction features which help understand the features in the model’s decision-
and the elimination of less relevant variables based on making process.
correlation analysis. The overhauled dataset was portioned into
planning and testing sets with the offer help of utilizing an 80- B. Comparative Analysis
20 portion for making and evaluating the prescient models. TABLE I. COMPARATIVE ANALYSIS
Dataset Models Novel
Study Accuracy
Used Evaluated Contributions
Clinical SVM, DT, Used basic ML
[1] Radhika et al. 91.50%
dataset LR models
Medical Image-based
[2] Faisal et al. CNN, RF 93.20%
images classification
Feature
Healthcar
[3] Thallam et al. RF, GBM 94.10% selection
e dataset
techniques
Hybrid deep
[4] Choudhary et Medical ANN,
92.80% learning
al. records SVM
model
Genetic
Genomic data
[5] Pati et al. markers SVM, DT 89.50%
analysis
dataset
Behavioral
Survey
[6] Banerjee et al. KNN, LR 90.30% factors
data
included
CT Scan Image-based
[7] Bharathy et al. CNN, RF 95.70%
images prediction
Mixed
XGBoost, Deep feature
[8] Kadir et al. clinical 96.50%
SVM engineering
dataset
Blood Biochemical
[9] Xie et al. biomarke NB, DT 88.90% feature
r dataset analysis
Introduces
interaction
Structure RF, XGB, features,
This study d health GBM, 97.00% SHAP analysis,
data SVC, LR and
comparative
evaluation
Fig.1 System Architecture
This architecture diagram Fig.1 illustrates the step-by-step The proposed approach establishes its novelty through a
process for building and evaluating a machine learning pipeline comparison with existing operational methods. This (Table I)
aimed at predicting lung cancer outcomes from input data. It incorporates previous research which leads to improvements
begins with raw data input from the CSV file. The data presented through the following table.
undergoes a preprocessing phase where duplicates are removed,
null values are handled, and variables are encoded. The next C. Summarized ML Algorithms:
stage is feature engineering, which involves creating interaction The determination of machine learning classifiers was
features, analyzing correlations, and removing irrelevant actualized based on their application and demonstrated changes
features to prepare the data for model training. Following this, in restorative conclusion, particularly in lung cancer expectation.
class balancing is performed using the SMOTE to address any A point by point depiction of the Models utilized specifically
data imbalance. The pipeline then branches out into model RF, XGBoost, GBM, SVC, and LR.
training with different algorithms such as LR, SVC, GBM,
XGBoost, and RF. Each model is evaluated using various • Random Forest (RF): This ML Show is a Capable tree-
metrics, including SHAP (Shapley Additive exPlanations) Learning calculation. It will work by creating a few Choice
analysis for interpretability, confusion matrix for performance trees whereas executing the preparing stage. Each tree is
assessment, and ROC curve for model comparison. The final prepared utilizing a arbitrary subset of the information set,
outputs of the process are predictions and feature importance, and this machine Learning show will degree a arbitrary
979-8-3315-0574-5/25/$31.00 ©2025 IEEE 353
Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 09,2026 at [Link] UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Machine Learning and Autonomous Systems (ICMLAS-2025)
IEEE Xplore Part Number: CFP258A9-ART; ISBN: 979-8-3315-0574-5
subset of highlights in each parcel. This arbitrariness gives the two measurements. Moreover, the by and large precision of
changeability among the person trees by lessening the 0.96 recommends that the models accurately classify 96% of
chance of overfitting and moving forward by and large cases.
expectation execution.
Both Random Forest and XGBoost ensemble models
• XGBoost: XGBoost has gained popularity and widespread demonstrate exceptional performance according to the results
use as a machine learning algorithm because of its capacity which show AUC-ROC scores exceeding 0.99 for strong
to manage sizable datasets and its cutting-edge performance discrimination. The F1-scores as well as recall values verify that
in machine learning tasks. the models show exceptional precision when identifying lung
• GB: GB is a ML technique that builds models in a stage-wise cancer patients. A confusion matrix analysis verifies the reliable
manner. It optimizes predictions by minimizing errors in a nature of predictions because it shows minimal rates of false
gradient descent-like process and works effectively for negatives and false positives. SHAP analysis demonstrates that
structured datasets. the prediction outcome depends heavily on the presence of
chronic diseases and allergies as well as alcohol consumption in
• SVC: It is an effective calculation broadly utilized in ML for patients. Three main enhancements to model performance can
both straight and non-linear classification. The fundamental
be realized by implementing more clinical information and
objective of the SVM is to distinguish the ideal hyperplanes
improved feature construction along with deep learning
in a multidimensional space to isolated information focuses
into diverse classes. architectures.
• Logistic Regression (LR): LR is the directed machine
learning calculation utilized to anticipate the likelihood of an
occasion having a place to the lesson or not. It is a
measurable strategy utilized by parallel classification.
These classifiers were prepared and assessed on a prepared
dataset utilizing measurements such as accuracy, AUC-ROC,
Accuracy, and Review to decide their viability in foreseeing
lung cancer.
IV. Results and Discussions
TABLE II PERFORMANCE METRICS
Precis F1- Precis F1-
Reca Recall Accur
Model AUC ion Scor ion Score
ll (0) (1) acy
(0) e (0) (1) (1)
Random 0.99
0.97 0.97 0.97 0.97 0.97 0.97 0.97
Forest 61
0.99
XGBoost
48
0.97 0.97 0.97 0.97 0.97 0.97 0.97 Fig.2 ROC Curve for Model Comparison
This ROC (Receiver Operating Characteristic) bend plot Fig.2
compares the execution of diverse ML models in anticipating the
Gradient 0.99
0.97 0.97 0.97 0.97 0.97 0.97 0.97 probability of lung cancer. There appears the trade-off between
Boosting 15
the false positive rate and true positive rate for each show. The
corner to corner dashed line speaks to a RC with an AUC of 0.5,
0.99 showing no prescient ability. In this plot, all models—Random
SVC 0.97 0.97 0.97 0.97 0.97 0.97 0.97
29 Forest, XGBoost, Angle Boosting, SVC, and Calculated
Regression—perform outstandingly well, as shown by their
Logistic AUC scores of 1.00, which implies culminate classification. The
0.99
Regressio
37
0.97 0.97 0.97 0.97 0.97 0.97 0.97 bends are nearly covering with the best cleared out corner,
n
exhibiting tall genuine positive rates and moo untrue positive
The performance evaluation of various ML models for this rates, meaning solid prescient precision. This recommends that
classification shows that all models, including RF, XGBoost, these models are exceedingly compelling in recognizing
GBM, SVC, and LR, exhibit excellent results in this Table II. between positive and negative cases for lung cancer.
The models achieve high AUC values between 0.9963 and
0.9986, indicating strong discriminatory power. Accuracy for
both the negative (0) and positive (1) classes stands at 0.94,
whereas review is 0.98 for the negative course and 0.93 for the
positive lesson, illustrating high accuracy in recognizing both
classes. The F1-score, which equalizations accuracy and review,
is reliably 0.96 for all models, reflecting a great adjust between
979-8-3315-0574-5/25/$31.00 ©2025 IEEE 354
Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 09,2026 at [Link] UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Machine Learning and Autonomous Systems (ICMLAS-2025)
IEEE Xplore Part Number: CFP258A9-ART; ISBN: 979-8-3315-0574-5
This bar plot Fig.4 shows the feature importance scores derived
from a RF show for anticipating lung cancer. Highlights are
positioned based on their commitment to the model's
expectations, with "Sensitivity" and "Liquor Devouring" being
the most persuasive, taken after by highlights like
"WHEEZING", "Gulping Trouble", and "Hacking". The least
important features include "YELLOW_FINGERS" and
"ANXIETY". Additionally, the newly created interaction
feature "ANXYELFIN" has a moderate impact. This analysis
highlights the key factors contributing to lung cancer prediction
in the dataset, helping to identify critical variables for further
investigation or clinical focus.
Fig.3 Confusion Matrix for Machine Learning Classifiers
This confusion Matrix Fig.3 for ML classifiers summarizes the Fig.5 SHAP values
predicted results of various models, such as RF, GBM, SVC, and
LR. Here, every row represents the actual classes, and the
column represents the predicted classes that will follow some The SHAP summary plot Fig.5 illustrates the impact of
parameter in this matrix: TP,TN,FP, and FN. individual features on the lung cancer prediction model, with
each dot representing a data point and its SHAP value indicating
the feature's influence on the model output. Features like
CHRONIC DISEASE, ALLERGY, and ALCOHOL
CONSUMING show significant positive impacts on the
prediction when their values are high, as indicated by the red-
colored points pushing the SHAP values to the right.
Conversely, features such as COUGHING, SWALLOWING
DIFFICULTY, and the interaction feature ANXYELFIN also
contribute positively but with more moderate influence. Lower
values of features like YELLOW_FINGERS and CHEST PAIN
tend to have minimal impact. This analysis highlights
CHRONIC DISEASE and ALLERGY as critical predictors,
providing a clear understanding of feature importance and their
directional contributions to lung cancer predictions.
Fig.4 Feature Importance Score
979-8-3315-0574-5/25/$31.00 ©2025 IEEE 355
Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 09,2026 at [Link] UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Machine Learning and Autonomous Systems (ICMLAS-2025)
IEEE Xplore Part Number: CFP258A9-ART; ISBN: 979-8-3315-0574-5
Fig.6 Heat map correlations between various features in the lung cancer dataset
The heatmap in Fig.6 illustrates the correlations between various
features in a lung cancer dataset. The strongest correlation is
REFERENCES
between YELLOW_FINGERS and ANXIETY (0.56),
suggesting a connection between these factors. The interaction [1] Radhika, P. R., Nair, R. A., & Veena, G. (2019, February). A
comparative study of lung cancer detection using machine learning
feature ANXYELFIN shows a high correlation of 0.85 with algorithms. In 2019 IEEE international conference on electrical,
itself, indicating its potential significance. Moderate correlations computer and communication technologies (ICECCT) (pp. 1-4). IEEE.
are seen between PEER_PRESSURE and ANXIETY (0.31) and [2] Jenipher, V. N., & Radhika, S. (2020, December). A study on early
YELLOW_FINGERS and FATIGUE (0.33). Features like prediction of lung cancer using machine learning techniques. In 2020
ALCOHOL_CONSUMING, COUGHING, and CHEST_PAIN 3rd International Conference on Intelligent Sustainable Systems
(ICISS) (pp. 911-916). IEEE.
show weak or negligible correlations with LUNG_CANCER,
suggesting they may be less relevant for prediction. This [3] Faisal, M. I., Bashir, S., Khan, Z. S., & Khan, F. H. (2018, December).
analysis highlights important predictors and guides future An evaluation of machine learning classifiers and ensembles for early
stage prediction of lung cancer. In 2018 3rd international conference on
modeling efforts. emerging trends in engineering, sciences and technology (ICEEST) (pp.
1-4). IEEE.
V. CONCLUSION
[4] Thallam, C., Peruboyina, A., Raju, S. S. T., & Sampath, N. (2020,
The proposed machine learning structure achieves exceptional November). Early stage lung cancer prediction using various machine
lung cancer detection by processing structured healthcare learning techniques. In 2020 4th International Conference on
Electronics, Communication and Aerospace Technology (ICECA) (pp.
information. This study successfully applies combination 1285-1292). IEEE.
methods of preprocessing techniques and feature engineering
[5] Choudhary, D. M., Roshan, V., Sri, A. D., Ajith, J., Srinivas, P. V. V.
with class balancing to achieve better model results. The S., & Amarendra, K. (2024, August). Implementation of Predictive
combination of Random Forest and XGBoost ensemble models Modeling for Lung Cancer Diagnosis Using Machine Learning and
proves superior than other methods during comparative analysis Deep Learning Algorithms on Lung Dataset. In 2024 Second
International Conference on Intelligent Cyber Physical Systems and
and SHAP-based methods boost interpretability levels. New Internet of Things (ICoICI) (pp. 960-966). IEEE.
research should concentrate on growing demographic records
[6] Rahane, W., Dalvi, H., Magar, Y., Kalane, A., & Jondhale, S. (2018,
with genetic information while developing deep learning March). Lung cancer detection using image processing and machine
technologies that combine electronic health records for practical learning healthcare. In 2018 International Conference on Current
clinical use. Additional development of these systems will lead Trends towards Converging Technologies (ICCTCT) (pp. 1-5). IEEE.
to enhanced predictive capacity which will support early lung [7] Banerjee, N., & Das, S. (2020, March). Prediction lung cancer–in
cancer discovery to benefit patient healthcare decisions and machine learning perspective. In 2020 International conference on
treatment results. computer science, engineering and applications (ICCSEA) (pp. 1-5).
IEEE.
979-8-3315-0574-5/25/$31.00 ©2025 IEEE 356
Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 09,2026 at [Link] UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Machine Learning and Autonomous Systems (ICMLAS-2025)
IEEE Xplore Part Number: CFP258A9-ART; ISBN: 979-8-3315-0574-5
[8] Pati, J. (2018). Gene expression analysis for early lung cancer
prediction using machine learning techniques: An eco-genomics
approach. IEEE Access, 7, 4232-4238.
[9] Tekade, R., & Rajeswari, K. (2018, August). Lung cancer detection and
classification using deep learning. In 2018 fourth international
conference on computing communication control and automation
(ICCUBEA) (pp. 1-5). IEEE.
[10] Wu, Q., & Zhao, W. (2017, October). Small-cell lung cancer detection
using a supervised machine learning algorithm. In 2017 international
symposium on computer science and intelligent controls (ISCSIC) (pp.
88-91). IEEE.
[11] Bharathy, S., & Pavithra, R. (2022, May). Lung cancer detection using
machine learning. In 2022 International Conference on Applied
Artificial Intelligence and Computing (ICAAIC) (pp. 539-543). IEEE.
[12] Kadir, T., & Gleeson, F. (2018). Lung cancer prediction using machine
learning and advanced imaging techniques. Translational lung cancer
research, 7(3), 304.
[13] Xie, Y., Meng, W. Y., Li, R. Z., Wang, Y. W., Qian, X., Chan, C., ... &
Leung, E. L. H. (2021). Early lung cancer diagnostic biomarker
discovery by machine learning methods. Translational oncology, 14(1),
100907.
[14] Dodia, S., Annappa, B., & Mahesh, P. A. (2022). Recent advancements
in deep learning based lung cancer detection: A systematic review.
Engineering Applications of Artificial Intelligence, 116, 105490.
[15] Shin, H., Oh, S., Hong, S., Kang, M., Kang, D., Ji, Y. G., ... & Choi, Y.
(2020). Early-stage lung cancer diagnosis by deep learning-based
spectroscopic analysis of circulating exosomes. ACS nano, 14(5), 5435-
5444.
979-8-3315-0574-5/25/$31.00 ©2025 IEEE 357
Authorized licensed use limited to: SRM Institute of Science and Technology- RamaPuram. Downloaded on January 09,2026 at [Link] UTC from IEEE Xplore. Restrictions apply.