0% found this document useful (0 votes)
23 views16 pages

Explainable ML for Lung Cancer Diagnosis

The article discusses the development of an explainable machine learning application, XML-GBM, for diagnosing lung cancer, achieving a high accuracy of 98.76% using the Gradient Boosting Machine (GBM) model. It emphasizes the importance of interpretability in machine learning models, utilizing SHAP for feature explanation and a mobile app for practical application. The study highlights the significance of early detection in improving survival rates for lung cancer patients.

Uploaded by

shwetha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views16 pages

Explainable ML for Lung Cancer Diagnosis

The article discusses the development of an explainable machine learning application, XML-GBM, for diagnosing lung cancer, achieving a high accuracy of 98.76% using the Gradient Boosting Machine (GBM) model. It emphasizes the importance of interpretability in machine learning models, utilizing SHAP for feature explanation and a mobile app for practical application. The study highlights the significance of early detection in improving survival rates for lung cancer patients.

Uploaded by

shwetha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Journal of Pathology Informatics 14 (2023) 100307

Contents lists available at ScienceDirect

Journal of Pathology Informatics


journal homepage: [Link]/locate/jpi

XML-GBM lung: An explainable machine learning-based application for the


diagnosis of lung cancer
Sarreha Tasmin Rikta a, Khandaker Mohammad Mohi Uddin a, , Nitish Biswas a, Rafid Mostafiz b,

Fateha Sharmin c, Samrat Kumar Dey d
a
Department of Computer Science and Engineering, Dhaka International University, Dhaka 1205, Bangladesh
b
Institute of Information Technology, Noakhali Science and Technology University, Noakhali, Bangladesh
c
Department of chemistry, University of Chittagong, Chittagong, Bangladesh
d
School of Science and Technology, Bangladesh Open University, Gazipur 1705, Bangladesh

A R T I C L E I N F O A B S T R A C T

Keywords: Lung cancer has been the leading cause of cancer-related deaths worldwide. Early detection and diagnosis of lung
Lung cancer cancer can greatly improve the chances of survival for patients. Machine learning has been increasingly used in the
Explainable machine learning medical sector for the detection of lung cancer, but the lack of interpretability of these models remains a significant
ROS challenge. Explainable machine learning (XML) is a new approach that aims to provide transparency and interpretabil-
SHAP
ity for machine learning models. The entire experiment has been performed in the lung cancer dataset obtained from
GBM
Mobile app
Kaggle. The outcome of the predictive model with ROS (Random Oversampling) class balancing technique is used to
comprehend the most relevant clinical features that contributed to the prediction of lung cancer using a machine
learning explainable technique termed SHAP (SHapley Additive exPlanation). The results show the robustness of
GBM's capacity to detect lung cancer, with 98.76% accuracy, 98.79% precision, 98.76% recall, 98.76% F-Measure,
and 0.16% error rate, respectively. Finally, a mobile app is developed incorporating the best model to show the efficacy
of our approach.

Introduction (WHO), lung cancer was the leading cause of cancer-related death in 2020,
taking 1.80 million lives.
The most lethal kind of cancer, globally, is lung cancer. It is one of the Both the mortality rate and the number of people affected by this
main causes of cancer fatalities in both women and men.1,2 One type of can- disease are predicted to rise along with the increase in the world's popula-
cer that begins in the lungs is lung cancer. When the body's cells start to pro- tion. The 5-year survival rate for lung cancer is only 18%, which highlights
liferate out of control, cancer develops. Lung cancer frequently develops the importance of early detection and diagnosis. Medical imaging, such as
over a long period of time and primarily affects persons between the ages computed tomography (CT) scans, is commonly used in the diagnosis of
of 55 and 65.3 Small cell lung cancer (SCLC) and non-small cell lung cancer lung cancer. However, the interpretation of medical images can be
(NSCLC) are the 2 main kinds of lung cancer. NSCLC accounts for about challenging and time-consuming for radiologists. This fatality rate can be
80%–85% of lung cancer cases. Most often, smokers or former smokers de- decreased with early detection and treatment. Machine learning algorithms
velop this type of lung cancer. More than 85% of lung cancer cases are can be quite beneficial in that circumstance to correctly forecast the
caused by current or previous cigarette smokers.4 Compared to other malignancy.6–8 However, due to the complexity and black-box nature of
forms of lung cancer, it is more prevalent in women than men and is many machine learning algorithms, it is difficult to understand how the
more likely to affect younger people. Contrarily, SCLC, also known as oat model is making predictions or decisions and to identify potential errors
cell carcinoma, accounts for 10%–15% of all cases of lung cancer. The or biases. This can be a major concern in fields such as healthcare, finance,
growth rate of SCLC and the formation of big tumors that have the potential and criminal justice, where the consequences of model errors can be severe.
to spread far throughout the body are virtually directly correlated with cig- Another limitation of black-box models is that they are often sensitive to the
arette smoking. They frequently begin in the bronchi in the center of the choice of hyperparameters, which can make them difficult to optimize and
chest. The overall number of cigarettes smoked has an impact on the generalize to new data. Additionally, black-box models can be prone to
death rate from lung cancer.5 According to the World Health Organization overfitting, which can lead to poor performance on unseen data. Overall,

⁎ Corresponding author at: Department of Computer Science and Engineering, Dhaka International University, Dhaka 1205, Bangladesh.
E-mail addresses: jilanicsejnu@[Link] (K.M.M. Uddin), [Link]@[Link] (S.K. Dey).

[Link]
Received 30 January 2023; Accepted 20 March 2023
Available online 24 March 2023
2153-3539/© 2023 The Author(s). Published by Elsevier Inc. on behalf of Association for Pathology Informatics. This is an open access article under the CC BY-NC-ND license
([Link]
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

black-box models have been very successful in various applications, but the to work with lung cancer. This article is focused on the explainability of
lack of interpretability and transparency is a major concern. Explainable models used in lung cancer so that people may comprehend how a model
machine learning (XML) is a new approach that aims to provide transpar- works internally and have faith in the experiment's prediction. The
ency and interpretability for black-box models, which can help overcome proposal's summary is shown in Fig. 1.
these limitations. Gradient Boosting Machine (GBM), Extreme Gradient Boosting
Due to this, post hoc techniques have recently become increasingly (XGBoost), and Light Gradient Boosting (LightGBM) are 3 ensemble-
popular as a solution to the problem of presenting black-box models in a based classifiers that have been used in this study. Among all the classifiers,
way that is understandable by humans. Such explanations are frequently GBM has the best accuracy, coming up at 98.76%. Finally, a smartphone
used to assist domain experts in finding discriminatory biases in black- app is developed to integrate the best model. The method for applying
box models.9,10 the proposal is shown in Fig. 2.
Local, model-agnostic methods that concentrate on explaining specific The following are the contributions to the proposed research efforts for
predictions of a given black-box classifier, such as LIME11 and SHAP,12 lung cancer:
are among the most well-known of these techniques. These techniques
1. GBM achieves a higher accuracy of 98.76% in this investigation.
produce perturbations of a particular instance in the data and track the im-
2. For the best outcomes, 3 classifiers—XGBoost, LightGBM, and GBM—
pact of these perturbations on the black-box classifier's output in order to
are used in this scenario.
quantify the contribution of individual features to a given prediction.
3. Data balance techniques, feature scaling, PCA (Principal Component
These approaches have been employed in a variety of fields, including
Analysis), and hyperparameter tuning have been used to attain the
law, medicine, finance, and science13–15 due to their generality, to explain
highest level of accuracy.
a variety of classifiers, including neural networks and sophisticated ensem-
4. SHAP is utilized hereto make the gradient-boosting output understand-
ble models.
able, meaningful, and trustworthy to humans.
The 4 explainable principles,16 however, are used in this experiment to
5. Finally, a user-friendly smart phone application that can calculate the
forecast lung cancer. Transparency is the first major principle that explains
result based on real-time inputs has been developed.
models in a clear and understandable way, for example, by highlighting the
most important features that led to a certain diagnosis. This can be achieved There are 4 sections for the remaining portions of this research project.
by using techniques such as feature importance, feature selection, and Section 2 demonstrates the related work and Section 3 provides the mate-
model interpretability methods such as LIME, SHAP, and others. According rials and methods. The 4 subsections in Section 3 are dataset description
to the second principle named Fairness, the model should not discriminate and data pre-processing, PCA and Hyperparameter tuning, machine
against certain groups of patients based on factors such as age, race, or gen- learning models, andSHAP. The analysis and discussion of the results are
der. This can be achieved by using fairness-aware algorithms, such as those covered in Section 4 which contains 5 subsections. These subsections are
that explicitly optimize for group fairness metrics, or by using bias correc- environmental setup, classification accuracy, model evaluation, SHAP re-
tion methods, such as re-sampling or adversarial training. And the third sult analysis, and the creation of the mobile app. The task is finally finished
principle is Robustness. According to this principle, the model should be ro- in Section 5.
bust to small changes in the input data and produce consistent predictions
even when presented with new or unseen data. This can be achieved by
using techniques such as cross-validation, regularization, and ensembling Related work
to improve model generalization. In accordance with the final tenet,
accountability, the model should be able to provide an explanation for its There has been a growing interest in the use of eXplainable Machine
predictions in case of errors or mistakes, and it should be possible to under- Learning (XML) techniques for lung cancer prediction in recent years.
stand the causes of these errors and take corrective measures. This can be Some notable studies in this field include:
achieved by using techniques such as model monitoring, model auditing, In one study, Masrur Sobhan and Ananda Mohan Mondal24 proposed a
and model governance to ensure that the model is behaving as expected pathway to identify significant lung cancer class- and patient-specific genes
and that any issues or biases are identified and addressed in a timely man- that could support the development of effective medicines for lung cancer
ner. The above explainable machine learning principles will help to build a patients. They used the 2 SHAP variants known as "tree explainer" and "gra-
trustable and understandable model which can help to improve the dient explainer," for which the classification algorithms "tree-based classi-
accuracy of predictions and ensure that the model is fair, robust, and fier," XGBoost, and "deep learning-based classifier," convolutional neural
accountable. network, respectively, were applied. The class-specific top 100 genes and
However, we proposed SHAP (SHapley Additive exPlanation) values in the differentially expressed genes, both of which are population-based
this experiment to increase trust, responsibility, debugging, and many other biomarkers. Few genes were found to be shared by the patients, indicating
tasks. Game theory concepts17 and local explanations are used to form the that each individual with lung cancer is represented by a different set of
foundation of SHAP. Explainable strategies have been highlighted in re- patient-specific genes that were found. This test demonstrates that XGBoost
nowned journals and are also gaining popularity in medical applications. achieves 96.3% accuracy.
Cosgriff and Celi18 show how to analyze deep neural network models In this study,25 machine learning (ML) models are used to predict the
using explanatory methodologies using high-frequency electronic patient length of stay (LOS) for patients with lung cancer. The methodology is
records. Explanatory models were described in Lundberg et al.,19 as a way forth to address imbalanced datasets for classification-based methods
to supplement ML models in forecasting mortality for patients with kidney employing electronic medical records (EHR). They forecasted the average
failure. Nowadays, image data analysis, X-rays, CT scans, ultrasounds, and length of stay (LOS) for ICU patients with lung cancer using the MIMIC-III
other imaging techniques use explicable models. Lundberg et al.20 explain dataset and supervised ML algorithms. The Random Forest (RF) Model per-
how models forecast hypoxemia during surgery work. The list of techniques formed better than other models during the 3 stages of the framework and
used in medicine is more thoroughly discussed in Singh et al.21 In addition, delivered the expected results. They described the predictive model's (RF)
research into lung conditions is ongoing. Numerous articles have been writ- outcome using the SMOTE class balance technique to comprehend the
ten about using artificial intelligence (AI) to treat lung conditions. Xi et al.22 most significant clinical factors that contributed to predicting lung cancer
employed exhaled aerosols to detect lung structural illness using machine LOS using the RF model utilizing SHAP.
learning algorithms like Random Forests (RF) and Support Vector Another study by Jamie et al.26 used 3 XAI techniques—SHAP, LIME,
Machines. In the extensive investigation, RF models were also employed.23 and Scoped Rules—to show how usable it is to add an explainable tertiary
To find lung cancer, scientists suggest a handmade e-nose device. Even appendix to ML models and to give data interpretability for large-scale EHR
though there has been a lot of research on this subject, we are still driven datasets. The Simulacrum, a synthetic dataset produced by Health Data

2
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

Fig. 1. Overview of the proposed work.

Fig. 2. Working flow of the mobile application.

Insight CiC using anonymized cancer data supplied by Public Health method aids in a better understanding of the comparison of the 3 lung can-
England's National Cancer Registration and Analysis Service (NCRAS), cer risk prediction models used in lung cancer screening, namely the BACH
served as the source of the data for this [Link] contrasted EHR model, PLCOm2012 model, and LCRAT model. The study's model perfor-
features based on the weighted prediction relevance calculated by XAI mance and accuracy are not discussed by the authors. They only concen-
models. However, in this study, 3 classifiers—Logistic Regression, XGBoost, trate on comprehending how the models act for various patients. For this
and EBM—were utilized, with XGBoost displaying the greatest perfor- investigation, they employed the domestic lung cancer database.
mance in terms of classification accuracy. Elias Dritsas and Maria Trigka28 employed a variety of machine learning
By using the example of models used to assess lung cancer risk in lung classifiers, including NB, BayesNet, SGD, SVM, LR, ANN, KNN, J48, LMT,
cancer screening by low-dose computed tomography, Katarzyna et al.27 RF, RT, RepTree, RotF, and AdaBoostM1, to identify people who are at a
proposed selected approaches from the XAI field in another work. This high risk of developing lung cancer. In order to identify the model with

3
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

the highest predictive performance, these classifiers are assessed in terms of machine learning models, including Decision Tree, Logistic Regression,
accuracy, precision, recall, F-Measure, and AUC. Their source for the Bagging, Random Forest, and AdaBoost. To assess model performance, 2
dataset was obtained from Kaggle. The RotF model from this experiment different feature sets were utilized with K-fold 5 cross-validations. Data
performs at the highest level. from 809 lung cancer surgery survivors who underwent surgery were com-
Another study on lung cancer research was carried out by Muntasir pared to the model performances. This experiment's findings indicated that
et al.29 in which they examined a number of earlier studies that were con- AdaBoost had the highest accuracy of 94.8%.
cerned with lung cancer prediction models and contrasted the results Overall, these studies mainly focus on demonstrating the potential of
with their models. They created the XGBoost, LightGBM, AdaBoost, and using explainable machine learning for lung cancer research, as it can im-
bagging ensemble learning approaches, among others, to forecast lung can- prove the performance and interpretability of machine learning models.
cer. The model validation approach was carried out using K-fold 10 cross- They provide interpretable insights into the underlying mechanisms of
validation. The best accuracy in this experiment is 94.42%, which is the disease and the factors that contribute to its development.
achieved with XGBoost.
Patra30 examined various machine learning classifiers, including Radial Materials and methods
Basis Function Network (RBF), K-Nearest Neighbors (KNN), J48, Support
Vector Machine (SVM), Logistic Regression, Artificial Neural Network This section explains our recommended approach and the techniques
(ANN), Nave Bayes, and Random Forest, for predicting lung cancer. The utilized to detect lung cancer. Fig. 3 depicts the proposed approach for
dataset, which includes 32 occurrences and 57 attributes, was gathered predicting lung cancer. The suggested approach looked into label encoding,
from the "UCI repository." RBF achieved an accuracy of 81.25%, which feature selection methods with standard scalars, where each feature's
the authors deemed to be higher than all the other algorithms. values in the data have a zero mean and unit variance, Principal Component
Another study by Sim et al.31 suggested a study of health-related quality analysis (PCA), which compresses the data, hyperparameter tuning with
of life (HRQOL) in 5-year lung cancer survival prediction using a variety of grid search, which is used to determine the best model, and performance

Fig. 3. Working procedure of proposed methodology.

4
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

matrices, which are used to improve the efficacy of a classification model. Table 1
This experiment's results showed that using all machine learning classifiers Descriptions of each characteristic in the dataset.
to predict lung cancer is a promising direction to go in. Below is a depiction Attributes name Description
of the algorithm used in this study for simplicity of understanding. Gender This characteristic indicates whether a person is male or
Algorithm: Working Procedure of XML Lung Cancer Prediction female.
Input: Kaggle Lung Cancer Dataset Age The person's age is recorded using this feature.
Output: Predicted value of XML Lung Cancer (Yes or No) Smoking This characteristic indicates whether or not the participant
smokes.
1. Begin Yellow fingers This characteristic shows if the user has yellow fingers or
2. data ← load dataset; not.
Anxiety The presence or absence of anxiety is indicated by this
3. shap ← load shap; feature.
4. Procedure DO_EXPLAINABLE( model, x ) Peer pressure This characteristic lets the user know whether they are
5. explainer ← [Link]( model, x ); susceptible to peer pressure or not.
6. shap_values ← explainer( x ); Chronic disease This feature indicates if the participant has a chronic disease
or not
7. if plots is equal bar
Fatigue The participant's level of fatigue is indicated by this
8. [Link]( shap_values ); attribute.
9. else if plots is equal beeswarm Allergy This characteristic shows whether or not the person has
10. [Link]( shap_values ); allergies.
11. else Wheezing This feature reveals if the individual has wheezing or not.
Alcohol This feature shows whether or not the person drinks.
12. [Link]( shap_values );
Coughing This feature shows whether the participant has a cough or
13. end procedure not.
14. pre-processing: Shortness of breath This feature shows whether the participant suffers from
15. [Link] is equal object or string shortness of breath or not
Swallowing difficulty This feature indicates whether the user has swallowing
16. encoding the data;
problems or not.
17. x ← [Link][lung]; Chest pain This feature shows whether or not the individual is
18. y ← [Link]; experiencing chest pain.
19. balancing_data: Lung cancer This feature specifies whether or not the individual has been
20. x_os, y_os = RandomOverSampling(x,y); diagnosed with lung cancer or not.

21. feature_scaling:
22. scaled_x ← scaling_the_feature(x_os);
23. feature_optimizing: in the dataset have a value of 0, indicating that no cancers were discovered.
24. pca_x = PCA(n_components = 9).fit(scaled_x); Fig. 4 shows the imbalanced data before imbalanced and after imbalanced.
25. x1, x2, y1, y2 ← split_data of pca_x and y_os;
26. for i in range(len(models)):
PCA and hyperparameter tuning
27. checking for HTP;
28. model ← train_model using x1and y1;
The dataset of this experiment consists of 16 features that are highly di-
29. predict ← testing_model using x2 and y2;
mensional. Due to overfitting, these numerous features make it difficult to
30. computes performance evaluation metrics;
achieve the optimal outcome. In order to improve the performance of the
31. DO_EXPLAINABLE(model, x);
outcome, Principal Component Analysis (PCA) is applied to the dataset,
32. End
which reduces the 16 characteristics to 9. PCA is a dimensionality-
reduction technique that is frequently used to reduce the dimensionality
Dataset description and data pre-processing of big datasets.35 It is done by condensing a large collection of variables
into a smaller one while retaining the majority of the data in the larger
A dataset called "Lung Cancer" was obtained from the Kaggle32 and set.36
contains 309 occurrences and 16 attributes, of which 15 attributes are pre- However, machine learning employs hyperparameter tuning,37 where
dictive and 1 is the class attribute. Lung cancer is the class attribute, and the the value of the parameter is chosen before the algorithm is taught. That
predictive attributes are, in order, gender, age, smoking, yellow fingers, particular set of hyperparameters maximizes the performance of the
anxiety, peer pressure, chronic disease, fatigue, allergy, wheezing, alcohol, model and produces better results with fewer errors by minimizing a preset
coughing, shortness of breath, swallowing trouble, and chest pain. Table 1 loss function. GridSearchCV and hyperparameter tuning are consequently
describes each feature of the dataset. merged in this study. GridSearchCV is a method in the scikit-learn library
One of these characteristics is that Gender and Lung Cancer both con- for Python that is used to perform an exhaustive search over a specified
tain categorical values that have been transformed into numerical values
(0,1) during the data pre-processing stage via label encoding. The dataset's
noise, missing values or information, and unbalanced data33 may reduce
the accuracy of the result. That is why these undesired items from the
dataset were eliminated prior to running the machine-learning model.
The best output for the dataset is achieved by data pre-processing. How-
ever, this dataset does not contain any missing values, but this dataset
was completely imbalanced. Random oversampling (ROS)34 is used here
to address the issue of imbalanced datasets. It involves duplicating exam-
ples from the minority class in order to balance the class distribution.
This is done by randomly selecting examples from the minority class and
adding them to the dataset until the class distribution is balanced. In
order to conduct this experiment, the minority class has been increased
by 70%. After oversampling, a total of 216 rows in the dataset have a
value of 1, indicating that malignancies were discovered, while 270 rows Fig. 4. Data balanced: (a) Before data balancing and (b) after data balancing.

5
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

parameter space for an estimator. This technique for hyperparameter Table 2


tuning does a comprehensive cross-validation search to find the ideal Environment setup of the proposed system.
values for the desired hyperparameters.38 The model evaluates and verifies Resource Details
each unique set of dictionary values.39 But in order to get the promising CPU Intel® Core™ i-3-1005G1 CPU @ 1.20GHz
accuracy from GridSearchCV, we specified the parameters for GBM, RAM 12GB
XGBoost, and LightGBM. The best GBM parameters n_estimators=100, GPU Intel® UHD Graphics
learning_rate=1, and max_depth=1 was chosen to yield an accuracy of Software Anaconda
Language Python
98.76%.
GridsearchCV was then used to build XGBoost, with the best parameters
being objective="binary: logistic", random_state=45, eval_metric="auc," generated by utilizing this dataset to train the model. Shapley’s value dis-
and n_estimators=100, producing an accuracy of 98.27%. Following the tributes the advantages of cooperation evenly by taking into account each
development of the LightGBM model with default parameters, predictions player's contributions. The standard measure of feature relevance merely
were made on an unseen test set. But the accuracy was poor. Consequently, indicates which features are significant, and its effects on prediction out-
the model was built using GridsearchCV, which achieved an accuracy of comes are unknown. The key benefit of the SHAP value is that it may dem-
98.89% when the ideal values for num_leaves=31, learning_rate=1, and onstrate both the positive and negative effects of the impact of the
n_estimators=100 were used. The 10-fold cross-validation was employed attributes in each sample.
for conducting this [Link], the best model with the highest accu-
racy is selected for each set of hyperparameters.
Result and discussion
Machine learning classifiers
Firstly, the data is split into training (65%) and testing (35%). The
Data modeling is carried out here using 3 ensemble-based machine optimal model with the highest accuracy is examined using a variety of ma-
learning algorithms: Gradient GBM, LightGBM, and [Link] is the chine learning methods, including feature scaling, PCA, ROS, and
learning process that incrementally fits new models to provide a more hyperparameter [Link] best model was selected using all these ma-
precise estimate of the response variable. To create the final forecasts, it ag- chine learning techniques.
gregates the predictions from many decision trees. Every decision tree's
nodes use a distinct subset of information to decide which split is the Environmental setup
best. This indicates that no 2 trees are exactly alike, and as a result, various
signals can be extracted from the data by each tree. Each subsequent tree This experiment involves some resources. Table 2 presents the materials
also takes into account any blunders or errors produced by the preceding used for this study's model development.
trees. So, each decision tree that comes after it is constructed using the
flaws of the prior trees. A gradient-boosting machine algorithm builds the
trees in a sequential manner in this way. Classification accuracy
Another well-known boosting method is XGBoost. XGBoost is actually
just a modified version of the GBM algorithm. The goal of this classifier is The efficiency of the classification systems is assessed using a number of
to accurately classify data by calculating weak classifiers iteratively.40 well-known matrices, such as accuracy, recall (also known as sensitivity),
Apply the accuracy and logistic loss criterion to select the best model in precision, and F1-score.44 Table 3 displays how well GBM, XGBoost, and
the hypothesis space and make the best prediction of the test data under LightGBM performed in terms of machine learning techniques. However,
the evaluation criterion using the sample data that are provided that are in-
dependent of one another. In fact, it comprises a number of regularization Table 3
methods that lessen overfitting and enhance performance in general. Evaluation of explainable machine learning methods.
Large volumes of data can be handled with ease with LightGBM. The
Methods Precision Recall F_Measure Accuracy Error
optimal split is chosen by LightGBM using a histogram-based strategy to ex-
GBM 98.79% 98.76% 98.76% 98.76% 0.012%
pedite the training process. Any continuous variable is separated into bins
XGB 96.41% 96.27% 96.28% 96.27% 0.037%
or buckets rather than using individual values. This shortens the training LGBM 96.97% 96.89% 96.89% 96.89% 0.031%
period and uses less memory. However, 3 machine-learning algorithms
were examined in this study for their capacity to forecast the emergence
of lung cancer, with GBM showing the highest level of accuracy.

SHAP (SHapley Additive explanation)

SHAP is a comprehensive method for analyzing the results of any


machine learning model developed by Lundberg et al.41 The SHAP provides
a way to calculate the contribution of each feature and is based on game
theory and local explanations. The model generates a prediction value for
each prediction sample, and the SHAP value is the score given to each fea-
ture in the dataset.42 In order to support iML, SHAP was created and made
available as a set of python tools. For each feature, SHAP provides a list of
Shapley values for a particular datum. This is based on the notion that pre-
dictions can be described by supposing that each feature is a "player" in a
game where the prediction is the payout.43 The Shapley value, a strategy
from coalitional game theory, explains how to equally distribute the
"payout" across the characteristics. Numerous distinctive factors will be
present in our existing dataset. Each distinctive variable can be viewed as
a player in the game theory sense. The benefits of several participants work-
ing together to complete a project may be seen in the prediction results Fig. 5. Accuracy for the GBM, XGB, and LGBM.

6
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

Fig. 6. Confusion matrix for: (a) GBM and (b) XGB, and (c) LGBM.

we have included the Accuracy, Precision, Recall, and F1-Score results for TP
Precision ð%Þ ¼  100 (2)
the purpose of observing the models' performance. TP þ FP
The comparison research showed that the GBM classifier outperformed
TP
all others with an accuracy of 98.76%. All metrics showed low performance Recall ð%Þ ¼  100 (3)
TP þ FN
for the XGB classifier. And among these classifiers, LGBM ranked second
highest. Since accuracy is a reliable indicator of balanced data, it is consid- 2  Recall  Precision
ered as the experiment's key performance metric. However, Gradient F_Measure ð%Þ ¼  100 (4)
Recall þ Precision
Boosting Machine achieved the best-balanced accuracy in this investiga-
tion, as shown in Fig. 5. FP þ FN
Error ð%Þ ¼  100 (5)
TP þ TN þ FP þ FN

Model evaluation Where TP, FP, TN, and FN stand for True Positives, False Positives, and
True Negatives, respectively. The AUC-ROC curve is shown in Fig. 7. Here,
A key component of creating a powerful machine learning model the AUC-ROC curve is used to show how well the classification model per-
ismodel evaluation. In this experiment, various evaluation metrics, such forms on graphs. It is a favored and important statistic for evaluating how
as the confusion matrix (accuracy, precision, recall, F measure, Error), well the categorization model is working.
and the AUC-ROC curve, are utilized to judge the performance or caliber Binary classification issues can be evaluated using the ROC curve as a
of the model. The number of true-positive, true-negative, false-positive, statistic. This probability curve, which basically distinguishes the "signal"
and false-negative predictions made by the algorithm is determined by from the "noise," displays the TPR (True positive rate) versus the FPR
the confusion matrix. True positives are the number of instances where (False Positive Rate) at different threshold values. The ROC curve is sum-
the algorithm successfully identified the positive class, whereas true nega- marized using the Area Under the Curve (AUC), which measures a classi-
tives are the number of instances where the method correctly anticipated fier's capacity to differentiate between classes. The performance of the
the negative class. False positives are the number of occasions when the al- model in separating the positive and negative classes is inversely correlated
gorithm predicted a positive class when the actual class was negative, and with the AUC.
false negatives are the number of occasions when the system predicted a
negative class when the actual class was positive. A variety of performance SHAP result analysis
indicators, including accuracy, precision, recall, and F1 score, can be
calculated using the matrix.45 Fig. 6 shows the confusion matrix of each Finally, the values of their explanatory factors are utilized to determine
classifier. the Shapley value explanations of lung cancer in the test set. In the disci-
These performance metrics allow us to assess how well our model proc- pline of machine learning, the more explainable a model is, the simpler it
essed the given data. These evaluation matrices are defined in Eqs (1)–(5). is to understand and comprehend the predictions that have been made. In
order to explain the model outputs and determine the extent to which a cer-
tain characteristic contributes to the outcomes of a particular event, SHAP
TP þ TN
Accuracy ð%Þ ¼  100 (1) is used here. It is a powerful tool for feature importance that allows to un-
TP þ TN þ FP þ FN
derstand which features are most important in driving a specific prediction,

7
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

Fig. 7. ROC curve: (a) GBM, (b) XGB, and (c) LGBM.

Fig. 8. Features importance for lung cancer prediction.

8
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

Fig. 9. SHAP Bar plot for: (a) GBM, (b) XGBoost, and (c) LGBM.

9
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

Fig. 10. SHAP waterfall plot for: (a) GBM, (b) XGB, and (c) LGBM.

10
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

Fig. 10 (continued).

and how different feature interactions contribute to the overall prediction each other. It can help identify which features are the most important,
of the model. Feature importance strategy is developed as a result of this which have the strongest positive or negative effects, and how they
analysis. Fig. 8 illustrates the importance of variables.46 relate to the final [Link]. 10 represents the waterfall plot for
The graph makes it evident that AGE, which accounts for 190.0 of the GBM, XGBoost, and LGBM.
totals, is the main contributor for lung cancer prediction. FATIGUE is the Fig. 10(a) demonstrates that FATIGUE has a SHAP value of -2.47, which
next most important feature, followed by CHRONIC DISEASE, YELLOW has a negative influence on prediction, while ALLERGY has a positive im-
FINGERS, and so on, which are organized according to their significance. pact on prediction with a SHAP value of +1.88, and so on. All SHAP values
The least-contributing factor, by a factor of 9.0, is CEST PAIN. added together will equal E[f(x)]—f (x). ALLERGY has a +2.99 positive ef-
More graphing SHAP plots are available to aid with human comprehen- fect on prediction for Fig. 10(b). To predict lung cancer, FATIGUE has a
sion of the expected outcome. The bar plot of GBM, XGBoost, and LGBM is negative impact of -0.88 and YELLOW_ FINGERS has a favorable impact
shown in Fig. 9. of +0.65 for XGBoost. Comparatively, AGE has the most positive contribu-
Each model's associated importance is different. The most important tion to the prediction of 10(c), whereas the combined contributions of the
component of GBM, as shown in Fig. 9(a), is FATIGUE, which must imply other 6 variables are the least.
absolute SHAP values that are significantly higher than those of any other Another plotting technique of SHAP is Beeswarm which is shown in
characteristic. CHRONIC DISEASE contributed the second-highest amount Fig. 11. It is a type of scatter plot that is used to display the distribution of
(+1.48). FATIGUE is the secondary contributor to XGBoost, according to a large number of individual observations in a way that minimizes the over-
9(b), whereas ALLERGY is the main contributor, accounting for +1.51. lap between points. In this plot, the data points are represented by small
Similar to GBM, FATIGUE plays a prominent role for 9(c), followed by dots that are placed along the x-axis, with the y-axis showing the density
ALLERGY, CHRONIC DISEASE, and other conditions. of the points. The dots are arranged in a way that they are as close as possi-
Another, visualization technique is the Waterfall Plot, which visual- ble to their x-value without overlapping.
izes the contribution of each feature to the prediction. In a SHAP Wa- The FATIGUE is typically the most significant component for GBM, as
terfall Plot, the features are listed along the x-axis and the SHAP shown in Fig. 11(a). Then, CHRONIC_DISEASE and ALLERGY are the second
values are represented by bars that extend from the baseline (usually and third most crucial factors for prognosis, [Link] the other hand,
zero) to the final prediction. Positive SHAP values indicate that the fea- ALLERGY contributes most to the prediction of stroke for 11(b). The likeli-
ture had a positive impact on the prediction, while negative SHAP hood of experiencing a forecast will also increase as ALLERGY levels rise.
values indicate that the feature had a negative impact. The height of For XGBoost, FATIGUE and YELLOW_FINGERS are the second and third-
each bar represents the magnitude of the feature's contribution to the highest risk factors for lung cancer, [Link]. 11(c) shows that
prediction. It provides a clear and intuitive way to understand how FATIGUE has the biggest influence on prediction. ALLERGY and
each feature contributes to the prediction and how they interact with CHRONIC_DISEASE are the next 2 most crucial characteristics.

11
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

Fig. 11. SHAP Beeswarm plot for: (a) GBM, (b) XGB, and (c) LGBM.

As we can see from this section, GBM obtains the maximum accuracy of SHAP values will become a popular strategy among machine learning prac-
98.76% with a precision of 98.79%, recall of 98.76%, and F-measure titioners.
98.76% with a 0.012% error rate. These signs collectively indicate that However, a lot of research has been done on this subject by numerous
lung cancer detection can be modeled using GBM and are highly significant researchers using diverse methods and producing a range of findings. Accu-
overall. Finally, the 35% testing set is used to test GBM once it has been rate cancer forecasting is crucial since lung cancer affects people all
retrained on the entire 65% training set. Data for lung cancer have shown around the world. The significance of lung cancer inspired us to pursue
to benefit from using SHAP values for model explainability. It is interesting this topic. This study focuses on the use of XML to forecast lung cancer
to note that this work demonstrates how the idea of importance given to and demonstrated a useful implementation (mobile app) that can pre-
features by the absolute SHAP values may be stretched to be utilized as a dict cancer-based on given inputs. A comparison of this work with ear-
feature selection method. Explainable properties, which are used in this ap- lier research is provided in Table 4 in order to understand the existing
proach, could be beneficial for feature selection, a common pre-processing knowledge on a topic and identify gaps in the literature that their own
step in machine learning. We anticipate that feature selection based on study can address.

12
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

Fig. 11 (continued).

Table 4
Comparison of our work with the most related works.
Author Dataset Proposed models Performance

Lung cancer prediction (XML)


Sobhan et al.,1 (2022) UCSC Xena database, 1415 XGBoost Accuracy: 96.3 %
instances
2
Alsinglawi et al., (2022) MIMIC-III data, 53 423 instances RF AUC: 98% (95.3%-100%), Recall: 98% (95.3%-100%).
Katarzyna et al.,3 (2022) Domestic Lung Cancer dataset, 34 Used models BACH, PLCOm2012, Not mentioned the performance and accuracy. Only focus on
393 individuals and LCART. comprehending how the models act for various patients.
Jamie et al.,4 2021 Simulacrum dataset, 1 322 100 XGBoost Precision: 78 %, Recall:78% Accuracy: 78%
instances

Lung cancer prediction (ML)


Elias Dritsas and Maria Trigka,5 2022 Kaggle dataset, 309 instances Rotation forest (RotF) AUC: 99.3%, F-Measure, precision, recall, and accuracy:
97.1%.
Muntasir et al.,6 2022 Kaggle dataset, 309 instances XGBoost AUC: 98.14%, Precision:95.66% Accuracy:94.42% Recall:
94.46%
Patra,7 2020 UCI repository, instances 32 Radial Basis Function Network Accuracy:81.25% F-score: 81.3% AUC: 74.9%
Precision:81.3% Recall: 81.3%
8
Sim et al., 2020 HRQOL data, 809 individuals AdaBoost AUC: 94.9% Accuracy: 94.8%
Proposed Kaggle, 309 instances and total 16 GBM (XML) Accuracy: 98.76% F-score: 98.76% AUC: (train-1.0,
features test-0.991) Precision: 98.79% Recall: 98.76%

Implementation of mobile app Conclusion and future work

The practical component of this study is shown in this section. The ex- In this study, we have made an effort to close the gap regarding the in-
periment's application app, which was created using the best model, is terpretability of these risk models. In various sectors, explainable machine
depicted in Fig. 12. React Native was used to build this application. learning algorithms have gained significant traction. Lung cancer issues
This program has a user feedback form with input fields that forecasts have already been solved using XML techniques. However, we have also
breast cancer and collects user comments. The model is initially con- shown how to use explainable machine learning techniques to predict
structed as a pkl file in the Jupyter notebook after which a flask applica- lung cancer. XML approach offers research insights into the characteristics
tion is used to create an api. This api is used to add a machine learning that are most crucial for cancer prognosis. For this, GBM, XGBoost, and
model to an Android app, and the results are then shown on the screen. LGBM are explained using SHAP. The ability to satisfy multiple desirable
Fig 12(a) and (b) depict the mobile app where anyone can submit input qualities, such as consistency, locality, and missingness, makes SHAP a pop-
for forecasting the outcome. Fig 12(c) and (d) show the output after en- ular option for model interpretability. It is crucial to demonstrate how the
tering the inputs. inner workings of a medical system.

13
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

Fig. 12. Android application (a,b) input field (c,d) result field.

In SHAP, the relevance of features is determined by their contribution to features in terms of relevance. In this experiment, SHAP proved to be supe-
the model's output, regardless of the model being used. These contributions rior to other popular feature selection methods. This finding suggests that
are employed in this case as a feature selection approach and to rank using SHAP as a feature selection mechanism can be a good strategy for

14
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

machine learning solutions that need to be interpretable. We intend to fur- 11. Ribeiro MT, Singh S, Guestrin C. ‘Why should i trust you?’ Explaining the predictions of
any classifier. Proceedings of the ACM SIGKDD International Conference on Knowledge
ther examine SHAP using deep learning in further works. Additionally, an Discovery and Data Mining, 13-17-August-2016. Aug. 2016. p. 1135–1144. [Link]
analysis of the image with explainability will be conducted. org/10.1145/2939672.2939778.
12. Lundberg SM, Allen PG, Lee S-I. A Unified Approach to Interpreting Model Predictions.
[Online]. Available: [Link]
Author Contribution 13. Elshawi R, Al-Mallah MH, Sakr S. On the interpretability of machine learning-based
model for predicting hypertension. BMC Med Inform Decis Mak Jul. 2019;19(1).
[Link]
STR, NB, and KMMU were responsible for the conceptualization and
14. Ibrahim M, Louie M, Paisley J. Global Explanations of Neural Network Mapping the Land-
design of the study. They also had full access to all the study's data and ac- scape of Predictions Ceena Modarres Center for Machine Learning, Capital One. 2019.
cepted responsibility for the accuracy of the model generation and the [Link]
study's data. All of the contributors worked together to write the article. 15. Whitmore LS, George A, Hudson CM. Mapping chemical performance on molecular struc-
tures using locally interpretable explanations. Nov. 2016.[Online]. Available: [Link]
The report was critically revised with input from STR, SKD, and others. org/abs/1611.07443.
All of the results and data presentation techniques were produced by NB 16. Phillips PJ, et al. Four Principles of Explainable Artificial Intelligence Gaithersburg, MD.
and KMMU. The final version has been reviewed and approved by all au- Sep. 2021. [Link]
17. Štrumbelj E, Kononenko I. Explaining prediction models and individual predictions with
thors, who also contributed to the data collection and analysis. feature contributions. KnowlInfSyst Nov. 2014;41(3):647–665. [Link]
1007/s10115-013-0679-x.
18. Cosgriff Cv, Celi LA. Exploiting temporal relationships in the prediction of mortality. Lan-
Funding cet Digital Health Apr. 1, 2020;2(4):e152–e153. [Link]
(20)[Link] Ltd.
None. 19. Lundberg SM, et al. From local explanations to global understanding with explainable AI
for trees. Nat Mach Intell Jan. 2020;2(1):56–67. [Link]
0138-9.
Ethical Approval 20. Lundberg SM, et al. Explainable machine-learning predictions for the prevention of
hypoxaemia during surgery. Nat Biomed Eng Oct. 2018;2(10):749–760. [Link]
org/10.1038/s41551-018-0304-0.
Not required. 21. Singh A, Sengupta S, Lakshminarayanan V. Explainable deep learning models in medi-
cal image analysis. J Imaging Jun. 20, 2020;6(6). [Link]
[Link] AG.
Consent to participate 22. Xi J, Zhao W, Yuan JE, Cao B, Zhao L. Multi-resolution classification of exhaled aerosol
images to detect obstructive lung diseases in small airways. Comput Biol Med Aug.
Not required. 2017;87:57–69. [Link]
23. Li W, Jia Z, Xie D, Chen K, Cui J, Liu H. Recognizing lung cancer using a homemade e-
nose: A comprehensive study. Comput Biol Med May 2020;120. [Link]
Data availability 1016/[Link].2020.103706.
24. M. Sobhan and A. M. Mondal, “Explainable Machine Learning to Identify Patient-specific
Biomarkers for Lung Cancer”, [Link]
25. Alsinglawi B, et al. An explainable machine learning framework for lung cancer hospital
On reasonable request, the corresponding author will provide the data that length of stay prediction. Sci Rep Dec. 2022;12(1). [Link]
021-04608-7.
support the study's findings. 26. Kobylińska K, Orłowski T, Adamek M, Biecek P. Explainable machine learning for lung
cancer screening models. Appl Sci (Switzerland) Feb. 2022;12(4). [Link]
3390/app12041926.
Declaration of interests 27. Duell J, Fan X, Burnett B, Aarts G, Zhou S-M. A Comparison of Explanations Given by Ex-
plainable Artificial Intelligence Methods on Analysing Electronic Health Records. [On-
line]. Available: [Link]
The authors declare that they have no known competing financial inter- 28. Dritsas E, Trigka M. Lung cancer risk prediction with machine learning models. Big Data
ests or personal relationships that could have appeared to influence the and Cognitive Computing Nov. 2022;6(4):139. [Link]
work reported in this paper. 29. Mamun M, Farjana A, al Mamun M, Ahammed MS. Lung cancer prediction model using
ensemble learning techniques and a systematic review analysis. 2022 IEEE World AI IoT
Congress, AIIoT 2022; 2022. p. 187–193. [Link]
References 9817326.
30. Patra R. Prediction of lung cancer using machine learning classifier. Communications in
Computer and Information Science. CCIS; 2020. p. 132–142. [Link]
1. Cassidy A, Duffy SW, Myles JP, Liloglou T, Field JK. Lung cancer risk prediction: A tool 978-981-15-6648-6_11.
for early detection. International Journal of Cancer Jan 1, 2007;120(1):1–6. [Link] 31. ah Sim J, et al. The major effects of health-related quality of life on 5-year survival pre-
org/10.1002/ijc.22331. diction among lung cancer survivors: applications of machine learning. Sci Rep Dec.
2. Diagnosis of Lung Cancer Prediction System Using Data Mining Classification Tech- 2020;10(1). [Link]
niques. [Online]. Available: [Link]. 32. Lung Cancer Prediction Dataset. Available online: [Link]
3. Qiang Y, Guo Y, Li X, Wang Q, Chen H, Cuic D. The Diagnostic Rules of Peripheral Lung ysarahmadbhat/lung-cancer?fbclid=IwAR0uQ5K3mEbQZJcwQGYqlLJ5RydvsK2oU1S
Cancer Preliminary Study Based on Data Mining Technique. [Online]. Available: www. a5vYvit0ECoqkx6-vPR43JAM.
[Link]/locate/jnmu. 33. Ahmed N, et al. Machine learning based diabetes prediction and development of smart
4. Shopland DR, Eyre HJ, Pechacek TF. ARTICLES Smoking-Attributable Cancer Mortality web application. Int J Cognit Comput Eng Jun. 2021;2:229–241. [Link]
in 1991: Is Lung Cancer Now the Leading Cause of Death Among Smokers in the United 1016/[Link].2021.12.001.
States? [Online]. Available: [Link] 34. Mohammed R, Rawashdeh J, Abdullah M. Machine learning with oversampling and
5. Karabatak M, Ince MC. An expert system for detection of breast cancer based on associ- undersampling techniques: overview study and experimental results. 2020 11th Interna-
ation rules and neural network. Expert Syst Appl 2009;36(2 PART 2):3465–3469. tional Conference on Information and Communication Systems, ICICS 2020; Apr. 2020.
[Link] p. 243–248. [Link]
6. Stokowy T, Wojtaś B, Krajewska J, Stobiecka E, Dralle H, Musholt T, et al. A two miRNA 35. Tharwat A. Principal component analysis - a tutorial. Int J Appl Pattern Recognit 2016;3
classifier differentiates follicular thyroid carcinomas from follicular thyroid adenomas. (3):197. [Link]
Mol Cell Endocrinol Jan. 2015;399:43–49. [Link] 36. Kumar S. Effective Hedging Strategy For Us Treasury Bond Portfolio Using Principal
7. Zhang R, bin Huang G, Sundararajan N, Saratchandran P. Multicategory classification Component Analysis. [Online]. Available: [Link]
using an extreme learning machine for microarray gene expression cancer diagnosis. 37. Biswas N, Uddin KMM, Rikta ST, Dey SK. A comparative analysis of machine learning
IEEE/ACM Trans Comput Biol Bioinform Jul. 2007;4(3):485–494. [Link] classifiers for stroke prediction: A predictive analytics approach. Healthcare Anal Nov.
1109/TCBB.2007.1012. 2022;2:100116. [Link]
8. Wang Y, et al. Gene selection from microarray data for cancer classification - a machine 38. Mir Ishrak A, Dhruba M, Haider N, et al. Application of Machine Learning in Credit Risk As-
learning approach. Comput Biol Chem Feb. 2005;29(1):37–46. [Link] sessment: A Prelude to Smart Banking. 2018.
[Link].2004.11.001. 39. Saleh H, et al. Stroke prediction using distributed machine learning based on apache
9. Kim B, et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Ac- spark. Int J Adv Sci Technol 2019;28(15):89–97. [Link]
tivation Vectors (TCAV). 2018. 13478.68162.
10. Tan S, Caruana R, Hooker G, Lou Y. Distill-and-compare: auditing black-box models using 40. Wang D, Zhang Y, Zhao Y. LightGBM: an effective miRNA classification method in breast
transparent model distillation. AIES 2018 - Proceedings of the 2018 AAAI/ACM Confer- cancer patients. ACM International Conference Proceeding Series; Oct. 2017. p. 7-11.
ence on AI, Ethics, and Society; Dec. 2018. p. 303–310. [Link] [Link]
3278721.3278725.

15
S.T. Rikta et al. Journal of Pathology Informatics 14 (2023) 100307

41. Lundberg SM, et al. Explainable machine-learning predictions for the prevention of 44. Sokolova M, Lapalme G. A systematic analysis of performance measures for classification
hypoxaemia during surgery. Nat Biomed Eng 2018;2(10):749–760. [Link] tasks. Inf Process Manag Jul. 2009;45(4):427–437. [Link]
1038/s41551-018-0304-0. 03.002.
42. Li R, et al. Machine learning–based interpretation and visualization of nonlinear interac- 45. Luque A, Carrasco A, Martín A, DE LAS Heras A. The impact of class imbalance in classi-
tions in prostate cancer survival. JCO Clin Cancer Informatics 2020;4:637–646. https:// fication performance metrics based on the binary confusion matrix. Pattern Recognit Jul.
[Link]/10.1200/cci.20.00002. 2019;91:216–231. [Link]
43. Du Y, Rafferty AR, McAuliffe FM, Wei L, Mooney C. An explainable machine learning- 46. Fisher A, Rudin C, Dominici F. All Models are Wrong, but Many are Useful: Learning a
based clinical decision support system for prediction of gestational diabetes mellitus. Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. 2019.
Sci Rep Dec. 2022;12(1). [Link]

16

You might also like