TY - JOUR AU - Liu, Huasheng AU - Shang, Guangqian AU - Shan, Qianqian PY - 2025/10/3 TI - Deep Learning Algorithms in the Diagnosis of Basal Cell Carcinoma Using Dermatoscopy: Systematic Review and Meta-Analysis JO - J Med Internet Res SP - e73541 VL - 27 KW - deep learning algorithms KW - dermatoscopy KW - basal cell carcinoma KW - meta-analysis KW - artificial intelligence KW - AI N2 - Background: In recent years, deep learning algorithms based on dermatoscopy have shown great potential in diagnosing basal cell carcinoma (BCC). However, the diagnostic performance of deep learning algorithms remains controversial. Objective: This meta-analysis evaluates the diagnostic performance of deep learning algorithms based on dermatoscopy in detecting BCC. Methods: An extensive search in PubMed, Embase, and Web of Science databases was conducted to locate pertinent studies published until November 4, 2024. This meta-analysis included articles that reported the diagnostic performance of deep learning algorithms based on dermatoscopy for detecting BCC. The quality and risk of bias in the included studies were assessed using the modified Quality Assessment of Diagnostic Accuracy Studies 2 tool. A bivariate random-effects model was used to calculate the pooled sensitivity and specificity, both with 95% CIs. Results: Of the 1941 studies identified, 15 (0.77%) were included (internal validation sets of 32,069 patients or images; external validation sets of 200 patients or images). For dermatoscopy-based deep learning algorithms, the pooled sensitivity, specificity, and area under the curve (AUC) were 0.96 (95% CI 0.93-0.98), 0.98 (95% CI 0.96-0.99), and 0.99 (95% CI 0.98-1.00). For dermatologists? diagnoses, the sensitivity, specificity, and AUC were 0.75 (95% CI 0.66-0.82), 0.97 (95% CI 0.95-0.98), and 0.96 (95% CI 0.94-0.98). The results showed that dermatoscopy-based deep learning algorithms had a higher AUC than dermatologists? performance when using internal validation datasets (z=2.63; P=.008). Conclusions: This meta-analysis suggests that deep learning algorithms based on dermatoscopy exhibit strong diagnostic performance for detecting BCC. However, the retrospective design of many included studies and variations in reference standards may restrict the generalizability of these findings. The models evaluated in the included studies generally showed improved performance over that of dermatologists in classifying dermatoscopic images of BCC using internal validation datasets, highlighting their potential to support future diagnoses. However, performance on internal validation datasets does not necessarily translate well to external validation datasets. Additional external validation of these results is necessary to enhance the application of deep learning in dermatological diagnostics. Trial Registration: PROSPERO International Prospective Register of Systematic Reviews CRD42025633947; https://www.crd.york.ac.uk/PROSPERO/view/CRD42025633947 UR - https://www.jmir.org/2025/1/e73541 UR - http://dx.doi.org/10.2196/73541 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/73541 ER - TY - JOUR AU - McRae, Charlotte AU - Zhang, Dan Ting AU - Seeley, Donoghue Leslie AU - Anderson, Michael AU - Turner, Laci AU - Graham, V. Lauren PY - 2025/9/16 TI - Patient Perceptions of Artificial Intelligence and Telemedicine in Dermatology: Narrative Review JO - JMIR Dermatol SP - e75454 VL - 8 KW - digital health KW - technology KW - patient-centered care KW - health care innovation KW - trust KW - convergence KW - artificial intelligence KW - teledermatology N2 - Background: Artificial intelligence (AI) and telemedicine have significant potential to transform dermatology care delivery, but patient perspectives on these technologies have not been systematically compared. Objective: This study aimed to examine patient perspectives on AI and telemedicine in dermatology to inform implementation strategies as these technologies increasingly converge in clinical practice. Methods: A comprehensive literature search was conducted using PubMed, Scopus, and Embase databases between August 2024 and October 2024. We identified 48 papers addressing patient perspectives on AI and telemedicine in dermatology, with none directly comparing patients? views of both technologies. Results: Several distinct themes emerged regarding patient perspectives on these technologies: willingness to use, perceived benefits and risks, barriers to implementation, and conditions necessary for successful integration. Findings revealed that patients express hesitancy toward AI-based diagnoses that lack dermatologist involvement, while preferences for teledermatology varied by reason for appointment, age, and previous technology exposure. Patients? motivations for implementing AI are connected to its potential for quicker diagnoses and improved triage efficiency. At the same time, telemedicine addresses logistical challenges such as reduced travel time and improved appointment availability. Both technologies were perceived to improve accessibility and diagnostic efficiency, though patients expressed concerns about AI?s limited communication abilities and teledermatology?s inability to perform physical examinations. Primary adoption barriers for these modalities included technological limitations and trust concerns, with patients emphasizing the need for dermatologist oversight, transparency, and adequate educational resources for successful integration. Conclusions: The complementary strengths of AI and teledermatology suggest they could mitigate each other?s limitations when integrated?AI potentially enhancing teledermatology?s diagnostic accuracy, while teledermatology addresses AI?s lack of human connection. By thoroughly examining these perspectives, this review may serve as a guide for the patient-centered integration of technology in the future landscape of accessible dermatologic care. UR - https://derma.jmir.org/2025/1/e75454 UR - http://dx.doi.org/10.2196/75454 ID - info:doi/10.2196/75454 ER - TY - JOUR AU - Ghazanfar, Noshela Misbah AU - Al-Mousawi, Ali AU - Riemer, Christian AU - Björnsson, Þór Benóný AU - Boissard, Charlotte AU - Lee, Ivy AU - Ali, Zarqa AU - Thomsen, Francis Simon PY - 2025/7/16 TI - Effectiveness of a Machine Learning-Enabled Skincare Recommendation for Mild-to-Moderate Acne Vulgaris: 8-Week Evaluator-Blinded Randomized Controlled Trial JO - JMIR Dermatol SP - e60883 VL - 8 KW - machine learning KW - personalised skincare KW - acne vulgaris KW - dermatology KW - skincare N2 - Background: Acne vulgaris (AV) is one of the most common skin disorders, with a peak incidence in adolescence and early adulthood. Topical treatments are usually used for mild to moderate AV; however, a lack of adherence to topical treatment is seen in patients due to various reasons. Therefore, personalized skincare recommendations may be beneficial for treating mild-to-moderate AV. Objective: This study aimed to evaluate the effectiveness of a novel machine learning approach in predicting the optimal treatment for mild-to-moderate AV based on self-assessment and objective measures. Methods: A randomized, evaluator-blinded, parallel-group study was conducted on 100 patients recruited from an internet-based database and randomized in a 1:1 ratio (groups A and B) based on their consent form submission. Groups A and B received customized product recommendations using a Bayesian machine learning model and self-selected treatments, respectively. The patients submitted self-assessed disease scores and photographs after the 8-week treatment. The primary and secondary outcomes were photograph evaluation by two board-certified dermatologists using the Investigator Global Assessment (IGA) scores and quality of life (QoL) measured using the Dermatology Life Quality Index (DLQI), respectively. Results: Overall, 99 patients were screened, and 68 patients (mean age: 27 years, SD 4.56 years) were randomized into groups A (customized) and B (self-selected). IGA scores significantly improved after treatment in group A but not in group B (mean difference in IGA score; group A=0.32, P=.04 vs group B=0.09, P=.54). The DLQI significantly improved in group A from 7.75 at baseline to 3.5 (P<.001) after treatment but reduced in group B from 7.53 to 5.3 (P>.05). IGA scores and the DLQI were significantly correlated in group A, but not in group B. A total of 3 patients reported adverse reactions in group B, but none in group A. Conclusions: Using a machine learning model for personalized skincare recommendations significantly reduced symptoms and improved severity and overall QoL of patients with mild-to-moderate AV, supporting the potential of machine learning-based personalized treatment options in dermatology. UR - https://derma.jmir.org/2025/1/e60883 UR - http://dx.doi.org/10.2196/60883 ID - info:doi/10.2196/60883 ER - TY - JOUR AU - Brehmer, Alexander AU - Seibold, Constantin AU - Egger, Jan AU - Majjouti, Khalid AU - Tapp-Herrenbrück, Michaela AU - Pinnekamp, Hannah AU - Priester, Vanessa AU - Aleithe, Michael AU - Fischer, Uli AU - Hosters, Bernadette AU - Kleesiek, Jens PY - 2025/5/1 TI - Fine-Grained Classification of Pressure Ulcers and Incontinence-Associated Dermatitis Using Multimodal Deep Learning: Algorithm Development and Validation Study JO - JMIR AI SP - e67356 VL - 4 KW - computer vision KW - image classification KW - wound classification KW - deep learning KW - pressure ulcer KW - incontinence-associated dermatitis KW - multi modal data KW - synthetic image generation N2 - Background: Pressure ulcers (PUs) and incontinence-associated dermatitis (IAD) are prevalent conditions in clinical settings, posing significant challenges due to their similar presentations but differing treatment needs. Accurate differentiation between PUs and IAD is essential for appropriate patient care, yet it remains a burden for nursing staff and wound care experts. Objective: This study aims to develop and introduce a robust multimodal deep learning framework for the classification of PUs and IAD, along with the fine-grained categorization of their respective wound severities, to enhance diagnostic accuracy and support clinical decision-making. Methods: We collected and annotated a dataset of 1555 wound images, achieving consensus among 4 wound experts. Our framework integrates wound images with categorical patient data to improve classification performance. We evaluated 4 models?2 convolutional neural networks and 2 transformer-based architectures?each with approximately 25 million parameters. Various data preprocessing strategies, augmentation techniques, training methods (including multimodal data integration, synthetic data generation, and sampling), and postprocessing approaches (including ensembling and test-time augmentation) were systematically tested to optimize model performance. Results: The transformer-based TinyViT model achieved the highest performance in binary classification of PU and IAD, with an F1-score (harmonic mean of precision and recall) of 93.23%, outperforming wound care experts and nursing staff on the test dataset. In fine-grained classification of wound categories, the TinyViT model also performed best for PU categories with an F1-score of 75.43%, while ConvNeXtV2 showed superior performance in IAD category classification with an F1-score of 53.20%. Incorporating multimodal data improved performance in binary classification but had less impact on fine-grained categorization. Augmentation strategies and training techniques significantly influenced model performance, with ensembling enhancing accuracy across all tasks. Conclusions: Our multimodal deep learning framework effectively differentiates between PUs and IAD, achieving high accuracy and outperforming human wound care experts. By integrating wound images with categorical patient data, the model enhances diagnostic precision, offering a valuable decision-support tool for health care professionals. This advancement has the potential to reduce diagnostic uncertainty, optimize treatment pathways, and alleviate the burden on medical staff, leading to faster interventions and improved patient outcomes. The framework?s strong performance suggests practical applications in clinical settings, such as integration into hospital electronic health record systems or mobile applications for bedside diagnostics. Future work should focus on validating real-world implementation, expanding dataset diversity, and refining fine-grained classification capabilities to further enhance clinical utility. UR - https://ai.jmir.org/2025/1/e67356 UR - http://dx.doi.org/10.2196/67356 ID - info:doi/10.2196/67356 ER - TY - JOUR AU - Jones, Tudor Owain AU - Calanzani, Natalia AU - Scott, E. Suzanne AU - Matin, N. Rubeta AU - Emery, Jon AU - Walter, M. Fiona PY - 2025/1/28 TI - User and Developer Views on Using AI Technologies to Facilitate the Early Detection of Skin Cancers in Primary Care Settings: Qualitative Semistructured Interview Study JO - JMIR Cancer SP - e60653 VL - 11 KW - artificial intelligence KW - AI KW - machine learning KW - ML KW - primary care KW - skin cancer KW - melanoma KW - qualitative research KW - mobile phone N2 - Background: Skin cancers, including melanoma and keratinocyte cancers, are among the most common cancers worldwide, and their incidence is rising in most populations. Earlier detection of skin cancer leads to better outcomes for patients. Artificial intelligence (AI) technologies have been applied to skin cancer diagnosis, but many technologies lack clinical evidence and/or the appropriate regulatory approvals. There are few qualitative studies examining the views of relevant stakeholders or evidence about the implementation and positioning of AI technologies in the skin cancer diagnostic pathway. Objective: This study aimed to understand the views of several stakeholder groups on the use of AI technologies to facilitate the early diagnosis of skin cancer, including patients, members of the public, general practitioners, primary care nurse practitioners, dermatologists, and AI researchers. Methods: This was a qualitative, semistructured interview study with 29 stakeholders. Participants were purposively sampled based on age, sex, and geographical location. We conducted the interviews via Zoom between September 2022 and May 2023. Transcribed recordings were analyzed using thematic framework analysis. The framework for the Nonadoption, Abandonment, and Challenges to Scale-Up, Spread, and Sustainability was used to guide the analysis to help understand the complexity of implementing diagnostic technologies in clinical settings. Results: Major themes were ?the position of AI in the skin cancer diagnostic pathway? and ?the aim of the AI technology?; cross-cutting themes included trust, usability and acceptability, generalizability, evaluation and regulation, implementation, and long-term use. There was no clear consensus on where AI should be placed along the skin cancer diagnostic pathway, but most participants saw the technology in the hands of either patients or primary care practitioners. Participants were concerned about the quality of the data used to develop and test AI technologies and the impact this could have on their accuracy in clinical use with patients from a range of demographics and the risk of missing skin cancers. Ease of use and not increasing the workload of already strained health care services were important considerations for participants. Health care professionals and AI researchers reported a lack of established methods of evaluating and regulating AI technologies. Conclusions: This study is one of the first to examine the views of a wide range of stakeholders on the use of AI technologies to facilitate early diagnosis of skin cancer. The optimal approach and position in the diagnostic pathway for these technologies have not yet been determined. AI technologies need to be developed and implemented carefully and thoughtfully, with attention paid to the quality and representativeness of the data used for development, to achieve their potential. UR - https://cancer.jmir.org/2025/1/e60653 UR - http://dx.doi.org/10.2196/60653 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60653 ER - TY - JOUR AU - Willem, Theresa AU - Wollek, Alessandro AU - Cheslerean-Boghiu, Theodor AU - Kenney, Martha AU - Buyx, Alena PY - 2025/1/28 TI - The Social Construction of Categorical Data: Mixed Methods Approach to Assessing Data Features in Publicly Available Datasets JO - JMIR Med Inform SP - e59452 VL - 13 KW - machine learning KW - categorical data KW - social context dependency KW - mixed methods KW - dermatology KW - dataset analysis N2 - Background: In data-sparse areas such as health care, computer scientists aim to leverage as much available information as possible to increase the accuracy of their machine learning models? outputs. As a standard, categorical data, such as patients? gender, socioeconomic status, or skin color, are used to train models in fusion with other data types, such as medical images and text-based medical information. However, the effects of including categorical data features for model training in such data-scarce areas are underexamined, particularly regarding models intended to serve individuals equitably in a diverse population. Objective: This study aimed to explore categorical data?s effects on machine learning model outputs, rooted the effects in the data collection and dataset publication processes, and proposed a mixed methods approach to examining datasets? data categories before using them for machine learning training. Methods: Against the theoretical background of the social construction of categories, we suggest a mixed methods approach to assess categorical data?s utility for machine learning model training. As an example, we applied our approach to a Brazilian dermatological dataset (Dermatological and Surgical Assistance Program at the Federal University of Espírito Santo [PAD-UFES] 20). We first present an exploratory, quantitative study that assesses the effects when including or excluding each of the unique categorical data features of the PAD-UFES 20 dataset for training a transformer-based model using a data fusion algorithm. We then pair our quantitative analysis with a qualitative examination of the data categories based on interviews with the dataset authors. Results: Our quantitative study suggests scattered effects of including categorical data for machine learning model training across predictive classes. Our qualitative analysis gives insights into how the categorical data were collected and why they were published, explaining some of the quantitative effects that we observed. Our findings highlight the social constructedness of categorical data in publicly available datasets, meaning that the data in a category heavily depend on both how these categories are defined by the dataset creators and the sociomedico context in which the data are collected. This reveals relevant limitations of using publicly available datasets in contexts different from those of the collection of their data. Conclusions: We caution against using data features of publicly available datasets without reflection on the social construction and context dependency of their categorical data features, particularly in data-sparse areas. We conclude that social scientific, context-dependent analysis of available data features using both quantitative and qualitative methods is helpful in judging the utility of categorical data for the population for which a model is intended. UR - https://medinform.jmir.org/2025/1/e59452 UR - http://dx.doi.org/10.2196/59452 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59452 ER - TY - JOUR AU - Wang, Wei AU - Chen, Xiang AU - Xu, Licong AU - Huang, Kai AU - Zhao, Shuang AU - Wang, Yong PY - 2024/12/27 TI - Artificial Intelligence?Aided Diagnosis System for the Detection and Classification of Private-Part Skin Diseases: Decision Analytical Modeling Study JO - J Med Internet Res SP - e52914 VL - 26 KW - artificial intelligence-aided diagnosis KW - private parts KW - skin disease KW - knowledge graph KW - dermatology KW - classification KW - artificial intelligence KW - AI KW - diagnosis N2 - Background: Private-part skin diseases (PPSDs) can cause a patient?s stigma, which may hinder the early diagnosis of these diseases. Artificial intelligence (AI) is an effective tool to improve the early diagnosis of PPSDs, especially in preventing the deterioration of skin tumors in private parts such as Paget disease. However, to our knowledge, there is currently no research on using AI to identify PPSDs due to the complex backgrounds of the lesion areas and the challenges in data collection. Objective: This study aimed to develop and evaluate an AI-aided diagnosis system for the detection and classification of PPSDs: aiding patients in self-screening and supporting dermatologists? diagnostic enhancement. Methods: In this decision analytical modeling study, a 2-stage AI-aided diagnosis system was developed to classify PPSDs. In the first stage, a multitask detection network was trained to automatically detect and classify skin lesions (type, color, and shape). In the second stage, we proposed a knowledge graph based on dermatology expertise and constructed a decision network to classify seven PPSDs (condyloma acuminatum, Paget disease, eczema, pearly penile papules, genital herpes, syphilis, and Bowen disease). A reader study with 13 dermatologists of different experience levels was conducted. Dermatologists were asked to classify the testing cohort under reading room conditions, first without and then with system support. This AI-aided diagnostic study used the data of 635 patients from two institutes between July 2019 and April 2022. The data of Institute 1 contained 2701 skin lesion samples from 520 patients, which were used for the training of the multitask detection network in the first stage. In addition, the data of Institute 2 consisted of 115 clinical images and the corresponding medical records, which were used for the test of the whole 2-stage AI-aided diagnosis system. Results: On the test data of Institute 2, the proposed system achieved the average precision, recall, and F1-score of 0.81, 0.86, and 0.83, respectively, better than existing advanced algorithms. For the reader performance test, our system improved the average F1-score of the junior, intermediate, and senior dermatologists by 16%, 7%, and 4%, respectively. Conclusions: In this study, we constructed the first skin-lesion?based dataset and developed the first AI-aided diagnosis system for PPSDs. This system provides the final diagnosis result by simulating the diagnostic process of dermatologists. Compared with existing advanced algorithms, this system is more accurate in identifying PPSDs. Overall, our system can not only help patients achieve self-screening and alleviate their stigma but also assist dermatologists in diagnosing PPSDs. UR - https://www.jmir.org/2024/1/e52914 UR - http://dx.doi.org/10.2196/52914 UR - http://www.ncbi.nlm.nih.gov/pubmed/39729353 ID - info:doi/10.2196/52914 ER - TY - JOUR AU - Parekh, Pranav AU - Oyeleke, Richard AU - Vishwanath, Tejas PY - 2024/12/18 TI - The Depth Estimation and Visualization of Dermatological Lesions: Development and Usability Study JO - JMIR Dermatol SP - e59839 VL - 7 KW - machine learning KW - ML KW - computer vision KW - neural networks KW - explainable AI KW - XAI KW - computer graphics KW - red spot analysis KW - mixed reality KW - MR KW - artificial intelligence KW - visualization N2 - Background: Thus far, considerable research has been focused on classifying a lesion as benign or malignant. However, there is a requirement for quick depth estimation of a lesion for the accurate clinical staging of the lesion. The lesion could be malignant and quickly grow beneath the skin. While biopsy slides provide clear information on lesion depth, it is an emerging domain to find quick and noninvasive methods to estimate depth, particularly based on 2D images. Objective: This study proposes a novel methodology for the depth estimation and visualization of skin lesions. Current diagnostic methods are approximate in determining how much a lesion may have proliferated within the skin. Using color gradients and depth maps, this method will give us a definite estimate and visualization procedure for lesions and other skin issues. We aim to generate 3D holograms of the lesion depth such that dermatologists can better diagnose melanoma. Methods: We started by performing classification using a convolutional neural network (CNN), followed by using explainable artificial intelligence to localize the image features responsible for the CNN output. We used the gradient class activation map approach to perform localization of the lesion from the rest of the image. We applied computer graphics for depth estimation and developing the 3D structure of the lesion. We used the depth from defocus method for depth estimation from single images and Gabor filters for volumetric representation of the depth map. Our novel method, called red spot analysis, measures the degree of infection based on how a conical hologram is constructed. We collaborated with a dermatologist to analyze the 3D hologram output and received feedback on how this method can be introduced to clinical implementation. Results: The neural model plus the explainable artificial intelligence algorithm achieved an accuracy of 86% in classifying the lesions correctly as benign or malignant. For the entire pipeline, we mapped the benign and malignant cases to their conical representations. We received exceedingly positive feedback while pitching this idea at the King Edward Memorial Institute in India. Dermatologists considered this a potentially useful tool in the depth estimation of lesions. We received a number of ideas for evaluating the technique before it can be introduced to the clinical scene. Conclusions: When we map the CNN outputs (benign or malignant) to the corresponding hologram, we observe that a malignant lesion has a higher concentration of red spots (infection) in the upper and deeper portions of the skin, and that the malignant cases have deeper conical sections when compared with the benign cases. This proves that the qualitative results map with the initial classification performed by the neural model. The positive feedback provided by the dermatologist suggests that the qualitative conclusion of the method is sufficient. UR - https://derma.jmir.org/2024/1/e59839 UR - http://dx.doi.org/10.2196/59839 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59839 ER - TY - JOUR AU - Liu, Xu AU - Duan, Chaoli AU - Kim, Min-kyu AU - Zhang, Lu AU - Jee, Eunjin AU - Maharjan, Beenu AU - Huang, Yuwei AU - Du, Dan AU - Jiang, Xian PY - 2024/8/6 TI - Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis JO - JMIR Med Inform SP - e59273 VL - 12 KW - artificial intelligence KW - AI KW - large language model KW - LLM KW - Claude KW - ChatGPT KW - dermatologist N2 - Background: Recent advancements in artificial intelligence (AI) and large language models (LLMs) have shown potential in medical fields, including dermatology. With the introduction of image analysis capabilities in LLMs, their application in dermatological diagnostics has garnered significant interest. These capabilities are enabled by the integration of computer vision techniques into the underlying architecture of LLMs. Objective: This study aimed to compare the diagnostic performance of Claude 3 Opus and ChatGPT with GPT-4 in analyzing dermoscopic images for melanoma detection, providing insights into their strengths and limitations. Methods: We randomly selected 100 histopathology-confirmed dermoscopic images (50 malignant, 50 benign) from the International Skin Imaging Collaboration (ISIC) archive using a computer-generated randomization process. The ISIC archive was chosen due to its comprehensive and well-annotated collection of dermoscopic images, ensuring a diverse and representative sample. Images were included if they were dermoscopic images of melanocytic lesions with histopathologically confirmed diagnoses. Each model was given the same prompt, instructing it to provide the top 3 differential diagnoses for each image, ranked by likelihood. Primary diagnosis accuracy, accuracy of the top 3 differential diagnoses, and malignancy discrimination ability were assessed. The McNemar test was chosen to compare the diagnostic performance of the 2 models, as it is suitable for analyzing paired nominal data. Results: In the primary diagnosis, Claude 3 Opus achieved 54.9% sensitivity (95% CI 44.08%-65.37%), 57.14% specificity (95% CI 46.31%-67.46%), and 56% accuracy (95% CI 46.22%-65.42%), while ChatGPT demonstrated 56.86% sensitivity (95% CI 45.99%-67.21%), 38.78% specificity (95% CI 28.77%-49.59%), and 48% accuracy (95% CI 38.37%-57.75%). The McNemar test showed no significant difference between the 2 models (P=.17). For the top 3 differential diagnoses, Claude 3 Opus and ChatGPT included the correct diagnosis in 76% (95% CI 66.33%-83.77%) and 78% (95% CI 68.46%-85.45%) of cases, respectively. The McNemar test showed no significant difference (P=.56). In malignancy discrimination, Claude 3 Opus outperformed ChatGPT with 47.06% sensitivity, 81.63% specificity, and 64% accuracy, compared to 45.1%, 42.86%, and 44%, respectively. The McNemar test showed a significant difference (P<.001). Claude 3 Opus had an odds ratio of 3.951 (95% CI 1.685-9.263) in discriminating malignancy, while ChatGPT-4 had an odds ratio of 0.616 (95% CI 0.297-1.278). Conclusions: Our study highlights the potential of LLMs in assisting dermatologists but also reveals their limitations. Both models made errors in diagnosing melanoma and benign lesions. These findings underscore the need for developing robust, transparent, and clinically validated AI models through collaborative efforts between AI researchers, dermatologists, and other health care professionals. While AI can provide valuable insights, it cannot yet replace the expertise of trained clinicians. UR - https://medinform.jmir.org/2024/1/e59273 UR - http://dx.doi.org/10.2196/59273 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59273 ER - TY - JOUR AU - Gassner, Mathias AU - Barranco Garcia, Javier AU - Tanadini-Lang, Stephanie AU - Bertoldo, Fabio AU - Fröhlich, Fabienne AU - Guckenberger, Matthias AU - Haueis, Silvia AU - Pelzer, Christin AU - Reyes, Mauricio AU - Schmithausen, Patrick AU - Simic, Dario AU - Staeger, Ramon AU - Verardi, Fabio AU - Andratschke, Nicolaus AU - Adelmann, Andreas AU - Braun, P. Ralph PY - 2023/8/24 TI - Saliency-Enhanced Content-Based Image Retrieval for Diagnosis Support in Dermatology Consultation: Reader Study JO - JMIR Dermatol SP - e42129 VL - 6 KW - dermatology KW - deep learning KW - melanoma KW - saliency maps KW - image retrieval KW - dermoscopy KW - skin cancer KW - diagnosis KW - algorithms KW - convolutional neural network KW - dermoscopic images N2 - Background: Previous research studies have demonstrated that medical content image retrieval can play an important role by assisting dermatologists in skin lesion diagnosis. However, current state-of-the-art approaches have not been adopted in routine consultation, partly due to the lack of interpretability limiting trust by clinical users. Objective: This study developed a new image retrieval architecture for polarized or dermoscopic imaging guided by interpretable saliency maps. This approach provides better feature extraction, leading to better quantitative retrieval performance as well as providing interpretability for an eventual real-world implementation. Methods: Content-based image retrieval (CBIR) algorithms rely on the comparison of image features embedded by convolutional neural network (CNN) against a labeled data set. Saliency maps are computer vision?interpretable methods that highlight the most relevant regions for the prediction made by a neural network. By introducing a fine-tuning stage that includes saliency maps to guide feature extraction, the accuracy of image retrieval is optimized. We refer to this approach as saliency-enhanced CBIR (SE-CBIR). A reader study was designed at the University Hospital Zurich Dermatology Clinic to evaluate SE-CBIR?s retrieval accuracy as well as the impact of the participant?s confidence on the diagnosis. Results: SE-CBIR improved the retrieval accuracy by 7% (77% vs 84%) when doing single-lesion retrieval against traditional CBIR. The reader study showed an overall increase in classification accuracy of 22% (62% vs 84%) when the participant is provided with SE-CBIR retrieved images. In addition, the overall confidence in the lesion?s diagnosis increased by 24%. Finally, the use of SE-CBIR as a support tool helped the participants reduce the number of nonmelanoma lesions previously diagnosed as melanoma (overdiagnosis) by 53%. Conclusions: SE-CBIR presents better retrieval accuracy compared to traditional CBIR CNN-based approaches. Furthermore, we have shown how these support tools can help dermatologists and residents improve diagnosis accuracy and confidence. Additionally, by introducing interpretable methods, we should expect increased acceptance and use of these tools in routine consultation. UR - https://derma.jmir.org/2023/1/e42129 UR - http://dx.doi.org/10.2196/42129 UR - http://www.ncbi.nlm.nih.gov/pubmed/37616039 ID - info:doi/10.2196/42129 ER - TY - JOUR AU - Zhang, Xinyuan AU - Xie, Ziqian AU - Xiang, Yang AU - Baig, Imran AU - Kozman, Mena AU - Stender, Carly AU - Giancardo, Luca AU - Tao, Cui PY - 2022/12/12 TI - Issues in Melanoma Detection: Semisupervised Deep Learning Algorithm Development via a Combination of Human and Artificial Intelligence JO - JMIR Dermatol SP - e39113 VL - 5 IS - 4 KW - deep learning KW - dermoscopic images KW - semisupervised learning KW - 3-point checklist KW - skin lesion KW - dermatology KW - algorithm KW - melanoma classification KW - melanoma KW - automatic diagnosis KW - skin disease N2 - Background: Automatic skin lesion recognition has shown to be effective in increasing access to reliable dermatology evaluation; however, most existing algorithms rely solely on images. Many diagnostic rules, including the 3-point checklist, are not considered by artificial intelligence algorithms, which comprise human knowledge and reflect the diagnosis process of human experts. Objective: In this paper, we aimed to develop a semisupervised model that can not only integrate the dermoscopic features and scoring rule from the 3-point checklist but also automate the feature-annotation process. Methods: We first trained the semisupervised model on a small, annotated data set with disease and dermoscopic feature labels and tried to improve the classification accuracy by integrating the 3-point checklist using ranking loss function. We then used a large, unlabeled data set with only disease label to learn from the trained algorithm to automatically classify skin lesions and features. Results: After adding the 3-point checklist to our model, its performance for melanoma classification improved from a mean of 0.8867 (SD 0.0191) to 0.8943 (SD 0.0115) under 5-fold cross-validation. The trained semisupervised model can automatically detect 3 dermoscopic features from the 3-point checklist, with best performances of 0.80 (area under the curve [AUC] 0.8380), 0.89 (AUC 0.9036), and 0.76 (AUC 0.8444), in some cases outperforming human annotators. Conclusions: Our proposed semisupervised learning framework can help with the automatic diagnosis of skin disease based on its ability to detect dermoscopic features and automate the label-annotation process. The framework can also help combine semantic knowledge with a computer algorithm to arrive at a more accurate and more interpretable diagnostic result, which can be applied to broader use cases. UR - https://derma.jmir.org/2022/4/e39113 UR - http://dx.doi.org/10.2196/39113 UR - http://www.ncbi.nlm.nih.gov/pubmed/37632881 ID - info:doi/10.2196/39113 ER - TY - JOUR AU - Rezk, Eman AU - Eltorki, Mohamed AU - El-Dakhakhni, Wael PY - 2022/3/8 TI - Leveraging Artificial Intelligence to Improve the Diversity of Dermatological Skin Color Pathology: Protocol for an Algorithm Development and Validation Study JO - JMIR Res Protoc SP - e34896 VL - 11 IS - 3 KW - artificial intelligence KW - skin cancer KW - skin tone diversity KW - people of color KW - image blending KW - deep learning KW - classification KW - early diagnosis N2 - Background: The paucity of dark skin images in dermatological textbooks and atlases is a reflection of racial injustice in medicine. The underrepresentation of dark skin images makes diagnosing skin pathology in people of color challenging. For conditions such as skin cancer, in which early diagnosis makes a difference between life and death, people of color have worse prognoses and lower survival rates than people with lighter skin tones as a result of delayed or incorrect diagnoses. Recent advances in artificial intelligence, such as deep learning, offer a potential solution that can be achieved by diversifying the mostly light-skin image repositories through generating images for darker skin tones. Thus, facilitating the development of inclusive cancer early diagnosis systems that are trained and tested on diverse images that truly represent human skin tones. Objective: We aim to develop and evaluate an artificial intelligence?based skin cancer early detection system for all skin tones using clinical images. Methods: This study consists of four phases: (1) Publicly available skin image repositories will be analyzed to quantify the underrepresentation of darker skin tones, (2) Images will be generated for the underrepresented skin tones, (3) Generated images will be extensively evaluated for realism and disease presentation with quantitative image quality assessment as well as qualitative human expert and nonexpert ratings, and (4) The images will be utilized with available light-skin images to develop a robust skin cancer early detection model. Results: This study started in September 2020. The first phase of quantifying the underrepresentation of darker skin tones was completed in March 2021. The second phase of generating the images is in progress and will be completed by March 2022. The third phase is expected to be completed by May 2022, and the final phase is expected to be completed by September 2022. Conclusions: This work is the first step toward expanding skin tone diversity in existing image databases to address the current gap in the underrepresentation of darker skin tones. Once validated, the image bank will be a valuable resource that can potentially be utilized in physician education and in research applications. Furthermore, generated images are expected to improve the generalizability of skin cancer detection. When completed, the model will assist family physicians and general practitioners in evaluating skin lesion severity and in efficient triaging for referral to expert dermatologists. In addition, the model can assist dermatologists in diagnosing skin lesions. International Registered Report Identifier (IRRID): DERR1-10.2196/34896 UR - https://www.researchprotocols.org/2022/3/e34896 UR - http://dx.doi.org/10.2196/34896 UR - http://www.ncbi.nlm.nih.gov/pubmed/34983017 ID - info:doi/10.2196/34896 ER - TY - JOUR AU - Chang, Wei Che AU - Lai, Feipei AU - Christian, Mesakh AU - Chen, Chun Yu AU - Hsu, Ching AU - Chen, Shen Yo AU - Chang, Hao Dun AU - Roan, Luen Tyng AU - Yu, Che Yen PY - 2021/12/2 TI - Deep Learning?Assisted Burn Wound Diagnosis: Diagnostic Model Development Study JO - JMIR Med Inform SP - e22798 VL - 9 IS - 12 KW - deep learning KW - semantic segmentation KW - instance segmentation KW - burn wounds KW - percentage total body surface area N2 - Background: Accurate assessment of the percentage total body surface area (%TBSA) of burn wounds is crucial in the management of burn patients. The resuscitation fluid and nutritional needs of burn patients, their need for intensive unit care, and probability of mortality are all directly related to %TBSA. It is difficult to estimate a burn area of irregular shape by inspection. Many articles have reported discrepancies in estimating %TBSA by different doctors. Objective: We propose a method, based on deep learning, for burn wound detection, segmentation, and calculation of %TBSA on a pixel-to-pixel basis. Methods: A 2-step procedure was used to convert burn wound diagnosis into %TBSA. In the first step, images of burn wounds were collected from medical records and labeled by burn surgeons, and the data set was then input into 2 deep learning architectures, U-Net and Mask R-CNN, each configured with 2 different backbones, to segment the burn wounds. In the second step, we collected and labeled images of hands to create another data set, which was also input into U-Net and Mask R-CNN to segment the hands. The %TBSA of burn wounds was then calculated by comparing the pixels of mask areas on images of the burn wound and hand of the same patient according to the rule of hand, which states that one?s hand accounts for 0.8% of TBSA. Results: A total of 2591 images of burn wounds were collected and labeled to form the burn wound data set. The data set was randomly split into training, validation, and testing sets in a ratio of 8:1:1. Four hundred images of volar hands were collected and labeled to form the hand data set, which was also split into 3 sets using the same method. For the images of burn wounds, Mask R-CNN with ResNet101 had the best segmentation result with a Dice coefficient (DC) of 0.9496, while U-Net with ResNet101 had a DC of 0.8545. For the hand images, U-Net and Mask R-CNN had similar performance with DC values of 0.9920 and 0.9910, respectively. Lastly, we conducted a test diagnosis in a burn patient. Mask R-CNN with ResNet101 had on average less deviation (0.115% TBSA) from the ground truth than burn surgeons. Conclusions: This is one of the first studies to diagnose all depths of burn wounds and convert the segmentation results into %TBSA using different deep learning models. We aimed to assist medical staff in estimating burn size more accurately, thereby helping to provide precise care to burn victims. UR - https://medinform.jmir.org/2021/12/e22798 UR - http://dx.doi.org/10.2196/22798 UR - http://www.ncbi.nlm.nih.gov/pubmed/34860674 ID - info:doi/10.2196/22798 ER - TY - JOUR AU - Takiddin, Abdulrahman AU - Schneider, Jens AU - Yang, Yin AU - Abd-Alrazaq, Alaa AU - Househ, Mowafa PY - 2021/11/24 TI - Artificial Intelligence for Skin Cancer Detection: Scoping Review JO - J Med Internet Res SP - e22934 VL - 23 IS - 11 KW - artificial intelligence KW - skin cancer KW - skin lesion KW - machine learning KW - deep neural networks N2 - Background: Skin cancer is the most common cancer type affecting humans. Traditional skin cancer diagnosis methods are costly, require a professional physician, and take time. Hence, to aid in diagnosing skin cancer, artificial intelligence (AI) tools are being used, including shallow and deep machine learning?based methodologies that are trained to detect and classify skin cancer using computer algorithms and deep neural networks. Objective: The aim of this study was to identify and group the different types of AI-based technologies used to detect and classify skin cancer. The study also examined the reliability of the selected papers by studying the correlation between the data set size and the number of diagnostic classes with the performance metrics used to evaluate the models. Methods: We conducted a systematic search for papers using Institute of Electrical and Electronics Engineers (IEEE) Xplore, Association for Computing Machinery Digital Library (ACM DL), and Ovid MEDLINE databases following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines. The studies included in this scoping review had to fulfill several selection criteria: being specifically about skin cancer, detecting or classifying skin cancer, and using AI technologies. Study selection and data extraction were independently conducted by two reviewers. Extracted data were narratively synthesized, where studies were grouped based on the diagnostic AI techniques and their evaluation metrics. Results: We retrieved 906 papers from the 3 databases, of which 53 were eligible for this review. Shallow AI-based techniques were used in 14 studies, and deep AI-based techniques were used in 39 studies. The studies used up to 11 evaluation metrics to assess the proposed models, where 39 studies used accuracy as the primary evaluation metric. Overall, studies that used smaller data sets reported higher accuracy. Conclusions: This paper examined multiple AI-based skin cancer detection models. However, a direct comparison between methods was hindered by the varied use of different evaluation metrics and image types. Performance scores were affected by factors such as data set size, number of diagnostic classes, and techniques. Hence, the reliability of shallow and deep models with higher accuracy scores was questionable since they were trained and tested on relatively small data sets of a few diagnostic classes. UR - https://www.jmir.org/2021/11/e22934 UR - http://dx.doi.org/10.2196/22934 UR - http://www.ncbi.nlm.nih.gov/pubmed/34821566 ID - info:doi/10.2196/22934 ER - TY - JOUR AU - Aggarwal, Pushkar PY - 2021/10/12 TI - Performance of Artificial Intelligence Imaging Models in Detecting Dermatological Manifestations in Higher Fitzpatrick Skin Color Classifications JO - JMIR Dermatol SP - e31697 VL - 4 IS - 2 KW - deep learning KW - melanoma KW - basal cell carcinoma KW - skin of color KW - image recognition KW - dermatology KW - disease KW - convolutional neural network KW - specificity KW - prediction KW - artificial intelligence KW - skin color KW - skin tone N2 - Background: The performance of deep-learning image recognition models is below par when applied to images with Fitzpatrick classification skin types 4 and 5. Objective: The objective of this research was to assess whether image recognition models perform differently when differentiating between dermatological diseases in individuals with darker skin color (Fitzpatrick skin types 4 and 5) than when differentiating between the same dermatological diseases in Caucasians (Fitzpatrick skin types 1, 2, and 3) when both models are trained on the same number of images. Methods: Two image recognition models were trained, validated, and tested. The goal of each model was to differentiate between melanoma and basal cell carcinoma. Open-source images of melanoma and basal cell carcinoma were acquired from the Hellenic Dermatological Atlas, the Dermatology Atlas, the Interactive Dermatology Atlas, and DermNet NZ. Results: The image recognition models trained and validated on images with light skin color had higher sensitivity, specificity, positive predictive value, negative predictive value, and F1 score than the image recognition models trained and validated on images of skin of color for differentiation between melanoma and basal cell carcinoma. Conclusions: A higher number of images of dermatological diseases in individuals with darker skin color than images of dermatological diseases in individuals with light skin color would need to be gathered for artificial intelligence models to perform equally well. UR - https://derma.jmir.org/2021/2/e31697 UR - http://dx.doi.org/10.2196/31697 UR - http://www.ncbi.nlm.nih.gov/pubmed/37632853 ID - info:doi/10.2196/31697 ER - TY - JOUR AU - Huang, Kai AU - Jiang, Zixi AU - Li, Yixin AU - Wu, Zhe AU - Wu, Xian AU - Zhu, Wu AU - Chen, Mingliang AU - Zhang, Yu AU - Zuo, Ke AU - Li, Yi AU - Yu, Nianzhou AU - Liu, Siliang AU - Huang, Xing AU - Su, Juan AU - Yin, Mingzhu AU - Qian, Buyue AU - Wang, Xianggui AU - Chen, Xiang AU - Zhao, Shuang PY - 2021/9/21 TI - The Classification of Six Common Skin Diseases Based on Xiangya-Derm: Development of a Chinese Database for Artificial Intelligence JO - J Med Internet Res SP - e26025 VL - 23 IS - 9 KW - artificial intelligence KW - skin disease KW - convolutional neural network KW - medical image processing KW - automatic auxiliary diagnoses KW - dermatology KW - skin KW - classification KW - China N2 - Background: Skin and subcutaneous disease is the fourth-leading cause of the nonfatal disease burden worldwide and constitutes one of the most common burdens in primary care. However, there is a severe lack of dermatologists, particularly in rural Chinese areas. Furthermore, although artificial intelligence (AI) tools can assist in diagnosing skin disorders from images, the database for the Chinese population is limited. Objective: This study aims to establish a database for AI based on the Chinese population and presents an initial study on six common skin diseases. Methods: Each image was captured with either a digital camera or a smartphone, verified by at least three experienced dermatologists and corresponding pathology information, and finally added to the Xiangya-Derm database. Based on this database, we conducted AI-assisted classification research on six common skin diseases and then proposed a network called Xy-SkinNet. Xy-SkinNet applies a two-step strategy to identify skin diseases. First, given an input image, we segmented the regions of the skin lesion. Second, we introduced an information fusion block to combine the output of all segmented regions. We compared the performance with 31 dermatologists of varied experiences. Results: Xiangya-Derm, as a new database that consists of over 150,000 clinical images of 571 different skin diseases in the Chinese population, is the largest and most diverse dermatological data set of the Chinese population. The AI-based six-category classification achieved a top 3 accuracy of 84.77%, which exceeded the average accuracy of dermatologists (78.15%). Conclusions: Xiangya-Derm, the largest database for the Chinese population, was created. The classification of six common skin conditions was conducted based on Xiangya-Derm to lay a foundation for product research. UR - https://www.jmir.org/2021/9/e26025 UR - http://dx.doi.org/10.2196/26025 UR - http://www.ncbi.nlm.nih.gov/pubmed/34546174 ID - info:doi/10.2196/26025 ER - TY - JOUR AU - Eapen, Raj Bell AU - Archer, Norm AU - Sartipi, Kamran PY - 2020/4/20 TI - LesionMap: A Method and Tool for the Semantic Annotation of Dermatological Lesions for Documentation and Machine Learning JO - JMIR Dermatol SP - e18149 VL - 3 IS - 1 KW - LesionMap KW - LesionMapper KW - digital imaging KW - machine learning KW - dermatology UR - http://derma.jmir.org/2020/1/e18149/ UR - http://dx.doi.org/10.2196/18149 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/18149 ER -