TY - JOUR AU - Cui, Junqi AU - Li, Weijia AU - Lim, Ngai Enoch Chi AU - Wu, Xiaoqin AU - Lim, Danforn Chi Eung PY - 2026/2/25 TI - Stratified Causal Inference for Intensive Care Unit Risk Prediction: Informatics-Based Modeling of Anesthetic Drug Combinations JO - JMIR Form Res SP - e80294 VL - 10 KW - fentanyl KW - propofol KW - intensive care KW - dose-response analysis KW - counterfactual modeling N2 - Background: Postoperative intensive care unit (ICU) admission affects 15% to 20% of surgical patients and represents a major source of morbidity and health care costs. Current anesthetic dosing relies on empirical guidelines rather than individualized risk assessment. We developed a counterfactual dose-response model to identify optimal fentanyl-propofol combinations. Objective: This study aimed to develop and evaluate a stratified, causal machine learning framework using electronic health record data to identify optimal fentanyl-propofol dose combinations and predict postoperative ICU admission risk, enabling precision anesthesia and individualized clinical decision support. Methods: We analyzed perioperative electronic health records of 67,134 surgical procedures from UC Irvine Medical Center (2017?2022). A hierarchical learning framework was used to estimate causal effects while controlling for confounding variables. A total of 6 dose-sensitive subgroups were identified through stratified analysis. The primary end point was postoperative ICU admission. Results: High-risk combinations (fentanyl >5 mcg/kg with propofol <1 mg/kg) increased ICU admissions? absolute risk difference by 36% (absolute risk increase; 95% CI 0.351-0.509; P<.001). A total of 6 patient subgroups demonstrated distinct dose-response patterns, with populations considered vulnerable (high glucose, elevated creatinine) showing elevated risk even at standard doses. The optimal dose range for decision-making was determined to be 1.25 to 4.25 mg/kg for propofol and 3.5 to 4.0 mcg/kg for fentanyl. Conclusions: Fentanyl-propofol combinations exhibit complex, nonlinear dose-response relationships with ICU admission risk. High-dose combinations markedly increase risk through synergistic effects, while specific patient subgroups require enhanced monitoring even at standard doses. These findings support the development of individualized dosing algorithms and risk assessment tools that could inform future decision support tools aimed at reducing postoperative ICU use, although their predictive performance and clinical impact would require external validation. UR - https://formative.jmir.org/2026/1/e80294 UR - http://dx.doi.org/10.2196/80294 ID - info:doi/10.2196/80294 ER - TY - JOUR AU - Causio, Andrea Francesco AU - De Vita, Vittorio AU - Nappi, Andrea AU - Sawaya, Melissa AU - Rocco, Bernardo AU - Foschi, Nazario AU - Maioriello, Giuseppe AU - Russo, Pierluigi PY - 2026/2/19 TI - Survival Prediction in Patients With Bladder Cancer Undergoing Radical Cystectomy Using a Machine Learning Algorithm: Retrospective Single-Center Study JO - JMIR Perioper Med SP - e86666 VL - 9 KW - cystectomy KW - disease-free survival KW - artificial intelligence KW - neoplasm staging KW - retrospective studies KW - urinary bladder neoplasms KW - clinical decision-making KW - machine learning KW - statistical models N2 - Background: Traditional statistical models often fail to capture the complex dynamics influencing survival outcomes in patients with bladder cancer after radical cystectomy, a procedure where approximately 50% of patients develop metastases within 2 years. The integration of artificial intelligence (AI) offers a promising avenue for enhancing prognostic accuracy and personalizing treatment strategies. Objective: This study aimed to develop and evaluate a machine learning algorithm for predicting disease-free survival (DFS), overall survival (OS), and the cause of death in patients with bladder cancer undergoing cystectomy, using a comprehensive dataset of clinical and pathological variables. Methods: Retrospective data of 370 patients with bladder cancer who underwent radical cystectomy at Fondazione Policlinico Universitario Agostino Gemelli IRCCS, Rome, Italy, were collected. The dataset comprised 20 input variables, encompassing demographics, tumor characteristics, treatment variables, and inflammatory markers. For specific analyses and models, we used patient subcohorts. The CatBoost algorithm was used for regression tasks (DFS in 346 patients, OS in 347 patients) and a binary classification task (tumor-related death in 312 patients). Model performance was assessed using mean absolute error (MAE) for regression and F1-score for classification, prioritizing a minimum recall of 75% for tumor-related deaths. Five-fold cross-validation and Shapley additive explanations (SHAP) values were used to ensure robustness and interpretability. Results: For DFS prediction, the CatBoost model achieved an MAE of 18.68 months, with clinical tumor stage and pathological tumor classification identified as the most influential predictors. OS prediction yielded an MAE of 17.2 months, which improved to 14.6 months after feature filtering, where tumor classification and the systemic immune-inflammation index (SII) were most impactful. For tumor-related death classification, the model achieved a recall of 78.6% and an F1-score of 0.44 for the positive class (tumor-related deaths), correctly identifying 11 of 14 cases. Bladder tumor position was the most influential feature for cause-of-death prediction. Conclusions: The developed machine learning algorithm demonstrates promising accuracy in predicting survival and the cause of death in patients with bladder cancer after cystectomy. The key predictors include clinical and pathological tumor staging, systemic inflammation (SII), and bladder tumor position. These findings highlight the potential of AI in providing clinicians with an objective, data-driven tool to improve personalized prognostic assessment and guide clinical decision-making. UR - https://periop.jmir.org/2026/1/e86666 UR - http://dx.doi.org/10.2196/86666 ID - info:doi/10.2196/86666 ER - TY - JOUR AU - Mevik, Kjersti AU - Woldaregay, Zebene Ashenafi AU - Jonsson, Lindell Eva AU - Tejedor, Miguel AU - Temple-Oberle, Claire PY - 2026/2/17 TI - Application of AI Models for Preventing Surgical Complications: Scoping Review of Clinical Readiness and Barriers to Implementation JO - JMIR AI SP - e75064 VL - 5 KW - surgical complications prediction models KW - machine learning KW - artificial intelligence KW - AI KW - surgical complications KW - predictive modeling KW - risk prediction KW - surgery outcomes KW - perioperative care KW - clinical decision support N2 - Background: The impact of surgical complications is substantial and multifaceted, affecting patients and their families, surgeons, and health care systems. Despite the remarkable progress in artificial intelligence (AI), there remains a notable gap in the prospective implementation of AI models in surgery that use real-time data to support decision-making and enable proactive intervention to reduce the risk of surgical complications. Objective: This scoping review aims to assess and analyze the adoption and use of AI models for preventing surgical complications. Furthermore, this review aims to identify barriers and facilitators for implementation at the bedside. Methods: Following PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, we conducted a literature search using IEEE Xplore, Scopus, Web of Science, MEDLINE, ProQuest, PubMed, ABI, Embase, Epistemonikos, CINAHL, and Cochrane registries. The inclusion criteria included empirical, peer-reviewed studies published in English between January 2013 and January 2025, involving AI models for preventing surgical complications (surgical site infections, and heart and lung complications or stroke) in real-world settings. Exclusions included retrospective algorithm-only validations, nonempirical research (eg, editorials or protocols), and non-English studies. Study characteristics and AI model development details were extracted, along with performance statistics (eg, sensitivity and area under the receiver operating characteristic curve). We then used thematic analysis to synthesize findings related to AI models, prediction outputs, and validation methods. Studies were grouped into three main themes: (1) duration of hypotension, (2) risk for complications, and (3) decision support tool. Results: Of the 275 identified records, 19 were included. The included models frequently demonstrated strong technical accuracy with high sensitivity and area under the receiver operating characteristic curve, particularly among studies evaluating decision support tools. However, only a few models were adopted routinely in clinical practice. Two studies evaluated the clinicians? perceptions regarding the use of AI models, reporting predominantly positive assessments of their usefulness. Conclusions: Overall, AI models hold potential to predict and prevent surgical complications as the validation studies demonstrated high accuracy. However, implementation in routine practice remains limited by usability barriers, workflow misalignment, trust concerns, and financial and ethical constraints. The evidence included in this scoping review was limited by the heterogeneity in study design and the predominance of small-scale feasibility studies, particularly for hypotension prediction. Future research should prioritize prospectively validated models that use other physiologic features and address clinicians? concerns regarding generalizability and adoption. UR - https://ai.jmir.org/2026/1/e75064 UR - http://dx.doi.org/10.2196/75064 ID - info:doi/10.2196/75064 ER - TY - JOUR AU - Ma, Junwei AU - Tang, Huifeng AU - Zhang, Yunshan AU - Yi, Xuemei AU - Zhong, Tangsheng AU - Li, Xinyun AU - Wang, Gang PY - 2026/2/12 TI - Machine Learning for Predicting Venous Thromboembolism After Joint Arthroplasty: Systematic Review of Clinical Applicability and Model Performance JO - JMIR Med Inform SP - e79886 VL - 14 KW - joint arthroplasty KW - venous thromboembolism KW - machine learning KW - meta-analysis KW - systematic review N2 - Background: There is increasing research on machine learning in predicting venous thromboembolism after joint arthroplasty, but the quality and clinical applicability of these models remain uncertain. Objective: This systematic review aims to evaluate the predictive performance and methodological quality of machine learning models for venous thromboembolism risk after joint replacement surgery. Methods: Web of Science, Embase, Scopus, CNKI, Wanfang, Vipro, and PubMed were searched until December 15, 2024. The risk of bias and applicability were evaluated using the PROBAST (Prediction Model Risk of Bias Assessment Tool) checklist. A qualitative comprehensive analysis was conducted to extract and describe the data related to the model?s characteristics and performance. Results: This review encompassed 34 prediction models from 9 studies. The most frequently used machine learning models were extreme gradient boosting and logistic regression. The results showed that all studies had significant heterogeneity and high risk of bias. Although some models reported nearly flawless area under the curve (>0.9), they lacked external validation and may have overfitted. The models tested on large external datasets demonstrated more conservative performance. Conclusions: The predictive performance of machine learning models varied greatly. Although the reported area under the curve values indicated that some models have good discriminative ability, this performance varied greatly and was inconsistent among the included studies. These models have a high risk of bias, and it is necessary to take this into account when they are used in clinical practice. Future studies should adopt a prospective study design, ensure appropriate data handling, and use external validation to improve model robustness and applicability. Trial Registration: PROSPERO CRD42024625842; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024625842 UR - https://medinform.jmir.org/2026/1/e79886 UR - http://dx.doi.org/10.2196/79886 ID - info:doi/10.2196/79886 ER - TY - JOUR AU - Dosis, Alexios AU - Syversen, Berger Aron AU - Kowal, R. Mikolaj AU - Grant, Daniel AU - Tiernan, Jim AU - Wong, David AU - Jayne, G. David PY - 2026/1/27 TI - Exploiting Unsupervised Free-Living Data for Cardiorespiratory Fitness Estimation: Systematic Review and Meta-Analysis JO - JMIR Mhealth Uhealth SP - e69996 VL - 14 KW - wearables KW - cardiorespiratory fitness KW - free-living data KW - machine learning KW - perioperative medicine N2 - Background: Current methods of cardiorespiratory fitness (CRF) assessment may discriminate against frail individuals who are challenged to perform a maximal cardiopulmonary exercise test. CRF estimations from free-living wearable data, captured over extended time periods, may offer a more representative assessment and increase usability in clinical settings. Objective: This study aimed to review current evidence behind this novel concept and evaluate the performance and quality of models developed to estimate CRF from free-living, unsupervised data. Methods: Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we systematically searched 4 databases (MEDLINE, Embase, Scopus, and arXiv) for studies reporting the development of models to estimate CRF from continuous free-living wearable data. Studies conducted entirely under controlled laboratory conditions were excluded. Performance metrics were combined in a meta-correlation analysis using a random-effects model and Fisher Z transformation. Results: Of 1848 papers screened, 18 met the eligibility criteria, with a total of 31,072 participants. The weighted mean age was 46.9 (SD 1.46) years. Multiple computational techniques were used, with 8 studies employing more advanced machine learning models. The meta-correlation analysis revealed a pooled overall estimate of 0.83 with a 95% CI 0.77?0.88. The I2 test indicated high heterogeneity at 97%. Risk of bias assessment found most concerns in the data analysis domain, with studies often lacking clarity around the data handling process. Conclusions: A promising preliminary agreement between CRF predictions and measured values was noted. However, no definite conclusions can be drawn for clinical implementation due to high heterogeneity among the included studies and lack of external validation. Nonetheless, continuous data streams appear to be a valuable resource that could lead to a step change in how we measure and monitor CRF. Trial Registration: PROSPERO CRD42024593878; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024593878 UR - https://mhealth.jmir.org/2026/1/e69996 UR - http://dx.doi.org/10.2196/69996 ID - info:doi/10.2196/69996 ER - TY - JOUR AU - Gaessler, Jan AU - Remschmidt, Bernhard AU - Jopp, Ann-Kathrin AU - Arefnia, Behrouz AU - Franke, Adrian AU - Rieder, Marcus PY - 2026/1/5 TI - Quality of Conventional versus Artificial Intelligence Oral Surgery Consent Forms: Comparative Analysis JO - J Med Internet Res SP - e59851 VL - 28 KW - oral surgical procedures KW - informed consent KW - quality control KW - artificial intelligence KW - oral surgery KW - consent form KW - AI KW - dental health KW - oral surgeon KW - patient care KW - paitent autonomy KW - dentistry UR - https://www.jmir.org/2026/1/e59851 UR - http://dx.doi.org/10.2196/59851 ID - info:doi/10.2196/59851 ER - TY - JOUR AU - Escobar-Castillejos, David AU - Barrera-Animas, Y. Ari AU - Noguez, Julieta AU - Magana, J. Alejandra AU - Benes, Bedrich PY - 2025/11/18 TI - Transforming Surgical Training With AI Techniques for Training, Assessment, and Evaluation: Scoping Review JO - J Med Internet Res SP - e58966 VL - 27 KW - artificial intelligence KW - technology-enhanced learning KW - simulation-based training KW - performance assessment KW - medical training KW - surgery KW - higher education KW - educational innovation N2 - Background: Artificial intelligence (AI) has introduced novel opportunities for assessment and evaluation in surgical training, offering potential improvements that could surpass traditional educational methods. Objective: This scoping review examines the integration of AI in surgical training, assessment, and evaluation, aiming to determine how AI technologies can enhance trainees? learning paths and performance by incorporating data-driven insights and predictive analytics. In addition, this review examines the current state and applications of AI algorithms in this field, identifying potential areas for future research. Methods: Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, the PubMed, Scopus, and Web of Science were searched for studies published between January 2020 and March 18, 2024. Eligibility criteria included English-language full-text articles that investigated the application of AI in surgical training, assessment, or evaluation; non-English texts, reviews, preprints, and studies not addressing AI in surgical education were excluded. After duplicate removal and screening, 56 studies were included in the analysis. Data were structured by categorizing studies according to surgical procedure, AI technique, and training setup. Results were synthesized narratively and summarized in frequency tables. Results: From 1400 initial records, 56 studies met the inclusion criteria. Most were journal articles (84%, 47/56), with the remainder being conference papers (16%, 9/56). AI was most frequently applied in minimally invasive surgery (27%, 15/56), neurosurgery (20%, 11/56), and laparoscopy (16%, 9/56). Common techniques included machine learning (20%, 11/56), clustering (14%, 8/56), deep learning (11%, 6/56), convolutional neural networks (11%, 6/56), and support vector machines (11%, 6/56). Training setups were dominated by simulation platforms (33%, 19/56) and box trainers (24%, 13/56), followed by surgical video analysis (16%, 9/56), and robotic systems such as the da Vinci platform (13%, 7/56). Across studies, AI-enhanced training environments provided automated skill assessment, personalized feedback, and adaptive learning trajectories, with several reporting improvements in trainees? learning curves and technical proficiency. However, heterogeneity in study design and outcome measures limited comparability, and algorithmic transparency was often lacking. Conclusions: The application of AI in surgical training demonstrates the potential to enhance skill acquisition and support more efficient, personalized, and adaptive learning pathways. Despite encouraging findings, several limitations exist, including small sample sizes, the lack of standardized evaluation metrics, and insufficient external validation of AI models. Future studies should aim to clarify AI methodologies, improve reproducibility, and develop scalable, simulation-based solutions aligned with global education goals. UR - https://www.jmir.org/2025/1/e58966 UR - http://dx.doi.org/10.2196/58966 UR - http://www.ncbi.nlm.nih.gov/pubmed/41252719 ID - info:doi/10.2196/58966 ER - TY - JOUR AU - Han, Changho AU - Soh, Sarah AU - Park, Je-Wook AU - Pak, Hui-Nam AU - Yoon, Dukyong PY - 2025/11/10 TI - Artificial Intelligence?Based Electrocardiogram Model as a Predictor of Postoperative Atrial Fibrillation Following Cardiac Surgery: Retrospective Cohort Study JO - J Med Internet Res SP - e77164 VL - 27 KW - postoperative atrial fibrillation KW - cardiac surgery KW - electrocardiogram KW - artificial intelligence KW - deep learning N2 - Background: Postoperative atrial fibrillation (AF) after cardiac surgery is common and is associated with substantial clinical and economic repercussions. However, existing strategies for preventing postoperative AF remain suboptimal, limiting proactive management. Advances in artificial intelligence (AI) may improve the prediction of postoperative AF. Studies have shown that deep learning applied to electrocardiograms (ECGs) can detect subtle patterns in non-AF ECGs associated with a history of (or impending) AF (referred to as the AI-ECG-AF model). As a noninvasive test routinely performed throughout the perioperative period, the ECG presents a unique opportunity for additional risk stratification. Objective: We aimed to determine whether the AI-ECG-AF model can serve as an independent risk factor for postoperative AF after cardiac surgery, compare its predictive performance with existing postoperative AF prediction tools, and assess its additive value. Methods: This single-center retrospective cohort study included 2266 patients (5402 standard 12-lead ECGs) who underwent cardiac surgery at a tertiary hospital in South Korea between December 2018 and December 2023. The AI-ECG-AF model was trained on 4.05 million non-AF standard 12-lead ECGs (1.13 million patients) using a 1D EfficientNet-B0 architecture and achieved an area under the receiver operating characteristic curve (AUROC) of 0.901 (95% CI 0.900?0.902) in its held-out test set. Postoperative AF was defined as AF documented by ECG within 30 days after surgery. Using multivariable logistic regression, we assessed the association between the AI-ECG-AF model score and postoperative AF, adjusting for conventional clinical variables. We also investigated the additive or synergistic predictive value of the AI-ECG-AF model score when combined with an existing postoperative AF tool (the postoperative atrial fibrillation score) or other risk factors, based on the AUROC. Results: After adjusting for other clinical variables, a 10% absolute increase in the AI-ECG-AF model score was associated with a 1.197- to 1.209-fold increase in the odds of developing postoperative AF. The AI-ECG-AF model score significantly enhanced postoperative AF prediction: the AUROC of the existing postoperative atrial fibrillation score was 0.643; adding the AI-ECG-AF model score increased it to 0.680 (P<.001), and combining the AI-ECG-AF model score with other risk factors raised it to 0.710 (P<.001). Conclusions: The AI-ECG-AF model serves as a novel, robust, and independent risk factor for postoperative AF following cardiac surgery and provides additive or synergistic predictive value when integrated with existing postoperative AF prediction tools or other risk factors. By capturing atrial electrophysiological vulnerability not reflected in conventional clinical scores, the AI-ECG-AF model may function as a noninvasive biomarker for preoperative risk stratification for postoperative AF prediction in cardiac surgery patients, potentially enabling targeted prophylaxis and closer monitoring during the perioperative period. UR - https://www.jmir.org/2025/1/e77164 UR - http://dx.doi.org/10.2196/77164 ID - info:doi/10.2196/77164 ER - TY - JOUR AU - Wang, Runchen AU - Zheng, Jianqi AU - Guo, Wenwei AU - Huang, Haiqi AU - Wang, Qixia AU - Li, Yihong AU - Lin, Manwan AU - Huang, Linchong AU - Zhang, Qing AU - Chen, Kaishen AU - Ye, Zhiming AU - Deng, Hongsheng AU - Jiang, Yu AU - Lin, Yuechun AU - Feng, Yi AU - Huang, Ying AU - Chen, Ying AU - He, Jianxing AU - Liang, Hengrui PY - 2025/9/16 TI - Integrating a Multimodal Digital Device for Continuous Perioperative Monitoring in Patients With Lung Cancer Undergoing Thoracic Surgery: Development and Usability Study JO - JMIR Mhealth Uhealth SP - e69512 VL - 13 KW - lung cancer KW - digital device KW - wearable device KW - patients reported outcomes KW - multi-modal KW - artificial intelligence KW - AI N2 - Background: Minimally invasive thoracic surgery has improved lung cancer outcomes but requires enhanced postoperative care. Traditionally, the episodic care model has limited timely and multidimensional monitoring of patients. Recent technological advances in multimodal digital devices, including wearable devices and electronic patient-reported outcomes (ePROs), offer a promising solution to these challenges. However, current studies focus on only a few parameters and limited application in thoracic surgery. Objective: This study aims to propose a self-controlled study to evaluate the feasibility and reliability of multimodal digital devices, including wearables and ePROs, for continuous perioperative monitoring to enhance recovery after thoracic surgery. Methods: We included 288 patients with non?small cell lung cancer from the Guangzhou Medical University cohort, which includes 2757 participants with various lung diseases. Digital data were collected during hospitalization using a commercial smartwatch combined with an ePROs questionnaire, while clinical data were obtained from electronic health records (EHRs). Agreement between the digital device and EHR was evaluated via Bland-Altman analysis. Time-series data were normalized for continuous outlier monitoring, and threshold analysis of ePROs scores were used to explore associations across different modules. Results: Throughout hospitalization, digital devices provided a subjective overview of the patients? recovery trajectories. Results of Bland-Altman analysis demonstrated a high level of agreement between the digital device and the EHR. For body temperature, the analysis revealed a minimal bias of 0.02 °C (95% CI ?0.01 °C to 0.05 °C), the agreement for heart rate showed a bias of 0.26 beats per minute (bpm; 95% CI ?0.49 bpm to 1.01 bpm), and the bias for oxygen saturation was ?0.06% (95% CI ?0.27% to 0.15%), indicating close alignment between the 2 measurement methods. Meanwhile, wearable devices demonstrate significant potential in outlier detection compared to the episodic care model, offering accurate and sensitive monitoring of outliers between traditional measurement intervals. Using a thresholding method, we found that wearable metrics were correlated with the severity of ePROs. Conclusions: These findings highlight the reliability and clinical potential of digital device?based multimodal systems within the enhanced recovery after surgery framework, offering a novel approach for continuous perioperative monitoring. UR - https://mhealth.jmir.org/2025/1/e69512 UR - http://dx.doi.org/10.2196/69512 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/69512 ER - TY - JOUR AU - Chandrasekar, Subramaniam Rajagopal AU - Kane, Michael AU - Krishnamurti, Lakshmanan PY - 2025/9/15 TI - Machine-Learning Predictive Tool for the Individualized Prediction of Outcomes of Hematopoietic Cell Transplantation for Sickle Cell Disease: Registry-Based Study JO - JMIR AI SP - e64519 VL - 4 KW - sickle cell disease KW - SCD KW - prediction algorithms KW - hematopoietic stem cell transplantation KW - machine learning KW - ML KW - predictive tool KW - prediction KW - hematopoietic cell transplantation KW - HCT KW - hematopoietic cell KW - registry-based study KW - clinical decision-making KW - prediction model KW - clinical outcomes KW - gene therapy KW - shared decision-making N2 - Background: Disease-modifying therapies ameliorate disease severity of sickle cell disease (SCD), but hematopoietic cell transplantation (HCT), and more recently, autologous gene therapy are the only treatments that have curative potential for SCD. While registry-based studies provide population-level estimates, they do not address the uncertainty regarding individual outcomes of HCT. Computational machine learning (ML) has the potential to identify generalizable predictive patterns and quantify uncertainty in estimates, thereby improving clinical decision-making. There is no existing ML model for SCD, and ML models for HCT for other diseases focus on single outcomes rather than all relevant outcomes. Objective: This study aims to address the existing knowledge gap by developing and validating an individualized ML prediction model SPRIGHT (Sickle Cell Predicting Outcomes of Hematopoietic Cell Transplantation), incorporating multiple relevant pre-HCT features to make predictions of key post-HCT clinical outcomes. Methods: We applied a supervised random forest ML model to clinical parameters in a deidentified Center for International Blood and Marrow Transplant Research (CIBMTR) dataset of 1641 patients who underwent HCT between 1991 and 2021 and were followed for a median of 42.5 (IQR 52.5;range 0.3?312.9) months. We applied forward and reverse feature selection methods to optimize a set of predictive variables. To counter the imbalance bias toward predicting positive outcomes due to the small number of negative outcomes, we constructed a training dataset, taking each outcome as variable of interest, and performed 2-times repeated 10-fold cross-validation. SPRIGHT is a web-based individualized prediction tool accessible by smartphone, tablet, or personal computer. It incorporates predictive variables of age, age group, Karnofsky or Lansky score, comorbidity index, recipient cytomegalovirus seropositivity, history of acute chest syndrome, need for exchange transfusion, occurrence and frequency of vaso-occlusive crisis (VOC) before HCT, and either a published or custom chemotherapy or radiation conditioning, serotherapy, and graft-versus-host disease prophylaxis. SPRIGHT makes individualized predictions of overall survival (OS), event-free survival, graft failure, acute graft-versus-host disease (AGVHD), chronic graft-versus-host disease (CGVHD), and occurrence of VOC or stroke post-HCT. Results: The model's ability to distinguish between positive and negative classes, that is, discrimination, was evaluated using the area under the curve, accuracy, and balanced accuracy. Discrimination met or exceeded published predictive benchmarks with area under the curve for OS (0.7925), event-free survival (0.7900), graft failure (0.8024), acute graft-versus-host disease (0.6793), chronic graft-versus-host disease (0.7320), and VOC post-HCT (0.8779). SPRIGHT revealed good calibration with a slope of 0.87?0.96, with small negative intercepts (?0.01 to 0.03), for 4 out of the 5 outcomes. However, OS exhibits nonideal calibration, which may be reflective of the overall high OS in all subgroups. Conclusions: A web-based ML prediction tool incorporating multiple clinically relevant variables predicts key clinical outcomes with a high level of discrimination and calibration and has potential in shared decision-making UR - https://ai.jmir.org/2025/1/e64519 UR - http://dx.doi.org/10.2196/64519 ID - info:doi/10.2196/64519 ER - TY - JOUR AU - Lex, R. Johnathan AU - Abbas, Aazad AU - Mosseri, Jacob AU - Singh Toor, Jay AU - Simone, Michael AU - Ravi, Bheeshma AU - Whyne, Cari AU - Khalil, B. Elias PY - 2025/9/10 TI - Using Machine Learning to Predict-Then-Optimize Elective Orthopedic Surgery Scheduling to Improve Operating Room Utilization: Retrospective Study JO - JMIR Med Inform SP - e70857 VL - 13 KW - machine learning KW - orthopedic surgery KW - optimization KW - elective surgery KW - scheduling KW - hip and knee arthroplasty N2 - Background: Total knee and hip arthroplasty (TKA and THA) are among the most performed elective procedures. Rising demand and the resource-intensive nature of these procedures have contributed to longer wait times despite significant health care investment. Current scheduling methods often rely on average surgical durations, overlooking patient-specific variability. Objective: To determine the potential for improving elective surgery scheduling for TKA and THA, respectively, by using a 2-stage approach that incorporates machine learning (ML) prediction of the duration of surgery (DOS) with scheduling optimization. Methods: In total, 2 ML models (one each for TKA and THA) were trained to predict DOS using patient factors based on 302,490 and 196,942 patients, respectively, from a large international database. In total, 3 optimization formulations based on varying surgeon flexibility were compared: Any (surgeons could operate in any operating room at any time), Split (limitation of 2 surgeons per operating room per day), and multiple subset sum problem (MSSP; limit of 1 surgeon per operating room per day). Two years of daily scheduling simulations were performed for each optimization problem using ML prediction or mean DOS over a range of schedule parameters. Constraints and resources were based on a high-volume arthroplasty hospital in Canada. Results: The TKA and THA prediction models achieved test accuracy (with a 30 min buffer) of 78.1% (mean squared error 0.898) and 75.4% (mean squared error 0.916), respectively. Any scheduling formulation performed significantly worse than the Split and MSSP formulations with respect to overtime and underutilization (P<.001). The latter 2 problems performed similarly (P>.05) over most schedule parameters. The ML prediction schedules outperformed those generated using a mean DOS for most scheduling parameters, with overtime reduced on average by 300-500 minutes per week (12?20 min per operating room per day; P<.001). However, there was more operating room underutilization with the ML prediction schedules, with it ranging from 70?192 minutes more underutilization (P<.001). Using a 15-minute schedule granularity with a waitlist pool of a minimum of 1 month generated the ML schedule that outperformed the mean schedule 97.1% of times. Conclusions: Assuming a full waiting list, optimizing an individual surgeon?s elective operating room time using an ML-assisted predict-then-optimize scheduling system improves overall operating room efficiency, significantly decreasing overtime. This has significant potential implications for health care systems struggling with pressures of rising costs and growing operative waitlists. UR - https://medinform.jmir.org/2025/1/e70857 UR - http://dx.doi.org/10.2196/70857 ID - info:doi/10.2196/70857 ER - TY - JOUR AU - Huang, Kecheng AU - Wu, Chujun AU - Pi, Rongpeng AU - Fang, Jieyu PY - 2025/8/22 TI - AI-Driven Integration of Deep Learning With Lung Imaging, Functional Analysis, and Blood Gas Metrics for Perioperative Hypoxemia Prediction JO - JMIR Med Inform SP - e73995 VL - 13 KW - pneumonia KW - perioperative KW - artificial intelligence KW - hypoxemia KW - deep learning UR - https://medinform.jmir.org/2025/1/e73995 UR - http://dx.doi.org/10.2196/73995 UR - http://www.ncbi.nlm.nih.gov/pubmed/40759599 ID - info:doi/10.2196/73995 ER - TY - JOUR AU - Obst, Miriam AU - Arensmeyer, Jan AU - Bonsmann, Henrik AU - Kolbinger, Andreas AU - Kigenyi, Joel AU - Oneka, Francis AU - Owere, Benard AU - Schmidt, Joachim AU - Feodorovici, Philipp AU - Wynands, Jan PY - 2025/8/18 TI - AI-Enhanced 3D Models in Global Virtual Reality Case Conferences for Surgical Care in a Low-Income Country: Exploratory Study JO - JMIR Form Res SP - e69300 VL - 9 KW - 3D scanning KW - artificial intelligence KW - virtual reality KW - extended reality KW - metaverse KW - spatial computing KW - global surgery KW - reconstructive surgery N2 - Background: Approximately 5 billion people worldwide lack adequate access to surgical care, primarily in the Global South. Especially in crisis regions and war zones, telemedical applications may enhance health services. This study explores the feasibility of using artificial intelligence (AI)-enhanced 3D imaging and extended reality (XR) technologies for intercontinental surgical case conferences in a low-resource scenario in Uganda. Our pilot study aims to assess the value of these technologies to address the lack of surgical resources and multilateral knowledge exchange. Objective: This study intends to determine the feasibility of using new AI-enhanced image modeling technology within an immersive spatial XR scenario to collaboratively and remotely assess reconstructive patient cases in the resource-limited country of Uganda. Methods: Within a surgical camp at Lamu Medical Centre, Uganda, 3D models of patients? conditions were created using a smartphone app. Digital models were generated from photographs taken on-site and processed into 3D formats to be visualized in virtual case conferences. Here, surgeons from Uganda and Germany used virtual reality (VR) headsets to collaboratively discuss case strategies while marking surgical approaches on each digital patient model. Results: The study included 15 patients requiring reconstructive surgery, with a diverse range of conditions. The use of XR technology facilitated detailed visualization and discussion of surgical strategies. The process was time-efficient, with a total of under 8 minutes per case for data acquisition and model creation, and resource-efficient with surgeons reporting sufficient quality of smartphone-derived models. Valuable user experience and precise interaction during the VR case processing were found, underlining its potential to improve surgical planning and patient care in resource-limited settings. Conclusions: The findings indicate that AI-enhanced 3D imaging and immersive virtual communication platforms are valuable tools for integrative surgical case assessments. The cost-effectiveness of the used consumer solutions should be especially beneficial for low-resource environments. While the study demonstrates the feasibility of this approach, further research is needed to explore a broader application and impact of these technologies in global health. The study highlights the potential of XR to enhance training and surgical precision, contributing to better health care outcomes in underserved regions. UR - https://formative.jmir.org/2025/1/e69300 UR - http://dx.doi.org/10.2196/69300 ID - info:doi/10.2196/69300 ER - TY - JOUR AU - Maruyama, Hiroki AU - Toyama, Yoshitaka AU - Takanami, Kentaro AU - Takase, Kei AU - Kamei, Takashi PY - 2025/7/30 TI - Role of Artificial Intelligence in Surgical Training by Assessing GPT-4 and GPT-4o on the Japan Surgical Board Examination With Text-Only and Image-Accompanied Questions: Performance Evaluation Study JO - JMIR Med Educ SP - e69313 VL - 11 KW - LLM KW - ChatGPT KW - Japan Surgical Board Examination KW - surgical education KW - large language models KW - artificial intelligence KW - Medical Licensing Examination KW - diagnostic imaging N2 - Background: Artificial intelligence and large language models (LLMs)?particularly GPT-4 and GPT-4o?have demonstrated high correct-answer rates in medical examinations. GPT-4o has enhanced diagnostic capabilities, advanced image processing, and updated knowledge. Japanese surgeons face critical challenges, including a declining workforce, regional health care disparities, and work-hour-related challenges. Nonetheless, although LLMs could be beneficial in surgical education, no studies have yet assessed GPT-4o?s surgical knowledge or its performance in the field of surgery. Objective: This study aims to evaluate the potential of GPT-4 and GPT-4o in surgical education by using them to take the Japan Surgical Board Examination (JSBE), which includes both textual questions and medical images?such as surgical and computed tomography scans?to comprehensively assess their surgical knowledge. Methods: We used 297 multiple-choice questions from the 2021?2023 JSBEs. The questions were in Japanese, and 104 of them included images. First, the GPT-4 and GPT-4o responses to only the textual questions were collected via OpenAI?s application programming interface to evaluate their correct-answer rate. Subsequently, the correct-answer rate of their responses to questions that included images was assessed by inputting both text and images. Results: The overall correct-answer rates of GPT-4o and GPT-4 for the text-only questions were 78% (231/297) and 55% (163/297), respectively, with GPT-4o outperforming GPT-4 by 23% (P=<.01). By contrast, there was no significant improvement in the correct-answer rate for questions that included images compared with the results for the text-only questions. Conclusions: GPT-4o outperformed GPT-4 on the JSBE. However, the results of the LLMs were lower than those of the examinees. Despite the capabilities of LLMs, image recognition remains a challenge for them, and their clinical application requires caution owing to the potential inaccuracy of their results. UR - https://mededu.jmir.org/2025/1/e69313 UR - http://dx.doi.org/10.2196/69313 ID - info:doi/10.2196/69313 ER - TY - JOUR AU - Li, Xin AU - Yang, Wen-yu AU - Zhang, Fan AU - Shan, Rui AU - Mei, Fang AU - Song, Shi-Bing AU - Sun, Bang-Kai AU - Chen, Jing AU - Hu, Run-ze AU - Yang, Yang AU - Yang, Yi-hang AU - Liu, Jing-yao AU - Yuan, Chun-Hui AU - Liu, Zheng PY - 2025/7/11 TI - Size-Specific Predictors for Malignancy Risk in Follicular Thyroid Neoplasms: Machine Learning Analysis JO - JMIR Cancer SP - e73069 VL - 11 KW - follicular thyroid neoplasm KW - tumor size KW - machine learning KW - malignancy KW - follicular thyroid cancer KW - follicular thyroid adenoma KW - random forest KW - XGBoost N2 - Background: Surgeons often face challenges in distinguishing between benign and malignant follicular thyroid neoplasms (FTNs), particularly small tumors, until diagnostic surgery is performed. Objective: This study aimed to identify the size-specific predictors for the malignancy risk of FTNs preoperatively. Methods: A retrospective cohort study was conducted at Peking University Third Hospital in Beijing, China, from 2012 to 2023. Patients with a postoperative pathological diagnosis of follicular thyroid adenoma (FTA) or follicular thyroid carcinoma (FTC) were included. FTNs were classified into small- and large-sized categories based on the cutoff value of the tumor diameter derived from spline regression, which indicated the turning point of malignancy risk. We identified the 5 most important predictors from 22 variables including demography, sonography, and hormones, using machine learning methods. We also calculated the odds ratios (OR) with 95% CI for these predictors in both small- and large-sized FTNs. Results: Altogether, we included 1494 FTNs, comprising 1266 FTAs and 228 FTCs. FTNs with a maximum diameter less than 3.0 cm were grouped as small-sized tumors (n=715), while those with larger diameters were categorized as large-sized tumors (n=779). In the small-sized group, tumors with macrocalcification (OR 2.90, 95% CI 1.50-5.60), those with peripheral calcification (OR 4.50, 95% CI 1.50-13.00), and those in younger patients (OR 1.33, 95% CI 1.05-1.69) showed a higher malignancy risk. In the large-sized group, tumors presenting with a nodule-in-nodule appearance (OR 3.30, 95% CI 1.30-7.90) exhibited a higher malignancy risk. In both groups, lower thyroid-stimulating hormone levels (OR 1.49, 95% CI 1.20-1.85 for small-sized FTNs; OR 1.61, 95% CI 1.37-1.96 for large-sized FTNs) and a larger mean diameter (OR 1.40, 95% CI 1.10-1.70 for small-sized FTNs; OR 1.50 95% CI 1.20-1.70 for large-sized FTNs) were associated with the malignancy risk of FTNs. Conclusion: This study identified size-specific predictors for malignancy risk in FTNs, highlighting the importance of stratified prediction based on tumor size. UR - https://cancer.jmir.org/2025/1/e73069 UR - http://dx.doi.org/10.2196/73069 ID - info:doi/10.2196/73069 ER - TY - JOUR AU - Parduzi, Qendresa AU - Wermelinger, Jonathan AU - Koller, Domingo Simon AU - Sariyar, Murat AU - Schneider, Ulf AU - Raabe, Andreas AU - Seidel, Kathleen PY - 2025/3/24 TI - Explainable AI for Intraoperative Motor-Evoked Potential Muscle Classification in Neurosurgery: Bicentric Retrospective Study JO - J Med Internet Res SP - e63937 VL - 27 KW - intraoperative neuromonitoring KW - motor evoked potential KW - artificial intelligence KW - machine learning KW - deep learning KW - random forest KW - convolutional neural network KW - explainability KW - medical informatics KW - personalized medicine KW - neurophysiological KW - monitoring KW - orthopedic KW - motor KW - neurosurgery N2 - Background: Intraoperative neurophysiological monitoring (IONM) guides the surgeon in ensuring motor pathway integrity during high-risk neurosurgical and orthopedic procedures. Although motor-evoked potentials (MEPs) are valuable for predicting motor outcomes, the key features of predictive signals are not well understood, and standardized warning criteria are lacking. Developing a muscle identification prediction model could increase patient safety while allowing the exploration of relevant features for the task. Objective: The aim of this study is to expand the development of machine learning (ML) methods for muscle classification and evaluate them in a bicentric setup. Further, we aim to identify key features of MEP signals that contribute to accurate muscle classification using explainable artificial intelligence (XAI) techniques. Methods: This study used ML and deep learning models, specifically random forest (RF) classifiers and convolutional neural networks (CNNs), to classify MEP signals from routine supratentorial neurosurgical procedures from two medical centers according to muscle identity of four muscles (extensor digitorum, abductor pollicis brevis, tibialis anterior, and abductor hallucis). The algorithms were trained and validated on a total of 36,992 MEPs from 151 surgeries in one center, and they were tested on 24,298 MEPs from 58 surgeries from the other center. Depending on the algorithm, time-series, feature-engineered, and time-frequency representations of the MEP data were used. XAI techniques, specifically Shapley Additive Explanation (SHAP) values and gradient class activation maps (Grad-CAM), were implemented to identify important signal features. Results: High classification accuracy was achieved with the RF classifier, reaching 87.9% accuracy on the validation set and 80% accuracy on the test set. The 1D- and 2D-CNNs demonstrated comparably strong performance. Our XAI findings indicate that frequency components and peak latencies are crucial for accurate MEP classification, providing insights that could inform intraoperative warning criteria. Conclusions: This study demonstrates the effectiveness of ML techniques and the importance of XAI in enhancing trust in and reliability of artificial intelligence?driven IONM applications. Further, it may help to identify new intrinsic features of MEP signals so far overlooked in conventional warning criteria. By reducing the risk of muscle mislabeling and by providing the basis for possible new warning criteria, this study may help to increase patient safety during surgical procedures. UR - https://www.jmir.org/2025/1/e63937 UR - http://dx.doi.org/10.2196/63937 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/63937 ER - TY - JOUR AU - Oh, Mi-Young AU - Kim, Hee-Soo AU - Jung, Mi Young AU - Lee, Hyung-Chul AU - Lee, Seung-Bo AU - Lee, Mi Seung PY - 2025/3/19 TI - Machine Learning?Based Explainable Automated Nonlinear Computation Scoring System for Health Score and an Application for Prediction of Perioperative Stroke: Retrospective Study JO - J Med Internet Res SP - e58021 VL - 27 KW - machine learning KW - explainability KW - score KW - computation scoring system KW - Nonlinear computation KW - application KW - perioperative stroke KW - perioperative KW - stroke KW - efficiency KW - ML-based models KW - patient KW - noncardiac surgery KW - noncardiac KW - surgery KW - effectiveness KW - risk tool KW - risk KW - tool KW - real-world data N2 - Background: Machine learning (ML) has the potential to enhance performance by capturing nonlinear interactions. However, ML-based models have some limitations in terms of interpretability. Objective: This study aimed to develop and validate a more comprehensible and efficient ML-based scoring system using SHapley Additive exPlanations (SHAP) values. Methods: We developed and validated the Explainable Automated nonlinear Computation scoring system for Health (EACH) framework score. We developed a CatBoost-based prediction model, identified key features, and automatically detected the top 5 steepest slope change points based on SHAP plots. Subsequently, we developed a scoring system (EACH) and normalized the score. Finally, the EACH score was used to predict perioperative stroke. We developed the EACH score using data from the Seoul National University Hospital cohort and validated it using data from the Boramae Medical Center, which was geographically and temporally different from the development set. Results: When applied for perioperative stroke prediction among 38,737 patients undergoing noncardiac surgery, the EACH score achieved an area under the curve (AUC) of 0.829 (95% CI 0.753-0.892). In the external validation, the EACH score demonstrated superior predictive performance with an AUC of 0.784 (95% CI 0.694-0.871) compared with a traditional score (AUC=0.528, 95% CI 0.457-0.619) and another ML-based scoring generator (AUC=0.564, 95% CI 0.516-0.612). Conclusions: The EACH score is a more precise, explainable ML-based risk tool, proven effective in real-world data. The EACH score outperformed traditional scoring system and other prediction models based on different ML techniques in predicting perioperative stroke. UR - https://www.jmir.org/2025/1/e58021 UR - http://dx.doi.org/10.2196/58021 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58021 ER - TY - JOUR AU - Dong, Jiale AU - Jin, Zhechuan AU - Li, Chengxiang AU - Yang, Jian AU - Jiang, Yi AU - Li, Zeqian AU - Chen, Cheng AU - Zhang, Bo AU - Ye, Zhaofei AU - Hu, Yang AU - Ma, Jianguo AU - Li, Ping AU - Li, Yulin AU - Wang, Dongjin AU - Ji, Zhili PY - 2025/3/6 TI - Machine Learning Models With Prognostic Implications for Predicting Gastrointestinal Bleeding After Coronary Artery Bypass Grafting and Guiding Personalized Medicine: Multicenter Cohort Study JO - J Med Internet Res SP - e68509 VL - 27 KW - machine learning KW - personalized medicine KW - coronary artery bypass grafting KW - adverse outcome KW - gastrointestinal bleeding N2 - Background: Gastrointestinal bleeding is a serious adverse event of coronary artery bypass grafting and lacks tailored risk assessment tools for personalized prevention. Objective: This study aims to develop and validate predictive models to assess the risk of gastrointestinal bleeding after coronary artery bypass grafting (GIBCG) and to guide personalized prevention. Methods: Participants were recruited from 4 medical centers, including a prospective cohort and the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. From an initial cohort of 18,938 patients, 16,440 were included in the final analysis after applying the exclusion criteria. Thirty combinations of machine learning algorithms were compared, and the optimal model was selected based on integrated performance metrics, including the area under the receiver operating characteristic curve (AUROC) and the Brier score. This model was then developed into a web-based risk prediction calculator. The Shapley Additive Explanations method was used to provide both global and local explanations for the predictions. Results: The model was developed using data from 3 centers and a prospective cohort (n=13,399) and validated on the Drum Tower cohort (n=2745) and the MIMIC cohort (n=296). The optimal model, based on 15 easily accessible admission features, demonstrated an AUROC of 0.8482 (95% CI 0.8328-0.8618) in the derivation cohort. In external validation, the AUROC was 0.8513 (95% CI 0.8221-0.8782) for the Drum Tower cohort and 0.7811 (95% CI 0.7275-0.8343) for the MIMIC cohort. The analysis indicated that high-risk patients identified by the model had a significantly increased mortality risk (odds ratio 2.98, 95% CI 1.784-4.978; P<.001). For these high-risk populations, preoperative use of proton pump inhibitors was an independent protective factor against the occurrence of GIBCG. By contrast, dual antiplatelet therapy and oral anticoagulants were identified as independent risk factors. However, in low-risk populations, the use of proton pump inhibitors (?21=0.13, P=.72), dual antiplatelet therapy (?21=0.38, P=.54), and oral anticoagulants (?21=0.15, P=.69) were not significantly associated with the occurrence of GIBCG. Conclusions: Our machine learning model accurately identified patients at high risk of GIBCG, who had a poor prognosis. This approach can aid in early risk stratification and personalized prevention. Trial Registration: Chinese Clinical Registry Center ChiCTR2400086050; http://www.chictr.org.cn/showproj.html?proj=226129 UR - https://www.jmir.org/2025/1/e68509 UR - http://dx.doi.org/10.2196/68509 UR - http://www.ncbi.nlm.nih.gov/pubmed/40053791 ID - info:doi/10.2196/68509 ER - TY - JOUR AU - Huang, Pinjie AU - Yang, Jirong AU - Zhao, Dizhou AU - Ran, Taojia AU - Luo, Yuheng AU - Yang, Dong AU - Zheng, Xueqin AU - Zhou, Shaoli AU - Chen, Chaojin PY - 2025/3/3 TI - Machine Learning?Based Prediction of Early Complications Following Surgery for Intestinal Obstruction: Multicenter Retrospective Study JO - J Med Internet Res SP - e68354 VL - 27 KW - postoperative complications KW - intestinal obstruction KW - machine learning KW - early intervention KW - risk calculator KW - prediction model KW - Shapley additive explanations N2 - Background: Early complications increase in-hospital stay and mortality after intestinal obstruction surgery. It is important to identify the risk of postoperative early complications for patients with intestinal obstruction at a sufficiently early stage, which would allow preemptive individualized enhanced therapy to be conducted to improve the prognosis of patients with intestinal obstruction. A risk predictive model based on machine learning is helpful for early diagnosis and timely intervention. Objective: This study aimed to construct an online risk calculator for early postoperative complications in patients after intestinal obstruction surgery based on machine learning algorithms. Methods: A total of 396 patients undergoing intestinal obstruction surgery from April 2013 to April 2021 at an independent medical center were enrolled as the training cohort. Overall, 7 machine learning methods were used to establish prediction models, with their performance appraised via the area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, and F1-score. The best model was validated through 2 independent medical centers, a publicly available perioperative dataset the Informative Surgical Patient dataset for Innovative Research Environment (INSPIRE), and a mixed cohort consisting of the above 3 datasets, involving 50, 66, 48, and 164 cases, respectively. Shapley Additive Explanations were measured to identify risk factors. Results: The incidence of postoperative complications in the training cohort was 47.44% (176/371), while the incidences in 4 external validation cohorts were 34% (17/50), 56.06% (37/66), 52.08% (25/48), and 48.17% (79/164), respectively. Postoperative complications were associated with 8-item features: Physiological Severity Score for the Enumeration of Mortality and Morbidity (POSSUM physiological score), the amount of colloid infusion, shock index before anesthesia induction, ASA (American Society of Anesthesiologists) classification, the percentage of neutrophils, shock index at the end of surgery, age, and total protein. The random forest model showed the best overall performance, with an AUROC of 0.788 (95% CI 0.709-0.869), accuracy of 0.756, sensitivity of 0.695, specificity of 0.810, and F1-score of 0.727 in the training cohort. The random forest model also achieved a comparable AUROC of 0.755 (95% CI 0.652-0.839) in validation cohort 1, a greater AUROC of 0.817 (95% CI 0.695-0.913) in validation cohort 2, a similar AUROC of 0.786 (95% CI 0.628-0.902) in validation cohort 3, and the comparable AUROC of 0.720 (95% CI 0.671-0.768) in validation cohort 4. We visualized the random forest model and created a web-based online risk calculator. Conclusions: We have developed and validated a generalizable random forest model to predict postoperative early complications in patients undergoing intestinal obstruction surgery, enabling clinicians to screen high-risk patients and implement early individualized interventions. An online risk calculator for early postoperative complications was developed to make the random forest model accessible to clinicians around the world. UR - https://www.jmir.org/2025/1/e68354 UR - http://dx.doi.org/10.2196/68354 UR - http://www.ncbi.nlm.nih.gov/pubmed/40053794 ID - info:doi/10.2196/68354 ER - TY - JOUR AU - Xiong, Xiaojuan AU - Fu, Hong AU - Xu, Bo AU - Wei, Wang AU - Zhou, Mi AU - Hu, Peng AU - Ren, Yunqin AU - Mao, Qingxiang PY - 2025/1/22 TI - Ten Machine Learning Models for Predicting Preoperative and Postoperative Coagulopathy in Patients With Trauma: Multicenter Cohort Study JO - J Med Internet Res SP - e66612 VL - 27 KW - traumatic coagulopathy KW - preoperative KW - postoperative KW - machine learning models KW - random forest KW - Medical Information Mart for Intensive Care N2 - Background: Recent research has revealed the potential value of machine learning (ML) models in improving prognostic prediction for patients with trauma. ML can enhance predictions and identify which factors contribute the most to posttraumatic mortality. However, no studies have explored the risk factors, complications, and risk prediction of preoperative and postoperative traumatic coagulopathy (PPTIC) in patients with trauma. Objective: This study aims to help clinicians implement timely and appropriate interventions to reduce the incidence of PPTIC and related complications, thereby lowering in-hospital mortality and disability rates for patients with trauma. Methods: We analyzed data from 13,235 patients with trauma from 4 medical centers, including medical histories, laboratory results, and hospitalization complications. We developed 10 ML models in Python (Python Software Foundation) to predict PPTIC based on preoperative indicators. Data from 10,023 Medical Information Mart for Intensive Care patients were divided into training (70%) and test (30%) sets, with 3212 patients from 3 other centers used for external validation. Model performance was assessed with 5-fold cross-validation, bootstrapping, Brier score, and Shapley additive explanation values. Results: Univariate logistic regression identified PPTIC risk factors as (1) prolonged activated partial thromboplastin time, prothrombin time, and international normalized ratio; (2) decreased levels of hemoglobin, hematocrit, red blood cells, calcium, and sodium; (3) lower admission diastolic blood pressure; (4) elevated alanine aminotransferase and aspartate aminotransferase levels; (5) admission heart rate; and (6) emergency surgery and perioperative transfusion. Multivariate logistic regression revealed that patients with PPTIC faced significantly higher risks of sepsis (1.75-fold), heart failure (1.5-fold), delirium (3.08-fold), abnormal coagulation (3.57-fold), tracheostomy (2.76-fold), mortality (2.19-fold), and urinary tract infection (1.95-fold), along with longer hospital and intensive care unit stays. Random forest was the most effective ML model for predicting PPTIC, achieving an area under the receiver operating characteristic of 0.91, an area under the precision-recall curve of 0.89, accuracy of 0.84, sensitivity of 0.80, specificity of 0.88, precision of 0.88, F1-score of 0.84, and Brier score of 0.13 in external validation. Conclusions: Key PPTIC risk factors include (1) prolonged activated partial thromboplastin time, prothrombin time, and international normalized ratio; (2) low levels of hemoglobin, hematocrit, red blood cells, calcium, and sodium; (3) low diastolic blood pressure; (4) elevated alanine aminotransferase and aspartate aminotransferase levels; (5) admission heart rate; and (6) the need for emergency surgery and transfusion. PPTIC is associated with severe complications and extended hospital stays. Among the ML models, the random forest model was the most effective predictor. Trial Registration: Chinese Clinical Trial Registry ChiCTR2300078097; https://www.chictr.org.cn/showproj.html?proj=211051 UR - https://www.jmir.org/2025/1/e66612 UR - http://dx.doi.org/10.2196/66612 UR - http://www.ncbi.nlm.nih.gov/pubmed/39841523 ID - info:doi/10.2196/66612 ER - TY - JOUR AU - Ding, Zhendong AU - Zhang, Linan AU - Zhang, Yihan AU - Yang, Jing AU - Luo, Yuheng AU - Ge, Mian AU - Yao, Weifeng AU - Hei, Ziqing AU - Chen, Chaojin PY - 2025/1/15 TI - A Supervised Explainable Machine Learning Model for Perioperative Neurocognitive Disorder in Liver-Transplantation Patients and External Validation on the Medical Information Mart for Intensive Care IV Database: Retrospective Study JO - J Med Internet Res SP - e55046 VL - 27 KW - machine learning KW - risk factors KW - liver transplantation KW - perioperative neurocognitive disorders KW - MIMIC-? database KW - external validation N2 - Background: Patients undergoing liver transplantation (LT) are at risk of perioperative neurocognitive dysfunction (PND), which significantly affects the patients? prognosis. Objective: This study used machine learning (ML) algorithms with an aim to extract critical predictors and develop an ML model to predict PND among LT recipients. Methods: In this retrospective study, data from 958 patients who underwent LT between January 2015 and January 2020 were extracted from the Third Affiliated Hospital of Sun Yat-sen University. Six ML algorithms were used to predict post-LT PND, and model performance was evaluated using area under the receiver operating curve (AUC), accuracy, sensitivity, specificity, and F1-scores. The best-performing model was additionally validated using a temporal external dataset including 309 LT cases from February 2020 to August 2022, and an independent external dataset extracted from the Medical Information Mart for Intensive Care ? (MIMIC-?) database including 325 patients. Results: In the development cohort, 201 out of 751 (33.5%) patients were diagnosed with PND. The logistic regression model achieved the highest AUC (0.799) in the internal validation set, with comparable AUC in the temporal external (0.826) and MIMIC-? validation sets (0.72). The top 3 features contributing to post-LT PND diagnosis were the preoperative overt hepatic encephalopathy, platelet level, and postoperative sequential organ failure assessment score, as revealed by the Shapley additive explanations method. Conclusions: A real-time logistic regression model-based online predictor of post-LT PND was developed, providing a highly interoperable tool for use across medical institutions to support early risk stratification and decision making for the LT recipients. UR - https://www.jmir.org/2025/1/e55046 UR - http://dx.doi.org/10.2196/55046 UR - http://www.ncbi.nlm.nih.gov/pubmed/39813086 ID - info:doi/10.2196/55046 ER - TY - JOUR AU - Holler, Emma AU - Ludema, Christina AU - Ben Miled, Zina AU - Rosenberg, Molly AU - Kalbaugh, Corey AU - Boustani, Malaz AU - Mohanty, Sanjay PY - 2025/1/9 TI - Development and Validation of a Routine Electronic Health Record-Based Delirium Prediction Model for Surgical Patients Without Dementia: Retrospective Case-Control Study JO - JMIR Perioper Med SP - e59422 VL - 8 KW - delirium KW - machine learning KW - prediction KW - postoperative KW - algorithm KW - electronic health records KW - surgery KW - risk prediction N2 - Background: Postoperative delirium (POD) is a common complication after major surgery and is associated with poor outcomes in older adults. Early identification of patients at high risk of POD can enable targeted prevention efforts. However, existing POD prediction models require inpatient data collected during the hospital stay, which delays predictions and limits scalability. Objective: This study aimed to develop and externally validate a machine learning-based prediction model for POD using routine electronic health record (EHR) data. Methods: We identified all surgical encounters from 2014 to 2021 for patients aged 50 years and older who underwent an operation requiring general anesthesia, with a length of stay of at least 1 day at 3 Indiana hospitals. Patients with preexisting dementia or mild cognitive impairment were excluded. POD was identified using Confusion Assessment Method records and delirium International Classification of Diseases (ICD) codes. Controls without delirium or nurse-documented confusion were matched to cases by age, sex, race, and year of admission. We trained logistic regression, random forest, extreme gradient boosting (XGB), and neural network models to predict POD using 143 features derived from routine EHR data available at the time of hospital admission. Separate models were developed for each hospital using surveillance periods of 3 months, 6 months, and 1 year before admission. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC). Each model was internally validated using holdout data and externally validated using data from the other 2 hospitals. Calibration was assessed using calibration curves. Results: The study cohort included 7167 delirium cases and 7167 matched controls. XGB outperformed all other classifiers. AUROCs were highest for XGB models trained on 12 months of preadmission data. The best-performing XGB model achieved a mean AUROC of 0.79 (SD 0.01) on the holdout set, which decreased to 0.69-0.74 (SD 0.02) when externally validated on data from other hospitals. Conclusions: Our routine EHR-based POD prediction models demonstrated good predictive ability using a limited set of preadmission and surgical variables, though their generalizability was limited. The proposed models could be used as a scalable, automated screening tool to identify patients at high risk of POD at the time of hospital admission. UR - https://periop.jmir.org/2025/1/e59422 UR - http://dx.doi.org/10.2196/59422 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59422 ER - TY - JOUR AU - Tang, Ran AU - Qi, Shi-qin PY - 2024/11/18 TI - The Vast Potential of ChatGPT in Pediatric Surgery JO - J Med Internet Res SP - e66453 VL - 26 KW - ChatGPT KW - pediatric KW - surgery KW - artificial intelligence KW - AI KW - diagnosis KW - surgeon UR - https://www.jmir.org/2024/1/e66453 UR - http://dx.doi.org/10.2196/66453 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/66453 ER - TY - JOUR AU - Liu, Jiayu AU - Liang, Xiuting AU - Fang, Dandong AU - Zheng, Jiqi AU - Yin, Chengliang AU - Xie, Hui AU - Li, Yanteng AU - Sun, Xiaochun AU - Tong, Yue AU - Che, Hebin AU - Hu, Ping AU - Yang, Fan AU - Wang, Bingxian AU - Chen, Yuanyuan AU - Cheng, Gang AU - Zhang, Jianning PY - 2024/9/10 TI - The Diagnostic Ability of GPT-3.5 and GPT-4.0 in Surgery: Comparative Analysis JO - J Med Internet Res SP - e54985 VL - 26 KW - ChatGPT KW - accuracy rates KW - artificial intelligence KW - diagnosis KW - surgeon N2 - Background: ChatGPT (OpenAI) has shown great potential in clinical diagnosis and could become an excellent auxiliary tool in clinical practice. This study investigates and evaluates ChatGPT in diagnostic capabilities by comparing the performance of GPT-3.5 and GPT-4.0 across model iterations. Objective: This study aims to evaluate the precise diagnostic ability of GPT-3.5 and GPT-4.0 for colon cancer and its potential as an auxiliary diagnostic tool for surgeons and compare the diagnostic accuracy rates between GTP-3.5 and GPT-4.0. We precisely assess the accuracy of primary and secondary diagnoses and analyze the causes of misdiagnoses in GPT-3.5 and GPT-4.0 according to 7 categories: patient histories, symptoms, physical signs, laboratory examinations, imaging examinations, pathological examinations, and intraoperative findings. Methods: We retrieved 316 case reports for intestinal cancer from the Chinese Medical Association Publishing House database, of which 286 cases were deemed valid after data cleansing. The cases were translated from Mandarin to English and then input into GPT-3.5 and GPT-4.0 using a simple, direct prompt to elicit primary and secondary diagnoses. We conducted a comparative study to evaluate the diagnostic accuracy of GPT-4.0 and GPT-3.5. Three senior surgeons from the General Surgery Department, specializing in Colorectal Surgery, assessed the diagnostic information at the Chinese PLA (People?s Liberation Army) General Hospital. The accuracy of primary and secondary diagnoses was scored based on predefined criteria. Additionally, we analyzed and compared the causes of misdiagnoses in both models according to 7 categories: patient histories, symptoms, physical signs, laboratory examinations, imaging examinations, pathological examinations, and intraoperative findings. Results: Out of 286 cases, GPT-4.0 and GPT-3.5 both demonstrated high diagnostic accuracy for primary diagnoses, but the accuracy rates of GPT-4.0 were significantly higher than GPT-3.5 (mean 0.972, SD 0.137 vs mean 0.855, SD 0.335; t285=5.753; P<.001). For secondary diagnoses, the accuracy rates of GPT-4.0 were also significantly higher than GPT-3.5 (mean 0.908, SD 0.159 vs mean 0.617, SD 0.349; t285=?7.727; P<.001). GPT-3.5 showed limitations in processing patient history, symptom presentation, laboratory tests, and imaging data. While GPT-4.0 improved upon GPT-3.5, it still has limitations in identifying symptoms and laboratory test data. For both primary and secondary diagnoses, there was no significant difference in accuracy related to age, gender, or system group between GPT-4.0 and GPT-3.5. Conclusions: This study demonstrates that ChatGPT, particularly GPT-4.0, possesses significant diagnostic potential, with GPT-4.0 exhibiting higher accuracy than GPT-3.5. However, GPT-4.0 still has limitations, particularly in recognizing patient symptoms and laboratory data, indicating a need for more research in real-world clinical settings to enhance its diagnostic capabilities. UR - https://www.jmir.org/2024/1/e54985 UR - http://dx.doi.org/10.2196/54985 UR - http://www.ncbi.nlm.nih.gov/pubmed/39255016 ID - info:doi/10.2196/54985 ER - TY - JOUR AU - Tsai, Feng-Fang AU - Chang, Yung-Chun AU - Chiu, Yu-Wen AU - Sheu, Bor-Ching AU - Hsu, Min-Huei AU - Yeh, Huei-Ming PY - 2024/8/21 TI - Machine Learning Model for Anesthetic Risk Stratification for Gynecologic and Obstetric Patients: Cross-Sectional Study Outlining a Novel Approach for Early Detection JO - JMIR Form Res SP - e54097 VL - 8 KW - gradient boosting machine KW - comorbidity KW - gynecological and obstetric procedure KW - ASA classification KW - American Society of Anesthesiologists KW - preoperative evaluation KW - machine learning KW - machine learning model KW - gynecology KW - obstetrics KW - early detection KW - artificial intelligence KW - physiological KW - gestational KW - anesthetic risk KW - clinical laboratory data KW - laboratory data KW - risk KW - risk classification N2 - Background: Preoperative evaluation is important, and this study explored the application of machine learning methods for anesthetic risk classification and the evaluation of the contributions of various factors. To minimize the effects of confounding variables during model training, we used a homogenous group with similar physiological states and ages undergoing similar pelvic organ?related procedures not involving malignancies. Objective: Data on women of reproductive age (age 20-50 years) who underwent gestational or gynecological surgery between January 1, 2017, and December 31, 2021, were obtained from the National Taiwan University Hospital Integrated Medical Database. Methods: We first performed an exploratory analysis and selected key features. We then performed data preprocessing to acquire relevant features related to preoperative examination. To further enhance predictive performance, we used the log-likelihood ratio algorithm to generate comorbidity patterns. Finally, we input the processed features into the light gradient boosting machine (LightGBM) model for training and subsequent prediction. Results: A total of 10,892 patients were included. Within this data set, 9893 patients were classified as having low anesthetic risk (American Society of Anesthesiologists physical status score of 1-2), and 999 patients were classified as having high anesthetic risk (American Society of Anesthesiologists physical status score of >2). The area under the receiver operating characteristic curve of the proposed model was 0.6831. Conclusions: By combining comorbidity information and clinical laboratory data, our methodology based on the LightGBM model provides more accurate predictions for anesthetic risk classification. Trial Registration: Research Ethics Committee of the National Taiwan University Hospital 202204010RINB; https://www.ntuh.gov.tw/RECO/Index.action UR - https://formative.jmir.org/2024/1/e54097 UR - http://dx.doi.org/10.2196/54097 UR - http://www.ncbi.nlm.nih.gov/pubmed/38991090 ID - info:doi/10.2196/54097 ER - TY - JOUR AU - Wong, Chia-En AU - Chen, Pei-Wen AU - Hsu, Heng-Jui AU - Cheng, Shao-Yang AU - Fan, Chen-Che AU - Chen, Yen-Chang AU - Chiu, Yi-Pei AU - Lee, Jung-Shun AU - Liang, Sheng-Fu PY - 2024/7/4 TI - Collaborative Human?Computer Vision Operative Video Analysis Algorithm for Analyzing Surgical Fluency and Surgical Interruptions in Endonasal Endoscopic Pituitary Surgery: Cohort Study JO - J Med Internet Res SP - e56127 VL - 26 KW - algorithm KW - computer vision KW - endonasal endoscopic approach KW - pituitary KW - transsphenoidal surgery N2 - Background: The endonasal endoscopic approach (EEA) is effective for pituitary adenoma resection. However, manual review of operative videos is time-consuming. The application of a computer vision (CV) algorithm could potentially reduce the time required for operative video review and facilitate the training of surgeons to overcome the learning curve of EEA. Objective: This study aimed to evaluate the performance of a CV-based video analysis system, based on OpenCV algorithm, to detect surgical interruptions and analyze surgical fluency in EEA. The accuracy of the CV-based video analysis was investigated, and the time required for operative video review using CV-based analysis was compared to that of manual review. Methods: The dominant color of each frame in the EEA video was determined using OpenCV. We developed an algorithm to identify events of surgical interruption if the alterations in the dominant color pixels reached certain thresholds. The thresholds were determined by training the current algorithm using EEA videos. The accuracy of the CV analysis was determined by manual review, and the time spent was reported. Results: A total of 46 EEA operative videos were analyzed, with 93.6%, 95.1%, and 93.3% accuracies in the training, test 1, and test 2 data sets, respectively. Compared with manual review, CV-based analysis reduced the time required for operative video review by 86% (manual review: 166.8 and CV analysis: 22.6 minutes; P<.001). The application of a human-computer collaborative strategy increased the overall accuracy to 98.5%, with a 74% reduction in the review time (manual review: 166.8 and human-CV collaboration: 43.4 minutes; P<.001). Analysis of the different surgical phases showed that the sellar phase had the lowest frequency (nasal phase: 14.9, sphenoidal phase: 15.9, and sellar phase: 4.9 interruptions/10 minutes; P<.001) and duration (nasal phase: 67.4, sphenoidal phase: 77.9, and sellar phase: 31.1 seconds/10 minutes; P<.001) of surgical interruptions. A comparison of the early and late EEA videos showed that increased surgical experience was associated with a decreased number (early: 4.9 and late: 2.9 interruptions/10 minutes; P=.03) and duration (early: 41.1 and late: 19.8 seconds/10 minutes; P=.02) of surgical interruptions during the sellar phase. Conclusions: CV-based analysis had a 93% to 98% accuracy in detecting the number, frequency, and duration of surgical interruptions occurring during EEA. Moreover, CV-based analysis reduced the time required to analyze the surgical fluency in EEA videos compared to manual review. The application of CV can facilitate the training of surgeons to overcome the learning curve of endoscopic skull base surgery. Trial Registration: ClinicalTrials.gov NCT06156020; https://clinicaltrials.gov/study/NCT06156020 UR - https://www.jmir.org/2024/1/e56127 UR - http://dx.doi.org/10.2196/56127 UR - http://www.ncbi.nlm.nih.gov/pubmed/38963694 ID - info:doi/10.2196/56127 ER - TY - JOUR AU - El-Gabalawy, Renée AU - Sommer, L. Jordana AU - Hebbard, Pamela AU - Reynolds, Kristin AU - Logan, S. Gabrielle AU - Smith, D. Michael S. AU - Mutter, C. Thomas AU - Mutch, Alan W. AU - Mota, Natalie AU - Proulx, Catherine AU - Gagnon Shaigetz, Vincent AU - Maples-Keller, L. Jessica AU - Arora, C. Rakesh AU - Perrin, David AU - Benedictson, Jada AU - Jacobsohn, Eric PY - 2024/5/14 TI - An Immersive Virtual Reality Intervention for Preoperative Anxiety and Distress Among Adults Undergoing Oncological Surgery: Protocol for a 3-Phase Development and Feasibility Trial JO - JMIR Res Protoc SP - e55692 VL - 13 KW - virtual reality KW - preoperative anxiety and distress KW - perioperative mental health KW - breast cancer KW - oncological surgery N2 - Background: Preoperative state anxiety (PSA) is distress and anxiety directly associated with perioperative events. PSA is associated with negative postoperative outcomes such as longer hospital length of stay, increased pain and opioid use, and higher rates of rehospitalization. Psychological prehabilitation, such as education, exposure to hospital environments, and relaxation strategies, has been shown to mitigate PSA; however, there are limited skilled personnel to deliver such interventions in clinical practice. Immersive virtual reality (VR) has the potential for greater accessibility and enhanced integration into an immersive and interactive experience. VR is rarely used in the preoperative setting, but similar forms of stress inoculation training involving exposure to stressful events have improved psychological preparation in contexts such as military deployment. Objective: This study seeks to develop and investigate a targeted PSA intervention in patients undergoing oncological surgery using a single preoperative VR exposure. The primary objectives are to (1) develop a novel VR program for patients undergoing oncological surgery with general anesthesia; (2) assess the feasibility, including acceptability, of a single exposure to this intervention; (3) assess the feasibility, including acceptability, of outcome measures of PSA; and (4) use these results to refine the VR content and outcome measures for a larger trial. A secondary objective is to preliminarily assess the clinical utility of the intervention for PSA. Methods: This study comprises 3 phases. Phase 1 (completed) involved the development of a VR prototype targeting PSA, using multidisciplinary iterative input. Phase 2 (data collection completed) involves examining the feasibility aspects of the VR intervention. This randomized feasibility trial involves assessing the novel VR preoperative intervention compared to a VR control (ie, nature trek) condition and a treatment-as-usual group among patients undergoing breast cancer surgery. Phase 3 will involve refining the prototype based on feasibility findings and input from people with lived experience for a future clinical trial, using focus groups with participants from phase 2. Results: This study was funded in March 2019. Phase 1 was completed in April 2020. Phase 2 data collection was completed in January 2024 and data analysis is ongoing. Focus groups were completed in February 2024. Both the feasibility study and focus groups will contribute to further refinement of the initial VR prototype (phase 3), with the final simulation to be completed by mid-2024. Conclusions: The findings from this work will contribute to the limited body of research examining feasible and broadly accessible interventions for PSA. Knowledge gained from this research will contribute to the final development of a novel VR intervention to be tested in a large population of patients with cancer before surgery in a randomized clinical trial. Trial Registration: ClinicalTrials.gov NCT04544618; https://www.clinicaltrials.gov/study/NCT04544618 International Registered Report Identifier (IRRID): DERR1-10.2196/55692 UR - https://www.researchprotocols.org/2024/1/e55692 UR - http://dx.doi.org/10.2196/55692 UR - http://www.ncbi.nlm.nih.gov/pubmed/38743939 ID - info:doi/10.2196/55692 ER - TY - JOUR AU - Mittal, Ajay AU - Wakim, Jonathan AU - Huq, Suhaiba AU - Wynn, Tung PY - 2024/5/9 TI - Effectiveness of Virtual Reality in Reducing Perceived Pain and Anxiety Among Patients Within a Hospital System: Protocol for a Mixed Methods Study JO - JMIR Res Protoc SP - e52649 VL - 13 KW - virtual reality KW - digital health KW - feasibility KW - acceptability KW - pain KW - anxiety KW - hospital KW - hospitalization KW - in-patient KW - observational study KW - pharmacologic pain management KW - pain management KW - topical anesthetic creams KW - topical cream N2 - Background: Within hospital systems, diverse subsets of patients are subject to minimally invasive procedures that provide therapeutic relief and necessary health data that are often perceived as anxiogenic or painful. These feelings are particularly relevant to patients experiencing procedures where they are conscious and not sedated or placed under general anesthesia that renders them incapacitated. Pharmacologic pain management and topical anesthetic creams are used to manage these feelings; however, distraction-based methods can provide nonpharmacologic means to modify the painful experience and discomfort often associated with these procedures. Recent studies support distraction as a useful method for reducing anxiety and pain and as a result, improving patient experience. Virtual reality (VR) is an emerging technology that provides an immersive user experience and can operate through a distraction-based method to reduce the negative or painful experience often related to procedures where the patient is conscious. Given the possible short-term and long-term outcomes of poorly managed pain and enduring among patients, health care professionals are challenged to improve patient well-being during medically essential procedures. Objective: The purpose of this pilot project is to assess the efficacy of using VR as a distraction-based intervention for anxiety or pain management compared to other nonpharmacologic interventions in a variety of hospital settings, specifically in patients undergoing lumbar puncture procedures and bone marrow biopsies at the oncology ward, patients receiving nerve block for a broken bone at an anesthesia or surgical center, patients undergoing a cleaning at a dental clinic, patients conscious during an ablation procedure at a cardiology clinic, and patients awake during a kidney biopsy at a nephrology clinic. This will provide the framework for additional studies in other health care settings. Methods: In a single visit, patients eligible for the study will complete brief preprocedural and postprocedural questionnaires about their perceived fear, anxiety, and pain levels. During the procedure, research assistants will place a VR headset on the patient and the patient will undergo a VR experience to distract from any pain felt from the procedure. Participants? vitals, including blood pressure, heart rate, and rate of respiration, will also be recorded before, during, and after the procedure. Results: The study is already underway, and results support a decrease in perceived pain by 1.00 and a decrease in perceived anxiety by 0.3 compared to the control group (on a 10-point Likert scale). Among the VR intervention group, the average rating for comfort was 4.35 out of 5. Conclusions: This study will provide greater insight into how patients? perception of anxiety and pain could potentially be altered. Furthermore, metrics related to the operational efficiency of providing a VR intervention compared to a control will provide insight into the feasibility and integration of such technologies in routine practice. International Registered Report Identifier (IRRID): DERR1-10.2196/52649 UR - https://www.researchprotocols.org/2024/1/e52649 UR - http://dx.doi.org/10.2196/52649 UR - http://www.ncbi.nlm.nih.gov/pubmed/38722681 ID - info:doi/10.2196/52649 ER - TY - JOUR AU - Osmanodja, Bilgin AU - Sassi, Zeineb AU - Eickmann, Sascha AU - Hansen, Maria Carla AU - Roller, Roland AU - Burchardt, Aljoscha AU - Samhammer, David AU - Dabrock, Peter AU - Möller, Sebastian AU - Budde, Klemens AU - Herrmann, Anne PY - 2024/4/1 TI - Investigating the Impact of AI on Shared Decision-Making in Post-Kidney Transplant Care (PRIMA-AI): Protocol for a Randomized Controlled Trial JO - JMIR Res Protoc SP - e54857 VL - 13 KW - shared decision-making KW - SDM KW - kidney transplantation KW - artificial intelligence KW - AI KW - decision-support system KW - DSS KW - qualitative research N2 - Background: Patients after kidney transplantation eventually face the risk of graft loss with the concomitant need for dialysis or retransplantation. Choosing the right kidney replacement therapy after graft loss is an important preference-sensitive decision for kidney transplant recipients. However, the rate of conversations about treatment options after kidney graft loss has been shown to be as low as 13% in previous studies. It is unknown whether the implementation of artificial intelligence (AI)?based risk prediction models can increase the number of conversations about treatment options after graft loss and how this might influence the associated shared decision-making (SDM). Objective: This study aims to explore the impact of AI-based risk prediction for the risk of graft loss on the frequency of conversations about the treatment options after graft loss, as well as the associated SDM process. Methods: This is a 2-year, prospective, randomized, 2-armed, parallel-group, single-center trial in a German kidney transplant center. All patients will receive the same routine post?kidney transplant care that usually includes follow-up visits every 3 months at the kidney transplant center. For patients in the intervention arm, physicians will be assisted by a validated and previously published AI-based risk prediction system that estimates the risk for graft loss in the next year, starting from 3 months after randomization until 24 months after randomization. The study population will consist of 122 kidney transplant recipients >12 months after transplantation, who are at least 18 years of age, are able to communicate in German, and have an estimated glomerular filtration rate <30 mL/min/1.73 m2. Patients with multi-organ transplantation, or who are not able to communicate in German, as well as underage patients, cannot participate. For the primary end point, the proportion of patients who have had a conversation about their treatment options after graft loss is compared at 12 months after randomization. Additionally, 2 different assessment tools for SDM, the CollaboRATE mean score and the Control Preference Scale, are compared between the 2 groups at 12 months and 24 months after randomization. Furthermore, recordings of patient-physician conversations, as well as semistructured interviews with patients, support persons, and physicians, are performed to support the quantitative results. Results: The enrollment for the study is ongoing. The first results are expected to be submitted for publication in 2025. Conclusions: This is the first study to examine the influence of AI-based risk prediction on physician-patient interaction in the context of kidney transplantation. We use a mixed methods approach by combining a randomized design with a simple quantitative end point (frequency of conversations), different quantitative measurements for SDM, and several qualitative research methods (eg, records of physician-patient conversations and semistructured interviews) to examine the implementation of AI-based risk prediction in the clinic. Trial Registration: ClinicalTrials.gov NCT06056518; https://clinicaltrials.gov/study/NCT06056518 International Registered Report Identifier (IRRID): PRR1-10.2196/54857 UR - https://www.researchprotocols.org/2024/1/e54857 UR - http://dx.doi.org/10.2196/54857 UR - http://www.ncbi.nlm.nih.gov/pubmed/38557315 ID - info:doi/10.2196/54857 ER - TY - JOUR AU - Rohatgi, Nidhi PY - 2023/11/21 TI - JMIR Perioperative Medicine: A Global Journal for Publishing Interdisciplinary Innovations, Research, and Perspectives JO - JMIR Perioper Med SP - e54344 VL - 6 KW - JMIR Perioperative Medicine KW - innovation KW - technology KW - digital health KW - research KW - interdisciplinary KW - perioperative medicine UR - https://periop.jmir.org/2023/1/e54344 UR - http://dx.doi.org/10.2196/54344 UR - http://www.ncbi.nlm.nih.gov/pubmed/37988142 ID - info:doi/10.2196/54344 ER - TY - JOUR AU - Nakanishi, Kozo AU - Goto, Hidenori PY - 2023/11/14 TI - A New Index for the Quantitative Evaluation of Surgical Invasiveness Based on Perioperative Patients? Behavior Patterns: Machine Learning Approach Using Triaxial Acceleration JO - JMIR Perioper Med SP - e50188 VL - 6 KW - surgery KW - invasiveness KW - triaxial acceleration KW - machine learning KW - human activity recognition KW - patient-oriented outcome KW - video-assisted thoracoscopic surgery KW - VATS KW - postoperative recovery KW - perioperative management KW - artificial intelligence KW - AI KW - mobile phone N2 - Background: The minimally invasive nature of thoracoscopic surgery is well recognized; however, the absence of a reliable evaluation method remains challenging. We hypothesized that the postoperative recovery speed is closely linked to surgical invasiveness, where recovery signifies the patient?s behavior transition back to their preoperative state during the perioperative period. Objective: This study aims to determine whether machine learning using triaxial acceleration data can effectively capture perioperative behavior changes and establish a quantitative index for quantifying variations in surgical invasiveness. Methods: We trained 7 distinct machine learning models using a publicly available human acceleration data set as supervised data. The 3 top-performing models were selected to predict patient actions, as determined by the Matthews correlation coefficient scores. Two patients who underwent different levels of invasive thoracoscopic surgery were selected as participants. Acceleration data were collected via chest sensors for 8 hours during the preoperative and postoperative hospitalization days. These data were categorized into 4 actions (walking, standing, sitting, and lying down) using the selected models. The actions predicted by the model with intermediate results were adopted as the actions of the participants. The daily appearance probability was calculated for each action. The 2 differences between 2 appearance probabilities (sitting vs standing and lying down vs walking) were calculated using 2 coordinates on the x- and y-axes. A 2D vector composed of coordinate values was defined as the index of behavior pattern (iBP) for the day. All daily iBPs were graphed, and the enclosed area and distance between points were calculated and compared between participants to assess the relationship between changes in the indices and invasiveness. Results: Patients 1 and 2 underwent lung lobectomy and incisional tumor biopsy, respectively. The selected predictive model was a light-gradient boosting model (mean Matthews correlation coefficient 0.98, SD 0.0027; accuracy: 0.98). The acceleration data yielded 548,466 points for patient 1 and 466,407 points for patient 2. The iBPs of patient 1 were [(0.32, 0.19), (?0.098, 0.46), (?0.15, 0.13), (?0.049, 0.22)] and those of patient 2 were [(0.55, 0.30), (0.77, 0.21), (0.60, 0.25), (0.61, 0.31)]. The enclosed areas were 0.077 and 0.0036 for patients 1 and 2, respectively. Notably, the distances for patient 1 were greater than those for patient 2 ({0.44, 0.46, 0.37, 0.26} vs {0.23, 0.0065, 0.059}; P=.03 [Mann-Whitney U test]). Conclusions: The selected machine learning model effectively predicted the actions of the surgical patients with high accuracy. The temporal distribution of action times revealed changes in behavior patterns during the perioperative phase. The proposed index may facilitate the recognition and visualization of perioperative changes in patients and differences in surgical invasiveness. UR - https://periop.jmir.org/2023/1/e50188 UR - http://dx.doi.org/10.2196/50188 UR - http://www.ncbi.nlm.nih.gov/pubmed/37962919 ID - info:doi/10.2196/50188 ER - TY - JOUR AU - Matsumoto, Koutarou AU - Nohara, Yasunobu AU - Sakaguchi, Mikako AU - Takayama, Yohei AU - Fukushige, Syota AU - Soejima, Hidehisa AU - Nakashima, Naoki AU - Kamouchi, Masahiro PY - 2023/10/26 TI - Temporal Generalizability of Machine Learning Models for Predicting Postoperative Delirium Using Electronic Health Record Data: Model Development and Validation Study JO - JMIR Perioper Med SP - e50895 VL - 6 KW - postoperative delirium KW - prediction model KW - machine learning KW - temporal generalizability KW - electronic health record data N2 - Background: Although machine learning models demonstrate significant potential in predicting postoperative delirium, the advantages of their implementation in real-world settings remain unclear and require a comparison with conventional models in practical applications. Objective: The objective of this study was to validate the temporal generalizability of decision tree ensemble and sparse linear regression models for predicting delirium after surgery compared with that of the traditional logistic regression model. Methods: The health record data of patients hospitalized at an advanced emergency and critical care medical center in Kumamoto, Japan, were collected electronically. We developed a decision tree ensemble model using extreme gradient boosting (XGBoost) and a sparse linear regression model using least absolute shrinkage and selection operator (LASSO) regression. To evaluate the predictive performance of the model, we used the area under the receiver operating characteristic curve (AUROC) and the Matthews correlation coefficient (MCC) to measure discrimination and the slope and intercept of the regression between predicted and observed probabilities to measure calibration. The Brier score was evaluated as an overall performance metric. We included 11,863 consecutive patients who underwent surgery with general anesthesia between December 2017 and February 2022. The patients were divided into a derivation cohort before the COVID-19 pandemic and a validation cohort during the COVID-19 pandemic. Postoperative delirium was diagnosed according to the confusion assessment method. Results: A total of 6497 patients (68.5, SD 14.4 years, women n=2627, 40.4%) were included in the derivation cohort, and 5366 patients (67.8, SD 14.6 years, women n=2105, 39.2%) were included in the validation cohort. Regarding discrimination, the XGBoost model (AUROC 0.87-0.90 and MCC 0.34-0.44) did not significantly outperform the LASSO model (AUROC 0.86-0.89 and MCC 0.34-0.41). The logistic regression model (AUROC 0.84-0.88, MCC 0.33-0.40, slope 1.01-1.19, intercept ?0.16 to 0.06, and Brier score 0.06-0.07), with 8 predictors (age, intensive care unit, neurosurgery, emergency admission, anesthesia time, BMI, blood loss during surgery, and use of an ambulance) achieved good predictive performance. Conclusions: The XGBoost model did not significantly outperform the LASSO model in predicting postoperative delirium. Furthermore, a parsimonious logistic model with a few important predictors achieved comparable performance to machine learning models in predicting postoperative delirium. UR - https://periop.jmir.org/2023/1/e50895 UR - http://dx.doi.org/10.2196/50895 UR - http://www.ncbi.nlm.nih.gov/pubmed/37883164 ID - info:doi/10.2196/50895 ER - TY - JOUR AU - Bottani, Eleonora AU - Bellini, Valentina AU - Mordonini, Monica AU - Pellegrino, Mattia AU - Lombardo, Gianfranco AU - Franchi, Beatrice AU - Craca, Michelangelo AU - Bignami, Elena PY - 2023/7/5 TI - Internet of Things and New Technologies for Tracking Perioperative Patients With an Innovative Model for Operating Room Scheduling: Protocol for a Development and Feasibility Study JO - JMIR Res Protoc SP - e45477 VL - 12 KW - internet of things KW - artificial intelligence KW - machine learning KW - perioperative organization KW - operating rooms N2 - Background: Management of operating rooms is a critical point in health care organizations because surgical departments represent a significant cost in hospital budgets. Therefore, it is increasingly important that there is effective planning of elective, emergency, and day surgery and optimization of both the human and physical resources available, always maintaining a high level of care and health treatment. This would lead to a reduction in patient waiting lists and better performance not only of surgical departments but also of the entire hospital. Objective: This study aims to automatically collect data from a real surgical scenario to develop an integrated technological-organizational model that optimizes operating block resources. Methods: Each patient is tracked and located in real time by wearing a bracelet sensor with a unique identifier. Exploiting the indoor location, the software architecture is able to collect the time spent for every step inside the surgical block. This method does not in any way affect the level of assistance that the patient receives and always protects their privacy; in fact, after expressing informed consent, each patient will be associated with an anonymous identification number. Results: The preliminary results are promising, making the study feasible and functional. Times automatically recorded are much more precise than those collected by humans and reported in the organization?s information system. In addition, machine learning can exploit the historical data collection to predict the surgery time required for each patient according to the patient?s specific profile. Simulation can also be applied to reproduce the system?s functioning, evaluate current performance, and identify strategies to improve the efficiency of the operating block. Conclusions: This functional approach improves short- and long-term surgical planning, facilitating interaction between the various professionals involved in the operating block, optimizing the management of available resources, and guaranteeing a high level of patient care in an increasingly efficient health care system. Trial Registration: ClinicalTrials.gov NCT05106621; https://clinicaltrials.gov/ct2/show/NCT05106621 International Registered Report Identifier (IRRID): DERR1-10.2196/45477 UR - https://www.researchprotocols.org/2023/1/e45477 UR - http://dx.doi.org/10.2196/45477 UR - http://www.ncbi.nlm.nih.gov/pubmed/37405821 ID - info:doi/10.2196/45477 ER - TY - JOUR AU - Zoodsma, S. Ruben AU - Bosch, Rian AU - Alderliesten, Thomas AU - Bollen, W. Casper AU - Kappen, H. Teus AU - Koomen, Erik AU - Siebes, Arno AU - Nijman, Joppe PY - 2023/5/16 TI - Continuous Data-Driven Monitoring in Critical Congenital Heart Disease: Clinical Deterioration Model Development JO - JMIR Cardio SP - e45190 VL - 7 KW - artificial intelligence KW - aberration detection KW - clinical deterioration KW - classification model KW - paediatric intensive care KW - pediatric intensive care KW - congenital heart disease KW - cardiac monitoring KW - machine learning KW - peri-operative KW - perioperative KW - surgery N2 - Background: Critical congenital heart disease (cCHD)?requiring cardiac intervention in the first year of life for survival?occurs globally in 2-3 of every 1000 live births. In the critical perioperative period, intensive multimodal monitoring at a pediatric intensive care unit (PICU) is warranted, as their organs?especially the brain?may be severely injured due to hemodynamic and respiratory events. These 24/7 clinical data streams yield large quantities of high-frequency data, which are challenging in terms of interpretation due to the varying and dynamic physiology innate to cCHD. Through advanced data science algorithms, these dynamic data can be condensed into comprehensible information, reducing the cognitive load on the medical team and providing data-driven monitoring support through automated detection of clinical deterioration, which may facilitate timely intervention. Objective: This study aimed to develop a clinical deterioration detection algorithm for PICU patients with cCHD. Methods: Retrospectively, synchronous per-second data of cerebral regional oxygen saturation (rSO2) and 4 vital parameters (respiratory rate, heart rate, oxygen saturation, and invasive mean blood pressure) in neonates with cCHD admitted to the University Medical Center Utrecht, the Netherlands, between 2002 and 2018 were extracted. Patients were stratified based on mean oxygen saturation during admission to account for physiological differences between acyanotic and cyanotic cCHD. Each subset was used to train our algorithm in classifying data as either stable, unstable, or sensor dysfunction. The algorithm was designed to detect combinations of parameters abnormal to the stratified subpopulation and significant deviations from the patient?s unique baseline, which were further analyzed to distinguish clinical improvement from deterioration. Novel data were used for testing, visualized in detail, and internally validated by pediatric intensivists. Results: A retrospective query yielded 4600 hours and 209 hours of per-second data in 78 and 10 neonates for, respectively, training and testing purposes. During testing, stable episodes occurred 153 times, of which 134 (88%) were correctly detected. Unstable episodes were correctly noted in 46 of 57 (81%) observed episodes. Twelve expert-confirmed unstable episodes were missed in testing. Time-percentual accuracy was 93% and 77% for, respectively, stable and unstable episodes. A total of 138 sensorial dysfunctions were detected, of which 130 (94%) were correct. Conclusions: In this proof-of-concept study, a clinical deterioration detection algorithm was developed and retrospectively evaluated to classify clinical stability and instability, achieving reasonable performance considering the heterogeneous population of neonates with cCHD. Combined analysis of baseline (ie, patient-specific) deviations and simultaneous parameter-shifting (ie, population-specific) proofs would be promising with respect to enhancing applicability to heterogeneous critically ill pediatric populations. After prospective validation, the current?and comparable?models may, in the future, be used in the automated detection of clinical deterioration and eventually provide data-driven monitoring support to the medical team, allowing for timely intervention. UR - https://cardio.jmir.org/2023/1/e45190 UR - http://dx.doi.org/10.2196/45190 UR - http://www.ncbi.nlm.nih.gov/pubmed/37191988 ID - info:doi/10.2196/45190 ER - TY - JOUR AU - Sun, Peng AU - Zhao, Yao AU - Men, Jie AU - Ma, Zhe-Ru AU - Jiang, Hao-Zhuo AU - Liu, Cheng-Yan AU - Feng, Wei PY - 2023/3/10 TI - Application of Virtual and Augmented Reality Technology in Hip Surgery: Systematic Review JO - J Med Internet Res SP - e37599 VL - 25 KW - virtual reality KW - augmented reality KW - hip KW - pelvis KW - arthroplasty KW - mobile phone N2 - Background: Virtual and augmented reality (VAR) represents a combination of current state-of-the-art computer and imaging technologies and has the potential to be a revolutionary technology in many surgical fields. An increasing number of investigators have developed and applied VAR in hip-related surgery with the aim of using this technology to reduce hip surgery?related complications, improve surgical success rates, and reduce surgical risks. These technologies are beginning to be widely used in hip-related preoperative operation simulation and training, intraoperative navigation tools in the operating room, and postoperative rehabilitation. Objective: With the aim of reviewing the current status of virtual reality (VR) and augmented reality (AR) in hip-related surgery and summarizing its benefits, we discussed and briefly described the applicability, advantages, limitations, and future perspectives of VR and AR techniques in hip-related surgery, such as preoperative operation simulation and training; explored the possible future applications of AR in the operating room; and discussed the bright prospects of VR and AR technologies in postoperative rehabilitation after hip surgery. Methods: We searched the PubMed and Web of Science databases using the following key search terms: (?virtual reality? OR ?augmented reality?) AND (?pelvis? OR ?hip?). The literature on basic and clinical research related to the aforementioned key search terms, that is, studies evaluating the key factors, challenges, or problems of using of VAR technology in hip-related surgery, was collected. Results: A total of 40 studies and reports were included and classified into the following categories: total hip arthroplasty, hip resurfacing, femoral neck fracture, pelvic fracture, acetabular fracture, tumor, arthroscopy, and postoperative rehabilitation. Quality assessment could be performed in 30 studies. Among the clinical studies, there were 16 case series with an average score of 89 out of 100 points (89%) and 1 case report that scored 81 (SD 10.11) out of 100 points (81%) according to the Joanna Briggs Institute Critical Appraisal Checklist. Two cadaveric studies scored 85 of 100 points (85%) and 92 of 100 points (92%) according to the Quality Appraisal for Cadaveric Studies scale. Conclusions: VR and AR technologies hold great promise for hip-related surgeries, especially for preoperative operation simulation and training, feasibility applications in the operating room, and postoperative rehabilitation, and have the potential to assist orthopedic surgeons in operating more accurately and safely. More comparative studies are necessary, including studies focusing on clinical outcomes and cost-effectiveness. UR - https://www.jmir.org/2023/1/e37599 UR - http://dx.doi.org/10.2196/37599 UR - http://www.ncbi.nlm.nih.gov/pubmed/36651587 ID - info:doi/10.2196/37599 ER - TY - JOUR AU - Gabriel, Allanigue Rodney AU - Simpson, Sierra AU - Zhong, William AU - Burton, Nicole Brittany AU - Mehdipour, Soraya AU - Said, Tadros Engy PY - 2023/2/8 TI - A Neural Network Model Using Pain Score Patterns to Predict the Need for Outpatient Opioid Refills Following Ambulatory Surgery: Algorithm Development and Validation JO - JMIR Perioper Med SP - e40455 VL - 6 KW - opioids KW - ambulatory surgery KW - machine learning KW - surgery KW - outpatient KW - pain medication KW - pain KW - pain management KW - patient needs KW - predict KW - algorithms KW - clinical decision support KW - pain care N2 - Background: Expansion of clinical guidance tools is crucial to identify patients at risk of requiring an opioid refill after outpatient surgery. Objective: The objective of this study was to develop machine learning algorithms incorporating pain and opioid features to predict the need for outpatient opioid refills following ambulatory surgery. Methods: Neural networks, regression, random forest, and a support vector machine were used to evaluate the data set. For each model, oversampling and undersampling techniques were implemented to balance the data set. Hyperparameter tuning based on k-fold cross-validation was performed, and feature importance was ranked based on a Shapley Additive Explanations (SHAP) explainer model. To assess performance, we calculated the average area under the receiver operating characteristics curve (AUC), F1-score, sensitivity, and specificity for each model. Results: There were 1333 patients, of whom 144 (10.8%) refilled their opioid prescription within 2 weeks after outpatient surgery. The average AUC calculated from k-fold cross-validation was 0.71 for the neural network model. When the model was validated on the test set, the AUC was 0.75. The features with the highest impact on model output were performance of a regional nerve block, postanesthesia care unit maximum pain score, postanesthesia care unit median pain score, active smoking history, and total perioperative opioid consumption. Conclusions: Applying machine learning algorithms allows providers to better predict outcomes that require specialized health care resources such as transitional pain clinics. This model can aid as a clinical decision support for early identification of at-risk patients who may benefit from transitional pain clinic care perioperatively in ambulatory surgery. UR - https://periop.jmir.org/2023/1/e40455 UR - http://dx.doi.org/10.2196/40455 UR - http://www.ncbi.nlm.nih.gov/pubmed/36753316 ID - info:doi/10.2196/40455 ER - TY - JOUR AU - Mlodzinski, Eric AU - Wardi, Gabriel AU - Viglione, Clare AU - Nemati, Shamim AU - Crotty Alexander, Laura AU - Malhotra, Atul PY - 2023/1/27 TI - Assessing Barriers to Implementation of Machine Learning and Artificial Intelligence?Based Tools in Critical Care: Web-Based Survey Study JO - JMIR Perioper Med SP - e41056 VL - 6 KW - surveys and questionnaires KW - machine learning KW - artificial intelligence KW - critical care KW - respiratory insufficiency KW - survey KW - Qualtrics KW - questionnaire KW - perception KW - trust KW - perspective KW - attitude KW - intubation KW - predict KW - barrier KW - adoption KW - implementation N2 - Background: Although there is considerable interest in machine learning (ML) and artificial intelligence (AI) in critical care, the implementation of effective algorithms into practice has been limited. Objective: We sought to understand physician perspectives of a novel intubation prediction tool. Further, we sought to understand health care provider and nonprovider perspectives on the use of ML in health care. We aim to use the data gathered to elucidate implementation barriers and determinants of this intubation prediction tool, as well as ML/AI-based algorithms in critical care and health care in general. Methods: We developed 2 anonymous surveys in Qualtrics, 1 single-center survey distributed to 99 critical care physicians via email, and 1 social media survey distributed via Facebook and Twitter with branching logic to tailor questions for providers and nonproviders. The surveys included a mixture of categorical, Likert scale, and free-text items. Likert scale means with SD were reported from 1 to 5. We used student t tests to examine the differences between groups. In addition, Likert scale responses were converted into 3 categories, and percentage values were reported in order to demonstrate the distribution of responses. Qualitative free-text responses were reviewed by a member of the study team to determine validity, and content analysis was performed to determine common themes in responses. Results: Out of 99 critical care physicians, 47 (48%) completed the single-center survey. Perceived knowledge of ML was low with a mean Likert score of 2.4 out of 5 (SD 0.96), with 7.5% of respondents rating their knowledge as a 4 or 5. The willingness to use the ML-based algorithm was 3.32 out of 5 (SD 0.95), with 75% of respondents answering 3 out of 5. The social media survey had 770 total responses with 605 (79%) providers and 165 (21%) nonproviders. We found no difference in providers? perceived knowledge based on level of experience in either survey. We found that nonproviders had significantly less perceived knowledge of ML (mean 3.04 out of 5, SD 1.53 vs mean 3.43, SD 0.941; P<.001) and comfort with ML (mean 3.28 out of 5, SD 1.02 vs mean 3.53, SD 0.935; P=.004) than providers. Free-text responses revealed multiple shared concerns, including accuracy/reliability, data bias, patient safety, and privacy/security risks. Conclusions: These data suggest that providers and nonproviders have positive perceptions of ML-based tools, and that a tool to predict the need for intubation would be of interest to critical care providers. There were many shared concerns about ML/AI in health care elucidated by the surveys. These results provide a baseline evaluation of implementation barriers and determinants of ML/AI-based tools that will be important in their optimal implementation and adoption in the critical care setting and health care in general. UR - https://periop.jmir.org/2023/1/e41056 UR - http://dx.doi.org/10.2196/41056 UR - http://www.ncbi.nlm.nih.gov/pubmed/36705960 ID - info:doi/10.2196/41056 ER - TY - JOUR AU - Gabriel, Allanigue Rodney AU - Harjai, Bhavya AU - Simpson, Sierra AU - Du, Liu Austin AU - Tully, Logan Jeffrey AU - George, Olivier AU - Waterman, Ruth PY - 2023/1/26 TI - An Ensemble Learning Approach to Improving Prediction of Case Duration for Spine Surgery: Algorithm Development and Validation JO - JMIR Perioper Med SP - e39650 VL - 6 KW - ensemble learning KW - machine learning KW - spine surgery KW - case duration KW - prediction accuracy KW - operating room efficiency KW - learning KW - surgery KW - spine KW - operating room KW - case KW - model KW - patient KW - surgeon KW - linear regression KW - accuracy KW - estimation KW - time N2 - Background: Estimating surgical case duration accurately is an important operating room efficiency metric. Current predictive techniques in spine surgery include less sophisticated approaches such as classical multivariable statistical models. Machine learning approaches have been used to predict outcomes such as length of stay and time returning to normal work, but have not been focused on case duration. Objective: The primary objective of this 4-year, single-academic-center, retrospective study was to use an ensemble learning approach that may improve the accuracy of scheduled case duration for spine surgery. The primary outcome measure was case duration. Methods: We compared machine learning models using surgical and patient features to our institutional method, which used historic averages and surgeon adjustments as needed. We implemented multivariable linear regression, random forest, bagging, and XGBoost (Extreme Gradient Boosting) and calculated the average R2, root-mean-square error (RMSE), explained variance, and mean absolute error (MAE) using k-fold cross-validation. We then used the SHAP (Shapley Additive Explanations) explainer model to determine feature importance. Results: A total of 3189 patients who underwent spine surgery were included. The institution?s current method of predicting case times has a very poor coefficient of determination with actual times (R2=0.213). On k-fold cross-validation, the linear regression model had an explained variance score of 0.345, an R2 of 0.34, an RMSE of 162.84 minutes, and an MAE of 127.22 minutes. Among all models, the XGBoost regressor performed the best with an explained variance score of 0.778, an R2 of 0.770, an RMSE of 92.95 minutes, and an MAE of 44.31 minutes. Based on SHAP analysis of the XGBoost regression, body mass index, spinal fusions, surgical procedure, and number of spine levels involved were the features with the most impact on the model. Conclusions: Using ensemble learning-based predictive models, specifically XGBoost regression, can improve the accuracy of the estimation of spine surgery times. UR - https://periop.jmir.org/2023/1/e39650 UR - http://dx.doi.org/10.2196/39650 UR - http://www.ncbi.nlm.nih.gov/pubmed/36701181 ID - info:doi/10.2196/39650 ER - TY - JOUR AU - Jozsa, Felix AU - Baker, Rose AU - Kelly, Peter AU - Ahmed, Muneer AU - Douek, Michael PY - 2022/11/15 TI - The Use of Machine Learning to Reduce Overtreatment of the Axilla in Breast Cancer: Retrospective Cohort Study JO - JMIR Perioper Med SP - e34600 VL - 5 IS - 1 KW - breast cancer KW - preoperative screening KW - machine learning KW - artificial intelligence KW - artificial neural network KW - breast KW - cancer KW - axillary node KW - metastasis KW - metastatic KW - preoperative KW - axillary clearance KW - metastases KW - oncology N2 - Background: Patients with early breast cancer undergoing primary surgery, who have low axillary nodal burden, can safely forego axillary node clearance (ANC). However, routine use of axillary ultrasound (AUS) leads to 43% of patients in this group having ANC unnecessarily, following a positive AUS. The intersection of machine learning with medicine can provide innovative ways to understand specific risks within large patient data sets, but this has not yet been trialed in the arena of axillary node management in breast cancer. Objective: The objective of this study was to assess if machine learning techniques could be used to improve preoperative identification of patients with low and high axillary metastatic burden. Methods: A single-center retrospective analysis was performed on patients with breast cancer who had a preoperative AUS, and the specificity and sensitivity of AUS were calculated. Standard statistical methods and machine learning methods, including artificial neural network, naive Bayes, support vector machine, and random forest, were applied to the data to see if they could improve the accuracy of preoperative AUS to better discern high and low axillary burden. Results: The study included 459 patients; 142 (31%) had a positive AUS; among this group, 88 (62%) had 2 or fewer macrometastatic nodes at ANC. Logistic regression outperformed AUS (specificity 0.950 vs 0.809). Of all the methods, the artificial neural network had the highest accuracy (0.919). Interestingly, AUS had the highest sensitivity of all methods (0.777), underlining its utility in this setting. Conclusions: We demonstrated that machine learning improves identification of the important subgroup of patients with no palpable axillary disease, positive ultrasound, and more than 2 metastatically involved nodes. A negative ultrasound in patients with no palpable lymphadenopathy is highly indicative of low axillary burden, and it is unclear whether sentinel node biopsy adds value in this situation. Further studies with larger patient numbers focusing on specific breast cancer subgroups are required to refine these techniques in this setting. UR - https://periop.jmir.org/2022/1/e34600 UR - http://dx.doi.org/10.2196/34600 UR - http://www.ncbi.nlm.nih.gov/pubmed/36378516 ID - info:doi/10.2196/34600 ER - TY - JOUR AU - Bardia, Amit AU - Deshpande, Ranjit AU - Michel, George AU - Yanez, David AU - Dai, Feng AU - Pace, L. Nathan AU - Schuster, Kevin AU - Mathis, R. Michael AU - Kheterpal, Sachin AU - Schonberger, B. Robert PY - 2022/10/5 TI - Demonstration and Performance Evaluation of Two Novel Algorithms for Removing Artifacts From Automated Intraoperative Temperature Data Sets: Multicenter, Observational, Retrospective Study JO - JMIR Perioper Med SP - e37174 VL - 5 IS - 1 KW - temperature KW - intraoperative KW - artifacts KW - algorithms KW - perioperative KW - surgery KW - temperature probe KW - artifact reduction KW - data acquisition KW - accuracy N2 - Background: The automated acquisition of intraoperative patient temperature data via temperature probes leads to the possibility of producing a number of artifacts related to probe positioning that may impact these probes? utility for observational research. Objective: We sought to compare the performance of two de novo algorithms for filtering such artifacts. Methods: In this observational retrospective study, the intraoperative temperature data of adults who received general anesthesia for noncardiac surgery were extracted from the Multicenter Perioperative Outcomes Group registry. Two algorithms were developed and then compared to the reference standard?anesthesiologists? manual artifact detection process. Algorithm 1 (a slope-based algorithm) was based on the linear curve fit of 3 adjacent temperature data points. Algorithm 2 (an interval-based algorithm) assessed for time gaps between contiguous temperature recordings. Sensitivity and specificity values for artifact detection were calculated for each algorithm, as were mean temperatures and areas under the curve for hypothermia (temperatures below 36 °C) for each patient, after artifact removal via each methodology. Results: A total of 27,683 temperature readings from 200 anesthetic records were analyzed. The overall agreement among the anesthesiologists was 92.1%. Both algorithms had high specificity but moderate sensitivity (specificity: 99.02% for algorithm 1 vs 99.54% for algorithm 2; sensitivity: 49.13% for algorithm 1 vs 37.72% for algorithm 2; F-score: 0.65 for algorithm 1 vs 0.55 for algorithm 2). The areas under the curve for time × hypothermic temperature and the mean temperatures recorded for each case after artifact removal were similar between the algorithms and the anesthesiologists. Conclusions: The tested algorithms provide an automated way to filter intraoperative temperature artifacts that closely approximates manual sorting by anesthesiologists. Our study provides evidence demonstrating the efficacy of highly generalizable artifact reduction algorithms that can be readily used by observational studies that rely on automated intraoperative data acquisition. UR - https://periop.jmir.org/2022/1/e37174 UR - http://dx.doi.org/10.2196/37174 UR - http://www.ncbi.nlm.nih.gov/pubmed/36197702 ID - info:doi/10.2196/37174 ER - TY - JOUR AU - McLeod, Graeme AU - Kennedy, Iain AU - Simpson, Eilidh AU - Joss, Judith AU - Goldmann, Katriona PY - 2022/3/30 TI - Pilot Project for a Web-Based Dynamic Nomogram to Predict Survival 1 Year After Hip Fracture Surgery: Retrospective Observational Study JO - Interact J Med Res SP - e34096 VL - 11 IS - 1 KW - hip fracture KW - survival KW - prediction KW - nomogram KW - web KW - surgery KW - postoperative KW - machine learning KW - model KW - mortality KW - hip KW - fracture N2 - Background: Hip fracture is associated with high mortality. Identification of individual risk informs anesthetic and surgical decision-making and can reduce the risk of death. However, interpreting mathematical models and applying them in clinical practice can be difficult. There is a need to simplify risk indices for clinicians and laypeople alike. Objective: Our primary objective was to develop a web-based nomogram for prediction of survival up to 365 days after hip fracture surgery. Methods: We collected data from 329 patients. Our variables included sex; age; BMI; white cell count; levels of lactate, creatinine, hemoglobin, and C-reactive protein; physical status according to the American Society of Anesthesiologists Physical Status Classification System; socioeconomic status; duration of surgery; total time in the operating room; side of surgery; and procedure urgency. Thereafter, we internally calibrated and validated a Cox proportional hazards model of survival 365 days after hip fracture surgery; logistic regression models of survival 30, 120, and 365 days after surgery; and a binomial model. To present the models on a laptop, tablet, or mobile phone in a user-friendly way, we built an app using Shiny (RStudio). The app showed a drop-down box for model selection and horizontal sliders for data entry, model summaries, and prediction and survival plots. A slider represented patient follow-up over 365 days. Results: Of the 329 patients, 24 (7.3%) died within 30 days of surgery, 65 (19.8%) within 120 days, and 94 (28.6%) within 365 days. In all models, the independent predictors of mortality were age, BMI, creatinine level, and lactate level. The logistic model also incorporated white cell count as a predictor. The Cox proportional hazards model showed that mortality differed as follows: age 80 vs 60 years had a hazard ratio (HR) of 0.6 (95% CI 0.3-1.1), a plasma lactate level of 2 vs 1 mmol/L had an HR of 2.4 (95% CI 1.5-3.9), and a plasma creatinine level of 60 vs 90 mol/L had an HR of 2.3 (95% CI 1.3-3.9). Conclusions: In conclusion, we provide an easy-to-read web-based nomogram that predicts survival up to 365 days after hip fracture. The Cox proportional hazards model and logistic models showed good discrimination, with concordance index values of 0.732 and 0.781, respectively. UR - https://www.i-jmr.org/2022/1/e34096 UR - http://dx.doi.org/10.2196/34096 UR - http://www.ncbi.nlm.nih.gov/pubmed/35238320 ID - info:doi/10.2196/34096 ER - TY - JOUR AU - Shin, Jeong Seo AU - Park, Jungchan AU - Lee, Seung-Hwa AU - Yang, Kwangmo AU - Park, Woong Rae PY - 2021/10/14 TI - Predictability of Mortality in Patients With Myocardial Injury After Noncardiac Surgery Based on Perioperative Factors via Machine Learning: Retrospective Study JO - JMIR Med Inform SP - e32771 VL - 9 IS - 10 KW - myocardial injury after noncardiac surgery KW - high-sensitivity cardiac troponin KW - machine learning KW - extreme gradient boosting N2 - Background: Myocardial injury after noncardiac surgery (MINS) is associated with increased postoperative mortality, but the relevant perioperative factors that contribute to the mortality of patients with MINS have not been fully evaluated. Objective: To establish a comprehensive body of knowledge relating to patients with MINS, we researched the best performing predictive model based on machine learning algorithms. Methods: Using clinical data from 7629 patients with MINS from the clinical data warehouse, we evaluated 8 machine learning algorithms for accuracy, precision, recall, F1 score, area under the receiver operating characteristic (AUROC) curve, and area under the precision-recall curve to investigate the best model for predicting mortality. Feature importance and Shapley Additive Explanations values were analyzed to explain the role of each clinical factor in patients with MINS. Results: Extreme gradient boosting outperformed the other models. The model showed an AUROC of 0.923 (95% CI 0.916-0.930). The AUROC of the model did not decrease in the test data set (0.894, 95% CI 0.86-0.922; P=.06). Antiplatelet drugs prescription, elevated C-reactive protein level, and beta blocker prescription were associated with reduced 30-day mortality. Conclusions: Predicting the mortality of patients with MINS was shown to be feasible using machine learning. By analyzing the impact of predictors, markers that should be cautiously monitored by clinicians may be identified. UR - https://medinform.jmir.org/2021/10/e32771 UR - http://dx.doi.org/10.2196/32771 UR - http://www.ncbi.nlm.nih.gov/pubmed/34647900 ID - info:doi/10.2196/32771 ER - TY - JOUR AU - Conway, Aaron AU - Jungquist, R. Carla AU - Chang, Kristina AU - Kamboj, Navpreet AU - Sutherland, Joanna AU - Mafeld, Sebastian AU - Parotto, Matteo PY - 2021/10/5 TI - Predicting Prolonged Apnea During Nurse-Administered Procedural Sedation: Machine Learning Study JO - JMIR Perioper Med SP - e29200 VL - 4 IS - 2 KW - procedural sedation and analgesia KW - conscious sedation KW - nursing KW - informatics KW - patient safety KW - machine learning KW - capnography KW - anesthesia KW - anaesthesia KW - medical informatics KW - sleep apnea KW - apnea KW - apnoea KW - sedation N2 - Background: Capnography is commonly used for nurse-administered procedural sedation. Distinguishing between capnography waveform abnormalities that signal the need for clinical intervention for an event and those that do not indicate the need for intervention is essential for the successful implementation of this technology into practice. It is possible that capnography alarm management may be improved by using machine learning to create a ?smart alarm? that can alert clinicians to apneic events that are predicted to be prolonged. Objective: To determine the accuracy of machine learning models for predicting at the 15-second time point if apnea will be prolonged (ie, apnea that persists for >30 seconds). Methods: A secondary analysis of an observational study was conducted. We selected several candidate models to evaluate, including a random forest model, generalized linear model (logistic regression), least absolute shrinkage and selection operator regression, ridge regression, and the XGBoost model. Out-of-sample accuracy of the models was calculated using 10-fold cross-validation. The net benefit decision analytic measure was used to assist with deciding whether using the models in practice would lead to better outcomes on average than using the current default capnography alarm management strategies. The default strategies are the aggressive approach, in which an alarm is triggered after brief periods of apnea (typically 15 seconds) and the conservative approach, in which an alarm is triggered for only prolonged periods of apnea (typically >30 seconds). Results: A total of 384 apneic events longer than 15 seconds were observed in 61 of the 102 patients (59.8%) who participated in the observational study. Nearly half of the apneic events (180/384, 46.9%) were prolonged. The random forest model performed the best in terms of discrimination (area under the receiver operating characteristic curve 0.66) and calibration. The net benefit associated with the random forest model exceeded that associated with the aggressive strategy but was lower than that associated with the conservative strategy. Conclusions: Decision curve analysis indicated that using a random forest model would lead to a better outcome for capnography alarm management than using an aggressive strategy in which alarms are triggered after 15 seconds of apnea. The model would not be superior to the conservative strategy in which alarms are only triggered after 30 seconds. UR - https://periop.jmir.org/2021/2/e29200 UR - http://dx.doi.org/10.2196/29200 UR - http://www.ncbi.nlm.nih.gov/pubmed/34609322 ID - info:doi/10.2196/29200 ER - TY - JOUR AU - Choe, Sooho AU - Park, Eunjeong AU - Shin, Wooseok AU - Koo, Bonah AU - Shin, Dongjin AU - Jung, Chulwoo AU - Lee, Hyungchul AU - Kim, Jeongmin PY - 2021/9/30 TI - Short-Term Event Prediction in the Operating Room (STEP-OP) of Five-Minute Intraoperative Hypotension Using Hybrid Deep Learning: Retrospective Observational Study and Model Development JO - JMIR Med Inform SP - e31311 VL - 9 IS - 9 KW - arterial pressure KW - artificial intelligence KW - biosignals KW - deep learning KW - hypotension KW - machine learning N2 - Background: Intraoperative hypotension has an adverse impact on postoperative outcomes. However, it is difficult to predict and treat intraoperative hypotension in advance according to individual clinical parameters. Objective: The aim of this study was to develop a prediction model to forecast 5-minute intraoperative hypotension based on the weighted average ensemble of individual neural networks, utilizing the biosignals recorded during noncardiac surgery. Methods: In this retrospective observational study, arterial waveforms were recorded during noncardiac operations performed between August 2016 and December 2019, at Seoul National University Hospital, Seoul, South Korea. We analyzed the arterial waveforms from the big data in the VitalDB repository of electronic health records. We defined 2s hypotension as the moving average of arterial pressure under 65 mmHg for 2 seconds, and intraoperative hypotensive events were defined when the 2s hypotension lasted for at least 60 seconds. We developed an artificial intelligence?enabled process, named short-term event prediction in the operating room (STEP-OP), for predicting short-term intraoperative hypotension. Results: The study was performed on 18,813 subjects undergoing noncardiac surgeries. Deep-learning algorithms (convolutional neural network [CNN] and recurrent neural network [RNN]) using raw waveforms as input showed greater area under the precision-recall curve (AUPRC) scores (0.698, 95% CI 0.690-0.705 and 0.706, 95% CI 0.698-0.715, respectively) than that of the logistic regression algorithm (0.673, 95% CI 0.665-0.682). STEP-OP performed better and had greater AUPRC values than those of the RNN and CNN algorithms (0.716, 95% CI 0.708-0.723). Conclusions: We developed STEP-OP as a weighted average of deep-learning models. STEP-OP predicts intraoperative hypotension more accurately than the CNN, RNN, and logistic regression models. Trial Registration: ClinicalTrials.gov NCT02914444; https://clinicaltrials.gov/ct2/show/NCT02914444. UR - https://medinform.jmir.org/2021/9/e31311 UR - http://dx.doi.org/10.2196/31311 UR - http://www.ncbi.nlm.nih.gov/pubmed/34591024 ID - info:doi/10.2196/31311 ER - TY - JOUR AU - Naqvi, Ali Syed Asil AU - Tennankore, Karthik AU - Vinson, Amanda AU - Roy, C. Patrice AU - Abidi, Raza Syed Sibte PY - 2021/8/27 TI - Predicting Kidney Graft Survival Using Machine Learning Methods: Prediction Model Development and Feature Significance Analysis Study JO - J Med Internet Res SP - e26843 VL - 23 IS - 8 KW - kidney transplantation KW - machine learning KW - predictive modeling KW - survival prediction KW - dimensionality reduction KW - feature sensitivity analysis N2 - Background: Kidney transplantation is the optimal treatment for patients with end-stage renal disease. Short- and long-term kidney graft survival is influenced by a number of donor and recipient factors. Predicting the success of kidney transplantation is important for optimizing kidney allocation. Objective: The aim of this study was to predict the risk of kidney graft failure across three temporal cohorts (within 1 year, within 5 years, and after 5 years following a transplant) based on donor and recipient characteristics. We analyzed a large data set comprising over 50,000 kidney transplants covering an approximate 20-year period. Methods: We applied machine learning?based classification algorithms to develop prediction models for the risk of graft failure for three different temporal cohorts. Deep learning?based autoencoders were applied for data dimensionality reduction, which improved the prediction performance. The influence of features on graft survival for each cohort was studied by investigating a new nonoverlapping patient stratification approach. Results: Our models predicted graft survival with area under the curve scores of 82% within 1 year, 69% within 5 years, and 81% within 17 years. The feature importance analysis elucidated the varying influence of clinical features on graft survival across the three different temporal cohorts. Conclusions: In this study, we applied machine learning to develop risk prediction models for graft failure that demonstrated a high level of prediction performance. Acknowledging that these models performed better than those reported in the literature for existing risk prediction tools, future studies will focus on how best to incorporate these prediction models into clinical care algorithms to optimize the long-term health of kidney recipients. UR - https://www.jmir.org/2021/8/e26843 UR - http://dx.doi.org/10.2196/26843 UR - http://www.ncbi.nlm.nih.gov/pubmed/34448704 ID - info:doi/10.2196/26843 ER - TY - JOUR AU - Cao, Yang AU - Näslund, Ingmar AU - Näslund, Erik AU - Ottosson, Johan AU - Montgomery, Scott AU - Stenberg, Erik PY - 2021/8/19 TI - Using a Convolutional Neural Network to Predict Remission of Diabetes After Gastric Bypass Surgery: Machine Learning Study From the Scandinavian Obesity Surgery Register JO - JMIR Med Inform SP - e25612 VL - 9 IS - 8 KW - forecasting KW - clinical decision rules KW - remission induction KW - type 2 diabetes mellitus KW - gastric bypass KW - morbid obesity N2 - Background: Prediction of diabetes remission is an important topic in the evaluation of patients with type 2 diabetes (T2D) before bariatric surgery. Several high-quality predictive indices are available, but artificial intelligence algorithms offer the potential for higher predictive capability. Objective: This study aimed to construct and validate an artificial intelligence prediction model for diabetes remission after Roux-en-Y gastric bypass surgery. Methods: Patients who underwent surgery from 2007 to 2017 were included in the study, with collection of individual data from the Scandinavian Obesity Surgery Registry (SOReg), the Swedish National Patients Register, the Swedish Prescribed Drugs Register, and Statistics Sweden. A 7-layer convolution neural network (CNN) model was developed using 80% (6446/8057) of patients randomly selected from SOReg and 20% (1611/8057) of patients for external testing. The predictive capability of the CNN model and currently used scores (DiaRem, Ad-DiaRem, DiaBetter, and individualized metabolic surgery) were compared. Results: In total, 8057 patients with T2D were included in the study. At 2 years after surgery, 77.09% achieved pharmacological remission (n=6211), while 63.07% (4004/6348) achieved complete remission. The CNN model showed high accuracy for cessation of antidiabetic drugs and complete remission of T2D after gastric bypass surgery. The area under the receiver operating characteristic curve (AUC) for the CNN model for pharmacological remission was 0.85 (95% CI 0.83-0.86) during validation and 0.83 for the final test, which was 9%-12% better than the traditional predictive indices. The AUC for complete remission was 0.83 (95% CI 0.81-0.85) during validation and 0.82 for the final test, which was 9%-11% better than the traditional predictive indices. Conclusions: The CNN method had better predictive capability compared to traditional indices for diabetes remission. However, further validation is needed in other countries to evaluate its external generalizability. UR - https://medinform.jmir.org/2021/8/e25612 UR - http://dx.doi.org/10.2196/25612 UR - http://www.ncbi.nlm.nih.gov/pubmed/34420921 ID - info:doi/10.2196/25612 ER - TY - JOUR AU - de Pennington, Nick AU - Mole, Guy AU - Lim, Ernest AU - Milne-Ives, Madison AU - Normando, Eduardo AU - Xue, Kanmin AU - Meinert, Edward PY - 2021/7/28 TI - Safety and Acceptability of a Natural Language Artificial Intelligence Assistant to Deliver Clinical Follow-up to Cataract Surgery Patients: Proposal JO - JMIR Res Protoc SP - e27227 VL - 10 IS - 7 KW - artificial intelligence KW - natural language processing KW - telemedicine KW - cataract KW - aftercare KW - speech recognition software KW - medical informatics KW - health services KW - health communication KW - delivery of health care KW - patient acceptance of health care KW - mental health KW - cell phone KW - internet KW - conversational agent KW - chatbot KW - expert systems KW - dialogue system KW - relational agent N2 - Background: Due to an aging population, the demand for many services is exceeding the capacity of the clinical workforce. As a result, staff are facing a crisis of burnout from being pressured to deliver high-volume workloads, driving increasing costs for providers. Artificial intelligence (AI), in the form of conversational agents, presents a possible opportunity to enable efficiency in the delivery of care. Objective: This study aims to evaluate the effectiveness, usability, and acceptability of Dora agent: Ufonia?s autonomous voice conversational agent, an AI-enabled autonomous telemedicine call for the detection of postoperative cataract surgery patients who require further assessment. The objectives of this study are to establish Dora?s efficacy in comparison with an expert clinician, determine baseline sensitivity and specificity for the detection of true complications, evaluate patient acceptability, collect evidence for cost-effectiveness, and capture data to support further development and evaluation. Methods: Using an implementation science construct, the interdisciplinary study will be a mixed methods phase 1 pilot establishing interobserver reliability of the system, usability, and acceptability. This will be done using the following scales and frameworks: the system usability scale; assessment of Health Information Technology Interventions in Evidence-Based Medicine Evaluation Framework; the telehealth usability questionnaire; and the Non-Adoption, Abandonment, and Challenges to the Scale-up, Spread and Suitability framework. Results: The evaluation is expected to show that conversational technology can be used to conduct an accurate assessment and that it is acceptable to different populations with different backgrounds. In addition, the results will demonstrate how successfully the system can be delivered in organizations with different clinical pathways and how it can be integrated with their existing platforms. Conclusions: The project?s key contributions will be evidence of the effectiveness of AI voice conversational agents and their associated usability and acceptability. International Registered Report Identifier (IRRID): PRR1-10.2196/27227 UR - https://www.researchprotocols.org/2021/7/e27227 UR - http://dx.doi.org/10.2196/27227 UR - http://www.ncbi.nlm.nih.gov/pubmed/34319248 ID - info:doi/10.2196/27227 ER - TY - JOUR AU - Joo, Hyeon AU - Burns, Michael AU - Kalidaikurichi Lakshmanan, Saradha Sai AU - Hu, Yaokun AU - Vydiswaran, Vinod V. G. PY - 2021/5/26 TI - Neural Machine Translation?Based Automated Current Procedural Terminology Classification System Using Procedure Text: Development and Validation Study JO - JMIR Form Res SP - e22461 VL - 5 IS - 5 KW - CPT classification KW - natural language processing KW - machine learning KW - neural machine translation N2 - Background: Administrative costs for billing and insurance-related activities in the United States are substantial. One critical cause of the high overhead of administrative costs is medical billing errors. With advanced deep learning techniques, developing advanced models to predict hospital and professional billing codes has become feasible. These models can be used for administrative cost reduction and billing process improvements. Objective: In this study, we aim to develop an automated anesthesiology current procedural terminology (CPT) prediction system that translates manually entered surgical procedure text into standard forms using neural machine translation (NMT) techniques. The standard forms are calculated using similarity scores to predict the most appropriate CPT codes. Although this system aims to enhance medical billing coding accuracy to reduce administrative costs, we compare its performance with that of previously developed machine learning algorithms. Methods: We collected and analyzed all operative procedures performed at Michigan Medicine between January 2017 and June 2019 (2.5 years). The first 2 years of data were used to train and validate the existing models and compare the results from the NMT-based model. Data from 2019 (6-month follow-up period) were then used to measure the accuracy of the CPT code prediction. Three experimental settings were designed with different data types to evaluate the models. Experiment 1 used the surgical procedure text entered manually in the electronic health record. Experiment 2 used preprocessing of the procedure text. Experiment 3 used preprocessing of the combined procedure text and preoperative diagnoses. The NMT-based model was compared with the support vector machine (SVM) and long short-term memory (LSTM) models. Results: The NMT model yielded the highest top-1 accuracy in experiments 1 and 2 at 81.64% and 81.71% compared with the SVM model (81.19% and 81.27%, respectively) and the LSTM model (80.96% and 81.07%, respectively). The SVM model yielded the highest top-1 accuracy of 84.30% in experiment 3, followed by the LSTM model (83.70%) and the NMT model (82.80%). In experiment 3, the addition of preoperative diagnoses showed 3.7%, 3.2%, and 1.3% increases in the SVM, LSTM, and NMT models in top-1 accuracy over those in experiment 2, respectively. For top-3 accuracy, the SVM, LSTM, and NMT models achieved 95.64%, 95.72%, and 95.60% for experiment 1, 95.75%, 95.67%, and 95.69% for experiment 2, and 95.88%, 95.93%, and 95.06% for experiment 3, respectively. Conclusions: This study demonstrates the feasibility of creating an automated anesthesiology CPT classification system based on NMT techniques using surgical procedure text and preoperative diagnosis. Our results show that the performance of the NMT-based CPT prediction system is equivalent to that of the SVM and LSTM prediction models. Importantly, we found that including preoperative diagnoses improved the accuracy of using the procedure text alone. UR - https://formative.jmir.org/2021/5/e22461 UR - http://dx.doi.org/10.2196/22461 UR - http://www.ncbi.nlm.nih.gov/pubmed/34037526 ID - info:doi/10.2196/22461 ER - TY - JOUR AU - Chen, Zhipeng AU - Zeng, D. Daniel AU - Seltzer, N. Ryan G. AU - Hamilton, D. Blake PY - 2021/5/11 TI - Automated Generation of Personalized Shock Wave Lithotripsy Protocols: Treatment Planning Using Deep Learning JO - JMIR Med Inform SP - e24721 VL - 9 IS - 5 KW - nephrolithiasis KW - extracorporeal shock wave therapy KW - lithotripsy KW - treatment planning KW - deep learning KW - artificial intelligence N2 - Background: Though shock wave lithotripsy (SWL) has developed to be one of the most common treatment approaches for nephrolithiasis in recent decades, its treatment planning is often a trial-and-error process based on physicians? subjective judgement. Physicians? inexperience with this modality can lead to low-quality treatment and unnecessary risks to patients. Objective: To improve the quality and consistency of shock wave lithotripsy treatment, we aimed to develop a deep learning model for generating the next treatment step by previous steps and preoperative patient characteristics and to produce personalized SWL treatment plans in a step-by-step protocol based on the deep learning model. Methods: We developed a deep learning model to generate the optimal power level, shock rate, and number of shocks in the next step, given previous treatment steps encoded by long short-term memory neural networks and preoperative patient characteristics. We constructed a next-step data set (N=8583) from top practices of renal SWL treatments recorded in the International Stone Registry. Then, we trained the deep learning model and baseline models (linear regression, logistic regression, random forest, and support vector machine) with 90% of the samples and validated them with the remaining samples. Results: The deep learning models for generating the next treatment steps outperformed the baseline models (accuracy = 98.8%, F1 = 98.0% for power levels; accuracy = 98.1%, F1 = 96.0% for shock rates; root mean squared error = 207, mean absolute error = 121 for numbers of shocks). The hypothesis testing showed no significant difference between steps generated by our model and the top practices (P=.480 for power levels; P=.782 for shock rates; P=.727 for numbers of shocks). Conclusions: The high performance of our deep learning approach shows its treatment planning capability on par with top physicians. To the best of our knowledge, our framework is the first effort to implement automated planning of SWL treatment via deep learning. It is a promising technique in assisting treatment planning and physician training at low cost. UR - https://medinform.jmir.org/2021/5/e24721 UR - http://dx.doi.org/10.2196/24721 UR - http://www.ncbi.nlm.nih.gov/pubmed/33973862 ID - info:doi/10.2196/24721 ER -