Development and internal validation of an AI-based emergency triage model for predicting critical outcomes in emergency department

This study demonstrates the potential of machine learning models to predict ICU admission from the ED more accurately than the conventional CTAS triage system. Our top-performing model, XGBoost, showed superior discrimination, a finding consistent with other studies that highlight the ability of tree-based models to capture complex, non-linear interactions among predictors20,23. The improvement in AUROC (0.917 vs. 0.882) is noteworthy, but the substantial improvement in AUPRC (0.629 vs. 0.333) is particularly compelling. AUPRC is more informative than AUROC in settings with class imbalance, such as ICU prediction, and this large gain suggests our model is significantly better at ensuring that patients flagged as high-risk are truly likely to require ICU admission, thereby improving the positive predictive value of the triage assessment. The step-like pattern observed in the CTAS ROC curve (Fig. 2) reflects its nature as a categorical scale with five discrete levels, which inherently limits its ability to provide nuanced, continuous risk stratification compared to the machine learning models.

Previous studies have shown that machine learning can improve triage outcomes using both structured and unstructured data, primarily in developed countries20,21,23,29,34. This study builds on those findings and demonstrates that such improvements are also achievable in resource-limited settings. Differences include excluding pain scores as they are subjective and unreliable for determining patient acuity35. Furthermore, the integration of NLP to analyze chief complaints in free text enabled the model to interpret textual data. Conventional triage systems, including ESI and CTAS, rely on chief complaints for categorization. The NLP approach can capture the subtle variations in clinical presentations, allowing for a broader categorization of chief complaints. Additionally, the use of multilingual embeddings effectively manages the linguistic diversity of clinical documentation in the local context, allowing the model to interpret text written in Thai with occasional English medical terms. However, reliance on free-text chief complaints introduces variability that could affect model prediction reliability.

Our findings also align with other advanced triage systems. For instance, the TriAge-Go system, a sophisticated software as a medical device (SaMD), also showed improved prediction over standard triage in a recent prospective evaluation36. While TriAge-Go represents a highly advanced implementation, our model demonstrates that significant improvements can also be achieved in resource-limited settings using readily available data.

This study shows that data-driven tools can make ED decisions more effective. By providing a real-time risk score, the model can flag high-risk patients for triage nurses, helping to mitigate under-triage and focus attention where it is most needed. It can also aid in resource management by providing more accurate forecasts for ICU bed demand.

Limitations

Several limitations should be acknowledged. First, a significant limitation is our definition of the ground truth. The primary outcome of direct ICU admission from the ED does not account for disposition errors, such as unplanned ICU transfers (UIT) from a general ward within 24 h of admission. These cases often represent patients who were under-triaged, and their exclusion may mean our model was trained on more clearly identifiable cases of critical illness, potentially overestimating its performance. Future prospective studies should incorporate UIT to create a more robust and clinically accurate composite outcome.

Second, this study is a retrospective, single-center analysis conducted on a predominantly Asian population, which inherently limits its generalizability to other settings and demographic groups. A critical challenge for all predictive models is performance degradation upon real-world, prospective implementation. Studies on models like the Rothman Index and the EPIC Sepsis Model have shown that even models validated on large retrospective datasets can experience a significant drop in performance when deployed, often due to data drift or overfitting37,38. Therefore, our model must be considered an early-stage development, and its clinical utility can only be confirmed through rigorous external and prospective validation.

Third, the reliance on free-text chief complaints, while powerful, introduces variability. The quality and detail of documentation can differ between nurses, which could affect model reliability. Future work should prioritize improving the quality of free-text inputs and exploring standardization using systems like SNOMED CT to enhance data consistency39.

Forth, the outcome was limited to ICU admissions. Certain conditions, such as anaphylaxis or reactive airway disease, require immediate attention but may not result in ICU admission. In contrast, conditions associated with high mortality, such as unconsciousness, may lead to death in the ED rather than admission to the ICU. Outcomes such as emergency procedures, early mortality, or ED resource utilization could provide a more comprehensive evaluation of patient acuity.

Finally, the absence of detailed patient history as a predictor may have constrained the model’s performance. Incorporating prior medical information could significantly enhance prediction accuracy and help address potential biases.

Development and internal validation of an AI-based emergency triage model for predicting critical outcomes in emergency department

Tags: