Alternatives and detailed information of sepsisPrediction

Google Summer of Code - CBMI@UTHSC

Early Sepsis Prediction using Machine Learning

Introduction
Modules
Code Description
GSoC Experience
Conclusion
Team
License

Introduction

This project aims to provide improved solution to the medical world, where millions of people die due to Sepsis, a fatal disease where the patient has dyregulated response to infection. Since sepsis is time-sensitive, it quickly escalates to multiorgan failures, that greatly increases the risk of death. Here we try to accurately predict the occurence of sepsis, hours before it actaully occurs. This will provide doctors to take contingency actions early, and will decrease mortality rates significantly.

This project is based off the eICU database, managed by physionet. Critically ill patients are admitted to the ICU where they receive complex and time-sensitive care from a wide array of clinical staff. Electronic measuring devices are attached to them that produce data at regular intervals. This data, from multiple hospitals was assimilated into the eICU database.
The vitals from the patients were measured every 5 minutes. Such a frequency is ideal because reduced frequency does not allow us to get a deep insight into the patient's condition, and consequentially, the models are not accurate enough.

In this project, we apply multiple machine learning methods to generate descriptive features that are clinically meaningful and predict the onset of sepsis.

NOTE: For the database features, please go through the documentation of the eICU database here: https://eicu-crd.mit.edu/about/eicu/

Modules

Extracting relevant data

lab.csv was used to extract the lab values.
nurseCharting.csv was used to extract the GCS scores as well as the MAP and ventilator details.
infusionDrug.csv was used to extract all relevant vasopressors like Norepinephrine, Dopamine etc.
vitalPeriodic.csv was were all the vitals for the patients were recorded in a frequency of 5 minutes.
The IV antibiotics data has been collected from the medication.csv table for each registered patient, while the fluid samples data was taken from the microlab.csv
Apart from the essential parameters needed for SOFA score calculation, we have also included a number of different variables to the final training data to check how they influence the model as will be shown in the feature importance curve. Some of them are:
- calcium
- glucose
- lactate
- magnesium
- Phosphate
- potassium

SOFA Calculations

Here is a small code snippet of one of the parts of SOFA calculation:

    labs_withO2.loc[(labs_withO2['total_bilirubin'] <1.2), 'SOFA_Liver'] = 0
    labs_withO2.loc[(labs_withO2['total_bilirubin'] >=1.2) & (labs_withO2['total_bilirubin'] <=1.9), 'SOFA_Liver'] = 1
    labs_withO2.loc[(labs_withO2['total_bilirubin'] >=2) & (labs_withO2['total_bilirubin'] <=5.9), 'SOFA_Liver'] = 2
    labs_withO2.loc[(labs_withO2['total_bilirubin'] >=6) & (labs_withO2['total_bilirubin'] <=11.9), 'SOFA_Liver'] = 3
    labs_withO2.loc[(labs_withO2['total_bilirubin'] >12), 'SOFA_Liver'] = 4

    labs_withO2.loc[(labs_withO2['paO2_FiO2'] >=400), 'SOFA_Respiration'] = 0
    labs_withO2.loc[(labs_withO2['paO2_FiO2'] <400), 'SOFA_Respiration'] = 1
    labs_withO2.loc[(labs_withO2['paO2_FiO2'] <300), 'SOFA_Respiration'] = 2
    labs_withO2.loc[((labs_withO2['paO2_FiO2'] <200) & (labs_withO2['nursingchartvalue'] =='ventilator')), 'SOFA_Respiration'] = 3
    labs_withO2.loc[((labs_withO2['paO2_FiO2'] <100) & (labs_withO2['nursingchartvalue'] =='ventilator')), 'SOFA_Respiration'] = 4

    labs_withO2.loc[((labs_withO2['creatinine'] >=0) & (labs_withO2['creatinine'] <=1.1)), 'SOFA_Renal'] = 0
    labs_withO2.loc[((labs_withO2['creatinine'] >=1.2) & (labs_withO2['creatinine'] <=1.9)), 'SOFA_Renal'] = 1
    labs_withO2.loc[((labs_withO2['creatinine'] >=2) & (labs_withO2['creatinine'] <=3.4)), 'SOFA_Renal'] = 2
    labs_withO2.loc[((labs_withO2['creatinine'] >=3.5) & (labs_withO2['creatinine'] <=4.9)) | (labs_withO2['urinary_creatinine'] <200), 'SOFA_Renal'] = 3
    labs_withO2.loc[(labs_withO2['creatinine'] >5) | (labs_withO2['urinary_creatinine'] <200), 'SOFA_Renal'] = 4

Feature Extraction

Standard Deviation
Kurtosis
Skewness
Mean
Minimum
Maximum
RMS_Difference

Model Development (XGBoost and others)

Five-fold cross-validation model was developed using XGBClassifier. The area under the ROC curve (AUROC) is a function of prediction window. The AUROC for the training set was higher than the testing set. The average testing AUROC at 2 hours prior to the sepsis onset was 0.86. However, the AUROC decreases as we move away from the time of sepsis onset.

The average testing cross-validated recall and precision for predicting sepsis class are 73%, and 84%, respectively, 2 hours before the sepsis onset. Whereas, the overall F1-Score was 79.5%. The following provides the precision, recall and F1 score for each of the time intervals before the sepsis onset.

Here we compare the XGBoost F1-Score with the other machine learning methods (RF: Random Forest; LR: Logistic Regression; GNB: Gaussian Naïve Bayes).

NOTE: All of the model statistics are exclusive to the eICU database. A new database might produce different results, better or worse. Hyper-parameterization will be required.

Code Description

NOTE: Before using any of the functions listed in this project, make sure the data is formatted according the eICU schema. Only then, will it work as intended.

antibiotics.py

get_antibiotics()

gcs_extract.py

extract_GCS_withSOFA()

extract_GCS()

extract_MAP()

extract_VENT()

labs_extract.py

extract_lab_format()

calc_lab_sofa()

vasopressor_extract.py

extract_drugrates()

incorporate_weights()

add_separate_cols()

calc_SOFA()

sepsis_calc.py

calc_tsepsis()

merge_final_table.py

merge_final()

sepsisprediction.py

feature_fun()

process()

case_preprocess()

control_preprocess()

get_controls()

run_xgboost()

GSoC Experience

Conclusion

Team

Ronet Swaminathan
[email protected]

Author

Aditya Singh
[email protected]

Author

Dr. Akram Mohammed
[email protected]

Mentor, Maintainer

Dr. Rishikesan Kamaleswaran
[email protected]

Mentor

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

cbmi-uthsc / sepsisPrediction

Programming Languages

Google Summer of Code - CBMI@UTHSC

Early Sepsis Prediction using Machine Learning

Table of Contents

Introduction

Modules

Extracting relevant data

SOFA Calculations

Feature Extraction

Model Development (XGBoost and others)

Code Description

GSoC Experience

Conclusion

Team

License