Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Predicting UTI using Machine Learning

Urinary tract infection (UTI) is a common disease high diagnostic error rates. Because a urine
culture is usually not available for 24–48 hours after an ED visit, diagnosis and treatment
decisions are based on symptoms, physical findings, and other laboratory results, potentially
leading to overutilization, antibiotic resistance, and delayed treatment. Our aim was to train
and test machine-learning based predictive models with a large dataset of UTI patient.

Dataset:
The data was taken from
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0194085.

This dataset was also used in the scientific paper ‘Predicting urinary tract infections in the
emergency department with machine learning’ written by R. Andrew Taylor, Christopher L.
Moore, Kei-Hoi Cheung, Cynthia Brandt.

Data Preprocessing:

The main csv file contained information of 80,000 patient. We took 150 randomly selected data
and also removed unnecessary data from there.

Data cleaning:

There were also some missing data expressed as ‘Not Reported’. If the quantity of the missing
data is greater than 25 % we dropped that column or if the missing data is less than 5%, we
dropped that row. We mapped the categorical values into numerical values.

Overfitting and Feature Selection:

We plotted the accuracy score vs training examples and parameters. From the plot, we saw
that the model was overfitting so we selected 27 features with best accuracy.

Model training and testing:

We split the data into train(80%) and test set(20%).We used logistic regression,KNN and SVM.
We got an accuracy about 74% which is below expectation. So we need to take more data to
train our model.

You might also like