Class 14 - Basic Coding in Python - 5

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

Basic coding in Python: part 4.

0
Making loops in Python

• while loops: while loop are used to execute a set of statements as long as a condition is true

+= means x = x + value
Making loops in Python

• while loops: while loop are used to execute a set of statements as long as a condition is true

• Loops allow for repeating a piece of code


Loops and breaks
break statement can stop the loop even if the while condition is true
Loops and continue statements
continue statements can stop the current iteration and continue with the next

Note that number 3 is missing from the series


Loops and else statements
else statements run a block of code once a condition stops being true
Basic coding in Python: part 5.0
Step 1. Click on the symbol with 3 dots

Step 2. Click on Upload File > Upload file “data.csv” from your local computer
- the file is also uploaded on the Course page -
Step 3. Import the database file into replit by executing the following 3 lines of code

import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
Step 4. Import libraries with the following 4 lines of code

import numpy
from scipy import stats
import math
import matplotlib.pyplot as plt
Step 5: Statistics for the Age column - Define mean, median and standard deviation (std)
by executing the following 6 lines of code

median = numpy.median(df['age'])
print(median)
mean = numpy.mean(df['age'])
print(mean)
std = numpy.std(df['age'])
print(std)
Step 5: Statistics for the Age column - Define mean, median and standard deviation (var)
by executing the following 6 lines of code

Mean = 55 years
Median = 54.3 years
Stand. Dev = 9 years
Step 6: Let’s organize the data in rows

Input the following code:

categorical_val = []
continous_val = []
for column in df.columns:
print('==============================')
print(f"{column} : {df[column].unique()}")
if len(df[column].unique()) <= 10:
categorical_val.append(column)
else:
continous_val.append(column)
Step 6: Let’s organize the data in rows

Input the following code:


Step 6: Let’s organize the data in rows

Cp - Chest pain

Trestbps - resting blood pressure (above 135 is of concern)

Cholesterol - greater than 200 is of concern

resting EKG (1 = an abnormal heart rhythm, which can range from


mild symptoms to severe problems)

Thalach - maximum heart rate achieved (over 140 is more likely to


have heart disease)

exercise-induced angina

Ca: number of major vessels (more blood movement = better, so


people with CA = 0 are more likely to have heart disease)

thal - thalium stress result (flow of blood through coronary)


Step 7: Before applying a machine learning algorithm, let’s prepare a function to classify data
Step 7: Before applying a machine learning algorithm, let’s prepare a function to classify data

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

def print_score(clf, X_train, y_train, X_test, y_test, train=True):


if train:
pred = clf.predict(X_train)
clf_report = pd.DataFrame(classification_report(y_train, pred, output_dict=True))
print("Train Result:\n================================================")
print(f"Accuracy Score: {accuracy_score(y_train, pred) * 100:.2f}%")
print("_______________________________________________")
print(f"CLASSIFICATION REPORT:\n{clf_report}")
print("_______________________________________________")
print(f"Confusion Matrix: \n {confusion_matrix(y_train, pred)}\n")
elif train==False:
pred = clf.predict(X_test)
clf_report = pd.DataFrame(classification_report(y_test, pred, output_dict=True))
print("Test Result:\n================================================")
print(f"Accuracy Score: {accuracy_score(y_test, pred) * 100:.2f}%")
print("_______________________________________________")
print(f"CLASSIFICATION REPORT:\n{clf_report}")
print("_______________________________________________")
print(f"Confusion Matrix: \n {confusion_matrix(y_test, pred)}\n")
Step 8: Let’s split the data into training and test sets

Usually, data split is 70% of the total for the training set and 30% testing set

Class #4, slide #5


Step 8: Let’s split the data into training and test sets

Data split is 70% for training set and 30% for testing set

from sklearn.model_selection import train_test_split

X = df.drop('target', axis=1)
y = df.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


Step 9: Let’s train the machine learning model with a logistic regression model:

Class #2, slide #27

This model only accounts for 2 parameters:


tumor size and abnormality score
Step 9: Let’s train the machine learning model with a logistic regression model:
Step 8: Let’s train the machine learning model with a logistic regression model:

from sklearn.linear_model import LogisticRegression

lr_clf = LogisticRegression(solver='liblinear')
lr_clf.fit(X_train, y_train)

print_score(lr_clf, X_train, y_train, X_test, y_test, train=True)


print_score(lr_clf, X_train, y_train, X_test, y_test, train=False)
Step 8: Let’s train the machine learning model with a logistic regression model:

the model performs well: the test set has almost the same accuracy as the training set

Class #5, slide#9


Step 8: Let’s train the machine learning model with a logistic regression model:

Class #5, slide #7: Fewer examples


(30% of whole dataset)

You might also like