Class 14 - Basic Coding in Python - 5

Basic coding in Python: part 4.
0
Making loops in Python
• while loops: while loop are used to execute a set of statements as long as a condition is true
+= means x = x + value
Making loops in Python
• while loops: while loop are used to execute a set of statements as long as a condition is true
• Loops allow for repeating a piece of code

Loops and breaks
break statement can stop the loop even if the while condition is true
Loops and continue statements
continue statements can stop the current iteration and continue with the next
Note that number 3 is missing from the series

Loops and else statements
else statements run a block of code once a condition stops being true
Basic coding in Python: part 5.0
Step 1. Click on the symbol with 3 dots
Step 2. Click on Upload File > Upload file “data.csv” from your local computer
- the file is also uploaded on the Course page -
Step 3. Import the database file into replit by executing the following 3 lines of code
import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
Step 4. Import libraries with the following 4 lines of code
import numpy
from scipy import stats
import math
import matplotlib.pyplot as plt
Step 5: Statistics for the Age column - Define mean, median and standard deviation (std)
by executing the following 6 lines of code
median = numpy.median(df['age'])
print(median)
mean = numpy.mean(df['age'])
print(mean)
std = numpy.std(df['age'])
print(std)
Step 5: Statistics for the Age column - Define mean, median and standard deviation (var)
by executing the following 6 lines of code
Mean = 55 years
Median = 54.3 years
Stand. Dev = 9 years
Step 6: Let’s organize the data in rows
Input the following code:
categorical_val = []
continous_val = []
for column in df.columns:
print('==============================')
print(f"{column} : {df[column].unique()}")
if len(df[column].unique()) <= 10:
categorical_val.append(column)
else:
continous_val.append(column)
Input the following code:

Cp - Chest pain
Trestbps - resting blood pressure (above 135 is of concern)
Cholesterol - greater than 200 is of concern
resting EKG (1 = an abnormal heart rhythm, which can range from

mild symptoms to severe problems)
Thalach - maximum heart rate achieved (over 140 is more likely to

have heart disease)
exercise-induced angina
Ca: number of major vessels (more blood movement = better, so

people with CA = 0 are more likely to have heart disease)
thal - thalium stress result (flow of blood through coronary)

Step 7: Before applying a machine learning algorithm, let’s prepare a function to classify data
Step 7: Before applying a machine learning algorithm, let’s prepare a function to classify data
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
def print_score(clf, X_train, y_train, X_test, y_test, train=True):

if train:
pred = clf.predict(X_train)
clf_report = pd.DataFrame(classification_report(y_train, pred, output_dict=True))
print("Train Result:\n================================================")
print(f"Accuracy Score: {accuracy_score(y_train, pred) * 100:.2f}%")
print("_______________________________________________")
print(f"CLASSIFICATION REPORT:\n{clf_report}")
print("_______________________________________________")
print(f"Confusion Matrix: \n {confusion_matrix(y_train, pred)}\n")
elif train==False:
pred = clf.predict(X_test)
clf_report = pd.DataFrame(classification_report(y_test, pred, output_dict=True))
print("Test Result:\n================================================")
print(f"Accuracy Score: {accuracy_score(y_test, pred) * 100:.2f}%")
print("_______________________________________________")
print(f"CLASSIFICATION REPORT:\n{clf_report}")
print("_______________________________________________")
print(f"Confusion Matrix: \n {confusion_matrix(y_test, pred)}\n")
Step 8: Let’s split the data into training and test sets
Usually, data split is 70% of the total for the training set and 30% testing set
Class #4, slide #5

Step 8: Let’s split the data into training and test sets
Data split is 70% for training set and 30% for testing set
from sklearn.model_selection import train_test_split
X = df.drop('target', axis=1)
y = df.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Step 9: Let’s train the machine learning model with a logistic regression model:
Class #2, slide #27
This model only accounts for 2 parameters:

tumor size and abnormality score
from sklearn.linear_model import LogisticRegression
lr_clf = LogisticRegression(solver='liblinear')
lr_clf.fit(X_train, y_train)
print_score(lr_clf, X_train, y_train, X_test, y_test, train=True)

print_score(lr_clf, X_train, y_train, X_test, y_test, train=False)
the model performs well: the test set has almost the same accuracy as the training set
Class #5, slide#9

Class #5, slide #7: Fewer examples

(30% of whole dataset)

Class 14 - Basic Coding in Python - 5

Uploaded by

Copyright:

Available Formats

You might also like

Class 14 - Basic Coding in Python - 5

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Class 14 - Basic Coding in Python - 5

Uploaded by

Copyright:

Available Formats

Basic coding in Python: part 4.

• Loops allow for repeating a piece of code

Note that number 3 is missing from the series

Input the following code:

Input the following code:

Trestbps - resting blood pressure (above 135 is of concern)

Cholesterol - greater than 200 is of concern

resting EKG (1 = an abnormal heart rhythm, which can range from

Thalach - maximum heart rate achieved (over 140 is more likely to

Ca: number of major vessels (more blood movement = better, so

thal - thalium stress result (flow of blood through coronary)

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

def print_score(clf, X_train, y_train, X_test, y_test, train=True):

Class #4, slide #5

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Class #2, slide #27

This model only accounts for 2 parameters:

from sklearn.linear_model import LogisticRegression

print_score(lr_clf, X_train, y_train, X_test, y_test, train=True)

Class #5, slide#9

Class #5, slide #7: Fewer examples

You might also like