B24 ML Exp-1

PART A
(PART A: TO BE REFFERED BY STUDENTS)
Experiment No. 1
A.1 Aim:
To implement Logistic Regression.

A.2 Prerequisite:
Python Basic Concepts
A.3 Outcome:
Students will be able to implement Logistic Regression.
A.4 Theory:
Machine Learning, being a subset of Artificial Intelligence (AI), has been

playing a dominant role in our daily lives. Data science engineers and developers
working in various domains are widely using machine learning algorithms to
make their tasks simpler and life easier.
What are the two types of supervised learning?
As supervised learning is used to classify something or predict a value, naturally

there are two types of algorithms for supervised learning - classification models
and regression models.
1. Classification model - In simple terms, a classification model predicts

possible outcomes. Example: Predicting if a transaction is fraud or not.
2. Regression model - Are used to predict a numerical value. Example:
Predicting the sale price of a house.
What is logistic regression?
Logistic regression is an example of supervised learning. It is used to calculate or

predict the probability of a binary (yes/no) event occurring. An example of logistic
regression could be applying machine learning to determine if a person is likely to
be infected with COVID-19 or not. Since we have two possible outcomes to this
question - yes they are infected, or no they are not infected - this is called binary
classification.
Sigmoid function, which produces an S-shaped curve. It always returns a

probability value between 0 and 1. The Sigmoid function is used to convert
expected values to probabilities. The function converts any real number into a
number between 0 and 1. We utilize sigmoid to translate predictions to
probabilities in machine learning.
The mathematically sigmoid function can be,
Where to use logistic regression
 In health care, logistic regression can be used to predict if a tumor is likely

to be benign or malignant.
 In the financial industry, logistic regression can be used to predict if a
transaction is fraudulent or not.
 In marketing, logistic regression can be used to predict if a targeted audience
will respond or not.
Advantages of the Logistic Regression Algorithm
 Logistic regression performs better when the data is linearly separable

 It does not require too many computational resources as it’s highly interpretable
 There is no problem scaling the input features—It does not require tuning
 It is easy to implement and train a model using logistic regression
 It gives a measure of how relevant a predictor (coefficient size) is, and its
direction of association (positive or negative)
PART B
(PART B : TO BE COMPLETED BY STUDENTS)
(Students must submit the soft copy as per following segments within two hours of the practical. The
soft copy must be uploaded on the Blackboard or emailed to the concerned lab in charge faculties at
the end of the practical in case the there is no Black board access available)
Roll. No. BE-B Name: Sakshi.B.Tupsundar

Class: BE-Comps Batch: B2
Date of Experiment:10-10-2023 Date of Submission:11-10-2023
Grade:
B.1 Software Code written by student:

import numpy as np
from google.colab import drive
import csv
import pandas as pd
import seaborn as sns
# Method 2: Using pandas
df = pd.read_csv('/content/survey lung cancer.csv')
print(df)
df.shape
df.isnull().sum()
df.head()
from sklearn import preprocessing
# label_encoder object knows

# how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'species'.
df['GENDER']= label_encoder.fit_transform(df['GENDER'])
df['GENDER'].unique()
df['LUNG_CANCER']= label_encoder.fit_transform(df['LUNG_CANCER'])
df['LUNG_CANCER'].unique()
df.head()
import seaborn as sns
from matplotlib import pyplot as plt
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 8))
plt.suptitle("Lung Disease Prediction")
ax = plt.gca()
df.boxplot()
Applying Logistic model before outlier removal

#Applying Logistic Regression with outlier
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
X = df.drop('LUNG_CANCER', axis=1)
y = df['LUNG_CANCER']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)
from sklearn.preprocessing import StandardScaler

st_x= StandardScaler()
X_train= st_x.fit_transform(X_train)
X_test= st_x.transform(X_test)
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train, y_train)
y_pred = log_reg.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)
from sklearn.metrics import classification_report

log_reg.fit(X_train,y_train)
y_pre = log_reg.predict(X_train)
y_pred = log_reg.predict(X_test)
y_pred1 = log_reg.predict_proba(X_test)
from sklearn.metrics import accuracy_score,roc_auc_score

from sklearn.model_selection import cross_val_score
print("Training Accuracy_score:
{}".format(accuracy_score(y_train,y_pre)))
print("Testing Accuracy_score:
{}".format(accuracy_score(y_test,y_pred)))
print("roc_auc_score: {}".format(roc_auc_score(y_test,y_pred1[:,1])))
print("CV_score: {}".format(cross_val_score(log_reg, X, y, cv=10,
scoring='accuracy').mean()))
print(classification_report(y_test, y_pred))
Applying Logistic model after outlier removal
from sklearn.model_selection import train_test_split
X = data_cleaned.iloc[:, :-1]
y = data_cleaned.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split( X, y,
test_size=0.2,
random_state=42)
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
X_train= st_x.fit_transform(X_train)
X_test= st_x.transform(X_test)
from sklearn.linear_model import LogisticRegression

logreg_model=LogisticRegression()
logreg_model.fit(X_train,y_train)
y_pre=logreg_model.predict(X_train)
y_pred=logreg_model.predict(X_test)
y_pred1=logreg_model.predict_proba(X_test)
from sklearn.metrics import accuracy_score,roc_auc_score

from sklearn.model_selection import cross_val_score
print("Training Accuracy_score:
{}".format(accuracy_score(y_train,y_pre)))
print("Testing Accuracy_score:
{}".format(accuracy_score(y_test,y_pred)))
print("roc_auc_score: {}".format(roc_auc_score(y_test,y_pred1[:,1])))
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
hyperparameter tuning
# For hyperparameter tuning-
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from sklearn.metrics import confusion_matrix
# Specify parameters
c_values = list(np.arange(1, 10))
param_grid = [
{'C': c_values, 'penalty': ['l1'], 'solver' : ['liblinear'],
'multi_class' : ['ovr']},
{'C': c_values, 'penalty': ['l2'], 'solver' : ['liblinear', 'newton-
cg', 'lbfgs'], 'multi_class' : ['ovr']}
]
grid = GridSearchCV(logreg_model, param_grid, cv=10, scoring='accuracy')
grid.fit(X_train,y_train)
print(grid.best_params_)
print(grid.best_score_)
print(grid.best_estimator_)
predictionforest = grid.best_estimator_.predict(X_test)
y_pred2=grid.best_estimator_.predict_proba(X_test)
print(confusion_matrix(y_test,predictionforest))
#print ("{0}".format(metrics.confusion_matrix(y_test, y_pred, labels=[1,
0])))
print ("Classification Report")
# labels for set 1=True to upper left and 0 = False to lower right
print ("{0}".format(metrics.classification_report(y_test,
predictionforest, labels=[1, 0])))
acc_hyper = accuracy_score(y_test,predictionforest)
print (acc_hyper)
print("roc_auc_score after hypertuning:
{}".format(roc_auc_score(y_test,y_pred2[:,1])))
print("roc_auc_score original:
{}".format(roc_auc_score(y_test,y_pred1[:,1])))
B.2 Input and Output:
Logistic Model Before Outlier Removal After Outlier Removal

Training Accuracy Score 0.9271255060728745 0.9428571428571428
Testing Accuracy Score 0.967741935483871 0.9193548387096774
ROC_AUC Score 0.9166666666666667 0.9263157894736842
CV Score 0.9256989247311826 0.9221505376344086
Hyper Parameter Tuning for After Outlier Removal

Logistic Model
Accuracy 0.9306666666666666
ROC_AUC score 0.9228070175438596
CV score 0.9221505376344086
B.3 Observations and learning:

We able to learn Logistic Regression.
B.4 Conclusion:
We able to implement Logistic Regression.
B.5 Question of Curiosity
1.What is Logistic regression?

Ans:- Logistic regression is a supervised machine learning algorithm mainly used for classification tasks
where the goal is to predict the probability that an instance of belonging to a given class or not. It is a
kind of statistical algorithm, which analyze the relationship between a set of independent variables and
the dependent binary variables. It is a powerful tool for decision-making. For example email spam or
not. Logistic regression is a supervised machine learning algorithm mainly used for classification tasks
where the goal is to predict the probability that an instance of belonging to a given class. It is used for
classification algorithms its name is logistic regression. it’s referred to as regression because it takes the
output of the linear regression function as input and uses a sigmoid function to estimate the probability
for the given class. The difference between linear regression and logistic regression is that linear
regression output is the continuous value that can be anything while logistic regression predicts the
probability that an instance belongs to a given class or not.

B24 ML Exp-1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B24 ML Exp-1

Uploaded by

Copyright:

Available Formats

PART A

(PART A: TO BE REFFERED BY STUDENTS)

To implement Logistic Regression.

Machine Learning, being a subset of Artificial Intelligence (AI), has been

What are the two types of supervised learning?

As supervised learning is used to classify something or predict a value, naturally

1. Classification model - In simple terms, a classification model predicts

What is logistic regression?

Logistic regression is an example of supervised learning. It is used to calculate or

Sigmoid function, which produces an S-shaped curve. It always returns a

The mathematically sigmoid function can be,

Where to use logistic regression

 In health care, logistic regression can be used to predict if a tumor is likely

 Logistic regression performs better when the data is linearly separable

Roll. No. BE-B Name: Sakshi.B.Tupsundar

B.1 Software Code written by student:

from google.colab import drive

import seaborn as sns

# Method 2: Using pandas

df = pd.read_csv('/content/survey lung cancer.csv')

from sklearn import preprocessing

# label_encoder object knows

# Encode labels in column 'species'.

import seaborn as sns

from matplotlib import pyplot as plt

import matplotlib.pyplot as plt

import matplotlib.pyplot as plt

plt.suptitle("Lung Disease Prediction")

Applying Logistic model before outlier removal

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

from sklearn.preprocessing import StandardScaler

accuracy = accuracy_score(y_test, y_pred)

from sklearn.metrics import classification_report

from sklearn.metrics import accuracy_score,roc_auc_score

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score,roc_auc_score

c_values = list(np.arange(1, 10))

Logistic Model Before Outlier Removal After Outlier Removal

Hyper Parameter Tuning for After Outlier Removal

B.3 Observations and learning:

B.5 Question of Curiosity

1.What is Logistic regression?

You might also like