Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

PART A

(PART A: TO BE REFFERED BY STUDENTS)

Experiment No. 1
A.1 Aim:

To implement Logistic Regression.


A.2 Prerequisite:
Python Basic Concepts

A.3 Outcome:
Students will be able to implement Logistic Regression.

A.4 Theory:

Machine Learning, being a subset of Artificial Intelligence (AI), has been


playing a dominant role in our daily lives. Data science engineers and developers
working in various domains are widely using machine learning algorithms to
make their tasks simpler and life easier.

What are the two types of supervised learning?

As supervised learning is used to classify something or predict a value, naturally


there are two types of algorithms for supervised learning - classification models
and regression models.

1. Classification model - In simple terms, a classification model predicts


possible outcomes. Example: Predicting if a transaction is fraud or not.
2. Regression model - Are used to predict a numerical value. Example:
Predicting the sale price of a house.

What is logistic regression?

Logistic regression is an example of supervised learning. It is used to calculate or


predict the probability of a binary (yes/no) event occurring. An example of logistic
regression could be applying machine learning to determine if a person is likely to
be infected with COVID-19 or not. Since we have two possible outcomes to this
question - yes they are infected, or no they are not infected - this is called binary
classification.

Sigmoid function, which produces an S-shaped curve. It always returns a


probability value between 0 and 1. The Sigmoid function is used to convert
expected values to probabilities. The function converts any real number into a
number between 0 and 1. We utilize sigmoid to translate predictions to
probabilities in machine learning.

The mathematically sigmoid function can be,

Where to use logistic regression

 In health care, logistic regression can be used to predict if a tumor is likely


to be benign or malignant.
 In the financial industry, logistic regression can be used to predict if a
transaction is fraudulent or not.
 In marketing, logistic regression can be used to predict if a targeted audience
will respond or not.
Advantages of the Logistic Regression Algorithm

 Logistic regression performs better when the data is linearly separable


 It does not require too many computational resources as it’s highly interpretable
 There is no problem scaling the input features—It does not require tuning
 It is easy to implement and train a model using logistic regression
 It gives a measure of how relevant a predictor (coefficient size) is, and its
direction of association (positive or negative)
PART B
(PART B : TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments within two hours of the practical. The
soft copy must be uploaded on the Blackboard or emailed to the concerned lab in charge faculties at
the end of the practical in case the there is no Black board access available)

Roll. No. BE-B Name: Sakshi.B.Tupsundar


Class: BE-Comps Batch: B2
Date of Experiment:10-10-2023 Date of Submission:11-10-2023
Grade:

B.1 Software Code written by student:


import numpy as np

from google.colab import drive

import csv

import pandas as pd

import seaborn as sns

# Method 2: Using pandas

df = pd.read_csv('/content/survey lung cancer.csv')

print(df)

df.shape

df.isnull().sum()

df.head()

from sklearn import preprocessing

# label_encoder object knows


# how to understand word labels.

label_encoder = preprocessing.LabelEncoder()

# Encode labels in column 'species'.

df['GENDER']= label_encoder.fit_transform(df['GENDER'])

df['GENDER'].unique()

df['LUNG_CANCER']= label_encoder.fit_transform(df['LUNG_CANCER'])

df['LUNG_CANCER'].unique()

df.head()

import seaborn as sns

from matplotlib import pyplot as plt

import matplotlib.pyplot as plt

import matplotlib.pyplot as plt

plt.figure(figsize=(14, 8))

plt.suptitle("Lung Disease Prediction")

ax = plt.gca()

df.boxplot()

Applying Logistic model before outlier removal


#Applying Logistic Regression with outlier
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
X = df.drop('LUNG_CANCER', axis=1)
y = df['LUNG_CANCER']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

from sklearn.preprocessing import StandardScaler


st_x= StandardScaler()
X_train= st_x.fit_transform(X_train)
X_test= st_x.transform(X_test)
log_reg = LogisticRegression(max_iter=1000)

log_reg.fit(X_train, y_train)

y_pred = log_reg.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)


print("Accuracy:", accuracy)

from sklearn.metrics import classification_report


log_reg.fit(X_train,y_train)
y_pre = log_reg.predict(X_train)
y_pred = log_reg.predict(X_test)
y_pred1 = log_reg.predict_proba(X_test)

from sklearn.metrics import accuracy_score,roc_auc_score


from sklearn.model_selection import cross_val_score
print("Training Accuracy_score:
{}".format(accuracy_score(y_train,y_pre)))
print("Testing Accuracy_score:
{}".format(accuracy_score(y_test,y_pred)))
print("roc_auc_score: {}".format(roc_auc_score(y_test,y_pred1[:,1])))
print("CV_score: {}".format(cross_val_score(log_reg, X, y, cv=10,
scoring='accuracy').mean()))
print(classification_report(y_test, y_pred))
Applying Logistic model after outlier removal
from sklearn.model_selection import train_test_split
X = data_cleaned.iloc[:, :-1]
y = data_cleaned.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split( X, y,
test_size=0.2,
random_state=42)
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
X_train= st_x.fit_transform(X_train)
X_test= st_x.transform(X_test)

from sklearn.linear_model import LogisticRegression


logreg_model=LogisticRegression()
logreg_model.fit(X_train,y_train)

y_pre=logreg_model.predict(X_train)
y_pred=logreg_model.predict(X_test)
y_pred1=logreg_model.predict_proba(X_test)

from sklearn.metrics import accuracy_score,roc_auc_score


from sklearn.model_selection import cross_val_score
print("Training Accuracy_score:
{}".format(accuracy_score(y_train,y_pre)))
print("Testing Accuracy_score:
{}".format(accuracy_score(y_test,y_pred)))
print("roc_auc_score: {}".format(roc_auc_score(y_test,y_pred1[:,1])))
print("CV_score: {}".format(cross_val_score(log_reg, X, y, cv=10,
scoring='accuracy').mean()))
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
hyperparameter tuning
# For hyperparameter tuning-
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from sklearn.metrics import confusion_matrix

# Specify parameters

c_values = list(np.arange(1, 10))

param_grid = [
{'C': c_values, 'penalty': ['l1'], 'solver' : ['liblinear'],
'multi_class' : ['ovr']},
{'C': c_values, 'penalty': ['l2'], 'solver' : ['liblinear', 'newton-
cg', 'lbfgs'], 'multi_class' : ['ovr']}
]
grid = GridSearchCV(logreg_model, param_grid, cv=10, scoring='accuracy')
grid.fit(X_train,y_train)
print(grid.best_params_)
print(grid.best_score_)
print(grid.best_estimator_)
predictionforest = grid.best_estimator_.predict(X_test)
y_pred2=grid.best_estimator_.predict_proba(X_test)

print(confusion_matrix(y_test,predictionforest))
#print ("{0}".format(metrics.confusion_matrix(y_test, y_pred, labels=[1,
0])))
print ("Classification Report")

# labels for set 1=True to upper left and 0 = False to lower right
print ("{0}".format(metrics.classification_report(y_test,
predictionforest, labels=[1, 0])))
acc_hyper = accuracy_score(y_test,predictionforest)
print (acc_hyper)
print("roc_auc_score after hypertuning:
{}".format(roc_auc_score(y_test,y_pred2[:,1])))
print("roc_auc_score original:
{}".format(roc_auc_score(y_test,y_pred1[:,1])))
print("CV_score: {}".format(cross_val_score(log_reg, X, y, cv=10,
scoring='accuracy').mean()))
B.2 Input and Output:

Logistic Model Before Outlier Removal After Outlier Removal


Training Accuracy Score 0.9271255060728745 0.9428571428571428
Testing Accuracy Score 0.967741935483871 0.9193548387096774
ROC_AUC Score 0.9166666666666667 0.9263157894736842
CV Score 0.9256989247311826 0.9221505376344086

Hyper Parameter Tuning for After Outlier Removal


Logistic Model
Accuracy 0.9306666666666666
ROC_AUC score 0.9228070175438596
CV score 0.9221505376344086

B.3 Observations and learning:


We able to learn Logistic Regression.
B.4 Conclusion:
We able to implement Logistic Regression.

B.5 Question of Curiosity

1.What is Logistic regression?


Ans:- Logistic regression is a supervised machine learning algorithm mainly used for classification tasks
where the goal is to predict the probability that an instance of belonging to a given class or not. It is a
kind of statistical algorithm, which analyze the relationship between a set of independent variables and
the dependent binary variables. It is a powerful tool for decision-making. For example email spam or
not. Logistic regression is a supervised machine learning algorithm mainly used for classification tasks
where the goal is to predict the probability that an instance of belonging to a given class. It is used for
classification algorithms its name is logistic regression. it’s referred to as regression because it takes the
output of the linear regression function as input and uses a sigmoid function to estimate the probability
for the given class. The difference between linear regression and logistic regression is that linear
regression output is the continuous value that can be anything while logistic regression predicts the
probability that an instance belongs to a given class or not.

You might also like