Professional Documents
Culture Documents
End To End Implementation of Data Science Pipeline in The Linear Regression Model
End To End Implementation of Data Science Pipeline in The Linear Regression Model
End To End Implementation of Data Science Pipeline in The Linear Regression Model
End To End Pipeline of Linear Regression [‘Image Created By Dheeraj Kumar K’]
DataSet.
1. Loading Dataset: This step is the data connection layer
# Import Libraries
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
# Load Dataset
df
because most problems can be solved with the help of good EDA
def show_hist(x):
plt.rcParams["figure.figsize"] = 15,18
x.hist()
show_hist(df)
In this Histogram we can see that most of the variables are not
def Show_PairPlot(x):
sns.pairplot(x)
Show_PairPlot(df)
Pair Plot will give you insights about both the relationship and
End Data Science Project, 60% of the work will regarding Data
Cleaning.
# Missing Values
df.isna().sum()
The output of checking Missing Values [‘Image Created By Dheeraj
Kumar K’]
Some More,
df['Year'] = df['dteday'].str.split('-').str[0]
df['Year'] = df['Year'].astype(int)
df['Month'] = df['dteday'].str.split('-').str[1]
df['Month'] = df['Month'].astype(int)
df['Date'] = df['dteday'].str.split('-').str[2]
df['Date'] = df['Date'].astype(int)
df = df.drop(['dteday'],axis=1)
df = df.drop(['yr'],axis=1)
df = df.drop(['mnth'],axis=1)
Outliers
specifically.
Outliers Detection
def outlier(x):
high=0
q1 = x.quantile(.25)
q3 = x.quantile(.75)
iqr = q3-q1
low = q1-1.5*iqr
high += q3+1.5*iqr
return(outlier)
outlier(df['cnt']).count()
The output of Outliers in Response Variable [‘Image Created By Dheeraj Kumar K’]
Removal of Outliers
q1 =df['cnt'].quantile(.25)
q3 = df['cnt'].quantile(.75)
iqr = q3-q1
print('=========================================================
=====================')
print('=========================================================
=======================')
import pylab
pt = PowerTransformer(method='yeo-johnson', standardize=True,)
df = pt.fit_transform(df)
df = pd.DataFrame(df)
df=df.rename(columns{0:'Intent',1:'Season',2:'hr',3:"holiday",4:
"Weekday",5:"Workingday",6:"Weathersit",7:"hum",8:"Windspeed",9:
"registered",10:"cnt",11:"year",12:"Month",13:"Date",14:"atemp",
15:"Temp",16:"Cacual"})
df
def show_hist(x):
plt.rcParams["figure.figsize"] = 15,18
x.hist()
show_hist(df)
have to test the model on some test dataset. For this, you’ll a
dataset which is different from the training set you used earlier.
In such cases, the obvious solution is to split the dataset you have
into two sets, one for training and the other for testing; and you
y = df['cnt']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3
0,random_state=40)
Building Statsmodel,
import statsmodels.api as sm
model2 =sm.OLS(y_train,x_train).fit()
model2.summary()
StatsModel Output [‘Image Created By Dheeraj Kumar K’]
LR =lr.fit(x_train,y_train)
LR_Pred = lr.predict(x_test)
print('R SQUARE:',r2_score(y_test,LR_Pred))
import pickle
pickle.dump(lr,open('LinearRegression.pkl','wb'))
FLASK Deployment
What Is Flask?
Flask is a popular Python web framework, meaning it is a
Why Flask?
● Easy to use.
● Extensively documented.
Project Structure
This project has four parts :
Learning Model.
<html >
<!--From https://codepen.io/frytyler/pen/EGdtg-->
<head>
<meta charset="utf-8">
<link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/4.3.1/css/bootst
rap.min.css">
<script
src="https://ajax.googleapis.com/ajax/libs/jquery/4.4.1/jquery.m
in.js"></script>
<script
src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd
/popper.min.js"></script>
<script
src="https://maxcdn.bootstrapcdn.com/bootstrap/4.3.1/js/bootstra
p.min.js"></script>
<link rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/
css/font-awesome.min.css">
<title>Home</title>
</head>
<style>
body {
width: 100%;
height:10%;
background: #6b66b8;
color: #fff;
font-size: 28px;
text-align:center;
letter-spacing:1.2px;
}
</style>
<style>
#ip4 {
padding: 20px;
width: 300px;
height: 15px;
</style>
<body>
<div>
<!--navbar portion-->
<li class="nav-item">
</li>
<li class="nav-item">
</li>
</ul>
</nav>
<br>
</div>
<body>
<br>
<div class="login">
<h1>Bike Sharing Demand Prediction</h1>
<br>
<br>
</form>
<br>
<br>
{{ prediction_text }}
</div>
</body>
</html>
import numpy as np
import pickle
app = Flask(__name__)
def home():
return render_template('index.html')
@app.route('/predict',methods=['POST'])
def predict():
'''
'''
int_features = [int(x) for x in request.form.values()]
final_features = [np.array(int_features)]
prediction = lr.predict(final_features)
output = round(prediction[0], 2)
@app.route('/predict_api',methods=['POST'])
def predict_api():
'''
For direct API calls trought request
'''
data = request.get_json(force=True)
prediction = lr.predict([np.array(list(data.values()))])
output = prediction[0]
return jsonify(output)
if __name__ == "__main__":
app.run(debug=True)
Output
Conclusion
In this tutorial, we have achieved the implementation of linear