Linear Regression

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

2/28/22, 3:10 PM Linear regression

NAME:- PRIADARSHANA
ROLL NO:- 2019332

SIMPLE LINEAR REGRESSION

localhost:8888/notebooks/Machine learning/Linear regression.ipynb 1/7


2/28/22, 3:10 PM Linear regression

In [1]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import statsmodels.api as sm

data = pd.read_csv(r"Advertising.csv")
data

Out[1]:
Unnamed: 0 TV Radio Newspaper Sales

0 1 230.1 37.8 69.2 22.1

1 2 44.5 39.3 45.1 10.4

2 3 17.2 45.9 69.3 9.3

3 4 151.5 41.3 58.5 18.5

4 5 180.8 10.8 58.4 12.9

5 6 8.7 48.9 75.0 7.2

6 7 57.5 32.8 23.5 11.8

7 8 120.2 19.6 11.6 13.2

8 9 8.6 2.1 1.0 4.8

9 10 199.8 2.6 21.2 10.6

10 11 66.1 5.8 24.2 8.6

11 12 214.7 24.0 4.0 17.4

12 13 23.8 35.1 65.9 9.2

13 14 97.5 7.6 7.2 9.7

14 15 204.1 32.9 46.0 19.0

15 16 195.4 47.7 52.9 22.4

16 17 67.8 36.6 114.0 12.5

17 18 281.4 39.6 55.8 24.4

18 19 69.2 20.5 18.3 11.3

19 20 147.3 23.9 19.1 14.6

20 21 218.4 27.7 53.4 18.0

21 22 237.4 5.1 23.5 12.5

22 23 13.2 15.9 49.6 5.6

23 24 228.3 16.9 26.2 15.5

24 25 62.3 12.6 18.3 9.7

25 26 262.9 3.5 19.5 12.0

26 27 142.9 29.3 12.6 15.0

27 28 240.1 16.7 22.9 15.9

28 29 248.8 27.1 22.9 18.9

localhost:8888/notebooks/Machine learning/Linear regression.ipynb 2/7


2/28/22, 3:10 PM Linear regression

Unnamed: 0 TV Radio Newspaper Sales

29 30 70.6 16.0 40.8 10.5

... ... ... ... ... ...

170 171 50.0 11.6 18.4 8.4

171 172 164.5 20.9 47.4 14.5

172 173 19.6 20.1 17.0 7.6

173 174 168.4 7.1 12.8 11.7

174 175 222.4 3.4 13.1 11.5

175 176 276.9 48.9 41.8 27.0

176 177 248.4 30.2 20.3 20.2

177 178 170.2 7.8 35.2 11.7

178 179 276.7 2.3 23.7 11.8

179 180 165.6 10.0 17.6 12.6

180 181 156.6 2.6 8.3 10.5

181 182 218.5 5.4 27.4 12.2

182 183 56.2 5.7 29.7 8.7

183 184 287.6 43.0 71.8 26.2

184 185 253.8 21.3 30.0 17.6

185 186 205.0 45.1 19.6 22.6

186 187 139.5 2.1 26.6 10.3

187 188 191.1 28.7 18.2 17.3

188 189 286.0 13.9 3.7 15.9

189 190 18.7 12.1 23.4 6.7

190 191 39.5 41.1 5.8 10.8

191 192 75.5 10.8 6.0 9.9

192 193 17.2 4.1 31.6 5.9

193 194 166.8 42.0 3.6 19.6

194 195 149.7 35.6 6.0 17.3

195 196 38.2 3.7 13.8 7.6

196 197 94.2 4.9 8.1 9.7

197 198 177.0 9.3 6.4 12.8

198 199 283.6 42.0 66.2 25.5

199 200 232.1 8.6 8.7 13.4

200 rows × 5 columns

In [2]: data.columns

Out[2]: Index(['Unnamed: 0', 'TV', 'Radio', 'Newspaper', 'Sales'], dtype='object')

localhost:8888/notebooks/Machine learning/Linear regression.ipynb 3/7


2/28/22, 3:10 PM Linear regression

In [4]: plt.figure(figsize=(16, 8))


plt.scatter(
data['TV'],
data['Sales']
)
plt.xlabel("TV ")
plt.ylabel("Sales ")
plt.show()

In [6]: X = data['TV'].values.reshape(-1,1)
y = data['Sales'].values.reshape(-1,1)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_stat

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
reg = LinearRegression()
reg.fit(X_train, y_train)

(140, 1)

(60, 1)

(140, 1)

(60, 1)

Out[6]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,

normalize=False)

localhost:8888/notebooks/Machine learning/Linear regression.ipynb 4/7


2/28/22, 3:10 PM Linear regression

In [7]: print(reg.coef_[0][0])
print(reg.intercept_[0])

print("The linear model is: Y = {:.5} + {:.5}X".format(reg.intercept_[0], reg.coe

0.04581434217189623

7.310810165411681

The linear model is: Y = 7.3108 + 0.045814X

In [9]: predictions = reg.predict(X_test)



plt.figure(figsize=(16, 8))
plt.scatter(
data['TV'],
data['Sales']
)
plt.plot(
X_test,
predictions,
linewidth=2,
color='red'
)
plt.xlabel("TV ")
plt.ylabel("Sales ")
plt.show()

localhost:8888/notebooks/Machine learning/Linear regression.ipynb 5/7


2/28/22, 3:10 PM Linear regression

In [10]: X=X_train
y=y_train
X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())

OLS Regression Results

==============================================================================

Dep. Variable: y R-squared: 0.555

Model: OLS Adj. R-squared: 0.552

Method: Least Squares F-statistic: 172.3

Date: Mon, 28 Feb 2022 Prob (F-statistic): 4.76e-26

Time: 14:28:59 Log-Likelihood: -371.64

No. Observations: 140 AIC: 747.3

Df Residuals: 138 BIC: 753.2

Df Model: 1

Covariance Type: nonrobust

==============================================================================

coef std err t P>|t| [0.025 0.975]

------------------------------------------------------------------------------

const 7.3108 0.611 11.957 0.000 6.102 8.520

x1 0.0458 0.003 13.125 0.000 0.039 0.053

==============================================================================

Omnibus: 1.727 Durbin-Watson: 1.908

Prob(Omnibus): 0.422 Jarque-Bera (JB): 1.452

Skew: -0.086 Prob(JB): 0.484

Kurtosis: 2.532 Cond. No. 366.

==============================================================================

Warnings:

[1] Standard Errors assume that the covariance matrix of the errors is correctl
y specified.

In [11]: print('Train Score :', reg.score(X_train,y_train))


print('Test Score:', reg.score(X_test,y_test))

Train Score : 0.5552336104251212

Test Score: 0.725606346597073

In [12]: from sklearn import metrics



print('MSE :', metrics.mean_squared_error(y_test,predictions))

print('RMSE :', np.sqrt(metrics.mean_squared_error(y_test,predictions)))

MSE : 7.497479593464674

RMSE : 2.7381525876883988

MULTIPLE LINEAR REGRESSION

localhost:8888/notebooks/Machine learning/Linear regression.ipynb 6/7


2/28/22, 3:10 PM Linear regression

In [14]: Xs = data.drop(['Sales', 'Unnamed: 0'], axis=1)


y = data['Sales'].values.reshape(-1,1)
reg = LinearRegression()
reg.fit(Xs, y)
print("The linear model is: Y = {:.5} + {:.5}*TV + {:.5}*radio + {:.5}*newspaper

The linear model is: Y = 2.9389 + 0.045765*TV + 0.18853*radio + -0.0010375*news


paper

In [16]: X = np.column_stack((data['TV'], data['Radio'], data['Newspaper']))


y = data['Sales']
X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())

OLS Regression Results

==============================================================================

Dep. Variable: Sales R-squared: 0.897

Model: OLS Adj. R-squared: 0.896

Method: Least Squares F-statistic: 570.3

Date: Mon, 28 Feb 2022 Prob (F-statistic): 1.58e-96

Time: 15:01:01 Log-Likelihood: -386.18

No. Observations: 200 AIC: 780.4

Df Residuals: 196 BIC: 793.6

Df Model: 3

Covariance Type: nonrobust

==============================================================================

coef std err t P>|t| [0.025 0.975]

------------------------------------------------------------------------------

const 2.9389 0.312 9.422 0.000 2.324 3.554

x1 0.0458 0.001 32.809 0.000 0.043 0.049

x2 0.1885 0.009 21.893 0.000 0.172 0.206

x3 -0.0010 0.006 -0.177 0.860 -0.013 0.011

==============================================================================

Omnibus: 60.414 Durbin-Watson: 2.084

Prob(Omnibus): 0.000 Jarque-Bera (JB): 151.241

Skew: -1.327 Prob(JB): 1.44e-33

Kurtosis: 6.332 Cond. No. 454.

==============================================================================

Warnings:

[1] Standard Errors assume that the covariance matrix of the errors is correctl
y specified.

localhost:8888/notebooks/Machine learning/Linear regression.ipynb 7/7

You might also like