Linear Regression

ROLL NO:- 2019332


In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import statsmodels.api as sm

data = pd.read_csv(r"Advertising.csv")

Unnamed: 0 TV Radio Newspaper Sales

0 1 230.1 37.8 69.2 22.1

1 2 44.5 39.3 45.1 10.4

2 3 17.2 45.9 69.3 9.3

3 4 151.5 41.3 58.5 18.5

4 5 180.8 10.8 58.4 12.9

5 6 8.7 48.9 75.0 7.2

6 7 57.5 32.8 23.5 11.8

7 8 120.2 19.6 11.6 13.2

8 9 8.6 2.1 1.0 4.8

9 10 199.8 2.6 21.2 10.6

10 11 66.1 5.8 24.2 8.6

11 12 214.7 24.0 4.0 17.4

12 13 23.8 35.1 65.9 9.2

13 14 97.5 7.6 7.2 9.7

14 15 204.1 32.9 46.0 19.0

15 16 195.4 47.7 52.9 22.4

16 17 67.8 36.6 114.0 12.5

17 18 281.4 39.6 55.8 24.4

18 19 69.2 20.5 18.3 11.3

19 20 147.3 23.9 19.1 14.6

20 21 218.4 27.7 53.4 18.0

21 22 237.4 5.1 23.5 12.5

22 23 13.2 15.9 49.6 5.6

23 24 228.3 16.9 26.2 15.5

24 25 62.3 12.6 18.3 9.7

25 26 262.9 3.5 19.5 12.0

26 27 142.9 29.3 12.6 15.0

27 28 240.1 16.7 22.9 15.9

28 29 248.8 27.1 22.9 18.9

29 30 70.6 16.0 40.8 10.5

... ... ... ... ... ...

170 171 50.0 11.6 18.4 8.4

171 172 164.5 20.9 47.4 14.5

172 173 19.6 20.1 17.0 7.6

173 174 168.4 7.1 12.8 11.7

174 175 222.4 3.4 13.1 11.5

175 176 276.9 48.9 41.8 27.0

176 177 248.4 30.2 20.3 20.2

177 178 170.2 7.8 35.2 11.7

178 179 276.7 2.3 23.7 11.8

179 180 165.6 10.0 17.6 12.6

180 181 156.6 2.6 8.3 10.5

181 182 218.5 5.4 27.4 12.2

182 183 56.2 5.7 29.7 8.7

183 184 287.6 43.0 71.8 26.2

184 185 253.8 21.3 30.0 17.6

185 186 205.0 45.1 19.6 22.6

186 187 139.5 2.1 26.6 10.3

187 188 191.1 28.7 18.2 17.3

188 189 286.0 13.9 3.7 15.9

189 190 18.7 12.1 23.4 6.7

190 191 39.5 41.1 5.8 10.8

191 192 75.5 10.8 6.0 9.9

192 193 17.2 4.1 31.6 5.9

193 194 166.8 42.0 3.6 19.6

194 195 149.7 35.6 6.0 17.3

195 196 38.2 3.7 13.8 7.6

196 197 94.2 4.9 8.1 9.7

197 198 177.0 9.3 6.4 12.8

198 199 283.6 42.0 66.2 25.5

199 200 232.1 8.6 8.7 13.4

200 rows × 5 columns

In [2]: data.columns

Out[2]: Index(['Unnamed: 0', 'TV', 'Radio', 'Newspaper', 'Sales'], dtype='object')

In [4]: plt.figure(figsize=(16, 8))

plt.xlabel("TV ")
plt.ylabel("Sales ")

In [6]: X = data['TV'].values.reshape(-1,1)
y = data['Sales'].values.reshape(-1,1)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_stat

reg = LinearRegression(), y_train)

(140, 1)

(60, 1)

(140, 1)

(60, 1)

Out[6]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,


In [7]: print(reg.coef_[0][0])

print("The linear model is: Y = {:.5} + {:.5}X".format(reg.intercept_[0], reg.coe



The linear model is: Y = 7.3108 + 0.045814X

In [9]: predictions = reg.predict(X_test)

plt.figure(figsize=(16, 8))
plt.xlabel("TV ")
plt.ylabel("Sales ")

In [10]: X=X_train
X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 =

OLS Regression Results


Dep. Variable: y R-squared: 0.555

Model: OLS Adj. R-squared: 0.552

Method: Least Squares F-statistic: 172.3

Date: Mon, 28 Feb 2022 Prob (F-statistic): 4.76e-26

Time: 14:28:59 Log-Likelihood: -371.64

No. Observations: 140 AIC: 747.3

Df Residuals: 138 BIC: 753.2

Df Model: 1

Covariance Type: nonrobust


coef std err t P>|t| [0.025 0.975]


const 7.3108 0.611 11.957 0.000 6.102 8.520

x1 0.0458 0.003 13.125 0.000 0.039 0.053


Omnibus: 1.727 Durbin-Watson: 1.908

Prob(Omnibus): 0.422 Jarque-Bera (JB): 1.452

Skew: -0.086 Prob(JB): 0.484

Kurtosis: 2.532 Cond. No. 366.



[1] Standard Errors assume that the covariance matrix of the errors is correctl
y specified.

In [11]: print('Train Score :', reg.score(X_train,y_train))

print('Test Score:', reg.score(X_test,y_test))

Train Score : 0.5552336104251212

Test Score: 0.725606346597073

In [12]: from sklearn import metrics

print('MSE :', metrics.mean_squared_error(y_test,predictions))

print('RMSE :', np.sqrt(metrics.mean_squared_error(y_test,predictions)))

MSE : 7.497479593464674

RMSE : 2.7381525876883988


In [14]: Xs = data.drop(['Sales', 'Unnamed: 0'], axis=1)

y = data['Sales'].values.reshape(-1,1)
reg = LinearRegression(), y)
print("The linear model is: Y = {:.5} + {:.5}*TV + {:.5}*radio + {:.5}*newspaper

The linear model is: Y = 2.9389 + 0.045765*TV + 0.18853*radio + -0.0010375*news


In [16]: X = np.column_stack((data['TV'], data['Radio'], data['Newspaper']))

y = data['Sales']
X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 =

OLS Regression Results


Dep. Variable: Sales R-squared: 0.897

Model: OLS Adj. R-squared: 0.896

Method: Least Squares F-statistic: 570.3

Date: Mon, 28 Feb 2022 Prob (F-statistic): 1.58e-96

Time: 15:01:01 Log-Likelihood: -386.18

No. Observations: 200 AIC: 780.4

Df Residuals: 196 BIC: 793.6

Df Model: 3

Covariance Type: nonrobust


coef std err t P>|t| [0.025 0.975]


const 2.9389 0.312 9.422 0.000 2.324 3.554

x1 0.0458 0.001 32.809 0.000 0.043 0.049

x2 0.1885 0.009 21.893 0.000 0.172 0.206

x3 -0.0010 0.006 -0.177 0.860 -0.013 0.011


Omnibus: 60.414 Durbin-Watson: 2.084

Prob(Omnibus): 0.000 Jarque-Bera (JB): 151.241

Skew: -1.327 Prob(JB): 1.44e-33

Kurtosis: 6.332 Cond. No. 454.



[1] Standard Errors assume that the covariance matrix of the errors is correctl
y specified.

