Ex - No 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Ex.

No: 2 EXPLORATORY DATA ANALYSIS ON 2 VARIABLE


Date: LINEAR REGRESSION MODEL

AIM:
To write and execute a program on exploratory data analysis on a 2 variable linear
regression model.

ALGORITHM:
Step 1: Start
Step 2: Use NumPy to generate synthetic data with an independent variable (X) and a
dependent variable (y).
Step 3: Create a scatter plot to visualize the relationship between the independent and
dependent variables.
Step 4: Calculate the correlation coefficient between X and y using NumPy.
Step 5: Fit a linear regression model and plot the regression line.
Step 6: Plot the residuals against the independent variable ( X) and add a horizontal line at y=0.
Step 7: Display a histogram of the residuals using Seaborn.
Step 8: Fit the linear regression model using the statsmodels library and print a summary.
Step 9: Stop

PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm

# Generate synthetic data


np.random.seed(42)
X = 2 * np.random.rand(100, 1) # Independent variable
y = 4 + 3 * X + np.random.randn(100, 1) # Dependent variable with some noise

# Scatter plot
plt.scatter(X, y)
plt.title('Scatter Plot of X vs. y')
plt.xlabel('X')
plt.ylabel('y')
plt.show()

# Correlation
correlation = np.corrcoef(X.flatten(), y.flatten())[0, 1]
print(f'Correlation between X and y: {correlation}')
# Regression Line
regression_model = LinearRegression()
regression_model.fit(X, y)
plt.scatter(X, y)
plt.plot(X, regression_model.predict(X), color='red', linewidth=3)
plt.title('Regression Line of X vs. y')
plt.xlabel('X')
plt.ylabel('y')
plt.show()

# Residual Plot
residuals = y - regression_model.predict(X)
plt.scatter(X, residuals)
plt.axhline(y=0, color='r', linestyle='--')
plt.title('Residual Plot')
plt.xlabel('X')
plt.ylabel('Residuals')
plt.show()

# Distribution of Residuals
sns.histplot(residuals.flatten(), kde=True)
plt.title('Distribution of Residuals')
plt.xlabel('Residuals')
plt.show()

# Statistical Summary
X_with_constant = sm.add_constant(X)
model = sm.OLS(y, X_with_constant).fit()
print(model.summary())

OUTPUT:
RESULT:
Thus, the program on exploratory data analysis on a 2 variable linear regression model
was successfully executed and verified.

You might also like