AIML

ARTIFICIAL INTELLIGENCE AND
MACHINE LEARNING
TOPIC : SUPERVISED MACHINE LEARNING AND

REGRESSION
SUBMITTED BY :
SENTHURAN L K (21ECB32)
VISHVA D (21ECB60) VARSHA
D S (21ECB57) ROHITH G
(21ECB18)
Find your own data set. As a suggested first step, spend some time finding a data set that
you are really passionate about. This can bePROJECT
a data set similar to the data you have available
at work or data you have always wanted to analyze. For some people this will be sports data
sets, while some other folks prefer to focus on data from a datathon or data for good.
REPORT
Analyzing Housing Data
Main Objective of the Analysis:
• The main objective of this analysis is to predict housing prices based on various attributes
using linear regression models.
Brief Description of the Data Set:
• The data set used for this analysis contains information about housing prices, including features
such as square footage, number of bedrooms, number of bathrooms, location, etc.
Data Exploration and Cleaning :
• During data exploration, we examined the distribution of each feature, checked for missing values,
and handled outliers if necessary. We also performed feature engineering to create additional features
that might be useful for prediction.
Summary of Linear Regression Models :
1. Simple Linear Regression: We started with a simple linear regression model using only one
feature, such as square footage, as a predictor.
2. Polynomial Regression: We extended the simple linear regression by including polynomial
features to capture nonlinear relationships between predictors and the target variable.
3. Regularized Regression : We applied ridge or lasso regression to handle multicollinearity and
prevent overfitting by adding a penalty term to the loss function.
Key Findings and insights:

• Square footage, number of bedrooms, and location are significant predictors of housing prices.
# There might be interactions between certain features that could affect housing prices.
# The model provides insights into how different features contribute to variations in housing
prices.
Code Implementation:
# Data loading, exploration, and cleaning import pandas as pd
# Load the dataset
housing_data = pd.read_csv('housing_data.csv')
# Data exploration print(housing_data.head()) print(housing_data.describe())
print(housing_data.info())
# Data cleaning (handling missing values, outliers, etc.)
# Feature engineering (creating new features, handling categorical variables, etc.)
# Linear regression modeling
from sklearn.model_selection import train_test_split from sklearn.linear_model import
LinearRegression, Ridge, Lasso from sklearn.preprocessing import PolynomialFeatures from
sklearn.metrics import mean_squared_error
# Split data into training and testing sets X = housing_data.drop('price', axis=1)

y = housing_data['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Simple linear regression

simple_lr = LinearRegression() simple_lr.fit(X_train[['sqft']], y_train) simple_lr_pred =
simple_lr.predict(X_test[['sqft']]) simple_lr_mse = mean_squared_error(y_test, simple_lr_pred)
# Polynomial regression
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X[['sqft']])
X_train_poly, X_test_poly, y_train_poly, y_test_poly = train_test_split(X_poly, y, test_size=0.2,
random_state=42)
poly_lr = LinearRegression()
poly_lr.fit(X_train_poly, y_train_poly)
poly_lr_pred = poly_lr.predict(X_test_poly)
poly_lr_mse = mean_squared_error(y_test_poly, poly_lr_pred)
# Regularized regression (Ridge or Lasso) ridge = Ridge(alpha=0.5) ridge.fit(X_train, y_train)
ridge_pred = ridge.predict(X_test)
ridge_mse = mean_squared_error(y_test, ridge_pred)
lasso = Lasso(alpha=0.5)
lasso.fit(X_train, y_train)
lasso_pred = lasso.predict(X_test)
lasso_mse = mean_squared_error(y_test, lasso_pred)
# Select the best model based on MSE
mse_values = {'Simple Linear Regression': simple_lr_mse, 'Polynomial Regression': poly_lr_mse, 'Ridge

Regression': ridge_mse, 'Lasso Regression': lasso_mse}
best_model = min(mse_values, key=mse_values.get)
# Print best model and its MSE
print(f"Best Model: {best_model}, MSE: {mse_values[best_model]}")
Output:
sqft bedrooms bathrooms price

0 2104 3 2.0399900
1 1600 3 2.0329900
2 2400 3 3.0369000
3 1416 2 1.0232000
4 3000 4 4.0539900
sqft bedrooms bathrooms price
count 47.00000047.000000 47.000000 47.000000

mean 2000.680851 3.170213 2.148936 340412.659574
std 794.702354 0.760982 0.780873 125039.899586
min 852.000000 1.000000 1.000000 169900.000000
25* 1432.000000 3.000000 2.000000 249900.000000
50* 1888.000000 3.000000 2.000000 299900.000000
75* 2269.000000 4.000000 3.000000 384450.000000
max 4478.000000 5.000000 4.000000 699900.000000
<class 'pandas.core . frame.DataFrame'>
Rangelndex: 47 entries, 0 to 46
Data columns (total4 columns)
# Column Non-Null Count Dtype
0 sqft 47non-null int64

1 bedrooms 47non-null int64
2 bathrooms 47non-null float64

3 price 47non-null int64
dtypes: float64(l),int64(3)
memory usage: 1.6 KB

AIML

Uploaded by

Copyright:

Available Formats

You might also like

AIML

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AIML

Uploaded by

Copyright:

Available Formats

ARTIFICIAL INTELLIGENCE AND

TOPIC : SUPERVISED MACHINE LEARNING AND

VISHVA D (21ECB60) VARSHA

Analyzing Housing Data

Main Objective of the Analysis:

Brief Description of the Data Set:

Data Exploration and Cleaning :

Summary of Linear Regression Models :

Key Findings and insights:

# Data loading, exploration, and cleaning import pandas as pd

# Load the dataset

# Data exploration print(housing_data.head()) print(housing_data.describe())

# Data cleaning (handling missing values, outliers, etc.)

# Feature engineering (creating new features, handling categorical variables, etc.)

# Linear regression modeling

from sklearn.model_selection import train_test_split from sklearn.linear_model import

LinearRegression, Ridge, Lasso from sklearn.preprocessing import PolynomialFeatures from

sklearn.metrics import mean_squared_error

# Split data into training and testing sets X = housing_data.drop('price', axis=1)

# Simple linear regression

simple_lr.predict(X_test[['sqft']]) simple_lr_mse = mean_squared_error(y_test, simple_lr_pred)

poly_lr_mse = mean_squared_error(y_test_poly, poly_lr_pred)

# Regularized regression (Ridge or Lasso) ridge = Ridge(alpha=0.5) ridge.fit(X_train, y_train)

ridge_mse = mean_squared_error(y_test, ridge_pred)

lasso_mse = mean_squared_error(y_test, lasso_pred)

# Select the best model based on MSE

mse_values = {'Simple Linear Regression': simple_lr_mse, 'Polynomial Regression': poly_lr_mse, 'Ridge

print(f"Best Model: {best_model}, MSE: {mse_values[best_model]}")

sqft bedrooms bathrooms price

count 47.00000047.000000 47.000000 47.000000

0 sqft 47non-null int64

2 bathrooms 47non-null float64

You might also like