AIML

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 5

ARTIFICIAL INTELLIGENCE AND

MACHINE LEARNING

TOPIC : SUPERVISED MACHINE LEARNING AND


REGRESSION

SUBMITTED BY :

SENTHURAN L K (21ECB32)

VISHVA D (21ECB60) VARSHA

D S (21ECB57) ROHITH G

(21ECB18)

Find your own data set. As a suggested first step, spend some time finding a data set that
you are really passionate about. This can bePROJECT
a data set similar to the data you have available
at work or data you have always wanted to analyze. For some people this will be sports data
sets, while some other folks prefer to focus on data from a datathon or data for good.

REPORT

Analyzing Housing Data

Main Objective of the Analysis:

• The main objective of this analysis is to predict housing prices based on various attributes
using linear regression models.

Brief Description of the Data Set:

• The data set used for this analysis contains information about housing prices, including features
such as square footage, number of bedrooms, number of bathrooms, location, etc.

Data Exploration and Cleaning :

• During data exploration, we examined the distribution of each feature, checked for missing values,
and handled outliers if necessary. We also performed feature engineering to create additional features
that might be useful for prediction.

Summary of Linear Regression Models :

1. Simple Linear Regression: We started with a simple linear regression model using only one
feature, such as square footage, as a predictor.
2. Polynomial Regression: We extended the simple linear regression by including polynomial
features to capture nonlinear relationships between predictors and the target variable.
3. Regularized Regression : We applied ridge or lasso regression to handle multicollinearity and
prevent overfitting by adding a penalty term to the loss function.

Key Findings and insights:


• Square footage, number of bedrooms, and location are significant predictors of housing prices.
# There might be interactions between certain features that could affect housing prices.
# The model provides insights into how different features contribute to variations in housing
prices.

Code Implementation:

# Data loading, exploration, and cleaning import pandas as pd

# Load the dataset

housing_data = pd.read_csv('housing_data.csv')

# Data exploration print(housing_data.head()) print(housing_data.describe())

print(housing_data.info())

# Data cleaning (handling missing values, outliers, etc.)

# Feature engineering (creating new features, handling categorical variables, etc.)

# Linear regression modeling

from sklearn.model_selection import train_test_split from sklearn.linear_model import

LinearRegression, Ridge, Lasso from sklearn.preprocessing import PolynomialFeatures from

sklearn.metrics import mean_squared_error

# Split data into training and testing sets X = housing_data.drop('price', axis=1)


y = housing_data['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Simple linear regression


simple_lr = LinearRegression() simple_lr.fit(X_train[['sqft']], y_train) simple_lr_pred =

simple_lr.predict(X_test[['sqft']]) simple_lr_mse = mean_squared_error(y_test, simple_lr_pred)

# Polynomial regression

poly = PolynomialFeatures(degree=2)

X_poly = poly.fit_transform(X[['sqft']])
X_train_poly, X_test_poly, y_train_poly, y_test_poly = train_test_split(X_poly, y, test_size=0.2,
random_state=42)

poly_lr = LinearRegression()

poly_lr.fit(X_train_poly, y_train_poly)

poly_lr_pred = poly_lr.predict(X_test_poly)

poly_lr_mse = mean_squared_error(y_test_poly, poly_lr_pred)

# Regularized regression (Ridge or Lasso) ridge = Ridge(alpha=0.5) ridge.fit(X_train, y_train)

ridge_pred = ridge.predict(X_test)

ridge_mse = mean_squared_error(y_test, ridge_pred)

lasso = Lasso(alpha=0.5)

lasso.fit(X_train, y_train)

lasso_pred = lasso.predict(X_test)

lasso_mse = mean_squared_error(y_test, lasso_pred)

# Select the best model based on MSE

mse_values = {'Simple Linear Regression': simple_lr_mse, 'Polynomial Regression': poly_lr_mse, 'Ridge


Regression': ridge_mse, 'Lasso Regression': lasso_mse}
best_model = min(mse_values, key=mse_values.get)
# Print best model and its MSE

print(f"Best Model: {best_model}, MSE: {mse_values[best_model]}")

Output:

sqft bedrooms bathrooms price


0 2104 3 2.0399900
1 1600 3 2.0329900
2 2400 3 3.0369000
3 1416 2 1.0232000
4 3000 4 4.0539900
sqft bedrooms bathrooms price

count 47.00000047.000000 47.000000 47.000000


mean 2000.680851 3.170213 2.148936 340412.659574
std 794.702354 0.760982 0.780873 125039.899586
min 852.000000 1.000000 1.000000 169900.000000
25* 1432.000000 3.000000 2.000000 249900.000000
50* 1888.000000 3.000000 2.000000 299900.000000
75* 2269.000000 4.000000 3.000000 384450.000000
max 4478.000000 5.000000 4.000000 699900.000000
<class 'pandas.core . frame.DataFrame'>
Rangelndex: 47 entries, 0 to 46
Data columns (total4 columns)
# Column Non-Null Count Dtype

0 sqft 47non-null int64


1 bedrooms 47non-null int64

2 bathrooms 47non-null float64


3 price 47non-null int64
dtypes: float64(l),int64(3)
memory usage: 1.6 KB

You might also like