Professional Documents
Culture Documents
Retail Sales Prediction Model
Retail Sales Prediction Model
Retail Sales Prediction Model
Table of Contents
0. Introduction
1. Import libraries
2. Import data
3. Data exploration
3.1 Sales Data
3.2 Stores Data
4. Data cleaning
4.1 Sales Data Cleaning
4.2 Stores Data Cleaning
5. Feature engineering
6. Data preparation
6.1 Data Merge
6.2 Encoding Categorical Data
6.3 Data Splitting
6.4 Feature Selection
7. Model training
7.1 Linear Regression
7.2 DecisionTreeRegressor
7.3 RandomForestRegressor
7.4 GradientBoostingRegressor
7.5 AdaBoostRegressor
7.6 XGBoostRegressor
8. Result Evaluation & Conclusion
0. Introduction
Back to top
Problem Statement
XYZ operates over 3,000 drug stores in 7 countries. XYZ store managers are currently
tasked with predicting their daily sales. Store sales are influenced by many factors,
including promotions, competition, school, and state holidays, seasonality, and locality.
With thousands of individual managers predicting sales based on their unique
circumstances, the accuracy of results can be quite varied.
You are provided with historical sales data for 1,115 XYZ stores.Note that some stores in
the dataset were temporarily closed for refurbishment.
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 1/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
This is a Regression Problem, where we will attempt to predict the sales figures using
historical data.
Data Description
Salesdata.csv - Historical Sales Data
Sales dataset
Store dataset
1. Import libraries
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 2/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Back to top
2. Import data
Back to top
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 3/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[6]: Store DayOfWeek Date Sales Customers Open Promo StateHoliday SchoolHoliday
31-
0 1 5 07- 5263 555 1 1 0
2015
31-
1 2 5 07- 6064 625 1 1 0
2015
31-
2 3 5 07- 8314 821 1 1 0
2015
31-
3 4 5 07- 13995 1498 1 1 0
2015
31-
4 5 5 07- 4822 559 1 1 0
2015
0 1 c a 1270.0 9.0
1 2 a a 570.0 11.0
2 3 a a 14130.0 12.0
3 4 c c 620.0 9.0
4 5 a a 29910.0 4.0
3. Data exploration
Back to top
In [8]: sales_df.head()
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 4/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[8]: Store DayOfWeek Date Sales Customers Open Promo StateHoliday SchoolHoliday
31-
0 1 5 07- 5263 555 1 1 0
2015
31-
1 2 5 07- 6064 625 1 1 0
2015
31-
2 3 5 07- 8314 821 1 1 0
2015
31-
3 4 5 07- 13995 1498 1 1 0
2015
31-
4 5 5 07- 4822 559 1 1 0
2015
In [9]: sales_df.columns
In [10]: sales_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1017209 entries, 0 to 1017208
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Store 1017209 non-null int64
1 DayOfWeek 1017209 non-null int64
2 Date 1017209 non-null object
3 Sales 1017209 non-null int64
4 Customers 1017209 non-null int64
5 Open 1017209 non-null int64
6 Promo 1017209 non-null int64
7 StateHoliday 1017209 non-null object
8 SchoolHoliday 1017209 non-null int64
dtypes: int64(7), object(2)
memory usage: 69.8+ MB
In [11]: print(sales_df.shape)
(1017209, 9)
Out[12]: 0
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 5/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Sales
In [14]: sales_df['Sales'].describe()
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 6/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
(172871, 9)
In [18]: (zero_sales.shape[0]/sales_df.shape[0])*100
Out[18]: 16.994639253093514
Sales_Zeros = 172871
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 7/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Open
In [19]: sales_df['Open'].value_counts()
Out[19]: Open
1 844392
0 172817
Name: count, dtype: int64
In [20]: plt.figure(figsize=(8,4))
sns.countplot(data=sales_df, x=sales_df.Open)
plt.title('Open', fontsize=16)
plt.xlabel('count')
plt.show()
In [21]: sales_df[sales_df['Sales']==0]['Open'].value_counts()
Out[21]: Open
0 172817
1 54
Name: count, dtype: int64
Customers
Back to top
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 8/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
plt.show()
Out[24]: 172869
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 9/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[25]: Open
0 172817
1 52
Name: count, dtype: int64
Back to top
Out[26]: 7
In [27]: sales_df['DayOfWeek'].value_counts()
Out[27]: DayOfWeek
5 145845
4 145845
3 145665
2 145664
1 144730
7 144730
6 144730
Name: count, dtype: int64
Out[28]: DayOfWeek
7 141137
4 11218
5 7212
1 7173
3 3743
2 1708
6 678
Name: count, dtype: int64
Out[29]: DayOfWeek
7 141137
4 11219
5 7212
1 7173
3 3743
2 1709
6 678
Name: count, dtype: int64
In [30]: sales_df[(sales_df['Open']==0)]['DayOfWeek'].value_counts()
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 10/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[30]: DayOfWeek
7 141137
4 11201
5 7205
1 7170
3 3729
2 1703
6 672
Name: count, dtype: int64
plt.tight_layout()
plt.show()
observation
most of the times stores are also closed on sunday. so the sales drop on 7th day (sunday)
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 11/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
plt.figure(figsize=(8, 4))
plt.plot(df_yearly_sales['Year'], df_yearly_sales['Sales'], marker='o', linestyle='
plt.title('Sales Over Years')
plt.xlabel('Year')
plt.ylabel('Total Sales')
plt.grid(True)
plt.xticks(df_yearly_sales['Year'])
plt.show()
Promo
In [35]: sales_df['Promo'].value_counts()
Out[35]: Promo
0 629129
1 388080
Name: count, dtype: int64
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 12/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
School Holiday
In [37]: sales_df['SchoolHoliday'].value_counts()
Out[37]: SchoolHoliday
0 835488
1 181721
Name: count, dtype: int64
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 13/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
StateHoliday
In [39]: sales_df['StateHoliday'].value_counts()
Out[39]: StateHoliday
0 855087
0 131072
a 20260
b 6690
c 4100
Name: count, dtype: int64
In [40]: sales_df['StateHoliday'].unique()
StateHoliday attribute needs correction as the zero(0) is recorded as both int(0) and str ('0')
In [42]: sales_df['StateHoliday'].unique()
In [43]: sales_df['StateHoliday'].value_counts()
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 14/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[43]: StateHoliday
0 986159
a 20260
b 6690
c 4100
Name: count, dtype: int64
Observations :
In [44]: plt.figure(figsize=(8,4))
sns.countplot(data=sales_df, x=sales_df.StateHoliday)
plt.title('StateHoliday', fontsize=16)
plt.xlabel('count')
plt.show()
Outliers analysis
Sales
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 15/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Customers
In [46]: plt.figure(figsize=(10, 3))
sns.boxplot(x=sales_df.Customers, showfliers=True, showbox=True, whis=1.5, color='r
plt.title('Customers Distribution')
plt.show()
Skewness analysis
In [47]: pd.DataFrame.from_dict(dict(
{
'Sales':sales_df.Sales.skew(),
'Customers':sales_df.Customers.skew()
}), orient='index', columns=['Skewness'])
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 16/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[47]: Skewness
Sales 0.64146
Customers 1.59865
In [48]: df_store.head()
0 1 c a 1270.0 9.0
1 2 a a 570.0 11.0
2 3 a a 14130.0 12.0
3 4 c c 620.0 9.0
4 5 a a 29910.0 4.0
In [49]: df_store.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1115 entries, 0 to 1114
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Store 1115 non-null int64
1 StoreType 1115 non-null object
2 Assortment 1115 non-null object
3 CompetitionDistance 1112 non-null float64
4 CompetitionOpenSinceMonth 761 non-null float64
5 CompetitionOpenSinceYear 761 non-null float64
6 Promo2 1115 non-null int64
7 Promo2SinceWeek 571 non-null float64
8 Promo2SinceYear 571 non-null float64
9 PromoInterval 571 non-null object
dtypes: float64(5), int64(2), object(3)
memory usage: 87.2+ KB
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 17/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[50]: Store 0
StoreType 0
Assortment 0
CompetitionDistance 3
CompetitionOpenSinceMonth 354
CompetitionOpenSinceYear 354
Promo2 0
Promo2SinceWeek 544
Promo2SinceYear 544
PromoInterval 544
dtype: int64
In [52]: df_store[df_store['Promo2']==0].shape
In [53]: df_store[df_store['Promo2']==0].head()
0 1 c a 1270.0 9.0
3 4 c c 620.0 9.0
4 5 a a 29910.0 4.0
5 6 a a 310.0 12.0
6 7 a c 24000.0 4.0
In [54]: df_store['PromoInterval'].unique()
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 18/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Obervations
StoreType
In [55]: df_store['StoreType'].value_counts()
Out[55]: StoreType
a 602
d 348
c 148
b 17
Name: count, dtype: int64
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 19/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Assortment
In [57]: df_store['Assortment'].value_counts()
Out[57]: Assortment
a 593
c 513
b 9
Name: count, dtype: int64
4. Data cleaning
Back to top
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 20/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[60]: 0
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 21/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
(9731, 10)
(10501, 10)
In [68]: pd.DataFrame.from_dict(dict(
{
'Sales':df_sales_clean.Sales.skew(),
'Customers':df_sales_clean.Customers.skew()
}), orient='index', columns=['Skewness'])
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 22/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[68]: Skewness
Sales 0.10212
Customers 0.22518
In [72]: pd.DataFrame.from_dict(dict(
{
'Sales':df_sales_clean.Sales.skew(),
'Customers':df_sales_clean.Customers.skew()
}), orient='index', columns=['Skewness'])
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 23/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[72]: Skewness
Sales -0.059760
Customers -0.065484
Out[75]: Store 0
StoreType 0
Assortment 0
CompetitionDistance 3
CompetitionOpenSinceMonth 354
CompetitionOpenSinceYear 354
Promo2 0
Promo2SinceWeek 544
Promo2SinceYear 544
PromoInterval 544
dtype: int64
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 24/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
In [78]: df_store_clean.isna().sum().sum()
Out[78]: 0
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 25/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
In [80]: df_store_clean.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1115 entries, 0 to 1114
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Store 1115 non-null int64
1 StoreType 1115 non-null object
2 Assortment 1115 non-null object
3 CompetitionDistance 1115 non-null float64
4 CompetitionOpenSinceMonth 1115 non-null float64
5 CompetitionOpenSinceYear 1115 non-null float64
6 Promo2 1115 non-null int64
7 Promo2SinceWeek 1115 non-null float64
8 Promo2SinceYear 1115 non-null float64
9 PromoInterval 1115 non-null object
dtypes: float64(5), int64(2), object(3)
memory usage: 87.2+ KB
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 26/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
5. Feature engineering
Back to top
In [83]: df_sales_new_feat.head()
Out[83]: Store DayOfWeek Date Sales Customers Open Promo StateHoliday SchoolHoliday
31-
0 1 5 07- 5263 555 1 1 0
2015
31-
1 2 5 07- 6064 625 1 1 0
2015
31-
2 3 5 07- 8314 821 1 1 0
2015
31-
3 4 5 07- 13995 1498 1 1 0
2015
31-
4 5 5 07- 4822 559 1 1 0
2015
In [85]: df_sales_new_feat['Date'].info()
<class 'pandas.core.series.Series'>
Index: 975812 entries, 0 to 1017208
Series name: Date
Non-Null Count Dtype
-------------- -----
975812 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 14.9 MB
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 27/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
df_sales_new_feat['date_dow_name'] = df_sales_new_feat['Date'].dt.day_name()
df_sales_new_feat['date_is_weekend'] = np.where(df_sales_new_feat['date_dow_name'].
In [87]: df_sales_new_feat.info()
<class 'pandas.core.frame.DataFrame'>
Index: 975812 entries, 0 to 1017208
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Store 975812 non-null int64
1 DayOfWeek 975812 non-null int64
2 Date 975812 non-null datetime64[ns]
3 Sales 975812 non-null int64
4 Customers 975812 non-null int64
5 Open 975812 non-null int64
6 Promo 975812 non-null int64
7 StateHoliday 975812 non-null object
8 SchoolHoliday 975812 non-null int64
9 date_year 975812 non-null int32
10 date_month 975812 non-null int32
11 date_dow_name 975812 non-null object
12 date_is_weekend 975812 non-null int32
dtypes: datetime64[ns](1), int32(3), int64(7), object(2)
memory usage: 93.1+ MB
In [89]: df_sales_new_feat.columns
In [91]: df_sales_new_feat.sample(4)
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 28/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
In [93]: df_store_new_feat.CompetitionDistance.describe()
df_store_new_feat['CompetitionDistanceRange'] = pd.cut(df_store_new_feat['Competiti
bins=ranges,
labels=label,
include_lowest=True,
right=False
)
In [95]: df_store_new_feat.CompetitionOpenSinceYear.describe()
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 29/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
df_store_new_feat['CompetitionOpenSinceYear1'] = df_store_new_feat['CompetitionOpen
In [97]: df_store_new_feat.CompetitionOpenSinceYear1.value_counts()
Out[97]: CompetitionOpenSinceYear1
2006-2010 612
2011-2015 327
2000-2005 156
Before 2000 20
Name: count, dtype: int64
In [99]: df_store_new_feat.columns
In [101… df_store_new_feat.nunique()
In [102… df_store_new_feat.sample(4)
51 52 d c Very Close
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 30/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
6. Data preparation
Back to top
In [105… df_merge.sample(4)
420477 493 7 0 0 0 0 0 0
In [106… df_merge.columns
In [107… df_merge.isna().sum().sum()
Out[107… 0
In [108… df_merge.shape
In [109… df_merge.duplicated().sum()
Out[109… 107282
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 31/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
In [111… df_merge.duplicated().sum()
Out[111… 0
In [112… df_merge.shape
Droping Store
df_merge.PromoInterval = df_merge.PromoInterval.map({'NoPromo':0,
'Jan,Apr,Jul,Oct':1,
'Feb,May,Aug,Nov':2,
'Mar,Jun,Sept,Dec':3})
In [115… df_merge.info()
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 32/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
<class 'pandas.core.frame.DataFrame'>
Index: 868530 entries, 0 to 975811
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 DayOfWeek 868530 non-null int64
1 Sales 868530 non-null int64
2 Customers 868530 non-null int64
3 Open 868530 non-null int64
4 Promo 868530 non-null int64
5 StateHoliday 868530 non-null int64
6 SchoolHoliday 868530 non-null int64
7 date_year 868530 non-null int32
8 date_month 868530 non-null int32
9 date_is_weekend 868530 non-null int32
10 StoreType 868530 non-null int64
11 Assortment 868530 non-null int64
12 CompetitionDistanceRange 868530 non-null category
13 CompetitionOpenSinceMonth 868530 non-null int32
14 CompetitionOpenSinceYear1 868530 non-null int64
15 Promo2 868530 non-null int64
16 Promo2SinceWeek 868530 non-null int32
17 Promo2SinceYear 868530 non-null int32
18 PromoInterval 868530 non-null int64
dtypes: category(1), int32(6), int64(12)
memory usage: 106.9 MB
In [117… df_merge.columns
In [118… df_merge.sample(4)
429701 7 0 0 0 0 0 0 2
143871 7 0 0 0 0 0 0 2
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 33/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
In [120… data_for_model.shape
In [122… X = data_for_model[input_features]
Y = data_for_model[target_feature]
In [123… # Split the data into 50% training and 50% remaining
X_train, X_temp, Y_train, Y_temp = train_test_split(X, Y, test_size=0.5, random_sta
# Split the remaining 50% into 30% validation and 20% testing
X_val, X_test, Y_val, Y_test = train_test_split(X_temp, Y_temp, test_size=0.4, rand
# Reset indices
X_train = X_train.reset_index(drop=True)
X_val = X_val.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)
Y_train = Y_train.reset_index(drop=True)
Y_val = Y_val.reset_index(drop=True)
Y_test = Y_test.reset_index(drop=True)
corr_matrix = X_train.corr()
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 34/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
# Create a heatmap
fig = go.Figure(data=go.Heatmap(
x=corr_matrix.columns,
y=corr_matrix.columns,
z=corr_matrix.values,
colorscale='Viridis',
zmin=-1, zmax=1))
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 35/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
In [128… X_train.shape
In [129… X_val.shape
In [130… X_test.shape
7. Model training
Back to top
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 37/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[132… ▾ LinearRegression
LinearRegression()
Training data
In [133… model_at_hand = Linear_regression
y_train_pred = model_at_hand.predict(X_train)
mean_absolute_error 848.3219116977904
mean_squared_error 1270421.1571851827
root_mean_squared_error 1127.129609754434
r2 : 0.8497316572399064
Adjusted-r2 : 0.8497264666116231
Validation data
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 38/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
mean_absolute_error 846.2712525269167
mean_squared_error 1262750.9246222086
root_mean_squared_error 1123.7219071559514
r2 : 0.8496512178879971
Adjusted-r2 : 0.8496425619972932
Testing data
In [135… model_at_hand = Linear_regression
y_test_pred = model_at_hand.predict(X_test)
mean_absolute_error 847.2350220558316
mean_squared_error 1266742.920519768
root_mean_squared_error 1125.4967438956755
r2 : 0.8492486487631956
Adjusted-r2 : 0.8492356297622827
plt.figure(figsize=(20, 10))
sns.regplot(x='true', y='pred', data=results, scatter_kws={"s": 10}, line_kws={"col
plt.title("Linear Regression Prediction")
plt.xlabel("True Values")
plt.ylabel("Predicted Values")
plt.show()
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 39/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[137… ▾ DecisionTreeRegressor
DecisionTreeRegressor(max_depth=10, random_state=42)
Training data
In [138… model_at_hand = decision_tree
y_train_pred_dt = model_at_hand.predict(X_train)
mean_absolute_error 684.7630342705415
mean_squared_error 897338.8738482862
root_mean_squared_error 947.2797231273803
r2 : 0.8938606896581025
Adjusted-r2 : 0.8938570233522385
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 40/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Validation data
In [139… model_at_hand = decision_tree
y_val_pred_dt = model_at_hand.predict(X_val)
mean_absolute_error 689.7687944737653
mean_squared_error 912138.8824511515
root_mean_squared_error 955.0596224588031
r2 : 0.8913966583437162
Adjusted-r2 : 0.8913904058244589
Testing data
In [140… model_at_hand = decision_tree
y_test_pred_dt = model_at_hand.predict(X_test)
mean_absolute_error 691.2255119789542
mean_squared_error 914646.3904988627
root_mean_squared_error 956.3714709770794
r2 : 0.8911506217733642
Adjusted-r2 : 0.8911412214585885
plt.figure(figsize=(20, 10))
sns.regplot(x='true', y='pred', data=results, scatter_kws={"s": 10}, line_kws={"col
plt.title("Linear Regression Prediction")
plt.xlabel("True Values")
plt.ylabel("Predicted Values")
plt.show()
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 41/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Out[142… ▾ RandomForestRegressor
Training data
In [143… model_at_hand = randomForest_regressor
y_train_pred_rf = model_at_hand.predict(X_train)
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 42/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
mean_absolute_error 662.6306262234529
mean_squared_error 838737.0438540096
root_mean_squared_error 915.8258807513629
r2 : 0.9007922491855428
Adjusted-r2 : 0.9007888223123383
Validation data
In [144… model_at_hand = randomForest_regressor
y_val_pred_rf = model_at_hand.predict(X_val)
mean_absolute_error 666.7656869376409
mean_squared_error 850902.2815541207
root_mean_squared_error 922.4436468175824
r2 : 0.8986877623817527
Adjusted-r2 : 0.8986819296264522
Testing data
In [145… model_at_hand = randomForest_regressor
y_test_pred_rf = model_at_hand.predict(X_test)
mean_absolute_error 667.1891192171452
mean_squared_error 851980.600835278
root_mean_squared_error 923.0279523585828
r2 : 0.8986082931880428
Adjusted-r2 : 0.8985995369234209
plt.figure(figsize=(20, 10))
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 43/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
7.4 GradientBoostingRegressor
Back to top
Out[147… ▾ GradientBoostingRegressor
Training Data
In [148… model_at_hand = gradient_booster
y_train_pred_gb = model_at_hand.predict(X_train)
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 44/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
mean_absolute_error 509.37470620440644
mean_squared_error 497629.43375333503
root_mean_squared_error 705.4285461712866
r2 : 0.9411392435525556
Adjusted-r2 : 0.9411372103611223
Validation Data
In [149… model_at_hand = gradient_booster
y_val_pred_gb = model_at_hand.predict(X_val)
mean_absolute_error 514.1751850000209
mean_squared_error 507014.6621183516
root_mean_squared_error 712.0496205450513
r2 : 0.9396325629417146
Adjusted-r2 : 0.9396290874633642
Testing Data
In [150… model_at_hand = gradient_booster
y_test_pred_gb = model_at_hand.predict(X_test)
mean_absolute_error 513.3922086262515
mean_squared_error 506655.92820248945
root_mean_squared_error 711.7976736422293
r2 : 0.9397043673570933
Adjusted-r2 : 0.939699160180574
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 45/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
plt.figure(figsize=(20, 10))
sns.regplot(x='true', y='pred', data=results, scatter_kws={"s": 10}, line_kws={"col
plt.title("GradientBoostingRegressor Prediction")
plt.xlabel("True Values")
plt.ylabel("Predicted Values")
plt.show()
7.5 AdaBoostRegressor
Back to top
Out[152… ▸ AdaBoostRegressor
▸ base_estimator: DecisionTreeRegressor
▸ DecisionTreeRegressor
Training data
In [153… model_at_hand = adaboost_regressor
y_train_pred_ada = model_at_hand.predict(X_train)
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 46/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
mean_absolute_error 661.4912482323593
mean_squared_error 765255.044243848
root_mean_squared_error 874.7885711666837
r2 : 0.9094838694735599
Adjusted-r2 : 0.9094807428297267
Validation data
In [154… model_at_hand = adaboost_regressor
y_val_pred_ada = model_at_hand.predict(X_val)
mean_absolute_error 669.0196654460577
mean_squared_error 794985.349872525
root_mean_squared_error 891.6195095849603
r2 : 0.9053454827712929
Adjusted-r2 : 0.9053400333147409
Testing data
In [155… model_at_hand = adaboost_regressor
y_test_pred_ada = model_at_hand.predict(X_test)
mean_absolute_error 668.0571340165561
mean_squared_error 793784.2481950956
root_mean_squared_error 890.9457044035263
r2 : 0.9055340700409826
Adjusted-r2 : 0.9055259118916972
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 47/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
plt.figure(figsize=(20, 10))
sns.regplot(x='true', y='pred', data=results, scatter_kws={"s": 10}, line_kws={"col
plt.title("AdaBoostRegressor Prediction")
plt.xlabel("True Values")
plt.ylabel("Predicted Values")
plt.show()
7.6 XGBoostRegressor
Back to top
Out[169… ▾ XGBRegressor
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 48/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
Training data
In [170… model_at_hand = xgboost
y_train_pred_xgb = model_at_hand.predict(X_train)
mean_absolute_error 409.2964975125055
mean_squared_error 339637.88554671773
root_mean_squared_error 582.7845961817434
r2 : 0.9598268480409066
Adjusted-r2 : 0.9598254603640682
Validation data
In [171… model_at_hand = xgboost
y_val_pred_xgb = model_at_hand.predict(X_val)
mean_absolute_error 427.2426332609709
mean_squared_error 369210.7245661449
root_mean_squared_error 607.6271262593079
r2 : 0.956040117097663
Adjusted-r2 : 0.9560375862361794
Testing data
In [172… model_at_hand = xgboost
y_test_pred_xgb = model_at_hand.predict(X_test)
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 49/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
n = len(Y_test)
k = X_test.shape[1]
print("Adjusted-r2 :", adjusted_r2(R2_Score, n, k))
mean_absolute_error 426.68219768788504
mean_squared_error 370602.0162675934
root_mean_squared_error 608.7709062263024
r2 : 0.9558957434705853
Adjusted-r2 : 0.9558919345935748
plt.figure(figsize=(20, 10))
sns.regplot(x='true', y='pred', data=results, scatter_kws={"s": 10}, line_kws={"col
plt.title("XGBRegressor Prediction")
plt.xlabel("True Values")
plt.ylabel("Predicted Values")
plt.show()
8 Result Evaluation
Back to top
In [185… Model_evaluation_scores = {
'Model': [
'LinearRegression',
'DecisionTreeRegressor',
'RandomForestRegressor',
'GradientBoostingRegressor',
'AdaBoostRegressor',
'XGBoostRegressor'
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 50/51
08/07/2024, 10:38 Retail_Sales_Prediction_model
],
'R² Score': [
0.849248,
0.891150,
0.898608,
0.939704,
0.905534,
0.955895
],
'Adjusted R² Score': [
0.849235,
0.891141,
0.898599,
0.939699,
0.905525,
0.955891
]
}
result = pd.DataFrame(Model_evaluation_scores)
result.index = result.index + 1
result
Conclusion
AdaBoostRegressor, GradientBoostingRegressor & XGBoostRegressor has given
amazing results.
Can Observe that R2 and Adjusted R2 values are closely similar, This indicates that
these models are well-balanced and do not suffer significantly from adding irrelevant
variables.it generally means that the model is robust, has good explanatory power, and
is not overfitting, The predictors included are contributing meaningfully to the
prediction of the target variable.
Ajith Devadiga
localhost:8888/doc/tree/OneDrive/Desktop/ML_projs/ML_models/Retail_Sales_Predictions/Retail_Sales_Prediction_model.ipynb 51/51