E Commerce Project

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

E-commerce Project

Overview
Problem Statement Dataset Details Business Objective

A leading Ecommerce company The dataset features numerical columns: Our goal is to decipher the key variables
specializes in online clothing sales and • Avg. Session Length: Represents the influencing annual income predictions.
offers in-store style and clothing advice average duration of in-store style advice By constructing a robust prediction
sessions. Customers engage in sessions (in minutes). model, we aim to provide valuable
personalized sessions with a stylist at the • Time on App: Signifies the average time insights that guide the company in
physical store, followed by the option to spent on the mobile app (in minutes). directing their focus effectively—whether
place orders through either the mobile • Time on Website: Indicates the average towards refining the mobile app
app or the website. The company faces a time spent on the website (in minutes). experience or optimizing the website.
strategic dilemma, pondering whether to • Length of Membership: Reflects the This analysis aligns with the overarching
optimize the mobile app or the website customer's membership duration in objective of maximizing customer
to enhance customer engagement and years. spending and satisfaction.
drive sales. Our mission is to assist them • Yearly Amount Spent: Denotes the total
in making an informed decision. annual spending by the customer in
dollars.
Data Preparation Process

Variable Check Null Value Handling

Thoroughly checking and validating variables Meticulously handling null values was a priority
ensured the consistency and reliability of the to ensure accurate analysis and precise results.
data. This step was crucial to maintain data Implementing effective strategies helped
integrity throughout the project. maintain data quality and reliability.
Exploratory Data Analysis (EDA) and Auto
EDA Using Sweetviz

EDA with Visualizations Automated EDA with Sweetviz

EDA was conducted with a wide array of visualizations, By leveraging Sweetviz, our automated EDA process
including correlation plots, relationship plots, became even more efficient and insightful. Sweetviz
distribution plots, histograms, and Q-Q plots. These provided comprehensive visualizations that accelerated
visualizations provided valuable insights into the data exploratory data analysis and revealed meaningful
distribution and underlying patterns. patterns.
Outliers

1 Boxplot Visualization 2 IQR and Cook's Distance


Treatment
Employed boxplots to identify and
analyze outliers, aiding in the Robust handling of outliers was carried
determination of anomalous data out through the implementation of
points within the dataset. This Interquartile Range (IQR) treatment
visualization technique provided a and evaluation of Cook's distance.
clear representation of data These mechanisms facilitated the
distribution and identification of identification and treatment of
potential outliers. influential data points in the dataset.
Automated Machine Learning
(Auto ML)
1 Pycaret
Utilized Pycaret to automate the machine learning process, enabling streamlined model
building, evaluation, and comparison across multiple algorithms.

2 H2O
Integrated H2O for AutoML, allowing for automated model selection, hyperparameter
tuning, and model performance assessment, enhancing the efficiency of the machine
learning workflow.

3 Auto Sklearn
Auto Sklearn was employed to automate hyperparameter optimization, algorithm
selection, and model assessment, optimizing the machine learning pipeline.
Manual Machine Learning Techniques

Linear Regression Ridge Regressor GB Regressor

Applied linear regression to model Utilized the ridge regressor to Implemented the Gradient
the relationship between predictor mitigate multicollinearity and Boosting (GB) regressor to
variables and the target variable, overfitting, enhancing the stability iteratively improve the predictive
providing valuable insights into and accuracy of the regression performance, ensemble learning,
the data patterns. model. and model generalization.
Manual Machine Learning Techniques

Lasso Regressor XGBoost SVR & K-Neighbors


The Lasso regression technique Employed XGBoost, an efficient Utilized Support Vector
was utilized to perform feature gradient boosting algorithm, to Regression (SVR) and K-Nearest
selection and regularization, improve model accuracy, Neighbors (K-Neighbors)
enhancing the interpretability computational speed, and algorithms to model nonlinear
and robustness of the model. handle complex dataset relationships and complex data
structures effectively. patterns, enhancing the model's
predictive capability.
Model Evaluation & Performance

MSE R2
Mean Squared Error (MSE) R2 Values

Linear regression and ridge regression exhibited The R2 values for linear regression and ridge
the least MSE, signifying superior predictive regression were observed to be notably high,
accuracy and precision in the models. indicating strong model fitness and proportion of
variance in the target variable captured by the
predictors.
Deployment: User Interactive Web App

Pickle for Model Persistence


1 Utilized pickle for model serialization, ensuring the trained model's persistent
storage and retrieval for seamless deployment.

Flask Web Application


2 Leveraged Flask to develop a user interactive web application, enabling users to
predict the Yearly Amount Spent based on the deployed machine learning model.
Comprehensive Statistical Analysis Overview

Data Preparation & Modeling & Interactive Web App


EDA Evaluation Deployment
The initial phases of the Implemented a wide array The final phase included
project involved of machine learning the deployment of an
meticulous data techniques, both interactive web application
preparation, validation, automated and manual, using Flask, providing
and detailed exploratory followed by rigorous users with a seamless
data analysis, laying the evaluation to select the experience for predicting
groundwork for the most optimal models for the target variable.
subsequent modeling and deployment.
deployment stages.
Final Insights and Conclusions

In summary, our analysis revealed crucial predictors affecting yearly customer spending, emphasizing the significance of
session length, time on app, time on website, and membership duration. The developed multiple linear regression models
demonstrated satisfactory accuracy, as measured by MSE and R-squared, offering valuable predictive insights for decision-
making.

The findings underscore the importance of enhancing user engagement, both on the app and website, as a strategic
approach to positively influence customer spending. Notably, a positive correlation was identified between membership
duration and spending, suggesting the potential benefits of encouraging longer memberships to foster loyalty.

In conclusion, this project contributes to a deeper understanding of customer behavior for the Ecommerce company,
providing actionable insights. Future recommendations include refining models, incorporating additional features, and
exploring advanced techniques to further enhance predictive capabilities.

The practical application of deploying predictive models through Flask offers real-time insights, creating a seamless
integration of data science findings into operational decision-making processes. This holistic approach emphasizes the
bridge between data-driven insights and practical business optimization.

You might also like