Professional Documents
Culture Documents
E Commerce Project
E Commerce Project
E Commerce Project
Overview
Problem Statement Dataset Details Business Objective
A leading Ecommerce company The dataset features numerical columns: Our goal is to decipher the key variables
specializes in online clothing sales and • Avg. Session Length: Represents the influencing annual income predictions.
offers in-store style and clothing advice average duration of in-store style advice By constructing a robust prediction
sessions. Customers engage in sessions (in minutes). model, we aim to provide valuable
personalized sessions with a stylist at the • Time on App: Signifies the average time insights that guide the company in
physical store, followed by the option to spent on the mobile app (in minutes). directing their focus effectively—whether
place orders through either the mobile • Time on Website: Indicates the average towards refining the mobile app
app or the website. The company faces a time spent on the website (in minutes). experience or optimizing the website.
strategic dilemma, pondering whether to • Length of Membership: Reflects the This analysis aligns with the overarching
optimize the mobile app or the website customer's membership duration in objective of maximizing customer
to enhance customer engagement and years. spending and satisfaction.
drive sales. Our mission is to assist them • Yearly Amount Spent: Denotes the total
in making an informed decision. annual spending by the customer in
dollars.
Data Preparation Process
Thoroughly checking and validating variables Meticulously handling null values was a priority
ensured the consistency and reliability of the to ensure accurate analysis and precise results.
data. This step was crucial to maintain data Implementing effective strategies helped
integrity throughout the project. maintain data quality and reliability.
Exploratory Data Analysis (EDA) and Auto
EDA Using Sweetviz
EDA was conducted with a wide array of visualizations, By leveraging Sweetviz, our automated EDA process
including correlation plots, relationship plots, became even more efficient and insightful. Sweetviz
distribution plots, histograms, and Q-Q plots. These provided comprehensive visualizations that accelerated
visualizations provided valuable insights into the data exploratory data analysis and revealed meaningful
distribution and underlying patterns. patterns.
Outliers
2 H2O
Integrated H2O for AutoML, allowing for automated model selection, hyperparameter
tuning, and model performance assessment, enhancing the efficiency of the machine
learning workflow.
3 Auto Sklearn
Auto Sklearn was employed to automate hyperparameter optimization, algorithm
selection, and model assessment, optimizing the machine learning pipeline.
Manual Machine Learning Techniques
Applied linear regression to model Utilized the ridge regressor to Implemented the Gradient
the relationship between predictor mitigate multicollinearity and Boosting (GB) regressor to
variables and the target variable, overfitting, enhancing the stability iteratively improve the predictive
providing valuable insights into and accuracy of the regression performance, ensemble learning,
the data patterns. model. and model generalization.
Manual Machine Learning Techniques
MSE R2
Mean Squared Error (MSE) R2 Values
Linear regression and ridge regression exhibited The R2 values for linear regression and ridge
the least MSE, signifying superior predictive regression were observed to be notably high,
accuracy and precision in the models. indicating strong model fitness and proportion of
variance in the target variable captured by the
predictors.
Deployment: User Interactive Web App
In summary, our analysis revealed crucial predictors affecting yearly customer spending, emphasizing the significance of
session length, time on app, time on website, and membership duration. The developed multiple linear regression models
demonstrated satisfactory accuracy, as measured by MSE and R-squared, offering valuable predictive insights for decision-
making.
The findings underscore the importance of enhancing user engagement, both on the app and website, as a strategic
approach to positively influence customer spending. Notably, a positive correlation was identified between membership
duration and spending, suggesting the potential benefits of encouraging longer memberships to foster loyalty.
In conclusion, this project contributes to a deeper understanding of customer behavior for the Ecommerce company,
providing actionable insights. Future recommendations include refining models, incorporating additional features, and
exploring advanced techniques to further enhance predictive capabilities.
The practical application of deploying predictive models through Flask offers real-time insights, creating a seamless
integration of data science findings into operational decision-making processes. This holistic approach emphasizes the
bridge between data-driven insights and practical business optimization.