Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

PRINTERS AND SPARES

PG DIPLOMA DATA SCIENCE-2019-2020 CAPSTONE PROJECT

DEMAND FORECASTING TO ASSIST AN


UPCOMING PRINTER MANUFACTURER

TEAM MEMBERS
ABHYUDAYA MARYA
JEEVA RAMPRASAD
ROMA DADHIRAO
SUDARSHAN RAKHMAJI KADGE
G TAROON SUBRAMANIAM

AMITY UNIVERSITY ONLINE


Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

Table of Contents

1. Introduction………………………………………………………………...2
2. Project Description & Tools Used……………………………………….4
3. Role of Machine Learning………………………………………………..5
4. Data Exploration…………………………………………………………..6
5. Data Manipulation...............................................................................7
6. Feature Engineering………………………………………………………8
7. Building Training-Test Sample...................……………………………..9
8. Model Selection and
Evaluation……………………………………………………………........10
9. Key Contributions…………………………………………………………12
10. Analysis of Results………………………………………………………..13
11. Tableau Visualizations.......................................................................14

1
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

Introduction

Through this project, for the benefit of an organization (Inqer Printers and Spares*) dealing
with sale of spare parts of printers, we intend to integrate data on their stock/ inventory and
demand collated for the past 15 months to effectively track their supply chain, enhance
decision making ability and expedite the process of grievance redressal using Machine
Learning Algorithms. Through statistical techniques, we will forecast demand effectively in
the short, medium and long term. In addition, we will optimize the inventory to maximize
profits (while also being cognizant of the need of a buffer stock). This will help envisage
better sales and operations plans across departments and also optimize resourcing efficiency
by creating supply plans based on prioritized demands, allocations and supply chain
constraints.

Dataset Description:
1. The data set consists of sales of a company dealing with a large number of
components.
2. It consists of inventory of the components in 3 different warehouses->AME, APJ,
EMEA.
3. It also consists of parameters to prioritize inventory planning through Local Area
Stock Code, PSMS*, D-Chain Status**, SPT***

D-Chain PSMS

25 C2, C4

55 C5, S6, C8, S8

60, 61 S9

69 C5, S6, C8, S8

*Data is real time (with certain realistic changes) by an organization, however, a fictitious name to uphold its privacy.

2
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

D-Chain Description

25 Part is not currently available for sale but


will be at a future date. Allows pricing and
costs but does not allow orders.

55 The part is actively available

60 The part is no longer available. Orders and


pricing are blocked.
*orders can be manually created on GCSS

61 The part is no longer available but does have


a replacement. All D-Chain 61 parts must
have a corresponding Material Determination
record

69 The part is actively available but under


allocation. It allows orders but prevent
shipment until the order is released.

Planned Delivery Time

Time it takes to receive a part after a purchase order (PO) is placed with a supplier

Yield Rate (For Repair Parts)

Special Procurement Type (SPT)

1. Part is repairable
2. Part is set to return due to the OEM for warranty coverage
3. If a part is non-returnable, it is assumed it cannot be repaired

3
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

PSMS Description

C2 In development - the default initial value for corporate parts

C4 Pre launch -the default initial value for SC specific parts

C5 NPI – Part is released for support

S6 Sustaining –supported part which is > 180 past the SAP FCS date

C8 Supplier (or ‘Supply’) EOL, POs still possible with potential limitations

S8 LTB analysis done, LTB PO raised where required

S9 EOSL – Part is no longer supported

C9 Obsolete –blocks all inventory and financial transactions

Project Objectives and Tools Used


Following are the tools to be used for the purpose of the project to demand the forecast:
a. Excel->Conversion of a bulky dataset into a csv file and collate the demand data with
spare parts data
b. Tableau->First an initial descriptive report to bolster the business acumen before we
go about assigning values and weights to the dataset for manipulation in python
c. Jupyter Notebook (Python)->Through Anaconda, a live web based environment for
effective data analytics to handle large datasets for effectively gathering insights
Following are the Python libraries most commonly used:
a. SciKit Learn->Guides through the entire lifecycle of the data analytics life cycle
from preprocessing to learning
i. Label Encoder, One Hot Encoder (Encoding Data)
ii. Standard Scaler, Normalizer, MinMaxScaler (Scaling/ Normalizing
Data)
iii. PCA, Shuffle, Column Transformer (Feature Engineering)
iv. Train_Test_Split (Hold out method for supervised learning)

4
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

v. Supervised learning methods such as Naïve Bayes (Gaussian,


Bernoulli, Multinomial), SVM, Decision Tree, Random Forest,
Regression (Logistic, Linear)
vi. Evaluation metrics such as accuracy score, precision score, recall
score, f1 score

b. Pandas->Used to work on the dataframes from the stage of its initiation to


complex manipulations

c. Numpy->Predominantly to perform linear algebra operations on the dataframes


(and other forms of datasets too) but not limited to it.

d. Visualization
i. Matplotlib
ii. Seaborn

e. Other
i. Statistics
ii. Itertools
Objective
1. In this project, the target is to predict demand for the product based on its prior sales.
2. We are also trying to create a system to manage the inventory of the different.
warehouses, depending on the sales of a product.
3. This will give idea of much quantity of product to be ordered.

*PSMS->Plant Specific Material Status, it indicates its current position in the life cycle
**D-Chain->Determines if part is for sale and available for immediate delivery
***SPT->Special Procurement Type->Returnable or non-returnable

Role of Machine Learning


Machine Learning Model
For demand planning, since, we have the past sales data, we could perform supervised the
price of the item (business acumen suggesting that lower priced/ support system items would
usually need a bulk order). As is evident in the preliminary results, on experimenting with
various supervised learning techniques, the best result obtained was using ensemble methods,
i.e Random Forest and Decision Tree. We have also taken cognizance that it would be
important to compartmentalize demand in blocks such as low, medium, high, very high and

5
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

booming given the humongous variance as is expected in a business of a such a wide ambit.
The manipulation is done using Python and SciKitLearn Libraries.

Given the supervised learning techniques and the categorical nature of the predictor variable,
we employed the following supervised learning techniques:
a. GaussianNB
b. Random Forest Classifier
c. Decision Tree Classifier
d. MultinomialNB
e. SVM
f. Bernoulli NB
With the obtained accuracies mentioned below (post feature engineering)

As is evident, Random Forest and DT classifier (ensemble methods) give us the best results,
hence we obtain our results based on this. Based on several trials and errors, including
shuffling of data in hold out and using random samples from dataset, Random Forest is
consistently the best method.

Data Exploration
This was the first step to the demand forecasting process and to most data analytics life cycle
in general. After all the libraries from Numpy, Pandas, ScikitLearn, Matplotlib etc are added,
the steps are:
a. Add the csv file and check for datatypes
b. and read the first few rows (head() function)

c. Predictor variable to a different dataframe (and must be dropped later)

Here, business acumen is of utmost importance.

6
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

i. Local Stock Advice Code is mentioned in terms of priority of the


product as 1,2,3,4 with similar weights as is the priority
ii. There is a usual trade off of quantity and price and hence the bulk of
orders and to account for objectivity in results, we multiply
LOFNETQTY with Price (eg-a cartridge will always have a lot more
individual parts in orders than a xerox machine but the standards for us
to deem it a high demand is different)
d. Fill in NULL values in the dataframe and other nitty gritties such as separate
month and year and correction of spellings
e. Also, note, PSMS have been given weights in accordance to priority (a manual
label encoding). PSMS of S6 is extremely high priority due to being sustaining
while others are not. (Based on discussion with the lead, S6 given 4 times the
priority of 1)

f. One hot encoded the month since, our initial assumption/ null hypothesis is that
the month per se will not have an impact. If it does, the machine learning model
will account for it based on the equal weights assigned through OHE. Similarly
OHE on the region too.

g. For the predictor variable, we bin it as very low, low, medium, high, very high,
booming based on the needs of the lead. By trial and error based on 25th, 50th, 75th,
85th, 95th and 99th percentiles of predictor variable values, the bins were chosen as:

Data Manipulation
a. Dropping redundant columns/ non numerical. Drop month and region since they were
one hot encoded already.

b. Normalization of columns except local stock advice code (it is essentially a weight)

7
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

c. Price, DChain and Inventory values had a few NaN/ invalid values so replaced those
with the median

Feature Engineering
Before dwelling into the feature engineering process, presenting first analysis of the various
supervised learning methods (after hold out method was applied)
(Three examples of only basic code snippets, else it would fill the whole page)

With the accuracy results (first just the basic metric)

8
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

Hence, it was imperative to filter out important features, used random forest to filter out since
that was the most accurate method here.

Plot of features in order of importance

We can choose the top 10 features for our need. (APJ_INVentory_Value,


AME_INVentory_Value,....., AMS) and discard the other
Significant improvement of 4-5% accuracy in some cases. Best is still the ensemble methods
of Random Forest and Decision Tree.

Building Training-Test Sample


We use the hold out training test validation method for this wherein a part of the dataset is
used to train and build a model and in accordance to the trained model, check if the smaller
training set is learning it currently and hence evaluate the model.
In this case,
Training->70% of the dataset

9
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

Test->30% of the dataset


Validation Process
The training and test dataset were first evaluated with the shuffle validator as “True” so that
the sets are formed randomly, on running the entire workflow 5 times, it is set to “False”, so
it may be feasible for us to label the evaluated dataset and hence provide insights with utmost
accuracy.

The process was performed twice->first on the dataset as it is and then again, post the feature
engineering process.

Model Selection and Evaluation


Since, we had a predictor variable, we limited the modelling to supervised learning
techniques. Regression would not prove to be useful in this case due to the high number of
encoding and weights assignations and could skew the results as either under or overfitting,
hence we limited to the following methods:
a. Naïve Bayes->GaussianNB, MultinomialNB, BernoulliNB
b. Ensemble Methods->Random Forest, Decision Tree
c. Vector methods->Support Vector Machines
d. Neighbors->K-distance Neighbors Classifier

Initially, only the accuracy was tested to check for potential for feature engineering:

While the accuracy is decent in a few methods (Random Forest, DT, KNeighbors, SVM),
there is potential for improvement of both accuracy and also computation time hence, it is
imperative to resort to feature selection.

Feature Selection Snippet


Used Random Forest for the purpose due to its highest accuracy before

10
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

Accuracy Post Feature Selection

Did not bode well for Naïve Bayes and SVM based methods however, it terms of Ensemble
(Random Forest and DT) and Neighbors based (KNN), the improvement is highly significant.
While either of the three can be chosen, stuck to Random Forest Classifier.

Other Evaluation Metrics


Note->It is not recommended to use Area Under Curve for comparison, since it needs the
process of binarization of data first, which would lead to less interpretability in categorical
predictor variable

Precision and Recall Score for Random Forest

Confidence Matrix

11
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

Due to a high value of over 75% in all metrics, this method could get a go ahead for the
purpose.

To further improve accuracy, it would be recommended to:


a. Improvement of business acumen for better binning
b. Improvement of business acumen for better weight assignation
c. More data, a year may not be enough to not account for biases

Key Contributions
On actually realizing the contributions that advances in the fields of subsets of Data Science,
viz. Machine Learning, Artificial intelligence and Deep Learning have in merely our day to
day activities and is exponentially having more so every day, perhaps the present millennial
generation and generations to come would find it unfathomable a life without this. While it
may be far sighted at this stage, the zenith of this is expounded by Murray Shanahan as
“Technological Singularity” which in very rudimentary terms would involve our entire life
processes being driven by technology to the point of it being the sole decision maker.
However, not digressing towards the philosophical annotations, Machine Learning used in a
swathe of industries ranging from effective assembly lines in core industries to demand and
inventory planning in the service and e-commerce sector to even having forayed deeply into
the primary sector of Agriculture in a range of processes. Here, we using real time data, we
exemplified its significance for a small to medium scale Printers and Spares organization, as
to how it could aid it for effective decision making and inventory planning. Today,
employing machine learning for a business may be a luxury but in the coming years it would
be sine qua non. Following were the aspects covered and usually is in any analytics
processes:
a. Business Acumen=>Often domain wise knowledge
b. Comprehensive knowledge of statistical as well as Machine Learning mathematics
c. Data Exploration processes, technical and logical/ face value based
d. Data preparation, both for valuable insights, as well as to improve computation time
e. Data Modelling and Evaluation, choose the right techniques and evaluate the most
appropriate
f. Deployment->Export the learnings into a csv file for further analysis
g. Descriptive and Predictive visualizations, tableau/ excel/ powerBI

Analysis of Results
With at least 75% of accuracy, we can guarantee the proprietor of Inqer Printers and Spares,
the demand that he may predict (not to be too pedantic, but this was up to Feb-2020 right
before the economic slump due to the pandemic, so realistically, the analysts might have had
a shock from the far cry in the results, which further highlights the importance of

12
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

dynamically accounting for factors (economists use the phrases “animal spirits” and “black
swan events”) which is a learning in itself from the project). Nevertheless, now, following is
how, the expectations have been classified in the various demand categories in the graph

As the trend was turning out to be, indeed, there is expectation of lot of products in “Low”
demand as they followed those metrics however, very few in “Very Low” and an
encouraging number in “Very High” and “High”. It is alarming however, that there are
several products with low demand compared to medium which makes it imperative to clear
out the stock appropriately for those that haven’t been off the shelves, either through:
a. Sale at throwaway prices
b. Depending on e-commerce platforms for sale
c. On the shelf marketing around local stores rather than solely own store
d. Chain marketing
e. Grassroot level lead generation for potential bulk orders (although executing this may
be expensive and might not have a great trade off)

For the few within “Very Low”, without a doubt it has to be through Sales at Throwaway
prices.

The goal is to bring as many products in the booming category as possible, which is however,
not a realistic thought and in any case would only involve further raising the yardsticks.

13
Final Report-Amity Capstone Demand Forecasting-Printer Spare Parts

A high amount of demand within the “medium” category is indicative of not being threatened
by competition/ having upped the ante from Inqer’s side at the most, however, must always
be on the lookout.

Tableau Visualizations

Descriptive

Predictions

14

You might also like