Download as pdf or txt
Download as pdf or txt
You are on page 1of 19


Institute of Technology & Advanced Learning


SCM 4525 - 0LA



Prepared by:

Aksh Rathod - N01405747

Ha Phuong Anh Nguyen - N01391968

Sandra Bewaji - N01107633

Minh-Tuan Nguyen - N01365776

Eric French - N00194962

Hatem Shashaa

April 18th, 2023

I. Project Description

Company : DataCo’s Global

Project Description: The goal of the DataCo’s Global project is to examine and inspect the

supply chain operations of DataCo's. The goal is to recognise sections that can be improved for

advancements in place to expand the productivity of the company and reduce costs. Within the

project, we have analyzed the dataset that pertains to data on sectors such as Customer sales and

inventory. The overall goal is to cultivate an improved understanding of Dataco’s supply chain

operations and recognise sectors that can be improved.

Steps for the project:

● The first step of this project is to discover and analyze the dataset in order to have a

complete understanding of the information in the operations and the areas it comprises.

During this stage, we would be performing a data cleaning and vetting before the

processing of the statistics to safeguard its constant and dependency. The dataset contains

information on sales, inventory, and logistics, so it would be necessary to understand how

these different components are related and how they influence the complete functioning

of the supply chain. At this level, any repeats, absent values are deleted and put into a

proper plan for analysis. This ensures that all concerns are addressed in order for a proper

analysis of the data.

● After the first stage, the information is then put to conduct an exploratory data analysis

(EDA) in order to recognise repetitions and trends in the dataset. Steps such as

visualizing the dataset will help to expand understanding into how the supply chain

operates and logistics of the company can be upgraded and amended. This includes
techniques such as Bar charts, scatterplots, Pie charts and more. We utilized these to gain

a proper visualization of the arithmetical analysis and recognise patterns in the dataset.

Exploratory data analysis also helped us to recognize any possible irregularities in the

dataset so we could examine and investigate further.

● Once that analysis has been completed, we developed a descriptive model to understand

and gain a clear assessment on the financial situation of the company. Once that is

complete we made a predictive model to predict the inventory and sales situation. We did

this by analyzing the past dataset and recognised and tracked any trends. Doing so will

allow us to enhance the company's logistics processes and decrease their operation costs.

● The last stage is the conclusion with suggestions for refining logistics operations centered

on the understandings expanded from the predictive model and analysis of the dataset.

Alterations in the logistics, inventory administration and sales policies could be altered

for advancing Dataco’s information compilation and examination further.

The main objective of the report is to look for the most gainful categories and cultivate

the improved estimating techniques for the company to maintain their competitiveness. We used

the analytics from the dataset to distinguish development in DataCo's logistics operations and

develop a plan to improve productivity and lower costs. The tools utilized will include: cleaning

and processing, exploratory data analysis, descriptive model, predictive model and application.

II. Project Objectives

Through analyzing the dataset, the project is created to recognise the highest profitable

category. This will assist to concentrate the company's resources specifically on those categories
and distribute inventory appropriately. Using the dataset from Dataco’s Global, we aim to

enhance the forecasting procedures by means of forecasting practices, we aim to acquire

enhanced approaches that can precisely foresee demands for the category. Doing so will allow

Dataco’s to improve inventory and ensure product stock replenishment on-time. We also hope

that this will improve the profitability of Dataco’s Global by enhancing its portfolio and

concentrating on the best gainful categories, allowing it to grow its productivity and develop its

financial process. This will enable the company to invest in new products and services, and

continue to grow and expand its operations. Using the dataset we also aim to increase Dataco’s

competitiveness by concentrating on the most lucrative categories and using forecasting

techniques, to further its attractiveness. The end goal being improved client demand and

increased position in the market.

The steps in which we reached this was thru:

● Identification of any outliers and propose corrections to decrease costs.

● Analyze the outlines and advise on inventory to plan for which product

should be made more available in inventory.

● Make a model to forecast for manufacturing preparation and arrangement.

All in the direction to lead to better proficiency, saving in costs, and client fulfillment.

III. List the importance of the research to the industry

❖ Cost Reduction: Supply chain research helps e-commerce companies to optimize their

supply chain processes, which leads to a reduction in costs. With efficient supply chain

management, e-commerce companies can reduce the cost of transportation, inventory,

warehousing, and other supply chain-related expenses.

❖ Customer Satisfaction: Efficient supply chain management ensures that products are

delivered to customers on time and in good condition. This enhances customer

satisfaction, which is crucial for the success of any e-commerce business.

❖ Competitive Advantage: E-commerce companies that have efficient supply chain

management processes have a competitive advantage over their rivals. Such companies

can offer better delivery times, lower costs, and higher quality products, which attract and

retain customers.

❖ Increased Profitability: Efficient supply chain management results in increased

profitability for e-commerce companies. This is because it reduces the cost of doing

business, increases customer satisfaction, and enhances the company's reputation.

❖ Improved Sustainability: Supply chain research can help e-commerce companies to

identify and implement sustainable practices that reduce the environmental impact of

their operations. This not only helps to protect the environment but also enhances the

company's reputation among consumers who are increasingly concerned about


In conclusion, exploring inventory models will help us resolve many operations within

our supply chain. Research in inventory models can be beneficial because it is used across the

entire company and further grows into the industry.

IV. Highlight the data used (primary and/or secondary).

● The data we are using for the research is secondary data.

○ Secondary data can be obtained from a variety of sources, including

government agencies, industry associations, academic institutions, and

commercial data providers. Some common methods of obtaining

secondary data include:

○ Online searches: Researchers can conduct online searches for relevant

information using search engines, databases, and other online resources.

○ Data providers: Researchers can purchase secondary data from

commercial data providers, such as market research firms or industry


○ Government agencies: Researchers can obtain secondary data from

government agencies, such as the U.S. Census Bureau or the Bureau of

Labour Statistics.

○ Academic sources: Researchers can obtain secondary data from academic

sources, such as journals and published research papers.

● The main characteristics of secondary data include:

○ Non-original: Secondary data is information that has been previously

collected and recorded by others, rather than collected specifically for the

purpose of the research.

○ Low cost: Secondary data is generally less expensive to obtain than

primary data, as it is often publicly available or can be purchased from

third-party sources.

○ Time-saving: Secondary data can be obtained quickly and easily, as it is

readily available from a variety of sources.

○ Large sample size: Secondary data often has a large sample size, which

can provide a more comprehensive view of supply chain operations.

○ Variable quality: The quality of secondary data can vary widely

depending on the source and the accuracy of the information.

○ Lack of control: Researchers have little or no control over the collection

and recording of secondary data, which can limit its usefulness.

V. Data Used, Implemented Data Cleaning & Processing

Figure 1.Raw Data

Figure 2. Cleaned Data

The main objective of the report is to consider the most profitable categories and develop

the optimized forecasting methods for the company to maintain their competitiveness. The raw

data set contains more than 180,000 sets of data. Therefore, the dataset is then shrunk down to

1,500. In addition, there were multiple irrelevant variables for the purpose of this report. Thus,

irrelevant information is removed, and only those illustrated in Figure 2 are kept.

As we consider the profit of the company’s products, all information about the customers

(such as customers last name, email, etc.), payment methods (debit, credit, transfer), shipping

method and delivery status are removed. Customer’s personal information has no correlation

with profitability. Shipping mode was simple enough with only four subcategories it could stay.

Each category name has its own category ID numbers, so we reduced the ID number to

make the dataset more practical and lean. Repetitive information such as Customer ID and Order

customer ID; Product Card ID and Order Item CarProd ID; and Product Price and Order Item

Product Price are going to be removed one of each pair. We are tracking orders which everything

with “Order” in the name would be kept.

Product information such as product images, product descriptions, and product status are

removed due to its uselessness when put into analysis. Product category ID would also be

removed because we already have category names in use. In terms of geographics, we only

consider “Order Region” where anything else such as Latitude, Gratitude, States, Zip Codes, etc.

are removed.
After shrinking the data, the date recorded is only available until September 2017.

Although the raw data might prolong until 2018, however, the dataset size is too big. Therefore,

for the purpose of forecasting, the forecast would start from October 2017.

In addition, after cleaning the data, there are only 33 categories left (As illustrated in

below). For the Sales in Categories, the sheet includes all 33 categories, along with Order Item

Quantity, Order Item Product Price, Order Item Discount Rate and Sales, Order Item Profit Ratio

and Order Item Profit Per Order to calculate the Revenue corresponding to each category.
The Customers in Regions sheet shows the Total Order Quantity in each region in

general. Then, a bar chart can be graphed to illustrate the potential markets for the company.
For the Yearly Trend, only the top three categories in terms of profit are considered:

Sporting Goods, Cleats and Women’s Apparel. The sheets conclude the Order Date, Category

and Order Quantity.

VI. Visualized Data

Figure 1. Sales Data in Categories: Figure 1 shows the details in each category and its

profit each order. Figure 2 generalized all 33 categories, the profit of each as well as in total

Figure 2. Profit percentage by Category ID: by designing the product numbers by

profitability, it is easy to show how as the product numbers increase, so too does the profitability

of the items ID category. This makes projecting the best items to sell very easy as it starts at the

highest items category ID and the lower the items category, the lower the profitability of the

Figure 3. Top 10 Categories in Profit: because there are 33 categories in total, the graph

would be divided into multiple small sections. Those categories with low profit might account

for less than 1%. Therefore, only the top 10 profitable categories are visualized. The 23

remaining are named as “Others”

Figure 4. Customers in Regions: The total order quantity in each region is calculated,

regardless of product categories. The bar chart shows that the Western Europe is the most

potential market, following is Central America and South America.

Figure 5. Yearly Trend - Sporting Goods: For Sporting Goods, the remaining data is

only available from 2017 - 2018. Hence, the trend is shown in a 2-year period.

Figure 6. Yearly Trend - Cleats: The trend is shown in 3 years, from 2015 - 2018. For

this category, the demand fluctuated.

Figure 7. Yearly Trend - Women’s Apparel: This category also shows trends in the

period of 2015 - 2018. For Women’s Apparel, the demand gradually increased yearly.

VII. Data Models

01. Simple 3-month Moving Average for Cleats and Women’s Apparel
We use the Simple 3-month Moving Average to forecast the demand for cleats and

women’s apparel in 2017.

02. Exponential Smoothing Forecasting for Cleats and Women’s Apparel

For Exponential Smoothing, we select α = 0.3.

VIII. Results & Conclusions

Overall, after conducting several analyses using charts and graphs for visualization, two

models: Moving Average Model and Exponential Smoothing Model to see where trends go for

different products, we are able to see our top purchasing products to focus on either further

analysis, or production plan imposition. Additionally, we have a vision for the future of what are

going to be the most demanding products. That allows us to

1. Prepare for the ultimate profit generation pool

2. Manage our supply chain in a manner that will give us a competitive advantage in

the years to come.


1. Constante, F., Silva, F., & Pereira, A. V. (2019). DataCo SMART SUPPLY CHAIN FOR

BIG DATA ANALYSIS. A. Retrieved from:

You might also like