Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

VIETNAM NATIONAL UNIVERSITY, HANOI

INTERNATIONAL SCHOOL

***

DATA WAREHOUSING AND BUSINESS ANALYTICS

REPORT OF FINAL EXAMINATION

Business Analysis based on Northwind Database

Lecturer: PhD. Tran Thi Ngan


Class: INS307301
Group 11

Hoàng Quỳnh Anh 20070666


Nguyễn Thị Thanh Nhàn 21070695
Nguyễn Anh Thơ 20070984
Nguyễn Phương Uyên 21070734

Hanoi – 2024
PARTICIPANTS

Full Name Student ID Contribution Percentage


Hoàng Quỳnh Anh 20070666 25%

Nguyễn Thị Thanh Nhàn 21070695 25%

Nguyễn Anh Thơ 20070984 25%

Nguyễn Phương Uyên 21070734 25%

2
TABLE OF CONTENTS

PARTICIPANTS....................................................................................................................... 2
TABLE OF CONTENTS...........................................................................................................3
I. Introduction......................................................................................................................... 4
1. Problem Overview.......................................................................................................... 4
2. Project Objective............................................................................................................ 4
3. Business Question......................................................................................................... 4
II. Database prepare............................................................................................................... 5
1. Database Design............................................................................................................5
2. ETL Process...................................................................................................................8
III. Building data warehouse..................................................................................................9
Business Process...............................................................................................................9
Dimensional Modeling...................................................................................................... 10
IV. Business analysis........................................................................................................... 11
V. Conclusion........................................................................................................................14
VI. References...................................................................................................................... 15

3
I. Introduction
1. Problem Overview
The challenge lies in predicting future sales accurately, which is crucial for businesses to
formulate effective strategies and ensure sustainable growth. With rapidly changing consumer
behavior and intense competition, understanding the factors influencing purchasing decisions
is essential. However, this task is complex due to the multitude of variables involved,
including consumer preferences, market trends, economic conditions, and competitor actions.
Traditional methods may not suffice in capturing and analyzing such vast and dynamic data
sets efficiently. Hence, there is a need for a comprehensive solution that can gather, store,
explore, analyze, and present data to forecast future consumption trends effectively.
To tackle this complexity, constructing a Data Warehouse is indispensable. A Data
Warehouse aids organizations in collecting, storing, analyzing, and presenting sales data. This
function is especially critical in an environment characterized by swift changes in ordering
trends and intense competition.
We collect data from diverse sources, tracking and comparing it with averages, targets, and
historical data to discern critical patterns and trends. Evaluating these results is a key step in
the analysis process. Ultimately, the derived insights provide essential information to support
organizational decision-making.

2. Project Objective
The objective of the project is to build a robust data warehouse system using Google
BigQuery that can serve as a powerful tool for analyzing and predicting consumption trends.
By collecting, storing, and processing data related to sales from various sources, the data
warehouse aims to provide accurate and timely insights into future consumption patterns and
the factors influencing them.
Through advanced analytics and predictive modeling techniques, the project seeks to
identify key drivers of consumption decisions and forecast revenue and order quantities with
high precision. Additionally, the ultimate goal is to endow organizations with the ability to
craft well-informed strategic blueprints and developmental trajectories based on actionable
insights derived from the data warehouse.

3. Business Question
“Which products should we increase advertising to optimize sales and increase profits?”
The question of which products warrant intensified advertising efforts to maximize sales
and bolster profitability encompasses a multifaceted inquiry into various aspects of marketing
strategy and consumer behavior. Delving into this query involves a comprehensive analysis
of product performance metrics, market dynamics, customer preferences, and competitive
landscape.

4
II. Database prepare
1. Database Design

Figure 1: ERD diagram

Table "Employees":
Employee_ID: int [primary key]
Last_name: nvarchar (20)
First_name: nvarchar (20)
Title: nvarchar (30)
Title_of_courtesy: nvarchar (25)
Birth_date: datetime
Hire_date: datetime
Address: nvarchar (60)
City: nvarchar (20)
Region: nvarchar (20)

5
Postal_code: nvarchar (10)
Country: nvarchar (20)
Home_phone: nvarchar (24)
Extension: nvarchar (4)
Photo: image
Notes: ntext
Reports_to: int (foreign key referencing Employee_ID in the same table)
PhotoPath: nvarchar (255)

Table "Categories":
Category_ID: int [primary key]
Category_name: nvarchar (15)
Description: ntext
Picture: image

Table "Customers":
Customer_ID: int [primary key]
Company_name: nvarchar (40)
Contact_name: nvarchar (30)
Contact_title: nvarchar (30)
Address: nvarchar (60)
City: nvarchar (15)
Region: nvarchar (15)
Postal_code: nvarchar (10)
Country: nvarchar (15)
Phone: nvarchar (24)
Fax: nvarchar (24)

Table "Shippers":
Shipper_ID: int [primary key]
Company_name: nvarchar (40)
Phone: nvarchar (24)

Table "Suppliers":
Supplier_ID: int [primary key]
Company_name: nvarchar (40)
Contact_name: nvarchar (30)
Contact_title: nvarchar (30)
Address: nvarchar (60)
City: nvarchar (15)
Region: nvarchar (15)
Postal_code: nvarchar (10)
Country: nvarchar (15)
Phone: nvarchar (24)

6
Fax: nvarchar (24)
Home_page: ntext

Table "Orders":
Order_ID: int [primary key]
Customer_ID: int (foreign key referencing Customer_ID in the Customers table)
Employee_ID: int (foreign key referencing Employee_ID in the Employees table)
Order_date: datetime
Required_date: datetime
Shipped_date: datetime
Ship_via: int (foreign key referencing Shipper_ID in the Shippers table)
Freight: money
Ship_name: nvarchar (40)
Ship_address: nvarchar (60)
Ship_city: nvarchar (15)
Ship_region: nvarchar (15)
Ship_postal_code: nvarchar (10)
Ship_country: nvarchar (15)

Table "Products":
Product_ID: int [primary key]
Product_name: nvarchar (40)
Supplier_ID: int (foreign key referencing Supplier_ID in the Suppliers table)
Category_ID: int (foreign key referencing Category_ID in the Categories table)
Quantity_per_unit: nvarchar (20)
Unit_price: money
Units_in_stock: smallint
Units_on_order: smallint
Reorder_level: smallint
Discontinued: bit

Table "Order Details":


Order_ID: int (foreign key referencing Order_ID in the Orders table)
Product_ID: int (foreign key referencing Product_ID in the Products table)
Unit_price: money
Quantity: smallint
Discount: real
Primary Key: Order_ID

Table "Territories":
Territory_ID: int [primary key]
Territory_description: int
Region_ID: int (foreign key referencing Region_ID in the Region table)

7
Table "Region":
Region_ID: int [primary key]
Region_description: int

Table "Customer_demo":
Customer_ID: int (foreign key referencing Customer_ID in the Customers table)
Customer_type_ID: int (foreign key referencing Customer_type_ID in the
Customer_demographics table)
Primary Key: Customer_ID

Table "Customer_demographics":
Customer_type_ID: int [primary key]

Table "Employee_territories":
Employee_ID: int (foreign key referencing Employee_ID in the Employees table)
Territory_ID: int (foreign key referencing Territory_ID in the Territories table)
Primary Key: Employee_ID

2. ETL Process
ETL Data Pipelines
Created ETL data pipelines with SQL Server Integration Services (SSIS) to automate data
extraction, transformation, and loading from the source database (OLTP) to the data
warehouse (OLAP).

The most significant element was loading the data into the Fact Table, which was
accomplished by utilizing a Merge Join transformation to aggregate all of the IDs from the
source database's tables.

Then, use a Lookup transformation to obtain the surrogate key for each dimension and do any
necessary changes.

8
III. Building data warehouse
Business Process
In the commencement of our Data Warehouse Project, we have strategically emphasized the
Revenue of Product Analysis Process as a focal point. This intentional decision is driven by
the fundamental business inquiry guiding our revenue strategy: "Which products should we
increase advertising to optimize sales and increase profits?" Starting with the Revenue of
product Analysis Process, positioned at the intersection of pricing policies and promotional
strategies, marks the optimal launchpad for our data-centric venture. Concentrating on this
pivotal business process, our aim is to derive practical insights that directly contribute to our
overarching goal of enhancing sales and profits, ultimately realizing sustainable growth.

Analytics Approach:
● Identify Available Data:

9
Collect and identify relevant data sources, including sales figures, current advertising
strategies, customer data, and competitive pricing information.
● Business Analysis:
Apply statistical analysis methods to evaluate the performance of current advertising and
pricing policies. Analyze their impact on revenue and profit.
● Customer Segmentation and Profiling:
Segment customer groups into similar cohorts and create detailed profiles to understand the
specific needs and desires of each group.
● Forecasting and Modeling:
Utilize forecasting models to predict the outcomes of new strategies, as well as to assess
their impact on revenue and profit.
● A/B Testing and Optimization:
Conduct A/B testing to assess the performance of varied strategies. Optimize advertising
and pricing strategies based on test results and market feedback.
● Real-time Dashboards and Reporting:
Build real-time dashboards and reports to monitor the performance of strategies and update
business decisions.
● Machine Learning and Artificial Intelligence:
Employ machine learning and artificial intelligence to discover complex patterns in data and
suggest optimization strategies based on new information.
● Continuous Evaluation and Updating:
Implement continuous evaluation to ensure that strategies still reflect the evolving business
environment and update them over time.

Dimensional Modeling
Used denormalization techniques to improve query efficiency and decrease the need for
complex joins.

Developed and deployed a Star Schema paradigm for effective data warehousing.

10
IV. Business analysis
1. Is There A Method To Identify The Top Performing Salespeople And Compensate Them
Accordingly?
-Null Hypotheses: All Salespeople produce the same.

11
-Alternative Hypotheses: Salespeople generate are different in their production.

ALPHA = 0.05 and this is a TWO-SIDED Ttest.

We will examine the Northwinds Sales Team next. I'll start by listing the salespeople's names,
numbers, and names of their regions. Finding the best performers and their characteristics is the aim.
All salespeople may then be motivated by these qualities, ensuring that their objectives and the
company's objectives are in sync.

Below are 3 graphs that show the Total Revenue by Salesperson, along with their Number of Orders
and Average Revenue per Order.

12
The “Revenue by Salesperson” and “Number of Orders by Salesperson” histograms seem to lend
evidence to REJECT the Null Hypothesis. There appears to be difference in the each Salesperson’s
Revenue.

First, let’s calculate the total orders for each salesperson in the last 12 months. Then, run an Ordinary
Least Square model (OLS) on Salespeople with the following predictors: Average Revenues per Order
and Number of Orders by Salesperson to see which predictor has the most influence on Revenue.
Then, dig deeper into that predictor to see if there’s a difference in Sales Rep Performance.

It appears that Avg_Revenue_per_Order (pvalue < .05) and TOT_Num_ORDERS_by_Rep (pvalue <
.05) are influential in PREDICTING THE REVENUE of each Salesperson. Orders by Rep is the most
influencial. However, this DOES NOT prove or disprove our null hypothesis. Our null hypothesis is
to see if there’s a difference in salesperson revenue performance.

Run an ANOVA test to compare Avg_Revenue_per_Order to see if there’s a statistical difference in


Salesperson Performance. This will answer the question of whether the average revenue per order
varies between different salespeople and assesses the degree of variation between multiple samples,
where each sample is a different salesperson.

13
*CONCLUSION: The pvalue is greater that alpha (.05), so we FAIL TO REJECT the null
hypothesis. Therefore, we conclude that each salesperson is similar in terms of Average
Revenues per Order.

2. Do we grow Northwinds’ business by focusing efforts on underperforming regions?


Or do we expand into other Regions?

-Null Hypotheses: All geographical areas (Region and/or Country) are equal.

-Alternative Hypothesis: Areas are different and there’s an opportunity to grow in existing
territories and/or expand into others.

-TWO-TAILED TEST & ALPHA = .05

First, we will start by mapping our current customers and orders, of which, there are 88 and
540, respectively. It was important to revisit the Geographic Overview that was developed in
the EDA.

14
In our EDA, we discovered that the best method to differentiate geographic areas was by
“Average Revenue per Order by Country.” Here are the Average Orders per Country in the
Last 12 Months (5/7/13 to 5/6/14).

The pvalue is <.05 for Austria, so we REJECT the NULL HYPOTHESIS that the Average
Customer Revenues are the same by Country. Here is the code for Tukey test and it’s output.
used it because it’s output displayed which specific countries were different.

15
16
-The Tukey Pairwise test shows that, given an alpha=.05, that the following pairs of
Countries have statistically significant “Average Revenue per Orders”. Countries listed are
statistically different than 2 or more other countries.

● Austria is different from: Argentina, Brazil, Finland, France, Italy, Mexico, Spain,
UK, Venezuela
● Ireland is different from: France, Italy

17
- CONCLUSION: We reject the null hypothesis that all countries are the same in terms of
“revenue per order”.

● Austria and Ireland are statiscally different than 2 or more other countries to the right
(greater) of the mean.
● France and Italy are statiscally different than 2 or more other countries to the left
(less) of the mean.

V. Conclusion
In summary, the establishment of a robust Data Warehouse, equips organizations to
predict future sales by analyzing diverse data sources. The synergy of data-driven
predictions and human intuition is acknowledged, with insights guiding strategic
decisions. The conclusion is not merely a consolidation of data but a strategic
roadmap, providing actionable intelligence for sustainable development in a dynamic
business environment.

VI. References
https://www.researchgate.net/publication/288829109_Using_a_Data_Warehouse_to_i
mprove_analyzing_Tourism_Data

https://ojs.unud.ac.id/index.php/ijeet/article/download/53621/31807

https://nathanwyand.com/2019/03/27/digging-for-business-insights-in-the-northwind-
database/

https://docs.yugabyte.com/preview/sample-data/northwind/

https://docs.yugabyte.com/preview/sample-data/northwind/
https://www.geeksengine.com/database/sample/what-is-northwind-database.php

18

You might also like