Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/349098830

Designing a Data Warehouse System for Sales and Distribution Company

Article  in  Big Data Mining and Analytics · February 2021

CITATIONS READS

0 2,109

2 authors:

Ragulan Balasingham Romal Subash


Sri Lanka Institute of Information Technology Sri Lanka Institute of Information Technology
8 PUBLICATIONS   0 CITATIONS    1 PUBLICATION   0 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Designing a Data Warehouse System for Sales and Distribution Company View project

All content following this page was uploaded by Ragulan Balasingham on 07 February 2021.

The user has requested enhancement of the downloaded file.


Designing a Data Warehouse System for Sales and
Distribution Company
Balasingham Ragulan Romal Subash
Sri Lanka Institute of Information Sri Lanka Institute of Information
Technology Technology
ragulans08@gmail.com romal.subash@gmail.com
Colombo, Sri Lanka Colombo, Sri Lanka

Abstract—There is a necessity to transform from manual The reasons to design a data warehouse system are:
data operations to digital data operation permanently to • To evaluate complex queries without causing severe
survive in the industry, achieve business goals, and reusability impact on the sources;
of historical data in the future. The sustainability of a data
warehouse system has a strong correlation with user
• To increase data availability and decrease response-
satisfaction. To perform DBMS, OLAP and OLTP designs time for OLAP queries;
should be attached to all tables in the data warehouse system • To analyze state-oriented data and provide the right
should have relationships in an effective way. This study historical trend;
presents a design data warehouse model to build a data • To protect and secure the crucial business
warehouse design for a sales and distribution company. This information;
design covers transforming an operational database into an • To back up changes and ensure evolution and
informational data warehouse which helps to make decisions
by conducting, data analyzing, predicting, and forecasting. The
maintenance.
proposed model in this study goes through data migration from
existing sources, ETL process, and data indexing and loading. The above-mentioned reasons include the assurance of
The data warehouse design in this study was implemented non-performing requirements such as reliability, security,
through software tools. confidentiality, maintenance, integrity, and availability
which are included in the attributes of reliability [1] [2] [3].
Keywords—OLAP, OLTP, DBMS, DATA WAREHOUSE, ETL
This research study is prepared including the following
I. INTRODUCTION areas: section II presents a brief description of a few key
In a rapidly changing world, everything relies on digital terms which are used in this study. Section III states the
products and services. Humans adapt rapidly changing new objectives of the study. Section IV presents the related work
technologies and convert their manual lifestyles into digital on the development of the data warehouse system and the
lifestyles to enhance the productivity of their business and role of the data warehouse system in an organization.
interpersonal skills. Presently large enterprises in the Section V discusses collecting and filtering the user
industrial world rely on database systems to manage, requirements which involve the designer and the final users.
monitor, make decisions, and achieve organizational goals Section VI explains the design methodology of the proposed
within a short-term period. The database management system data warehouse design based on user requirements. Section
helps to perform the daily business transaction and analytical VII discusses the architecture of data warehouse design and
purposes of an organization. However, the right competition data cleansing and transforming. Section VIII represents the
in the industrial world to survive close-fitting market implementation: indexing and loading of the proposed data
competition, it is necessary to derive effective business warehouse design and the brief description of software
strategies and implement effective paths to carry out business tools.
by analyzing data [1].
Data Warehouse Systems (DWS) are highly required by II. DEFINITIONS
top-level authorities in an organization to analyze the
A. Business Intelligence
current financial position and an organizational path to
achieve organizational goals based on the large volume of Business Intelligence (BI) can be abstracted as a set of
data integrated from Heterogeneous Data Sources. The information systems enabling decision making, based on
other data storage, analysis, and data mining technologies.
multi-dimensional structure of the information facilitated
The main objective of the BI tool is to establish a framework,
easier to make exploration and perform analysis to end-
enabling transform data registered inside the organization
users. The data warehouse system is a combination of the into information that is required for the decision-making
following layers: process [1] [3] [5].
• Heterogeneous Data Sources (DS),
• ETL (Extraction/ Transformation/ Loading) is the B. Data Warehouse System
process extract and transform from available multiple
The data warehouse system incorporates information that
data sources and load it into a data warehouse,
can alter over time from different distributed and
• Data Warehouse repository, autonomous data sources. A data warehouse must also adjust
• Restoration tools enable the analysis of the data in to any changes that may occur in the underlying data sources.
OLAP. [1] [3] [5] [9] [10]. In a general view, a data warehouse

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


system is a contribution of data which represents the
following features:
• Subject-oriented: All data items are related to the
same business objects.
• Time-variant: Business reports are monitored and
recorded to run provisional reports.
• Nonvolatile: Data is read-only and not going to be
updated or deleted.
• Integrated: Data from multiple sources of
applications are gathered and made stable.
Figure 01: Star schema
C. Data Mart
Databases are organized into smaller units, usually called E. ETL
"Data Marts", to standardize data analysis and implement Extraction-Transformation-Loading (ETL) tool is a class
simplified application methods. In typical warehouses, Data of specific instruments with the assignment of managing
marts can extract their content directly from functional information distribution center homogeneity, cleaning, and
databases in complex situations. Database configuration can stacking issues. The plan of an ETL cycle intends to create a
be multi-level, and data mart content can be loaded from significant deliverable: a guide of the properties of the
intermediate repositories [1] [3] [8]. information sources to the properties of the information data
warehouse tables. [1] [3] [4] [5].
D. Fact Table & Dimension Table
A table of fact consists of the foreign keys to all the F. OLAP
dimension tables in the schema and numerical business On-Line Analytical Processing (OLAP) enables users to
calculation facts. It is a table which is transaction-based. Star analyze a large volume of data quickly and efficiently. All
schema is a basic dimensional model building technique, It data are stored in the form of multi-dimensional tables that
has a large central table called the fact table and a are most similar to real business data. It provides an access to
considerable amount of small tables called dimension tables get abstract data easily and quickly [1] [4] [7] [9].
which are related to the fact table by surrogate keys.
In a star schema design, the fact table locates in the G. OLTP
middle, and the dimension tables are located in the points of Online transaction processing (OLTP) is a backbone in
the star. A star schema model may consist of more current business, banking, and Internet applications. For
dimension tables. The fact table consists of all measurements some OLAP applications, especially web-based business
of an OLAP design which are aggregated in several ways applications, customers require quick access times.
with dimension tables such as the price of products sold, the Tragically, serving demands which include information base
number of products sold. All dimension tables in an OLAP movement for dynamic question preparing and information
has one to many relationships with the fact table. age can be extremely moderate significant degrees slower
The primary key of the fact table is the combination of than conveying static substance [1] [3] [12].
the all primary key of all dimension tables in a design. The
dimension tables give the premise to amassing the estimation III. THE OBJECTIVE OF THIS STUDY
in the fact table. The following figure expresses the star
This phase is mainly focused on designing an effective
schema.
data warehouse system for the sales and distribution
In this study, the order fact table will be analyzed by the company. On the way to achieve the objective, this study
customer, region, date, and products. Dimension tables includes the following three tasks to complete the data
consist of hierarchies and vastly deformalized tables. Star warehouse design.
schema can be implemented in OLAP tools or traditional 1. Cleanse the existing sales data available in CSV
relational database management systems. Even though files.
having star schema in a design has many advantages, it has a 2. Design and load the existing data to sales schema.
few limitations are following; [5] [13] 3. Provide sales dashboard to strategic level
1. Results of a user requirement analysis are management for the decision making.
unpredictable in practical and it may change from
time to time. Star schema provides an unstable
design. IV. LITERATURE ANALYSIS
2. If the database management system developer does Data warehouse design of an organization facilitates
have a clear picture of the requirement, it can lead running the organization on a right track in the industrial
to producing incorrect data warehouse design. world by merging vast historical data from multiple forms of
existing databases and amalgamate them under a single
3. It can lead to the loss of information through pre- scheme to guide decision-makers. OLAP is the storage of
collection, which limits how data can be analyzed. multidimensional information, usually hierarchical, that
almost continuously provides answers to queries. To achieve
accomplishments and to conduct descriptive analysis, the
OLAP methods may be used. The technique and algorithm of database implemented under MS SQL Server for the
the Decision Support System (DSS) can be used to forecast relational model. Inside a traditional sales and distribution
the potential areas [12]. company, this database reflects the business. It involves
tracking the company's results, making it easier to pursue and
As per the study conducted by [17], It is very important
simplify sales opportunities, closed deals, various other sales
to find out the right grains for different levels of analysis, KPIs, and sales management at a regular or strategic level.
which is one of the key factors of designing a multi- All sales activities can be navigated when it is of utmost
dimensional database. importance to raise revenue and income, to quickly and
The powerful data warehouse system aims to integrate accurately forecast and compare results. Besides, it allows a
various data sources that contain enormous amounts of data full 360 ° overview of revenue statistics, a real-time data
while enhancing the collaboration and sharing of information overview of priorities, progress visualization, and
among individuals in an organization. The framework helps information sharing with customers or stakeholders. [15]
the organization to facilitate the exchange of knowledge [16].
between information sources and various agencies [18].
Dim_Customer
Dim_Date
V. REQUIREMENT SPECIFICATION sk_Customer
sk_OrderDate
This phase focuses on gathering and filtrating the user Region
Date
requirements. It includes both designers' and users' points of Month Country
view. It generates specifications related to selecting facts on Year CustomerID
the one hand and initial signs of workloads on the other Financial Year Customer Name
hand. Financial Month Fact_Order
Facts are the initial concepts that impact the decision-
making process. Typically, it correlated with the events sk_Order
which are occurring dynamically in the enterprise world. sk_Product
Few functional requirements of proposed data warehouse sk_OrderDate
design are the following: sk_Customer
• Make fast and successful decisions that require Sk_shippedDate
competitive pressures.
• Observing the world of rivalry.
OrderID
• Feasibility evaluation and data collection from Sales Channel
various points of view of different sides of the Unit Sold
system. Discount
• Collection and analysis of data collected from Dim_Product Total Revenue
existing datasheets and the use of different Sk_Product Total Cost
techniques, data mining, and visualization. ProductID Net value
Product Type
Product Name
Unit Price
Unit Cost

Figure 03: Conceptual Scheme of Operational Database


The design methodology of the proposed data warehouse
design is based on existing sources and required
requirements. It sums up clearly in figure 02. To analyze user
requirements gathered a set of OLAP queries were utilized
and to design the dimensional model.
The above figure 03 depicts the conceptual schema of the
operations database of the sales and distribution company.
The main reason for having a different time dimension table
Figure 02: Conceptual Scheme of Operational Database is to facilitate the fact table. The second reason is to show
measure values based on yearly, monthly, and daily in
The pseudo-natural language is used to express the visuals. This study discusses thr.ee hierarchies to proposed
preliminary workload. It enables the designers able to find data warehouse design such as location-wise sales, product
out dimensions and measures while developing the wise sales, and tine wise sales. Location wise sales derive as
conceptual design. There is a necessity to specify the most region then country. Product-wise sales derive from item
relevant measures and aggregations in each fact. type then item. Time-wise sales derive as yearly sales then
monthly sales up to daily level. There are four-dimensional
tables and a fact table represented in the above figure. All
VI. OPERATIONAL DATABASE DESIGN dimensional tables are connected by relevant surrogate keys
Essentially, by way of relationships, the organizational with the fact table.
database used to later extract the data warehouse contains 04
different relationships or tables linked together. It is a
VII. DATA WAREHOUSE DESIGN - BUILDING THE 1) Truncate Tables
DATABASE Truncate table dbo.Staging_Sales

A. Architecture 2) Initial Loading Dimension Tables


Data warehouse needs for sales accreditation information Populated data into dimension tables such as customer,
emerging from the OLTP, which processes orders, product data, product, and region.
data, customer data, and sales data, are met. The data is
exported in the form of an excel file * .csv. Over a long time, 3) Lookup
the corporation used Excel sheets to run the organization's Compared source and dimension data then filtered out
everyday operations [1] [6]. match and unmatched data and if unmatched, then insert as
The excel file is then loaded into the staging table from new data.
OLTP and ETL. To make it easier to transfer the data from
the stage to the data warehouse database, the stage database 4) Derived Column
structure was constructed by matching the structure of the Created new columns or changed the formula of the
dimension tables from the data warehouse database. The data existing columns at the time table.
which is in the staging database is moved into the database of
the data warehouse to its dimension tables. The dimension 5) Load Dimension and Fact Tables
tables’ data then are copied into the fact tables on the Incremental data only uploaded on schedule runtime of the
database of the data warehouse. ETL process.
B. Data Cleansing and Transforming
Four significant steps are required to be completed to VIII. IMPLEMENTATION: LOADING & INDEXING
create an efficient data warehouse system. There are “capture The proposed data warehouse design is well developed
and extract, scrub and data cleansing, transform, and load by using various database management software tools. The
and index”. Figure 04 expresses the required stages to logical diagram of the proposed data warehouse system is
convert the operational database into a data warehouse [6] expressed in figure no. 06 which was exported from
[8] [11] [13]. Microsoft SQL Servers. All data from existing data sources
Source Data staging Data & Meta End user were successfully carried out cleaning and transforming and
System area data storage Presentation exported into the buffer zone, where is the storage location to
area tools the proposed data warehouse. Finally, to point out the
Ad hoc query patterns and business trends seamlessly, data mining
techniques can be carried out, which is highly required by the
Extract

Processing tools
Load

Clean Report writers


Feed

Reconcile End user


top-level authority in the company to proceed decision-
Match Data applications making process leisurely [6] [7] [12] [14].
Standardize Warehouse Modeling
Transform mining tools Dim_Product
Export to DW Visualization Dim_Date sk_Product
tools
sk_OrderDate Product Group
Date Unit Cost
Figure 04: Architecture for Building the Data Warehouse Day ProductID
DayOfYear ProductName
With existing Excel sheets as the source of data, the data DaySuffin ProductType
is first extracted and then temporarily stored in a buffer field. DOWinMonth UnitPrice
Data is pre-processed until it is captured. This involves data isHoliday
isWeekend
cleaning and reconciling, correcting errors in data entry, and MMYYYY Fact_Order
converting information into a more normalized standard. The Month sk_Customer
transformed data is loaded and indexed into the knowledge MonthName sk_OrderDate
database or the data warehouse until cleaned. Tables are Month Name FirstLetter sk_Order
dropped, new tables are formed, columns are discarded, and Month Name Short sk_Product
Month Year sk_ShippedDate
new columns are added in this process. The data warehouse Quarter Discount
development phase is shown in Figure 05. QuarterName NetValue
Weekday OrderID
Load as 1 on 1 Staging Cleanse of Transfer Data WeekDayName SalesChannel
Environment WeekDayName_Firstletter TotalCost
WeekDayName Short TotalRevenue
Capture/ Load/ WeekOfMonth UnitSold
Extract Index WeeklYear
Year

Dim_Custom er
sk_Customer
Region
CSV File Data Warehouse Environment Country
CustomerID
Figure 05: Process of Building the Data Warehouse CustomerName

Figure 06: Data Warehouse Logical Diagram


A. Software Tools Figure no 09: Product type wise sales
1) Power BI The product type-wise sales (in Million) are presented by
Power BI is used to represent reports and dashboards to the pie charts with colorful graphical presentation. The user can
end-users. This is a cloud-based portal, and it is possible to see further sales details for a particular product by selecting
grant access with relevant credentials remotely a product type on the pie chart. For example; all sold
vegetables are listed under the vegetable category.
2) MS Integration Services (SSIS)
It is used to design all tasks of the data loading process.

3) MS SQL Server
This is the database server that enables all tables which
were required for the ETL process. It allows us to reuse
these tables for query optimization in visual creation.

IX. DASHBOARD DIAGRAMS


There are few diagrams exported from Power BI to
provide sales data based on several aspects. Figure no 10: Country wise sales
The top-level authority of the company could be able to
see broader view of universal sales. According to the above
diagram, it can focus on high demand regions in the World.

X. SALES AND DISTRIBUTION DATA LOAD PROCESS

Figure no 07: Overall picture of sales

The above figure no. 07 enables the top-level authority


of the company to see the business activities for a particular
period based on regional sales. To see further details, the
above figure shows the net value of sales, the quantity of
sold units, total revenue for a particular period and year, and
discounts.
Figure no 08: Yearly sales drill down diagram

Figure no 11: Data load process


The first stage of the data warehousing is to truncate the
existing data in the staging table. The staging database is
under different DB instances (DB_Staging) and production
tables under different DB instances (DA). The next step is to
load the Sales data CSV file into the staging table. This is
The above figure no. is an annual sales drill down stack one to one load and keeps all fields as a string data type. The
line diagram. It represents the sales data (in Millions) on Y- data type conversions would be done in a production data
axis and year on X-axis. If necessary, the user could be able load. First, we load the product dimension table. Gets all
to see the summary of sold units by selecting a particular distinct product type, product group, and product name from
point on the stack line diagram. the staging table then lookup against the current Product
dimension table, and if no match data found then would
insert as new product data. If the match data found then
existing data will be updated in the product dimension
table.

Then we load the Customer dimension table. Gets all


distinct Customer region, Customer country, Customer ID,
and Customer name from the staging table then
lookup against the current Customer dimension table, and if
no match data found then would insert as new Customer REFERENCES
data. If the match data found then existing data will be [1] Y. Bassil, "A Data Warehouse Design for A Typical University
updated in the Customer dimension table. The Date Information System," vol. 1, p. 6, 2012.
dimension table is already populated through the script for a [2] M. Golfarelli and S. Rizzi, "A Methodological Framework for Data
longer period. The last stage of the data load part is loading Warehouse Design," Proceedings ACM First International Workshop
on Data Warehousing and OLAP, 1998.
the Sales fact table. We do the error handling part by
[3] C. Gallo, M. D. B. e and M. Perilli, "Data Warehouse Design and
redirecting error records into the 'Error Record' table. This Management - Theory and Practice," 2010.
error record table is populated at each lookup stage and if [4] S. L. a, C. Hana, S. Wanga and Q. Luoa, "Data Warehouse Design
there is a dimension record not available in the relevant For Earth Observation Satellites," no. 1877-7058, 2012.
dimension table then it would redirect to the error record [5] E. M. Leonard, "Design and Implementation of an Enterprise Data
Warehouse," e-Publications at Marquette, Marquette University,
table instead of failing the entire package. 2011.
[6] T. A. Oketunji and R. O. Omodara, "Design of Data Warehouse and
XI. CONCLUSION Business Intelligence System," 2011.
[7] J. D. Chelico, A. B. Wilcox, D. K. Vawdrey and G. J. Kuperman,
This paper introduced a model for creating a data "Designing a Clinical Data Warehouse Architecture to Support
warehouse for the information system of a traditional sales Quality Improvement Initiatives," 2017.
and distribution organization. The warehouse is an [8] D. L. Moody and M. A. Kortink, "From Enterprise Models to
informative database that extracts data in * .csv form from an Dimensional Models - A Methodology for Data Warehouse and Data
Excel spreadsheet. Data warehousing technology helps to Mart Design," vol. 5, 2000.
collect historical huge data from several kinds of databases [9] P. Giorgini, S. Rizzi and M. Garzetti, "Goal-Oriented Requirement
and unify them under a unified schema to be used by online Analysis for Data Warehouse Design," 2005.
Analytical Processing (OLAP) to decision-makers. Extract, [10] D. Mankad and P. Dholakia, "The Study on Data Warehouse Design
Transform and Load (ETL) system is the core of the data and Usage," vol. 3, no. 2250-3153, 2013.
warehouse. Each type of database needs different [11] J. George, B. V. Kumar and V. S. Kumar, "Data Warehouse Design
constrictions of the ETL system according to the data types. Considerations for a Healthcare Business Intelligence System," vol.
OLAP is the storage of multidimensional, generally 1, 2015.
hierarchical, data providing near constant-time answers to [12] Z. A. S. Abdullah and T. A. S. Obaid, "Design and Implementation of
queries. The OLAP techniques can be utilized to obtain Educational Data Warehouse using OLAP," vol. 5, no. 5, 2016.
organizational achievements and to perform descriptive [13] D. Nugroho, S. Siswanti, T. I. and Kustanto, "Design of Data
analysis. The goal of the proposed design is to assist Warehouse System to Support the Quality Management of IT based
School," vol. 10, no. 6, 2013.
decision-makers and to carry out data mining and data
analysis on the data stored in the warehouse, which [14] I. Hilal, N. Afifi, M. Ouzzif and R. F. Hilali, "Toward a New
Approach for Modeling Dependability of a Data Warehouse System,"
ultimately enables them to discover important trends and vol. 11, no. 6, 2013.
patterns. [15] I.-Y. Song and K. LeVan-Shultz, "Data Warehouse Design for E-
Commerce Environments," vol. 1727, 1999.
[16] Z. A. Abdulla and T. A. S. Obaid, "Design and Implementation of
Educational Data Warehouse Using OLAP," vol. 5, 2016.
[17] B. Parmanto, M. Scotch and S. Ahmad, "A Framework for Designing
a Healthcare Outcome Data Warehouse," 2005.
[18] J. C. Rivera-Vázquez, L. V. Ortiz-Fournier and M. Ramaswamy,
"Designing data warehouses to support criminal investigation," vol.
12, pp. 445-454, 2011.

View publication stats

You might also like