Professional Documents
Culture Documents
Rapport PFE Balghouthi Hazemespdsi20201678154298523
Rapport PFE Balghouthi Hazemespdsi20201678154298523
ii
Acknowledgement
I dedicate all this success to the people that been there for me since day one.
Even the most elegant of language and expressions cannot adequately convey my
unfathomable love for you or my sincere gratitude for all of your efforts. You’ve given
me a feeling of accountability, optimism, and self-assurance in the face of challenges in life.
Your advice has always guided my steps towards success. Your endless patience, your
understanding and your encouragement are for me the essential support that you have
always been able to give me. I owe you what you have always been able to give me.
I owe you what I am today and what I will be tomorrow. I promise you that I will al-
ways do my best to remain your pride and never disappoint you. May God, the Almighty,
preserve you, grant you health, happiness, and protect you from all harm.
Thank you for always being by my side, your presence, your devoted love, your ten-
derness, your prodigious advice and your constant support, always lead me to success and
happiness.
To my Friends
Thank you for your advice and for all the good times we had together. I hope our
friendship will last forever.
iii
Thanks
At the end of this work, all people who have helped build this project, whether directly
or indirectly, deserve my sincere gratitude..
To start with,i cannot express how much i thank my parents who have been with me
since day one and they deserve much more
I would also like to my academic supervisor, Mrs. Wided MATHLOUTHI, for walking
me through the steps and evaluations of this project .
I would like to express my gratitude to the team of SBS, to the people who left the
team and to those who are still with us. Thank you for your unconditional support, your
support, your relevant comments and your advice, thank you for bringing motivation to
the work environment and contributing to the smooth running of my internship.
Thank you also to the professors of the Private School of Engineering and Technology
(ESPRIT), who provided me with the tools necessary for the success of my university
studies.
I will not miss the opportunity to warmly thank the president and the members of the
jury for having granted me the honor of judging my work.
iv
Contents
Acknowledgement iii
Thanks iv
General Introduction 1
1 Project Presentation 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Host Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Smart Business Solution . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Areas of activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Client Presentation COGEPHA . . . . . . . . . . . . . . . . . . . 4
1.3 Project Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Problematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.2 Business Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.3 Data Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.4 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
v
2.5 Functional Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.1 Identification of actors . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.2 Technical Environment . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Multidimensional Modeling 24
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Identifying dimensions and fact tables . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Indicators and measures . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.2 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.3 Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Multidimensional Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Data warehouse models . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.2 Choice of model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.3 Data Marts Conception . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Predictive analysis 48
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Data Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Key Influencers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 Decomposition Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.5 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
General Conclusion 53
Bibliographie et nétographie55
vi
List of Figures
vii
4.21 Shipments Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.22 Web Version over the Power Bi Service . . . . . . . . . . . . . . . . . . . . 47
4.23 Mobile Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
viii
List of Tables
3.1 Dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Fact Tables for the Stock . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Fact Tables for the Purchases . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Fact Tables for the Shipment . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Comparison of star model and snowflake model. . . . . . . . . . . . . . . . 29
ix
Table of abbreviations and acronyms
IT Information Technology
BI Business Intelligence
DWH/DW Data WareHouse
ERP Enterprise Resource Planning
DM Data Mart
AI Artificial Intelligence
STG Staging Area
ETL Extract,Transform,Load
SSIS SQL Server Integration Services
OLAP Online Analytical Processing
SSAS SQL Server Analysis Services
DB Database
MSBI Stands for Microsoft Business Intelligence
SSMS SQL Server Management Studio
IDE Integrated Development Environment
MDX Multidimensional Expressions
DAX Data Analysis Expressions
CRM Customer relationship management
SME Small and medium-sized enterprises
KPI Key Performance Indicator
x
General Introduction
Competition between companies that provide services and products is intensifying and
they are looking for business opportunities to ensure their ability to grow and engage with
their customers to capture market share. Over time, the world has come across a huge
amount of data that keeps growing every day and conveys a lot of relevant information
that can influence business strategy and economics. After this phenomenon, business
intelligence has emerged to extract, clean, recover and analyze data so managers can
completely comprehend their genuine condition, maximize their assets, and keep them
informed in order to make the best judgments possible.
BI has changed a lot since its introduction: the methods used, the solutions consid-
ered and used. It is important to note that BI basically encompasses the entire Business
Intelligence system. Data management infrastructure (data warehouse, ETL), reporting
tools, business analytics, data visualization, etc. The purpose of this set is to monitor
its activity. Stimulate innovation, anticipate and adapt to future market developments,
increase efficiency in all areas of activity.
1
I will offer the 5 chapters below to further explain this project:
- The first chapter “Project framework”: This is an introductory chapter that presents
the host organization, the problems, the proposed solutions and the general research. Fi-
nally, we end the chapter with a presentation and an explanation of the work methodology
applied.
- The second chapter "Project planning and definition of needs": As its name suggests,
this chapter aims to plan the project’s various stages, then determine the end users’ de-
mands before defining the technological environment..
- The third chapter “Multidimensional modeling”: This is devoted to the data model-
ing phase, the selection of indicators and the identification of tables of facts and dimen-
sions.
- The fourth chapter “Implementation and visualization of data”: This chapter focuses
on the design and development of the data preparation domain as well as the specification
and development of user applications and dashboards.
-The fifth chapter “Predictive analysis”: This chapter focuses on the predictive anal-
ysis, axis of data collection and presents the future vision of the module.
In our final section, we’ll give a final assessment of the work done, discuss the outcomes,
and outline the project’s potential future directions.
2
Chapter 1
Project Presentation
1.1 Introduction
This chapter is dedicated to introduce the host company, the project’s context,the
problematic and the proposed solution. The first section is dedicated to introduce the
general context of the project. To begin, we will introduce the host company and its
different areas of activity. In the second section, we will study the existing system, which
includes the difficulties and the problems to fix, in order to formulate the solutions that
need to be built. In the last section, we will elaborate more about the process of the
project respecting Business Intelligence’s state of the art.
3
1.2.2 Areas of activity
The main activities of SBS are: Integration of ERP business management solutions
and outsourcing.
• Business Intelligence Reports and Dashboards that allow clients to monitor their
different activities (sales,purchases,stock...) and understand their data coming from
the ERP better
4
Dashboard” within SBS: this will allow the client company COGEPHA must have a
comprehensive and in-depth understanding of the data relating to its operation, which
will be a significant benefit when making decisions.
1.3.1 Problematic
Given the specificity of pharmaceutical products, particularly the validity limit of these
products, the company wishes to maintain an optimized stock value while guaranteeing its
availability for sale. This balance is dependent on several key factors including ’the rate
of satisfaction of supplier orders’ as well as ’average delivery times’. In order to achieve
the company’s needs, we built an interactive dashboard that focuses on monitoring the
stock value and shipments activity. Through the dynamic features of PowerBI, we were
able to load all the data and navigate through time filters to track the stock value through
time. This was needed since the company old reporting service, done through Excel, is
static and need to be updated monthly and archived.
5
1.4.2 Business Intelligence
Business intelligence (BI) is the collective name for a group of technologies that aid
in decision-making and give a broad picture of a company’s numerous operations. By
providing essential data at the right time and through the right channel, the resulting BI
Information System acts as the primary tool for making decisions and enables goals to
be monitored and the means of accomplishing them to be altered. BI can be the focus
of extremely various techniques from one firm to another because it is a field that is still
in full development. Nonetheless, it all has the same objective—to help decision-makers
evaluate the performance of their business. The four steps of a BI project are collection,
modeling, restitution, and analysis. Figure 1.3 depicts these stages, which are further
described below[2].
The Collecting and Integration phase: The process of collection involves find-
ing, choosing, and extracting transactional data from our source. An ETL tool will then
load the data into a data warehouse after making the necessary modifications to structure
it in accordance with a unified, standardized, and usable manner. It is necessary to focus
at this stage on the end user’s specific needs in order to meet all their expectations.
The ETL procedure is described in more depth in the following graphic (figure 1.4):
These actions are performed periodically. In our project it will be launched on a daily
basis at midnight when there is no data traffic.
The Modeling and Analysis phase: This phase is all about storing data in an
appropriate format for the needed analysis. It calls for the design of a data warehouse
that enables the simplification of the data model and the organization of the information
delivery.
6
Figure 1.4: ETL Process
At this level, the data is fed into the data warehouse from the production databases
using the ETL process.
We presented the data in accordance with a number of analytical axes throughout
the decision analysis so that we could readily extract them and compare them in various
ways. This is accomplished using multidimensional databases (Cubes) made by OLAP
computing method and data processing technology.
The OLAP system makes it possible to perform aggregations on the measures and a
rapid analysis on the data, in our case this system is developed in analysis services (SSAS).
Data Marts
The Datamart is a collection of focused, sorted, aggregated, and organized data that
has been organized to satisfy certain business goals.SQL queries from one or more rela-
tional databases are used to generate the data store, which is then saved in memory under
the control of a database management system.
Data warehouse
7
needed to enhance decision-making. The Datawarehouse is only to be used for this pur-
pose. ETL (Extract, Transform, Load) tools in particular are responsible for supplying it
with data from the production bases.
— Integrated: The data is gathered from various sources that each use a different
format. Before making them available for use, they are integrated.
— Non-volatile: Data does not vanish or alter over time or with processing (Read-
Only).
1.4.4 Measures
Fact tables are tables that contain records where everyone relates to a well-defined
business operation.
— Navigation Keys: they allow access to one or more axes of analysis. Technically
these are foreign keys from related dimensions with her.
— Facts or Measures: These are the calculable numerical values, on which we can
perform aggregations and from which we can calculate performance indicators. We can
cite three types of measures:
Additive measure which is the most flexible and the most used since it allows to ag-
gregate the values of the measure with all the dimensions.
Non-additive measure that does not allow the aggregation to be applied to any di-
mension
1.5 Conclusion
In this first chapter we began by presenting the reception, the context of the project, the
problems encountered, the solutions proposed and we ended up stating the concept of
business intelligence and the phases of its different segments
8
Chapter 2
2.1 Introduction
In the second chapter, we will first outline the work methodologies that are pertinent to
the project proposals for solution planning and decision support made in the first chapter.
Then, we will list the functional and non-functional needs, and finally, we will conduct a
functional study.In the end, we shall outline the solution’s technical environment.
9
changing business needs. But to do that, you have to put in place the ideal framework for
implementing and managing BI. There are different versions, but the underlying method-
ology is the same. Below are the main steps for successfully adopting an agile BI approach.
• The concept: develop a loose vision of BI.
• The initiation: define the different priorities.
Identify the main needs and requirements of the business. This includes understanding
the questions to be answered through the BI system Discover the available data sources
Understand the channels for disseminating information: reports, dashboards Prioritize
key business requirements and needs taking into account time and budget constraints
Choose the Business Intelligence software to use.
• Construction iteration: provide a functional system that meets the evolving
needs of stakeholders.
• The transition: start the previous build iteration in production
Release the pilot project in a small subgroup
• The production: support all elements from construction iterations and transition
to production
Operate and support the system, dashboards and reports.
Identification
To situate oneself correctly in the context of the company, the GIMSI method pro-
poses a first phase which makes it possible to describe the company’s internal and external
environments:
This is the step that identifies the company strategically and competitively in terms of:
- Market
- Company resources and policies
- Business Strategy
10
• Identification of the company :
Structural analysis of the company to identify the processes, activities and concerned
persons.
Design
This is the second stage of the GIMSI approach for cataloguing every component in-
volved in dashboards.
In order to be able to make decisions based on the research done in the earlier phases,
this step helps you to outline the company’s objectives.
Each objective needs to meet the requirements listed below:
- Real time: It needs to be updated often in order to allow for decision-making at any
moment.
- Measurable: It must be able to measure one or more constructible objectives
- Ergonomic design for the dashboard
• Collection of information :
11
This is the phase in which the data needed to construct the indicators can be collected
according to the objectives established in the previous phase. Each piece of data must
meet certain criteria:
This is the stage that allows us to manage the objectives that we set ourselves in the
previous step. Each of them must contain the following elements:
- A timeframe or date
- Measures
- Visuals
- Responses to the concerns and aims
• The Dashboard system :
Design of the dashboard system, control of the global consistency.
This is the step that explores the interactions between the dashboards and allows an
overall consistency of decision-making systems. In fact, this harmony ensures that:
Implementation
This is the third phase of the GIMSI method to implement the solution.
• BI Tools:
This is the step that allows you to study the technological needs and analyze the market
offer, in order to choose the most appropriate tool for the company’s objectives and the
most adapted to its current problems.
• Integration and Deployment :
This is the step that allows the implementation of the technologies chosen during the
previous step in the company as well as the integration of the solution with the existing
one.
Continuous Improvement
The audit constitutes the fourth phase of the GIMSI method, which watches over
the system. This is the step that allows the permanent monitoring of the solution. It
guarantees not only the sustainability of our system, but also its performance according
to the needs and objectives set.
12
2.2.1.3 Choice of Methodology
After having studied the two different methodologies that can be used for the imple-
mentation of our solution We see that the GIMSI technique has several decision points,
which is unique in the design of a management system based on dashboards that encour-
ages communication and knowledge sharing amongst decision-makers.
2.2.2 BI Approach
Designing a DataWareHouse (DWH or DW) is an intricate and crucial step in the
creation of a decision support system. Bill Inmon and Ralph Kimball, both fathers of the
DataWareHousing process (design and development process of DWH), offer two different
approaches to modeling a data warehouse. The two methods must be compared in order
to determine which one best meets your needs.
Bill Inmon thinks the DWH and the DM are physically separate. DW has its own phys-
ical existence, oriented to storage, traceability and scalability, but DM also has its own
physical existence. They leverage DW and offer a performance-based payment structure
based on their expressed needs. Figure 2.1 illustrates Bill Inmon’s approach to designing
a data warehouse.
Ralph Kimball views DW very differently than most people. According to him, DWs
can be viewed as a coherent collection of DMs between them based on coherently shared
dimensions. The issue with Ralph Kimball’s strategy is the reliance on quick decision-
making processes that meet the needs expressed. Figure 2.2 shows the design by DW
following the approach of Ralph Kimball.
13
Figure 2.2: Kimball Approach
Based on the presentation of the last two approaches, we will compare their main
characteristics such as process, schematization, basic model and the nature of the head-
to-head result. Table 2.1 presents a comparative study of the DWH design approaches. [7]
Ralph-Kimball Bill-Inmon
Process Bottom-Up Top-Down
Principles Model the data mart Create a centralized
then integrate them data warehouse in
and thus form a data which the data will be
warehouse consolidated
Schematization Star Snowflake
Data structure Business process ori- Data oriented
ented: KPI, Dash-
boards
End-user accessibility Strong Weak
Persistence of source data Stable Changing
Data warehouse delivery Fast Slow
During the realization of the current project, and according to the customer’s needs,
starting from the reduction of development costs, until the realization of a decision sup-
port system which delivers an action, a series of tactical actions, while delivering artefacts
as quickly as possible We adopt a bottom-up approach to meet all needs.
14
2.2.3 Life cycle of Ralph Kimball’approach
To better adapt the project to the changing and sometimes volatile needs of our cus-
tomers, Ralph Kimball provided the Business Intelligence project lifecycle. While adapt-
ing to the frequent changes that may occur from one project to another. This diagram
shows Ralph Kimball’s lifecycle as well as vertical and horizontal dependencies among
various nodes.
Figure 2.3: Life cycle of a business intelligence project according to Ralph Kimball
Project planning
- It has an interacting relationship with the notion of needs, as the arrow suggests.
In fact, it emphasizes these requirements from the perspective of available resources and
level of expertise, which is directly related to work assignment, duration, and sequencing.
Definition of needs
- The user must be heavily involved and his wants must be understood in order to
properly build the data warehouse.
- Determining the important elements that will enable the business to accurately spec-
ify its requirements, which will be utilized to model the data warehouse.
- This activity serves as the foundation for the other concurrent activities including
technology, data, and user applications.
15
- A description of the problem and how technologies were integrated to solve it so that
he might develop.
- The needs, the current technical environment, and the anticipated strategic technical
directions are taken into account.
- Choosing the right tools in accordance with the technical architecture research (Ex:
data preparation and access tools)
Dimensional modeling
- Once the requirements have been established, modeling and the creation of a dimen-
sional model (including fact tables and dimensions) can proceed.
- Identification of the physical structures required for the implementation of the logical
data in the database. (Ex: identification of keys, constraints, types . . . )
- Security
- The development of the ETL process: data extraction, cleaning, consolidation, and
loading
- The integrity of a shared understanding between the developers and the end user is
ensured by the establishment of dashboards based on the preceding stage.
Deployment
16
- Forecasting end-user training
- User monitoring
- A guarantee that the data warehouse will run continuously and effectively (perfor-
mance optimization)
- Data storage
Project management
- Control adjustments
17
Figure 2.4: Project Plan
2.4.1 Objectives
The aim is converted into an action intention by the logistical goal of the solution. To
establish company demands, various logistics objectives frequently need to be clarified.
Corporate requirements frequently dictate how their objectives will be met. The answer
should enable analysis, monitoring, and:
Now that the business needs have been defined, we can proceed to the definition of
the functional and non-functional needs of the solution.
- Data integration: The process of creating the solution starts with data integra-
tion, cleaning, transforming and loading the data into a database for business intelligence.
18
Hence, in order to give the dimensional model, we shall rely on the ETL procedure.
This stage produces a set of dashboards that offer global project visibility, staff, and
business operations to aid in decision-making.
Track Purchases
Track Stock
Track Shipments
- Minimise delays
- Satisfy customer’s needs
- Explain the shipments that exceeded time
19
2.5 Functional Study
2.5.1 Identification of actors
We began by conducting a thorough analysis of the database and extrapolating the
data that we may utilize to increase visibility and accelerate the decision-making pro-
cess in order to identify the perpetrators. When the data had been used, we studied
COGEPHA’s businesses in order to improve our comprehension of the data source and
direct the solution. Improve our knowledge of the data source and tailor the solution to
the needs of the targeted end users. We may distinguish between two categories of users
who played various roles:
The system administrator: This is a collaborator whose mission is to:
End users: The following people are the app’s direct users:
Those who make decisions will all have access to dashboards. The administrator’s
responsibilities include managing the dashboards, updating the data, ensuring the appro-
priate management and operation of the ETL (Extraction, Transformation, and Loading)
component, and modifying the indicators.
Technical architecture
In order to accomplish our goals, we will establish a technical architecture that will
specify the steps that data must go through in order to change states,beginning with data
gathering, passing through modeling, up to the visualization and finally deployment. the
visualization stage and finally the deployment. We based our work on the technical ar-
chitecture below in order to create unified graphic charters:
— The first part is the ETL zone, which stands for extract, transform, and load. De-
velopers must have exclusive access to this zone where data processing is done. Under
no circumstances should an end user be able to access it because the data is not yet
20
Figure 2.5: Solution’s technical architecture
— The data storage space is the second part: historically we had a data warehouse
here based on OLAP technology. Now we can imagine other types of data storage such
as in-memory storage or distributed data systems (HDFS, . . .). However, regardless of
the format or storage medium chosen, dimensional modeling is an excellent approach to
logically structure data and provide access to it.[8]
— The data restitution section, which includes all the tools that produce reports or
dashboards, is the final part. This area can also provide data at an atomic level for ma-
chine learning tasks.
Technical environment
Now and following the definition of the architecture, we present the tools that will be
used to carry out this project.
21
Microsoft has also built in backward compatibility for older versions of SQL Server,
enabling older SQL Server instances to be connected to by a newer version of SSMSs.
SSIS[9] is a tool for extracting, transforming and loading data, in short what is called
an ETL. We extract from a data source, then follow the transformation if necessary, and
then inject this data to MS SQL Server or other destinations.
To work with SSIS you need a Microsoft Visual Studio. The design environment for
an SSIS package is therefore Visual Studio with, if possible, access to your data server.
This is to at least check that the import was successful (in addition to the progress logs
available with your SSIS package).
22
SQL Server Analysis Services
Power BI
Microsoft offers a tool for data analysis called Power BI[11]. With a user interface
that is easy enough for end users to use, it enables the production of interactive data
visualizations.(Dashboard).
2.6 Conclusion
This chapter is dedicated to both defining the overall vision of the solution made pos-
sible and to project planning, including the specification of functional and non-functional
demands to better identify the expected objectives of the solution. In order to proceed to
the preparation section of the following chapter, identify the solutions, technologies, and
technical requirements for your solution project.
23
Chapter 3
Multidimensional Modeling
3.1 Introduction
Ralph Kimball’s method will be continued in this chapter by detailing the layout of our
proposed fix. In fact, during this stage of the project, after determining the functional and
non-functional requirements and identifying the application’s final actors, we will continue
with the application, we will continue by modeling and designing our dimensions and our
tables of facts.
They can also be KPI (Key Performance Indicator) is a quantitative measure that
allows you to follow the progress of your company or organization in relation to your key
business objectives.
Below we will list all of the indicators,KPIs and measures used in our project:
- Quantity Running Total: This will calculate the quantities of products in the stock
while seeing the evolution over time.
- Stock Value Running Total: This will calculate the value of the stock while seeing
the evolution over time.
24
-Avg time to prepare: Average time to prepare a shipment to be delivered.
-Exceeded Time: a boolean measure that indicates if a shipment exceed time or no.
-Active Vendors: Count of active vendors that delivered purchases during current
month.
-Not Paid: the amount of purchases that are still not paid.
-Cost of Delay : Cost of the not paid amounts in terms of the declined orders.
-As Planned: a boolean measure that indicates if a shipment respected the planned
delivery date predicted by the audit manager.
-Planned Percentage: Percentage of the shipments that respected the planned ship-
ment date.
-Coverage Rate: the ratio of value stock covering the needs of the customers.
-Overbought: percentage of the extra quantity purchased based on the sold quantity.
-Rotation Rate: describes the time required for a company to pay, be paid and renew
its inventory.
-Stock Value: Indicates the value of the products in the stock over time.
25
3.2.2 Dimensions
A dimension is a table that contains the axes of analysis (the Dimensions) according
to which we want to study observable data (the facts) which, subjected to a multidimen-
sional analysis, give users the information necessary for decision-making.
• Slowly changing dimension: The dimension that can undergo member descrip-
tion changes over time.
• Time dimension: Central because most of the facts correspond to business events
of the company.
Let’s now present the dimensions for our project after having briefly discussed the
various types of dimensions. The table below (tab. 3.1) lists the type, various features,
and description for each dimension.
26
Item Entries ItemEntriesId,ItemNo,Type, Contains details of all the Not
Source,Location... item entries to stock of CO- Shared
GEPHA including sales and
purchases
Shipments De- ShipmentDetailId,Customer Contains details of all the Not
tails No, City, Creation Time , shipments of COGEPHA Shared
Shipment Date...
Table 3.1: Dimensions.
Foreign keys to dimension tables and numeric values that represent measures are two
different sorts of columns found in fact tables.
The logistics chain and in particualr our project encompasses three business processes,
the stock,the purchases and the shipments which will be presented.
• Topic 1: Track Stock evolution in terms of value and quantity to assure that it
satisfies the needs while keeping an optimized value .
The following table (tab. 3.2) will detail the many measure that need be taken for
this fact table :
• Topic 2: Track Purchases in terms of amounts and quantity and track vendors per-
formances .
For this process we will need two fact tables listed below :
27
Fact Table Dimensions Indicators and Measures
Fact Purchase Item, PurchaseDetails, Ven- Active Vendors, Ven-
dor, Date, Location dorDisAmount, Not
Paid, Received Quan-
tity,Overbought,Purchases
Amount
Fact Purchase- Item, PurchaseOrders- Availability, Ordered Quan-
Orders Details, Vendor, Date, tity
Location
• Topic 3: Track Shipments in terms of delays and quantity delivered to satisfy our
customers .
For this process we will need two fact tables listed below :
• Snowflake model: In a snowflake schema, initiated by Inmon, the fact table is also
at the heart of the model. The dimensions gravitate around the central table but the
difference lies in a greater hierarchy of these dimensions. These are linked by a succes-
sion of relationships down to the finest granularity, which is directly related to the fact
table. Technically, this scheme avoids information redundancy but requires joins when
aggregating these dimensions via a succession of foreign keys.
• Constellation model: A series of star and/or snowflake diagrams make up this model
in which the fact tables share certain dimension tables.
28
3.3.2 Choice of model
The final design will incline toward the constellation of facts. Moreover, based on the
dimensions’ chosen granularity, we will find star or snowflake models. This is why we will
contrast these two models in the table below (tab.3.5) to determine which is best for us..
We first concentrate on the business process while determining the level of granularity
existing in the fact table in order to select the model that works best for us. Second, the
model’s simplicity, adaptability, and performance in terms of response time all played a
role in our decision. In light of this, we selected the star model that is displayed below.
The first "FactStock" essentially focuses on the variation of the stock in terms of value,
quantity and availability. This table allows you to track missing items, overbought items
and overall inventory status.
Our “FactPurchases” fact table will track the costs of all our purchases, most active
vendors and overall expenses of the purchase cycle.
In the third fact table "FactPurchasesOrders" we will look at the different vendors and
orders while focusing on the declined orders to assess suppliers performances.
Lastly, "FactShipments" will track deliveries to our clients and focusing mainly on
times and delays, while assessing the costs of shipments.
All of these fact tables which are the center of our data marts will build the data ware-
house.Each data mart follows the Star Model and all together consist the Constellation
Model.
29
Figure 3.1: Fact Tables Model
3.4 Conclusion
Throughout this chapter, we have identified the different sectors of activity, the mea-
sures and the axes of analysis. Indeed, we presented the dimensions and the fact tables,
after that, we discussed multidimensional datamart models before putting out the physical
data warehouse model.The development of the data stage will be covered in the following
chapters, the specification and the development of the user application and we will end
by exposing the deployment model of the solution.
30
Figure 3.2: Purchases Data Mart
31
Figure 3.4: Shipments Data Mart
32
Chapter 4
4.1 Introduction
In this chapter and following the modeling, we will move on to the next technical
stage of our project.We’ll concentrate on creating the components of the data preparation
section and using these data to make decisions.At last, the development of data in the
form of a dashboard that will give the decision maker a better view of operations while
adopting the Kimball approach to project completion decision-making.
4.2.1 Extraction
We currently pull the various required data from the transactional databases and store
them in the Staging Area (STG). To create a new foundation from which to perform our
modifications, we will build a duplicate of the tables from the source database specifically
dedicated to our company.
We used the visual studio IDE creating a solution named after our project. This so-
lution encompasses projects of all phases named STG,DWH and SSAS.
The first project is the STG. We create a package for each extracted table, in this
package we have, the source component and the destination component.In between,the
Slowly Changing Dimension (SCD) is a dimension that stores and manages both current
and historical data over time in a data warehouse.
It is considered and implemented as one of the most critical ETL tasks in tracking the
history of dimension records.Its use here is to update old rows and add the new ones into
the table without having to truncate the table.(fig.4.1)
33
Figure 4.1: Staging Area Task
And we create the package which allows filling out the tables in a sequential manner as
shown in the figure below(fig.4.2), the sequence container’s packages will all be performed
at once.
• Blocking components These are the most greedy components in terms of resources,
they require to review all the lines arriving on their input before sending them on their
output flow. Example: Aggregate, Sort, Row Sampling.
• Semi-blockers These transformations temporarily block the flow of data. This is the
case, for example, of the Merge Join which will send data to its output stream when all
34
the lines with the same join key have arrived at its input. Example: Merge, Union All.
• Non-blocking objects This type of object does not retain data either partially (semi-
blocking component) or totally (blocking component). They are therefore more efficient
in terms of execution time. Example: Derived Column, Lookup, OLEDB Command.
We will now approach the feeding step to concretize the theoretical components after
defining the appropriate dimensions in the modeling phase.
For some dimensions, we have done historization to keep track. Like the Item dimen-
sion case shown in the figure below (fig 4.3).
35
For a business need we have chosen to keep the change history of the last direct cost
of items, for the other attributes such as name, description if there will be a change the
record will be modified. So, first of all we started from our two sources which are STG
and DWH to be able to compare the new data with those which already exist in our
destination (DWH).
We used the Union All component to perform a join on all outcomes. In this case we
will have three possible cases:
• Update: A change in an attribute whose history we don’t need to keep allows the
change of this same item. (Change of name, description...).
• Historization: In this case the item is in the DWH but one of its attributes that we
need to keep track of "Last Direct Cost" has been changed. So we add a new row with
this same item keeping the same business key but an incremented surrogate key, so it will
be the current item with the new Last Direct Cost and we keep the other record while
changing its end date and having the old Last Direct Cost.An example is shown in the
figure below (fig 4.4)
For all the dimension we used the ’Script’ component to create the auto-incremented
surrogate key(Technical key). We also used ’Derived Columns’ to replace null and empty
values into "UNKNOWN" so we can filter the data better.
The fact tables will be loaded at the final transformation stage already identified dur-
ing the fact table identification phase. To fully understand this step, we will go through
each fact table.
To prepare the data, we created several tasks using different data transformation com-
ponents to build our operational data marts and we will now discuss some of them.
Since most of the data for the Purchases Data Mart mainly comes from the Purchases
Invoice Line table. In our case, we are primarily interested in the amounts and costs,
so we pull the data from the table to perform the purchase amount aggregation so that
we can add it to as a measure and insert it into Fact Purchase Table.(fig 4.6)
Since most of the data for the Purchases Data Mart mainly comes from the Purchases
Line table. In our case, we are primarily interested in the non received orders, so we pull
36
the data to perform the measures needed on declined orders including their amounts and
quantities and insert it into Fact PurchaseOrders Table.(fig 4.7)
Shipments data mainly come from the Sales Shipments Line table, so we will in-
tegrate the data into the Fact Shipments table so that we can calculate and focus on
quantities shipped and their costs.(fig 4.9)
Stock data mainly come from the Value Entry table, so we will integrate the data
into the Fact Stock table so that we can calculate the value of the stock and the needed
measures.(fig 4.8)
We performed Lookups with dimensions at the source component level instead of joins
as it performs better when it comes to execution time. As a result, we were able to access
every match in our fact table. We have thus added some of our measures that don’t
require complex calculating power such as sum and multiplication through the ’Derived
column’ component, these measures have been calculated from the data extracted by the
source. Finally, we reach the last stage, data loading, which involves writing data more
accurately into the correct table in the database. The data warehouse’s "Master" package
is depicted in the figure below. Execution of the fact tables will come after the sequential
execution of the dimensions. (fig.4.5).
37
Figure 4.6: Fact Purchases Implementation
We also used ’Conditional Split’ to split the [Source No] column in table Value Entry
from STG ,which contains both Vendors and Customers IDs, to separate them into two
new columns VendorID and CustomerID in FactStock.
38
Figure 4.9: Fact Shipments Implementation
This engine is the industry-leading OLAP server integrates effectively with a variety
of BI products. It enables end users to analyze data across several dimensions, giving
them the knowledge they require for better decision-making.
The complicated and potent query language used by OLAP cubes is called MDX, and
it produces multidimensional reports made up of one or more two-dimensional arrays.
A multidimensional model consists of cubes and dimensions that can be extended and
annotated to allow complicated query techniques, speed up response times, and provide a
single data source for reporting. Another advantage offered by Analysis Services multidi-
mensional databases is integration with commonly used BI reporting tools such as Excel,
PowerBI, as well as custom applications and third-party solutions.
The figure below (fig.4.10) provides the dimensions and cubes that will be used in our
filters. Only the data required for the analysis has been loaded.
39
Figure 4.10: Cubes and Dimensions
We have created the aggregation metrics that will need to be displayed in our reports.
We were thus able to visualize through a simple ’drag and drop’ the results of our mea-
surements along several axes. For example, the measure StockValue presents the value of
the current stock by multiplying "Unit Cost" from Dimension Item By "Quantity" from
Dimension ItemEntries.
The creation of these dynamic measures allows us to facilitate the creation of dash-
boards and access to data without the need to make additional calculations.
40
For this step we must choose the destination (Our SQL server) and we must have the
SSIS catalog in which we are going to store the project. The deployment result is shown
in the figure below.
After package verification we schedule these packages so that our ETL can run for a
specific period of time (probably overnight). So, we create in SQL Server Agent a new
job that will take care of the execution of our packages. In the example below we have
created the data warehouse job. We can define one or more jobs to run the package at
predefined times. We have several types of planning namely daily, weekly and monthly
planning. The schedule will be set according to the type of business and the frequency of
data change. In our case, we scheduled the execution daily at 02:00 for STG and 05:00
for DW.
In the figure below we have launched our Job which has been executed successfully.
41
Figure 4.13: History of Execution
We will use the Power BI tool for the creation of the various dashboards, all the in-
dicators and graphic components of which will be automatically linked to the filters to
guarantee the dynamic aspect.
Home Screen
To ensure increased use of our application, we have designed a home interface (fig.4.14)
which provides the user with visibility of all the reports covered by our solution. Indeed,
this page contains the name of the project and each theme is represented by a button
that will redirect us to the selected page.
42
Figure 4.14: Home
Stock Overview
The Stock overview below (fig.4.15) provides a general idea of the stock through the
indicators and percentages needed to monitor the status of the application since 2018 (the
monitoring date is fixed by the customer). We can then view:
- The stock value by vendor Class, the percentage of each class in terms of value, the
coverage both in numbers and days and rotation rate.
- Coverage by location
- The distribution of quantity purchased by Vendors over a period of time divided into
months.
43
The stock evolution show the evolution of stock and quantity in the stock through
time while having filters on vendor class and item description.
Purchases Overview
The Purchases overview below (fig.4.17) provides a general idea of the purchases cycle.
We can see:
- Ordered Quantity VS Received Quantity and the availability for each vendor class.
- The amounts for each class and the not paid amount and the respective cost of delay
for each class.
The purchases daily (fig.4.19) shows the amounts daily for any month chosen by the
user.
For some measures,we had to use DAX [15] mainly focusing on time intelligence func-
tions so can filter through time and calculate complicated measures.
For example figure 4.18 show the NotPaid measure which calculate the amount of pur-
chases not paid yet , it uses CALCULATE which calculates an expression while allowing
filters and the first filter we compare today’s date to due date by calculating the difference
in days with DATEDIFF function.
44
Figure 4.17: Purchases Dashboard
Shipments Overview
The Shipments overview below (fig.4.21) provides a general idea of the shipments in-
cluding Quantity and costs by location. We can see:
45
- Average time to prepare shipments.
Another example of DAX measures, is AsPlanned (fig.4.20) which uses the condition
IF on an expression which in our case is comparing the planned shipment date and the
actual shipment date and following the result is either YES or NO.
These reports can be distributed and consumed on the web and on mobile devices in
order to meet various business needs, as shown in the figures below.
46
Figure 4.22: Web Version over the Power Bi Service
4.6 Conclusion
In this chapter, we’ve shown how to use the application to track post- and pre-
recruitment states by setting up dynamic graphical user interfaces that give managers
a broad yet granular perspective. Making strategic decisions will be much facilitated by
these dashboards.
47
Chapter 5
Predictive analysis
5.1 Introduction
The numerous dashboards allowed us to extract reliable data, which was beneficial for
us to have a clear visibility and to follow the evolution of the logistics cycle, but since
the volume of data does not stop increasing we will need to further enrich our client’s
knowledge and to optimize the company’s strategic and operational decisions.
Today the main concern for a Business Manager is to transform potential opportuni-
ties into projects, hence the need to take a broader view of the opportunity and its success
rate. It is a question of predicting the future status of the latter.
In our project, we will predict purchases amounts to try and find a trend.
It can improve Business Process by identifying certain risks or alerting you to potential
problems in your Business Structure.
48
Figure 5.1: Purchases Prediction
49
Figure 5.3: Purchases Key Influencers
Since this is also an AI (artificial intelligence) visualization, you can ask the program
to find the next dimension to explore based on certain criteria. This tool is valuable for
exploration and conducting root cause analysis.
In our case the shipments will be analyzed by City, Customer, Month and Year and
Item tracking the costs.
Also the purchases will be analyzed by Vendor Class, Vendor Name, Month and Year
and Item tracking the amounts.
This will provide an overview over how we spend the money and who receives the most.
50
Figure 5.4: Shipments Decomposition Tree
51
5.5 Clustering
Clustering is an unsupervised machine learning algorithm that looks for patterns in
data by dividing it into clusters. These clusters are created such that the points are
homogenous within the cluster and heterogenous across clusters. Clustering is commonly
used in market segmentation and several areas of marketing analytics.
In the figure below (fig 5.6) , we divide our vendors into 3 clusters by the amounts
and quantities they purchase.
In the top right , we can see a Smart AI description which describes the clusters .
5.6 Conclusion
This chapter has allowed us to outline the various stages of the project’s execution,
from the models’ creation and preparation to the development of the dashboards and,
finally, to the implementation of the predictive model and various analytical techniques.s
52
General Conclusion
This report, which summarizes an enlightening internship, ends with a reminder that
the goal of our project is to create and deploy a decision-making solution dedicated to the
Logistics department.
To accomplish this, a working technique customized for the business was selected.
We first began with a project strategic analysis. In this investigation, the needs were
identified, the objectives and the boundaries of our project in order to have a clear under-
standing of the future and to be able to work with realistic, definable short-, medium-,
and long-term goals. Then, we created the technical framework for the project, and as a
result, we chose the ideal tools for the job.
We started implementing our method, which is broken down into several parts, after
preparing our work area.
The first phase was an essential phase which consists in understanding the data in
order to move forward better and to have a clearer idea of the indicators to be measured.
The most important stage follows that allowing the extraction, transformation and load-
ing of data into the DW with the use of Microsoft technologies (SSIS, SSAS). Finally,
we reach the restitution phase, where we chose to use the Power BI in order to generate
interactive, clear and detailed dashboards ensuring access to pertinent and well-organized
data in order to advance our understanding we have developed an application web that
allows to predict the success rate of the opportunity, this application helps our decision
makers to unveil the future of a commercial proposal from where they can influence its
status.
This project gave us the chance to put our academic knowledge into practice, to ex-
perience working life, and to learn about the business world. So, we had the opportunity
to apply the entire BI process, to wear the hat of a BI developer, and to deal with the
inherent challenges such task distribution and time and effort management. We now know
how to effectively defend our work, persuade others, and communicate the concepts of
"intelligence" and interactivity to our clients.
Given the complexity of the links between the tables in the database, the biggest chal-
lenges encountered during the months of work were at the level of interpreting the data.
In addition, our approach is universally applicable and flexible.
By way of perspectives and avenues for development, we plan, on the one hand to
enhance the dashboard module so that it can keep up with the business field’s rapid evo-
53
lution, on the other hand to continue analyzing the data and implement other methods
of understanding it.
54
Bibliography
[3] Title="ETL",URL="https://www.geeksforgeeks.org/etl-process-in-data-warehouse/"
[9] Title="SSIS",URL="https://learn.microsoft.com/en-us/sql/integration-services/sql-
server-integration-services?view=sql-server-ver16"
[10] Title="SSAS",URL="https://learn.microsoft.com/en-us/analysis-services/ssas-
overview?view=asallproducts-allversions"
55