Professional Documents
Culture Documents
BI and Data Warehousing Lect 1 & 2 PDF
BI and Data Warehousing Lect 1 & 2 PDF
BI and Data Warehousing Lect 1 & 2 PDF
Most companies track and record thousands of transactions daily. Not just customer
purchases – which might include information such as the customer, the products/items
sold, the store from which the purchase was made, and the date and time of the purchase
– but also transactions such as warehouse activity, inventory purchases, employee hours
and time off, and daily operating costs. In fact, most companies are virtually swimming in
data.
With the amount of data stored by companies growing exponentially, companies need to
translate such data into information to plan for future business strategies and gain unique
perspectives of their business, enabling them to become leaner and more competitive.
Company data can indicate the viability of the company’s products and help in the
planning of future growth. Hence data can help to make important up-to-date business
decisions for maximizing revenues and reducing costs. The solution that helps business
to make such decisions is called Business Intelligence (BI).
There are several definitions of BI. Here are two working definitions in the context of this
course;
“BI is a set of tools and techniques that enable a company to transform its
business data into timely and accurate information for the decisional
process, to be made available to the right persons in the most suitable
form.”
OR
1
and predictive views of business operations for better decision-making. BI often
aims to support better business decision making.
Note that: Business Intelligence is not Artificial Intelligence (AI) in that, AI systems
make decisions for the users while BI systems help the users make the right decisions
based on available data.
2. Why BI is important?
With business intelligence, companies have greater insight into their organization,
yielding new opportunities, corrections to existing procedures or processes, competitive
advantages, and more, including the ability to:
2
3. Characteristics of BI
There are four main characteristics of BI solutions. These are; accurate answers, valuable
insights, on-time information and actionable conclusions.
i. Accurate Answers
For BI to be of any value in the decision-making process, it must correctly reflect the
objective reality of the organization, and adhere to rigid standards of correctness. As
such, the first hallmark of insights produced from BI processes is their accuracy.
BI must represent the absolute closest thing to the truth that’s possible. The aim is not
only to produce results, but also to protect BI’s reputation among the skeptics. Without
accuracy, insights that are the product of BI are worse than worthless. They can be
harmful to the company. And once that happens, nobody will ever trust BI again.
BI’s goal should not only be to produce correct information, but to produce information
that has a material impact on the organization. The impacts could either be significantly
reduced costs, improved operations, enhanced sales, or some other positive factor.
Further, BI should produce high-value insights that are usually not easily deduced. To
elaborate on the high-value insight consider this scenario: Imagine Kilimanjaro bus
invested a multimillion-shillings on a BI solution to analyze its travelers-history data.
Imagine after such vast investment the solution finds that it is likely to have high number
of travelers travelling from Dar es Salaam to Moshi in December!
3
iv. Actionable conclusions
Accurate is one thing, actionable is another. Imagine if the conclusions reached at the
end of the BI cycle were that the company would be better off if a competitor would go
out of business, or if one of its factories were 10 years old instead of 30 years old.
Those ideas might be accurate — and it’s no stretch to believe that if either scenario came
to pass, it would be valuable to the company. But what, exactly, are the bosses supposed
to do about them? You can’t wish a competing company out of business. You can’t snap
your fingers and de-age a factory. These are exaggerated examples but one of the
biggest weaknesses of decision support tools is that they build conclusions that are not
actionable. To be actionable, there has to be a feasible course that takes advantage of
the situation. It has to be possible to move from conclusion to action.
Ideally, BI should help in producing a report that would guide future actions. For instance,
reports that would help executives to conclude that a price should be lowered, or perhaps
two items should be sold as a package.
4. Main Components of BI
The main components of BI include; Data sources, data warehouse, and business
analytics (e.g. Data Mining, Online Analytical Processing (OLAP) techniques, etc.)
Data Mining
Data Visualization
Online Analytical
Processing (OLAP)
4
Data for BI solutions may come from different relevant sources. Examples of such
sources are computerized systems for Enterprise Resource Planning (ERP), Customer
Relationship Management (CRM), and Human Resource Management (HRM). Others
are Google Analytics, sensors, flat files, spreadsheets, etc. Transaction data such as
supermarket checkout, bank withdrawal, etc. is a typical example of data that can be
gathered by companies.
After data has been stored in the data warehouse it needs to be analyzed interactively
and effectively, and arranged into meaningful patterns using different tools. The analysis
can be performed with the use of OLAP tools, Data Mining or any other analytical tools
(such as EIS).
5. Data Warehouse
5
In order to overcome these problems, it is considered necessary to have an environment
that can bring together the essential data from the underlying heterogeneous databases.
In addition, the environment should also provide facilities for users to carry out queries on
all the data without worrying where it actually resides. Such an environment is a data
warehouse. All queries are issued to the data warehouse as if it is a single database, and
the warehouse management system will handle the evaluation of the queries.
Using the above analogy, we can say that a data warehouse is a centralized place to
store data (i.e. the finished products) which are generated from different operational
systems (i.e. plants). For big corporations there are normally a number of different
departments/divisions, each of which may have its own operational system (e.g.
database). These operational systems generate data day in and day out, and the output
from these individual systems can be transferred to the data warehouse for further use.
Note that: A data warehouse is a centralized repository of data from multiple sources,
while data warehousing is the process of constructing and using a data warehouse.
As we have briefly explained in 4 above, the ETL (Extract, Transform and Load) is a data
integration method where data from one or more source systems is first read (i.e.
extracted), then made to go through some changes (i.e. transformed) and then the
changed data is written (i.e. loaded) to a data warehouse. The figure below depicts
simplified structure of the ETL process.
6
Extract:
During the extraction process, data is collected from disparate data sources or a specific
subset of data is extracted from a particular source database. Extraction is done from
multiple sources for the ultimate goal of deriving some meaningful business insights. This
data may be heterogenous enough to include Online Transaction Processing (OLTP),
social media data, log files, and sensor data, which could be structured, unstructured or
semi-structured data.
Transform:
The second function is transformation of the extracted data. The extracted data is then
checked for validation which implies that data having a desired schema is processed
further and the remaining data that fails the validation test is processed in a different way
in order to make it schema-specific, and hence ready for the rest of the process that
includes loading data into the data warehouse. Therefore, during the transformation
phase of the ETL process data is processed to conform to a uniform schema that is
accepted by the data warehouse. This transformation of data into a desired state include
functions such as data formatting, splitting data, joining data, creating rows and columns,
using lookup tables or creating combinations within the data.
7
Load:
The final step of the ETL process is loading the transformed data into the target data
warehouse. This data, after transformation, is schema-specific catering to the demand of
the data warehouse. Unlike the unstructured or semi-structured data available before the
ETL process, the data is now structured, integrated, subject-oriented, time-variant, and
non-volatile. This data is loaded to the data warehouse, thereby allowing data scientists
to analyze data, gain insights, and create promising business policies.
Note that: There are several open-source and commercial ETL tools available. Some
of them are Informatica PowerCentre, Hevo, Skyvia, Xplenty and IBM DataStage ETL
software.