BI and Data Warehousing Lect 1 & 2 PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

CIT 306: Business Intelligence and Data Warehousing

1. The Meaning of Business Intelligence

Most companies track and record thousands of transactions daily. Not just customer
purchases – which might include information such as the customer, the products/items
sold, the store from which the purchase was made, and the date and time of the purchase
– but also transactions such as warehouse activity, inventory purchases, employee hours
and time off, and daily operating costs. In fact, most companies are virtually swimming in
data.

With the amount of data stored by companies growing exponentially, companies need to
translate such data into information to plan for future business strategies and gain unique
perspectives of their business, enabling them to become leaner and more competitive.
Company data can indicate the viability of the company’s products and help in the
planning of future growth. Hence data can help to make important up-to-date business
decisions for maximizing revenues and reducing costs. The solution that helps business
to make such decisions is called Business Intelligence (BI).

There are several definitions of BI. Here are two working definitions in the context of this
course;

“BI is a set of tools and techniques that enable a company to transform its
business data into timely and accurate information for the decisional
process, to be made available to the right persons in the most suitable
form.”

OR

“BI refers to the tools, technologies, applications and practices used to


collect, integrate, analyze, and present an organization’s raw data in order
to create insightful and actionable business information.”

As it can be deduced from these two definitions, BI employs technologies to


undertake in-depth analysis of company data so as to provide historical, current,

1
and predictive views of business operations for better decision-making. BI often
aims to support better business decision making.

There are three things that companies can do with BI:

i. Track customers activity patterns


ii. Track their own operations and
iii. Track industry trend

Note that: Business Intelligence is not Artificial Intelligence (AI) in that, AI systems
make decisions for the users while BI systems help the users make the right decisions
based on available data.

2. Why BI is important?

With business intelligence, companies have greater insight into their organization,
yielding new opportunities, corrections to existing procedures or processes, competitive
advantages, and more, including the ability to:

• Identify top-selling products by sales person, store, or region


• Identify trends, both good and bad, early on
• Generate ad-hoc financial reports
• Track competitors in their area
• Compare information about customers, products, prices, and costs over time

The potential benefits of BI solutions include:

1) Accelerating and improving decision making


2) Optimizing internal business processes
3) Increasing operational efficiency
4) Driving new revenues
5) Gaining competitive advantages over business rivals
6) Identifying market trends
7) Spotting business problems that need to be addressed

2
3. Characteristics of BI

There are four main characteristics of BI solutions. These are; accurate answers, valuable
insights, on-time information and actionable conclusions.

i. Accurate Answers

For BI to be of any value in the decision-making process, it must correctly reflect the
objective reality of the organization, and adhere to rigid standards of correctness. As
such, the first hallmark of insights produced from BI processes is their accuracy.

BI must represent the absolute closest thing to the truth that’s possible. The aim is not
only to produce results, but also to protect BI’s reputation among the skeptics. Without
accuracy, insights that are the product of BI are worse than worthless. They can be
harmful to the company. And once that happens, nobody will ever trust BI again.

ii. Valuable insights

BI’s goal should not only be to produce correct information, but to produce information
that has a material impact on the organization. The impacts could either be significantly
reduced costs, improved operations, enhanced sales, or some other positive factor.

Further, BI should produce high-value insights that are usually not easily deduced. To
elaborate on the high-value insight consider this scenario: Imagine Kilimanjaro bus
invested a multimillion-shillings on a BI solution to analyze its travelers-history data.
Imagine after such vast investment the solution finds that it is likely to have high number
of travelers travelling from Dar es Salaam to Moshi in December!

iii. On-time information

Another important characteristic of BI solution is on-time information. BI should eliminate


information delays. If the trader’s applications, for example, are slower in producing
information (analyzed data), they would miss opportunities to execute the most profitable
trades.

3
iv. Actionable conclusions

Accurate is one thing, actionable is another. Imagine if the conclusions reached at the
end of the BI cycle were that the company would be better off if a competitor would go
out of business, or if one of its factories were 10 years old instead of 30 years old.

Those ideas might be accurate — and it’s no stretch to believe that if either scenario came
to pass, it would be valuable to the company. But what, exactly, are the bosses supposed
to do about them? You can’t wish a competing company out of business. You can’t snap
your fingers and de-age a factory. These are exaggerated examples but one of the
biggest weaknesses of decision support tools is that they build conclusions that are not
actionable. To be actionable, there has to be a feasible course that takes advantage of
the situation. It has to be possible to move from conclusion to action.

Ideally, BI should help in producing a report that would guide future actions. For instance,
reports that would help executives to conclude that a price should be lowered, or perhaps
two items should be sold as a package.

4. Main Components of BI

The main components of BI include; Data sources, data warehouse, and business
analytics (e.g. Data Mining, Online Analytical Processing (OLAP) techniques, etc.)

Data Mining
Data Visualization

Data Analytical Analysis


Data Sources
Warehouse Tools Results

Online Analytical
Processing (OLAP)

Companies gather Stored data analyzed and Knowledge is


Collected data is
data from relevant arranged into meaningful gained after
filtered and stored
sources patterns using different tools analysis

4
Data for BI solutions may come from different relevant sources. Examples of such
sources are computerized systems for Enterprise Resource Planning (ERP), Customer
Relationship Management (CRM), and Human Resource Management (HRM). Others
are Google Analytics, sensors, flat files, spreadsheets, etc. Transaction data such as
supermarket checkout, bank withdrawal, etc. is a typical example of data that can be
gathered by companies.

A data warehouse is a collection of corporate information and data derived from


operational systems and external data sources. Data is populated into the data
warehouse through the processes of extraction, transformation and loading (ETL). This
process can be achieved by the use of ETL tools. As the name implies, an ETL tool
Extracts data from a source, Transforms the data while on transit, then it Loads the data
into a data warehouse. We will provide a further discussion on data warehousing and the
ETL process in subsequent sections.

After data has been stored in the data warehouse it needs to be analyzed interactively
and effectively, and arranged into meaningful patterns using different tools. The analysis
can be performed with the use of OLAP tools, Data Mining or any other analytical tools
(such as EIS).

5. Data Warehouse

A data warehouse is simply a centralized data repository that is maintained separately


from an organization’s operational databases. The motivation for building a data
warehouse is that corporate data is often scattered across different databases and
possibly in different formats. In order to obtain a complete piece of information, it is
necessary to access these heterogeneous databases, obtain bits and pieces of partial
information from each of them, and then put together the bits and pieces to produce an
overall picture. Obviously, this approach (without a data warehouse) is cumbersome,
inefficient, ineffective, error-prone, and usually involves huge efforts of system analysts.
All these difficulties deter the effective use of complex corporate data, which usually
represents a valuable resource of an organization.

5
In order to overcome these problems, it is considered necessary to have an environment
that can bring together the essential data from the underlying heterogeneous databases.
In addition, the environment should also provide facilities for users to carry out queries on
all the data without worrying where it actually resides. Such an environment is a data
warehouse. All queries are issued to the data warehouse as if it is a single database, and
the warehouse management system will handle the evaluation of the queries.

A data warehouse is conceptually similar to a traditional centralized warehouse of


products within the manufacturing industry. For example, a manufacturing company may
have a number of plants and a centralized warehouse. Different plants use different raw
materials and manufacturing processes to manufacture goods. The finished products
from the plants will then be transferred to and stored in the warehouse. Any queries and
deliveries will only be made to and from the warehouse rather than the individual plants.

Using the above analogy, we can say that a data warehouse is a centralized place to
store data (i.e. the finished products) which are generated from different operational
systems (i.e. plants). For big corporations there are normally a number of different
departments/divisions, each of which may have its own operational system (e.g.
database). These operational systems generate data day in and day out, and the output
from these individual systems can be transferred to the data warehouse for further use.

Note that: A data warehouse is a centralized repository of data from multiple sources,
while data warehousing is the process of constructing and using a data warehouse.

6. The ETL Process

As we have briefly explained in 4 above, the ETL (Extract, Transform and Load) is a data
integration method where data from one or more source systems is first read (i.e.
extracted), then made to go through some changes (i.e. transformed) and then the
changed data is written (i.e. loaded) to a data warehouse. The figure below depicts
simplified structure of the ETL process.

6
Extract:

During the extraction process, data is collected from disparate data sources or a specific
subset of data is extracted from a particular source database. Extraction is done from
multiple sources for the ultimate goal of deriving some meaningful business insights. This
data may be heterogenous enough to include Online Transaction Processing (OLTP),
social media data, log files, and sensor data, which could be structured, unstructured or
semi-structured data.

Transform:

The second function is transformation of the extracted data. The extracted data is then
checked for validation which implies that data having a desired schema is processed
further and the remaining data that fails the validation test is processed in a different way
in order to make it schema-specific, and hence ready for the rest of the process that
includes loading data into the data warehouse. Therefore, during the transformation
phase of the ETL process data is processed to conform to a uniform schema that is
accepted by the data warehouse. This transformation of data into a desired state include
functions such as data formatting, splitting data, joining data, creating rows and columns,
using lookup tables or creating combinations within the data.

7
Load:

The final step of the ETL process is loading the transformed data into the target data
warehouse. This data, after transformation, is schema-specific catering to the demand of
the data warehouse. Unlike the unstructured or semi-structured data available before the
ETL process, the data is now structured, integrated, subject-oriented, time-variant, and
non-volatile. This data is loaded to the data warehouse, thereby allowing data scientists
to analyze data, gain insights, and create promising business policies.

Note that: There are several open-source and commercial ETL tools available. Some
of them are Informatica PowerCentre, Hevo, Skyvia, Xplenty and IBM DataStage ETL
software.

7. Characteristics of Data Warehousing

The key features of a data warehouse are discussed below −


• Subject Oriented − A data warehouse is subject oriented because it provides
information around a subject rather than the organization's ongoing operations.
These subjects can be product, customers, suppliers, sales, revenue, etc. A data
warehouse does not focus on the ongoing operations, rather it focuses on
modelling and analysis of data for decision making.
• Integrated − A data warehouse is constructed by integrating data from
heterogeneous sources such as relational databases, flat files, etc. This integration
enhances the effective analysis of data.
• Time Variant − The data collected in a data warehouse is identified with a
particular time period. The data in a data warehouse provides information from the
historical point of view.
• Non-volatile − Non-volatile means the previous data is not erased when new data
is added to it. A data warehouse is kept separate from the operational database
and therefore frequent changes in operational database is not reflected in the data
warehouse.

You might also like