Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 41

Business Intelligence and Data Warehousing Introduction

Muchake Brian
Email: bmuchake@gmail.com
Tel: 0701178573

Do not Keep Company With Worthless People


Psalms 26:11
What is Business Intelligence?
 Business Intelligence-BI is a set of processes, tools, and technologies to transform
business data into timely and accurate information to support decisional processes.
 BI systems are used by decision makers to get comprehensive knowledge of the
business and to define and support their business strategies.
 The goal is to enable data-based decisions aimed at gaining competitive advantage,
improving operative performance, responding more quickly to changes, increasing
profitability, and, in general, creating added value for the company.
 BI is the “opposite” of Artificial Intelligence (AI).
 AI systems make decisions for the users, while BI systems help users make the right
decisions, based on the available data. Many BI techniques have roots in AI, though.
What is Business Intelligence? [Cont’d]
 BI is a broad category of applications and technologies for gathering, storing,
analysing and providing access to data to help enterprise users make better business
decisions.
 BI applications include the activities of decision support systems, query and reporting,
online analytical processing (OLAP), statistical analysis, forecasting, and data mining.
 BI applications can be:
-Mission-critical and integral to an enterprise’s operations or occasional to meet a
special requirement;
-Enterprise – wide or local to one division, department, or project;
-Centrally initiated or driven by user demand.
What is Business Intelligence? [Cont’d]
 The main goal of BI is to provide sufficient information for making business decisions.
 Depending on the aim of the business decision, BI methods can provide information
about company customers, market trends, effectiveness of marketing campaigns,
companies competitors, or even predict future activities.
 In a business intelligence paradigm,
1.Data: Items that are the most elementary descriptions of things, events, activities, and
transactions. These may be internal or external. Raw streams of facts.
2. Information: Organized data that has meaning and value.
3. Knowledge: Processed data or information that conveys understanding or learning
applicable to a problem or activity.
What is Business Intelligence? [Cont’d]

BI Pyramid
What is Business Intelligence? [Cont’d]
Examples BI Queries
Q1: On October 11, 2000, find the 5 top-selling products for each product subcategory that
contributes more than 20% of the sales within its product category.
Q2: As of March 15, 1995, determine shipping priority and potential gross revenue of the orders
that have the 10 largest gross revenues among the orders that had not yet been shipped.
Consider orders from the book market segment only.
 Regular database models and systems are not suitable for this type of queries.
 BI approaches can range from a simple excel spread sheet to a major competitive intelligence
undertaking.
Why BI for Businesses
 The Web made BI more necessary. This means Customers do not appear
”physically” in the store. And thus Customers can change to other stores more easily.
 As a result: You have to know your customers using data and BI. Web logs make is
possible to analyze customer behavior in more detailed than before (what was not
bought?). Combine web data with traditional customer data.
 Wireless Internet adds further to this. Customers are always ”online”. Customer’s
position is known. Combine position and knowledge about customer => very valuable.
 Organizations will expect IT leaders in charge of BI and performance management
initiatives to help transform and significantly improve their business.
Why BI for Businesses
 Due to lack of information, processes, and tools, through the 21st century, more than
35% of the top 100 companies in the world are bound to fail to make insightful
decisions about significant changes in their business and markets.
 Today, it is difficult to find a successful enterprise that has not leveraged BI technology
for their business.
What are the BI Challenges
1) Complex and unusable models. Many DB models are difficult to understand. DB
models do not focus on a single clear business purpose
2) Same data found in many different systems. Example: customer data in many
different systems. The same concept is defined differently
3) Data is suited for operational systems. Accounting, billing, etc. Do not support
analysis across business functions
4) Data quality is bad. Missing data, imprecise data, different use of systems
5) Data are ”volatile”. Data deleted in operational systems (6 months). Data change
over time will lead to no historical information
What are the BI Challenges
 The solution to these challenges is a new analysis environment with a data warehouse
at the core, where data is:
1.Integrated (logically and physically)
2.Subject oriented (versus function oriented)
3.Supporting management decisions (different organization)
4.Stable (data is not deleted, several versions)
5.Time variant (data can always be related to time)
What is a Data Warehouse?
 The term "Data Warehouse" was first coined by Bill Inmon in 1990.
 A DWH is a subject-oriented, integrated, time-variant, historical and non-volatile collection of
data in support of management’s decision-making process.”—Inmon.
 A DWH is a single, complete, and consistent store of data obtained from a variety of sources
and made available to end users in a way they can understand and use it in a business
context. -[Barry Devlin].
 A DWH is a decision support database that is maintained separately from the organization’s
operational database.
 A DWH is an integrated database that supports information processing by providing a solid
platform of consolidated, historical data for analysis.
 DWHs support business decisions by collecting, consolidating, and organizing data for
reporting and analysis with tools such as online analytical processing (OLAP) and data
mining.
What is a Data Warehouse?
 Data warehousing is the process of constructing and using a data warehouse. A data
warehouse is constructed by integrating data from multiple heterogeneous sources that
support analytical reporting, structured and/or ad hoc queries, and decision making.
 Data warehousing involves data cleaning, data integration, and data consolidations.
 Data Warehousing is a process of transforming data into information and making it available
to users in a timely enough manner to make a difference.
 Data Warehousing -- It is a process. It involves techniques for assembling and managing
data from various sources for the purpose of answering business questions. Thus making
decisions that were not previous possible.
 Although data warehouses are built on relational database technology, the design of a data
warehouse database differs substantially from the design of an online transaction processing
system (OLTP) database.
Data Warehouse Applications
 As discussed before, a data warehouse helps business executives to organize, analyze, and
use their data for decision making.
 A data warehouse serves as a sole part of a plan-execute-assess "closed-loop" feedback
system for the enterprise management.
 Data warehouses are widely used in the following fields:
1)Financial services
2)Banking services
3)Consumer goods
4)Retail sectors
5)Controlled manufacturing
DWH Features of a Data Warehouse
1. DWH—Subject-Oriented
A data warehouse is subject oriented because it provides information around a
subject rather than the organization's ongoing operations.
These subjects can be product, customers, suppliers, sales, revenue, etc.
A data warehouse does not focus on the ongoing (daily) operations, rather it focuses
on modelling and analysis of data for decision making.
Provide a simple and concise view around particular subject issues by excluding data
that are not useful in the decision support process.
DWH Features of a Data Warehouse [Cont’d]
2. DWH—Integrated
A data warehouse is constructed by integrating data from heterogeneous sources
such as relational databases, flat files, on-line transaction records etc. This integration
enhances the effective analysis of data.
Constructed by integrating multiple, heterogeneous data sources relational
databases, flat files, Data cleaning and data integration techniques are applied.
DWH Features of a Data Warehouse [Cont’d]
3. Data Warehouse—Time Variant
The data collected in a data warehouse is identified with a particular time period. The
data in a data warehouse provides information from the historical point of view.
The time horizon for the data warehouse is significantly longer than that of operational
systems. Operational database: current value data. Data warehouse database: provide
information from a historical perspective (e.g., past 5-10 years).
Every key structure in the data warehouse contains an element of time, explicitly or
implicitly
DWH Features of a Data Warehouse [Cont’d]
4. Data Warehouse—Non-Volatile
Non-volatile means the previous data is not erased when new data is added to it. A
data warehouse is kept separate from the operational database and therefore frequent
changes in operational database is not reflected in the data warehouse.
Rem: A physically separate store of data transformed from the operational environment.
Operational update of data does not occur in the data warehouse environment.
 Does not require transaction processing, recovery, and concurrency control
mechanisms
Why Separate a DWH From Operational Databases
 An operational database is constructed for well-known tasks and workloads such as
searching particular records, indexing, etc. In contract, data warehouse queries are
often complex and they present a general form of data.
 Operational databases support concurrent processing of multiple transactions.
Concurrency control and recovery mechanisms are required for operational
databases to ensure robustness and consistency of the database.
 An operational database query allows to read and modify operations, while an OLAP
query needs only read only access of stored data.
 An operational database maintains current data. On the other hand, a data
warehouse maintains historical data.
Why Implement DWHs
 An exponential increase in operational data has made computers the only tools suitable
for providing data for decision-making performed by business managers.
Why Implement DWHs [Cont’d]
 Over the years, storage and management of data from various operational systems
has become a great challenge.
 Long-term strategic planning has become increasingly important in the modern global
market. For this reason, companies have worked towards: Access to information at all
levels, and Survival and prosper in a competitive world.
 The focus of technology shifted from data input and capture through the operational
systems to information access and availability.
Benefits of a Data Warehouse
Successfully implemented data warehouses can bring benefits to an organization as below:
1. Improve Organisational-Performance:-Operational Systems and Data Warehouse.
2. Simplify Data access- Make Complex Data from Many Systems Available in One.
3. Accuracy- Standardize and Cleanse data and its operations.
4. Business Value - Provide the Foundation for the Business to have access to Information to
Make Timely, and Informed Decisions.
5.Direct Use- Non-IT personnel can make reports in a data warehouse.
6.Provides for Return on Investment (ROI): A DWH gives new insights into Customer habits
and this may help the organisation develop new products that suites the customer behaviour
and this this will help the organisation sale more products.
Benefits of a Data Warehouse [Cont’d]
7. A DWH leads to cost savings and hence revenue increases.
8. A DWH facilitates cross-selling of products. Cross selling is the sell of a different
product or service to an existing customer.
9. A DWH helps identify and target most profitable customers.
DWH Complex Queries
 Typical data warehouse queries (Case study: Banking industry)
1.Which corporate customers are above the average account usage per month and how
does this correlate to their business?
2.Who were the first hundred customers in Jan 2006 and how does this list compare with
the list for the previous three years?
3.What is the revenue by destination, by month, by business unit, by region?
Factors Affecting DWH- Implementation
1. High investment :The initial cost of building a data warehouse is very high and ROI cannot
easily be explained.
2. Large storage: Data warehouse stores useful historical data of an enterprise.
3. Maintenance of source systems: If data source systems are not cleaned, we automatically
get dirty data into the data warehouse. Decision markers using such data are likely to be
mislead and their decisions may lead to loss of company revenue.
4. Qualified staff: Data warehouse building and maintenance requires skilled personnel.

1.
Types of Errors in Constructing a Data Warehouse
1. Incomplete errors
 Missing Fields
 Records or Fields That, by Design, are not Being Recorded
2. Incorrect errors
 Wrong Calculations, Aggregations
 Duplicate Records
 Wrong Information Entered into Source System
3. Inconsistency errors
 Inconsistent Use of Different Codes
 Conflicting codes
Best Practices in DWH Construction
 Data Warehousing is a process and not a project.
 Complete requirements and design.
 Prototyping is key to business understanding.
 Utilizing proper aggregations and detailed data.
 A full iterative approach is essential.
 Training is an on-going process.
 Build data integrity checks into your system.
 Note: A data warehouse does not require transaction processing, recovery, and concurrency controls,
because it is physically stored and separate from the operational database.
Basic elements of a DWH Environment
DWH-End User Community
 Data warehouses and their ultimate use are shaped profoundly by their user. The end-
user community can be characterized by looking at the type of individual who
constitutes the community. There are four types of users:
1.Farmers
 The farmer is the most predominant type of a user found in a DWH environment.
 The farmer is a predictable person since he or she does the same activity on a routine
basis.
 The type of queries the farmer submits vary only by the type of data that the query is
submitted for. The type of query is run repeatedly by the farmer.
 The queries submitted by the farmer tend to be short since he or she knows what he
wants so he or she goes directly to where the data is.
DWH-End User Community [Cont’d]
 The queries that the farmer submits have a similar pattern of access. Thus if the
farmer submits lots of queries on Mondays one week, then it is a good bet that the
farmer will submit lots of queries on Monday next week. The farmer has a high hit rate
on finding what ever he or she wants.
2. Explorer
 The explorer is the person that does not know what he or she wants. The explorer
operates in a mode of complete unpredictability. The explorer has a habit of looking at
large volumes of data.
 The explorer does not know what he or she wants until the exploration process is
begun. The explorer looks at patterns of data that may or may not exist. The operator
operates in a heuristic mode. In a heuristic mode the explorer does not know what the
next step of analysis is going to be until the results of the current step are complete.
DWH-End User Community [Cont’d]
3. Miner
The miner is the person who digs into piles of data and determines whether the data is
saying something or not. A miner is one who is presented an assertion and is asked to
determine the validity and the strength of the assertion. The miner usually uses statistical
tools.
The miner creates queries that are enormous in size. The miner also operates in a
heuristic manner. The miner often operates closely in tandem with the explorer. The
explorer creates a assertions and hypotheses, while the miner determines the strength of
those assertions and hypotheses.
The miner has special skills (usually mathematical) that set the miner apart from other
technicians.
DWH-End User Community [Cont’d]
4. Tourists
Tourist is an individual who knows where to find things. The tourist has a breadth of
knowledge, as opposed to the depth of knowledge. The tourist is familiar with both formal
and informal systems. The tourist knows how to use the internet. He is familiar with meta
data in its many forms. The tourist knows where to find indexes and how to use them.
The tourist is as familiar with structured data as he or she is with unstructured data. The
tourist is familiar with the source code and how to read and interpret it.
Notes on End user Community is picked from Building the data warehouse by W.H.
Inmon
Data Warehouses Database Vs OLTP Database
Data warehouse database OLTP database
Designed for analysis of business
Designed for real-time business operations
measures by categories and attributes
Optimized for bulk loads and large, Optimized for a common set of
complex, unpredictable queries that transactions, usually adding or retrieving a
access many rows per table single row at a time per table
Optimized for validation of incoming data
Loaded with consistent, valid data;
during transactions; uses validation data
requires no real time validation
tables
Supports few concurrent users relative to
Supports thousands of concurrent users
OLTP
Data Warehouse (OLAP) vs. Operational DBMS (OLTP)
 OLTP (on-line transaction processing)
 Major task of traditional relational DBMS
 Day-to-day operations such as purchasing, inventory, banking, manufacturing, payroll,
registration, accounting, etc.
 OLAP (on-line analytical processing)
 Major task of data warehouse system
 Data analysis and decision making
 Distinct features (OLTP vs. OLAP):
 User and system orientation: customer vs. market
 Data contents: current, detailed vs. historical, consolidated
Data Warehouse (OLAP) vs. Operational DBMS (OLTP) [Cont’d]
 Database design: ER + application vs. star + subject
 View: current, local vs. evolutionary, integrated
 Access patterns: update vs. read-only but complex queries
 Distinct features (OLTP vs. OLAP):
 User and system orientation: customer vs. market
 Data contents: current, detailed vs. historical, consolidated
 Database design: ER + application vs. star + subject
 View: current, local vs. evolutionary, integrated
 Access patterns: update vs. read-only but complex queries
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
A Data Warehouse Supports OLTP
 A DWH supports an OLTP system by providing a place for the OLTP database to offload data
as it accumulates, and by providing services that would complicate and degrade OLTP
operations if they were performed in the OLTP database.
 Without a DWH to hold historical information, data is archived to static media such as
magnetic tape, or allowed to accumulate in the OLTP database.
 If data is simply archived for preservation, it is not available or organized for use by analysts
and decision makers.
 If data is allowed to accumulate in the OLTP so it can be used for analysis, the OLTP
database continues to grow in size and requires more indexes to service analytical and report
queries.
 These queries access and process large portions of the continually growing historical data
and add a substantial load to the database. The large indexes needed to support these
queries also tax the OLTP transactions with additional index maintenance.
A Data Warehouse Supports OLTP [Cont’d]
 These queries can also be complicated to develop due to the typically complex OLTP
database schema.
 A data warehouse offloads the historical data from the OLTP, allowing the OLTP to
operate at peak transaction efficiency. High volume analytical and reporting queries are
handled by the data warehouse and do not load the OLTP, which does not need
additional indexes for their support. As data is moved to the data warehouse, it is also
reorganized and consolidated so that analytical queries are simpler and more efficient.
OLAP is a Data Warehouse Tool
 Online analytical processing (OLAP) is a technology designed to provide superior
performance for ad hoc business intelligence queries. OLAP is designed to operate
efficiently with data organized in accordance with the common dimensional model used in
data warehouses.
 A data warehouse provides a multidimensional view of data in an intuitive model designed to
match the types of queries posed by analysts and decision makers.
 OLAP organizes data warehouse data into multidimensional cubes based on this
dimensional model, and then preprocesses these cubes to provide maximum performance
for queries that summarize data in various ways.
 For example, a query that requests the total sales income and quantity sold for a range of
products in a specific geographical region for a specific time period can typically be
answered in a few seconds or less regardless of how many hundreds of millions of rows of
data are stored in the data warehouse database.
OLAP is a Data Warehouse Tool [Cont’d]
 OLAP is not designed to store large volumes of text or binary data, nor is it designed to
support high volume update transactions. The inherent stability and consistency of
historical data in a data warehouse enables OLAP to provide its remarkable
performance in rapidly summarizing information for analytical queries.
 In SQL Server, Analysis Services provides tools for developing OLAP applications and a
server specifically designed to service OLAP queries.
Data Mining is a Data Warehouse Tool
 Data mining is a technology that applies sophisticated and complex algorithms to
analyze data and expose interesting information for analysis by decision makers.
 Whereas OLAP organizes data in a model suited for exploration by analysts, data mining
performs analysis on data and provides the results to decision makers. Thus, OLAP
supports model-driven analysis and data mining supports data-driven analysis.
 Data mining has traditionally operated only on raw data in the data warehouse database
or, more commonly, text files of data extracted from the data warehouse database.
 In SQL Server, Analysis Services provides data mining technology that can analyze data
in OLAP cubes, as well as data in the relational data warehouse database. In addition,
data mining results can be incorporated into OLAP cubes to further enhance model-
driven analysis by providing an additional dimensional viewpoint into the OLAP model.
Data Mining is a Data Warehouse Tool [Cont’d]
 For example, data mining can be used to analyze sales data against customer attributes
and create a new cube dimension to assist the analyst in the discovery of the information
embedded in the cube data.

You might also like