Professional Documents
Culture Documents
Business Intelligence
Business Intelligence
29/02/2024
1.class
Chapters:
1. Business intelligence
2. What is data warehouse
3. Data warehouse:a multidimensional data model
4. Data warehouse architect
5. Power BI
6. Project
1. BI general overview
● Strategies: This refers to the overall plan for how data will be collected, analyzed,
and used within the organization.
● Technologies: BI relies on various software tools and systems for data warehousing,
data analysis, reporting, and visualization.
● Applications: BI isn't just about collecting data; it's about using that data to solve
real-world business problems. BI applications can be used for tasks like:
○ Identifying sales trends
○ Understanding customer behavior
○ Optimizing marketing campaigns
○ Improving operational efficiency
Benefits of business intelligence
Why BI?
1. Businesses are drowning in data: Every interaction, transaction, and process within a
company generates data. Without BI, this data becomes a vast, overwhelming ocean. BI acts
as a life raft, helping businesses navigate this data ocean and find the valuable insights
hidden beneath the surface.
2. Intuition has its limits: Traditionally, businesses relied on gut feelings and experience to
make decisions. While these can be valuable, BI offers a more objective approach. By
analyzing past data and identifying trends, BI provides concrete evidence to support
decision-making, leading to more reliable and successful outcomes.
Feature of BI
● Data collection and integration: BI can gather data from various sources like sales
figures, customer interactions, social media, website traffic, and financial records. It
integrates this data into a central location for easy access and analysis.
● Data analysis and reporting: BI tools provide features for data analysis using
techniques like filtering, sorting, grouping, and calculations. You can create reports
1
that present data in a clear and concise format, highlighting trends and patterns.
Key stages of BI
1. Data sourcing
2. Data analysis
3. Situation awareness
4. Risk analysis
5. Decision support
● Data Analysis Tools: These applications allow users to explore and analyze data in
more depth. They offer features for filtering, sorting, grouping, and calculations,
enabling users to uncover hidden patterns and relationships within the data.
2
● Key Performance Indicator (KPI) is a measurable value that reflects how well a
specific objective or goal is being met. It's like a yardstick you use to track progress
towards achieving something important
Technologies:
● Data Warehousing: Data warehouses act as central repositories that store large
volumes of data from various sources in a structured and organized way. This ensures
data consistency and simplifies analysis for BI applications.
● Data Mining: Data mining techniques uncover hidden patterns and trends within
large datasets. BI applications can leverage data mining to identify new customer
segments, predict future sales trends, or detect fraudulent activity.
By using these BI applications and technologies, companies can transform raw data into
actionable insights that empower them to:
2. Data warehouse
A data warehouse is a subject oriented, integrated,time variant and non volatile collection
of data in support of management decision making process
3
● Time-variant: This means the data warehouse stores historical data alongside current
data. It captures information over time, allowing businesses to analyze trends and
patterns.
● Non-volatile: Data in a data warehouse is primarily intended for reading and analysis,
not frequent updates or deletion. Once data is loaded into the data warehouse, it
typically remains there for historical reference. This allows for a consistent picture of
the business over a period of time. Think of it like a library archive - information is
preserved for future reference.
● Simplified Analysis: Data analysts can access all relevant data from one place,
eliminating the need to navigate through multiple disparate systems. This simplifies
the process of data analysis and reporting.
● Faster Query Performance: Data warehouses are optimized for querying large
datasets. This allows for faster and more efficient analysis, enabling users to get
insights quickly.
N.B: Data warehouses are not operational databases: While they store data, they are not
designed for real-time transactions. Their primary purpose is to support data analysis and
reporting for BI purposes.
4
Feature Data Warehouse Operational Database
● Purpose: Focused on business intelligence (BI) and data analysis. Stores historical
data from various sources to identify trends, analyze patterns, and support
decision-making.
● Data Type: Integrated data, meaning data is transformed into a consistent format
from various sources (relational databases, flat files, etc.) Organized around business
subjects like sales or finance.
● Schema: Typically uses a star schema or snowflake schema for optimized querying
and analysis. These schemas prioritize ease of analysis over strict data normalization
practices found in relational databases.
● Updates: Data is updated periodically (batch loads) to ensure data quality and
consistency.
5
● Access: Designed for data analysts, business users, and decision-makers for
querying and exploration. User base is typically smaller than an operational database.
● Performance: Optimized for complex queries on large datasets for historical
analysis.
Heterogeneous DBMS:
● Purpose: Manages and interacts with a collection of disparate databases that use
different schemas and data models. Focuses on providing a unified view and access
point to these heterogeneous databases.
● Data Type: Manages data in its native format, meaning data from each source
database retains its original schema and structure.
● Schema: Doesn't enforce a specific schema. The heterogeneous DBMS acts as a layer
on top of existing databases, allowing them to function independently.
● Updates: Updates are handled by the underlying individual databases. The
heterogeneous DBMS itself might not perform updates directly.
● Access: Designed for a wider range of users who need to access data from various
sources, including application developers and data analysts. User base can be larger
and more diverse.
● Performance: Query performance can be slower compared to data warehouses due to
the need to translate queries across different database schemas
6
diagram of the Knowledge Discovery in Databases (KDD) process. It outlines the steps
involved in extracting valuable knowledge from large datasets
1. Data Selection: This involves identifying and selecting the relevant data for your
analysis task. Data can come from various sources like sales figures, customer
records, social media feeds, sensor readings, and more. The key is to choose data that
aligns with your specific business goals and avoids irrelevant information.
2. Data Cleaning: Real-world data is often messy and inconsistent. This stage involves
cleaning the data to identify and rectify errors, missing values, and inconsistencies.
Common data cleaning tasks include removing duplicates, correcting typos, and
handling outliers.
3. Data Integration: Data for analysis may be scattered across different databases or
systems. This stage integrates data from various sources into a unified format. The
data is transformed to ensure consistency and facilitate seamless analysis.
7
4. Data warehouse: The data warehouse is designed for efficient storage and retrieval
of large datasets. It allows for historical data to be kept alongside current data,
providing a long-term view for analysis.
5. Data Mining: This is the core stage where hidden patterns and trends are extracted
from the data. Data mining techniques like classification, clustering, regression, and
association rule learning are used to uncover insights that might not be evident
through simple queries.
6. Pattern Evaluation: Not all discovered patterns are equally valuable. This stage
involves evaluating the patterns based on criteria like interestingness, usefulness, and
actionability. You identify the patterns that hold the most significance for your
business goals.
7. Knowledge Representation: The valuable patterns are then presented in a way that's
understandable to human users. This could involve using tables, charts, graphs, or
even visualizations.
8. Knowledge Discovery: This is the final stage where the knowledge gained from the
analysis is integrated into the overall business knowledge. The insights are used to
inform decision-making, improve processes, or develop new products or services.
Overall, the KDD process is a systematic approach to transforming raw data into
actionable knowledge. By following these steps, businesses can unlock the hidden
potential within their data and make data-driven decisions for better outcomes.
QUESTION : why create and use data warehouses for OLAP instead of using the already
existing databases
03/14/2024
2.class
8
Data cube : a data cube allows data to be modeled and viewed in multiple dimension .it is
defined by dimension and facts
While the term "cube" suggests a 3D shape, a data cube can have any number of dimensions.
For simplicity, a 3D data cube is taken as an example. Imagine a box with:
● Base: Represents two of your chosen dimensions (e.g., product category and
customer segment).
○ Height: Represents the third dimension (e.g., time - year, quarter, month).
○ Each cell within the cube: Represents a specific aggregation (calculation) of
your data point at the intersection of those three dimensions. Common
aggregations include sum (total sales), average (average price), or count
(number of transactions).
Example:
Imagine a retail business with a data cube that analyzes sales data. The base of the cube
might have:
9
● Product Category: (e.g., Clothing, Electronics, Home Goods)
● Customer Segment: (e.g., Loyal Customers, New Customers, Discount Seekers)
Each cell within this data cube would then hold a specific value based on the intersection of
these dimensions. For instance, a particular cell might represent the "Total Sales of Clothing
to Loyal Customers in the last Quarter."
Star schema: the star schema is a prominent design pattern used in conceptual modeling for
data warehouses. It's a specific way of organizing the data subjects, dimensions, and
measures identified during the conceptual modeling phase.
10
Structure:
● A star schema resembles a star symbol, with the fact table at its center. The fact table
stores the core data (measures) of interest for analysis, often including foreign keys
that link to the surrounding dimension tables.
● Dimension tables: These tables radiate outwards from the fact table, each
representing a specific dimension (descriptive attribute) relevant to the analysis.
Dimension tables typically have a primary key and may also have other relevant
attributes related to the dimension.
● Efficient Query Performance: The structure of the star schema optimizes query
performance for analytical workloads. Queries often focus on retrieving data from the
fact table based on filters applied to the dimensions.
● Scalability: New dimension tables can be easily added to the star schema to
accommodate additional data subjects or analysis needs without affecting the existing
structure.
1
A snowflake schema extends the star schema concept by further normalizing the dimension
tables. In a star schema, dimension tables may contain redundant data to optimize query
performance. A snowflake schema breaks down these dimension tables into multiple, more
granular tables with smaller sizes and reduced redundancy.
1
Query :A query is a specific question or request posed to the system to retrieve data.
Query Performance:This refers to how fast and efficiently a database system can process and return the results of a
query
Redundant data refers to data that is duplicated or stored in multiple places within a database
11
N.B: the snowflake structure can reduce effectiveness of browsing ,since more joins will be
needed
Imagine a star schema where the "Customer" dimension table stores both customer
demographics (name, address) and purchase history details (items bought, dates). In a
snowflake schema, you might create separate tables for "Customer Demographics" and
"Customer Transactions," linked by a common customer ID.
● Improved Data Maintainability: Normalized tables are often easier to maintain and
update as the business evolves. Adding new customer attributes becomes simpler as
you can create a new table without affecting existing tables.
● Increased Query Complexity: While snowflake schemas can improve data integrity,
they can also lead to more complex queries compared to star schemas. Queries may
12
need to join multiple tables to retrieve the necessary data.
The decision between a star schema and a snowflake schema depends on several factors:
Fact Constellations :
● In a fact constellation approach, you would create separate fact tables for "Sales,"
"Website Traffic," and "Customer Service Interactions," but they would all share some
common dimension tables (e.g., product, customer, time). This allows for
13
flexibility in analyzing data from different perspectives.
● Increased Complexity: Managing and querying data warehouses with multiple fact
tables can be more complex compared to simpler star schema designs.
● Potential Performance Impact: Joins across multiple fact tables and dimension
tables can affect query performance compared to a single fact table in a star schema.
Fact constellations are a good choice for data warehouses that meet these criteria:
14
Concept hierarchies
Concept hierarchies may also be defied by grouping values for given dimension or attributes,
resulting in a set-grouping hierarchy
OLAP operation
15
1. Drill Down:
● This operation allows users to navigate deeper into a specific dimension of the data.
Imagine zooming in on a map.
● For instance, if you're analyzing sales data, you could start by looking at total sales
for all products. Then, you could drill down into the "Product" dimension to see sales
figures for specific product categories or even individual products.
2. Roll Up:
3. Slice:
4. Dice:
16
A star-net query model
The starnet query model is a way to visualize how users can query and analyze data
within a multidimensional data warehouse
The starnet query model serves as a visual guide for users to understand how they can
navigate and analyze data across different levels of detail within each dimension of the data
warehouse. And also Helps users understand how dimensions and their hierarchies are
structured, enabling them to filter and drill down into specific details during analysis.
Benefits:
17
data warehouse a multi-tiered architecture
a multi-tiered architecture refers to a way of organizing the data warehouse system into
distinct layers, each with specific functionalities.
● This is the starting point. It encompasses the various operational systems within the
organization that generate data, such as sales transaction systems, customer
relationship management (CRM) systems, financial systems, sensor data, and social
media feeds.
● Data from various sources is first extracted and landed in the staging area. This
temporary storage area serves as a buffer zone for cleansing, transforming, and
validating the data before it's loaded into the data warehouse.
● The data integration layer is responsible for extracting data from the source systems,
transforming it into a consistent format, and handling any inconsistencies or errors.
This layer may also involve techniques like data cleansing, deduplication, and schema
mapping.
● This is the core layer of the architecture, where the cleansed and transformed data
resides. The data warehouse is typically organized using schemas like star schema or
snowflake schema to optimize storage and retrieval for analysis.
● This layer provides users with tools and interfaces to access, analyze, and visualize
data stored within the data warehouse. These tools can include OLAP (Online
Analytical Processing) tools for multidimensional analysis, data mining tools for
uncovering hidden patterns, and reporting tools for creating dashboards and
visualizations.
18
1. Scalability: Various components can be added, deleted, or updated in accordance with
the data warehouse’s shifting needs and specifications.
2. Better Performance: The several layers enable parallel and efficient processing,
which enhances performance and reaction times.
3. Modularity: The architecture supports modular design, which facilitates the creation,
testing, and deployment of separate components.
4. Security: The data warehouse’s overall security can be improved by applying various
security measures to various layers.
5. Improved Resource Management: Different tiers can be tuned to use the proper
hardware resources, cutting expenses overall and increasing effectiveness.
19
Data mart :is a focused subset of a data warehouse or data source that caters to the specific
needs of a particular department or business unit. For example, a marketing data mart may
limit its topics to customers, goods, and sales
20