Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Business intelligence

29/02/2024
1.class

Chapters:
1. Business intelligence
2. What is data warehouse
3. Data warehouse:a multidimensional data model
4. Data warehouse architect
5. Power BI
6. Project

1. BI general overview

What is business intelligence?

➢ Business intelligence combines business analytics, data mining,data visualization,


data tools and infrastructure, and best practices to help organizations make more
data-driven decisions.

➢ Business intelligence (BI) is a comprehensive approach that utilizes strategies,


technologies, and applications to analyze and manage data for business purposes.

● Strategies: This refers to the overall plan for how data will be collected, analyzed,
and used within the organization.
● Technologies: BI relies on various software tools and systems for data warehousing,
data analysis, reporting, and visualization.
● Applications: BI isn't just about collecting data; it's about using that data to solve
real-world business problems. BI applications can be used for tasks like:
○ Identifying sales trends
○ Understanding customer behavior
○ Optimizing marketing campaigns
○ Improving operational efficiency
Benefits of business intelligence

Improved decision-making: BI empowers businesses to move beyond intuition and


guesswork by providing data-driven insights.
Enhanced performance: By analyzing data on various aspects like sales, marketing,
customer behavior, and operations, businesses can identify areas for improvement and
optimize their strategies.
Identification of trends and opportunities: BI helps uncover hidden trends in the market
and anticipate customer needs.
Increased customer satisfaction: Through data analysis, businesses can gain a deeper
understanding of their customer base.
Reduced costs: BI can help identify areas of waste and inefficiency in operations. By
optimizing processes based on data insights, businesses can lower costs and improve their
bottom line.

Why BI?
1. Businesses are drowning in data: Every interaction, transaction, and process within a
company generates data. Without BI, this data becomes a vast, overwhelming ocean. BI acts
as a life raft, helping businesses navigate this data ocean and find the valuable insights
hidden beneath the surface.

2. Intuition has its limits: Traditionally, businesses relied on gut feelings and experience to
make decisions. While these can be valuable, BI offers a more objective approach. By
analyzing past data and identifying trends, BI provides concrete evidence to support
decision-making, leading to more reliable and successful outcomes.

Feature of BI

● Data collection and integration: BI can gather data from various sources like sales
figures, customer interactions, social media, website traffic, and financial records. It
integrates this data into a central location for easy access and analysis.

● Data warehousing and management: BI utilizes data warehouses, which are


centralized repositories that store large volumes of data in a structured and organized
way. This ensures data consistency and simplifies analysis.

● Data analysis and reporting: BI tools provide features for data analysis using
techniques like filtering, sorting, grouping, and calculations. You can create reports

1
that present data in a clear and concise format, highlighting trends and patterns.

● Data visualization: BI tools excel at transforming complex data sets into


easy-to-understand charts, graphs, and maps. These visualizations make it easier to
identify trends, communicate insights effectively, and spot areas for improvement

● Predictive analytics: Advanced BI solutions offer features for predictive analytics.


This leverages machine learning and statistical techniques to forecast future trends
and anticipate customer behavior. This allows businesses to make proactive decisions
and optimize strategies.

Key stages of BI
1. Data sourcing
2. Data analysis
3. Situation awareness
4. Risk analysis
5. Decision support

Bi application BI application and technologies can help companies


make decision
Applications:

● Dashboards and Reporting Tools: These applications provide real-time or


up-to-date visual summaries of key metrics and trends. Decision-makers can quickly
see how different parts of the business are performing and identify areas requiring
attention.

● Data Analysis Tools: These applications allow users to explore and analyze data in
more depth. They offer features for filtering, sorting, grouping, and calculations,
enabling users to uncover hidden patterns and relationships within the data.

● Online Analytical Processing (OLAP) Tools: These applications facilitate


multi-dimensional data analysis. Users can drill down into specific details or pivot
data to view it from different angles, allowing for a more comprehensive
understanding of complex business issues.

2
● Key Performance Indicator (KPI) is a measurable value that reflects how well a
specific objective or goal is being met. It's like a yardstick you use to track progress
towards achieving something important

Technologies:

● Data Warehousing: Data warehouses act as central repositories that store large
volumes of data from various sources in a structured and organized way. This ensures
data consistency and simplifies analysis for BI applications.

● Data Mining: Data mining techniques uncover hidden patterns and trends within
large datasets. BI applications can leverage data mining to identify new customer
segments, predict future sales trends, or detect fraudulent activity.

By using these BI applications and technologies, companies can transform raw data into
actionable insights that empower them to:

● Identify and prioritize business opportunities


● Make data-driven decisions with greater confidence
● Optimize marketing campaigns and resource allocation
● Improve operational efficiency and reduce costs
● Gain a competitive edge in the marketplace

2. Data warehouse
A data warehouse is a subject oriented, integrated,time variant and non volatile collection
of data in support of management decision making process

● Subject-oriented: This means the data warehouse is organized around specific


business subjects (topics or areas of interest) like sales, customers, products, finance,
or marketing. Data relevant to each subject is stored together, making it easier for
analysts to focus on the information they need.

● Integrated: Data is collected from various sources within an organization, such as


sales databases, customer relationship management (CRM) systems, website
analytics, and financial records. The data warehouse integrates this data from different
sources into a consistent format. This eliminates inconsistencies and allows for easier
analysis across different departments.

3
● Time-variant: This means the data warehouse stores historical data alongside current
data. It captures information over time, allowing businesses to analyze trends and
patterns.

● Non-volatile: Data in a data warehouse is primarily intended for reading and analysis,
not frequent updates or deletion. Once data is loaded into the data warehouse, it
typically remains there for historical reference. This allows for a consistent picture of
the business over a period of time. Think of it like a library archive - information is
preserved for future reference.

Benefits of Data Warehouses for BI:

● Improved Data Quality: By centralizing data and enforcing consistency, data


warehouses ensure everyone in the organization is working with the same accurate
information.

● Simplified Analysis: Data analysts can access all relevant data from one place,
eliminating the need to navigate through multiple disparate systems. This simplifies
the process of data analysis and reporting.

● Faster Query Performance: Data warehouses are optimized for querying large
datasets. This allows for faster and more efficient analysis, enabling users to get
insights quickly.

● Supports Historical Analysis: Data warehouses store historical data alongside


current data. This allows businesses to analyze trends and patterns over time,
providing valuable insights into customer behavior, market changes, and overall
performance.

● Subject-Oriented View: Data warehouses can be structured around specific business


subjects, like sales, marketing, or finance. This allows analysts to focus on relevant
data for their area of expertise.

N.B: Data warehouses are not operational databases: While they store data, they are not
designed for real-time transactions. Their primary purpose is to support data analysis and
reporting for BI purposes.

Data Warehouse vs. Operational Database:

4
Feature Data Warehouse Operational Database

Purpose Analyze historical data for Support day-to-day business


trends and insights operations and transactions

Data Type Integrated data from various Current, detailed data


sources, subject-oriented specific to its system

Update Frequency Relatively infrequent Frequent updates (real-time


updates (batch loads) or near real-time)

Focus Trends, patterns, historical Day-to-day operations,


analysis efficiency, accuracy

Access Designed for data analysis Designed for frequent


by a limited number of users transactions by a large
number of users

Optimization Optimized for querying Optimized for fast insertions


large datasets and updates

Example Systems BigQuery, Redshift, Oracle, SQL Server,


Snowflake MySQL

Data warehouse V.S a heterogeneous DBMS (Heterogeneous Database


Management System)
Data Warehouse:

● Purpose: Focused on business intelligence (BI) and data analysis. Stores historical
data from various sources to identify trends, analyze patterns, and support
decision-making.
● Data Type: Integrated data, meaning data is transformed into a consistent format
from various sources (relational databases, flat files, etc.) Organized around business
subjects like sales or finance.
● Schema: Typically uses a star schema or snowflake schema for optimized querying
and analysis. These schemas prioritize ease of analysis over strict data normalization
practices found in relational databases.
● Updates: Data is updated periodically (batch loads) to ensure data quality and
consistency.

5
● Access: Designed for data analysts, business users, and decision-makers for
querying and exploration. User base is typically smaller than an operational database.
● Performance: Optimized for complex queries on large datasets for historical
analysis.

Heterogeneous DBMS:

● Purpose: Manages and interacts with a collection of disparate databases that use
different schemas and data models. Focuses on providing a unified view and access
point to these heterogeneous databases.
● Data Type: Manages data in its native format, meaning data from each source
database retains its original schema and structure.
● Schema: Doesn't enforce a specific schema. The heterogeneous DBMS acts as a layer
on top of existing databases, allowing them to function independently.
● Updates: Updates are handled by the underlying individual databases. The
heterogeneous DBMS itself might not perform updates directly.
● Access: Designed for a wider range of users who need to access data from various
sources, including application developers and data analysts. User base can be larger
and more diverse.
● Performance: Query performance can be slower compared to data warehouses due to
the need to translate queries across different database schemas

6
diagram of the Knowledge Discovery in Databases (KDD) process. It outlines the steps
involved in extracting valuable knowledge from large datasets

1. Data Selection: This involves identifying and selecting the relevant data for your
analysis task. Data can come from various sources like sales figures, customer
records, social media feeds, sensor readings, and more. The key is to choose data that
aligns with your specific business goals and avoids irrelevant information.

2. Data Cleaning: Real-world data is often messy and inconsistent. This stage involves
cleaning the data to identify and rectify errors, missing values, and inconsistencies.
Common data cleaning tasks include removing duplicates, correcting typos, and
handling outliers.

3. Data Integration: Data for analysis may be scattered across different databases or
systems. This stage integrates data from various sources into a unified format. The
data is transformed to ensure consistency and facilitate seamless analysis.

7
4. Data warehouse: The data warehouse is designed for efficient storage and retrieval
of large datasets. It allows for historical data to be kept alongside current data,
providing a long-term view for analysis.
5. Data Mining: This is the core stage where hidden patterns and trends are extracted
from the data. Data mining techniques like classification, clustering, regression, and
association rule learning are used to uncover insights that might not be evident
through simple queries.

6. Pattern Evaluation: Not all discovered patterns are equally valuable. This stage
involves evaluating the patterns based on criteria like interestingness, usefulness, and
actionability. You identify the patterns that hold the most significance for your
business goals.

7. Knowledge Representation: The valuable patterns are then presented in a way that's
understandable to human users. This could involve using tables, charts, graphs, or
even visualizations.

8. Knowledge Discovery: This is the final stage where the knowledge gained from the
analysis is integrated into the overall business knowledge. The insights are used to
inform decision-making, improve processes, or develop new products or services.

Overall, the KDD process is a systematic approach to transforming raw data into
actionable knowledge. By following these steps, businesses can unlock the hidden
potential within their data and make data-driven decisions for better outcomes.

QUESTION : why create and use data warehouses for OLAP instead of using the already
existing databases

03/14/2024
2.class

data warehouse: a multidimensional data model


A traditional relational database model organizes data in tables with rows and columns. A
multidimensional data model, however, views data from various perspectives or dimensions,
like time, product, customer, location, etc.

8
Data cube : a data cube allows data to be modeled and viewed in multiple dimension .it is
defined by dimension and facts

While the term "cube" suggests a 3D shape, a data cube can have any number of dimensions.
For simplicity, a 3D data cube is taken as an example. Imagine a box with:

● Base: Represents two of your chosen dimensions (e.g., product category and
customer segment).
○ Height: Represents the third dimension (e.g., time - year, quarter, month).
○ Each cell within the cube: Represents a specific aggregation (calculation) of
your data point at the intersection of those three dimensions. Common
aggregations include sum (total sales), average (average price), or count
(number of transactions).

Example:

Imagine a retail business with a data cube that analyzes sales data. The base of the cube
might have:

9
● Product Category: (e.g., Clothing, Electronics, Home Goods)
● Customer Segment: (e.g., Loyal Customers, New Customers, Discount Seekers)

The height of the cube could represent the time dimension:

● Time: (e.g., Year, Quarter, Month)

Each cell within this data cube would then hold a specific value based on the intersection of
these dimensions. For instance, a particular cell might represent the "Total Sales of Clothing
to Loyal Customers in the last Quarter."

Conceptual Modeling of data warehouses


This stage lays the foundation for designing and building an effective data warehouse that
caters to your specific business needs

Star schema: the star schema is a prominent design pattern used in conceptual modeling for
data warehouses. It's a specific way of organizing the data subjects, dimensions, and
measures identified during the conceptual modeling phase.

10
Structure:

● A star schema resembles a star symbol, with the fact table at its center. The fact table
stores the core data (measures) of interest for analysis, often including foreign keys
that link to the surrounding dimension tables.
● Dimension tables: These tables radiate outwards from the fact table, each
representing a specific dimension (descriptive attribute) relevant to the analysis.
Dimension tables typically have a primary key and may also have other relevant
attributes related to the dimension.

Benefits of Star Schema:

● Simplicity: The star schema is a straightforward and easy-to-understand design,


making it ideal for data warehouses that support a wide range of users with varying
levels of technical expertise.

● Efficient Query Performance: The structure of the star schema optimizes query
performance for analytical workloads. Queries often focus on retrieving data from the
fact table based on filters applied to the dimensions.

● Scalability: New dimension tables can be easily added to the star schema to
accommodate additional data subjects or analysis needs without affecting the existing
structure.

Snowflake schema: A refinement of star of schema where some dimensional hierarchy is


further splitting (normalized)into a set of smaller dimension tables,forming a shape similar to
snowflake

1
A snowflake schema extends the star schema concept by further normalizing the dimension
tables. In a star schema, dimension tables may contain redundant data to optimize query
performance. A snowflake schema breaks down these dimension tables into multiple, more
granular tables with smaller sizes and reduced redundancy.

1
Query :A query is a specific question or request posed to the system to retrieve data.
Query Performance:This refers to how fast and efficiently a database system can process and return the results of a
query
Redundant data refers to data that is duplicated or stored in multiple places within a database

11
N.B: the snowflake structure can reduce effectiveness of browsing ,since more joins will be
needed

Imagine a star schema where the "Customer" dimension table stores both customer
demographics (name, address) and purchase history details (items bought, dates). In a
snowflake schema, you might create separate tables for "Customer Demographics" and
"Customer Transactions," linked by a common customer ID.

Benefits of Snowflake Schema:

● Reduced Data Redundancy: Normalization in snowflake schemas minimizes data


duplication, which can help save storage space and improve data integrity. If a
customer's address changes, you only need to update it in one place, the "Customer
Demographics" table.

● Improved Data Maintainability: Normalized tables are often easier to maintain and
update as the business evolves. Adding new customer attributes becomes simpler as
you can create a new table without affecting existing tables.

Drawbacks of Snowflake Schema:

● Increased Query Complexity: While snowflake schemas can improve data integrity,
they can also lead to more complex queries compared to star schemas. Queries may

12
need to join multiple tables to retrieve the necessary data.

● Potentially Slower Query Performance: The additional joins required in snowflake


schemas can impact query performance compared to the simpler structure of star
schemas.

Choosing Between Star Schema and Snowflake Schema:

The decision between a star schema and a snowflake schema depends on several factors:

● Data Integrity Requirements: If maintaining highly consistent and non-redundant


data is critical, a snowflake schema might be preferable.
● Query Performance Needs: If query performance is a top priority and the data
model is relatively stable, a star schema might be more suitable.
● User Base Expertise: For data warehouses with users with varying technical skills, a
star schema's simplicity can be advantageous.

Fact Constellations :

● A fact constellation is an extension of the star schema concept used in data


warehouses. While a star schema has a single fact table connected to multiple
dimension tables, a fact constellation involves multiple fact tables sharing
dimensions.
● Imagine you have a data warehouse for an online retail business. You might have a
star schema for "Sales" data with dimensions like product, customer, time, and
location. However, you might also want to analyze "Website Traffic" data or
"Customer Service Interactions" data.

● In a fact constellation approach, you would create separate fact tables for "Sales,"
"Website Traffic," and "Customer Service Interactions," but they would all share some
common dimension tables (e.g., product, customer, time). This allows for

13
flexibility in analyzing data from different perspectives.

Benefits of Fact Constellations:

● Flexibility for Complex Warehouses: Fact constellations cater to data warehouses


that integrate data from multiple sources and involve various analytical needs. Each
fact table can focus on a specific data domain (sales, traffic, etc.).

● Improved Data Reusability: Shared dimension tables in a fact constellation promote


data reuse and consistency. Dimensions only need to be maintained in one place,
reducing redundancy.

Drawbacks of Fact Constellations:

● Increased Complexity: Managing and querying data warehouses with multiple fact
tables can be more complex compared to simpler star schema designs.

● Potential Performance Impact: Joins across multiple fact tables and dimension
tables can affect query performance compared to a single fact table in a star schema.

When to Use Fact Constellations:

Fact constellations are a good choice for data warehouses that meet these criteria:

● Multiple data sources with diverse analytical needs


● Complex data models with various fact types
● Importance of data reusability and consistency across dimensions

14
Concept hierarchies

Concept hierarchies may also be defied by grouping values for given dimension or attributes,
resulting in a set-grouping hierarchy

OLAP operation

15
1. Drill Down:

● This operation allows users to navigate deeper into a specific dimension of the data.
Imagine zooming in on a map.
● For instance, if you're analyzing sales data, you could start by looking at total sales
for all products. Then, you could drill down into the "Product" dimension to see sales
figures for specific product categories or even individual products.

2. Roll Up:

● This is the opposite of drill down. It involves moving up a level in a dimension


hierarchy to see more aggregated data.
● Continuing the sales data example, you might start by looking at sales for individual
products. Then, you could roll up to see sales by product category, providing a
broader overview.

3. Slice:

● This operation involves selecting a specific subset of data based on a particular


dimension value.
● In the sales data example, you could slice the data to focus on sales for a specific
customer segment (e.g., gold members) or a particular geographic region.

4. Dice:

● Similar to slicing, dicing involves creating a sub-dataset by applying filters to


multiple dimensions.
● You could dice the sales data to look at sales of electronic products within a specific
price range for customers in a particular region during a certain time period.

16
A star-net query model

The starnet query model is a way to visualize how users can query and analyze data
within a multidimensional data warehouse

The starnet query model serves as a visual guide for users to understand how they can
navigate and analyze data across different levels of detail within each dimension of the data
warehouse. And also Helps users understand how dimensions and their hierarchies are
structured, enabling them to filter and drill down into specific details during analysis.

Benefits:

● Improved User Comprehension: The starnet model provides a clear visual


representation of how dimensions and their hierarchies are structured in the data
warehouse. This helps users understand the available options for filtering and drilling
down into the data during analysis.
● Effective Exploration: By visualizing the dimensional hierarchies, users can explore
data from various perspectives. They can choose to analyze data at a high level (e.g.,
total sales for a year) or drill down to a more granular level (e.g., sales by product
category for each month).

17
data warehouse a multi-tiered architecture
a multi-tiered architecture refers to a way of organizing the data warehouse system into
distinct layers, each with specific functionalities.

Common tiers in a data warehouse architecture:

1. Data Source Layer:

● This is the starting point. It encompasses the various operational systems within the
organization that generate data, such as sales transaction systems, customer
relationship management (CRM) systems, financial systems, sensor data, and social
media feeds.

2. Data Staging Area:

● Data from various sources is first extracted and landed in the staging area. This
temporary storage area serves as a buffer zone for cleansing, transforming, and
validating the data before it's loaded into the data warehouse.

3. Data Integration Layer:

● The data integration layer is responsible for extracting data from the source systems,
transforming it into a consistent format, and handling any inconsistencies or errors.
This layer may also involve techniques like data cleansing, deduplication, and schema
mapping.

4. Data Warehouse Layer:

● This is the core layer of the architecture, where the cleansed and transformed data
resides. The data warehouse is typically organized using schemas like star schema or
snowflake schema to optimize storage and retrieval for analysis.

5. Data Presentation Layer:

● This layer provides users with tools and interfaces to access, analyze, and visualize
data stored within the data warehouse. These tools can include OLAP (Online
Analytical Processing) tools for multidimensional analysis, data mining tools for
uncovering hidden patterns, and reporting tools for creating dashboards and
visualizations.

Advantages of Multi-Tier Architecture of Data warehouse

18
1. Scalability: Various components can be added, deleted, or updated in accordance with
the data warehouse’s shifting needs and specifications.
2. Better Performance: The several layers enable parallel and efficient processing,
which enhances performance and reaction times.
3. Modularity: The architecture supports modular design, which facilitates the creation,
testing, and deployment of separate components.
4. Security: The data warehouse’s overall security can be improved by applying various
security measures to various layers.
5. Improved Resource Management: Different tiers can be tuned to use the proper
hardware resources, cutting expenses overall and increasing effectiveness.

19
Data mart :is a focused subset of a data warehouse or data source that caters to the specific
needs of a particular department or business unit. For example, a marketing data mart may
limit its topics to customers, goods, and sales

Data warehouse devlopment: A recommended approach

high-level overview of a recommended data warehouse development approach. It


emphasizes the importance of defining a clear data model upfront and following a
multi-tiered architecture for efficient data storage, management, and analysis.

20

You might also like