Cat Data Mining

• Differentiate a relational database from a Data Warehouse:
• A relational database is designed for real-time transaction processing and data

management. It is optimized for read/write operations, allowing users to perform
transactions like inserting, updating, and deleting data.
• A data warehouse is designed for analytical processing and reporting. It is optimized for
read-heavy operations, aggregating large volumes of data from multiple sources to
support decision-making processes.
• What is a data cube:
• A data cube is a multi-dimensional array of values, typically used to describe data in

terms of multiple dimensions for OLAP (Online Analytical Processing). It allows users to
view and analyze data from different perspectives, such as time, geography, and product
dimensions.
• Enumerate three OLAP operations:
• Roll-up: Aggregates data by climbing up a concept hierarchy for a dimension, reducing

the level of detail.
• Drill-down: The reverse of roll-up, it increases the level of detail by navigating down a
concept hierarchy.
• Slice and Dice: Slicing involves selecting a single dimension to filter the data, whereas
dicing creates a sub-cube by selecting specific values for multiple dimensions.
• Define data lake & data mart:
• Data Lake: A centralized repository that allows you to store all your structured and
unstructured data at any scale. It can hold raw data in its native format until needed.
• Data Mart: A subset of a data warehouse, focused on a specific business line or team. It
is optimized for specific queries and reporting needs.
• 10 importances of data warehousing:
• Improved data quality and consistency

• Enhanced business intelligence and decision-making
• Consolidation of data from multiple sources
• Historical data analysis
• Faster query performance
• Better data security and privacy
• Enhanced data governance and compliance
• Scalability to handle large volumes of data
• Reduced load on transactional systems
• Support for advanced analytics and machine learning
• Differentiate OLAP from OLTP in terms of: Users and system orientation, Data
contents, and Database design:
• Users and system orientation: OLAP is designed for data analysis and is user-oriented,
while OLTP is designed for transaction processing and is system-oriented.
• Data contents: OLAP contains historical data from various sources, while OLTP
contains current, operational data.
• Database design: OLAP databases are designed in star or snowflake schemas for fast
query performance, whereas OLTP databases are normalized to minimize redundancy
and optimize for transactional speed.
• Explain in brief 4 layers of a data warehouse:
• Data Source Layer: This layer includes all the data sources from which data is extracted,
such as databases, flat files, and external data sources.
• Data Staging Layer: This layer is where data is cleaned, transformed, and loaded into
the data warehouse. It acts as an intermediate storage area.
• Data Storage Layer: This is the actual data warehouse where data is stored in a
structured format, typically in star or snowflake schemas.
• Presentation Layer: This layer provides tools for querying, reporting, and data analysis,
allowing users to interact with the data warehouse.
• Top-Down & Bottom-Up design approaches of a data warehouse:
• Top-Down Approach: Starts with the overall design and planning of the data warehouse
and then moves to the creation of individual data marts. It emphasizes a comprehensive,
enterprise-wide solution.
• Bottom-Up Approach: Begins with the creation of data marts that address specific
business needs and gradually integrates them into a comprehensive data warehouse. It
focuses on delivering quick results and addressing immediate business requirements.
• Differentiate ELT from ETL in terms of speed and security:
• Speed: ELT (Extract, Load, Transform) can be faster for large datasets because the
transformation is performed within the target database, leveraging its processing power.
ETL (Extract, Transform, Load) may be slower as the transformation happens before
loading.
• Security: ETL can be more secure as data is transformed before loading into the target
system, reducing exposure to raw data. ELT may expose raw data to the target system
before transformation, potentially increasing security risks.
• List any 2 ETL tools/software:
• Apache Nifi
• Talend
• Star Schema in data modeling:
• Advantages:
o Simplifies queries and improves performance.
o Easy to understand and navigate.
o Efficient for large volumes of data.
• Disadvantages:
o Can lead to data redundancy.
o Less flexible for complex queries.
o Can become inefficient with very large dimension tables.
• What are fact tables and dimensions in multidimensional data processing:
• Fact Tables: Central tables in a star schema of a data warehouse that store quantitative
data for analysis. They contain measurable, numeric data such as sales revenue, order
quantities, etc.
• Dimensions: Tables that store descriptive attributes related to the facts. Dimensions
provide context to the data stored in fact tables, such as time, geography, product details,
etc.
• Describe three tiers of data warehouse architecture:
• Bottom Tier: Data Warehouse Database Server, where the data is stored, usually in a
relational database.
• Middle Tier: OLAP Server, which provides an abstracted view of the database to the
end-users for querying and analysis.
• Top Tier: Front-end Tools, which are the user interfaces for data analysis, reporting, and
mining.
• In Data Extraction, we have 2 types, list and explain:
• Full Extraction: Extracts all the data from the source system. It is typically used when a
data warehouse is being built for the first time or when there is a significant change in the
source system.
• Incremental Extraction: Extracts only the data that has changed since the last
extraction. It is used for regular updates to keep the data warehouse in sync with the
source systems.
• 10 common types of data transformation:
• Data Cleansing
• Data Deduplication
• Data Aggregation
• Data Filtering
• Data Enrichment
• Data Normalization
• Data Summarization
• Data Integration
• Data Sorting
• Data Formatting
• In transferring data, 4 steps are followed, list and briefly discuss them:
• Extract: Retrieving data from various source systems.

• Transform: Applying rules, filters, and functions to the extracted data to convert it into
the desired format.
• Load: Inserting the transformed data into the target database or data warehouse.
• Validate: Ensuring the accuracy and consistency of the loaded data.
• According to your perceptions, which OLAP operation in each of the following 5

diagrams:
• (a) Roll-up: Aggregating data to a higher level of detail.

• (b) Drill-down: Breaking down data to a more detailed level.
• (c) Slice: Selecting a single dimension to filter the data.
• (d) Dice: Creating a sub-cube by selecting specific values for multiple dimensions.
• (e) Pivot: Rotating the data axes to provide a different perspective.
• Discuss the 7 OLAP servers:
• Relational OLAP (ROLAP): Uses relational databases to store and manage data. It
provides flexibility in querying large datasets.
• Multidimensional OLAP (MOLAP): Uses multidimensional data storage, typically in
data cubes, for fast retrieval and analysis.
• Hybrid OLAP (HOLAP): Combines features of both ROLAP and MOLAP, providing a
balance between storage efficiency and query performance.
• Desktop OLAP (DOLAP): Runs on individual desktops, providing quick and easy
access to OLAP functionalities for individual users.
• Web OLAP (WOLAP): Delivers OLAP functionalities over the web, allowing remote
access and analysis.
• Mobile OLAP (MOLAP): Provides OLAP capabilities on mobile devices, enabling on-
the-go data analysis.
• In-memory OLAP (IMOLAP): Uses in-memory data storage to achieve high-speed data
processing and analysis.

Cat Data Mining

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cat Data Mining

Uploaded by

Copyright:

Available Formats

• Differentiate a relational database from a Data Warehouse:

• A relational database is designed for real-time transaction processing and data

• What is a data cube:

• A data cube is a multi-dimensional array of values, typically used to describe data in

• Enumerate three OLAP operations:

• Roll-up: Aggregates data by climbing up a concept hierarchy for a dimension, reducing

• Define data lake & data mart:

• 10 importances of data warehousing:

• Improved data quality and consistency

• Explain in brief 4 layers of a data warehouse:

• Top-Down & Bottom-Up design approaches of a data warehouse:

• Differentiate ELT from ETL in terms of speed and security:

• List any 2 ETL tools/software:

• What are fact tables and dimensions in multidimensional data processing:

• Describe three tiers of data warehouse architecture:

• In Data Extraction, we have 2 types, list and explain:

• 10 common types of data transformation:

• Extract: Retrieving data from various source systems.

• According to your perceptions, which OLAP operation in each of the following 5

• (a) Roll-up: Aggregating data to a higher level of detail.

• Discuss the 7 OLAP servers:

You might also like