Professional Documents
Culture Documents
Information-Management-Reviewer
Information-Management-Reviewer
Information-Management-Reviewer
Need for Data Warehousing: The necessity arises from the need for a company-wide view of high-
quality information and the separation of informational from operational systems to enhance
performance in managing company data.
• An operational system is a system that is used to run a business in real time, based on current
data.
• Informational systems are designed to support decision making based on historical point-in time
and prediction data.
Data Warehouse Architectures: different architectures like independent data marts, dependent data
marts with operational data stores, and real-time data warehouses, highlighting their structures and
purposes.
A star schema is a simple database design (particularly suited to ad hoc queries) in which dimensional
data (describing how data are commonly aggregated for reporting) are separated from fact or event
data (describing business activity). A star schema is one version of a dimensional model.
• Components: It consists of a central fact table and one or more dimension tables.
• Fact Table: Holds quantitative data like units sold or orders booked, which are numerical and
additive.
• Dimension Tables: Contain descriptive data that provide context for facts, often used in reports
and queries.
• Data Mart Usage: A data mart may have multiple star schemas, sharing dimension tables but
with distinct fact tables.
2. Data Mart A subset of a data warehouse, customized for the decision-making needs
of a specific user group.
3. Star Schema A database design that separates dimensional data (descriptive) from fact
data (quantitative), making it suitable for ad hoc queries.
4. Fact Tables Tables in a star schema that contain quantitative data about a business,
such as sales or transactions, linked to dimension tables with descriptive
data.
Data Governance
• Purpose – ensures data within an organization is managed with a focus on availability, integrity
and regulatory compliance.
• System – it is also a system that defines authority and usage of data assets involving people,
processes, and technologies.
• Protection – aims to manage and safeguard data assets effectively.
• Goal – transparency within and outside the organization to regulators and increasing the value
of data maintained by the organization.
Most commonly involved stakeholders in Data Governance:
• Data Owners
o decision-makers responsible for data at an entity or attribute level, ensuring data is
managed as an asset.
• Data Stewards
o Subject matter experts who ensure daily adherence to data policies and standards,
responsible for the care of data assets.
• Data Custodians
o handle the technical and business processes for maintaining and updating data assets
throughout their lifecycle.
• Data Governance Committee
o A group that approves data policies and standards and addresses escalated data
governance issues.
In a typical enterprise, here are some folks who might make up a Data Governance Team:
• Manager, Master Data Governance
o Leads the design, implementation and continued maintenance of Master Data Control
and governance across the corporation.
• Solution and Data Governance Architect
o Provides oversight for solution designs and implementations.
• Data Analyst
o Uses analytics to determine trends and review information
• Data Strategist
o Develops and executes trend-pattern analytics plans
• Compliance specialist
o Ensure adherence to required standards (legal, defense, medical, privacy)
It involves a set of processes and procedures typically refers to the technical and operational
focused on managing data with objectives like aspects of handling data, such as data storage,
availability, integrity, and compliance. retrieval, and maintenance.
Yoko na, taas kayo… ditso tas exercise haha
Exercise:
Define the following:
1. Data transformation modifying data values or formats to meet the requirements of a system or
application. It’s a crucial step in building a data mart, ensuring that data
from various sources is standardized and consolidated for analysis.
2. Data owners typically, senior management who have authority over specific data assets
within an organization. They are responsible for the availability, integrity,
and compliance of the data.
3. Data stewards individuals or groups responsible for managing data according to the
policies and guidelines set by the data governance program. They ensure
the quality and proper usage of data across different business units.
4. ETL (Extract, Transform, Load) process used in databases and data warehousing
to extract data from various sources, transform it to fit operational needs,
and load it into a target database or data mart. It’s essential for integrating
and refining data, which is a key part of data governance.
• Traditional Data Administration - is a high-level function that is responsible for the overall
management of data resources in an organization, including maintaining corporate-wide data
definitions and standards.
o Roles of traditional data administration:
▪ Data policies, procedures, and standards
▪ Planning A key administration function
▪ Data conflict resolution
▪ Managing the information repository
▪ Internal marketing
• Traditional Database Administration - is a technical function responsible for logical and
physical database design and for dealing with technical issues, such as security enforcement,
database performance, backup and recovery, and database availability.
o Roles assumed by database administration:
▪ Analyzing and designing the database
▪ Selecting DBMS and related software tools
▪ Installing and upgrading the DBMS
▪ Tuning database performance
▪ Improving database query processing performance
▪ Managing data security, privacy, and integrity
▪ Performing data backup and recovery
• Data Warehouse Administration - A DWA plays many of the same roles as do Das and DBAs for
the data warehouse and data mart databases for the purpose of supporting decision-making
applications. The role of a DWA emphasizes integration and coordination of metadata and data.
o DWA performs the following functions:
▪ Build and administer an environment supportive of decision support
applications.
▪ Build a stable architecture for the data warehouse.
▪ Develop service-level agreements with suppliers and consumers of data for the
data warehouse.
o IT Change Management
▪ refers to the process by which changes to operational systems and databases are
authorized. Typically, any change to a production system or database must be
approved by a change control board that is made up of representatives from the
business and IT organizations.
Kapoy naman basa mod 7 taas kayo, ditso nkos exercise with answers:
Module 8: Overview: Distributed Database, Object Oriented Data
Distributed DBMS
• To have a distributed database, there must be a database management system that coordinates
the access to data at the various nodes.
1. Data Location Tracking: Maintains a distributed data dictionary to track data locations.
2. Data Retrieval and Processing: Determines where to retrieve and process parts of a query.
3. Request Translation: Translates requests between nodes with different DBMS and data models.
4. Data Management: Manages security, concurrency, optimization, and recovery functions.
5. Data Consistency: Ensures consistency among data copies across remote sites.
6. Logical Database Presentation: Presents a single logical database that is physically distributed.
7. Scalability: Allows the database to dynamically adapt to changing business needs.
8. Procedure Replication: Distributes stored procedures across nodes, like data.
9. Performance Improvement: Utilizes residual computing power to enhance database processing.
10. DBMS Diversity: Supports different DBMSs at various nodes through middleware.
11. Application Code Versions: Allows different software versions across the distributed database
nodes.
Query Optimization
• With distributed databases, the response to a query may require a DBMS to assemble data from
several different sites (although with location transparency, the user is unaware of this need).
• A major decision for the DBMS is how to process a query, which is affected by both the way a
user formulates a query and the intelligence of the distributed DBMS to develop a sensible plan
for processing.