MIS - Chap 6

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

EM4218E - MANAGEMENT INFORMATION SYSTEMS

Chapter 6
Foundations of Business Intelligence:
Database and Information Management
Learning Objectives

1. What are the problems of managing data resources in a traditional file environment?
2. What are the major capabilities of database management systems (DBMS), and why is a
relational DBMS so powerful?
3. What are the principal tools and technologies for accessing information from databases
to improve business performance and decision making?
4. Why are data governance and data quality assurance essential for managing the firm’s
data resources?

2
CONTENTS

6.1. Challenges in Managing Data Resources

6.2. DBMS: Concepts and Roles

6.3. Tools and Technologies to Access Information

6.4. Data Governance and Data Quality Assurance

3
6.1. Challenges in Managing Data Resources

• File Organization Concepts and Services


• Data Hierarchy: Video Lecture Figure 6.1
• Bit: represents the smallest unit of data a computer can handle
• Byte: group of bits, which represents a single character, which
can be a letter, a number, or another symbol.
• Field: group of characters into a word, a group of words, or a
complete number (such as a person’s name or age)
• Record: group of related fields (such as the student’s name, the
course taken, the date, and the grade)
• File: group of records of the same type

• Entity
• An entity is a person, place, thing, or event on which we store and
maintain information
• Attribute
• Each characteristic or quality describing a particular entity is
called an attribute

4
6.1. Challenges in Managing Data Resources
• Problems with the Traditional File Environment
• Video Lecture Figure 6.2

5
6.1. Challenges in Managing Data Resources
• Problems with the Traditional File Environment

Presence of The coupling of A traditional file Management might Disconnected


duplicate data in data stored in files system cannot have no way of information
multiple data files and the specific deliver ad hoc knowing who is sources in an
so that the same programs required reports or respond accessing or even organization lead
data are stored in to update and to unanticipated making changes to to unreliable data
more than one place maintain those information the organization’s and hinder
or location. files such that requirements in a data information
changes in timely fashion sharing
programs require
changes to the
data.
6
6.1. Challenges in Managing Data Resources

7
CONTENTS

6.1. Challenges in Managing Data Resources

6.2. DBMS: Concepts and Roles

6.3. Tools and Technologies to Access Information

6.4. Data Governance and Data Quality Assurance

8
6.2. DBMS: Concepts and Roles

• Database
• A collection of data organized to service many applications at the same time by storing and managing
data so that they appear to be in one location

9
6.2. DBMS: Concepts and Roles

• Database
• A collection of data organized to service many applications at the same time by storing and managing
data so that they appear to be in one location

Storage Management Updates Retrieval

10
6.2. DBMS: Concepts and Roles

• Database Management Systems (DBMS)


• Software that enables an organization to centralize data, manage them efficiently, and provide access to
the stored data by application programs
• An interface between application programs and physical data files
• A DBMS reduces data redundancy and inconsistency by minimizing isolated files in which the same data
are repeated

• Relational DBMS
• Video Lecture Figure 6.4
• Represents data as two-dimensional tables: rows – records, columns – fields
• Primary key
• One field that is the unique identifier for all the information in any row of the table.
• The primary key cannot be duplicated
• Foreign key
• Field that helps one table to lookup values in the other

11
6.2. DBMS: Concepts and Roles

12
13
6.2. DBMS: Concepts and Roles

• Three basic operations of a relational DBMS


• select, join, and project

14
6.2. DBMS: Concepts and Roles

• DBMS capabilities
• Data definition language
• Specify the structure of the content of the database
• Data dictionary
• An automated or manual file that stores definitions of data elements and their characteristics

15
6.2. DBMS: Concepts and Roles

• DBMS capabilities
• Data manipulation language
• Add, change, delete, and retrieve the data in the database
• Structured Query Language, or SQL
• Query
• A request for data from a database

16
6.2. DBMS: Concepts and Roles
• A glance of SQL

17
6.2. DBMS: Concepts and Roles

Designing databases
• Entity Relationship Diagram (ERD)
• One-to-one
• One-to-many
• Many-to-many

18
6.2. DBMS: Concepts and Roles

Designing databases
• Data Normalization
• The process of creating small stable data structures from complex groups of data when designing a relational
database

19
6.2. DBMS: Concepts and Roles

Designing databases
• Data Normalization

20
6.2. DBMS: Concepts and Roles

• Another practice

Order_ID, Customer_ID, Customer_FirstName,


Customer_LastName, Order_street, Order_city, Order_state,
Order_zip, Order_ShipDate, Product_ID, Quantity,
Product_type, Customer_phone, Customer_DefaultStreet,
Customer_DefaultCity, Customer_DefaultZip

• What are the tables?


• What are the primary keys and foreigner keys in the tables?
• What are the relationships between the tables?
• Write a query to display all the products that are purchased by the customers who live in Hanoi and
these customers’ information.

21
6.2. DBMS: Concepts and Roles

22
6.2. DBMS: Concepts and Roles

Nonrelational Databases
• Nonrelational Database management systems use a more flexible data model and are designed for
managing large data sets across many distributed machines and for easily scaling up or down.
• Useful for accelerating simple queries against large volumes of structured and unstructured data (web,
social media, graphics, and other forms of data)

Cloud Databases and Distributed Database


• Cloud-based data management services have special appeal for businesses seeking database capabilities
at a lower cost than in-house database products
MySQL, Microsoft Azure SQL Database, Oracle Database, PostgreSQL, Amazon Aurora, Maria DB
• A distributed database is one that is stored in multiple physical locations
Google

23
6.2. DBMS: Concepts and Roles

• Blockchain
• A distributed database technology that enables firms and organizations to create and verify transactions
on a network nearly instantaneously without a central authority.

24
CONTENTS

6.1. Challenges in Managing Data Resources

6.2. DBMS: Concepts and Roles

6.3. Tools and Technologies to Access Information

6.4. Data Governance and Data Quality Assurance

25
6.3. Tools and Technologies to Access Information

• Big Data
• Big Data are large and complex data sets. So large that traditional data processing software is not capable
of collecting, managing and processing data in a reasonable amount of time. These large data sets can
include structured, unstructured, and semi-structured data.
• Big data is typically characterized by 3Vs:
• Volume: Extremely large data volume
• Variety: Various types of data
• Velocity: The speed at which data needs to be processed and analyzed

26
6.3. Tools and Technologies to Access Information

• Business Intelligence Infrastructure


• The technology infrastructure used to build and deploy Business Intelligence (BI) systems. Business
Intelligence Infrastructure provides tools and services to collect, manage, analyze and display business
information from different data sources, enabling users to make better decisions based on information has
been processed and analyzed
• Data Warehouse
• A database that stores current and historical data of potential interest to decision makers throughout the company
• Data Mart
• A subset of a data warehouse in which a summarized or highly focused portion of the organization’s data is
placed in a separate database for a specific population of users
• Hadoop
• An open-source software framework managed by the Apache Software Foundation that enables distributed
parallel processing of huge amounts of data across inexpensive computers
• Hadoop consists of several key services, including the Hadoop Distributed File System (HDFS) for data storage
and MapReduce for high-performance parallel data processing
• In-Memory Computing
• Analytic Platform

27
6.3. Tools and Technologies to Access Information

• Business Intelligence Infrastructure


• Data Warehouse
• Data Mart
• Hadoop
• In-Memory Computing
• Another way of facilitating big data analysis is to use in-memory computing, which relies primarily on a
computer’s main memory (RAM) for data storage
• Analytic Platforms
• Commercial database vendors have developed specialized high-speed analytic platforms using both relational and
nonrelational technology that are optimized for analyzing large data sets.
• Analytic platforms feature preconfigured hardware-software systems that are specifically designed for query
processing and analytics
• Data lake is a repository for raw unstructured data or structured data that for the most part has not yet been
analyzed, and the data can be accessed in many ways

28
6.3. Tools and Technologies to Access Information

Analytical Tools: Relationships, Patterns, Trends


OLAP
• OLAP (Online Analytical Processing) is a multidimensional data analysis method in which data is organized into
different dimensions, allowing users to analyze data from many different angles and build analytical reports, data
accumulation. OLAP allows users to analyze data from many different angles, including time, geographic,
product, and more, to uncover relationships, trends, and fluctuations in the data
• Supports multidimensional data analysis
• Enables users to view the same data in different ways using multiple dimensions (product, pricing, cost, region, or
time period)
• Enables users to obtain online answers to ad hoc questions such as these in a large amount of time
Datamining
• Provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and
relationships in large databases and inferring rules from them to predict future behavior
• Types of information obtainable from data mining include associations, sequences, classifications, clusters, and
forecasts
Text mining and web mining

29
6.3. Tools and Technologies to Access Information

Analytical Tools: Relationships, Patterns, Trends


OLAP
Datamining
Text mining and web mining
• Text mining is the process of using data mining techniques to analyze and find information in text documents.
These techniques may include parsing, word frequency analysis, semantic analysis, text similarity analysis,
opinion analysis, and information mining in various documents such as emails, newspapers, and magazines.
advertisements, news, books, websites, etc.
• Web mining is the process of data mining on the web, which includes the collection and analysis of information
from websites and other data sources on the internet. Web mining techniques include web data mining, site
structure mining, web content mining, and user interaction with the website, etc

30
CONTENTS

6.1. Challenges in Managing Data Resources

6.2. DBMS: Concepts and Roles

6.3. Tools and Technologies to Access Information

6.4. Data Governance and Data Quality Assurance

31
6.4. Data Governance and Data Quality Assurance

Data governance – key components of data management


• Data governance encompasses organizational policies and procedures for the maintenance, distribution,
and use of information in the organization
• Establishes the organization’s rules for sharing, disseminating, acquiring, standardizing, classifying, and
inventorying information.
• Component
• Data Quality, Data Security and Data Privacy
• Purpose
• Ensure that data is accurate, consistent, and trustworthy and that it is used in a way that complies with legal and
regulatory requirements.

32
6.4. Data Governance and Data Quality Assurance

Data quality audit


• Definition: a process of evaluating the accuracy, completeness, consistency, and reliability of an
organization's data to identify issues or problems with the data, and to develop a plan to address them.
• The audit typically involves a review of the data sources, data collection processes, data storage and
management systems, and data usage and reporting practices
Data cleansing
• Data cleansing is the process of preparing data for analysis by removing irrelevant or incorrect,
incomplete or misleading information that could skew results and cause erroneous decisions or
unrealistic.
• Data cleaning not only refers to the removal of unnecessary pieces of data, but is often associated with
correcting incorrect information in the data set and reducing duplicates

33
Learning Objectives

1. What are the problems of managing data resources in a traditional file environment?
2. What are the major capabilities of database management systems (DBMS), and why is a
relational DBMS so powerful?
3. What are the principal tools and technologies for accessing information from databases
to improve business performance and decision making?
4. Why are data governance and data quality assurance essential for managing the firm’s
data resources?

34

You might also like