Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

8/05/2023

EM4218E - MANAGEMENT INFORMATION SYSTEMS


Chapter 6
Foundations of Business Intelligence:
Database and Information Management

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong -


1

LEARNING OBJECTIVES

After learning this chapter, the students can be able to:


• Describe the problems of managing data resources in a traditional file environment
• Identify the major capabilities of DBMS, and the potential benefit of DBMS
• Perceive and be able to use tools and technologies for accessing information from
databases to improve business performance and decision making
• Explain why data governance and data quality assurance essential for managing the
firm’s data resources

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 2

CONTENTS

6.1. Challenges in Managing Data Resources


6.2. DBMS: Concepts and Roles
6.3. Tools and Technologies to Access Information
6.4. Data Governance and Data Quality Assurance

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 3

1
8/05/2023

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 4

6.1. Challenges in Managing Data Resources

• File Organization Concepts and Services


• Data Hierarchy
• Bit: represents the smallest unit of data a computer
can handle
• Byte: group of bits, which represents a single
character, which can be a letter, a number, or
another symbol.
• Field: group of characters into a word, a group of
words, or a complete number (such as a person’s
name or age)
• Record: group of related fields (such as the
student’s name, the course taken, the date, and the
grade)
• File: group of records of the same type

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 5

6.1. Challenges in Managing Data Resources

Presence of The coupling of A traditional file Management Disconnected


duplicate data in data stored in files system cannot might information
multiple data files and the specific deliver ad hoc have no way of sources in an
so that the same programs required reports or respond knowing who is organization lead
data are stored in to update and to unanticipated accessing or even to unreliable data
more than one maintain those information making changes to and hinder
place or location. files such that requirements in a the organization’s information
changes in timely fashion data sharing
programs require
changes to the
data.

2
8/05/2023

6.1. Challenges in Managing Data Resources

• File Organization Concepts and Services


• Data Hierarchy
• Traditional File Processing
• Data Redundancy and Inconsistency
• Program-Data Dependency
• Lack of Flexibility
• Poor Security
• Lack of Data Sharing and
Availability

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 7

6.2. DBMS: Concepts and Roles

• DBMS is software that enables an organization to centralize data, manage them


efficiently, and provide access to the stored data by application programs.
• DBMS is considered as an interface between application programs and physical
data files.
• Relational DBMS represents data as two-dimensional tables: rows – records,
columns – fields
• Primary key: one field that is the unique identifier for all the information in any row of the table.
The primary key cannot be duplicated.
• Foreign key: field that helps one table to lookup values in
the other

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 8

6.2. DBMS

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 9

3
8/05/2023

6.2. DBMS: Concepts and Roles

• DBMS reduces data redundancy and inconsistency by minimizing


isolated files in which the same data are repeated
• DBMS includes capabilities and tools for organizing, managing, and
accessing the data in the database
• Data definition language
• Data dictionary
• Data manipulation language

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 10

10

6.2. DBMS: Concepts and Roles

• Data definition language


• Data dictionary is an
automated or manual file
that stores definitions of
data elements and their
characteristics

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 11

11

6.2. DBMS: Concepts and Roles

• Data manipulation language (ex: SQL – Structured Query Language) is


used to add, change, delete, and retrieve the data in the database.
• A query is a request for data from a database

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 12

12

4
8/05/2023

6.2. DBMS: Concepts and Roles

Designing database
• Entity Relationship Diagram (ERD)
• One-to-one
• One-to-many
• Many-to-many
• Data Normalization

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 13

13

6.2. DBMS: Concepts and Roles

Nonrelational Databases
• Nonrelational Database management systems use a more flexible data model
and are designed for managing large data sets across many distributed machines
and for easily scaling up or down.
• Useful for accelerating simple queries against large volumes of structured and
unstructured data (web, social media, graphics, and other forms of data)
Cloud Databases and Distributed Database
• Cloud-based data management services have special appeal for businesses
seeking database capabilities at a lower cost than in-house database products
MySQL, Microsoft Azure SQL Database, Oracle Database, PostgreSQL, Amazon Aurora, Maria
DB
• A distributed database is one that is stored in multiple physical locations
Google

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 14

14

6.2. DBMS: Concepts and Roles

Blockchain is a
distributed database
technology that enables
firms and organizations to
create and verify
transactions on a network
nearly instantaneously
without a central
authority.

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 15

15

5
8/05/2023

6.3. Tools and Technologies to Access Information

• Big Data
• Big Data are large and complex data sets. So large that traditional data
processing software is not capable of collecting, managing and processing data in
a reasonable amount of time. These large data sets can include 0structured,
unstructured, and semi-structured data.
• Big data is typically characterized by 3Vs:
• Volume: Extremely large data volume
• Variety: Various types of data
• Velocity: The speed at which data needs to be processed and analyzed
• Business Intelligence InfraStructure
• Analytical Tools

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 16

16

6.3. Tools and Technologies to Access Information

• Big Data
• Business Intelligence InfraStructure
• The technology infrastructure used to build and deploy Business Intelligence (BI) systems.
Business Intelligence Infrastructure provides tools and services to collect, manage, analyze and
display business information from different data sources, enabling users to make better decisions
based on information has been processed and analyzed
• Data Warehouses is a database that stores current and historical data of potential interest to
decision makers throughout the company
• Data Marts is a subset of a data warehouse in which a summarized or highly focused portion of
the organization’s data is placed in a separate database for a specific population of users
• Hadoop
• Hadoop is an open-source software framework managed by the Apache Software Foundation that enables
distributed parallel processing of huge amounts of data across inexpensive computers
• Hadoop consists of several key services, including the Hadoop Distributed File System (HDFS) for data
storage and MapReduce for high-performance parallel data processing
• In-Memory Computing
• Analytic Platform
• Analytical Tools

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 17

17

6.3. Tools and Technologies to Access Information

• Big Data
• Business Intelligence InfraStructure
• Data Warehouses and Data Marts
• Hadoop
• In-Memory Computing
• Another way of facilitating big data analysis is to use in-memory computing, which relies
primarily on a computer’s main memory (RAM) for data storage
• Analytic Platforms
• Commercial database vendors have developed specialized high-speed analytic platforms using
both relational and nonrelational technology that are optimized for analyzing large data sets.
• Analytic platforms feature preconfigured hardware-software systems that are specifically
designed for query processing and analytics
• Data lake is a repository for raw unstructured data or structured data that for the most part
has not yet been analyzed, and the data can be accessed in many ways
• Analytical Tools

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 18

18

6
8/05/2023

6.3. Tools and

Technologies to Access
Information

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 19

19

6.3. Tools and Technologies to Access Information

Analytical Tools: Relationships, Patterns, Trends


OLAP
• OLAP (Online Analytical Processing) is a multidimensional data analysis method in which data is
organized into different dimensions, allowing users to analyze data from many different angles and
build analytical reports, data accumulation. OLAP allows users to analyze data from many different
angles, including time, geographic, product, and more, to uncover relationships, trends, and
fluctuations in the data
• Supports multidimensional data analysis
• Enables users to view the same data in different ways using multiple dimensions (product, pricing,
cost, region, or time period)
• Enables users to obtain online answers to ad hoc questions such as these in a large amount of time
Datamining
• Provides insights into corporate data that cannot be obtained with OLAP by finding hidden
patterns and relationships in large databases and inferring rules from them to predict future
behavior
• Types of information obtainable from data mining include associations, sequences, classifications,
clusters, and forecasts
Text mining and web mining

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 20

20

6.3. Tools and Technologies to Access Information

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 21

21

7
8/05/2023

6.3. Tools and Technologies to Access Information

Analytical Tools: Relationships, Patterns, Trends


OLAP
Datamining
Text mining and web mining
• Text mining is the process of using data mining techniques to analyze and find information in
text documents. These techniques may include parsing, word frequency analysis, semantic
analysis, text similarity analysis, opinion analysis, and information mining in various documents
such as emails, newspapers, and magazines. advertisements, news, books, websites, etc.
• Web mining is the process of data mining on the web, which includes the collection and
analysis of information from websites and other data sources on the internet. Web mining
techniques include web data mining, site structure mining, web content mining, and user
interaction with the website, etc

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 22

22

6.4. Data Governance and Data Quality Assurance

Data Governance – Key component of Data Management


• Data governance encompasses organizational policies and procedures for the
maintenance, distribution, and use of information in the organization
• Establishes the organization’s rules for sharing, disseminating, acquiring,
standardizing, classifying, and inventorying information.
• Component : Data Quality, Data Security and Data Privacy
• Purpose: ensure that data is accurate, consistent, and trustworthy and that it is
used in a way that complies with legal and regulatory requirements.

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 23

23

6.4. Data Governance and Data Quality Assurance

Incomplete Inaccurate Inconsistence Duplicated Outdated

Some data points Data is incorrect or Data is recorded The same data is Data is no longer
are missing or not contains errors. differently across recorded relevant or up-to-
recorded. Having This happens due different sources or multiple times. date, that can
variety of reasons: to human error or does not match up Taking up lead to incorrect
data entry errors, technical glitches with other data in a unnecessary analysis and
data corruption or dataset storage space decision-making
missing and increase
information processing time

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 24

24

8
8/05/2023

6.4. Data Governance and Data Quality Assurance

Data Quality Audit


• Definition: a process of evaluating the accuracy, completeness, consistency, and
reliability of an organization's data to identify issues or problems with the data,
and to develop a plan to address them.
• The audit typically involves a review of the data sources, data collection
processes, data storage and management systems, and data usage and reporting
practices
Data Cleansing
• Data cleansing is the process of preparing data for analysis by removing
irrelevant or incorrect, incomplete or misleading information that could skew
results and cause erroneous decisions or unrealistic.
• Data cleaning not only refers to the removal of unnecessary pieces of data, but is
often associated with correcting incorrect information in the data set and
reducing duplicates

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 25

25

Individual Assignment

Sylvester’s Bike Shop, located in San Francisco, California, sells road, mountain,
hybrid, leisure, and children’s bicycles. Currently, Sylvester’s purchases bikes from
three suppliers but plans to add new suppliers in the near future.
Your assigned to:
1. Build a simple relational database to manage information about Sylvester’s
suppliers and products.
2. Once you have built the database, perform the following activities.
• Prepare a report that identifies the five most expensive bicycles. The report should list the
bicycles in descending order from most expensive to least expensive, the quantity on hand for
each, and the markup percentage for each.
• Prepare a report that lists each supplier, its products, the quantities on hand, and associated
reorder levels. The report should be sorted alphabetically by supplier. For each supplier, the
products should be sorted alphabetically.
• Prepare a report listing only the bicycles that are low in stock and need to be reordered. The
report should provide supplier information for the items identified.
• Write a brief description of how the database could be enhanced to further improve
management of the business. What tables or fields should be added? What additional reports
would be useful?

EM4218E @ Assoc. Prof. Dr. Pham Thi Thanh Hong - 26

26

@ Assoc Prof Pham Thi Thanh Hong


Email: hong.phamthithanh@hust.edu.vn

EM4218E – Management Information System @ Assoc. Prof. Pham Thi Thanh Hong 27

27

You might also like