Professional Documents
Culture Documents
Chapter 2 - Data and Knowledge Management - NOTES
Chapter 2 - Data and Knowledge Management - NOTES
MANAGEMENT
INFORMATION SYSTEM
Chapter 2 -Data and Knowledge Management
The chapter emphasizes the need and importance of Data and Knowledge
Management along with Business intelligence in the domain of
Management Information System
Management Information System
ii. Rows. Rows are like records as they contain data of multiple columns (like
the 1, 2, 3 etc. in a spreadsheet). A row can be made up of as many or as
few columns as you want. This makes reading data much more efficient -
you fetch what you want.
iii. Tables. A table is a logical group of columns. For example, you may have a
table that stores details of customers' names and addresses. Another table
would be used to store details of parts and yet another would be used for
supplier's names and addresses.
Management Information System
ii. Shared: Data in a database are shared among different users and
applications.
ix. Easily Accessible: It should be available when and where it is needed i.e. it
should be easily accessible.
vi. The goal of every organization and expert is same to get maximum out of the
data, the route and the starting point are different for each organization and
expert.
vii. As organizations are evaluating and architecting big data solutions they are
also learning the ways and opportunities which are related to Big Data.
viii. There is not a single solution to big data as well there is not a single vendor
which can claim to know all about Big Data.
ix. Big Data is too big a concept and there are many players – different
architectures, different vendors and different technology.
Volume
i. The exponential growth in the data storage as the data is now more than text
data.
Management Information System
ii. The data can be found in the format of videos, music’s and large images on
our social media channels.
iii. It is very common to have Terabytes and Petabytes of the storage system for
enterprises.
iv. As the database grows the applications and architecture built to support the
data needs to be reevaluated quite often.
v. Sometimes the same data is re-evaluated with multiple angles and even
though the original data is the same the new found intelligence creates
explosion of the data.
vi. The big volume indeed represents Big Data.
Velocity
i. The data growth and social media explosion have changed how we look at the
data. There was a time when we used to believe that data of yesterday is recent.
ii. The matter of the fact newspapers is still following that logic.
iii. However, news channels and radios have changed how fast we receive the
news.
iv. Today, people reply on social media to update them with the latest happening.
On social media sometimes a few seconds old messages (a tweet, status
updates etc.) is not something interests users.
v. They often discard old messages and pay attention to recent updates.
vi. The data movement is now almost real time and the update window has
reduced to fractions of the seconds.
vii. This high velocity data represents Big Data.
Variety
i. Data can be stored in multiple format. For example, database, excel, csv, access
or for the matter of the fact, it can be stored in a simple text file.
ii. Sometimes the data is not even in the traditional format as we assume, it may
be in the form of video, SMS, pdf or something we might have not thought
about it. It is the need of the organization to arrange it and make it meaningful.
iii. It will be easy to do so if we have data in the same format, however it is not
the case most of the time.
iv. The real world has data in many different formats and that is the challenge we
need to overcome with the Big Data.
v. This variety of the data represent Big Data.
Management Information System
Data warehouses have no standard definition and the people who work on data
warehouse subject have defined it in many ways as follows:
[1] “The basic data warehouse architecture interposes between end-user desktops and
production data sources a warehouse that we usually think of as a single, large
system maintaining an approximation of an enterprise data model.”
[2] “A data warehouse is a copy of transaction data specifically structured for querying
and reporting.”
[3] “A data warehouse as a “subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-making process”.
These data is obtained from different operational sources and kept in separate physical
store. A data warehouse is not only a relational database that contains historical data
derived from transactional data but also it is an environment that includes all the
operations and applications to manage the process of gathering data, and delivering it
to business users such as extraction, transportation, transformation, and loading (ETL)
solution, an online analytical processing (OLAP) engine, client analysis tools.
contains a historical record of sales over specific time intervals. If designed well,
subject-oriented data provides a stable image of business processes, independent
of legacy systems. In other words, it captures the basic nature of the business
environment.
ii. Integrated: Data warehouse consists of different kind of data which are collected
from separate legacy systems and this can create conflicts and inconsistencies
among units of measure.
iii. Because of this, they have to be put in a consistent format and by this way they
become integrated.
iv. Nonvolatile: Nonvolatile means that, once entered the warehouse, data should
not change. This is logical because the purpose of a warehouse is to enable a user
to analyze what has occurred. New data is always appended to the database, rather
than replaced. The database continually absorbs new data, integrating it with the
previous data.
v. Time variant: There is difference between operational data and informational data
from the point of time valiancy. Operational data is valid only now of access-
capturing a moment in time. When performance requirements are demanded,
historical data is needed. Due to the data warehouse data represents data over a
long time horizon; historical analysis can be easily performed.
2.3.2 The Goals of a Data Warehouse
i. Source System
A source system is called as legacy system that captures business data and
transactions. Source system has to be uptime and available and it gives a chance
to share basic dimensions as product and customer with other legacy system in the
organization. It is the largest source of data for analysis systems therefore it is a
burden to create queries and management reports directly from these systems.
ii. Data Staging Area
A data staging area is an initial storage area where set of processes- that clean,
transform, combine, de-duplicate, household, archive- are performed on the data
in order to use them in the data warehouse. The data staging area acts as a bridge
between the source system and presentation server. Data staging area can be
spread over a number of machines and does not need to be based on relational
technology. Unlike the presentation service, which will be describe below, the main
restriction of data staging area is that it never provides query and presentation
services.
iii. Presentation Server
A presentation server is a physical machine that stores the processed data for the
end user’s querying and reporting requirements. It is fed from data staging area. If
the query able presentation resource for an enterprise’s data organizes around an
entity-relation model, understandability and performance will be lost. Also the
tables will be organized as star schema if the presentation server presents and
stores data in a dimensional framework.
iv. Dimensional Model
Dimensional model, which is designed to provide higher query performance,
resilience to change and to be more understandable, is an alternative model to
entity relation model. The dimensional model consists of fact table and dimension
tables.
A fact table contains measurement of the business that is preferred to be numeric
and additive. There has to be a set of two or more foreign keys that helps to join
dimension tables to fact table.
A dimension table is complementary to the fact table. Most of them have many
textual attributes. It also has primary key enables to make a relation with the fact
table.
v. Data Mart
Data mart is a logical subset of the complete data warehouse and prepared for a
single business process in an organization. When they come together, an
Management Information System
integrated enterprise data warehouse is formed. Data marts must be built from
shared dimensions and fact. By this way they can be combined and used together.
vi. OLAP (On-Line Analytic Processing)
OLAP enables querying and presenting text and number data from data
warehouses for end users. OLAP technology is based on multidimensional cube of
data and OLAP databases have multidimensional structure.
vii. End User Application
These applications help end users to prepare queries, make analysis and perform
other activities which are targeted to support business needs such as end user data
access tool and ad hoc query tool.
End user data access tool works with SQL session and provides to the user a report,
a screen of data or another forms of analysis.
Ad hoc query tool facilitates preparing queries by given an opportunity to the user
to use pre-built query templates.
viii. Modeling Application
Modeling applications enable to transform or make a summary from the data
warehouse by forecasting models, behavior scoring models allocation models and
data mining tools.
ix. Metadata
Metadata contains information and definitions about the data, which is stored.
Legacy
ct
ct
May be frequently refreshed;
End
The major task of online operational database systems is to perform online transaction
and query processing. These systems are called online transaction processing (OLTP)
systems. They cover most of the day-to-day operations of an organization.
Data warehouse systems, on the other hand, users or knowledge workers in the role
of data analysis and decision making. Such systems can organize and present data in
various formats in order to accommodate the diverse needs of different users. These
systems are known as online analytical processing (OLAP) systems.
Management Information System
(1) Users and System Orientation: An OLTP system is used for transaction and query
processing by clerk, clients and information technology professionals. An OLAP
system is used for data analysis by knowledge workers, analysts, managers and
executives.
(2) Data Contents: An OLTP system manages current data that typically are too
detailed to be easily used for decision making. An OLAP system manages large
amounts of historic data, provides facilities for summarization and aggregation and
stores and manages information at different levels of granularity. These features
make the data easier to use for informed decision making.
(3) Database Design: An OLTP systems use the entity-relationship(ER) data model and
an application-oriented database design. An OLAP systems use a star or snowflake
model and subject-oriented database design.
(4) View: An OLTP system focuses mainly on the current data within an enterprise or
department, without referring to historic data or data in different organization. In
contrast, an OLAP system often spans multiple versions of a database schema, due
to the evolutionary process of an organization. OLAP systems also deal with
information that originates from different organizations, integrating information
from many data stores. Because of their huge volume, OLAP data are stored on
multiple storage media.
(5) Access patterns: The access patterns of an OLTP system consist mainly of short,
atomic transactions. Such a system requires concurrency control and recovery
mechanisms. However, accesses to OLAP systems are mostly-read only operations,
although many could be complex queries.
2.5 Data Warehouse Architectures
Data warehouses and their architectures vary depending upon the specifics of an
organization's situation. Three common architectures are:
i. Data Warehouse Architecture (Basic)
Management Information System
By this simple architecture for a data warehouse seen in Figure 3.6.1, end users
directly access data derived from several source systems through the data
warehouse.
Flat files
An additional type of data, summary data is very valuable in data warehouses because they
pre-compute long operations in advance. For example, the result of the query that is about sales
of last year is retrieved by adding sales data.
The most data warehouses use a staging area in order to clean and process the operational
data before putting it into the warehouse. A staging area simplifies building summaries and
general warehouse management. The quite common architecture is shown in Figure 3.6.2.
Management Information System
Flat files
A warehouse’s architecture can be customized for different groups within the organization
by adding data marts, which are systems designed for specific parts of business.
The following Figure 3.6.3 shows an example. In this example, there are three data marts
which are designed separately for purchasing, sales, and inventories. This architecture gives an
opportunity to analyze historical data for purchases and sales.
Figure 17: Architecture of a Data Warehouse with a Staging Area and Data Marts
Management Information System
Data warehouse systems use back-end tools and utilities to populate and refresh their data. These
tools and utilities include the following functions.
∑ Data Extraction which typically gathers data from multiple, heterogeneous and external
sources.
∑ Data Cleaning which detects errors in the data and rectifies them when possible.
∑ Data Transformation which converts data from legacy or host format to warehouse format.
∑ Load, which sorts, summarizes, consolidates, computes views, checks integrity and builds
indexes and partitions.
∑ Refresh, which propagates the updates from data source to the data warehouse.
Another thing is that of Metadata is that it is structured data which describes the
characteristics of resource. Metadata is stored in the system itself and can be queried using tools
that are available on the system.
Examples:
(1) The table of contents and index in a book may be considered metadata for the book.
(2) A library catalogue may be considered metadata. The catalogue metadata consists of
several predefined elements representing specific attributes of a resource, and each
element can have one or more values. These elements could be the name of the author,
the name of the document, the publisher’s name, the publication date and the category
to which it belongs. They could even include an abstract of the data.
(3) Suppose we say that a data element about a person is 80. This must be described by
nothing that it is the person’s weight and the unit is kilograms. Therefore (weight,
kilogram) is the metadata about the data is 80.
A metadata repository is a database of data about data (metadata). The purpose of the
metadata repository is to provide consistent and reliable access to data. The metadata
Management Information System
repository itself may be stored in a physical location in which metadata is drawn from separate
sources. Metadata may include information about how to access specific data or more details
about the data.
Metadata has a very important role in a data warehouse. The role of metadata in a
warehouse is different from the warehouse data, and it plays an important role.
Metadata plays a very different role than data warehouse and it is important for many reasons.
Example: A metadata are used as a directory to help the decision support system analyst locate
the contents of the data warehouse, and as a guide to the data mapping when data are
transformed from the operational environment to the data warehouse environment. Metadata
also serve as a guide to the algorithms used for summarization between the current detailed data
and the highly summarized data, and between the lightly summarized data and the highly
summarized data. Metadata should be stored and managed persistently.
The two most common approaches to building Meta data repository architecture are:
(1) Centralized
(2) Decentralized
Generally small to medium sized organizations, a single metadata repository (the centralized
approach) is enough for handling all of the metadata required by the various groups in the
corporation. This architecture offers a single and centralized approach to administering and
sharing metadata.
On the Other hand, most large enterprises that have multiple and disparate divisions will require
several metadata repository for handling all of the corporation’s various types of metadata
content and applications.
This approach is the most common one that corporations have implemented.
The concept of a centralized Metadata architecture, consistent Meta model that mandates the
schema for defining and organizing the various metadata be stored in a global metadata
repository.
The strength of this approach is that it integrates all of the metadata and stores it in the Meta
model schema that can be easily accessed.
Management Information System
Process
Decentralized Metadata architecture creates a uniform and consistent Meta model that mandates
the schema for defining and organizing the various Metadata to be stored in a global metadata
repository and in the shared metadata elements that appear in the local meta data repository.
All the Metadata that is shared and reused among the various repositories must first go
through the central global repository but sharing and access to the local metadata is independent
of the central repository.
3.10 Mapping
A basic part of the data warehouse environment is that of mapping from the operational
environment into the data warehouse.
∑ Conversions
However, if the reports have been made properly from the data warehouse, the manage having
to go back to the operational sources. At this point, if the mapping data has been carefully stored,
then the manager can quickly and easily go to the operational source. However, if the mapping
has not been stored properly, then manager has a difficult time defending conclusion to the vice
president.
The metadata store for the data warehouse then is natural place for the storing of mapping
information.
Data marts can have dependent or independent structure. If the characteristic of the data
marts’ dimensions is defined at the beginning, as they would be compliant to each other
then these data marts will have dependent characteristic.
In some situations, it is better to have independent data marts. This time the characteristic
of the other data marts will not take in the consideration during the preparation of the data
Management Information System
mart. However, this can prevent future integration and add development cost if there will
be an interest in sharing information across departments.
i. To give users more flexible access to the data they need to analyze most often.
ii. To provide data in a form that matches the collective view of a group of users.
iii. To improve end uses response time.
iv. Potential users of a data mart are clearly defined and can be targeted for support
to retrieve the data.
v. To provide appropriately structured data as dictated by the requirements of the
enduser access tools.
vi. Building a data mart is simpler compared with establishing a corporate data
warehouse.
vii. The cost of implementing data marts is far less than that required to establish a
data warehouse.
viii. Data mart is the access larger of the data warehouse environment. That means
we create data mart to retrieve the data to the users faster.
ix. The Data mart is the subset of warehouse that means all the data available in the
data mart will be available in database. This Data mart will be created for the
purpose of specific business.
x. It is easy to access frequently needed data from the database when required by
the client.
xi. We can give access to group of users to view the Data mart when it is required.
Of course, performance will be good.
xii. It is easy to maintain and to create the data mart. It will be related to specific
business.
xiii. It is low cost to create a data mart rather than creating data warehouse with a
huge space.
Management Information System
Resource
Finance
There are three main approaches for building data marts; top-down approach, bottomup
approach and federated approach.
Data Marts
ODS
ODS
When the data mart is compared with the data warehouse, two fundamental distinctions
can easily be noticed. One of them is that data mart is a subset of the data warehouse
and it is requirement oriented. Against this data warehouse holds the enterprise data
without taking care about any specific requirements. But of course, during the design of
data mart the structure of the whole warehouse has to be considered, if not it will be very
hard to integrate the data marts later.
Management Information System
The implementation of the data mart is much faster and costs cheaper, since a data mart
contains only a specific part of the data warehouse whose implementation is more time
consuming and costs much more.
There are some data mart solutions that are developed by the many decision support
systems (DSS) vendors. But using them to design a data mart for the specific
requirements needs to spend much more effort to customize them; due to this solutions
are produced for general purposes.
The other main difference of the data mart from the data warehouse is that the data in
the data mart can be more granular than the data warehouse. Since the requirements of
the data mart are more defined than those of the data warehouse, preaggregation can
be afforded to the data along the requirements. So the extraction of the data can be
done faster and more efficient.
Data Handling Data warehousing includes large Data marts are easy to use, design
area of the corporation which is and implement as it can only
why it takes a long time to process handle small amounts of data.
it.
Data type The data stored inside the Data Data Marts are built for particular
Warehouse are always detailed user groups. Therefore, data short
when compared with data mart. and limited.
Subject-area The main objective of Data Mostly hold only one subject area-
Warehouse is to provide an for example, Sales figure.
integrated environment and
coherent picture of the business at
a point in time.
Management Information System
Data type Time variance and non-volatile Mostly includes consolidation data
design are strictly enforced. structures to meet subject area's
query and reporting needs.
Source In Data Warehouse Data comes In Data Mart data comes from
from many sources. very few sources.
Size The size of the Data Warehouse The Size of Data Mart is less than
may range from 100 GB to 1 TB+. 100 GB.
The systems develop to capture, create, refine, tag and circulate information
used to improve business productivity of the organization. There are three
broadways of managing the knowledge system.
Structure
i. Expert Systems
ii. Groupware
In the current global scenario, team members are spread across regions.
However, it is important for them to collaborate on various projects. Groupware
is a knowledge management system which helps in sharing calendar, project
activities and instant messaging.
iii. SharePoint
Decision support system helps floor managers; Sales Manager, CEO, etc. take
decisions to finalize business or operational strategy. Decision support system
Management Information System
All the systems we are discussing here come under knowledge management
category. A knowledge management system is not radically different from all
these information systems, but it just extends the already existing systems by
assimilating more information.
What is Knowledge?
• Personalized information
• State of knowing and understanding
• An object to be stored and manipulated
• Intranet
• Data warehouses and knowledge repositories
• Decision support tools
• Groupware for supporting collaboration
• Networks of knowledge workers
• Internal expertise
Purpose of KMS
• Improved performance
• Competitive advantage
Management Information System
• Innovation
• Sharing of knowledge
• Integration
• Continuous improvement by −
o Driving strategy
o Starting new lines of business
o Solving problems faster
o Developing professional skills
o Recruit and retain talent
Activities in Knowledge Management
• Start with the business problem and the business value to be delivered first.
• Identify what kind of strategy to pursue to deliver this value and address the KM
problem.
• Think about the system required from a people and process point of view.
• Finally, think about what kind of technical infrastructure are required to support
the people and processes.
Decision making is the mental process of selecting a course of action from a set of
alternatives. Decision making is the mental process of choosing from a set of alternatives.
Every decision-making process produces an outcome that might be an action, a
recommendation, or an opinion. Since doing nothing or remaining neutral is usually
among the set of options one chooses from, selecting that course is also deciding.
While they are related, problem analysis and decision making are distinct activities.
Decisions are commonly focused on a problem or challenge. Decision makers must gather
and consider data before making a choice. Problem analysis involves framing the issue by
defining its boundaries, establishing criteria with which to select from alternatives, and
developing conclusions based on available information. Analyzing a problem may not
result in a decision, although the results are an important ingredient in all decision
making.
Decision making comprises a series of sequential activities that together structure the
process and facilitate its conclusion. These steps are:
• Establishing objectives
• Classifying and prioritizing objectives
• Developing selection criteria
• Identifying alternatives
• Evaluating alternatives against the selection criteria
• Choosing the alternative that best satisfies the selection criteria
• Implementing the decision
Management Information System
A major part of decision making involves the analysis of a defined set of alternatives
against selection criteria. These criteria usually include costs and benefits, advantages and
disadvantages, and alignment with preferences. For example, when choosing a place to
establish a new business, the criteria might include rental costs, availability of skilled labor,
access to transportation and means of distribution, and proximity to customers. Based on
the relative importance of these factors, a business owner decides that best meets the
criteria.
The decision maker may face a problem when trying to evaluate alternatives in terms of
their strengths and weaknesses. This can be especially challenging when there are many
factors to consider. Time limits and personal emotions also play a role in the process of
choosing between alternatives. Greater deliberation and information gathering often
takes additional time, and decision makers often must choose before they feel fully
prepared. In addition, the more that is at stake the more emotions are likely to come into
play, and this can distort one’s judgment.
Types of Decisions
Three approaches to decision making are avoiding, problem solving and problem
seeking.
Every decision-making process reaches a conclusion, which can be a choice to act or not
to act, a decision on what course of action to take and how, or even an opinion or
recommendation. Sometimes decision-making leads to redefining the issue or
challenge. Accordingly, three decision-making processes are known as avoiding,
problem solving, and problem seeking.
One decision-making option is to make no choice at all. There are several reasons why
the decision maker might do this:
4. The person considering the alternatives does not have the authority to decide.
Initially, BI tools were primarily used by data analysts and other IT professionals
who ran analyses and produced reports with query results for business users.
Increasingly, however, business executives and workers are using BI platforms
themselves, thanks partly to the development of self-service BI and data discovery
tools and dashboards.
Types of BI tools
Business intelligence combines a broad set of data analysis applications, including
ad hoc analytics and querying, enterprise reporting, online analytical processing
(OLAP), mobile BI, real-time BI, operational BI, software-as-a-service BI, open
source BI, collaborative BI and location intelligence.
BI technology also includes data visualization software for designing charts and
other infographics, as well as tools for building BI dashboards and performance
scorecards that display visualized data on business metrics and key performance
indicators in an easy-to-grasp way. Data visualization tools have become the
standard of modern BI in recent years. A couple leading vendors defined the
technology early on, but more traditional BI vendors have followed in their path.
Now, virtually every major BI tool incorporates features of visual data discovery.
Questions
2 Marks Questions
1. Define Database Approach
2. Define Big Data with example.
3. Define Datawarehouse and Data Mart.
4. Define Knowledge management with neat diagram.
5. What are the 3V’s of big data analytics.
6. What are the roles of BI?
7. Differentiate between traditional Computing and Stream Computing.
8. Define data management.
5 Marks Questions
1. Describe the importance of Business Intelligence and DSS in developing MIS
2. Explain the MIS pyramid.
3. What is Information system? What are functions of information system and its
impact on the society in the domain of health care.
4. Explain the ethical issues and threats of information security.
5. Differentiate between Datawarehouse and Data Mart.
6. Differentiate between OLAP and OLTP.
7. Explain with neat diagram the Value Chain of Big data.
8. Explain the Knowledge Management framework with KM Ladder.
10 Marks Question