Chapter 2 - Data and Knowledge Management - NOTES

University of Mumbai -ILO7013
MANAGEMENT
INFORMATION SYSTEM
Chapter 2 -Data and Knowledge Management
The chapter emphasizes the need and importance of Data and Knowledge
Management along with Business intelligence in the domain of
Management Information System
Data and Knowledge Management

• Database Approach
• Big Data
• Data warehouse and Data Marts
• Knowledge Management
2.1 Database Approach
❖ The database approach is an improvement on the shared file solution as the
use of a database management system (DBMS) provides facilities for querying,
data security and integrity, and allows simultaneous access to data by several
different users.
Figure 1: Data Management- YouTube Video
❖ Database: A database is a collection of related data.

❖ The Database is a shared collection of logically related data, designed to meet
the information needs of an organization.
❖ A database is a computer-based record keeping system whose over all
purpose is to record and maintains information.
❖ The database is a single, large repository of data, which can be used
simultaneously by many departments and users. Instead of disconnected files
with redundant data, all data items are integrated with a minimum amount of
duplication.
Figure 2: Database Approach
Figure 3: Database Approach- YouTube Video
2.1.1 Building blocks of a Database

The following three components form the building blocks of a database. They store
the data that we want to save in our database.
i. Columns. Columns are like fields, that is, individual items of data that we
wish to store. A Student' Roll Number, Name, Address etc. are all examples
of columns. They are also like the columns found in spreadsheets (the A, B,
C etc. along the top).
ii. Rows. Rows are like records as they contain data of multiple columns (like
the 1, 2, 3 etc. in a spreadsheet). A row can be made up of as many or as
few columns as you want. This makes reading data much more efficient -
you fetch what you want.
iii. Tables. A table is a logical group of columns. For example, you may have a
table that stores details of customers' names and addresses. Another table
would be used to store details of parts and yet another would be used for
supplier's names and addresses.
2.1.2 Characteristics of database

The data in a database should have the following features:
i. Organized/Related: It should be well organized and related.
ii. Shared: Data in a database are shared among different users and
applications.
iii. Permanent or Persistence: Data in a database exist permanently in the

sense the data can live beyond the scope of the process that created it.
iv. Validity/integrity/Correctness: Data should be correct with respect to the

real-world entity that they represent.
v. Security: Data should be protected from unauthorized access.
vi. Consistency: Whenever more than one data element in a database

represents related real-world values, the values should be consistent with
respect to the relationship.
vii. Non-redundancy: No two data items in a database should represent the

same real-world entity.
viii. Independence: Data at different levels should be independent of each

other so that the changes in one level should not affect the other levels.
ix. Easily Accessible: It should be available when and where it is needed i.e. it
should be easily accessible.
x. Recoverable: It should be recoverable in case of damage.
xi. Flexible to change: It should be flexible to change.
Figure 4: Database Approach

2.1.3 Traditional File Processing System and Its Characteristics

i. Traditional File Processing Systems: It was totally computer-based system
where all the information is store in different computer files.
ii. Traditional files system stores data in a manner that all the departments of
an organization have their own set of files that creates data redundancy.
For example:
To illustrate Traditional File Processing Systems definition, lets us take an
example of college where student record for examination is stored n other
file and his library record is stored in different file that creates many
duplicate values like roll Number, Name and Father Name.
A typical Traditional File Processing Systems is shown in the diagram that
shows program and data independency.
Library Examinations Registrations
Library Examination Registration

Applications Applications Applications
Registration Data Registration Data Registration Data

Files Files Files
Figure 5: Transaction Processing System
2.2 Big Data

i. Big Data is becoming one of the most talked about technology trends
nowadays.
ii. The real challenge with the big organization is to get maximum out of the data
iii. already available and predict what kind of data to collect in the future.
iv. How to take the existing data and make it meaningful that it provides us
accurate insight in the past data is one of the key discussion points in many of
the executive meetings in organizations.
v. With the explosion of the data the challenge has gone to the next level and
now a Big Data is becoming the reality in many organizations.
vi. The goal of every organization and expert is same to get maximum out of the
data, the route and the starting point are different for each organization and
expert.
vii. As organizations are evaluating and architecting big data solutions they are
also learning the ways and opportunities which are related to Big Data.
viii. There is not a single solution to big data as well there is not a single vendor
which can claim to know all about Big Data.
ix. Big Data is too big a concept and there are many players – different
architectures, different vendors and different technology.
The three Vs of Big data are Velocity, Volume and Variety.
Figure 6:Big Data Sphere

Figure 7:Big Data – Transactions, Interactions, Observations
Big data Characteristics
The three Vs of Big data are Velocity, Volume and Variety
Figure 8:Characteristics of Big Data
Volume
i. The exponential growth in the data storage as the data is now more than text
data.
ii. The data can be found in the format of videos, music’s and large images on
our social media channels.
iii. It is very common to have Terabytes and Petabytes of the storage system for
enterprises.
iv. As the database grows the applications and architecture built to support the
data needs to be reevaluated quite often.
v. Sometimes the same data is re-evaluated with multiple angles and even
though the original data is the same the new found intelligence creates
explosion of the data.
vi. The big volume indeed represents Big Data.
Velocity
i. The data growth and social media explosion have changed how we look at the
data. There was a time when we used to believe that data of yesterday is recent.
ii. The matter of the fact newspapers is still following that logic.
iii. However, news channels and radios have changed how fast we receive the
news.
iv. Today, people reply on social media to update them with the latest happening.
On social media sometimes a few seconds old messages (a tweet, status
updates etc.) is not something interests users.
v. They often discard old messages and pay attention to recent updates.
vi. The data movement is now almost real time and the update window has
reduced to fractions of the seconds.
vii. This high velocity data represents Big Data.
Variety
i. Data can be stored in multiple format. For example, database, excel, csv, access
or for the matter of the fact, it can be stored in a simple text file.
ii. Sometimes the data is not even in the traditional format as we assume, it may
be in the form of video, SMS, pdf or something we might have not thought
about it. It is the need of the organization to arrange it and make it meaningful.
iii. It will be easy to do so if we have data in the same format, however it is not
the case most of the time.
iv. The real world has data in many different formats and that is the challenge we
need to overcome with the Big Data.
v. This variety of the data represent Big Data.
Figure 9:Volume, Velocity, Variety
Figure 10: Traditional vs Stream Computing - YouTube video
Figure 11: Big Data - YouTube Video
2.3 Data warehouse and Data Marts

2.3.1 Introduction to Data Warehousing
A data warehouse is storage of convenient, consistent, complete and consolidated data,

which is collected for the purpose of making quick analysis for the end users who take
place in Decision Support Systems (DSS).
Data warehouses have no standard definition and the people who work on data
warehouse subject have defined it in many ways as follows:
[1] “The basic data warehouse architecture interposes between end-user desktops and
production data sources a warehouse that we usually think of as a single, large
system maintaining an approximation of an enterprise data model.”
[2] “A data warehouse is a copy of transaction data specifically structured for querying
and reporting.”
[3] “A data warehouse as a “subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-making process”.
These data is obtained from different operational sources and kept in separate physical
store. A data warehouse is not only a relational database that contains historical data
derived from transactional data but also it is an environment that includes all the
operations and applications to manage the process of gathering data, and delivering it
to business users such as extraction, transportation, transformation, and loading (ETL)
solution, an online analytical processing (OLAP) engine, client analysis tools.
Figure 12:Data Warehouse System Model
i. Subject-Oriented: Data warehouses are designed to aid in decision making for a

specific subject. For example, sales data for applications contains specific sales of
specific products to specific customers. In contrast, sales data for decision support
contains a historical record of sales over specific time intervals. If designed well,
subject-oriented data provides a stable image of business processes, independent
of legacy systems. In other words, it captures the basic nature of the business
environment.
ii. Integrated: Data warehouse consists of different kind of data which are collected
from separate legacy systems and this can create conflicts and inconsistencies
among units of measure.
iii. Because of this, they have to be put in a consistent format and by this way they
become integrated.
iv. Nonvolatile: Nonvolatile means that, once entered the warehouse, data should
not change. This is logical because the purpose of a warehouse is to enable a user
to analyze what has occurred. New data is always appended to the database, rather
than replaced. The database continually absorbs new data, integrating it with the
previous data.
v. Time variant: There is difference between operational data and informational data
from the point of time valiancy. Operational data is valid only now of access-
capturing a moment in time. When performance requirements are demanded,
historical data is needed. Due to the data warehouse data represents data over a
long time horizon; historical analysis can be easily performed.
2.3.2 The Goals of a Data Warehouse
The fundamental goals of the data warehouse are:
1- “Makes an organization’s information accessible.” The contents of the data warehouse

are correctly labeled and obvious. It is very easy to reach to data because they are one
click away and there is no need to wait for this. These properties are called as same in
the above order; understandable, navigable and fast performance.
2- “Makes the organization’s information consistent.” Consistent information has a key
importance for the data warehouses since they get data from different parts of an
organization. They must be matched properly. If two measures of the organization
have the same name, then they must mean the same thing. Conversely, in two
measures don’t mean the same thing, they are labeled differently.
3- “To be an adaptive and resilient source of information.” It enables to add new data
and ask new questions without any change in existing data and the technologies due
to it are designed for continuous change.
4- “To be a secure bastion that protects owner’s information asset.” The data warehouse
not only controls access to the data effectively, but also gives its owners great visibility
into the uses and abuses of that data, even after it has left the data warehouse.
5- “To be the foundation for decision-making.” The data warehouse provides the right
data for the decision makers. The decisions are output of the data warehouses.
2.3.3 Basic Elements of the Data Warehouse
i. Source System
A source system is called as legacy system that captures business data and
transactions. Source system has to be uptime and available and it gives a chance
to share basic dimensions as product and customer with other legacy system in the
organization. It is the largest source of data for analysis systems therefore it is a
burden to create queries and management reports directly from these systems.
ii. Data Staging Area
A data staging area is an initial storage area where set of processes- that clean,
transform, combine, de-duplicate, household, archive- are performed on the data
in order to use them in the data warehouse. The data staging area acts as a bridge
between the source system and presentation server. Data staging area can be
spread over a number of machines and does not need to be based on relational
technology. Unlike the presentation service, which will be describe below, the main
restriction of data staging area is that it never provides query and presentation
services.
iii. Presentation Server
A presentation server is a physical machine that stores the processed data for the
end user’s querying and reporting requirements. It is fed from data staging area. If
the query able presentation resource for an enterprise’s data organizes around an
entity-relation model, understandability and performance will be lost. Also the
tables will be organized as star schema if the presentation server presents and
stores data in a dimensional framework.
iv. Dimensional Model
Dimensional model, which is designed to provide higher query performance,
resilience to change and to be more understandable, is an alternative model to
entity relation model. The dimensional model consists of fact table and dimension
tables.
A fact table contains measurement of the business that is preferred to be numeric
and additive. There has to be a set of two or more foreign keys that helps to join
dimension tables to fact table.
A dimension table is complementary to the fact table. Most of them have many
textual attributes. It also has primary key enables to make a relation with the fact
table.
v. Data Mart
Data mart is a logical subset of the complete data warehouse and prepared for a
single business process in an organization. When they come together, an
integrated enterprise data warehouse is formed. Data marts must be built from
shared dimensions and fact. By this way they can be combined and used together.
vi. OLAP (On-Line Analytic Processing)
OLAP enables querying and presenting text and number data from data
warehouses for end users. OLAP technology is based on multidimensional cube of
data and OLAP databases have multidimensional structure.
vii. End User Application
These applications help end users to prepare queries, make analysis and perform
other activities which are targeted to support business needs such as end user data
access tool and ad hoc query tool.
End user data access tool works with SQL session and provides to the user a report,
a screen of data or another forms of analysis.
Ad hoc query tool facilitates preparing queries by given an opportunity to the user
to use pre-built query templates.
viii. Modeling Application
Modeling applications enable to transform or make a summary from the data
warehouse by forecasting models, behavior scoring models allocation models and
data mining tools.
ix. Metadata
Metadata contains information and definitions about the data, which is stored.
The basic elements of the data warehouse are given in Figure 3

Source End User Data Access
Legacy
ct
User group driven;
ct
May be frequently refreshed;
End
Conform dimensions; Models:

ct
No User Query Services;
Figure 13:The Basic Elements of the Data Warehouse
2.4 Differences between Operational Database Systems and Data

Warehouses
The major task of online operational database systems is to perform online transaction
and query processing. These systems are called online transaction processing (OLTP)
systems. They cover most of the day-to-day operations of an organization.
Data warehouse systems, on the other hand, users or knowledge workers in the role
of data analysis and decision making. Such systems can organize and present data in
various formats in order to accommodate the diverse needs of different users. These
systems are known as online analytical processing (OLAP) systems.
Figure 14: Data Warehouse Architecture
Feature OLTP OLAP
Characteristic Operational processing Informational processing
Orientation Transaction Analysis
User Clerk, DBA, database Knowledge worker

professional (Manager, analyst, executive)
Function Day-to-day operations long-term informational,
requirements decision support
DB design ER-based,application oriented Star/snowflake, subject-oriented
Data Current, guaranteed Historic, accuracy, maintained

Up to date over time
Summarization Primitive, Summarized, consolidated
Highly detailed
View Detailed, flat relational Summarized, multidimensional
Unit of work Short, simple Complex query

Transaction
Access Read/write Mostly read
Focus Data in Information out

Operations Index/ hash on primary key Lots of scans
Number of Tens Millions

records accessed
Number of Users Thousands Hundreds
DB size GB to high- order GB >=TB
Priority High performance, High flexibility, end-user

High availability autonomy
(1) Users and System Orientation: An OLTP system is used for transaction and query
processing by clerk, clients and information technology professionals. An OLAP
system is used for data analysis by knowledge workers, analysts, managers and
executives.
(2) Data Contents: An OLTP system manages current data that typically are too
detailed to be easily used for decision making. An OLAP system manages large
amounts of historic data, provides facilities for summarization and aggregation and
stores and manages information at different levels of granularity. These features
make the data easier to use for informed decision making.
(3) Database Design: An OLTP systems use the entity-relationship(ER) data model and
an application-oriented database design. An OLAP systems use a star or snowflake
model and subject-oriented database design.
(4) View: An OLTP system focuses mainly on the current data within an enterprise or
department, without referring to historic data or data in different organization. In
contrast, an OLAP system often spans multiple versions of a database schema, due
to the evolutionary process of an organization. OLAP systems also deal with
information that originates from different organizations, integrating information
from many data stores. Because of their huge volume, OLAP data are stored on
multiple storage media.
(5) Access patterns: The access patterns of an OLTP system consist mainly of short,
atomic transactions. Such a system requires concurrency control and recovery
mechanisms. However, accesses to OLAP systems are mostly-read only operations,
although many could be complex queries.
2.5 Data Warehouse Architectures
Data warehouses and their architectures vary depending upon the specifics of an
organization's situation. Three common architectures are:
i. Data Warehouse Architecture (Basic)
ii. Data Warehouse Architecture (with a Staging Area)

iii. Data Warehouse Architecture (with a Staging Area and Data Marts)
2.6.1 Data Warehouse Architecture (Basic)
By this simple architecture for a data warehouse seen in Figure 3.6.1, end users
directly access data derived from several source systems through the data
warehouse.
Data Sources Warehouses Users
Flat files
Figure 15:Architecture of a Data Warehouse (Basic)
An additional type of data, summary data is very valuable in data warehouses because they
pre-compute long operations in advance. For example, the result of the query that is about sales
of last year is retrieved by adding sales data.
Data Warehouse Architecture (with a Staging Area)
The most data warehouses use a staging area in order to clean and process the operational
data before putting it into the warehouse. A staging area simplifies building summaries and
general warehouse management. The quite common architecture is shown in Figure 3.6.2.
Flat files
Figure 16:Architecture of a Data Warehouse with a Staging Area
Data Warehouse Architecture (with a Staging Area and Data Marts)
A warehouse’s architecture can be customized for different groups within the organization
by adding data marts, which are systems designed for specific parts of business.
The following Figure 3.6.3 shows an example. In this example, there are three data marts
which are designed separately for purchasing, sales, and inventories. This architecture gives an
opportunity to analyze historical data for purchases and sales.
Figure 17: Architecture of a Data Warehouse with a Staging Area and Data Marts
3.8 Define Extraction, Transformation and Loading
Data warehouse systems use back-end tools and utilities to populate and refresh their data. These
tools and utilities include the following functions.
∑ Data Extraction which typically gathers data from multiple, heterogeneous and external
sources.
∑ Data Cleaning which detects errors in the data and rectifies them when possible.
∑ Data Transformation which converts data from legacy or host format to warehouse format.
∑ Load, which sorts, summarizes, consolidates, computes views, checks integrity and builds
indexes and partitions.
∑ Refresh, which propagates the updates from data source to the data warehouse.
3.9 Data warehouse Metadata

Given the complexity of information in an ODS and data warehouse, it is essential that
there be a mechanism for users to easily find out what data is there and how it can be used to
meet their needs. Providing metadata about the ODS or the data warehouse achieves this.
Metadata is data about data or documentation about the data that is needed by the users. It is
not the actual data warehouse, but answers “who, what, where, when, why and how” questions
about the data warehouse.
Another thing is that of Metadata is that it is structured data which describes the
characteristics of resource. Metadata is stored in the system itself and can be queried using tools
that are available on the system.
Examples:
(1) The table of contents and index in a book may be considered metadata for the book.
(2) A library catalogue may be considered metadata. The catalogue metadata consists of
several predefined elements representing specific attributes of a resource, and each
element can have one or more values. These elements could be the name of the author,
the name of the document, the publisher’s name, the publication date and the category
to which it belongs. They could even include an abstract of the data.
(3) Suppose we say that a data element about a person is 80. This must be described by
nothing that it is the person’s weight and the unit is kilograms. Therefore (weight,
kilogram) is the metadata about the data is 80.
A metadata repository is a database of data about data (metadata). The purpose of the
metadata repository is to provide consistent and reliable access to data. The metadata
repository itself may be stored in a physical location in which metadata is drawn from separate
sources. Metadata may include information about how to access specific data or more details
about the data.
3.9.1 Role of Metadata
Metadata has a very important role in a data warehouse. The role of metadata in a
warehouse is different from the warehouse data, and it plays an important role.
The various roles of metadata are explained below.
i. Metadata acts as a directory.

ii. This directory helps the decision support system to locate the contents of the data
warehouse.
iii. Metadata helps in decision support system for mapping of data when data is
transformed from operational environment to data warehouse environment.
iv. Metadata helps in summarization between current detailed data and highly
summarized data.
v. Metadata is used for query tools.
vi. Metadata is used in extraction and cleansing tools.
vii. Metadata is used in reporting tools.
viii. Metadata is used in transformation tools.
ix. Metadata plays an importing role in loading functions.
Metadata plays a very different role than data warehouse and it is important for many reasons.
Example: A metadata are used as a directory to help the decision support system analyst locate
the contents of the data warehouse, and as a guide to the data mapping when data are
transformed from the operational environment to the data warehouse environment. Metadata
also serve as a guide to the algorithms used for summarization between the current detailed data
and the highly summarized data, and between the lightly summarized data and the highly
summarized data. Metadata should be stored and managed persistently.
The following diagrams show the role of Metadata.

Figure 18:Role of Metadata Chart
3.9.2 Metadata Repository:
Metadata repository is an integral part of a data warehouse system.
A Metadata repository should contain the following:
(1) Definition of data warehouse:

It includes the description of the structure of data warehouse. The description is defined by
schema, view, hierarchies, derived data definitions, and data mart location and contents.
(2) Business Metadata:

It includes the business terms and definitions, data ownership information and changing
policies.
(3) Operational Metadata:

It includes currency of data and data lineage. Currency of data means whether the data is
active, archived or purged. Lineage of data means the history of data migrated and
transformation applied on it.
(4) Data for mapping from operational environment to data warehouse:

It includes source databases and their contents, data partitions, data extraction, cleaning,
transformation rules, data refresh and purging rules and security (user authorization and
access control).
(5) The algorithms used for summarization:

It includes measure and dimension definition algorithms, data on granularity, partitions,

subject areas, aggregation, summarization, and predefined queries and reports.
(6) Data related to system performance:

It includes indices and profiles that improve data access and retrieval performance, in
addition to rules for the timing and scheduling or refresh, update and replication cycles.
3.9.3 Types of Metadata in Data Warehouse Architecture:
The two most common approaches to building Meta data repository architecture are:
(1) Centralized
(2) Decentralized
Generally small to medium sized organizations, a single metadata repository (the centralized
approach) is enough for handling all of the metadata required by the various groups in the
corporation. This architecture offers a single and centralized approach to administering and
sharing metadata.
On the Other hand, most large enterprises that have multiple and disparate divisions will require
several metadata repository for handling all of the corporation’s various types of metadata
content and applications.
3.9.3.1 Centralized Metadata Repository Architecture:
This approach is the most common one that corporations have implemented.
The concept of a centralized Metadata architecture, consistent Meta model that mandates the
schema for defining and organizing the various metadata be stored in a global metadata
repository.
The strength of this approach is that it integrates all of the metadata and stores it in the Meta
model schema that can be easily accessed.
Process
Figure 3.9.3.1 Centralized Metadata Repository Architecture
3.9.3.2 Decentralized Metadata Repository Architecture:
Decentralized Metadata architecture creates a uniform and consistent Meta model that mandates
the schema for defining and organizing the various Metadata to be stored in a global metadata
repository and in the shared metadata elements that appear in the local meta data repository.
All the Metadata that is shared and reused among the various repositories must first go
through the central global repository but sharing and access to the local metadata is independent
of the central repository.
MetaData sources MetaData sources MetaData sources
Figure 19:Decentralized Metadata Repository Architecture

3.10 Mapping
A basic part of the data warehouse environment is that of mapping from the operational
environment into the data warehouse.
The mapping includes a wide variety of feature include some here.
∑ Mapping from one attribute to another
∑ Conversions
∑ Changes in mapping conventions.
∑ Changes in physical characteristics of data.
∑ Filtering of data , etc. Example:

Consider the Vice president of marketing who has just asked for a new report of product selling
and purchasing. The manager turns to the data warehouse for the data for report. Upon
inspection, the vice president proclaims the report to be fiction. Than manager who can prove
that data in the report to be valid. The manager first looks to the validity of the data in the
warehouse. If the data warehouse, data has not been reported properly then the reports are
adjusted.
However, if the reports have been made properly from the data warehouse, the manage having
to go back to the operational sources. At this point, if the mapping data has been carefully stored,
then the manager can quickly and easily go to the operational source. However, if the mapping
has not been stored properly, then manager has a difficult time defending conclusion to the vice
president.
The metadata store for the data warehouse then is natural place for the storing of mapping
information.
Figure 20:Functionality chart of Mapping

3.11. Data Mart

The data mart is a model, which represents the same data structure with the data
warehouse. They are prepared for specific requirements of the whole organization or a part of it.
The data mart contains less data that gives to users some advantages. Firstly it enables to work
with faster queries. Another advantage is mobility due to it requires less hard disk space so the
user can carry the data mart with the laptop. During the designing process of the data marts, it is
possible to follow up two different methods in order to collect the data. One option is to collect
the granular data from the enterprise data warehouse and then process it according to the needs
around which the data mart was prepared. The second option is to collect shaped data directly to
the data mart. The data, which is designed up to the requirements of data mart, then is kept in
the central repository of all enterprise data. In Figure 3.11 the options can be seen.
Figure 21:Data Mart
Data marts can have dependent or independent structure. If the characteristic of the data
marts’ dimensions is defined at the beginning, as they would be compliant to each other
then these data marts will have dependent characteristic.
In some situations, it is better to have independent data marts. This time the characteristic
of the other data marts will not take in the consideration during the preparation of the data
mart. However, this can prevent future integration and add development cost if there will
be an interest in sharing information across departments.
3.11.1 Reason for creating a Data Mart
i. To give users more flexible access to the data they need to analyze most often.
ii. To provide data in a form that matches the collective view of a group of users.
iii. To improve end uses response time.
iv. Potential users of a data mart are clearly defined and can be targeted for support
to retrieve the data.
v. To provide appropriately structured data as dictated by the requirements of the
enduser access tools.
vi. Building a data mart is simpler compared with establishing a corporate data
warehouse.
vii. The cost of implementing data marts is far less than that required to establish a
data warehouse.
viii. Data mart is the access larger of the data warehouse environment. That means
we create data mart to retrieve the data to the users faster.
ix. The Data mart is the subset of warehouse that means all the data available in the
data mart will be available in database. This Data mart will be created for the
purpose of specific business.
x. It is easy to access frequently needed data from the database when required by
the client.
xi. We can give access to group of users to view the Data mart when it is required.
Of course, performance will be good.
xii. It is easy to maintain and to create the data mart. It will be related to specific
business.
xiii. It is low cost to create a data mart rather than creating data warehouse with a
huge space.
Resource
Finance
Figure 22:Functionality chart of Data Mart
3.11.2 Data Marts Development Approaches
There are three main approaches for building data marts; top-down approach, bottomup
approach and federated approach.
3.11.2.1. Top-Down Approach

As shown in the Figure 23 below the data firstly comes to the data staging area from the
operational sources and in this area some of the processes are performed to the data. After this it
is transferred to the data warehouse which then feeds it to the dependent data mart.
Data Marts
Enterprise Data Warehouse

( EDW )
ODS
Figure 23:Top-Down Approach to Data Mart Development
3.11.2.2. Bottom-Up Approach

In this approach, the data, which comes from legacy systems to the staging area, flows
directly into the independent data marts and then these data marts feed the enterprise data
warehouse as it is illustrated in Figure 24
Enterprise Data Warehouse (EDW)
ODS
Figure 24:Bottom-Up Approach to Data Mart Development
3.11.3 The Differences between Data Mart and Data Warehouse
When the data mart is compared with the data warehouse, two fundamental distinctions
can easily be noticed. One of them is that data mart is a subset of the data warehouse
and it is requirement oriented. Against this data warehouse holds the enterprise data
without taking care about any specific requirements. But of course, during the design of
data mart the structure of the whole warehouse has to be considered, if not it will be very
hard to integrate the data marts later.
Figure 25: Data Warehouse and Data Mart
The implementation of the data mart is much faster and costs cheaper, since a data mart
contains only a specific part of the data warehouse whose implementation is more time
consuming and costs much more.
There are some data mart solutions that are developed by the many decision support
systems (DSS) vendors. But using them to design a data mart for the specific
requirements needs to spend much more effort to customize them; due to this solutions
are produced for general purposes.
The other main difference of the data mart from the data warehouse is that the data in
the data mart can be more granular than the data warehouse. Since the requirements of
the data mart are more defined than those of the data warehouse, preaggregation can
be afforded to the data along the requirements. So the extraction of the data can be
done faster and more efficient.
Parameter Data Warehouse Data Mart
Definition A Data Warehouse is a large A data mart is an only subtype of

repository of data collected from a Data Warehouse. It is designed
different organizations or to meet the need of a certain user
departments within a corporation. group.
Usage It helps to take a strategic decision. It helps to take tactical decisions

for the business.
Objective The main objective of Data A data mart mostly used in a

Warehouse is to provide an business division at the
integrated environment and department level.
coherent picture of the business at
a point in time.
Designing The designing process of Data The designing process of Data

Warehouse is quite difficult. Mart is easy.
Model May or may not use in a It is built focused on a

dimensional model. However, it dimensional model using a start
can feed dimensional models. schema.
Data Handling Data warehousing includes large Data marts are easy to use, design
area of the corporation which is and implement as it can only
why it takes a long time to process handle small amounts of data.
it.
Focus Data warehousing is broadly Data Mart is subject-oriented, and

focused all the departments. It is it is used at a department level.
possible that it can even represent
the entire company.
Data type The data stored inside the Data Data Marts are built for particular
Warehouse are always detailed user groups. Therefore, data short
when compared with data mart. and limited.
Subject-area The main objective of Data Mostly hold only one subject area-
Warehouse is to provide an for example, Sales figure.
integrated environment and
coherent picture of the business at
a point in time.
Data storing Designed to store enterprise-wide Dimensional modeling and star

decision data, not just marketing schema design employed for
data. optimizing the performance of
access layer.
Data type Time variance and non-volatile Mostly includes consolidation data
design are strictly enforced. structures to meet subject area's
query and reporting needs.
Data value Read-Only from the end-user’s Transaction data regardless of

standpoint. grain fed directly from the Data
Warehouse.
Scope Data warehousing is more helpful Data mart contains data, of a

as it can bring information from specific department of a company.
any department. There are maybe separate data
marts for sales, finance, marketing,
etc. Has limited usage
Source In Data Warehouse Data comes In Data Mart data comes from
from many sources. very few sources.
Size The size of the Data Warehouse The Size of Data Mart is less than
may range from 100 GB to 1 TB+. 100 GB.
Implementation The implementation process of The implementation process of

time Data Warehouse can be extended Data Mart is restricted to few
from months to years. months.
2.6 Knowledge Management
Knowledge is very important for survival of organization. Historically,

employees have gathered knowledge through trial-and-error method or by
working as an apprentice under a tenured knowledgeable employee.
Management guru Peter Drucker forwarded a concept that knowledge is as
valuable as a company’s various asset like plant, machinery, etc. A knowledge

management system comprises a range of practices used in an organization
to identify, create, represent, distribute, and enable adoption to insight and
experience. Such insights and experience comprise knowledge, either
embodied in individual or embedded in organizational processes and
practices
Importance of Knowledge Management
Knowledge provides a competitive advantage to an employee as well as the

organization. The data and information which come with knowledge help
organization make an informed decision. For example, knowledge about
competitors pricing model or business strategy can help organization work
towards bettering the competitor. Historical data e.g sales data, pricing data,
etc. can help organization improve existing or proposed business initiative.
Knowledge management is a highly iterative process which consists of six

major tasks like create, capture, refine store, tag and circulate. The first step is
to create or capture data and store it at appropriate location. The second step
is to refine the data into meaningful information. The third step is to transmit
information to relevant stakeholders.
There are two types of knowledge, which need to be capture as part of

knowledge management. The first type is hard data in terms of numbers and
figures. The second type of knowledge is the interpretation of data captured
based on experience. The real need of the knowledge management system is
to provide access to the knowledge base whenever required.
Knowledge Management System
The systems develop to capture, create, refine, tag and circulate information
used to improve business productivity of the organization. There are three
broadways of managing the knowledge system.
❖ Utilization of information technology and systems to improve business

efficiency.
❖ Utilization of organizational method to improve business efficiency.
❖ Creating a healthy workplace to facilitate improvement of business
efficiency.
Structure
The structure of the knowledge management system is dependent on the

business strategy of the organization. The final structure needs to have
alignment of technology, organizational structure and work culture.
Figure 26:Knowledge Management
Types of Knowledge Management Systems
Based on structure and requirement of organization, there are several types of

knowledge management systems. Some of them are as follows:
i. Expert Systems
These are knowledge management systems developed to facilitate a Subject

Matter Expert. This module provides knowledge of different subjects.
ii. Groupware
In the current global scenario, team members are spread across regions.
However, it is important for them to collaborate on various projects. Groupware
is a knowledge management system which helps in sharing calendar, project
activities and instant messaging.
iii. SharePoint
It is important for team to store various documents at a single location.

SharePoint enables a user to store multiple version of the same document, helps
a user search through folders for document, etc.
iv. Decision Support System
Decision support system helps floor managers; Sales Manager, CEO, etc. take
decisions to finalize business or operational strategy. Decision support system
comprises of primary data as well as secondary data. Decision support system

enables editing of data and converts it information in the desired format.
v. Database Management System
Knowledge management systems which support active storage and retrieval of

data are known as a database management system.
All the systems we are discussing here come under knowledge management
category. A knowledge management system is not radically different from all
these information systems, but it just extends the already existing systems by
assimilating more information.
As we have seen, data is raw facts, information is processed and/or interpreted

data, and knowledge is personalized information.
What is Knowledge?
• Personalized information
• State of knowing and understanding
• An object to be stored and manipulated
• A process of applying expertise

• A condition of access to information
• Potential to influence action
Sources of Knowledge of an Organization
• Intranet
• Data warehouses and knowledge repositories
• Decision support tools
• Groupware for supporting collaboration
• Networks of knowledge workers
• Internal expertise
Purpose of KMS
• Improved performance
• Competitive advantage
• Innovation
• Sharing of knowledge
• Integration
• Continuous improvement by −
o Driving strategy
o Starting new lines of business
o Solving problems faster
o Developing professional skills
o Recruit and retain talent
Activities in Knowledge Management
• Start with the business problem and the business value to be delivered first.
• Identify what kind of strategy to pursue to deliver this value and address the KM
problem.
• Think about the system required from a people and process point of view.
• Finally, think about what kind of technical infrastructure are required to support
the people and processes.
• Implement system and processes with appropriate change management and

iterative staged release.
Level of Knowledge Management
Figure 27:Level of Knowledge Management

Business Intelligence (BI)

• Managers and Decision Making
Decision Making in Management
Decision making is the mental process of selecting a course of action from a set of
alternatives. Decision making is the mental process of choosing from a set of alternatives.
Every decision-making process produces an outcome that might be an action, a
recommendation, or an opinion. Since doing nothing or remaining neutral is usually
among the set of options one chooses from, selecting that course is also deciding.
Difference Between Problem Analysis and Decision Making
While they are related, problem analysis and decision making are distinct activities.
Decisions are commonly focused on a problem or challenge. Decision makers must gather
and consider data before making a choice. Problem analysis involves framing the issue by
defining its boundaries, establishing criteria with which to select from alternatives, and
developing conclusions based on available information. Analyzing a problem may not
result in a decision, although the results are an important ingredient in all decision
making.
Steps in Decision Making
Decision making comprises a series of sequential activities that together structure the
process and facilitate its conclusion. These steps are:
• Establishing objectives
• Classifying and prioritizing objectives
• Developing selection criteria
• Identifying alternatives
• Evaluating alternatives against the selection criteria
• Choosing the alternative that best satisfies the selection criteria
• Implementing the decision
A major part of decision making involves the analysis of a defined set of alternatives
against selection criteria. These criteria usually include costs and benefits, advantages and
disadvantages, and alignment with preferences. For example, when choosing a place to
establish a new business, the criteria might include rental costs, availability of skilled labor,
access to transportation and means of distribution, and proximity to customers. Based on
the relative importance of these factors, a business owner decides that best meets the
criteria.
The decision maker may face a problem when trying to evaluate alternatives in terms of
their strengths and weaknesses. This can be especially challenging when there are many
factors to consider. Time limits and personal emotions also play a role in the process of
choosing between alternatives. Greater deliberation and information gathering often
takes additional time, and decision makers often must choose before they feel fully
prepared. In addition, the more that is at stake the more emotions are likely to come into
play, and this can distort one’s judgment.
Types of Decisions
Three approaches to decision making are avoiding, problem solving and problem
seeking.
• Problem seeking: The process of clarifying, understanding, and restating the

problem.
• Problem solving: Problem solving involves using generic or ad hoc methods, in
an orderly manner, for finding solutions to specific problems.
Every decision-making process reaches a conclusion, which can be a choice to act or not
to act, a decision on what course of action to take and how, or even an opinion or
recommendation. Sometimes decision-making leads to redefining the issue or
challenge. Accordingly, three decision-making processes are known as avoiding,
problem solving, and problem seeking.
One decision-making option is to make no choice at all. There are several reasons why
the decision maker might do this:
1. There is insufficient information to make a reasoned choice between alternatives.

2. The potential negative consequences of selecting any alternative outweigh the
benefits of selecting one.
3. No pressing need for a choice exists and the status quo can continue without harm.
4. The person considering the alternatives does not have the authority to decide.
BI for Data analysis and Presenting Results
Business Intelligence (BI) is a technology-driven process for analyzing data and

presenting actionable information to help executives, managers and other
corporate end users make informed business decisions. BI encompasses a wide
variety of tools, applications and methodologies that enable organizations to
collect data from internal systems and external sources, prepare it for analysis,
develop and run queries against that data, and create reports, dashboards and
data visualizations to make the analytical results available to corporate decision-
makers, as well as operational workers.
Business intelligence is sometimes used interchangeably with business analytics.
In other cases, business analytics is used either more narrowly to refer to advanced
data analytics or more broadly to include both BI and advanced analytics.
Importance of Business Intelligence

The potential benefits of business intelligence tools include accelerating and
improving decision-making, optimizing internal business processes, increasing
operational efficiency, driving new revenues and gaining competitive advantage
over business rivals. BI systems can also help companies identify market trends
and spot business problems that need to be addressed.
BI data can include historical information stored in a data warehouse, as well as

new data gathered from source systems as it is generated, enabling BI tools to
support both strategic and tactical decision-making processes.
Initially, BI tools were primarily used by data analysts and other IT professionals
who ran analyses and produced reports with query results for business users.
Increasingly, however, business executives and workers are using BI platforms
themselves, thanks partly to the development of self-service BI and data discovery
tools and dashboards.
Types of BI tools
Business intelligence combines a broad set of data analysis applications, including
ad hoc analytics and querying, enterprise reporting, online analytical processing
(OLAP), mobile BI, real-time BI, operational BI, software-as-a-service BI, open
source BI, collaborative BI and location intelligence.
BI technology also includes data visualization software for designing charts and
other infographics, as well as tools for building BI dashboards and performance
scorecards that display visualized data on business metrics and key performance
indicators in an easy-to-grasp way. Data visualization tools have become the
standard of modern BI in recent years. A couple leading vendors defined the
technology early on, but more traditional BI vendors have followed in their path.
Now, virtually every major BI tool incorporates features of visual data discovery.
BI programs may also incorporate forms of advanced analytics, such as data

mining, predictive analytics, text mining, statistical analysis and big data analytics.
In many cases, though, advanced analytics projects are conducted and managed
by separate teams of data scientists, statisticians, predictive modelers and other

skilled analytics professionals, while BI teams oversee more straightforward
querying and analysis of business data.
Business intelligence data is typically stored in a data warehouse or in smaller data
marts that hold subsets of a company's information. In addition, Hadoop systems
are increasingly being used within BI architectures as repositories or landing pads
for BI and analytics data, especially for unstructured data, log files, sensor data
and other types of big data. Before it is used in BI applications, raw data from
different source systems must be integrated, consolidated and cleansed using
data integration and data quality tools to ensure that users are analyzing accurate
and consistent information.
Questions
2 Marks Questions
1. Define Database Approach
2. Define Big Data with example.
3. Define Datawarehouse and Data Mart.
4. Define Knowledge management with neat diagram.
5. What are the 3V’s of big data analytics.
6. What are the roles of BI?
7. Differentiate between traditional Computing and Stream Computing.
8. Define data management.
5 Marks Questions
1. Describe the importance of Business Intelligence and DSS in developing MIS
2. Explain the MIS pyramid.
3. What is Information system? What are functions of information system and its
impact on the society in the domain of health care.
4. Explain the ethical issues and threats of information security.
5. Differentiate between Datawarehouse and Data Mart.
6. Differentiate between OLAP and OLTP.
7. Explain with neat diagram the Value Chain of Big data.
8. Explain the Knowledge Management framework with KM Ladder.
10 Marks Question
1. What is the role of knowledge management and knowledge management

programs in business?
2. What are the business benefits of using intelligent techniques for knowledge
management?
3. What is the role of knowledge management and knowledge management
programs in business?
4. How do different decision-making constituencies in an organization use
business intelligence?
5. Explain the importance of transition from traditional database systems to big
data analytics system with respect to the online processing system.

Chapter 2 - Data and Knowledge Management - NOTES

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2 - Data and Knowledge Management - NOTES

Uploaded by

Copyright:

Available Formats

University of Mumbai -ILO7013

Data and Knowledge Management

Figure 1: Data Management- YouTube Video

❖ Database: A database is a collection of related data.

Figure 2: Database Approach

Figure 3: Database Approach- YouTube Video

2.1.1 Building blocks of a Database

2.1.2 Characteristics of database

iii. Permanent or Persistence: Data in a database exist permanently in the

iv. Validity/integrity/Correctness: Data should be correct with respect to the

v. Security: Data should be protected from unauthorized access.

vi. Consistency: Whenever more than one data element in a database

vii. Non-redundancy: No two data items in a database should represent the

viii. Independence: Data at different levels should be independent of each

x. Recoverable: It should be recoverable in case of damage.

xi. Flexible to change: It should be flexible to change.

Figure 4: Database Approach

2.1.3 Traditional File Processing System and Its Characteristics

Library Examinations Registrations

Library Examination Registration

Registration Data Registration Data Registration Data

Figure 5: Transaction Processing System

2.2 Big Data

The three Vs of Big data are Velocity, Volume and Variety.

Figure 6:Big Data Sphere

Figure 7:Big Data – Transactions, Interactions, Observations

Big data Characteristics

The three Vs of Big data are Velocity, Volume and Variety

Figure 8:Characteristics of Big Data

Figure 9:Volume, Velocity, Variety

Figure 10: Traditional vs Stream Computing - YouTube video

Figure 11: Big Data - YouTube Video

2.3 Data warehouse and Data Marts

2.3.1 Introduction to Data Warehousing

A data warehouse is storage of convenient, consistent, complete and consolidated data,

Figure 12:Data Warehouse System Model

i. Subject-Oriented: Data warehouses are designed to aid in decision making for a

The fundamental goals of the data warehouse are:

1- “Makes an organization’s information accessible.” The contents of the data warehouse

2.3.3 Basic Elements of the Data Warehouse

The basic elements of the data warehouse are given in Figure 3

Source End User Data Access

User group driven;

Conform dimensions; Models:

No User Query Services;

Figure 13:The Basic Elements of the Data Warehouse

2.4 Differences between Operational Database Systems and Data

Figure 14: Data Warehouse Architecture

Feature OLTP OLAP

Characteristic Operational processing Informational processing

Orientation Transaction Analysis

User Clerk, DBA, database Knowledge worker

Data Current, guaranteed Historic, accuracy, maintained

Unit of work Short, simple Complex query

Focus Data in Information out

Operations Index/ hash on primary key Lots of scans

Number of Tens Millions

DB size GB to high- order GB >=TB

Priority High performance, High flexibility, end-user

ii. Data Warehouse Architecture (with a Staging Area)

2.6.1 Data Warehouse Architecture (Basic)

Data Sources Warehouses Users

Figure 15:Architecture of a Data Warehouse (Basic)