Professional Documents
Culture Documents
Lesson 2. Characteristics and Value of Information
Lesson 2. Characteristics and Value of Information
Introduction
Data is raw facts. Data is like raw material. Data does not interrelate
and also it does not help in decision making. Data is defined as
groups of non-random symbols in the form of text, images, voice
representing quantities, action and objects.
Information : Information is the product of data processing.
Information is interrelated data. Information is equivalent to finished
goods produced after processing the raw material.
The information has a value in decision making. Information brings clarity and creates an intelligent human response in the mind.
According to Davis and Olson : “Information is a data that has been processed into a form that is meaningful to recipient and is of
real or perceived value in the current or the prospective action or decision of recipient.”
It is a most critical resource of the organization. Managing the information means managing future. Information is knowledge that one
derives from facts placed in the right context with the purpose of reducing uncertainty.
Learning Objectives:
LESSON PROPER
A. Characteristics of Information
The parameters of a good quality are difficult to determine for information. Quality of information refers to its fitness for use, or its
reliability. Following are the essential characteristic features:
i) Timeliness : Timeliness means that information must reach the recipients within the prescribed timeframes. For effective
decisionmaking, information must reach the decision-maker at the right time, i.e. recipients must get information when they
need it. Delays destroys the value of information. The characteristic of timeliness, to be effective, should also include up-to-
date, i.e. current information.
ii) Accuracy : Information should be accurate. It means that information should be free from mistakes, errors &, clear.
Accuracy also means that the information is free from bias. Wrong information given to management would result in wrong
decisions. As managers decisions are based on the information supplied in MIS reports, all managers need accurate
information.
iii) Relevance : Information is said to be relevant if it answers especially for the recipient what, why, where, when, who and
why? In other words, the MIS should serve reports to managers which is useful and the information helps them to make
decisions.
iv) Adequacy : Adequacy means information must be sufficient in quantity, i.e. MIS must provide reports containing
information which is required in the deciding processes of decision-making. The report should not give inadequate or for that
matter, more than adequate information, which may create a difficult situation for the decision-maker. Whereas inadequacy of
information leads to crises, information overload results in chaos.
v) Completeness : The information which is given to a manager must be complete and should meet all his needs. Incomplete
information may result in wrong decisions and thus may prove costly to the organization.
vi) Explicitness : A report is said to be of good quality if it does not require further analysis by the recipients for decision
making.
vii) Impartiality : Impartial information contains no bias and has been collected without any distorted view of the situation.
Assessment
1
____________________________________________________________________________________________________
____________________________________________________________________________________________________
____________________________________________________________________________________________________
____________________________________________________________________________________________________
___________________________________________________________________________.
Introduction
The IT architecture and IT infrastructure provide the basis for all information systems in
the organization. An information system (IS) collects, processes, stores, analyzes, and
disseminates information for a specific purpose. A computer-based information system
(CBIS) is an information system that uses computer technology to perform some or all
of its intended tasks. Although not all information systems are computerized, most are.
For this reason, the term “information system” is typically used synonymously with
“computer based information system.”
Learning Objectives:
LESSON PROPER
A. Major Capabilities of Computer-Based Information Systems
•Perform high-speed, high-volume, numerical computations.
•Provide fast, accurate communication and collaboration within and among organizations.
• Store huge amounts of information in an easy-to-access, yet small, space.
•Allow quick and inexpensive access to vast amounts of information, worldwide.
Facilitate the interpretation of vast amounts of data.
Increase the effectiveness and efficiency of people working in groups in one place or in several locations, anywhere.
Automate both semiautomatic business processes and manual tasks.
B. Application Programs
Once the business architecture is finished, the system developer can start a five-step process of building the IT architecture, as shown
in Figure 35. Notice that translating the business objectives into IT architecture can be a very complex undertaking. Let us look now at
various basic elements of IT architecture.
2
C. Managing Information Resources
Information resources are a general term that includes all the hardware, software (information systems and applications), data, and
networks in an organization. In addition to the computing resources, numerous applications exist, and new ones are continuous ly being
developed. Applications have enormous strategic value. Firms rely on them so heavily that, in some cases, when they are not working
(even for a short time), an organization cannot function. In addition, these information systems are very expensive to acquir e, operate,
and maintain. Therefore, it is essential to manage them properly.
However, it is becoming increasingly difficult to manage an organization’s information resources effectively. The reason for this difficulty
comes from the evolution of the MIS function in the organization. When businesses first began to use computers in the early 1950s, the
information systems department (ISD) owned the only computing resource in the organization, the mainframe. At that time, end users
did not interact directly with the mainframe.
Today, computers are located throughout the organization, and almost all employees use computers in their work. This system is
known as end user computing. As a result of this change, the ISD no longer owns the organization’s information resources. Ins tead, a
partnership has developed between the ISD and the end users. The ISD now acts as more of a consultant to end users, viewing them
as customers. In fact, the main function of the ISD is to use IT to solve end users’ business problems.
As we just saw, the responsibility for managing information resources is now divided between the ISD and the end users. This
arrangement raises several important questions:
Which resources are managed by whom?
What is the role of the ISD, its structure, and its place within the organization?
In this section we provide brief answers to these questions. There are many types of information systems resources. In addition, their
components may come from multiple vendors and be of different brands. The major categories of information resources are hardware,
software, databases, networks, procedures, security facilities, and physical buildings. These resources are scattered throughout the
organization, and some of them change frequently. Therefore, they can be difficult to manage.
To make things more complicated, there is no standard menu for how to divide responsibility for developing and maintaining information
resources between the ISD and end users.
Instead, that division depends on many things: the size and nature of the organization, the amount and type of IT resources, the
organization’s attitudes toward computing, the attitudes of top management toward computing, the maturity level of the techno logy, the
amount and nature of outsourced IT work, and even the country in which the company operates.
Generally speaking, the ISD is responsible for corporate-level and shared resources and the end users are responsible for
departmental resources.
It is important that the ISD and the end users work closely together and cooperate regardless of who is doing what. Let us begin by
looking at the role of the ISD within the organization.
E. The Role of the IS Department
Inside the organization, the ISD and the end-user units must be close
partners. The ISD has the responsibility for setting standards for
hardware and software purchases, as well as for information security.
The ISD also monitors user hardware and software purchases, and it
serves as a gatekeeper concerning software licensing and illegal
downloads (e.g., music files).
Assessment
1. Explain the main role of director of ISD?
____________________________________________________________________________________________________
____________________________________________________________________________________________________
____________________________________________________________________________________________________
____________________________________________________________________________________________________
_______________________________________________________________________________________________.
3
Lesson 3. Purpose of Information Management
Introduction
To gain the maximum benefits from your company's information system, you have to exploit all its capacities. Information systems
gain their importance by processing the data from company inputs to generate information that is useful for managing your
operations. To increase the information system's effectiveness, you can either add more data to make the information more acc urate
or use the information in new ways.
Learning Objectives:
LESSON PROPER
A. Business Communication Systems
Part of management is gathering and distributing information, and information systems can make this process more efficient by
allowing managers to communicate rapidly. Email is quick and effective, but managers can use information systems even more
efficiently by storing documents in folders that they share with the employees who need the information. This type of communication
lets employees collaborate in a systematic way.
Each employee can communicate additional information by making changes that the system tracks. The manager collects the inputs
and sends the newly revised document to his target audience.
How you manage your company's operations depends on the information you have. Information systems can offer more complete
and more recent information, allowing you to operate your company more efficiently. You can use information systems to gain a cost
advantage over competitors or to differentiate yourself by offering better customer service. Sales data give you insights abo ut what
customers are buying and let you stock or produce items that are selling well. With guidance from the information system, you can
streamline your operations.
C. Company Decision-Making
The company information system can help you make better decisions by delivering all the information you need and by modeling the
results of your decisions. A decision involves choosing a course of action from several alternatives and carrying out the
corresponding tasks. When you have accurate, up-to-date information, you can make the choice with confidence.
If more than one choice looks appealing, you can use the information system to run different scenarios. For each possibility, the
system can calculate key indicators such as sales, costs and profits to help you determine which alternative gives the most
beneficial result.
D. Company Record-Keeping
Your company needs records of its activities for financial and regulatory purposes as well as for finding the causes of problems and
taking corrective action. The information system stores documents and revision histories, communication records and operational
data. The trick to exploiting this recording capability is organizing the data and using the system to process and present it as useful
historical information. You can use such information to prepare cost estimates and forecasts and to analyze how your actions
affected the key company indicators.
Assessment
Introduction
Data consists of raw facts, such as employee numbers and sales figures. For data to be transformed into useful information, it must first be
organized in a meaningful way.
4
Learning Objectives:
LESSON PROPER
A. The Hierarchy of Data
Data of a book is organised into characters, words, phrases, sentences, paragraphs and
chapters. Similarly data in a database can be organised into fields, records and files that
forms a hierarchy. Data hierarchy begins with the smallest piece of data used by computers (a
bit) and progress through the hierarchy to a database.
Bits can be organized into units called bytes. A byte is typically 8 bits. Each byte
represents a character. Character is the basic building block of data, consisting of
letters (A, B, C, …, Z, a, b, …, z), numeric digits (0, 1, 2, …, 9) or special symbols (., +, - ,
@, …).
Characters are put together to form a field. A field is typically a name (employee name),
number (salary) or combination of characters (national ID number) that describes an
aspect of a business object (e.g. an employee, a location, a vehicle) or activity (e.g. a
sale).
A collection of related records is a file (e.g. all employee records – typically known as
the employee file).
Database is a collection of interrelated files (e.g. Employee file, Department file, Payroll file). A KEY (e.g. Employee, Department and
Payroll files can be linked by the Employee ID key)
Entity is a generalised class of people (e.g. Employee, Student, Customer), places (e.g. City, Outlet, Warehouse) or things ( e.g. Part,
Item, Inventory) for which data is collected, stored and maintained. A record is an instance of an entity.
Attribute is a characteristic of an entity. Employee name, Designation are attributes of an employee. Main purpose of an attr ibute is to
capture the relevant characteristics of entities such as employees or customers. The specific value of an attribute, called a data item,
can be found in the fields of the record describing an entity (e.g. De Silva is data item of the name attribute of an employee entity).
A key is a field or set of fields in a record that is used to identify the record. A primary key is a key that uniquely identifies the record
(e.g. national ID number or employee number may be used to identify an employee uniquely).
From the beginning of the use of computers to perform business functions, companies have used the traditional approach to process
their functions. In the traditional approach separate files have been used for each application. Today it has changed to data base
approach which uses a unified and integrated database for most of the transactions of the company.
Traditional Approach
Manual method of managing data is by recording them on paper
(e.g. filling an employee application form) and putting them in files
(e.g. employee file), which are stored using filing cabinets of the
personnel division.
Each division created and managed files required for their applications.
Thus data which were common for several applications appeared in many files (e.g. employee name, address). This became one of the
flaws of the traditional approach to data management (e.g. employee name and address appeared in employee file, payroll file,
employee performance management file etc.). Duplication of data in separate files is known as data redundancy.
5
This caused problems when data had to be developed and coordinated to ensure that each file was properly updated. As this is difficult
to achieve in practice lot of inconsistencies could occur among data stored in separate files.
• Lack of Data Integration Having data in independent files made it difficult to provide end users with information for ad hoc
requests that required accessing data stored in several different files. Special computer programs had to be written to
retrieve data from each independent file. This was so difficult, time consuming and costly for some organizations that it was
impossible to provide end users or management with such information. If necessary, end users had to manually extract the
required information from the various reports produced by each separate application and prepare customized reports for
management.
• Data Dependence In file processing systems, major components of a system – the organization of files, their physical
locations on storage hardware, and the application software used to access those files – depended on one another in
significant ways. For example, application programs typically contained references to the specific format of the data stored
in the files they used. Thus, changes in the format and structure of data and records in a file required that changes be made
to all of the programs that used that file. This program maintenance effort was a major burden of file processing systems. It
proved difficult to do properly, and it resulted in a lot of inconsistency in the data files.
*DBA - a person responsible for the installing, configuring, upgrading, administrating, monitoring and maintaining of databases
in an organization.
F. Types of DBMS
Data model is a tool that the database designers use to show the logical relationships among data. When data modeling done at a level
of entire organization it is known as enterprise data modeling.
Based on the type of data modeling used different DBMS exist. They are hierarchical, network, relational and Object-Oriented models.
Based on the no. of users too DBMS types are identified. They are single user (e.g. MS Access) and multi-user (e.g. Oracle) DBMS.
6
Let’s consider the hierarchical, network, relational and Object-Oriented database models.
Hierarchical Databases
It is one of the oldest methods of organizing and storing data, and used by few
organizations. Related fields or records are grouped together so that there are
higher-level records and lowerlevel records, just like the parents in a family tree
sit above the subordinated children. Furthermore, each child can also be a
parent with children underneath it.
The parent record at the top of the pyramid is called the root record. A child
record always has only one parent record to which it is linked, just like in a
normal family tree. In contrast, a parent record may have more than one child
record linked to it. Hierarchical databases work by moving from the top down. A
record search is conducted by starting at the top and working down through the
tree from parent to child until the appropriate child record is found.
Network Databases
The relational model describes data using a standard tabular format. In a database structured according to the relational model, all data
elements are placed in two dimensional tables, called relations which are the logical equivalent of files. The tables in relational
databases organize data in rows and columns, simplifying data access and manipulation. It is easier for managers to understand the
relational model than other database models. IT1105 ©UCSC 8 In the relational model, each row of a table represents a data entity,
with the columns of the table representing attributes. Each attribute can take on only certain values. The allowable values for these
attributes are called the domain. The domain for a particular attribute indicates what values can be placed in each of the columns of the
relational table. The relational database model is widely used.
7
Manipulating Data
Once data has been placed into a relational database, users can make inquiries & analyze data. To manipulate relational databases a
set of relational operators have been defined. Basic data manipulations using relational operators include selecting, projecting & joining.
Selecting involves eliminating rows according to certain criteria. Projecting involves eliminating columns in a table. Joining involves
combining two or more tables.
Hierarchical and network databases are all designed to handle structured data; that is, data that fits nicely into fields, rows, and
columns. They are useful for handling small snippets of information such as names, addresses, zip codes, product numbers, and any
kind of statistic or number you can think of. On the other hand, an object-oriented database can be used to store data from a variety of
media sources, such as photographs and text, and produce work, as output, in a multimedia format.
Object-oriented databases use small, reusable chunks of software called objects. The objects themselves are stored in the obje ct-
oriented database. Each object consists of two elements: 1) a piece of data (e.g., sound, video, text, or graphics), and 2) the
instructions, or software programs called methods, for what to do with the data. Part two of this definition requires a little more
explanation. The instructions contained within the object are used to do something with the data in the object. For example, test scores
would be within the object as would the instructions for calculating average test score.
Object-oriented databases have two disadvantages. First, they are more costly to develop. Second, most organizations are reluctant to
abandon or convert from those databases that they have already invested money in developing and implementing. However, the
benefits to objectoriented databases are compelling. The ability to mix and match reusable objects provides incredible multimedia
capability. Healthcare organizations, for example, can store, track, and recall CAT scans, X-rays, electrocardiograms and many other
forms of crucial data.
Assessment:
Introduction
Database Management Systems (DBMS) is a collection of programs that manages a databases structure and control access to the
data stored in the database. It is software which facilitates the process of defining, storing, manipulating and sharing database among
various users and applications.
8
As stated below, Database Applications enable organisations to generate information useful for decision making. Furthermore, with the
help of databases organisations try to improve their efficiency as well as achieve competitive. For e.g. databases support organisations
to carry out data mining and business intelligence which will help to identify customer preferences effectively.
Learning Objectives:
LESSON PROPER
Typical users of the DBMS are Database Administrator, database designers, end users, systems analysts and application
programmers. DBMS performs several important factors which are discussed below.
Like other software products, there are a number of commercial database systems (e.g. SQL Server, DB2, Oracle, Informix) and open-
source (PostgreSQL and MySQL).
mySQL : This is the most popular open-source database management system.
Today customers, suppliers and company employees must be able to access corporate database through the internet, intranet and
extranet to meet various business needs.
Example:
1. When a customer is going to buy a book through internet he is accessing a database to find the book information, author,
price, etc.
2. With the help of the databases the suppliers can check the raw materials and the current production schedule to determine
when & how much of their products must be delivered to support just-in-time inventory management
3. Employees Of a company working from abroad may want to access the internal databases through the Internet or the
intranet to make important decisions.
Developing a seamless integration of traditional databases with the internet is often called a semantic web. The semantic web is about
taking the relational database & webbing it. It allows accessing and manipulating a number of traditional databases at the sa me time
through the internet.
Instead of the internet, organizations are gaining access to databases through networks to get good prices and reliable services.
However, linking company databases to external network such as the Internet can be potentially dangerous due to issues relate d to
security. For example a competitor or any other hacker may gain access to these databases.
Using data warehouses and data mining data can be used to support
decision making. A data warehouse stores data that have been
extracted from the various operational, external and other databases
of an organisation. It is a central source of the data that have been
cleaned, transformed, catalogued so that they can be used by
managers and other business professionals for data mining, online
analytical processing and other form of business analysis, market
research and decision support. A data warehouse can also be
viewed as a database for historical data from different functions
within a company.
Data Mining
Data mining is a major use of data warehouse databases. It is an information-analysis tool that involves the automated discovery of
hidden patterns and trends in historical business activity. Data mining’s objective is to extract patterns, trends and rules from data
warehouses to evaluate (i.e. predict or score) proposed business strategies, which in turn will improve competitiveness, impr ove profits
and transform business processes. With the help of data mining it is possible to improve customer retention, campaign management
and customer segmentation analysis.
9
D. Business Intelligence
Databases can be used for the purpose of business intelligence closely linked to data mining. Business Intelligence is the process of
gathering enough of the right information in a timely manner and usable form and analyzing it so that it can have a positive impact on
business strategy, tactics or operations. Business Intelligence turns data into valuable information and distributes it throughout an
enterprise. This information is used by the companies to improve strategic discussions about which markets to enter, how to select and
manage key customer relationships, how to improve sales promotions etc. Business Intelligence tools (applications) can be found in
different categories such as Business planning, Customer Relationship Management (CRM), Management Information Systems (MIS)
etc.
Online analytical processing allows users to explore data from a number of different perspectives. OLAP involves analysing complex
relationships among thousands or even millions of data items stored in data marts, data warehouses, and other multidimensiona l
databases to discover patterns, trends and exceptional conditions.
The database administrator often selects the best database management system for an organization. The process begins by a nalyzing
database needs and characteristics. The information needs of the organization affect the type of data that is collected and the type of
database management system that is used.
The important features that have to be considered when selecting a Database Management System are as follows.
Database size
Database size depends on the number of records or files in the database. The size determines the overall storage requirement
for the database.
To maintain good performance and to reduce costs companies are trimming the size of their databases.
Number of concurrent users
Number of simultaneous users that can access the contents of the database is also an important factor. A database that is
used by a large workgroup must be able to support number of concurrent users. If it cannot, then the efficiency of the user
requests will be lowered. To provide flexibility to the database, highly scalable DBMS is preferred by the companies. Scalability
describes how well a database performs as the size of the database or the number of concurrent users increase.
Performance
How fast the database is able to update records can be the most important performance criterion for some organizations.
Example: Credit and airline companies must have database systems that can immediately update customer records and check
credit or make a plane reservation in seconds not minutes. However payroll applications can be processed once a week or
less frequently and do not require immediate processing. When an application demands immediacy, it also demands rapid
recovery facilities in the event that the computer system shuts down temporarily. Other performance considerations include the
number of concurrent users that can be supported and the amount of memory that is required to execute the database
management program.
Integration
A key aspect of any database is its’ ability to be integrated with other applications and databases. A key determinant here is
what operating systems it can run under – such as Linux, UNIX or Windows. Some companies use several databases for
different applications at different locations.
Features
The features of the database management system can also make a big difference. Most database programs come with
security procedures, privacy protection and a variety of tools.
The vendor
The size, reputation and financial stability of the vendor is also an important aspect. Some organisations would rely on vendor
support to handle operational aspects of the system.
Cost
Cost of a database system varies from few thousands to millions of rupees based on the number of users and functionalities.
In addition to the initial cost of the database package, annual or monthly maintenance or operating costs should be
considered.
Assessment:
Direction: Write TT if the statement is true and FF it is rather false.
_____1. Today customers, suppliers and company employees must not be able to access corporate database through the
internet, intranet and extranet to meet various business needs.
_____2. Data Warehouse is a major use of data warehouse databases.
_____3. Databases can be used for the purpose of business intelligence closely linked to data mining.
_____4. The features of the database management system can also make a big difference.
_____5. The size, reputation and financial stability of the vendor is not an important aspect.
Lesson 1. Overview
Introduction
The term "Data Warehouse" was first coined by Bill Inmon in 1990. According to Inmon, a data warehouse is a subject oriented,
integrated, time-variant, and non-volatile collection of data. This data helps analysts to take informed decisions in an organization.
10
An operational database undergoes frequent changes on a daily basis on account of the transactions that take place. Suppose a
business executive wants to analyze previous feedback on any data such as a product, a supplier, or any consumer data, then the
executive will have no data available to analyze because the previous data has been updated due to transactions.
A data warehouses provides us generalized and consolidated data in multidimensional view. Along with generalized and consolidated
view of data, a data warehouses also provides us Online Analytical Processing (OLAP) tools. These tools help us in interactive and
effective analysis of data in a multidimensional space. This analysis results in data generalization and data mining.
Data mining functions such as association, clustering, classification, prediction can be integrated with OLAP operations to enhance the
interactive mining of knowledge at multiple level of abstraction. That's why data warehouse has now become an important platform for
data analysis and online analytical processing.
Learning Objectives:
LESSON PROPER
An operational database is constructed for well-known tasks and workloads such as searching particular records, indexing,
etc. In contract, data warehouse queries are often complex and they present a general form of data.
Operational databases support concurrent processing of multiple transactions. Concurrency control and recovery
mechanisms are required for operational databases to ensure robustness and consistency of the database.
An operational database query allows to read and modify operations, while an OLAP query needs only read only access of
stored data.
An operational database maintains current data. On the other hand, a data warehouse maintains historical data.
Subject Oriented − A data warehouse is subject oriented because it provides information around a subject rather than the
organization's ongoing operations. These subjects can be product, customers, suppliers, sales, revenue, etc. A data
warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision
making.
Integrated − A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases,
flat files, etc. This integration enhances the effective analysis of data.
Time Variant − The data collected in a data warehouse is identified with a particular time period. The data in a data
warehouse provides information from the historical point of view.
Non-volatile − Non-volatile means the previous data is not erased when new data is added to it. A data warehouse is kept
separate from the operational database and therefore frequent changes in operational database is not reflected in the data
warehouse.
Note − A data warehouse does not require transaction processing, recovery, and concurrency controls, because it is physically stored
and separate from the operational database.
D. Data Warehouse Applications
As discussed before, a data warehouse helps business executives to organize, analyze, and use their data for decision making. A
data warehouse serves as a sole part of a plan-execute-assess "closed-loop" feedback system for the enterprise management. Data
warehouses are widely used in the following fields −
Financial services
Banking services
Consumer goods
Retail sectors
Controlled manufacturing
11
Information Processing − A data warehouse
allows to process the data stored in it. The data
can be processed by means of querying, basic
statistical analysis, reporting using crosstabs,
tables, charts, or graphs.
Analytical Processing − A data warehouse
supports analytical processing of the
information stored in it. The data can be
analyzed by means of basic OLAP operations,
including slice-and-dice, drill down, drill up, and
pivoting.
Data Mining − Data mining supports
knowledge discovery by finding hidden
patterns and associations, constructing
analytical models, performing classification and
prediction. These mining results can be
presented using the visualization tools.
Assessment
Lesson 2. Concepts
Introduction
Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from
multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data
warehousing involves data cleaning, data integration, and data consolidations.
Learning Objectives:
LESSON PROPER
Tuning Production Strategies − The product strategies can be well tuned by repositioning the products and managing the
product portfolios by comparing the sales quarterly or yearly.
Customer Analysis − Customer analysis is done by analyzing the customer's buying preferences, buying time, budget
cycles, etc.
Operations Analysis − Data warehousing also helps in customer relationship management, and making environmental
corrections. The information also allows us to analyze business operations.
B. Integrating Heterogeneous Databases
To integrate heterogeneous databases, we have two approaches −
Query-driven Approach
Update-driven Approach
C. Query-Driven Approach
This is the traditional approach to integrate heterogeneous databases. This approach was used to build wrappers and integrato rs on
top of multiple heterogeneous databases. These integrators are also known as mediators.
Disadvantages
Query-driven approach needs complex integration and filtering processes.
This approach is very inefficient.
It is very expensive for frequent queries.
This approach is also very expensive for queries that require aggregations.
D. Update-Driven Approach
This is an alternative to the traditional approach. Today's data warehouse systems follow update-driven approach rather than the
traditional approach discussed earlier. In update-driven approach, the information from multiple heterogeneous sources are integrated
in advance and are stored in a warehouse. This information is available for direct querying and analysis.
Advantages
This approach has the following advantages −
This approach provides high performance.
The data is copied, processed, integrated, annotated, summarized and restructured in semantic data store in advance.
Query processing does not require an interface to process data at local sources.
Introduction
A data warehouse is never static; it evolves as the business expands. As the business evolves, its requirements keep changing and
therefore a data warehouse must be designed to ride with these changes. Hence a data warehouse system needs to be flexible.
Ideally there should be a delivery process to deliver a data warehouse. However data warehouse projects normally suffer from various
issues that make it difficult to complete tasks and deliverables in the strict and ordered fashion demanded by the waterfall method.
Most of the times, the requirements are not understood completely. The architectures, designs, and build components can be
completed only after gathering and studying all the requirements.
Learning Objectives:
LESSON PROPER
A. Delivery Method
The delivery method is a variant of the joint application
development approach adopted for the delivery of a data
warehouse. We have staged the data warehouse
delivery process to minimize risks. The approach that we
will discuss here does not reduce the overall delivery
time-scales but ensures the business benefits are
delivered incrementally through the development
process.
Note − The delivery process is broken into phases to
reduce the project and delivery risk.
The following diagram explains the stages in the delivery
process −
13
B. IT Strategy
Data warehouse are strategic investments that require a business process to generate benefits. IT Strategy is required to procure and
retain funding for the project.
C. Business Case
The objective of business case is to estimate business benefits that should be derived from using a data warehouse. These benefits
may not be quantifiable but the projected benefits need to be clearly stated. If a data warehouse does not have a clear business case,
then the business tends to suffer from credibility problems at some stage during the delivery process. Therefore in data warehouse
projects, we need to understand the business case for investment.
D. Education and Prototyping
Organizations experiment with the concept of data analysis and educate themselves on the value of having a data warehouse before
settling for a solution. This is addressed by prototyping. It helps in understanding the feasibility and benefits of a data warehouse. The
prototyping activity on a small scale can promote educational process as long as −
E. Business Requirements
To provide quality deliverables, we should make sure the overall requirements are understood. If we understand the business
requirements for both short-term and medium-term, then we can design a solution to fulfil short-term requirements. The short-term
solution can then be grown to a full solution.
The following aspects are determined in this stage −
F. Technical Blueprint
This phase need to deliver an overall architecture satisfying the long term requirements. This phase also deliver the components that
must be implemented in a short term to derive any business benefit. The blueprint need to identify the followings.
K. Extending Scope
In this phase, the data warehouse is extended to address a new set of business requirements. The scope can be extended in two
ways −
Introduction
We have a fixed number of operations to be applied on the operational databases and we have well-defined techniques such as use
normalized data, keep table small, etc. These techniques are suitable for delivering a solution. But in case of decision-support
systems, we do not know what query and operation needs to be executed in future. Therefore techniques applied on operational
databases are not suitable for data warehouses.
Learning Objectives:
LESSON PROPER
Data extraction takes data from the source systems. Data load takes the extracted data and loads it into the data warehouse.
Note − Before loading the data into the data warehouse, the information extracted from the external sources must be reconstructed.
15
When to Initiate Extract
Data needs to be in a consistent state when it is extracted, i.e., the data warehouse should represent a single, consistent version of
the information to the user.
For example, in a customer profiling data warehouse in telecommunication sector, it is illogical to merge the list of customers at 8 pm
on Wednesday from a customer database with the customer subscription events up to 8 pm on Tuesday. This would mean that we ar e
finding the customers for whom there are no associated subscriptions.
Note − Consistency checks are executed only when all the data sources have been loaded into the temporary data store.
within itself.
with other data within the same data source.
with the data in other source systems.
with the existing data present in the warehouse.
Transforming involves converting the source data into a structure. Structuring the data increases the query performance and
decreases the operational cost. The data contained in a data warehouse must be transformed to support performance requirements
and control the ongoing operational costs.
Aggregation
Aggregation is required to speed up common queries. Aggregation relies on the fact that most common queries will analyze a subset
or an aggregation of the detailed data.
Backup and Archive the Data
In order to recover the data in the event of data loss, software failure, or hardware failure, it is necessary to keep regula r back ups.
Archiving involves removing the old data from the system in a format that allow it to be quickly restored whenever required.
For example, in a retail sales analysis data warehouse, it may be required to keep data for 3 years with the latest 6 months data being
kept online. In such as scenario, there is often a requirement to be able to do month-on-month comparisons for this year and last year.
In this case, we require some data to be restored from the archive.
D. Query Management Process
This process performs the following functions −
Lesson 4. Architecture
Learning Objectives:
16
LESSON PROPER
Since a data warehouse can gather information quickly and efficiently, it can enhance business productivity.
A data warehouse provides us a consistent view of customers and items, hence, it helps us manage customer relationship.
A data warehouse also helps in bringing down the costs by tracking trends, patterns over a long period in a consistent and
reliable manner.
To design an effective and efficient data warehouse, we need to understand and analyze the business needs and construct
a business analysis framework. Each person has different views regarding the design of a data warehouse. These views are as
follows −
The top-down view − This view allows the selection of relevant information needed for a data warehouse.
The data source view − This view presents the information being captured, stored, and managed by the operational system.
The data warehouse view − This view includes the fact tables and dimension tables. It represents the information stored
inside the data warehouse.
The business query view − It is the view of the data from the viewpoint of the end-user.
B. Three-Tier Data Warehouse Architecture
Generally a data warehouses adopts a three-tier architecture. Following
are the three tiers of the data warehouse architecture.
Virtual Warehouse
Data mart
Enterprise Warehouse
Virtual Warehouse
The view over an operational data warehouse is known as a virtual warehouse. It is easy to build a virtual warehouse. Building a
virtual warehouse requires excess capacity on operational database servers.
Data Mart
Data mart contains a subset of organization-wide data. This subset of data is valuable to specific groups of an organization.
In other words, we can claim that data marts contain data specific to a particular group. For example, the marketing data mar t may
contain data related to items, customers, and sales. Data marts are confined to subjects.
Points to remember about data marts −
Window-based or Unix/Linux-based servers are used to implement data marts. They are implemented on low-cost servers.
The implementation data mart cycles is measured in short periods of time, i.e., in weeks rather than months or years.
The life cycle of a data mart may be complex in long run, if its planning and design are not organization-wide.
Data marts are small in size.
Data marts are customized by department.
The source of a data mart is departmentally structured data warehouse.
Data mart are flexible.
Enterprise Warehouse
An enterprise warehouse collects all the information and the subjects spanning an entire organization
It provides us enterprise-wide data integration.
The data is integrated from operational systems and external information providers.
This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or beyond.
17
D. Load Manager
This component performs the operations required to extract and load process.
The size and complexity of the load manager varies between specific solutions from
one data warehouse to other.
Fast Load
In order to minimize the total load window the data need to be loaded into the warehouse in the fastest possible time.
The transformations affects the speed of data processing.
It is more effective to load the data into relational database prior to applying transformations and checks.
Gateway technology proves to be not suitable, since they tend not be performant when large data volumes are involved.
Simple Transformations
While loading it may be required to perform simple transformations. After this has been completed we are in position to do the complex
checks. Suppose we are loading the EPOS sales transaction we need to perform the following checks:
Strip out all the columns that are not required within the warehouse.
Convert all the values to required data types.
E. Warehouse Manager
A warehouse manager is responsible for the warehouse management process. It consists of third-party system software, C programs,
and shell scripts.
The size and complexity of warehouse managers varies between specific solutions.
Note − A warehouse Manager also analyzes query profiles to determine index and aggregations are appropriate.
F. Query Manager
Assessment
1. What does data warehouse offers? What are the advantages being
offered?
____________________________________________________________________________________________________
____________________________________________________________________________________________________
____________________________________________________________________________________________________
____________________________________________________________________________________________________
__________________________________________________________________________________.
VALUES INTEGRATION:
1. How will you manifest perseverance in understanding IM as a new subject for your taken course?
2. In creating group projects for Information Management how will you show cooperation with the team?
REFERENCES:
1. https://ecomputernotes.com/mis/what-is-mis/what-do-you-understand-by-information-what-are-the-characteristics-of-
information
2. https://uotechnology.edu.iq/ce/Lectures/SarmadFuad-MIS/MIS_Lecture_12.pdf
3. https://www.google.com/search?rlz=1C1CHBD_enPH863PH863&sxsrf=ALeKk01wezWlNY6gNJFw9k9hnvAk8QiwFw:159859
6554854&q=importance+of+information+management&sa=X&ved=2ahUKEwic9qmepL3rAhUFfnAKHb9GDpkQ1QIoAXoECA
oQAg&biw=1366&bih=625
4. https://www.tutorialspoint.com/dwh/dwh_data_warehousing.htm#:~:text=Data%20warehousing%20is%20the%20process,hoc
%20queries%2C%20and%20decision%20making.
5. https://slbit.files.wordpress.com/2015/02/organizing-data-and-information.pdf
Prepared by:
JELINDA T. MACASUSI
Full-time Lecturer
19