Cape Notes Unit 2 Module 1 Content 1 3 2

Syllabus Focus: Unit 2 Module 1 Content 1 - 3
Specific Objective 1: Differentiate among the terms used in Information Management

Content: For example: field, records, tables, files, database, and database management system
1. differentiate among terms used For example, fields, records, tables, files, database and
in Information Management; database management system.
What is Information Management?
Information Management (IM) is the collection and management of information from one or more sources
and the distribution of that information to one or more audiences. This sometimes involves those who have
a stake in, or a right to that information. Management means the organization of and control over the
structure, processing and delivery of information. In short, Information Management entails organizing,
retrieving, acquiring and maintaining information.
http://en.wikipedia.org/wiki/Information_management
Information Management helps you deliver data that is integrated, accurate, and timely across the enterprise.
With Information Management, you can provide trusted data for initiatives like business transaction
processing, business intelligence, data warehousing, data migration, or master data management.
http://www.businessobjects.com/product/im/
KEY TERMS USED IN INFORMATION MANAGEMENT
 Fields: field is a named unit of information. Each entry in a database can have multiple fields of multiple
types e.g. a text field called 'favourite color' which allows you to type in your favourite shade, or a menu
called 'Parish' that lets you choose one from a list of the parishes that make up the country. By combining
several fields with appropriate names and types you should be able to capture all the relevant information
about the items in your database.
 Records: In the context of a relational database, a row—also called a record or tuple—represents a
single, implicitly structured data item in a table. In simple terms, a database table can be thought of as
consisting of rows and columns or fields. Each row in a table represents a set of related data, and every
row in the table has the same structure.
 Tables: a collection of several records that are related each other. Table information are normally
arranged in a logical manner. The Personal Table in a Database might contain the fields: Person ID,
Surname, Christian Name, Street Address, Town, Date of Birth, Registration Fee, etc. and this contains
several related fields that communicate information about a set of records when looked at as one picture.
A database is a collection of data that you want to manage, rearrange, and add to later. It is a good program
to use to manage lists that are not entirely numbers, such as addresses and phone numbers, inventories, and
membership rosters. With a database you could sort the data by name or city or postal code or by any
individual item of information recorded. You can create forms to enter or update or just display the data.
You can create reports that show just the data you are interested in, like members who owe dues.
CAPE NOTES Unit 2 Module 1 Content 1 - 3 1

Both spreadsheets and databases can be used to handle much the same information, but each is optimized
to handle a different type most efficiently. The larger the number of records, the more important the
differences are.
Examples of databases: MS Access, dBase, FoxPro, Paradox, Approach, Oracle, Open Office Base.
Purpose: Managing data
Major Advantages: Can change way data is sorted and displayed
Features/Terms:
A flat database contains files which contain records which contain fields
A relational database contains tables which are linked together. Each table contains records which
contain fields. A query can filter your records to show just the ones that meet certain criteria or to
arrange them in a particular order.
Database: A Computer Database is a structured collection of records or data that is stored in a computer
system. The structure is achieved by organizing the data according to a database model. The model in most
common use today is the relational model. Other models such as the hierarchical model and the network
model use a more explicit representation of relationships (see below for explanation of the various database
models).
A computer database relies upon software to organize the storage of data. This software is known as a
database management system (DBMS). Database management systems are categorized according to the
database model that they support. The model tends to determine the query languages that are available to
access the database. A great deal of the internal engineering of a DBMS, however, is independent of the
data model, and is concerned with managing factors such as performance, concurrency, integrity, and
recovery from hardware failures. In these areas there are large differences between products.
http://en.wikipedia.org/wiki/Database
Hierarchical model
In a hierarchical model, data is organized into an inverted tree-like structure, implying a multiple downward
link in each node to describe the nesting, and a sort field to keep the records in a particular order in each
same-level list. This structure arranges the various data elements in a hierarchy and helps to establish logical
relationships among data elements of multiple files. Each unit in the model is a record which is also known

as a node. In such a model, each record on one level can be related to multiple records on the next lower
level. A record that has subsidiary records is called a parent and the subsidiary records are called children.
Data elements in this model are well suited for one-to-many relationships with other data elements in the
database.
This model is advantageous when the data elements are inherently hierarchical. The disadvantage is that in
order to prepare the database it becomes necessary to identify the requisite groups of files that are to be
logically integrated. Hence, a hierarchical data model may not always be flexible enough to accommodate
the dynamic needs of an organization.
Network model
In the network model, records can participate in any number of named relationships. Each relationship
associates a record of one type (called the owner) with multiple records of another type (called the
member). These relationships (somewhat confusingly) are called sets. For example a student might be a
member of one set whose owner is the course they are studying, and a member of another set whose owner
is the college they belong to. At the same time the student might be the owner of a set of email addresses,
and owner of another set containing phone numbers. The main difference between the network model and
hierarchical model is that in a network model, a child can have a number of parents whereas in a hierarchical
model, a child can have only one parent. The hierarchical model is therefore a subset of the network model.

Programmatic access to network databases is traditionally by means of a navigational data manipulation
language, in which programmers navigate from a current record to other related records using verbs such as
find owner, find next, and find prior. The most common example of such an interface is the COBOL-based
Data Manipulation Language defined by CODASYL.
Network databases are traditionally implemented by using chains of pointers between related records. These
pointers can be node numbers or disk addresses.
The network model became popular because it provided considerable flexibility in modelling complex data
relationships, and also offered high performance by virtue of the fact that the access verbs used by
programmers mapped directly to pointer-following in the implementation.
However, the model had several disadvantages. Navigational programming proved error-prone as data
models became more complex, and small changes to the data structure could require changes to many
programs. Also, because of the use of physical pointers, operations such as database loading and
restructuring could be very time-consuming.
Relational model
The basic data structure of the relational model is a table where information about a particular entity (say,
employees) is represented in columns and rows. The columns enumerate the various attributes of an entity

(e.g. employee_name, address, phone_number). Rows (also called records) represent instances of an entity
(e.g. specific employees).
The "relation" in "relational database" comes from the mathematical notion of relations from the field of set
theory. A relation is a set of tuples, so rows are sometimes called tuples. All tables in a relational database
adhere to three basic rules.
The ordering of columns is immaterial. Identical rows are not allowed in a table
Each row has a single (separate) value for each of its columns (each tuple has an atomic value).
If the same value occurs in two different records (from the same table or different tables) it can imply a
relationship between those records. Relationships between records are often categorized by their cardinality
(1:1, (0), 1:M, M:M).
Tables can have a designated column or set of columns that act as a "key" to select rows from that table with
the same or similar key values. A "primary key" is a key that has a unique value for each row in the table.
Keys are commonly used to join or combine data from two or more tables. For example, an employee table
may contain a column named address which contains a value that matches the key of an address table. Keys
are also critical in the creation of indexes, which facilitate fast retrieval of data from large tables. It is not
necessary to define all the keys in advance; a column can be used as a key even if it was not originally
intended to be one.
http://www.smartcomputing.com/editorial/article.asp?article=articles%2F2000%2Fs1110%2F32s1
0%2F32s10.asp
Database Management System

A DBMS is a complex set of software programs that controls the organization, storage, management, and
retrieval of data in a database. DBMS are categorized according to their data structures or types. It is a set
of prewritten programs that are used to store, update and retrieve a Database. A DBMS includes:
1. A modeling language to define the schema of each database hosted in the DBMS, according to the
DBMS data model.
o The four most common types of organizations are the hierarchical, network, relational and object
models. Inverted lists and other methods are also used. A given database management system
may provide one or more of the four models. The optimal structure depends on the natural
organization of the application's data, and on the application's requirements (which include
transaction rate (speed), reliability, maintainability, scalability, and cost).
o The dominant model in use today is the ad hoc one embedded in SQL, despite the objections of
purists who believe this model is a corruption of the relational model, since it violates several of
its fundamental principles for the sake of practicality and performance. Many DBMSs also
support the Open Database Connectivity API that supports a standard way for programmers to
access the DBMS.
Data structures (fields, records, files and objects) optimized to deal with very large amounts of data stored
on a permanent data storage device (which implies relatively slow access compared to volatile main
memory).
A database query language and report writer to allow users to interactively interrogate the database, analyze
its data and update it according to the users privileges on data.
It also controls the security of the database.
Data security prevents unauthorized users from viewing or updating the database. Using passwords, users
are allowed access to the entire database or subsets of it called subschemas. For example, an employee
database can contain all the data about an individual employee, but one group of users may be authorized
to view only payroll data, while others are allowed access to only work history and medical data.
If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this
capability allows for managing personal databases. However, it may not leave an audit trail of actions or
provide the kinds of controls necessary in a multi-user organization. These controls are only available when
a set of application programs are customized for each data entry and updating function.
A transaction mechanism, that ideally would guarantee the ACID properties, in order to ensure data
integrity, despite concurrent user accesses (concurrency control), and faults (fault tolerance).
It also maintains the integrity of the data in the database.
The DBMS can maintain the integrity of the database by not allowing more than one user to update the same
record at the same time. The DBMS can help prevent duplicate records via unique index constraints; for
example, no two customers with the same customer numbers (key fields) can be entered into the database.
See ACID properties for more information (Redundancy avoidance).

The DBMS accepts requests for data from the application program and instructs the operating system to
transfer the appropriate data.
When a DBMS is used, information systems can be changed much more easily as the organization's
information requirements change. New categories of data can be added to the database without disruption
to the existing system.
Organizations may use one kind of DBMS for daily transaction processing and then move the detail onto
another computer that uses another DBMS better suited for random inquiries and analysis. Overall systems
design decisions are performed by data administrators and systems analysts. Detailed database design is
performed by database administrators.
Database servers are specially designed computers that hold the actual databases and run only the DBMS
and related software. Database servers are usually multiprocessor computers, with RAID disk arrays used
for stable storage. Connected to one or more servers via a high-speed channel, hardware database
accelerators are also used in large volume transaction processing environments.
DBMSs are found at the heart of most database applications. Sometimes DBMSs are built around a private
multitasking kernel with built-in networking support although nowadays these functions are left to the
operating system.
Useful links
http://en.wikipedia.org/wiki/Database
PART 2
Specific Objective 2: Explain how files and databases are used in organizations;
Content: Uses: including store, organize, search, retrieve; eliminate redundancies; data mining, data marts
and data warehouses.
2. explain how files and databases Uses: including store, organize, search, retrieve; eliminate
are used in organizations; redundancies; data mining, data marts and data warehouses.
Databases are used to hold information that is useful in an organization and it may be used to organize or
arrange data in such a way that will improve the efficiency in data response in an organization.
In organizing data, the DB is used to allow data to be stored in tables, and manipulated in queries. Several
database are used across organizations and within our society.
Databases allows organization to locate information quickly through use of criterias using database querying
facilities and supply the requested information upon request in a timely and organized manner.
With Database, retrieval of records is possible where users can specify key terms and these key terms in
criterias are powerful enough to extract the required data.

Databases are helpful to remove repetition of data which leads to increased storage caused from
redundancies which is an undesirable characteristic of database. Database are useful to help remove
redundancy patterns or data that can arise in database that can distort data as well as affect the results a
database produce.
Data Mining
Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data
from different perspectives and summarizing it into useful information - information that can be used to
increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for
analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and
summarize the relationships identified. Technically, data mining is the process of finding correlations or
patterns among dozens of fields in large relational databases.
For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local
buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also
tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping
on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they
purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly
discovered information in various ways to increase revenue. For example, they could move the beer display
closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.
Data
Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are
accumulating vast and growing amounts of data in different formats and different databases. This includes:
 operational or transactional data such as, sales, cost, inventory, payroll, and accounting
 nonoperational data, such as industry sales, forecast data, and macro economic data
 meta data - data about the data itself, such as logical database design or data dictionary definitions
Information
The patterns, associations, or relationships among all this data can provide information. For example,
analysis of retail point of sale transaction data can yield information on which products are selling and when.
Knowledge
Information can be converted into knowledge about historical patterns and future trends. For example,
summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide
knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are
most susceptible to promotional efforts.

Data Warehouses
Dramatic advances in data capture, processing power, data transmission, and storage capabilities are
enabling organizations to integrate their various databases into data warehouses. Data warehousing is
defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a
relatively new term although the concept itself has been around for years. Data warehousing represents an
ideal vision of maintaining a central repository of all organizational data. Centralization of data is needed
to maximize user access and analysis. Dramatic technological advances are making this vision a reality for
many companies. And, equally dramatic advances in data analysis software are allowing users to access this
data freely. The data analysis software is what supports data mining.
What can data mining do?
Data mining is primarily used today by companies with a strong consumer focus - retail, financial,
communication, and marketing organizations. It enables these companies to determine relationships among
"internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic
indicators, competition, and customer demographics. And, it enables them to determine the impact on sales,
customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary
information to view detail transactional data.
With data mining, a retailer could use point-of-sale records of customer purchases to send targeted
promotions based on an individual's purchase history. By mining demographic data from comment or
warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.
For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to
individual customers. American Express can suggest products to its cardholders based on analysis of their
monthly expenditures.
WalMart is pioneering massive data mining to transform its supplier relationships. WalMart captures point-
of-sale transactions from over 2,900 stores in 6 countries and continuously transmits this data to its massive
7.5 terabyte Teradata data warehouse. WalMart allows more than 3,500 suppliers, to access data on their
products and perform data analyses. These suppliers use this data to identify customer buying patterns at
the store display level. They use this information to manage local store inventory and identify new
merchandising opportunities. In 1995, WalMart computers processed over 1 million complex data queries.
The National Basketball Association (NBA) is exploring a data mining application that can be used in
conjunction with image recordings of basketball games. The Advanced Scout software analyzes the
movements of players to help coaches orchestrate plays and strategies. For example, an analysis of the play-
by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6,
1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and
made each one! Advanced Scout not only finds this pattern, but explains that it is interesting because it
differs considerably from the average shooting percentage of 49.30% for the Cavaliers during that game.
By using the NBA universal clock, a coach can automatically bring up the video clips showing each of the
jump shots attempted by Williams with Price on the floor, without needing to comb through hours of video
footage. Those clips show a very successful pick-and-roll play in which Price draws the Knick's defense
and then finds Williams for an open jump shot.

How does data mining work?
While large-scale information technology has been evolving separate transaction and analytical systems,
data mining provides the link between the two. Data mining software analyzes relationships and patterns in
stored transaction data based on open-ended user queries. Several types of analytical software are available:
statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought:
 Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain
could mine customer purchase data to determine when customers visit and what they typically order.
This information could be used to increase traffic by having daily specials.
 Clusters: Data items are grouped according to logical relationships or consumer preferences. For
example, data can be mined to identify market segments or consumer affinities.
 Associations: Data can be mined to identify associations. The beer-diaper example is an example of
associative mining.
 Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an
outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a
consumer's purchase of sleeping bags and hiking shoes.
Data mining consists of five major elements:
 Extract, transform, and load transaction data onto the data warehouse system. Store
and manage the data in a multidimensional database system.
 Provide data access to business analysts and information technology professionals.
 Analyze the data by application software.
 Present the data in a useful format, such as a graph or table.
http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm
Issues raised by Data Mining
One of the key issues raised by data mining technology is not a business or technological one, but a social
one. It is the issue of individual privacy. Data mining makes it possible to analyze routine business
transactions and glean a significant amount of information about individuals buying habits and preferences.
Another issue is that of data integrity. Clearly, data analysis can only be as good as the data that is being
analyzed. A key implementation challenge is integrating conflicting or redundant data from different
sources. For example, a bank may maintain credit cards accounts on several different databases. The
addresses (or even the names) of a single cardholder may be different in each. Software must translate data
from one system to another and select the address most recently entered.
A hotly debated technical issue is whether it is better to set up a relational database structure or a
multidimensional one. In a relational structure, data is stored in tables, permitting ad hoc queries. In a
multidimensional structure, on the other hand, sets of cubes are arranged in arrays, with subsets created

according to category. While multidimensional structures facilitate multidimensional data mining, relational
structures thus far have performed better in client/server environments. And, with the explosion of the
Internet, the world is becoming one big client/server environment.
Finally, there is the issue of cost. While system hardware costs have dropped dramatically within the past
five years, data mining and data warehousing tend to be self-reinforcing. The more powerful the data mining
queries, the greater the utility of the information being gleaned from the data, and the greater the pressure
to increase the amount of data being collected and maintained, which increases the pressure for faster, more
powerful data mining queries. This increases pressure for larger, faster systems, which are more expensive.
HISTORY OF DATA STORAGE
Specific Objective 2: Explain how data storage and retrieval have changed over time;
Content: Concept of the terms; history of storage devices; formats of data (from text-based to multimedia);
volumes to be stored; compression utilities; access method and speed.
explain how data storage and retrieval Concept of the terms; history of storage devices; formats of
have changed over time; data (from text-based to multimedia); volumes to be stored;
compression utilities; access method and speed.
Read up on History of Data Storage http://gadgets.fosfor.se/history-of-data-storage/
Formats of Data Text Based --------------- Audio Based, Video Based, Signal Based
ACCESS METHODS
An access method defines the way processes read and write files. We study some of these below.
Sequential Access This occurs when a number of records need to be successively read.
For example, if a file has records which are not sorted, then to find a particular record,
that file would need to be searched from the beginning, reading each record in turn
until the wanted record is found.
Direct Access This access allows a user to position the read/write mark before reading or writing. This
feature is useful for applications such as editors that need to randomly access the contents of the file.
What are the advantages and disadvantages associated with each method of file storage?
What are the issues relating to the speed at which we access information?

Cape Notes Unit 2 Module 1 Content 1 3 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cape Notes Unit 2 Module 1 Content 1 3 2

Uploaded by

Copyright:

Available Formats

Syllabus Focus: Unit 2 Module 1 Content 1 - 3

Specific Objective 1: Differentiate among the terms used in Information Management

What is Information Management?

KEY TERMS USED IN INFORMATION MANAGEMENT

CAPE NOTES Unit 2 Module 1 Content 1 - 3 1

Purpose: Managing data

Major Advantages: Can change way data is sorted and displayed

CAPE NOTES Unit 2 Module 1 Content 1 - 3 2

CAPE NOTES Unit 2 Module 1 Content 1 - 3 3

CAPE NOTES Unit 2 Module 1 Content 1 - 3 4

Database Management System

CAPE NOTES Unit 2 Module 1 Content 1 - 3 5

It also controls the security of the database.

It also maintains the integrity of the data in the database.

CAPE NOTES Unit 2 Module 1 Content 1 - 3 6

CAPE NOTES Unit 2 Module 1 Content 1 - 3 7

CAPE NOTES Unit 2 Module 1 Content 1 - 3 8

What can data mining do?

CAPE NOTES Unit 2 Module 1 Content 1 - 3 9

Data mining consists of five major elements:

Issues raised by Data Mining

CAPE NOTES Unit 2 Module 1 Content 1 - 3 10

HISTORY OF DATA STORAGE

Read up on History of Data Storage http://gadgets.fosfor.se/history-of-data-storage/

CAPE NOTES Unit 2 Module 1 Content 1 - 3 11

You might also like