Professional Documents
Culture Documents
Information Management (Files and Databases)
Information Management (Files and Databases)
SUMMARY NOTES
UNIT 2- MODULE 1
❖ Database management systems (DBMS) are collections of tools used to manage databases.
Four basic functions performed by all DBMS are:
➢ Create, modify, and delete data structures, e.g. tables
➢ Add, modify, and delete data – Data Manipulation
➢ Retrieve data selectively
➢ Generate reports based on data
❖ Field – this a single piece of information. It is an area (within a record reserved for a specific
piece of data. Examples: customer number, street address, city, current balance etc.
❖ Record – A collection of values for all the fields pertaining to one entity: i.e., anything that
data will be collected on e.g. person, product, company, transaction etc.
❖ Table/File – A collection of related records. E.g. employee table, product table, customer
table, student table, flight table etc. In a table, records are represented by rows and fields are
represented as columns. In a relational database a table may be referred to as a relation and a
row may be referred to as a tuple.
❖ Database – A collection of related tables. It can also include other objects, such as queries,
forms and reports. The structure of a database is the relationships between its tables.
❖ Entity – a person, place or thing on which data will be collected e.g. student, lecturer,
product, store.
❖ Attribute – a characteristic or property of an entity e.g. First Name, ID No, Product Code,
branch name.
❖ Organize: Databases are used to hold information that is useful in an organization and it may
be used to organize or arrange data in such a way that will improve the efficiency in data
response in an organization.
❖ Store: A computerized database is used to store data in tables.
❖ Search & Retrieve: Databases allows organizations to locate and retrieve information quickly
through use of given criteria, such as specific key terms.
❖ Eliminate Redundancies: Databases are helpful in eliminating redundancies thus, removing
repetition of data.
Data Mining:
Generally, data mining is the process of analyzing data from different perspectives and
summarizing it into useful information - information that can be used to increase revenue, cuts
costs, or both.
Data Mart: This is the micro version of data warehouse. It is more useful or suitable to support
small organizations with very few departments. Data mart is often built and controlled by a
single department within an organization, eg: Technology, sales, finance, marketing, etc
Objective 3: Explain how Data Storage & Retrieval have changed overtime.
EFFICIENCY Not very efficient approach Much more efficient approach than
when speed and data quality, traditional file approach
data handling and processing
is looked at
COST Cost of developing and Cost of developing and maintaining
maintaining higher; initial lower; initial cost higher
cost lower
DATA QUALITY General quality encompasses Quality is better than traditional file
completeness, validity, approach
consistency, timeliness,
accuracy; if these are good
quality is good; general
quality is lower than
database approach
COMPLETENESS Now way to ensure data is Validations can be written to ensure
complete that data is complete; fields that are
primary keys must be present
VALIDITY No validation checks Has validation checks; validations
can be written
Master Files
This is a long-lived file, it holds the data that is to be processed; descriptive data and the updated data
after transaction is completed. It is a permanent file.
Transaction Files
This is it namesake holds the transactions that are to be carried out on the master file. Basically the data
on this file makes changes to the master file. It is not permanent but rather temporary.
File Organization
Serial
• collection of records
• no particular sequence
• cannot be used as master file
• used as temporary transaction file
• records stored in order received
Advantages:
• simple file design
• can be stored on inexpensive devices
Disadvantages:
• entire file must be processed even if single record must be accessed
• overall processing slow
Sequential
• A collection of records
• stored in key sequence
• adding/deleting record requires making new file
• used as master file
Advantages:
• simple file design
• very efficient when most of records must be processed (like in payroll)
• very efficient; data has natural order
• can be stored on inexpensive devices like magnetic tape
Disadvantages:
• entire file must be processed even if single record is to be searched
• transactions have to be sorted before processing
• overall processing slow
Direct/Random
• Records are read directly from or written directly to the file
• the records are stored at known address
• address is calculated by applying a mathematical function to key field
• stored on a direct access backing storage medium (example: magnetic disk, CD,DVD)
• used in any information retrieval system (example: train timetable system)
Advantages:
• Any record can be directly accessed
• speed of recording processing is very fast
• up-to-date file because of on-line updating
• concurrent processing is possible
Disadvantages:
• more complex than sequential
• does not fully use memory location
• more security and backup problems
Indexed Sequential
• each record of a file has a key field which uniquely identifies that record
• has an index which consists of keys and addresses
• an index sequential file is a sequential file which has an index
• a full index to a file is one in which applications where data needs to be accessed in either two
ways
• accessed randomly and sequentially; randomly using index
• file can be stored in random access device (example: magnetic discs, CD, DVD)
Advantages:
• provides flexibility for users who need both types of access with the same file
• faster than sequential
Disadvantages:
• extra storage space required for index
Database Types
Personal Database
• generally used by one person at a time
• can stores information for an entire family
• things it can store: pictures, music, games etc.
Work Group
• shares information across a network
• has security and data integrity checks
• used by several people (2-25)
• greater capacity than personal
• provides backup
• must be maintained on regular basis
Departmental Database
• Work group on a bigger scale
• used by 25 - 100 people
• even greater capacity that work group
• provides backup
• maintenance important
Enterprise Database
• manages scope of whole organization
• back and maintenance important
• used by over 100 people
• provides greatest storage capacity
• must have fast retrieval speed
Database Organization
Hierarchical
Advantages:
• more efficient than flat file model because of redundancy
Disadvantage:
• must be an expert to operate
• only caters for one-to-many relationship (between parent and child)
• changes in structure affects data access
• not very flexible
Network
• allows for each parent to have multiple children and each child to have multiple parents; hence
facilitating many-to-many relationships
• an improvement on hierarchical
Advantage:
• facilitates many-to-many relationships
• more flexible than hierarchical
Disadvantage:
• changes in structure affects data access
Relational
• This model is defined as a model in which two or more linked tables are used to track
information. Data is stored in tables (more than one table).
• data represented in terms of tuples grouped into relations
• applies relations between tables
• facilitates all cardinalities (many-to-many etc.)
• has primary key that facilitates relationships
• table has name that is distinct form all other tables in the database
• no duplicate rows; all rows are distinct
• entries in fields (columns) are atomic (no repeating of groups or multi-valued attributes)
• each field has distinct name
• examples: MS SQL SERVER, Oracle, My SQL, MS Access
Advantages:
• avoid redundancies of information
• conceptual simplicity
• Structural Independence: Changes in structure does not affect the data access
• Design Implementation: Achieves both data independence and structural independence
• flexible: data can be manipulated by operators
• relations between tables ensure no ambiguity
• more efficient that previous two
• can point to specific piece of data directly without going through another piece of data
• consistency is achieved by declaring constraints in database design
Object Oriented
Advantages:
Disadvantages:
Key