Professional Documents
Culture Documents
Introduction To Database Management System Work Plan
Introduction To Database Management System Work Plan
WORK PLAN
Introduction to Database Management System
Data: The facts that can be recorded and which have implicit meaning. Data takes
different forms. It may be string, numeric or graphics. Raw facts that have not been
processed or that can be used for further processing
Example:
Name:
Level:
Photograph/Sign:
Age:
File Organisation
Sequential Files: Records are stored in a fixed sequence and can only be read in
that sequence, starting from the first record. In this type of file organisation,
records can only be added at the end of the file. Sequential file arrangement is not
efficient as file searching tends to be difficult and time consuming.
Indexed Files: This method uses an index to access records in a random fashion.
Here records are arranged in a predefined order. Records can be sorted according
to an attribute or preference (e.g. Alphabetically, Ascending Descending, etc)
indexed files are efficient and faster to access.
4. No Data sharing: There should be concurrency for faster response time. This
will also ensure accurate update. This is not the case here. E.g. step in Admin
AL/2, step in Accounts AL/1
5. Data Integrity Problems: There is need to ensure that data is correct, consistent
and current.
6. Security Problem: Every user of the system should be able to access only the
data they are permitted to see. Payroll people only handle employee records, and
cannot see customer accounts.
Database Characteristics
Speed: How fast is data retrievable and other processing action on the database
Advantages of DBMS
Data Security: Security tools such as password access, and encryption, ensure that
data is not deliberately or accidentally changed or damaged. The improved security
ensures that is not access without proper authorisation.
Reduced Data Redundancy: With the use of DBMS, there can only be a single
version of the truth. Reduced data redundancy ensures efficient data storage and
efficient time management of Hardware (CPU), programme(s), analyst(s) and
user(s). Relational DBMS use Normalisation to reduce data redundancy.
Data Sharing: DBMS allows different user groups to use the same data. This
helps to manage space and time efficiently.
Data Standardization: In general, data items have common names and storage
formats.
Application of DBMS
There are a lot of people involved in the design use and maintenance of any
database.
These people can be classified into two (2) distinct types- Actor on the scene and
workers behind the scene
1. Actors on the scene: The people, whose jobs involve the day –to-day use of a
database are called “Actors on the screen”.
Database Designers: Database designers are responsible for identifying the data to
be stored in the database and for choosing appropriate structures to represent and
store this data.
End users: End users are the people whose jobs require access to the database for
querying, updating and generating reports.
System Analysis: These people determine the requirements of end users and
develop specifications for transactions
Database Designers and Implementers: These are the people who design and
implement the DBMS modules and interfaces as a software package.
Tools Developers: Include persons who design and implement tools consisting the
packages for design, performance monitoring, and pntotyping and test data
generation.
Assignment 1
DATA MODELS
In the study of database, our interest is in data modelling. The role of data
modelling is to provide techniques that allow us to represent by graphical and other
formal methods, the nature of data in real-world computer applications. The
principal justification for the use of data modelling is to provide a clearer
understanding of the underlying nature of information and the processing of
information in this area.
It describes data at the conceptual and view levels. It provides fairly flexible
structuring capabilities and allows one to specify data constraints explicitly. There
are over 30 such models, including
Entity: An entity is a distinguishable objects that exists. Each entity has associated
with it a set of attributes describing it e.g. number and balance for account entity.
Another essential element of the E-R diagram is the mapping cardinalities, which
express the number on entities to which another entity can be associate via a
relationship set. The overall logical structure of a database can be expressed
graphically by an E-R diagram.
It describes data at the conceptual and view levels. These models are used to
specify overall logical structure of the database and provides a higher- level
description of the implementation. They are called record- based because the
database is structured in fixed-format records of several types. Each record type
defines a fixed number of fields or attributes.
Each field is usually of a fixed length. Record – based models do not include
a mechanism for direct representation of code in the database. Separate languages
associated with the model are used to express database queries and updates. The
three most widely- accepted models are:
Relational Model
Network Model
Hierarchical Model
Lets talk briefly on these three different model while Relational Model will be
examined in more details later.
Number Balance
101 750
203 890
104 560
110 720
A simple relational database
The relational model does not use pointers or links, but relates records by the
values they contain. This allows a formal mathematical foundation to be defined.
Here data are represented by collection of records. Relationships among data are
represented by links. The organization of a network model is that of an arbitrary
graph.
Abiodun Iwaro Eriijiyan 101 750
203 890
Kolawole Monatan Ibadan
104 560
Christiana Basiri Ado- Ekiti
110 720
h The Hierarchical Model
Customer - Account
Unifying model
Frame memory
Data Abstraction
FUNCTIONAL DEPENDENCY
Note that for any value of engine size, say 1400, the mileage charge is always the
same (in this case 10p). Hence, the mileage charge is functionally dependent on the
engine size. We would write this as:
Entity Integrity
“Entity Integrity is the principle that no part of a primary key can be null or
can have duplicate”
Referential Integrity
Referential Integrity is concerned with the linkages between tables defined by the
foreign and primary key fields. For a set of database tables, all foreign keys values
in all tables must be matched by a row in another tables. A database tables for
which this is true is said to conform to referential integrity; otherwise, it does not.
Courses Students
In the first row of the student table, the course code value B 654 is valid
because it refers to an existing row in the table (Economics course). However, the
course code in the second row (D 333) row breaks referential integrity because
there is no corresponding row in the course table. In application terms, it means/
implies that the student Smith is on a non-existent course. If student smith in the
above table was not enrolled on any course, this could be specified by means of a
null value in the course code column. This does not conflict with referential
integrity since no reference is being made to a non-existent course.
Note that the C299 Course Code in the courses table is not matched with any
value in the student table; this is Ok, however, since Course code is not a foreign
key in the course table. It simply implies that the computing has currently no
enrolled students.
The example shows the referential link between the two tables using the course
code. The effect of establishing this link is that the database software will not
permit any modification to the database that breaches this referential integrity
requirement. For instance we cannot delete a row of the course table if some row
or rows of the student table refers to it. This corresponds to the application domain
restriction that you cannot cancel a course while there are still students on it.
However, most systems (including Microsoft Access) allow deletion of rows from
the primary table by use of a technique called cascaded delete. This has the effect
of deleting all dependent rows when a primary table record is deleted. For
instance, if the course table record referring to say, course B 654 is deleted,
reference to all students on this course in the students table will also be deleted,
thereby preserving referential integrity. The admissibility of his technique will of
course depend on the design of the particular application. It is like;ly that the
correct approach in most application would be to delete the dependent rows
individually before deleting the parent.
Normalisation - Assignment
Review Questions
Distinguish between the terms primary key, candidate key and foreign key
What does it mean to say that one attribute of a table is ‘functionally
dependent’ on another?
Explain the purpose of nulls in database tables and indicate why it is not
quite correct to talk of a null value.
Explain the concept of referential integrity and its importance in relational
database practice.
What is the essential nature of relational algebra?
What are the principal disadvantages of hierarchical and Network database?
What is meant by the term ‘Schema’?
NORMALIZATION
One of the principal objectives of relational databases is to ensure that each item of
data is only held once within the database. For instance, if we hold customers’
addresses then the address of any customer is only represented once throughout all
the tables of the application. In many cases, it is relatively easy to arrange the
tables to meet this objective. There is, however, a more formal procedure called
Normalization that can be followed to organise data into a standard format which
avoids many processing difficulties.
Purpose of Normalization
Put data into a form that conforms to relational principles e.g. single-valued
columns; each relation represents one ‘real world’ entity.
Avoid redundancy by storing each ‘fact’ within the database only once
Put the data into form that is more able to accommodate change
Normalization involves checking that the tables conform to specific rules, and if
not, re-organising the data. This will mean creating new tables containing data
drawn from the original tables.
Normalization is a multi-stage process, the result of each of the stages being called
a NORMAL FORM; successive stages produce a greater and greater degree of
normalization. There are a total of seven normal forms, called, in increasing degree
and grouped for convenience of description:
First, second and third normal forms (abbreviated to INF, 2NF and 3NF)
Fifth Normal form (5NF) and Domain Key Normal Form (DK/NF)
The normal forms 1NF, 2NF and 3NF are the most important and all practical
database applications would be expected to conform to these. The livelihood of a
set of table requiring modification to comply with these is quite high.
The Boyce- codd Normal form is a more stringent form of 3NF and again
should be applied to a practical system. There is less chance of this normal form
affecting the structure of the tables.
The fourth and fifth normal forms are unlikely to be significant in a practical
system that has been designed, say, using the ER approach.
The highest normal form, the Domain –key was devised by Fagin in
1981.Fagin proved that this normal form was the “last; no higher form is possible
or necessary, since a relation in DK/NF can have no modification anomalism.
However, this is mostly of theoretical interest, since there is no known procedure
for converting to this form.
The first three normal forms are the most significant and are usually
sufficient for most applications.
The normalisation process assumes that you start with some informal description
of all the data attributes that the database application appears to require; this is
often called un-normalised data. This set of attributes is then tested using criteria
defined by each of the normalisation stages. If the data fails the criteria, there is a
prescribed procedure for correcting the structure of the data. This inevitably
involves the creation of additional tables.
The overall process of normalisation for the first three stage is summarised below
The normalisation process and Normal Forms
Review questions
CHOICE OF DATABASE
In many cases we have no choice in the selection of a database product but where a
‘green-field’ situation exists there are a number of factors to be considered. In
assessing the wide range of database products that are currently available, it is
helpful to identify some major distinguishing features to simplify the process of
selection. A number of such features are given below.
1. Scale: This term refers to the maximum number of users and the maximum
disk space supported by the database. The system overall must be able to
cope with the expected loading. This can be limited by the host computer.
2. Performance: The number of transactions per hour that can be handled.
This of course is also heavily dependent on the computer system used, but it
is important that the database and the computer are matched in power.
3. Support for data types: For example, you may require support for staring
multimedia data. For specialised applications, you may require to hold high
precision floating point values or ‘long’ integer values.
4. Connectivity: This term refers to support for accessing other database or file
systems.There may be a need, for instance, to access some legacy system
that uses an older database package. Also, your database may need to
communicate with another system in another office. This is an area of
considerable activity in the database market at present fuelled by the current
interest in promoting commerce via national and international networks.
The client or ‘front end’ interfaces with the user; it is where application
programs execute and it provides facilities such as interactive querying,
report generation, transaction forms etc.
The server or ‘back-end’ provides the database engine that manages the
physical data and responds to queries sent from the client.
In modern systems, SQL plays a major role; SQL is used as the
communication language that enables the client to define the queries that are
sent to the server for execution. It is important to distinguish between an
SQL- based client – server system and file server system.
In a file server, the user’s computer has its own DBMS but accesses
database files held on the server. Application programs request data via the
local DBMS, which transmits the requests to the server. The server resolves
the query and returns data blocks to the application program.
In an SQL Server environment , the client application sends SQL queries to
the server, which responds with the query “answer”, in the form of rows of
data.
Three –tier Architecture
The three tier (or three layer) architecture is a client – server architecture in which
the user interface, the processing logic and the data storage and access are each in a
separate computing resources. This is shown below.
User Interface
Tier
Application or Logic
Tier
Database Tier
User interface tier- Enables user control and inputs and provides presentation of
results.
2. The computers used in the system can be optimised for the role they play. The
server machines can be configured with large memory and a fast disk system to
cope with their role as data providers.
3. It divides the processing burden between client and server; much of the
processing is performed locally; i.e. on the client system, near to where the data is
produced and required. This consequently minimises network activity needed to
service the application.
In a typical client –server system, a server may itself act as a client to use the
services of another server, if this principle is adopted with full generality, the
distinction between client and server computers would disappear, any machine
could act as a server any as a client. In effect, ‘client’ and ‘server’ would be roles
adopted by the computers as required by application systems. Such an arrangement
is called a peer- to –peer system and forms the basis of an ultimate distributed
processing model. In this model, all networked computers can offer services to,
and use the services of, the rest of the network.
Distributed Databases
(1)Improves system efficiency due to reduced network traffic and faster access to
local data. This is probably the main factor favouring distributed databases.
(3)Improves availability: If one site fails, the rest of the system can continue
operating (at least to the extent that the services of the failed site are required).
(4) Improved System Capacity: By subdividing the network, the system capacity is
improved.
(1)Query Processing: Data may have to be obtained from several sites, requiring
additional work in resolving and optimising the query.
Threats to Databases
The following are the major threats to integrity and security of database
User errors: The database users can accidentally (or otherwise) enter erroneous
values, amend data incorrectly or delete valid data. Errors may also be generated
by mistakes occurring in data collection process prior to data entry into the
computer.
Database Integrity: Entity, referential and data integrity. This is concerned with
techniques employed to preserve the ‘correctness’ of the database contents as it
undergoes updates. Once defined by the database designer, the DBMS is able to
manage subsequent preservation of those integrities.
Data Validation: Techniques used to ensure that data entering the database is
correct. This includes type checking, validation techniques, assertions and triggers.
Backups and Recovery: Describes methods used to ensure that the database
system can survive failures and other threats. This includes backups, transaction
logs and check-points.
Use of Antivirus
Database 2
It describes part of the database for a particular group of users. There can be many
different views of a database e.g. the bursary section of Afe Babalola University
gets only the name, level, account number and bank of the member of staff and not
staff profile or personal information. It describes how users see the data.
Note:
View is also referred to as external schema, physical level as physical schema and
conceptual level as conceptual schema.
1. Field
2. Records
3. Queries
4. Reports
ATTRIBUTES
The domain of an attribute is the range of possible values that the attribute can
have.
Some domains are very large (e.g. the set of all students’ names) while others can
be more precisely delineated (e.g. students of Kano origin in Abuja).
Data type refers to the classification of the data used for the attribute; common
types are text, number and date.
Types of Attributes
1. Simple attributes: The attributes that are not divisible are called simple or
atomic attributes. Example: Acc No, Sex, Age
2. Composite Attributes: These are the attributes that can be divided into
smaller sub-parts, which represent more basic attributes with independent
meaning. These are useful to model situations in which a user sometimes
refers to the composite attribute as unit and at other times refers specifically
to its components.
Example: Street address can be divided into 3 simple attributes as Number,
Street and Apartment No
Street Address
3. Single Valued Attribute: The attributes having a single value for a particular
entity are called “single valued attributes”
Example: Age
Multi-valued Attribute: The attributes with a set of values for the same entity are
called a ‘multi valued attributes’
Example:
School Attended Attribute: A person may attend only one school, while others
may attend two or three or more. A multi-value attribute may have lower and upper
bounds on the number of values allowed for each individual entity.
i. Procedural: The user specifies what data is needed and how to get it.
ii. Non procedural: The user only specifies what data is needed.
In order to enable the setting up, maintaining and accessing of a database, a DBMS
would provide a range of features serving the needs of the application system
developers and the end-users. A practical database package would typically
provide facilities for:
The description of the data held in the database position, size and type of each
component field of each database record type.
Account Relation
Diagram
The invention of the relational database is largely attributed to Edward Codd who
presented a paper on the subject in the year 1970. The relational database has an
underlying mathematical basis which has been important in its development and
acceptance. The simplest view of relational database is that they represent the
application data as ‘two-dimensional’ tables. Using file processing terminology,
the columns of the table represent component fields of the record, and each row
details an instance of a record. An example is given below
Note that as conventionally drawn, each columns of the table has a heading such as
Name, Level that specifies the content of the column. The heading is part of the
schema, the stored description of the data, and not part of the data itself. The
conceptually simple tabular arrangement can be readily understood by most people
and this simplicity is an important factor in the popularity of relational databases.
In particular, its ease of understanding for simple tasks has made the relational
database acquire scent most preferred for use by ‘end users’ in addition to
professional system developers.
The major advantages of relational model over the older data models are
Well designed
1. It is simple and elegant Sophisticated
Necat
4. The relationship between stored data can be viewed at a glance which gives
room to easy comparison.
Basic set principles. The relational model is based on the mathematical theory of
sets.
A set can be viewed as a collection of zero or more items of similar type. For
instance, a set of number could be {30, 3, 97, 1064}, another set of number
could be {1}. A set of people could be {Abiodun, Kolawole, Christianah,
Kemisola}. An empty set is valued and shown as { }. The theory is not
concerned with the nature of the contained items, only about the common
properties of such sets and the ways in which sets can be processed
mathematically. For instance, we can define operators on sets such as
‘union’ that combines two sets into one: the union of the set {32,5,99,
1066,1}.
The three most important characteristics of sets for our purposes are:
These simple set principle can be used to develop the concept of a relation, as
described below
Relations
Given two sets X and Y, we can take any element x from y from Y to form an
ordered pair (x, y). The set of all ordered pairs is called the product set and is
denoted by X. Y.
e. g. X = {1,2} Y = {A, B, C}
then X. Y = {1A, 1B, 1C, 2A, 2B, 2C}
A subset of X.Y is called a relation and can be denoted R (X, Y). A relation can be
considered as a mapping from one set to another and given a functional name.
The set X could be mapped into a number of other sets, in addition to Y; for
instance, Y1 = {Clerk, Accountant, manager, Salesman}
A relation can now be viewed as a set of tuples, shown in the table beklow
A set of Tuples
Properties of a Relation
1. Columns in the relations are all single values; i.e. arrays of values or other
compound