Lavy

Traditional System
In the traditional approach, information is stored in flat files which are maintained by the file system of OS. Application programs go through the file system to access these flat files.
Problems: Traditional Approach

Data Security Data Redundancy Data Isolation Program / Data Dependence Lack of Flexibility Concurrent Access Anomalies
DATA BASE MANAGEMENT SYSTEM

DBMS consists of collection of interrelated files and set of programs to access and modify files Primary goal is to provide a convenient and efficient way to store, retrieve and modify information Layers of abstraction between the application programs and the file system
Services provided by a DBMS Data management Data definition Transaction support Concurrency control Recovery Security and integrity Utilities- facilities like data import & export, user management, backup, performance analysis, logging & audit, physical storage control
Where does the DBMS Fit In

Now, the DBMS acts as a layer of abstraction on top of the File system
For interacting with the DBMS we would be using a Query language called SQL.
Difference Between the File and DBMS Operations
Three Layer Architecture
Detailed Architecture
The External view is how the Customer views it.
The Conceptual view is how the DBA views it.
The Internal view is how the data is actually stored.
Example
Users of DBMS
Database Administrator(DBA) Managing information contents Liaison with users Enforcing security and integrity rules Strategizing backup and recovery Monitoring performance Database Designers Application Programmers End Users
Data models
A conceptual tool is used to describe: Data Data Independence Data semantics Consistency constraints
Object Based logical model

Entity Relationship model
Record Based logical model

Hierarchical data model Network data model Relational data model
Relational Model Basics
Keys in relational model
Database Design Techniques
ER modeling
Properties
Attributes Attribute types(Simple, composite, derived) Degree of a Relationship (unary, binary, ternary) Cardinality (1:1,1:M, M:M, M:1) Relationship Participation (Partial or Total)
What is Normalization
Normalization
Normalization:
The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations
Normal form:
Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form
Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect The database designers need not normalize to the highest possible normal form (usually up to 3NF, BCNF or 4NF)
If a relation schema has more than one key, each is called a candidate key.
One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys.
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributethat is, it is not a member of any candidate key.
First Normal Form

Disallows composite attributes multivalued attributes nested relations; attributes whose values for an individual tuple are non-atomic
Second Normal Form

Uses the concepts of FDs, primary key
Prime attribute: An attribute that is member of the primary key K

Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more Examples: {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key R can be decomposed into 2NF relations via the process of 2NF normalization
Third Normal Form

Definition:
Transitive functional dependency: a FD X -> Z that can be derived from two FDs X -> Y and Y -> Z
Examples: SSN -> DMGRSSN is a transitive FD Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold SSN -> ENAME is non-transitive Since there is no set of attributes X where SSN -> X and X -> ENAME
A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency . E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
Normal Forms Defined Informally

1st
normal form
normal form normal form
All attributes depend on the key
2nd
All attributes depend on the whole key

All attributes depend on nothing but the key
3rd
General Normal Form Definitions (For Multiple Keys)

The
above definitions consider the primary key only The following more general definitions take into account relations with multiple candidate keys A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R
Superkey of relation schema R - a set of attributes S of R that contains a key of R A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds in R, then either: (a) X is a superkey of R, or (b) A is a prime attribute of R
BCNF (Boyce-Codd Normal Form)
A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)
Achieving the BCNF by Decomposition
Two FDs exist in the relation TEACH: fd1: { student, course} -> instructor fd2: instructor -> course {student, course} is a candidate key for this relation and that the dependencies shown follow the pattern in Figure 10.12 (b). So this relation is in 3NF but not in BCNF A relation NOT in BCNF should be decomposed so as to meet this property, while possibly forgoing the preservation of all functional dependencies in the decomposed relations.
Three possible decompositions for relation TEACH {student, instructor} and {student, course} {course, instructor } and {course, student} {instructor, course } and {instructor, student}
All three decompositions will lose fd1. We have to settle for sacrificing the functional dependency preservation. But we cannot sacrifice the non-additivity property after decomposition.
File Organization & Indexing
How
DBMS physically organizes data Different file organizations or access methods What is Indexing? Different indexing methods How to create indexes using SQL
DBMS has to store data somewhere Choices: Main memory Expensive compared to secondary and tertiary storage Fast in memory operations are fast Volatile not possible to save data from one run to its next Used for storing current data Secondary storage (hard disk) Less expensive compared to main memory Slower compared to main memory, faster compared to tapes Persistent data from one run can be saved to the disk to be used in the next run Used for storing the database Tertiary storage (tapes) Cheapest Slowest sequential data access Used for data archives
DBMS stores data on hard disks
This means that data needs to be

read from the hard disk into memory (RAM) Written from the memory onto the hard disk
Because I/O disk operations are slow query performance depends upon how data is stored on hard disks The lowest component of the DBMS performs storage management activities Other DBMS components need not know how these low level activities are performed
Basics of Data storage on hard disk

A
disk is organized into a number of blocks or pages A page is the unit of exchange between the disk and the main memory A collection of pages is known as a file DBMS stores data in one or more files on the hard disk
Database Tables on Hard Disk
Database tables are made up of one or more tuples (rows) Each tuple has one or more attributes One or more tuples from a table are written into a page on the hard disk

Larger tuples may need more than one page! Tuples on the disk are known as records Records are separated by record delimiter Attributes on the hard disk are known as fields Fields are separated by field delimiter
File Organization
The physical arrangement of data in a file into records and pages on the disk File organization determines the set of access methods for Storing and retrieving records from a file Therefore, file organization synonymous with access method We study three types of file organization Unordered or Heap files Ordered or sequential files Hash files We examine each of them in terms of the operations we perform on the database Insert a new record Search for a record (or update a record) Delete a record
Unordered Or Heap File

Records are stored in the same order in which they are created Insert operation
Fast because the incoming record is written at the end of the last page of the file Slow because linear search is performed on pages Slow because the record to be deleted is first searched for Deleting the record creates a hole in the page Periodic file compacting work required to reclaim the wasted space
Search (or update) operation
Delete Operation

Ordered or Sequential File
Records are sorted on the values of one or more fields Ordering field the field on which the records are sorted Ordering key the key of the file when it is used for record sorting Search (or update) Operation Fast because binary search is performed on sorted records Update the ordering field? Delete Operation Fast because searching the record is fast Periodic file compacting work is, of course, required Insert Operation Poor because if we insert the new record in the correct position we need to shift all the subsequent records in the file Alternatively an overflow file is created which contains all the new records as a heap Periodically overflow file is merged with the main file If overflow file is created search and delete operations for records in the overflow file have to be linear!
Hash File
Is an array of buckets Given a record, r a hash function, h(r) computes the index of the bucket in which record r belongs h uses one or more fields in the record called hash fields Hash key - the key of the file when it is used by the hash function Example hash function Assume that the staff last name is used as the hash field Assume also that the hash file size is 26 buckets - each bucket corresponding to each of the letters from the alphabet Then a hash function can be defined which computes the bucket address (index) based on the first letter in the last name.
Hash File (2)
Insert Operation
Fast because the hash function computes the index of the bucket to which the record belongs
If that bucket is full you go to the next free one
Search Operation
Fast because the hash function computes the index of the bucket
Performance may degrade if the record is not found in the bucket suggested by hash function
Delete Operation
Fast once again for the same reason of hashing function being able to locate the record quick
Dept. of Computing Science, University of Aberdeen
49
Indexing
Can we do anything else to improve query performance other than selecting a good file organization? Yes, the answer lies in indexing Index - a data structure that allows the DBMS to locate particular records in a file more quickly Very similar to the index at the end of a book to locate various topics covered in the book Types of Index Primary index one primary index per file Clustering index one clustering index per file data file is ordered on a non-key field and the index file is built on that non-key field Secondary index many secondary indexes per file Sparse index has only some of the search key values in the file Dense index has an index corresponding to every search key value in the file
Primary Indexes
The data file is sequentially ordered on the key field Index file stores all (dense) or some (sparse) values of the key field and the page number of the data file in which the corresponding record is stored
B002 B003 B004 B005 1 1 2 2
Branch B002 record Branch B003 record Branch B004 record Branch B005 record Branch B007 record
Branch
BranchNo B002 B003 Street 56 Clover Dr 163 Main St 32 Manse Rd 22 Deer Rd 16 Argyll St City London Glasgow Bristol London Aberdeen Postcode NW10 6EU G11 9QX BS99 1NZ SW1 4EH AB2 3SU
2
3 4
B004 B005 B007
B007
Indexed Sequential Access Method

Indexed sequential access method is based on primary index Default access method or table type in MySQL, MyISAM is an extension of ISAM Insert and delete operations disturb the sorting
ISAM
You need an overflow file which periodically needs to be merged with the main file
Secondary Indexes
An index file that uses a non primary field as an index e.g. City field in the branch table They improve the performance of queries that use attributes other than the primary key You can use a separate index for every attribute you wish to use in the WHERE clause of your select query But there is the overhead of maintaining a large number of these indexes
Creating indexes in SQL

You
can create an index for every table you create in SQL For example
CREATE INDEX branchNoIndex on branch(branchNo); CREATE INDEX numberCityIndex on branch(branchNo,city); DROP INDEX branchNoIndex;
File
organization or access method determines the performance of search, insert and delete operations.
Access methods are the primary means to achieve improved performance
Index
structures help to improve the performance further

Lavy

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lavy

Uploaded by

Copyright:

Available Formats

Traditional System

Problems: Traditional Approach

DATA BASE MANAGEMENT SYSTEM

Where does the DBMS Fit In

Difference Between the File and DBMS Operations

Three Layer Architecture

The Conceptual view is how the DBA views it.

The Internal view is how the data is actually stored.

Object Based logical model

Record Based logical model

Relational Model Basics

Keys in relational model

Database Design Techniques

First Normal Form

Second Normal Form

Prime attribute: An attribute that is member of the primary key K

Third Normal Form

Normal Forms Defined Informally

All attributes depend on the key

All attributes depend on the whole key

General Normal Form Definitions (For Multiple Keys)

BCNF (Boyce-Codd Normal Form)

Achieving the BCNF by Decomposition

File Organization & Indexing

DBMS stores data on hard disks

This means that data needs to be

Basics of Data storage on hard disk

Database Tables on Hard Disk

Unordered Or Heap File

Search (or update) operation

Ordered or Sequential File

Hash File (2)

Dept. of Computing Science, University of Aberdeen

B004 B005 B007

Indexed Sequential Access Method

Creating indexes in SQL

Access methods are the primary means to achieve improved performance

structures help to improve the performance further

You might also like