1.database System Concepts and Architecture

Database System Concepts and Architecture
Data Models
A collection of concepts that can be used to describe the structure of a database (data types, relationships, and constraints) basic operations (retrieval and updates) specify the dynamic aspect or behavior of a database application( user-defined operations ) example: COMPUTE_GPA, which can be applied to a STUDENT object
Jan 29, 2002
Categories of Data Models

High-level or conceptual data models (common users) low-level or physical data models (describe the details of how data is stored ) in between, representational (or implementation) data models can serve both categories above
Jan 29, 2002
Conceptual Data Model
Use concepts such as

Entities:a real-world object or concept (DEPT) (COURSE) Attributes:property of interest that further describes an entity (dept no, name, telephone, etc) Relationships:interaction among the entities (DEPT) provides (COURSE)
Jan 29, 2002
Physical Data Model

Describes how data is stored in the computer. It represents info such as
record formats record orderings access path: make search more efficient
Jan 29, 2002
Representational Data Model

Used in traditional commercial DMBS they include
Relational Data model Network model Hierarchical model
Jan 29, 2002
Schemas
Is the description of the database (not database

itself)
Specified during database design Not expected to change frequently A displayed schema is called a schema diagram (Fig 2.1)
Each object in the schema-such as STUDENT or COURSE-is a schema construct. Schema diagram represents only some aspects of a schema (name of record type, data element and some type of constraint)
Jan 29, 2002
Jan 29, 2002
Instances and Database State
The data in the database at a particular moment in time is called a database state or snapshot or current set of occurrences or instances in the database When we define a new database we have database state is empty state (schema specified only in DBMS) The initial state when the database is first populated Then At any point in time, the database has a current state schema evolution: when we need to change the schema
Jan 29, 2002
The Three-Schema Architecture
Importance of using DB approach

insulation of programs and data support of multiple user views use of a catalog to store the database description (schema).
The aim is to separate the user application and physical DB schema can be defined into three levels:
The internal level has an internal schema describes the physical storage structure of the database. uses a physical data model
Jan 29, 2002
Jan 29, 2002
The Three-Schema Architecture

The conceptual level has a conceptual schema describing the structure of the whole database for a community of users. It hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. A high-level data model or an implementation data model can be used at this level. The external or view level includes a number of external schemas or user views describing the part of the db that a particular user group is interested in and hides the rest of the db from that user group. A high-level data model or an implementation data model can be used at this level.
Jan 29, 2002
Data Independence
Is the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. Logical data independence: capacity to change the conceptual schema without having to change external schemas or application programs. Physical data independence: capacity to change the internal schema without having to change the conceptual (or external) schemas
Jan 29, 2002
DBMS Languages
Data Definition Language DDL: Language to specify conceptual and internal schemas for the database and any mappings between the two. Storage definition language SDL: used when clear distinction between conceptual and internal schema. view definition language VDL: specify user views and their mappings to the conceptual schema. data manipulation language DML:retrieval, insertion, deletion, and modification of the data
Jan 29, 2002
DBMS Languages ..
SQL relational database language: represents a combination of DDL,VDL, and DML, as well as statements for constraint specification and schema evolution There are two main types of DMLs:
A high-level or nonprocedural DML : specify complex DB operations. Example SQL(set-at-a-time) A low-level or procedural DML: retrieve individual records or objects from DB and process each separately (recordat-a-time).
Jan 29, 2002
DBMS Interfaces

Menu-Based Interfaces for Browsing

menus leads to formulation of a request
Forms-Based Interfaces
display a form for each user (insert, select) designed for nave users.
Graphical User Interfaces (GUI)

display schema as diagram. Utilize both menu and forms.
Jan 29, 2002
DBMS Interfaces
Natural Language Interfaces
Interfaces for Parametric Users (eg tellers) Interfaces for the DBA
Accept requests in native language and attempt to understand them. Refers to words in the schema and (standard words) to interpret the request.
goal is to min the number of keystroks required. (use of function) keys creating accounts, system privileges, changing schema, etc.
Jan 29, 2002
The Database System Environment
DBMS Component Modules (fig 2.3)

db & DBMS stored in disk controlled by OS. Stored data manager control access to DBMS SDM puts data in buffers in main memory DDL compiler process schema definitions and store it in meta data. Run-time-data-proc handles DB accesses @runtime receive update or retrieve and solve them on the DB Query-Compiler: handles high level queries: parse, analyze and interpret uses DB access code. Precompiler extract DML commands from app program
Jan 29, 2002
Jan 29, 2002
Database System Utilities

Loading: load existing files into the DB Backup: creates backup copy of the DB File reorganization: reorganize files for better performance Performance monitoring: monitor DB usage and provide statistics to DBA
Jan 29, 2002
Tools, Application Environments & Communications Facilities

Case: design phase data (information) repository: store catalog info, design decisions, usage, app program description, user information Application Developer: e.g. power builder. Help in development of DB design, GUI, query, update etc. Comm Software: allow users remotely to access the DB
Jan 29, 2002
Classification of DBManagement Systems
Data model:
Number of users supported by the system.

Single-user systems and Multiuser systems
relational, object, object-relational, hierarchical, network, and other.
Number of sites over which the database is distributed.
centralized, distributed DBMS (DDBMS) ,Homogeneous DDBMSs ,federated DBMS (develop software to access several autonomous preexisting databases stored under heterogeneous DBMSs. )
Jan 29, 2002
Classification of DBManagement Systems ..

Cost of the DBMS: 10K-100K. Single 1003K General-purpose vs Special-purpose (When performance is a primary consideration.
Example: on-line transaction processing (OLTP) systems, which must support a large number of concurrent transactions without imposing excessive delays. )
Jan 29, 2002
Jan 29, 2002
What is DBMS?
Need for information management A very large, integrated collection of data. Models real-world enterprise.
Entities (e.g., students, courses) Relationships (e.g., John is taking CS662)
A Database Management System (DBMS) is a software package designed to store and manage databases.
Why Use a DBMS?
Data independence and efficient access. Data integrity and security. Uniform data administration. Concurrent access, recovery from crashes. Replication control Reduced application development time.
Why Study Databases??
Shift from computation to information

at the low end: access to physical world at the high end: scientific applications Digital libraries, interactive video, Human Genome project, e-commerce, sensor networks ... need for DBMS/data services exploding
Datasets increasing in diversity and volume.

DBMS encompasses several areas of CS
OS, languages, theory, AI, multimedia, logic
Data Models
A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using the a given data model. The relational model of data is the most widely used model today.
Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields.
Levels of Abstraction
Many views, single conceptual (logical) schema and physical schema.

View 1
View 2
View 3
Views describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used.
Conceptual Schema Physical Schema
* Schemas are defined using DDL; data is modified/queried using DML.
Example: University Database
Conceptual schema:
Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string)
Relations stored as unordered files. Index on first column of Students. Course_info(cid:string, enrollment:integer)
Physical schema:

External Schema (View):
Data Independence
Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data.
* One of the most important benefits of using a DBMS!
Concurrency Control
Concurrent execution of user programs is essential for good DBMS performance.
Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. DBMS ensures such problems dont arise: users can pretend they are using a single-user system.
Because disk accesses are frequent, and relatively slow, it is important to keep the CPU humming by working on several user programs concurrently.
Transaction: An Execution Unit of a DB

Key concept is transaction, which is an atomic sequence of database actions (reads/writes). Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins.
Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraints. Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). Why not? Thus, ensuring that a transaction (run alone) preserves consistency is ultimately the users responsibility!
Scheduling Concurrent Transactions
DBMS ensures that execution of {T1, ... , Tn} is equivalent to some serial execution T1 ... Tn.
Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction. (Strict 2PL locking protocol.) Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions. What if Tj already has a lock on Y and Ti later requests a lock on Y? What is it called? What will happen?
Ensuring Atomicity
DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact. Idea: Keep a log (history) of all actions carried out by the DBMS while executing a set of Xacts:
Before a change is made to the database, the corresponding log entry is forced to a safe location. (WAL protocol.) After a crash, the effects of partially executed transactions are undone using the log. (Thanks to WAL, if log entry wasnt saved before the crash, corresponding change was not applied to database!)
The Log
The following actions are recorded in the log:
Ti writes an object: the old value and the new value.

Log record must go to disk before the changed page!
Ti commits/aborts: a log record indicating this action.
Log records chained together by Xact id, so its easy to undo a specific Xact (e.g., to resolve a deadlock). Log is often duplexed and archived on stable storage. All log related activities (and in fact, all CC related activities such as lock/unlock, dealing with deadlocks etc.) are handled transparently by the DBMS.
Databases make these folks happy ...

End users and DBMS vendors DB application programmers
e.g. webmasters
Database administrator (DBA)

Designs logical /physical schemas Handles security and authorization Data availability, crash recovery Database tuning as needs evolve
Must understand how a DBMS works!
Structure of a DBMS

These layers must consider concurrency control and recovery
A typical DBMS has a Query Optimization layered architecture. and Execution The figure does not show Relational Operators the concurrency control Files and Access Methods and recovery components. Buffer Management This is one of several Disk Space Management possible architectures; each system has its own variations.
DB
Summary
DBMS used to maintain, query large datasets. Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. Levels of abstraction give data independence. A DBMS typically has a layered architecture. DBAs hold responsible jobs and are well-paid! DBMS R&D is one of the broadest, mature areas in CS.
Data Models
A Database models some portion of the real world. Data Model is link between users view of the world and bits stored in computer. Many models have been proposed. We will concentrate on the Relational Model.
Student (sid: string, name: string, login: string, age: integer, gpa:real)
10101 11101
Describing Data: Data Models
A data model is a collection of concepts for describing data.
A database schema is a description of a particular collection of data, using a given data model. The relational model of data is the most widely used model today.
Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields.
Levels of Abstraction
Users
Views describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used. (sometimes called the ANSI/SPARC model)
View 1 View 2 View 3
Conceptual Schema
Physical Schema
DB
Data Independence:The Big Breakthrough of the Relational Model
A Simple Idea: Applications should be insulated from how data Logical data independence: is structured and Protection from changes in logical structure of data. stored.
View 1
View 2
View 3
Conceptual Schema
Physical Schema
Physical data independence: Protection from changes in physical structure of data.
DB
Q: Why are these particularly important for DBMS?
Why Study the Relational Model?
Most widely used model currently. DB2, MySQL, Oracle, PostgreSQL, SQLServer, Note: some Legacy systems use older models
e.g., IBMs IMS
Object-oriented concepts have recently merged in object-relational model Informix, IBM DB2, Oracle 8i Early work done in POSTGRES research project at Berkeley
XML (semi-structured)models emerging?
Relational Database: Definitions

Relational database: a set of relations. Relation: made up of 2 parts: Schema : specifies name of relation, plus name and type of each column. E.g. Students(sid: string, name: string, login: string, age: integer, gpa: real) Instance : a table, with rows and columns. #rows = cardinality #fields = degree / arity Can think of a relation as a set of rows or tuples. i.e., all rows are distinct
Example: University Database
Conceptual schema:
Students(sid: string, name: string, string, age: integer, gpa:real) Courses(cid: string, cname:string, Enrolled(sid:string, cid:string,
View 1
View 2
View 3
login: Conceptual Schema

Physical Schema credits:integer) grade:string)
External Schema (View):

Course_info(cid:string,enrollment:integer)
DB
One possible Physical schema :

Relations stored as unordered files. Index on first column of Students.
Ex: An Instance of Students Relation

sid 53666 53688 53650 name login Jones jones@cs Smith smith@eecs Smith smith@math age 18 18 19 gpa 3.4 3.2 3.8
Cardinality = 3, Arity = 5 All rows must be unique (set semantics)

Q: Do all values in each column of a relation instance have to be Unique? Q: Is Cardinality a schema property? Q: Is Arity a schema property?
SQL - A language for Relational DBs
SQL (a.k.a. Sequel), Intergalactic Standard for Data Stands for Structured Query Language Two sub-languages: Data Definition Language (DDL) create, modify, delete relations specify constraints administer users, security, etc. Data Manipulation Language (DML) Specify queries to find tuples that satisfy criteria add, modify, remove tuples
SQL Overview

CREATE TABLE <name> ( <field> <domain>, ) INSERT INTO <name> (<field names>) VALUES (<field values>) DELETE FROM <name> WHERE <condition> UPDATE <name> SET <field name> = <value> WHERE <condition> SELECT <fields> FROM <name> WHERE <condition>
Creating Relations in SQL
Creates the Students relation. Note: the type (domain) of each field is specified, and enforced by the DBMS whenever tuples are added or modified.
CREATE TABLE Students (sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa FLOAT)
Table Creation (continued)
Another example: the Enrolled table holds information about courses students take.
CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2))
Adding and Deleting Tuples
Can insert a single tuple using:
INSERT INTO Students (sid, name, login, age, gpa) VALUES (53688, Smith, smith@ee, 18, 3.2)
Can delete all tuples satisfying some condition (e.g., name = Smith):
DELETE FROM Students S WHERE S.name = Smith

Powerful variants of these commands are available; more later!
Keys
Keys are a way to associate tuples in different relations Keys are one form of integrity constraint (IC)
Enrolled
sid 53666 53666 53650 53666 cid grade Carnatic101 C Reggae203 B Topology112 A History105 B
Students
sid 53666 53688 53650 name login Jones jones@cs Smith smith@eecs Smith smith@math age 18 18 19 gpa 3.4 3.2 3.8
FORIEGN Key
PRIMARY Key
Primary Keys

A set of fields is a superkey if:

No two distinct tuples can have same values in all key fields
A set of fields is a candidate key for a relation if :

It is a superkey No subset of the fields is a superkey
what if >1 key for a relation?

one of the candidate keys is chosen (by DBA) to be the primary key. E.g. sid is a key for Students. What about name? The set {sid, gpa} is a superkey.
Primary and Candidate Keys in SQL
Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key.
Keys must be used carefully! For a given student and course, there is a single grade.
CREATE TABLE Enrolled CREATE TABLE Enrolled (sid CHAR(20) (sid CHAR(20) cid CHAR(20), cid CHAR(20), vs. grade CHAR(2), grade CHAR(2), PRIMARY KEY (sid), PRIMARY KEY (sid,cid)) UNIQUE (cid, grade)) Students can take only one course, and no two students in a course receive the same grade.
Foreign Keys, Referential Integrity
Foreign key : Set of fields in one relation that is used to `refer to a tuple in another relation. Must correspond to the primary key of the other relation. Like a `logical pointer. If all foreign key constraints are enforced, referential integrity is achieved (i.e., no dangling references.)
Foreign Keys in SQL
E.g. Only students listed in the Students relation should be allowed to enroll for courses.
sid is a foreign key referring to Students:
CREATE TABLE Enrolled (sid CHAR(20),cid CHAR(20),grade CHAR(2 PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students
Enrolled
sid 53666 53666 53650 53666 cid grade Carnatic101 C Reggae203 B Topology112 A History105 B
sid 53666 53688 53650
Students
name login Jones jones@cs Smith smith@eecs Smith smith@math
age 18 18 19
gpa 3.4 3.2 3.8
11111 English102 A
Enforcing Referential Integrity

Consider Students and Enrolled; sid in Enrolled is a foreign key that references Students. What should be done if an Enrolled tuple with a nonexistent student id is inserted? (Reject it!) What should be done if a Students tuple is deleted?
Also delete all Enrolled tuples that refer to it? Disallow deletion of a Students tuple that is referred to? Set sid in Enrolled tuples that refer to it to a default sid? (In SQL, also: Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown or `inapplicable.)
Similar issues arise if primary key of Students tuple is updated.
Integrity Constraints (ICs)

IC: condition that must be true for any instance of the database; e.g., domain constraints. ICs are specified when schema is defined. ICs are checked when relations are modified. A legal instance of a relation is one that satisfies all specified ICs. DBMS should not allow illegal instances. If the DBMS checks ICs, stored data is more faithful to real-world meaning. Avoids data entry errors, too!
Where do ICs Come From?

ICs are based upon the semantics of the real-world that is being described in the database relations. We can check a database instance to see if an IC is violated, but we can NEVER infer that an IC is true by looking at an instance. An IC is a statement about all possible instances! From example, we know name is not a key, but the assertion that sid is a key is given to us. Key and foreign key ICs are the most common; more general ICs supported too.
Relational Query Languages

A major strength of the relational model: supports simple, powerful querying of data. Queries can be written intuitively, and the DBMS is responsible for efficient evaluation.
The key: precise semantics for relational queries. Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change.
The SQL Query Language
The most widely used relational query language.

Current std is SQL-2003; SQL92 is a basic subset that we focus on in this class.
To find all 18 year old students, we can SELECT * sid name login age FROM Students S write: 53666 Jones jones@cs 18
WHERE S.age=18
gpa 3.4 3.2
53688 Smith smith@ee 18
To find just names and logins, replace the first line:

SELECT S.name, S.login
Querying Multiple Relations
What does the following query compute?

SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade='A'
Given the following instance of Enrolled
sid 53831 53831 53650 53666
cid grade Carnatic101 C Reggae203 B Topology112 A History105 B
we get:
S.name E.cid Smith Topology112
Semantics of a Query
A conceptual evaluation method for the previous query:

1. do FROM clause: compute cross-product of Students and Enrolled 2. do WHERE clause: Check conditions, discard tuples that fail 3. do SELECT clause: Delete unwanted fields
Remember, this is conceptual. Actual evaluation will be much more efficient, but must produce the same answers.
Cross-product of Students and Enrolled Instances

S.sid 53666 53666 53666 53666 53688 53688 53688 53688 53650 53650 53650 53650 S.name Jones Jones Jones Jones Smith Smith Smith Smith Smith Smith Smith Smith S.login jones@cs jones@cs jones@cs jones@cs smith@ee smith@ee smith@ee smith@ee smith@math smith@math smith@math smith@math S.age 18 18 18 18 18 18 18 18 19 19 19 19 S.gpa 3.4 3.4 3.4 3.4 3.2 3.2 3.2 3.2 3.8 3.8 3.8 3.8 E.sid 53831 53832 53650 53666 53831 53831 53650 53666 53831 53831 53650 53666 E.cid E.grade Carnatic101 C Reggae203 B Topology112 A History105 B Carnatic101 C Reggae203 B Topology112 A History105 B Carnatic101 C Reggae203 B Topology112 A History105 B
Queries, Query Plans, and Operators

SELECT eid, ename, title SELECT E.loc, AVG(E.sal) COUNT DISTINCT (E.eid) FROM Emp E FROM Emp E,E.loc P, Asgn A GROUP BY > WHERE E.salProj$50K WHERE E.eid = A.eid HAVING Count(*) > 5 AND P.pid = A.pid AND E.loc <> P.loc
Count distinct Having
Group(agg) Join Select

Join
Emp Proj Emp Emp Asgn
System handles query plan generation & optimization; ensures correct execution.
Employees Projects Assignments
Issues: view reconciliation, operator ordering, physical operator choice, memory management, access path (index) use,
Structure of a DBMS

A typical DBMS has a layered architecture. The figure does not show the concurrency control and recovery components. Each system has its own variations. The book shows a somewhat more detailed version. You will see the real deal in PostgreSQL. Its a pretty full-featured example Next class: we will start on this stack, bottom up.
These layers must consider concurrency control and recovery
Query Optimization and Execution

Relational Operators Files and Access Methods Buffer Management Disk Space Management
DB
Relational Model: Summary

A tabular representation of data. Simple and intuitive, currently the most widely used
Object-relational variant gaining ground
Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations.
Two important ICs: primary and foreign keys In addition, we always have domain constraints.
Powerful query languages exist.

SQL is the standard commercial one
DDL - Data Definition Language DML - Data Manipulation Language
Storage
The are two general types of storage media that is used with computers. They are :
Primary Storage - This includes all storage media that can be operated on directly by the CPU (RAM , L1 and L2 Cache Memory) Secondary Storage - This includes Hard Drives, CDs and tape.
Chapter 5
69
Memory Hierarchies & Storage Devices
The Memory Hierarchy is based upon speed of access. However, this speed comes with a price tag attached which varies inversely with the access time of memory. Like cars the faster the memory access is the more it costs.
Chapter 5
70
Primary Storage Level of Memory
The Primary Storage Level of Memory is generally made up of 3 Levels.

L1 Cache which is located on the CPU L2 Cache which is located near the CPU Main Memory which is the RAM figure that is often referred to in computer advertisements
Chapter 5
71
Secondary Storage Level of Memory
The Secondary Storage Level of Memory may be made up of 4 Levels.

Flash Memory or EEPROM Hard Drives CD ROMs Tape
Chapter 5
72
Terms Used in the Hardware Description of Hard Drives

Capacity - The number of bytes it can store. Single-sided vs. Double-sided - States if the disk/platter is written on one or both sides. Disk Pack - A collection of disks/platters that are assembled together into a pack. Track - A Circle of a small width on a disk. A disk surface will have many tracks.
Chapter 5 73

Sector - A segment or arc of a track. Block - is the division of a track into equal sized portions by the operating system. Interblock Gaps - These are fixed sized segments that separate the blocks. Read/Write Head - Actual reads/writes the information to the disk.
Chapter 5
74
Cylinder - Tracks with the same diameter that are located on the disk surface of a disk pack.
Chapter 5
75
Terms Used in Measuring Disk Operations

Seek Time (s)- The time it takes to position the read/write head on the desired track. It will be given in all problems that it is needed for. Rotational Delay (rd) - The average amount of time it takes the desired block to rotate into position under the read/write head. Rd=(1/2)*(1/p) min where p is rpm of the disk
Chapter 5
76

Transfer Rate (tr) - The rate at which information can be transferred to or from the disk. tr =(track size)/(1/p min) Block Transfer Time (btt) - The time it takes to transfer the data once the read/write head has been positioned. btt = B/tr msec where B is the block size in bytes.
Chapter 5
77

Bulk Transfer Rate (btr) - The rate at which multiple blocks can be written/read to contiguous blocks. Where G is the Interblock Gap btr = (B/(B+G)) * tr bytes/msec Rewrite Time (Trw) - Time it takes after a block is read to write that same block back to the disk or the time for one revolution.
Chapter 5 78
Computing Times
Given :
Seek Time (s) = 10 msec Rotational speed = 3600 rpm Track size = 50 KB Block size (B) = 512 bytes Interblock Gap = 128 bytes
Chapter 5
79
Problems for Disk Operations

Compute the average time it takes to transfer 1 block on this system. Compute the average time it takes to transfer 20 non-contiguous blocks that are located on the same track. Compute the average time it takes to transfer 20 contiguous blocks.
Chapter 5
80
Parallelizing Disk Access Using RAID

RAID - Stands for Redundant Arrays of Inexpensive Disks or Redundant Arrays of Independent Disks. RAIDs are used to provide increased reliability, increased performance or both.
Chapter 5
81
RAID Levels
Level 0 - has no redundancy and the best write performance but its read performance is not as good as level 1. Level 1 - uses mirrored disks which provide redundancy and improved read performance. Level 2 - provides redundancy using Hamming Codes
Chapter 5
82
RAID Levels
Level 3 - uses a single parity disk. Level 4 and 5 - use block-level data striping with level 5 distributing the data across all the disks. Level 6 - uses the P + Q redundancy scheme making use of the Reed-Soloman codes to protect against the failure of 2 Disks.
Chapter 5
83
Records
Records is the term used to refer to a number of related values or items. Each value or item is stored in a field of a specific data type. Records may be of either fixed or variable lengths.
Chapter 5
84
Variable Length Records in Files
There are several reasons a record with the same record type may be of variable length.
Variable length fields Repeating fields
For efficiency reasons different record types may be clustered in a file.
Chapter 5
85
Spanned Vs Unspanned Records
When the records in a file is stored on a disk they may be placed in blocks of a fixed size. This will rarely match the record size. So a decision must be made when the record size is smaller than the block size and the block size is not a multiple of the record size whether to store the record all in one block and have unused space or in two different blocks.
Chapter 5 86
File Operations
File may either be stored in contiguous blocks or by linking the blocks together. There are advantages and disadvantages to both methods. Operations on files can be group into two type of operations. Retrieval or update. Retrieval only involves a read while and update involves read, write and modification.
Chapter 5 87
File Structure
Heap (Pile) Files Hash (Direct) Files Ordered (Sorted) Files B - Trees
Chapter 5
88
Once the data has been brought into memory, it can be accessed by an instruction in .00000004 seconds by a machine running a 25MIPS. The disparity between time for memory access and disk access is enormous:we can perform 625,000 instructions in the time it takes to read /write one disk page. To put this in human terms if you were typing a letter for you boss and found a word you could not make out so you leave him a voice mail message. Since you were told to do nothing else but this you patiently wait for his reply doing Nothing! Unfortunately, he just went on vacation and does not get your message for 3 WEEKS. This is similar to the computer waiting .025 seconds to get the needed data into memory from a disk read.
Chapter 5 89

1.database System Concepts and Architecture

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1.database System Concepts and Architecture

Uploaded by

Copyright:

Available Formats

Database System Concepts and Architecture

Jan 29, 2002

Categories of Data Models

Jan 29, 2002

Conceptual Data Model

Use concepts such as

Jan 29, 2002

Physical Data Model

Jan 29, 2002

Representational Data Model

Relational Data model Network model Hierarchical model

Jan 29, 2002

Is the description of the database (not database

Jan 29, 2002

Instances and Database State

Jan 29, 2002

The Three-Schema Architecture

Importance of using DB approach

Jan 29, 2002

Jan 29, 2002

The Three-Schema Architecture

Jan 29, 2002

Jan 29, 2002

Jan 29, 2002

Jan 29, 2002

Menu-Based Interfaces for Browsing

Graphical User Interfaces (GUI)

Jan 29, 2002

Natural Language Interfaces

Jan 29, 2002

The Database System Environment

DBMS Component Modules (fig 2.3)

Jan 29, 2002

Database System Utilities

Jan 29, 2002

Tools, Application Environments & Communications Facilities

Jan 29, 2002

Classification of DBManagement Systems

Number of users supported by the system.

relational, object, object-relational, hierarchical, network, and other.

Number of sites over which the database is distributed.

Classification of DBManagement Systems ..

Jan 29, 2002

Entities (e.g., students, courses) Relationships (e.g., John is taking CS662)

Why Use a DBMS?

Why Study Databases??

Shift from computation to information

Datasets increasing in diversity and volume.

DBMS encompasses several areas of CS

OS, languages, theory, AI, multimedia, logic

Many views, single conceptual (logical) schema and physical schema.

Conceptual Schema Physical Schema

* Schemas are defined using DDL; data is modified/queried using DML.

Example: University Database

External Schema (View):

* One of the most important benefits of using a DBMS!

Concurrent execution of user programs is essential for good DBMS performance.

Transaction: An Execution Unit of a DB

Scheduling Concurrent Transactions

The following actions are recorded in the log:

Ti writes an object: the old value and the new value.

Ti commits/aborts: a log record indicating this action.