Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Week 1 – Part 1: An Introduction to Databases and DBMSs

Database Systems
ƒ Database: A very large, integrated collection of data.
ƒ Examples: databases of customers, products,...
ƒ There are huge databases out there, for satellite and
Databases and DBMSs other scientific data, digitized movies,...; up to
Data Models and Data Independence hexabytes of data (i.e., 1018 bytes)
Concurrency Control and Database Transactions ƒ A database usually models (some part of) a real-world
Structure of a DBMS enterprise.
DBMS Languages ƒ Entities (e.g., students, courses)
ƒ Relationships (e.g., Paolo is taking CS564)
ƒ A Database Management System (DBMS) is a software
package designed to store and manage databases.

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 1 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 2

Why Use a DBMS? Why Study Databases??


ƒ Data independence and efficient access — You
don’t need to know the implementation of the
ƒ Shift from computation to information:
database to access data; queries are optimized.
Computers were initially conceived as neat
ƒ Reduced application development time —
devices for doing scientific calculations; more
Queries can be expressed declaratively,
and more they are used as data managers.
programmer doesn’t have to specify how they are
evaluated. ƒ Datasets increasing in diversity and volume:
ƒ Data integrity and security — (Certain) Digital libraries, interactive video, Human Genome
constraints on the data are enforced project, EOS project
automatically. … need for DBMS technology is exploding!
ƒ Uniform data administration. ƒ DBMS technology encompasses much of Computer
ƒ Concurrent access, recovery from crashes — Science:
Many users can access/update the database at OS, languages, theory, AI, multimedia, logic,...
the same time without any interference.
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 3 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 4
Data Models
Levels of Abstraction
ƒ A data model is a collection of concepts for Many views, single logical
describing data. schema and physical
View 1 View 2 View 3
ƒ A database schema is a description of the data that schema.
are contained in a particular database. ƒ Views (also called external
ƒ The relational model of data is the most widely Logical Schema
schemas) describe how
used data model today. users see the data. Physical Schema
ƒ Main concept: relation, basically a table with ƒ Logical schema* defines
rows and columns. logical structure
ƒ A relation schema, describes the columns, or ƒ Physical schema describes
attributes, or fields of a relation. the files and indexes used.
* Called conceptual schema back in the
old days.

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 5 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 6

Example: University Database Tables Represent Relations


ƒ Logical schema:
Students(Sid:String, Name:String, Login: Students Sid Name Login Age Gpa

String, Age:Integer,Gpa:Real) 00243 Paolo pg 21 4.0


01786 Maria mf 20 3.6
Courses(Cid:String, Cname:String, Credits:
02699 Klaus klaus 19 3.4
Integer)
02439 Eric eric 19 3.1
Enrolled(Sid:String, Cid:String,
Grade:String)
Courses Cid Cname Credits
ƒ Physical schema:
csc340 Rqmts Engineering 4
ƒ Relations stored as unordered files. csc343 Databases 6
ƒ Index on first column of Students. ece268 Operating Systems 3
ƒ (One) External Schema (View): csc324 Programming Langs 4
CourseInfo(Cid:String, Enrollment:Integer)
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 7 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 8
Data Independence
Concurrency Control
Applications insulated from how data is structured ƒ Concurrent execution of user programs is essential
and stored: (See also 3-layer schema structure.) for good DBMS performance.
ƒ Logical data independence: Protection from ƒ Because disk accesses are frequent, and
changes in the logical structure of data. relatively slow, it is important to keep the CPU
ƒ Physical data independence: Protection from humming by working on several user programs
changes in the physical structure of data. concurrently.
ƒ Interleaving actions of different user programs can
lead to inconsistency: e.g., cheque is cleared while
One of the most important benefits account balance is being computed.
of database technology! ƒ DBMS ensures that such problems don’t arise:
users can pretend they are using a single-user
system.
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 9 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 10

Database Transactions Scheduling Concurrent Transactions


DBMS ensures that execution of {T1, ... , Tn} is equivalent
ƒ Key concept is transaction, which is an atomic to some serial execution of T1, ... ,Tn.
sequence of database actions (reads/writes). ƒ Before reading/writing an object, a transaction
ƒ Each transaction executed completely, must leave the requests a lock on the object, and waits till the DBMS
DB in a consistent state, if DB is consistent when the gives it the lock. All locks are released at the end of
transaction begins. the transaction. (Strict 2-phase locking protocol.)
ƒ Users can specify some simple integrity constraints on ƒ Idea: If an action of Ti (say, writing X) affects Tk (which
the data, and the DBMS will enforce these constraints. perhaps reads X), one of them, say Ti, will obtain the
ƒ Beyond this, the DBMS does not really understand the lock on X first and Tk is forced to wait until Ti
semantics of the data. (e.g., it does not understand how completes; this effectively orders the transactions.
the interest on a bank account is computed). ƒ What if Tk already has a lock on Y and Ti later requests
ƒ Thus, ensuring that a transaction (run alone) preserves a lock on Y? (Deadlock!) Ti or Tk is aborted and
consistency is ultimately the user’s responsibility! restarted!

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 11 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 12
Ensuring Atomicity The Log

ƒ DBMSs ensure atomicity (all-or-nothing property), ƒ The following actions are recorded in the log:
even if system crashes in the middle of a transaction. ƒ Ti writes an object: the old value and the new
ƒ Idea: Keep a log (history) of all actions carried out by value; log record must go to disk before the
the DBMS while executing a set of transactions: changed page!
ƒ Before a change is made to the database, the ƒ Ti commits/aborts: a log record indicating this
corresponding log entry is forced to a safe action.
location. (WAL protocol; OS support for this is ƒ Log records chained together by transaction id, so it’s
often inadequate.) easy to undo a specific transaction (e.g., to resolve a
ƒ After a crash, the effects of partially executed deadlock).
transactions are undone using the log. (Thanks to ƒ Log is often duplexed and archived on “stable”
WAL, if log entry wasn’t saved before the crash, storage.
corresponding change was not applied to ƒ All log related activities (and in fact, all CC-related
database!) activities such as lock/unlock, dealing with deadlocks
etc.) are handled transparently by the DBMS.
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 13 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 14

Databases Make Folks Happy... Structure of a DBMS These layers


must consider
concurrency
control and
recovery
ƒ End users and DBMS vendors
ƒ Database application programmers, ƒ A typical DBMS has a Query Optimization
layered architecture. and Execution
e.g. smart webmasters
ƒ The figure does not
ƒ Database administrators (DBAs) Relational Operators
show the concurrency
ƒ Design logical /physical schemas control and recovery Files and Access Methods
ƒ Handle security and authorization components. Buffer Management
ƒ Data availability, crash recovery ƒ This is one of several
Disk Space Management
ƒ Database tuning as needs evolve possible architectures;
each system has its
own variation.
Must understand how a DBMS works! DB

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 15 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 16
Database Languages SQL, an Interactive Language
A DBMS supports several languages and several SELECT Course, Room,
modes of use: Building
ƒ Interactive textual languages, such as SQL; FROM Rooms, Courses
ƒ Interactive commands embedded in a host WHERE Code = Room
programming language (Pascal, C, Cobol, Java, AND Floor=”Ground"
etc.)
ƒ Interactive commands embedded in ad-hoc ROOMS Code Building Floor
development languages (known as 4GL), usually DS1 Ex-OMI Ground
N3 Ex-OMI Ground
with additional features (e.g., for the production of G Science Third
forms, menus, reports, ...)
COURSES Course Room Floor
ƒ Form-oriented, non-textual user-friendly languages Networks N3 Ground
such as QBE. Systems N3 Ground

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 17 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 18

SQL Embedded in ad-hoc


SQL Embedded in Pascal
Language (Oracle PL/SQL)
write(‘city name''?'); readln(city); declare Sal number;
EXEC SQL DECLARE E CURSOR FOR begin
SELECT NAME, SALARY select Sal into Salary from Emp where Code='5788'
FROM EMPLOYEES for update of Sal;
WHERE CITY = :city ; if Salary>30M then
EXEC SQL OPEN E ; update Emp set Sal=Salary*1.1 where Code='5788';
EXEC SQL FETCH E INTO :name, :salary ; else
while SQLCODE = 0 do begin
update Emp set Sal=Salary*1.2 where Code='5788';
write(‘employee:', name, ‘raise?');
end if;
readln(raise);
commit;
EXEC SQL UPDATE PERSON SET SALARY=SALARY+:raise
WHERE CURRENT OF E exception
EXEC SQL FETCH E INTO :name, :salary when no_data_found then
end; insert into Errors
EXEC SQL CLOSE CURSOR E values(‘No employee has given code',sysdate);
end;
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 19 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 20
Form-Based Interface
(in Access) DBMS Languages

Host Programming Language


DML — data manipulation
language DBMS
DDL — data definition
language (allows defini-tion DDL
DML
of database schema) 4GL
4GL — fourth generation
language, useful for
declarative query proces- Database
sing, report generation

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 21 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 22

DBMS Technology: Pros and Cons Conventional Files vs Databases


Pros
ƒ Data are handled as a common resource. Databases
ƒ Centralized management and economy of scale. Files Advantages — Good for
ƒ Availability of integrated services, reduction of Advantages — many data integration; allow for
redundancies and inconsistencies already exist; good for more flexible formats (not
simple applications; very just records)
ƒ Data independence (useful for the development and
efficient Disadvantages — high cost;
maintenance of applications)
Disadvantages — data drawbacks in a centralized
Cons facility
duplication; hard to
ƒ Costs of DBMS products (and associated tools), evolve; hard to build for
also of data migration. complex applications
ƒ Difficulty in separating features and services (with
potential lack of efficiency.) The future is with databases!

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 23 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 24
Types of DBMSs The Hierarchical Data Model
ƒ Conventional — relational, network, hierarchical,
Database consists of hierarchical record
consist of records of many different record types
structures;
structures a field may have as value a list of
(database looks like a collection of files)
records; every record has at most one parent
ƒ Object-Oriented — database consists of objects
(and possibly associated programs); database Book
schema consists of classes (which can be objects B365 War & Peace $8.99
too). parent
Borrower
ƒ Multimedia — database can store formatted data children
(i.e., records) but also text, pictures,... 38 Elm Toronto
ƒ Active databases — database includes event-
condition-action rules Borrowing
ƒ Deductive databases* — like large Prolog
programs, not available commercially Jan 28, 1994 Feb 24, 1994
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 25 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 26

The Network Data Model Comparing Data Models


A database now consists of records with pointers ƒ The oldest DBMSs were hierarchical, dating back to
the mid-60s. IMS (IBM product) is the most popular
(links) to other records. Offers a navigational view among them. Many old databases are hierarchical.
of a database. ƒ The network data model came next (early ‘70s).
Views database programmer as “navigator”, chasing
Customer
Customer links (pointers, actually) around a database.
1::n link
ƒ The network model was found to be too
implementation-oriented, not insulating sufficiently
the programmer from implementation features of
Order
Order
cycles of links are allowed network DBMSs.
ƒ The relational model is the most recent arrival.
Relational databases are cleaner because they don’t
Ordered
Ordered Sales
Sales allow links/pointers (necessarily implementation-
Part Part
Part Part History
History dependent).
ƒ Even though the relational model was proposed in
1970, it didn’t take over the database market till the
Region 80s.
Region
©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 27 ©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 28
Summary

ƒ DBMSs used to maintain and query large datasets.


ƒ Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security.
ƒ Levels of abstraction give data independence.
ƒ A DBMS typically has a layered architecture.
ƒ DBAs hold responsible jobs and are well-paid !
ƒ DBMS R&D is one of the broadest,
most exciting areas in CS.

©2005 John Mylopoulos CSC343 Introduction to Databases — University of Toronto Introduction — 29

You might also like