Professional Documents
Culture Documents
Database Concepts © Leo Mark
Database Concepts © Leo Mark
DATABASE CONCEPTS
Leo Mark College of Computing Georgia Tech
(January 1999)
Database Concepts Leo Mark
Course Contents
q q q q q q q q q q
Introduction Database Terminology Data Model Overview Database Architecture Database Management System Architecture Database Capabilities People That Work With Databases The Database Market Emerging Database Technologies What You Will Be Able To Learn More About
INTRODUCTION
q q q q q q q q q q
What a Database Is and Is Not Models of Reality Why use Models? A Map Is a Model of Reality A Message to Map Makers When to Use a DBMS? Data Modeling Process Modeling Database Design Abstraction
your personal address book in a Word document a collection of Word documents a collection of Excel Spreadsheets a very large flat file on which you run some statistical analysis functions data collected, maintained, and used in airline reservation data used to support the launch of a space shuttle
Models of Reality
DML REALITY structures processes DDL DATABASE SYSTEM DATABASE
A database is a model of structures of reality q The use of a database reflect processes of reality q A database system is a software system which supports the definition and use of a database q DDL: Data Definition Language q DML: Data Manipulation Language Database Concepts
q
Leo Mark
Models can be useful when we want to examine or manage part of the real world The costs of using a model are often considerably lower than the costs of using or experimenting with the real world itself Examples:
airplane simulator nuclear power plant simulator flood warning system model of US economy model of a heat reservoir map
q q q q
A model is a means of communication Users of a model must have a certain amount of knowledge in common A model on emphasized selected aspects A model is described in some language A model can be erroneous A message to map makers: Highways are not painted red, rivers dont have county lines running down the middle, and you cant see contour lines on a mountain [Kent 78]
q q q q q
persistent storage of data centralized control of data control of redundancy control of consistency and integrity multiple user support sharing of data data documentation data independence control of access and security backup and recovery
the initial investment in hardware, software, and training is too high the generality a DBMS provides is not needed the overhead for security, concurrency control, and recovery is too high data and applications are simple and stable real-time requirements cannot be met by it multiple user access is not needed
Data Modeling
REALITY structures processes DATABASE SYSTEM MODEL data modeling
The model represents a perception of structures of reality The data modeling process is to fix a perception of structures of reality and represent this perception In the data modeling process we select aspects and we abstract
Process Modeling
REALITY structures processes process modeling DATABASE SYSTEM MODEL
q q
The use of the model reflects processes of reality Processes may be represented by programs with embedded database queries and updates Processes may be represented by ad-hoc database queries and updates at run-time
DML PROG DML
Database Design
The purpose of database design is to create a database which
q q
is a model of structures of reality supports queries and updates modeling processes of reality runs efficiently
Abstraction
It is very important that the language used for data representation supports abstraction We will discuss three kinds of abstraction:
q q q
Classification
In a classification we form a concept in a way which allows us to decide whether or not a given phenomena is a member of the extension of the concept. CUSTOMER
Aggregation
In an aggregation we form a concept from existing concepts. The phenomena that are members of the new concepts extension are composed of phenomena from the extensions of the existing concepts AIRPLANE WING ENGINE
Database Concepts Leo Mark
COCKPIT
Generalization
In a generalization we form a new concept by emphasizing common aspects of existing concepts, leaving out special aspects CUSTOMER 1 CLASS
S T
BUSINESS CLASS
ECONOMY CLASS
Generalization (cont.)
Subclasses may overlap CUSTOMER BUSINESS 1 CLASS CLASS
S T
TRUCKS
Database Concepts Leo Mark
HELICOPTERS
GLIDERS
T
classification
generalization
T intension
DATABASE TERMINOLOGY
q q q q q q q
Data Models Keys and Identifiers Integrity and Consistency Triggers and Stored Procedures Null Values Normalization Surrogates - Things and Names
Data Model
A d tamd l a oe
q q q
c n is o n ta n fo e p s in : o s ts f o tio s r x re s g
F IG T C E U E L H -S H D L F IG T L H# 11 0 55 4 92 1 22 4
Database Concepts Leo Mark
Dynamic constraints apply to change of database state E.g., All FLIGHT-SCHEDULE entities must have precisely one DEPT-AIRPORT relationship
F IG T C E U E L H -S H D L F IG T L H# 11 0 55 4 92 1 22 4 AL E IR IN d elta am erican scan in ian d av u sair WE D Y EKA m o w e fr m o P IC RE 16 5 10 1 40 5 21 3 D P -A P R E T IR O T F IG T L H# 11 0 92 1 55 4 22 4 A P R -C D IR O T O E atl cp h lax bs o
D P -A P R E T IR O T AL E IR IN d elta am erican scan in ian d av u sair d elta WE D Y EKA m o w e fr m o tu P IC RE 16 5 10 1 40 5 21 3 28 5 F IG T L H# 11 0 92 1 55 4 22 4 9 7 A P R -C D IR O T O E atl cp h lax bs o atl
F IG T C E U E L H -S H D L F IG T L H# 11 0 55 4 92 1 22 4
Database Concepts Leo Mark
Integrity: does the model reflect reality well? Consistency: is the model without internal conflicts? a FLIGHT# in FLIGHT-SCHEDULE cannot be null because it models the existence of an entity in the real world a FLIGHT# in DEPT-AIRPORT must exist in FLIGHT-SCHEDULE because it doesnt make sense for a non-existing FLIGHTSCHEDULE entity to have a DEPT-AIRPORT
F IG T C E U E L H -S H D L F IG T L H# 11 0 55 4 92 1 AL E IR IN d elta am erican scan in ian d av u sair WE D Y EKA m o w e fr m o P IC RE 16 5 10 1 40 5 21 3 D P -A P R E T IR O T F IG T L H# 11 0 92 1 55 4 22 4 A P R -C D IR O T O E atl cp h lax bs o
q q
F IG T C E U E L H -S H D L F IG T L H# 11 0 55 4 92 1 22 4
Database Concepts Leo Mark
Null Values
CS O E UT MR
CS O E# UT MR
Lisa Smith Lisa Jones George Foreman inapplicable unknown Mary Blake
NM AE
MID NN M A E AE
D A TS A U RF TTS
Null-value unknown reflects that the attribute does apply, but the value is currently unknown. Thats ok! q Null-value inapplicable indicates that the attribute does not apply. Thats bad! q Null-value inapplicable results from the direct use of catch all forms in database design. q Catch all forms are ok in reality, but detrimental in database design. Database Concepts
Leo Mark
Normalization
F IG T C E U E L H -S H D L F IG T L H# F IG T E K A L H -WE D Y AL E IR IN d elta WE D Y EKAS m,fr o m,w o e,fr fr P IC RE 16 5 10 1 40 5 F IG T L H# 11 0 55 4 92 1 11 0 F IG T C E U E L H -S H D L F IG T L H# 11 0 55 4 92 1 11 0 55 4 55 4
Database Concepts Leo Mark
u no nr mli 11 0 a ze d
55 4 92 1
WE D Y EKA m o m o fr fr w e fr
55 4 AL E IR IN d elta am erican scan in ian d av d elta am erican am erican WE D Y EKA m o m o fr fr w e fr P IC RE 16 5 10 1 F IG T C E U E L H -S H D L F IG T L H# 11 0 55 4 92 1 AL E IR IN d elta am erican scan in ian d av 55 4
just righ t!
P IC RE 16 5 10 1 40 5
redu nda nt
40 5 16 5 10 1 10 1
customer
name-based representation
reality
name custom# addr customer
customer custom# name addr
customer
surrogate-based representation
name-based: a thing is what we know about it q surrogate-based: Das ding an sich [Kant] q surrogates are system-generated, unique, internal identifiers Database Concepts
q
Leo Mark
ER-Model Hierarchical Model Network Model Inverted Model - ADABAS Relational Model Object-Oriented Model(s)
ER-Model
q q q
q q
The ER-Model is extremely successful as a database design model Translation algorithms to many data models Commercial database design tools, e.g., ERwin No generally accepted query language No database system is based on the model
attrib te u
su set b relatio sh n ip
ty e p
d ed eriv
Database Concepts Leo Mark
attrib te u
k a u ey ttrib te
E
E 1 d x p
E 3
E 2
ER Model - Example
dt ep tim e airp rt o n e am airp rt o ad r d airp rt o co e d 1 airp rt o 1 street city zip arriv tim e cu m sto er# cu m sto er n e am reserv atio n d ate n cu m sto er n n flig t h in ce stan arriv airp rt o n 1 in ce stan o f dt ep airp rt o n p flig t h sch u ed le dm o estic flig t h
w ds eek ay
flig t# h
seat#
Database Concepts Leo Mark
ER-Model - Operations
q
Several navigational query languages have been proposed A closed query language as powerful as relational languages has not been developed None of the proposed query languages has been generally accepted
Hierarchical Model
q q q
Commercial systems include IBMs IMS, MRIs System-2000 (now sold by SAS), and CDCs MARS IV
record types: flight-schedule, flight-instance, etc. q field types: flight#, date, customer#, etc. q parent-child relationship types (1:n only!!): (flight-sched,flight-inst), (flight-inst,customer) q one record type is the root, all other record types is a child of one parent record type only q substantial duplication of customer instances q asymmetrical model of n:m relationship types
q
Database Concepts Leo Mark
q q
duplication of customer instances avoided still asymmetrical model of n:m relationship types
flig t-sch h ed flig t# h flig t-in h st d ate cu m sto er cu m sto er# d t-airp ep airp rt-co e o d cu m n e sto er am
[fo ea flig t-sch r ch h ed fo ea flig t-in w d te= 0 2 8 r ch h st ith a 1 2 9 fo ea cu m w n m J sen g th firsto e] r ch sto er ith a e= en , et e n
[fo ea flig t-sch r ch h ed fo ea flig t-in w d te= 0 2 8 g th first r ch h st ith a 1 2 9 , et e g th n tflig t-in wa erth d te] et e ex h st, h tev e a
Network Model
q q q
q q
Based on the CODASYL-DBTG 1971 report Commercial systems include, CA-IDMS and DMS-1100
q owner record types: flight-schedule, customer q member record type: reservations q DBTG-set types: FR, CR q n-m relationships cannot be modeled directly q recursive relationships cannot be modeled directly Database Concepts
Leo Mark
d ate
cu m sto er#
p rice
flig t-sch u h ed le flig t# h F R reserv n atio flig t# h cu m sto er cu m sto er# F an C are R dR d ate C R cu m n e sto er am cu m sto er#
The operations in the Network Model are generic, navigational, and procedural
q ery u :
(1 fin flig t-sch u wereflig t# F ) d h ed le h h = 2 (2 fin firstreserv no F ) d atio f R (3 fin n treserv no F ) d ex atio f R F 1 (4 fin o n o C ) d wer f R R 1 R 2 F R R 3 R 4 R 5 R 6
C 1
C R
C 4
navigation is cumbersome; tuple-at-a-time many different currency indicators multiple copies of currency indicators may be needed if the same path is traveled twice external schemata are only sub-schemata
Relational Model
q q q
Commercial systems include: ORACLE, DB2, SYBASE, INFORMIX, INGRES, SQL Server Dominates the database market on all platforms
airlin e: ch 0 ar(2 )
w d: eek ay ch ) ar(2
p rice: d ,2 ec(6 )
flig t# h
d ate
cu m sto er#
Powerful set-oriented query languages Relational Algebra: procedural; describes how to compute a query; operators like JOIN,
SELECT, PROJECT
Relational Calculus: declarative; describes the desired result, e.g. SQL, QBE insert, delete, and update capabilities
d ate .P
cu m sto er#
cu m am sto er-n e
LO E
_ c
_ c
Object-Oriented Model(s)
q q q
based on the object-oriented paradigm, e.g., Simula, Smalltalk, C++, Java area is in a state of flux object-oriented model has object-oriented repository model; adds persistence and database capabilities; (see ODMG-93, ODL, OQL) object-oriented commercial systems include GemStone, Ontos, Orion-2, Statice, Versant, O2
object-relational model has relational repository model; adds object-oriented features; (see SQL3) q object-relational commercial systems include Starburst, POSTGRES Database Concepts
q
Leo Mark
Object-Oriented Paradigm
q q q q q q q q q q q q q
object class object attributes, primitive types, values object interface, methods; body, implementations messages; invoke methods; give method name and parameters; return a value encapsulation visible and hidden attributes and methods object instance; object constructor & destructor object identifier, immutable complex objects; multimedia objects; extensible type system subclasses; inheritance; multiple inheritance operator overloading references represent relationships transient & persistent objects
O2-like syntax
class flight-instance { type tuple (flight-date: tuple ( year: integer, month: integer, day: integer); instance-of: flight-schedule, passengers: set (customer) inv customer::reservations) method add-passenger(new-passenger:customer):boolean, /*adds to passengers; invokes customer.make-reservation */ remove-passenger(passenger: customer):boolean} /*removes from passengers; invokes customer.cancel-reservation*/ class customer { type tuple (customer#: integer, customer-name: tuple ( fname: string, lname: string) reservations: set (flight-instance) inv flight-instance::passengers) method make-reservation(new-reservation: flight-instance): boolean, cancel-reservation(reservation: flight-instance): boolean}
O2-like syntax
customer-name: tuple ( fname: string, lname: string) reservations: set (flight-instance) inv flight-instance::passengers) main () { transaction::begin(); all-customers: set( customer); /*makes persistent root to hold all customers */ customer c= new customer; /*creates new customer object */ c= tuple (customer#: 111223333, customer-name: tuple( fname: Leo, lname: Mark)); all-customers += set( c); transaction::commit();}
Database Concepts Leo Mark
DATABASE ARCHITECTURE
q q q
ANSI/SPARC 3-Level DB Architecture Metadata - What is it? Why is it important? ISO Information Resource Dictionary System (ISO-IRDS)
q q q q
a database is divided into schema and data the schema describes the intension (types) the data describes the extension (data) Why? Effective! Efficient!
data data
data data
Database Concepts Leo Mark
database database
Database Concepts Leo Mark
Conceptual Schema
q
Describes all conceptually relevant, general, timeinvariant structural aspects of the universe of discourse Excludes aspects of data representation and physical organization, and access
CUSTOMER NAME ADDR SEX AGE
External Schema
q
Describes parts of the information in the conceptual schema in a form convenient to a particular user groups view Is derived from the conceptual schema
MALE-TEEN-CUSTOMER NAME ADDR TEEN-CUSTOMER(X, Y) = CUSTOMER(X, Y, S, A) CUSTOMER WHERE SEX=M AND 12<A<20; ADDR SEX AGE
NAME
Internal Schema
q
Describes how the information described in the conceptual schema is physically represented to provide the overall best performance
CUSTOMER NAME ADDR SEX AGE
B+-tree on AGE
index on NAME
NAME
PTR
database database
Database Concepts Leo Mark
Physical data independence is a measure of how much the internal schema can change without affecting the application programs
database database
Database Concepts Leo Mark
Logical data independence is a measure of how much the conceptual schema can change without affecting the application programs
Schema Compiler
The schema compiler compiles schemata and stores them in the metadatabase
metadata
compiler
schemata
Query Transformer
Uses metadata to transform a query at the external schema level to a query at the storage level
metadata
DML
data
query transformer
schema compiler
34 21 data
System metadata:
Where data came from How data were changed How data are stored How data are mapped Who owns data Who can access data Data usage history Data usage statistics
Business metadata:
What data are available Where data are located What the data mean How to access the data Predefined reports Predefined queries How current the data are
System metadata are critical in a DBMS Business metadata are critical in a data warehouse
ISO-IRDS - Why?
q q q
q q
Are metadata different from data? Are metadata and data stored separately? Are metadata and data described by different models? Is there a schema for metadata? A metaschema? Are metadata and data changed through different interfaces? Can a schema be changed on-line? How does a schema change affect data?
ISO-IRDS Architecture
DL
metaschema
metaschema; describes all schemata that can be defined in the data model data dictionary schema; contains copy of metaschema; schema for format definitions; schema for data about application data data dictionary data; schema for application data; data about application data raw formatted application data
ISO-IRDS - example
metaschema
relations rel-name att-name dom-name
operation
att-name dom-name
(u1, supplier, insert) (u2, supplier, delete) supplier s# sname (s1, smith, london) (s2, jones, boston)
location
data
Teleprocessing Database File-Sharing Database Client-Server Database - Basic Client-Server Database - w/Caching Distributed Database Federated Database Multi-Database Parallel Databases
Teleprocessing Database
dumb terminal dumb terminal dumb terminal
communication lines
OSTP AP1 AP2 DBMS OSDB AP3
mainframe
database
DB
q q
Dumb terminals APs, DBMS, and DB reside on central computer Communication lines are typically phone lines Screen formatting transmitted via communication lines User interface character oriented and primitive Dumb terminals are gradually being replaced by micros
File-Sharing Database
AP1 DBMS OSNET AP2 AP3 DBMS OSNET
micros
LAN
OSNET OSDB
DB
APs and DBMS on client micros File-Server on server micro Clients and file-server communicate via LAN Substantial traffic on LAN because large files (and indices) must be sent to DBMS on clients for processing Substantial lock contention for extended periods of time for the same reason Good for extensive query processing on downloaded snapshot data Bad for high-volume transaction processing
micros LAN
micro(s) or mainframe
database
DB
APs on client micros Database-server on micro or mainframe Multiple servers possible; no data replication Clients and database-server communicate via LAN Considerably less traffic on LAN than with file-server Considerably less lock contention than with file-server
micros
LAN
DB OSNET DBMS OSDB
DB
micro(s) or mainframe
database
DB
DBMS on server and clients Database-server is primary update site Downloaded queries are cached on clients Change logs are downloaded on demand Cached queries are updated incrementally Less traffic on LAN than with basic clientserver database because only initial query result is downloaded followed by change logs Less lock contention than with basic clientserver database for same reason
Distributed Database
AP1 AP2 DDBMS OSNET&DB AP3 DDBMS OSNET&DB
external
external
DB
Database Concepts Leo Mark
DB
DB
APs and DDBMS on multiple micros or mainframes One distributed database Communication via LAN or WAN Horizontal and/or vertical data fragmentation Replicated or non-replicated fragment allocation Fragmentation and replication transparency Data replication improves query processing Data replication increases lock contention and slows down update transactions
partitioned non-replicated
A B
C D
A B
C D
non-partitioned replicated
A B
C D
partitioned replicated
Federated Database
AP1 AP2 DDBMS OSNET&DB AP3 DDBMS OSNET&DB
federation schema export schema1 conceptual1 internal1 export schema2 conceptual2 internal2 export schema3 conceptual3 internal3
DB
Database Concepts Leo Mark
DB
DB
Each federate has a set of APs, a DDBMS, and a DB Part of a federates database is exported, i.e., accessible to the federation The union of the exported databases constitutes the federated database Federates will respond to query and update requests from other federates Federates have more autonomy than with a traditional distributed database
Multi-Database
AP1 AP2 AP3 MULTI-DBMS OSNET&DB
MULTI-DBMS OSNET&DB
conceptual1 internal1
conceptual2 internal2
conceptual3 internal3
DB
Database Concepts Leo Mark
DB
DB
Multi-Database - characteristics
q
A multi-database is a distributed database without a shared schema A multi-DBMS provides a language for accessing multiple databases from its APs A multi-DBMS accesses other databases via a network, like the www Participants in a multi-database may respond to query and update requests from other participants Participants in a multi-database have the highest possible level of autonomy
Parallel Databases
q
A database in which a single query may be executed by multiple processors working together in parallel There are three types of systems:
Shared memory Shared disk Shared nothing
q
P
processors share memory via bus extremely efficient processor communication via memory writes bus becomes the bottleneck not scalable beyond 32 or 64 processors
P M
q q
q
M P
q
Database Concepts Leo Mark
processors share disk via interconnection network memory bus not a bottleneck fault tolerance wrt. processor or memory failure scales better than shared memory interconnection network to disk subsystem is a bottleneck used in ORACLE Rdb
q
M P
scales better than shared memory and shared disk main drawbacks:
higher processor communication cost higher cost of non-local disk access
disk striping improves performance via parallelism (assume 4 disks worth of data is stored)
q q
disk mirroring improves reliability via redundancy (assume 4 disks worth of data is stored) mirroring: via copy of data (c); via bit parity (p)
p
Database Concepts Leo Mark
DATABASE CAPABILITIES
q q q q q q q
Data Storage
q q q q q
Queries
SQL queries are composed from the following:
q
Selection
Point Range Conjunction Disjunction
Set operations
Cartesian Product Union Intersection Set Difference
Join
Natural join Equi join Theta join Outer join
Other
Duplicate elimination Sorting Built-in functions: count, sum, avg, min, max
Projection
Query Optimization
select flight#, date from reserv R, cust C where R.cust#=C.cust# and cust-name=LEO;
reserv flig t# h cu m sto er cu st# d ate cu st#
cu am st-n e
flig t# d h , ate
cost: 10,000x30
cu e= eo st-nam L
cost: 10,000x3,000
cu st#
cu am L st-n e= eo
reserv
cust
reserv
cust
cost: 3,000
Query Optimization
q q q q q
Database statistics Query statistics Index information Algebraic manipulation Join strategies
Nested loops Sort-merge Index-based Hash-based
Indexing
Why Bother? q Disk access time: 0.01-0.03 sec q Memory access time: 0.000001-0.000003 sec q Databases are I/O bound q Rate of improvement of (memory access time)/(disk access time) >>1 q Things wont get better anytime soon! Indexing helps reduce I/O !
Database Concepts Leo Mark
Indexing (cont.)
q q q
Clustering vs. non-clustering Primary and secondary indices I/O cost for lookup:
Heap: Sorted file: Single-level index: Multi-level index; B+-tree: Hashing: N/2 log2(N) log2(n)+1 logfanout(n)+1 2-3
Concurrency Control
flig t-in h st flig t# h d ate # ail-seats av reserv flig t# h d ate cu m sto er#
write(reserv(flight#,date,customer1)) write(flight-inst(flight#,date,seats))}
overbooking!
Database Concepts Leo Mark
Consistency
A transaction maps a correct database state to another correct state This requires that the transaction is correct, which is the responsibility of the application programmer
Isolation
Although multiple transactions execute concurrently, i.e. interleaved, not parallel, they appear to execute sequentially This is the responsibility of the concurrency control subsystem
Durability
The effect of a completed transaction is permanent This is the responsibility of the recovery manager
deadlock and livelock possible deadlock prevention: wait-die, wound-wait deadlock detection: rollback a transaction
Optimistic protocol: proceed optimistically; back up and repair if needed Pessimistic protocol: do not proceed until knowing that no back up is needed
Recovery
reserv flig t# h d ate cu m sto er# flig t-in h st flig t# h d ate # ail-seats av
102398 50 50 50 50 50 50 49 49
Recovery (cont.)
Storage types:
q q
Errors:
q q q
Logical error: transaction fails; e.g. bad input, overflow System error: transaction fails; e.g. deadlock System crash: power failure; main memory lost, disk survives Disk failure: head crash, sabotage, fire; disk lost
What to do?
Database Concepts Leo Mark
Recovery (cont.)
q
dont change database until ready to commit write-ahead to log to disk change the database q Immediate update (UNDO/NO-REDO): write-ahead to log on disk update database anytime commit not allowed until database is completely updated q Immediate update (UNDO/REDO): write-ahead to log on disk update database anytime commit allowed before database is completely updated q Shadow paging (NO-UNDO/NO-REDO): write-ahead to log in disk keep shadow page; update copy only; swap at commit Database Concepts
Leo Mark
Security
DAC: Discretionary Access Control q is used to grant/revoke privileges to users, including access to files, records, fields (read, write, update mode) MAC: Mandatory Access Control q is used to enforce multilevel security by classifying data and users into security levels and allowing users access to data at their own or lower levels only
System Analysts Database Designers Application Developers Database Administrators End Users
System Analysts
q
communicate with each prospective database user group in order to understand its
information needs processing needs
develop a specification of each user groups information and processing needs develop a specification integrating the information and processing needs of the user groups document the specification
Database Designers
q
choose appropriate structures to represent the information specified by the system analysts choose appropriate structures to store the information in a normalized manner in order to guarantee integrity and consistency of data choose appropriate structures to guarantee an efficient system document the database design
Application Developers
q q
implement the database design implement the application programs to meet the program specifications test and debug the database implementation and the application programs document the database implementation and the application programs
Database Administrators
q
data names, formats, relationships cross-references between data and application programs (see metadata slide) Database Concepts
Leo Mark
End Users
q
Parametric end users constantly query and update the database. They use canned transactions to support standard queries and updates. Casual end users occasional access the database, but may need different information each time. They use sophisticated query languages and browsers. Sophisticated end users have complex requirement and need different information each time. They are thoroughly familiar with the capabilities of the DBMS.
Prerelational vs. Relational Database Vendors Relational Database Products Relational Databases for PCs Object-Oriented Database Capabilities
1 4 1 2 1 0 8 6 4 2 0 1 9 9 4 1 9 9 5 1 9 9 6 1 9 9 7 1 9 9 8 1 9 9 9 p e ea i n l r r l to a r l t nl ea io a
q Prerelationalm etrevenueshrinkingabout9% ear. Crrently1.8billion/year ark /y u q R nalm etrevenuegrow gabout30% ear. Crrently11.5billion/year elatio ark in /y u
Database Concepts Oject-O ted ark en e o t 5 illio /y b Leo Mark rien m et rev u ab u 1 0m n ear
Database Vendors
S b ($6 4 ) y ase 6 M
O racle
In rm (+ stra)($ 9 M fo ix Illu 4 2 )
C -ID S (+ g ($ 4 M A M In ress) 4 7 )
N C($2 1 ) E 1M
F jitsu($ 8 ) u 1 6M
COMPARISON CRITERIA Relational Model Domains Referential Integ. violation options Taylor referential messages Referential WHERE clause Updatable views w/check option Database Objects User-defined data types BLOBs Additional data types Table structure Index structure Tuning facilities
yes yes image,video,text, messaging,spatial data types heap,clustered B-tree,bitmap, hash table and index allocation
yes yes binary,image,text, money,bit, varbinary heap,clustered B-tree index pre-fetch, I/O buffer cache, block size, table partitioning
no yes byte, text up to 2GB no choice B+-tree,clustered extents, table fragmentation by expression or round robin
COMPARISON CRITERIA Relational Model Domains Ref. integrity w/check option Taylor referential messages Referential WHERE clause Updatable views w/check option Database objects User-defined data types BLOBs Additional data types Table structure Index structure Tuning facilities
no restrict,cascade, set null no no yes, including union vews yes yes large objects
yes yes
yes yes byte,longbyte,long varchar,spatial, varbyte, money B-tree,hash,heap, ISAM B-tree,hash,ISAM table&index alloc. fill factors, pre-allocation
C OM P ARISON ORAC LE7 SYB ASE SQ L IN FORM IX C RITERIA VERSION 7.3 SERVER11 ON LIN E7.2 Triggers Level row &set-based set-based row &set-based Tim ing before,after after before,after,each N esting yes yes yes Stored procedures Language PL/SQL Transact-SQL SPL N esting yes yes yes Cursors yes yes yes External calls RPC RPC system calls Events yes tim e-based no Queries Lock ing lev el table, row table, page db,table,page,row AN SI SQL com ply entry level SQL92 entry level SQL92 entry level SQL92 Cursors forw ard forw ard forw ard,back w ard Outer join yes yes yes AN SI syntax no no no APIs ODBC DBLIB,CTLIB,ODBC ESQL,TP/XA,CLI, ODBC
C O M P A R ISO N M IC R O SO F T SQ L M D B 2 2 .1 .1 C A IB C R IT E R IA D E R V E R 6.5 O P E N IN G R E S1 .2 T riggers Lev e l se t-ba s ed s et& ro w -ba se d ro w -ba sed T im ing after befo re,a fter after N esting yes y es yes Store d procedures Language Transact-SQ L SQ L, 3G L SQ L-lik e N esting yes y es yes C urso rs yes y es no Ex te rnal calls sy stem ca ll y es no (db e v e nts) Ev e nts no use r-def functions ev ent alerte rs db Q ueries Lock ing lev el db,table, pa ge ,ro w db,table, page,ro w db,table,page A N SI SQ L com ply entry le v el SQ L92entry lev e l SQ L92entry le v e l SQ L92 C urso rs forw ard,back w a rd rw ard fo fo rw ard ,rela tiv e ,absolute O ute r join yes y es yes A N SI s y ntax no no yes A P Is ESQ L,D B LIB,O D B C, L,,O D B C ESQ ESQ L,T P/X A ,O D B C D ist m gt obje cts
COMPARISON CRITERIA Database Admin Tools SNMP support Security Partial backup & recovery Internet Internet support Connectivity, Distribution Gateways to other DBMSs
ORACLE7
SYBASE SQL SERVER11 Sybase SQL Mgr SQL Monitor yes C2 configurable
OracleWebServer
web.sql
MVS source through EDA/SQL (Adabas,IDMS,SQL /DS,VSAM), any APPC source, AS/ 400,DRDA,DB2,Tur boimage,Sybase,R db,RMS,Informix,C A-Ingres,SQL Server,Teradata part of base prod yes gateways yes yes
Adabas,AS/400, DB2,IDMS,IMS, Informix,Ingres, ISAM,SQL Server, Oracle,Rdb,RMS, seq.flies,SQL/DS, SybaseSQL Server, Teradata,VSAM
Oracle,Sybase, IMS,DB2
MICROSOFT SQL IBM DB@ 2.1.1 SERVER6.5 Enterprise Mgr, Perf Monitor yes NT integrated per table DB Director,Perf Monitor, Visual Explain yes three levels yes
SNMP support Security Partial backup & recovery Internet Internet support
CA-OpenIngres/ ICE
no n/a no no yes
DB2, Datacom, IMS, IDMS, VSAM, Oracle, Rdb, Albase, Informix, Oracle, Sybase CA-OpenIngres* yes,automatic through gateways yes no
C OM P ARISON C RITERIA Replication Recording H ot standby Peer-to-peer To other DBMSs Cascading Additional restrictions N am e length Colum ns Colum n size Tables Table size Table w idth Platform s (OS)
replic. log/trigger log buffer yes yes yes yes through gatew aysDirectConnect yes yes
30 254 2GB n/a n/a by colum n m ost U N IX, OS/2, VAX/VMS, MAC, W indow sN T, W indow s95
30 18 250 2767 1962 32,767 2 billion 477 m illion storage dependent 4 terabytes 6 storage dependent 2,767 3 m ost U N IX, OS/2, m ost U N IX, VAX/VMS, MAC W indow sN T, W indow sN T, W indow s95 W indow s95,
C O M P A R ISO N C R IT E R IA Replication Recording H ot standby Peer-to-peer To other D B M Ss Cascading A dditional restrictions N am e length Colum ns Colum n size Tables Table size Table w idth Platform s (OS)
M IC R O SO F T SQ L B D B 2 2.1 .1 C A IM SER V E R 6.5 O P E N IN G R E S1 .2 log y es no through OD B C no log y es y es D ataJoiner no rules(triggers) y es y es through gatew ay s y es
18 32 255 300 4005, ex cept LOB 2008 (B LOB s 2G B ) storage dependent n/a 64G B n/a storage dependent 2008 (B LOB s 2G B ) m ost U N IX , OS/2, m ost U N IX ,V A X / V A X /V M S, M A C V M S, W indow sN T, W indow sN T, W indow s95 (CA W indow s95, OpenIngres/ D esk top
Microsoft FoxPro for Windows Microsoft FoxPro for DOS Borlands Paradox for Windows Borlands dBASE IV Paradox for DOS R:BASE Microsoft Access
Primary Use Version M gt. Recovery Transac. M gt. Composite Objects M ultiple Inherit. Concur/ Locking Distribute Support Dynamic Evolution M ultimedia Language Interface Platforms
GemStone Coop environ. yes shadowp yes no no planned 3 locks optim pesim yes yes limited yes C,C++,OPAL Smalltalk SUN3&4, Apollo,PCs, VAX/VM S change notific.
ORION-2 CAD/CAM OIS, M M yes logs & shadowp yes yes yes 5 locks
VERSANT Colab. engineer yes yes yes yes 4 locks, 2PL yes yes limited no C, C++ SUN3&4
yes yes all feature yes LISP, C Symbolics, SUN3, HP, DECstation, Apollo change notific. pri/sha db
Special Feature
Primary Use Version Mgt. Recovery Transac. Mgt. Composite Objects Multiple Inherit. Concur/ Locking Distributed Support Dynamic Evolution Multimedia Language Interface Platforms Special Feature
O2 CAD/CAM, GIS, OIS limited yes yes yes yes yes optimistic yes yes limited yes C SUN OS4.0 or higher Vis. Interf. Powerful QL
Starburst CAD/CAM, KBS no rollback yes complex objects yes rules & rollback yes yes C, C++ IBM PC, RISC 6000 -
EMERGING DB TECHNOLOGIES
q q q q q q q q
WEB databases Multimedia Databases Mobile Databases Data Warehousing and Mining Geographic Information Systems Genome Data Management Temporal Databases Spatial Databases