Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Information Systems

Application - Databases

Course Introduction

Shawn A. Butler, Ph.D.


Senior Lecturer, Executive Education Program
Institute for Software Research
Carnegie Mellon University
Objectives
ƒ Understand the general course content and
student expectations
ƒ Motivate the course and relevance of Data
Management Systems
ƒ Provide an overview of important concepts that
will appear later in the course

© 2009, CMU-ISR 2
Course Structure
ƒ Foundations
• Introduction to DBMS
• Entity Relationship Conceptual Design
• Relational Model SQL Data Definition Language
• Relational Algebra and Calculus
• SQL Data Modeling Language

ƒ Applications I
• Database Application Development

ƒ Systems
• Overview of Storage and Indexing
• Overview of Query Evaluation
• Overview of Transaction Management

© 2009, CMU-ISR 3
Course Structure cont’d
ƒ Applications II
• Schema Refinement, FDs, Normalization
• Physical DB Design and Tuning
• Security and Authorization

ƒ Advance Topics
• Parallel and Distributed DBs
• Data Warehousing and Decision Support
• Data Mining
• Information Retrieval and XML Data
Management

© 2009, CMU-ISR 4
What is a Database Management
System?
ƒ First, what is a database?
• A collection of data
• Models the activities of one or more related
organizations
• Composed of:
ƒ Entities
ƒ Relationships between entities

ƒ A DBMS is the software designed to assist


in maintaining and utilizing large
collections of data

© 2009, CMU-ISR 5
Files vs. DBMS
ƒ Files
• Data read sequentially
• Large amounts of data have to move in and out
of main memory
• Finding data would be difficult
• Data consistency would be a problem
• Security difficult for users needing access to
only a subset of the data
• Recovery very difficult

© 2009, CMU-ISR 6
File vs. DBMS
ƒ DBMS provides for:
• Storage for large amounts of data that can be
quickly accessed
• Efficient management of data in and out of
main memory
• Crash recovery
• Security and Access control
• Inconsistency Management
• Special language to access and update data

© 2009, CMU-ISR 7
Why Use a DBMS?
ƒ Data Independence Æ Hide the details of data
representation and storage from application and reduce
development time
ƒ Efficient Data Access Æ DBMS is optimized for data storage
and retrieval
ƒ Data Integrity and Security Æ DBMS can enforce data
integrity constraints and provide sophisticated access
controls mechanisms
ƒ Data Administration Æ Trained DB administrators can
optimize data retrieval performance
ƒ Concurrent Access and Crash Recovery Æ Multiple users
accessing the data simultaneously and protects against
system failures

© 2009, CMU-ISR 8
Data Models
ƒ Data Model is a collection of concepts for
describing data
ƒ A Schema is a description of a particular
collection of data, using a given data
model
ƒ The Relational Model of Data, similar to a
set of records, is the most widely used
model used in modern DBMSs
ƒ The Semantic Model is a more abstract,
high-level data model that makes it easier
for users to think about the data
© 2009, CMU-ISR 9
Models In Use
ƒ Relational Model: DB2 (IBM), Informix,
Oracle, Sybase, MS Access, Paradox,
Tandem, etc.
ƒ Hierarchical Model: IMS (IBM)
ƒ Network Model: IDS and IDMS
ƒ Object-oriented Model: Objectstore and
Versant
ƒ Object-relational Model: DBMS Products
from IBM, Oracle, Versant, et al.

© 2009, CMU-ISR 10
Levels of Abstraction
ƒ Data Definition Language (DDL) is used to
define the external and conceptual
schemas
• Most widely used DDL is SQL

ƒ Three Levels of Abstraction


• Conceptual Schema Æ describes the data in
terms of the data model of the DBMS
• Physical Schema Æ Describes how the
conceptual schema are actually stored on
secondary storage devices
• External Schema Æ Also described in terms of
the data model, allow customized user access
© 2009, CMU-ISR 11
Levels of Abstraction

View 1 View 2 View 3

Conceptual Schema

Physical Schema

© 2009, CMU-ISR 12
More on Abstraction
ƒ External Schemas or Views
• A collection of one or more views and relations
from the conceptual schema
• Similar to a relation, but records in views are
not stored in the database
• Computed using the definition for the view

ƒ Physical Schema
• How the DBMS actually stores relationsÆ as
unordered files
• Auxiliary data structures, called indexes, which
speed up data retrieval

© 2009, CMU-ISR 13
Conceptual Schema
ƒ Also called the logical schema
ƒ Consists of two parts:
• Entities
• Relationships

ƒ Conceptual Schema Example


• Students, Courses, Textbooks, Professors,
Grades, etc.
• Enrolled, Requires, Has, Meets-in
• Students(sid:string, name:string, login:string,
age:integer, gpa:real)
• Enrolled(sid:string, cid:string, grade:string)
© 2009, CMU-ISR 14
Data Independence
ƒ Applications insulated from how data is
structured and stored
ƒ Logical data independence: Protection
from changes in logical structure of data
ƒ Physical data independence: Protection
from changes in physical structure of data

© 2009, CMU-ISR 15
Concurrency Control
ƒ Concurrency is one of the greatest benefits
of a DBMS
ƒ DBMS Performance can actually improve
during concurrency
• Disk accesses are frequent, and relatively slow,
so the CPU works more efficiently when disk is
spinning
• Interleaving actions from different users can
lead to inconsistency
• Users feel that they are using a single-user
system

© 2009, CMU-ISR 16
DBMS Transactions
ƒ A Transaction is an atomic sequence of
database actions (reads/writes)
ƒ Database must always be in a consistent
state after a transaction completes
ƒ Users can specify integrity constraints on
data Æ database will enforce these
constraints
ƒ Since Database doesn’t ‘understand’ the
data, ultimate database consistency is
user’s responsibility

© 2009, CMU-ISR 17
Concurrent Transactions
ƒ DBMS ensures that execution of {T1…Tn}
is equivalent to some serial executions of
T1’…Tn’
• Before reading/writing and object, a
transaction requests a lock on the object.
• All locks are released at the end of the
transaction
• Ti is writing, which affects Tj, if Ti has the lock
then Tj must wait until Ti completes the
transaction
• Deadlock can occur when interleaved
transactions are both waiting for each others
transactions to finish
© 2009, CMU-ISR 18
Incomplete Transactions
ƒ Atomicity properties ensure that either the
transaction completes in its entirety or
does not complete any part of the
transaction
ƒ Keep a log of all actions carried out by the
DBMS
• Before a change is made to the database the
log entry is forced to a safe area (WAL
protocol)
• After a crash, the effects of a partially executed
transaction are undone using the log

© 2009, CMU-ISR 19
The Log
ƒ The following actions are recorded in the log:
• Ti writes an object: the old value and the new value
• Ti commits/aborts: a log record indicating this action
• Logs must be forced to disk

ƒ Log records chained together by Transaction id


ƒ Logs are duplexed and archived on “stable
storage”
ƒ All log related activities are handled transparently
by the DBMS
ƒ Periodic checkpointing can reduce the time
needed to recover from a crash, but slows down
performance
© 2009, CMU-ISR 20
People and DBMSs
ƒ End users and DBMS vendors
ƒ DB application Programmers develop
packages that facilitate data access for end
users
ƒ Database Administrators
• Design logical/physical schemas
• Handles security and authorization
• Ensures data availability, crash recovery
• Database tuning as needs evolve
• Must understand how a database works

© 2009, CMU-ISR 21
Database Architecture

SQL Commands

Query Evaluation Engine

Files and Access Methods Transaction


Recovery Manager
Buffer Manager
Manager
Lock
Disk Space Manager Manager

DATABASE

© 2009, CMU-ISR 22
Summary
ƒ DBMS uses to maintain, query large
datasets
ƒ Benefits include recovery from system
crashes, concurrent access, quick
application development, data integrity
and security
ƒ Levels of abstraction give data
independence
ƒ DBMS typically have layered architectures
ƒ It all starts with the data!
© 2009, CMU-ISR 23

You might also like