Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

DATABASE SYSTEMS

LECTURE -1-
Lecture(1): Introduction + Data Models
Dr. Maher Salama
Course Goals
■ The world is drowning in data!
■ Need computer scientists to help manage this data
– Help domain scientists achieve new discoveries
– Help companies provide better services (e.g., Facebook)
– Help governments (and universities!) become more efficient
■ This course includes:
– Existing tools PLUS data management principles
– This is not just a class on SQL!

Al-Ba'th University - Informatic Engineering College - 3rd year 2


Turing Awards in Data Management
Charles Bachman, 1973
IDS and CODASYL

Ted Codd, 1981


Relational model
You could be next!!

Jim Gray, 1998


Transaction processing

Michael Stonebraker, 2014


INGRES and Postgres

Al-Ba'th University - Informatic Engineering College - 3rd year 3


Course Format
■ Lectures:
– Location: here!
http://www.mediafire.com/folder/5xgrgsdb19la8/Database+Sy
stems
– Please attend
■ Class and section participation:
– Post and answer questions (in class, email
(dr.maher.salama@gmail.com))

Al-Ba'th University - Informatic Engineering College - 3rd year 4


Textbook
■ Database Systems:
– The Complete Book,
■ Hector Garcia-Molina,
■ Jeffrey Ullman,
■ Jennifer Widom
– Second edition.

REQUIRED READING !

Al-Ba'th University - Informatic Engineering College - 3rd year 5


Other Texts

■ Database Management Systems, Ramakrishnan

■ Fundamentals of Database Systems, Elmasri, Navathe

■ Foundations of Databases, Abiteboul, Hull, Vianu

■ Data on the Web, Abiteboul, Buneman, Suciu

Al-Ba'th University - Informatic Engineering College - 3rd year 6


Course Overview
■ Intro
■ Relational Data Models
■ Query Languages
– Data models, SQL, Relational Algebra, Datalog
■ Non-relational data
■ RDMBS internals and query optimization
■ Parallel query processing
■ DBMS usability, conceptual design
■ Transactions
Al-Ba'th University - Informatic Engineering College - 3rd year 7
Now onto the real stuff…

Al-Ba'th University - Informatic Engineering College - 3rd year 8


Outline of Today’s Lecture

■Overview of database management systems

■Data Models

Al-Ba'th University - Informatic Engineering College - 3rd year 9


Database
■ What is a database ?

■ Give examples of databases

Al-Ba'th University - Informatic Engineering College - 3rd year 10


Database
■ What is a database ?
– A collection of files storing related data

■ Give examples of databases


– Accounts database;
– payroll database;
– U’s students database;
– Amazon’s products database;
– airline reservation database
Al-Ba'th University - Informatic Engineering College - 3rd year 11
The Evolution of Database Systems
■ In essence a database is nothing more than a collection of information that
exists over a long period of time, often many years. In common parlance, the
term database refers to a collection of data that is managed by a DBMS. The
DBMS is expected to:
1) Allow users to create new databases and specify their schemas (logical
structure of the data), using a specialized data-definition language.
2) Give users the ability to query the data (a “query” is database lingo for a
question about the data) and modify the data, using an appropriate
language, often called a query language or data-manipulation language.
3) Support the storage of very large amounts of data — many terabytes or more
— over a long period of time, allowing efficient access to the data for queries
and database modifications.
4) Enable durability, the recovery of the database in the face of failures, errors
of many kinds, or intentional misuse.
5) Control access to data from many users at once, without allowing
unexpected interactions among users (called isolation) and without actions
on the data to be performed partially but not completely (called atomicity).
Al-Ba'th University - Informatic Engineering College - 3rd year 12
Early Database Management Systems
■ The first commercial database management systems appeared in the late 1960’s.
■ These systems evolved from file systems, which provide some of item (3);
■ file systems store data over a long period of time, and they allow the storage of large
amounts of data. However, file systems do not generally guarantee that data cannot be
lost if it is not backed up, and they don’t support efficient access to data items whose
location in a particular file is not known.
■ Further, file systems do not directly support item (2), a query language for the data in
files.
■ Their support for (1) — a schema for the data — is limited to the creation of directory
structures for files.
■ Item (4) is not always supported by file systems; you can lose data that has not been
backed up.
■ Finally, file systems do not satisfy (5). While they allow concurrent access to files by
several users or processes, a file system generally will not prevent situations such as
two users modifying the same file at about the same time, so the changes made by one
user fail to appear in the file

Al-Ba'th University - Informatic Engineering College - 3rd year 13


Early Database Management Systems…
■ The first important applications of DBMS’s were ones where data
was composed of many small items, and many queries or
modifications were made.
■ Examples of these applications are:
– Banking systems: maintaining accounts and making sure that
system failures do not cause money to disappear.
– Airline reservation systems: these, like banking systems,
require assurance that data will not be lost, and they must
accept very large volumes of small actions by customers.
– Corporate record keeping: employment and tax records,
inventories, sales records, and a great variety of other types of
information, much of it critical.
Al-Ba'th University - Informatic Engineering College - 3rd year 14
Early Database Management Systems…
■ The early DBMS’s required the programmer to visualize data much as it was
stored.
■ These database systems used several different data models for describing
the structure of the information in a database, chief among them the
“hierarchical” or tree-based model and the graph-based “network”
model. The latter was standardized in the late 1960’s through a report
of CODASYL (Committee on Data Systems and Languages).
■ A problem with these early models and systems was that they did not
support high-level query languages.
■ For example, the CODASYL query language had statements that allowed
the user to jump from data element to data element, through a graph of
pointers among these elements.
■ There was considerable effort needed to write such programs, even for
very simple queries.

Al-Ba'th University - Informatic Engineering College - 3rd year 15


Database Management System
■ What is a DBMS ?

■ Give examples of DBMSs

Al-Ba'th University - Informatic Engineering College - 3rd year 16


Database Management System
■ What is a DBMS ?
– A big program written by someone else that allows us to
manage efficiently a large database and allows it to
persist over long periods of time
■ Give examples of DBMSs
– Oracle, IBM DB2, Microsoft SQL Server, Vertica, Teradata
– Open source: MySQL (Sun/Oracle), PostgreSQL, CouchDB
– Open source library: SQLite
■ We will focus on relational DBMSs

Al-Ba'th University - Informatic Engineering College - 3rd year 17


An Example: Online Bookseller
■ What data do we need?

■ • What capabilities on the data do we need?

Al-Ba'th University - Informatic Engineering College - 3rd year 18


An Example: Online Bookseller
■ What data do we need?
– Data about books, customers, pending orders, order histories, trends,
preferences, etc.
– Data about sessions (clicks, pages, searches)
– Note: data must be persistent! Outlive application
– Also note that data is large… won’t fit all in memory
■ What capabilities on the data do we need?
– Insert/remove books, find books by author/title/etc., analyze past order history,
recommend books, …
– Data must be accessed efficiently, by many users
– Data must be safe from failures and malicious users

Al-Ba'th University - Informatic Engineering College - 3rd year 19


Challenges for a DBMS
■ Alice and Bob receive a $200 gift certificate as wedding gift

Al-Ba'th University - Informatic Engineering College - 3rd year 20


Challenges for a DBMS
■ Alice and Bob receive a $200 gift certificate as wedding gift

■ Questions:
Lesson:
1. What is the ending credit?
2. What if second book costs $130?
a DBMS needs to handle
3. What if system crashes?
various scenarios
Al-Ba'th University - Informatic Engineering College - 3rd year 21
What a DBMS Does
■ Describe real-world entities in terms of stored data
■ Persistently store large datasets
■ Efficiently query & update
– Must handle complex questions about data
– Must handle sophisticated updates
– Performance matters
■ Change structure (e.g., add attributes)
■ Concurrency control: enable simultaneous updates
■ Crash recovery
■ Security and integrity

Al-Ba'th University - Informatic Engineering College - 3rd year 22


Overview of a DBMS

Al-Ba'th University - Informatic Engineering College - 3rd year 23


Storage and Buffer Management
■ The buffer manager is responsible for partitioning the available main memory into
buffers, which are page-sized regions into which disk blocks can be transferred.
Thus, all DBMS components that need information from the disk will interact with
the buffers and the buffer manager, either directly or through the execution engine.
The kinds of information that various components may need include:
1) Data: the contents of the database itself.
2) Metadata: the database schema that describes the structure of, and
constraints on, the database.
3) Log Records: information about recent changes to the database; these support
durability of the database.
4) Statistics: information gathered and stored by the DBMS about data properties
such as the sizes of, and values in, various relations or other components of the
database.
5) Indexes: data structures that support efficient access to the data.
Al-Ba'th University - Informatic Engineering College - 3rd year 24
The players
■ DB application developer:
– writes programs that query and modify data
■ DB designer: establishes schema
– DB administrator: loads data, tunes system, keeps whole thing running
■ Data analyst:
– data mining, data integration
■ DBMS implementer: builds the DBMS
■ Research on new systems

Al-Ba'th University - Informatic Engineering College - 3rd year 25


Data Management Concepts
■ Data model
■ Declarative query language
■ Data independence
■ Query optimization
■ Physical design
■ Transactions

Al-Ba'th University - Informatic Engineering College - 3rd year 26


Data Management Concepts
■ Data models
– Relational: SQL, RA, and Datalog Data models
– NoSQL: SQL++
■ RDMBS internals
– Relational algebra
– Query optimization and physical design Query Processing
■ Parallel query processing
– Spark and Hadoop
■ Conceptual design
– E/R diagrams
– Schema normalization
Using DBMS
■ Transactions
– Locking and schedules
– Writing DB applications
Al-Ba'th University - Informatic Engineering College - 3rd year 27
Data Models…..

Al-Ba'th University - Informatic Engineering College - 3rd year 28


Data Models
■ Recall our example: want to design a database of books:
– author, title, publisher, pub date, price, etc
– How should we describe this data?
■ Data model = mathematical formalism (or conceptual
way) for describing the data

Al-Ba'th University - Informatic Engineering College - 3rd year 29


Data Models
■ Relational
– Data represented as relations
■ Semi-structured (JSon)
– Data represented as trees
■ Key-value pairs – Used by NoSQL systems
■ Graph
■ Object-oriented

Al-Ba'th University - Informatic Engineering College - 3rd year 30


Example: storing FB friends

Al-Ba'th University - Informatic Engineering College - 3rd year 31


Elements of Data Models
■ Instance
– The actual data
■ Schema
– Describe what data is being stored
■ Query language
– How to retrieve and manipulate data

Al-Ba'th University - Informatic Engineering College - 3rd year 32


Turing Awards in Data Management
Charles Bachman, 1973
IDS and CODASYL

Ted Codd, 1981


Relational model
You could be next!!

Jim Gray, 1998


Transaction processing

Michael Stonebraker, 2014


INGRES and Postgres

Al-Ba'th University - Informatic Engineering College - 3rd year 33


Relational Model
■ Data is a collection of relations / tables:

■ mathematically, relation is a set of tuples


– each tuple appears 0 or 1 times in the table
– order of the rows is unspecified
Al-Ba'th University - Informatic Engineering College - 3rd year 34
The Relational Data Model
■ Degree (arity) of a relation = #attributes
■ Each attribute has a type.
– Examples types:
■ Strings: CHAR(20), VARCHAR(50), TEXT
■ Numbers: INT, SMALLINT, FLOAT
■ MONEY, DATETIME, …
■ Few more that are vendor specific
– Statically and strictly enforced

Al-Ba'th University - Informatic Engineering College - 3rd year 35


Keys
■ Key = one (or multiple) attributes that uniquely identify a
record Note: future updates to
the database may create
duplicate no_employees

Al-Ba'th University - Informatic Engineering College - 3rd year 36


Multi-attribute Key

Key = fName,lName
(what does this mean?)

Al-Ba'th University - Informatic Engineering College - 3rd year 37


Multiple Keys
key Another key

We can choose one key and designate it as primary key E.g.:


primary key = SSN

Al-Ba'th University - Informatic Engineering College - 3rd year 38


Foreign Key
Company(cname, country, no_employees, for_profit)
Country(name, population)

Al-Ba'th University - Informatic Engineering College - 3rd year 39


Keys: Summary
■ Key = columns that uniquely identify tuple
– Usually we underline
– A relation can have many keys, but only one can be
chosen as primary key
■ Foreign key:
– Attribute(s) whose value is a key of a record in some
other relation
– Foreign keys are sometimes called semantic pointer

Al-Ba'th University - Informatic Engineering College - 3rd year 40


Query Language
■ SQL
– Structured Query Language
– Developed by IBM in the 70s
– Most widely used language to query relational data
■ Other relational query languages
– Datalog, relational algebra

Al-Ba'th University - Informatic Engineering College - 3rd year 41


Our First DBMS
■ SQL Lite
■ Will switch to SQL Server later in practical Labs

Demo-1-

Al-Ba'th University - Informatic Engineering College - 3rd year 42


Discussion
■ Tables are NOT ordered
– they are sets or multisets (bags)
■ Tables are FLAT
– No nested attributes
■ Tables DO NOT prescribe how they are implemented /
stored on disk
– This is called physical data independence

Al-Ba'th University - Informatic Engineering College - 3rd year 43


Thank You

Any Question?

Al-Ba'th University - Informatic Engineering College - 3rd year 44

You might also like