Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Kingdom of Saudi Arabia

Ministry of Education
Imam Muhammad Ibn Saud Islamic University
College of Computer and Information Sciences

Topic 1
Database System Architecture

Dr. Abdul Khader Jilani Saudagar


Information Systems Department
0
Outlines

• Centralized Architectures
• Client-Server Architectures
• Parallel Systems
• Distributed Systems
• Network Types

0
IS Department
Centralized Systems
• Centralized Systems: Run on a single computer system and do not interact
with other computer systems
• E.g. single-user database system on a personal computer
• General-purpose computer system consists of one to a few CPUs and a
number of device controllers that are connected through a common bus that
provides access to shared memory.
• Two ways of using computers:
• Single-user system (e.g., personal computer or workstation): desktop unit, single
user, usually has only one CPU and one or two hard disks; the OS may support only
one user at a time. Ex. single user OS include DOS, Windows1-3, and Windows 95/98
etc.
• Multi-user system: more disks, more memory, multiple CPUs, and a multi-user OS.
Serve a large number of users who are connected to the system via terminals. Often
called server systems. Ex. Unix, thin client and mainframe operating systems. In the
Unix server, multiple remote users have the access to the Unix shell at the same time.
In the case of thin client, multiple X WINDOW sessions are spread across various
terminals. These terminals are powered by a single machine.

0
IS Department
A Centralized Computer System

0
IS Department
Centralized Systems
• Database systems designed for use by single-
users usually do not provide many facilities as
compared to multi-user database.
• Do not support concurrency control (not required for a
single user)
• Provision of crash-recovery is not envisaged (or may be
primitive)
• Merely making a backup of a database before any update

0
IS Department
Client-Server Systems
• Client server is network architecture which separates a client (often an application that
uses a graphical user interface) from a server. Each instance of the client software can
send requests to a server. While their purposes vary somewhat, the basic architecture
remains the same.

• Server systems satisfy requests generated at m client systems, whose general structure is
shown below

• Centralized Systems act as Server Systems that satisfy requests generated by client
systems

0
IS Department
Client-Server Systems (Cont.)
• The database functionality can be divided into:
• Back-end: manages access structures, query evaluation and optimization,
concurrency control and recovery.
• Front-end: consists of tools such as forms, report-writers, and graphical user
interface facilities.
• The interface between the front-end and the back-end is through SQL or
through an application program interface (JDBC, ODBC).
• Thanks to standardization the front-end and back-end can be given by
different vendors

0
IS Department
Client-Server Systems (Cont.)
• Advantages of replacing mainframes with networks of
workstations or personal computers connected to back-end
server machines:
• better functionality for the cost
• flexibility in locating resources and expanding facilities
• better user interfaces
• easier maintenance
• Server systems can be broadly categorized into two kinds:
• transaction servers which are widely used in relational database
systems, and data servers, used in object-oriented database
systems

0
IS Department
Transaction Servers
• Transaction server: it is also called query server systems
or SQL server systems; it is an interface with to client
requests
• Clients send requests to the server system where the
transactions are executed, and results are shipped back to the
client.
• Requests are specified in SQL, and sent to the server
through a remote procedure call (RPC) mechanism.
• Open Database Connectivity (ODBC) is a C language
application program interface standard from Microsoft for
connecting to a server, sending SQL requests, and
receiving results.
• JDBC standard similar to ODBC, for Java

0
IS Department
Transaction Server Process Structure

• A typical transaction server consists of multiple processes


accessing data in a shared memory.
• The processes that form part of the database system
include:
• Server processes
• These processes receive user queries (transactions), execute them and
send the results back,
• Processes may be multi-threaded, allowing a single process to execute
several user queries concurrently
• Definition: A thread is like a process, but multiple threads execute as part of the
same process.
• Lock manager process
• This process implements lock manager functionality, which includes lock
grant, lock release, and deadlock detection.

0
IS Department
Transaction Server Processes (Cont.)

• Database writer process


• Output modified buffer blocks to disks continually
• Log writer process
• This process outputs log records from the log record buffer to a stable
storage (hard disk)
• Log: history of previous transactions.
• Checkpoint process
• Performs periodic checkpoints (control procedures)
• “Process monitor” process
• It monitors other processes, and takes recovery actions if any of the
other processes fail
• E.g. aborting any transactions being executed by the failed process and
restarting it.

0
IS Department
Transaction System Processes (Cont.)

0
IS Department
Transaction System Processes (Cont.)
• Shared memory contains all shared data, such as:
• Buffer pool: A buffer pool is memory used to cache table and index data
pages as they are being read from disk, or being modified.
• Lock table
• Log buffer: containing log records waiting to be output to the log on stable
storage
• Cached query plans (reused if same query submitted again)
• All database processes can access shared memory
• To ensure that no two processes are accessing the same data
structure at the same time, databases systems implement mutual
exclusion using either Operating system semaphores
• A semaphore is a protected variable and constitutes the classic method for
restricting access to equivalent shared resources (e.g. storage) in a
multiprogramming environment. (Definition from WiKipedia)

0
IS Department
Transaction System Processes (Cont.)
• To avoid overhead of inter-process communication (message passing) for lock
request/grant, each database process operates directly on the lock table data
structure instead of sending requests to lock manager process
• Mutual exclusion ensured on the lock table using semaphores, or more commonly,
atomic instructions
• If a lock can be obtained, the lock table is updated directly in shared memory
• If a lock cannot be immediately obtained, a lock request is noted in the lock table and
the process (or thread) then waits for lock to be granted
• When a lock is released, releasing process updates lock table to record release of
lock, as well as grant of lock to waiting requests (if any)
• Process/thread waiting for lock may either:
• Continually scan lock table to check for lock grant, or
• Use operating system semaphore mechanism to wait on a semaphore.
• Semaphore identifier is recorded in the lock table
• When a lock is granted, the releasing process signals the
semaphore to tell the waiting process/thread to proceed

• Lock manager process still used for deadlock detection

0
IS Department
Data Servers
• Used in LANs, where there is a very high speed connection between
the clients and the server, the client machines are comparable in
processing power to the server machine, and the tasks to be executed
are computation intensive.
• Data Servers ship (send) data to client machines where processing is
performed, and then ship (send) results back to the server machine.
• This architecture requires a full back-end functionality at the clients.
• Data server architectures have been particularly used in many object-
oriented database systems
• Data servers provide some interesting issues:
• Page-Shipping versus Item-Shipping
• Locking
• Data Caching
• Lock Caching

0
IS Department
Data Servers (Cont.)
• Page-Shipping versus Item-Shipping
• The unit of communication for data can be of coarse granularity, such as page, or
fine granularity such as item (object, tuples).
• Smaller unit of shipping more messages
• Page shipping: sending items even before they were requested (Pre-fetching)
• It is worth pre-fetching related items along with requested item
• Locking
• Lock are usually granted by the server for the data items that it ships to the client
machines
• Drawback of page shipping: client machines may be granted locks of too coarse
granularity – a lock on a page implicitly locks all the item contained in the page
• Locks on a pre-fetched item can be P{called back} by the server, and returned by
client transaction if the pre-fetched item has not been used.
• Some techniques to reduce the impact of Locks of items in the page have been when
there are lock conflicts (deescalation).
• Locks on unused items can then be returned to server and the locks can be allocated to other
clients.

0
IS Department
Data Servers (Cont.)
• Data Caching
• Data can be cached at client even in between transactions
• But check that data is up-to-date before it is used (cache
coherency)
• Check can be done when requesting lock on data item
• Lock Caching
• Locks can be retained by client system even in between
transactions
• Transactions can acquire cached locks locally, without contacting
server
• Server calls back locks from clients when it receives conflicting
lock requests. Client returns lock once no local transaction is
using it.
• Similar to deescalation, but across transactions.

0
IS Department
Parallel Systems
• Definition in WikiPedia
• A parallel database system seeks to improve performance through
parallelization of various operations, such as loading data, building
indexes and evaluating queries. Although data may be stored in a
distributed fashion, the distribution is governed solely by
performance considerations. Parallel database improves processing
and input/output speeds by using multiple CPUs and disks in
parallel.
• Centralized and client-server database systems are not powerful
enough to handle such applications. In parallel processing, many
operartions are performed simultaneously, as opposed to serial
processing, in which the computational steps are performed
sequentially.

0
IS Department
Parallel Systems
• Parallel database systems consist of multiple processors
and multiple disks connected by a fast interconnection
network.
• A coarse-grain parallel machine consists of a small
number of powerful processors
• A massively parallel or fine-grain parallel machine
utilizes thousands of smaller processors.
• Two main performance measures:
• throughput --- the number of tasks that can be completed in a given
time interval
• response time --- the amount of time it takes to complete a single
task from the time it is submitted

0
IS Department
Parallel Systems, what for?

• A system that processes a large number of small


transactions can improve the throughput by processing
many transactions in parallel.
• A system that processes large transactions can improve
response time as well as throughput by performing sub-
tasks of each transaction in parallel.

0
IS Department
Running a given task in less time by
increasing the degree of parallelism is
called speedup.

Handling larger tasks by increasing


the degree of parallelism is called
scaleup.
0
IS Department
Speed-Up and Scale-Up
• Speed-up: It refers to how much a parallel system (large system with
more processors) is faster than a corresponding sequential system
(small system).
• It measures how much we can increase the processing speed by increasing
parallelism, for a single transaction.
• Measured by:
speedup = execution time of a task Q in a sequential system
execution of a task Q in parallel system
• Speedup is said to be linear if the above equation equals to N.
• Scale-up: It relates to the ability to process larger tasks in the same
amount of time by providing more resources
• It measure how well we can handle an increased number of transactions by
increasing parallelism.
• Measured by: (Task QN is N time bigger than Q).
scaleup = execution time of a task Q on a sequential machine Ms
execution time of a task QN on a parallel machine N time bigger that Ms
• The Scale-up is linear if the equation is equals to 1.

0
IS Department
Speed-up
• Linear Speed-up: the parallel system is said to be linear speed-up if the speed-up
is N when the larger system has N times the resources (CPU, Disk, and so on).
• Sub-Linear Speed-up: the parallel system is said to be sub-linear speed-up if the
speed-up is less than N when the larger system has N times the resources

0
IS Department
Scaleup
• Linear Scale-up: the parallel system is said to be linear scale-up if the scale-up is
equal to 1 when the larger system has N times the resources.
• Sub-Linear Scale-up: the parallel system is said to be sub-linear scale-up if the
scale-up is less than 1 when the larger system has N times the resources

0
IS Department
Batch and Transaction Scaleup
There are two kinds of scale-up that are relevant in parallel database systems,
depending on how the size of the task is measured

• Batch scale-up:
• It means that the size of the database increases, and the tasks, whose runtime
depends on the size of the database, become large jobs.
• E.g. scan of a relation has a size proportional to the size of the database
• In this case, the size of the database is a measure of the size of the problem
• Transaction scale-up
• It means that the rate at which transactions are submitted to the database
increases and the size of the database increases proportionally to the
transaction rate.
• N-times as many users submitting requests (hence, N-times as many requests)
to an N-times larger database, on an N-times larger computer.
• Well-suited to parallel executions, since transactions can run concurrently and
independently on separate processors.

0
IS Department
Factors Limiting Speedup and Scaleup

• For having a perfect effect of using


parallelism we should have linear scale-up
and linear speed-up.
• However, Speed-up and Scale-up are often
sub-linear.
• This means that there are some factors that
limit the efficiency of using parallelism in
databases.

0
IS Department
Factors Limiting Speedup and Scaleup
• A number of factors work against efficient parallel operation
and can diminish both speed-up and scale-up
• Startup costs : There is a startup cost associated with initiating
a single process. Cost of starting up multiple processes may
dominate computation time, if the degree of parallelism is high
(many processors).
• Interference : Processes accessing shared resources
(e.g.,system bus, disks, or locks) compete with each other, thus
spending time waiting on other processes, rather than
performing useful work.
• Skew: Increasing the degree of parallelism increases the
variance in service times of parallely executing tasks. Overall
execution time determined by slowest of parallely executing
tasks (refer to the example in the textbook).
0
IS Department
Interconnection Network Architectures

● Parallel systems consist of a set of


components (processors, memory, and
disks) that can communicate with each other
via an interconnection network.
● There are three commonly used types of
interconnection.

0
IS Department
Interconnection Network Architectures
Bus. System components send data on and receive data from a single
communication bus (Ethernet for example)
Does not scale well with increasing parallelism.
Mesh. Components are arranged as nodes in a grid, and each component is
connected to all adjacent components
Communication links grow with growing number of components, and so it scales better.
can reach each other via at most 2 (√n -1) links;
Hypercube. Components are numbered in binary; components are connected to
one another if their binary representations differ in exactly one bit.
n components are connected to log2(n) other components and can reach each other via at
most log2(n) links;
reduces communication delays (lower than in mesh)

0
IS Department
Parallel Database Architectures
• There are several architectural models for
parallel machines. The most important
architecture are:
• Shared memory -- processors share a common memory
• Shared disk -- processors share a common set of disks
(clusters)
• Shared nothing -- processors share neither a common
memory nor common disk
• Hierarchical -- hybrid of the above architectures

0
IS Department
Parallel Database Architectures

0
IS Department
Shared Memory
• Processors and disks have access to a common memory,
typically via a bus or through an interconnection network.
• Extremely efficient communication between processors —
data in shared memory can be accessed by any processor
without having to move it using a software.
• A processor can send messages to other processors much faster by
using memory writes (less that 1 ms) than by sending a message
through a communication mechanism
• Drawback – architecture is not scalable beyond 32 or 64
processors since the bus or the interconnection network
becomes a bottleneck
• Widely used for lower degrees of parallelism (4 to 8).

0
IS Department
Shared Disk
• All processors can directly access all disks via an
interconnection network, but the processors have
private memories.
• The memory bus is not a bottleneck
• Architecture provides a degree of fault-tolerance — if a
processor fails, the other processors can take over its tasks
since the database is resident on disks that are accessible from
all processors.
• Drawback: bottleneck now occurs at interconnection to
the disk subsystem.
• Shared-disk systems can scale to a somewhat larger
number of processors, but communication between
processors is slower.
0
IS Department
Shared Nothing
• A node consists of a processor, memory, and one or more disks.
Processors at one node communicate with another processor at
another node using an interconnection network.
• A node operates as a server for the data on its own disk(s).
• Data accessed from local disks (and local memory accesses) do not
pass through interconnection network, thereby minimizing the
interference of resource sharing.
• Shared-nothing multiprocessors can be scaled up to thousands of
processors without interference.
• Main drawback: cost of communication and non-local disk access;
sending data involves software interaction at both ends.

0
IS Department
Hierarchical
• It combines the characteristics of shared-memory, shared-disk, and
shared-nothing architectures.
• Top level is a shared-nothing architecture – nodes connected by an
interconnection network, and do not share disks or memory with each
other.
• Each node of the system could be a shared-memory system with a few
processors.
• Alternatively, each node could be a shared-disk system, and each of
the systems sharing a set of disks could be a shared-memory system.
• Reduce the complexity of programming such systems by distributed
virtual-memory architectures

0
IS Department
Attempts to reduce the complexity of programming such systems
have yielded distributed virtual-memory architectures, where
logically there is a single shared memory, but physically there are
multiple disjoint memory systems; the virtual memory- mapping
hardware, coupled with system software, allows each processor to
view the disjoint memories as a single virtual memory. Since
access speeds differ, depending on whether the page is available
locally or not, such an architecture is also referred to as a
nonuniform memory architecture (NUMA).

0
IS Department
Distributed Systems
Definition in Wikipedia
A distributed database is a database that is under the control of a central database management
system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in
multiple computers located in the same physical location, or may be dispersed over a network of
interconnected computers.
Collections of data (eg. in a database) can be distributed across multiple physical locations. A
distributed database is distributed into separate partitions/fragments. Each partition/fragment of a
distributed database may be replicated.
Besides distributed database replication and fragmentation, there are many other distributed database
design technologies. For example, local autonomy, synchronous and asynchronous distributed
database technologies. These technologies' implementation can and does definitely depend on the
needs of the business and the sensitivity/confidentiality of the data to be stored in the database. And
hence the price the business is willing to spend on ensuring data security, consistency and integrity.

Link: http://en.wikipedia.org/wiki/Distributed_database

0
IS Department
Distributed Systems
Data spread over multiple machines (also
referred to as sites or nodes.
E.g. a bank with different branches, each branch
has its own database
Network interconnects the machines
In a distributed database system, the
database is stored in different computers
They do not share main memory or disks
Data shared by users on multiple machines
The computers in distributed systems
communicate with each other through various
communication media, such as high speed
networks or telephone lines.
The computers in a distributed system may
vary in size and function, ranging from
workstations up to mainframe systems
0
IS Department
Distributed Databases
• Difference with shared-nothing parallel databases
• distributed databases are geographically distributed
• they are separately administrated
• slower interconnection
• Homogeneous distributed databases
• Same software/schema on all sites, data may be partitioned among sites
• Goal: provide a view of a single database, hiding details of distribution
• Heterogeneous distributed databases
• Different software/schema on different sites
• Goal: integrate existing databases to provide useful functionality
• Differentiate between local and global transactions
• A local transaction accesses data in the single site at which the transaction
was initiated.
• A global transaction either accesses data in a site different from the one at
which the transaction was initiated or accesses data in several different sites.

0
IS Department
Trade-offs in Distributed Systems
There are several reasons to build distributed database systems including

• Sharing data – it enables users at one site able to access the data
residing at some other sites. (e.g. different branches of a bank)
• Autonomy – each site is able to retain a degree of control over data
stored locally.
• A global database administrator is responsible for the entire system, but also
delegates part of the responsibility to local database administrators
• Higher system availability through redundancy — data can be
replicated at remote sites, and system can function even if a site fails.
• Disadvantage: added complexity required to ensure proper
coordination among sites.
• Software development cost.
• Greater potential for bugs.
• Increased processing overhead.
• See an example of distributed database systems in the textbook.

0
IS Department
Kingdom of Saudi Arabia
Ministry of Education
Imam Muhammad Ibn Saud Islamic University
College of Computer and Information Sciences

End of Topic 1
IS 371 : Advanced Database Management Systems

You might also like