CH 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Chapter 2: Review of Database Systems

2.1 Evolution of DDBS



Computer System Architecture consisting of
interconnected multiple processors

They are classified into two basic categories:

Tightly coupled system

Loosely coupled system
Loosely Coupled Multiprocessor System

Every processor has its own memory module.

It is efficient when there is less interaction between tasks
running on different processors.

There are no memory conflicts in general.

It is considered as a Message transfer system (MTS).

It is less expensive. It has a low data rate.

It has distributed memory.

They are usually seen in distributed computing systems.
Tightly Coupled Multiprocessor
System
Tightly Coupled Multiprocessor System

In this system, the processors share memory modules.

It is efficient when used with real-time processing.

It provides high speed. It has memory conflicts.

They are connected through common networks.

It has a high data rate.

It is expensive.

It is usually seen in parallel processing systems.
Tightly coupled vs. loosely coupled multiprocessors

A loosely coupled multiprocessor is often called a
“message-passing” or “distributed-memory”
multiprocessor.

In massively parallel (or "loosely coupled") processing,
up to 200 or more processors can work on the same
application.

Each processor has its own operating system and
memory, but an "interconnect" arrangement of data
paths allows messages to be sent between processors.

An MPP system is also known as a "shared nothing"
system.
Tightly coupled vs. loosely coupled multiprocessors


With the advent of parallel processing, multiprocessing
is divided into

Symmetric multiprocessing (SMP)( or Tightly couple)

Massively parallel processing (MPP) (or loosely
coupled)

A multiprocessor that has no shared memory, Such a
multiprocessor is called loosely coupled.

Communication is by means of inter-processor
messages .
2.2 Overview of PP systems
2.2 Overview of PP systems

In this CPU, disks are used parallel to enhance the
processing performance.


Operations like data loading and query processing are
performed parallel.


Centralized and client server database systems are not
powerful enough to handle applications that need fast
processing.


Parallel database systems have great advantages for
online transaction processing and decision
support applications.
2.2 Overview of PP systems

All processors in the system can perform their task
concurrently

Task may require to be synchronised

Nodes or processors usually share resources such as
data, disk and other devices.

e.g: In banking organisation number of employees
provide service to several customers simultaneously
2.2 Challenges of PP systems

Structuring of task so that several task can be
executed at the same time in parallel

Preserving the task sequencing so that tasks can be
executed serially


The parallel processing technique increases the
system performance in terms of two important
properties: Speedup and Scaleup
Speedup

More hardware can perform the same task in
less time than the original system

With good speedup, additional processors
reduce system response time.

Speedup = time_original

---------------------

Time_parallel
Scaleup

Scaleup: is the factor that represents how
much more work can be done in the same
time period by a larger system.

With the added hardware,

scaleup holds the time as a constant and
measures the increased size of job that can be
done within that constant period of time.

Scalup = volume_parallel / volume_original
Parallel database

To improve system performance,

a parallel databases allows multiple user to access a
single physical database from multiple machines.

To balance the workload among processors,

parallel databases provide concurrent access to data and
preserve data integrity.
2.2.2 Benefits of Parallel Databases

Better Performance

The improvement in performance depends on
the degree of inter-node locking and
synchronization activities.

The volume of lock operations and
throughput determines the scalability of the
system.
2.2.2 Benefits of Parallel Databases

Higher Availability

processors are isolated from each other, so a
failure of one node does not imply the
failure of the entire system.

One of the surviving nodes recovers the failed
node while the other nodes in the system
continue to provide data access to users.
2.2.2 Benefits of Parallel Databases

Greater Flexibility

One can allocate or deallocate instances as
necessary.

For example, one can temporarily allocate more
instances as demand on database increases.

When they are no longer required, these
instances can be deallocated and used for
other purposes.
2.2.2 Benefits of Parallel Databases

Serves more users:

it is possible to overcome memory limits;

thus, a single system can serve thousands of
users.
2.3 Parallel Database Architectures


Shared memory

Shared disk

Shared nothing

Hirarchical
1. Shared Memory Architecture
1. Shared Memory Architecture

Tightly coupled architecture

Processors attached to a global shared memory

large amount of cache memory at each processors.

If a processor performs a write operation to memory
location,

the data should be updated or removed

e.g: A= A+10, A= B+10, Commit
1. Shared Memory Architecture

Advantages of Shared memory system:

Data is easily accessible to any processor.

One processor can send message to other
efficiently.

Disadvantages of Shared memory system

These are costly, limited extensibility and low
availability.

Waiting time of processors is increased due to
more number of processors.
2. Shared Disk System
2. Shared Disk System

Loosely coupled architecture

Every processor has local memory.

Multiple processors share a common set of
disks.

Also called clusters
2. Shared Disk System

Advantages:

Fault tolerance: If a processor or its memory fails, the other
processor can complete the task.

Disadvantage:

Limited scalability

If more processors are added the existing processors
are slowed down.

Applications of Shared Disk System:

Digital Equipment Corporation(DEC): DEC’s cluster running
relational databases use the shared disk system and now
owned by Oracle.
3. Shared nothing system
3. Shared nothing system

Each processor has its own local memory and
local disk.

Any processor can act as a server to serve
the data which is stored on local disk.
3. Shared nothing system


Advantages :

Number of processors and disk can be
connected as per the requirement

It makes the system more scalable.
3. Shared nothing system

Disadvantages:

Data partitioning is required in shared
nothing disk system.

Cost of communication for accessing local
disk is much higher.

Applications of Shared nothing disk system:

Tera data database machine.

The Grace and Gamma research prototypes.
4. Hierarchical System

Also known as NUMA (Non-Uniform Memory
Architecture)

A hybrid of shared memory system, shared disk
system and shared nothing system.
4. Hierarchical System

Each group of processor has a local memory

But processors from other groups can access
memory which is associated with the other
group in coherent.

NUMA uses local and remote memory(Memory
from other group)

hence it will take longer time to communicate
with each other.
4. Hierarchical System

Advantages:

Improves the scalability of the system.

Memory bottleneck(shortage of memory)
problem is minimized in this architecture.

Disadvantages:

The cost of the architecture is higher compared
to other architectures.
2.3 Parallel Database Design

Parallel database system supports parallelism

between and within queries

e.g: inter-and intra-query paternalism

The crucial issues in parallel database:
 Data partitioning
 Parallel query processing
 Query optimisation
 Parallel transaction management
2.3.1 Data Partitioning

For Load balancing:

To distribute the workload across the
resources
 such as CPU, disk, main memory and network of a
parallel system
 It is supported by data partitioning method
2.3.1 Data Partitioning
2.3.1 Data Partitioning

By partitioning the data equally into many
different processors’ workload

achieve better performance

better parallelism of the whole system

2 types:

Horizontal

Vertical
Horizontal Data Partitioning
 Partitioning the tables using the conditions

specified through WHERE clause

distribute bunch of tuples (records)

STUDENT (Regno, SName, Address, Branch, Phone)

SELECT * FROM student WHERE Branch = branch
name;

e.g: branches like ‘BTech CIVIL’, ‘BTech MECH’,
‘BTech CSE’,
Vertical Data Partitioning

Partitioning tables using the decomposition
rules

Distribute the tables into multiple partitions
vertically (different schemas)

e.g:STUDENT into different tables like

STUDENT(Regno, SName, Address,
Branch)

STU_PHONE(Regno, Phone),
Partitioning Strategies:

To manage the data distribution into multiple
processors evenly, strategies like:

Round-robin

Hash partitioning

Range partitioning

e.g: we partition our data as:

n processors P0, P1, P2, …, Pn-1

n disks D0, D1, D2, …, Dn-1

The value of n is chosen according to requirements
Round-Robin Partitioning

The simplest form where data are distributed into
various disks

First record into first disk, second record into second disk,
and so on.

Tuple numbers 1, 11, 21, 31, 41, ..., 991 of the Employee
relation will be stored on disk number 1

For second record goes to D2 (2 mod 10 =2) ....

This scheme distributes data evenly in all the disks.

Hence, processing of the point query “city” for the relation
is very difficult.

Similarly, the range query is also very difficult to process
Round-Robin Partitioning

Eexcellent for applications that wish to read the
entire relation sequentially for each query.

Very difficult to process point queries and range
queries.

A point query: retrieval of tuples from a relation that
satisfies a particular attribute :

e.g: Student relation with the City = “Kolkata”

A range query: retrieval of tuples from a relation
within a given range.

e.g: Emp relation with the salary range (1000,2000)
Hash Partitioning

Identifies one or more attributes as partitioning
attributes

It takes the identified partitioning attributes as input
to hash function.

Hash Partitioning is ideally suited for applications that
want only sequential accesses to the data.

Tuples are stored by applying a hashing function to an
attribute.

The hash function specifies the placement of the tuple
on a particular disk.
Hash Partitioning

For example, consider the following table;

EMPLOYEE(ENo, EName, DeptNo, Salary, Age)

If we choose DeptNo attribute as the partitioning
attribute

If we have 10 disks to distribute the data, then the
following would be a hash function;

h(DeptNo) = DeptNo mod 10

If we have 10 departments, then according to the hash
function, all the employees of department 1 will go into
disk 1, department 2 to disk 2 and so on.
Range Partitioning

In Range Partitioning we identify one or more
attributes as partitioning attributes.

Then we choose a range partition vector to
partition the table into n disks.

The vector is the values present in the partitioning
attribute.
Range Partitioning

For example, Salary for the EMPLOYEE

[5000, 15000, 30000], where every value means the
individual range of salaries

5000 represents the first range (0 – 5000),

15000 represents the range (5001 – 15000),

30000 represents the third range (15001 – 30000),

it includes the final range which is (30001 – rest).

Hence, the vector with 3 values represents 4
disks/partitions.

You might also like