CH 2

Chapter 2: Review of Database Systems
2.1 Evolution of DDBS


Computer System Architecture consisting of
interconnected multiple processors

They are classified into two basic categories:

Tightly coupled system

Loosely coupled system
Loosely Coupled Multiprocessor System

Every processor has its own memory module.

It is efficient when there is less interaction between tasks
running on different processors.

There are no memory conflicts in general.

It is considered as a Message transfer system (MTS).

It is less expensive. It has a low data rate.

It has distributed memory.

They are usually seen in distributed computing systems.
Tightly Coupled Multiprocessor
System
Tightly Coupled Multiprocessor System

In this system, the processors share memory modules.

It is efficient when used with real-time processing.

It provides high speed. It has memory conflicts.

They are connected through common networks.

It has a high data rate.

It is expensive.

It is usually seen in parallel processing systems.
Tightly coupled vs. loosely coupled multiprocessors

A loosely coupled multiprocessor is often called a
“message-passing” or “distributed-memory”
multiprocessor.

In massively parallel (or "loosely coupled") processing,
up to 200 or more processors can work on the same
application.

Each processor has its own operating system and
memory, but an "interconnect" arrangement of data
paths allows messages to be sent between processors.

An MPP system is also known as a "shared nothing"
system.
Tightly coupled vs. loosely coupled multiprocessors

With the advent of parallel processing, multiprocessing
is divided into

Symmetric multiprocessing (SMP)( or Tightly couple)

Massively parallel processing (MPP) (or loosely
coupled)

A multiprocessor that has no shared memory, Such a
multiprocessor is called loosely coupled.

Communication is by means of inter-processor
messages .
2.2 Overview of PP systems

In this CPU, disks are used parallel to enhance the
processing performance.

Operations like data loading and query processing are
performed parallel.

Centralized and client server database systems are not
powerful enough to handle applications that need fast
processing.

Parallel database systems have great advantages for
online transaction processing and decision
support applications.

All processors in the system can perform their task
concurrently

Task may require to be synchronised

Nodes or processors usually share resources such as
data, disk and other devices.

e.g: In banking organisation number of employees
provide service to several customers simultaneously
2.2 Challenges of PP systems

Structuring of task so that several task can be
executed at the same time in parallel

Preserving the task sequencing so that tasks can be
executed serially

The parallel processing technique increases the
system performance in terms of two important
properties: Speedup and Scaleup
Speedup

More hardware can perform the same task in
less time than the original system

With good speedup, additional processors
reduce system response time.

Speedup = time_original

---------------------

Time_parallel
Scaleup

Scaleup: is the factor that represents how
much more work can be done in the same
time period by a larger system.

With the added hardware,

scaleup holds the time as a constant and
measures the increased size of job that can be
done within that constant period of time.

Scalup = volume_parallel / volume_original
Parallel database

To improve system performance,

a parallel databases allows multiple user to access a
single physical database from multiple machines.

To balance the workload among processors,

parallel databases provide concurrent access to data and
preserve data integrity.
2.2.2 Benefits of Parallel Databases

Better Performance

The improvement in performance depends on
the degree of inter-node locking and
synchronization activities.

The volume of lock operations and
throughput determines the scalability of the
system.

Higher Availability

processors are isolated from each other, so a
failure of one node does not imply the
failure of the entire system.

One of the surviving nodes recovers the failed
node while the other nodes in the system
continue to provide data access to users.

Greater Flexibility

One can allocate or deallocate instances as
necessary.

For example, one can temporarily allocate more
instances as demand on database increases.

When they are no longer required, these
instances can be deallocated and used for
other purposes.

Serves more users:

it is possible to overcome memory limits;

thus, a single system can serve thousands of
users.
2.3 Parallel Database Architectures

Shared memory

Shared disk

Shared nothing

Hirarchical
1. Shared Memory Architecture

Tightly coupled architecture

Processors attached to a global shared memory

large amount of cache memory at each processors.

If a processor performs a write operation to memory
location,

the data should be updated or removed

e.g: A= A+10, A= B+10, Commit

Advantages of Shared memory system:

Data is easily accessible to any processor.

One processor can send message to other
efficiently.

Disadvantages of Shared memory system

These are costly, limited extensibility and low
availability.

Waiting time of processors is increased due to
more number of processors.
2. Shared Disk System

Loosely coupled architecture

Every processor has local memory.

Multiple processors share a common set of
disks.

Also called clusters

Advantages:

Fault tolerance: If a processor or its memory fails, the other
processor can complete the task.

Disadvantage:

Limited scalability

If more processors are added the existing processors
are slowed down.

Applications of Shared Disk System:

Digital Equipment Corporation(DEC): DEC’s cluster running
relational databases use the shared disk system and now
owned by Oracle.
3. Shared nothing system

Each processor has its own local memory and
local disk.

Any processor can act as a server to serve
the data which is stored on local disk.

Advantages :

Number of processors and disk can be
connected as per the requirement

It makes the system more scalable.

Disadvantages:

Data partitioning is required in shared
nothing disk system.

Cost of communication for accessing local
disk is much higher.

Applications of Shared nothing disk system:

Tera data database machine.

The Grace and Gamma research prototypes.
4. Hierarchical System

Also known as NUMA (Non-Uniform Memory
Architecture)

A hybrid of shared memory system, shared disk
system and shared nothing system.

Each group of processor has a local memory

But processors from other groups can access
memory which is associated with the other
group in coherent.

NUMA uses local and remote memory(Memory
from other group)

hence it will take longer time to communicate
with each other.

Advantages:

Improves the scalability of the system.

Memory bottleneck(shortage of memory)
problem is minimized in this architecture.

Disadvantages:

The cost of the architecture is higher compared
to other architectures.
2.3 Parallel Database Design

Parallel database system supports parallelism

between and within queries

e.g: inter-and intra-query paternalism

The crucial issues in parallel database:
 Data partitioning
 Parallel query processing
 Query optimisation
 Parallel transaction management
2.3.1 Data Partitioning

For Load balancing:

To distribute the workload across the
resources
 such as CPU, disk, main memory and network of a
parallel system
 It is supported by data partitioning method

By partitioning the data equally into many
different processors’ workload

achieve better performance

better parallelism of the whole system

2 types:

Horizontal

Vertical
Horizontal Data Partitioning
 Partitioning the tables using the conditions

specified through WHERE clause

distribute bunch of tuples (records)

STUDENT (Regno, SName, Address, Branch, Phone)

SELECT * FROM student WHERE Branch = branch
name;

e.g: branches like ‘BTech CIVIL’, ‘BTech MECH’,
‘BTech CSE’,
Vertical Data Partitioning

Partitioning tables using the decomposition
rules

Distribute the tables into multiple partitions
vertically (different schemas)

e.g:STUDENT into different tables like

STUDENT(Regno, SName, Address,
Branch)

STU_PHONE(Regno, Phone),
Partitioning Strategies:

To manage the data distribution into multiple
processors evenly, strategies like:

Round-robin

Hash partitioning

Range partitioning

e.g: we partition our data as:

n processors P0, P1, P2, …, Pn-1

n disks D0, D1, D2, …, Dn-1

The value of n is chosen according to requirements
Round-Robin Partitioning

The simplest form where data are distributed into
various disks

First record into first disk, second record into second disk,
and so on.

Tuple numbers 1, 11, 21, 31, 41, ..., 991 of the Employee
relation will be stored on disk number 1

For second record goes to D2 (2 mod 10 =2) ....

This scheme distributes data evenly in all the disks.

Hence, processing of the point query “city” for the relation
is very difficult.

Similarly, the range query is also very difficult to process
Round-Robin Partitioning

Eexcellent for applications that wish to read the
entire relation sequentially for each query.

Very difficult to process point queries and range
queries.

A point query: retrieval of tuples from a relation that
satisfies a particular attribute :

e.g: Student relation with the City = “Kolkata”

A range query: retrieval of tuples from a relation
within a given range.

e.g: Emp relation with the salary range (1000,2000)
Hash Partitioning

Identifies one or more attributes as partitioning
attributes

It takes the identified partitioning attributes as input
to hash function.

Hash Partitioning is ideally suited for applications that
want only sequential accesses to the data.

Tuples are stored by applying a hashing function to an
attribute.

The hash function specifies the placement of the tuple
on a particular disk.
Hash Partitioning

For example, consider the following table;

EMPLOYEE(ENo, EName, DeptNo, Salary, Age)

If we choose DeptNo attribute as the partitioning
attribute

If we have 10 disks to distribute the data, then the
following would be a hash function;

h(DeptNo) = DeptNo mod 10

If we have 10 departments, then according to the hash
function, all the employees of department 1 will go into
disk 1, department 2 to disk 2 and so on.
Range Partitioning

In Range Partitioning we identify one or more
attributes as partitioning attributes.

Then we choose a range partition vector to
partition the table into n disks.

The vector is the values present in the partitioning
attribute.
Range Partitioning

For example, Salary for the EMPLOYEE

[5000, 15000, 30000], where every value means the
individual range of salaries

5000 represents the first range (0 – 5000),

15000 represents the range (5001 – 15000),

30000 represents the third range (15001 – 30000),

it includes the final range which is (30001 – rest).

Hence, the vector with 3 values represents 4
disks/partitions.

CH 2

Uploaded by

Copyright:

Available Formats

You might also like

CH 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH 2

Uploaded by

Copyright:

Available Formats

Chapter 2: Review of Database Systems

2.1 Evolution of DDBS

You might also like