Professional Documents
Culture Documents
CH 2
CH 2
CH 2
With the advent of parallel processing, multiprocessing
is divided into
Symmetric multiprocessing (SMP)( or Tightly couple)
Massively parallel processing (MPP) (or loosely
coupled)
A multiprocessor that has no shared memory, Such a
multiprocessor is called loosely coupled.
Communication is by means of inter-processor
messages .
2.2 Overview of PP systems
2.2 Overview of PP systems
In this CPU, disks are used parallel to enhance the
processing performance.
Operations like data loading and query processing are
performed parallel.
Centralized and client server database systems are not
powerful enough to handle applications that need fast
processing.
Parallel database systems have great advantages for
online transaction processing and decision
support applications.
2.2 Overview of PP systems
All processors in the system can perform their task
concurrently
Task may require to be synchronised
Nodes or processors usually share resources such as
data, disk and other devices.
e.g: In banking organisation number of employees
provide service to several customers simultaneously
2.2 Challenges of PP systems
Structuring of task so that several task can be
executed at the same time in parallel
Preserving the task sequencing so that tasks can be
executed serially
The parallel processing technique increases the
system performance in terms of two important
properties: Speedup and Scaleup
Speedup
More hardware can perform the same task in
less time than the original system
With good speedup, additional processors
reduce system response time.
Speedup = time_original
---------------------
Time_parallel
Scaleup
Scaleup: is the factor that represents how
much more work can be done in the same
time period by a larger system.
With the added hardware,
scaleup holds the time as a constant and
measures the increased size of job that can be
done within that constant period of time.
Scalup = volume_parallel / volume_original
Parallel database
To improve system performance,
a parallel databases allows multiple user to access a
single physical database from multiple machines.
To balance the workload among processors,
parallel databases provide concurrent access to data and
preserve data integrity.
2.2.2 Benefits of Parallel Databases
Better Performance
The improvement in performance depends on
the degree of inter-node locking and
synchronization activities.
The volume of lock operations and
throughput determines the scalability of the
system.
2.2.2 Benefits of Parallel Databases
Higher Availability
processors are isolated from each other, so a
failure of one node does not imply the
failure of the entire system.
One of the surviving nodes recovers the failed
node while the other nodes in the system
continue to provide data access to users.
2.2.2 Benefits of Parallel Databases
Greater Flexibility
One can allocate or deallocate instances as
necessary.
For example, one can temporarily allocate more
instances as demand on database increases.
When they are no longer required, these
instances can be deallocated and used for
other purposes.
2.2.2 Benefits of Parallel Databases
Serves more users:
it is possible to overcome memory limits;
thus, a single system can serve thousands of
users.
2.3 Parallel Database Architectures
Shared memory
Shared disk
Shared nothing
Hirarchical
1. Shared Memory Architecture
1. Shared Memory Architecture
Tightly coupled architecture
Processors attached to a global shared memory
large amount of cache memory at each processors.
If a processor performs a write operation to memory
location,
the data should be updated or removed
e.g: A= A+10, A= B+10, Commit
1. Shared Memory Architecture
Advantages of Shared memory system:
Data is easily accessible to any processor.
One processor can send message to other
efficiently.
Disadvantages of Shared memory system
These are costly, limited extensibility and low
availability.
Waiting time of processors is increased due to
more number of processors.
2. Shared Disk System
2. Shared Disk System
Loosely coupled architecture
Every processor has local memory.
Multiple processors share a common set of
disks.
Also called clusters
2. Shared Disk System
Advantages:
Fault tolerance: If a processor or its memory fails, the other
processor can complete the task.
Disadvantage:
Limited scalability
If more processors are added the existing processors
are slowed down.
Applications of Shared Disk System:
Digital Equipment Corporation(DEC): DEC’s cluster running
relational databases use the shared disk system and now
owned by Oracle.
3. Shared nothing system
3. Shared nothing system
Each processor has its own local memory and
local disk.
Any processor can act as a server to serve
the data which is stored on local disk.
3. Shared nothing system
Advantages :
Number of processors and disk can be
connected as per the requirement
It makes the system more scalable.
3. Shared nothing system
Disadvantages:
Data partitioning is required in shared
nothing disk system.
Cost of communication for accessing local
disk is much higher.
Applications of Shared nothing disk system:
Tera data database machine.
The Grace and Gamma research prototypes.
4. Hierarchical System
Also known as NUMA (Non-Uniform Memory
Architecture)
A hybrid of shared memory system, shared disk
system and shared nothing system.
4. Hierarchical System
Each group of processor has a local memory
But processors from other groups can access
memory which is associated with the other
group in coherent.
NUMA uses local and remote memory(Memory
from other group)
hence it will take longer time to communicate
with each other.
4. Hierarchical System
Advantages:
Improves the scalability of the system.
Memory bottleneck(shortage of memory)
problem is minimized in this architecture.
Disadvantages:
The cost of the architecture is higher compared
to other architectures.
2.3 Parallel Database Design
Parallel database system supports parallelism
between and within queries
e.g: inter-and intra-query paternalism
The crucial issues in parallel database:
Data partitioning
Parallel query processing
Query optimisation
Parallel transaction management
2.3.1 Data Partitioning
For Load balancing:
To distribute the workload across the
resources
such as CPU, disk, main memory and network of a
parallel system
It is supported by data partitioning method
2.3.1 Data Partitioning
2.3.1 Data Partitioning
By partitioning the data equally into many
different processors’ workload
achieve better performance
better parallelism of the whole system
2 types:
Horizontal
Vertical
Horizontal Data Partitioning
Partitioning the tables using the conditions
specified through WHERE clause
distribute bunch of tuples (records)
STUDENT (Regno, SName, Address, Branch, Phone)
SELECT * FROM student WHERE Branch = branch
name;
e.g: branches like ‘BTech CIVIL’, ‘BTech MECH’,
‘BTech CSE’,
Vertical Data Partitioning
Partitioning tables using the decomposition
rules
Distribute the tables into multiple partitions
vertically (different schemas)
e.g:STUDENT into different tables like
STUDENT(Regno, SName, Address,
Branch)
STU_PHONE(Regno, Phone),
Partitioning Strategies:
To manage the data distribution into multiple
processors evenly, strategies like:
Round-robin
Hash partitioning
Range partitioning
e.g: we partition our data as:
n processors P0, P1, P2, …, Pn-1
n disks D0, D1, D2, …, Dn-1
The value of n is chosen according to requirements
Round-Robin Partitioning
The simplest form where data are distributed into
various disks
First record into first disk, second record into second disk,
and so on.
Tuple numbers 1, 11, 21, 31, 41, ..., 991 of the Employee
relation will be stored on disk number 1
For second record goes to D2 (2 mod 10 =2) ....
This scheme distributes data evenly in all the disks.
Hence, processing of the point query “city” for the relation
is very difficult.
Similarly, the range query is also very difficult to process
Round-Robin Partitioning
Eexcellent for applications that wish to read the
entire relation sequentially for each query.
Very difficult to process point queries and range
queries.
A point query: retrieval of tuples from a relation that
satisfies a particular attribute :
e.g: Student relation with the City = “Kolkata”
A range query: retrieval of tuples from a relation
within a given range.
e.g: Emp relation with the salary range (1000,2000)
Hash Partitioning
Identifies one or more attributes as partitioning
attributes
It takes the identified partitioning attributes as input
to hash function.
Hash Partitioning is ideally suited for applications that
want only sequential accesses to the data.
Tuples are stored by applying a hashing function to an
attribute.
The hash function specifies the placement of the tuple
on a particular disk.
Hash Partitioning
For example, consider the following table;
EMPLOYEE(ENo, EName, DeptNo, Salary, Age)
If we choose DeptNo attribute as the partitioning
attribute
If we have 10 disks to distribute the data, then the
following would be a hash function;
h(DeptNo) = DeptNo mod 10
If we have 10 departments, then according to the hash
function, all the employees of department 1 will go into
disk 1, department 2 to disk 2 and so on.
Range Partitioning
In Range Partitioning we identify one or more
attributes as partitioning attributes.
Then we choose a range partition vector to
partition the table into n disks.
The vector is the values present in the partitioning
attribute.
Range Partitioning
For example, Salary for the EMPLOYEE
[5000, 15000, 30000], where every value means the
individual range of salaries
5000 represents the first range (0 – 5000),
15000 represents the range (5001 – 15000),
30000 represents the third range (15001 – 30000),
it includes the final range which is (30001 – rest).
Hence, the vector with 3 values represents 4
disks/partitions.