Professional Documents
Culture Documents
1-An Overview of Parallel Computing
1-An Overview of Parallel Computing
Cairo University
Dr. demerdash
Plan
1 Hardware
2 Types of Parallelism
Dr. demerdash
Hardware
Plan
1 Hardware
2 Types of Parallelism
Dr. demerdash
Hardware
Dr. demerdash
Hardware
Multicore processors
Dr. demerdash
Hardware
Multicore processors
L1 L1 L1 L1 L1 L1 L1 L1
inst data ins data ins data ins data
L2 L2
Main Memory
Dr. demerdash
Hardware
Dr. demerdash
Hardware
Distributed Memory
The largest and fastest computers in the world today employ both
shared and distributed memory architectures.
Current trends seem to indicate that this type of memory architecture
will continue to prevail.
While this model allows for applications to scale, it increases the
complexity of writing computer programs.
Dr. demerdash
Types of Parallelism
Plan
1 Hardware
2 Types of Parallelism
Dr. demerdash
Types of Parallelism
Pipelining
Dr. demerdash
Types of Parallelism
Instruction pipeline
Data parallelism
Dr. demerdash
Types of Parallelism
program:
...
i f CPU="a" then
do task "A"
else i f CPU="b" then
do task "B"
end i f
...
end program
Dr. demerdash
Types of Parallelism
program:
...
do task "A"
...
end program
program:
...
do task "B"
...
end program
Stencil computations
0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8
1 3 6 10 15 21 28
1 4 10 20 35 56
1 5 15 35 70
6 56
1 21
1 7 28
1 8
Dr. demerdash
Concurrency Platforms: Three Examples
Plan
1 Hardware
2 Types of Parallelism
Dr. demerdash
Concurrency Platforms: Three Examples Julia
[moreno@compute-0-3 ~]$ j u l i a -p 5
_
_ _ _(_)_ | A f r e s h approach t o t e c h n i cal computing
(_) | (_) (_) | Documentation: h t t p : / / do c s . j u l i a l a ng . o r g
_ _ _| |_ _ | Type " h e l p ( ) " t o l i s t help t o p i c s
| | | | | | |/ _ ‘ | |
| | |_| | | | (_| | | Version 0.2.0-prerelease+3622
_/ | \ ’ _ | _ | _ | \ ’ _ | | Commit c9bb96c 2013-09-04 15:34:41 UTC
| / | x86_64-redhat-linux
j u l i a > procs(da)
4-element A rray{ Int 64,1} :
2
3
4
5
j u l i a > da.chunks
4-element Array{RemoteRef,1}:
RemoteRef(2,1,1)
RemoteRef(3,1,2)
RemoteRef(4,1,3)
RemoteRef(5,1,4)
julia>
j u l i a > da.indexes
4-element Array{(Range1{Int 64},),1}:
(1:3,)
(4:5,)
(6:8,)
( 9: 10 ,)
j u l i a > da[3]
6
j u l i a > da[3:5]
3-element SubArray{Int 64,1,DArray{Int 64,1,Array{Int 64,1}},(Range1{Int 64},)}:
6
8
10
Dr. demerdash
Concurrency Platforms: Three Examples Julia
julia>
julia>
julia>
j u l i a > sum(da)
110
Dr. demerdash
Concurrency Platforms: Three Examples Julia
julia>
Dr. demerdash
Concurrency Platforms: Three Examples Cilk
i n t f i b ( i n t n)
{
i f (n < 2) return n ;
int x , y;
x = cilk_spawn f i b ( n - 1 ) ;
y = fib(n-2);
cilk_sync;
return x+y;
}
Dr. demerdash
Concurrency Platforms: Three Examples Cilk
Scheduling
Memory I/O
Network
$P $ … $
P P P
cilk_sync;
return (x+y); Compiler
}
}
Cilk++ source 6
Cilkview
Linker Scalability Analyzer
int fib (int n) {
if (n<2) return (n);
else {
int x,y;
x = fib(n-1);
5
- Binary Cilkscreen
return (x+y);
Race Detector
}
} Serialization
4 Runtime
Conventional System Parallel
Regression Tests Regression Tests
Dr. demerdash
Concurrency Platforms: Three Examples Cilk
Dr. demerdash
Concurrency Platforms: Three Examples Cilk
Uisng Cilkview
Dr. demerdash
Concurrency Platforms: Three Examples CUDA
Dr. demerdash
Concurrency Platforms: Three Examples CUDA
Dr. demerdash
Concurrency Platforms: Three Examples CUDA
Dr. demerdash
Concurrency Platforms: Three Examples CUDA
Dr. demerdash
Concurrency Platforms: Three Examples CUDA
Load the subset from global memory to shared memory, using multiple
threads to exploit memory-level parallelism.
Dr. demerdash
Concurrency Platforms: Three Examples CUDA
Dr. demerdash
Concurrency Platforms: Three Examples CUDA
Dr. demerdash
Concurrency Platforms: Three Examples MPI
Example
Dr. demerdash
Concurrency Platforms: Three Examples MPI
Dr. demerdash
Concurrency Platforms: Three Examples MPI
Dr. demerdash
Concurrency Platforms: Three Examples MPI
Dr. demerdash
Concurrency Platforms: Three Examples MPI
Dr. demerdash