Professional Documents
Culture Documents
Introduction CUDA
Introduction CUDA
Introduction
Frauke Sprengel
Outline
1 Lecture Organization
2 Contents
3 Parallel computing
6 Literature
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 2
Outline
1 Lecture Organization
2 Contents
3 Parallel computing
6 Literature
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 3
Visual Computing
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 4
Outline
1 Lecture Organization
2 Contents
3 Parallel computing
6 Literature
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 5
Contents
Contents of lecture
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 6
Contents
Contents of exercises
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 7
Outline
1 Lecture Organization
2 Contents
3 Parallel computing
Why parallel computing?
Identification of parallizable problem parts
Flynn’s Classical Taxonomy
6 Literature
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 8
Why parallel computing?
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 9
Serial and Parallel Problems
Figure 1.1: Serial (left) and parallel (right) problems in principle (Barney 2016).
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 10
Matrix Addition
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 11
Matrix Multiplication
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 12
Inner Product
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 13
Repetition
Question (Flynn)
How is computing hardware classified (Flynn)?
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 14
Flynn’s Classical Taxonomy
According to Flynn (1966) multi-processor computer architectures are
distingued according to
• instruction stream
• data stream
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 15
Single Instruction - Single Data
• single instruction: one instruction stream is being acted on by the CPU
• single data: one data stream is being used as input
• deterministic execution
• serial computer: older generation mainframes, single processor/core PCs.
Figure 1.3: Single Instruction, Single Data (Flow, Example) (Barney 2016).
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 16
Multiple Instruction - Single Data
• multiple instruction: each processing unit operates on the data
independently via separate instruction streams.
• single data: a single data stream is fed into multiple processing units.
• more or less irrelevant (no actual computers known)
Figure 1.4: Multiple Instruction, Single Data (Flow, Example) (Barney 2016).
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 17
Single Instruction - Multiple Data
• single instruction: all processing units execute the same instruction at
any given clock cycle
• multiple data: each processing unit can operate on a different data
element
• best suited for specialized problems characterized by a high degree of
regularity, such as graphics/image processing.
Figure 1.5: Single Instruction, Multiple Data (Flow, Example) (Barney 2016).
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 18
Single Instruction - Multiple Data
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 19
Multiple Instruction - Multiple Data
• multiple instruction: every processor may be executing a different
instruction stream
• multiple data: every processor may be working with a different data
stream
• execution can be synchronous or asynchronous, deterministic or
non-deterministic
Figure 1.8: Multiple Instruction, Multiple Data (Flow, Example) (Barney 2016).
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 20
Multiple Instruction - Multiple Data
• currently, the most common type of parallel computer - most modern
supercomputers fall into this category, networked parallel computer
clusters and "grids", multi-core PCs.
• Note: many MIMD architectures also include SIMD execution
sub-components
• distingush further: shared memory (e.g. multi-core PCs), distributed
memory (e.g. clusters), hybrid
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 21
Outline
1 Lecture Organization
2 Contents
3 Parallel computing
6 Literature
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 22
Speculation
Question
Where does parallelism come into the game using GPUs?
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 23
Introduction
Figure 1 Floating-Point Operations per Second for the CPU and GPU
Figure 1.10: Maximal values of processing speed (in GFLOP/s) for NVIDIA
GPUs and Intel CPUs (NVIDIA 2015a)
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 24
Introduction
Memory Bandwidth
CPU GPU
Latency Oriented Cores Throughput Oriented Cores
Chip Chip
Core Compute Unit
Cache/Local Mem
Local Cache
Threading
Registers
Control
Registers SIMD
SIMD Unit Unit
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 26
CPU vs. GPU Architecture
CPUs: Latency Oriented Design
ALU ALU – Powerful ALU
Control – Reduced operation latency
ALU ALU
– Large caches
– Convert long latency memory
CPU Cache accesses to short latency cache
accesses
– Sophisticated control
DRAM – Branch prediction for reduced
branch latency
– Data forwarding for reduced data
latency
5
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 27
CPU vs. GPU Architecture
GPUs: Throughput Oriented Design
– Small caches
– To boost memory throughput
– Simple control
– No branch prediction
GPU
– No data forwarding
– Energy efficient ALUs
– Many, long latency but heavily
DRAM pipelined for high throughput
– Require massive number of
threads to tolerate latencies
– Threading logic
– Thread state
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction,
6 28
A Brief History of GPU Computing
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 29
A Brief History of GPU Computing
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 30
GPGPU and GPU computing
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 31
Outline
1 Lecture Organization
2 Contents
3 Parallel computing
6 Literature
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 32
CUDA C Examples I
Figure 1.15: Fluid simulation (left) and smoke particles from CUDA SDK.
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 33
CUDA C Examples II
Figure 1.16: Ocean (left) and Mandelbrot fractal from CUDA SDK.
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 34
Real-world Applications of GPU Computing
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 35
Outline
1 Lecture Organization
2 Contents
3 Parallel computing
6 Literature
Books
Websites and other Sources
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 36
Literature: Books and Reports I
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 37
Literature: Books and Reports II
Tsuchiyama, Ryoji, Takashi Nakamura, Takuro Iizuka, Akihiro Asahara,
and Satoshi Miki (2012).
The OpenCL Programming Book – Parallel Programming for MultiCore
CPU and GPU.
2010 version available as html from
https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/contents/.
Japan: Fixstars Corporation.
Munshi, Aaftab, Benedict Gaster, Timothy G. Mattson, James Fung,
and Dan Ginsburg (2011).
OpenCL Programming Guide.
Addison-Wesley Professional,
P. 648.
isbn: 0321749642.
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 38
Literature: Books and Reports III
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 39
Literature: Websites and Articles I
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 40
Literature: Websites and Articles II
Segal, Mark and Kurt Akeley (2015).
The OpenGL Graphics System: A Specification (Version 4.5).
Tech. rep.
The definitive OpenGL reference,
https://www.opengl.org/documentation/.
Khronos Group.
Wikipedia (2016).
OpenCL.
en.wikipedia.org/wiki/OpenCL.
AMD (2010a).
Introduction to OpenCL Programming.
developer.amd.com/zones/OpenCLZone/courses/Documents/
Introduction_to_OpenCL_Programming(201005).pdf.
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 41
Literature: Websites and Articles III
AMD (2010b).
AMD OpenCL Tutorial at SAAHPC2010 (Benedict R. Gaster and Lee
Howes).
developer.amd.com/zones/OpenCLZone/courses/Documents/AMD_
OpenCL_Tutorial_SAAHPC2010.pdf.
NVIDIA (2015b).
NVIDIA CUDA Runtime API (Reference Manual). Version 7.5.
Tech. rep.
Moodle course,
http://docs.nvidia.com/cuda/cuda-runtime-api/.
NVIDIA Corp.
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 42
Literature: Websites and Articles IV
NVIDIA (2015c).
NVIDIA CUDA C Best Practices Guide. Version 7.5.
Tech. rep.
Moodle course,
http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/.
NVIDIA Corp.
— (2015d).
The CUDA Compiler Driver NVCC.
Tech. rep.
Useful details and options of NVCC,
http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc.
NVIDIA Corp.
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 43
Literature: Websites and Articles V
Kessenich, John (2016).
The OpenGL Shading Language (Language Version 4.4).
Tech. rep.
The definitive GLSL reference,
https://www.opengl.org/documentation/glsl/.
Khronos Group.
GPGPU.org (2015).
General-Purpose Computation on Graphics Hardware.
gpgpu.org.
Khronos Group (2016).
OpenCL Specification.
www.khronos.org/opencl.
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 44
Literature: Websites and Articles VI
Nvidia (2016b).
Nvidia’s OpenCl samples.
https://developer.nvidia.com/opencl.
AMD (2016).
AMD’s OpenCl tutorials, drivers, SDK and more.
developer.amd.com/zones/OpenCLZone.
jogamp.org (2016).
OpenGl and OpenCL bindings for Java (high level).
jogamp.org.
jocl.org (2016).
OpenCL bindings for Java (rather closed to C).
jocl.org.
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 45
Literature: Websites and Articles VII
AMD (2010c).
ATI Stream SDK OpenCL Programming Guide.
developer.amd.com/gpu_assets/ATI_Stream_SDK_OpenCL_
Programming_Guide.pdf.
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 46