Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Visual Computing – GPU Computing

Introduction

Frauke Sprengel
Outline
1 Lecture Organization

2 Contents

3 Parallel computing

4 Why GPU Computing?

5 Examples and Applications

6 Literature

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 2
Outline
1 Lecture Organization

2 Contents

3 Parallel computing

4 Why GPU Computing?

5 Examples and Applications

6 Literature

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 3
Visual Computing

Prof. Dr. Frauke Sprengel


Phone 0511/9296-1812
Room 1H2.49
frauke.sprengel@hs-hannover.de
This year’s topic: GPU Computing
Assignment: oral examination in January, two bigger exercises during
semester
Lecture notes and exercises: Moodle course

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 4
Outline
1 Lecture Organization

2 Contents

3 Parallel computing

4 Why GPU Computing?

5 Examples and Applications

6 Literature

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 5
Contents
Contents of lecture

• Why GPU Computing?


• Hardware architecture of GPUs
• CUDA C programming language
• OpenCL kernel language and API
• From CUDA to OpenCL
• Memory layout
• Concurrency and synchronization
• Graphics interoperability
• Application examples
• Computational thinking

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 6
Contents

Contents of exercises

• The focus of this course is on practical exercises!


• Programming with C++, CUDA, and OpenGL under Linux
• Programming with C/C++, OpenCL, OpenGL, Java under Linux
• Larger projects in the end of the two parts
• No graded project work, but preliminary for examination

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 7
Outline
1 Lecture Organization

2 Contents

3 Parallel computing
Why parallel computing?
Identification of parallizable problem parts
Flynn’s Classical Taxonomy

4 Why GPU Computing?

5 Examples and Applications

6 Literature
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 8
Why parallel computing?

How can we run a program in a faster way ?


• run it on a faster processor
• faster clock
• shorter time for each computation
• increases power consumption (limited)
• more work per step (clock cycle)
• also limited
• run it on more (simple) processors

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 9
Serial and Parallel Problems

Figure 1.1: Serial (left) and parallel (right) problems in principle (Barney 2016).

Problem: Identify independent problem parts which can be solved in parallel.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 10
Matrix Addition

Example: Matrix Addition


f o r ( i =0; i <M; i ++){
f o r ( j =0; j <N ; j ++) {
c[ i ][ j ] = a[ i ][ j ] + b[ i ][ j ];
}
}

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 11
Matrix Multiplication

Example: Matrix Multiplication


f o r ( i =0; i <M; i ++){
f o r ( j =0; j <N ; j ++) {
c [ i ] [ j ] = 0;
f o r ( k =0; k<P ; k++)
c [ i ] [ j ] += a [ i ] [ k ] ∗ b [ k ] [ j ] ;
}
}

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 12
Inner Product

Example: Inner Product


dot = 0 ;
f o r ( i =0; i <N ; i ++){
d o t += a [ i ] ∗ b [ i ] ;
}

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 13
Repetition

Question (Flynn)
How is computing hardware classified (Flynn)?

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 14
Flynn’s Classical Taxonomy
According to Flynn (1966) multi-processor computer architectures are
distingued according to
• instruction stream
• data stream

Figure 1.2: Flynn’s Taxonomy (Barney 2016).

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 15
Single Instruction - Single Data
• single instruction: one instruction stream is being acted on by the CPU
• single data: one data stream is being used as input
• deterministic execution
• serial computer: older generation mainframes, single processor/core PCs.

Figure 1.3: Single Instruction, Single Data (Flow, Example) (Barney 2016).

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 16
Multiple Instruction - Single Data
• multiple instruction: each processing unit operates on the data
independently via separate instruction streams.
• single data: a single data stream is fed into multiple processing units.
• more or less irrelevant (no actual computers known)

Figure 1.4: Multiple Instruction, Single Data (Flow, Example) (Barney 2016).

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 17
Single Instruction - Multiple Data
• single instruction: all processing units execute the same instruction at
any given clock cycle
• multiple data: each processing unit can operate on a different data
element
• best suited for specialized problems characterized by a high degree of
regularity, such as graphics/image processing.

Figure 1.5: Single Instruction, Multiple Data (Flow, Example) (Barney 2016).

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 18
Single Instruction - Multiple Data

Figure 1.6: Single Instruction,


Multiple Data Example (Barney
2016).

• Synchronous (lockstep) and deterministic


execution
• two varieties: processor arrays, vector
pipelines (e.g. Cray X-MP)
• most modern computers, particularly
those with graphics processor units
(GPUs) employ SIMD instructions and Figure 1.7: Cray X-MP, GPU (Barney
execution units. 2016).

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 19
Multiple Instruction - Multiple Data
• multiple instruction: every processor may be executing a different
instruction stream
• multiple data: every processor may be working with a different data
stream
• execution can be synchronous or asynchronous, deterministic or
non-deterministic

Figure 1.8: Multiple Instruction, Multiple Data (Flow, Example) (Barney 2016).

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 20
Multiple Instruction - Multiple Data
• currently, the most common type of parallel computer - most modern
supercomputers fall into this category, networked parallel computer
clusters and "grids", multi-core PCs.
• Note: many MIMD architectures also include SIMD execution
sub-components
• distingush further: shared memory (e.g. multi-core PCs), distributed
memory (e.g. clusters), hybrid

Figure 1.9: Hybrid distributed-shared memory architectures (Barney 2016).

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 21
Outline
1 Lecture Organization

2 Contents

3 Parallel computing

4 Why GPU Computing?


CPU vs. GPU Performance
GPU Architecture
A Brief History of GPU Computing

5 Examples and Applications

6 Literature
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 22
Speculation

Question
Where does parallelism come into the game using GPUs?

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 23
Introduction

Floating Point Operations per Second

Figure 1 Floating-Point Operations per Second for the CPU and GPU
Figure 1.10: Maximal values of processing speed (in GFLOP/s) for NVIDIA
GPUs and Intel CPUs (NVIDIA 2015a)

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 24
Introduction

Memory Bandwidth

Figure 2 Memory Bandwidth for the CPU and GPU


Figure 1.11: Memory bandwidth
The reason behind the discrepancy(GB/s) forcapability
in floating-point NVIDIA between GPUs
the CPU andand
the Intel CPUs
(NVIDIA 2015a)GPU is that the GPU is specialized for compute-intensive, highly parallel computation
- exactly what graphics rendering is about - and therefore designed such that more
transistors are devoted to data processing rather than data caching and flow control, as
schematically illustrated by Figure 3.
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 25
Cont r ol
CPU CPU and
vs. GPU GPU are designed very differently
Architecture

CPU GPU
Latency Oriented Cores Throughput Oriented Cores

Chip Chip
Core Compute Unit
Cache/Local Mem
Local Cache

Threading
Registers

Control
Registers SIMD
SIMD Unit Unit

Figure 1.12: Architecture of a CPU vs. a GPU (Nvidia 2016a).

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 26
CPU vs. GPU Architecture
CPUs: Latency Oriented Design
ALU ALU – Powerful ALU
Control – Reduced operation latency
ALU ALU
– Large caches
– Convert long latency memory
CPU Cache accesses to short latency cache
accesses
– Sophisticated control
DRAM – Branch prediction for reduced
branch latency
– Data forwarding for reduced data
latency

Figure 1.13: Architecture of a quad-core CPU (Nvidia 2016a).

5
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 27
CPU vs. GPU Architecture
GPUs: Throughput Oriented Design
– Small caches
– To boost memory throughput
– Simple control
– No branch prediction
GPU
– No data forwarding
– Energy efficient ALUs
– Many, long latency but heavily
DRAM pipelined for high throughput
– Require massive number of
threads to tolerate latencies
– Threading logic
– Thread state

Figure 1.14: Architecture of a GPU with 8 processors, each of which consists


of 16 cores (Nvidia 2016a).

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction,
6 28
A Brief History of GPU Computing

2001 NVIDIA GeForce 3 allows shader programming in Assembler.


2002 High-level shading languages GLSL (OpenGL),
HLSL (DirectX 9), Cg (NVIDIA)
2004 GPGPU (general purpose computation on GPUs) course at
SIGGRAPH 2004 using shading languages
(http://gpgpu.org/s2004);
BrookGPU project at Stanford University

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 29
A Brief History of GPU Computing

2006 Unified shader architecture;


GPU computing languages CUDA C (NVIDIA),
Stream SDK/Brook+ (ATI/AMD)
2008 Folding@Home project at Stanford University, using GPU
computing techniques on internet-connected PCs for distributed
simulation of protein folding
2009 GPU computing languages DirectCompute (DirectX 11),
OpenCL (Khronos, cross-platform computing language)

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 30
GPGPU and GPU computing

GPGPU General purpose computation on GPUs using shading languages


(e. g., GLSL) or the graphics API itself (e. g., OpenGL functions)
GPU computing General purpose computation on GPUs using computing
languages (e. g., CUDA)

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 31
Outline
1 Lecture Organization

2 Contents

3 Parallel computing

4 Why GPU Computing?

5 Examples and Applications

6 Literature

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 32
CUDA C Examples I

Figure 1.15: Fluid simulation (left) and smoke particles from CUDA SDK.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 33
CUDA C Examples II

Figure 1.16: Ocean (left) and Mandelbrot fractal from CUDA SDK.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 34
Real-world Applications of GPU Computing

• Engineering: computational fluid dynamics, FEM simulations


• Finance: option pricing, market analysis
• Geology: oil and gas exploration
• Medical imaging: volume rendering, CT reconstruction
• Life sciences: gene sequence alignment, protein folding
• “Consumer applications”: Photoshop, 3ds Max, Final Cut Pro, Matlab,
Mathematica

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 35
Outline
1 Lecture Organization

2 Contents

3 Parallel computing

4 Why GPU Computing?

5 Examples and Applications

6 Literature
Books
Websites and other Sources
Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 36
Literature: Books and Reports I

Kirk, David B. and Wen-mei W. Hwu (2010).


Programming Massively Parallel Processors.
Hands-on textbook on CUDA, supported by NVIDIA.
Burlington, MA: Morgan Kaufmann.
Nguyen, Hubert, ed. (2008).
GPU Gems 3.
Chapters 29 to 41 on GPU computing (mostly) with CUDA, available
online at http://http.developer.nvidia.com/GPUGems3/
gpugems3_part01.html.
Upper Saddle River, NJ: Addison-Wesley.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 37
Literature: Books and Reports II
Tsuchiyama, Ryoji, Takashi Nakamura, Takuro Iizuka, Akihiro Asahara,
and Satoshi Miki (2012).
The OpenCL Programming Book – Parallel Programming for MultiCore
CPU and GPU.
2010 version available as html from
https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/contents/.
Japan: Fixstars Corporation.
Munshi, Aaftab, Benedict Gaster, Timothy G. Mattson, James Fung,
and Dan Ginsburg (2011).
OpenCL Programming Guide.
Addison-Wesley Professional,
P. 648.
isbn: 0321749642.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 38
Literature: Books and Reports III

Banger, Ravishekhar and Koushik Bhattacharyya (2013).


OpenCL Programming by Example.
Packt Publishing.
Pharr, Matt, ed. (2005).
GPU Gems 2.
Chapters 29 to 36 on GPGPU before CUDA, available online at http://
http.developer.nvidia.com/GPUGems2/gpugems2_part01.html.
Upper Saddle River, NJ: Addison-Wesley.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 39
Literature: Websites and Articles I

Barney, Blaise (2016).


Introduction to Parallel Computing.
https://computing.llnl.gov/tutorials/parallel_comp.
NVIDIA (2015a).
NVIDIA CUDA C Programming Guide. Version 7.5.
Tech. rep.
Moodle course,
http://docs.nvidia.com/cuda/cuda-c-programming-guide/.
NVIDIA Corp.
Nvidia (2016a).
GPU Teaching Kit.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 40
Literature: Websites and Articles II
Segal, Mark and Kurt Akeley (2015).
The OpenGL Graphics System: A Specification (Version 4.5).
Tech. rep.
The definitive OpenGL reference,
https://www.opengl.org/documentation/.
Khronos Group.
Wikipedia (2016).
OpenCL.
en.wikipedia.org/wiki/OpenCL.
AMD (2010a).
Introduction to OpenCL Programming.
developer.amd.com/zones/OpenCLZone/courses/Documents/
Introduction_to_OpenCL_Programming(201005).pdf.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 41
Literature: Websites and Articles III

AMD (2010b).
AMD OpenCL Tutorial at SAAHPC2010 (Benedict R. Gaster and Lee
Howes).
developer.amd.com/zones/OpenCLZone/courses/Documents/AMD_
OpenCL_Tutorial_SAAHPC2010.pdf.
NVIDIA (2015b).
NVIDIA CUDA Runtime API (Reference Manual). Version 7.5.
Tech. rep.
Moodle course,
http://docs.nvidia.com/cuda/cuda-runtime-api/.
NVIDIA Corp.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 42
Literature: Websites and Articles IV
NVIDIA (2015c).
NVIDIA CUDA C Best Practices Guide. Version 7.5.
Tech. rep.
Moodle course,
http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/.
NVIDIA Corp.
— (2015d).
The CUDA Compiler Driver NVCC.
Tech. rep.
Useful details and options of NVCC,
http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc.
NVIDIA Corp.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 43
Literature: Websites and Articles V
Kessenich, John (2016).
The OpenGL Shading Language (Language Version 4.4).
Tech. rep.
The definitive GLSL reference,
https://www.opengl.org/documentation/glsl/.
Khronos Group.
GPGPU.org (2015).
General-Purpose Computation on Graphics Hardware.
gpgpu.org.
Khronos Group (2016).
OpenCL Specification.
www.khronos.org/opencl.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 44
Literature: Websites and Articles VI
Nvidia (2016b).
Nvidia’s OpenCl samples.
https://developer.nvidia.com/opencl.
AMD (2016).
AMD’s OpenCl tutorials, drivers, SDK and more.
developer.amd.com/zones/OpenCLZone.
jogamp.org (2016).
OpenGl and OpenCL bindings for Java (high level).
jogamp.org.
jocl.org (2016).
OpenCL bindings for Java (rather closed to C).
jocl.org.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 45
Literature: Websites and Articles VII

AMD (2010c).
ATI Stream SDK OpenCL Programming Guide.
developer.amd.com/gpu_assets/ATI_Stream_SDK_OpenCL_
Programming_Guide.pdf.

Hochschule Hannover, Fak. IV, F. Sprengel, Visual Computing – GPU Computing, Introduction, 46

You might also like