Professional Documents
Culture Documents
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
2012
Outline
Overview
What is GPGPU?
Execution Model
Thread Hierarchy
Memory Model
Summary
High Performance Computing Center 2
Overview
NVIDIA
GeForce (gaming/movie playback)
Quadro (professional graphics)
Tesla (HPC)
AMD/ATI
Radeon (gaming/movie playback)
FireStream (HPC)
Costs/performance ratio
Costs for power supply
Costs for maintain, operation
G80 Architecture
06/2008
Geforce GT 200 series
30 SMs (240 cores)
DRAM 1GB
Tesla
30 SMs (240 cores)
DRAM 4GB
• Registers
o on chip
o fast access
o per thread
o limited amount
• Registers
• Local Memory
o in DRAM
o slow
o non-cached
o per thread
o relative large
• Registers
• Local Memory
• Shared Memory
o on chip
o fast access
o per block
o 16 KByte
o synchronize between
threads
• Registers
• Local Memory
• Shared Memory
• Global Memory
o in DRAM
o slow
o non-cached
o per grid
o communicate between
grids
High Performance Computing Center 22
Memory Model
There are 6 Memory Types :
• Registers
• Local Memory
• Shared Memory
• Global Memory
• Constant Memory
o in DRAM
o cached
o per grid
o read-only
• Registers
• Local Memory
• Shared Memory
• Global Memory
• Constant Memory
• Texture Memory
o in DRAM
o cached
o per grid
o read-only
• Local Memory
• Global Memory
• Constant Memory
• Texture Memory
o in Device Memory
http://www.nvidia.com/object/tesla_computing_solutions.html 28
Bioinfomatics
32
Cryptanalysis
MD5 code breaking using GPU
MD5 is one-way hash function
Inverse problem
Input: MD5 hash
Ouput : the origin password
http://3.14.by/en/read/md5_benchmark
Seismic Exploration
―the cost of exploration and drilling deep wells can
reach hundreds of millions of dollars, and there’s
often only one chance to do it successfully‖
SeismicCity
use the most advanced depth imaging technologies
Using Tesla 1U System
Speed up 20x compared to CPU previous configuration
http://www.nvidia.com/object/seismiccity.html
http://www.seismiccity.com/
36
Other Applications
4000
3500
thời gian (s)
3000
2532 GPU
2500
CPU
2000 1737
1500 1195
1000
500 214
55 79 116
0
0.8 0.85 0.9 0.95
alpha
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch31.html
40
Supercomputers
http://www.top500.org/
The first supercomputer using GPU
2009, Tsubame, Japan:
41
Summary