CUDA PPT Report

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

TECHNOSOFT REPORT

CUDA
A technology that can make super-computers
personal…

‘The soul of supercomputer is the body of GPU’

SUBMITTED BY
KUNAL GARG

CSE(A)-VI Sem.

2507276

UIET KU

Kurukshetra,India
INDEX
Abstract…………………………………………………………………………….…………..3

Supercomputer………………………………………………………………………………….4

GPU……………………………………………………………………………………………..4

GPU Computing…………………………………………………………………….……….….4

History of GPU Computing……………………………………………………………………..5

GPGPU………………………………………………………………………………………….6

CUDA…………………………………………………………………………………………...6

Advantages of CUDA………………………………………………………………………..…6

CUDA Programming Model……………………………………………………………………..7

CUDA Architecture…………………………………………………………………..…………7

Tesla 10-Series………………………………………………………………………..…………7

Tesla 10-Series Architecture..................................................................................................…...7

Thread Hierarchy………………………………………………………………………………...8

Execution Model…………………………………………………………………………………8

Warps and Half-Warps…………………………………………………………………………..9

GPU Memory Allocation/Release………………………………………………………...………9

Next Generation CUDA Architecture……………………………………………...…………….9

Applications……………………………………………………………………………………...10

Why should I use GPU as a Processor……………………………………………………...……10

Bibliography……………………………………………………………………………………..11
ABSTRACT
Today’s supercomputers are computers of tomorrow and GPU processors are bridge between
them today. A graphics processing unit or GPU is a specialized processor that offloads 3D or
2D graphics rendering from the microprocessor. GPU computing is the use of a GPU to do
general purpose scientific and engineering computing.

The model for GPU computing is to use a CPU and GPU together in a heterogeneous computing
model. The sequential part of the application runs on the CPU and the computationally-intensive
part runs on the GPU. From the user’s perspective, the application just runs faster because it is
using the high-performance of the GPU to boost performance. Computing is evolving from
"central processing" on the CPU to "co-processing" on the CPU and GPU. To enable this new
computing paradigm, NVIDIA invented the CUDA (Compute Unified Device Architecture)
parallel computing architecture.

The NVIDIA® Tesla™ 20-series is designed from the ground up for high performance
computing. Based on the next generation CUDA GPU architecture. When compared to the latest
quad-core CPU, Tesla 20-series GPU computing processors deliver equivalent performance at
1/20th the power consumption and 1/10th the cost i.e. it gives the power of super computer in a
pc based workstation.

“By 2012, three of the top five supercomputers in the world will have graphics processors using
parallel computing applications for computing,” Nvidia chief scientist David Kirk predicted.
SUPERCOMPUTER
A supercomputer is a computer that is at the frontline of current processing capacity, particularly
speed of calculation. Today, supercomputers are typically one-of-a-kind custom designs
produced by "traditional" companies such as Cray, IBM and Hewlett-Packard, who had
purchased many of the 1980s companies to gain their experience. As of July 2009, the Cray
Jaguar is the fastest supercomputer in the world.

The term supercomputer itself is rather fluid, and today's supercomputer tends to become
tomorrow's ordinary computer.

Supercomputers are used for highly calculation-intensive tasks such as problems involving
quantum physics, weather forecasting, climate research, molecular modeling (computing the
structures and properties of chemical compounds, biological macromolecules, polymers, and
crystals), physical simulations (such as simulation of airplanes in wind tunnels, simulation of the
detonation of nuclear weapons, and research into nuclear fusion).

In November 2009, the AMD Opteron-based Cray XT5 Jaguar at the Oak Ridge National
Laboratory was announced as the fastest operational supercomputer, with a sustained processing
rate of 1.759 PFLOPS

GPU
A graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a
specialized processor that offloads 3D or 2D graphics rendering from the microprocessor. It is
used in embedded systems, mobile phones, personal computers, workstations, and game
consoles. Modern GPUs are very efficient at manipulating computer graphics, and their highly
parallel structure makes them more effective than general-purpose CPUs for a range of complex
algorithms. In a personal computer, a GPU can be present on a video card, or it can be on the
motherboard. More than 90% of new desktop and notebook computers have integrated GPUs,
which are usually far less powerful than those on a dedicated video card.

GPU’s can be of following types:


1. Dedicated video cards: These have their own dedicated memory.
2. Integrated graphics processors: Share a portion of RAM
3. Hybrid: Share RAM while having their own cache memory.

GPU Computing
GPU computing is the use of a GPU (graphics processing unit) to do general purpose scientific
and engineering computing.
The model for GPU computing is to use a CPU and GPU together in a heterogeneous computing
model. The sequential part of the application runs on the CPU and the computationally-intensive
part runs on the GPU. From the user’s perspective, the application just runs faster because it is
using the high-performance of the GPU to boost performance.

The application developer has to modify their application to take the compute-intensive kernels
and map them to the GPU. The rest of the application remains on the CPU. Mapping a function
to the GPU involves rewriting the function to expose the parallelism in the function and adding
“C” keywords to move data to and from the GPU.

GPU computing is enabled by the massively parallel architecture of NVIDIA’s GPUs called the
CUDA architecture. The CUDA architecture consists of 100s of processor cores that operate
together to crunch through the data set in the application.

The Tesla 10-series GPU is the second generation CUDA architecture with features optimized
for scientific applications such as IEEE standard double precision floating point hardware
support, local data caches in the form of shared memory dispersed throughout the GPU,
coalesced memory accesses and so on.

"GPUs have evolved to the point where many real-world applications are easily implemented on
them and run significantly faster than on multi-core systems. Future computing architectures will
be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs," Prof.
Jack Dongarra predicted.

HISTORY OF GPU COMPUTING

Graphics chips started as fixed function graphics pipelines. Over the years, these graphics chips
became increasingly programmable, which led NVIDIA to introduce the first GPU or Graphics
Processing Unit. In the 1999-2000 timeframe, computer scientists in particular, along with
researchers in fields such as medical imaging and electromagnetics started using GPUs for
running general purpose computational applications. They found the excellent floating point
performance in GPUs led to a huge performance boost for a range of scientific applications. This
was the advent of the movement called GPGPU or General Purpose computing on GPUs.

The problem was that GPGPU required using graphics programming languages like OpenGL and
Cg to program the GPU. Developers had to make their scientific applications look like graphics
applications and map them into problems that drew triangles and polygons. This limited the
accessibility of tremendous performance of GPUs for science.

NVIDIA realized the potential to bring this performance to the larger scientific community and
decided to invest in modifying the GPU to make it fully programmable for scientific applications
and added support for high-level languages like C and C++. This led to the CUDA architecture
for the GPU.

GPGPU

General-purpose computing on graphics processing units is the technique of using a GPU,


which typically handles computation only for computer graphics, to perform computation in
applications traditionally handled by the CPU.

The problem was that GPGPU required using graphics programming languages like OpenGL and
Cg to program the GPU. Developers had to make their scientific applications look like graphics
applications and map them into problems that drew triangles and polygons

Because of their nature, GPUs are only effective at tackling problems that can be solved using
stream processing and the hardware can only be used in certain ways.

CUDA
CUDA (an acronym for Compute Unified Device Architecture) is a parallel computing
architecture developed by NVIDIA. CUDA is the computing engine in NVIDIA graphics
processing units or GPUs that is accessible to software developers through industry standard
programming languages. Programmers use 'C for CUDA' (C with NVIDIA extensions),
compiled through a PathScale Open64 C compiler, to code algorithms for execution on the GPU.
CUDA architecture shares a range of computational interfaces with two competitors -the
Khronos Group's Open Computing Language and Microsoft's DirectCompute. Third party
wrappers are also available for Python, Fortran, Java and MATLAB.

The latest drivers all contain the necessary CUDA components. CUDA works with all NVIDIA
GPUs from the G8X series onwards, including GeForce, Quadro and the Tesla line. NVIDIA
states that programs developed for the GeForce 8 series will also work without modification on
all future Nvidia video cards, due to binary compatibility. CUDA gives developers access to the
native instruction set and memory of the parallel computational elements in CUDA GPUs. Using
CUDA, the latest NVIDIA GPUs effectively become open architectures like CPUs. Unlike CPUs
however, GPUs have a parallel "many-core" architecture, each core capable of running
thousands of threads simultaneously - if an application is suited to this kind of an architecture,
the GPU can offer large performance benefits.

Advantages of CUDA
CUDA has several advantages over traditional general purpose computation on GPUs (GPGPU)
using graphics APIs.

 Scattered reads – code can read from arbitrary addresses in memory.


• Shared memory – CUDA exposes a fast shared memory region (16KB in size) that can be
shared amongst threads. This can be used as a user-managed cache, enabling higher
bandwidth than is possible using texture lookups.
• Faster downloads and readbacks to and from the GPU
• Full support for integer and bitwise operations, including integer texture lookups

CUDA Programming Model

 Parallel code (kernel) is launched and executed on a device by many threads

 Threads are grouped into thread blocks

 Parallel code is written for a thread

 Each thread is free to execute a unique code path

 Built-in thread and block ID variables

CUDA Architecture
The CUDA Architecture consists of several components

 Parallel compute engines inside NVIDIA GPUs

 OS kernel-level support for hardware initialization, configuration, etc.

 User-mode driver, which provides a device-level API for developers

 PTX instruction set architecture (ISA) for parallel computing kernels and functions

Tesla 10 Series
CUDA Computing with Tesla T10

 240 SP processors at 1.45 GHz: 1 TFLOPS peak

 30 DP processors at 1.44Ghz: 86 GFLOPS peak

 128 threads per processor: 30,720 threads total

Tesla 10-Series Architecture


 240 thread processors execute kernel threads
 30 multiprocessors, each contains

 8 thread processors

 One double-precision unit

 Shared memory enables thread cooperation

Thread Hierarchy
 Threads launched for a parallel section are partitioned into thread blocks

 Grid = all blocks for a given launch

 Thread block is a group of threads that can

 Synchronize their execution

 Communicate via shared memory


Execution Model
Warps and Half Warps

GPU Memory Allocation / Release


Host (CPU) manages device (GPU) memory:

 cudaMalloc (void ** pointer, size_t nbytes)

 cudaMemset (void * pointer, int value, size_t count)

 cudaFree (void* pointer)

Next Generation CUDA Architecture


The next generation CUDA architecture, codenamed Fermi is the most advanced GPU
architecture ever built. Its features include

• 3.2 billion transistors

• 512 CUDA cores-Optimized performance and accuracy with upto 8x faster double
precision

• Nvidia Parallel Datacache Technology-First GPU architecture to support a true cache


hierarchy in combination with on-chip shared memory

 Nvidia Gigathread Engine-Increased efficiency with concurrent kernel execution

 ECC Support-Detects and corrects error before system is defected

Applications
 Accelerated rendering of 3D graphics
 Video Forensic
 Molecular Dynamics
 Computational Chemistry
 Life Sciences
 Bioinformatics
 Electrodynamics
 Medical Imaging
 Oil and gas
 Weather and Ocean Modeling
 Electronic Design Automaton
 Video Imaging
 Video Acceleration

Why should I use GPU as a processor?


 When compared to the latest quad-core CPU, Tesla 20-series GPU computing processors
deliver equivalent performance at 1/20th the power consumption and 1/10th the cost

 When computational fluid dynamics problem is solved it takes

 9 minutes on a Tesla S870(4GPUs)

 12 hours on one 2.5 GHz CPU core


 Speed of Intel core i7 980XE is 107.6 GFLOPS in double precision while nVidia's Tesla
C1060 GPU computing card performs around 933 GFLOPS in single precision
calculations while AMD's HemlockXT 5970 reaches 4640 GFLOPS and with the same
Nvidia Tesla C1060 capable of 78 GFLOPS in double precision and the AMD Hemlock
5970 capable of 928 GFLOPS in double precision .

 GeForce 8800 GTX - 346 GigaFLOPs


Radeon HD 2900 XT - 475 GigaFLOPs
PS3 Cell - 154 GigaFLOPs
Core 2 Duo E6600 - 38 GigaFLOPs
Athlon 64 X2 4600+ - 19 GigaFLOPs
 After all, it’s a supercomputer

Bibliography
The data has been collected from

 Wikipedia

 Nvidia.com

 Google

 Intel.com

You might also like