Unit 5 - CUDA Architecture

SNJB’s Late Sau. K. B. J.
College of Engineering
Subject :- High Performance Computing
Unit 5
CUDA Architecture
By- Prof. Gunjan Deshmukh
SNJB’s KBJ CoE | Civil | Computer | E&TC | Mechanical | AIDS | MBA | Visit Us @: www.snjb.org
SNJB’s Late Sau. K. B. J. College of Engineering
Syllabus
1. Introduction to GPU
2. Introduction to GPU Architecture overview
3. Introduction to CUDA C- CUDA programming model
4. write and launch a CUDA kernel
5. Handling Errors
6. CUDA memory model
7. Manage communication and synchronization
8. Parallel programming in CUDA- C.
Course Objectives & Outcomes
CO5: Apply CUDA architecture for parallel programming
SNJB’s KBJ CoE | Civil | Computer | E&TC | Mechanical | AIDS | MBA | Visit Us @: www.snjb.org
Introduction to GPU
● GPU stands for Graphics Processing Unit, which is a specialized type of processor that is
designed to handle complex graphical computations.
● Originally, GPUs were used primarily for gaming and video rendering, but in recent years
they have become increasingly important for scientific computing, machine learning, and
other types of data processing tasks.
● A GPU consists of many small processing cores, each of which can execute multiple
calculations simultaneously.
● In addition to their processing power, GPUs also have specialized memory and
communication systems that allow them to quickly transfer data between the processor
cores and other components of a computer system.
Introduction to GPU Architecture - CUDA
● CUDA (or Compute Unified Device Architecture) is a parallel computing platform and
application programming interface (API) that allows software to use certain types of
graphics processing units (GPUs) for general purpose processing
● CUDA is a software layer that gives direct access to the GPU's virtual instruction set and
parallel computational elements, for the execution of compute kernels.
● CUDA is designed to work with programming languages such as C, C++, and Fortran.
● CUDA-powered GPUs also support programming frameworks such as OpenMP, OpenACC
and OpenCL by compiling such code to CUDA.
● CUDA was created by Nvidia.

● CPU: The CPU (Central Processing Unit) is the main processor in a computer system. It is
responsible for executing instructions and managing the overall operation of the system.
● CUDA driver: The CUDA driver is a software component that enables communication
between the CPU and the GPU. It is responsible for managing the hardware resources of the
GPU and providing an interface for applications to access the GPU.
● CUDA runtime: The CUDA runtime is a library of functions that provides a high-level
interface for developing CUDA applications. It includes functions for allocating and
managing memory on the GPU, launching kernels (small programs that run on the GPU), and
communicating between the CPU and GPU.
● CUDA library: The CUDA library is a collection of pre-built functions that can be used in
CUDA applications. It includes functions for linear algebra, signal processing, image
processing, and more.
● Control unit: The control unit is responsible for managing the overall operation of the GPU.
It includes components for scheduling and executing kernels, managing memory, and
handling communication between the CPU and GPU.
● GPU: The GPU (Graphics Processing Unit) is a specialized processor designed to handle
complex graphical computations. In a CUDA-enabled system, the GPU is used for
general-purpose computing as well as graphics rendering.
CUDA programming Model For HPC Architecture
The CUDA programming model consists of several key components:
● Host code: The host code is the main program that runs on the CPU. It is responsible for
managing the overall operation of the system, including launching kernels on the GPU,
managing memory, and communicating with other system components.
● Device code: The device code is the program that runs on the GPU. It is written in CUDA C
or CUDA C++, which are extensions of the C and C++ programming languages that include
additional features for programming GPUs.
● Threads: Threads are individual processing units that run on the GPU. Each thread executes
a single instance of a kernel, and multiple threads can run simultaneously on the GPU.
The CUDA programming model consists of several key components:
● Host code: The host code is the main program that runs on the CPU. It is responsible for
managing the overall operation of the system, including launching kernels on the GPU,
managing memory, and communicating with other system components.
● Device code: The device code is the program that runs on the GPU. It is written in CUDA C
or CUDA C++
● Threads: Threads are individual processing units that run on the GPU. Each thread executes
a single instance of a kernel, and multiple threads can run simultaneously on the GPU.
● Kernels: Kernels are small programs that run on the GPU. They are launched from the host
code and are responsible for performing the actual computations.
Write and launch a CUDA kernel
The CUDA program Execution

CUDA program Execution ● Standard C Program run on host

● NVDIA Compiler (nvcc) can be used
to compile programs with no device
code
Explanation:
● The __global__ keyword indicates that the

function hello_world() will run on the GPU.
● The hello_world() function simply prints
"Hello, World!" to the console using the
printf() function.
● In the main() function, we launch the
hello_world() function as a kernel using
the <<<1, 1>>> syntax. This specifies that we
want to launch one block with one thread per
block.
● We call cudaDeviceSynchronize() to wait
for the kernel to finish before exiting the
program.
References
● https://platform9.com/blog/kubernetes-enterprise-chapter-2-kubernetes-architecture-concepts/
●

Unit 5 - CUDA Architecture

Uploaded by

Copyright:

Available Formats

You might also like

Unit 5 - CUDA Architecture

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 5 - CUDA Architecture

Uploaded by

Copyright:

Available Formats

SNJB’s Late Sau. K. B. J.

Course Objectives & Outcomes

CO5: Apply CUDA architecture for parallel programming

Introduction to GPU Architecture - CUDA

The CUDA program Execution

CUDA program Execution ● Standard C Program run on host

● The global keyword indicates that the

You might also like