HPC-FINAL

FACULTY OF ENGINEERING & TECHNOLOGY
Subject Name: high performance computing Lab

Subject Code: 203105430
B. Tech – 3rd Year 6th Semester
+
Experiment: - 1
Objective : - Study the facilities provided by Google Colab.
What is Google Colab:

Google Colab, short for Google Collaboratory, is a free cloud-based platform
provided by Google that allows you to write and execute Python code in a
collaborative environment. It is particularly popular among researchers, data
scientists, and machine learning practitioners. Google Colab provides a Jupyter
Notebook-like interface where you can create and share documents that contain
live code, equations, visualizations, and narrative text.
What is the use of Google Colab:

Google Colab has several uses, making it a versatile tool for various purposes.
Here are some of its most common applications:
➢ Machine Learning: -
• Training models: Colab's free access to powerful GPUs and TPUs makes it ideal for
training computationally intensive machine learning models. You can experiment with
different algorithms and hyperparameters easily without needing a powerful local machine.
• Developing and testing ML applications: Colab allows you to build and test your machine
learning applications right within the browser. This makes it a quick and convenient way to
iterate and debug your code.
➢ Data Science: -
• Data analysis and visualization: Colab offers various libraries for data analysis and
visualization, including NumPy, Pandas, and Matplotlib. This allows you to explore and
analyze your data sets without installing any software.
• Data cleaning and processing: Colab provides tools for cleaning and preprocessing your
data before analysis, making it ready for model training or further exploration.
➢ Education: -
• Interactive tutorials and notebooks: Colab enables educators to create interactive tutorials
and notebooks that combine code, text, and visualizations. This makes learning more
engaging and interactive for students.
• Teaching programming: Colab can be used to teach programming languages like Python,
especially for beginners, as it requires no setup and provides a user-friendly interface.
• Sharing educational resources: Educators can share their Colab notebooks with other
instructors and students, facilitating the dissemination of educational resources.
1|Page
2203031249007
➢ General use: -
• Running scripts and code snippets: Colab can be used to run simple scripts and code
snippets without needing a local development environment. • Exploring libraries and
frameworks: You can use Colab to explore new libraries and frameworks without installing
them on your local machine.
• Developing web applications: Colab supports various web development libraries and
frameworks, allowing you to build and experiment with web applications directly in the
browser
List of Features of Google Colab Versatile:

• Free access to powerful GPUs and TPUs:
1. Colab provides free access to Google's cloud infrastructure
2. including GPUs and TPUs.
3. This makes it ideal for tasks like training machine learning models and running
computationally intensive simulations.
• Interactive environment:
1. Colab allows you to combine code, text, images, and more in a single document.
2. This makes it easy to experiment and explore ideas.
3. You can also easily share your notebooks with others, making Colab a great tool for
collaboration.
• Easy to share:
1. You can easily share your Colab notebooks with others.
2. This makes it a great tool for collaboration.
3. You can also share your notebooks publicly, so that others can see your work and learn
from it.
• Wide range of libraries:

1. Colab comes pre-installed with a wide range of Python libraries, including TensorFlow,
PyTorch, and NumPy.
2. This means that you can get started with machine learning and data science right away.
3. You can also install additional libraries as needed.
• Supports multiple languages:

1. Colab supports Python, JavaScript, and other languages.
2. This means that you can use Colab for a variety of tasks, even if you're not a Python
programmer.
• Preemptible VMs:
1. Colab offers preemptible VMs
2. which are virtual machines that can be stopped at any time to make room for other users.
3. This is a great way to save money if you're only running short-lived jobs.
2|Page
2203031249007
• Customizable runtime:
1. You can customize the runtime environment for your Colab notebook.
2. This means that you can install specific libraries and dependencies that you need for your
project.
• Offline support:
1. You can use Colab offline by saving your notebook to your local machine.
2. This is a great way to work on your project even if you don't have an internet
connection.
• Version control:
1. Colab integrates with Google Drive
2. which provides version control for your notebooks.
3. This means that you can easily track changes to your notebooks and revert to previous
versions if necessary.
• Automatic backups:
1. Colab automatically backs up your notebooks to Google Drive.
2. This means that you don't have to worry about losing your work if your computer crashes
or you accidentally delete your notebook.
Python program
Code: -
g = int(input("ENTER A NUMBER TO CHECK : "))
a=0
k=0
for i in range(1,a+1):
if(g%i==0):
print(i)
k = k+1
if(k==2):
print("IT IS A PRIME NUMBER ")
else:
print("IT IS NOT A PRIME NUMBER")
OUTPUT:
3|Page
2203031249007
C – PROGRAM
CODE:
#include<stdio.h>
int main()
{
float a, b,average;
printf("Enter the firstnumber: ");
scanf("%f", &a)
printf("Enter the second number: ");
scanf("%f", &b);
average = (a + b) / 2;
printf("The average of %.2f and %.2f is %.2f\n", a, b, average);
return 0;
}
OUTPUT:
4|Page
2203031249007
Experiment -2
Objective: - Demonstrate basic Linux Commands.
Linux Commands: -
1. PWD:
• pwd - Print Working Directory.
• The pwd command is used to display the location of the current working directory.
• The pwd showing the full path.
Syntax: pwd
Output:
2. LS:
• Ls - List
• The ls command is used to display a list of content of a directory.
• Lists files and directories in the specified directory.
• Common options include -l for detailed listing and -a for showing hidden files.
Syntax: ls [options] [directory] or ls
Output:
5|Page
2203031249007
3. CD:
• cd - Change Directory.
• The cd command is used to change the current directory.
• The Use cd to go to the home directory.
• cd .. to move up one level.
Syntax: cd [directory
Output:
4. MKDIR:
• mkdir - Make Directory
• Creates a new directory with the specified name.
• The mkdir command is used to create a new directory under any directory
Output:
6|Page
2203031249007
5. RMDIR:
• rmdir - remove Directory
• The rmdir command is used to delete a directory.
• In other words, you can use this command to delete a particular directory
Syntax: rmdir <directory>

Output:
6. TOUCH:
• The touch command is just like a mkdir command.
• The touch command is used to create empty files.
• We can create multiple empty files by executing it once
Syntax: touch <filename>

Output:
7|Page
2203031249007
7. CAT:
• The cat command is a multi-purpose utility tool in the Linux system.
• It can be used to create a file, display content of the file, copy the content of one
file to another file, and more.
• This is much quicker than opening the file in an editor.
Syntax: cat <filename>

Output:
8. RM:
• rm - remove
• The rm command is used to remove a file.
• it discards all the items alone without the owner’s permission.
• Maintain certified backup of all the critical documents at all times
Syntax: rm [options] <filename/directory>
Output:
8|Page
2203031249007
9. CP:
• cp - copy
• The cp command is used to copy a file or directory.
• Specifies the source file or directory to be copied.
• Specifies the destination where the source will be copied.
Syntax: cp <existing file name> <new file name>

Output:
10.MV:
• mv - move
• The mv command is used to move a file or a directory form one location to
another location.
• The base file name is retained when you relocate a file or directory to another guide
• The mv command reorganizes the folder or transfers the directory from one directory
to another
Syntax: mv <source> <destination>

Output:
9|Page
2203031249007
11.HEAD:
• The head command is used to display the content of a file
• It displays the first 5 lines of a file.
• The head command prints the top N numbers of data from the provided input.
• Data from each file is introduced by its document if only one filename is given.
Syntax: head [options] <filename>

Output:
12.TAIL:
• The tail command is similar to the head command.
• It The difference between both commands is that it displays the last ten lines of the
file content.
• is useful for reading the error message.
Syntax: tail [options] <filename>
Output:
10 | P a g e
2203031249007
13.TAC:
• The tac command is the reverse of cat command, as its name specified.
• It displays the file content in reverse order (from the last line).
• Attach non-empty lines before a line with a separator.
• Treat the separator string as a regular expression.
Syntax: tac [options] <filename>
Output:
14.DATE:
• The date command will show the Week, Month, Day and Time.
• The date command is versatile and can be used in scripts
• The command line to generate timestamps, automate tasks based on dates
• For various other date-related operations.
Syntax: date
Output:
11 | P a g e
2203031249007
15.CAL:
• The cal command is used to display a simple calendar. It shows a calendar for the
current month or a specified month and year.
• If you run the cal command without any options, it will display the calendar for
the current month.
• The cal command provides a quick and simple way to view calendars in the terminal.
• For more advanced calendar operations, you might consider using other tools or
graphical calendar applications
Syntax: cal
Output:
16.ECHO:
• The cal command is used to display a simple calendar. It shows a calendar for
the current month or a specified month and year.
• If you run the cal command without any options, it will display the calendar for
the current month.
Syntax: echo
Output:
12 | P a g e
2203031249007
17.CHMOD:
• The chmod command lets you change the mode of a file (permissions) quickly. It
has a lot of options available with it.
• The basic permissions a file can have are:
1. r (read)
2. w (write)
3. x (execute)
• One of the most common use cases for chmod is to make a file executable by the
user. To do this, type chmod and the flag +x, followed by the file you want to
modify permissions on:
Syntax: chmod <permissions> <file>

Output:
18.UNAME:
• the uname command is used to check the complete OS information of the system.
• Check out the command and the output below.
• The command ‘uname‘ displays the information about the system
Syntax: uname
Output:
13 | P a g e
2203031249007
19.WC:
• The wc command can accept zero or more input FILE names. If no FILE is
specified, or when FILE is -, wc will read the standard input. A word is a string of
characters delimited by a space, tab, or newline.
• In it’s simplest form when used without any options, the wc command will print
four columns, the number of lines, words, byte counts and the name of the file for
each file passed as an argument. When using the standard input the fourth column
(filename) is not
Syntax: wc OPTION
Output:
20.WHOAMI:
• You can use the whoami command in shell scripts to check the user’s name running the
Script.
• Here is an example using an if statement to compare the user’s name running the script
with a given string.
• whoami does not accept arguments. If you pass an argument, the command prints an error
message:
Syntax: whoami
Output:
14 | P a g e
2203031249007
21.SLEEP:
• Searches for files in a directory with the specified name.
• The find command searches for files in a directory hierarchy based on a
regex expression.
• To use it, follow the syntax below
Syntax: sleep 5
Output:
22. I:
• The default editor that comes with the UNIX operating system is called vi
(visual editor).
• Using vi editor, we can edit an existing file or create a new file from scratch.
• we can also use this editor to just read a text file. The advanced version of the vi
editor is the vim editor
Syntax: vi[file_name]
Output:
15 | P a g e
2203031249007
23. ID:
• The id command is used to display the user ID (UID) and group ID (GID).
• The id command in Linux is used to display user and group information for
the current user or a specified username or user ID.
• -u, --user: Display only the user ID.
• -g, --group: Display only the effective group ID
Syntax: id
Output:
24.EXIT:
• The exit command terminates a script, just as in a C program. It can also return a
value, which is available to the script's parent process.
• Issuing the exit command at the shell prompt will cause the shell to exit.
• Common aliases for exit include "bye", "logout", and "lo".Every command
returns an exit status
Syntax: exit
Output:
16 | P a g e
2203031249007
25.CLEAR:
• clear is a standard Unix computer operating system command that is used to clear the
terminal screen.
• This command will ignore any command-line parameters that may be present. Also,
the clear command doesn’t take any argument and it is almost similar to cls command
on a number of other Operating Systems
• This command first looks for a terminal type in the environment and after that, it
figures out the terminfo database for how to clear the screen
Syntax: clear
Output:
26.TIME:
• The time command is used to display the time to execute a command.
• The time command in Linux is a useful tool for measuring the execution time of
a command or script.
• It provides detailed information about the real, user, and system time consumed by the
executed command
Syntax: time
Output:
17 | P a g e
2203031249007
27. DF:
• The df command is used to display the disk space used in the file system.
• It displays the output as in the number of used blocks, available blocks, and
the mounted directory.
• The displayed disk space usage may not be entirely accurate in some cases, especially
if file system metadata is not updated promptly
Syntax: df
Output:
28.PING:
• PING (Packet Internet Groper) command is used to check the network
connectivity between host and server/host.
• This command takes as input the IP address or the URL and sends a data packet
to the specified address with the message “PING” and get a response from the
server/host this time is recorded which is called latency.
• Fast ping low latency means faster connection. Ping uses ICMP(Internet Control
Message Protocol) to send an ICMP echo message to the specified host if that host is
available then it sends ICMP reply message. Ping is generally measured in
millisecond every modern operating system has this ping pre-installed
Output:
18 | P a g e
2203031249007
29.HISTORY:
• The history command in Linux is a built-in shell tool that maintains a record
of commands executed in the current terminal session.
• It allows users to view, search, and reuse previously executed commands, saving
time and effort.
• Lists previously run commands
Syntax: history
Output:
30.PS:
• The output includes information about the shell (bash) and the process running in this shell
(ps, the command that you typed):
• The four columns are labeled PID, TTY, TIME, and CMD..
• PID- The process ID. Usually, when running the ps command, the most important
information the user is looking for is the process PID. Knowing the PID allows you to
kill a malfunctioning process .
• TTY - The name of the controlling terminal for the process.
• TIME - The cumulative CPU time of the process, shown in minutes and seconds.
• CMD - The name of the command that was used to start the process
Syntax: ps
Output:
19 | P a g e
2203031249007
Subject Name: High Performance Computing Laboratory Code:
203105430
20 | P a g e
Subject Name: High Performance Computing Laboratory Code:
203105430
21 | P a g e
PRACTICAL -4
AIM: Write a program on an unloaded cluster for several different numbers of nodes and
record the time taken in each case Draw a graph ofexecution time against the number of nodes
➢ WHAT IS HPC CLUSTER?

High-Performance Computing (HPC) clusters are powerful computing systems that consist of
multiple interconnected computers, or nodes, working together to solve complex computational
problems. These clusters are designed to provide significantly higher computational power and speed
compared to individual computers.
➢ HOW TO BUILD AN HPC CLUSTER?

Building an HPC (High-Performance Computing) cluster involves several steps, and the
process can be complex. Here's a general overview of the steps you might take:
Define Requirements:
Identify the specific computational requirements of your applications or simulations.

Determine the desired level of performance, scalability, and parallel processing capability.
Select Hardware:
Choose high-performance hardware components, including processors, memory, accelerators

(such as GPUs or FPGAs), interconnects, and storage.Consider factors like power
consumption, cooling, and physical space constraints.
Networking:
Select a high-speed interconnect technology (e.g., InfiniBand, Ethernet) for efficient

communication between nodes.Plan the network topology to minimize latency and maximize
bandwidth.
Cluster Architecture:
Decide on the cluster architecture, such as symmetric or asymmetric, and the arrangement of
nodes.Consider the use of a master node for managing the cluster and coordinating tasks.
Operating System and Software:
Choose a suitable operating system for the cluster (common choices include Linux
distributions tailored for HPC, such as CentOS, Ubuntu, or specialized HPC-centric OS).
Install necessary software tools, libraries, and middleware for parallel processing and job
scheduling (e.g., MPI for message passing, Slurm or Torque for job scheduling).
22 | P a g e
2203031249007
File System:
Implement a high-performance parallel file system that can handle the I/O demands of your
applications.Consider distributed or parallel file systems like Lustre or GPFS.
Cluster Management:
Install cluster management software for monitoring, resource allocation, and job scheduling.
Popular choices include OpenHPC, Bright Cluster Manager, or custom configurations using
tools like Puppet or Ansible.
Power and Cooling:
Ensure adequate power and cooling infrastructure to support the cluster, especially
considering the high power consumption of HPC systems.
Testing:
Perform thorough testing to validate the functionality and performance of the HPC
cluster.Test parallel processing capabilities, network communication, and overall system
stability.
Optimization:
Fine-tune the cluster configuration based on performance benchmarks and application

requirements. Optimize the software stack, parallelization strategies, and resource allocation.
Documentation:
Create comprehensive documentation for the cluster setup, configuration, and maintenance
procedures.
Training:
Provide training for administrators and users on how to effectively use and manage the HPC
cluster.
Scaling:
Plan for future scalability by designing the cluster with expansion capabilities, enabling the
addition of more nodes as needed.
“Building an HPC cluster requires expertise in hardware, networking, system administration,

and parallel programming. Depending on your specific needs and resources, you may also
consider seeking assistance from HPC specialists, consultants, or vendors with experience in
cluster deployment.”
23 | P a g e
2203031249007
➢ KEY COMPONENT OF HPC CLUSTER
Computer Hardware:
- Includes servers, storage, and a dedicated network.
- Provision at least three servers for primary, worker, and client nodes.
- High-end servers with ample processors and storage are essential.
- Networking infrastructure requires high-bandwidth TCP/IP equipment like Gigabit Ethernet
Software:
- Comprises tools for monitoring, provisioning, and managing the cluster.
- Software stacks include libraries, compilers, debuggers, and file systems.
- HPC frameworks like Hadoop offer fault-tolerance and automatic system redirection
Facilities:
- Physical space to hold racks of servers.
- Power capacity up to 43 kW to operate and cool the servers.
- Networking gear such as NICs and switches are crucial for the cluster
24 | P a g e
2203031249007
CODE: Draw a graph ofexecution time against the number of nodes
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import time
X=[[1,2],[1,4],[1,0],[4,2],[4,0],[4,4],[4,5],[0,2],[5,5
]]
#Nilesh=6B5
nodes = [1,2,3,4,5]
time_taken=[]
for n in nodes:
start_time=time.time()
kmeans=KMeans(n_clusters=n)
kmeans.fit(X)
end_time=time.time()
time_taken.append(end_time - start_time)
plt.plot(nodes, time_taken)
plt.xlabel('Number of nodes')
plt.ylabel('Time Taken')
plt.title('Time taken vs Number of Nodes')
plt.show
OUTPUT
25 | P a g e
2203031249007
26 | P a g e
2203031249007
EXPERIMENT -5
OBJECTIVE: Write a program to check task distribution using Gprof.
What are profilers: Profiling is the process of measuring the execution time of
a program and identifying where the program spends most of its time. The gprof
tool is typically associated with the GNU Compiler Collection (GCC) and is
available on many Unix-like operating systems.
What is Gprof: gprof is a profiling tool that is commonly used in the context of
software development, particularly with programs written in languages like C
and C++. The tool is part of the GNU Compiler Collection (GCC) and is
available on many Unix-like operating systems.
The primary purpose of gprof is to provide insights into the execution time of a
program, helping developers identify performance bottlenecks and optimize
their code
Features: gprof is a profiling tool that provides developers with insights into
the runtime behavior and performance characteristics of their programs. Here
are some of the key features and information that gprof can provide:
Flat Profile:
Summary of the time spent in each function.
Percentage of the total program execution time spent in each function.
Number of calls and self-time (exclusive time spent in the function) for each
function.
Call Graph:
A graphical representation of the function call relationships.
Displays which functions call other functions and the percentage of time spent
in each.
Execution Count:
Information about how many times each function was called during the
program's execution.
Time Distribution:
Shows the percentage of total program execution time spent in various
functions.
Helps identify functions that consume a significant portion of the runtime.
27 | P a g e
2203031249007
Cycle Accurate Profiling:

gprof can be configured to provide cycle-accurate profiling on some
architectures, giving more detailed information about where time is spent at the
instruction level.
Source Code Annotation:

gprof can annotate the source code with profiling information, helping
developers quickly identify which parts of their code are consuming the most
time.
Profile Data File (gmon.out):

The profiling information is stored in a file called gmon.out during program
execution. This file can be used for post-execution analysis.
Command-Line Options:
gprof provides various command-line options to customize the output and
analysis. For example, specifying the executable to analyze, setting the sorting
criteria for the output, and more.
Integration with GCC:

gprof is part of the GNU Compiler Collection (GCC) and is often used in
conjunction with GCC for profiling C and C++ programs.
‘It's important to note that while gprof is a helpful tool for basic profiling, it
may have some limitations, and for more sophisticated profiling needs,
developers might explore other profiling tools or techniques. Additionally,
some modern development environments and IDEs provide integrated
profiling tools with graphical interfaces for a more user-friendly experience’
28 | P a g e
2203031249007
Simple code:
C-Program
%%writefile add.c
// C Program to demonstrate Decimal to Hexadecimal
// Conversion using the format specifier
#include <stdio.h>
int main()
{
int decimalNumber = 45;
// printing hexadecimal number

// using format specifier %X
printf("Hexadecimal number is: %X", decimalNumber);
return 0;
}
Code for compile

!gcc -Wall -pg add.c -o add
Code for run program

!./add
Code for generate gmon.out file

!ls
Code for run gprof output

!gprof add
29 | P a g e
2203031249007
OUTPUT for Simple code:
30 | P a g e
2203031249007
Complex code:
C-PROGRAM
%%writefile add.c
#include <stdio.h>
int main() {
long long sum = 0; // Using long long to handle large sums
int limit = 100000000; // 1 crore
for (int i = 1; i <= limit; ++i) {

sum += i;
}
//Nilesh-2203031247022
printf("Sum of the first 1 crore numbers: %lld\n", sum);
return 0;
}
Code for compile :

!gcc -Wall -pg add.c -o add
Code for run program:

!./add
Code for generate gmon.out file

!ls
Code for run gmon.out

!gprof add
31 | P a g e
2203031249007
OUTPUT FOR COMPLEX CODE:
32 | P a g e
2203031249007
Subject Name: High Performance Computing Laboratory
Code: 203105430
Experiment No: 6
AIM :- Use Intel V-Tune Performance Analyzerfor Profiling.
Intel V-Tune Profiler:
Intel offers a performance profiling tool called Intel VTune Profiler. VTune Profiler is part of the Intel
oneAPI toolkit, and it is used for performance analysis and tuning of applications. Please note that there
may have beenupdates or changes since my last knowledge update.
Use Intel VTune Profiler to analyze local and remote targetsystemsfromWindows*, macOS*, and
Linux*hosts. Improve application and system performance through these operations:
• Analyze algorithm choices.
• Find serial and parallel code bottlenecks.
• Understand where and how your application can benefit fromavailable hardware resources.
• Speed up the execution of your application.
V- Tune Profiler is used to locate:

• The most time-consuming (hot) functions in your application and/or on thewhole system
• Sections of code that do not effectively utilize available processor time
• The bestsections of code to optimize forsequential performance and forthreaded performance
• Synchronization objectsthat affectthe application performance
• Whether, where, and why your application spendstime on input/outputoperations
• Whether your application is CPU or GPU bound and how effectively it ofloadscode to the GPU
• The performance impact of different synchronization methods, different numbers of threads, or

different algorithmsIntel® VTune™ Profiler User Guide1 11
• Thread activity and transitions
• Hardware-related issuesin your code such as data sharing, cache misses,branch

misprediction, and other
Download Intel VTune Profiler to your system through one of these ways:
• Download the Standalone version.
• GetIntel VTune Profiler as part of the Intel® oneAPI Base Toolkit.
33 | P a g e
2203031249007
Code: 203105430
Understand the Workflow

Use Intel VTune Profiler to profile an application and analyze results for performance improvements. The
general workflow contains these steps
Select Your Host System to Get Started

Learn more aboutsystem-specific workflows for Windows* Systems
Get Started with Intel® VTune™ Profiler for Windows* OS

Before You Begin
1. Install Intel® VTune™ Profiler on your Windows* system.
2. Build your application with symbol information and in Release mode with all optimizations
enabled. For detailed information on compilersettings, see the VTune Profiler online user guide.
You can also use the matrix sample application available in \VTune\Samples\matrix. You can see
correspondingsample results in \VTune\Projects\sample (matrix).
3. Set up the environment variables: Run the \setvars.batscript.
4. By default, the for oneAPI components is Program Files (x86)\Intel\oneAPI.
34 | P a g e
2203031249007
Code: 203105430
Step 1: Start Intel® VTune™ Profiler

Start Intel VTune Profiler through one of these ways and set up a project. Aproject is a container for the
application you want to analyze, the type of analysis, and data collection results.
Step 2: Configure and Run Analysis

After creating a new project, the Configure Analysis window opens with thesedefault values:
The Performance Snapshot analysis type is a good starting point for your profiling experience with Intel®
VTune™ Profiler. This analysis gives you a general overview of your application performance with
recommendations forareas to focus.
1. In the Launch Application section, browse to the location of yourapplication executable file.
2. Click Start to run Performance Snapshot on your application. This analysis presents a general
35 | P a g e
2203031249007
Code: 203105430
Step 3: View and Analyze Performance

Data When data collection completes, VTune Profiler displays analysis results inthe Summary window.
Here, you see a performance overview of your application. The overview typically includes several
metrics along with their descriptions.
36 | P a g e
2203031249007
Code: 203105430
Experiment No: 7
AIM :- Analyze the code using Nvidia-Profilers.
Nvidia Profilers:
NVIDIA profiling tools enable us to understand and optimize the performance of our CUDA, OpenACC or OpenMP
applications. The Visual Profiler is a graphical profiling tool that displays a timeline of your application’s CPU and
GPU activity, and that includes an automated analysis engine to identify optimization opportunities. The nvprof
profiling tool enables you to collect and view profiling data from the command-line.
Types of Nvidia Profilers:
1. Visual:
The NVIDIA Visual Profiler allows you to visualize and optimize the performance of your application. The
Visual Profiler displays a timeline of your application’s activity on both the CPU and GPU so that you can identify
opportunities for performance improvement. In addition, the Visual Profiler will analyze your application to detect
potential performance bottlenecks and direct you on how to take action to eliminate or reduce those bottlenecks.
2. Nvprof:
The nvprof profiling tool enables you to collect and view profiling data from the command-line. nvprof
enables the collection of a timeline of CUDA-related activities on both CPU and GPU, including kernel execution,
memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels. Profiling options are
provided to nvprof through command-line options. Profiling results are displayed in the console after the profiling
data is collected, and may also be saved for later viewing by either nvprof or the Visual Profiler.
3. Remote Profiling:
Remote profiling is the process of collecting profile data from a remote system that is different than the
hostsystem at which that profile data will be viewed and analyzed. There are two ways to perform remote
profiling. You can profile your remote application directly from an nsight orthe Visual Profiler. Or you can use
nvprof to collect the profile data on the remote system and then use nvvp on the host system to view and analyze
the data.
4. MPI Profiling:
nvprof has a built-in option thatsupports two MPI implementations - OpenMPI and MPICH. If you have
either of these installed on your system, you can use the –annotate-mpi option and specify your installed MPI
implementation.
5. MPS Profiling:
You can collect profiling data for a CUDA application using Multi-Process Service (MPS) with nvprof and
then view the timeline by importing the data in the Visual Profiler.
Code:
%%writefile addi.cu
#include global void add(int a, int b, int *c) {
*c = a + b;
}
int main() {
int a,b,c;
37 | P a g e
2203031249007
Code: 203105430
a=3; b=4;
cudaMalloc((void**)&dev_c, sizeof(int));
add<<>>(a,b,dev_c);
cudaMemcpy(&c, dev_c,sizeof(int), cudaMemcpyDeviceToHost);
printf("%d + %d is %d\n", a, b, c);
cudaFree(dev_c);
return 0;
}
Creating object file using

Command : !nvcc addi.cu -o addi
Output :
Command : !nvprof ./addi

!nvprof is a command-line tool that allows you to profile CUDA applications and collect various performance
metrics1. It can help you identify performance bottlenecks and optimization opportunities in your code.
./addi is the name of an executable fil
38 | P a g e
2203031249007
PRACTICAL – 8
AIM: Load Distribution Using on GPU using CUDA
1. Create a new notebook
2. Set GPU as runtime type
2203031249007 39 | P a g e
3. Install Required Libraries using pip install

a) Pycuda
b) Numpy
4. Import required Libraries
2203031249007 40 | P a g e
5. Cuda Kernel Code :
a) Write the CUDA kernel code. Code to run on GPU
2203031249007 41 | P a g e
Experiment -9
Objective: - Write a simple CUDA program to print "Hello World!"
 What is CUDA:
• CUDA stands for Compute Unified Device Architecture.
• It is an extension of C/C++ programming.
• CUDA is a programming language that uses the Graphical Processing Unit (GPU).
• It is a parallel computing platform and an API (Application Programming Interface)
model, CUDA was developed by NVIDIA.
• This allows computations to be performed in parallel while providing well-formed speed.
• Using CUDA, one can harness the power of the Nvidia GPU to perform common computing
tasks, such as processing matrices and other linear algebra operations, rather than simply
performing graphical calculations.
 Features of CUDA:
CUDA (Compute Unified Device Architecture) from NVIDIA offers several features that make it a
powerful tool for parallel computing using GPUs. Here are some key features of CUDA:
1. Parallel Processing Model:

• CUDA enables the use of parallel processing on NVIDIA GPUs.
• It allows developers to harness the computational power of GPU cores, which are designed for
parallel execution of tasks.
2. CUDA Kernels:
• Developers can write CUDA kernels, which are functions designed to be executed on the GPU.
• Kernels can be called from the CPU code and run in parallel across many GPU threads.
3. GPU Memory Management:
• Explicit control over GPU memory allows developers to allocate and deallocate memory on the
GPU.
• Efficient memory transfers between the CPU and GPU are crucial for performance optimization.
4. Thread Hierarchy:
• CUDA organizes parallel execution using a hierarchy of threads, blocks, and grids.
• Threads are grouped into blocks, and blocks are organized into a grid. This hierarchy allows for
flexible and scalable parallelism.
5. CUDA C/C++ Language Extensions:

• CUDA extends standard programming languages (C/C++) with GPU-specific features.
• Developers can write both CPU and GPU code within the same source file, using CUDA-specific
keywords and constructs
.6. CUDA Libraries:
2203031249007
42 | P a g e
• NVIDIA provides optimized libraries for various numerical and scientific tasks.
• Examples include cuBLAS for linear algebra, cuFFT for Fast Fourier Transforms, and cuDNN
for deep neural networks.
6. Unified Virtual Addressing (UVA):
• UVA allows developers to use a single address space for both CPU and GPU memory,
simplifying memory management.
7. Dynamic Parallelism:
• CUDA supports dynamic parallelism, allowing a GPU kernel to launch new GPU kernels.
• This feature is useful for algorithms with varying levels of parallelism.
8. Occupancy and Optimization:
• Tools and techniques are available for optimizing GPU code, including occupancy calculations
and performance analysis tools.
9. Compatibility:
• CUDA is compatible with a wide range of NVIDIA GPUs, ensuring that developers can target
various hardware architectures.
10. Ecosystem and Support:
• A rich ecosystem of tools, documentation, and community support is available for CUDA
developers.
• The CUDA Toolkit includes a compiler, debugger, profiler, and other development tools.
11. Multi-GPU Support:
• CUDA supports multi-GPU configurations, allowing developers to scale their applications across
multiple GPUs for increased performance.
Step1: - Open Google Colab
2203031249007
43 | P a g e
Step2: - Select the runtime in Google Colab (Change runtime type to T4GPU)
Step3: - Check The Version of NVCC
Step4: - Install The NVCC Plugin
Step5: - Load the NVCC Plugin
2203031249007
44 | P a g e
 Hello in CUDA Programming:
• Compile The CUDA Program
 OUTPUT: -
• Run the CUDA program
 Addition of two number in CUDA programming: -

2203031249007
45 | P a g e
Step1: Declare Device variables,

Step2: Allocate Memory for Device variables,
Step3: Copy Host Memory to Device Memory.
Step4: Launch Kernel with appropriate Number of Blocks and Threads Per Block.
Step5: Copy Device Memory contents to Host Memory.
Step6: Display the result.
Step7: Free the allocated memory for Device variables as well as Host variables.
• Completion of cuda program
 OUTPUT: -
2203031249007
46 | P a g e
• Run the cuda program
2203031249007
47 | P a g e
Code: 203105430
Experiment No: 10
AIM :- Write a CUDA program to add twoarrays.
CODE : ADD TWO NUMER AND PRINT FINAL NUMBER
%%writefile addi.cu
#include<stdio.h>
global void add(int a, int b, int *c)
{
*c = a + b;
}
int main()
{
int a,b,c;
int *dev_c;
a=3;
b=4;
cudaMalloc((void**)&dev_c, sizeof(int));
add<<<1,1>>>(a,b,dev_c);
cudaMemcpy(&c, dev_c,sizeof(int), cudaMemcpyDeviceToHost);
printf("%d + %d is %d\n", a, b, c);
cudaFree(dev_c);
return 0;
}
OUTPUT:
48 | P a g e
2203031249007
Code: 203105430
CODE : ADD TWO ARRAY AND PRINT FINAL ARRAY
%%writefile addition.cu
#include <stdio.h>
#include <cuda_runtime.h>
global void add(int *a, int *b, int *c, int n)

{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < n)
c[i] = a[i] + b[i];
}
int main() {
int n = 10;
int *a, *b, *c;
int *d_a, *d_b, *d_c;
int size = n * sizeof(int);
a = (int *)malloc(size);
b = (int *)malloc(size);
c = (int *)malloc(size);
for (int i = 0; i < n; i++) {

a[i] = i;
b[i] = i * 2;
}
cudaMalloc((void **)&d_a, size);

cudaMalloc((void **)&d_b, size);
cudaMalloc((void **)&d_c, size);
cudaMemcpy(d_a, a, size, cudaMemcpyHostToDevice);

cudaMemcpy(d_b, b, size, cudaMemcpyHostToDevice);
add<<<1, n>>>(d_a, d_b, d_c, n);
cudaMemcpy(c, d_c, size, cudaMemcpyDeviceToHost);
for (int i = 0; i < n; i++)

printf("%d + %d = %d\n", a[i], b[i], c[i]);
cudaFree(d_a);
49 | P a g e
2203031249007
Code: 203105430
cudaFree(d_c);
free(a);
free(b);
free(c);
return 0;
}
OUTPUT :
50 | P a g e
2203031249007

HPC-FINAL

Uploaded by

Copyright:

Available Formats

You might also like

HPC-FINAL

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HPC-FINAL

Uploaded by

Copyright:

Available Formats

FACULTY OF ENGINEERING & TECHNOLOGY

Subject Name: high performance computing Lab

Objective : - Study the facilities provided by Google Colab.

What is Google Colab:

What is the use of Google Colab:

List of Features of Google Colab Versatile:

• Wide range of libraries:

• Supports multiple languages:

Objective: - Demonstrate basic Linux Commands.

Syntax: rmdir <directory>

Syntax: touch <filename>

Syntax: cat <filename>

Syntax: cp <existing file name> <new file name>

Syntax: mv <source> <destination>

Syntax: head [options] <filename>

Syntax: chmod <permissions> <file>

➢ WHAT IS HPC CLUSTER?

➢ HOW TO BUILD AN HPC CLUSTER?

Identify the specific computational requirements of your applications or simulations.

Choose high-performance hardware components, including processors, memory, accelerators

Select a high-speed interconnect technology (e.g., InfiniBand, Ethernet) for efficient

Operating System and Software:

Power and Cooling:

Fine-tune the cluster configuration based on performance benchmarks and application

“Building an HPC cluster requires expertise in hardware, networking, system administration,

➢ KEY COMPONENT OF HPC CLUSTER

- Includes servers, storage, and a dedicated network.

- High-end servers with ample processors and storage are essential.

- Networking infrastructure requires high-bandwidth TCP/IP equipment like Gigabit Ethernet

- Comprises tools for monitoring, provisioning, and managing the cluster.

- Software stacks include libraries, compilers, debuggers, and file systems.

- Physical space to hold racks of servers.

- Power capacity up to 43 kW to operate and cool the servers.

CODE: Draw a graph ofexecution time against the number of nodes

OBJECTIVE: Write a program to check task distribution using Gprof.

Cycle Accurate Profiling:

Source Code Annotation:

Profile Data File (gmon.out):

Integration with GCC:

// printing hexadecimal number

Code for compile

Code for run program

Code for generate gmon.out file

Code for run gprof output

OUTPUT for Simple code:

for (int i = 1; i <= limit; ++i) {

Code for compile :

Code for run program:

Code for generate gmon.out file

Code for run gmon.out

OUTPUT FOR COMPLEX CODE:

• Analyze algorithm choices.

• Find serial and parallel code bottlenecks.

• Speed up the execution of your application.

V- Tune Profiler is used to locate:

• Sections of code that do not effectively utilize available processor time

• The bestsections of code to optimize forsequential performance and forthreaded performance

• Synchronization objectsthat affectthe application performance

• Whether, where, and why your application spendstime on input/outputoperations

global void add(int a, int b, int *c, int n)