Cluster Computing 2

DRDO(ADVANCED SYSTEMS LABORATORY)
CLUSTER COMPUTING
PROJECT REPORT
Submitted By:
KANDALA VENKATA PHANI SAI RATNADEEP
2210315725
CSE-7
Submitted To:
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GITAM DEEMED TO BE UNIVERSITY,HYDERABAD
MAY-JUNE, 2018
Internship Report Page 1

Date: 15/06/2018
CERTIFICATE
This is to certify that Kandala Venkata Phani Sai Ratnadeep (2210315725) student of
Computer Science and Engineering Department from GITAM DEEMED TO BE
UNIVERSITY, HYDERABAD have successfully completed their project on "
CLUSTER COMPUTING ", as a part of summer internship from 1st May 2018 to 15th
June 2018 under the guidance of Shri V.Malolan Scientist 'E', Division Head of
QDMS&NDE ,Directorate of Reliability & Quality Assurance at Advanced System
Laboratory, Defence Research and Development Organization.
The performance and conduct during the period was excellent.
(V.MALOLAN)
Scientist 'E' (R&QA)
Division Head of QDMS&NDE
Advanced Systems Laboratory
DRDO
DECLARATION
I hereby declare that the results embodied in this dissertation entitled

"CLUSTER COMPUTING” is carried out by me during the year 2018 as a
part of internship for partial fulfillment of the award of B.Tech (Computer
Science and Engineering) from "GITAM DEEMED TO BE UNIVERSITY,
HYDERABAD". I have not submitted the same to any other university or
organization for the award of any other degree.
Kandala Venkata Phani Sai Ratnadeep

ACKNOWLEDGEMENT
This is an acknowledgement of the intensive drive and technical

competence of many individuals who have contributed to the success of my
project.
I am grateful to Dr.Tessy Thomas, Director, ASL, Hyderabad granting

me permission for the practical training through development of this project in
ASL.
A Special note of thanks to Sri W GOWRI SHANKAR Sc‘F’, Division

Head of DR&QA, ASL for supporting and encouraging me throughout the
project.
I am obliged and grateful to my guide Sri V MALOLAN, Sc‘E’,

Division Head of QDMS&NDE, R&QA, ASL for his valuable suggestions and
guidance in all respects during the course of my training and thankful for
providing me this great opportunity of working at ASL(DRDO).
I am thankful to Sri M PREMA KUMAR, Sc ‘D’ for encouraging me

throughout the project.

ABSTRACT
A computer cluster is a group of linked computers, working together closely so

that in many aspects they form a single computer. The components of a cluster
are commonly, but not always, connected to each other through fast local area
networks. Clusters are usually deployed to improve performance and/or
availability over that provided by a single computer, while typically being much
more cost-effective than single computers of comparable speed or availability.
The major objective in the cluster is utilizing a group of processing nodes so as to
complete the assigned job in a minimum amount of time by working
cooperatively. The main and important strategy to achieve such objective is by
transferring the extra loads from busy nodes to idle nodes.
The Document will contain the concepts of cluster computing and the principles
involved in it.

INDEX
1. INTRODUCTION ....................................................................................................... 8
1.1General Introduction ............................................................................................. 8
1.2 Computer Cluster .................................................................................................. 8
2.HISTORY OF CLUSTERS .......................................................................................... 11
2.1Briefly about working of Clusters systems ........................................................... 11
3.WHY CLUTERS? .........................................................................................................13
3.1Cluster Benefits: .................................................................................................... 14
4.ATTRIBUTES OF CLUSTERS .................................................................................. 15
4.1 High Availability or Failover Clusters ................................................................. 15
4.2 Load Balancing Cluster ........................................................................................ 16
4.3 Parallel/Distributed Processing Clusters .............................................................. 16
5.COMPARING OLD AND NEW ................................................................................. 18
6.LOGIC VIEW OF CLUSTER ..................................................................................... 20
7.ARCHITECTURE ........................................................................................................ 21
7.1 Clustering Concepts .............................................................................................. 23
8.COMPONENTS OF CLUSTER COMPUTER ............................................................. 28
9.CLUSTER OPERATION .............................................................................................. 31
9.1 Cluster Nodes ....................................................................................................... . 31
9.2 Cluster Network ................................................................................................... 32
9.3 Network Characterization ................................................................................... 32
9.4 Ethernet, Fast Ethernet, Gigabit Ethernet and 10-Gigabit Ethernet .................... 32

10. NETWORK STORAGE ......................................................................................... 35
10.1 Briefly about Network attached storage .......................................................... 36

10.2 RAID levels ..................................................................................................... 38
11. CPU UTILIZATION ............................................................................................... 42
12. GPU BASED COMPUTING .................................................................................. 44
12.1 Comparison of GPU and CPU Architecture ................................................... 44
12.2 GPU Design .................................................................................................... 45

13. ISSUES TO BE CONSIDERED ............................................................................. 46
14. CLUSTER APPLICATIONS .................................................................................. 47
14.1 Google Search Engine ..................................................................................... 47
14.2 Petroleum Reservoir Simulation ..................................................................... 47
14.3 Earthquake Simulation .................................................................................... 48
14.4 Image Rendering ............................................................................................. 48
14.5 Protein Explorer .............................................................................................. 48
15. CONCLUSION ....................................................................................................... 49

1. INTRODUCTION
1.1General Introduction:
Parallel computing has seen many changes since the days of the highly expensive and
proprietary super computers. Changes and improvements in performance have also been seen
in the area of mainframe computing for many environments. But these compute
environments may not be the most cost effective and flexible solution for a problem. Over
the past decade, cluster technologies have been developed that allow multiple low cost
computers to work in a coordinated fashion to process applications. The economics,
performance and flexibility of compute clusters makes cluster computing an attractive
alternative to centralized computing models and the attendant to cost, inflexibility, and
scalability issues inherent to these models.
Many enterprises are now looking at clusters of high-performance, low cost computers to
provide increased application performance, high availability, and ease of scaling within the
data centre. Interest in and deployment of computer clusters has largely been driven by the
increase in the performance of off-the-shelf commodity computers, high-speed, low-latency
network switches and the maturity of the software components. Application performance
continues to be of significant concern for various entities including governments, military,
education, scientific and now enterprise organizations.
1.2 Computer Cluster:

A cluster is a group of independent computers working together as a single system to ensure
that mission-critical applications and resources are as highly available as possible. The group
is managed as a single system, shares a common namespace, and is specifically designed to
tolerate component failures, and to support the addition or removal of components in a way
that's transparent to users
A computer cluster is a group of linked computers, working together closely

so that in many respects they form a single computer. The components of a cluster are
commonly, but not always, connected to each other through fast local area networks. Clusters
are usually deployed to improve performance and/or availability over that of a single
computer, while typically being much more cost-effective than single computers of
comparable speed or availability.

Cluster computing is best characterized as the integration of a number of off-the-shelf

commodity computers and resources integrated through hardware, networks, and software to
behave as a single computer. Initially, the terms cluster computing and high performance
computing were viewed as one and the same. However, the technologies available today
have redefined the term cluster computing to extend beyond parallel computing to
incorporate load-balancing clusters (for example, web clusters) and high availability clusters.
Clusters may also be deployed to address load balancing, parallel processing, systems
management, and scalability. Today, clusters are made up of commodity computers usually
restricted to a single switch or group of interconnected switches operating at Layer 2 and
within a single virtual local-area network (VLAN). Each compute node (computer) may have
different characteristics such as single processor or symmetric multiprocessor design, and
access to various types of storage devices. The underlying network is a dedicated network
made up of high-speed, low-latency switches that may be of a single switch or a hierarchy of
multiple switches.
A growing range of possibilities exists for a cluster interconnection

technology. Different variables will determine the network hardware for the cluster. Price-
per-port, bandwidth, latency, and throughput are key variables. The choice of network
technology depends on a number of factors, including price, performance, and compatibility
with other cluster hardware and system software as well as communication characteristics of
the applications that will use the cluster. Clusters are not commodities in themselves,
although they may be based on commodity hardware. A number of decisions need to be
made (for example, what type of hardware the nodes run on, which interconnect to use, and
which type of switching architecture to build on) before assembling a cluster range. Each
decision will affect the others, and some will probably be dictated by the intended use of the
cluster. Selecting the right cluster elements involves an understanding of the application and
the necessary resources that include, but are not limited to, storage, throughput, latency, and
number of nodes.
When considering a cluster implementation, there are some basic questions that can help
determine the cluster attributes such that technology options can be evaluated:
1. Will the application be primarily processing a single dataset?

2. Will the application be passing data around or will it generate real-time
information?
3. Is the application 32- or 64-bit?
The answers to these questions will influence the type of CPU, memory architecture, storage,
cluster interconnect, and cluster network design. Cluster applications are often CPU-bound

so that interconnect and storage bandwidth are not limiting factors, although this is not
always the case.
Development of new materials and production processes, based on high technologies,

requires a solution of increasingly complex computational problems. However, even as
computer power, data storage, and communication speed continue to improve exponentially;
available computational resources are often failing to keep up with what users demand of
them. Therefore high-performance computing (HPC) infrastructure becomes a critical
resource for research and development as well as for many business applications.
Traditionally the HPC applications were oriented on the use of high-end computer systems -
so-called "supercomputers".

2.HISTORY OF CLUSTERS
The formal engineering basis of cluster computing as a means of doing parallel work of any
sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come
to be regarded as the seminal paper on parallel processing: Amdahl's law.
The history of early computer clusters is more or less directly tied into the history of early
networks, as one of the primary motivations for the development of a network was to link
computing resources, creating a de facto computer cluster. The first commodity clustering
product was ARCnet, developed by Data point in 1977. ARC net wasn't a commercial
success and clustering didn't really take off until DEC released their VAX cluster product in
the 1980s for the VAX/VMS operating system. The ARCnet and VAX cluster products not
only supported parallel computing, but also shared file systems and peripheral devices. They
were supposed to give you the advantage of parallel processing while maintaining data
reliability and uniqueness. VAX cluster, now VMS cluster, is still available on OpenVMS
systems from HP running on Alpha and Itanium systems.
The history of cluster computing is intimately tied up with the evolution

of networking technology. As networking technology has become cheaper and faster, cluster
computers have become significantly more attractive.
2.1Briefly about working of Clusters systems :

Supercomputers are defined as the fastest, most powerful computers in terms of CPU power
and I/O capabilities. Since computer technology is continually evolving, this is always a
moving target. This year’s supercomputer may well be next year’s entry level personal
computer. In fact, today’s commonly available personal computers deliver performance that
easily bests the supercomputers that were available on the market in the 1980’s. Strong
limitation for further scalability of vector computers was their shared memory architecture.
Therefore, massive parallel processing (MPP) systems using distributed-memory were
introduced by the end of the 1980s. The main advantage of such systems is the possibility to
divide a complex job into several parts, which are executed in parallel by several processors
each having dedicated memory.
The communication between the parts of the main job occurs within the framework of the so-
called message-passing paradigm, which was standardized in the message-passing interface
(MPI). The message-passing paradigm is flexible enough to support a variety of applications
and is also well adapted to the MPP architecture. During last year’s, a tremendous

improvement in the performance of standard workstation processors led to their use in the
MPP supercomputers, resulting in significantly lowered price/performance ratios.
Traditionally, conventional MPP supercomputers are oriented on the very high-end of

performance. As a result, they are relatively expensive and require special and also expensive
maintenance support. To meet the requirements of the lower and medium market segments,
the symmetric multiprocessing (SMP) systems were introduced in the early 1990s to address
commercial users with applications such as databases, scheduling tasks in
telecommunications industry, data mining and manufacturing. Better understanding of
applications and algorithms as well as a significant improvement in the communication
network technologies and processors speed led to emerging of new class of systems, called
clusters of SMP or networks of workstations (NOW), which are able to compete in
performance with MPPs and have excellent price/performance ratios for special applications
types. On practice, clustering technology can be used for any arbitrary group of computers,
allowing building homogeneous or heterogeneous systems. Even bigger performance can be
achieved by combining groups of clusters into Hyper Cluster or even Grid-type system.
A cluster system by connecting 4 SMPS

3.WHY CLUTERS?
The question may arise why clusters are designed and built when perfectly good commercial
supercomputers are available on the market. The answer is that they are expensive . It is also
difficult to upgrade and maintain them. Clusters, on the other hand, are cheap and easy way
to take off-the-shelf components and combine them into a single supercomputer. In some
areas of research clusters are actually faster than commercial supercomputer. Clusters also
have the distinct advantage in that they are simple to build using components available from
hundreds of sources. We don’t even have to use new equipment to build a cluster.
Price/Performance :
The most obvious benefit of clusters, and the most compelling reason for the growth in their
use, is that they have significantly reduced the cost of processing power.
One indication of this phenomenon is the Gordon Bell Award for

Price/Performance Achievement in Supercomputing, which many of the last several years
has been awarded to Beowulf type clusters. One of the most recent entries, the Avalon
cluster at Los Alamos National Laboratory, "demonstrates price/performance an order of
magnitude superior to commercial machines of equivalent performance." This reduction in
the cost of entry to high-power computing (HPC) has been due to co modification of both
hardware and software over the last 10 years particularly. All the components of computers
have dropped dramatically in that time. The components critical to the development of low
cost clusters are:
1. Processors - commodity processors are now capable of computational power previously

reserved for supercomputers, witness Apple Computer's recent add campaign touting the G4
Macintosh as a supercomputer.
2. Memory - the memory used by these processors has dropped in cost right with the
processors.
3. Networking Components - the most recent group of products to experience co

modification and dramatic cost decreases is networking hardware.
High- Speed networks can now be assembled with these products for a fraction of the cost
necessary only a few years ago.
4. Motherboards, busses, and other sub-systems - all of these have become commodity
products, allowing the assembly of affordable computers from off the shelf components

3.1Cluster Benefits:
The main benefits of clusters are scalability, availability, and performance. For scalability, a
cluster uses the combined processing power of compute nodes to run cluster-enabled
applications such as a parallel database server at a higher performance than a single machine
can provide. Scaling the cluster's processing power is achieved by simply adding additional
nodes to the cluster. Availability within the cluster is assured as nodes within the cluster
provide backup to each other in the event of a failure. In high-availability clusters, if a node
is taken out of service or fails, the load is transferred to another node (or nodes) within the
cluster. To the user, this operation is transparent as the applications and data running are also
available on the failover nodes. An additional benefit comes with the existence of a single
system image and the ease of manageability of the cluster.
In summary, clusters provide:
• Scalable capacity for compute, data, and transaction intensive applications,

including support of mixed workloads
• Horizontal and vertical scalability without downtime
• Ability to handle unexpected peaks in workload
• Central system management of a single systems image
• 24 x 7 availability.

4.ATTRIBUTES OF CLUSTERS
There are several types of clusters, each with specific design goals and functionality. These
clusters range from distributed or parallel clusters for computation intensive or data intensive
applications that are used for protein, seismic, or nuclear modelling to simple load-balanced
clusters.
4.1 High Availability or Failover Clusters :

These clusters are designed to provide uninterrupted availability of data or services (typically
web services) to the end-user community. The purpose of these clusters is to ensure that a
single instance of an application is only ever running on one cluster member at a time but if
and when that cluster member is no longer available, the application will failover to another
cluster member. With a high-availability cluster, nodes can be taken out-of-service for
maintenance or repairs. Additionally, if a node fails, the service can be restored without
affecting the availability of the services provided by the cluster (see Figure 2.1). While the
application will still be available, there will be a performance drop due to the missing node.
High-availability clusters implementations are best for mission-critical applications or

databases, mail, file and print, web, or application servers
Unlike distributed or parallel processing clusters, high-availability clusters seamlessly and

transparently integrate existing standalone, non-cluster aware applications together into a

single virtual machine necessary to allow the network to effortlessly grow to meet increased
business demands.
4.2 Load Balancing Cluster :

This type of cluster distributes incoming requests for resources or content among multiple
nodes running the same programs or having the same content. Every node in the cluster is
able to handle requests for the same content or application. If a node fails, requests are
redistributed between the remaining available nodes. This type of distribution is typically
seen in a web-hosting environment.
Both the high availability and load-balancing cluster technologies can be combined to
increase the reliability, availability, and scalability of application and data resources that are
widely deployed for web, mail, news, or FTP services.
4.3 Parallel/Distributed Processing Clusters:

Traditionally, parallel processing was performed by multiple processors in a specially
designed parallel computer. These are systems in which multiple processors share a single
memory and bus interface within a single computer. With the advent of high speed, low-
latency switching technology, computers can be interconnected to form a parallel-processing
cluster. These types of cluster increase availability, performance, and scalability for
applications, particularly computationally or data intensive tasks. A parallel cluster is a
system that uses a number of nodes to simultaneously solve a specific computational or data-
mining task. Unlike the load balancing or high-availability clusters that distributes
requests/tasks to nodes where a node processes the entire request, a parallel environment will
divide the request into multiple sub-tasks that are distributed to multiple nodes within the
cluster for processing. Parallel clusters are typically used for CPU-intensive analytical
applications, such as mathematical computation, scientific analysis (weather forecasting,

seismic analysis, etc.), and financial data analysis. One of the more common cluster
operating systems is the Beowulf class of clusters. A Beowulf cluster can be defined as a
number of systems whose collective processing capabilities are simultaneously applied to a
specific technical, scientific, or business application. Each individual computer is referred to
as a “node” and each node communicates with other nodes within a cluster across standard
Ethernet technologies (10/100 Mbps, GbE, or 10GbE). Other high-speed interconnects such
as Myrinet, Infiniband, or Quadrics may also be used.

5.COMPARING OLD AND NEW
Today, open standards-based HPC systems are being used to solve

problems from High-end, floating-point intensive scientific and engineering problems to data
intensive tasks in industry. Some of the reasons why HPC clusters outperform RISC based
systems Include:
Collaboration
Scientists can collaborate in real-time across dispersed locations- bridging isolated islands of
scientific research and discovery- when HPC clusters are based on open source and building
block technology.
Scalability
HPC clusters can grow in overall capacity because processors and nodes can be added as
demand increases.
Availability
Because single points of failure can be eliminated, if any one system component goes Down,
the system as a whole or the solution (multiple systems) stay highly available.
Ease of technology refresh
Processors, memory, disk or operating system (OS) technology can be easily updated, And
new processors and nodes can be added or upgraded as needed.
Affordable service and support
Compared to proprietary systems, the total cost of ownership can be much lower. This
includes service, support and training.
Vendor lock-in
The age-old problem of proprietary vs. open systems that use industry-accepted standards is
eliminated.
System manageability
The installation, configuration and monitoring of key elements of proprietary systems is

usually accomplished with proprietary technologies, complicating system management. The
servers of an HPC cluster can be easily managed from a single point using readily available
network infrastructure and enterprise management software.

Reusability of components
Commercial components can be reused, preserving the investment. For example, older nodes
can be deployed as file/print servers, web servers or other infrastructure servers.
Disaster recovery
Large SMPs are monolithic entities located in one facility. HPC systems can be collocated or
geographically dispersed to make them less susceptible to disaster.

6.LOGIC VIEW OF CLUSTER
A Beowulf cluster uses multi computer architecture, as depicted in figure. It features a

parallel computing system that usually consists of one or more master nodes and one or more
compute nodes, or cluster nodes, interconnected via widely available network interconnects.
All of the nodes in a typical Beowulf cluster are commodity systems- PCs, workstations, or
servers-running commodity software such as Linux.
The master node acts as a server for Network File System (NFS) and as a gateway to the
outside world. As an NFS server, the master node provides user file space and other common
system software to the compute nodes via NFS. As a gateway, the master node allows users
to gain access through it to the compute nodes. Usually, the master node is the only machine
that is also connected to the outside world using a second network interface card (NIC). The
sole task of the compute nodes is to execute parallel jobs. In most cases, therefore, the
compute nodes do not have keyboards, mice, video cards, or monitors. All access to the
client nodes is provided via remote connections from the master node. Because compute
nodes do not need to access machines outside the cluster, nor do machines outside the cluster
need to access compute nodes directly, compute nodes commonly use private IP addresses,
such as the 10.0.0.0/8 or 192.168.0.0/16 address ranges. From a user’s perspective, a
Beowulf cluster appears as a Massively Parallel Processor (MPP) system. The most common
methods of using the system are to access the master node either directly or through Telnet or
remote login from personal workstations. Once on the master node, users can prepare and
compile their parallel applications, and also spawn jobs on a desired number of compute
nodes in the cluster. Applications must be written in parallel style and use the message-
passing programming model. Jobs of a parallel application are spawned on compute nodes,
which work collaboratively until finishing the application. During the execution, compute
nodes use 10 standard message-passing middleware, such as Message Passing Interface
(MPI) and Parallel Virtual Machine (PVM), to exchange information.

7.ARCHITECTURE
A cluster is a type of parallel or distributed processing system, which

consists of a collection of interconnected stand-alone computers cooperatively working
together as a single, integrated computing resource
A node:
 a single or multiprocessor system with memory, I/O facilities, & OS

 generally 2 or more computers (nodes) connected together
 in a single cabinet, or physically separated & connected via a LAN
 appear as a single system to users and applications
provide a cost-effective way to gain features and benefits
Three principle features usually provided by cluster computing are availability, scalability
and simplification. Availability is provided by the cluster of computers operating as a single
system by continuing to provide services even when one of the individual computers is lost
due to a hardware failure or other reason. Scalability is provided by the inherent ability of the
overall system to allow new components, such as computers, to be assed as the overall
system's load is increased. The simplification comes from the ability of the cluster to allow
administrators to manage the entire group as a single system. This greatly simplifies the
management of groups of systems and their applications. The goal of cluster computing is to
facilitate sharing a computer load over several systems without either the users of system or
the administrators needing to know that more than one system is involved. The Windows NT
Server Edition of the Windows operating system is an example of a base operating system
that has been modified to include architecture that facilitates a cluster computing
environment to be established.
Cluster computing has been employed for over fifteen years but it is
the recent demand for higher availability in small businesses that has caused an explosion in
this field. Electronic databases and electronic malls have become essential to the daily
operation of small businesses. Access to this critical information by these entities has created
a large demand for cluster computing principle features.
There are some key concepts that must be understood when forming a cluster computing
resource. Nodes or systems are the individual members of a cluster. They can be computers,
servers, and other such hardware although each node generally has memory and processing
capabilities. If one node becomes unavailable the other nodes can carry the demand load so
that applications or services are always available. There must be at least two nodes to
compose a cluster structure otherwise they are just called servers. The collection of software
on each node that manages all cluster specific activity is called the cluster service. The
cluster service manages all of the resources, the canonical items in the system, and sees then
as identical opaque objects. Resources can be such things as physical hardware devices, like
disk drives and network cards, logical items, like logical disk volumes, TCP/IP addresses,
applications, and databases.
When a resource is providing its service on a specific node it is said

to be on-line. A collection of resources to be managed as a single unit is called a group.

Groups contain all of the resources necessary to run a specific application, and if need be, to
connect to the service provided by the application in the case of client systems. These groups
allow administrators to combine resources into larger logical units so that they can be
managed as a unit. This, of course, means that all operations performed on a group affect all
resources contained within that group.
Normally the development of a cluster computing system occurs in phases. The first phase
involves establishing the underpinnings into the base operating system and building the
foundation of the cluster components. These things should focus on providing enhanced
availability to key applications using storage that is accessible to two nodes. The following
stages occur as the demand increases and should allow for much larger clusters to be formed.
These larger clusters should have a true distribution of applications, higher performance
interconnects, widely distributed storage for easy accessibility and load balancing. Cluster
computing will become even more prevalent in the future because of the growing needs and
demands of businesses as well as the spread of the Internet
7.1 Clustering Concepts :
Clusters are in fact quite simple. They are a bunch of computers tied together with a network
working on a large problem that has been broken down into smaller pieces. There are a
number of different strategies we can use to tie them together. There are also a number of
different software packages that can be used to make the software side of things work.
Parallelism
The name of the game in high performance computing is parallelism. It is the quality that
allows something to be done in parts that work independently rather than a task that has so
many interlocking dependencies that it cannot be further broken down. Parallelism operates
at two levels: hardware parallelism and software parallelism.
Hardware Parallelism
On one level hardware parallelism deals with the CPU of an individual system and how we
can squeeze performance out of sub-components of the CPU that can speed up our code. At
another level there is the parallelism that is gained by having multiple systems working on a
computational problem in a distributed fashion. These systems are known as ‘fine grained’
for parallelism inside the CPU or having to do with the multiple CPUs in the same system, or
‘coarse grained’ for parallelism of a collection of separate systems acting in concerts.

CPU Level Parallelism
A computer’s CPU is commonly pictured as a device that operates on one instruction after
another in a straight line, always completing one-step or instruction before a new one is
started. But new CPU architectures have an inherent ability to do more than one thing at
once. The logic of CPU chip divides the CPU into multiple execution units. Systems that
have multiple execution units allow the CPU to attempt to process more than one instruction
at a time. Two hardware features of modern CPUs support
multiple execution units: the cache – a small memory inside the CPU. The pipeline is a small
area of memory inside the CPU where instructions that are next in line to be executed are
stored. Both cache and pipeline allow impressive increases in CPU performances.
System level Parallelism
It is the parallelism of multiple nodes coordinating to work on a problem in parallel that

gives the cluster its power. There are other levels at which even more parallelism can be
introduced into this system. For example if we decide that each node in our cluster will be a
multi CPU system we will be introducing a fundamental degree of parallel processing at the
node level. Having more than one network interface on each node introduces communication
channels that may be used in parallel to communicate
with other nodes in the cluster. Finally, if we use multiple disk drive controllers in each node
we create parallel data paths that can be used to increase the performance of I/O subsystem.
Software Parallelism
Software parallelism is the ability to find well defined areas in a problem we want to solve
that can be broken down into self-contained parts. These parts are the program elements that
can be distributed and give us the speedup that we want to get out of a high performance
computing system. Before we can run a program on a parallel cluster, we have to ensure that
the problems we are trying to solve are amenable to being done in a parallel fashion. Almost
any problem that is composed of smaller sub problems that can be quantified can be broken
down into smaller problems and run on a node on a cluster.
System-Level Middleware
System-level middleware offers Single System Image (SSI) and high availability
infrastructure for processes, memory, storage, I/O, and networking. The single system image
illusion can be implemented using the hardware or software infrastructure. This unit focuses
on SSI at the operating system or subsystems level.

A modular architecture for SSI allows the use of services provided by

lower level layers to be used for the implementation of higher-level services. This unit
discusses design issues, architecture, and representative systems for job/resource
management, network RAM, software RAID, single I/O space, and virtual networking. A
number of operating systems have proposed SSI solutions, including MOSIX, Unix ware,
and Solaris -MC. It is important to discuss one or more such systems as they help students to
understand architecture and implementation issues.
Message Passing Primitives
Although new high-performance protocols are available for cluster computing, some
instructors may want provide students with a brief introduction to message passing programs
using the BSD Sockets interface Transmission Control Protocol/Internet Protocol (TCP/IP)
before introducing more complicated parallel programming with distributed memory
programming tools. If students have already had a course in data
communications or computer networks then this unit should be skipped. Students should
have access to a networked computer lab with the Sockets libraries enabled. Sockets usually
come installed on Linux workstations.
Parallel Programming Using MPI
An introduction to distributed memory programming using a standard tool such as Message

Passing Interface (MPI)[23] is basic to cluster computing. Current versions of MPI generally
assume that programs will be written in C, C++, or Fortran. However, Java-based versions of
MPI are becoming available.
Application-Level Middleware
Application-level middleware is the layer of software between the operating system and
applications. Middleware provides various services required by an application to function
correctly. A course in cluster programming can include some coverage of middleware tools
such as CORBA, Remote Procedure Call, Java Remote Method Invocation (RMI), or Jini.
Sun Microsystems has produced a number of Java-based technologies that can become units
in a cluster programming course, including the Java Development Kit (JDK) product family
that consists of the essential tools and APIs for all developers writing in the Java
programming language through to APIs such as for telephony (JTAPI), database connectivity
(JDBC), 2D and 3D graphics, security as well as electronic commerce. These technologies
enable Java to interoperate with many other devices, technologies, and software standards.

Single System image
A single system image is the illusion, created by software or hardware, thatpresents a

collection of resources as one, more powerful resource. SSI makes the cluster appear like a
single machine to the user, to applications, and to the network. A cluster without a SSI is not
a cluster. Every SSI has a boundary. SSI support can exist at different levels within a
system, one able to be build on another.
Single System Image Benefits
 Provide a simple, straightforward view of all system resources and activities, from any
node of the cluster
 Free the end user from having to know where an application will run
 Free the operator from having to know where a resource is located
 Let the user work with familiar interface and commands and allows the administrators
to manage the entire clusters as a single entity
 Reduce the risk of operator errors, with the result that end users see improved
reliability and higher availability of the system

 Allowing centralize/decentralize system management and control to avoid the need of

skilled administrators from system administration
 Present multiple, cooperating components of an application to the administrator as a
single application
 Greatly simplify system management
 Provide location- independent message communication
 Help track the locations of all resource so that there is no longer any need for system
operators to be concerned with their physical location
 Provide transparent process migration and load balancing across nodes.
 Improved system response time and performance
High speed networks
Network is the most critical part of a cluster. Its capabilities and performance directly
influences the applicability of the whole system for HPC. Starting from Local/Wide Area
Networks (LAN/WAN) like Fast Ethernet and ATM, to System Area Networks (SAN) like
Myrinet and Memory Channel
Eg. Fast Ethernet
 100 Mbps over UTP or fiber-optic cable

 MAC protocol: CSMA/CD

8.COMPONENTS OF CLUSTER COMPUTER

The basic building blocks of clusters are broken down into multiple categories: the cluster
nodes, cluster operating system, network switching hardware and the node/switch
interconnect. Significant advances have been accomplished over the past five years to
improve the performance of both the compute nodes as well as the underlying switching
infrastructure.
Application :
It includes all the various applications that are going on for a particular group. These
applications run in parallel. These includes various query running on different nodes of the
cluster. This can be said as the input part of the cluster component
Middleware:
These are software packages which interacts the user with the operating system for the
cluster computing. In other words we can say that these are the layers of software between
applications and operating system. Middleware provides various services required by an
application to function correctly. The software that are used as middleware are:

OSCAR(Open Source Cluster Application Resources)
Features:
 Image based Installation.

 Supported by Red Hat 9.0 and Mandrake 9.0.
 Processors supported: x86, Itanium (in beta).
 Interconnects: Ethernet, Myrinet.
 Diskless support in development.
 Opteron support in development.
 High-availability support in alpha testing.
Scyld :
Features:
 Commercial distribution.
 Single system image design.
 Processors: x86 and Opteron.
 Interconnects: Ethernet and Infiniband. MPI and PVM.
 Disk full and diskless support.
Rocks:
Features:
 Processors: x86, Opteron, Itanium.

 Interconnects: Ethernet and Myrinet.
 Compute node management via Red Hat’s kick start mechanism.
 Disk full only.
 Cluster on CD.
Operating System:
Clusters can be supported by various operating systems which includes Windows, Linux.etc.
Interconnect:
Interconnection between the various nodes of the cluster system can be done using 10GbE,
Myrinet etc. In case of small cluster system these and be connected with the help of simple
switches.

Switch:
A switch is a multi port bridge with a buffer and a design that can boost its efficiency(large
number of ports imply less traffic) and performance. Switch can perform error checking
before forwarding data, that makes it very efficient as it does not forward packets that have
errors and forward good packets selectively to correct port only.
10 GbE :
10 Gigabit Ethernet is the fastest and most recent of the Ethernet standards as IEEE
802.3ae defines a version of Ethernet with a nominal rate of 10Gbits/s that makes it 10 times
faster than Gigabit Ethernet. Unlike other Ethernet systems, 10 Gigabit Ethernet is based
entirely on the use of optical fiber connections.
Myrinet :
Myrinet is a cost effective, high performance, packet communication and switching

technology. It is widely used in Linux Clusters. Myrinet software supports most common
hosts and operating systems. The software is supplied open source. Myrinet implements host
interfaces that execute a control program to interact directly with host processes (OS bypass)
for low latency communication, and directly with network to send, receive, and buffer
packets.
Infiniband:
It is a computer-networking communications standard used in high-performance computing

that features very high throughput and very low latency. It is used for data interconnect both
among and within computers. Infiniband is also used as either a direct or switched
interconnect between servers and storage systems, as well as an interconnect between storage
systems. Infiniband uses a switched fabric topology, as opposed to early shared
medium Ethernet. Infiniband transmits data in packets of up to 4 KB.
Nodes :
Nodes of the cluster system implies about the different computers that are connected. All of
these processors can be of Intel or AMD 64 bit

9.CLUSTER OPERATION
9.1 Cluster Nodes :
Node technology has migrated from the conventional tower cases to single rack-unit
multiprocessor systems and blade servers that provide a much higher processor density
within a decreased area. Processor speeds and server architectures have increased in
performance, as well as solutions that provide options for either 32-bit or 64-bit processors
systems. Additionally, memory performance as well as hard-disk access speeds and storage
capacities have also increased. It is interesting to note that even though performance is
growing exponentially in some cases, the cost of these technologies has dropped
considerably. As shown in below, node participation in the cluster falls into one of two
responsibilities: master (or head) node and compute (or slave) nodes. The master node is the
unique server in cluster systems. It is responsible for running the file system and also serves
as the key system for clustering middleware to route processes, duties, and monitor the health
and status of each slave node. A compute (or slave) node within a cluster provides the cluster
a computing and data storage capability. These nodes are derived from fully operational,
standalone computers that are typically marketed as desktop or server systems that, as such,
are off-the-shelf commodity systems.

9.2 Cluster Network :

Commodity cluster solutions are viable today due to a number of factors such as the high
performance commodity servers and the availability of high speed, low-latency network
switch technologies that provide the inter-nodal communications. Commodity clusters
typically incorporate one or more dedicated switches to support communication between the
cluster nodes. The speed and type of node interconnects vary based on the requirements of
the application and organization. With today's low costs per-port for Gigabit Ethernet
switches, adoption of 10-Gigabit Ethernet and the standardization of 10/100/1000 network
interfaces on the node hardware, Ethernet continues to be a leading interconnect technology
for many clusters. In addition to Ethernet, alternative network or interconnect technologies
include Myrinet, Quadrics, and Infiniband that support bandwidths above 1Gbps and end-to-
end message latencies below 10 microseconds (uSec).
9.3 Network Characterization :

There are two primary characteristics establishing the operational properties of a network:
bandwidth and delay. Bandwidth is measured in millions of bits per second (Mbps) and/or
billions of bits per-second (Gbps). Peak bandwidth is the maximum amount of data that can
be transferred in a single unit of time through a single connection. Bi-section bandwidth is
the total peak bandwidth that can be passed across a single switch.
Latency is measured in microseconds (µSec) or milliseconds (mSec) and is the time it takes
to move a single packet of information in one port and out of another. For parallel clusters,
latency is measured as the time it takes for a message to be passed from one processor to
another that includes the latency of the interconnecting switch or switches. The actual
latencies observed will vary widely even on a single switch depending on characteristics
such as packet size, switch architecture (centralized versus distributed), queuing, buffer
depths and allocations, and protocol processing at the nodes.
9.4 Ethernet, Fast Ethernet, Gigabit Ethernet and 10-Gigabit Ethernet:

Ethernet is the most widely used interconnect technology for local area networking (LAN).
Ethernet as a technology supports speeds varying from 10Mbps to 10 Gbps and it is
successfully deployed and operational within many high-performance cluster computing
environments.

Ethernet:
Ethernet is the most popular physical layer LAN technology in use today. It defines the
number of conductors that are required for a connection, the performance thresholds that can
be expected, and provides the framework for data transmission. A standard Ethernet network
can transmit data at a rate up to 10 Megabits per second (10 Mbps). The Institute for
Electrical and Electronic Engineers developed an Ethernet standard known as IEEE Standard
802.3.
Fast Ethernet:
The Fast Ethernet standard (IEEE 802.3u) has been established for Ethernet networks that
need higher transmission speeds. This standard raises the Ethernet speed limit from 10 Mbps
to 100 Mbps .
There are three types of Fast Ethernet:
 100BASE-TX for use with level 5 UTP cable
 100BASE-FX for use with fiber-optic cable
 100BASE-T4 which utilizes an extra two wires for use with level 3 UTP cable.
Gigabit Ethernet:
Gigabit Ethernet was developed to meet the need for faster communication networks. It is
also known as “gigabit-Ethernet-over-copper” or 1000Base-T.It is a version of Ethernet that
runs at speeds 10 times faster than 100Base-T.It is defined in the IEEE 802.3
standard .Existing Ethernet LANs with 10 and 100 Mbps cards can feed into a Gigabit
Ethernet backbone to interconnect high performance switches, routers and servers.
10 Gigabit Ethernet :10 Gigabit Ethernet is the fastest and most recent of the Ethernet
standards .IEEE 802.3ae defines a version of Ethernet with a nominal rate of 10Gbits/s that
makes it 10 times faster than Gigabit Ethernet. Unlike other Ethernet systems, 10 Gigabit
Ethernet is based entirely on the use of optical fiber connections.


10. NETWORK STORAGE

In basic terms, network storage is simply about storing data using a method by which it can
be made available to clients on the network. Over the years, the storage of data has evolved
through various phases. This evolution has been driven partly by the changing ways in
which we use technology, and in part by the exponential increase in the volume of data we
need to store. It has also been driven by new technologies, which allow us to store and
manage data in a more effective manner.
In the days of mainframes, data was stored physically separate from the actual
processing unit, but was still only accessible through the processing units. As PC based
servers became more commonplace, storage devices went 'inside the box' or in external
boxes that were connected directly to the system. Each of these approaches was valid in its
time, but as our need to store increasing volumes of data and our need to make it more
accessible grew, other alternatives were needed. Enter network storage. Network storage is
a generic term used to describe network based data storage, but there are many
technologies within it which all go to make the magic happen. Here is a rundown of some
of the basic terminology that you might happen across when reading about network
storage.
Direct Attached Storage (DAS)
Direct attached storage is the term used to describe a storage device that is directly
attached to a host system. The simplest example of DAS is the internal hard drive of a
server computer, though storage devices housed in an external box come under this banner
as well. DAS is still, by far, the most common method of storing data for computer
systems. Over the years, though, new technologies have emerged which work, if you'll
excuse the pun, out of the box.
Network Attached Storage (NAS)
Network Attached Storage, or NAS, is a data storage mechanism that uses special
devices connected directly to the network media. These devices are assigned an IP address
and can then be accessed by clients via a server that acts as a gateway to the data, or in
some cases allows the device to be accessed directly by the clients without an intermediary.
The beauty of the NAS structure is that it means that in an environment with many
servers running different operating systems, storage of data can be centralized, as can the

security, management, and backup of the data. An increasing number of companies already
make use of NAS technology, if only with devices such as CD-ROM towers (stand-alone
boxes that contain multiple CD-ROM drives) that are connected directly to the network.
Some of the big advantages of NAS include the expandability; need more storage
space, add another NAS device and expand the available storage. NAS also bring an extra
level of fault tolerance to the network. In a DAS environment, a server going down means
that the data that that server holds is no longer available. With NAS, the data is still
available on the network and accessible by clients. Fault tolerant measures such as RAID,
can be used to make sure that the NAS device does not become a point of failure.
Storage Area Network (SAN)
A SAN is a network of storage devices that are connected to each other and to a
server, or cluster of servers, which act as an access point to the SAN. In some
configurations a SAN is also connected to the network. SAN's use special switches as a
mechanism to connect the devices. These switches, which look a lot like a normal Ethernet
networking switch, act as the connectivity point for SAN's. Making it possible for devices
to communicate with each other on a separate network brings with it many advantages.
Consider, for instance, the ability to back up every piece of data on your network without
having to 'pollute' the standard network infrastructure with gigabytes of data. This is just
one of the advantages of a SAN which is making it a popular choice with companies today,
and is a reason why it is forecast to become the data storage technology of choice in the
coming years.
10.1 Briefly about Network attached storage
Network-attached storage (NAS) is hard disk storage that is set up with its own network
address rather than being attached to the department computer that is serving applications
to a network's workstation users. By removing storage access and its management from the
department server, both application programming and files can be served faster because
they are not competing for the same processor resources. The network-attached storage
device is attached to a local area network (typically, an Ethernet network) and assigned an
IP address. File requests are mapped by the main server to the NAS file server.

A network-attached storage (NAS) device is a server that is dedicated to nothing more than
file sharing. NAS does not provide any of the activities that a server in a server-centric
system typically provides, such as e-mail, authentication or file management. NAS allows
more hard disk storage space to be added to a network that already utilizes servers without
shutting them down for maintenance and upgrades. With a NAS device, storage is not an
integral part of the server. Instead, in this storage-centric design, the server still handles all
of the processing of data but a NAS device delivers the data to the user. A NAS device
does not need to be located within the server but can exist anywhere in a LAN and can be
made up of multiple networked NAS devices.
Network Attached Storage separates the application server from the storage. This increases
overall system performance by allowing the servers to perform application requests and the
NAS to serve files or run applications.
Each NAS resides on the LAN as an independent network node and has its own IP address
An important benefit of NAS is its ability to provide multiple clients on the network with
access to the same files today, when more storage capacity is required, NAS appliances can
simply be outfitted with larger disks or clustered together to provide both vertical
scalability and horizontal scalability

10.2 RAID levels
RAID, or “Redundant Arrays of Inexpensive Disks” is a technique which makes use of a

combination of multiple disks instead of using a single disk for increased performance, data
redundancy or both. The term was coined by David Patterson, Garth A. Gibson, and Randy
Katz at the University of California, Berkeley in 1987.
Why data redundancy?
Data redundancy, although taking up extra space, adds to disk reliability. This means, in case
of disk failure, if the same data is also backed up onto another disk, we can retrieve the data
and go on with the operation. On the other hand, if the data is spread across just multiple
disks without the RAID technique, the loss of a single disk can affect the entire data.
Key evaluation points for a RAID System
 Reliability: How many disk faults can the system tolerate?
 Availability: What fraction of the total session time is a system in uptime mode, i.e.
how available is the system for actual use?
 Performance: How good is the response time? How high is the throughput (rate of
processing work)? Note that performance contains a lot of parameters and not just the
two.
 Capacity: Given a set of N disks each with B blocks, how much useful capacity is
available to the user?
RAID is very transparent to the underlying system. This means, to the host system, it appears
as a single big disk presenting itself as a linear array of blocks. This allows older
technologies to be replaced by RAID without making too many changes in the existing code.
Different RAID levels
RAID-0 (Striping)
Blocks are “striped” across disks.

In the figure, blocks “0,1,2,3” form a stripe.

 Instead of placing just one block into a disk at a time, we can work with two (or more)
blocks placed into a disk before moving on to the next one.
Evaluation:
 Reliability:0
There is no duplication of data. Hence, a block once lost cannot be recovered.
 Capacity:N*B
The entire space is being used to store data. Since there is no duplication, N disks each
having B blocks are fully utilized.
RAID-4 (Block-Level Striping with Dedicated Parity)
 Instead of duplicating data, this adopts a parity-based approach.
In the figure, we can observe one column (disk) dedicated to parity.

 Parity is calculated using a simple XOR function. If the data bits are 0,0,0,1 the parity
bit is XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0 the parity bit is XOR(0,1,1,0) = 0. A
simple approach is that even number of one's results in parity 0, and an odd number of
one's results in parity 1.

Assume that in the above figure, C3 is lost due to some disk failure. Then, we can
recompute the data bit stored in C3 by looking at the values of all the other columns and
the parity bit. This allows us to recover lost data.
Evaluation:
 Reliability:1
RAID-4 allows recovery of at most 1 disk failure (because of the way parity works). If
more than one disk fails, there is no way to recover the data.
 Capacity:(N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1) disks are made
available for data storage, each disk having B blocks.
RAID-5 (Block-Level Striping with Distributed Parity)
 This is a slight modification of the RAID-4 system where the only difference is that the
parity rotates among the drives.
In the figure, we can notice how the parity bit “rotates”.

 This was introduced to make the random write performance better.

Evaluation:
 Reliability:1
RAID-5 allows recovery of at most 1 disk failure (because of the way parity works). If
more than one disk fails, there is no way to recover the data. This is identical to RAID-
4.
 Capacity:(N-1)*B
Overall, space equivalent to one disk is utilized in storing the parity. Hence, (N-1) disks
are made available for data storage, each disk having B blocks.
Other RAID levels :

These two are less commonly used.
RAID-2
RAID-2 consists of bit-level striping using a Hamming Code parity i.e Instead of striping
the blocks across the disks, it stripes the bits across the disks. You need two groups of disks.
One group of disks are used to write the data, another group is used to write the error
correction codes. This uses Hamming error correction code (ECC), and stores this
information in the redundancy disks When data is written to the disks, it calculates the ECC
code for the data on the fly, and stripes the data bits to the data-disks, and writes the ECC
code to the redundancy disks. When data is read from the disks, it also reads the
corresponding ECC code from the redundancy disks, and checks whether the data is
consistent. If required, it makes appropriate corrections on the fly.
RAID-3
RAID-3 consists of byte-level striping with a dedicated parity i.e Instead of striping the
blocks across the disks, it stripes the bytes across the disks. Uses multiple data disks, and a
dedicated disk to store parity. The disks have to spin in sync to get to the data. Sequential
read and write will have good performance. Random read and write will have worst
performance. This is not commonly used
RAID-6
RAID-6 is a recent advancement which contains a distributed double parity, which involves
block-level striping with 2 parity bits instead of just 1 distributed across all the disks. There
are also hybrid RAIDs, which make use of more than one RAID levels nested one after the
other, to fulfill specific requirements. Can handle two disk failure .This RAID configuration
is complex to implement in a RAID controller, as it has to calculate two parity data for each
data block.

11. CPU UTILIZATION

One important consideration for many enterprises is to use compute resources as efficiently
as possible. As increased number of enterprises move towards real-time and business-
intelligence analysis, using compute resources efficiently is an important metric. However, in
many cases compute resource is underutilized. The more CPU cycles committed to
application processing the less time it takes to run the application. Unfortunately, although
this is a design goal, this is not obtainable as both the application and protocols compete for
CPU cycles.
As the cluster node processes the application, the CPU is dedicated to the application and
protocol processing does not occur. For this to change, the protocol process must interrupt a
uni processor machine or request a spin lock for a multiprocessor machine. As the request is
granted, CPU cycles are then applied to the protocol process. As more cycles are applied to
protocol processing, application processing is suspended. In many environments, the value of
the cluster is based on the run-time of the application. The shorter the time to run, the more
floating-point operations and/or millions of instructions per-second occur, and, therefore, the
lower the cost of running a specific application or job.

The example on the left side of below shows that when there is virtually no network or
protocol processing going on, CPU 0 and 1 of each node are 100% devoted to application
processing. The right side of below shows that the network traffic levels have significantly
increased. As this happens, the CPU spends cycles processing the MPI and TCP protocol
stacks, including moving data to and from the wire. This results in a reduced or suspended
application processing. With the increase in protocol processing, note that the utilization
percentages of CPU 0 and 1 are dramatically reduced, in some cases to 0.

12. GPU BASED COMPUTING

The paradigm that is many-core processors they are designed to operate on large chunks of
data, in which CPUs prove inefficient. A GPU(Graphics Processing Unit) comprises many
cores (that almost double each passing year), and each core runs at a clock speed
significantly slower than a CPU’s clock. GPUs focus on execution throughput of massively-
parallel programs. For example, the Nvidia GeForce GTX 280 GPU has 240 cores, each of
which is a heavily multithreaded, in-order, single-instruction issue processor (SIMD − single
instruction, multiple-data) that shares its control and instruction cache with seven other cores.
When it comes to the total Floating Point Operations per Second (FLOPS), GPUs have
been leading the race for a long time now.
12.1 Comparison of GPU and CPU Architecture
GPUs do not have virtual memory, interrupts, or means of addressing devices such as the
keyboard and the mouse. They are terribly inefficient when we do not have SPMD (Single
Program, Multiple Data). Such programs are best handled by CPUs, and may be that is the
reason why they are still around. For example, a CPU can calculate a hash for a string much,
much faster than a GPU, but when it comes to computing several thousand hashes, the GPU
wins. As of data from 2009, the ratio b/w GPUs and multi-core CPUs for peak FLOP
calculations is about 10:1. Such a large performance gap forces the developers to outsource
their data-intensive applications to the GPU.
GPUs are designed for data intensive applications. This is emphasized-upon by the fact that
the bandwidths of GPU DRAM has increased tremendously by each passing year, but not so
much in case of CPUs. Why GPUs adopt such a design and CPUs do not? Well, because
GPUs were originally designed for 3D rendering, which requires holding large amount of
texture and polygon data. Caches cannot hold such large amount of data, and thus, the only

design that would have increased rendering performance was to increase the bus width and
the memory clock. For example, the Intel i7, which currently supports the largest memory
bandwidth, has a memory bus of width 192b and a memory clock upto 800MHz. The GTX
285 had a bus width of 512b, and a memory clock of 1242 MHz.
CPUs also would not benefit greatly from an increased memory bandwidth. Sequential
programs typically do not have a ‘working set’ of data, and most of the required data can be
stored in L1, L2 or L3 cache, which are faster than any RAM. Moreover, CPU programs
generally have more random memory access patterns, unlike massively-parallel programs,
that would not derive much benefit from having a wide memory bus.
12.2 GPU Design
There are 16 streaming multiprocessors (SMs) in the above diagram. Each SM has 8
streaming processors (SPs). That is, we get a total of 128 SPs. Now, each SP has a MAD
unit (Multiply and Addition Unit) and an additional MU (Multiply Unit). The GT200 has
240 SPs, and exceeds 1 TFLOP of processing power.
Each SP is massively threaded, and can run thousands of threads per application. The G80
card supports 768 threads per SM (note: not per SP). Since each SM has 8 SPs, each SP
supports a maximum of 96 threads. Total threads that can run: 128 * 96 = 12,228. This is
why these processors are called ‘massively parallel’.

13. ISSUES TO BE CONSIDERED

Cluster Networking
If you are mixing hardware that has different networking technologies, there will be large
differences in the speed with which data will be accessed and how individual nodes can
communicate. If it is in your budget make sure that all of the machines you want to include
in your cluster have similar networking capabilities, and if at all possible, have network
adapters from the same manufacturer.
Cluster Software
You will have to build versions of clustering software for each kind of system you include in
your cluster.
Programming
Our code will have to be written to support the lowest common denominator for data types
supported by the least powerful node in our cluster.
Timing
This is the most problematic aspect of heterogeneous cluster. Since these machines have
different performance profile our code will execute at different rates on the different kinds of
nodes. This can cause serious bottlenecks if a process on one node is waiting for results of a
calculation on a slower node. The second kind of heterogeneous clusters is made from
different machines in the same architectural family: e.g. a collection of Intel boxes where the
machines are different generations or machines of same generation from different
manufacturers.
Network Selection
There are a number of different kinds of network topologies, including buses, cubes of
various degrees, and grids/meshes. These network topologies will be implemented by use of
one or more network interface cards, or NICs, installed into the head-node and compute
nodes of our cluster.
Speed Selection
No matter what topology you choose for your cluster, you will want to get fastest network
that your budget allows. Fortunately, the availability of high speed computers has also forced
the development of high speed networking systems. Examples are 10Mbit Ethernet, 100Mbit
Ethernet, gigabit networking, channel bonding etc.

14. CLUSTER APPLICATIONS

Few important cluster applications are -
14.1 Google Search Engine

Internet search engines enable Internet users to search for information on the Internet by
entering specific keywords. A widely used search engine, Google uses cluster computing to
meet the huge quantity of worldwide search requests that comprise of a peak of thousands of
queries per second. A single Google query needs to use at least tens of billions of processing
cycles and access a few hundred megabytes of data in order to return satisfactory search
results.
Google uses cluster computing as its solution to the high demand of system resources since
clusters have better price-performance ratios than alternative high-performance computing
platforms, and also use less electrical power. Google focuses on 2 important design factors:
reliability and request throughput. Google is able to achieve reliability at the software level
so that a reliable computing infrastructure can be constructed on clusters of 15,000
commodity PCs distributed worldwide. The services for Google are also replicated across
multiple machines in the clusters to provide the necessary availability. Google maximizes
overall request throughput by performing parallel execution of individual search requests.
This means that more search requests can be completed within a specific time interval.
14.2 Petroleum Reservoir Simulation

Petroleum reservoir simulation facilitates a better understanding of petroleum reservoirs that
is crucial to better reservoir management and more efficient oil and gas production. It is an
example of GCA as it demands intensive computations in order to simulate geological and
physical models. For example, The Center for Petroleum and Geosystems Engineering of the
University of Texas at Austin is constructing a new parallel petroleum reservoir simulator
called General Purpose Adaptive Simulator (GPAS) using a cluster of 64 dual-processor
servers with a total of 128 processors.
A typical petroleum reservoir simulator consists of a coupled set of non-linear partial

differential equations and constitutive relations that describe the physical processes occurring
in a petroleum reservoir. There are 2 most widely used simulators. The first is the black oil
simulator that uses water, oil, and gas phases for modelling fluid flow in a reservoir. The
second is the compositional simulator that uses phases with different chemical species for
modelling physical processes occurring in a reservoir. Previously, compositional simulators
were used less often since they are more complicated and thus require more intensive
memory and processing.
14.3 Earthquake Simulation

Earthquake simulation is classified as a GCA given its high modelling and computational
complexities. First, multiple spatial scales characterize the earthquake source and basin
response ranging from tens of kilometres for the basin dimensions to hundreds of kilometres
for earthquake sources. Second, temporal scales differ from the hundredths of a second for
depicting the highest frequencies of the earthquake source to several minutes of shaking
within the basin. Third, many basins have highly irregular geometry. Fourth, the soils in the
basins comprise heterogeneous material properties. And fifth, there remains great uncertainty
into the modelling process due to the indirect observation of geology and source parameters.
The earth quake simulation is conducted using a terra-scale HP Alpha Server cluster which
has 750 quadruple-processor nodes at the Pittsburgh Supercomputing Centre (PSC). It
simulates the 1994 Northridge earthquake in the Greater LA Basin at 1 Hz maximum
frequency resolution and 100 m/s minimum shear wave velocity. The resulting unstructured
mesh contains over 100 million grid points and 80 million hexahedral finite elements,
ranking it as one of the largest unstructured mesh simulations ever conducted. This is also the
most highly resolved simulation of the Northridge earthquake ever done. It sustains nearly a
teraflops over 12 hours in solving the 300 million wave propagations.
14.4 Image Rendering

The Scientific Computing and Imaging (SCI) Institute at University of Utah has explored
cluster-based scientific visualization using a 32-node visualization cluster composed of
commodity hardware components connected with a high-speed network. The OpenGL
scientific visualization tool Simian has been modified to create a cluster-aware version of
Simian that supports parallelization by making explicit use of remote cluster nodes through a
message-passing interface (MPI). Simian is able to generate 3D images for fire-spread
simulations that model scenarios such as when a missile located within a pool of jet fuel
catches fire and explodes. Using image rendering for fire-spread simulations enables
researchers to have a better visualization of the destructive effects.
14.5 Protein Explorer

The Bioinformatics Group at RIKEN Genomic Sciences Center in Japan is currently building
the world-first petaflops supercomputer – the ‘Protein Explorer’ (PE) system will be a
specialized system for molecular dynamics simulations, specifically, protein simulations, and
is expected to be ready in early 2006. The PE system will be a PC cluster equipped with
special-purpose engines to calculate non-bonded interactions between molecular atoms.
These calculations constitute the most time-consuming portion of the simulations.

15. CONCLUSION
High-performance cluster computing is enabling a new class of computationally intensive

applications that are solving problems that were previously cost prohibitive for many
enterprises. The use of commodity computers collaborating to resolve highly complex,
computationally intensive tasks has broad application across several industry verticals such as
chemistry or biology, quantum physics, petroleum exploration, crash test simulation, CG
rendering, and financial risk analysis. However, cluster computing pushes the limits of server
architectures, computing, and network performance.
Due to the economics of cluster computing and the flexibility and high performance offered,
cluster computing has made its way into the mainstream enterprise data centers using clusters
of various sizes. As clusters become more popular and more pervasive, careful consideration
of the application requirements and what that translates to in terms of network characteristics
becomes critical to the design and delivery of an optimal and reliable performing solution.
Knowledge of how the application uses the cluster nodes and how the characteristics of the
application impact and are impacted by the underlying network is critically important. As
critical as the selection of the cluster nodes and operating system, so too are the selection of
the node interconnects and underlying cluster network switching technologies. A scalable and
modular networking solution is critical, not only to provide incremental connectivity but also
to provide incremental bandwidth options as the cluster grows. The ability to use advanced
technologies within the same networking platform, such as 10 Gigabit Ethernet, provides new
connectivity options, increases bandwidth, whilst providing investment protection.
The technologies associated with cluster computing, including host protocol stack-processing
and interconnect technologies, are rapidly evolving to meet the demands of current, new, and
emerging applications. Much progress has been made in the development of low-latency
switches, protocols, and standards that efficiently and effectively use network hardware
components.

Cluster Computing 2

Uploaded by

Copyright:

Available Formats

You might also like

Cluster Computing 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cluster Computing 2

Uploaded by

Copyright:

Available Formats

DRDO(ADVANCED SYSTEMS LABORATORY)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Internship Report Page 1

The performance and conduct during the period was excellent.

I hereby declare that the results embodied in this dissertation entitled

Kandala Venkata Phani Sai Ratnadeep

Internship Report Page 3

This is an acknowledgement of the intensive drive and technical

I am grateful to Dr.Tessy Thomas, Director, ASL, Hyderabad granting

A Special note of thanks to Sri W GOWRI SHANKAR Sc‘F’, Division

I am obliged and grateful to my guide Sri V MALOLAN, Sc‘E’,

I am thankful to Sri M PREMA KUMAR, Sc ‘D’ for encouraging me

Internship Report Page 4

A computer cluster is a group of linked computers, working together closely so

Internship Report Page 5

1.1General Introduction ............................................................................................. 8

1.2 Computer Cluster .................................................................................................. 8

2.HISTORY OF CLUSTERS .......................................................................................... 11

2.1Briefly about working of Clusters systems ........................................................... 11

3.WHY CLUTERS? .........................................................................................................13

3.1Cluster Benefits: .................................................................................................... 14

4.ATTRIBUTES OF CLUSTERS .................................................................................. 15

4.1 High Availability or Failover Clusters ................................................................. 15

4.2 Load Balancing Cluster ........................................................................................ 16

4.3 Parallel/Distributed Processing Clusters .............................................................. 16

5.COMPARING OLD AND NEW ................................................................................. 18

6.LOGIC VIEW OF CLUSTER ..................................................................................... 20

7.1 Clustering Concepts .............................................................................................. 23

8.COMPONENTS OF CLUSTER COMPUTER ............................................................. 28

9.CLUSTER OPERATION .............................................................................................. 31

9.1 Cluster Nodes ....................................................................................................... . 31

9.2 Cluster Network ................................................................................................... 32

9.3 Network Characterization ................................................................................... 32

Internship Report Page 6

10. NETWORK STORAGE ......................................................................................... 35

10.1 Briefly about Network attached storage .......................................................... 36

11. CPU UTILIZATION ............................................................................................... 42

12. GPU BASED COMPUTING .................................................................................. 44

12.1 Comparison of GPU and CPU Architecture ................................................... 44

12.2 GPU Design .................................................................................................... 45

14. CLUSTER APPLICATIONS .................................................................................. 47

14.1 Google Search Engine ..................................................................................... 47

14.2 Petroleum Reservoir Simulation ..................................................................... 47

14.3 Earthquake Simulation .................................................................................... 48

14.4 Image Rendering ............................................................................................. 48

14.5 Protein Explorer .............................................................................................. 48

15. CONCLUSION ....................................................................................................... 49

Internship Report Page 7

1.2 Computer Cluster:

A computer cluster is a group of linked computers, working together closely

Internship Report Page 8

Cluster computing is best characterized as the integration of a number of off-the-shelf

A growing range of possibilities exists for a cluster interconnection

1. Will the application be primarily processing a single dataset?

Internship Report Page 9

Development of new materials and production processes, based on high technologies,

Internship Report Page 10