Professional Documents
Culture Documents
Cluster Computing 2
Cluster Computing 2
Cluster Computing 2
CLUSTER COMPUTING
PROJECT REPORT
Submitted By:
KANDALA VENKATA PHANI SAI RATNADEEP
2210315725
CSE-7
Submitted To:
Date: 15/06/2018
CERTIFICATE
This is to certify that Kandala Venkata Phani Sai Ratnadeep (2210315725) student of
Computer Science and Engineering Department from GITAM DEEMED TO BE
UNIVERSITY, HYDERABAD have successfully completed their project on "
CLUSTER COMPUTING ", as a part of summer internship from 1st May 2018 to 15th
June 2018 under the guidance of Shri V.Malolan Scientist 'E', Division Head of
QDMS&NDE ,Directorate of Reliability & Quality Assurance at Advanced System
Laboratory, Defence Research and Development Organization.
(V.MALOLAN)
Scientist 'E' (R&QA)
Division Head of QDMS&NDE
Advanced Systems Laboratory
DRDO
Internship Report Page 2
DRDO(ADVANCED SYSTEMS LABORATORY)
DECLARATION
ACKNOWLEDGEMENT
ABSTRACT
INDEX
1. INTRODUCTION ....................................................................................................... 8
7.ARCHITECTURE ........................................................................................................ 21
9.4 Ethernet, Fast Ethernet, Gigabit Ethernet and 10-Gigabit Ethernet .................... 32
1. INTRODUCTION
1.1General Introduction:
Parallel computing has seen many changes since the days of the highly expensive and
proprietary super computers. Changes and improvements in performance have also been seen
in the area of mainframe computing for many environments. But these compute
environments may not be the most cost effective and flexible solution for a problem. Over
the past decade, cluster technologies have been developed that allow multiple low cost
computers to work in a coordinated fashion to process applications. The economics,
performance and flexibility of compute clusters makes cluster computing an attractive
alternative to centralized computing models and the attendant to cost, inflexibility, and
scalability issues inherent to these models.
Many enterprises are now looking at clusters of high-performance, low cost computers to
provide increased application performance, high availability, and ease of scaling within the
data centre. Interest in and deployment of computer clusters has largely been driven by the
increase in the performance of off-the-shelf commodity computers, high-speed, low-latency
network switches and the maturity of the software components. Application performance
continues to be of significant concern for various entities including governments, military,
education, scientific and now enterprise organizations.
When considering a cluster implementation, there are some basic questions that can help
determine the cluster attributes such that technology options can be evaluated:
so that interconnect and storage bandwidth are not limiting factors, although this is not
always the case.
2.HISTORY OF CLUSTERS
The formal engineering basis of cluster computing as a means of doing parallel work of any
sort was arguably invented by Gene Amdahl of IBM, who in 1967 published what has come
to be regarded as the seminal paper on parallel processing: Amdahl's law.
The history of early computer clusters is more or less directly tied into the history of early
networks, as one of the primary motivations for the development of a network was to link
computing resources, creating a de facto computer cluster. The first commodity clustering
product was ARCnet, developed by Data point in 1977. ARC net wasn't a commercial
success and clustering didn't really take off until DEC released their VAX cluster product in
the 1980s for the VAX/VMS operating system. The ARCnet and VAX cluster products not
only supported parallel computing, but also shared file systems and peripheral devices. They
were supposed to give you the advantage of parallel processing while maintaining data
reliability and uniqueness. VAX cluster, now VMS cluster, is still available on OpenVMS
systems from HP running on Alpha and Itanium systems.
The communication between the parts of the main job occurs within the framework of the so-
called message-passing paradigm, which was standardized in the message-passing interface
(MPI). The message-passing paradigm is flexible enough to support a variety of applications
and is also well adapted to the MPP architecture. During last year’s, a tremendous
improvement in the performance of standard workstation processors led to their use in the
MPP supercomputers, resulting in significantly lowered price/performance ratios.
3.WHY CLUTERS?
The question may arise why clusters are designed and built when perfectly good commercial
supercomputers are available on the market. The answer is that they are expensive . It is also
difficult to upgrade and maintain them. Clusters, on the other hand, are cheap and easy way
to take off-the-shelf components and combine them into a single supercomputer. In some
areas of research clusters are actually faster than commercial supercomputer. Clusters also
have the distinct advantage in that they are simple to build using components available from
hundreds of sources. We don’t even have to use new equipment to build a cluster.
Price/Performance :
The most obvious benefit of clusters, and the most compelling reason for the growth in their
use, is that they have significantly reduced the cost of processing power.
2. Memory - the memory used by these processors has dropped in cost right with the
processors.
High- Speed networks can now be assembled with these products for a fraction of the cost
necessary only a few years ago.
4. Motherboards, busses, and other sub-systems - all of these have become commodity
products, allowing the assembly of affordable computers from off the shelf components
3.1Cluster Benefits:
The main benefits of clusters are scalability, availability, and performance. For scalability, a
cluster uses the combined processing power of compute nodes to run cluster-enabled
applications such as a parallel database server at a higher performance than a single machine
can provide. Scaling the cluster's processing power is achieved by simply adding additional
nodes to the cluster. Availability within the cluster is assured as nodes within the cluster
provide backup to each other in the event of a failure. In high-availability clusters, if a node
is taken out of service or fails, the load is transferred to another node (or nodes) within the
cluster. To the user, this operation is transparent as the applications and data running are also
available on the failover nodes. An additional benefit comes with the existence of a single
system image and the ease of manageability of the cluster.
4.ATTRIBUTES OF CLUSTERS
There are several types of clusters, each with specific design goals and functionality. These
clusters range from distributed or parallel clusters for computation intensive or data intensive
applications that are used for protein, seismic, or nuclear modelling to simple load-balanced
clusters.
single virtual machine necessary to allow the network to effortlessly grow to meet increased
business demands.
Both the high availability and load-balancing cluster technologies can be combined to
increase the reliability, availability, and scalability of application and data resources that are
widely deployed for web, mail, news, or FTP services.
seismic analysis, etc.), and financial data analysis. One of the more common cluster
operating systems is the Beowulf class of clusters. A Beowulf cluster can be defined as a
number of systems whose collective processing capabilities are simultaneously applied to a
specific technical, scientific, or business application. Each individual computer is referred to
as a “node” and each node communicates with other nodes within a cluster across standard
Ethernet technologies (10/100 Mbps, GbE, or 10GbE). Other high-speed interconnects such
as Myrinet, Infiniband, or Quadrics may also be used.
Collaboration
Scientists can collaborate in real-time across dispersed locations- bridging isolated islands of
scientific research and discovery- when HPC clusters are based on open source and building
block technology.
Scalability
HPC clusters can grow in overall capacity because processors and nodes can be added as
demand increases.
Availability
Because single points of failure can be eliminated, if any one system component goes Down,
the system as a whole or the solution (multiple systems) stay highly available.
Processors, memory, disk or operating system (OS) technology can be easily updated, And
new processors and nodes can be added or upgraded as needed.
Compared to proprietary systems, the total cost of ownership can be much lower. This
includes service, support and training.
Vendor lock-in
The age-old problem of proprietary vs. open systems that use industry-accepted standards is
eliminated.
System manageability
Reusability of components
Commercial components can be reused, preserving the investment. For example, older nodes
can be deployed as file/print servers, web servers or other infrastructure servers.
Disaster recovery
Large SMPs are monolithic entities located in one facility. HPC systems can be collocated or
geographically dispersed to make them less susceptible to disaster.
The master node acts as a server for Network File System (NFS) and as a gateway to the
outside world. As an NFS server, the master node provides user file space and other common
system software to the compute nodes via NFS. As a gateway, the master node allows users
to gain access through it to the compute nodes. Usually, the master node is the only machine
that is also connected to the outside world using a second network interface card (NIC). The
sole task of the compute nodes is to execute parallel jobs. In most cases, therefore, the
compute nodes do not have keyboards, mice, video cards, or monitors. All access to the
client nodes is provided via remote connections from the master node. Because compute
nodes do not need to access machines outside the cluster, nor do machines outside the cluster
need to access compute nodes directly, compute nodes commonly use private IP addresses,
such as the 10.0.0.0/8 or 192.168.0.0/16 address ranges. From a user’s perspective, a
Beowulf cluster appears as a Massively Parallel Processor (MPP) system. The most common
methods of using the system are to access the master node either directly or through Telnet or
remote login from personal workstations. Once on the master node, users can prepare and
compile their parallel applications, and also spawn jobs on a desired number of compute
nodes in the cluster. Applications must be written in parallel style and use the message-
passing programming model. Jobs of a parallel application are spawned on compute nodes,
which work collaboratively until finishing the application. During the execution, compute
nodes use 10 standard message-passing middleware, such as Message Passing Interface
(MPI) and Parallel Virtual Machine (PVM), to exchange information.
7.ARCHITECTURE
A node:
Three principle features usually provided by cluster computing are availability, scalability
and simplification. Availability is provided by the cluster of computers operating as a single
system by continuing to provide services even when one of the individual computers is lost
due to a hardware failure or other reason. Scalability is provided by the inherent ability of the
overall system to allow new components, such as computers, to be assed as the overall
system's load is increased. The simplification comes from the ability of the cluster to allow
administrators to manage the entire group as a single system. This greatly simplifies the
management of groups of systems and their applications. The goal of cluster computing is to
Internship Report Page 21
DRDO(ADVANCED SYSTEMS LABORATORY)
facilitate sharing a computer load over several systems without either the users of system or
the administrators needing to know that more than one system is involved. The Windows NT
Server Edition of the Windows operating system is an example of a base operating system
that has been modified to include architecture that facilitates a cluster computing
environment to be established.
Cluster computing has been employed for over fifteen years but it is
the recent demand for higher availability in small businesses that has caused an explosion in
this field. Electronic databases and electronic malls have become essential to the daily
operation of small businesses. Access to this critical information by these entities has created
a large demand for cluster computing principle features.
There are some key concepts that must be understood when forming a cluster computing
resource. Nodes or systems are the individual members of a cluster. They can be computers,
servers, and other such hardware although each node generally has memory and processing
capabilities. If one node becomes unavailable the other nodes can carry the demand load so
that applications or services are always available. There must be at least two nodes to
compose a cluster structure otherwise they are just called servers. The collection of software
on each node that manages all cluster specific activity is called the cluster service. The
cluster service manages all of the resources, the canonical items in the system, and sees then
as identical opaque objects. Resources can be such things as physical hardware devices, like
disk drives and network cards, logical items, like logical disk volumes, TCP/IP addresses,
applications, and databases.
Groups contain all of the resources necessary to run a specific application, and if need be, to
connect to the service provided by the application in the case of client systems. These groups
allow administrators to combine resources into larger logical units so that they can be
managed as a unit. This, of course, means that all operations performed on a group affect all
resources contained within that group.
Normally the development of a cluster computing system occurs in phases. The first phase
involves establishing the underpinnings into the base operating system and building the
foundation of the cluster components. These things should focus on providing enhanced
availability to key applications using storage that is accessible to two nodes. The following
stages occur as the demand increases and should allow for much larger clusters to be formed.
These larger clusters should have a true distribution of applications, higher performance
interconnects, widely distributed storage for easy accessibility and load balancing. Cluster
computing will become even more prevalent in the future because of the growing needs and
demands of businesses as well as the spread of the Internet
Clusters are in fact quite simple. They are a bunch of computers tied together with a network
working on a large problem that has been broken down into smaller pieces. There are a
number of different strategies we can use to tie them together. There are also a number of
different software packages that can be used to make the software side of things work.
Parallelism
The name of the game in high performance computing is parallelism. It is the quality that
allows something to be done in parts that work independently rather than a task that has so
many interlocking dependencies that it cannot be further broken down. Parallelism operates
at two levels: hardware parallelism and software parallelism.
Hardware Parallelism
On one level hardware parallelism deals with the CPU of an individual system and how we
can squeeze performance out of sub-components of the CPU that can speed up our code. At
another level there is the parallelism that is gained by having multiple systems working on a
computational problem in a distributed fashion. These systems are known as ‘fine grained’
for parallelism inside the CPU or having to do with the multiple CPUs in the same system, or
‘coarse grained’ for parallelism of a collection of separate systems acting in concerts.
A computer’s CPU is commonly pictured as a device that operates on one instruction after
another in a straight line, always completing one-step or instruction before a new one is
started. But new CPU architectures have an inherent ability to do more than one thing at
once. The logic of CPU chip divides the CPU into multiple execution units. Systems that
have multiple execution units allow the CPU to attempt to process more than one instruction
at a time. Two hardware features of modern CPUs support
multiple execution units: the cache – a small memory inside the CPU. The pipeline is a small
area of memory inside the CPU where instructions that are next in line to be executed are
stored. Both cache and pipeline allow impressive increases in CPU performances.
with other nodes in the cluster. Finally, if we use multiple disk drive controllers in each node
we create parallel data paths that can be used to increase the performance of I/O subsystem.
Software Parallelism
Software parallelism is the ability to find well defined areas in a problem we want to solve
that can be broken down into self-contained parts. These parts are the program elements that
can be distributed and give us the speedup that we want to get out of a high performance
computing system. Before we can run a program on a parallel cluster, we have to ensure that
the problems we are trying to solve are amenable to being done in a parallel fashion. Almost
any problem that is composed of smaller sub problems that can be quantified can be broken
down into smaller problems and run on a node on a cluster.
System-Level Middleware
System-level middleware offers Single System Image (SSI) and high availability
infrastructure for processes, memory, storage, I/O, and networking. The single system image
illusion can be implemented using the hardware or software infrastructure. This unit focuses
on SSI at the operating system or subsystems level.
Although new high-performance protocols are available for cluster computing, some
instructors may want provide students with a brief introduction to message passing programs
using the BSD Sockets interface Transmission Control Protocol/Internet Protocol (TCP/IP)
before introducing more complicated parallel programming with distributed memory
programming tools. If students have already had a course in data
communications or computer networks then this unit should be skipped. Students should
have access to a networked computer lab with the Sockets libraries enabled. Sockets usually
come installed on Linux workstations.
Application-Level Middleware
Application-level middleware is the layer of software between the operating system and
applications. Middleware provides various services required by an application to function
correctly. A course in cluster programming can include some coverage of middleware tools
such as CORBA, Remote Procedure Call, Java Remote Method Invocation (RMI), or Jini.
Sun Microsystems has produced a number of Java-based technologies that can become units
in a cluster programming course, including the Java Development Kit (JDK) product family
that consists of the essential tools and APIs for all developers writing in the Java
programming language through to APIs such as for telephony (JTAPI), database connectivity
(JDBC), 2D and 3D graphics, security as well as electronic commerce. These technologies
enable Java to interoperate with many other devices, technologies, and software standards.
Provide a simple, straightforward view of all system resources and activities, from any
node of the cluster
Free the end user from having to know where an application will run
Free the operator from having to know where a resource is located
Let the user work with familiar interface and commands and allows the administrators
to manage the entire clusters as a single entity
Reduce the risk of operator errors, with the result that end users see improved
reliability and higher availability of the system
Network is the most critical part of a cluster. Its capabilities and performance directly
influences the applicability of the whole system for HPC. Starting from Local/Wide Area
Networks (LAN/WAN) like Fast Ethernet and ATM, to System Area Networks (SAN) like
Myrinet and Memory Channel
Application :
It includes all the various applications that are going on for a particular group. These
applications run in parallel. These includes various query running on different nodes of the
cluster. This can be said as the input part of the cluster component
Middleware:
These are software packages which interacts the user with the operating system for the
cluster computing. In other words we can say that these are the layers of software between
applications and operating system. Middleware provides various services required by an
application to function correctly. The software that are used as middleware are:
Features:
Scyld :
Features:
Commercial distribution.
Single system image design.
Processors: x86 and Opteron.
Interconnects: Ethernet and Infiniband. MPI and PVM.
Disk full and diskless support.
Rocks:
Features:
Operating System:
Clusters can be supported by various operating systems which includes Windows, Linux.etc.
Interconnect:
Interconnection between the various nodes of the cluster system can be done using 10GbE,
Myrinet etc. In case of small cluster system these and be connected with the help of simple
switches.
Switch:
A switch is a multi port bridge with a buffer and a design that can boost its efficiency(large
number of ports imply less traffic) and performance. Switch can perform error checking
before forwarding data, that makes it very efficient as it does not forward packets that have
errors and forward good packets selectively to correct port only.
10 GbE :
10 Gigabit Ethernet is the fastest and most recent of the Ethernet standards as IEEE
802.3ae defines a version of Ethernet with a nominal rate of 10Gbits/s that makes it 10 times
faster than Gigabit Ethernet. Unlike other Ethernet systems, 10 Gigabit Ethernet is based
entirely on the use of optical fiber connections.
Myrinet :
Infiniband:
Nodes :
Nodes of the cluster system implies about the different computers that are connected. All of
these processors can be of Intel or AMD 64 bit
9.CLUSTER OPERATION
9.1 Cluster Nodes :
Node technology has migrated from the conventional tower cases to single rack-unit
multiprocessor systems and blade servers that provide a much higher processor density
within a decreased area. Processor speeds and server architectures have increased in
performance, as well as solutions that provide options for either 32-bit or 64-bit processors
systems. Additionally, memory performance as well as hard-disk access speeds and storage
capacities have also increased. It is interesting to note that even though performance is
growing exponentially in some cases, the cost of these technologies has dropped
considerably. As shown in below, node participation in the cluster falls into one of two
responsibilities: master (or head) node and compute (or slave) nodes. The master node is the
unique server in cluster systems. It is responsible for running the file system and also serves
as the key system for clustering middleware to route processes, duties, and monitor the health
and status of each slave node. A compute (or slave) node within a cluster provides the cluster
a computing and data storage capability. These nodes are derived from fully operational,
standalone computers that are typically marketed as desktop or server systems that, as such,
are off-the-shelf commodity systems.
Latency is measured in microseconds (µSec) or milliseconds (mSec) and is the time it takes
to move a single packet of information in one port and out of another. For parallel clusters,
latency is measured as the time it takes for a message to be passed from one processor to
another that includes the latency of the interconnecting switch or switches. The actual
latencies observed will vary widely even on a single switch depending on characteristics
such as packet size, switch architecture (centralized versus distributed), queuing, buffer
depths and allocations, and protocol processing at the nodes.
Ethernet:
Ethernet is the most popular physical layer LAN technology in use today. It defines the
number of conductors that are required for a connection, the performance thresholds that can
be expected, and provides the framework for data transmission. A standard Ethernet network
can transmit data at a rate up to 10 Megabits per second (10 Mbps). The Institute for
Electrical and Electronic Engineers developed an Ethernet standard known as IEEE Standard
802.3.
Fast Ethernet:
The Fast Ethernet standard (IEEE 802.3u) has been established for Ethernet networks that
need higher transmission speeds. This standard raises the Ethernet speed limit from 10 Mbps
to 100 Mbps .
100BASE-T4 which utilizes an extra two wires for use with level 3 UTP cable.
Gigabit Ethernet:
Gigabit Ethernet was developed to meet the need for faster communication networks. It is
also known as “gigabit-Ethernet-over-copper” or 1000Base-T.It is a version of Ethernet that
runs at speeds 10 times faster than 100Base-T.It is defined in the IEEE 802.3
standard .Existing Ethernet LANs with 10 and 100 Mbps cards can feed into a Gigabit
Ethernet backbone to interconnect high performance switches, routers and servers.
10 Gigabit Ethernet :10 Gigabit Ethernet is the fastest and most recent of the Ethernet
standards .IEEE 802.3ae defines a version of Ethernet with a nominal rate of 10Gbits/s that
makes it 10 times faster than Gigabit Ethernet. Unlike other Ethernet systems, 10 Gigabit
Ethernet is based entirely on the use of optical fiber connections.
In the days of mainframes, data was stored physically separate from the actual
processing unit, but was still only accessible through the processing units. As PC based
servers became more commonplace, storage devices went 'inside the box' or in external
boxes that were connected directly to the system. Each of these approaches was valid in its
time, but as our need to store increasing volumes of data and our need to make it more
accessible grew, other alternatives were needed. Enter network storage. Network storage is
a generic term used to describe network based data storage, but there are many
technologies within it which all go to make the magic happen. Here is a rundown of some
of the basic terminology that you might happen across when reading about network
storage.
Direct attached storage is the term used to describe a storage device that is directly
attached to a host system. The simplest example of DAS is the internal hard drive of a
server computer, though storage devices housed in an external box come under this banner
as well. DAS is still, by far, the most common method of storing data for computer
systems. Over the years, though, new technologies have emerged which work, if you'll
excuse the pun, out of the box.
Network Attached Storage, or NAS, is a data storage mechanism that uses special
devices connected directly to the network media. These devices are assigned an IP address
and can then be accessed by clients via a server that acts as a gateway to the data, or in
some cases allows the device to be accessed directly by the clients without an intermediary.
The beauty of the NAS structure is that it means that in an environment with many
servers running different operating systems, storage of data can be centralized, as can the
security, management, and backup of the data. An increasing number of companies already
make use of NAS technology, if only with devices such as CD-ROM towers (stand-alone
boxes that contain multiple CD-ROM drives) that are connected directly to the network.
Some of the big advantages of NAS include the expandability; need more storage
space, add another NAS device and expand the available storage. NAS also bring an extra
level of fault tolerance to the network. In a DAS environment, a server going down means
that the data that that server holds is no longer available. With NAS, the data is still
available on the network and accessible by clients. Fault tolerant measures such as RAID,
can be used to make sure that the NAS device does not become a point of failure.
A SAN is a network of storage devices that are connected to each other and to a
server, or cluster of servers, which act as an access point to the SAN. In some
configurations a SAN is also connected to the network. SAN's use special switches as a
mechanism to connect the devices. These switches, which look a lot like a normal Ethernet
networking switch, act as the connectivity point for SAN's. Making it possible for devices
to communicate with each other on a separate network brings with it many advantages.
Consider, for instance, the ability to back up every piece of data on your network without
having to 'pollute' the standard network infrastructure with gigabytes of data. This is just
one of the advantages of a SAN which is making it a popular choice with companies today,
and is a reason why it is forecast to become the data storage technology of choice in the
coming years.
Network-attached storage (NAS) is hard disk storage that is set up with its own network
address rather than being attached to the department computer that is serving applications
to a network's workstation users. By removing storage access and its management from the
department server, both application programming and files can be served faster because
they are not competing for the same processor resources. The network-attached storage
device is attached to a local area network (typically, an Ethernet network) and assigned an
IP address. File requests are mapped by the main server to the NAS file server.
A network-attached storage (NAS) device is a server that is dedicated to nothing more than
file sharing. NAS does not provide any of the activities that a server in a server-centric
system typically provides, such as e-mail, authentication or file management. NAS allows
more hard disk storage space to be added to a network that already utilizes servers without
shutting them down for maintenance and upgrades. With a NAS device, storage is not an
integral part of the server. Instead, in this storage-centric design, the server still handles all
of the processing of data but a NAS device delivers the data to the user. A NAS device
does not need to be located within the server but can exist anywhere in a LAN and can be
made up of multiple networked NAS devices.
Network Attached Storage separates the application server from the storage. This increases
overall system performance by allowing the servers to perform application requests and the
NAS to serve files or run applications.
Each NAS resides on the LAN as an independent network node and has its own IP address
An important benefit of NAS is its ability to provide multiple clients on the network with
access to the same files today, when more storage capacity is required, NAS appliances can
simply be outfitted with larger disks or clustered together to provide both vertical
scalability and horizontal scalability
RAID-0 (Striping)
Evaluation:
Reliability:0
There is no duplication of data. Hence, a block once lost cannot be recovered.
Capacity:N*B
The entire space is being used to store data. Since there is no duplication, N disks each
having B blocks are fully utilized.
Assume that in the above figure, C3 is lost due to some disk failure. Then, we can
recompute the data bit stored in C3 by looking at the values of all the other columns and
the parity bit. This allows us to recover lost data.
Evaluation:
Reliability:1
RAID-4 allows recovery of at most 1 disk failure (because of the way parity works). If
more than one disk fails, there is no way to recover the data.
Capacity:(N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1) disks are made
available for data storage, each disk having B blocks.
This is a slight modification of the RAID-4 system where the only difference is that the
parity rotates among the drives.
Evaluation:
Reliability:1
RAID-5 allows recovery of at most 1 disk failure (because of the way parity works). If
more than one disk fails, there is no way to recover the data. This is identical to RAID-
4.
Capacity:(N-1)*B
Overall, space equivalent to one disk is utilized in storing the parity. Hence, (N-1) disks
are made available for data storage, each disk having B blocks.
As the cluster node processes the application, the CPU is dedicated to the application and
protocol processing does not occur. For this to change, the protocol process must interrupt a
uni processor machine or request a spin lock for a multiprocessor machine. As the request is
granted, CPU cycles are then applied to the protocol process. As more cycles are applied to
protocol processing, application processing is suspended. In many environments, the value of
the cluster is based on the run-time of the application. The shorter the time to run, the more
floating-point operations and/or millions of instructions per-second occur, and, therefore, the
lower the cost of running a specific application or job.
The example on the left side of below shows that when there is virtually no network or
protocol processing going on, CPU 0 and 1 of each node are 100% devoted to application
processing. The right side of below shows that the network traffic levels have significantly
increased. As this happens, the CPU spends cycles processing the MPI and TCP protocol
stacks, including moving data to and from the wire. This results in a reduced or suspended
application processing. With the increase in protocol processing, note that the utilization
percentages of CPU 0 and 1 are dramatically reduced, in some cases to 0.
GPUs do not have virtual memory, interrupts, or means of addressing devices such as the
keyboard and the mouse. They are terribly inefficient when we do not have SPMD (Single
Program, Multiple Data). Such programs are best handled by CPUs, and may be that is the
reason why they are still around. For example, a CPU can calculate a hash for a string much,
much faster than a GPU, but when it comes to computing several thousand hashes, the GPU
wins. As of data from 2009, the ratio b/w GPUs and multi-core CPUs for peak FLOP
calculations is about 10:1. Such a large performance gap forces the developers to outsource
their data-intensive applications to the GPU.
GPUs are designed for data intensive applications. This is emphasized-upon by the fact that
the bandwidths of GPU DRAM has increased tremendously by each passing year, but not so
much in case of CPUs. Why GPUs adopt such a design and CPUs do not? Well, because
GPUs were originally designed for 3D rendering, which requires holding large amount of
texture and polygon data. Caches cannot hold such large amount of data, and thus, the only
design that would have increased rendering performance was to increase the bus width and
the memory clock. For example, the Intel i7, which currently supports the largest memory
bandwidth, has a memory bus of width 192b and a memory clock upto 800MHz. The GTX
285 had a bus width of 512b, and a memory clock of 1242 MHz.
CPUs also would not benefit greatly from an increased memory bandwidth. Sequential
programs typically do not have a ‘working set’ of data, and most of the required data can be
stored in L1, L2 or L3 cache, which are faster than any RAM. Moreover, CPU programs
generally have more random memory access patterns, unlike massively-parallel programs,
that would not derive much benefit from having a wide memory bus.
12.2 GPU Design
There are 16 streaming multiprocessors (SMs) in the above diagram. Each SM has 8
streaming processors (SPs). That is, we get a total of 128 SPs. Now, each SP has a MAD
unit (Multiply and Addition Unit) and an additional MU (Multiply Unit). The GT200 has
240 SPs, and exceeds 1 TFLOP of processing power.
Each SP is massively threaded, and can run thousands of threads per application. The G80
card supports 768 threads per SM (note: not per SP). Since each SM has 8 SPs, each SP
supports a maximum of 96 threads. Total threads that can run: 128 * 96 = 12,228. This is
why these processors are called ‘massively parallel’.
If you are mixing hardware that has different networking technologies, there will be large
differences in the speed with which data will be accessed and how individual nodes can
communicate. If it is in your budget make sure that all of the machines you want to include
in your cluster have similar networking capabilities, and if at all possible, have network
adapters from the same manufacturer.
Cluster Software
You will have to build versions of clustering software for each kind of system you include in
your cluster.
Programming
Our code will have to be written to support the lowest common denominator for data types
supported by the least powerful node in our cluster.
Timing
This is the most problematic aspect of heterogeneous cluster. Since these machines have
different performance profile our code will execute at different rates on the different kinds of
nodes. This can cause serious bottlenecks if a process on one node is waiting for results of a
calculation on a slower node. The second kind of heterogeneous clusters is made from
different machines in the same architectural family: e.g. a collection of Intel boxes where the
machines are different generations or machines of same generation from different
manufacturers.
Network Selection
There are a number of different kinds of network topologies, including buses, cubes of
various degrees, and grids/meshes. These network topologies will be implemented by use of
one or more network interface cards, or NICs, installed into the head-node and compute
nodes of our cluster.
Speed Selection
No matter what topology you choose for your cluster, you will want to get fastest network
that your budget allows. Fortunately, the availability of high speed computers has also forced
the development of high speed networking systems. Examples are 10Mbit Ethernet, 100Mbit
Ethernet, gigabit networking, channel bonding etc.
Google uses cluster computing as its solution to the high demand of system resources since
clusters have better price-performance ratios than alternative high-performance computing
platforms, and also use less electrical power. Google focuses on 2 important design factors:
reliability and request throughput. Google is able to achieve reliability at the software level
so that a reliable computing infrastructure can be constructed on clusters of 15,000
commodity PCs distributed worldwide. The services for Google are also replicated across
multiple machines in the clusters to provide the necessary availability. Google maximizes
overall request throughput by performing parallel execution of individual search requests.
This means that more search requests can be completed within a specific time interval.
The earth quake simulation is conducted using a terra-scale HP Alpha Server cluster which
has 750 quadruple-processor nodes at the Pittsburgh Supercomputing Centre (PSC). It
simulates the 1994 Northridge earthquake in the Greater LA Basin at 1 Hz maximum
frequency resolution and 100 m/s minimum shear wave velocity. The resulting unstructured
mesh contains over 100 million grid points and 80 million hexahedral finite elements,
ranking it as one of the largest unstructured mesh simulations ever conducted. This is also the
most highly resolved simulation of the Northridge earthquake ever done. It sustains nearly a
teraflops over 12 hours in solving the 300 million wave propagations.
15. CONCLUSION
Due to the economics of cluster computing and the flexibility and high performance offered,
cluster computing has made its way into the mainstream enterprise data centers using clusters
of various sizes. As clusters become more popular and more pervasive, careful consideration
of the application requirements and what that translates to in terms of network characteristics
becomes critical to the design and delivery of an optimal and reliable performing solution.
Knowledge of how the application uses the cluster nodes and how the characteristics of the
application impact and are impacted by the underlying network is critically important. As
critical as the selection of the cluster nodes and operating system, so too are the selection of
the node interconnects and underlying cluster network switching technologies. A scalable and
modular networking solution is critical, not only to provide incremental connectivity but also
to provide incremental bandwidth options as the cluster grows. The ability to use advanced
technologies within the same networking platform, such as 10 Gigabit Ethernet, provides new
connectivity options, increases bandwidth, whilst providing investment protection.
The technologies associated with cluster computing, including host protocol stack-processing
and interconnect technologies, are rapidly evolving to meet the demands of current, new, and
emerging applications. Much progress has been made in the development of low-latency
switches, protocols, and standards that efficiently and effectively use network hardware
components.