Cluster

Architectures and Cluster
Software
Ashish Ranjan
07-12-2023
Application Areas
How to Run Applications Faster ?
• There are 3 ways to improve performance:
• Work Harder
• Work Smarter
• Get Help
• Computer Analogy
• Using faster hardware
• Optimized algorithms and techniques used to solve computational
tasks
• Multiple computers to solve a particular task
Latest Processor
Characterizing parallel computing systems
• Clustering
• How the system is connected
• Namespace
• How far things are visible across the nodes
• Parallelism
• Reflects the forms of action concurrency that the hardware
architecture can exploit and support
• Latency and locality management
• Defines the mechanisms and methods incorporated to tolerate latency
effects. Caches, pipelining, prefetching, multi-threading , and message-driven
computing mechanisms are among the possible mechanisms that avoid or
hide access latencies
Scalable (Parallel) Computer Architectures
• Massively Parallel Processors (MPP)
• Symmetric Multiprocessors (SMP)
• Cache-Coherent Non-Uniform Memory Access (CC-
NUMA)
• Clusters
• Distributed Systems – Grids/P2P
Scalable Parallel Computer Architectures
• MPP
• A large parallel processing system with a shared-nothing
architecture
• Consist of several hundred nodes with a high-speed
interconnection network/switch
• Each node consists of a main memory & one or more
processors
• Runs a separate copy of the OS
• SMP
• 2-64 processors
• Shared-everything architecture
• All processors share all the global resources available
• Single copy of the OS runs on these systems
Scalable Parallel Computer Architectures
• CC-NUMA
• a scalable multiprocessor system having a cache-coherent
nonuniform memory access architecture
• every processor has a global view of all of the memory
• Clusters
• a collection of workstations / PCs that are interconnected by a high-
speed network
• work as an integrated collection of resources
• have a single system image spanning all its nodes
• Distributed systems
• considered conventional networks of independent computers
• have multiple system images as each node runs its own OS
• the individual machines could be combinations of MPPs, SMPs,
clusters, & individual computers
Rise and Fall of Computer Architectures
• Vector Computers (VC) - proprietary system:
• provided the breakthrough needed for the emergence of
computational science, buy they were only a partial answer.
• Massively Parallel Processors (MPP) -proprietary systems:
• high cost and a low performance/price ratio.
• Symmetric Multiprocessors (SMP):
• suffers from scalability
• Distributed Systems:
• difficult to use and hard to extract parallel performance.
• Clusters - gaining popularity:
• High Performance Computing - Commodity Supercomputing
• High Availability Computing - Mission Critical Applications
Sourcebook of Parallel Computing (Dongarra
et al.):
We note that the term cluster can be applied both broadly (any system
built with a significant number of commodity components) or narrowly
(only commodity components and open-source software). In fact,
there is no precise definition of a cluster. Some of the issues that are
used to argue that a system is a massively parallel processor (MPP)
instead of a cluster include proprietary interconnects (...), particularly
ones designed for a specific parallel computer, and special software
that treats the entire system as a single machine, particularly for the
system administrators. Clusters may be built from personal computers
or workstations (either single processors or symmetric multiprocessors
(SMPs)) and may run either open-source or proprietary operating
systems.
Top500 Computers Architecture
Server OEM Share
Computer Food Chain: Causing the
demise of specialized systems
• Demise of mainframes, supercomputers, & MPPs

Towards Clusters
The promise of supercomputing to the average

Technology Trends...
• Performance of PC/Workstations components has
almost reached performance of those used in
supercomputers…
• Microprocessors (50% to 100% per year)
• Networks (Gigabit SANs)
• Operating Systems (Linux,...)
• Programming environments (MPI,…)
• Applications (.edu, .com, .org, .net, .shop, .bank)
• The rate of performance improvements of commodity systems is
much rapid compared to specialized systems
Towards Commodity Cluster Computing
• Since the early 1990s, there is an increasing trend to
move away from expensive and specialized proprietary
parallel supercomputers towards clusters of computers
(PCs, workstations)
• From specialized traditional supercomputing platforms to
cheaper, general purpose systems consisting of loosely
coupled components built up from single or
multiprocessor PCs or workstations
• Linking together two or more computers to jointly solve computational
problems
History: Clustering of Computers
for Collective Computing
PDA
Clusters
1960 1980s 1990 1995+ 2000+

What is Cluster ?
• A cluster is a type of parallel and distributed processing system,
which consists of a collection of interconnected stand-alone
computers cooperatively working together as a single, integrated
computing resource.
• A node
• a single or multiprocessor system with memory, I/O facilities, & OS
• A cluster:
• generally 2 or more computers (nodes) connected together
in a single cabinet, or physically separated & connected via a LAN
• appears as a single system to users and applications
• provides a cost-effective way to gain features and benefits
Cluster Architecture
Parallel Applications
Sequential Applications
Sequential Applications
Sequential Applications Parallel Programming Environment
Cluster Middleware
(Single System Image and Availability Infrastructure)
PC/Workstation PC/Workstation PC/Workstation PC/Workstation
Communications Communications Communications Communications

Software Software Software Software
Network Interface Network Interface Network Interface Network Interface

Hardware Hardware Hardware Hardware
Cluster Interconnection Network/Switch

So What’s So Different about
Clusters?
• Commodity Parts?
• Communications Packaging?
• Incremental Scalability?
• Independent Failure?
• Intelligent Network Interfaces?
• Complete System on every node
• virtual memory
• scheduler
• files
•…
• Nodes can be used individually or jointly...
Windows of Opportunities
• Parallel Processing
• Use multiple processors to build MPP/DSM-like systems for parallel
computing
• Network RAM
• Use memory associated with each workstation as aggregate DRAM
cache
• Software RAID
• Redundant Array of Inexpensive/Independent Disks
• Use the arrays of workstation disks to provide cheap, highly available
and scalable file storage
• Possible to provide parallel I/O support to applications
• Multipath Communication
• Use multiple networks for parallel data transfer between nodes
Cluster Design Issues
• Enhanced Performance (performance @ low cost)
• Enhanced Availability (failure management)
• Single System Image (look-and-feel of one system)
• Size Scalability (physical & application)
• Fast Communication (networks & protocols)
• Load Balancing (CPU, Net, Memory, Disk)
• Security and Encryption (clusters of clusters)
• Distributed Environment (Social issues)
• Manageability (admin. and control)
• Programmability (simple API if required)
• Applicability (cluster-aware and non-aware app.)
Scalability Vs. Single System Image
UP
Common Cluster Modes
• High Performance (dedicated).
• High Throughput (idle cycle harvesting).
• High Availability (fail-over).
• A Unified System – HP and HA within the same cluster

High Performance Cluster (dedicated
mode)
High Throughput Cluster (Idle
Resource Harvesting)
Shared Pool of
Computing Resources:
Processors, Memory, Disks
Interconnect
Guarantee at least one Deliver large % of collective

workstation to many individuals resources to few individuals
(when active) at any one time
High Availability Clusters
HA and HP in the same Cluster
• Best of both Worlds:

world is heading
towards this
configuration)
Cluster Components
Prominent Components of Cluster Computers (I)
•Multiple High-Performance
Computers
• PCs
• Workstations
• SMPs (CLUMPS)
• Distributed HPC Systems leading to Grid
Computing
System CPUs
• Processors
• x86-class Processors
• Intel Xeon SPR
• AMD
• ARM
• Accerators
• Nvidia H100
• AMD MI300
• HABANA GAUDI
• Intel PVC
System Disk
• Disk and I/O
• Overall improvement in disk access time has
been less than 10% per year
• Amdahl’s law
• Speed-up obtained from faster processors is
limited by the slowest system component
• Parallel I/O
• Carry out I/O operations in parallel, supported by
parallel file system based on hardware or software
RAID
Commodity Components for Clusters (II):
Operating Systems
• Operating Systems
• 2 fundamental services for users
• make the computer hardware easier to use
• create a virtual machine that differs markedly from the real machine
• share hardware resources among users
• Processor - multitasking
• The new concept in OS services
• support multiple threads of control in a process itself
• parallelism within a process
• multithreading
• POSIX thread interface is a standard programming environment
• Trend
• Modularity – MS Windows, IBM OS/2
• Microkernel – provides only essential OS services
• high level abstraction of OS portability
Prominent Components of Cluster Computers
•State of the art Operating Systems
• Linux (MOSIX, Beowulf, and many more)
• Windows HPC (HPC2N – Umea University)
• SUN Solaris (Berkeley NOW, C-DAC PARAM)
• IBM AIX (IBM SP2)
• HP UX (Illinois - PANDA)
• Mach (Microkernel based OS) (CMU)
• Cluster Operating Systems (Solaris MC, SCO
Unixware, MOSIX (academic project)
• OS gluing layers (Berkeley Glunix)
Operating Systems used in Top500
Powerful computers
AIX
Prominent Components of Cluster Computers (III)
• High Performance Networks/Switches
• RDAM enabled Ethernet,
• InfiniBand
Interconnect Share
Prominent Components of Cluster Computers (IV)
• Cluster Middleware
• Single System Image (SSI)
• System Availability (SA) Infrastructure
• Hardware
• DEC Memory Channel, DSM (Alewife, DASH), SMP Techniques
• Operating System Kernel/Gluing Layers
• Solaris MC, Unixware, GLUnix, MOSIX
• Applications and Subsystems
• Applications (system management and electronic forms)
• Runtime systems (software DSM, PFS etc.)
• Resource management and scheduling (RMS) software
• Oracle Grid Engine, Platform LSF (Load Sharing Facility), PBS (Portable Batch
Scheduler), Microsoft Cluster Compute Server (CCS)
Advanced Network Services/
Communication SW
• Communication infrastructure support protocol for
• Bulk-data transport
• Streaming data
• Group communications
• Communication service provides cluster with important QoS parameters
• Latency
• Bandwidth
• Reliability
• Fault-tolerance
• Network services are designed as a hierarchical stack of protocols with
relatively low-level communication API, providing means to implement
wide range of communication methodologies
• RPC
• DSM
• Stream-based and message passing interface (e.g., MPI, PVM)
Prominent Components of
Cluster Computers (VI)
• Parallel Programming Environments and Tools
• Threads (PCs, SMPs, NOW..)
• POSIX Threads
• Java Threads
• MPI (Message Passing Interface)
• Linux, Windows, on many Supercomputers
• Parametric Programming
• Software DSMs (Shmem)
• Compilers
• C/C++/Java
• Parallel programming with C++ (MIT Press book)
• RAD (rapid application development) tools
• GUI based tools for PP modeling
• Debuggers
• Performance Analysis Tools
• Visualization Tools
Prominent Components of
Cluster Computers (VII)
•Applications
• Sequential
• Parallel / Distributed (Cluster-aware app.)
• Grand Challenging applications
• Weather Forecasting
• Quantum Chemistry
• Molecular Biology Modeling
• Engineering Analysis (CAD/CAM)
• ……………….
• PDBs, web servers, data-mining

Key Operational Benefits of Clustering
•High Performance
•Expandability and Scalability
•High Throughput
•High Availability
Clusters Classification (I)
•Application Target
•High Performance (HP) Clusters
• Grand Challenging Applications
•High Availability (HA) Clusters
• Mission Critical applications
Clusters Classification (II)
•Node Ownership
•Dedicated Clusters
•Non-dedicated clusters
• Adaptive parallel computing
• Communal multiprocessing
Clusters Classification (III)
•Node Hardware
•Clusters of PCs (CoPs)
• Piles of PCs (PoPs)
•Clusters of Workstations (COWs)
•Clusters of SMPs (CLUMPs)
Clusters Classification (IV)
•Node Operating System
• Linux Clusters (e.g., Beowulf)
• Solaris Clusters (e.g., Berkeley NOW)
• AIX Clusters (e.g., IBM SP2)
• SCO/Compaq Clusters (Unixware)
• Digital VMS Clusters
• HP-UX clusters
• Windows HPC clusters
Clusters Classification (V)
•Node Configuration
•Homogeneous Clusters
• All nodes will have similar architectures and
run the same OSs
•Heterogeneous Clusters
• Nodes will have different architectures and run
different OSs
Clusters Classification (VI)
• Levels of Clustering
• Group Clusters (#nodes: 2-99)
• Nodes are connected by SAN like Myrinet
• Departmental Clusters (#nodes: 10s to 100s)
• Organizational Clusters (#nodes: many 100s)
• National Metacomputers (WAN/Internet-
based)
• International Metacomputers (Internet-based,
#nodes: 1000s to many millions)
• Grid Computing
• Web-based Computing
• Peer-to-Peer Computing
Cluster Applications
Cluster Applications
• Numerous Scientific & engineering Apps.
• Business Applications:
• E-commerce Applications (Amazon, eBay);
• Database Applications (Oracle on clusters).
• Internet Applications:
• ASPs (Application Service Providers);
• Computing Portals;
• E-commerce and E-business.
• Mission Critical Applications:
• command control systems, banks, nuclear reactor
control, star-wars, and handling life threatening
situations.
Cluster of SMPs (CLUMPS)
• Clusters of multiprocessors (CLUMPS)
• To be the supercomputers of the future
• Multiple SMPs with several network interfaces can be connected using high
performance networks
• 2 advantages
• Benefit from the high performance, easy-to-use-and program SMP systems with a small
number of CPUs
• Clusters can be set up with moderate effort, resulting in easier administration and better
support for data locality inside a node
Many types of Clusters
• High Performance Clusters
• Linux Cluster; 1000 nodes; parallel programs; MPI
• Load-leveling Clusters
• Move processes around to borrow cycles (eg. Mosix)
• Web-Service Clusters
• load-level tcp connections; replicate data
• Storage Clusters
• GFS; parallel filesystems; same view of data from each node
• Database Clusters
• Oracle Parallel Server;
• High Availability Clusters
• ServiceGuard, Lifekeeper, Failsafe, heartbeat, failover clusters
Summary: Cluster Advantage
• Price/performance ratio of Clusters is low when compared
with a dedicated parallel supercomputer.
• Incremental growth that often matches with the demand
patterns.
• The provision of a multipurpose system
• Scientific, commercial, Internet applications
• Have become mainstream enterprise computing systems:
• As Top 500 List, over 50% (in 2003) and 80% (since 2008) of them are
based on clusters and many of them are deployed in industries.
• In the recent list, most of them are clusters!
Cluster Software Projects
https://hpsfoundation.github.io/

Cluster

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cluster

Uploaded by

Copyright:

Available Formats

Architectures and Cluster

• Demise of mainframes, supercomputers, & MPPs

The promise of supercomputing to the average

1960 1980s 1990 1995+ 2000+

PC/Workstation PC/Workstation PC/Workstation PC/Workstation

Communications Communications Communications Communications

Network Interface Network Interface Network Interface Network Interface

Cluster Interconnection Network/Switch

• A Unified System – HP and HA within the same cluster

Guarantee at least one Deliver large % of collective

• Best of both Worlds:

• PDBs, web servers, data-mining

You might also like