Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 55

Architectures and Cluster

Software

Ashish Ranjan
07-12-2023
Application Areas
How to Run Applications Faster ?
• There are 3 ways to improve performance:
• Work Harder
• Work Smarter
• Get Help
• Computer Analogy
• Using faster hardware
• Optimized algorithms and techniques used to solve computational
tasks
• Multiple computers to solve a particular task
Latest Processor
Characterizing parallel computing systems
• Clustering
• How the system is connected
• Namespace
• How far things are visible across the nodes
• Parallelism
• Reflects the forms of action concurrency that the hardware
architecture can exploit and support
• Latency and locality management
• Defines the mechanisms and methods incorporated to tolerate latency
effects. Caches, pipelining, prefetching, multi-threading , and message-driven
computing mechanisms are among the possible mechanisms that avoid or
hide access latencies
Scalable (Parallel) Computer Architectures
• Massively Parallel Processors (MPP)
• Symmetric Multiprocessors (SMP)
• Cache-Coherent Non-Uniform Memory Access (CC-
NUMA)
• Clusters
• Distributed Systems – Grids/P2P
Scalable Parallel Computer Architectures
• MPP
• A large parallel processing system with a shared-nothing
architecture
• Consist of several hundred nodes with a high-speed
interconnection network/switch
• Each node consists of a main memory & one or more
processors
• Runs a separate copy of the OS

• SMP
• 2-64 processors
• Shared-everything architecture
• All processors share all the global resources available
• Single copy of the OS runs on these systems
Scalable Parallel Computer Architectures
• CC-NUMA
• a scalable multiprocessor system having a cache-coherent
nonuniform memory access architecture
• every processor has a global view of all of the memory
• Clusters
• a collection of workstations / PCs that are interconnected by a high-
speed network
• work as an integrated collection of resources
• have a single system image spanning all its nodes
• Distributed systems
• considered conventional networks of independent computers
• have multiple system images as each node runs its own OS
• the individual machines could be combinations of MPPs, SMPs,
clusters, & individual computers
Rise and Fall of Computer Architectures
• Vector Computers (VC) - proprietary system:
• provided the breakthrough needed for the emergence of
computational science, buy they were only a partial answer.
• Massively Parallel Processors (MPP) -proprietary systems:
• high cost and a low performance/price ratio.
• Symmetric Multiprocessors (SMP):
• suffers from scalability
• Distributed Systems:
• difficult to use and hard to extract parallel performance.
• Clusters - gaining popularity:
• High Performance Computing - Commodity Supercomputing
• High Availability Computing - Mission Critical Applications
Sourcebook of Parallel Computing (Dongarra
et al.):

We note that the term cluster can be applied both broadly (any system
built with a significant number of commodity components) or narrowly
(only commodity components and open-source software). In fact,
there is no precise definition of a cluster. Some of the issues that are
used to argue that a system is a massively parallel processor (MPP)
instead of a cluster include proprietary interconnects (...), particularly
ones designed for a specific parallel computer, and special software
that treats the entire system as a single machine, particularly for the
system administrators. Clusters may be built from personal computers
or workstations (either single processors or symmetric multiprocessors
(SMPs)) and may run either open-source or proprietary operating
systems.
Top500 Computers Architecture
Server OEM Share
Computer Food Chain: Causing the
demise of specialized systems

• Demise of mainframes, supercomputers, & MPPs


Towards Clusters

The promise of supercomputing to the average


Technology Trends...
• Performance of PC/Workstations components has
almost reached performance of those used in
supercomputers…
• Microprocessors (50% to 100% per year)
• Networks (Gigabit SANs)
• Operating Systems (Linux,...)
• Programming environments (MPI,…)
• Applications (.edu, .com, .org, .net, .shop, .bank)
• The rate of performance improvements of commodity systems is
much rapid compared to specialized systems
Towards Commodity Cluster Computing
• Since the early 1990s, there is an increasing trend to
move away from expensive and specialized proprietary
parallel supercomputers towards clusters of computers
(PCs, workstations)
• From specialized traditional supercomputing platforms to
cheaper, general purpose systems consisting of loosely
coupled components built up from single or
multiprocessor PCs or workstations
• Linking together two or more computers to jointly solve computational
problems
History: Clustering of Computers
for Collective Computing

PDA
Clusters

1960 1980s 1990 1995+ 2000+


What is Cluster ?
• A cluster is a type of parallel and distributed processing system,
which consists of a collection of interconnected stand-alone
computers cooperatively working together as a single, integrated
computing resource.
• A node
• a single or multiprocessor system with memory, I/O facilities, & OS
• A cluster:
• generally 2 or more computers (nodes) connected together
in a single cabinet, or physically separated & connected via a LAN
• appears as a single system to users and applications
• provides a cost-effective way to gain features and benefits
Cluster Architecture

Parallel Applications
Parallel Applications
Parallel Applications
Sequential Applications
Sequential Applications
Sequential Applications Parallel Programming Environment

Cluster Middleware
(Single System Image and Availability Infrastructure)

PC/Workstation PC/Workstation PC/Workstation PC/Workstation

Communications Communications Communications Communications


Software Software Software Software

Network Interface Network Interface Network Interface Network Interface


Hardware Hardware Hardware Hardware

Cluster Interconnection Network/Switch


So What’s So Different about
Clusters?
• Commodity Parts?
• Communications Packaging?
• Incremental Scalability?
• Independent Failure?
• Intelligent Network Interfaces?
• Complete System on every node
• virtual memory
• scheduler
• files
•…
• Nodes can be used individually or jointly...
Windows of Opportunities
• Parallel Processing
• Use multiple processors to build MPP/DSM-like systems for parallel
computing
• Network RAM
• Use memory associated with each workstation as aggregate DRAM
cache
• Software RAID
• Redundant Array of Inexpensive/Independent Disks
• Use the arrays of workstation disks to provide cheap, highly available
and scalable file storage
• Possible to provide parallel I/O support to applications
• Multipath Communication
• Use multiple networks for parallel data transfer between nodes
Cluster Design Issues
• Enhanced Performance (performance @ low cost)
• Enhanced Availability (failure management)
• Single System Image (look-and-feel of one system)
• Size Scalability (physical & application)
• Fast Communication (networks & protocols)
• Load Balancing (CPU, Net, Memory, Disk)
• Security and Encryption (clusters of clusters)
• Distributed Environment (Social issues)
• Manageability (admin. and control)
• Programmability (simple API if required)
• Applicability (cluster-aware and non-aware app.)
Scalability Vs. Single System Image

UP
Common Cluster Modes
• High Performance (dedicated).
• High Throughput (idle cycle harvesting).
• High Availability (fail-over).

• A Unified System – HP and HA within the same cluster


High Performance Cluster (dedicated
mode)
High Throughput Cluster (Idle
Resource Harvesting)
Shared Pool of
Computing Resources:
Processors, Memory, Disks

Interconnect

Guarantee at least one Deliver large % of collective


workstation to many individuals resources to few individuals
(when active) at any one time
High Availability Clusters
HA and HP in the same Cluster

• Best of both Worlds:


world is heading
towards this
configuration)
Cluster Components
Prominent Components of Cluster Computers (I)

•Multiple High-Performance
Computers
• PCs
• Workstations
• SMPs (CLUMPS)
• Distributed HPC Systems leading to Grid
Computing
System CPUs
• Processors
• x86-class Processors
• Intel Xeon SPR
• AMD
• ARM
• Accerators
• Nvidia H100
• AMD MI300
• HABANA GAUDI
• Intel PVC
System Disk
• Disk and I/O
• Overall improvement in disk access time has
been less than 10% per year
• Amdahl’s law
• Speed-up obtained from faster processors is
limited by the slowest system component
• Parallel I/O
• Carry out I/O operations in parallel, supported by
parallel file system based on hardware or software
RAID
Commodity Components for Clusters (II):
Operating Systems
• Operating Systems
• 2 fundamental services for users
• make the computer hardware easier to use
• create a virtual machine that differs markedly from the real machine
• share hardware resources among users
• Processor - multitasking
• The new concept in OS services
• support multiple threads of control in a process itself
• parallelism within a process
• multithreading
• POSIX thread interface is a standard programming environment
• Trend
• Modularity – MS Windows, IBM OS/2
• Microkernel – provides only essential OS services
• high level abstraction of OS portability
Prominent Components of Cluster Computers
•State of the art Operating Systems
• Linux (MOSIX, Beowulf, and many more)
• Windows HPC (HPC2N – Umea University)
• SUN Solaris (Berkeley NOW, C-DAC PARAM)
• IBM AIX (IBM SP2)
• HP UX (Illinois - PANDA)
• Mach (Microkernel based OS) (CMU)
• Cluster Operating Systems (Solaris MC, SCO
Unixware, MOSIX (academic project)
• OS gluing layers (Berkeley Glunix)
Operating Systems used in Top500
Powerful computers

AIX
Prominent Components of Cluster Computers (III)
• High Performance Networks/Switches
• RDAM enabled Ethernet,
• InfiniBand
Interconnect Share
Prominent Components of Cluster Computers (IV)
• Cluster Middleware
• Single System Image (SSI)
• System Availability (SA) Infrastructure
• Hardware
• DEC Memory Channel, DSM (Alewife, DASH), SMP Techniques
• Operating System Kernel/Gluing Layers
• Solaris MC, Unixware, GLUnix, MOSIX
• Applications and Subsystems
• Applications (system management and electronic forms)
• Runtime systems (software DSM, PFS etc.)
• Resource management and scheduling (RMS) software
• Oracle Grid Engine, Platform LSF (Load Sharing Facility), PBS (Portable Batch
Scheduler), Microsoft Cluster Compute Server (CCS)
Advanced Network Services/
Communication SW
• Communication infrastructure support protocol for
• Bulk-data transport
• Streaming data
• Group communications
• Communication service provides cluster with important QoS parameters
• Latency
• Bandwidth
• Reliability
• Fault-tolerance
• Network services are designed as a hierarchical stack of protocols with
relatively low-level communication API, providing means to implement
wide range of communication methodologies
• RPC
• DSM
• Stream-based and message passing interface (e.g., MPI, PVM)
Prominent Components of
Cluster Computers (VI)
• Parallel Programming Environments and Tools
• Threads (PCs, SMPs, NOW..)
• POSIX Threads
• Java Threads
• MPI (Message Passing Interface)
• Linux, Windows, on many Supercomputers
• Parametric Programming
• Software DSMs (Shmem)
• Compilers
• C/C++/Java
• Parallel programming with C++ (MIT Press book)
• RAD (rapid application development) tools
• GUI based tools for PP modeling
• Debuggers
• Performance Analysis Tools
• Visualization Tools
Prominent Components of
Cluster Computers (VII)
•Applications
• Sequential
• Parallel / Distributed (Cluster-aware app.)
• Grand Challenging applications
• Weather Forecasting
• Quantum Chemistry
• Molecular Biology Modeling
• Engineering Analysis (CAD/CAM)
• ……………….

• PDBs, web servers, data-mining


Key Operational Benefits of Clustering
•High Performance
•Expandability and Scalability
•High Throughput
•High Availability
Clusters Classification (I)
•Application Target
•High Performance (HP) Clusters
• Grand Challenging Applications
•High Availability (HA) Clusters
• Mission Critical applications
Clusters Classification (II)
•Node Ownership
•Dedicated Clusters
•Non-dedicated clusters
• Adaptive parallel computing
• Communal multiprocessing
Clusters Classification (III)
•Node Hardware
•Clusters of PCs (CoPs)
• Piles of PCs (PoPs)
•Clusters of Workstations (COWs)
•Clusters of SMPs (CLUMPs)
Clusters Classification (IV)
•Node Operating System
• Linux Clusters (e.g., Beowulf)
• Solaris Clusters (e.g., Berkeley NOW)
• AIX Clusters (e.g., IBM SP2)
• SCO/Compaq Clusters (Unixware)
• Digital VMS Clusters
• HP-UX clusters
• Windows HPC clusters
Clusters Classification (V)
•Node Configuration
•Homogeneous Clusters
• All nodes will have similar architectures and
run the same OSs
•Heterogeneous Clusters
• Nodes will have different architectures and run
different OSs
Clusters Classification (VI)
• Levels of Clustering
• Group Clusters (#nodes: 2-99)
• Nodes are connected by SAN like Myrinet
• Departmental Clusters (#nodes: 10s to 100s)
• Organizational Clusters (#nodes: many 100s)
• National Metacomputers (WAN/Internet-
based)
• International Metacomputers (Internet-based,
#nodes: 1000s to many millions)
• Grid Computing
• Web-based Computing
• Peer-to-Peer Computing
Cluster Applications
Cluster Applications
• Numerous Scientific & engineering Apps.
• Business Applications:
• E-commerce Applications (Amazon, eBay);
• Database Applications (Oracle on clusters).
• Internet Applications:
• ASPs (Application Service Providers);
• Computing Portals;
• E-commerce and E-business.
• Mission Critical Applications:
• command control systems, banks, nuclear reactor
control, star-wars, and handling life threatening
situations.
Cluster of SMPs (CLUMPS)
• Clusters of multiprocessors (CLUMPS)
• To be the supercomputers of the future
• Multiple SMPs with several network interfaces can be connected using high
performance networks
• 2 advantages
• Benefit from the high performance, easy-to-use-and program SMP systems with a small
number of CPUs
• Clusters can be set up with moderate effort, resulting in easier administration and better
support for data locality inside a node
Many types of Clusters
• High Performance Clusters
• Linux Cluster; 1000 nodes; parallel programs; MPI
• Load-leveling Clusters
• Move processes around to borrow cycles (eg. Mosix)
• Web-Service Clusters
• load-level tcp connections; replicate data
• Storage Clusters
• GFS; parallel filesystems; same view of data from each node
• Database Clusters
• Oracle Parallel Server;
• High Availability Clusters
• ServiceGuard, Lifekeeper, Failsafe, heartbeat, failover clusters
Summary: Cluster Advantage
• Price/performance ratio of Clusters is low when compared
with a dedicated parallel supercomputer.
• Incremental growth that often matches with the demand
patterns.
• The provision of a multipurpose system
• Scientific, commercial, Internet applications
• Have become mainstream enterprise computing systems:
• As Top 500 List, over 50% (in 2003) and 80% (since 2008) of them are
based on clusters and many of them are deployed in industries.
• In the recent list, most of them are clusters!
Cluster Software Projects

https://hpsfoundation.github.io/

You might also like