Professional Documents
Culture Documents
W1 Intro.4u
W1 Intro.4u
● Course Convenor (Dr. Giuseppe Barca) and 2nd Lecturer (Dr. Alberto F. Martin)
● Course E-mail address (Piazza preferred, though): comp4300 (@anu.edu.au)
■ please make your subject line meaningful!
(e.g. ass1: trouble meeting deadline)
● Communication with teaching staff mostly handled via Piazza
Welcome to COMP4300/8300 ● Course webpage at https://comp.anu.edu.au/courses/comp4300
(we will use Wattle only for lecture recordings and marking!)
Parallel Systems ● course schedule
■ Labs start in week 2! Register before the end of week 1 via MyTimeTable
■ Account at NCI/Gadi needed! (apply for user accounts instructions)
● course assessment
■ 2x assignments 25% each, MSE 10% (redeemable), Final Exam 40%
● Assignments and exams submission via GitLab
■ We assume familiarization with Git/GitLab
■ See the following FAQ for details
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 1 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 2
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 3 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 4
Today’s lecture main take-away message Lecture Overview
To extract all the computational power from today’s (even commodity) microprocessors
there is no alternative but to EXPLICITLY manage parallelism in our programs!
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 5 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 6
The idea: Split program up and run parts simultaneously on different processors
● split your computation into tasks that can be executed simultaneously ● on p computers, the time to solution should (ideally!) be reduced by 1/p
● parallel programming: the art of writing the parallel code
Motivation:
● parallel computer: the hardware on which we run our parallel code
● Speed, Speed, Speed · · · at a cost-effective price
This course will discuss both!
■ if we didn’t want it to go faster we wouldn’t bother with the hassles of parallel
programming! Beyond raw compute power, other motivations may include
● reduce the time to solution to acceptable levels (e.g., under real-time constraints) ● enabling more accurate simulations in the same time (finer grids)
■ no point in taking 1 week to predict tomorrow’s weather! ● providing access to huge aggregate memories (e.g.
■ simulations that take months are NOT useful in a design environment atmosphere model requiring ≥ 8 Intel nodes to run on)
● tackle larger-scale (i.e., with higher computational demands) problems ● providing more and/or better input/output capacity
● keep power consumption and heat dissipation under control ● hiding latency
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 7 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 8
Scales of Parallelism Sample Application Areas: Grand Challenge Problems
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 9 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 10
■ cells are 1 × 1 × 1 mile to a height of 10 miles ⇒ 5 × 108 cells If you were plowing a field, which would you rather use? Two strong oxen or 1024
■ 200 floating point operations (flops) to update each cell ⇒ 1011 flops per chickens? Seymour Cray, 1986
● on a 1 tflops/sec. computer this would take just 10 minutes ● 90’s–00’s: shared and distributed memory parallel computers used for servers
(small-scale) and supercomputers (large-scale)
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 11 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 12
Moore’s Law & Dennard Scaling Moore’s Law and Dennard Scaling Undermine Parallel Computing
● parallel computing looked promising in the 90’s, but many companies failed due to
Two “laws” underpin exponential performance increase of microprocessors from the 70’s the ‘free lunch’ from the combination of Moore’s Law and Dennard Scaling
■ Why parallelize my codes? Just wait 2 years, and processors will be 2x faster!
On several recent occasions, I have been asked whether parallel computing will soon be
relegated to the trash heap . . . Ken Kennedy, CRPC Director, 1994
Luddites
Fanatics
(P−dP,R−dR) (P+dP,R−dR)
Degree of
Parllelism (P)
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 13 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 14
Chip Size and Clock Speed End of Dennard Scaling and Other Free Lunch Ingredients
The ‘free lunch’ continued into the early 2000’s, allowing faster (higher clock rates) and
more complex (serial) processors: systems so fast that they could not communicate
across the chip in a single clock cycle! (full paper available here)
● even with Dennard Scaling, we
saw a exponential increase in
the power density of chips
1985–2000 (superlinear
scaling of clock rate)
● then Dennard Scaling ceased
around 2005! (power becomes
a huge concern)
● instruction level parallelism
(ILP) also reached its limits
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 15 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 16
The Multicore Revolution Where are we now?
● vendors began putting multiple CPUs (cores) on a chip, and stopped increasing
clock frequency
We find ourselves in a chip development era characterized by:
■ 2004: Sun releases dual-core Sparc IV, heralding the start of the multicore era
● (dynamic) power of a chip is given by: P = QCV 2 f , V is the voltage, Q is the ● Transistors galore (this cannot be sustained forever, though, due to physical limits)
number of transistors, C is a transistor’s capacitance and f is the clock frequency
● Sever power limitations (maximize transistors utility no longer possible)
■ but, on a given chip, f ∝ V , so P ∝ Q f 3!
● Customization versus generalization
● (ideal) parallel performance for p cores is given by R = p f , but Q ∝ p
■ double p ⇒ double R, but also P We are now seeing:
■ double p, halve f ⇒ maintain R, but quarter P! ● (customized) accelerators, generally manycore with low clock frequency
● doubling the number of cores is better than doubling clock speed!
■ e.g. Graphics Processing Units (GPUs), customized for fast numerical
● Moore’s Law (increase Q at constant QC) is expected to continue (till ???), so we calculations
can gain in R at constant P
● ‘dark silicon’: need to turn off parts of chip to reduce power
● cores can be much simpler (e.g., exploit less ILP), but there can be many
● chip design and testing costs are significantly reduced ● hardware-software codesign: speed via specialization
● BUT ... parallelism now must be exposed by software: The Free Lunch Is Over!
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 17 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 18
The Top 500 Most Powerful Computers: November 2022 Top500: Performance Trends
The Top 500 provides an interesting view of these trends (click here for full list)
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 19 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 20
The Top500: Multicore Emergence Exascale and Beyond: Challenges and Opportunities
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 21 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 22
● Amdahl’s Law: Let f the fraction of a computation that cannot be split into parallel ● Time Moore: Exploiting Moore’s Law From The Perspective of Time. IEEE
tasks. Then, max speedup achievable for arbitrary large p (processors) is 1f !!! Solid-State Circuits Magazine, 11 (1), 2019. Publisher: IEEE.
■ Let ts and t p denote serial and parallel exec times Only for intrepid readers:
■ t p = f ts + (1−pf )ts
p
■ S p = ttps = p f +(1 − f ) (Speedup)
● Computing Performance: Game Over or Next Level?, Full report from US National
■ lim p→∞ S p = 1f Research Council, National Academies Press, 2011. Bonus: reprints for Moore’s
and Dennard’s Laws seminal papers.
■ e.g., if f = 0.05, then 1f = 20
● counterargument (Gustafson’s Law): 1 − f is not fixed, but increases with the
data/problem size N
COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 23 COMP4300/8300 L1: Introduction to Parallel Systems 2023 ◀◀ ◀ • ▶ ▶▶ 24
Recommended Texts + extra materials