Professional Documents
Culture Documents
Marc Snir NGDM07
Marc Snir NGDM07
Marc Snir
www.informatics.uiuc.edu
Outline
How a Supercomputer looks like in > 2010
www.informatics.uiuc.edu
2
Large NSF Funded Supercomputers
beyond 2010
One Petascale platform -- Blue Waters at NCSA, U Illinois
– Sustained performance: petaflop range
– Memory: petabyte range
– Disk: 10’s petabytes
– Archival storage: exabyte range
www.informatics.uiuc.edu
3
The Uniprocessor Crisis
Manufacturers cannot increase clock rate anymore (power
problem)
Computer architects have run out of productive ideas on
how to use more transistors to increase single thread
performance
– Diminishing return on caches
– Diminishing return on instruction-level parallelism
Increased processor performance will come only from the
increase on number of cores per chip
www.informatics.uiuc.edu
4
Average # Processors Top 500 System
3000
2500
2000
1500
1000
500
0
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
www.informatics.uiuc.edu
5
Mileage is Less than Advertised
4
3.5
nominal
IPC 2.5
1.5
0.5
0
LCM Eclat FP-Growth
PC Balance
(word operands from memory per flop)
0.2
0.15
0.1
0.05
0
0 5000 10000 15000 20000 25000 30000
www.informatics.uiuc.edu
8
Solutions to the Memory Wall
Caching and locality
– Need algorithms with good locality
Split communication
– Memory prefetch (local memory)
– Put/get (remote memory)
– Need programmed communication (not necessarily
message-passing)
www.informatics.uiuc.edu
9
Load Balancing
Problem: Amount of computation in DM kernels heavily
data dependent -- work partitioning results in load
imbalances
Hard solution: develop good work predictors and do
explicit, static load balancing
Easy solution: use system with task virtualization and
dynamic task migration
– E.g., AMPI (Kale, http://charm.cs.uiuc.edu/) -- scalable,
negligible (often negative) overheads
– Overhead of task migration is few seconds, at worse
Parallel file system is shared all
– Task virtualization essential for modularity and
ease of programming
www.informatics.uiuc.edu
10
Code Tuning
Is essential when using a Petascale system
– 1 hour = $5K - $10K
Is data dependent (more so with Datamining than with
many applications)
Is platform dependent
www.informatics.uiuc.edu
11
Relative Performance of Frequent Item Mining
Codes is Input Dependent
0
normalizedchess
execution time
pumsb bms-wv-1 accidents webdocs
(s=0.18) (s=0.45) (s=0.0005) (s=0.16) (s=0.1)
3. Automatic (compiler,
tuning runtime)
www.informatics.uiuc.edu
13
Algorithm Selection is a Classification
Problem
•Can be solved using
supervised ML
Input set
•Smart part: choice of
Generated generator Input feature vector that can
datasets
be computed fast and
“works”
Empirical Alg.
evaluation pool Feature
Labeled
SVM Platform specific predict Run
SVM model Alg.
features learning Alg.
www.informatics.uiuc.edu
14
Results: Average execution time
30
25
20.63
20 17.4
15
10.51
9.34
10
5
0
Average execution time
www.informatics.uiuc.edu
16
Implementation Selection
www.informatics.uiuc.edu
17
Speedups of LCM
Intel AMD
2.2 2.2
all
all
all
2 2
1.2 1.2
1 1
0.8 0.8
DS1 (77) DS2 (169) DS3 (90) DS4 (36)
DS1 (73) DS2 (159) DS3 (93) DS4 (35)
www.informatics.uiuc.edu
18
Prediction results – LCM
119.6
70
87.7
65
60
55
optimal
50 predicted
original
45
Average execution time in seconds
tile
Number of times that each 40 all
code version is the fastest Average execution time
Prediction close to “optimal” (oracle)
Prediction overhead is negligible
www.informatics.uiuc.edu
20
Questions?
www.informatics.uiuc.edu
21
Similarity definition
“Similarity”: how similar transactions are to each other
T1 1 1 0 1 0 1
T2 0 1 0 0 1 1
Difference = 3,
www.informatics.uiuc.edu
22
Similarity feature definition
Normalized hamming distance defines pair-wise distance, but we
need a global measure of similarity among all transactions.
www.informatics.uiuc.edu
23
Prediction results on real-world datasets
www.informatics.uiuc.edu
24
Prediction results on real-world datasets
www.informatics.uiuc.edu
25
Using synthetic data for training
0. 38%
Generator
– Widely used in
data mining
research 99. 53%
Synthetic datasets
Problem:
– The generated
11.11%
dataset is not
representative of 50%
Real-world datasets
www.informatics.uiuc.edu
26
Item frequency curve
www.informatics.uiuc.edu
27
Using modified IBM generator to produce
algorithm variability
20.03% 11.11%
50%
56.58%
23.39% 38.89%
www.informatics.uiuc.edu
28
Lexicographic ordering of transactions
Preprocess original database by reordering transactions
in lexicographic order
– Alphabet: items in descending frequency order
Improves locality of accesses (LCM & FP_Growth);
reduces computation (Eclat)
Overhead of lexicographic ordering
www.informatics.uiuc.edu
29
Lexicographic ordering in LCM
c f a b d e
1 c f a
Spatial locality of traversal 1 1 1 2 4 4
is improved (fewer jumps) 2 c f b
2 2 3 5 5 5
– Locality improved for 3 c f a
most frequent items 3 3 5
4 d e
– Order mostly preserved 4 5
for projected databases
5 c f a b d e
5
– ordering overhead
amortized over multiple
traversals c f a b d e
1 c f a
1 1 1 3 3 3
2 c f a
2 2 2 4 5 5
3 c f a b d e
3 3 3
4 c f b
4 4
5 d e
www.informatics.uiuc.edu
30
Lexicographic ordering in Eclat
2 1 1 1 0 0 0 1 1 1 0 0 0
3 0 0 0 0 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 0 1 0 0
0 0 0 0 1 1
Mark last 1
c
Already
f
in the
a
cache
b
(2)
d
e
(1)
Pseudo node
www.informatics.uiuc.edu
32
Lexicographic ordering
– project() in FP-Growth
Access pattern: From an intermediate node to root
More (parent, child) pairs are contiguous in the memory – better
spatial locality
c
d c
f
f d Assuming nodes allocated
a e
b consecutively are
a e contiguous in memory.
b
(2)
(1) b
(1)
b (2) (1) Pseudo node
d
d (parent, child) are
e
contiguous in memory
e (1) (1)
www.informatics.uiuc.edu
33
Wave-front prefetch
Array of short linked lists
2 iterations between
Prefetch pointers from prefetch and use
different linked-lists in one I3:
iteration
I4:
Hides memory latency
I5:
Increases register pressure
www.informatics.uiuc.edu
34
Tiling (LCM)
www.informatics.uiuc.edu 35
35
Programming patterns applied
www.informatics.uiuc.edu
36