Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Machine Learning based Compiler Optimization (IN43C-1528)

Arash Ashari Ponnuswamy Sadayappan Davide Del Vento


Department of Computer Science and Engineering, The Ohio State University National Center for Atmospheric Research
ashari,saday@cse.osu.edu ddvento@ucar.edu

Introduction Contribution Results


Scientists running high performance geophysical models want In this project, we extended cTuning by adding new kernels, more HD3D is a pseudospectral three-dimensional
to achieve the fastest runtime possible for their software on any relevant to atmospheric models, to the training database. We tested the periodic hydrodynamic/magnetohydrodynamic/Hall-
framework on few simple models in use at NCAR: Shallow Water,
machine. For this goal, they usually select compilers’ default
EULAG and HD3D. We performed our experiments on Janus high
MHD turbulence model.
aggressive optimization flag, however this is often a suboptimal v fftp, fftp_mod, fftp3D, pseudospec3D_mod,
performance computer system (at the university of Colorado Boulder).
choice. In fact, the best matching set of optimization flags We show our results comparing the execution time of iterative pseudospec3D_hd, hd3D.para
depends both on the underline hardware and on the compilation, default aggressive optimization flag/s and flags chosen by
characteristics of the program. The fast change and rapid Default Improvement %
machine learning for both the GCC and the PGI Fortran compilers.
improvement in microprocessor technology, and diversity of Flag – Run
program profiles make finding such a best set of optimization
flags a challenging task. This problem is NP-Complete and thus v 
Method
Objective: Self-Tuning (automatically generate a
time IC KNN 10-KNN All All-10

Mpif90-
it is not possible to find the general exact solution. However, an combination of compiler flags that will minimize Run-time) 43.5 0.1 % 1.1 % 1.3 % 1.5 % 1.8 %
gfortran
approximate solution may be found. v  Three steps: Mpif90-
Recently, computer science researchers have applied machine- v  Iterative Compilation: repeatedly compile a training 42.6 0.1 % 0.2 % 0.5 % 0.5 % 5%
pgfortran
learning algorithms on both static and dynamic profiling features kernel with a randomly picked set of flags, and Leave 1 out Cross Validation on training kernels-GCC
of computer programs including hardware performance counters
to achieve semi-optimal transformations on different
benchmarking the corresponding runtime, selecting
the set of flags that provides the fastest run time.
Conclusions
benchmarks. cTuning is one of the existing tools, which is an v  Training: Learn a prediction model based on the v  Practical use of self-tuning or adaptive methods
open source framework that profiles static features of computer training data set to improve performance by compiler flag selection
programs and uses machine learning algorithms to find a semi- v  55 static features of programs are used for v  In some cases performance improvement can be
optimal set of compiler optimization flags, which are often better profiling (cTuning) made over maximum optimization level i.e. –O3
than the default aggressive optimization such as –O3 for GCC v  K-Mean is used for similarity calculation
and –fast for the PGI compiler. v  Training kernels (cTuning kernels and
(gcc) or –fast (pgi)
v  Machine Learning methods and Milepost GCC
Motivation Polybench- http://www.cse.ohio-state.edu/~pouchet/software/polybench)
v  Testing (Apply the learned model on an un-seen
program and suggest the optimal set of flags) In this
framework are promising and having it trained
over appropriate set of features and diverse
Execution time of LU kernel compiled with: step based on the similarity between the feature set range of kernels, Milepost GCC can improve the
v  gcc -O2 -fsched-stalled-insns=10 -finline-limit=1197 - of training kernels and the test program we choose a execution time of unseen programs by choosing
falign-labels=1 -fno-branch-count-reg -fbranch-target-load- set of flags that we expect to be optimal.
optimize -fcrossjumping -fno-cx-limited-range -fno-defer-
optimal set of compiler flags.
Test on atmospheric models with GCC v  Future Opportunities
pop -fdelete-null-pointer-checks -fno-early-inlining -fforce-
addr -fno-function-cse -fno-gcse-sm -fif-conversion -fno-if- v  Exploring feature space in an attempt to:
conversion2 -fno-modulo-sched -fno-move-loop-invariants v  prune the feature space
-fno-omit-frame-pointer -fno-optimize-sibling-calls -fno- v  Extract better features or combination of
peephole2 -freorder-blocks -fno-rerun-cse-after-loop -
freschedule-modulo-scheduled-loops -fno-sched-spec-
features
load-dangerous -fno-sched-spec-load -fno-sched-spec - v  Using different machine learning algorithms/
fno-sched2-use-superblocks -fschedule-insns2 -fsignaling- similarity measurement
nans -fno-single-precision-constant -fno-strength-reduce -
ftree-ccp -fno-tree-ch -ftree-dce -fno-tree-fre -ftree-loop-im Bibliography
-ftree-loop-optimize -ftree-lrs -fno-tree-sra -ftree-vect-loop- 1.  Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon,
version -fno-tree-vectorize -funroll-loops -fno-unswitch- Zbigniew Chamski, et al. Milepost GCC: Machine
loops à 3.9 Learning Enabled Self-tuning Compiler. International
v  gcc –O3 à 4.9 Journal Parallel Programming. 39:296-327, 2011.
v  20% improvement! 2.  Christopher M. Bishop (2006). Pattern Recognition and
Copied from: http://www.ctuning.org/wiki/index.php/CTools:MilepostGCC
Test on atmospheric models with PGI Machine Learning. Springer.

You might also like