Automatic Static Feature Generation For Compiler Optimization Problems

Automatic Static Feature Generation for Compiler Optimization Problems
Abid M. Malik
Department of Computer Science Rice University, Houston, TX Abid.M.Malik@rice.edu
Abstract. Modern compilers have many optimization passes which help to get a better binary code for a given program. These optimizations are NP-hard. People use dierent heuristics to get a near optimal solution. These heuristics are designed by a compiler expert after examining sample programs. This is a challenging task. Recently, people have used machine learning techniques instead of heuristics for compiler optimizations. Machine learning techniques have not only eliminated the human eorts but have also out-performed human made huristics. However, the human eorts have now been moved from creating heuristics to selecting good features. Selecting right set of features is important for machine learning techniques since no machine learning tool will work well with poorly choosen features. This paper introduces a noval approach to generate features for machine learning for compiler optimization problems with out any human involvement.
Introduction
Modern compilers provide large number of optimization passes to get a better binary code for a target machine. All-most all optimizations are NP-hard [14]. There are no deterministic algorithums for these optimizations to get an optimal solution. People use heuristics to nd a near optimal solution for these optimization problems. These heuristics are created by a compiler expert by observing various program applications. It is a challenging task and requires many man hours. If one is able to ne tune a heuristic for a given architecture, when a new processor comes to the market, compiler expert has to repeat the whole process again to ne tune the heuristic for the new architecture. A companys greatest interest is to shorten this process in order to market the new product as soon as possible. Recently, people have used machine learning techniques to reduce this time cycle. The ultimate goal of using machine learning techniques is to learn a heuristic for a new enviorment at the press of a botton. Also, in practice, machine learning techniques have given better performance than their human created counter parts [16]. In a typical machine learning enviornment for a compiler optimization problem, number of programs are transfored into input vectors. For each program, we determine a desired output vector. Input vectors along with output vectors
D. Wang and M. Reynolds (Eds.): AI 2011, LNAI 7106, pp. 769778, 2011. c Springer-Verlag Berlin Heidelberg 2011
770
A.M. Malik
Fig. 1. Typical compilation path in a compiler for modern processors
form a training set. A machine learning tool will then try to construct a model which maps input vectors to output vectors. An input vector is a set of features which captures charateristics of a program. Now, an important question is; what are the best features for a given machine learning approach? A compiler expert has to decide set of features keeping in view the target compiler optimization and architecture. This is a dicult task. Every machine learning tool is bounded by the performance of the input features. Hence, selecting good features is an important research area in the eld of applying machine learning techniques for various compiler optimization problems. A typical compilation path in a modern compiler is shown in Figure 1. A compiler takes a source program written in a high-level language as an input. It performs lexical analysis (scanning), syntax analysis (parsing) and semantic analysis (type checking) in the front-end. It converts a given program into an intermediate representation (IR). IR is a machine and language independent version of the original source code. IR is used to create various data structures like abstract syntex tree (AST), control ow graph (CFG), data dependency graph (DDG) etc. These data structures are taken by the back-end to apply various optimizations. These data structures contain wealth of information about programs. People use these data structures to create features for machine learning techniques. Although machine learning alleviates a compiler expert from the task of building a heuristic, but it put on him another challenging task of reducing the wealth of information to a small set of features. This paper introduces an approach for autmatically creating a feature set with out the involement of a compiler expert by considering a compiler optimization problem as a classication problem. The rest of the paper is organized as follow. Section 2 gives the related work in this area. Section 3 discusses our approach. Section 4 talks about the experimental set up for the work. Section 5 talks about the results. Section 6 concludes our contributions and talks about our future line of action in this area.
Related Work
One of the rst researchers to incorporate machine learning into compiler for optimization problems were McGovern et al. [13] who used reinforcement learning for scheduling of straight-line code. Cavazos et al. [5] extended this idea by learning whether or not to apply instruction scheduling. Stephenson et al. [17]
771
looked at tuning the unroll factor using supervised classication techniques such as K-nearest neighbor (KNN) and support vector machines (SVM). Subsequent researchers have considered predictive models using machine learning techniques to automatically tune a compiler for an existing microarchitecture. These models use programs features to focus the search of optimization space in promising areas. Agakov et al. [2] use static code features to characterize a given program while Cavazos et al. [4] investigate the use of hardware performance counters. Leather et al. [16] give grammar to select the features to represent a program. Christophe et al. [8] use hardware features for selecting the best compiler options for a given architecture. The work by Ganapathi et al. [10] applies machine learning for compiler optimization problems for multi-core architectures. Malik [11] tries to capture spatial information of DDGs for the machine learning techniques. Yoki et al. [19] give static features for machine learning for tile selection problem. Recently, MILEPOST-GCC framework has been developed by IBM Haifa to drive the compiler optimization process based on machine learning. The framework gives features which are very comprehensive in terms of capturing all important characteristics of a given program. Interested readers can consult the work by Fursin et al. [9] for complete list of features. In this paper, we compare the performance of our approach against MILEPOST-GCC framework.
Our Approach
Previous work in this area needs lot of compiler expert involvement in crafting or selecting the best features for a machine learning technique for a given compiler problem. The main contribution of this work is an approach that does not require this involvement at any stage. Figure 2(a) is an IR for the C code in Figure 2(b). Due to the space constraints, we are not re-producing the whole IR for the code1 . We are showing only that part of IR that is good enough to establish our point of view. Figure 2(a) has two types of branch instructions. Branch instruction, br label register, takes one argument. Branch instruction, br i1 register label register label register, takes three arguments. The branch instruction with one argument is an unconditional branch instruction. The branch instruction with three arguments is a conditional branch instruction. The two branch instructions change the control ow of a program in a dierent way. The unconditional branch changes the control of a program with-out any testing. The conditional branch rst does a test in the rst argument and then changes the control of a program to the second or third argument depending upon the out come of the test. These instructions are categorized as one class in the previous work. However, both the branch instructions capture dierent semantics of a given program. An unconditional branch instruction may represent an endless loop while a conditional branch instruction shows a loop with limited iterations in a program. If one develops a feature set by considering such instructions as one class, it will be hard for a machine
1
Interested readers can consult the GCC documentation [1] to reproduce it.
772
A.M. Malik
Fig. 2. (a) IR code for the C code using the GCC compiler. Most parts of IR have been replaced by the dotted lines due to space constraints (b) C code program.
learning tool to dierentiate between semantics of dierent programs. With this approach, there is a possibility that two programs might have dierent semantic meanings but have similar feature vectors. For the approach in this paper, we borrow the idea from the feature selection approaches that are being used in the text classication problem. A text document consists of words which capture its semantics. In the text classication problem, a text document is represented by a set of words which are considered the most helpful in classifying the document with respect to a given class. Many statistical approaches are used to select the best features [20]. In the approach, we consider IR of a program as a text document. Instead of collapsing certain types of instruction as one instruction in order to reduce the dimensionality of search space, we treat each instruction type at IR level as one feature. A compiler has limited number of instruction types at IR level. In this work, we use the GCC compiler [1] which has 200 intruction types. One can use all 200 intruction types as a feature set for a machine learning tool for any optimization problem. However, this may lead to the curse of dimensionality problem which decreases the performance of a machine learning tool [12].
773
To select the best features with respect to a compiler optimization problem, we used feature selection methodology adopted by people for the text classication problem. For this, we rst dened a compiler optimization problem as a classication problem. In this work, we applied the approach to the best optimization options selection problem. Modern compilers have more than 100 options and selecting the best option for a given program is NP-hard. People use heuristic to nd the best options for a given problem. If each optimization option is considered as a class, then the best compiler options selection problem can be dened as a classication problem as follow: Given a progarm P and compiler optimization class C, determine whether P belongs to C or not. For simplicity, we assume that each optimization is independent of each other. We use the following criteria from the work [20] to select the best 30 features 2 for our feature vector representation.
3.1
Frequency Thresholding
The number of times an instruction type occurs in the training set. The basic assumptions is that rare instructions are not inuential for compiler optimization options. Rare instruction removal reduces the dimensionality of the feature space. This is the simplest technique but not a very good criterion to pick the best features.
3.2
Information Gain(IG)
Information gain is frequently employed as a term-goodness criterion in the eld of machine learning. It measures the number of bits of information obtained for category prediction by knowing the presence and absence of a term. Let {Ci }m i=1 gives the set of compiler options available in a compiler. The information gain of an instruction type t is dened by Equation 1.
m
G(t) =
i=1
Pr (Ci )logPr (Ci ) + Pr (t)Pr (Ci |t)logPr (Ci |t) + Pr (t)Pr (Ci |t)logPr (Ci |t) (1)
Pr (Ci ) gives the probablity of compiler option Ci being turned ON in the training set. Pr (t) gives the probablity of an instruction type in the training set . Pr (Ci |t) gives the probability of Ci turned ON given an instruction type t. Pr (Ci |t) gives the probability of Ci turned ON given an instruction type t is absent.
2
We want to see how robust the approach is as compare to MILEPOST-GCC framework. MILEPOST-GCC gives 55 hand made features.
774
A.M. Malik
3.3
2 Statistics (CHI)
CHI measures the lack of independence between two terms. Equation 2 gives CHI for an instruction type t with respect to a compiler option Ci . 2 (t, Ci) = avg N (AD CB)2 (A + C) (B + D) (A + B) (C + D) (2)
Where A is the number of times an instruction type t and compiler option Ci being turned ON co-occur for a program. B is the number of times the instruction type t occurs with the compiler option Ci being turned OFF. C is the number of times option Ci is turned ON with out instruction type t. D is the number of times neither instruction type t is present nor option Ci is turned ON. N is the total number of programs in the training set. We calculate the average value of CHI for each instruction type using Equation 3.
m
2 (t) = avg
i=1
Pr (Ci )2 (t, Ci )
(3)
Where m is the total number of compiler options and Pr (Ci ) gives the probablity of compiler option Ci being turned ON in the training set.
Experimental Setup
The tools, benchmarks, architecture and environment used for the work are briey described in this section. 4.1 Compiler
The GCC was selected as it is a mature and popular open-source optimizing compiler that supports many languages, has a large community, is competitive with the best commercial compilers, and features a large number of program transformation techniques. The GCC is the only open source compiler that supports more than 30 processor families. For our work, we selected the latest GCC 4.4x version. 4.2 Flags
In the latest version of GCC, there are about 100 ags. It is impossible to validate all combinations of optimization of ags. Most of these ags are considered with the global GCC optimization levels ,i.e., -O1, -O2 and -O3. For our work, we considered the GCC optimization level -O3 and then considered a particular optimization by tunning it ON or OFF through a corresponding ags f < optimizationname > and f no < optimizationname > ags respectively. Certain combination of ags cause the compiler to break or produce incorrect program execution and hence incorrect result. We reduced the probability of such cases by comparing outputs of program with the reference ouputs.
775
4.3
Plateforms
We used Intel Dual core running at 2.0 GHz with 4.0 GB of main memory and 1 MB of L2 cache, runing Ubuntu Linux . 4.4 Benchmarks
We used the MiBench and SPEC2006 benchmark suites. The MiBench consists of six categories of C programs. These categories oer dierent program characteristics that enable researchers in architecture and compiler to examine their design more eectively. The SPEC2006 benchmark consists of 39 applications both in FORTRAN and C. It is a standard benchmark which is used to verify various compiler optimization techniques. We only considered C applications from the SPEC2006. 4.5 Experiments
First, we identied hot functions in the benchmark applications. We dene a hot function as one which is mostly executed in a given program application. We used the gprof tool to determine hot function for a given application. The features extracted from a hot function was used to build a feature vector for machine learning. Feature vector of a hot function was used to represent a program instance. Each feature in a feature vector was weighted using a novel approach given by Malik [11]. We build a training set using the SPEC2006 C applications. We used the genetic algorithm (GA) from work [6] to get the best compiler options for each application. We used 1000 evolutions for GA approach. Each run was repeated ve times so that the speedups were not cacused by cache priming etc. This was the most time consuming part of our work. In some cases, it took more than a day to nd the best compiler options for an application. For our work, we selected two machine learnng techniques; decision tree (DT) and support vector machines (SVM). We used MILEPOST-GCC work [9] as a reference to compare the quality of our work as it gives the most detailed man made features for machine learning techniques. We used the C4.5 algorithm [7] for DT implementation. We used the default values for various parameters given by the developer. We implemented both the linear and non-linear SVMs. We used the SVMlight tool which is freely availabe on the web [18]. Again, we used the default values for various parameters set by the SVMlight tool. However, for the non-linear SVMs, we used radial basis function kernel with = 1 and the upper bound parameter (C) of SVMs equal to 10. DT learning took about a minute to build a model. However, SVM learning took 2 to 5 minutes to build a model for the best compiler ag selection problem.
Experimental Results
We compare the performance of the iterative approach using GA with the performance of the two machine learning techniques using MILEPOST-GCC framework
776
A.M. Malik
Fig. 3. Performance of the non-linear SVMs learning using 2 statistics (CHI) as a selection criterion for our framework
and our approach. Our approach did better on both DT and SVM techniques. However, due to the space constraints, we will discuss the non-linear SVM results. Figure 3 compares the performance of non-linear SVM using MILEPOSTGCC (ML) and our approah (SP) using CHI criterion againt the iterative model (ITR) using GA. In Figure 3 the horizontal axis gives name of each benchmark application used for testing and its performance using ITR, ML and SP models. The vertical axis gives the speed up over the running time when the same application is compiled with the -O3 GCC optimization level 3 . For the bitcount application using ITR model, we are able to nd a binary code which is 1.6 times faster than a binary code when one compiles it with the -O3 GCC optimization level. Figure 3 shows that with CHI selection criterion, our automatic approach outclasses MILEPOST-GCC framework. On average, our approach out performs MILEPOST-GCC framework on average by 6%. The reason of good performace using CHI criterion is its ability to capture the inter-dependency to some extent between dierent optimization options. Note, CHI criterion calculates the weightage of each instruction type by determining its presence and absence in each compiler option. This information is missing in MILEPOST-GCC framework. Figure 4 shows the performace of our framework using dierent selection criteria with the non-linear SVM learning. CHI criterion gives the best performace while the frequeny based selection gives the worst. The information gain performace is reasonable but is not as good as CHI.
-O3 is the highiest optimization level in the GCC compiler.
777
Fig. 4. Performance of the non-linear SVMs learning using 2 statistics (CHI), information gain (IG) and frequency (FREQ) as selection criteria for our framework
Conclusion and Future Work
Applying machine learning techniques to compilers require experts to generate features. At no point could the expert be sure that they have the best set of features to assist the learner. We presented in this paper a novel technique to generate features automatically with out the assistence of compiler expert. We tested the approach extensively using the two standard benchmarks and compared it against human created features from IBM MILEPOST-GCC framework. The results showed that our framework clearly out-performed IBM MILEPOST-GCC framework on almost all benchmark applications using SVM and DT learning on the best compiler option selection problem. In future, we plan to use the approach to investigate the automaic selection of better order of optimization passes and ne-grained tuning of transformation parameters for important optimization,e.g., unrolling factor of loop unrolling optimization.
References
1. http://gcc.gnu.org/ 2. Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., OBoyle, M., Thomson, J., Toussaint, M., Williams, C.: Using machine learning to focus iterative optimization. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2006 (2006) 3. Bodin, F., Kisuki, T., Knijnenburg, P.M.W., OBoyle, M., Rohou, E.: Iterative compilation in a non-linear optimization space. In: Workshop on Prole Directed Feedback-Compilation, PACT 1998 (1998)
778
A.M. Malik
4. Cavazos, J., OBoyle, M.: Method-specic dynamic compilation using logistic regression. In: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA 2006 (2006) 5. Cavazos, J., Moss, J.: Inducing heuristics to decide whether to schedule. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2004 (2004) 6. Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for Reduced Code Space using Genetic Algorithms. In: Workshop on Languages, Compilers, and Tools for Embedded Systems, LCTES 1999 (1999) 7. http://cis.temple.edu/~ ingargio/cis587/readings/id3-c45.html 8. Dubach, C., Jones, T.M., Bonilla, E.V., Fursin, G., OBoyle, M.F.: Portable Compiler optimization across embedded programs and micro-architectures using machine learning. In: Proceedings of the 42nd IEEE/ACM International Symposium on Micro-architecture (2009) 9. Fursin, G., Miranda, C., Temam, O., Namolaru, M., Yom-Tov, E., Zaks, A., Mendelson, B., Barnard, P., Ashton, E., Courtois, E., Bodin, F., Bonilla, E., Thomson, J., Leather, H., Williams, C., OBoyle, M.: MILEPOST GCC: machine learning based research compiler. In: Proceedings of the GCC Developers Summit, GCC 2008 (2008) 10. Ganapathi, A., Datta, K., Fox, A., Patterson, D.: A case for machine learning to optimize multicore performance. In: Proceedings of the First USENIX Conference on Hot Topics in Parallelism, HotPar 2009 (2009) 11. Malik, A.M.: Spatial Based Feature Generation for Machine Learning Based Optimization Compilation. In: Proceedings of the 9th IEEE International Conference on Machine Learning and Applications, ICMLA 2010 (2010) 12. Mitchell, T.: Machine Learning. McGraw-Hill (1997) 13. McGovern, A., Moss, E.: Scheduling straight-line code using reinforcement learning and rollouts. In: Proceedings of Neural Information Processing Symposium, NIPS 1998 (1998) 14. Muchnick, S.: Compiler Optimization for Modern Compilers. Morgan Kaufmann (1997) 15. Ipek, E., Mckee, S.A.: Ecently exploring architectural design spaces via predictive modeling. In: Proceedings of Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006 (2006) 16. Leather, H., Bonilla, E., OBoyle, M.: Automatic feature generation for machine learning based optimizing compilation. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2009 (2009) 17. Stephenson, M., Amarasinghe, S., Martin, M., ORelly, U.M.: Meta optimization: Improving compiler heuristics with machine learning. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2003 (2003) 18. http://svmlight.joachims.org/ 19. Yuki, T., Renganarayanan, L., Rajopadhye, S., Anderson, C., Eichenberger, A., OBrien, K.: Automatic Creation of Tile Size Selection Models. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO 2010 (2010) 20. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997 (1997)

Automatic Static Feature Generation For Compiler Optimization Problems

Uploaded by

Copyright:

Available Formats

You might also like

Automatic Static Feature Generation For Compiler Optimization Problems

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatic Static Feature Generation For Compiler Optimization Problems

Uploaded by

Copyright:

Available Formats

Automatic Static Feature Generation for Compiler Optimization Problems

Fig. 1. Typical compilation path in a compiler for modern processors