Professional Documents
Culture Documents
Research Into The Conditional Task Scheduling Problem
Research Into The Conditional Task Scheduling Problem
and Michael J. Oudshoorn Department of Computer Science, The University of Adelaide, Adelaide, SA 5005, Australia fhuang,michaelg@cs.adelaide.edu.au
Lin Huang
One of critical issues a ecting parallel system performance is the scheduling of tasks onto Abstract A great number available target processors. of algorithms and software tools have been presented to deal with the task scheduling problem in parallel processing. In this paper, we study a general (less-restricted) scheduling case, named \conditional task scheduling", in which the task model of each program execution is not identical to previous models due to conditional branches associated with task runtime operations. We present a new environment, called ATME, to practically tackle this complicated conditional scheduling problem.
1 Introduction
Parallel programming is involved with more issues than its sequential programming counterpart. Such issues include task partition, task communication and synchronization, task scheduling and tuning. The performance of the parallel program is a ected by all of these issues. The paper focuses on task scheduling on loosely-coupled distributed processors, aiming to achieve good parallel system performance via an e cient task scheduling policy. Simply put, task scheduling is the distribution of partitioned tasks of a parallel program onto the underlying available physical processors. The scheduling problem has been studied for some time, and has been proved to be NP-complete 12] (i.e., there is no optimal solution to this problem within polynominal
time). Various heuristics 5, 8] and software tools 13, 14] have been proposed to pursue a suboptimal solution within acceptable computation complexity bounds. A detailed survey is found in 1, 4]. Most heuristics to date have assumed that the task model remains constant across program executions. However, when tasks (especially task runtime operations such as task spawn and message passing) are associated with conditional branches or loops, the parallel program is best illustrated by a \conditional task model". Such a task model may vary across di erent program executions. Consequently, task models provided to the scheduling algorithm in di erent executions may not be identical. In addition, the scheduling policy employed in each program execution is most likely di erent as well. Little research has been conducted on conditional task scheduling 3]. One obstacle in conditional task scheduling is that, prior to program execution, it is impossible, in general, to accurately predict the task model for a future execution. This is a fundamental requirement of the task scheduling algorithm in order to produce a scheduling policy to distribute tasks onto multiple processors. Another obstacle is that, due to variations in the task model between executions, the scheduling policy should vary across di erent program executions, in accordance with the change of the conditional task model, aiming to achieve good
system performance in the majority of executions. However, there is a lack of scheduling algorithms to tackle this form of task scheduling. In this paper, we introduce a practical approach to tackle the conditional task scheduling problem. We describe the framework and components of ATME | the environment developed to deal with the automation of conditional task scheduling on loosely-coupled distributed processors, and to support parallel program design and implementation. This paper is organized as follows. Section 2 describes the major components and process of the conditional task scheduling problem. The task model construction is described in Section 3. Section 4 discusses the framework of an environment, ATME, to undertake the scheduling of conditional parallel tasks and programming support. Conclusions are presented in Section 5.
The task scheduling problem can be decomposed into three major components: the task model which portrays constituting tasks and interconnection relationships among tasks of a parallel program, the processor model which abstracts the architecture of the underlying
available parallel system on which parallel programs can be executed, and the scheduling algorithm which produces a scheduling policy by which tasks of a parallel program are distributed onto available processors and possibly arranged by execution commencement order of tasks assigned on the same processor. The scheduling algorithm aims to optimize the desired performance measurement of either the parallel system or the parallel program. The two most-commonly used performance measures are parallel execution time of the parallel program (or schedule length) and total cost of communication delay and load balance which has been adopted by a number of commonly-used scheduling algorithms such as 2, 7, 11] and 5, 8, 10] respectively. The scheduling algorithm employed in a parallel system also determines attributes of tasks and processors to be considered in the task and processor model. Generally speaking, with the objective of minimizing total cost of communication delay and load balance, the task model is normally portrayed as a weighted undirected graph in which tasks are represented by nodes and weighted by computation cost, and intertask communication is represented by edges and weighted by communication delay. While, on the other hand, with the objective of minimizing parallel execution time, the task model is described by a weighted directed acyclic graph (DAG) with directed arrows (edges) used to represent precedence relationships between tasks. In this paper, the objective of task scheduling is to optimize the parallel execution time of the parallel program. An example of the whole process of the conditional task scheduling is illustrated in Figure 1. The scheduling algorithm takes as input the conditional task model (formally dened in Section 2.2) and the processor model, and generates a scheduling policy to distribute user tasks. The scheduling policy is usually represented by a Gantt Chart, which is a list of all processors available in the parallel system, such that a number of user tasks are distributed and arranged by their execution com-
Processor
15
Processor Idle
10 H
Processor Model P0 1
1111 0000
PT = 50
P1
P2
Figure 1: The procedure of solving a conditional task scheduling problem. mencement order on each processor. In task scheduling research, for the sake of simplicity, the processor model of the target parallel system is assumed to be composed of identical processors fully connected via identical communication networks. It is intrinsically complicated and di cult to statically determine a scheduling policy by merely depending on the scheduling algorithm, since the task model in the forthcoming program execution can not be predicted accurately prior to runtime. Addressing other parts of the conditional scheduling problem is necessary. In our work, the conditional task scheduling problem is tackled through two steps: rst the task model in the forthcoming program execution is incrementally predicted, basing on past execution pro les (detailed in Section 3); then a new scheduling algorithm, called CET, is used to produce a scheduling policy. CET accepts, as input, an estimated task model and a processor model, and takes more task attributes into consideration (i.e., computation time, communication delay and execution probability between interconnected tasks). A detailed illustration of CET algorithm is found in ?].
of each task interconnection (between a parent task and a child task) in E . It is a pair (communication data size, execution probability), which is further explained below:
The communication data size represents the volume of data transferred between a parent and its child task, if there is such communication. For simplicity, we assume that one unit of data is transferred per unit time. Hence, the volume of data transferred is directly proportional to the time taken for communication. The execution probability indicates the likelihood that the parent task spawns, or attempts to spawn, the child task and communicates with it.
succeeding tasks. Therefore, task execution is non-preemptive. An accurate task model, predicted prior to program execution and o ered to the scheduling algorithm, is critical in the production of a good scheduling policy by which parallel tasks can run e ciently | evaluated through some performance measure such as the parallel execution time. It is generally impossible to achieve an accurate task model in the case of conditional parallel programming where conditional branches and loops are associated with task runtime operations. Only when usage patterns of parallel program are stable, i.e., input parameters to all tasks of the program do not radically change between di erent program executions, is the accurate prediction of the task model feasible.
mined from static program analysis, we focus on task attribute estimation here. The prediction of task attributes is based on what is captured during previous executions. This means that the parallel program should be instrumented so as to generate runtime task information, which has been conducted in the environment, ATME, discussed later in this paper. The number of past executions retained is determined by the user. Two techniques are utilized in constructing the task model: one is the linear regression model to estimate the computation time and communication data size of tasks, while the other is based on a nite state machine to predict the execution probability.
taking incp as the regressor (i.e., xi = incpi) and past values of the corresponding task attribute (yi is either the computation time of a task or the communication data magnitude between a pair of interconnected tasks), the task attribute in the (n + 1)th execution, yn+1 , can be calculated. In the case when the input parameter of each task in the application is not so obviously, or easily, obtained, ATME can skip the input parameter estimation, while using i as the regressor to predict the value for task attributes. Since the estimation of incp and the task attributes adopt the same techniques, we combine them. We nd parameters and by the method of \least squares" 9]. Let
Task computation and communication attributes in the task model are determined by analysis of corresponding data values collected in the previous n (user-de ned) executions. Let yi stand for some data value in the ith execution. We use linear statistical techniques to estimate the value in the (n + 1)th execution, and have the following linear regression model:
f( ; ) =
n X i=1
yi ? ( + xi )]2
The value of , and must be determined to minimize the function f ( ; ). Thus, solving the equations: 8 @f P y ? ( + x )] = 0 > i < @ = ?2 n i=1 i
> :
yi = + x i + " i
where , are regression coe cients and "i is the error term (shock or disturbance) which can be neglected. xi is the regressor in the linear regression model. As mentioned earlier, each task has \input parameters" which control the attribute values of the task. The value of the \input parameters", which include the parameters of the task and global information used by the task, in uence the value, yi , we are estimating. Let incpi be the e ect of the input parameters in the ith execution of the task. Therefore, the input parameter of a task in the (n + 1)th execution (yn+1 = incpn+1 ) is rst estimated with the regressor as the execution number (i.e., xi = i) and the corresponding values captured in past n executions; and then,
n i n i
n i
i)
n i
n i
n i
n i i n ) i=1 i
n i n i=1
Therefore we can estimate yn+1 (the data value in the (n + 1)th execution) by yn+1 = + xn+1 : Thus, based on a series of values collected at runtime, the computation time and communication data size of a task in the next execution can be estimated.
Actual task models collected in the execution pro le are used to predict the task model in
Start x : State 1 : Execution beteen two tasks occurs at runtime 0 : Execution between two tasks does not occur at runtime
Figure 2: 4-state nite machine to predict execution probability. subsequent executions. Each interconnection in the actual task model is labeled with either 0 or 1 for the execution probability to indicate if task spawn and communication takes place along that route (1 for occurrence). The execution probability of an interconnection in the task model is predicted by applying corresponding values captured in previous executions onto a m-state nite state machine (FSM). Figure 2 provides an example of such a FSM, where m = 4. Starting at the start state (state 1 in this case), the FSM is navigated using the execution probability values for that interconnection from n previous actual task models (in order). There is a threshold state in the FSM which is used to predict whether the execution path along the interconnection will, or will not, take place. If the number of the nal state is equal to or greater than that of the threshold state, then the execution path between this pair of tasks is predicted to be \taken", otherwise it is assumed to be \not-taken". The prediction accuracy of the execution probability depends on the number of states, the initial and threshold state in FSM, as well as the usage patterns of the application and the number of values included in the \execution history".
Processor Model
Task Scheduling
Scheduling Policy
||
Task Interconnection Structure Control Flow Graph Instrumented ATME Tasks Task Model Construction
User Tasks
Program Preprocessing&Analysis
Task Model
+ Pre-Execution
Tuning Suggestions Post-Execution Tuning Suggestions Runtime-Data Collection Post-Execution Analysis Program Database
Results
Analysis Files
Reports
Report Generation
Programmer
ATME
Program Execution
Legend:
|| + : Output either of the inputs : Combine two inputs : Output/display from the environment : Functional component in ATME : File generated/accessed by ATME : Component outside the environment : Possibly occur : Possibly required : Demarcation line
Figure 3: Framework of ATME. instrumented and analyzed by the program preprocessing and analysis component in order to make it run on the PVM platform 6] and produce information to be captured (mainly regarding attributes of the task model) at runtime (refer to ?] for details). ATME provides explicit support for conditional parallel programming, which allows the programmer to concentrate on issues relating to the application itself. Based on what has been captured in previous executions, the task model construction component can predict the actual task model prior to the forthcoming program execution, as stated in Section 3. With the task model obtained from the task model construction and the processor model from the target machine description, the task scheduling component statically generates a policy by which the user tasks are distributed onto the underlying processors. An algorithm CET is developed in ATME and described in detail in ?]. At runtime, the runtime data collection component collects traces produced by the instrumented tasks which is stored, after the execution completes, into program databases to be taken as input by the task model construction to predict the task model for the next run. The postexecution analysis and report generation provides various reports and tuning suggestions back to the user and to ATME for program improvement. As it may be noticed that a \cycle" exists in the ATME environment: starting from the task model construction, through task scheduling, runtime data collection and back to the task model construction. This entire procedure makes ATME an adaptive environment in that the task model o ered to the scheduling algorithm is incrementally established based on the past usage patterns of the application. Accurate estimation of task attributes can be obtained for relatively stable usage patterns and thus admits improvement in execution ef-
ciency.
5 Conclusions
This paper studies the conditional task scheduling problem and proposes the framework of an environment ATME to practically tackle the automation of the conditional scheduling procedure. ATME realizes its support to conditional task scheduling via two main steps: rst the task model in the forthcoming program execution is estimated prior to runtime, basing on task runtime information pro led in previous executions; then an ATME scheduling algorithm is employed to generate a scheduling policy by which to distribute parallel tasks onto target architectures. ATME is an integrated environment which automates the whole procedure of conditional task scheduling and provides support to the design and implementation of parallel programs (via a runtime library ?]). With ATME, programmers are relieved of the need to consider the tedious and complicated issue of task scheduling while developing parallel programs. In addition, they are equipped with an e cient ATME runtime library which signi cantly reduces the burden in program design and implementation. The development of ATME is complete. Simulation has been employed to conduct experiments on the performance of ATME, the non-neglectible execution probability attribute when scheduling tasks, and the in uence of accurate task attributes on the e ciency of the scheduling policy and, thus, on system performance. Performance comparison of ATME against other scheduling algorithms and strategies has also been done. Experimental results, partially presented in ?, ?], show that ATME achieves good system performance in most program executions. ATME is intended to be extended so that it can deal with preemptive conditional task scheduling of parallel programs. Preliminary experiments indicate that considering (or allowing) preemption in user tasks can signi -
References
1] Thomas L. Casavant and Jon G. Kuhl. A taxonomy of scheduling in generalpurpose distributed computing systems. IEEE Transactions on Software Engineering, Volume 14, Number 2, pages 141{154, February 1988. 2] Wesley W. Chu, Leslie J. Holloway, MinTsung Lan and Kemal Efe. Task allocation in distributed data processing. Computer, Volume 13, Number 11, pages 57{ 69, November 1980. 3] Hesham El-Rewini and Hesham H. Ali. Static scheduling of conditional branches in parallel programs. Journal of Parallel and Distributed Computing, Volume 24, Number 1, pages 41{54, January 1995. 4] Hesham El-Rewini, Hesham H. Ali and Ted Lewis. Task scheduling in multiprocessing systems. Computer, Volume 28, Number 12, pages 27{37, December 1995. 5] Hesham El-Rewini and Ted G. Lewis. Scheduling parallel program tasks onto arbitrary target machines. Journal of Parallel and Distributed Computing, Volume 9, Number 2, pages 138{153, June 1990. 6] Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek and Vaidy Sunderam. PVM: Parallel Virtual Machine. A User's Guide and Tutorial for Networked Parallel Computing. The MIT Press, Cambridge, Massachusetts, London, England, 1994. 7] F. Harary. Graph theory. Addison-Wesley, New York, N.Y., 1969. 8] Chung Yee Lee, Jing Jang Hwang, Yuan Chieh Chow and Frank D. Anger.
9]
10]
11]
12] 13]
14]
Multiprocessor scheduling with interprocessor communication delays. Operations Research Letters, Volume 7, Number 3, pages 141{147, June 1988. Enders Anthony Robinson. Least squares regression analysis in terms of linear algebra. Houston, Texas, Goose Pond Press, 1981. Vivek Sarkar. Determining average program execution times and their variance. Proceedings of 1989 SIGPLAN Notice, Volume 24, Number 7, pages 298{312, July 1989. Conference on Programming Language Design and Implementation. Harold S. Stone. Multiprocessor scheduling with the aid of network ow algorithms. IEEE Transactions on Software Engineering, Volume SE-3, Number 1, pages 85{93, January 1977. J. Ullman. NP-Complete scheduling problems. Journal of Computing System Science, Volume 10, pages 384{393, 1975. Min You Wu and Daniel Gajski. Hypertool: a programming aid for messagepassing systems. IEEE Transactions on Parallel and Distributed Systems, Volume 1, Number 3, pages 330{343, July 1990. Tao Yang. Scheduling and Code Generation for Parallel Architectures. Ph.D. thesis, Department of Computer Science, Rutgers, The State University of New Jersey, 1993.