Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Computers & Operations Research 35 (2008) 2561 2578

www.elsevier.com/locate/cor

Queueing analysis of a server node in transaction processing


middleware systems
Wei Xionga, , Tayfur Altiokb
a Department of Public Health, Weill Medical College, Cornell University, New York, NY 10021, USA
b Department of Industrial and Systems Engineering, Rutgers University, P.O. Box 909 Piscataway, NJ 08854-0909, USA

Available online 5 February 2007

Abstract
Quantitative performance modeling of complex information systems is of immense importance for designing enterprise e-business
infrastructures and applications. In this paper, we present a trafc model of a server node in a typical transaction processing middleware
system as well as a quantitative framework to model and analyze its performance. A multi-class open queueing network model is
presented in which multi-class jobs are admitted to a number of server processes sharing hardware resources including the CPU and
the disk. We have developed a viable approximation method, which decomposes the dependent components into their independent
counterparts while preserving their relevant characteristics. We have conducted queueing-theoretic delay analyses and veried the
approach using simulation. Results demonstrate the strength of our approach in predicting delays, elapsed times and other system
performance measures.
2007 Elsevier Ltd. All rights reserved.

Keywords: Performance analysis; Queuing theory; Transaction processing; TP monitor; Server processes

1. Introduction

Over the past decade, the Web and the Internet have served together as the primary enabling technology underpinning
spectacular developments in e-commerce applications. This reality motivates the continuing development of new
software and allied technologies for distributed systems, including middlewares and component-oriented application
servers, especially in client/server (C/S) environments. Enterprise applications often require concurrent access to
distributed data shared among multiple components in order to perform operations on data. In such cases, it may be
necessary that a group of operations on (distributed) resources be treated as one unit of work called a transaction, which
often possesses the ACID properties [1]. A transaction processing (TP) middleware system is in fact a multi-layer,
multi-platform and multi-user system processing transaction requests which are initiated by client nodes, processed
by server nodes and eventually returned to the initiating client nodes as replies. Performance evaluation of such
complex systems and decision making such as capacity planning thereafter are essential for successful implementation.
Performance of a TP system is usually monitored using a set of performance metrics that gauge system behavior

Corresponding author.
E-mail addresses: wmz2001@med.cornell.edu (W. Xiong), altiok@rci.rutgers.edu (T. Altiok).

0305-0548/$ - see front matter 2007 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cor.2006.12.012
2562 W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578

over time. Three major classes of metrics are:

Latency or response time metrics measuring delay between initiation of a request and receipt of its reply.
Throughput or capacity metrics measuring amount of work done per unit time.
Utilization metrics measuring fraction of time that system resources such as CPU, disk or network are operating.

Our work is motivated by the extensively used TP monitor named TUXEDO [23] whose architecture is general enough
to represent typical TP middlewares. In this paper, we present an analytical approach based on queueing theoretic
arguments to approximate the above set of performance metrics of a TP system. A multi-node TP network is too
complex to model and predict its performance and there are no known exact analytical methods to study the system
behavior. Since most of the delay in handling transactions in a typical TP network is anticipated in server nodes, we
mainly focus on delay analysis at a server node.
We propose a method, similar to the layered queueing network (LQN) approach, to model a server node as a queueing
network (QN) with two layers, the queues for the server processes (SPs) at the upper layer, and the queues for the
CPU and the disk subsystems at the lower layer. In view of this, we have developed a one-step decomposition method
to analyze the QN model of the trafc ow at a typical server node in a TP system. We directly solve the two-layer
network model by passing proper information from one layer to another. One unique feature of the proposed approach
is its capability of handling the round-robin service discipline at the CPU subsystem along with multiple classes of
jobs served by the SPs.
This paper is organized as follows: Section 2 presents an overview of the earlier work related to performance modeling
of TP systems. Section 3 introduces the structure and functionality of TP monitors and presents a typical trafc model
of a server node in TP middleware systems. Section 4 presents a QN model of the server node and nally Section 5
provides numerical examples.

2. Early work

Modeling and analysis of distributed systems have received considerable attention during the last two decades. Three
common approaches for performance analysis of computer systems have been measurement/benchmarking, simulation,
and analytical modeling. In this section, we briey review the related queueing-based literature. Historically, computer
system analysts used queueing models to deal with performance problems of computer hardware design. Queueing
models for the memory, disks and processors have been built to represent specic hardware designs [2]. More recent
research concentrated on performance models reecting both hardware and software considerations. Highleyman [3]
gave a review of technologies involved in TP and provided a variety of performance models for major building blocks
of a TP system. A number of product-form QN models for delay analysis were presented in Menasce and Almeida
[4] for application servers, disk subsystems and networks. However, even though these models offer computationally
efcient solutions via the mean value analysis, they tend to assume highly simplistic assumptions and fail to model
realistic scenarios such as simultaneous possession of resources and non-exponential service times with round-robin
type service disciplines.
LQNs have been used to provide approximations for delays and response times in systems with software servers
and hardware resources. In this approach, CS systems were modeled by a layered representation due to its multi-tier
architecture. LQNs extended the QN model to reect interactions between client and SPs, as well as the contention
of both software processes and hardware component. Two successive layers in LQNs formed a submodel and it was
studied using MVA techniques. Performance measures were estimated by iterating among submodels to nd a xed
point where each sub-group in the model has the same throughput. Performance behaviors of LQN are estimated
either by the method of layer (MOL) or the stochastic rendezvous network model (SRVN). The MOL [5] was an
iterative approach that decomposed LQNs into two groups of submodels, one for software processes and the other for
hardware devices. The software models estimated the software contention delays between successive levels, whereas
the device contention model was used to determine queueing delays at the device level. The results of the two sets
of models were combined to provide performance estimates. The output of MOL included response time, throughput,
utilization and queue lengths for both software processes and hardware resources. The SRVN [6,7,26] extended QNs to
model the system with rendezvous (send-and-wait) delays, which allowed for nested services and offered two phases
of service. The overall model was solved by rst decomposing into a set of interrelated submodels each consisting of
W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578 2563

one server and a set of directly connected clients. The submodels were solved iteratively using MVA approach. There
have been a number of studies that used LQNs approaches in performance modeling of various applications such as
telecommunication systems [8], middleware systems [9] and software components [10].
Another approach employed to study C/S systems was the simultaneous resource possession MVA (SRP-MVA)
approach described in [11]. SRP-MVA modied the basic MVA approach to reect the effect of SRP. It iterated on the
average queue lengths until some convergence criterion was met. Impacts of synchronous and asynchronous messages
on system performance were introduced in [12,13] where the authors employed a rigorous treatment for phase-type
service time distributions in closed networks of queues that do not meet product-form requirements. A QN model of
the system was decomposed into a set of subsystems and analyzed approximately using an iterative algorithm. Also, it
is worth mentioning that a layered method based on experience with UML was presented in [14].
In this paper, we study a multi-class open QN model in which jobs are admitted to a number of SPs sharing hardware
resources. Our contributions are summarized below:

The proposed method is a direct, one-step decomposition approach whereas most of the previous work used LQN-
type approaches that were iterative in nature. As a result, the proposed computational effort is minimal with negligible
CPU times.
The proposed approach models the multi-class scenario with round-robin scheme at the CPU subsystem in the device
level, rather than using the processor sharing discipline that most of the early work employed.
The hardware layer in our approach is modeled using an open QN with deterministic CPU times and composite
hyper-exponential disk service times.

3. TP monitor and the server node trafc

TP monitor is the middleware that provides message queueing, transaction scheduling and other services such as
authentication and authorization [1]. It receives transaction requests, queues them and then takes responsibility for
managing them until completion. The server node is the most complex component when it comes to performance
analysis of a multi-tier TP system. It consists of software and hardware components. Software components provide a
collection of business services to clients. They are encapsulated in a set of procedures offered by the SPs. Hardware
components include CPU, memory, disk among others, all shared by the software components. All these components
have a direct impact on the overall performance of the server node. In order to understand its behavior, let us briey
discuss the trafc ow in a typical server node.
As shown in Fig. 1, a typical server node consists of service providers such as the SPs, hardware resources such
as the CPU and the disk, and middleware components such as listeners, handlers, inter-process communication (IPC)
modules and the bridge process (BP) for communication with other server nodes. SPs are running on the server to
handle client requests and offer a number of services. One or more listeners are congured to listen to connection
requests from clients. Each listener uses one or more handlers to route incoming requests to proper SPs. Note that a
service may be provided by more than one SP and an SP may provide more than one services, in which case routing can
be done based on the load information on SPs. Thus, each arriving request joins the queue of the selected SP. Requests
in SP queues wait for their turns to be picked up by the SP and join the runnable queue to be executed. During execution,
the SP is engaged with the current request and it cannot pick up another request from its queue until processing of the

Fig. 1. Trafc ow in a typical server node.


2564 W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578

current request is completed. Processing of a request by an SP involves CPU execution and possibly a number of disk
I/O operations. Clearly, all the SPs compete for CPU and disk resources. Upon process completion, the SP sends a
reply message back to the client via a handler. From this point on, the SP can pick up another job (if any) for execution.
This trafc ow involves queues and delays in contention for resources, that is, delays will occur in queues for SPs,
the CPU and the disk. In the next section, we present a QN model to predict these delays and evaluate performance of
the server node.

4. Delay analysis of a server node

We want to predict the overall residence time of a client request in a server node, which consists of delays in SPs, the
CPU and the disk subsystems. Our objective is to provide a generally applicable model to evaluate performance of a
TP middleware. It is detailed enough to produce accurate performance metrics regarding resource sharing of software
and hardware components in a given server node. The proposed approach is based on queueing-theoretic arguments
and it approximates mean waiting times in each of the aforementioned subsystems.

4.1. A queueing network model of a server node

Let us consider a server node running a TP monitor. Requests arrive at SP queues from a pool of clients on a random
basis. It has been observed empirically in [15] that at a given web server, request arrivals follow a nearly Poisson
distribution, which is an assumption we make for the SP queues. The service discipline in SP queues is assumed to
be FIFO and the queue capacity is assumed to be innite. In practice, SP queues have nite capacities due to memory
limitations. However, we ignore this restriction due to ever decreasing cost of the memory technology. Service time in
SP queues is a random variable representing the entire time that a service request spends in hardware resources. This
is known as the elapsed time and will be analyzed in detail in Section 4.2.
The CPU processes jobs (which are in fact SPs) waiting in the runnable queue according to the round-robin discipline.
In every visit to the CPU, a job waits for all the jobs ahead and receives a quantum of service after which it joins the end
of the CPU runnable queue. During a quantum, the job may require a number of disk I/Os. Upon a disk I/O request, the
active SP being executed relinquishes the CPU instantly and joins the disk queue. After the I/O is completed, the job
joins the runnable queue for the remaining service. We assume that disk I/Os are uniformly distributed over the total
CPU service time. The disk subsystem is viewed as a single server queue with multiple classes of I/O requests served
based on FIFO discipline.
Although at any point in time, the active processes in the CPU runnable queue may include additional processes
such as handlers, native clients (NCs) and operating system (OS) processes, we have observed [16] that the CPU time
of these processes are negligible. Therefore, in this paper, we will focus on the SPs, CPU and the disk subsystem with
the resulting QN model of a server node shown in Fig. 2.
The proposed approach decomposes the QN shown in Fig. 2 in such a way that the disk subsystem is analyzed rst
in isolation; then the CPU subsystem is analyzed using the information coming from the disk subsystem. These two
subsystems allow us to characterize the elapsed time, which is the service time in SP queues. In the nal stage, the SP
queues are analyzed and the server node residence times are obtained for each class of client requests.
In what follows, we rst introduce the elapsed time and develop an expression for the waiting time in the CPU
runnable queue in Section 4.2. This analysis requires the disk subsystem time information that is produced in Section 4.3.
Finally, section 4.4 presents the analysis of the SP queues.

4.2. Analysis of the elapsed time

Elapsed time is an important part of the time a request spends in a server node. It starts from the point a request is
picked up by an SP and continues until its service is completed. It includes delays in the CPU runnable queue, the CPU
busy (execution) time and the time spent in other visited resources such as the disk. For simplicity, let us assume that
a server node is a single-CPU computer with sufciently large memory. Requests arriving from clients are routed to
an appropriate SP that provides the requested service. As shown in Fig. 2, one can view an SP as a single server queue
with FIFO service discipline and multiple classes of Poisson arrivals (a class corresponds to a pool of requests for the
same service). Newly arrived jobs wait in an SP queue until all the jobs in front of it complete their services. When a
W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578 2565

SP1
SP1 Queue
Arriving CPU Disk
Requests
CPU Runnable Queue Disk Queue

SPN Reply
SPN Queue

Fig. 2. A queueing network model of a server node.

job leaves the SP, its memory partition is freed for reassignment to the next job in the SP queue. Poisson assumption is
justied due to a large number of request streams from a number of client nodes. Note that the service time in an SP is
nothing but the elapsed time. To be able to analyze the SP queues, a detailed analysis of the elapsed time needs to be
carried out taking into account of all the SPs that compete for the CPU and the disk.
For our purposes, we view that an SP and the request that it picks up form a grouped entity and move in the system
together until the requested service is completed. In the following analysis, for ease of presentation, the above grouped
entity is still referred to as the request. Let us dene the following parameters.

S total number of SPs running on the server node,


NS total number of services provided by the server node,
NS (i) number of services offered by SP i,
ij request arrival rate for service j at SP i,
NS (i)
i total request arrival rate at SP i, that is, i = j =1 ij ,
Tij elapsed time of service j at SP i,
ij percentage of time SP i is busy processing service j, and given by ij = ij Tij ,
WC (i, j ) waiting time of a request for service j at SP i in the CPU runnable queue,
TC (i, j ) total time that a request for service j at SP i spends in the CPU subsystem,
xj xed CPU busy time for service j ,
j disk I/O service rate for service j ,
TD (j ) total disk subsystem time for service j,
c(i, j ) the probability that a request for service j from SP i is in the CPU subsystem (as opposed being in the
disk subsystem),
j characteristic quantum time for service j ,
 context switching time, which is the set-up time the CPU spends when a new process is taken into
service.

We need to point out that the CPU time xj is not necessarily required to be xed in our approach. The model would
still be valid with general CPU times. However, the actual CPU time measurement for a given service does not vary
signicantly to justify the randomness assumption (used in most MVA approaches). Our benchmarking measurements
provided full support for this assumption. It is also necessary to introduce the concept of the characteristic quantum
time before we get into our detailed queueing analysis. Normally, requests are executed on the CPU for a quantum
2566 W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578

at a time in a round-robin manner. However, during an assigned quantum time, a job may leave the CPU before the
quantum is over either due to service completion or interrupts. Thus, characteristic quantum time, j , is the amount
of time a service j spends each time it visits visit the CPU with an expected value of  j . Analysis of the characteristic
quantum time is presented in Appendix A.
To analyze elapsed times, let us consider the point in time a request for service j at SP i is about to join the runnable
queue, either in its rst visit or in its subsequent visits back from the CPU or disk. The expected total waiting time of a
request in the CPU queue can be thought of the product of the expected number of CPU visits and the expected CPU
waiting time per visit. The expected number of visits to the CPU however is simply xj / j . The mean waiting time of
a request per visit depends on the number of requests it sees ahead of itself waiting in the runnable queue or being
processed on the CPU. Note that requests that are currently being processed are either residing in the CPU or the disk
subsystem. Therefore, an arriving request may see only some of the busy SPs in the CPU subsystem due to the fact that
the rest may remain in the disk subsystem. Arriving request j needs to wait in the runnable queue until all the requests
in front of it complete their characteristic quantum time. Since the expected number of requests for service k from SP l
in the CPU subsystem or the disk subsystem is lk , the expected number of requests for service k from SP l in the CPU
subsystem alone is given by lk c(l, k). A context switching time  and the expected characteristic quantum time  k
are spent every time a process seizes the CPU. Eventually, considering all the SPs and services provided by the system,
we can write the following mean waiting time in the CPU queue for service j from SP i [22]:

xj S NS (l)
WC (i, j )
(lk c(l, k)( k + )) +  , (1)
 j l=1, l=i k=1

where ij = ij T (i, j ), the rst  is the context switching time for each job that request j sees ahead, and the second 
the context switching time for itself. Consequently,

TC (i, j ) = WC (i, j ) + xj . (2)

Eq. (1) is a typical equation used to derive the mean waiting time in the M/G/1 queue and its variations. The argument
is based on the fact that waiting time of an arriving request depends on the workload it sees ahead of itself. The
approximation is due to the fact that lk and c(l, k) in (1) are the arbitrary-time probabilities, whereas they were to be
based on observations at arrival points at the CPU queue.
The probability of request j being in the runnable queue or on the CPU is

T C (i, j )
c(i, j ) = (3)
T (i, j )
and

T (i, j ) = T C (i, j ) + T D (j ). (4)

Combining (3) and (4), we have

T (i, j ) T D (j )
c(i, j ) = . (5)
T (i, j )
Finally, the approximate expression for the mean waiting time of request j from SP i is given by

xj  S NS (l)
WC (i, j )
(lk (T (l, k) T D (k))( k + )) +  . (6)
j l=1, l=i k=1

On the other hand, the mean elapsed time of service request j at process i is given by

T (i, j ) = WC (i, j ) + xj + T D (j ). (7)


W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578 2567

S
Since SP i provides NS (i) services, combining (6) and (7), we obtain a set of linear equations, consisting of 2 i=1 NS (i)

equations and 2 Si=1 NS (i) unknowns, given by
 = b,
BT  (8)
where B is the coefcient matrix
1 0 0 x1 lk (k + ) x1 S,NT (NT + )
1 1

0 1 0 x2 lk (k + ) x2 S,NT (NT + )
B= 2 2 .


xNT xNT
11 (1 + ) lk (k + ) 0 0 1
NT NT
 is the vector of elapsed times,
T

T (1, 1)
T (1, 2)



T = ,

T (i, j )

 is given by
and b


x1 
S N
S (l)
x1 + T D (1) lk (k + )T D (k) + 
1 l=2 k=1



b=
.
 
xj S N S (l)

xj + T D (j ) lk (k + )T D (k) + 
j l=1, l=i k=1

Notice that for a stable server node, all the diagonal elements of matrix B are 1 and the absolute values of all the other
elements are less than 1, which make B positive denite, indicating the non-singularity of B and a unique solution for
Eq. (8). Also note that we want to solve (8) for T (i, j ), and yet T D (j ) is also unknown at this point. In the next section,
a queueing model is introduced to analyze the disk subsystem performance. After obtaining T D (j ), we will return to
Eq. (8) and use a standard method to solve it and obtain the elapsed times.

4.3. Analysis of the disk subsystem

In this section, we present a model to study the time spent in the disk subsystem. While being processed at the CPU,
a service may initiate a number of disk I/O requests each visiting the disk to execute a read or write operation. There
are a number of queueing models reported in the literature modeling the disk behavior. In [17,18], M/G/1-type disk
performance models were presented. Even though these models assume single threads, attention has not been paid to
keeping a limited number of jobs in the disk subsystem. Recall that each SP is able to pick up a new request only after
the current request is served. This suggests that there can be as many jobs in the disk subsystems as there are SPs in
the server node. For this reason, an elaborate queueing model is necessary to study the disk subsystem simply due to
the xed number of sources generating I/O requests.
Note that I/O requests are issued by SPi with rate
N
S (i)
i = Nd (j )ij , i = 1, 2, . . . , S, (9)
j =1

where i is the rate of the combined stream of I/O requests from all the services provided by SPi , and Nd (j ) is the
number of disk I/Os issued by SP j. Let us assume that disk I/O requests of a given service all have the same disk service
2568 W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578

time distribution. If we dene the arrival stream coming for a given service as a class, this constitutes a multi-class
queueing system with a x population of arrivals. The exact analysis of such model requires tracking each class of I/O
requests even with their order of arrivals. For tractability purposes, we will assume identical arrival sources (SPs) with
an average I/O request rate of  = (1 + 2 + L + S )/S, as suggested by Kelly [19]. Now, we can merge all the arrival
processes into a single arrival class with an overall state-dependent arrival rate

(S j ), 0 j S,
j = (10)
0, j S,

indicating that the I/O requests arrive at the disk queue with rate j , when j requests are already waiting or being
processed in the disk subsystem. Furthermore, in practice, the large number of I/O requests arriving at the disk suggests
the Poisson assumption for the overall arrival processor disk subsystem. We have also veried this assumption via
simulation and witnessed reasonable support, as discussed in Section 5. Thus, the arrival process at the disk subsystem
will be a Poisson process with a state-dependent arrival rate j .
Clearly, when the arrival classes are merged, their services have to be merged also. Accordingly, we suggest to model
the combined service time using a hyper-exponential distribution with the CDF


NS
FY (t) = 1 uj ej t , (11)
j =1

   S
where uj = i Nd (j )ij / i j Nd (j )ij , 0 < uj < 1 and N j =1 uj = 1. The implicit assumption in (11) is that the
disk service times are exponentially distributed. In fact, original disk service times can have more general distributions
than exponential, in which case (11) will be more involved, yet still manageable. We have modeled the disk subsystem
as a single-class M/G/1 system with state-dependent arrival rates. We will study this queue using an analysis similar
to the one presented in [20].
Let Pn be the steady-state probability that n I/O requests are in the disk subsystem at any point in time. Using M/G/1
analysis from [21], we obtain


n
Pn = 0 P0 E(A1n ) + l Pl E(Aln ), n = 1, 2, . . . , S, (12)
l=1

where Alk is the amount of time that k jobs are in the disk subsystem during a service time that started with l jobs in
the system, l k, and E(Alk ) is given by
  
Sl
E(Alk ) = (1 et )kl (et )Sk [1 FY (t)] dt, 1 < l < k < S. (13)
0 kl

In general, the computation of E(Alk ) is somewhat involved. One may use a numerical method to obtain E(Alk ) for
an M/G/1 queue with general service times. However, in the case of hyper-exponential service times with the CDF
given in (11), we have obtained a closed-form expression for E(Alk ) given below


NS
ui Clk (
ik ) ( lk )
E(Alk ) = , (14)
 (
ik + lk )
i=1

where
 i  

ik = + S k, Sl
 Clk =
lk = k l + 1, kl

and ( ) is the Gamma function.


Finally, P0 and Pn , n = 1, 2, K, S, are obtained through normalization. Note that the disk utilization is given by
 = 1 P0 .
W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578 2569

Recall that we know both the arrival and service rates for all the services at the disk subsystem, and therefore we
know the actual disk utilization in the original disk subsystem. However, when the approximate rate  is used to obtain
disk utilization using the above approach, naturally a discrepancy occurs between the actual disk utilization and its
approximate value. To circumvent this situation, we have modied the arrival rate  in a manner to match the resulting
disk utilization to the one in the original system, the so-called target utilization, T . We have employed the Hooke
and Jeeves search procedure [21] to obtain the arrival rate resulting in the target utilization. Thus, by using the above
approach, we have enforced the anticipated utilization onto the disk subsystem. This remedies some of the inaccuracies
in the approximate decomposition of the complex middleware network. Our numerical experience shows that the use
of T signicantly improves the accuracy of the approximation.

4.4. Analysis of server process queues

In this section, we characterize the components of an SP queue and propose a model for its stationary analysis to
obtain the associated mean waiting time. Service requests arrive at an SP queue from a number of sources including
clients and may be other server nodes. It is not unrealistic to assume Markovian arrivals per class of service requests
at each SP queue. Furthermore, queue capacities are assumed to be innite. Service times, on the other hand, are more
involved. Since a SP cannot pick up the next request (if any) in the queue before processing of the current request is
over, the service time at a SP queue is simply the elapsed time. Thus, we will model SP queues as M/G/1 queues with
multiple classes of customers. A customer class does not have priority over other classes in the SP queue.
The mean waiting time in SP queue i is the same for all classes (no priorities) and is given by

NS (i) N
S (i)
Wi = ij (Cv 2T (i, j ) + 1)T (i, j )/2 1 ij , i = 1, 2, K, S, (15)
j =1 j

where Cv 2T (i, j ) is the squared coefcient of variation of the elapsed time for service j offered by SP i. It is given by
V ar[T (i, j )]
Cv 2T (i, j ) = . (16)
T (i, j )2
The analysis from Section 4.2 provides us with the mean elapsed times. Note that to be able to evaluate Eq. (15), one
needs the variance of the elapsed times. Therefore, in this section, we will focus on the approximation of the variances.
The variance of the elapsed time is fundamentally more involved. In the absence of exact expressions, an approxi-
mation is proposed in this section, instead of assuming a more simplifying exponential elapsed time. Using Eq. (10),
we can write

Var[T (i, j )] = Var[TC (i, j ) + TD (j )],

or

Var[T (i, j )] = Var[WC (i, j ) + xj + WD (i, j ) + SD (j )]. (17)

Assuming independence in the behaviors of CPU and disk subsystem, one can write

Var[T (i, j )]
Var[WC (i, j )] + Var[WD (i, j )] + Var[SD (j )]. (18)

Variance of the disk service time for service j, Var[SD (j )] is known from the analysis in Section 4.3. Note that
Var[SD (j )] can either be the variance of the implicitly assumed exponential disk service time or the variance of the
combined distribution given by (11). We have used the former.
Let us next focus on the variance of the time in the disk subsystem. When a request arrives at the disk queue, its
anticipated waiting time is given by
D 1
M
WD (i, j ) = SD + SDr for all i, j , (19)
i=1
2570 W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578

where
MD is the number of requests seen by an arrival in the disk subsystem,
SD the disk service time introduced in (15) and
SDr the remaining disk service time.
Variance of the disk waiting time is given by
M
 D
Var(WD (i, j )) = Var SD = E(MD ) Var(SD ) + Var(MD )E 2 (SD ), (20)
i=1

where E(MD ) and Var(MD ) are obtained using the probability distribution of the number of requests in the disk.
Variance of the CPU waiting time can be written in a similar manner shown in Appendix B. Using Eqs. (18), (20) and
the analysis in the Appendix, we can evaluate (15) to obtain mean waiting times in SP queues.
By means of the above approach, we are able to obtain the mean delays and the mean residence time at the server
node, as well as the utilization of CPU, disk, and all the SPs, which are the bottleneck indicators of a server node. In
the next section, we provide numerical examples.

5. Numerical results

In this section, the accuracy of the approximations proposed in this paper is examined. A series of numerical
experiments are presented in which approximation results are compared against the results obtained from a simulation
model of the server node. A C/S system simulation tool, named CP_Tool [16], was designed and has been used at Rutgers
University to test performance of various TP systems, TUXEDO in particular. The CP_Tool is based on ARENA1
general purpose simulator and it has the capability of modeling multi-tier applications involving clients, networks and
server nodes. The depth of server node modeling is at the level of CPU, disk I/O and service features. The CP_Tool was
validated using the measurements and benchmarking data from BEA system Inc. In our numerical experiments, the
CP_Tool was run for 106 transactions, producing point estimates with 95% condence intervals for various transaction
delays and resource utilizations, which are merely bottleneck indicators. In all these examples, the computation times
have been negligible since the proposed approach is a one-step decomposition that solves the subsystems directly rather
than iteratively.
In the rst set of examples, we consider a server node with three types of services provided by four SPs. We assume
that the CPU quantum time is 1 ms and the context switch time is negligible. The CPU busy times and disk service
times are shown in Table 1.
We rst have conducted a series of experiments to test the independence and exponentiality of the inter-arrival times
at the disk subsystem, which was the main approximation in the proposed decomposition approach. We have used
various arrival rates to generate ve cases with CPU utilizations ranging from 0.1 to 0.6 and disk utilization ranging
from 0.1 to 0.8, which are typical resource utilizations for a TP environment. Table 2 provides autocorrelations for lags
15 of various inter-arrival scenarios with varying CPU utilization, disk utilization, and inter-arrival time variability.
Notice that cases 35 are a lot more realistic as far as resource utilizations are concerned. The results suggest that as
the arrival rate increases, the I/O arrival process gets closer to a Poisson process, with inter-arrival times having a Cv 2
closer to 1.0, supporting the exponentiality assumptions. Also, in all cases, autocorrelations at lags 15 showed no
strong evidence of dependency.
Table 1
Service attributes
Service 1 2 3

CPU busy time (ms) 20 30 10


No. of disk I/Os per service 2 3 1
Mean disk I/O service time (ms) 10 15 20

1 ARENA is a registered trademark of Rockwell Software Inc.


W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578 2571

Table 2
Inter-arrival time variability and autocorrelations at the disk subsystem

Case 1 Case 2 Case 3 Case 4 Case 5

CPU utilization (%) 11 25 39 49 57


Disk utilization (%) 13 32 52 66 79
Cv 2 of inter-arrival times 2.6747 1.9441 1.4575 1.1689 0.9897
Lag 1 autocorrelation 0.06248 0.03454 0.0076 0.02326 0.03308
Lag 2 autocorrelation 0.00366 0.030129 0.037553 0.013733 0.01582
Lag 3 autocorrelation 0.03803 0.024659 0.032313 0.020589 0.00463
Lag 4 autocorrelation 0.01896 0.008935 0.025355 0.015307 0.00682
Lag 5 autocorrelation 0.00340 0.014139 0.024261 0.022182 0.00276

Table 3
Simulation vs. analytical results for the mean elapsed times with various CPU and disk utilizations

Elapsed T (1, 1) T (1, 2) T (1, 3) T (2, 1) T (2, 2) T (2, 3) T (3, 1) T (3, 2) T (3, 3) T (4, 1) T (4, 2) T (4, 3) Average
time error (%)

Case 1 Analytical 53.55 94.98 36.87 47.57 92.28 36.87 53.48 92.89 37.14 51.25 92.84 36.42
Simulation 47.33 86.12 34.51 44.94 82.81 32.72 46.87 85.03 34.14 45.27 85.11 33.43
Error (%) 13.15 10.29 6.84 5.86 11.43 12.69 14.11 9.24 8.78 13.20 9.09 8.96 10.30

Case 2 Analytical 59.24 103.37 41.55 56.62 101.36 40.96 59.38 103.91 41.91 58.50 102.56 37.31
Simulation 55.13 98.30 37.78 52.81 94.17 36.38 54.79 98.65 37.80 53.22 95.55 36.41
Error (%) 7.46 5.16 9.98 7.21 7.63 12.59 8.38 5.33 10.88 9.92 7.33 2.48 7.86

Case 3 Analytical 64.25 111.12 42.38 63.26 109.66 41.86 64.25 111.12 42.38 63.26 109.66 41.86
Simulation 67.30 116.35 43.93 62.88 110.23 42.33 66.80 117.27 44.10 63.20 112.01 42.40
Error (%) 4.53 4.50 3.53 0.60 0.52 1.11 3.82 5.24 3.90 0.09 2.10 1.27 2.60

Case 4 Analytical 73.03 124.19 46.86 71.76 122.33 46.19 73.03 124.19 46.86 71.76 122.33 46.19
Simulation 77.46 133.05 49.26 72.50 124.47 46.66 78.09 133.60 48.56 72.70 125.00 45.70
Error (%) 5.72 6.66 4.87 1.02 1.72 1.01 6.48 7.04 3.50 1.29 2.14 1.07 3.54

Case 5 Analytical 83.68 139.51 50.32 80.43 135.03 49.84 81.54 137.07 51.23 80.71 135.30 50.62
Simulation 92.16 154.70 55.78 85.55 144.29 52.52 92.24 153.43 56.18 85.07 142.80 52.78
Error (%) 9.20 9.82 9.79 5.98 6.42 5.11 11.60 10.67 8.81 5.12 5.26 4.09 7.66

Numerical results for elapsed times in all ve cases are shown in Table 3. The relative error is most of the time under
10%, with a range from 0.09% to 12.59% and an average of 6.39%. The average relative error of the elapsed times drops
from 10.30% to 7.86% when the CPU utilization increases from 11% to 25%. With 39% of CPU utilization and 52%
of disk utilization, the average relative error of the elapsed time is 2.60%. As the system workload increased to loads
of 49% and 57% of CPU utilization, the average error in the elapsed times increased to 3.54% and 7.66%, respectively.
When the server load is low, the proposed approach tends to over-estimate the elapsed times. On the other hand, the
elapsed times are underestimated when the server trafc increases. In all cases, these errors are within acceptable ranges
for performance evaluation of TP systems.
To explore the source of errors in the proposed approach, we next experimented with the set of examples studied
above but without any disk I/O activity. We compared our results for the elapsed times against simulation under different
CPU utilization levels, as shown in Table 4.
When there is no disk I/O, the proposed approach over-estimates the elapsed time, which is mostly the CPU time,
in all cases. The average relative error increases from 0.78% to 6.12% when the CPU utilization increases from 11%
to 57%. The proposed approach performs remarkably well when services do not require any disk I/O. This is a clear
indication that decomposing the disk subsystem from the CPU contributes to inaccuracy in the approximation. Another
observation from Table 4 is that the overall elapsed time approximations degenerate when the server node trafc
increases, as the average error column indicates.
To further investigate the impact of disk utilization on our approximation, we have included additional examples
shown in Table 5 where we maintained a xed CPU utilization while varying disk utilization by increasing disk I/O
2572 W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578

Table 4
Simulation vs. analytical results for the mean elapsed times with no disk I/O

CPU Elapsed T (1, 1) T (1, 2) T (1, 3) T (2, 1) T (2, 2) T (2, 3) T (3, 1) T (3, 2) T (3, 3) T (4, 1) T (4, 2) T (4, 3) Average
utilization times error

11% Analytical 22.05 33.08 11.03 21.81 32.72 10.91 22.05 33.08 11.03 21.81 32.72 10.91
Simulation 21.81 33.15 11.01 21.67 32.31 11.17 21.79 32.95 11.06 21.79 32.42 10.98
Error (%) 1.10 0.22 0.14 0.65 1.26 2.37 1.20 0.38 0.31 0.10 0.91 0.68 0.78

25% Analytical 24.53 36.79 12.26 24.01 36.02 12.01 24.53 36.79 12.26 24.01 36.02 12.01
Simulation 24.45 36.68 12.15 23.91 35.57 11.91 24.54 36.43 12.18 23.95 35.29 11.67
Error (%) 0.31 0.30 0.93 0.42 1.25 0.80 0.06 0.99 0.68 0.25 2.06 2.87 0.91

39% Analytical 28.76 43.14 14.38 27.80 41.70 13.90 28.76 43.14 14.38 27.80 41.70 13.90
Simulation 28.43 42.99 14.08 27.46 40.76 13.43 28.37 43.07 14.17 27.00 41.19 13.48
Error (%) 1.17 0.36 2.14 1.23 2.30 3.49 1.39 0.17 1.49 2.95 1.23 3.11 1.75

49% Analytical 31.44 47.17 15.72 30.20 45.30 15.10 31.44 47.17 15.72 30.20 45.30 15.10
Simulation 30.18 46.53 15.32 28.77 43.48 14.51 30.41 46.85 15.29 29.09 44.24 14.31
Error (%) 4.19 1.37 2.62 4.98 4.20 4.08 3.40 0.67 2.83 3.83 2.41 5.53 3.34

57% Analytical 36.5 54.74 18.25 34.75 52.12 17.37 36.5 54.74 18.25 34.75 52.12 17.37
Simulation 34.62 52.43 17.23 32.91 48.71 16.33 34.67 51.02 17.18 32.48 49.09 16.27
Error (%) 5.43 4.41 5.92 5.59 7.00 6.37 5.28 7.29 6.23 6.99 6.17 6.76 6.12

Table 5
Simulation vs. analytical results for the mean elapsed times with variation disk utilizations

Disk Elapsed T (1, 1) T (1, 2) T (1, 3) T (2, 1) T (2, 2) T (2, 3) T (3, 1) T (3, 2) T (3, 3) T (4, 1) T (4, 2) T (4, 3) Average
utilization times error

0% Analytical 38.54 57.82 19.27 36.60 54.89 18.30 38.54 57.82 19.27 36.60 54.89 18.30
Simulation 37.81 57.48 18.93 35.73 52.90 17.43 37.56 57.29 18.79 35.14 53.15 17.88
Error (%) 1.93 0.59 1.82 2.41 3.76 4.98 2.61 0.91 2.55 4.14 3.27 2.35 2.61

14% Analytical 38.54 77.36 19.27 36.60 74.43 18.30 38.54 77.36 19.27 36.60 74.43 18.30
Simulation 37.59 73.58 18.86 35.21 70.73 17.44 37.27 73.32 18.70 35.57 70.41 17.58
Error (%) 2.55 5.13 2.21 3.92 5.24 4.91 3.41 5.51 3.07 2.89 5.72 4.11 4.06

27% Analytical 38.54 95.04 19.27 36.60 92.11 18.30 38.54 95.04 19.27 36.60 92.11 18.30
Simulation 36.06 94.64 17.89 34.07 86.88 16.98 36.17 95.11 18.13 34.10 88.06 16.74
Error (%) 6.89 0.42 7.73 7.41 6.02 7.76 6.57 0.08 6.30 7.32 4.60 9.31 5.87

42% Analytical 38.54 111.88 19.27 36.60 108.95 18.30 38.54 111.88 19.27 36.60 108.95 18.30
Simulation 36.50 122.49 18.19 34.15 120.45 17.00 36.73 121.98 18.01 34.44 117.98 17.03
Error (%) 5.60 8.66 5.95 7.16 9.54 7.64 4.94 8.28 7.01 6.26 7.65 7.45 7.18

55% Analytical 38.54 148.38 19.27 36.60 145.45 18.30 38.54 148.38 19.27 36.60 145.45 18.30
Simulation 36.76 161.18 18.47 35.24 154.89 17.60 37.17 162.05 18.57 34.80 154.88 17.03
Error (%) 4.86 7.94 4.34 3.85 6.09 3.97 3.70 8.44 3.78 5.16 6.09 7.45 5.47

requirements of a given service provided by the server. In this new set, service 2 is the only service having disk
activities.
From Table 5, we have observed that the overall average relative error increases as the server load increases. We
have also observed that the error in the elapsed time of services 2, which requires disk activity, contributes the most to
the average error in all the cases. When the disk utilization is low, the proposed approach over-estimates disk waiting
times of service 2. On the other hand, disk waiting times are under-estimated when the server load increases.
One can deduce from these examples that decomposing the disk subsystem from the CPU is the major contributor
to the inaccuracy in the approximation and the error deteriorates as the disk utilization increases. The relative errors
are higher for elapsed times of SPs with higher disk activity. Furthermore, Tables 3 and 5 reveal that in those cases
W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578 2573

Table 6
Service allocation and the arrival rates at server processes (103 m s1 )

SP1 SP2 SP3 SP4 SP5 SP6 SP7 SP8 SP9 SP10

Service 1 0.4 0.1 0.15 0.1


Service 2 0.2 0.2 0.3 0.08
Service 3 0.3 0.4 0.2 0.05
Service 4 0.4 0.3 0.1 0.04
Service 5 0.3 0.4 0.4 0.02
Service 6 0.2 0.5 0.5 0.05
Service 7 0.6 0.4 0.3 0.02
Service 8 0.4 0.3 0.3 0.1
Service 9 0.3 0.4 0.1 0.07
Service 10 0.1 0.5 0.4 0.04
Service 11 0.1 0.4 0.5 0.05
Service 12 0.1 0.4 0.3 0.1
Service 13 0.3 0.1 0.5 0.1
Service 14 0.3 0.3 0.3 0.15
Service 15 0.2 0.1 0.5 0.03

Table 7
Service attributes

CPU time (ms) Disk time (ms) # of disk I/Os

Service 1 20 0 0
Service 2 15 15 1
Service 3 30 0 0
Service 4 25 25 1
Service 5 18 0 0
Service 6 15 8 2
Service 7 12 30 4
Service 8 16 5 4
Service 9 15 10 3
Service 10 25 7 4
Service 11 25 14 5
Service 12 40 28 3
Service 13 50 5 2
Service 14 24 0 0
Service 15 34 6 3

with lower CPU utilization, since the method over-estimates both the CPU and disk waiting times, it results in higher
relative error in the approximation. When the system load is high, the proposed approach under-estimates the average
disk waiting times while still over-estimating the CPU waiting times. The approximate results are highly acceptable for
cases with medium and high CPU utilization with low and medium disk utilization. Fortunately, in reality, high disk
utilization is not desirable due to poor system performance; therefore, we expect our approach to be quite suitable for
performance analysis of typical TP systems.
The complexity of the system increases with the increased number of services and SPs running on a server node.
To test the scalability of our approach, we carried out an experiment on a server node with 10 SPs, which offer 15
different types of services. The allocation of services provided by the SPs and the request arrival rates at each SP are
given in Table 6. For example, SP1 offers services 1, 2, 3, 4, 5; SP 2 provides services 6, 7, 8, 9, 10, etc. The disk
service times and the CPU busy times for each service are shown in Table 7. In this example, we have considered a
mixture of services, some of which are CPU bound such as services 1, 3, 5, 14, disk bound such as services 7, 11, and
other services which require a mixture of CPU and I/O processing.
The correspondence CPU and disk utilizations are 35% and 46%. Our disk analysis provides an average approximate
disk queue waiting time of 12.38 ms, whereas the simulation result is 11.42 ms, with a relative error of 8.39%. The
results of the SP queue analysis are shown in Tables 8 and 9.
2574 W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578

Table 8
Mean elapsed times

Server process Service Analytical Simulation Error (%)

1 1 27.75 29.64 6.36


2 48.58 52.38 7.25
3 41.63 43.36 3.99
4 72.46 76.35 5.09
5 24.98 26.22 4.73

2 6 62.48 59.04 5.82


7 187.42 187.93 0.27
8 93.05 77.10 20.69
9 89.06 79.90 11.47
10 113.72 99.74 14.01

3 11 167.78 170.69 1.71


12 177.46 189.71 6.46
13 104.77 111.99 6.45
14 33.32 35.10 5.06
15 103.13 106.32 3.00

4 3 41.53 45.13 7.97


4 72.38 74.67 3.07
5 24.92 26.99 7.67
6 62.10 60.69 2.33
7 187.10 192.02 2.56

5 8 92.74 79.73 16.31


9 88.77 83.64 6.13
10 113.25 102.09 10.93
11 167.82 154.40 8.69
12 177.52 177.94 0.24

6 1 28.16 29.65 5.04


2 48.90 51.58 5.19
13 105.76 117.89 10.29
14 33.79 35.05 3.60
15 103.82 106.94 2.92

7 5 25.28 26.97 6.27


6 62.43 61.44 1.62
7 187.38 187.84 0.24
8 93.00 85.09 9.30
9 89.01 85.39 4.24

8 10 112.17 103.06 8.84


11 166.72 159.99 4.21
12 175.87 180.68 2.66
13 102.82 104.08 1.21
14 32.42 33.63 3.60

9 1 27.84 29.05 4.17


2 48.65 52.68 7.65
3 41.76 43.56 4.14
4 72.57 75.51 3.89
15 103.25 108.59 4.92

10 1 28.25 30.18 6.39


2 48.98 48.25 1.52
3 42.38 45.17 6.18
4 73.11 72.56 0.76
5 25.43 27.54 7.67
6 62.57 68.80 9.06
7 187.50 186.19 0.71
W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578 2575

Table 8 (continued)

8 93.15 91.67 1.62


9 89.15 91.89 2.98
10 113.87 117.03 2.70
11 168.45 164.62 2.33
12 178.47 196.60 9.22
13 106.01 112.19 5.51
14 33.90 35.57 4.69
15 103.99 100.79 3.18

Table 9
Mean waiting times in server process queues
SP Analytical Simulation (95%CI) Relative error (%)

1 2.72 2.86(0.12) 4.60


2 25.11 21.13 (0.16) 18.88
3 8.14 8.54 (0.16) 4.73
4 16.61 15.68 (0.23) 5.97
5 32.89 31.64 (0.55) 3.89
6 2.10 2.52 (0.11) 16.66
7 14.01 12.41 (0.14) 12.94
8 30.19 29.28 (0.17) 3.08
9 5.51 5.93 (0.12) 7.34
10 7.20 8.51 (0.13) 15.19

Table 10
Mean relative error of response times across services for each SP
SP Relative error of response times (%)

1 5.48
2 10.45
3 4.53
4 4.72
5 8.46
6 5.41
7 4.33
8 4.10
9 4.95
10 4.30

In Table 8, we show the elapsed times for all the services offered by each SP. The relative errors over all the elapsed
times range from 0.24% to 20.69% with an average of 5.45%. In Table 9, we present the mean waiting time at each SP
queue. The relative errors range from 3.08% to 18.88% with an average of 9.33%. We have observed in this experiment
that jobs with larger CPU times experience larger relative errors. For those jobs with extensive I/O activity, relative
errors in most cases are below 5%. For services with mixture of CPU execution and I/O operations, our observation
is that SPs offering services requiring large number of I/O operations, such as SP2 and SP5 , experience larger relative
errors in response times as shown in Table 10, which averages errors of response times over all services for each SP.
Due to our observations that either CPU-bound or I/O-bound services exhibiting smaller relative errors compared to
those services requiring a mixed use of resources, we believe that the main source of error is the simplicity in modeling
the internal arrival process at the disk subsystem.
The error levels we have experienced in this study are highly comparable to those reported in the literature for the
performance evaluation of distributed systems. However, it is difcult to directly compare our approach with the earlier
LQN-type methods due to differences in assumptions. An appropriate comparison would be to implement the two
approaches in a real scenario and compare the results. However, this is outside the scope of the present paper. Besides
2576 W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578

the errors of the proposed approximation, we need to point out that we did not generalize our approach to multiple
CPU and multiple disk cases, which is a common feature of many TP system. However, by means of similar treatment,
we can easily extend this approach to multi-server queues.
The unique advantage of the proposed approach is its capability of analyzing more general systems with non-
exponential service times, multiple job classes, and the round-robin service discipline at the CPU. The accuracy of the
results encourages us to continue our effort in embellishing these approximations and extending them to include other
middleware scenarios such as application servers.

Acknowledgement

This research has been supported by grants from the National Science Foundation, Grant # DMI 0085659. We are
thankful to Mr. Mesut Gunduc of Microsoft Corporation for extensive discussion on middleware performance. We are
also thankful to the reviewers for their constructive reviews that improved the presentation of the contents of the paper.

Appendix A. Characteristic quantum time

During processing of a service, a request will spend a full quantum time in each visit to the CPU except in those
visits during which either an I/O occurs or the service is completed. Thus, the time a service request spends on the CPU
will be either a quantum or some time less than a quantum. Let us dene the time service j spends on the CPU per visit
as the characteristic quantum time for service j.
Let j be characteristic quantum time of service j, xj the xed CPU busy time of service j, the CPU quantum
time, Z the time until an I/O or the service completion occurs in a visit to CPU and Nd (j ) = number of disk I/Os per
service j.
Note that while dening Z, we do not differentiate between an I/O and the service completion, which is certainly an
approximation. Let us take a look at a sample path of CPU execution of a particular request, as shown in Fig. A.1. The
request leaves the CPU for an I/O in the third quantum. The characteristic quantum time in this visit is 1  . After a
couple of full quanta, there is another interrupt, where the characteristic quantum time in this visit is 2  . Finally,
the request completes its CPU busy time and leaves the CPU and the characteristic quantum time in this last visit is
e  .
Note that service j has Nd (j ) I/Os resulting in the service request visiting the CPU Nd (j ) + 1 times, each taking
less than a quantum CPU busy time. This includes Nd (j ) I/O departures to the disk and the last visit to the CPU where
the service is completed. On the other hand, on average there are (xj (Nd (j ) + 1)Z)/ CPU visits of a request for
service j with characteristic quantum time j = . Out of (xj (Nd (j ) + 1)Z)/ + Nd (j ) + 1 CPU visits of a request
for service j, (Nd (j ) + 1) of them have a characteristic quantum time j = Z. Thus, P (j < ) can be approximated
by
Nd + 1
P (j < ) = . (A.1)
(X(Nd +1)Z)
+ Nd + 1

We can now express the characteristic quantum time j approximately in the following manner:
Nd (j ) + 1

w.p. 1 ,

xj (Nd (j )+1)Z
+ (Nd (j ) + 1)
j
(A.2)

Nd (j ) + 1

Z w.p. .
xj (Nd (j )+1)Z
+ (N d (j ) + 1)

1 ... 2 e

Fig. A.1. A sample path of CPU execution.


W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578 2577

As mentioned earlier, the approximation is due to the assumption that the time until an I/O occurs in any visit to the
CPU and the time until the service is completed in the last visit has the same distribution. Due to lack of information, we
further assume that Z is uniformly distributed in a quantum with Z = /2. Thus, the expected characteristic quantum
time is given by
xj
j = xj Nd (j )+1
. (A.3)
+ 2

Appendix B. Number of requests in the CPU runnable queue

The purpose of this analysis is to obtain the steady-state probabilities of the number of requests in a queue that
would approximate the behavior of the CPU runnable queue when the disk activity is removed. Note that the ultimate
goal is to approximate the variance of the waiting time in the runnable queue to be sued in Section 4.4. To achieve this,
we propose to model the CPU runnable queue as an M/G/1 queue with state-dependent arrivals. Since each service
has a xed CPU busy time xj , the CPU service time will be given by

SC = xj w.p. uj (B.1)
   S
where uj = i ij / i j ij , N j =1 uj = 1.
Let {Nt , t > 0} be the stochastic process representing the number of jobs in the CPU subsystem at time t. Also, let
n(0 n S)be a particular state of this stochastic process, indicating a job being executed and n 1 jobs waiting in the
runnable queue. The CPU is idle when n = 0. Let Qn be the probability of being in state n. Recall that in a server node,
there is a possibility that some requests are in the disk subsystem at any point in time. If there are N j jobs in the disk
subsystem, then there are at most j jobs in the CPU subsystem. If all requests from SPs reside in the disk subsystem,
there will be no jobs at the CPU runnable queue. Also, recall that we have obtained the steady-state probability Pi of
having i I/O requests in the disk subsystem, in Section 4.5. Using Pi and the approximate arrival rate , the arrival rate
at the CPU runnable queue can be approximately represented by
Sj

(S i)Pi , 0 j S,
j
i=0 (B.2)

0, j < S,
 
where i = j ij and  = i i /S.
Employing the arguments used in Section 4.5, the steady-state probabilities of the number of requests in the CPU
subsystem are given by

n
Qn = 0 Q0 E(A1n ) + j Qj E(Aj n ), n = 1, 2, K, S, (B.3)
j =1

where E(Aj k ) is given by


NS 
 
Sj
E(Aj k ) = wi [(1 exi )kj +1 (1 exi1 )kj +1 ]/(k j + 1). (B.4)
kj
i=1
 N 1
where (x1 , x2 , K, xNS ) is obtained by arranging (x1 , x2 , K, xNS ) in a increasing order, and wi = 1 Si=1 ui .
Finally, Qn , n = 0, 1, 2, . . . , S, can be obtained through normalization. The variance of the number of jobs waiting
can be easily obtained from Qn .

References

[1] Gray J. Transaction processing: concepts and techniques. Andreas Reuter, 1993.
[2] Lazowska ED, Zahorjan J, Graham GS, Sevcik KC. Computer system analysis using queueing network models. Englewood Cliffs, NJ: Prentice-
Hall Inc.; 1984.
[3] Highleyman WH. Performance analysis of transaction processing systems. Englewood Cliffs, NJ: Prentice-Hall; 1989.
2578 W. Xiong, T. Altiok / Computers & Operations Research 35 (2008) 2561 2578

[4] Menasc DA, Almeida VAF. Web performance metrics models and methods. Englewood Cliffs, NJ: Prentice-Hall; 1998.
[5] Rolia JA, Sevcik KA. The method of layers. IEEE Transactions on Software Engineering 1995;21(8):689700.
[6] Woodside CM, Neilson JE, Petriu DC, Majumdar S. The stochastic rendezvous network model for performance of synchronous clientserver-like
distributed software. IEEE Transactions on Computers 1995;44(8).
[7] Woodside CM, Neron E, Ho EDS, Mondoux B. An activeserver model for the performance of parallel programs written using rendezvous.
Journal of Systems and Software 1986.
[8] Petriu D, Shousha C, Jalnapurkar A. Architecture-based performance analysis applied to a telecommunication system. IEEE Transactions on
Software Engineering 2000;26(11):104965.
[9] Petriu D, Amer H, Majumdar S, Abdull-Fatah I. Using analytic models for predicting middleware performance WOSP 2000, Ottawa, Canada,
2000.
[10] Wu X, Woodside M. Performance modeling from software components WOSP 2004, Redwood Shores, CA, 2004.
[11] D. Menasce, Two-level iterative queueing model of software contention. In Modeling analysis and simulation of computer and
telecommunications systems (MASCOTS 2002), Fort Worth, TX, 2002. p. 26780.
[12] Ramesh S, Perros HG. A multi-layered queueing network model of a clientserver system with synchronous and asynchronous messages. IEEE
Transactions of Software Engineering 2000;26:1086100.
[13] Ramesh S, Perros HG. A multi-layer clientserver queueing network model with synchronous and asynchronous non-hierarchical messages.
Performance Evaluation Journal 2001;45(4):22356.
[14] Kahkipuro P. UML-based performance modeling framework for component-based systems. In: Dumke R, Rautenstrauch A, Schmietendorf A,
Scholz A, editors. Performance engineering. Berlin: Springer; 2001.
[15] Ciemiewicz D, et al., Webmasters survival conference notes. Silicon Graphics, Inc., 1996.
[16] Altiok T, Xiong W, Gunduc M. A capacity planning tool for the Tuxedo Middleware used in transaction processing systems. In Proceedings of
the winter simulation conference, 2001.
[17] Shriver E. Performance modeling for realistic storage devices. PhD dissertation, Department of Computer Science, New York University, May
1997.
[18] Wilhelm NC. A general model for the performance of disk systems. Journal of the ACM (JACM) 1977;24(1):1431.
[19] Kelly FP. Reversibility and stochastic networks. New York: Wiley; 1979.
[20] Tijms HC. Stochastic modeling and analysisa computational approach. New York: Wiley; 1986.
[21] Reklaitis GV, Ravindran A, Ragsdell KM. Engineering optimization, methods and applications. New York: Wiley; 1983.
[22] Altiok T. Performance evaluation of manufacturing systems. Berlin: Springer; 1997.
[23] Andrade M, Carges M, Terence J, Dwyer J, Stephen D, Felts X. The Tuxedo system, software for construction and managing distributed business
applications. Reading, MA: Addision-Wesley Publishing Company; 1996.
[26] Woodside CM. Throughput calculation for basic stochastic rendezvous networks. Performance Evaluation 1989;9:14360.

You might also like