Professional Documents
Culture Documents
2008 Dynamic Thread Count Adaptation
2008 Dynamic Thread Count Adaptation
Takeshi Ogasawara
IBM Tokyo Research Lab
Shimo-tsuruma 1623-14, Kanagawa, Japan
E-mail: takeshi@jp.ibm.com
Abstract Clients
Network
We propose a dynamic mechanism, thread count adap-
tation, that adjusts the thread counts that are allocated
to services for adapting to CPU requirement variations in Service
SMP environments. Our goal is to increase the maximum Queues
ria for these services in dynamic workloads. Our challenge service requests
is to significantly improve response times for dynamic con- threads
tent on a busy well-tuned thread-pool-based system without
prioritizing any specific services. Our experiments demon- Figure 1. Multiple services and thread pools
strate that a prototype using our approach on J2EE mid- of multiple dynamic content services of a dynamic work-
dleware quickly (around every 20 ms) adjusted the thread load that can be handled by a given multiprocessor server,
counts for the services and that it improved the average where thread pool sizes are optimal on average while still
90th -percentile response times by up to 27% (and 22% on meeting the time service factor (TSF) targets1 (or QoS cri-
average) for the SPECjAppServer2004 benchmark. teria) of these services. In other words, for a busy server
that currently can handle M clients at maximum, we want
1. Introduction to support additional N clients while still meeting the TSF
targets by improving the QoS. We seek to achieve this with-
Multiple application services including Web services can
out prioritizing any requests of specific types (or changing
be supported on a single shared-memory multiprocessor
the transaction mix).
(SMP) server. The capacity of such a server is increased
by recent processors that run many hardware threads on a Our challenge is to significantly improve response times
single chip. Recently, even an entry-level server with tens for dynamic content on a busy well-tuned thread-pool-based
of hardware threads (for example 64 threads on a Sun Nia- system without prioritizing any specific services. Accord-
gara2) can handle tens of thousands of clients. Many mod- ing to queuing theory, additional clients on a busy system
ern application services provide dynamic content, which is will exponentially degrade the response times. Therefore,
created at runtime and which consumes variable amounts of to continue meeting the TSF targets with additional clients,
the system’s resources. A typical example of dynamic con- the response times should be significantly improved. For
tent is webpages that are generated by application code that application code that creates dynamic content, threads can
follows the Java Enterprise Edition standard. The code is be blocked when they access the remote database, and such
executed by threads on application servers [41, 42]. Appli- blocked threads affect the CPU resources actually allocated
cation servers receive requests from clients, dispatch them to services. The response times of the database accesses are
to threads to run the code that corresponds to the request unpredictable. Also, it is difficult to predict the execution
types, and send the calculated data back to the clients. times of the application code at runtime. Therefore, along
To optimize the performance, many application servers with dynamic changes in the request arrival rate and trans-
[4, 28, 20, 35, 36] use thread pools for services [25]. A action mix, it is a challenge to adapt CPU shares for dy-
set of threads are created and reused for each service. With namic content compared to adapting CPU shares for static
help from asynchronous I/O [39, 9], the arriving requests content [2]. Also, we need to frequently adapt the distri-
are delivered to a limited number of threads in the thread bution of the shared CPU resource to the changes of CPU
pools. Requests that are dispatched to a service but not im- demands by controlling the threads, since changes in CPU
mediately handled by a thread wait for a thread in a service demand can be observed frequently, such as every 20 ms.
queue. An example application server is shown in Figure 1. 1 A typical example of TSF targets is the 90th -percentile response time,
Our goal is to improve the maximum total throughput which specifies a time limit for 90% of the total responses.
586
was reduced. Their approach can find an optimal pool size, target applications, the workload is dynamic and the CPU
starting from any initial pool size. However, the benefit is resource allocations among the services can change. We
small if the initial pool size is already near optimal. can achieve a better performance than these approaches by
Welsh et al. proposed thread pool controllers [39] on a dynamically adjusting the CPU resource allocations among
staged event-driven architecture (SEDA). They increase the the services.
thread pool size for each stage up to the maximum number 3. Overview of Thread-Pool Based Middleware
in order to allocate more CPU time whenever the TSF target
of the stage is not met. Threads are removed if they become Before discussing our problem in CPU sharing on thread
idle. Li et al. adjusted the thread pool sizes on the SEDA so pools, we first explain how threads handle requests, how the
that the average response time met the TSF targets by sup- thread pool size is set, and how the actual CPU shares for
posing that the throughput was proportional to the thread services constantly change.
pool size and that the average response time could be con- 3.1. Thread Pool Bound to Service
trolled by the throughput [15]. These approaches control
the thread pools for stages that depend on each other. Since Services have service queues and their own thread pools.
a single task is split into stages on the SEDA, the output A thread pool is a set of threads that are created in advance
rate of each stage is matched with the input rate for the next and reused by the middleware to reduce the overhead of
stage [14]. In contrast, we are controling thread pools for thread creation, whose overhead is generally large for an
independent services. Also, the throughput is saturated on operating system.
busy servers. Threads in each thread pool obtain requests from the ser-
vice queue, execute the application code that provides the
Lu et al. proposed a control-theoretic approach based on
service, and send the results back to the clients. Threads
linear system models for controlling a pool of Web server
are not bound to connections in contrast to the process-per-
processes [17]. They reported that they could not control
connection mode of widely used Apache. After threads
fluctuations in the waiting times of requests because of non-
complete the requests, they obtain the next requests and
linear properties. Also, they showed that a system model
continue processing. On a busy server, where the CPUs are
based on low-load conditions will break down [38]. Our
fully used, since the service queues tend to have requests
approach can respond to nonlinear properties such as thread
that are waiting for threads (or CPUs), threads can usually
blocking on a busy server.
obtain requests immediately.
Pyarali et al. proposed thread borrowing, which moves A server can have multiple thread pools. In practice,
threads from pools with low priorities to pools with high many vendors [20, 35, 28, 36] use multiple thread pools for
priorities in a real-time system, RT-CORBA, on a single multiple services on a single server when submitting their
server [23]. This solves a priority inversion problem when SPECjAppServer2004 results [30].
services with high priorities have no pool threads causing
them to stop functioning even though services with low pri- 3.2. Thread Pool Size
orities are still running. Since we dynamically determine Thread pool size can be larger than the number of pro-
which service should be prioritized based on the TSF, the cessors if threads can be blocked waiting for the remote data
prioritized services can be changing in our approach, while and other threads. For example, the application code can
the prioritized services are fixed in RT-CORBA. communicate with a database to create dynamic content and
Kourai et al. proposed a framework for application-level acquire the lock of a hash table for mutual exclusion. If the
thread scheduling for J2EE applications [13]. They can in- thread pool size is the same as the number of CPUs, then
sert scheduler code into the existing code without rewriting the CPUs with blocked threads become idle. Therefore, we
programs. They demonstrated that a scheduler can prior- need additional threads to fully utilize the CPUs.
itize threads in a high-priority group over those in a low- The thread pool size tends to reach the maximum value
priority group. In our prototype, we modified the middle- and stay there on a busy server even if the thread pool size is
ware to obtain the middleware-specific data such as the av- dynamic. The thread pool size can start at a minimum value
erage wait times for the service queues. Our mechanism that and increase to a maximum value. Threads are removed
decreases the number of active threads is similar to theirs. if they become idle. On a busy server, however, threads
Much work has been done in the area of proportional- rarely become idle. Therefore, application servers may fix
share scheduling to assign the CPU resources to services the thread pool size for a busy server to reduce the overhead
in the proportions specified by applications at the kernel of managing the number of threads.
level and at the application level. The fixed proportions of The thread pool size should be minimized as long as all
the CPU resource allocations among the services work effi- of the CPUs are fully utilized. Excess threads can degrade
ciently if the workload of each service is static and the CPU the throughput and response time [39, 40]. The thread pool
resources assigned to the services do not change. For our size is initially estimated based on experience (e.g., two
587
times the number of processors [16]) and tuned via experi- Enq/DeqA Enq/DeqB
Avg Wait
Service Queue A Service Queue B
ments. An optimal thread pool size depends on many fac- Time Profiler
588
Nsvc the number of service queues
Nthd the total number of threads switch overhead between processes (less than 1 µs [31]), C3
B the number of threads that are used for thread redistribution is smaller. Supposing that C3 is at most 1 µs in our exper-
out of the total Nthd threads
∆Wi change in the average wait time for service i imental environment (B = 1, I = 0.02, and P = 4), the
wi weight value for normalizing the absolute values of ∆Wi
Ẇi weighted wait time change, wi ∗ ∆Wi third cost also consumes at most only 0.00125% of the total
M the median point of Ẇi execution time and is negligible2 .
bi the number of threads to be added or removed
P+ /− a set of services that can obtain or lose threads 6. Evaluation
N+ /− the number of services in P+ and P− , respectively
Table 1. Definitions of parameters We prototyped our approach on our J2EE middleware,
the IBM WebSphere Application Server (WAS), version
ing calculations for the redistributions. First, TCA calcu- 6.0 [36]. We used the workload of an industry-standard
lates thePmedian point of the weighted wait time changes, J2EE benchmark, SPECjAppServer2004 (SjAS) [30], for
Nsvc
M , as i=1 Ẇi /Nsvc . The B redistributable threads can our tests. We ran the tests on the original and modified
be moved from services whose Ẇi values are less than M versions of the J2EE middleware. We first evaluated im-
to services whose Ẇi values are larger than M . For thread provements in the response time by comparing the average
pools that can obtain threads (P+ ), the number of additional response times between the original and modified versions.
threads is proportional to the distance of Ẇi from M and is For the tests, we used the injection rate3 that yielded the
PN+
calculated as bi = B ∗ |Ẇi − M |/ j=1 |Ẇi − M |. Sim- maximum throughput on the original version. Then we
ilarly, the number of removed threads for P− can be calcu- evaluated how the improved response time contributed to
lated using N− . The calculated value of bi is a real number, improvement of the overall throughput by increasing the in-
and therefore we round it to the nearest integer. Next, us- jection rate. The following sections explain the experimen-
ing bi , TCA adjusts the thread pool sizes. To add threads to tal environment and discuss the results in detail.
thread pools, TCA activates idle threads by sending instruc-
6.1. Experimental Environment
tions to them. To remove threads from thread pools, TCA
sends requests to stop threads in those thread pools. Each In this section, we describe the SjAS benchmark, the ma-
thread checks for such a request after processing its task chine configuration, the J2EE middleware, and the proto-
and, if found, that thread accepts the request and becomes type code used for our tests.
idle. 6.1.1. SPECjAppServer2004. The SjAS benchmark is a
J2EE application that emulates an automobile manufactur-
5.4. Discussion of Overhead ing company and its associated dealerships. SjAS calls
The overhead of our approach is very small. Our ap- for a 3-tier machine configuration: driver machines, J2EE
proach has overhead for (1) calculating the average wait server machines, and database machines. The application
time, (2) calculating the wait time changes and the require- has multiple services that manage information about cus-
ments for thread count adaptation, and (3) the actual con- tomers, manufacturers, and suppliers and that display the re-
trol of thread pool size. The first cost is the cost of main- sults via a Web interface. The workload is dynamic. The re-
taining Tarrival and N for each queue at every queue op- quest arrival rates are dynamic because of the varying think
eration. Though this is the most frequently incurred of the times. The various request types also affect the execution
three costs, it is negligible, since only a few machine in- times for request processing. The numbers of database ac-
structions are performed and they are trivial compared to cesses also vary randomly. The J2EE code that calculates
the number of instructions for calculating the dynamic con- the dynamic content consumes most of the computational
tent. The second cost is proportional to the number of thread power of the J2EE server machines. The driver requests
pools. When we measured the cost on our machine (with two types of transactions: one via ORB and the other via
POWER5 1.65 GHz CPUs), the cost was under 1 µs for up the Web. The 90th -percentile response time (RT) is speci-
to 128 thread pools, which are sufficient for the SMP envi- fied for each type. The response time for each transaction
ronments that are currently available. The second cost that is calculated based on the total of the response times for
is incurred every second is calculated as C2 /I/P where C2 multiple requests. We ran the unmodified code of SjAS five
is the second cost, I is the time interval between invocations times and calculated the average 90th -percentile RT. When
of OM, and P is the number of hardware threads. Suppos- we evaluated how the improved response times contributed
ing that C2 is at most 1 µs for our experimental environment to improvements of the overall throughput, we compared
(I = 0.02 and P = 4), and the second cost consumes at the maximum JOPS (jAppServer Operations Per Second)
most only 0.00125% of the total execution time and is neg- whose average 90th -percentile RT was satisfied between the
ligible. The majority of this third cost is the context-switch original and the modified versions of WAS.
overhead. The third cost that is incurred every second is 2 The third cost will still be low for large B since P is also large for
calculated as C3 ∗ B/I/P where C3 is the context-switch such B.
overhead between threads. Compared with the context- 3 The number of clients is proportional to the injection rate.
589
6.1.2. Three-Tier Machine Configuration. We set up a 6
570 (1.65 GHz POWER5 with 2-way SMT). The AIX 5.3 1
12
For SjAS, the two services, ORB and Web, are the major 10
6
menting with many combinations of thread pool sizes, we 4
0
and 16 for Web) sufficiently large to fully utilize the CPUs 500000 550000 600000 650000 700000
and to show the maximum JOPS satisfying the TSF tar- (b) Web container service
gets. The J2EE applications access databases as they pro- Figure 3. Numbers of non-blocked threads
cess each request. The J2EE applications running on the with such additional threads, then we remove those extra
worker threads send requests to the database servers and threads but do not add them to any other service, since they
then wait for responses from the database servers. To fully were only created to compensate for the lower limits on the
utilize the CPUs of the J2EE servers, more threads than the number of threads that could be moved. Once all of these
number of CPUs are used, since threads can be blocked extra threads have been removed, then the total number of
while accessing the database. pool threads will return to the initial value.
6.1.4. Prototype of Our Approach. We implemented
6.2. Experimental Results
thread count adaptation by modifying the code of WAS.
The OM is invoked at approximately 20-ms intervals. Our We first show how the numbers of threads that can pro-
approach monitors the average wait time of requests even cess requests constantly change. As explained in Section 4,
though the TSF targets on SjAS are not specified for re- one reason that services cannot obtain CPU shares in pro-
qeusts, but for the transactions, which consist of multiple re- portion to their workloads is the blocking of threads. To
quests. We cannot monitor the average wait times of trans- investigate how threads are blocked during the tests, we
actions by correlating the requests to transactions since the tracked the number of threads that are not blocked due to
correlation is only known by the clients. However, address- I/O or resource contention (the runnable threads). Figures
ing the overloads that are monitored at the request level 3(a) and 3(b) show how the numbers of threads that are
reduced the wait times at the transaction level, as demon- not blocked changes for the ORB service and the Web con-
strated in the next section. We controlled the thread pools tainer service, respectively. The x-axis shows the execution
for the ORB and Web container services, because these ser- time in milliseconds and the y-axis shows the number of
vices consume most of the CPU resources. Based on ex- runnable threads. Though the pool sizes are a constant 5
periments, we determined that the sensitivity thresholds are for ORB in Figure 3(a) and 16 for Web container in Figure
100 ms for ORB and 50 ms for a Web container and that 3(b), the numbers of runnable threads are constantly chang-
one thread is sufficient for thread redistribution. The thread ing. As a result, the ratio of runnable threads between ORB
pool size has a minimum value of two to avoid starving re- and the Web container fluctuates a great deal, though the ini-
quests for the service. Since our server with 4 hardware tial ratio is 5/16 = 0.3 as shown in Figure 4. The runnable
threads is not a large system, the initial thread pool size threads are scheduled to hardware threads, thus determining
for ORB of five threads is rather small. Therefore, only up the CPU resources allocated for services.
to three threads can be removed from ORB. However, we Next we show that our approach improved the average
sometimes observed situations calling for larger changes in wait times of the queued requests. We first show snapshots
the thread pool sizes by moving threads from ORB even of the variations in the average wait times of the queued
though its pool size had already reached the minimum. To requests measured for ORB and the Web container on the
address such situations, we created extra threads in addition original WAS in Figures 5(a) and 5(b), respectively. The
to the original pool threads. If the situation calls for remov- x-axis shows the execution time in milliseconds and the y-
ing threads from a pool which has reached its minimum, axis shows the average wait time in milliseconds. There
then we can add up to four additional threads to the other are clear variations, especially for the Web container, which
service. If we are later moving threads away from a service communicates with many clients and consumes more CPU
590
1 Service Average Std. deviation
ORB 52.2 41.4
0.5
Table 2. Statistics for Figures 5 and 6
0.4
0.3
0.2 25
0.1
0 20
500000 550000 600000 650000 700000 750000 800000
tween services 5
0
500000 550000 600000 650000 700000 750000 800000
execution time (ms)
400
350
Figure 7. Adapting the thread pool sizes
average wait time (ms)
300
250
maizes the fundamental statistics of the wait times corre-
200 sponding to Figures 5 and 6. Note that SjAS uses these wait
150
times for its TSF targets as explained in Section 6.1.4.
100
50
For the TSF targets, our approach reduced the average
0
500000 550000 600000 650000 700000 750000 800000
90th -percentile response time by up to 27% with an av-
execution time (ms)
(a) ORB service erage of 22%. By improving the 90th -percentile response
400
time, WAS with our approach can still satisfy the TSF tar-
350 gets as we increased the workload or the numbers of clients.
average wait time (ms)
300
250
Summarizing our results, we were able to support a 3%
200 higher throughput, though such additional workload on a
150
busy server should exponentially degrade the response time.
100
50
How we controlled the thread pool sizes is shown in Figure
0
500000 550000 600000 650000 700000 750000 800000
7. The x-axis shows the execution time and the y-axis shows
execution time (ms)
(b) Web container service the thread pool sizes that our prototype used during each in-
Figure 5. Wait time without adaptation terval. Note that more threads than the number specified
by our prototype are running. As explained in Section 5.3,
threads cannot stop immediately when the pool size is de-
400
350
creased. For example, there is a period where we used extra
threads and the thread pool size exceeded the total number
average wait time (ms)
300
250
of the initial and movable threads near 520,000 ms of exe-
200
150
cution time.
100 We still observe spikes exceeding 300 ms in the average
50
0
wait times of the queued requests in Figures 6(a) and 6(b),
500000 550000 600000 650000 700000
350
stops applications, it increases the wait time of the requests
that are queued when GC occurs. For the remaining spikes,
average wait time (ms)
300
250
the threads were waiting for responses from the database or
200
150
contending for locks on shared resources.
100
50
7. Conclusions
0
500000 550000 600000 650000 700000
591
actually responds to the frequently occurring overloads of [19] H. Naccache, G. C. Gannod, and K. A. Gary. A self-healing
services and can adjust the thread pool sizes. We confirmed Web server using differentiated services. In ICSOC, volume
4294 of LNCS, pages 203–214. Springer, 2006.
that the quick responses significantly improved the average [20] Oracle®Containers for J2EE Configuration and Administra-
90th -percentile response time by up to 27% (22% on an tion Guide 10g (10.1.3.1.0). Configuring OC4J thread pools,
average) by using an industry-standard J2EE benchmark, Oct. 2006.
[21] L. F. Orleans and P. Furtado. Fair load-balancing on parallel
SPECjAppServer2004. With the extra room created un- systems for QoS. In ICPP, page 22. IEEE, 2007.
derneath the TSF targets, we successfully supported more [22] P. Pradhan, R. Tewari, S. Sahu, A. Chandra, and P. Shenoy.
clients on a server that was optimally tuned to show the An observation-based approach towards self-managing Web
maximum throughput and gained 3% more throughput. servers. In IWQoS, pages 13–22. IEEE, 2002.
[23] I. Pyarali, M. Spivak, R. Cytron, and D. C. Schmidt. Eval-
References uating and optimizing thread pool strategies for real-time
[1] T. F. Abdelzaher and N. T. Bhatti. Web content adaptation to CORBA. In LCTES, pages 214–222. ACM, 2001.
improve server overload behavior. Computer Networks, 31(11- [24] Rock Web Server User Guide. Worker thread configuration,
16):1563–1577, 1999. 2007.
[2] T. F. Abdelzaher, K. G. Shin, and N. T. Bhatti. Performance [25] D. C. Schmidt and S. Vinoski. Object interconnection: com-
guarantees for Web server end-systems: A control-theoretical paring alternative programming techniques for multi-threaded
approach. IEEE Trans. Parallel Distrib. Syst., 13(1):80–96, CORBA servers: thread pool (column 6). SIGS C++ Report
2002. Magazine, 8:1–12, 1996.
[3] J. Alonso, J. Guitart, and J. Torres. Differentiated quality [26] B. Schroeder and M. Harchol-Balter. Web servers under
of service for e-commerce applications through connection overload: How scheduling can help. ACM Trans. Internet
scheduling based on system-level thread priorities. In PDP, Techn., 6(1):20–52, 2006.
pages 72–76. IEEE, 2007. [27] K. Shen, H. Tang, T. Yang, and L. Chu. Integrated resource
[4] Apache Tomcat 6.0. The Executor (thread pool), 2006. management for cluster-based Internet services. In OSDI,
[5] J. M. Blanquer, A. Batchelli, K. E. Schauser, and R. Wolski. 2002.
Quorum: Flexible quality of service for internet services. In [28] Sun Java System Application Server 9.1 Administration
NSDI. USENIX, 2005. Guide. Thread pools, July 2007.
[6] A. Chandra, W. Gong, and P. J. Shenoy. Dynamic resource [29] Sun Java System Web Server 7.0 Performance Tuning, Siz-
allocation for shared data centers using online measurements. ing, and Scaling Guide. Understanding threads, processes, and
In IWQoS, volume 2707 of LNCS, pages 381–400. Springer, connections, 2007.
2003. [30] The Standard Performance Evaluation Corporation
[7] M. Crovella, R. Frangioso, and M. Harchol-Balter. Connec- (SPEC®). SPECjAppServer®2004, 2004.
tion scheduling in Web servers. In USITS, 1999. [31] D. Tsafrir. The context-switch overhead inflicted by hard-
[8] S. Elnikety, E. M. Nahum, J. M. Tracey, and W. Zwaenepoel. ware interrupts (and the enigma of do-nothing loops). In Ex-
A method for transparent admission control and request perimental Computer Science. ACM, 2007.
scheduling in e-commerce Web sites. In WWW, pages 276– [32] B. Urgaonkar and P. J. Shenoy. Cataclysm: policing extreme
286. ACM, 2004. overloads in internet applications. In A. Ellis and T. Hagino,
[9] S. M. Fontes, C. J. Nordstrom, and K. W. Sutter. WebSphere editors, WWW, pages 740–749. ACM, 2005.
connector architecture evolution. IBM Syst. J., 43(2):316–326, [33] T. Voigt, R. Tewari, D. Freimuth, and A. Mehra. Ker-
2004. nel mechanisms for service differentiation in overloaded Web
[10] P. Furtado and R. Antunes. Deadline and throughput-aware servers. In Y. Park, editor, USENIX Annual Technical Confer-
control for request processing systems. In ISPA, volume 4742 ence, General Track, pages 189–202. USENIX, 2001.
of LNCS, pages 383–394. Springer, 2007. [34] W. Wang, W. Zhang, L. Zhang, and T. Huang. WMQ: To-
[11] M. Harchol-Balter, B. Schroeder, N. Bansal, and wards a fine-grained QoS control for e-business servers. In
M. Agrawal. Size-based scheduling to improve Web per- ICEBE, pages 139–146. IEEE, 2007.
formance. ACM Trans. Comput. Syst., 21(2):207–233, 2003. [35] WebLogic Server®Performance and Tuning. Tune pool
[12] A. Kamra, V. Misra, and E. M. Nahum. Yaksha: a self-tuning sizes, Nov. 2006.
controller for managing the performance of 3-tiered Web sites. [36] WebSphere®Application Server Network Deployment, Ver-
In IWQoS, pages 47–56. IEEE, 2004. sion 6.1. Thread pool settings, June 2007.
[13] K. Kourai, H. Hibino, and S. Chiba. Aspect-oriented [37] J. Wei and C.-Z. Xu. eQoS: Provisioning of client-perceived
application-level scheduling for J2EE servers. In AOSD, pages end-to-end QoS guarantees in Web servers. IEEE Trans. Com-
1–13. ACM, 2007. puters, 55(12):1543–1556, 2006.
[14] Z. Li, D. Levy, S. Chen, and J. Zic. Auto-tune design and [38] M. Welsh and D. E. Culler. Adaptive overload control for
evaluation on staged event-driven architecture. In MODDM, busy Internet servers. In USITS, 2003.
pages 1–6. ACM, 2006. [39] M. Welsh, D. E. Culler, and E. A. Brewer. SEDA: An ar-
[15] Z. Li, D. Levy, S. Chen, and J. Zic. Explicitly controlling the chitecture for well-conditioned, scalable Internet services. In
fair service for busy Web servers. In ASWEC, pages 159–168. SOSP, pages 230–243, 2001.
IEEE, 2007. [40] D. Xu and B. Bode. Performance study and dynamic opti-
[16] Y. Ling, T. Mullen, and X. Lin. Analysis of optimal thread mization design for thread pool systems. In CCCT, 2004.
pool size. ACM SIGOPS Operating Systems Review, 34(2):42– [41] J. Zhou and T. Yang. Selective early request termination for
55, 2000. busy Internet services. In WWW, pages 605–614. ACM, 2006.
[17] C. Lu, Y. L. 0002, T. F. Abdelzaher, J. A. Stankovic, and [42] J. Zhou, C. Zhang, T. Yang, and L. Chu. Request-aware
S. H. Son. Feedback control architecture and design method- scheduling for busy internet services. In INFOCOM. IEEE,
ology for service delay guarantees in web servers. IEEE Trans. 2006.
Parallel Distrib. Syst., 17(9):1014–1027, 2006. [43] H. Zhu, H. Tang, and T. Yang. Demand-driven service dif-
[18] Microsoft®Windows Server®2003 TechCenter. Web and ferentiation in cluster-based network servers. In INFOCOM,
application server infrastructure - performance and scalability, pages 679–688, 2001.
Apr. 2003.
592