Multicore Resource Managment

MULTICORE RESOURCE MANAGMENT
TABLE OF CONTENT
Abstract Introduction Virtual private machines Resource Abstraction Mechanisms VPM scheduler Partitioning mechanisms Feedback mechanisms Resource Manager Hardware Execution Throttling Evaluation and Results Analysis Conclusions References ... .. .. ... ........1 2 2 .3 . ... 3 3 .4 4 4 ............ 5 .5 5 6
Abstract.
Modern processors provide mechanisms (such as duty cycle modulation and cache prefetcher adjustment) to control the execution speed or resource usage ef ciency of an application. Although these mechanisms were originally designed for other purposes, we argue in this paper that they can be an effective tool to support fair use of shared on-chip resources on multi-cores. Compared to existing approaches to achieve fairness (such as page coloring and CPU scheduling quantum adjustment), the execution throttling mechanisms have the advantage of providing ne-grained control with little software system change or undesirable side effect. Additionally, although execution throttling slows down some of the running applications, it does not yield any loss of overall system ef ciency as long as the bottleneck resources are fully utilized. We conducted
experiments with several sequential and server benchmarks. Results indicate high fairness with almost no ef ciency degradation achieved by a hybrid execution throttling mechanisms.
1 Introduction
Modern multi-core processors may suffer from poor fairness with respect to utilizing shared on-chip resources (including the last-level on-chip cache space and the memory bandwidth). In particular, recent research effort has shown that uncontrolledchip resource sharing can lead to large performance variations among co running applications. Such poor performance isolation Makes an application s performance hard to predict and consequently it hurts the system stability to provide Quality-of-service support. Even worse, malicious Applications can take advantage of such obliviousness to on-chip resource sharing to launch denial-of-service attacks and starve other applications. In this article, our vision for resource management in future multicore systems involves enriched interaction between system software and hardware. Our goal is for the application and system software to manage all the shared hardware resources in a multicore system.
1 Virtual private machines

In a traditional multiprogrammed system, the operating system assigns each application (program) a portion of the physical resources for example, physical memory and processor time slices. From the application s perspective, each application has its own private machine with a corresponding amount of physical memory and processing capabilities. With multicore chips containing shared microarchitecture level resources, however, an application s machine is no longer private, so resource usage by other, independent applications can affect its resources. Therefore, we introduce the virtual private machine (VPM) framework as a means for resource management in systems based on multicorechips.3VPMs are similar in principle to classical virtual machines . However, classical virtual machines virtualize a system s functionality5 (ignoring implementation features), while VPMs virtualize a system s performance and power characteristics, which are implementation specific. A VPM consists of a complete set of virtual hardware resources, both spatial (physical) resources and temporal resources (time slices). These include the shared microarchitecture-level resources. By definition, a
VPM has the same performance and power characteristics as a real machine with an equivalent set of hardware resources.
2 Resource Abstraction
A key design requirement for simpli cation of portability between platforms is application development independent of physical platforms. This is crucial for multi-core systems where adding a processor may not necessarily bring an improvement of performance. Infect, embedded software developed to be highly ef cient on a given multicore platform, could be very ine cient on a new platform with a di erent number of cores. Roughly speaking, two key features have to be extracted to properly virtualize a multi-core platform: the overall computing power of the entire multiprocessor platform and the number m of virtual processors of the platform. In ACTORS, a virtual multi-core platform is represented as a set of m sequential virtual processors (VPs). We brie y recall the basic concepts that allow the abstraction of a single processor.
3 Mechanisms
There are three basic types of mechanisms needed to support the VPM framework: a VPM scheduler, partitioning mechanisms, and feedback mechanisms. The first two types securely multiplex, arbitrate, or distribute hardware resources to satisfy VPM assignments. The third provides feedback to application and system policies. Because mechanisms are universal, system builders can implement mechanisms in both hardware and software. Generally, the VPM scheduler is implemented in software (in a microkernel11 or a virtual machine monitor12), while the partitioning mechanisms and feedback mechanisms are primarily implemented in hardware. Although a basic set of VPM mechanisms are available, 2, 3 many research opportunities remain to develop more efficient and robust VPM mechanisms.
3.1 VPM scheduler

The VPM scheduler satisfies applications temporal VPM assignments by timeslicing hardware threads.3 The VPM scheduler is a proportional-fair (p-fair) scheduler,13 but it must also ensure that coscheduled applications spatial resource assignments don t conflict that is, that the set of coscheduled threads spatial resource assignments
match the physical resources available and don t oversubscribe any microarchitecture resources. VPM scheduling in its full generality, satisfying proportional fairness without spatial conflicts, is an open research problem.
3.2 Partitioning mechanisms

To satisfy the spatial component of VPM assignments, each shared microarchitecture resource must be under the control of a partitioning mechanism that can enforce minimum and maximum resource assignments. As we described earlier, the resource assignments are stored in privileged control registers that the VPM scheduler configures. In general, each shared microarchitecture resource is one of three basic types of resources: a memory storage, buffer, or bandwidth resource. Each type of resource has a basic type of partitioning mechanism.
3.3 Feedback mechanisms

Mechanisms also provide application and system policies with feedback regarding physical resource capacity and usage. Feedback mechanisms communicate to system policies the capacity of the system s resources and the available VPM partitioning mechanisms. They also provide application policies with information regarding individual applications resource usage. Application resource usage information should be independent of the system architecture and the application s VPM assignments. For example, a mechanism that measures a stack distance histogram can predict cache storage and memory bandwidth usage for many different cache sizes.
4 Resource Manager
In ACTORS, the Resource Manager (which is implemented as a user-level application) is responsible for allocating the resources to applications. Frequent reactions to fluctuations of computational demands and resource availability would be too inefficient. Rather, resource management in ACTORS is inspired by the Matrix resource management framework, where application demands are abstracted as a small set of service levels, each characterized by a Quds and resource requirements.13 In this way, only significant changes trigger a system reconfiguration.
5 Hardware Execution Throttling
One mechanism to throttle a CPU s execution speed available in today s multi core platforms is dynamic voltage and frequency scaling. However, on some multicore platforms, sibling cores often need to operate at the same frequency. Intel provides another mechanism to throttle per-core execution speed, namely, duty-cycle modulation. Speci cally, the operating system can specify a portion (e.g. Multiplier of 1/8) of regular CPU cycles as duty cycles by writing to the logical processor s IA32 CLOCK MODULATION register. The processor is effectively halted during non-duty cycles. Dutycycle modulation was originally designed for thermal.
6 Evaluation and Results Analysis

We enabled the duty-cycle modulation and cache prefetcher adjustment mechanisms by modifying the Linux 2.6.18 kernel. Our experiments were conducted on an Intel Xeon 5160 3.0GHz Woodcrest dual-core platform. The two cores share a single 4MB L2 cache (16-way set-associative, 64-byte cache line, 14 cycle latency, write back).
7 Conclusions
This paper presented the approach developed in the ACTORS research project for managing computational resources over multi-core platforms. The adopted bandwidth reservation mechanisms allows the system designer to better control the allocation of the available resources, with respect to the classical threads and priorities approach. Together with appropriate measurements, this technique also facilitates automatic run-time resource management. The approach resulted to be very e ective for handling time-sensitive applications with highly variable load. The reservation mechanism is provided by a new Linux scheduling class , SCHEDDEADLINE, fully integrated in the 2.6.33 kernel release.
References
[1] L. Abeni and G. Buttazzo. Integrating multimedia applications in hard realtime systems. In Proceedings of the19th IEEE Real-Time Systems Symposium 2. F.J. Cazorla et al., Predictable Performance in SMT Processors: Synergy between the OS and SMTs, IEEE Trans. Computers. 3. K.J. Nesbit, J. Laudon, and J.E. Smith, Virtual Private Machines: A Resource Abstraction for Multicore Computer Systems, tech Computer Engineering Dept., University of Wisconsin Madison, Dec. 2007. [4] S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In 39th Int l Symp. on Microarchitecture. [5] A. Fedorova, M. Seltzer, and M. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. [6] L.R.Hsu,S.K.Reinhardt,R.Iyer,andS.Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: [7] IA-32 Intel architecture software developer s manual, 2008.

Multicore Resource Managment

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multicore Resource Managment

Uploaded by

Copyright:

Available Formats

MULTICORE RESOURCE MANAGMENT

1 Virtual private machines

3.1 VPM scheduler

3.2 Partitioning mechanisms

3.3 Feedback mechanisms

5 Hardware Execution Throttling

6 Evaluation and Results Analysis

You might also like