Dual Core Architecture Seminar Paper

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Lappeenranta University of Technology Information Technology CT30A7001 Concurrent and Parallel Computing

DUAL CORE ARCHITECTURE Seminar Paper


Oct 30, 2008

Group 04 Manish Thapa [c0346938 manish.thapa@lut.fi] Madan Kadariya [c0346967 madan.kadariya@lut.fi]

ABSTRACT

Lappeenranta University of Technology Department of Information Technology

Manish Thapa Madan Kadariya

Dual Core Architecture Seminar Paper

2008

15 pages

Examiners: Professor, D.Sc. (Tech.) Jari Porras

Keywords: dual core, multithreading, Intel Itanium architecture

With the increase in clock speed of a processor, in a single core processor heat dissipation is increased within a processor. Uncontrolled heat causes error or even damages a processor. So, increasing clock speed for performance was bounded. So, optimal way to find the progressive growth in capabilities of a processor with less clock speed was sought for. The generated idea is placing two processors within a single package with their cache and cache controller together. This is how development of dual core initiated.

This seminar paper briefly explains about the dual core, its need, its utilization, its stand with single core and dual processors. This seminar paper illustrates one of the dual core architecture with reference to Intel Dual Core Itanium 2 Processor and explains the few topics in relevance to this particular processor which helps to get some idea about parallel computing.

TABLE OF CO TE TS

1. Introduction 2. Need of dual core 3. Stand of a dual core 4. Utilization 5. How dual core works 6. Intel Dual-Core Technology 7. Intel Itanium 2 Processor 7.1 Introduction 7.2 History 7.3 Architecture 7.3.1 The Intel Itanium architecture 7.3.2 Instruction Execution 7.3.3. Memory architecture 7.4 Features 7.5 Software support 7.6 Competition 8. Conclusion References

4 4 6 7 8 8

9 9 10 10 12 12 13 13 14 14

1. I TRODUCTIO

Dual-core CPU refers to has two complete execution cores in a single integrated circuit. It includes two processors and their caches and cache controllers onto a single integrated circuit (silicon chip) referred as die. Both core work side by side to help each other in processing and executing. Dual Core is monolithic processor which means processor with all cores on a single die. Each "core" independently implements optimizations such as superscalar execution, pipelining, and multithreading. A system with n cores is effective when it is presented with n or more threads. [1]

Though may sound similar dual-core CPU and a dual-processor CPU are different. In dual core both reside within a same package. Dual-processor is the term for using two processors, not necessarily on the same chip and not necessarily to be in the same motherboard also.

Fig 1: Diagram of a generic dual core processor, with CPU-local level 1 caches, and a shared, ondie level 2 cache. Source: http://en.wikipedia.org/wiki/Image:Dual_Core_Generic.svg

2. EED OF DUAL CORE

With the forward in manufacturing technology continues, reducing the size of single gates, physical limits of semiconductor-based microelectronics have become a major design concern. Adverse

effects of these limits can cause heat dissipation and data synchronization problems. Need of increasing capable microprocessor thus required some new idea to develop something that could handle those physical limits. Instruction-level parallelism (ILP) method like superscalar pipelining was thought of, but seems inefficient because of difficult-to-predict code. Thread level parallelism (TLP) method implementation is improved idea, and multiple independent CPUs are one common method used to increase a system's overall TLP. A combination of increased available space due to refined manufacturing processes and the demand for increased TLP to solve bigger real life problems and gaming is the logic behind the creation of dual-core CPUs. [2]

Processor is a device that executes a series of instructions to tell it what to do. The faster it can do it is considered better. Faster can be directly related to clock speed. Both AMD and INTEL scaled up the clock speeds of their processors in a very short amount of time but have recently slowed the curve. The computer market has long enjoyed the steady growth of processor speeds. A processor's speed is largely determined by how fast a clock tells the processor to perform instructions. There is some constraint on power requirement on the rate at which processor clock speeds can be increased. This trend is shown quite clearly in Figure 2 below where the average clock speed and heat dissipation for Intel and AMD processors plotted over time.[3]

Fig:2 Average clock speed and heat dissipation for Intel and AMD processors. Source: www.ccur.com/pdf/Preparing_forthe_multicore_revolution.pdf

The power consumption is seen elevating in the above graph which requires additional cooling and electrical service to keep the processor continue its operation. More power consumption is analogous to more heat production. The solution was to scale out processor cores instead of scaling up the CPU frequency. It is the flattening of the clock speed curve that some are reasoning why a shift to dual core was sought. The drop-off in clock speed on the graph indicates the delivery of the first dual-core processors from AMD and Intel. The electricity running around the die is prone to noise. The noise refers to interference. The pathways on a processor are microscopically close together. The more power that runs through these pathways due to the requirement of higher clock speeds means that there will be a small amount of electrical radiation from one pathway to the next. That leakage could corrupt the data in another pathway.[4]So, dual core processors are designed to run at a slower clock rate than single core designs due heat issues. These dual-core chips can, in theory, deliver twice the performance of a single-core chip and thus help continue the processor performance march.[3].Though it is not practically achievable because of overheads included while doing multi core. 3. STA D OF A DUAL CORE Dual-core CPU can not be twice as fast as a single-core model running at the same clock speed. There are many issues to degrade the performance. Dual-core CPUs can only work their magic when there is more than one discrete set of tasks to work on known as a "thread". A single-threaded application running on a dual-core CPU simply will not benefit from that second core. When we try to share work between cores, there's overhead involved. Overhead includes load balancing issues, communication between cores and synchronization. Depending on the nature of the task, it is observed that adding a second core will boost performance by up to 70 percent over a single-core CPU. But again, because dual-core CPUs run at lower clock rates, the advantage over competing single-core processors is slim. Even though, dual-core CPUs can work their magic. Business users, for instance, typically have several programs open at once. Dual-core CPUs can help speed things up when we are doing many things at the same time, such as working on a document while loading a page in web browser and listening to music on a media player. Most important, more and more software is being tuned with dual-core processors in mind. Many game vendors and graphics-card companies have aggressively adopted multithreaded architectures to tap dual-core systems. "Even if the game is single-threaded, all the graphics and 3D [drivers] are multithreaded," says Brookwood. [5]

Multithreaded code is used by many media-creation applications, such as Adobe Photoshop and Premier. It can be thus expected that multithreading to become more persistent as software vendors seek to cater to a large installed base of dual-core CPUs.Dual core processors work best when software can run in parallel on them. So called multithreaded applications benefit from an additional CPU core because subroutines can be allocated to different ALU of dual core. Administering the threads carries an overhead, though, which means that dual core processors are never exactly twice as fast as two single cores. [5] Thus, a dual core processor is a cross between a single core processor and a dual processor system. A dual core processor has two cores and share hardware like the memory controller and bus. A dual processor system has completely separate hardware and shares nothing with the other processor. A dual core processor won't be twice as fast as a single core processor nor will it be as fast as a dual processor system. It is in mid of single core and dual processors in terms of performance but has lot more to offer. 4. UTILIZATIO To utilize a dual core processor to the fullest, the operating system must support multi-threading and the software that run must have simultaneous multi-threading technology (SMT) considered during its development. SMT enables parallel multi-threading wherein the cores are served multithreaded instructions in parallel. Without SMT the software will only recognize one core. SMT techniques are considered in server development and for instance Adobe Photoshop also supports SMT. [6] Thus, complete optimization for the dual-core processor requires both the operating system and applications running on the computer to support a thread-level parallelism, or TLP. Thread-level parallelism is the part of the OS or application that runs multiple threads simultaneously, where threads refer to the part of a program that can execute independently of other parts. Even without a multithread-enabled application, we can still see benefits of dual-core processors if we are running an OS that supports TLP. For example, using Microsoft Windows XP which supports multithreading, we can open browser, virus scanner, stream audios or videos at a same time and the dual-core processor will handle the multiple threads of these programs running simultaneously with an increase in performance and efficiency.

Today latest operating systems and hundreds of applications already support multithread technology, especially applications that are used for editing and creating music files, videos and graphics because types of programs need to perform operations in parallel. As dual-core technology becomes more common in homes and the workplace, awareness to build system supporting thread level parallelism is also increased. 5. HOW DUAL CORE WORKS In a single-core or traditional processor the CPU is fed strings of instructions it must order, execute, then selectively store in its cache for quick retrieval. When data outside the cache is required, it is retrieved through the system bus from random access memory (RAM) or from any other storage devices. Accessing these slows down performance to the maximum speed the bus, RAM or storage device will allow, which is far slower than the speed of the CPU. The situation is compounded when multi-tasking. In this case the processor must switch back and forth between two or more sets of data streams and programs. CPU resources are depleted and performance suffers. In a dual core processor each core handles incoming data strings simultaneously to improve efficiency. When one is executing the other can be accessing the system bus or executing its own code. With this advent, both AMD and Intel leading producers of processors have production of their dual core processors. [6] 6. I TEL DUAL-CORE TECH OLOGY Designed from the ground up for revolutionary energy-efficient performance, Intel dual-core processors enable exceptional productivity enhancing features and rich multimedia experiences. With its different approach of processor architecture design, Intel dual-core processors have become the standard for desktop, mobile, and server platforms. We can do many thing together, without slowing down. The key features are:

Boost multitasking power with improved performance for highly multithreaded and compute-intensive applications

Reduce costs and use less power with energy-efficient Intel dual-core processors built on Intel Core micro architecture.

Enjoy flexibility and the performance to handle robust content creation or intense gaming with multimedia-enabling technologies built in.[7]

7. I TEL ITA IUM 2 PROCESSOR

7.1 Introduction
The Itanium is a 64-bit Intel microprocessor that implements the Intel Itanium architecture. There are basically two processor families in the Intel Itanium architecture: Itanium and Itanium-2 families. These processors are mostly used in high performance computing systems and enterprise server solutions. This architecture was initially developed at HP and was later HP and Intel collaborated to build the Itanium series of processor's. The first Itanium microprocessor was released in 2001, and more powerful Itanium processors have been released frequently over the past few years. HP produces most Itanium-based systems, but several other manufacturers have also developed systems based on Itanium. As of 2007, Itanium is the fourthmost deployed microprocessor architecture for enterprise-class systems. The architecture is different from past x86 architecture and can execute six instructions per cycle.

7.2 History Development


Explicitly Parallel Instruction Computing (EPIC) which implements a form of Very Large Instruction Word (VLIW) architecture allowed the processor to execute more than one instruction in one clock cycle. EPIC came into existence as the replacement of reduced instruction set computing (RISC) computers which could execute only one instruction/cycle. With EPIC, the compiler determines in

advance which instructions can be executed at the same time, so the microprocessor simply executes the instructions and does not need elaborate mechanisms to determine which instructions to execute in parallel on its own.
In 1994, HP and Intel jointly developed the IA-64 architecture, which derived from EPIC. Intel had undertaken a large development effort on IA-64 with the vision that they could sell it to majority of the enterprise systems manufacturers. HP and Intel initiated a large joint development effort with a goal of delivering the first product codenamed Merced, in 1998. Due to the structural problems within the

project between Intel and HP, they used different methodologies and had slightly different priorities. Later on, Intel announced the official name of the processor, Itanium on October 4, 1999.
By the time Itanium was released in June, 2001, it was no longer superior to the RISC and CISC processors. Sales were not as expected because of poor yields, relatively poor performance, and high cost and limited software availability and this lack of software raised a serious issue to move forward.

To stimulate the development, Intel made thousands of these early systems available to independent software vendors (ISVs). HP and Intel brought the next-generation Itanium 2 processor to market a year later. [8] 9

Itanium 2 processors: 2002present


The Itanium 2 was released in 2002, aiming enterprise servers, not to High-end computing. The initial Itanium 2 was codenamed McKinley, used a 180 nm process, but it relieved many of the performance problems of the original Itanium. In 2003, AMD released the Opteron, which implemented its x86-64, 64-bit architecture. Opteron gained rapid acceptance in the enterprise server space because it provided an easy upgrade from x86. Intel responded by implementing x86-64 in its Xeon microprocessors in 2004. Intel released a new Itanium 2 family member, named Madison, in 2003. Madison used a 130 nm process and was the basis of all new Itaniums until Montecito was released in June 2006.Itanium is not a high-volume product for Intel. Intel does not release production numbers, but one industry analyst estimated that the production rate was 200,000 processors per year in 2007. The total number of

Itanium servers sold by all vendors in 2007 was about 55,000. This compares with 417,000 RISC servers and 8.4 million x86 servers. From 2001 through 2007, an IDC report shows that a total of 184,000 Itanium-based systems have been sold. This means Itanium-based system revenue reached 26% in the second quarter of 2008. [8]

7.3 Architecture 7.3.1 The Intel Itanium architecture Widely referred to as IA-64, Intel Itanium Architecture is 64-bit register-rich explicitly-parallel architecture. The base data word is 64 bits, byte-addressable. The logical address space is 264 bytes. The architecture implements predication, speculation, and branch prediction under control of the compiler: each instruction word includes extra bits for this. It uses a hardware register renaming mechanism rather than simple register windowing for parameter passing. The same mechanism is also used to permit parallel execution of loops. The architecture implements 128 integer registers, 128 floating point registers, 64 one-bit predicates, and eight branch registers. The floating point registers are 82 bits long to preserve precision for intermediate results. [8]

10

Fig 3 : Itanium Architecture Src:http://upload.wikimedia.org/wikipedia/commons/7/7c/Itanium_arch.png The Intel Itanium processor has two complete 64-bit processing cores on a single processor with up to 24 MB low-latency L3 cache which provides high bandwidth for both cores. It incorporates Hyper Threading (HT) technology with which the number of threads in the operating system is doubled in each core. It yields four times the threads usability by the operating system. High cache together with the Hyper-Threading (HT) doubles the performance compared to earlier dual-core processors. EPIC provides different advanced implementations of parallelism, prediction, and speculation for a great instruction level parallelism (ILP). This feature helps to address the requirements of high-end business enterprise and simulation needs. The dual core Intel Itanium 2 processor includes hardware-assisted virtualization that support increase virtualization effects and broaden operating compatibility. In conjunction with dual core performance improvements and unparalleled scalability advantages, Intel virtualization technology makes Dual Core Itanium 2 based systems and excellent platform for data intensive virtualization. [9] The Intel Itanium 2 uses 20% less power than the previous dual-core Itaniums with 2.5 times higher performance per watt, which lowers the energy requirements with major performance 11

improvements. The Itanium contains 128 general and 128 floating-point registers that support rotation. Also, a register stack engine is used to improve the management of processor resources. Another feature introduced in the Itanium 2 is the support of prediction and speculation that helps improve the processing performance. It has high-Bandwidth System Bus for scalability. The processor uses up to 8.53 GB/s bandwidth. It has a 128-bit data bus (64 bits dedicated to each core). It also provides 50-bits of physical memory addressing and 64-bits of virtual addressing. The busses, with 400-533 MHz frequency, are expendable to systems with multiple system busses.

7.3.2 Instruction Execution


A 128-bit instruction word contains three instructions, and two instruction words per clock can be fetched from the cache into the pipeline. With full usage of this, the processor can execute six instructions per clock cycle. The processor has thirty functional execution units in eleven groups. Each unit can execute a particular subset of the instruction set, and each unit executes at a rate of one instruction per cycle unless execution stalls waiting for data. While not all units in a group execute identical subsets of the instruction set, common instructions can be executed in multiple units. The execution unit groups consists of

Six general-purpose ALUs, two integer units, one shift unit Four data cache units Six multimedia units, two parallel shift units, one parallel multiply, one population count Two floating-point multiply-accumulate units, two "miscellaneous" floating-point units Three branch units[8]

7.3.3. Memory architecture


The Itanium 2 processors have 3 levels of cache. The Level 1 cache is 16KB for both instruction and another 16 KB for data. The Level 2 cache is 256KB fro both instruction and data. The Level 3 cache varies from 1.5MB to 24MB. Level 2 cache is used to handle semaphore logic. Main memory is accessed through a bus to an off-chip chipset. The speed of the bus is it transfers 2x128 bits per clock cycle. [8]

12

7.4 Features Some of the key features of the Itanium include: EPIC Architecture With explicit parallel instruction computing offered in Intel Itanium Architecture, high end enterprise and business workload could be addressed. Dual Core Processing With two complete 64- bit cores in a single processor, clock size limitation in single core processor is addressed. Intel Hyper-Threading Technology Compared to single core, four times the applications threads can be run at a time. Intel Virtualization technology With scalability and dual core aid, it prepares excellent platform for data-intensive virtualization. Cache Safe technology Minimizes cache errors and enables to operate even in the event of errors. Energy Efficiency Takes less power compared to earlier series. Thus increases power performance watt. Security Faster data encryption, robust memory and hardware authentication of firmware enables data security.
Features to support flexible platform environments: An IA-32 execution layer is available in the Itanium 2 to support IA-32 application binaries. The processor contains an abstraction layer that eliminates processor dependencies. [9]

7.5 Software Support To add software support compatibility in Itanium, Intel supported the development of effective compilers for its platform including gcc, Open 64 and MS Visual Studio. Itanium is supported by Windows Server 2003 and Windows Server 2008 and multiple Linux distributions. Itanium also supports mainframe environment GCOS from Groupe Bull and several IA-32 operating systems via Instruction Set Simulators. According to the Itanium Solutions Alliance, as of early 2008, over 13,000 applications are available for Itanium based systems. [8] The ISA also supports Gelato, an Itanium HPC user group and developer communities that supports open source software for Itanium. [8]

13

7.6 Competition The Itanium 2 competes in the enterprise server and high-performance computing (HPC) markets. Itanium's major competitors include Sun Microsystems UltraSPARC IV+, Fujitsu's SPARC64, IBM's POWER6, AMD's Opteron, and Intel's own Xeon servers.Itanium has had the best floating point performance relative to fixed-point performance compared to any other general-purpose microprocessor. [8]

8. CO CLUSIO

This seminar has helped us a lot to explore a bit in parallel technology in the area of dual core processors and its architecture. We knew ideas about dual core, quad core and multi core development and studied the key features of architecture of Intel Itanium 2 Architecture. Overall dual core processors have not created yet the impact in market as expected because of not extensive use of thread level parallelism in application development. But sooner or later multicore processor will utilize its capabilities to the fullest with multithreaded support operating system and applications.

14

REFERE CES

[1] http://en.wikipedia.org/wiki/Dual_Core [2] http://mediacoder.sourceforge.net/wiki/index.php/Multi-Core [3] www.ccur.com/pdf/Preparing_forthe_multicore_revolution.pdf [4] http://icrontic.com/articles/dual_core [5] http://www.widowpc.com/2006/01/dual_core_compu.php [6] http://www.wisegeek.com/what-is-a-dual-core-processor.htm [7] http://www.intel.com/technology/computing/dual-core/ [8] http://en.wikipedia.org/wiki/Itanium [9] http://download.intel.com/products/processor/itanium2/dc_prod_brief.pdf

15

You might also like