Professional Documents
Culture Documents
Multicore Processor
Multicore Processor
Multicore Processor
6
•In Multi-core chips: hyper-threads
-L1 caches private
-L2 caches private in
CORE1
CORE0
some architectures
and shared in others
L1 cache L1 cache
•Memory is always shared L2 cache
memory
7
Private vs shared caches
Advantages of private:
They are closer to core, so faster access
Reduces contention
Advantages of shared:
Threads on different cores can share the same
cache data
More cache space available if a single (or a few)
high-performance thread runs on the system
8
Features
Wide DynamicExecution
Intelligent Power Capability
Advanced Smart Cache
Smart Memory Access
Dynamic execution is a set of tools (data flow analysis,out-of
order execution, superscalarity)
Wide Dynamic Execution – is a set of tools,implemented in
Core microarchitecture;they expanded parallelism in operation
execution and reduced energy consumption.
It lets each core execute up to four instructions per clock
Deeper instruction buffers for greater execution flexibility
Additional features to reduce execution time
◦ Macrofusion –two of instructions can be fused into one micro-
op
◦ Microfusion – two of micro-ops can be fused into another
micro-op in order to save time
◦ ALU were enhanced to increase effectivity of macrofusion
The Advanced Smart Cache is a multi-core
optimized cache that significantly reduces latency
to frequently used data, thus improving
performance and efficiency by increasing the
probability that each execution core of a multi-
core processor can access data from a high-
performance, more efficient cache subsystem.
L2 cache is shared by cores, so, if one core uses
less amount of L2, other can use larger part of L2.
In addition, common to both cores data is stored
in only one copy.
Smart Memory Access
Smart Memory Access includes:
Memory disambiguation
Advanced prefetching
Memory disambiguation - increases the efficiency of out-of-order
processing by providing the execution cores with the built-in intelligence
to speculatively load data for instructions that are about to execute
before all previous store instructions are executed.
Advanced prefetching – prefetcher gets data into a higher level unit
using very speculative algorithms. It is designed to provide data that is
very likely to be requested soon, which can reduce latency and
increase efficiency.
Intelligent Power Capability
Several measures are adopted that start at
manufacturing level:
The 65-nm process provides a good basis for
efficient ICs
Clock rating and sleep transistors make sure that
all units as well as single transistors that are not
needed remain shut down
Enhanced SpeedStep still reduces the clock speed
when the system is idle or under a low load, but it
is also capable of controlling each core separately
Voltage can also be different in different blocks of
the processor
Why multi-core ?
•Difficult to make single-core clock
frequencies even higher.
•Deeply pipelined circuits:
-heat problems
-speed of light problems
-difficult design and verification
-large design teams necessary
-air-conditioning
•Many new applications are
multithreaded(Adobe photoshop is an
example of multi threading software)
Intel processor branding
The important points are as follows:
1) The new brand is Intel Core. There will be
three derivatives: Core i7, Core i5 and Core i3
2) The Core 2 Duo and Core 2 Quad branding will
eventually disappear
3) Pentium, Celeron and Atom will remain.
4) Centrino will also go away and Intel's WiFi and
WiMAX products will inherit the name starting
in 2010
Intel Core 2 Duo
In 2006 Intel presented dual-core processors Core
2 Duo; they have received the name of most
powerfull CPU at that time - this is particularly
important to the company
They are based on the Core microarchitecture,
produced using 65 nm technology
After start of production leader in class of desktops
was “extremal” Intel Core 2 Extreme X6800,
designed for high-end game computers
Frequency 2,93 MHz, FSB 1066 MHz, 4 MB L2
cache, price $999
Intel Core i7
Intel Core i7 is a family of several Intel
desktop and laptop 64-bit processors, the first
processors released using the Intel Nehalem
microarchitecture
Core i7 design is based on current Core 2
processors but has been widely revised
Core i7:
integrates four cores into a single chip
brings the memory controller onboard, and
introduces a low-latency point-to-point interconnect
called QuickPath to replace the front-side bus
Intel Core i5
On September 8, 2009, Intel released the
first Core i5 processor: The Core i5 750, which
is a 2.66 GHz quad-core Lynnfield processor
with Hyper-threading disabled
Lynnfield Core i5 processors have an 8 MB L3
cache, a DMI bus running at 2.5 GT/s and
support for dual-channel DDR3-
800/1066/1333 memory
Intel Core i5 - 2009
Max. CPU clock - 2.66 GHz
FSB speeds - 2.5 GT/s
Min. feature size - 45 nm to 32 nm
Instruction set - x86, x86-64, MMX, SSE, SSE2,
SSE3, SSSE3, SSE4.1, SSE4.2
Microarchitecture - Intel Nehalem, Intel Westmere
Cores - 2-4
Sockets - LGA 1156,
Core names - Arrandale, Clarkdale, Lynnfield
What separates a Core i7 from a Core i5 and
Core i3?
Desktop Processor Cores Threads Turbo
Intel Core i7 4 8 Yes
Intel Core i5 2 or 4 4 Yes
Intel Core i3 2 or 4 4 No
Mobile Processor Cores Threads Turbo
Intel Core i7 2 or 4 4 or 8 Yes
Intel Core i5 2 or 4 4 Yes
Intel Core i3 2 or 4 4 No
What applications benefit
from multi-core?
Database servers
Web servers (Web commerce)
Each can
Compilers run on its
Multimedia applications own core
Scientific applications,
CAD/CAM
In general, applications with
Thread-level parallelism(as
opposed to instruction-level
parallelism)
21
Conclusion
• Multi-core chips an
important new trend in
computer architecture