Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 15

Intel Itanium 2 Processor

Intels Server Solution


Raymond Ball
April 2, 2004

Presentation Overview
Why Intel Itanium 2 in a DSP class?
General specifications and features
Instruction set
DSP in Itanium 2
Itanium 2 vs. TigerSHARC (?)

Why Itanium 2
Itanium 2 designed for heavy loaded and number
crunching servers which has some similarities to DSP
Its always a good idea to see what other solutions are
available
Designs tend to over time borrow ideas from other fields
which may give insight
To see if the power in the processor is really worth the
cost
Because I was interested

Specifications (April 2004)


Clock 0.9-1.5 GHz
L3 cache up to 6MB
64 bit
128 bit bus (400 MHz)
Price: $3k - $5k ea.
IA-32 compatible

Considered RISC
Pipeline 8 deep
6 instructions / cycle
in 2 bundles of 3
Power consumption:
110W (130W max)
128+128+64+8
registers

Register Stack Engine (RSE)


First 32 registers are global (static)

GR0 is hardwire as 0
Seen this in SHARC because immediate will kill the pipeline

GR32 GR63 local procedure registers


The remaining 96 registers are used to store stacked
register frames
If more room is needed, the registers are pushed onto
memory
Transparently maintains the illusion of an infinite number
of registers
Only for the GRs (other registers are all global)

Instruction set
Instructions come in bundles of 3 operations and 2 bundles are
pulled in once a cycle
Uses a special Explicitly Parallel Instruction Computing (EPIC)
format
The format moves the responsibility of resource management on to
the compiler
Template value dictates to which execution unit an operation will be
performed

Slot 2
Bit 127

Slot 1
Bit 87

Slot 0
Bit 46

Template
Bit 5

Bit 0

Bundled Code Example


{ .mii
add r1 = r2, r3
sub r4 = r5, r6 ;;
shr r7 = r8, r9
}
{ .mfi
ld4 r14=[r56]
fadd f10=f12,f13
add r16=r18,r19
}
{ .mmi
st4 [r16]=r67 ;;
add r24=r56,r57
add r28=r58,r59
}

Cycle 0 Start of a Memory-Integer-Integer bundle


Cycle 1 Part of the last bundle plus another
Memory-Float-Integer bundle done in this cycle

Cycle 2 A single operation


Cycle 3 last two operations in the snippet

Save me compiler!
Instruction set and pipeline so difficult to handle
you wont do much better than the compiler
With the EPIC architecture, more resource
management is put on the compiler, which
means extra work for human compilers
The most efficient DSP algorithms tend to come
from human compilers

Difficult to utilize all of the system resources like a


hand made DSP algorithm

Whats wrong with r1 = r2 + r3?

DSP Relation
How does the instruction set compare to a
DSP processor?

RISC type instruction set


For example, no mem-to-mem move

Itanium 2 could easily be used to


efficiently do a DSP algorithm
The Itanium 2 basically includes every
trick in the book thus far, which includes
borrowing ideas from DSP

Pro-DSP
Many single cycle instructions
Instructions are designed for a heavily pipelined
environment
Processor has ways of accessing the data in a
SIMD fashion (8x8-bit, 4x16-bit, 2x32-bit, 1x64bit)
High precision registers (82-bit floating-point
accumulator)

People wonder whether 64-bit processing is


necessary, well THIS is where its necessary

High number of registers for fast access

Anti-DSP
No hardware loops
No hardware circular buffers
Only a single bus (although fast 6.4GB/s)
High power usage

TigerSHARC vs. Itanium 2


COST! ($0.3k vs. $3k)
Both heavily pipelined
Both very hard to code by hand
There really is no comparison

Processors were made for two different


intensions
The framework that is typically built around
the chips makes it even harder to compare

Conclusion
You get what you pay for or maybe a
little less

The Itanium 2 is consider to be a high-end


server processor
Anything high-end tends to be very over priced
(rack mount equipment)

Sure, its a DSP processor but for that


price it should make you toast in the
morning too

References
Intel Itanium 2 Processor Hardware Developers Manual
Intel Itanium 2 Processor Reference Manual
A 1.5-GHz 130-nm Itanium 2 Processor With 6MB On-die L3 Cache. IEEE
JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 11, NOVEMBER
2003. Stefan Rusu, Senior Member, IEEE, Jason Stinson, Simon Tam,
Member, IEEE, Justin Leung, Harry Muljono, and Brian Cherkauer.

You might also like