Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 39

A Characterization of

Processor Performance in the


VAX-11/780

From the ISCA Proceedings 1984


Emer & Clark
Overview
• It used a micro-PC histogram which
was, at the time, a novel approach to
evaluating processor performance.
• This approach attempted to quantify
actual performance at the
microinstruction level.
• How did they do this and what were the
results?
What is the motivation?
• The VAX 8200 was being designed as the
first VAX microprocessor and the design
included a CPU which was spread across 3
chips with the microcode on another 5 chips.
• Chip crossing were expensive.
• Would a two-level hierarchical microcode
store work better? (Some microcode on the
processor chip and the rest on the microcode
chip.)
Motivation Continued
• The question they were trying to answer
was “With different latencies for different
microinstructions, what would the
performance be?
• As we will see this approach did not
answer the question.
Why is the VAX a good candidate
for this type of study?
• Think about studying a CISC. Would this
help in designing a RISC?
• Yes – if you can get quantitative numbers on
frequency of instructions and whether its
complex components are used. Then you
can remove instructions that are rarely used
and possibly map these into a combination of
instructions which are remaining. However,
this comes at the cost of not having as many
instructions from which to chose.
Block Diagram
Definitions

What terms does this paper use


and what do they mean?
What Is a Microinstruction?
The actual instruction could be 'add the
contents of registers X and Y', a
microinstruction would be more like 'write out
register X to bus Z', or 'read data bus into
register Q'. These are very basic actions and
can be woven together to implement the
actual instruction set of the machine. When
taken in total this set of microinstructions was
called the microprogram, or microcode.
Thanks to Neal Harman at www-
compsci.swan.ac.uk/~csneal/HPM/into.html
What is a Read Stall?

Occurs when there is a cache miss on a


D-stream read, the requesting
microinstruction waits while the data is
being retrieved. This takes a minimum
of 6 cycles on the VAX.
What is a Write Stall?

Occurs anytime there is a write


attempted less than 6 cycles after a
previous write. Can be minimized by a
microprogram which only writes every 6
cycles.
What is an IB Stall ?

Occurs when there are not enough


bytes to satisfy the microcode’s request.
Occurs during I-stream processing.
What is an Architectural
Event?

An event that would occur in any


implementation of the VAX architecture.
What is an Implementation
Event?

An event that is dependent on the


particular implementation of the VAX
architecture.
What are Operand Specifiers?

The operand specifiers follow the


opcode in the I-stream and indicate the
type of operand. For example, whether
a read operand is located in memory
addressed by a register or in a register.
Micro-PC Histogram
Techniques
• They developed their own special purpose
hardware that was able to gather data in up to
16,000 addressable count locations which
incremented a select location based on the
microcode execution.
• The counter capacity was sufficient to collect
data for up to 1 to 2 hours of heavy processing
on the CPU.
• The great strength of this approach is its ability to
classify every processor cycle and thus to
establish duration of events.
Micro-PC Histogram
Techniques Continued
• This technique captures data that is
under the direct control of the
microcode.
• Both live and synthetic environments
were used for data collection.
• The VMS Null process, which runs
when the system is idle, was excluded.
Disadvantages of the Micro-PC
Histogram Technique
• Data such as instruction stream memory
references are not part of the data collected
because these events are controlled by the
hardware.
• To save space, some microcode shares
microinstructions. In these cases it is not
possible to differentiate between the sharers of
the code.
• Only average behavior is captured because there
are no mechanisms to capture the variations of
the statistics during the measurement.
Architectural Events
Opcodes
• Can not distinguish all opcodes.
• Opcodes can be grouped together.
• Those classified as Simple (moves,
branches and other simple instructions)
occur much more frequently than other
opcodes. This is no real surprise.
Opcode Frequency
How did they determine
Opcode frequency?
• They counted the number of times each
microinstruction was seen.
• They knew which microinstructions
were included in each Opcode.
• Then by developing a linear algebra
equation they were able to “back-in” to
the Opcode frequencies.
How did they determine
Opcode frequency? – Cont.
• Since some Opcodes shared
microinstructions or even contained
exactly the same microinstructions
(such as add and subtract) the best
data they were able to glean from the
results of the linear algebra equation
were groups of Opcodes.
PC-Changing Instructions

• The most interesting for the purposes of


this paper are the conditional branches
that actually branch.
• These account for 38.5% of all
instructions.
PC-Changing Instructions
Memory Operations
• The ratio of reads to writes is about 2 to
1 on the VAX.
• To directly measure the contribution of
each type of instruction group on overall
performance, the results are in terms of
events per average instruction.
D-stream Reads and Writes
Average Instruction Size
• This is the only true architectural feature of
the I-stream.
• To calculate this we use the average operand
specifier size which they were able to
determine was 1.48 and displacement figures
from a previous paper by the authors which is
1.68 bytes.
• Branch Displacements are given at .31
Estimated Size of Instruction
Implementation Events
I-stream References
• The VAX uses an 8-byte Instruction Buffer
(IB).
• This buffer makes a cache reference any time
one or more bytes are empty.
• The IB is controlled by hardware so the
micro-PC histogram does not include counts
of IB references.
• A previous paper found that there were 2.2
cache references per instruction on average.
Cache Misses
• Cache is controlled by hardware so we
have to rely on previous work which
concluded that cache read misses is
.28, of which .18 was due to I-stream
and .10 due to D-stream.
• These misses are referred to as
microcode stalls.
Translation Buffer (TB) Misses
• The TB is controlled by microcode, so it can be
measured.
• The TB miss triggers a trap and the cycles are
counted by a micro-routine for the duration of the
trap.
• The results were .029 misses per instruction (.02
for D-stream and .009 for I-stream).
• The average number of cycles used to service a
miss was 21.6 of which 3.5 were read stalls
because the page table was not in cache.
Stalls
• The occurrence and duration of read,
write and IB stalls are all specific to the
implementation.
• The duration is measurable by the
micro-PC technique but the frequency is
not.
Cycles per Instruction
Average Cycles per
Instruction
Cycles per Instruction by
Group
Summary – What is Important
and Applicable Today?
• This paper was one of the first to begin
to quantify performance.
• Today quantifying performance is
expected and there are many
benchmarks that have been developed
by which systems can be compared.
Summary – What is Important
and Applicable Today?
• Based on this and other work,
architectural design has moved from
and art form to a science.
• The fact that 83.6% of all instructions
are simple, supports the idea of RISC.
Summary – What is not as
Relevant Today?
• Floating point was a very small percentage of
instructions (3.62%). Today the number of
floating point instructions would probably be
much higher because of all the graphics used
in modern computers.
• Call/Ret which was 3.22% would probably be
much higher today with the advent of object
oriented languages which use numerous
function calls, overloaded and virtual function
etc.

You might also like