Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

IBM Power Processors

POWER Series and POWER Architecture


Vikhyat Babel (13BEC122)
Department of Electronics and Communication
Nirma University, Ahmedabad

Abstract IBM POWER processors were one of the most


efficient and fastest processors, when compared to same from
their generation. Their transition from POWER to POWER ISA
version, has made them much performance and speed efficient
along with the generation. From POWER1 to POWER8, each
one had its own importance.
Keywords POWER ISA, PowerPC, POWER Architecture

INTRODUCTION
IBM developed a highly efficient micro processor series called
power, a number of derivatives following it, which includes
POWER1, POWER2, POWER3, continuing to POWER8.
Systems like RS-6000, AS-400, p Series, i Series, Systemp,
Systemi and Power Systems lines of server and super
computers make use of these processors by IBM. Even, data
storage device make use of these processors. These include
manufacturers like IBM, Bull, Hitachi. In 1980, IBM
developed the first power processor, the study on which led to
development of its later members. Initially, these processors
made use of POWER Instruction Set Architecture (ISA).
POWER ISA, later on, led to the development of PowerPC
and Power Architecture. How the processor is named remains
the same till present day. Present generation processors have
moved on from the use of POWER Instruction Set
Architecture to Power Architecture. Developed by IBM,
POWER is a RISC based instruction set architecture. The full
form for POWER is Performance Optimization with Enhanced
RISC. These include processors namely, POWER1 and
POWER2. The introduction of POWER3 by IBM, which was
based on ISA successor, PowerPC, had a reduced value. It was
a thirty two or sixty four bit processor based on PowerPC.
After that, there were no developments made in POWER ISA.
II. HISTORY
It was in the second half of 1980, when IBM brought in Power
Architecture which resulted in the much needed POWER
architecture. It was first, made to be utilized, with
the computers called RS/6000. These were made familiar in
1990. It was a RIOS/1 processor which was called POWER1,
later on. It was the AIM triad including Apple, Motorola and

IBM, which later on developed PowerPC architecture. IBM


decided to enhance its POWER architecture, which led to the
rise of eight-chip processor called POWER2. It was brought in
during 1993. Its one-chip edition was called POWER2 Super
chip, which was acquainted in 1996. It was in the first quarter
of 1990, that IBM decided to bring in RISC architecture in the
place of its cache instruction set based AS-400 mini
computers. Amazon was the code name under which the new
architecture was developed. This architecture, later, was
known as PowerPC-AS, Power Performance Computing
Advanced Series. In 1998, the unification of the later
PowerPC and POWER2, was developed, which was called
POWER3. In 2001, IBM brought in POWER4, which
superseded the PowerPC Advanced Series along with POWER
3 architectures. IBM then made POWER5, an evolved version
of POWER4, more familiar. The version 2.06 spec of Power
ISA was launched in Feb 2009, and was later edited and
formatted in July 2010. In Feb 2010, the POWER7 processor
was launched by IBM. August 2013 led to the foundation
of Open POWER Foundation by IBM. This was an initiative
to trace new ideas in server and collaboration in data
centre space. This was also an opening for licensing of
POWER8 related technologies. The POWER8 processor was
also revealed. It was manufactured on a 22 nanometre process,
with twelve eight-way multithreaded cores working at
four GHz.

FIG.1 EVOLUTION OF POWER PROCESSORS [5]

III. THE POWER SERIES

was given advantage from the inclusion of a L2 cache having


capacities of 512 KB, 1 MB and 2 MB.A sixty four or one
hundred twenty eight bit bus was used to connect this cache to
POWER2+. It was a direct mapped cache and had a 128 byte
line size. It was write through. The chip for storage control
contained the cache tags. The memory bus of POWER2+ is
slightly narrower. It also has a smaller data cache. The chips
for data cache are smaller as there is a lesser cache present.

A. IBM POWER 1
The POWER1 is a multichip processor developed by IBM. It
implemented the POWER ISA. It was actually called the
RISC system-6000 central processing unit or the RS-6000
central processing unit earlier than the higher members needed
that the actual name should be replaced so as to distinguish it
from further changing designs. It was a thirty two bit 2 way
super scalar processing unit. It contained three of the
execution units, one of the fixed-point unit, a branch unit and
floating point unit. Although it has a thirty two bit physical
address, it has a virtual address of fifty two bits. This greater
space for virtual address was much favourable for the
application performance, as a large four GB range of address
was allowed by it. It utilizes Harvard format cache grading
with different and definite instruction cache and data cache.
Instruction cache was eight KB in length or size and is a 2 way
associative having line size worth sixty four bytes. Instruction
cache is situated in the ICU. The data cache is 32 or 64 KB in
size. It is four-way associative having line size equal to one
hundred and twenty eight bytes. It was a high end based
design. Multiprocessing was the limitation POWER1 faced. It
wasnt capable of performing multiple processing at a single
time. IBM made use of clustering to solve this problem,
allowing the member to effectively perform as if it was a
multiprocessing system.
B. IBM POWER 2
The POWER2 is a processing unit which was designed and
developed by IBM. It involves the implementation of
POWER instruction set architecture. It was the next member
of the POWER generation after POWER1. When introduced,
it was the fastest microprocessor. Improvements over the
previous member, that is, POWER 1, included intensification
of the instruction set architecture. The intensifications
included bringing in a set of new user instructions and new
system instructions, increased rate of clock (55 to 71.5 MHz),
an auxiliary fixed point and an additional floating point unit,
an increased instruction cache size, and an increased size of
data cache. It was a multi chip design which had 6 or
8 integrated circuits, decided according to the amount of data
cache. The distribution of design was similar to the previous
member that contained a chip for instruction cache, a chip for
fixed point unit, a chip for floating point unit, a storage control
chip, and two or four chips for data cache. The eight chip
configuration included approximately twenty three million
transistors. An enhanced version of POWER 2 was made
familiar as POWER2+. Negotiations and enterprise processing

C. IBM POWER 3
POWER3 is a micro processor, which involved the
implementation of 64 bit PowerPC instruction set architecture,
comprising of the arbitrary instructions of ISA. It was
introduced in the RS-6000 43P Model260, a high end graphics
work station. It was the next member of the POWER series
after the POWER2 super chip version of POWER 2 and acted
as the transition making element for IBMs transition to
PowerPC after POWER. POWER3 was used in IBM RS6000, work stations at 200 Mega Hertz. It was based on an
antecedent 64 bit PowerPC employment that was not
successful in commercial market. The POWER 3 includes
three of fixed point, two of floating point multiply and
add units, and an additional load and store unit to increase the
efficiency of floating point performance. The POWER3 is an
out of order instruction execution design. It has one seven
stage integer pipelining along with a basic eight stage store
and load pipelining and one floating point pipelining of ten
stages. Its front end includes fetch and decode. At the time of
its first stage, eight of the instructions were selected from
instruction cache and were placed in an instruction buffer. At
the time of the second stage, the four of the instructions were
adopted from the buffer, decoded, and were issued to queues.
Limitations on instruction issuing include this that only one of
the two integer queues can take one instruction while the other
one can take four or less. There is short pipelining at the front
end which gives rise to a three cycled branchs incorrect
prediction penalty. While, in stage three, the operands of the
instructions that are ready for decapitation, are read from their
respective register files. There are forty eight registers in the
general purpose register file. Thirty two of the forty eight
registers are general purpose and sixteen of them are
for register renaming. In order to decrease the port numbers,
the register file is made to duplicate. The first one supports
three of the integer execution units. The second acts as a base
for the two load and store units. There are fifty six registers in
floating point file. Execution starts in the fourth stage. Eight or
less than eight instructions are dispatched by the queues and
are sent to the execution units. The three integer execution
units execute integer instruction. Two floating point units
execute floating point instructions. Once execution is over, the
buffers hold the instructions. For integer instructions,
execution is completed in stage five and for floating point
instructions, in eighth stage.
D. IBM POWER 4

The POWER4 is a micro processor that made use of the sixty


four bit PowerPC and its instruction set architecture. The
POWER4 which was seen in RS-6000 along with AAS-400
computers was the next member of the series after POWER3.
It was a multi core microprocessor containing two cores on
one die. The actual POWER4 had a clock rate of 1.1 to
1.3 Giga Hertz, while the POWER4+, attained a clock rate of
1.9 Giga Hertz. In POWER4, the cooperative L2 cache is
distributed into three similar parts. Each of the part possesses
independence of the L2 controller that feeds thirty two bytes
data in a cycle. Each of the L2 controllers is connected to the
data or instruction cache by Core Interface Unit. The
instruction ensuing functions are handled by Non cacheable
unit. Though the actual memory is slower and off, there is a
L3 cache controlling unit. Input Output device
communications are controlled by GX controller. There are
two four byte sized GX bus, one of them is incoming and the
other is outgoing. There is a master controller for this bus
network, called Fabric Controller, which controls L1 or L2
controllers communications between POWER 4 chips. Trace
and Debug is also given. It provides a super scalar micro
architecture through unsubstantiated out of order execution. It
has eight independent execution units, two floating point units,
two load store units, two fixed point units, one branch unit,
and one conditional register unit. Eight operations per clock
can be completed by these execution units. One fused multiply
and add in a clock is completed by each of the two floating
point units. Each load store unit completes one of the
instructions in a clock cycle. Each fixed point unit completes
one of the instructions in a cycle. The pipelining covered are
Branch-Prediction, Instruction-Fetch, Decode, Crack/Group
Formation, Group-Dispatch and Instruction Issue, LoadStore
Unit-Operation, Load-Hit Store, Store-Hit Load, Load-Hit
Load, Instruction Execution Pipelining.
E. IBM POWER 5
The POWER5, which is an enhanced POWER 4 version, is
a micro processor was introduced by IBM. The principal
improvements include backing for concurrent multithreading
and an on die controller. It is a two core micro processor. Each
core acts as a base for a substantial and two logical threads.
There are two physical threads in total and four logical threads
in logical total. The inclusion of a two way multi
threading demanded replication of return stack and needed
duplication of program counter along with instruction buffer.
Group completion unit as well as store queue were also
duplicated. Most of the resources are shared including the
register files. Even execution units are shared. Independent set
of registers are seen by each of the threads. The POWER5
enforced SMT, where two threads go under execution
simultaneously. SMT can be stopped by POWER5 to upgrade
for the desired workload. Due to the sharing of many
resources by two threads, an increase is observed in the
capacity so as to compensate the performance loss. The
integer and floating point registers are increased in number to
a hundred and twenty each. An increase in capacity of floating

point, from 20 to 24 entries, cache is also observed. The L2


cache has an increase in capacity. Its capacity was increased to
1.875 MB. The set association was increased to ten way. The
incorporated on package L3 cache was also brought. It had an
increased capacity of 36 MB. The two cores share the cache.
This cache is operated via two 128 bit buses which operate in
a single direction. At half the frequency of core, operate these
buses. On die memory controller backs up to sixty four GB
of double data rate and double data rate 2 memory. High
frequency serial buses are used in order to communicate with
the buffers that are present externally. Virtual Vector
Architecture technology can be used to couple several
POWER5 processors together so that they act as a
single vector processor.
F.

IBM POWER 6

The POWER 6 is also a two core processor. Each core has a


capacity multi threading in two ways simultaneously.
POWER6 uses instruction execution in order instead of out of
order. The recompilation of the software is also required for
optimal performance. Still significant performance
enhancements over the POWER5+ have been observed, even
without modifying software. An advantage of ViVA
2, Virtual Vector Architecture, is also there in POWER6. It
allows the several nodes to combine to make them act as a
single processor (vector).There are two integer units per core.
Two binary floating point units along with an Alti Vec unit is
also present. A decimal floating point unit is also present. To
achieve six cycles and thirteen FO4 pipelines, the binary
floating point unit makes use of many techniques. All the
cores have a sixty four kilobyte, four way instruction cache
along with a sixty four KB data cache. The two stage pipeline
designed data cache, is an eight ways set associative. This
cornerstone two free thirty two bit reads or one sixty four bit
write in one cycle. Each of the cores have a
four MiB consolidated L2 cache. Here the cache is allotted a
core. A thirty two MiB L3 cache is shared by the two cores,
using eighty GB per second bus. POWER6 can be connected
to thirty one other processors using two links that are internodal. It backs 10 or less logical divisions per core. IBM also
makes use of a five Giga Hertz duty cycle clock. It makes use
of a logic and a SRAM power supply. The thermal
characteristics are same as that of POWER5.
G. IBM POWER 7
The POWER7 is a super scalar multi core microprocessor,
which is symmetric, and is available with four, six, or eight
cores per micro chip, with 1024 or less SMTs. It has a
different architecture and acts as an interface for backing Sub
Specifications keeping Power ISA as a standard. A
special Turbo Core mode is a mode that turns off four out of
eight core processor. The four cores that are turned off can
operate all the memory controllers and the L3 cache at a
higher clock rate. Each core's performance is increased which
results in the advantage for the fastest series performance, but

parallel performance is reduced. TurboCore mode leads to


reduction in software costs, but there is an increase in the
performance per core. Each core possesses of four ways SMT.
An observed dissimilarity between POWER6 and its successor
is this that the POWER7 possesses out of order execution of
instructions in the place of in order. Each core has an
increased performance, though there is a decrease in
maximum frequency. POWER7 has 3.0 to 4.25 Giga Hertz
clock rate, maximum four chips per quad chip module, four,
six or eight cores per chip, four SMT threads in one core, per
core twelve execution units. There are two fixed point units
along with two load and store units. Four double precision
floating point units are also present. A vector unit, a decimal
floating point unit are present with a branch unit. A condition
register unit are also present. L1 instruction cache of size
32+32 kb and data cache is present. A 256 KB L2 Cache is
present. Per core minimum of four MB L3 cache and
maximum of thirty two MB is operable. In a very broad sense,
the floating point performing of the POWER7 and Haswell i7
is similar.
H. IBM POWER 8
POWER8 is designed to be a mass multi threaded chip. All its
cores are much efficient in handling eight threads (hardware)
at the same time. A total of ninety six threads are executed t
the same time on a single twelve core chip. It utilizes a large
quantity of on and off chip eDRAM caches. It also uses a large
amount of on chip memory controllers. GX is removed from
the design of POWER8 and is replaced by a CAPI port
(Coherent Accelerator Processor Interface), which is used to
connect GPU and ASIC like processors. Same memory
address space as used by the CPU is used by all the units that
are attached to the CAPI bus. POWER8 also contains on chip
controller (OCC). OCC is a microcontroller that manages
power and thermal related parameters. It consists of two
general purposes off load engines (GPEs). It has an embedded
static RAM of 512 KB. The OCC can be helpful in over
clocking processor by programming it. It can also help in
lowering its power consumption by decreasing the frequency
at which it operates. It comes in four, six, eight, ten and twelve
core variants. It core has a sixty four KB L1 data cache. It also
has thirty two KB L1 instruction cache. Each of the cores can
issue ten instructions. It can dispatch eight each cycle to
sixteen Execution Units, two Fixed Point Units, two Load and
Store Units, two Instruction Fetching Units, four Floating
Point Units, two VMX units, one Cryptographic Unit,
one Decimal Floating Unit, one Condition Register Unit, and
one Branch Register Unit. It has an issue queue with four
cross sixteen entries. Improved branch predictors are used
which can handle double the cache misses. Each core is eight
ways hardware multi threaded.

POWER2+ and POWER2 super chip was that of POWER


architecture. POWER3 uses PowerPC instruction. POWER4
and its derivatives use PowerPC-AS. POWER5 and its
derivatives along with POWER6 and its derivatives use
POWER ISA version 2.03. POWER7 and its derivative use
POWER ISA version 2.06. POWER8 uses POWER ISA
version 2.07. Power is RISC load and store architecture. It has
a number of sets of registers: thirty two 32 bit or 64 bit general
purpose registers, sixty four 128 bit vector scalar registers,
thirty two 64 bit floating point registers, thirty two 128 bit
vector registers, Eight 4 bit condition registers, Special
registers that includes register like counter and link register,
time base and alternate time base, accumulator, status
registers. Instructions are having a length of 32 bits. Most of
the time there are two source operands and one destination of
any instructions. Strictly load and store memory options are
used, which allows for out of order execution. Supervisor and
hypervisor are the different operation modes. Categories
include

Base
Server
Embedded
Misc

The Power Architecture specification is divided into five parts,


called "books":

Book 1, User Instruction Set Architecture includes


the base instruction set and Memory reference, flow
controlling, Integer and floating point, numerical
acceleration, application level programming.
Book 2, Virtual Environment Architecture defines
the available storage model including timing,
synchronization, cache managing, storing features,
byte ordering.
Book 3, Operating Environment Architecture
contains rejections, interrupts, memory managing,
debugging facilities and special control functions. It
is divided into two parts:

Book 3-S

Book 3-E
Book-VLE, Variable Length Encoded Instruction
Architecture defines substitute instructions and
definitions from Book 1 to Book 3.

IV. ARCHITECTURE
The instruction set used in POWER1 and its derivatives was
POWER architecture. The instruction set used in POWER2,

FIG. 2 POWER ARCHITECTURE PROCESSOR [7]

ACKNOWLEDGEMENT
I

WOULD LIKE TO THANK PROFESSOR DHAVAL SHAH FOR


GIVING ME AN OPPORTUNITY TO WORK ON THIS PROJECT. THIS
PROJECT HAS BEEN A LOT INFORMATIVE AND HAS CLEARED MY
DOUBTS ON VARIOUS SUBJECTS. NOT ONLY DID I LEARN IBM
PROCESSORS, BUT I LEARNED ITS VARIOUS POWER
ARCHITECTURES AND ITS STRUCTURES. THE PROJECT HELPED
ME VISUALIZE AND APPLY THE CONCEPTS THAT I HAVE LEARNT
IN MY B.TECH. SUBJECTS. MR. DHAVAL SHAH'S CONSTANT
SUPPORT AND GUIDANCE HAS HELPED ME GAIN CONFIDENCE
TO WORK ON SUCH A PROFESSIONAL TOPIC. THE PROJECT HAS
HELPED ME VISUALIZE THE USE , IMPORTANCE AND
PROCESSING OF PROCESSORS IN THE WORKING FIELD. I WOULD
ALSO LIKE TO THANK MY BATCH MATES AND MY PARENTS FOR
THEIR CONSTANT SUPPORT

REFERENCES
[1]. Cocke, J. and Markstein, V. (January 1990). "The evolution of
RISC technology at IBM". IBM Journal of Research and
Development 34 (1): 4-11.
[2]. Montoye, R. K.; Hokenek, E.; Runyon, S. L. (January 1990).
"Design of the IBM RISC System/6000 floating-point execution
unit". IBM Journal of Research and Development 34 (1): 5970.
[3]. Bakoglu, H. B.; Grohoski, G. F.; Montoye, R. K. (January 1990).
"The
IBM
RISC
System/6000
processor:
Hardware
overview". IBM Journal of Research and Development 34 (1): 12
22.
[4]. Soltis, Frank G. (1997). Inside the AS/400: Featuring the AS/400e
Series, 2nd Edition. 29th.
[5]. Henriok, Power Architecture, unpublished
[6]. O'Connell, F. P.; White, S. W. (6 November 2000). "POWER3: The
next generation of PowerPC processors". IBM Journal of Research
and Development, Volume 44, Number 6.
[7]. Henriok, Power Architecture, unpublished
[8]. D. Warnock, J. M. Keaty, J. Petrovick, J. G. Clabes, C. J. Kircher,
B. L. Krauter, P. J. Restle, B. A. Zoric, and C. J. Anderson
(2002). "The circuit and physical design of the POWER4
microprocessor". IBM
Journal
of
Research
and
Development 34 (1): 2752.
[9]. Glaskowsky, Peter N. (14 October 2003). "IBM Raises Curtain on
Power5".
[10]. "New POWER7 workload optimizing systems". YouTube. IBM.
2010-02-05. Retrieved 2010-02-22.

[13BEC122]

You might also like