Professional Documents
Culture Documents
Survey and Comparison of Pipeline of Some RISC and CISC System Architectures
Survey and Comparison of Pipeline of Some RISC and CISC System Architectures
Abstract—Instruction set is a set of instructions used by CPU operands", "operation", "storage" and other operations. First of
to calculate and control computer system, and is the interface all, for complex instructions, their execution times are different.
between hardware and software. There are two common Some can be completed in 4 or 5 clock cycles, while others
instruction sets: CISC and RISC. Pipeline technology is widely require dozens. Even for simple instructions, different
used in instruction set processor design to improve the efficiency addressing methods will result in different execution times.
of executing instructions. This paper introduces the difference What's worse, the length of instructions is also different, and
between CISC and RISC in pipeline implementation, introduces the length of the same instruction will vary with different
the basic pipelining and two advanced pipelining - superscalar
addressing methods. For these instructions, how to design the
and superpipelining in detail, and introduces several pipelining
pipeline length? If the pipeline is designed according to the
using CISC and RISC architecture processors, including ARM,
RISC-V, Longarch, and X86.
shortest instruction, the pipeline will be interrupted when
encountering complex instructions; If the pipeline is designed
Keywords—Pipeline, superscalar, superpipelining, CISC, RISC, according to the longest instruction, some stages will be
processor skipped when executing shorter instructions, so that the
pipeline cannot be fully filled. RISC have a fixed length and
I. INTRODUCTION few addressing modes. Most of them are simple instructions
If the computer is to obey the command, it must be in the and can be completed in a clock cycle.
language of the computer. The basic words in a computer Secondly, CISC has a complex instruction format due to
language are called instructions, and all the instructions of a its variable instruction length. In contrast, RISC has fewer
computer are called the instruction set of the computer [1]. In instruction formats, and the source register field positions in
general, instruction set architectures define supported each instruction are the same. This symmetry means that the
instructions, data types, registers, hardware support for decoding stage can start to read the register stack while
managing main memory, basic features (such as memory determining the reference type. If the instruction format is
consistency, addressing mode, virtual memory), and a set of asymmetric, the decoding stage needs to be divided into two
implemented input/output models. parts to deepen the pipeline level.
The Instruction Set architecture is mainly divided into Then, CISC does not have special memory operation
Complex Instruction Set Computer (CISC) and Reduced instructions. Many instructions can operate the memory.
Instruction Set Computer (RISC). In the 1960s, people changed Taking one instruction often requires several consecutive
the original task to be completed by multiple instructions to be memory operations, making memory access operations
completed by one Instruction, and the computer that executes frequent and without rules to follow. RISC uses special load
these Complex instructions is called CISC. But with the instructions and store instructions to access memory. Other
development of computer technology, the number of ordinary instructions cannot access memory, making RISC
instructions also increases, making the complex instruction set memory operands only appear in access instructions. All
more complex. It was found that only 20% of the instructions operands must be aligned in the memory, and there is no need
defined by the CISC instruction set were frequently used and to worry that a data transmission instruction needs to access the
80% were rarely used. Therefore, RISC became popular in the memory twice. The requested data can be transmitted between
1980s. The biggest difference between RISC and CISC lies in the processor and memory in the first level pipeline[3].
the simplicity of its instructions. Multiple instructions are used
to complete the tasks that can be completed by one instruction Although RISC has more advantages in using pipelines, it
in a complex instruction set [2]. does not mean that CISC cannot use pipelines. In order to
facilitate pipeline design, various CISC system CPUs have
Compared with reduced instructions, complex instructions introduced the concept of micro-operations instructions. In the
introduce problems to the pipeline technology widely used in pipeline pre fetching and pre decoding stages, hardware
modern processor technology: the execution of instructions in decoders are used to translate the corresponding internal simple
microprocessors is generally divided into "prefetch", "fetching instruction (microcode) sequences, and then sent to the
786
Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:10:15 UTC from IEEE Xplore. Restrictions apply.
the machine cycle can be shortened by increasing the number the operation of reading registers to the decoding unit, which
of pipeline stages. In the same time, the super pipelining makes the functions of various components of the pipeline
execute more machine instructions[9]. Another advantage of more balanced [11].
super pipelining is to improve the dominant frequency. The
more pipeline stages, the thinner the pipeline is cut, and the
less hardware logic each pipeline contains. The less hardware
logic between the two registers, the higher the frequency.
Fig. 3. ARM9 pipeline.
Of course, more pipeline stages will consume more
registers and more area overhead. Another problem with deep As shown in Fig.4,ARM11 selects pipeline of Scalar
processor pipelines is that it is impossible to know whether the architecture. On the back end of the pipeline, three parallel
conditional jump result will jump or not at the instruction component structures are used, ALU (arithmetic logic unit),
fetching stage of the pipeline, so it can only be predicted. At MAC (multiple/aggregate). LS (Load/Store). LS pipeline is
the end of the pipeline, it is possible to know whether the specially used for processing access operation instructions to
branch should jump or not through actual operations. If the separate data access operations from data arithmetic operations,
prediction results do not match, you need to "Pipeline Flush" -- so as to execute instructions more effectively. Considering that
discard all the prefetched error instruction streams. Retrieve the different instructions require different execution times, when
correct instruction stream. If the pipeline is deep, more error three types of instructions are sent to the pipeline successively,
instruction streams will be prefetched, discarded and restarted, they can be executed simultaneously, allowing random
wasting power and losing performance [10]. execution [12].
It is precisely because of the different advantages and
disadvantages of processor pipeline depth, according to
different application scenarios, the pipeline depth of today's
processors is developing towards two different extremes. On
the one hand, the progression is getting deeper and deeper. In
2004, Pentium 4 (Prescott) reached an amazing 31; On the
other hand, it becomes shallower and shallower, the shallowest
can reach two stages.
III. PIPELINING OF SEVERAL RISC AND CISC PROCESSORS
Fig. 4. ARM11 pipeline.
A. ARM
The ARM core uses a RISC architecture. Most ARM ARM Cortex A series architecture, under which ARM
processors support two instruction sets: 32-bit ARM instruction imports Superscalar architecture pipeline, enabling the
set and 16 bit Thumb instruction set. processor to process more than one instruction set in parallel in
a cycle Taking Cortex A8 as an example, it supports the 13
The previous classic processor series of ARM include stage integer pipeline(see Table 2) and the 10 stage NEON
ARM7, ARM9, ARM11.ARM7, which is a Von Neumann multimedia instruction set pipeline(see Table 3). Taking the
structure. As shown in Fig.2,it uses a typical three-stage integer processing instruction set as an example, Cortex A8
pipeline and is divided into fetching, decoding, and execution. supports Dual Issue and In Order Pipeline. Unlike the previous
The execution unit completes a lot of work, including reading ARM core, which can only process one integer processing
and writing operations of registers and memories related to instruction set at a time, Cortex A8 uses superscalar technology
operands, ALU operations, and data transmission between to issue two integer processing instruction sets together, The
related devices. Therefore, it takes up multiple clock cycles. two instruction sets are processed in parallel by two integer
arithmetic logic units Pipeline in a cycle [14].
From the classic ARM series to the current Cortex series,
the structure of ARM processors is developing towards a
complex stage, but what has not changed is the CPU's access
instructions and address relationships. That is, no matter how
Fig. 2. RM7 pipeline.
many stages of pipelines, the current PC position can be judged
As shown in Fig.3, ARM9 is a Harvard architecture, which according to the operating characteristics of the original three-
uses five level pipeline technology. ARM9 adds two stages of stage pipeline. PC always points to the instruction fetching, not
access memory and write back results after fetching to the instruction executing or the instruction decoding.
instructions, decoding, and executing. Access memory is Generally speaking, people habitually agree that the instruction
responsible for loading and storing data specified in being executed is the reference point, which is called the
instructions, extracting, symbol expanding, and loading data current first instruction. Therefore, PC always points to the
through byte or halfword loading commands. But access to third instruction. In the execution phase, when ARM is in the
memory and write back results are only valid for load (LDR) state, the PC always points to the instruction address+8 bytes;
and store command (STR), and other instructions do not need When the processor is in thumb state, the PC always points to
to execute these two stages. At the same time, ARM9 transfers the instruction address+4 bytes; When the branch instruction is
787
Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:10:15 UTC from IEEE Xplore. Restrictions apply.
executed or the PC is directly modified for branching, the
ARM kernel will refresh its pipeline; An instruction at the
execution stage completes its execution even if an interrupt is
raised.
788
Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:10:15 UTC from IEEE Xplore. Restrictions apply.
Intel's first pipeline was introduced in the i486 chip. The instruction into one or more fixed length RISC instructions; 3)
five level pipeline is Fetch, D1, D2, EX, and WB .Where, FI: The superscalar pipeline organization can perform micro
fetch the instruction from the cache. Since the entire cache line operations in disorder; 4) Submit the execution results of each
is obtained from the cache, most instructions do not need this micro operation to the register group in the order of the original
stage. On average, about 5 instructions will be acquired.D1: program flow[20].
main instruction decode. Up to three instruction bytes can be
decoded. Decode the operation that occurs in D2 stage, TABLE IV. PENTIUM 4 PIPELINE
determine the length of the instruction, and guide the 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2
instruction aligner/prefetch queue to the next instruction. D2: 0 1 2 3 4 5 6 7 8 9 0
TC TC D A Rena Q Sch Disp RF E F B D
secondary instruction decode, and memory address Next Fetc ri ll me u x l r ri
computation. Each clock decodes a memory displacement field IP h v o e g C v
e c s k e
of 1 to 4 bytes, or an immediate constant of 1 to 4 bytes. If TC Next IP=Pointer to the next instruction of trace cache TC Fetch=Trace cache fetch
there are both memory shifts and direct constants in the Alloc=Alloc Rename=Register Rename
Que=Micro operation queue Sch=Micro operation scheduling
instruction, decoding requires two D2 cycles [18]. Disp=Dispatch RF=register group
Ex=Exection Flgs=Flag
Br Ck=Branch check
PF Fetch and align instruction
IV. CONCLUSION
Decode instruction Generate
In the past, most of the new isas conceived were mainly
D1
control word RISC. RISC technology has formed two technical styles: one is
the super pipeline style that deepens the traditional pipeline,
D2 Decode control word Generate Decode control word Generate
and the other is the superscalar style that allows multiple
memory address memory address instructions to enter the pipeline per clock. Of course, many
CPUs now use superscalars together with super pipelining
Decode control word Generate Decode control word Generate technology. However, as CISC ISA, x86 is still very popular. It
E
memory address memory address converts x86 macro instructions into micro operations (Intel's
uops and AMD's ROPS). The use of uops (or ROPS) allows
WB Write result Write result
RISC style execution cores to be used to implement super
pipelines and superscalars. At present, the boundary between
U pipe V pipe RISC and CISC is not so obvious. RISC and CISC are
gradually integrated, which is equally important for pipeline
Fig. 7. Pentium superscalar pipeline. research on variable length instruction sets and pipeline
research on fixed length instruction sets.
The classic Pentium series pipeline is similar to the i486
chip, and each integer pipeline is divided into five stages of REFERENCES
pipeline. Compared with the i486 CPU's integer pipeline, the [1] Patterson D A, Hennessy J L. Computer organization and design: the
Pentium microprocessor integrates additional hardware to hardware/software interface 5th ed[J]. 2014.
speed up instruction execution. For example, the i486 CPU [2] Blem E, Menon J, Sankaralingam K. Power struggles: Revisiting the
needs two clocks to decode several instruction formats, but the RISC vs. CISC debate on contemporary ARM and x86
Pentium CPU only needs one clock to execute shift and architectures[C]//2013 IEEE 19th International Symposium on High
Performance Computer Architecture (HPCA). IEEE, 2013: 1-12.
multiply instructions faster. More importantly, the Pentium
[3] William S. COMPUTER ORGANIZATION AND ARCHITECTURE
processor adds a second independent superscalar pipeline. The
[4] DESIGNING FOR PERFORMANCE EIGHTH EDITION[J]. 2010.
two pipelines can run in parallel, and each pipeline can have
multiple instructions executed at different pipeline stages at the [5] Sutherland I E. Micropipelines[J]. Communications of the ACM, 1989,
32(6): 720-738.
same time. Fig. 7 shows that the resources used for address
[6] Kane G, Heinrich J. MIPS RISC architectures[M]. Prentice-Hall, Inc.,
generation and ALU functions are copied into independent 1992
integer pipelines, called U and V. In the PF and D1 stages, the [7] Hennessy D A P. Computer Architecture: A Quantitative Approach by
CPU can obtain and decode two simple instructions in parallel John L[J]. Hennessy, David A. Patterson, 2017.
and send them to the U and V pipelines[19]. If possible, the [8] Shen J P, Lipasti M H. Modern processor design: fundamentals of
first instruction is arranged to be executed in the U pipeline, superscalar processors[M]. Waveland Press, 2013.
and the second instruction is arranged to be executed in the V [9] Omondi A R. The microarchitecture of pipelined and superscalar
pipeline. If not, the first instruction is scheduled to be executed computers[M]. Springer Science & Business Media, 2013.
in the U pipeline, and the instruction is not scheduled to run in [10] .Jouppi N P, Wall D W. Available instruction-level parallelism for
the V pipeline. Instructions running in two pipes have exactly superscalar and superpipelined machines[J]. ACM SIGARCH Computer
the same effect as their sequential execution. Architecture News, 1989, 17(2): 272-282.
[11] Hartstein A, Puzak T R. The optimum pipeline depth for a
Unlike the previous microprocessors, Pentium 4 has at least microprocessor[J]. ACM Sigarch Computer Architecture News, 2002,
20 stages of pipeline (see Table 4). In some cases, micro 30(2): 7-13.
operations require multiple execution segments, which leads to [12] Sloss A, Symes D, Wright C. ARM system developer's guide: designing
a longer pipeline. As a whole, the instruction operation steps of and optimizing system software[M]. Elsevier, 2004.
Pentium 4 can be divided into four stages: 1) fetch micro [13] Cormie D. The ARM11 microarchitecture[J]. Retrieved July, 2002, 21:
2004.
instructions in order; 2) Micro operation, which translates each
789
Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:10:15 UTC from IEEE Xplore. Restrictions apply.
[14] Williamson D. Arm cortex-a8: A high-performance processor for low- [17] Loongson Technology Corp. Ltd. Loong3A5000 processor[OL] [2022-
power applications[M]//Unique Chips and Systems. CRC Press, 2018: 6-28] htts://www.loongson.cn/productShow/32
95-122. [18] Crawford J. The execution pipeline of the intel i486 cpu[C]//1990
[15] Asanovic K, Avizienis R, Bachrach J, et al. The rocket chip generator[J]. Thirty-Fifth IEEE Computer Society International Conference on
EECS Department, University of California, Berkeley, Tech. Rep. Intellectual Leverage. IEEE Computer Society, 1990:
UCB/EECS-2016-17, 2016, 4.. 254,255,256,257,258-254,255,256,257,258.
[16] Asanovic K, Patterson D A, Celio C. The berkeley out-of-order machine [19] Alpert D, Avnon D. Architecture of the Pentium microprocessor[J].
(boom): An industry-competitive, synthesizable, parameterized risc-v IEEE micro, 1993, 13(3): 11-21..
processor[R]. University of California at Berkeley Berkeley United [20] Hinton G, Sager D, Upton M, et al. The microarchitecture of the
States, 2015. Pentium® 4 processor[C]//Intel technology journal. 2001..
790
Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:10:15 UTC from IEEE Xplore. Restrictions apply.