Professional Documents
Culture Documents
Sherwin Chan - RISC Architecture
Sherwin Chan - RISC Architecture
RISC vs CISC
Sherwin Chan
Playstation Intel x86 INMOS Transputer ZISC36 many GPUs IA-64 Itanium C6000 (Texas Instruments)
In the past, it was believed that hardware design was easier than compiler design
Most programs were written in assembly language Limited and slower memory Few registers
The Solution
Have instructions do more work, thereby minimizing the number of instructions called in a program Allow for variations of each instruction
CISC
Ex. A single instruction can load from memory, perform an arithmetic operation, and store the result in memory
Compilers became more prevalent The majority of CISC instructions were rarely used Some complex instructions were slower than a group of simple instructions performing an equivalent task
Smaller instructions allowed for constants to be stored in the unused bits of the instruction
RISC Architecture
Small, highly optimized set of instructions Uses a load-store architecture Short execution time Pipelining Many registers
Load/Store Architecture
Individual instructions to store/load data and to perform operations All operations are performed on operands in registers Main memory is used only to load/store instructions
RISC vs CISC
Less transistors needed in RISC RISC processors have shorter design cycles RISC instructions take less clock cycles than CISC instructions
Smaller instructions allowed for constants to be stored in the unused bits of the instruction
111 instructions
25 branch/jump instructions 21 arithmetic instructions 15 load instructions 12 comparison instructions 10 store instructions 8 logic instructions 8 bit manipulation instructions 8 move instructions 4 miscellaneous instructions
Pipelining 101
Break instructions into steps Work on instructions like in an assembly line Allows for more instructions to be executed in less time A n-stage pipeline is n times faster than a non pipeline processor (in theory)
Fetch instruction Decode instruction Execute instruction Access operand Write result
Without Pipelining
Normally, you would peform the fetch, decode, execute, operate, and write steps of an instruction and then move on to the next instruction
Without Pipelining
Clock Cycle 1 2 3 4 5 6 7 8 9 10
Instr 1 Instr 2
With Pipelining
The processor is able to perform each stage simultaneously. If the processor is decoding an instruction, it may also fetch another instruction at the same time.
With Pipelining
Clock Cycle 1 2 3 4 5 6 7 8 9
Pipeline (cont.)
Length of pipeline depends on the longest step Thus in RISC, all instructions were made to be the same length Each stage takes 1 clock cycle In theory, an instruction should be finished each clock cycle
Pipeline Problem 1
Problem: An instruction may need to wait for the result of another instruction
Ex:
Solution: Compiler may recognize which instructions are dependent or independent of the current instruction, and rearrange them to run the independent one first
Pipelining Problems 2
Problem: A branch instruction evaluates the result of another instruction that has not finished yet Ex: Loop : add $r3, $r2, $r1 sub $r6, $r5, $r4 beq $r3, $r6, Loop
Solution 1: Guess. Begin on predicted instruction first. If wrong, clear pipeline and begin on correct instruction. Ex: For a loop statement, assume it will loop back, because the majority of the time it will. Some processors remember old branches and use that to predict new ones
Solution 2: Begin decoding instructions from both sides of the branch. After the branch is evaluated, send the correct instructions to the pipeline.
Superpipelining
Superduperpipelining Superscalarpipelining
Dynamic pipeline: Uses buffers to hold instruction bits in case a dependent instruction stalls
Most Intel and AMD chips are CISC x86 Most PC applications are written for x86 Intel spent more money improving the performance of their chips Modern Intel and AMD chips incorporate elements of pipelining