Professional Documents
Culture Documents
Chapter 4 (Processors and Memory Hierarchy)
Chapter 4 (Processors and Memory Hierarchy)
Chapter 4 (Processors and Memory Hierarchy)
Chapter Four
Processors and Memory Hierarchy
Advanced Processor Technology
• Instruction Pipelines
A typical instruction includes four phases: ifetch, decode, execute, write-back. These phases are often
executed by an instruction pipeline depicted in Fig 4.1. A pipeline cycle is the time required for each phase to
complete its operation, assuming equal delay in all phases. Some terms associated with pipelining are:
Instruction pipeline cycle:- The clock period of instruction pipeline.
Instruction issue latency:- The time (in clocks) required between issuing the two adjacent instructions.
Instruction issue rate:- Number of instructions issued per cycle, also degree of a superscalar computer.
Simple operation latency: These latencies are measured in numbers of cycles when simple operations
such as add, load, store, branch, move etc are being executed.
Resource conflicts: Refers to the situation where two or more instructions demand using the same
functional unit at the same time.
Fig 4.4: Microprogrammed controlled and unified cache CISC 4.5: Hardwired controlled and split cache RISC
• Characteristics of Typical CISC and RISC Architectures:
• A CISC processor architecture (VAX 8600 processor)
VAX 8600 is made of two separate functional units
(integer and floating point instructions) for
concurrent execution.
The unified cache memory is used for
holding both the instructions and data.
The instruction unit fetches and decodes
instructions, handles branches and feeds
operands to the two functional units.
The TLM (Translation Lookaside Buffer)
is used to generate physical address from
a virtual one.
Both integer and floating-point units are
pipelined and the performances rely on
the cache hit ratio and minimal branching
damage to the flow.
• A RISC processor architecture
(Intel i860 processor)
The Intel i860 processor has nine functional
units interconnected by multiple data paths.
All address buses are 32-bit wide and all
the data buses are 64-bits wide.
Instruction cache: The 4 kbytes sized cache
is organized as a two way set associative
memory and transfer 64 bits per cycle clock.
Memory management unit: Manages the
Instruction and data address for further
proceeding.
Data cache: Also a two way set associative
memory of 8 kbytes and transfer 128 bits per
cycle clock.
Bus control: Coordinates the 64 bit data
transfer between the chip and the external
connections.
Integer unit(IU: IU executes load, store, integer
and control instructions and fetches instructions for FPU
units as well.
Floating point unit(FPU): FPU coordinates
two other basic units called multiplier unit
and adder unit.
• Multiplier and Adder unit: Under the supervision of FPU, these two units operates simultaneously.
Special dual floating point instructions like add-and-multiply and subtract-and-multiply use both
adder and multiplier units.
• Graphics unit: Can work with 8, 16 or 32 bits pixel data types and also supports three dimensional
drawing with color intensity and shading.
Superscalar and Vector processors
• Pipelining in Superscalar processor:
As mentioned earlier, a base scalar processor either in RISC or CISC, has m=1. So, in order to
exploit higher degree of instruction level parallelism, a superscalar processor of degree m must
execute m instructions per cycle.
The micro/simple operation latency should require only one cycle as in the base scalar processor.
Typical superscalar processor issues two to five instructions per cycle.
Fig 4.10: Typical VLIW processor with degree m=3 Fig 4.11: VLIW processor pipelining with degree m=3
• Hierarchical Memory Technology
• Storage devices are organized as hierarchy as shown in
figure on the right.
• The memory technology and storage organization at each
level are characterized by five parameters:
Access time (𝑡𝑖 ): Refers to the round trip from CPU to
the ith level memory.
Memory size (𝑠𝑖 ): Refers to the memory size in bytes
or words in ith level memory.
Cost per byte (𝑐𝑖 ): Refers to the cost of the ith level
memory evaluated by the product (𝑐𝑖 𝑠𝑖 ).
Transfer bandwidth (𝑏𝑖 ): Refers to the rate at which
data are transferred between adjacent levels.
Unit of transfer (𝑥𝑖 ): Refers to the grain size for data
transfer between two adjacent levels.
Where 𝑖 = 1,2,3 … 𝑛
• Inclusion property
• The inclusion property states as 𝑀1 ⊂ 𝑀2 ⊂ 𝑀3 ⊂ ⋯ ⊂ 𝑀𝑛 . This implies that all information
are stored in the outermost level 𝑀𝑛 . So that subsets of 𝑀𝑛 are copies into 𝑀𝑛−1 , subsets of
𝑀𝑛−1 are copied into 𝑀𝑛−2 and so on.
• Information transfer between CPU and cache is in terms of words.
• The cache is divided into cache blocks (typically 32 bytes or 8 words).
• Information transfer between cache and main memory is in terms of blocks.
• Main memory is divided into pages (typically 4 kbytes) which in fact are consisted of series of
blocks.
• Pages are the units of data transfer between main memory and disks or other external
drives.
• Pages are organized as segment in the disk memory.
Fig 4.13: Inclusion property and data transfer between memory hierarchy
• Coherence property
Coherence property requires that the copies in successive memory levels be consistent.
If a word is modified in cache, then it must be updated immediately through all higher levels.
There are two ways to maintain the coherence in memory hierarchy:
o Write-through (WT): Immediate updating in level 𝑀𝑖+1 if there is any modification in level 𝑀𝑖 .
o Write-back (WB): Delaying the updating in level 𝑀𝑖+1 until the modified data is replaced or
removed in level 𝑀𝑖 .
• Locality of references
Also known as principle of locality depicts the tendency of a processor to access the same set of
memory locations repetitively over a short period of time for future use. There are three properties
of locality:
o Temporal: Recently referenced items tend to be accessed again in the near future e.g.
instructions in a loop, local variables, subroutines.
o Spatial: Access tends to be clustered e.g. array or program segments that are situated in
neighboring addresses.
o Sequential: Instructions tend to be executed sequentially unless an out-of-order instruction (e.g.
branch) arrives.
• Memory Capacity Planning
Hit ratio: Hit ratio is defined for any two adjacent memory levels. When an information is found in 𝑀𝑖 ,
(where 𝑖 = 1,2,3, … , 𝑛) we call it a hit, otherwise a miss.
Hit ratio (ℎ𝑖 ) at 𝑀𝑖 is a probability that the information be found at 𝑀𝑖 or not. The miss ratio at 𝑀𝑖 is
defined as (1 − ℎ𝑖 ).
Since, hit ratio is a probability function, and if there are n levels in the memory hierarchy, then the
outermost memory 𝑀𝑛 is always a hit, thus, ℎ𝑛 = 1.
Access frequency: Access frequency at 𝑀𝑖 is defined as:
𝑓𝑖 = (1 − ℎ1 ) 1 − ℎ2 … (1 − ℎ𝑖−1 )ℎ𝑖
This means that a hit occurs at 𝑀𝑖 , when there are 𝑖 − 1 number of misses at the lower memory
levels. So the equation below exists:
𝑛
𝑓𝑖 = 1
𝑖=1
and 𝑓1 = ℎ1 , when there is only one level in the memory hierarchy
Effective access time (𝑇𝑒𝑓𝑓 ): Hit ratio must be as high as possible at level 𝑀𝑖 . There is a penalty for
every miss at every level. Considering this fact, the effective access time can be defined as follows:
𝑛
𝑇𝑒𝑓𝑓 = 𝑓𝑖 . 𝑡𝑖
𝑖=1
= ℎ1 𝑡1 + 1 − ℎ1 ℎ2 𝑡2 + (1 − ℎ1 ) 1 − ℎ2 ℎ3 𝑡3 + ⋯ + 1 − ℎ1 1 − ℎ2 … (1 − ℎ𝑛−1 )ℎ𝑛 𝑡𝑛
Hierarchy optimization: The total cost (𝐶𝑡𝑜𝑡𝑎𝑙 )of memory hierarchy is evaluated as below:
𝑛
𝐶𝑡𝑜𝑡𝑎𝑙 = 𝐶𝑖 𝑆𝑖
𝑖=1
This equation is a distributive form over n memory levels. Since 𝑐1 > 𝑐2 > 𝑐3 > ⋯ > 𝑐𝑛 , we have
to choose 𝑠1 < 𝑠2 < 𝑠3 <…< 𝑠𝑛 .
Math Problems