Professional Documents
Culture Documents
Aca 4
Aca 4
CSD-411
•
Advanced Computer Architecture
•.
Advanced Computer Architecture
Vector Processor
• Vector architecture grab sets of data elements scattered in memory, place
them into large, sequential register files, operate on data in those register
files, and then disperse the results back into memory.
• Exploit data-level parallelism by applying a single instruction to a
collection of data in parallel.
• Components of VMIPS are:
• Vector registers
• Vector functional units
• Vector load/store unit
• A set of scalar registers
Advanced Computer Architecture
Advanced Computer Architecture
Advanced Computer Architecture
Example
•Y=axX+Y
X and Y are vectors, initially resident in memory, and a is a scalar.
• VMIPS code for DAXPY
Advanced Computer Architecture
MIPS Code
Advanced Computer Architecture
Vector-Length
• The number of elements in each vector register.
• This length, which is 64 for VMIPS, is unlikely to match the real vector
length in a program.
• Example:
• VLR controls the length of any vector operation, including a vector load or
store.
• The value in the VLR can not be greater than the length of the vector
registers.
• This solves the problem as long as the real length is less than or equal to
the maximum vector length (MVL).
• Create one loop to handle any number of iterations that is a multiple of the
MVL and another loop to handle remaining iterations.
• Strip-mined version of the DAXPY loop:
Advanced Computer Architecture
• First convoy starts with the first LV instruction. The MULVS.D is dependent on
the first LV, but chaining allows it to be in the same memory.
• Second LV instruction is in a separate convoy as there is a structural hazard on
the load/store unit for the prior LV instruction.
• SV is in third convoy as it has a structural hazard on the LV in the second
convoy.
Advanced Computer Architecture
Chaining
• Even though a pair of operations depends on one another, chaining allows
the operations to proceed in parallel on separate elements on the vector.
Advanced Computer Architecture
• A 500 MHz VMIPS would run this loop at 333 MFLOPS assuming no
strip-mining or start-up overhead.
Advanced Computer Architecture
• The loop stores into an array element indexed by and later fetches from the
same array element when it is indexed by
Advanced Computer Architecture
• In general, we cannot determine whether a dependence exists at compile
time. For example, the values of a, b, c, and d may not be known (they
could be values in other arrays), making it impossible to tell if a
dependence exists.
• Many programs contain simple indices where a, b, c, and d are all
constants. For these cases, it is possible to devise reasonable compile time
tests for dependence.
• A simple and sufficient test for the absence of a dependence is the greatest
common divisor (GCD) test.
• If a loop-carried dependence exists, then GCD(c,a) must divide (d – b).
Advanced Computer Architecture
• Example: Use the GCD test to determine whether dependences exist in the
following loop:
• Solution:
Advanced Computer Architecture
• When loop are unrolled, this sort of optimization is important to reduce the
impact of dependences arising from recurrences.
Advanced Computer Architecture
• Loop example:
• Selected instructions from different iterations are then put together in the
loop with the loop control instructions:
•
Advanced Computer Architecture
Predicated instruction
• Concept: An instruction refers to a condition, which is evaluated as part of
the instruction execution. If the condition is true, the instruction is
executed normally; if the condition is false, the execution continues as if
the instruction were a no-op.
• These instructions can be used to eliminate branches, converting a control
dependence into a data dependence and potentially improving
performance.
• Example: conditional move instruction – move a value from one register to
another if the condition is true.
• It can be used to completely eliminate a branch in simple sequences.
Advanced Computer Architecture
• Example:
• Assuming that registers R1, R2, and R3 hold the values of A, S, and T,
respectively.
• Code using a branch:
•
Advanced Computer Architecture
•
Advanced Computer Architecture
• DRAM latency is about 100,000 times less than disk, and performance
advantage costs 30 to 150 times more per gigabyte for DRAM.
Advanced Computer Architecture
Disk Power
• Power is an increasing concern for disks as well as for processors.
• A typical ATA disk in 2011 might use 9 watts when idle, 11 watts when
reading or writing, and 13 watts when seeking.
• Smaller platters, slower rotation, and fewer platters all help in reducing the
disk motor power.
Advanced Computer Architecture
RAID
• It stands for either Redundant Array of Independent Disks or Redundant
Array of Inexpensive Disks.
• It is a technology that is used to increase the performance and/or reliability
of data storage.
Advanced Computer Architecture
RAID 0 – Striping
• In a RAID 0 system data are split up into blocks.
• RAID 0 offers great performance, both in read and write operations.
• RAID 0 does not provide redundancy or fault tolerance.
Advanced Computer Architecture
RAID 1 – Mirroring
• Data are stored twice by writing them to both the data drive (or set of data
drives) and a mirror drive (or set of drives).
• If a drive fails, the controller uses either the data drive or the mirror drive
for data recovery and continuous operation.
• You need at least 2 drives for a RAID 1 array.
• RAID-1 is ideal for mission critical storage, for instance for accounting
systems.
Advanced Computer Architecture
•
Advanced Computer Architecture
RAID 2
• It is an original RAID level but rarely used today.
• It is a striping technology that stripes at the bit level instead of the block
level, and uses a complex type of error correcting code that takes the place
of parity.
Advanced Computer Architecture
RAID 3
• It uses byte-level striping and parity, and stores parity calculations on
dedicated disk.
Advanced Computer Architecture
RAID 4
• It stripes block level data and dedicates a disk to parity.
Advanced Computer Architecture
•
Advanced Computer Architecture
•
Advanced Computer Architecture
•
Advanced Computer Architecture