Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Computer Organization and

Architecture Themes and Variations 1st


Edition Alan Clements Solutions
Manual
Go to download the full and correct content document:
https://testbankdeal.com/product/computer-organization-and-architecture-themes-and
-variations-1st-edition-alan-clements-solutions-manual/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Computer Organization and Architecture 10th Edition


Stallings Solutions Manual

https://testbankdeal.com/product/computer-organization-and-
architecture-10th-edition-stallings-solutions-manual/

Computer Organization and Architecture 9th Edition


William Stallings Solutions Manual

https://testbankdeal.com/product/computer-organization-and-
architecture-9th-edition-william-stallings-solutions-manual/

Computer Organization and Architecture 10th Edition


Stallings Test Bank

https://testbankdeal.com/product/computer-organization-and-
architecture-10th-edition-stallings-test-bank/

Computer Organization and Architecture 9th Edition


William Stallings Test Bank

https://testbankdeal.com/product/computer-organization-and-
architecture-9th-edition-william-stallings-test-bank/
Psychology Themes and Variations 9th Edition Wayne
Weiten Solutions Manual

https://testbankdeal.com/product/psychology-themes-and-
variations-9th-edition-wayne-weiten-solutions-manual/

CDN ED Psychology Themes and Variations 3rd Edition


Weiten Solutions Manual

https://testbankdeal.com/product/cdn-ed-psychology-themes-and-
variations-3rd-edition-weiten-solutions-manual/

Psychology Themes and Variations 8th Edition Weiten


Test Bank

https://testbankdeal.com/product/psychology-themes-and-
variations-8th-edition-weiten-test-bank/

Psychology Themes and Variations 10th Edition Weiten


Test Bank

https://testbankdeal.com/product/psychology-themes-and-
variations-10th-edition-weiten-test-bank/

Computer Organization and Design RISC-V 1st Edition


Patterson Solutions Manual

https://testbankdeal.com/product/computer-organization-and-
design-risc-v-1st-edition-patterson-solutions-manual/
Chapter 7: Processor Control
1. For the microprogrammed architecture of Figure P7.1, give the sequence of actions required to implement the
instruction ADD D0, D1 which is defined in RTL as [D1] ← [D1] + [D0].

Abus GMSR Bbus


Read Data out
Write
Main store EMSR The memory performs
GMSW
Data in a read when Read = 1
Address and a write when Write = 1
EMSW
CMAR
MAR

CMBR GMBR
MBR
EMBR
CIR GIR
IR
EIR
CPC GPC
PC
EPC
CD0 GD0
D0
ED0
CD1 GD1
D1
ED1
CL1

ALU P Latch 1
F f(P,Q)
Q Latch 2
Function select

CL2
F2 F1 F0

Figure P7.1 Architecture of a hypothetical computer

You should describe the actions that occur in plain English (e.g., “Put data from this register on that bus”) and as a
sequence of events (e.g., Read = 1, EMSR). The table below defines the effect of the ALU’s function code. Note that
all data has to pass through the ALU (the copy function) to get from bus B or bus C to bus A.

F2 F1 F0 Operation
0 0 0 Copy P to bus A A=P
0 0 1 Copy Q to bus A A=Q
0 1 0 Copy P + 1 to bus A A=P+1
0 1 1 Copy Q + 1 to bus A A=Q+1
1 0 0 Copy P ‐ 1 to bus A A=P–1
1 0 1 Copy Q ‐ 1 to bus A A=Q–1
1 1 0 Copy bus P + Q to bus A A=P+Q
1 1 1 Copy bus P ‐ Q to bus A A=P–Q

109
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
SOLUTION

To perform the addition D1 must be latched into an ALU latch, D2 latched into an ALU latch, the ALU set to add
and the result latched into D1. That is,

ED0 = 1, CL1 ;we can do D0 or D1 in any order and we can use latch L1 or latch L2
ED1 = 1, CL2 ;copy D1 via bus B into latch 2
ALU(f2,f1,f0) = 1,1,0, CD1 ;perform addition and latch result in D1.

2. For the architecture of Figure P7.1 write the sequence of signals and control actions necessary to implement the
fetch cycle.

SOLUTION

The fetch cycle involves reading the data at the address in the PC, moving the instruction read from memory to
the IR, and updating the PC.
EPC = 1, CL1 ;move PC via B bus to latch 1
ALU(f2,f1,f0) = 0,0,0, CMAR ;pass PC through ALU and clock into MAR
;the PC is in L1 so we can increment it
ALU(f2,f1,f0) = 0,1,0, CPC ;use the ALU to increment L1 and move to PC
Read = 1, EMSR = 1, CL1 ;move instruction from memory to latch 1 via B bus
ALU(f2,f1,f0) = 0,0,0, CIR ;pass instruction through ALU and clock into IR

3. Why is the structure of Figure P7.1 so inefficient?

SOLUTION

Because there is only one bus to the ALU input and no direct connection between the B and A bus. This means
that all data has to go through the ALU, which becomes a bottleneck.

4. Why is the ALU instruction set of Figure P7.1 so inefficient?

SOLUTION

Because three of the operations are repeated. Since there is only one B bus input to the ALU via latch L1 or L2, it
does not matter whether data is passed from bus B to bus A via L1 or L2.

5. For the architecture of Figure P7.1, write the sequence of signals and control actions necessary to execute the
instruction ADD M,D0 that adds the contents of memory location M to data register D0 and deposits the results
in D0. Assume that the address M is in the instruction register IR.

SOLUTION

This instruction requires a memory read followed by an addition.

EIR = 1, CL1 ;move IR (i.e., address) via B bus to latch 1


ALU(f2,f1,f0) = 0,0,0, CMAR ;pass IR through ALU and clock into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus
ED0 = 1, CL2 ;move D0 via B bus to latch 2 via B bus
ALU(f2,f1,f0) = 1,1,0, CD0 ;perform addition and clock result into D0

110
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
6. This question asks you to implement register indirect addressing. For the architecture of Figure P7.1, write the
sequence of signals and control actions necessary to execute the instruction ADD (D1),D0 that adds the
contents of the memory location pointed at by the contents of register D1 to register D0, and deposits the results
in D0. This instruction is defined in RTL form as[D0] ← [[D1]] + [D0].

SOLUTION

Here, we have to read the contents of a register, use it as an address, and read from memory.

ED1 = 1, CL1 ;move D1 (i.e., address) via B bus to latch 1


ALU(f2,f1,f0) = 0,0,0, CMAR ;pass D1 (the pointer) through ALU and clock into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is the actual data)
ED0 = 1, CL2 ;move D0 via B bus to latch 2 via B bus
ALU(f2,f1,f0) = 1,1,0, CD0 ;perform addition and clock result into D0

7. This question asks you to implement memory indirect addressing. For the architecture of Figure P7.1, write the
sequence of signals and control actions necessary to execute the instruction ADD [M],D0 that adds the
contents of the memory location pointed at by the contents memory location M to register D0, and deposits the
results in D0. This instruction is defined in RTL form as[D0] ← [[M]] + [D0].

SOLUTION

We have to read the contents of a memory location, use it as an address, and read from memory. We can begin
with the same code we used for ADD M,D0.

EIR = 1, CL1 ;move IR (i.e., address) via B bus to latch 1


ALU(f2,f1,f0) = 0,0,0, CMAR ;pass IR through ALU and clock into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is a pointer)
ALU(f2,f1,f0) = 0,0,0, CMAR ;pass the pointer through ALU and clock into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is the data)
ED0 = 1, CL2 ;move D0 via B bus to latch 2 via B bus
ALU(f2,f1,f0) = 1,1,0, CD0 ;perform addition and clock result into D0

8. This question asks you to implement memory indirect addressing with index. For the architecture of Figure P7.1,
write the sequence of signals and control actions necessary to execute the instruction ADD [M,D1],D0, that
adds the contents of the memory location pointed at by the contents memory location M plus the contents of
register D1 to register D0, and deposits the results in D0. This instruction is defined in RTL form as[D0] ←
[[M]+[D1]] + [D0].

SOLUTION

We have to read the contents of a memory location, generate an address by adding this to a data register, and
then use the sum to get the actual data. We can begin with the same code we used for ADD [M],D0.

EIR = 1, CL1 ;move IR (i.e., address) via B bus to latch 1


ALU(f2,f1,f0) = 0,0,0, CMAR ;pass IR through ALU and clock into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is a pointer)
ED1 = 1, CL2 ;move D1 via B bus to latch 2
ALU(f2,f1,f0) = 1,1,0, CMAR ;perform addition to get the indexed address and clock result into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is the data)
ED0 = 1, CL2 ;move D0 via B bus to latch
ALU(f2,f1,f0) = 1,1,0, CD0 ;perform addition and clock result into D0

Note how microprogramming can implement any arbitrarily complex addressing mode.

111
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
9. For the microprogrammed architecture of Figure P7.1, define the sequence of actions (i.e., micro‐operations)
necessary to implement the instruction TXP1 (D0)+,D1 that is defined as:

[D1] ← 2*[M([D0])] + 1
[D0] ← [D0] + 1

Explain the actions in plain English and as a sequence of enables, ALU controls, memory controls and clocks. This
is quite a complex instruction because it requires a register‐indirect access to memory to get the operand and it
requires multiplication by two (there is no ALU multiplication instruction). You will probably have to use a
temporary register to solve this problem and you will find that it requires several cycles to implement this
instruction. A cycle is a sequence of operations that terminates in clocking data into a register.

SOLUTION

Now we have to perform quite a complex operation; that is, read from memory using a register indirect address.
The address is obtained by reading the data in the location pointed at by D0, multiplying this value by 2 and
adding 1. We have no multiplied or shifter, so we must add the number to itself.

ED0 = 1, CL1 ;move D0 via B bus to latch 1


ALU(f2,f1,f0) = 0,0,0, CMAR ;pass IR through ALU and clock into MAR
Read = 1, EMSR = 1, CL1, CL2 ;move data from memory to latch 1 and latch 2 via B bus
;note that we have a copy of D0 in L1 and L2
ALU(f2,f1,f0) = 1,1,0, CD1 ;perform addition to get 2[M[D0]] in D1 which we use as a temp register
ED1 = 1, CL1 ;move D1 via B bus to latch 1
ALU(f2,f1,f0) = 0,1,0, CMAR ;perform P + 1 in the ALU and clock address 2 × [M[D0]] + 1 into MAR
Read = 1, EMSR = 1, CL1 ;move data from memory to latch 1 via B bus (this is the final data)
ALU(f2,f1,f0) = 0,0,0, CD1 ;pass data through ALU and clock into D1
;now increment D0
ED0 = 1, CL1 ;move D0 via B bus to latch 1
ALU(f2,f1,f0) = 0,1,0, CD0 ;perform [D0] + 1 in the ALU and latch into D0

10. Why was microprogramming such a popular means of implementing control units in the 1980s?

SOLUTION

In the 1980s memory was horrendously expensive by comparison with the cost of memory today. Every byte was
precious. Consequently, complex instructions were created to do a lot of work per instruction. These instructions
were interpreted in microcode in the CPU. Today, memory is cheap and simple regular instructions are the order
of the day (i.e., RISC). However, some processors like the IA32 have legacy code (complex instructions), that is still
interpreted by means of microcode.

11. Why is microprogramming so unpopular today?

SOLUTION

Microcode is not generally used today in new processors because executing microcode involves too many data
paths in series. In particular, there are several ROM look‐up paths in series. First, it is necessary to look up the
instruction to decode it. Then you have to look up each microinstruction in the microinstruction memory. Today,
RISC‐like processors with 32‐bit instructions are encoded so that the instruction word itself is able to directly
generate the signals necessary to interpret the instruction in a single cycle. In other words, the machine itself has
become the new microcode.

112
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
12. Figure P7.12 from the text demonstrates the execution of a conditional branch instruction in a flow‐through
computer. The grayed out sections of the computer are not required by a conditional branch instruction. Can you
think of any way in which these unused elements of the computer could be used during the execution of a
conditional branch?
BRA Target

Register file
OpCode
PCaddress Data
S1address S1data S1data Memory
Memory
PC address Memory_MPLX
ALU_MPLX ALU Maddress
PCdata 0
S2address S2data 0 MPLX
BRA Target where S2data
MPLX Mdata_out 1
the target address is Instruction 1
[PC]+4+4*L Memory Daddress
Mdata_in
PC_MPLX Literal L Ddata
00
01
PC
Branch +
PC_adder
4

MPLX Jump Sign


10 extension Load data
11
32-bit branch 32-bit sign-extended
target address word offset

0
PC_MPLX
Z +
Branch_adder
Left shift x 2

control
The Z-bit from the CCR 32-bit sign-extended
controls the PC multiplexer. byte offset
It selects between next
address and branch address.

Figure P7.12 Architecture of a hypothetical computer

SOLUTION

In this example, the register file, ALU, and data memory are not in use. It begs an interesting question. Could a
branch be combined with another operation that could be performed in parallel (rather like the VLIW (very long
instruction word) computers that we look at in Chapter 8. For example, you could imagine an instruction BEQ
target: r0++ which performs a conditional branch to target and also increments register r0. Of course, the
price of such an extension would be to reduce the number of bits available for the target address.

13. What modifications would have to be made to the architecture of the computer in Figure P7.12 to implement
predicated execution like the ARM?

SOLUTION

The ARM predicates instructions; for example, ADDEQ r0,r1,r2. A predicated instruction is executed if the
stated condition is true. In this case ADDEQ r0,r1,r2 is executed if the Z‐bit of the status is true. One way of
implementing predicated execution would be to take a NOP (no operation) instruction that is jammed into the
instruction register if the predicated condition is false. Another solution would be to put AND gates in all paths
that generate signals that clock or update registers and status values. If the predicated condition is false, all
signals that perform an update are negated and the state of the processor does not change.

14. What modifications would have to be added to the computer of Figure P7.12 to add a conditional move
instruction with the format MOVZ r1,r2,r3 that performs [r1] ← [r2] if [r3] == 0?

SOLUTION

The basic data movement can be implemented in the normal way using existing data paths from the register file,
through the ALU, the memory multiplexer, and back to the ALU. To implement the conditional action, register r3
must be routed to the ALU and compared with zero. The result of the comparison is used to determine whether a
writeback (i.e., writing r2 into r1) would take place in the next pipeline stage.

113
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
15. What modifications would have to be made to the architecture of the computer in Figure P7.12 to implement
operand shifting (as part of a normal instruction) like the ARM?

SOLUTION

As in the case of the ARM processor family, it would require a barrel shifter in one of the inputs to the ALU so that
the operand is shifted before use. The number of shifts to be performed could be taken from the op‐code (for
example, from the literal field). However, the existing structure could not implement an ARM‐like dynamic shift
ADD r0,r1,r2, lsl r3 , because the register file does not have three address inputs. In order to provide
dynamic shifts, it would be necessary to add an extra address and read the data port to the register file.

16. Derive an expression for the speedup ratio (i.e., the ratio of the execution time without pipelining to the
execution time with pipelining) of a pipelined processor in terms of the number of stages in the pipeline m and
the number of instructions to be executed N.

SOLUTION

Suppose that the number of instructions to be executed were N. It would take N clocks + m ‐ 1 to execute. The
factor (m ‐ 1) is due to the time for the last instruction to pass through the pipeline. The speedup relative to an
unpipelined system that would require N⋅m cycles (N instructions executed in n stages) is N⋅m/(N + m ‐ 1).

17. In what ways is the formula for the speedup of the pipeline derived in the previous question flawed?

SOLUTION

There are two flaws. The first is that the pipeline can be exploited fully only if the pipeline is continually supplied
with instructions. However, interactions between data elements, competition for resources, and branch
operations reduce the efficiency of a pipeline. These factors can introduce stall cycles (wait states for resources)
or force the pipeline to be flushed.

However, there is another factor to consider. In order to pipeline a process, it is necessary to place a register
between stages. The register has a setup and hold time which must be taken into account; that is, the pipeline
register increases the effective length of each stage.

18. A processor executes an instruction in the following six stages. The time required by each stage in picoseconds
(1,000 ps = 1 ns) is given for each stage.

IF instruction fetch 300 ps


ID Instruction decode 150 ps
OF Operand fetch 250 ps
OE Execute 350 ps
M Memory access 700 ps
OS Operand store (writeback) 200 ps

a. What is the time to execute an instruction if the processor is not pipelined?


b. What is the time taken to fully execute an instruction assuming that this structure is pipelined in six stages
and that there is an additional 20 ps per stage due to the pipeline latches?
c. Once the pipeline is full, what is the average instruction rate?
d. Suppose that 25% of instructions are branch instructions that are taken and cause a 3‐cycle penalty, what is
the effective instruction execute time?

114
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
SOLUTION

a. Add up the individual times: 300 + 150 + 250 + 350 + 700 + 200 = 1950ps = 1.950ns

b. The longest stage is 700 ps which determines the clock period. With 20 ps for the latches, the time is 720 × 6
= 4320 ps = 4.32 ns.

c. One instruction per clock; that is every 720 ps.

d. 75% of instructions are not taken branches and these take on average 0.75 × 720 ps = 540 ps. 25% are taken
branches that take 0.25 × 3 × 720 ps = 540 ps. The total time is 540 + 540 =1080 ps.

19. Both RISC and CISC processors have registers. Answer the following questions about registers.

a. Is it true that a larger number of registers in any architecture is always better than a smaller number?
b. What limits the number of registers that can be implemented by any ISA?
c. What are the relative advantages and disadvantages of dedicated registers like the IA32 architecture
compared to general purpose registers like ARM and MIPS?
d. If you have an m‐bit register select field in an instruction, you can’t have more than 2m registers. There are, in
fact, ways round this restriction. Suggest ways of increasing the number of registers beyond 2m while keeping
an m‐bit register select field.

SOLUTION

a. In principle yes, as long as you don’t have to pay a price for them. More registers means fewer memory
accesses and that is good. However, if you have to perform a context switch when you run a new task, having
to save a lot of registers may be too time‐consuming. Having more registers requires more bits in an
instruction to specify them. If you allocate too many bits to register specification then you have a more
limited instruction set.

b. Today, it’s the number of bits required to specify a register. A processor like the Itanium IA64 with a much
longer instruction word can specify more registers.

c. Having fixed special purpose registers permits more compressed code. For example, if you have a counter
register, any instruction using the counter doesn’t need to specify the register – because that is fixed. The
weakness is that you can’t have two counter registers. Computers that originated in the CISC area like the
IA32 architecture use special‐purpose registers, because they were designed when saving bits (reducing
instruction size) was important. Remember that early 8‐bit microprocessors had an 8‐bit instruction set.
More recent architectures are RISC based and have general‐purpose architectures. ARM processors are
unusual in the sense that they have a small general‐purpose register set that includes two special‐purpose
registers, a link register for return addresses and the program counter itself.

d. Of course, you can’t address more than 2m registers with an m‐bit address field. But you can use a set of more
than 2m registers of which only 2m are currently visible. Such a so‐called windowing technique has been used
in, for example, the Berkeley RISC and the SPARC processor. Essentially, every time you call a
subroutine/function you get a new set of register windows (these are still numbered r0 t0 r31). However,
each function has its own private registers that cannot be accessed from other functions. There are also
global registers common to all functions and parameter passing registers that are shared with parent and
child functions. Such mechanisms have not proved popular. The problem is that if you deeply nest
subroutines, you end up having to dump registers to memory.

115
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
20. Someone once said, “RISC is to hardware what UNIX is to software”. What do you think this statement means and
is it true?

SOLUTION

This is one of those pretentious statements that people make for effect. UNIX is the operating system loved by
many computer scientists and is often contrasted with operating systems from large commercial organizations
such as Microsoft. By analogy, RISC processors were once seen as an opportunity for small companies and
academics to develop hardware at a time when existing processors were being developed by large corporations at
considerable expense. Relatively small teams were required to design MIPS or the ARM processor compared to an
Intel IA32 processor. In that sense RISC/UNIX were seen as returning hardware/software to the masses. Over the
years, the distinction between RISC and CISC processors has become very blurred, even though computing world
is still, to some extent, divided into UNIX and Windows spheres.

21. What are the characteristics of a RISC processor that distinguish it from a CISC processor? Does it matter whether
this question is asked in 2015 or 1990?

SOLUTION

The classic distinction between RISC processors and CISC processors is that RISC processors are pipelined, and
have a small, simple, and highly regular instruction sets. RISC processors are also called load/store processors with
the only memory access operations being load and store. All data processing operations are register‐to‐register.
CISC processors tend to have irregular instruction sets, special purpose registers, complex instruction
interpretation hardware and memory to memory operations. However, the difference between modern RISC and
CISC processors is blurred and the distinction is no longer as significant as it was. RISC techniques have been
applied to CISC processors and even traditional complex instruction set processors are highly pipelined. Equally,
some RISC processors have quite complex instruction sets. One difference is that today’s RISC processors have not
returned to memory‐to‐memory or memory‐to‐register instruction formats.

22. What, in the context of pipelined processors, is a bubble and why is it detrimental to the performance of a
pipelined processor?

SOLUTION

As an instruction flows through a pipeline, various operations are applied to it. For example, in the first stage it is
fetched from memory and it may be decoded. In the second stage any operands it requires are read from the
register file, and so on. Sometimes, it is not possible to perform an operation on an instruction. For example, if an
operand is required and that operand is not ready, the stage processing the operand cannot continue. This results
in a bubble or a stall when ‘nothing happens’. Equally, bubbles appear when a branch is taken and instructions
following the branch are no longer going to be executed. So, a bubble is any condition that leads to a stage in the
pipeline not performing its normal operation because it cannot proceed. A bubble is detrimental to performance
because it means that an operation that could be executed is not executed and its time slot is wasted.

23. To say that the RISC philosophy was all about reducing the size of instruction sets would be wrong and entirely
miss the point. What enduring trends or insights did the so‐called RISC revolution bring to computer architecture
including both RISC and CISC design?

SOLUTION

Designers learned to look at the whole picture rather than just optimizing one or two isolated aspects of the
processor. In particular there was a movement toward the use of benchmarks to improve performance. That is,
engineers applied more rigorous design techniques to the construction of new processors.

116
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
24. There are RAW, WAR, and WAW data hazards. What about RAR (read‐after‐read)? Can a RAR operation cause
problems in a pipelined machine?

SOLUTION

No. A read‐after‐read situation would be:


ADD r1,r2,r3
ADD r4,r2,r7

In the above code, register r2 is read by both instructions. Since the value of r2 is altered by neither operation and
it does not matter (semantically) which instruction is executed first, there can be no problem.

25. Consider the instruction sequence in a five‐stage pipeline IF, OF, E, M, OS:

1. ADD r0,r1,r2
2. ADD r3,r0,r5
3. STR r6,[r7]
4. LDR r8,[r7]

Instructions 1 and 2 will create a RAW hazard. What about instructions 3 and 4? Will they also create a RAW
hazard?

SOLUTION

Yes ‐ possibly. Register r6 may not have been stored before it is read (in memory) by the next instruction. Of
course, part of the problem is the bad code. You are storing a value in memory and then reading it back. You
should replace the LDR r8,[r7] by MOV r8,r6.

26. A RISC processor has a three‐address instruction format and typical arithmetic instructions (i.e., ADD, SUB, MUL,
DIV etc.). Write a suitable sequence of instructions to evaluate the following expression in the minimum time:

X = (A+B)(A+B+C)E+H
G+A+B+D+F(A+B-C)

Assume that all variables are in registers and that the RISC does not include a hardware mechanism for the
elimination of data dependency. Each instance of data dependency causes one bubble in the pipeline and wastes
one clock cycle.

SOLUTION

It is necessary to write the code with the minimum number of RAWs. For example,

ADD T1,A,B ;A+B


ADD T2,G,D ;G+D
ADD T3,T1,C ;A+B+C
ADD T2,T2,T1 ;G+A+B+D
MUL T4,T1,F ;(A+B)E
SUB T5,T1,C ;A+B-C
MUL T4,T4,T3 ;(A+B)(A+B+C)E
MUL T5,T5,F ;F(A+B-C)
ADD T4,T4,H ;(A+B)(A+B+C)E+H
ADD T5,T5,T2 ;G+A+B+D+F(A+B-C)
DIV T4,T4,T5 ;((A+B)(A+B+C)E+H)/(G+A+B+D+F(A+B-C)) (one stall)

117
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
27. Figure P7.27 gives a partial skeleton diagram of a pipelined processor. What is the purpose of the flip‐flops
(registers) in the information paths?

Figure P7.27 Structure of a Pipelined Processor

SOLUTION

The problem with architecture like that of Figure P7.27 is that when an instruction is processed (e.g., an operation
and its operands), all the information must be in place at the same time. For example, if you perform a = b + c
followed by p = q ‐ r, it would be unfortunate if q and r arrived at the ALU at the same time as the + operator. This
would lead to the erroneous operation p = q + r.

Once an instruction goes from PC to instruction memory to instruction register, it is divided into fields (operands,
constants, instructions) and each of these fields provides data that flows along different paths. For example, the
op‐code goes to the ALU immediately, whereas the operands (during a register‐to‐register operation) go via the
register file where operand addresses are translated into operand values. The flip‐flops equalize the time at which
data and operations arrive at the ALU. It is also necessary to put a delay in the destination address path because
the destination address has to wait an extra cycle – the time required for the ALU to perform an operation.

28. Explain why branch operations reduce the efficiency of a pipelined architecture. Describe how branch prediction
improves the performance of a RISC processor and minimizes the effect of branches?

SOLUTION

Four stage pipeline IF = instruction fetch


OF = operand fetch
i-1 IF OF E S E = execute
S = result store

i IF OF E S It is not until after the


execute phase that
IF OF a fetch from the target
i+1 Bubble Bubble E S address can begin.
IF These two instructions
Branch instruction i+2 Bubble OF E S are not executed.
BRA N

N IF OF E S First instruction at the


branch target address

N+1 IF OF E S

118
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
The figure demonstrates the effect of a bubble in a pipelined architecture due to a branch. The pipeline inputs a
string of instructions and executes them in stages; in this example, it’s four. Once the pipe is full, four instructions
are in varying stages of completion. If a branch is read into the pipeline and that branch is taken, the instructions
following the branch are not going to be executed. Instructions ahead of the branch will be executed. A bubble is
the term used to describe a pipeline state where the current instruction must be rejected. In this figure it takes
two clocks before normal operation can be resumed.

29. Assume that a RISC processor uses branch prediction to improve its performance. The table below gives the
number of cycles taken for predicted and actual branch outcomes. These figures include both the cycles taken by
the branch itself and the branch penalty associated with branch instructions.

Actual
Prediction Not taken Taken
Not taken 1 4
Taken 2 1

If pb is the probability that a particular instruction is a branch, pt is the probability that a branch is taken, and pw is
the probability of a wrong prediction, derive an expression for the average number of cycles per instruction, TAVE.
All non‐branch instructions take one cycle to execute.

SOLUTION

The total number of possible outcomes of an instruction are:

Non‐branch cycles + branches not taken and predicted not taken + branches not taken and predicted taken +
branches taken and predicted taken + branches taken and predicted not taken

In each case, we multiply the probability of the event by the cost of the event; that is:

TAVE = (1 ‐ pb)⋅1 + pb ⋅((1 ‐ pt)⋅(1 ‐ pw)⋅1 + (1 ‐ pt)⋅pw⋅2 + pt⋅(1 ‐ pw)⋅1 + pt⋅pw⋅4 )

Remember that if pt is the probability of a branch being taken, 1 ‐ pt is the probability of a branch not being taken.
If pw is the probability of a wrong correction, (1 ‐ pw) is the probability of a correct prediction.

Therefore, the average number of cycles is 1 ‐ pb(1 ‐ 1 + pt + pw ‐ pt⋅pw ‐ 2⋅pw + 2⋅pt⋅pw ‐ pt + pt⋅pw ‐ 4⋅pt⋅pw)
= 1 ‐ pb⋅ ( ‐pw ‐ 2⋅pt⋅pw ) =1 + pb⋅pw(1 + 2⋅pt)

30. IDT application note AN33 [IDT89] gives an expression for the average number of cycles per instruction in a RISC
system as:

Cave = Pb(1 + b) + Pm(1 + m) + (1 ‐ Pb ‐ Pm) where:

pb = probability that an instruction is a branch


b = branch penalty
pm = probability that an instruction is a memory reference
m = memory reference penalty
Explain the validity of this expression. How do you think that it might be improved?

SOLUTION

The first term, Pb(1 + b), is the probability of a branch multiplied by the total cost of a branch (i.e., 1 plus the
branch penalty). The second Pm(1 + m) term deals with memory accesses and is the probability of a memory
access multiplied by the total memory access cost. The final term, (1 ‐ Pb ‐ Pm), is what’s left over; that is not a
branch and not a memory access.
119
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
This formula is limited in the sense that it does not describe the difference between branches that are taken and
not taken and between cache accesses and not‐cache accessed. However, its message is clear; reduce both
branches and memory accesses.

31. RISC processors rely (to some extent) on on‐chip registers for their performance increase. A cache memory can
provide a similar level of performance increase without restricting the programmer to a fixed set of registers.
Discuss the validity of this statement.

SOLUTION

Memory accesses can take orders of magnitude longer than register accesses. Because RISC style processors have
far more registers than CISC processors, it is possible to operate on a subset of data stored within the chip and to
reduce memory accesses.

However, cache memory, which is a copy of some frequently‐used memory, can reduce the memory access
penalty by keeping data in the on‐chip cache.

One argument in favor of cache is that it is handled automatically by the hardware. Registers have to be allocated
by the programmer or the compiler. If the number of registers is limited, it is possible that the on‐chip registers
may be used/allocated non‐optimally.

Cache memory also has the advantage that it supports dynamic data structures like the stack. Most computers do
not allow dynamic data structures based on registers (that is, you can’t access register ri, where i is an index). The
Itanium IA64 that we discuss in Chapter 8 does indeed have dynamic registers.

32. RISC processors best illustrate the difference between architecture and implementation. To what extent is this
statement true (or not true)?

SOLUTION

We have already stated that architecture and organization are orthogonal; that is they are independent. In
principle, this statement is true. You can create an instruction set on paper and then implement it any way you
want; via direct logic (called random logic) or via a structure such as microprogramming. However, some design
or organization techniques may be suited or unsuited to a particular architecture. CISC processors are
characterized by both complicated instructions (i.e., multiple‐part instructions or instructions with complex
addressing modes); for example, the BFFFO (locate the occurrence of the first bit set to 1) can be regarded as a
complex instruction, and irregular instruction encodings. Consequently, CISC instruction sets are well‐suited to
implementation/interpretation via microcode. The instruction lookup table simply translates a machine code
value into the location of the appropriate microcode. It doesn’t matter how odd the instruction encoding is.

RISC processors with simple instructions are well suited to implementation by pipelining because of the regularity
of a pipeline; that is, all instructions are executed in approximately the same way.

33. A RISC processor executes the following code. There are no data dependencies.

ADD r0,r1,r2
ADD r3,r4,r5
ADD r6,r7,r8
ADD r9,r10,r11
ADD r12,r13,r14
ADD r15,r16,r17

a. Assuming a 4‐stage pipeline fetch, operand fetch, execute, write, what registers are being read during the 6th
clock cycle and what register is being written?

120
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
b. Assuming a 5‐stage pipeline fetch, operand fetch, execute, write, store, what registers are being read during
the 6th clock cycle and what register is being written?

SOLUTION

a. Four‐stage pipeline
Cycle 1 2 3 4 5 6 7 8
ADD r0,r1,r2 IF OF E W
ADD r3,r4,r5 IF OF E W
ADD r6,r7,r8 IF OF E W
ADD r9,r10,r11 IF OF E W
ADD r12,r13,r14 IF OF E W
ADD r15,r16,r17 IF OF E

During the 6th clock cycle, operands r13 and r14 are being read and operand r6 is being written.

b. Five ‐stage pipeline


Cycle 1 2 3 4 5 6 7 8
ADD r0,r1,r2 IF OF E M W
ADD r3,r4,r5 IF OF E M W
ADD r6,r7,r8 IF OF E M W
ADD r9,r10,r11 IF OF E M W
ADD r12,r13,r14 IF OF E M
ADD r15,r16,r17 IF OF E

During the 6th clock cycle operands r13 and r14 are being read and operand r3 is being written.

34. A RISC processor executes the following code. There are data dependencies but no internal forwarding. A source
operand cannot be used until it has been written.

ADD r0,r1,r2
ADD r3,r0,r4
ADD r5,r3,r6
ADD r7,r0,r8
ADD r9,r0,r3
ADD r0,r1,r3

a. Assuming a 4‐stage pipeline: fetch, operand fetch, execute, result write, what registers are being read during
the 10th clock cycle and what register is being written?
b. How long will it take to execute the entire sequence?

SOLUTION

Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13
ADD r0,r1,r2 IF OF E W
ADD r3,r0,r4 IF OF E W
ADD r5,r3,r6 IF OF E W
ADD r7,r0,r8 IF OF E W
ADD r9,r0,r3 IF OF E W
ADD r0,r1,r3 IF OF E W

a. In the 10th cycle registers r0 and r3 are being read and register r5 is being written.

b. It takes 13 cycles to complete the sequence.

121
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
35. A RISC processor has an eight stage pipeline: F D O E1 E2 MR MW WB (fetch, decode, register read operands,
execute 1, execute 2, memory read, memory write, result writeback to register). Simple logical and arithmetic
operations are complete by the end of E1. Multiplication is complete by the end of E2.How many cycles are
required to execute the following code assuming that internal forwarding is not used?

MUL r0,r1,r2
ADD r3,r1,r4
ADD r5,r1,r6
ADD r6,r5,r7
LDR r1,[r2]

SOLUTION

Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 MUL r0,r1,r2 F D O E1 E2 MR MW WB
2 ADD r3,r1,r4 F D O E1 E2 MR MW WB
3 ADD r5,r1,r6 F D O E1 E2 MR MW WB
4 ADD r6,r5,r7 F D O E1 E2 MR MW WB
5 LDR r1,[r2] F D O E1 E2 MR MW WB

There’s only one RAW dependency in instruction 4 involving r5. The total number of cycles is 17.

36. Repeat the previous problem assuming that internal forwarding is implemented.

SOLUTION

Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
1 MUL r0,r1,r2 F D O E1 E2 MR MW WB
2 ADD r3,r1,r4 F D O E1 E2 MR MW WB
3 ADD r5,r1,r6 F D O E1 E2 MR MW WB
4 ADD r6,r5,r7 F D O E1 E2 MR MW WB
5 LDR r1,[r2] F D O E1 E2 MR MW WB

37. Consider the same structure as question 35 but with the following code fragment. Assume that internal
forwarding is possible and an operand can be used as soon as it is generated. Show the execution of this code.

LDR r0,[r2]
ADD r3,r0,r1
MUL r3,r3,r4
ADD r6,r5,r7
STR r3,[r2]
ADD r6,r5,r7

SOLUTION
Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 LDR r0,[r2] F D O E1 E2 MR MW WB
2 ADD r3,r0,r1 F D O E1 E2 MR MW WB
3 MUL r3,r3,r4 F D O E1 E2 MR MW WB
4 ADD r6,r3,r7 F D O E1 E2 MR MW WB
5 LDR r1,[r2] F D O E1 E2 MR MW WB

122
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
38. The following table gives a sequence of instructions that are performed on a 4‐stage pipelined computer. Detect
all hazards. For example if instruction m uses operand r2 generated by instruction m‐1, then write m‐1,r2 in the
RAW column in line m.

Number Instruction RAW WAR WAW


1 Add r1,r2,r3
2 Add r4,r1,r3
3 Add r5,r1,r2
4 Add r1,r2,r3
5 Add r5,r2,r3
6 Add r1,r6,r6
7 Add r8,r1,r5

SOLUTION

Number Instruction RAW WAR WAW


1 Add r1,r2,r3
2 Add r4,r1,r3 1,r1
3 Add r5,r1,r2 1,r1
4 Add r1,r2,r3 3,r1 1,r1
5 Add r5,r2,r3 3,r5
6 Add r1,r6,r6 4,r1 1,r1
7 Add r8,r1,r5 6,r1
5,r5

Note that some of the hazards are technical hazards and not real hazards. For example, instruction 3 does not
suffer a RAW hazard on r1 because any delay will have been swallowed by the previous instruction.

39. Consider the following code:

LDR r1,[r6] ;Load r1 from memory. r6 is a pointer


ADD r1,r1,#1 ;Increment r1 by 1
LDR r2,[r6,#4] ;Load r2 from memory
ADD r2,r2,#1 ;Increment r2 by 1
ADD r3,r1,r2 ;Add r1 and r2 with total in r3
ADD r8,r8,#4 ;Increment r8 by 4
STR r2,[r6,#8] ;Store r2 in memory
SUB r2,r2,#64 ;Subtract 64 from r2

The processor has a five‐stage pipeline F O E M S; that is, instruction fetch, operand fetch, operand execute,
memory, operand writeback to register file.

a. How many cycles does this code take to execute assuming internal forwarding is not used?
b. How many cycles does this code take to execute assuming internal forwarding is used?
c. How many cycles does the code take to execute assuming that it is reordered (no internal forwarding)?
d. How many cycles does the code take to execute assuming reordering and internal forwarding?

123
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
SOLUTION

a. No forwarding
Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 LDR r1,[r6] F O E M S
2 ADD r1,r1,#1 F O E M S
3 LDR r2,[r6,#4] F O E M S
4 ADD r2,r2,#1 F O E M S
5 ADD r3,r1,r2 F O E M S
6 ADD r8,r8,#4 F O E M S
7 STR r2,[r6,#8] F O E M S
8 SUB r4,r4,#64 F O E M S

b. Forwarding
Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 LDR r1,[r6] F O E M S
2 ADD r1,r1,#1 F O E M S
3 LDR r2,[r6,#4] F O E M S
4 ADD r2,r2,#1 F O E M S
5 ADD r3,r1,r2 F O E M S
6 ADD r8,r8,#4 F O E M S
7 STR r2,[r6,#8] F O E M S
8 SUB r4,r4,#64 F O E M S

c. Reordering
Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 LDR r1,[r6] F O E M S
2 LDR r2,[r6,#4] F O E M S
3 ADD r8,r8,#4 F O E M S
4 ADD r1,r1,#1 F O E M S
5 ADD r2,r2,#1 F O E M S
6 SUB r4,r4,#64 F O E M S
7 ADD r3,r1,r2 F O E M S
8 STR r2,[r6,#8] F O E M S

d. Reordering and forwarding


Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1 LDR r1,[r6] F O E M S
2 LDR r2,[r6,#4] F O E M S
3 ADD r8,r8,#4 F O E M S
4 ADD r1,r1,#1 F O E M S
5 ADD r2,r2,#1 F O E M S
6 SUB r4,r4,#64 F O E M S
7 ADD r3,r1,r2 F O E M S
8 STR r2,[r6,#8] F O E M S

40. Why do conditional branches have a greater effect on a pipelined processor than unconditional branches?

SOLUTION

The outcome of an unconditional branch is known the moment it is first detected. Consequently, instructions at
the target address can be fetched immediately. The outcome of a conditional address is not known until the
condition has been tested which may be at a later stage in the pipeline.

41. Describe the various types of change of flow‐of‐control operations that modify the normal sequence in which a
processor executes instructions. How frequently do these operations occur in typical programs?

SOLUTION

Operations that affect the flow of control are:


Branch/jump (programmer initiated)
Subroutine call
Subroutine return
Trap (operating system call)
124
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
Software exception
Hardware exception (interrupt).

All these events cause a change in the flow of control (non‐sequential instruction execution). Interrupts and
exceptions are relatively rare (expressed as a percentage of total instructions executed). The frequency of
branches and jumps may be expressed statically or dynamically. The static frequency is the fractional number of
branches in the code. The dynamic frequency is more meaningful and is the number of branches executed when
the code is run. Branch instructions make up about 20% of a typical program. Subroutine calls and returns are less
frequent (of the order of 2%).

42. Consider the following code:

MOV r0,#Vector ;point to Vector


MOV r2,#10 ;loop count
Loop LDR r1,[r0] ;Repeat: get element
SUBS r2,r2,#1 ;decrement loop count and set Z flag
MUL r1,r1,#5
STR r1,[r0] ;save result
ADD r0,r0,#4 ;point to next
BNE Loop ;until all done (branch on Z flag).

Suppose this ARM‐like code is executed on a 4‐stage pipeline with internal forwarding. The load instruction has
one cycle penalty and the multiply instruction introduces two stall cycles into the execute phase. Assume the
taken branch has no penalty.

a. How many instructions are executed by this code?


b. Draw a timing diagram for the first iteration showing stalls. Assume internal forwarding.
c. How many cycles does it take to execute this code?

SOLUTION

a. There are two pre‐loop instructions and a 6‐instruction loop repeated 10 times. Total = 2 + 10 × 6 = 62.

b. The following shows the code of one pass round the loop

Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1. LDR r1,[r0] F O E S
2. SUBS r2,r2,#1 F O E S
3. MUL r1,r1,#5 F O E S
4. STR r1,[r0] F O E S
5. ADD r0,r0,#4 F O E S
6. BNE Loop F O E S
1. LDR r1,[r0] F O E S (repeat)

c. It takes 11 cycles to make one pass round the loop. However, it takes 14 cycles to execute all the instructions
in a loop fully. The total number of cycles is 2 (preloop) + 10 × 11 + 3 (post loop) = 115.

43. Branch instructions may be taken or not taken. What is the relative frequency of taken to not taken, and why is
this so?

SOLUTION

At first sight is might appear that the probability of branches being taken or not taken is 50:50 because there are
two alternatives. However, this logic is entirely misleading because of the way in which branches are used. A
paper (albeit old) by Y. Wu and J.R. Larus (Static branch frequency and program profile analysis, MICRO‐27 Nov
1994) suggests that loop branches have a probability of 88% of being taken.
125
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
44. What is branchless computing?

SOLUTION

If branches are considered harmful because a misprediction can lead to bubbles in the pipeline, it is a good idea to
reduce the frequency of branches. Doing this is called branchless computing. In particular, it refers to predicated
computing where an instruction is conditionally executed; for example, the ARM’s ADDEQ r0,r0,#1
increments the value of register r0 if the result of the last operation that set the condition code was zero. The
IA32 MMX instruction set extension also permits branchless computing by turning a condition into a value; that is,
if a test yields true, the value 1111…1 is generated and if the condition is false the value 0000...0 is generated.
These two constants can then be used as masks in Boolean operations

45. What is a delayed branch and how does it contribute to minimizing the effect of pipeline bubbles? Why are
delayed branch mechanisms less popular then they were?

SOLUTION

The term delayed in delayed branch is not a very good description. In a pipelined computer a taken branch means
that the pipeline must be (partially) flushed. If the instruction sequence is P,B,Q where P, B, and Q are three
instructions and B is a branch, the instruction Q is executed if the branch is not taken and not executed if the
branch is taken. A delayed branch mechanism always executes the instruction after the branch. Thus, the
sequence P,Q,B (where P and Q are executed before the branch) becomes P,B,Q where Q is still executed before
the branch. Of course, if a suitable instruction P cannot be found, the so‐called delayed branch slot must be filled
with a NOP (no operation).

46. How does branch prediction reduce the branch penalty?

SOLUTION

In a pipelined processor, an instruction flows through the pipeline and is executed in stages. If an instruction is a
branch and the branch is taken, all instructions behind it in the pipeline have to be flushed. The earlier a branch is
detected and the outcome resolved the better. Branch prediction makes a guess about the direction (outcome) of
the branch; taken or not taken. If the branch is predicted not taken, nothing happens and execution continues. If
the branch is predicted as taken, instructions can be obtained from the branch target address and loaded into the
instruction stream immediately. If the prediction is incorrect, the pipeline has to be flushed in the normal way.

47. A pipelined computer has a four‐stage pipeline: fetch/decode, operand fetch, execute, writeback. All operations
except load and branch do not introduce stalls. A load introduces one stall cycle. A non‐taken branch introduces
not stalls and a taken branch introduces two stall cycles. Consider the following loop.

for (j=1023; j > 0; j--) {x[j]=x[j]+2;}

a. Express this code in an ARM‐like assembly language (assume that you cannot use autoindexed addressing and
that the only addressing mode is register indirect of the form [r0]).
b. Show a single trip round the loop and indicate how many clock cycles are required.
c. How many cycles will it take to execute this code in total?
d. How can you modify the code to reduce the number of cycles?

SOLUTION

a. The code
mov r2,#1023
Loop ldr r0,[r1]
add r0,r0,#2
126
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
str r0,[r1]
add r1,r1,#4
subs r2,r2,#1
BNE Loop

b. A trip round the loop has 6 instructions. The load has a one cycle stall and the taken branch back has two
cycles. The total is 6 + 1 + 2 = 9 cycles.

c. The total number of cycles is 1 + 1,024 × 9 ‐ 2 (the minus 2 is there because the branch is not taken on the last
loop). This is 9,215 cycles.

d. You can speed up the code by unrolling the loop and performing multiple iterations per trip and avoiding the
two cycle branch delay. You could save a cycle of latency by inserting the increment r1 by 4 after the load to
hide the load stall.

48. Suppose that you design an architecture with the following characteristics

Cost of a non‐branch instruction 1 cycle


Fraction of instructions that are branches 20%
Fraction of branches that are taken 85%
Fraction of delay slots that can be filled 50%
Cost of an unfilled delay slot 1 cycle

For this architecture

a. calculate the average number of cycles per instruction


b. calculate the improvement (as a percentage) if the fraction of delay slots that are filled can be increased to
95%.

SOLUTION

a. Average cycles = non‐branch cycles + non‐taken branches + taken branches slot filled + taken branches slot
unfilled.
= 80% × 1 + 20%(15% × 1 + 85% × (50% × 1 + 50% × 2))
=0.80 + 0.20 × (0.15 + 0.85 × (0.50 + 1.00) = 0.80 + 0.20 × (0.15 + 1.275) = 1.085

b. The only thing different is the fraction of unfilled slots. We can write
Average cycles = 80% × 1 + 20%(15% × 1 + 85% × (95% × 1 + 5% × 2))
= 0.80 + 0.20(0.15 + 0.8925) = 1.0085.

49. A pipelined processor has the following characteristics:


• Loads 18%
• Load stall (load penalty) 1 cycle
• Branches 22%
• Probability a branch is taken 80%
• Branch penalty on taken 3 cycles
• RAW dependencies 20% of all instructions except branches
• RAW penalty 1 cycle

Estimate the average cycles per instruction for this processor.

SOLUTION

We have to add the load and data and branch stalls.


Load stalls: 18% × 1 = 0.18.
127
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
Data stalls: 78% × 20% × 1 = 0.156.
Branch stalls 22% × 80% × 3 = .66 × .8 = .528
Total = 1 + 0.18 + 0.156 + 0.528 = 1.864 cpi

50. What is the difference between static and dynamic branch prediction?

SOLUTION

Static prediction takes place before any code is executed; that is, it does not use feedback from the actual running
of the code to make a prediction. Dynamic prediction uses information from the past behavior of the program to
predict the future behavior. Dynamic prediction is more accurate than static prediction.

Static prediction relies on factors such as the static behavior of individual branches (e.g., this branch type is
usually taken, this one is not). Such an approach is relatively crude. The compiler can analyze code and make a
guess about the outcome of branches and then set a hint bit in the code. The processor uses this hint bit to decide
whether the branch will be taken. Note that not all computers have a branch hint bit.

Dynamic branch prediction observes the history of branches (either individually or collectively) and the position of
branches in the program to decide whether to take or not take a branch. Dynamic prediction can be very accurate
in many circumstances.

51. A processor has a branch‐target buffer. If a branch is in the buffer and it is correctly predicted, there is no branch
penalty. The prediction rate is 85% correct. If it is incorrectly predicted, the penalty is 4 cycles. If the branch is not
in the buffer, and not taken, the penalty is 2 cycles. Seventy percent of branches are taken. If the branch is not in
the buffer and is taken the penalty is 3 cycles. The probability that a branch is in the buffer is 90%. What is the
average branch penalty?

SOLUTION

Branch penalty = mispredict penalty (in buffer) + taken penalty (not in buffer) + not taken penalty (not in buffer) =
90% × 15% × 4 + 10% × 70% × 3 + 10% × 30% × 2
= 0.54 + 0.21 + 0.06 = 0.81 cycles per branch.

52. How can the compiler improve the efficiency of some processors with branch prediction mechanisms?

SOLUTION

Some processors allow the compiler to set/clear bits in the op‐code that tell the processor whether to treat this
branch as taken or not taken; for example, if you have a loop in a high level language, the terminating conditional
branch will be taken back to the start of the loop n‐1 times for n iterations. The compiler would set the take
branch bit in the opcode and the processor would automatically assume ‘branch taken’.

53. Consider the following two streams of branch outcomes (T = taken and N = not taken). In each case what is the
simplest form of branch prediction mechanism that would be effective in reducing the branch penalty?

a. T, T, T, T, T, N, T, T, T, T, T, T, T, N, T, T, T, T, T, N, T, T, T, T, T, T, T, N, T, T, T, T, T
b. T, T, T, T, T, N, N, N, N, N, N, N, N, N, T, T, T, T, T, T, T, T, T, T, T, N, N, N, N, N, N, N, N

SOLUTION

a. Static

b. 1‐bit change direction on first error

128
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
54. A processor uses a 2‐bit saturation‐counter dynamic branch predictor with the states strongly taken, weakly
taken, weakly not taken, and strongly not taken. The symbol T indicates a branch that is taken and an N indicates
a branch that is not taken. Suppose that the following predicted sequence of branches is recorded: T T T N T X

What is the value of X?

SOLUTION

In order to make the N prediction, the previous two states would have to be not taken states. If the next
prediction is T then the previous branch must have been T to move from the weakly not taken predicted state to
the weakly predicted taken state. Therefore the next prediction X will be T.

55. The following sequence of branch outcomes is applied to a saturating counter branch predictor
TTTNTTNNNTNNNTTTTTNTTTNNTTTTNT. If the branch penalty is two cycles for a miss‐predicted branch, how
many additional cycles does the system incur for the above sequence of 30 branches? Assume that the predictor
is initially in the strongly predicted taken state.

SOLUTION
Branch sequence
T T T N T T N N N T N N N T T T T T N T T T N N T T T T N T
Next predictor state (ST, WT, SN, WN, SN)
ST ST ST WT ST ST WT WN SN WN SN SN SN WN WT ST ST ST WT ST ST ST WT WN WT ST ST ST WT ST
Outcome (decision)
T T T T T T T N N N N N N N T T T T T T T T T N T T T T T T
Wrong decision
W W W W W W W W W W W

The number of wrong decisions is 11 costing 11 × 2 = 22 cycles. This is no better than guessing taken.

56. The state diagram below represents one of the many possible 2‐bit state machines that can be used to perform
prediction. Explain, in plain English, what it does.

T T

NT Not Not
S0 S1 Taken S2 Taken S3
taken taken
NT NT T

NT

SOLUTION

We can regard S0 as a strongly not taken state and all not taken branches lead towards this state. States S0, S1,
S2, S3 behave exactly like the corresponding states in a saturating counter with respect to not taken branches.
The differences between this and a saturating counter are:

1. If you are in state S1 (not taken) and the next branch is taken, you go straight to state S3, the strongly taken
state.
2. If you are in state S3, a taken branch takes you to state S2 (rather than back to state S3). State S3 is not a
saturating state. If there is a sequence of taken branches, the system oscillates between S2 and S3. From
state S3 the next state is always state S2 (since a taken and a not taken have the same destination).

129
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
57. What is a branch target buffer and how does it contribute to a reduction of the branch penalty?

SOLUTION

The fundamental problem with a branch is that if it is taken, instructions already in the pipeline have to be
flushed. Consequently, you want to detect a branch instruction as soon as possible. Then you can begin execution
at the target address.

Branch target prediction operates by detecting the branch, guessing its outcome and fetching instructions from
the next or target address as soon as possible.

The branch target buffer, BTB, is a form of memory cache that caches the addresses of branch instructions. The
program counter searches the BTB. If the current instruction address corresponds to a branch, the cache can be
accessed and the predicted outcome of the branch read (This is true only of BTBs that have a prediction bit. In
general, it is assumed that every cached branch will be taken). The BTB contains the address of the target of the
branch. This means that instructions can be loaded from that address immediately (without having to read the
branch instruction and compute the target address). If you also cache the instruction at the target address you
can get the instruction almost immediately. The BTB lets you resolve the branch much earlier in the pipeline and
therefore reduce the branch penalty.

58. Consider the 4‐bit saturating counter as a branch predictor with 16 states from 1111 to 0000? Describe in words
the circumstances where such a counter might be effective.

SOLUTION

If the branch predictor works in the same way as a 2‐bit saturating counter, it has 16 states; 8 of which predict
take and 8 don’t take the branch. If you are in a run of taken or not taken branches (more than 15) then you are in
the strongest taken (or not taken state). It will take a run of eight wrongly predicted branches in sequence to
reverse the decision. Therefore, you might use such a system in circumstances where very longs runs of a branch
are in one direction, and you do not wish to reverse the direction unless there is a change of direction spanning 8
branches.

59. Draw the state diagram of a branch predictor using three‐bit saturating counter? Under what circumstances do
you think such a predictor might prove effective?

SOLUTION

The predictor will not change direction when fully saturated until four consecutive wrong decisions have been
made.

130
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
60. Given the branch sequence TTTTNTTNNTTTTTNNNNNNNNNTNTTTTTTTTTTTT and assuming that the 3‐bit
saturating predictor starts in its saturated T state, what will the predicted sequence be?

SOLUTION

Input
T T T T N T T N N T T T T T N N N N N N N N N T N T T T T T T T T T T T T
State
S7 S7 S7 S7 S6 S7 S7 S6 S5 S6 S7 S7 S7 S7 S6 S5 S4 S3 S2 S1 S0 S0 S0 S1 S0 S1 S2 S3 S4 S5 S6 S7 S7 S7 S7 S7 S7
Predict
T T T T T T T T T T T T T T T T T N N N N N N N N N N N T T T T T T T T T
Outcome (c = correct, w = wrong)
C C C W C C W W C C C C C W W W W C C C C C W C W W W W C C C C C C C C C

61. The following code is executed by an ARM processor:

MOV r0,#4
B1 MOV r2,#5
SUB r2,r2,r0
B2 SUBS r2,r2,#1
BNE B2 ;Branch 1
SUBS r0,r0,#1
BNE B1 ;Branch 2

Assume that a 1‐bit branch predictor is used for both branch 1 and branch 2 and that both predictors are initially
set to N. Complete the following table by running through this code.

Branch 1 Branch 2
Cycle Branch prediction Branch outcome Cycle Branch prediction Branch outcome
1 N N 1 N T
2 2
3 3
4 4
5
6
7
8
9
10

Repeat the same exercise with the same initial conditions but assume a 2‐bit saturating counter branch predictor.

SOLUTION

Branch 1 Branch 2
Cycle Branch prediction Branch outcome Cycle Branch prediction Branch outcome
1 N N 1 N T
2 N T 2 T T
3 T N 3 T T
4 N T 4 T N
5 T T
6 T N
7 N T
8 T T
9 T T
10 T N
131
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
62. A processor executes all non‐branch instructions in one cycle. This processor implements branch prediction,
which incurs an additional penalty of 2 cycles if the prediction is correct and 4 cycles if the prediction is incorrect.

a. If conditional branch instructions occupy 15% of the instruction stream, and the probability of an incorrect
branch prediction is 20%, what is the average number of cycles per instruction?
b. If the same processor is to run no less than 28% slower than a machine with a zero branch penalty when up
to 20% of the instructions are conditional branches, what level of accuracy must the branch prediction
achieve on average?

SOLUTION

a. CPI = non‐branch cycles + branch cycles (correct prediction) + branch cycles (incorrect prediction)
= 0.85 × 1 + 0.15(0.80 × 2 + 0.20 × 4) = 0.85 + 0.15(2.4) = 0.85 + 0.36 = 1.21 CPI

b. A machine with a zero branch penalty runs at 1.0 CPI.


CPI = 0.80 + 0.20(Pc × 2 + (1 ‐ Pc) × 4) = 0.8 + 0.20(4 ‐ 2Pc) = 0.85 + 0.80 ‐ 0.4Pc = 1.65 ‐ 0.4Pc.
This must not be 28% less than a machine with no branch penalties; that is 1.65 ‐ 0.4Pc = 1.28 and Pc = (1.65 ‐
1.28)/0.40 = 0.925; that is the branch prediction must be about 93% accurate.

63. A computer has a branch target buffer, BTB. Derive an expression for the average branch penalty if:

• a branch not in the BTB that is not taken incurs a penalty of 0 cycles
• a branch not in the BTB that is taken incurs a penalty of 6 cycles
• a branch in the BTB that is not taken incurs a penalty of 4 cycles
• a branch in the BTB that is taken incurs a penalty of 0 cycles
• the probability that a branch instruction is cached in the BTB is 80%
• the probability that an instruction not in the BTB is taken is 20%
• the probability that an instruction in the BTB is taken is 90%

SOLUTION

We have to add up penalties for all outcomes:


Not in BTB not taken 20% × 80% × 0 = 0.00
Not in BTB and taken 20% × 20% × 6 = 0.24
In BTB not taken 80% × 10% × 4 = 0.32
In BTB taken 80% × 90% × 0 = 0.00
Total = 0.54 cycles

132
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
64. A RISC processor implements a subroutine call using a link register (i.e., the return address is saved in the link
register).The cost of a call is 2 cycles and the return costs 1 cycle. If a subroutine is called from another subroutine
(i.e., the subroutine is nested), the contents of the link register must be saved and later restored. The cost of
saving the link register is 6 cycles and the cost of restoring the link register is 8 cycles. Assume that a certain
instruction mix contains 20% subroutine calls and returns (i.e., 10% calls, 10% returns). The probability of a single
subroutine call and return without nesting is 60%. The probability that a subroutine call will be followed by a
single nested call is 40%. Assume that the probability of further nesting is vanishingly small. What is the overall
cost of subroutine calls? The average call of all other instructions is 1.5 cycles. What is the average number of
cycles per instruction?

SOLUTION

There are five possibilities: an instruction is not a subroutine call or return, it is a single call, it is a nested call, it is
a single return, it is a nested return. Note that when a subroutine is nested, it has the unnested call return plus
the extra save/return time. The probabilities and costs are:

Not subroutine 80% × 1.5 cycles 1.20


Subroutine call (not nested) 10% × 60% × 2 cycles 0.12
Subroutine call (nested) 10% × 40% × (2 + 6 cycles) 0.32
Subroutine return (not nested) 10% × 60% × 1 cycle 0.06
Subroutine return (nested) 10% × 40% × (1 + 8 cycles) 0.36
Average cycles 2.06 cycles

65. Why is the literal in the op‐code sign‐extended before use (in most computer architectures)?

SOLUTION

Literals in instructions are invariably shorter than the register size of the computer; for example, a 32‐bit
processor might have a 16‐bit literal and 32‐bit registers. When the literal is loaded into the low‐order bits of a
register, the upper order bits must either be cleared, left unchanged, or used to extend the loaded value to the
full length of the register (i.e., sign extension). Since many computer instructions operate with signed values or
with address offsets, it makes sense to sign‐extend an operand when it is loaded. Some processors like the 68K
have separate address (pointer) and general‐purpose data register. Values in address registers are always sign‐
extended, whereas those in data registers are not sign‐extended.

66. Why is the address offset shifted two places left in branch/jump operations in 32‐bit RISC‐like processors?

SOLUTION

Typical processors have 32‐bit, four‐byte, instructions, yet the memory is byte addressed. That is, words have the
hexadecimal address 0,4,8,C,10,14 … However, the address bus can access addresses at any location; for example,
you can access address 0xABC3 (which is not word‐aligned). Because the two lowest bits of an address are always
zero for an aligned address, there is no point in storing them when an address is stored in an instruction as an
offset; for example if the address offset is xxxxxxxx00, it is stored as xxxxxxxx. Consequently, when loaded it must
be shifted left by two places to generate xxxxxxxx00. Doing this extends the effective size of a literal by two bits.

133
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
67. Assume a 5‐stage pipeline (instruction fetch, operand fetch, execute, memory, write‐back). For the following code
show any stalls and indicate where operand forwarding would be needed.

ADD R9,R9,R8
MUL R1,R2,R3
LDA R5,(4,R1)
SUB R5,R5,R1
ADD R7,R8,R9
MUL R7,R1,R5

SOLUTION

With no internal forwarding (operand fetch in bold where operand fetching is needed)

Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Instruction
ADD F OF E M WB
R9,R9,R8 ADD R9,R8 R9+R8 None R9
MUL F OF E M WB
R1,R2,R3 MUL R2,R3 R2.R3 None R1
LDR R5,(4,R1) F OF E M WB
LDR 4,R1 [4,R1] read R5
SUB R5,R5,R1 F OF E M WB
SUB R5,R1 R5‐R1 None R5
ADD F OF E M WB
R7,R8,R9 ADD R8,R9 R8+R9 None R7
MUL F OF E M WB
R7,R1,R5 MUL R1,R5 R1.R5 None R7

With internal forwarding (operand used in next cycle after its creation)

Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Instruction
ADD F OF E M WB
R9,R9,R8 ADD R9,R8 R9+R8 None R9
MUL F OF E M WB
R1,R2,R3 MUL R2,R3 R2.R3 None R1
LDR F OF E M WB
R5,(4,R1) LDR 4,R1 [4,R1] read R5
SUB F OF E M WB
R5,R5,R1 SUB R5,R1 R5‐R1 None R5
ADD F OF E M WB
R7,R8,R9 ADD R8,R9 R8+R9 None R7
MUL F OF E M WB
R7,R1,R5 MUL R1,R5 R1.R5 None R7

134
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied, or duplicated, or posted to a publicly available website, in whole or in part.
Another random document with
no related content on Scribd:
The Project Gutenberg eBook of Transient
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.

Title: Transient

Author: Ward Moore

Illustrator: Virgil Finlay

Release date: January 13, 2024 [eBook #72706]

Language: English

Original publication: New York, NY: Ziff-Davis Publishing Company,


1960

Credits: Greg Weeks, Mary Meehan and the Online Distributed


Proofreading Team at http://www.pgdp.net

*** START OF THE PROJECT GUTENBERG EBOOK TRANSIENT


***
TRANSIENT

By WARD MOORE

ILLUSTRATED by FINLAY

COMPLETE BOOK-LENGTH NOVEL

[Transcriber's Note: This etext was produced from


Amazing Stories February 1960.
Extensive research did not uncover any evidence that
the U.S. copyright on this publication was renewed.]
CHAPTER 1

The Governor, a widower in his earliest fifties, turned off the ignition,
noting with satisfaction the absence of street signs limiting parking
time. Governor Lampley, serving out the unexpired term of his
predecessor and not entirely hopeless of nomination and election in
his own right, pictured the stupid or fanatical cop who under any
circumstances would write a ticket for the car with the license GOV-
001. He and Marvin had made a big joke out of those zeros, Marvin
showing his hostility under the kidding, the Governor hiding his
dislike for his secretary under his self-deprecation.
Before getting out he dusted the knees of his trousers and looked up
and down the shabby street. The Odd Fellows Hall was built of
concrete blocks; Almon Lampley was reasonably sure it hadn't been
there thirty years before. The other buildings seemed to be as he
remembered them, if anything so fragile as the reconstruction in his
mind could be called a memory. He'd forgotten the name of the
place, its very location. Only the highway marker, the one so close it
rooted the town briefly from obscurity to pinpoint it fleetingly: so
many miles from the capital behind him, so many miles to the
destination before him, hit the chord. Why, it was here. This was the
place. How very long, long ago. Goodness (he curbed the natural
profanity of even his thoughts lest he offend some straitlaced voter),
goodness—years and years. A generation. Before he met Mattie,
before he switched from selling agricultural implements to vote-
getting.

And the sign just outside, Pop. 1,983. Pathetic lack of 17 more pop.
With 2,000 they could have boasted: We're on our way, on our third
thousand, the biggest little town between here and there. Watch us
grow. If you lived here, you'd be home now. Get in on the ground
floor and expand with us, Tomorrow's metropolis. Under two
thousand was stagnation, decay, surrender. 1,983: possibly a
thousand registered voters; more likely eight hundred—two
precincts. How many Republicans, how many Democrats? Maybe
three screwballs: one voting Prohibition, one writing in his own
name, one casting a ballot for Pogo. A sad town, a dead town.
Surely it hadn't been so thirty years ago?
But there had been the railroad then, and young Almon Lampley
swinging down from the daycoach before the wheels stopped
turning, bursting with enthusiasm, eager, cocky, invincible. The
railroad gone, its tracks melted into scrap, its ties piled up and
burned, its place taken by trucking lines, buses, cars. You had to
have progress. So what if the town got lost in the process, fell
behind? There were other towns, equally deserving, equally
promising, equally anxious to get ahead. The state was full of them:
chicory capital of the world, hub of mink breeding, where the juiciest
pickles are made, home-owners' heaven, the friendliest city, Santa
Claus' summer residence, host to the annual girly festival, gateway
to the alkali flats. Thousands of them. And he was governor of the
whole state. It would be non-feasance if not mis-feasance in him to
regret this one bypassed settlement.
Evidently progress, before it withered, had brought the Odd Fellows
Hall. No more. The false fronted stores were as he remembered—as
he thought he ought to have remembered—and the dwelling set
back from the street, forgotten or held in irascible obstinacy, petunias
and geraniums growing too lush in the overgrowned front yard. The
Hay, Grain & Feed where he had called—where he must have called
—the garage, the Chevrolet agency, the hotel.

The Governor gave a final brush to his trousers, pocketed the keys,
and picked up his overnight case from the seat beside him. The hotel
was unquestionably the most prominent building on either side of the
street yet he had unconsciously left noticing it to the last. It was a
square three stories high, probably older than anything else in town,
of no identifiable style, with a sign saying glumly ROOMS, MEALS,
in paint so ancient its surface had peeled away, leaving only fossil
pigment to take the weather and continue the message. The brown
clapboards had grayed, they were parted—driven asunder—by a
vertical column of match-fencing, mincingly precise in its senility,
pierced by multipaned windows with random blue, brown, green and
yellow glass. The verandah, empty of chairs but suggestive of a
place for drummers to sit with their heels on the collapsing railing,
sagged in a twisted list. The two balconies above it had been
mended with scrap lumber, unpainted, and the repairs themselves
mended again.
Governor Lampley could easily have driven another thirty, forty, fifty
miles—it was only mid-afternoon and he was not tired—to find
modern accommodations. He could have driven all the way to his
destination. He chose to stop here. As a sentimental gesture? As an
uncomfortable (fleas, lumpy beds, creaky floors) amusement? As a
whim? Call it a whim. The Governor was on an unofficial, very
limited, vacation.
He admitted feeling slightly foolish as he took the three steps to the
verandah and walked over the uneasy boards to the plate-glass
doors and into the darkened, dusty lobby. In this position one didn't
give way to sudden impulse. Any yielding to sentiment was
calculated, studied, designed, to be milked for good publicity. He
could see the bored, competent photographers, the casual—well-
planned—chat with the reporters. Marvin would have arranged it all;
the Governor would have only to move gracefully through his part.
Responsibly he ought to phone Marvin, let him know he was staying
here, give his attention to whatever business Marvin would say
couldn't wait till tomorrow. In imagination he could hear the
querulous, nagging tones beneath the surface respectfulness, the
barely suppressed astonishment (what do you suppose he's up to
now? a woman? a meeting with one of the doughboys? a drunk?),
the assurance Marvin would call if anything came up. He ought to
phone Marvin immediately.
The thought of Marvin made him turn and glance back through the
doorway, to reassure himself he was not part of a scheduled
program after all. But there was no car on the street save his own,
no busy technicians, no curious onlookers, no one. Only the
afternoon sunlight, the swirling motes, the faint smell of oil and dust.
As soon as he accustomed his eyes to the dimness he saw there
was no one in the lobby. An artificial palm, its raffia swathings loose
as a two year old's diaper, stood in a wooden tub. Eight chairs were
placed in neatly opposing rows, four covered in once-black leather,
cracked and split, the wrinkles worn brown, four wooden, humbly
straight. There was an air of peacefulness independent of the dark,
the quiet, the emptiness, an assertion that there was no need to
hurry here, that there was never a need to hurry here.

He lifted his arm to look at his watch. The sweep-second hand was
not revolving. He put the watch to his ear; there was no tick. He
wound it, shook it; it didn't start. He slipped it off his wrist into his
pocket, and loosened his necktie.
He stood in front of the brown counter whose top was shiny with the
patina of leaning elbows. There was a bell with inverted triple chins
and outpopping pimple, an open register turned indifferently toward
him, a bank of empty pigeonholes. He picked up the chewed pen
with the splayed nib, bronze-shiny where the ink had dried on it. He
had to tip the black scumcrusty inkwell far over to moisten it. It
scratch-scratched across the top line on the page, engraving his
name but only staining the depressions here and there, mostly on
the downstrokes. Oddly, instead of the capital, he wrote down the
city he had once lived in, the city where he got his first job.
After registering he hesitated over the bell. Instead of ringing it he
picked up his bag and walked to the shadowed stairway. Up ahead
he saw a low-watt bulb staring bleakly on the wall. The area around
the globe was a grimy green, outside the magic circle the pervading
dark brown was undisturbed. The carpet under his feet was
threadbare and gritty; through his shoe-soles he felt the lumps of
resistant knots in the wood, and the nailheads raised by the wearing-
down around them. He gazed ahead.
There was a landing halfway up, opening on a narrow hall. Light
came through fogged windows set close together along one wall.
The other was papered with circus posters, the brightly lithographed
elephants and hippopotami faded almost to indiscernibility, the
creases burst open like scored chestnuts. The Governor hesitated,
went on up.
At the second floor he turned left, noting how spacious this hall was
in contrast to the one below, how comparatively bright and clean.
Most of the doors were slightly ajar, not inviting perhaps, merely
indicating they were receptive to a tenant.
From the outside there was nothing to choose between them yet he
felt the choice was important. Further, it seemed to him that opening
a door would commit him; he must choose without inspection.
Thoughtfully he passed several. The one he finally entered opened
on a large room with two tall windows. Thin, brittle curtains drooped
palely from the rods. Two dressers, the high one bellying forward,
the low one supporting a tilted metal-dull mirror, were thick with
cheap varnish that wept long blob-ended tears. The double bed was
made, the coverlet turned down, the lumpy pillows smooth and gray.
On a whatnot in one corner a glass bell enclosed two wax figures, a
bride and groom in wedding finery. The wax bride was wringing her
waxen hands.

The Governor put his bag on the foot of the bed, took off his jacket,
rolled up his sleeves, went to the sink. The faucets were black-
spotted, green-flecked, with remnants of nickelplating and long dark
scratches. The basin was orange-brown and gray-white. He turned
on the HOT. There was a quick hiss and a slurp of thick, liquid rust.
He tried the COLD. The slurp was the same but there was no hiss.
He looked around the room again, saw the washstand. The knobbly-
spouted pitcher stood in the center of the knobbly-rimmed bowl. The
water appeared good despite the dust floating on top. He poured
some in the basin and rinsed his face and hands.
As a small child he had been sure water was life. Once he sprinkled
some on a dead bird, stiff and ruffled. He found a towel, hard and
grainy, dried his hands, shrinking slightly from the contact. He took
his comb from his jacket and ran it through his still thick hair, only
lightly graying. It was a minor pride that his campaign pictures were
always the latest, never one taken when he was much younger.
He became aware he was being watched and turned inquiringly
toward the door. The man standing there wore heavy work-shoes,
blue denim pants, a denim jacket buttoned to his neck. His face was
dark, his straight black hair long. His eyes slanted ever so little
above his high cheekbones. He smiled at Lampley. "Everything OK?"
"Everything OK," said Lampley. "Except the plumbing."
The man nodded thoughtfully.
"Oh, the plumbing. It went out." He gestured vaguely with his hands,
indicating leaks, stoppages, broken pipes, hopeless fittings, worn-out
heaters. "So we put in washstands."
"I see. Maybe it would have been better to have it fixed."
The other shook a doubtful head. "This was change. Advance.
Improvement. Maybe next we'll put a well in every room, with a rope
and bucket reaching straight down. Plop! And then rrrrr, up she
comes full and slopping over. Or artesians with the water bubbling up
like a billiard ball on the end of a cue. That would be hard to beat,
ay? Or perhaps wooden pipes from the rain gutters."
"I see," said Lampley. The plans didn't seem unreasonable. "You're
the clerk?" he asked politely.
"Clerk is good as any. Everyone has lots to do."
"That's right. Well, thanks."
"Don't mention it."
Lampley rolled down his sleeves, refastened his cufflinks, put his
jacket back on. "Can I get something to eat here?"
"Why not? Come on."

The Governor followed him into the hall, closing the door. He thought
briefly of asking about telephoning since there was no phone in his
room. Still it wasn't really necessary; Marvin could take care of
everything. The clerk led him, not to the stairs he had come up, but
in the opposite direction. Some of the partially open doors were
painted in vivid colors and marked with symbols strange to Lampley.
The backstairs were narrower, steeper, darker; the Governor had a
constant fear of overestimating the width of the treads and placing a
searching foot upon insubstantial air. They came to the halfway
landing but instead of the windowed hall with the circus posters, they
entered a low room, low as a ship's cabin compressed between
decks. Exposed beams held up the ceiling. A long plank table ran
between two benches, a high ladderback chair at the head and foot.
One of the benches was built into the battened wall.
On it a man with an infantile face and bulging forehead under coarse
black hair crouched over the table guarding his food with tiny
kangaroo arms. A stained and spotted napkin was tied around his
neck like a bib. He slobbered and gurgled over a bowl of thick
porridge, smearing it around his mouth, spilling on the napkin as he
scooped the mess from the bowl.
At the head of the table an old man, white-haired, hook-nosed,
chewed silently. On the outside bench was a middle-aged woman
with sagging, placid features, and a girl in her teens. All looked
Indian or Mexican except the idiot, none paid attention to their
arrival.
The clerk sat down at the foot of the table. Lampley saw there was
no place for him except on the bench next to the defective. He edged
his way in, staying far as possible from him. The room was suddenly
oppressive; he had the notion they must be near a furnace, a boiler,
a dynamo. He took out his carefully folded handkerchief and wiped
his forehead. The old man glanced at him sympathetically.
The young girl reached under the table and came up with a bright
green crepe paper party favor. She extended it diffidently toward the
Governor. Smiling, he took hold of the stiff cardboard strip inside the
ruffle with his thumb and forefinger. She giggled, holding the other
end; they pulled. The cracker popped, a red tissue paper phrygian
cap fell out. She clapped her hands and motioned him to put it on.
Slightly embarrassed, he complied.
She searched through the torn favor for the motto, unfolded it. She
shook her head and handed it to him. He read, AN UNOPENED
BOOK HAS NO PRINTING. She put a bowl of beans, cut-up
chicken, and rice before him. "Thank you," he said.
"For nothing," she responded, shyly polite. Her young breasts
pushed out against her white shirt. Her dark eyes looked into his
before her long lashes fell. Her mouth was wide and supple.
Lampley realized she was beautiful. He thought with pain of walking
with her through knee-high grass and lying beside her under
spreading trees.
He spooned up some of the food; it was overcooked and tasteless. It
didn't matter. Between spoonfuls he looked furtively at the woman—
he dared not let his eyes return to the girl—and thought he saw a
resemblance to.... To whom? The face was pleasant, ordinary,
memorable neither for charm nor repulsiveness. It was a matter of
professional pride, an occupational necessity for him to remember
faces; he could not recall this one. It nagged at the back of his mind.

The old man rose, wiping his mouth with the back of his hand,
bowing clumsily toward the Governor. He pulled a wrinkled pack of
cigarettes from his shirt pocket and extended it. "Thanks,"
acknowledged Lampley, "I don't use them." The old man shook his
head, tipped the pack to his mouth, replaced it, lit the cigarette with a
match struck on the seat of his trousers. His fingers were thick and
twisted; they still appeared capable of delicate manipulation.
The clerk pushed back his chair. "We might put self-service in here,"
he remarked to no one in particular. "Individual stoves, maybe
mechanized farms or hydroponic tanks." He belched, holding his
hand diffidently before his mouth.
Lampley emptied his bowl. The girl looked questioningly at him. "No
more," he said. "Thank you."
She smiled at him, followed the clerk and the old man from the room;
he was alone with the woman and the idiot. He wanted to get up and
go too; something held him. "A long time," said the woman gently.
He knew what she meant; he refused to accept understanding. "I'm
sorry."
"Since you were here. You forgot?"
There was coldness in his stomach. "No ... not exactly. I'm sorry."
She shrugged. Her arms and shoulders were rounded and graceful
but their grace did not obscure the fact that she was old as he, or
nearly. Why was it so reprehensible to long for freshness and beauty
in women but the stamp of taste to want these qualities in anything
else? "I'm sorry," he said for the third time, aware of the phrase's
futility.
She smiled, showing a gold tooth. The others were white but
uneven. "For nothing," she echoed the girl. "What is there to be sorry
for?"
His eyes went from the creature on the bench to her and back again.
"Yours," she said calmly.
He had known, but knowing and knowing were different things.
"Impossible!"
She showed the gold tooth again. "Why impossible? You make love,
you have babies."
"But—like that?"
"Does everything have to be perfect for you?"
He regarded her with greater horror than he had his—his son. A
beast, an animal, giving birth to beasts and animals. "Not perfect.
But not ... this."
She laughed and moved around the table to the unfortunate. She
untied the napkin and tenderly wiped his vacant face and the
undeveloped hands. She kissed him passionately on the forehead.
"You think it is possible to love only perfection? You couldn't love one
like this, or an old woman, or a corpse?"

Lampley ran from the room, past a curtained entrance, and stumbled
through a hall lit with yellow, grease-filmed light. The hall smelled of
food, acrid, sickening. There was a swinging door at the end,
padded, outlined with brass nails. Many were missing, their absence
commemorated by the dark outline of where they had been. He
pushed through it.
The kitchen was oddly constructed. Its ceiling seemed to be two
stories high; just under it were niches for sooty plaster figures, all
horribly distorted, figures with one arm twice the length of the other,
phalluses long as legs, monstrous heads, steatopygian buttocks,
goiters resting on sagittarian knees. Through a rose window yellow-
pink beams streamed to the flagged floor. Scrubbed and sanded
butchers' blocks splattered with gobs of fat, drying entrails, scabs of
hard blood stood against the wall. Gleaming knives and cleavers
were racked in the sides of the blocks. Tomb-like ranges were
hooded in a row; opposite them empty spits turned before cold,
blackened fireplaces.
The old man was seated on a stool before a slanting table,
methodically chopping onions in a wooden bowl. He turned his head.
"She gave you a hard time, hay?"
Blackmail, extortion, exposure, disgrace. "I don't know."
The old man wrinkled his forehead. The peculiar light made the
creases unnaturally deep, like well-healed scars. "Who does know?"
He laid down his cleaver. "Come."
Will-less, the Governor followed. The old man had a limp, formerly
unnoticed. They went past the ranges to a massive steel door with a
red lamp beside it. The old man lifted the tight latch. Lampley noted
a safety device preventing the door from fully closing except from the
outside.
They were in a large refrigerator. Sides of beef, barred red and pale,
hung from hooks. Whole sheep and pigs, encased in stiff,
unarmoring fat, thrust dead forefeet toward the unattainable
sawdust-covered floor. Barrels of pickling brine, boxes of fish and
seafood (the lobsters waving feelers uncertainly) were arranged
neatly. It was cold; the Governor shivered. Plucked fowl dangled in
rows. Beyond them wild ducks and geese, still sadly feathered, were
suspended in bunches, three ducks, two geese to a bunch. The
Governor touched a mallard's breast as he passed; the down was
strangely warm in this chilling place.
They crossed an empty space. The whole carcass of an animal hung
from gray hooks curving through the tendons of its legs. It had been
skinned and gutted but the head was intact and untouched. The
shaggy hair drooped forward, the horns pointed at nothing. The
glazed eyes absorbed the light, the tongue, clenched between dead
teeth, protruded.
"Buffalo," cried the Governor. "Surely it's against the law to kill
them?"

The old man ran his dark, heavily veined hand gloatingly over the
bison's hump and down the shoulder. "Tasty. Very tasty."
"You can't do things like this," insisted the Governor.
"Ah," sighed the old man. "Boom boom."
Lampley came closer. There were no signs of the buffalo having
been shot. Its throat was cut and dark blood had congealed around
the jagged wound. The old man picked out several clots of dried
blood and put them in his mouth, sucking appreciatively. He rubbed
his cheek against the head of the animal. "Soft," he said. "See
yourself."
The Governor drew back.
The old man stared contemptuously at him. "No wonder."
"How do I get out of here?" asked Lampley.
The old man gestured indifferently. "Try that way." He waved his arm.
Lampley turned. Either the refrigerator had gotten colder or he was
newly vulnerable to the chill. He shivered; hoarfrost crunched under
his feet, the wall glistened with ice crystals. He realized he was not
retracing his steps when he passed braces of partridges—or had he
merely not noticed them before?—grouse, pheasants. He looked
back; the old man was still nuzzling the bison head.
He came to a mound of snow and was puzzled, less at its presence
than at its use and origin. Who would manufacture snow, and for
what purpose? And if not manufactured it must have been brought a
long way at great expense for it did not snow in this part of the state
more than once in a dozen years.
But it was not a simple mound after all but an igloo, crudely
constructed, as though by a child. Impulsively he got down on his
knees and crawled through the entrance-tunnel until his head was
inside. It was warm under the dome, warm and soothing and safe.
He backed out quickly, frightened at the thought of becoming too
content there, of not being able to leave its comfort.
The cold of the refrigerator was accentuated by the contrast; his
breath came in steamy puffs. He hurried to a door opening from the
inside. He leaned against the wall of a dark corridor and breathed
deeply. The picture of the old man fondling the buffalo head was still
before him. He felt his way slowly along the wall, and then he was in
the lobby again. There was something wrong here: the room where
they had eaten had been a half level higher.
There was no point in wondering over the layout of the hotel. He
would retrieve his bag, get in the car and go on to his destination. He
stumbled through the gloom, missing the stairs, and saw he was in
front of an antique elevator, the doors open, the ancient basketwork
cage an inch or two below the floor.
"Get in," invited the clerk, "I'll take you."
Lampley entered, panting a little, smoothing his tie with his palm.
"Thank you."
The clerk pulled the grill shut; there was no door on the elevator
itself. "Ninety-three million miles to the sun," he said. "We'd fry
before we got there."
The Governor considered the idea. "Explode from lack of pressure,
asphyxiate from lack of air first."
The clerk looked at him curiously. "We could shut our eyes and hold
our breath, you know."
Lampley did not answer.
"All right." The clerk grasped the control lever. The cage fell with
sickening velocity.

CHAPTER 2

Lampley knew something had broken but his fear was not absolute.
He bent his knees (go limp: drunks and babies were less liable to
injury than the stiffly erect); the drop would not be fatal, he might not
even break a leg. How far to the basement? Twenty feet at most. If
he just relaxed—or jumped to the top of the cage and clung to the
fretwork?—he would not be hurt. He must not be hurt; the
implications of the headlines would destroy him.
The elevator dived in darkness, far further than any conceivable
excavation beneath the hotel. It fell through night, blacker and more
terrifying than any moonless, starless reality. It plunged into total,
unrelieved absence of light, a devastation to the senses, a mockery
to the eyes.
Then, subtly, there was a difference. The blackness was still black
but now it could be seen and valued. It was blackness, not
blindness. Then increasingly there was a faint diminution of the
darkness itself, and the shaft changed from sable to the deepest
gray.
After they had fallen still further Lampley saw the shaft was lined with
porcelain tiles, yellowed with neglect. Could it be they were no longer
falling at all, just descending normally? To where?
They flashed past doors, cavernous rectangles in the shiny wall.
Shiny? Yes, the tiles were brighter, cleaner, whiter. The cage slowed,
came to a bouncing halt. The repressed fear surged through
Lampley, making his feet and ankles weak and helpless. The clerk
slid the door open smartly, with a sharp click.
"What's here?" asked the Governor, conscious of the inadequacy of
the question.
"Odds," replied the clerk. "Odds. No ends to speak of but plenty of
odds."
He half led, half pushed Lampley out of the elevator. They were in a
great chamber, so far stretching that though it was adequately lit the
defining walls were lost, far, far off. So close to each other that they
almost touched, grand pianos with their lids thrown back and strings
exposed, stood, rank after rank. From the ceiling long stalactites
dripped on the pianos: plink plink, plink plink, plink plunk! A thousand
pianotuners might have been at work simultaneously.
"Nobody here," said the clerk. "All right." He turned swiftly back into
the elevator, slamming the door.
"Wait!" cried Lampley in panic. "Wait for me." He heard the softly
whirring mechanism as the elevator started, leaving him alone.

You might also like