Solutions Manual Computer Organization and Design 4th PDF

You might also like

Download as pdf
Download as pdf
You are on page 1of 228
a Solution 1.1 4.41.4. Computer used to rum large problems and usually accessed via a network: S supercomputers 11.2 10" or 2© bytes: 7 petabyte 1.1.3 Computer composed of hundreds to thousands of processors and terabytes of memory: 3 servers 4.1.4 Today's science fiction application that probably will be available in near future: 1 virtual worlds 4.1.5 A kind of memory called random access memory: 12 RAM 1.1.6 Part of a computer called central processor unit: 13 CPU 4.1.7 Thousands of processors forming a large duster: 8 datacenters 4.18 A microprocessor containing several processors in the same chip: 10 multi= ‘core processors 1.1.9 Desktop computer without screen or keyboard usually accessed via a net- ‘work: 4 low-end servers 4.4.40 Currently the largest class of computer that runs one application or one setof related applications: 9 embedded computers 1.1.44. Special language used to describe hardware components: 11 VHDL 1.1.22 Personal computer delivering good performance to single users at low ‘cost: 2 desktop computers 1.1.43 Program that translates statements in highelevel language to assembly language: 15 compiler Chapter. Sokitions 1.4.14 Program that translates symbolic instructions to binary instructions: 21 assembler 1.4.45 High-level language for business data processing: 25 cobol 1.4.46 Binary language that the processor can understand: 19 machine language 1.4.47 Commands that the processors understand: 17 instruction 1.4.48 High-level language for scientific computation: 26 fortran 1.4.19 Symbolic representation of machine instructions: 18 assembly language 1.4.20 Interface between user's program and hardware providing a variety of services and supervision functions: 14 operating system, 1.4.21 Sofiware/programs developed by the users: 24 application software 1.4.22 Binary digit (value 0 or 1): 16 bit 1.4.23 Software layer between the application software and the hardware that includes the operating system and the compilers: 23 system software 1.4.24 High-level language used to write application and system software: 20C 1.4.25 Portable language composed of words and algebraic expressions that ‘must be translated into assembly language before run ina computer: 22 high-level language 1.1.26 10” or 2 bytes: 6 terabyte Solution 1.2 1.21 Sbitsx 3 colors=24 bits/pixel = 4 bytes/pixel. 1280 x 800 pixels = 1,024,000 pixels, 1,024,000 pixels x4 bytes/pixel = 4,096,000 bytes (approx 4 Mbytes). 1.2.2 2GB=2000 Mbytes. No. frames = 2000 Mbytes/4 Mbytes = 500 frames. 1.2.3 Network speed: | gigabit network => 1 gigabit/per second = 125 Mbytes! second. File size: 256 Kbytes = 0.256 Mbytes. Time for 0.256 Mbytes = 0.256/125 = 2.048 ms. 1.2.4 2 microseconds from cache => 20 microseconds from DRAM, 20 micro seconds ftom DRAM ==> 2 seconds from magnetic disk 20 microseconds from. DRAM => 2ms fiom flash memory, Solution 1.3 4.3.4. P2 has the highest performance performance of PI (instructions/sec) Performance of P2 instructions/sec) performance of P3 (instructions/sec) 1.3.2 No.cyeles= time x clock rate ycles(P1) = 10x 2x 10°=20x 10s. ycles(P2) = 10x 15x 10° = 15 x 10’s ycles(P3) = 10 3x 10° =30x 10"s, time = (No instr. x CPD)/clock rate, then No. instructions = No. cycle CPI instructions(P1) = 20 x 10°/1.5 = 13.33 x 10” 1.3.3 timeyg,= time gg 07 CPL= CPLx 1.2, then CPI(PI) = 1.8, CPI(P2) = 1.2, CPI(P3) = 3, f=No. instr. x CPI/time, then APL) = 13.33 x 10° x 8/7 A(P2) = 15 x 10? x 1.2/7 AP3) = 12x 1093/7. 1.344 IPC= /CPI= No. instw/(time x dock rate) PCL) = 142 IPC(P2)=2 IPOP3) = 3.33 1,35 Timejy/Timegg=7/10=0.7-0 fyey= fei 07 1.3.6 Timengy/Timegg=9/10=0.9. So Instructions,.y = Instructions,,jX 0.9 \GH2/0,7 = 2.14 GHz. 10x 10” x09 = 27 x 10”, Chapter. Sokitions Solution 1.4 ‘141 P2 lass A: 10° instr, lass B: 2% 10? instr lass C:5 x 10° instr Gass D: 2 10° instr Time =No. inst, x CPVdock rate “Time class Time class C Time class. ‘Time class B: ‘Time class Time dass ‘Total time P2= 11 x10 21.4.2 CPI=time x dock rate/No. instr, CPI(P1) = 18.65 x 10 1.5 10/10°: CPI(P2) = 11 x 104 x 2x 10°/10° = 2.2 79 18x 1+ 2% 1X 24+5%105X3-42% 107 X4=28 X10 PX 2+2% 1X 245 1X 2+ 2x 10x 3=22K 10° (OX 1+ 50X54 100X5 + S0X2)X05x 107 =E75nS 14.5 CPI=time x dock rate/No. instr. (WrIs675 x10 x2 x10 700=1.82 146 Tine = GO0X 1+ 50X5 + 505+ 50% 2) X05X 10> 55018 Spoadup = 675 ns/SEDs = 1.22 (P1=550% 10° x2 10'/T00= 157 Solution 1.5 154 1G, 0.756 S/S b, | 1G,1.56 mss 15.2 ‘a. | F216 1.3 tres taste than PL bi. | PLis 1.08 times faster then P2 15.3 ‘| 2b LAL tres taste than PL B, | PLS 1,00 tines faster then PD 154 a | 20505 b. | 2935 15.5 a | O7tis | 0.8655 15.6 ‘@ | 130 tines faster | La0 tines fier Solution 1.6 1.6.1 Pete conpaer hor : m a : is on Chapter. Sokitions 1.6.2 =m 16.3 a ; a om 1.6.4 : Ea Sia : een Sia 146.5 Speed-up, PI versus P2: ‘& | OSSTI0ERES B | O7SOESIES 1.6.6 ‘@ | S2ON0SIESS wb, | R2IeDIeDIS Solution 1.7 74 Geometric mean clock rate ratio = (1.28 1.56 0,74)"7=2.15 ‘Geometric mean power ratio = (1.24 x 1.20 2.06 2.88 x 2.59 x 1.37 x 0,92)! = 162 v4 x 3,03 x 10.00 x 1,80 1.7.2 Largest clock rate ratio = 2000 MHz/200 MHz = 10 (Pentium Pro to Pentium 4 Willamette) Largest power ratio= 2: 2.1 W/10.1 W = 2.88 (Pentium to Pentium Pro) 17.3 Clock rate: 2.667 x 10°/12.5 x 10°= 212.8 Power: 95 W/3.3 W = 28,78 1.7.4 C=PN?x dockrate 80286: .0105 x 10° 80386: C = 0.01025 x 10-6 80486: .00784 x 10° .00612 x 10° 0133 10-6 Pentium 4 Willamette: C= 0.0122 x 10° Pentium 4 Prescott: C = 0.00183 x 10-¢ Core 2: C=0.0294 x 10° 17.5 33/1) 1.78 (Pentium Pro to Pentium 4 Willamette) 1.7.6 Pentium to Pentium Pro: 3.3/5 = 0.66 Pentium Pro to Pentium 4 Willamette: 1. Pentium 4 Willamette to Pentium 4 Presco Pentium 4 Prescott to Core 2: 1.1/125 =088 Geometric mean = 0.68 0.53 125/1.75=071 Solution 1.8 4.8.1 Power, =V? xdlock rate C, Power= 0.9 Power, Cy,= 08 xP KOS XAT BP XAX IP = 1.08 1.8.2 Power,/Power, = V7 x dock rate,/V 7 dock rate; Power Pover, = 087 = Reduction of 13% 18.3 Ponerg= Vix 1x 10? x08 G,=06%Poner, Power, = 5? x 0.5 x 10°C Vx 1x AWxOBXC, = 06 xPXO5% 10" C1 Vg=(OGXS? x05 x 10°YA.x 10? x08) /?= 3080 Chapter. Sokitions 1x Gua Vaal 11.8.6 Voltage = 1.1 x 1/27! = 0.92 V. Glock rate = 2.667 x 2"? = 3.771 GHz Solution 1.9 191 V4)2 5 dock rate x 2!? = Powerg. Thus, ‘a | 1749%100= 2% wb. | 46/120 x100 = 57.5% 1.9.2 & [a= 33208 be | bau = 45/11 = 409 19.3 ‘a | Ponera/Poveran = 1/49= 0.02 ba | Ponerg/Povetyn = 45/57 = 08 LBA Powers/Power,,=0.6 => Powery = 0.6 XPoweln, ‘am | Ponerg= 06 x40W=24W be }exaW=18W 19.5 a 24/0B= SDA 19.6 2 ee ee ee ee Solution 1.10 1,101 pe ree [teers | eet | : ; = = 2 = = : = = [| reeomere | namctons perpen | Talim | : = = ; = = 2 = = : = = 1.10.2 oT : 5 = 2 as Z = 1 “A006 2 3205 4 Bae4 3 35m Chapter. Solutions 1.10.3 —2h————— Eee 5376 258 1a ara — LEE eee ‘S376 3a78 3564 3a82 1104 4.00 Bar 125 Ors Emre 400 200 1.00 (050 a3 | 110.5 ey ‘core | Power (W) per core | Power(W) | Power (W) Ei Et Eri 0625 15 0625 025 025 025 ‘core | Power (W) per core Eis 0825 0m 0m 0m 125 25 a3 Het sph fale et) Ea 0825 125 25 1.10.6 Sa : = : = : is ; 225 i g i ala|a|a|—=|s|a]a]e EH Ba) a) a Chapter. Sokitions Solution 1.11 1.41.4 Wafer area =x (d/2)?? be 1113 ‘a | Dies per vater= 1.1 x90= 99 Defects per area = 1.15 x 0,018 = 0.021 detects/an? fer aed Dies per wafer = 176,7/29 = 1.78 cx? LALA Yield =1/(1 + (defect per area x die area)/2) ‘Then defect per area = (2/die area)(y“Y/— 1) Replacing values for T] and T2 we get ‘TI: defects per area = 0.00085 defects/mm? = 0.085 defects/am? ‘T2: defects per area = 0.00060 defects/mm? = 0.060 defects/am? ‘T3: defects per area = 0.00043 defects/mm? = 0.043 defects/am? ‘T4: defects per area = 0.00026 defects/mm? = 0.026 defects/am* 1,45 nosolution provided Solution 1.12 1.12.1 CPI= dock rate x CPU time/instr. count dlock rate = L/eyde time = 3 GHz. ‘a | Pibeand = 3x 10? x 500/218 x 10 =0.7 (Pint) = 3 10? x 1200/336 x 10" = 10.7 ref. time/execution time, (AxTE= 1219 1.424 CPU time =No. instr.x CPV dock rate If CPL and dock rate do not change, the CPU time increase is equal to the increase in the number of instructions, that is, 10%. 1.12.5 CPU time(before) = No. instr. CPI/clock rate CPU time(after) = 1.1 x No. instt.x 1,05 x CPW clock rate CPU times(after)/CPU time(before) = 1.1 x 1.05 = 1.155. Thus, CPU time is, increased by 155% (1.12.6 SPECratio= reference time/CPU time SPECratio(after)/SPECratio(before) = CPU time(before CPU time(afier) = 1/1.1555 = 0.86. That, the SPECratio is decreased by 14%. Solution 1.13 4.43.1 CPI=(CPU timex clock rate)/No. instr, ‘a | 1= 450% 4 x 107/085 X2118 x 107) = 0.98 bb, | 1 1150 4x 10°/(0.85 x396x 107) = 16.10 Chapter. Sokitions 1.43.2 Clock mate ratio= 4 GHa/3 GHz= 1.33, @ | Pi@acre= 099,091 @3 Gre= 0.7 ratio= 11 b, | PlesGr= 161,01 @3GH= 10,7 ratio= 150 ‘They are different because although the number of instructions hasbeen reduced by 15%, the CPU time has been reduced bya lower percentage. 1.13.3 ‘@ | 490/500 = 0.90. CPU time reduction: 10%. bb, | 1150/1200 = 0.958, CPU tine reduction: 42, 1.43.4 No. instr. =CPU time x dock rate/CPI. ‘| No. inst = 8200.9 4% 108/0.96 = S075 x10 wa | No. inst, = 58009 x4 x 107/2.94= T1010 1613.5 Clock rate =No. instr. x CPYCPU time. (Clock rate jy =No. instr. x CPI/0.9 x CPU time = 1/0.9 dlock rate =3.33 GHz. 1.43.6 Clock rate=No. instr. x CPYCPU time. Clock rate = No. instr. x 0.85 x CPI/0.80 CPU time = 0.85/0.80 dock rategyy = 3.18 GHz. Solution 1.14 1.14.1 No. instr. = 10° TaulPD= AP XL 25/4 AP = ORISA PE A? neMUCIONS Tea (P= OSI XAT S P2:Tep/P2) = NXO.75/3 x 10? then N= 1.26 x 108 1.143 MIPS= Clock rate x 10°/CPL MPSP1)=4 x10? x 107/125 =200 MPSF2)=3% 10? x 10/075 = 4000 MPSP1) 10°/.01 x 10° x 10°) =2.97 x 108 “The Second progam has te higher Peomance and the higher MFLOPS fare, GUL the frst rrogram: has the higher MPS figure, Solution 1.15 145.1 T= 35 XO.8= 285, T= 28 +85 + 50+ 30 = 195 5, Rechelion 35K 50 XOB= 405, T= 40 + 80 + 50 + 30= 2005, Redo 4.7% Chapter. Sokitions 1.15.2 & | Wa =200 x08: Tip Ty} Taann = 215 & Tyg = 45 & Reaction tine NT 7% By | T= 210 XOB= 16S Tip Tye Then = 1508 Tyg SSS Reduction tre NISRA 1.15.3 & | Ta =200X08= 1605 Ty Tia * ra 4708 NO Be | T= 2i0x08: ST Tia * ys 2608. NO 1.154 Glock eyles = CPIjp X No. FP instt, + CPline X No. INT instr, + CPI. No. US instr. + CPlyyanch X No. branch instr. Tau = dock cytles/clock rate = clock cycles/2 x 10? bb | 8 processors: Goo QUES = 10245 Ty =O5I2 5 ‘To half the number of clock cycles by improving the CPI of FP instructions: (Phingroved fp * No- EP instr. + CPling X No. INT instr. + CPhy, x No. L/S instr. + (Planch % No. branch instr, = clock eycles/2 (Plingroved p= (lock cycles/2 = (CPI, X No. INT instr. + CPI, No. U/S instr. + (Plana, X No. branch instr,)No, EP instr. ‘a [1 OCeSSOr CPlrraasy = (4096 — 762/560 <0 not POSS Ba | 8 rOCeSEOTS: OM grams y= G12 ~944)/E0 MOL pOSSEIE 1.15.5. Using the clock cycle data from 1.15.4: ‘To half the number of dock cycles improving the CPI of L/S instructions: (Ply x No. FP instr. ++ CPlizy % No. INT instr: + CPlinprened is * No. L/S instr. + (CPlppanch X Nos branch instr. = clock eycles/2 (CPlinprosed is = (dock eycles/2 — (CP ly x No, FP instr, + CPlig X NowINT inst + (CPhiyaneh * No. branch instr.))/No. L/S instr. si7 1 process Proud a= 096 = 9072/1280 = 08 1 PROCESSIONS: Praed 1,156 Clock cyles= CPlj, No, FP instt,+ CPlizy No. INT instr, CPly,%No, L/S instr: + (CPleyarcs Noe branch instr Tey = lock cycles/clock rate = dock cycles/2 x 10° CPI) =0.6% 1 = 06; CPI,,= 07 x4=. ' PFOOSSGONS Ty (BETO MOK) = OSI 6; Ty (GNF MIPIOM) = 0.542 5 Solution 1.16 1.16.1. Without reduction in any routine: {tal time 2 proc = 185 ns ‘otal tine 16 proc = 34 ns Reducing time in routines A, C and E: 2 proc: WA)= 17 ns, TO) = 85s, TE) = 4 ns, tal time = 78.Gns —> veda = 29% 16 proc: TA)= 3.48, TO = 1.16.2 2 pr00: NE) = 72s, Ota ine = 177 ns —> eon =A 16 pros: TS) = 12.6 ns total ire = 325s => redoion =A 116.3 2 PROC: NO) = GSS, (al tine = 17ENS => ATINN=S-7H 516 roo! ND) = 10S rs, Wal tne =S2.8re — reac = 3S Chapter. Solutions 1.164 Cad ‘# Processors — | Computing time oO Routing time ratio 2 176 a 95 055 1s, 49 Ost aL 16 2 Ost 12 2 4 oar 205) oe 65 06 Ls 1.46.5 Geometric mean of computing time ratios = 052, Multiply this by the ‘computing time for a 64-processor system gives a computing time for a 128 processor system of 3.4 ms. Geometricmean of routing time ratios = 1.19, Multiply this by the routing time for a 64-processor system gives a routing time for a 128-processor system of 30.9 ms. 1.46.6 Computing time = 176/0.52 = 338 ms. Routing time = 0, since no com= munication is required. F Solution 2.2 Chapter 2 Sokitions 2.2.2 F z a [a7 234 23.5 Solution 2.4 244 24.2 243 B 244 Chapter 2 Sokitions 245 & | nochange tt | nochange 24.6 a | 5.25 writen, Sminenaaty 1b | 2.25 writen, 2mineneaty Solution 2.5 2.5.1 a ‘Aokress Data temp = ArrayE3; 2 1 arrayl3] = Arrayl2 I; 8 6 arrayl2] i 4 4 arrayll] 3 ° 2 arraylO] be ‘aouress Data emp = Arrayl41s 16 1 arrayl4] 2 2 array[O] = temp: 8 3 tem = Array[31: 4 4 arrayl3] = ArrayLL: ° 5 ArrayEi] = temps 252 a ‘Access Data temp = Array[3s Tw $10, 12¢856) 2 1 Array[3] = Arrayl2]; | Iw $ti, 8CSs6) 8 6 array[2] = Arrayl1]; | sw $tl, 12¢6s6) 4 4 ArrayLl] = Array[O]; | Iw $ti, 4Css6) ° 2 Array] = temp: sn St, 8Cs6) Tw $tl, OCSs6) Stl, 4¢$s6) $10, C856) be accress Data tem = ArrayL4]; $00, 16(556) 16 1 arrayl4] = ArrayLOl: Stl, c$s6) 2 2 arrayLO] = temps sw $tl, 16(556) 8 3 sw $10, O¢$s6) 4 4 temp = ArrayL3]: 0 5 arrayl3] = ArrayLil; | Iw $00, 12¢$56) arrayl1] = temps Tw $ti, 4¢$s6) sw $tl, 12¢856) sh $10, 4($56) 25.3 a] Ades Daa | tam - Arraylal: Tw $0, ‘Sips nstrucns, +4 2 1 Arrayl3] = Arrayl21: Tw stl, a mips ist for every nary 8 6 Arrayl2] = Arrayl11i sw $tl, 1 ato offset h/sW Bair 4 4 ArrayLi] = ArraylO1: Tw stl, 4 ‘Cimips inst.) ° 2 Arrayl0] = temps sw $tl, ai Tw stl, ai sw $tl, 4 si $t0, 0} BAe oats | cen = Arrayl; Tw $0, i ‘Bmips nstrucons, +4 16 1 Arrayl4] = Arrayl01s Iw $tl, a mips fst for every nary 2 2 array] = temp: sw $tly I ato offset h/sW Bair 8 3 sw $0, 0 ‘Cimips inst.) 4 a enp = ArrayL3]5 ° 5 Arrayl3] = Arrayl11s Iw $0, 12¢ arraylil = temp: Tw $tly 4 si $tl, 1 SH $0, 4 254 @ | 205418806 ws | suse0ran2a 25.5 a pattess Daa reess: Dat 2 2 2 B 8 a 8 ES 4 56 4 a ° 78 o 2 B notte Daa reess: Data 2 be 2 og 8 a 8 0 4 © 4 a ° oa o be Solution 2.6 264 alin sub add wb | acd lw add lw Chapter 2 Sokitions 2.6.2 2.6.3 2.6.6 adi $50, $50, $51 a a ripe [0 | 36 | a9 | 16 » P| te | ee acai $56, 56, <2] we | 8 | 2 | 2 20 adi_i56, #6, tsi [Rape | oO | 2m | 7 | 2 ww $50, 0650 | roe | 5 | 2 | 16 a 273 wb | FFFFS: a | FFF b [a 27.6 b | FFF Solution 2.8 284 a | TFFFFFFF, 1 b 00, averfiow Chapter 2 Sokitions 2.8.2 a 0 Overton & [ 0, no overtion 2.8.3 ‘a | EFFFFFFF, overfion 284 ‘| overtiow we | ne cverion 28.5 ‘@ | no orton we | ne cverion 28.6 ‘@ | overiow & [no wvemon Solution 2.9 294 ‘| overtiow we | ne cverion 2.9.2 ‘@ | overtiow we | ne cverion 2.9.3 ‘@ | mvertiow | ovation 294 ‘@ | moveriow b. [ moveriow 2.10.3 ‘@ | At0so004 ws, | ancaoo40 Chapter 2 Sokitions = REOB, 7 OO, PSSOKI3, OS, inm=OxA ‘2920022012 2366177216 2.41.3 sw $13, -4(850) be Tw $00, ~64Cst0) 2414 Rare Fee a [add $vi, Sat, # b [sw Sal, 4(850) 241.6 | oxanssao04 Solution 2.12 2424 ee a[me[ os ]sf]3 ifs 6 total its = 26 a total its = 32. welts = 28 8/5 wavs = 26 ‘@ | ssregsiers > ss its par NStUCtIN — could recuoe ove Sa ess regsters — more regter spills mare instructions ia | smater constants — more ki nstrucions — could naease Oxde See | ‘Selle Constants ~ smaller noades —> smal Oxde Swe 2124 ‘| 17367056 | 23e6n77208 Chapter 2 Sokitions 212.6 a | Rape 02030, 0-8 Be | type, 09-003, =O Solution 2.13 243.4 a | OS77S577B wb. | OxFEFFFEDE 2.43.2 ‘a | cxSS555560 wa | OxEADFEEDO 2.13.3 a | OXGFEFFFFF Solution 2.14 2441 a |] ad std, $0 srl stl, 5 andi Stl, OxOOltf tt b | add st, $0 si] stl, 10 andi StL, Oxf fF #8000 2.14.2 ‘@ | add sti, st, $0 andi $tl, $t1, Oxooodoor [add Stl, $00, $0 srl $tl, Stl, 14 andi $t1, $t1, 0x0003c000 2.14.3 a] add sti, sto, $0 sr] $tl, $t1, 28 [add Stl, $00, $0 srl Stl, $tl, 14 andi $t1, $t1, x0001c000 2144 a] add $t2, $00, $0 srl $t2, $2, and $t2, $t2, Ox0ggo003t and Stl, Stl, Oxfffff cd ori Stl, $t1, $02 Badd $12, sto, $0 sl] $t2, $t2, 3 and $t2, $t2, Oxoofcood and $t1, $t1, oxtffosrtt ori Sti, $ti, $t2 2445 ‘a [acd $12, $00, 60 and $2, $t2) dxoooooar and $tl, §tl, Oxfrrrrred ori Stl, stl, st2 [add $12, $10, $0 sll $t2, $t2, 14 and $12, $2, Qx0007000 and $tl, $l, Oxrrrasrrr ori Stl, Stl, st2 am [add $12, $00, 0 srl $12, $12, 29 and $12, $12, Oxooggo003 and $11, St, OxfFrrfrrc ori_ Stl, stl, $12 be [add $12, $00, $0 srl $t2, $12) 15 and $2, $2) dxooddc000 and $tl, §tl, Oxfrrrarrr ori Stl, stl, st2 Solution 2.15 245.4 ‘@ | oxoondase) | @xoorrsa66 2.15.2 ‘a [nor Stl, $12, $12 and__$tl, $tl, $3 wm pxor Stl, $i, Sta nor __$tl, $ti, $tl 2.15.3 and tl, stl, $t3_| Goaa00 a1001 01 @ [nor Stl, $12, $t2 ] 000000 01010 1010 Gl00] 00000 100 1901 00000 100100 be [xor Sti, $12, $03 | Go0000 O10I0 ox nor _$ti, sti, st1_| coaaa0 i001 01 2154 ‘am | x00000220 we | oxco003234 2.15.5 Assuming Stl =A, $2=B, $s1 =base of Array C a Tw $51) and 2, 8 B beq Stl, $0, ELSE add $tl, $02, $0 beq $0,°80, END ELSE: Iw $02, O¢$s1) EN: 2.15.6 a Tw $03, Yooo!1 10001 lari cooqaeqcq0qa0000 and $tly (009000 o1G10 aiai1 a1o01 aaada 100100 B beq Stl, ‘a0a700 1001 co000 coooaaqGca0q0I0 add tly ‘090000 1610 cocda a1001 aaa00 1G0000 beq $0, ‘00100 Go000 coGe0 acocacaacaa0a001 ELSE: Iw §t2, 100011 10001 1010 cooeaoacacaaca0a EN: ‘@ | all, 16000 10 OTF ba, | 06000 DORE 216.3 @ [1um—9, teen B. [ jum—n9,tee—n0 Chapter 2 Sokitions 2.164 = [so=2 | m [so22 J = [se=0 | wm [sons J [ioe beav0 | ty [homes bares J Solution 2.17 247A. The answer is really the same for all All of these instructions are either supported by an existing instruction, or sequence of existing instructions. Looking for an answer along the lines of, “these instructions are not common, and we are only making the common case fast’ 2.47.2 ‘| could be either ype of ype sub $12,$zer0,813 f= - 3 ble §t3,$zero,dane if if 13 €0, result is t2 add $12,8t3,8zero if 13> G, result is 13 DONE: b [ste Stl, 903, 512 2474 a [2 AB >| inca Dal=0 +2 sta, $0, 0 $0,” $0, "TEST $50, $50, $51 sta, sta, 1 siz, sta, 10 st2, $0," LoaP $12, $50, 10 St2, $0, DONE 513, $51, $50 S12, $50, 2 SUZ, $52, $12 513, 12) $50, $50, 1 J Loo? oon 2.18.3 ‘| Sinstuctons to mplementand 44 structions executed 1b | Sinstuctons to mplementand 2 nstrclons excuted 2.18.4 @ [ot b [SOL 2.18.5 am | forci=i00; DO; i==71 result += MenArraylsOl: so wb. | fori: TAM, 2d" result == MenirraytsO + 11; result == MemirraylsO = i + ) 2.18.6 a addi $t1, $50, 400 Lop: Iw $51, 0¢$s0) add $52, $52, §s1 addi $50, $50, 4 Ine $50, $1, Loop | ready reduced to minimum insructons Solution 2.19 ssp, $50, sro, O¢SSp) Ow sub $y0, $20, $a in ra we | fetter ‘061 $5p. Sp, —1 sw §ra, 12Césp) sw $50, a¢Ssp) sw $51, 4¢6sp) Sw $52, O¢SSp) add $50, $20, $0 dd. $51) Sal; $0 dd $52) $22) $0 add $10, $51, $0, ne $52) $0,"exit add $20, $50, $51 add Sal $50, $0 add $02) $52) fal . - Iw O¢$sp) Ww 4C$sp) Ww acssp) Ve audi in Chapter 2 Sokitions ssp, acssp) $20, $a 80,1 80, exit 80, $10 $0, exit 80, $0 sti, $0 ocssp) ssp. 4 Sra, ssp, sra not possible for the MTF err “4 contents of register $ra after cating function sub: old §sp=> @rrrrrre “4 of register $ra ssp 8 of register $ra_ firetum to ’ | after calling function fi_iter: old §sp> Gxsfrfrrre “4 of register $ra of register $50, of register $51 ssp > of register $52 $5p,55p,-8 $ra,4($5p) $50,0¢5sp) $50,522 tune $20, 5v0 $21,530 fune $ra,4(Ssp) $50,0(5sp) $5p,55p48 $ra 219.5 “@ | We can use the till optimization for the secend call © func, but ten wemustresior Sa nt Sep befare that call, We seve only one instruction (Sra). wb | We can NOT use the tal call optimization here, because the vale reumed from f 1S not equal {0 te vale retried by the last call © FUNC. 2.19.6 Register $ra is equal to the return address in the caller function, registers ‘$sp and $s3 have the same values they had when function f was called, and register ‘$t5 can have an arbitrary value, For register $t5, note that although our function does not modify it; function func is allowed to modify it so we cannot assume. anything about the of $t5 after function f unc has been called. Solution 2.20 $sp+ §5p, Sra, 4C$5p) $20, a¢$sp) $50, $0, $20 $0, $20, 510, 60, $v0, $0, 1 Ssp, §5p, 8 $ra $20, $a, -2 Fact $v0, $50, $¥0 $20, 0¢$sp) Sra, 4¢$5p) Ssp, $5p, 8 sra Chapter 2 Sokitions Ssp, ~8 ACS5p) ‘acssp) 50, $20 $20, 2 80, "LI $0, 1 $sp, 8 $20, -1 $50, $0 acssp) ACSsp) ssp, 8 ‘a | 25.MPS nstrucions to cove nonrecurSNeWS. 45 nsYUCtONS to excowte(coTeCted versIN| oprecursion Nenrecursve version: addi §sp, $sp, ~4 sw $ra, 4(S5p) add acd sTti $10, ine $10, mul $52, addi $50, J Loo" add $0, Ww Sra, addi §sp, or $ra 1. | 25MPS insiuctions to excovte nonreoursive vs. 45 instructions io excoute(coected version ‘preousion Nenrecursve version: addi $sp, Ssp. -4 sw $ra, 4C55p) add $50, $0, $20 add $52, $0, $1 Log: siti sua, $50, 2 ine $00, $0,’ DONE mul $52, §s0, $52 addi $50, $50; -1 J Loop’ DONE: add $v0, $0, $52 Ww Sra, acésp) addi §sp, Ssp, 4 or __$ral 2.20.3 ‘@ | Recursive version FacT: addi $sp, $sp, sw $ray 4($5p) sw $a, O¢Ssp) add $50, $0, $20 HERE: siti $t0, $a0, 2 beq $0, $0,°L1 addi §¥0, $0, 1 addi §sp, $sp. 8 ir $ral Liz addi $30, $20, jal FACT mul $40, $50, $0 Tw $20, 0¢S5p) Tw $ray 4($sp) addi §5p, $sp. 8 ir $ral old $59 >" Oxnnnonnnn ~ ssp > ay old $59 >" Oxnnnonnnn ~ ay 72 ssp > ne ‘at label HERE, after calling function old §sp >” Oxnmnannnn ssp > old $sp > ssp > ‘Oxonnnnnna ~ ay 72 ne 20 ‘at label HERE, after calling function FACT with input of 42 r $ra ter $20, s of regis! ts of regis ‘at label HERE, after calling function FACT with input of 3: of regist of regis of regis of regis of regis of regis ‘at label HERE, after calling function FACT with input of 1 of regist of regis of regis of regis of regis of regis of regis of regis Chapter 2 Solutions | Recursive version FACT: addi $sp, si Sra, su Sa, add HERE: siti beg addi addi in li: addi dal fact Mul $¥0, $50, $¥0 Tw $20, OCs) Tw Sra, 4(Ssp) addi $5p. $sp. 8 ir $ral old §sp =>" Oxnnnannnn ssp > 3 old §sp =>" Oxnnnannnn 3 2 ssp > a6 Sta $59 => Oxnnmnnnn 3 2 6 20 ssp > 24 old §sp =>" Oxnnnannnn 3 2 6 20 24 5 $sp > = fat label HERE, after calling function FACT with input of contents of register $ra contents of register $20 fat label HERE, after calling function FACT with input of Tabel HERE, after calling function FACT with input of s of regist 5 of regi 5 of regi 5 of regi 5 of regi 5 of regi fat label HERE, after calling function FACT with input of 2.204 @ fis: addi Sp, = acs5p) 4CSsp) acssp) $20, 3 80," 80, 1 uy $20, -1 $0, $0 $30, -1 $40, $51 exit: ocssp) 4(Ssp) 5p) $sp, 12 $sp, ~12 C559) 4C5sp) acssp) $20, 3 80," 80, 1 uy $20, -1 $0, $0 $20, -1 $00, $51 EXIT: Iw $20, 0¢$sp) Tw $51, 4(5sp) Tw $ray 8(5sp) addi §5p, $sp, 1 Chapter 2 Sokitions 2.20.5 er FIB: addi SH addi addi siti ine add $sp, Sra, ssi, $52, sta, 510, $53, ssi, $52, $20, Loo add Ww addi ar $0 Sra, Ssp, Sra ‘a | 23MPS istuntions to ecaute nonrecurshe vs. 73 nsiUctons lo excoWte (coTeCtEd VES oprecursion Nonrecursve version Ssp. ~ sp) $0, 2 $0) 1 $20, 3 50,’ ExT ssi, $0 sl, $52 $53, $0 $20, -1 sl, $0 sp) ssp, 4 er Fis: acd addi acai siti ine add acd acd acd ssp, Sra, ssi, $52, $20, L008 add Ww acdi ar 80, Sra, ssp, sra ’b | 23MPS instutions © @caile nenrecurShevS. 73 netuctons excoute carected vas recursion Nenrecursve version: 2.20.6 calling function FI8 with inpy calling function FI8 with inpy FIB: addi $5p, $sp, =: sw $ra, (5p) sw $51, 4(Ssp) sw $a, O(Ssp) siti $0, $20, 3 beq $0, $0,°L1 addi §¥0, $0, 1 3 ext addi $20, $a0, “1 jal FIB| addi $51, $¥0, $0 addi $20; $40) =1 jal FI ’dd $40, $40, $52 Iw $20, 0¢$sp) Tw $51, 4(Ssp) Tw $ray 8¢Ssp) addi §sp, $5p, 12 ir $ral ‘at Tabel HERE, aft old $5p ‘Oxonnnnnna “1 = ssp > w | recursive version FIB: addi $5p, $5p. ~ sw $ray (S50) sw $51, 4(Ssp) sw $a, OCSsp) HERE: siti $00, $20, 3 beq $0, §0,°L1 addi §¥0, $0, 1 3 ExT Lis addi $0, $20, -1 jal FIB addi $51, $40, $0 addi $20, $30, -1 jal FI add $40, $¥0, $51 EXIT: Iw $20, 0¢$5p) Tw $51, 4(Ssp) Tw $ray (sp) addi §5p, $5p, 12 ir $ral ‘at Tabel HERE, aft old $5 (Oxnnnnnnnn “4 4 Chapter 2 Sokitions Solution 2.21 221 @ | after entering function main: ald $sp Gurrffrre 222 after entering function leaf_function old $5 wrrrrffc =A $sp 8 Sp => “4 contents of regist of regist of regist er er er sra sra Sra (rewum to main) By | after entering function main: eld $5 axerrrrte ssp “4 after entering function my f eld $sp averrrttc “4 “5 ssp = global_pointers ‘@x1o008000 00 my_ global of regist of regist of regis’ sra sra Sra return to main) 2.21.2 a | Hal ‘addi $5p, sw $ra, addi $20, Jal LEAF Ww Sra, (sp) addi $sp, Ssp, 4 dr fra addi $5p, $sp, sw $ra, AC$5p) sw $50, OC$5p) addi $50, $20, 1 siti $2, 5, $20 ine $12, $0, DONE add $20, $50, $0 Jal LEAF add $v, $50, $0 Ww $50, Oc$p) Ww $ra, A¢$sp) addi Ssp, Ssp, 8 ar fra sa7 Wal $5p. (Ss) ($50) assume $50 has global variable base $v0 $0 (ssp) $5p, 4 FUN sub $¥0, $a0, Sal $5p. (Ss) 80,1 (ssp) $5p, 4 $sp, 4(359) acssp) $20, 1 5, $20 $d, DONE ss0, $0 $50, $0 Css) 4C5sp) $5p, 8 has global variable base Chapter 2 Sokitions 224 Register $30 fs used to od a temporany result Without saving $S0 frst TO comect this Breblam, S10 (or $40) shculd be used in place Of $50 the rst two sructions, Note tat a ‘Subgptinal SOMtCN WOuld Le 1 ConUTWe USI $S0, DU ac Code to seve/TeSIEE “The MOadd NSTTUCIONS move the slad< Ponte the wong dection. Note at the MPS: caling corvention requres te S120« to ggOW down. Even if We Siac gro UP, is caKe WOU be incorect because Sra and $80 are saved acooring 0 te Stac-grO"s down COTMETTION. 2.21.6 ‘& | The function retums 842 (which is 2 (1 =30) + 1000 = 100) 1b | The function retums 1500 (ga, b) is 500, s0 it etums 500 + 1000) Solution 2.22 2.224 @ | 65 20 98 121 116 101 mp SD igi 114 2.22.2 ‘@ | u0041, u+0020, Us0062, U0079, U0074, U+0065 h | U0063, U+06r, U+006d, U+0070, U+0075, U+0074, U+0065, U+0072 2.22.3 @ | add tb | shit Loge: DONE: FIRST: sp, ~4 (sp) 80, 030 50, 0x39 # 19" 50, $0 sa, $0 cso) Sel, $16 80, DONE St7, $1 80, DONE sel, $6 80, FIRST $50, 10 $50, §t1 st0, 1 $50, $0 (sb) $5p, 4 Loo HEX: ec: Dan FIRST: $5p, (sp) 80, Qx41 i tat 50, O45 i 1FY 50, 030 # 10" 50, 039 #19" sly 80,’ FIRST $50, 10 $50, $12 st0, 1 add $v0, $50, $0 Tw $ray ($59) addi S5p, Sp, 4 ir $ra Chapter 2 Sokitions Solution 2.24 2.24.1 ‘@ | cxoo000012 wb [ onzrrrere 2.24.2 ‘a | 00000080 wm | 20000000 2.243 b [ oaTsss655 Solution 2.25 2.25.4. Generallyall solutionsare similar: ui sta, ori stl, $1 2.25.2 Jump can go up to OXOFFFFFFC, a [© ® [r0 2.25.3 Range is 0x604 + Ox FFFC = 0x0002 0600 to 0x604 — 0120000 = OxEFFE 0604. by fies 2.25.4 Range is 0x0042.0600 to 0003E 0600, » [ro 2.25.5 Generally, all solutions are similar: add Stl, zero, $zero clear $t1 addi $12, $zero, tomAbits set top 8b sl] $12, $02, 24 isnift left 24 spots or Stl, Stl, $12 place top 8b into $t1 addi $12, $zero, mxtl@pits iset next 8b sll $12, $12, 16 isnift left 16 spots or Stl, Stl, $12 place next & into $t1 addi $12, $zero, mxt2 pits iset next 8b sl] $12, $12, 24 isnift left 8 spots or Stl, Stl, $12 place next & into $t1 ori $tl, $tl, bo@bits — ior in bottom 8b 2.25.6 a | ox12345678 by | oxiz340000 2.25.7 @ | 20 = (Oxi234 << 16) |] Oxs678: be | cd = (00 || Oxs678); 0 owiell ie Solution 2.26 2.26.1. Branch range is 0x00020000 to OxFFFED004. @ | oebench by | thee branches 2.26.2 by | can't be done 2.26.3 Branch range is 0x00000200 to OxFFFFFEO4. eit branches B ‘S12 branes | Chapter 2 Sokitions 2.264 trench range is 16% loner be branch range is 16xsmaler 2.26.5 a no cree be “ump to aciresses 010 2” instead of 0 to 2 assuming the PO=0" condition for te mer loop is AeC-ed. ‘This conan is Crected atoll oF 5 tines (whenever snap Is called, psa total of 10 res {0 ext the Emer logp once n each eration oF re cuter Keon), so We execute 55 MSHUGIONS fever. Solution 2.33 2.331 ‘@ | ind: move $v0,$zer0 eq $v0,5a1 done sl $t0,5v0,2 add $t0, 500,520, lw $t0,a¢st0 be $t0,522,skip ir $ra addi $¥0,$¥0,2 b 1009 2 1i $y) irra we | count: move $¥0,$zer0 ‘addi 00, $0, b Toop ir_sra 2.33.2 ‘a | int findGint *a, int ny int 01 for(p=azp!maenip+4) fp) B | ine count(int *a, int ny at int ress; 2.33.3 & [inc nove $00,500 etl stls$ai.2 StI/stl;sa0 $10, Steere si2iacsioy $12,802, skip $10" $10,820 $10"80,2 & sia, szero sia;sa0 sulisen.2 StI/stl;sa0 $10, Steere si2iacsioy $12,802, skip $10,80001 sto;siosa oop sre 2.33.4 EE Eee = 7 5 be 8 6 2.33.5 EE eee = ri 3 be 2 3 2.33.6 Nothing would change. The aode would change to save all tregisters we use to the stack; but this change is outside the loop body. The loop body itself ‘would stay exactly the same. Solution 2.34 2341 a addi $50, $0, 10 Loge: add. $50; $50, $51 $50, $50, -1 $50, $0, Lo0e b SI] $51, $52, 28 srl $52) $52) 4 or $51) $51) $52 2.342 ‘@ | ADD, SUBS, MO/—oll ARM registeregister instruction forme ENELan ARM branch instrucon format a | RoR an AR regisiowegisir suction format 2.34.3 a curd, rl BHI FARAWAY B ADDO, ry Fz 2.344 ‘@ | Oven ARM resisieregster instruction fomat EMILan ARM branch nstuctin format bb [ ADD—en Ar registerregister instruction format Solution 2.35 2.351 ‘@ | egsier operand wa | resister + offset and update register Chapter 2 Sokitions 2.35.3 a addi $50, addi $51, xor $52, ADDLP: Tw $54, addi $52, addi $50, addi $51, $51, «1 ine $51, $0, ADDLP & ST $51, $52, 28 srl $52, $52, 4 or $51, $51, $52 2.35.4 ‘a | BARNS BMPS reTUCIONS by [ LaRWS. 3 MPS structions 2.35.5 ‘| ARMOR? tines fact as MPS wa | ARW 2 tines fast as MPS Solution 2.36 2.36.4 = si $51, $51, 3 add_$53, $52, $51 b si] $54, $51, 2 srl $51, $51; 3 or $51, $51; $54 add_$53, $52, $51 2.36.2 ‘| adi $53, $52, wb | addi $53, $52, 2.36.3 a sl $l, $1, 3 add $53, $52) $51 & Su $54, $51, srl $51, $51, 3 or $51, $51, $54 add $53, $52) $51 2.364 [add 13, 02, b [add 13, 72, ox@000 Solution 2.37 237.1 a mov ed, Lesind¥ebx] echemnenory(ests4*ebx) wb | staRT: mov ax, 001011005 nar ax = 0010110005 mov cx, 000000115 nar bx = 121200005, mov bx, 11110000) nar ex = QOQ0OOLIb: and ax, bx ax = ax Bk bx; or ax, ox ax = ax | os 2.37.2 a SI $52, $52, 2 add $34, $54, $52 Iw $53, (sd) wb | START: addi $50, $0, Ox2 addi $52, $0, 0x03, addi $51, $0, Oxfo and $50, $50, $51 or $50, $50, $52 2.37.3 @ | nov edx, [esisttebx] SLES b | add eax, 0x12345678 4,4,1,32 2374 ‘@ | addi $00, $0, 2 sl} $20, a0, $t0 add $20; $20; $al Tw $v0, 0320) wb | tui $20, Oaaae ori $20, Ox5678 Solution 2.38 2.38.1 ‘Tis insructin copies ECX bytes rom an are pointed to by ESI to an aray pointer by EDL. ‘example C litany function that can easily be implemented using this instruction fs memcpy. “This insruction copies EOX elanents, where each leet is 4 bytes nSize, roman aray parted toby ESI to an aay porter by EDI Chapter 2 Solutions 2.38.2 am | loon wm | Toop: Tw $t0,0¢baz) sw $t0,0¢5ai) ‘addi $20, $30,-1 addi $21,414 addi $22,824 bnez $20, 1000 i SS ———— EE ee a] f: add $00,520,521 MPS:2%4=8 bes Sr__$ra 286: Li pfs: [fw $t0,0Csa0) MPS: 6x4 =24 bes Ww $t1,0¢$a1) 286: 19 bes add $10, $11 Sa 'ocsa0) sw $00,0¢$a1) sr__$ral 2.38.5 In MIPS, we fetch the next two consecutive instructions by reading the next 8 bytes from the instruction memory. In x86, we only know where the second instruction begins after wehaveread and decoded the first one, so itis more dificult +0 design a processor that executes multiple instructions in parallel. 2.38.6 Under these assumptions, using x86 leads to a significant slowdown (the speed-up is well below 1): a = 2 Fry 0.8 ] & é 19 02 | Solution 2.39 2.39.1 ‘@ | oss seconds ba. | ax7eseomas 2.39.2. Answer is no in all cases. Slows down the computer. CCT = dock cycle time ICa= instruction count (arithmetic) ICls= instruction count (load/store) 10>= instruction count (branch) new CPU time= 0.75 x old ICa x CPla x 1.1 x oldOCT + oldICs x CPils x 1.1 x oldCCT + oldICb x CP Ib x 1.1 x oldCCT “The extra clock cycle time adds sufficiently to the new CPU time such thatiit is not quicker than the old execution time in all cases. 2.39.3 @ | 13.16% LAs | s0886% 10.64% 2.394 als ba | oxrieccscar Chapter 2 Sokitions Solution 2.40 2.40.1 ‘a | Ine frst Reation S10 is Oand the Wy felches a{0} After that $10 1, the hvuses a nore aligned access tegers a bus ercr, 1b | In he fst eatin $10 and $e point to [0] EO} so the Iv and sw structions acoess AO), YO, and then a{0] as intended. Inthe secand iteration $10 and $e point tothe res byte a{0] end bf] respective instead of pointing toa] and bE]. Thus the frst hy uses a nore aligned address ard causes a bus mar. Note that fe computation for $12 (ackess fala) oes not cause a us eer becaLke thet address isnt actualy used to access menor. 2.40.2 ‘a | Yes, assuming that is asigrederded bye valve between 128 and 127. xis simply be ‘ale betveen 0 and 255, the function proved only veri if nether xnor array a contain values outside the rarge of 0.127. | f? move $v0,szer0 mmave $10,$2er0 Le SIT $t1,$t0,2 — ; We must muttiply the index by 4 before we add $11,$t1,$20 add it to af] to form the address for Iw Ww $tl,0csi1) me §11,$22,5 ‘addi $¥0,$¥0,2 St addi $10,800,1 ine §10,Sal, ir __tra we | F move $00, $20 move §t1,$a1 ST] §t2,$32,2 ; We must multiply n by 4 to get the address add $12,$t2,$20 ; of the end of array a Li lw st3,0cstoy Ww §ta,acsti) add $13,$13,504 sw §13,0(10) addi $10,$t0,4 —; Move to next element in a addi $U1,8t1,4 Move to next element in b ine $10,512, irra 20.4 At the exit fiom my_a] loc, the Ssp register is moved to “free” the men that isretumed tomain, Then my_in t) writes to this memory to initialize it, Note that neither my_int nor main access the stack memory in any other way until sort () iscalled, so the values at the point where sort () iscalledare sill the same as those written by my_ini t: ala G2 0 0 0 als & 3 2 7 2.40.5. In main, register $s) becomes 5, then my_a110C is called. The address of the array y “allocated” by my_a110c is Oxffe8, because in my_a110¢ $sp was saved at Oxffic, and then 20 bytes (4% 5) were reserved for array arr (Ssp was dec- remented by 20 to yield Oxfle8). The elements of array v retumed to main are thus al0] at Oxffe8, a[1] at Oxffec, a[2] at Oxf, a[3] at Oxf, and a[4] at Oxfff8. After My_al 1oc returns, $sp is back to 0x10000. The value retumed from my_a110c is Oxffe8 and this address is placed into the $sl register. The my_init function does not modify $sp, $50, $1, $82, or $3. When Sort () begins to execute, $sp is (x1000, $30 is 5, $s] is Oxffe7, and $52 and $33 keep their original values of -10 and 1 respectively. The sort (0 procedure then changes $sp to Oxffec (0x1000 minus 20), and writes $80 to memory at address Oxffec (this is where af1] is, so al] becomes 5), writes $s1 to memory at address OxfHf0 (this is where a[2] is, so a[2] becomes Oxffe8), writes $s2 to memory address Oxffl4 (this is where al3] is, soa[3] becomes, -10), writes $s3 to memory address Oxfif8 (this is where a[4] is, soa[4] becomes 1), and writes the return address to Oxfifc, which does not affect values in array v.Now the values of array V are: al 0 5 ote 7 1 Gtfea 7 1 pl so 2.40.6 When the sort ( ) procedureentersits main loop, the elements of array v are sorted without any interference from other stack accesses. The resulting sorted Unfortunately, this is not the end of the chaos caused by the original bug in mya] loc. When the sort () function begins restoring registers, Sra is read the (luckily) unmodified location where it was saved. Then $80 is read from memory at address Oxtfec (this is where a[1] is), $s1 is read from address Oxff0 (this is where al2] is), $52 is read from address Oxfi¥4 (this is where a[3] is), and $33 is tead from address Oxfff8 (this is where a[4] is). When sort() returns to ma in( ), registers $60 and $s1 are supposed to keep n and the address of array V.As a result, after sort ( retums to ma in( ), mand v are: a [mis S0Visa Lelementaray of ntessrs hat begns at address 5 B | 55 Sovisa Selement array of tesers that begns at address 5 If we were to actually attempt to access (e.g. print out) elements of array v in the main() function after this point, the first hy would result in a bus error due to non-aligned address, If MIPS were to tolerate non-aligned accesses, we would print ‘out whatever values were at the address v points to (note that this is not the same address to which my_init wrote its values), a | 7620 3606 31.2 a | 72 wb, | 3625 3.1.3 a [277 we b, | 205 108 314 = | 730 , | 1560 315 = | 730 wb, | 1560 316 ‘a | 101110010 b, | oxonto100110 | ‘The attraction is that each octal digit contains one of 8 different characters (0-7). Since with 3 binary bits you can represent 8 different patterns, in octal each digit requires exactly 3 binary bits. You can write down the conversion directly. $70 Chapter 3 Sokitions Solution 3.2 wb. | 4765 A877 ‘The attraction is that each hex digit contains one of 16 different characters (0-9, A-E). Since with 4 binary bits you can represent 16 different patterns, in hex each digit requires exactly 4 binary bits. And bytes are by definition 8 bits long, so two hex digits are all that are required to represent the contents of 1 byte. S71. Solution 3.3 331 ‘@ | Urderiow 21) b, | Neier 8) 1258, which does not ft into an Bit SM format) :146, which does not ft into an SM format) a | Neher 2) b [Nein £ R a [2108-47 ‘a | 200+ 105= 255 (305) we, | 247 237 = 255 (BA) Chapter Solution 3.4 341 50x23 soe |__| serie | meres |e tials 20000101000 | 000000000000, + [Prad= od-riand | cian | c00 00101 000 | 00000101 c00| (sifted qxooun | c00.001010000 | 00000 401 c00 Fea Wer oxoo1 | c20.01 010000 | 00000 101 co0 2 [ea rross wean | 06x00 | 000001010 00 | ooo 0088 000 (shitvard eoioo1 | c00010100000 | ovo 444. c00 Fea Wer 0x00 | c00 10100000 | ooo 144. o00 3 [iw= 0.000 aaox00 | —a00 010100000 | 00004000, tshitMand 0300 [000 101000000 | coon 144. c00 Fe wer ‘ooo | 000 101 000000 | ooo 444. o00 4 [iw=O.n000 xno [0003101 000000 | 00001 444. o00 tshitvand caooio | co1 010000000 | ooo 144 c00 Rewer ‘ooo | 001 010000000 | ono 4. 000 3 [ed Fred's and | 00001 | ~oo1 010000 o00 | oxox 344.000 cswtvend xno | 010100000000 | ooo 414. c00 Tener aa0000 | 010100000000 | oon 1,000. © [i@= 0.0000 x00 cot ox0000000 | ono 444 c00 cswtvend ‘0000 | 101 000000000 | oonona x44. c00 Fe Mer x0000 | 101 000000000 | oon 44. 000 Bb. 66x 04 OES ee [anv coox00 | ccooooi01i0 | 00000000000, + [= On0% «0300 | c09000110 110 | 00000000000, Lshitvand «aa0x00 | c00 01 101 100 | 00000000000, Ft er ‘ooo | c00.01 101 100 | 00000000000 2 [i@=0.n000 xooio [009001101 100 | 00000000000, sitar ‘ooo coo 11 011 000 | 00000000000, Few wer ‘ooo | c00.11 011 000 | 00000000000 Pod=Prod+Meand | 000001 | opox012.000 | 000.011.011.000 ati Mand oer | coosio1i0000 | coooiLoii 000 sift Mer ‘c00000 | coosi0s10000 | _cooo11 O11 000) 4 [Ko=0,n0@ ‘oo0000 | coosios10000 | cooo1L O11 000, Latin Moana e000 | coLiorisoc00 | cod011 011 000, SK Mor ‘e00000 | consonso0000 | 00011 011 000 ‘o00000 | corsoriooc00 | coooiL 011000 e000 | cucics0c00 | 00011 011 000, ‘e000 | cxo1c00000 | 00011011 000 6 ‘e000 | cucicsoc00 | coooiLo11 000 e00000 | io1i0c00000 | 00011 011 000, ‘e000 | 10110000000 | cooo1L 011 000 34.2 a.50X 23 a 0 [ital vais 301,000 ‘een e10 014 a | Prod=Prd+ Mead 301,000 101 00001004, Reni Preaek 101,000 ‘010 100601 GOL 2 [Prod=Prod + Moana 301,000 it up ooL aL Suit Mier 101,000 (G14 440000300 3 [=0.n0@ 301,000 ‘@14 210600 100 eit Mier 301,000 ‘eb 414 c00010 a 301,000 ‘obi 221 600010 301,000 ‘000 411 100001 3 301,000 01141 100001 301000 (010 411 210000 6 301,000 ‘010 414 110600 sit Mer 301,000 ‘epi oii 411.600 S74 Chapter 3 Sokitions b.66x 04 fj sre | |__| rc ta Vas 140310 00000 600300 1 [s=ame 110310 90.000 c00 300 Fete 14030 00000 c00 010 2 [is=0.r00 140310 00000 c00 010 FS MIE 110310 90.000 c00 008 3 [ed Pras Nand 14030 1103100000 co Reni Prakct 140310 031 014 00000 a [m=0m@ 0200 21 011 000000 Fete 140310 001 10410000 = [m=0me 110310 oo1 104 100000 Fst Mer 14030 000110110000 © [m=0me@ 110310 90.110 110000 Fete 14030 00 014 011 000 3.4.3 No solution provided 344 a 4x67 = 424 a Tata aves 120000 10 100 | 000.000 000060 tiers OR Nien ° thera ke ste ciara | ceoa00 001100 | eo0000 000000 | O 7 [pross rod-rimana | oxoaax_[ 000000001100 | 000000001 300 | 0 (sited exo | o00.000011000 | 000000001100 | 0 Fs wer ‘axca | 000.000.013.000 | 000000001 100 | 0 2 [Pred Frods and | eau cis | 00000011 000” | o00.000.100100 | 0 tshitverd orca | 000.000.110000_| 000000100100 | 0 Ft weer ‘0x01 | 000000 110000_| 000000 100100 | 0 3 [red = Frese and | oo0301 | 000000130000 | on0 004 010100 | 0 (sited ‘oon | 000.01 100000_| 000001 010100 | 0 Ft wer ‘0010 | 000003 100000 | 000001 010300 [0 ° ° eit Mier ‘00001 | copartca000 | coogi a10100 | Oo 5 [Rod=Pma+wena | 00002 | coor000000 | coDs00wa00 | Oo ati Mand ‘e00001 | c0110000000 | coos00a10100 | 0 eit Mier ‘o00000 | co0210000000 | coos00G10400 | Oo ‘000000 | c00210000000 | co 100010100 | 0 ‘e00000 | cor 100600000 | coos00G10100 | 0 ‘a00000 | cor 100600000 | coosc0G10100 | 0 ‘000000 | 01100000000 | co 100010100 | 0 | Motor | _Mutpticand | Product __| sien} © [rmsanenes coos | oo0G00013 000 | co9o00 000000 | 0 Wakes OR ateteasen ° OxRO Make postive cous | coocono1z ceo | conooocco000 | 0 1 [Proa=mrea+weane | ccoas1 | conoo0cr1000 | oooccooi1000 | Laat Wend eno. | coo c00.a10c00 | coo doors 000 | 0 ai eer oor | con co0.a10c00 | condo crs 000 | 0 2 [moa=Peas wend | caoo1s | ooocc0110000 | ecooo1 001000 | Ls Wend ‘noes | 00001 190 c00 | c00004 003.000 | 0 ae ‘0c0x | 00.001 190 00 | co0 004 002.000 | 0 3 [Red=Pec-snend | coooat | 00001 100000 | ecoox0 101 000 | 0 Lawton ‘coocar | 00031 000 c00 | con 10302 000 |_o ‘0000 | 900031 900 c00 | co0 1002.00 | 0 a ‘0 c00 | 00031 000 c00 | con o10403.000 | 0 ‘0 c00-| 900320000000 | co0o10303 000 | 0 air ‘00 c00 | 000330000 000 | aoo.o10402.000 | 0 = [io=0,r00@ ‘0 c00 | 900330000 00 | co0.010103 000 |_o Lea Wend ‘00 c00 | 01300000 c00 | co0.o10403.000 | 0 anther ‘0 .c00 | 01 300000 000 | cooo10,02.000 | 0 S76 Chapter 3 Sohitions po |__| tee ict] 5] S=omm carimenen [onatinan [o a Se eas Savon [nitenownoese [ omonnanton 0 345 54x67 =(-24x-11 = 264) Peep | sen | murpscans | roaeyanrtser > [wae = Senseo [ean inti eer mae ‘oui aon Gi a [pstshe ies ‘ni ‘aot o ram a tapes it ee En ae an ini Poor a iscame a ona nine ii iaioDen 5 [rosea inti pete rea ‘oui iinet = [peichosiead ‘ni Senet ce rai a Sensei 105 Muatpcand [ava a0 (9000 000000 411 1 [rad = Ros ane 110 ats 000000 411 Fe wer 011000 (001 100000 01 2 [ed = Pos Weed i100 (9300 100000 011 Fe oat au10 0010 010000 oor 3 [a= Po We aiL00 (0301 010000 oor Fi er a1L000 (040 101000 000 4 [m= 0000 x10 (0410 101000 000 Fit er i100 9001 010300 000 a = 0.n0@ ‘01 000) (0.001.010 100 000) sit Mier (014.000) (01000 101 010 000) 6 [ko=0,n0@ (1 000) (0000 101 010 000) Reif Mier ‘01 000) (01000 010 101 000) 34,6 No solution provided Solution 3.5 3,5: For hardware, it takes 1 cyde to do the add, 1 cycle to do the shift, and 1 gvce to decide if we are done. So the loop takes (3 A) cycles, with each cycle being B time unitslong. Tora software implementation, it takes 1 qycle to do the add, 1 eyde to do each, shift, and 1 cycle to decide if we are done. So the loop takes (4% A) cydes, with each, oycle being B time units long. ‘@ | Gx 4)xSU=36 une unis forever (4x4) 3tu=48 tne uns for sorware B 3.5.2 It takes B time units to get through an adder, and there will be A = adders, ‘a | Word is 4 bis wise, requing'S adders, 3 Stu=9 tine units. | Word is 32 bits wie, requinng 31 adcers, 31 Twu= 217 time LAs. 3.5.3 It takes B time units to get through an adder, and the adders are arranged in a tree structure, It will require log2(A) levels. ‘a | 4 its wide word requires 3 adders in leeks. 2x Su= 6 tine units b, [ 32 bits word requires 31 adders in 5 levels. 5x Tus = 35 te units, Solution 3.6 3.64 ‘@ | 004 x C9 = O04, 0:24 = 96, ad 36=S2 + 4, so ve can Shift OCD KAS places, then dc 10 tek vee (0x4 320) 309 shied Ket 2 places (Ox324) = OX1C44. Tal 2 Shits, 4 a we | cra x 0aB=O0618 Ovad = 64+ 1, OS = 16 + 2, Best way Would be Shit X15 KE places, and then add x18. 1 shi, 1 adc Chapter 3 Sokitions 3.6.2 ‘a | 028x069 = 004 x OO = -ONIA = Bad 012A = 36, and 96 = 32+ 4, We SA (0x49 let 5 places (01820), then aid to that velve Ox shifted eft 2 places (124) = OoMa. We need 1 heen track of the Si... one OF the 01 negate, SO the resi will be regave. ‘Bal 2 shits, Ladd by [041x008 = O6IBOAL= 64 + 1,048 = 16+ 2, Best way woulkbe to Sn OC8 Kt 6 planes, and then add OLR. 1 shit, 4 ax, 3.6.3 No solution provided 3464 Quoting the wikipedia entry directly: Booth’s algorithm involves repeatedly adding one of two predetermined values A. and S toa product B then performinga rightward arithmetic shift on P, Let x and y be the multiplicand and multiplier, respectively; and let x and y represent the number of bits in xand y. 1. Determine the values of A and , and the initial value of P. All of these numbers should havea length equal to («-+y+ 1). a. A: Fill the most significant (leftmost) bits with the value of x. Fill the remaining (y+ 1) bits with zeros. bb. S:Fill the most significantbits with the value of (~s) in two's complement notation. Fil the remaining (y-+1) bitswith zeros. _P-Fill the most significant x bits with zeros. To the right of this, append the value of y. Fill the least significant (rightmost) bit with azero, 2. Determine the two least significant (rightmost) bits of P. a. If theyare01, find the value of P +A. Ignore any overflow. b. Iftheyare 10, find thevalue of P +S. Ignore any overflow. © If theyare 00 or 11, donothing, Use P directly in the next step. 3. Arithmetically shift the value obtained in the previous step by a single place to the right. Let P now equal this new value. 4, Repeat steps 2 and 3 until they have been doneyy times. 5. Drop the least significant (tightmost) bit fiom P. This is the product of xandy. 3.6.5 0x42. 0x36=0x0DEC a Se Titi Vals ‘0100 0010 (0000 0000 0041 01100 (00, nop st (0100 000 ‘9000 0000 0041 01100 (0100 0010 (0000 0000 0901 1011.0 10, subtract sit ‘0100 0010 3011 1230 0001 1012.0 ‘0100 0010 4401 1111. 0000 1101 1 “Tren sit ‘0100 000 101 1111. 0000 1101.1 0100 0010 1110 1111 1000 0110 1 Gi, add Sait ‘0100 0010 ‘0011 0001 1000 01101 (0100 0010 ‘0001 1000 1100 0011.0 10, bred Sih ‘0100 000 1101 0110 11000011 0| ‘0100 0010 110 10110110 0001.2 {LE nop Shit ‘0100 0010 4110 1011 0110 0001 1 0100 0010 1111 0101 1011.0000 1 Ch, acd sit ‘0100 0010 (0011 O11 1011 00001 ‘0100 0010 (0001 1011 1101 10000 ‘00, nop Si ‘0100 000 ‘0001 1011 1101 10000 0100 0010 (0000 1101 1110 11000 1b. Ox9F x Ox8E=—Ox61 xOx72 Titi Vals (0000 0000 1000 11100 (00, nop Si ‘9000 0000 1000 11100 (0000 0000 0100 0111.0 110, subtract Sit ‘a110 0001 0100 0111.0 (0011. 0000 1010 0011 1 {LL nop shift ‘0011 0000 1010 0011 1 (0001 1000 0101 0001 1 {1 nop sift ‘0001 10000101 0001 1 (0000 1100 0910 1000 1 Ch, acd sit 31010 1011, 0010 1000.1 4401 0101 1001 01000 (00, nop Si 101 0101 1001 01000 41110 1010 1100 10100 Chapter 3 Sokitions a (00, nop Shit 001 at 41110 1010 1100 10100 soot 111 1111 0101 01100101 0 10, subtract shit 001 St ‘0101 01100110 0101 0 soot 1111 (0040 1011 0011 0010 1 3.6.6 Nosolution provided Solution 3.7 3.7L @ 50/23 =2 remainder 2 tna vais ‘19.011 000000 | co0<00 101.600 1 [Rem=nem- ow (000000) ‘010011.000000 | 401 301 101.000 Fem<0,R+D.Qs< [000000 ‘1o@11 600000 | Go0G00 301 600 Fein Ow (000000) ‘001.001 100000 | ~Go0«00 301 600 2 | Rem=nem- Div (000000, ‘o01001 100000 | 210211 G01.600 Fem0,RO=1 ong 014 ‘000001 0000001 6 [Re ory (000010 000010 Fem Rem= Di ory 401 111,000 010 Fen 0,RO=2 1100 100 {@i0 101,600 000 3.7.3 Nosolution provided 374 a. 55/24=O remainder 15: Dividend negative Sign of Quotient = (Sign bit of Divisor) XOR (Sign bit of Dividend) = negative Sign of Remainder= Sign of Dividend = negative see] fetes | orto pre _ nore] nnnervas ‘10100000000 | 090000 001102 1 [Rem=Fem-Dv ‘coc | o10a00 000000 | son 10000101 Ren 0,Q 0,062 ‘00011 | 00000001 001 | 000 ca0000 oH Reh ‘00011 | 600000000300 | coo caD000 oH 3 [sees wooo | 000000000100 | 00 000000 01 B75 ‘@.55/24=0 remainder 15: Dividend negative Sign of Quotient = (Sign bit of Divisor) XOR (Sign bit of Dividend) Sign of Remainder = Sign of Dividend = negative a negative Tia ae The cone Suntan cre cons Teves D con Sanita oe So SnD ae cone Tewari an co Sania oe con Suman —— So ta a Soo Saison fre con Seaaan ec con Tousen an So Sago oe cone Seusioen crea co tousaon ee con Sensinen oe So Ssisn ae Soo Tastoon ee con Sauna liege Sons Teuton (Q=-0, Rem==15) ‘b, 36/51 = 3 remainder 3: Dividend positive Sign of Quotient = (Sign bit of Divisor) XOR (Sign bit of Dividend) = negative Sign of Remainder =Sign of Dividend = positive Pome | ren |__| Remain | © [mens int crauibou sn oe Sa Sener crea Sa tout eee Sa ee Chapter 3 Sokitions a ee (001 001, (000 601 411.600 Rem=Ran=Dv 01 001, 111000111 000 Ren 0,RO=2 01 001, (000 210000 601 6 [Re 01 001, ‘001 100000 010 Rem=Ran=Dv 01 001, (000.014.000.010 Rem > 0,RO=4. (001, 001, ‘o00 011 000 014 7 [Adust signs (001 001 ‘000 614 100014 (Q=-3,Fem=3) 3.7.6 No solution provided Solution 3.8 3.84 In these solutions a 1 will be shifted into the quotient and a compensating right shift of the remainder will be performed. This isthe alternate approach men- tioned in Solution 3.7.2, a 75/12 =6 remainder 1 a Ital Vals (000000 111 104 Ree eoL oH ‘oop 0o1 1a Gio Ren =Rem—Div eo1 010 110 411 111 010 1 [Ren =0,9=<0, Acthe 001 010) 01 414 11000 Ren=Rem-+Dwv wo1 oH 001 110300 2 | REn<0,Q<<0,Acerent eoL oH iio oi 101.000 Ren =Rem-+ Div eo1 010 {111 101 101.000 fee] eens Rem < 0,9 << 0, Aichext (001.010 4111 011 010 000 Rem= Rem Div 01 010) (000 101 010 000) 4° [Rem>0,0<< 1, Sued (001 010) (001,010 100 001, Rem=Rem= DV (001.010) ‘0000000 100003, 5 [Rem=0,0<<1,Siinea (001.010 (000001 000 014 Rem=Ram= Dv 01 010) 140 111 000 041 6 [Rem <0,0<« 0,Aacnet (001 010) 401 110 000 110 Rem= Rem Div (001010) {111,000 000 110 7 [Rem=0,Rem= Rem DW 001 010) ‘000010 000 110 Shit Rem >> 1 01 010, "000001 000 110 (Q=6,Rem=1) bb. 52/41 = Ly remainder 11 SE 0 [iar vais ac 001, (000.000 101 010 Ree 00002, (000000010 100 Ran=Ren-Dv 0001, 1100 000 040 100 a | Rem=<0,9<<0,Adchext 00008, (0000000 101 000) Rami= nem +i 0001, 1300 001 401 000 2 [Ram <0,Q<<0,Aacnet 30000, (0000011 010 000) Ran=Rem-+DW 00, 1100 100 040 000 3 [Ran <0,0<<0,Adcnet 00008, ‘001,000 100 000) Rani= Rem +Di 0001, {101 001 100.000 4 [Ran <0,0<<0,Aacnot 00002, ‘019 02000 000 Ran=Rem+DN 0001, {140 100 000 000 Ram <0,0 << 0,Adchext 00008, {101,000 000 000 Rami= Rem +Div 00001, (001,001,000 000) 6 [Ren >0,Q<<3,Sienee 30000, (0100010 000001 Ran=Rem=Di 00, {140 001,000 001 7 | Ran=0,Rem=Ren+ DW 0001, ‘010 010 000 001 SniftRem => 4 00001, ‘001 001 000 001 Q=1.Rem=11) 3.8.2 Nosolution provided 3.8.3 No solution provided Chapter 3 Sokitions total Vals (000000 001142 1 [Temp=Ren-Dw | 000000 [110100000111 [co%100000000 [co0000 co1144 emp <0,9<<0 | 00000 | 110100 090111 | coLI00.G0G000 | 090000 ODL Ran ‘e00000 [110100 000111 [ooo1s0 000000 | o00000 COLL 2 [Tenp=Ren=Dw | 000000 |at1010 01141 [ oo0%10c00000 | coD00 COLA Temp <0,0=<0 | 00000 [ati010 co1111 [ co0410 c00000 [ co0000 COLA ReunN ‘e00000 [11019 002111 | oo0012. oo0000 | o00000 oon 3 [Tenp=Ran=Dw | 000000 [111101 oo1111 [oo00z4 coo0G0 | coD000 COLA Temp-<0,Q<<0 | 000000 [111101 oo1111 [co0nz4 coo000 [ coD000 ooLAAL Rann ‘e00000 [11101 001111 | coo00n 150000 | oo0000 OLLI 4 [Temp=Ren—pw | 00000 | 111110 101111 [ oo0001 100000 | 090000 oot Temp <0,Q<<0 | 00000 [111110 01111 [ co0008 100000 [ co0000 cOLAAL RaW ‘e00000 [11110 101111 | e00000 140000 | oo0000 OLA Temp =Ren=w | 000000 [141111 o10111 [ c00000 410000 [ oo0000 oOLAAL emp <0,9<<0 | oo0000 [111111 010112 | op0000 216000 [ 090000 oot RaW ‘e000 [11111 10111 [oco000o11600 | o00000 cont 6 [Temp=Ran=Dw | 00000 [ati1i1 110111 [oo0c00Gx%000 | coD000 oOLAL Temp-<0,0%< | ooo000 [asiti1 10111 [0000 ax%000 | co0000 ooLAL RetDN ‘o00000 [aaza12 210111 | Oo000 00200 [00000 oo 7 [Temp=Ren=Dw | 000000 [000000 c00011 [coco00c01100 [oo0000 COLA T>0,9-<1,R=1| oo0001 [000000 00011 [ co0000 G0H100 [090000 O00 Renn ‘200001 [00000 000011 [Go0000. 00080 | o00000 CooL bb. 70/23 = 2 remainder 22 OE 0 [ ta vas ‘200000 [ co0000.e00000 | o10011 000000 | 090000 111000 a [Temp=Ran=Dw | 000000 | so1i01 s11000 [ 10011 000000 | 000000 111000, Temp<0,Q=<0 | o00000 | s01101 413000 | o10011 c00000 | 000000 111000 RSID ‘200000 | s01101 213000 | 91001 190000 | 600000 111000 2 [temp=nen—ow | 0000 | a0 o11000 | o01002 100000 | 000000 111000 Temp <0,Q<<0 | 00000 | s40414 cxi000 | oo1001 100000 | 000000 111000 RaW ‘e00000 | 140112 614600 | 090100 110000 | 000000 111000 Ea ee Tero=Fem=Dv | oo0000 [ 413300 001000 Tene =0,9<<0 | oo0000 | 113100 o01000 | aoouo0 40000 | a00000 43000 rv ‘cn0000 | 411300 005000 | caaouocxsca0 | cxn000 s45000 @ [rem=rem=aw | ccan00 | 11410002000 | oooei0 012000 | ooocnD 4000 Teno =0,9<<0 | oo0000 | 114130 001000 | coup cxs000 | a00000 43000 FD cooo00 | 411330 005000 | canou oossa0 | cano00 a45000 5 [ene=rem=bw | oooocp | 131110 110300 | aooooxcoss00 | ao0000 43000 Tene <0,0<<0 | oo0000 | 113330 410100 | caooouco2300 | ao0000 43000 rib eoo000 | 411330 a10100 | cano0 soox0 | cxno00 s45000 © [erw=Rem=ow | cccn00 | oo00e0ox0010 | ooocxD 00440 | ooocnD 14000 T>0,0<4,R=T| ooooes | co0000 o100%0 | cao0000sc0230 | aooo00 cx00%0 rR cnoonr | ecco00 o19e10 | canooooucpes | cano00CHc0%0 7 [fenn= Feen=bw | Ooooms [131331 s11533 | aoooo0cioo1a | aooo00 ci0010 Teno =0,9<<0 | oooou0 | s13131 411141 | coooo0aicoua | aooo00 ci00%9 TD cooouo | s11s31 114113 [ ccuoo0coscns | canoo0qua0%0 3.8.5 No solution provided 3.8.6 No solution provided Solution 3.9 3.9.1, Nosolution provided 3.9.2 Nosolution provided 3.9.3 No solution provided Solution 3.10 3.10.1 a | 610858726 GABE TSE wb, | -1346037120 2348530176 3.10.2 @ | addy $6,554 b [sw 31, © (2) Chapter 3 Sokitions x46 + 13x 162= A375 + 05078125 ‘am | 15095 x 1 =110 0100 1001.10 ‘ormaiize, move inery point 10 0 te left 140 0100 1001.10 x 2°= s.0010010011 x2" Sign = negaltve, = 198 + 10= 138 Finalbit patter: 11000101010010010011000000000000 we | 9388125 x 10? =a140101010.1101 xP ‘onmaize, move binery point 9 to te let a.a101010101101 x ‘Sign = negathe, ex = 128+9'= 137 Fal bt patter 11000100111010101011010000000000 3.10.5 ‘| 16085 x40= 1100100 1001.10x2 ‘normalize, moe binary point 10 to the tft {1100100 1001.10 2? = 1.10010010011 x 2° be | conei2s x10'= 1110101010101 xP ‘malize, mow binary point Sto the let ‘Lafoioiono1101 x2" ‘sign =negstive,ep = 1024 + 9= 1033 Fal bit pattem: -10000001001140101010140c0000000D;.oDoDAo000000000000 ‘eoc00000 3.10.6 ‘@ | 1605%10? = o11001001001.10 x P=64.8x18° ‘move hex point 3 hex cigs tothe let. (9110 0100 1001.10 x2" = 110010010011 x 16° ‘Sig =negatve, © = 64 +3 = 67 Fral bit pater: '100c0110441Co101001400000000000 wb | -2sRai25 x0? = 1110101010101 x P= 3.8 x 18? normals, noe hex pont 3 0 He It ‘0011. 1010 1010 1101 x 16 ‘Sign = negative, = Ga + 3= 67 Fal bit pater: 14000011001110101010110100000000 Solution 3.11 3444 ‘a | 5100736125 x 10° = SOOTSEIDS x 10 = OT AAOO.DX 10 11111010010000000000,0010 x 2° ‘ove the bhary poit 19 to te feft= «11.11010010000000000001 x 220 ‘@oxrent = +19, mantissa = +111101001000000000000100000 “answer: Coc0000100110111 }oiSo10cD00000c00010 wa | -2.601650990825 x 10°? = —paea1650300625 x 10° =— o0000L10L 11001 xP ‘| Soorasi25% -0,0000000000011 «22 GR 10.00 1 Guard = Q, Round = 1, Sticky = 1) Round = 1, Round up. ' | 2a108675% 10" eso1e1s62 10> 23108075 x 10"= 23.109375 = 1.oLt4000111 x 2 @s016015K2 x 10! = (5391601562 = 1,0100011101 x 2+ ‘Sift binary point 5 tote et and align exponents, GR 10121000111 00 zo0q0101000 11 101 Guard Found = 1, Sticky = 1) 1.0) 1 In this case Guard and Round are both 1,50 we round up. ‘Lotit119000 x 2'= 10111.110000 x 2 = 23.75 = 2.375% 10% 3.11.5 Nosolution provided 3.1.6 No solution provided Solution 3.12 3.42.1 ‘a | 5166015625 x 850575 ‘5166015625 = 1,0110101001 x 2 8558375 = 1,0001001100 x 2° Bp2+3=5,5416=21 (10101) Sere: bot postihe, result posine Mertissar 0110101001 }aoaTo01300 ‘90900000000 ‘nacoacqce0 30110101001 30110101001 ‘neaoacacea ‘oacoacacea 30110101001 ogoa000000 ‘gaaoacacea ‘gacoacacea ao1ioi01001 1300002010010001 411000010100 10 00101100 Guard = 1, Round = 0, Siy= 1: Round up 44900010101 x 2° = OL1O101N001O101 (110000.10101 = 4865625) ‘S\G6015625 x 8.59975 = A8.6419677 734375 Same information was fst because the result de not i into the available LO fk, Answer off by 0142822265625 Chapter 3 Sokitions ‘| 6:18 x 10" x 5.796875 10" 4,0011010100 x 2° 100111111 xP 14,16 +14=3001110) Signs: bath postive, result postive Mantis 30011010100 30011010100 30011010100 10100 ‘nacoacace0 ‘naaoacaae0 30011010100, 30011010100 30011010100 3000101 10000102100 Must: Nomal ize, add one to exponent ‘Some information was last because the resulted not fit into the availabe 10. fk. Ansver off by 15.3125 3.42.2 Nosolution provided 3.42.3 No solution provided 3.124 a | 32541077685 x10? 900001101300011, 1,0100000000 10 0101 Guard = 1, Found = 0, tty = 1: Round wo :,.0100000001 x 22-= o1xoo100100000001 = 101,00000001 =5,00380625 5264/6525 = 5.002208850575 Sa information was fst because the result de not FL into the avaiable LO Fld ANSEF offby 001607330425 Chapter 3 Sokitions ve | -22TTSASTSX 1 /1.154975 x 1 227T3ASTSX 10? = ~2.27731375 = ~1,0010001110 x 24 jd100110111 x2 Boenent = 1-5 =-5,-6 + 16= 11 01011) Signs: one negative, are positive, result negetive 1010000110011101 ~Loxeooo1101 x2 = 1910100100001101 = .000010100001101 = OIST2S61A257E125 22 772AST5/115.4375 = 0197284499001598308997743 ‘Some information was last because the resulted not it nto the aailable 1Obit fe, Answer off ty 0000164357657 3.42.5 No solution provided 3.42.6 No solution provided Solution 3.13 3.431 a | a.6360% 10" 1.6360 x 108) + 10010? 146960 x 19! = 4111111010 x7"9=—11111111010000, 1.6360 10"= 211111110102 = 11111711010000,, Lo x10": ,agagcaaaaa +1,00g0000000 ‘agaaaacaoa = a10qaaqcaaaaaq0a = 2 wa | (2.868625 x 108 + 4.140625 x 10% + 1.2140605 x40" -2.865625 x 10= 1.1100101010% 2" ‘suftuinary pokTtof smaller left 6 so expanents match w 100101010 @ Taoa011010 10 0000 Guard=1, Rounde, Sticky=0 (98) No round (Us) 00 Guarded, Rounded, Sticky=d Normalize, add 1 to exponent Chapter 3 Sokitions 1. | 2cc6s x 108+ A065 x 10+ 1.2140625%104) ~2acE625 x19! = 1100101010 x 2*, 4.200625 x10"! = 1:1010100000 x 2 12140625 %10" = 1.1000010010 x 23 ‘Sit binary pont oF sma Io. 6 So EXPONENTS MAH ia) ogom0010 @ (0000110102 00 000 GuarcHO, Rounded, Sticky=0 «cay w. «cay AS(BHC) 10,1001001101 10 Normalize, add 1 to exponent {010Q100120 11 0 Guarcei, Rouncel, Sticky, Round up As(BHC) = 1,G209100121 x 28 = GLOLOLoLOgIONIII = 41,21875 3.13.3 ‘a | Nose ence equ: (A+B) +OH1, Ax © +0) = 0 GepS SoM Oe), Exact “16360~ 16950 + 1= 1 ’. | No, ey enc eR (A+B) +C=A41575,A+(B+ C= ALZIBTS SERS TOWN). act arsner f A1.2109575 3.134 (@BEDEIDS x 10x LTS 10) x 2.50105 x 1 (§) 4.828125 x 10- = 1,0000000000 x 2 (®)L.768x 40" = 1.2011101000 x2" (Q)280125 x 107= 1ar11010001 x 27 ogoaca0q00 oaoaca9000 sogoacq9000 reaog0a0000 veaocaaqa0 ogoacac0a0 1.101101 0900000000000, |AxB_1.4011101000 09 00000000 Guar AxB 11011101000 x2? Bp 1+7=8 ‘Sens: both postive, result postive Martissa (xe) oO no111a1000 101000 10100 Normalize, add 1 to exponent (Ax) 11010111111 OL 101101000 Guard =O, Round = 1, Stity =: No Round (AxB)xC Liowo1a4111 x = 431.75 Chapter 3 Sokitions bb | (721875 x 10° 2.808875 x05 x 3.575% 10" (4.721875 x 10% = 10111100111 x 2 (@) 2808375 x 10*=1.1100000110% 2" (©) 3575 x 10" = 4,0001111000x 2° 20,10010111010001101010 Normalize, add 1 to exponent 10100101110 10 01201011 Guard-1, Round-0, Stic! Round up AxB — Louooio1141 x2 Bp 10+5=15 Ses: tot positive, result postive Marts (Ax 8) © Round-0, Sticky=1: Round up @xB)xC_ Lortt0o1011 x29 3135 sa | 4:8828125 x 1074 768 0° 2.50125 x 17) (8) AEDS x 40~ = 1,0000000009 x 2 (@) 1.768 x 40° = 1,1011101000 x2 (©) 280125 x 10"= 1.1111010001 x2" bp 1047=37 Seq: both postive, resut postive Merv: (3) 101000 na1iia1000 101000 Normalize, add 1 to exponent 101101000 Guard=0, Round=1, Stic Bx C LioIoNII111 x2! OVERFLOW: Carrot be rerresentod No Round Chapter 3 Sokitions wy | acraters x 10! (2608875 x 10" «3.575 x 105 (@) 4.721875 x 19% = Loti90111 x 2 (©) 2.809375 x 10*= 1.1100000110 x2" (©) 3575 x 19= 119001111000 x2 Bera Sigs: ban poste, result positve Mantissa @ oO 710000 Guarce1, RouncO, Sticky. ound up Bxc sanisoni001 x2? Bp:5+9=14 Signs: bath posit, result postive 1 Normalize, add 1 to exponent. Guarda, Round-a, Sticky No Round Ax(BXxO) LoLtco1OI0 x29 3.136 a | ane (xB)xC= 14010111111 x 29 = 431.75 BX C= 11010111111 x 280VERRLOW: Cannotbe represented Band C ave both lg, so their product does not it nto the 16. Moetng pont format being used. Nn: (xB)xC= LoLt001011 x 2 = 47456 Ax(BXC)= LOLL1001010 x 395 = 47404 "Exact" 47 21875 x 2087S X 35.75 = 47424205341 7967S Solution 3.14 3.441 ‘a: | L5D34375 x 107 x Q.070R125 x10" + 9.96875 x 104) (®)2.0708125 x 40% = 1.1010100000 x2 (9.96875 x 10! = 14000111011 xP Shiftbinary point 9 tothe let, match exoarents (c)—1-2000121011 (©) Togogoaae1i a2 100000 Guarceo, Round=1, Stickyel (ec) 11000111110 x 26 Bp -3+6=3 ‘Sens: both postive, result postive Mentissa w 10011100000 (a) x 11090111110 10011100000 190011100000 190011100000 190011100000 190011100000 19011100000 10011100000, Ax (GC) ——-L,ITIQOLIO11 10 c1ag0q00 Guarde1, Roundea, Sticky=1 Round up AX(B+0) Lt140011100% 2° Chapter 3 Sokitions =2.TBS0605 x 10 x (ROSE 10" + 1,0216%10) Sticky = 0: no round (BC) 0,0100001010 Normalize, subtract 2 fram exponent (6 +) Loooo10N000%2"* Bq 4411=15 Stars: cne negtive,one positive ~ sin negate Mertisar w 1001 BH 1.1100111110 10 Ax@+0)-1-1100111111 Ax (B+ 0) 11100111111 x 29 01000 Guar Round-0, Sticky-l: Round up 3.14.2 fa | 15234375 x 107 x @.0708125 x 10 + 996875 105) (9 4.8234375 x 10% = 1,0011100000x 23 (@) 2:0703125 x 19*= 10101000002 (€)9.96875 x 108= 14000111011 x Bgr-3-3=-6 ‘Sens: bof posite, result positive Marts w @ be. 30,0000010012 09 a9000000 Normal ize 4 1,00000a1001 10 0 0, ‘Ax B 1,0000001010 x 2 B-3+6=3 ‘Sens: bom posite, result positive Marts w © add 1 to exponent 0 Guarcel, Rounchd, Sticky=0: Round to even ac. 111100101 Axe 11110011000 x ‘Shift binary point 8 to the eft, match exponents fc #1.1110021000 oS 10000100 09 001010 Guard=0, Round=0, Sticky=1: No Round 10100000 Guard=1, Roun 11110011100 (AxB) + AxO)= 11110001100 x2 Chapter 3 Sokitions 1 | -27ES0525 x 10" (8.088 x 10° + LODI6 x 104) (@) -2.7890625 x 10 =-1.1011111008 x2" (@) 8088 x 10° ==. 1114100110 x2 (©) Lozi x 10*= LOoLta1101 x23 = 10016 Bip: 441216 OVERFLOW: Carnet Represent Sens: bath negmtive, resuk posite ae 100010 G1 010120110 Guarda, Round=! AxXB —LAgLAIO9OIO x2"” OVERFLOW: Cannot Represent Bip: 4413=17 OMRFLOW: Cannot Represent Sigs! one negatne, one poste, result eEEtN Mentissa a oO Normal ize, add 1 to exponent bot {0001011001 00 Q00G10101 Guard0, Round=0, Sticky=1: No Round AXC =LOOOIOLIO0L x23" OVERFLOW: Cannot Represent bc -1.10) 1 x 28 QvERFLOW: Cannot Represent be 10001 x 21 QVERFLOW: Cannot Represent 4: (0 x 218 QVERFLOM: Cannot Represent 101 100 x 217 OVERFLOW: Cannot Represent AXBHAXC =1.1011011100% 2" O/ERALOW: Carnot Represent 3.14.3 a anes: Ax (6 +0)= L1A10011100 x29 = 15.21875, and (Ax B) + (Ax 0)=1.-1140011100 x 2° 1521575 Breet: 1534375 x (20708125 + 99.6875) = 15.2183074951171875 Ne Whit is possible to cakouete Ax (BC), iS not possible to caulate AX B ar AX, S H— Ie >—rsrictino — Q| ~ se { S pa e ‘This shows the lowermost bit of each word, This schematic is repeated 7 more times for the remaining seven bits, Note that there are no connec- tions for D and C flip-flop inputs because datapath figures do not specify how instruction memory is writen, Reg! Fego_o WDatao HJ Pegi_o Clock) RegWrite | He ‘WReg Rage ‘This is the schematic for the lowermost bit, it needs to be repeated 7 more times for the remaining bits. RReg] is the Read Register 1 input, RReg2 is the Read Register 2 input, WReg is the Write Register ‘input, WData is the Write Data input. RDatal and RData2 are Read Data 1 and Read Data 2 outputs. Data outputs and input have“_0” to denote that this is only bit O of the 8-bit signal. Chapter 4 Sokitions 43.3 al a © +p se +p se +p se o {+ oe dl 'b. [No change, here are no gates with more then 2 inputs nthe schematic. 4.34 ‘The latency of apath is the latency from an input (ora D-clement output) to an output (or D-clement input), The latency of the circuit is the latency of the path with the longest latency. Note that there are many correct ways to design the Gireuit in 43.2, and for each solution to 4.3.2 there isa different solution for this problem, 4.3.5 ‘The cost of the implementation is simply the total cost of all its compo- nents, Note that there are many correct ways to design the circuit in 4.3.2, and for each solution to 4.3.2 there isa different solution for this problem, 43.6 ‘Because multhinput AND and OR gates have the sere latency as Bip ones, we can use ‘menyinput gates to reduce the nuTber of estes on the path ffom inPulS tO outouts. Te Shemale Shawn for 4.3.2 tums aut to aleady be eptETe, ‘A Brooinput or afournput gatehas a lowe ltency than a cascade of tio Darou BES, ‘This means that shorter overall latency is achieved by using 3 and dénput gates rater then cascades of anput wes. I our Schematic own FO" 4.3.2, we Srouki replace the Wee 2anput AND getes Used for Cock, Regie, ane Wheg sigs wit two Sanput AND gates Pat rectly etemine the value of he C input foreach Dearest. Solution 4.4 4A.1 We show the implementation and also determine the latency (in gates) needed for 4.4.2, 44.2 See answer for 44.1 above, Chapter 4 Sokitions 444 ‘@ | Tere are four OR gates on te ortcal pat, fora total of 1360S, | The cal path consists of OR, XOR, OR, and OR, fora total of 5100S 445 ‘&_ | Te cost is 2 AND gates and 4 OR ames, fora total cost oF 16. bb. | Te costs 1 AND gate, 4 OR gates, ancl 1 9OR te, fora total cost oF 12. 446 We already computed the cost of the combined circuit. Now we determine the cost of the separate circuits and the savings. [| cotinient | setoct | svt a 36 O2OR ges) (B= 16/22 = 27K be 2 14 (1 AND te) (14~ 12/14 = 14% Solution 4.5 451 ‘Stat ——_____y— > Out yA Te Ck4—t ol - ut ® Chapter 4 Solutions 45.3 a Sr ORAED Site Be 170ps (NOT, AND, D) eS 1120s (OR, AND, AND,D) (2X S0ps)/(45 x 120ps) = 1.50 ‘80ps (NOT AND) (2X 170psY/AS x 90s) =3.78 ——— Bice a 14 (LAND, 1 OR, 1X0R, 1D) ‘20 (2AND, LOR, 2¥0R, 1D) be 29 @NOT.2AND,20) 23 (LNOT-2AND,2D) 45.6 Cad eer eee ed ‘a. | 14%32x90=40820 | 20%16%120=38400 | Cost/perfornence of Creuit2 is better by about 4.7% wb. | 2axa2xa70=t57760 | zex16xo0=41760 | Cosyperfornence of Crauit2 is etter ty about 73.5% Solution 4.6 latency of the I-Mem: 4.6.1. [-Mem takes longer than the Add unit so the dock cyce time is equal to the ‘a. [ 40005 | 50005 4.6.2. ‘The critical path for this instruction is through the instruction memory, Sign-extend and Shift-left-2 to get the offset, Add unit to compute the new PC, and ‘Mux to select that value instead of PC + 4, Note that the path through the other ‘Add unit is shorter, because the latency of I-Mem is longer that the latency of the Add unit. We have: ‘@ | 400p5 + 20p8:+ 2ps + 100ps + SOps= S52ps ba. | 500ps + 90ps +20ps + 150ps + 100ns= BEDS 46.3 Conditional branches have the same long-latency path that computes the branchaddressas unconditional branches do. Additionally they havea long-latency path that goes through Registers, Mux, and ALU to compute the PCSrc condition, ‘The critical path is the longer of the two, and the path through PCSrc is longer for these latencies: ‘| 40005 + 200ps + 30ps + 120p6 + 30ps= 70S ba | S00ps + 220ps-+ 100ps + 180ps + 100ps= 2300ps All structions except ps that re not PGrelatve (al jal) 464 a be. | Loais and stores None, Mem is slower, and all instructions (even NOP) nead to read the instruction 46.5 wb, | Lowi and stores, 4.6.6 Of the two instruction (bne and add), bre hasa longer critical path so it determines the clock cycle time. Note that every path for add is shorter or equal to than the corresponding path for bne, so changes in unit latency will not affect this. As a result we focus on how the unit’ latency affects the critical path of one: ‘@ | This unitisrot onthe aitcal path so changes tb ts latency do neck affect te bck ok tine Unless he latency of tre unt becomes so nee to create anew crical path yOu is UN, ‘he trench ach and the PC Mux. The leroy ofthis path is 230ps and it neds to be above 7800s, so the latenty of the Adc unit needs to be more 650ps for to be n the aia path. bs | This unit is not used by BNE nor by ADD, sotcarnect affect he ctcal path for eter instruction, Solution 4.7 4.7.4. The longest-latency path for ALU operationsis through I-Mem, Regs, Mux (to select ALU operand), ALU, and Mux (to select value for register write). Note that the only other path of interest is the PC-increment path through Add (PC +4) Chapter 4 Sokitions and Mux, which is much shorter. So for the I-Mem, Regs, Mux, ALU, Mux path wwe have: “4Q0AS + 200s + 30NS + 12085 + 30NS = 7EORS "500A + 22006 + 10085 + 1809S + 1009S = 1100S 7.2 Thelongest-latency path for Iw is through I-Mem,Regs, Mux (to select ALU input), ALU,D-Dem, and Mux (to select what is written to register). The only other interesting pathsare the PC-increment path (which is much shorter) and the path, through Sign-extend unit in address computation instead of through Registers. However, Regs has a longer latency than Sign-extend, so for IeMem, Regs, Murs, “ALU, D-Mem, and Mux path we have: ‘a. | 40005 + 2000s + 30s + 12005 + 850 + 30NS= 110s ‘500ns + 220ps + 1008S + 180A + 1000Ps + 100pS=2100ps 3. Theanswer isthe same asin 4.7.2 because the |W instruction has the longest tical path. The longest path for sw is shorter by one Mux latency (no write to register), and the longest path for add or bne is shorter by one D-Mem latency. 4.1.4 The data memory is used by 1w and sw instructions, so the answer is: ‘a | 2s 10% = 50% i, | Seis 158 50% 4.7.5 ‘The sign-extend circuit is actually computinga result in every cycle, but its output is ignored for add and not instructions. The input ofthe sign-extend ci cuit isneeded for addi (to provide the immediate ALU operand), bea (to provide the PC-relative offset), and Tw and sw (to provide the offset used in addressing memory) so the answer is: 15 + 20% + 20K + 10%= GOK ‘Si 15+ SEH + 15K= OK 4.7.6 The clock cyde time is determined by the critical path for the instruction that has the longest critical path. This is the Iw instruction, and its critical path, goes through I-Mem, Regs, Mux, ALU, D-Mem, and Mux so we have: ‘a | Hem has the ingest atenay $0 we reduce it letenay ftom 400ps to 3500s, making the Cook ‘ce 40s shorter. The speed achieved ty reducing the cock oe time is then 113055/ 1090p = L037 ’b. | Dem has the longest latency So we redure is latency fom 1000Rs to 900s, making He (lock cele 100ps sharter. The speedup achieved by reduc the dock cycle te is Hen, ‘2100p5/2000ps = 1.050 Solution 4.8 4.8.1 To test for a stuck-at-0 fault on a wire, we need an instruction that puts that wire to a value of | and hasa different result if the value on the wire is stuck at zero: ‘| BICT ofthe instcton wor is only used as part of an immeclete/offset part ofthe msttucin, ‘50 One way 10 test would be to excoute ADDI $1, 20, 128 whic is Supposed to Place a valve of 128 ito $1. If struction bit 7 is Stuck at Ze0, Swill be 250 because Valve 128 nas al bits at azo except bit 7. Be. | The onl insinctions tat st tis signal to 1 ae las, We can testy filing te dataerory wih 20s and excoutng a ead instuction from a nonzero adiess,e, LW $1, 10240). After his insiuction, ine elle i $2 is supposed 10 be zea. ee MarTtoReg signa is tuc% 21 0, the value in the regiser willbe 1024 (the Mux selects the ALU output (1024) instead of ‘he value from memory. 48.2 The test for stuckeat-zero requires an instruction that sets the signal to 1 and the test for stuck-at-1 requires an instruction that sets the signal to. Because the signal cannot be both 0 and 1 in the same cyde, we cannot test the same signal simultaneously for stuck-at-0 and stuck-at-1 using only one instruction, The test for stuck-at-1 isanalogous to the stuck-at-0 test: ‘a. | Wecan use ADDI Si, a0, O whist is supposed to puta valve of On $1. FBI ofthe insructon word's stokat 2, the immediate operand tecomes 128 and $1 tecomes 128 instead of ’. | We carnot relay test fortis fall, because all instuctins tht set the MenicReg signal to za also sel te ReadMem Sigal 10 210 If one ofthese nsIUCtonS 5 Used 25 a Test Mertofeg stu, the value written tothe destination register is “random” (whatever noise is thereat te dala output of Data Memory). This vlbe could be the same as the valve already in te register, soi the fault exis te test may not detec it 48.3 ‘a | leis possible to work arcund ts fault ut itis vey aificut. We must find all instructions fiat have ze in this bit of the offset or irmedite cpetend and replace them with a secyerice of “sefe"insiucton. For example a load wit such an offset must be replaced with an instruction ‘that subtracts 128 from the adress reste, ten the le (wit the offset lrger by 128 to set Dit7 ofthe offset to 1), then sutsract 128 from the acess register We cannot work around this problem, because It prevents allinstuctons from storing ei resul in registers, exceot fr load instructions Laad instuclons only move data fram memary {o registers, so they cannot be used to emul ALU operations “broken” by the full Chapter 4 Solutions 484 ‘a | enead is sick at 0, data memory is read for every instruction, Hovever for nordoad instuctions the value from memory is discarded by he Mix tat selects the vale tobe written ‘othe Register uni, As a result, we cannot desig tis kind of test for this faut, because the processor sil opertes carectly eltioug! inefficient), (by | To test for this faut, we need an instruction whose coade is 2570 and MemiRead is 1. HOMEVeT, | instrucuons with a zero gpcoge ae AL operons (not acs), so the Meread s 0. AS 2 ‘esl, we cannot design his kindof test fr is full because De processor operates carecty 48.5 ‘| iump is succeed, ery instruction updates tre PO as if were alum MsTUtIN, Wo test for ‘his fau we can execute an ACDI wih anonzeno mediate cporaNd, te MD sera is SLUG ati, the PC after the ADDI executes Willnot Le pont to the swucion tre fobs tre ADDL 1b. | To test for tis faut we need an instruction vewose opcode is a0 and Jump is 1, HOWeNe, he | ‘opcode for tre JUmp instruction 1S noM ZED, AS a ESUT, We CarMct design ti KU of test fOr ‘his faut, because the processor operates Oaect 4.8.6 Each single-instruction test “covers” all faults that, if present, result in dif ferent behavior for the test instruction. To test for as many of these faults as poss bleina single instruction, we need an instruction that sets as many of these signals to a value that would be changed by a fault. Some signals cannot be tested using this single-instruction method, because the fault on a signal could still result in completely correct execution of all instruction that trigger the fault. Solution 4.9 91 100011 co130 00001 oooocca000000000 ecco 000401 Oooo oooto saan aa 4.9.2 jms | hemes | etree | _ scene = (ona) 100001) | _Wes outro sec) BD 260009 ves 2 e0010) ves 49.3 PP Reniecnie | eget tert | = Tay = Eh ates aT = because RegDst is X) ee MeniRead= 1 be Regitie= 0 MeiRead=0 49.5 We use 131 through 26 to denote individual bits of Instruction[31:26], which is the input to the Control unit: a | Reeds =NOTISL | Regiite = (NOT 128 AND NOTIZ7) OR (Bt AND NOT 29) 4.9.6 If possible, we try to reuse some or all of the logic needed for one signal to help us compute the other signal ata lower cost: ’b. | Memmead = 11 AND NOTI29 Fegitite = (NOT 128 AND NOT 127) OR MemFead Solution 4.10 ‘To solve problems in this exercise, it helps to first determine the latencies of dif ferent paths inside the processor, Assuming zero latency for the Control unit, the critical path is the path to get the data for a load instruction, so we have I-Mem, ‘Mux, Regs, Mux, ALU, D-Mem and Mux on this path. 4401 The Control unit can begin generating MemWite only after I-Mem is read. It must finish generating this signal before the end of the clock qe. Note that Mem\rite is actually a write-enable signal for D-Mem flip-flops, and the actual write is triggered by the edge of the clock signal, so MemWrite need not Chapter 4 Solutions arrive before that time. So the Control unit must generate the Mem\Write in one lock cycle, minus the I-Mem access time: a oe ‘a | 400p5-+30ps + 200s + Shs + [1460pS~ 400ps = 760pS {120s + 350ns + Ops = 160A wb. | 500ps-+100ps + 200ps + 1009s + "2200ps = 500ps = 1700ps {180s + 1000p + 100ps = 220008 4.40.2 All control signals start to be generated after I-Mem read is complete, The ‘most slack a signal can have is until the end of the oycle,and MemWrite and Reg ‘Write are both needed only at the end of the cycle, so they have the most slack, ‘The time to generate both signals without increasing the critical path is the one ‘computed in 4.10.1. 4.40.3 MemWrite and RegWrite are only needed by the end of the cyde, RegDst, jump, and MemtoReg are needed one Mux 30ps) 30ps + 200ps + 30ps - SOps = 210ps_ TIS (05> Spm 100s + oss as | For the next three problems, it helps to compute for each signal how much time ‘we have to generate it before it starts affecting the critical path. We already did this for RegDst and RegWrite in 4.10.1, and in 4,10.3 we described how to do it for the remaining control signals, We have: aE See [ae [res | rooms | ams | rams [a Teor | 75008 | Toons asco [| scoms [1500s | coms | ico | sens | a7cops S05, {1700ps ‘The difference between the allowed time and the actual time to generate the signal is called “slack” For this problem, the allowed time will be the maximum time the signal can take without affecting clock cydle time. If slack is positive, the signal artives before itis actually needed and it does not affect lock cyde time. If the slack is positive, the signal islate and the lockcyce time must be adjusted. We now ‘compute the cack for each signal: aE Soe aps [os | 100m [ aoe | ss | 10m | sms | oom | ~ps es ons 310005 00ps 200s | Sens: 20008 Sons ns plus 4.10.4 With this in mind, the clock cde time is what we computed in 4.10.1, ‘the absolute value of the most negative slack, We have: ‘Actual clock cycle time Control signal with the | Clock cycle timo with kioal | with these skgnal ete Ree ee) od Regiite C40ps) 1160s 1200ps| AUS (80p5) 2200ps 2280ps 41055 It only makes sense to pay to speed-up signals with negative slack, because improvements to signals with positive slack cost us without improving perfor manee. Furthermore, for each signal with negative slack, we need to speed it up only until we eliminate all its negative slack, so we have: Per-processor cost to Eee aed Cee as = ‘MeiReadl(~20p5) (6Ops at S/Sps = $12 Fegitite 40ps) & ‘ALUOp(-355) “11ps at $1/Sps = $23 LUST C50ps) Chapter 4 Solutions 4.40.6 The signal with the most negative slack determines the new clock cycle time, The new clock cycle time increases the slack of all signals until there are is no remaining negative slack, To minimize cost, we can then slow down signals that ‘end up having some (positive) slack, Overall, the cost is minimized by slowing signals down by: SSS eo a 50ps 40ps 140ps 20s Ops SOps 20s Tops Ops b.| 800s ‘80ps 1809s 1809s 2500s 450 2500s Ons, eps Solution 4.11 4.111 ‘Goo1000011000000000001000000 ‘Gooo100011000000000000110000 co ‘10000 ‘001100 PO w Ad (FO + 4)t0 branch MAK tO jUP mete a {ER Ranga ee RnSre he | Seem amare Pie il ee Semen 4.114 PT wracmer [ num [ non/nune | severe | amon | See b [soormssn| 3 x FO+a Posa 4415 2andi6 Poa FO aand 6x4 be —sai-3 Poa Foramdaxa 441.6 Pel teat Regstr i [Renitegeer2] witeeer [WneDan| Regt a 2 3 be 1 3 x@ro x 0 Solution 4.12 4424 Pits | ors = ‘5005 365005 be 2005 ‘0088 412.2 a = = 250005 1660s be 100005 ‘00s 412.3 Hl rd eee = eM 40055 & F 19005 4124 2% aa Chapter 4 Solutions = [em | [oo | 4.42.6 We already computed dock cycle times for pipelined and single cde ‘organizations in 4.12.1, and the multi> 4 bts Index Bhary address mod 16 HIL/MISS: MyM My Hy Ny MyM My MAM ’, | inary ares: 00000110,, 11010110,, 10104114,, 11010140,, 00000120, 01010100, (01000001, 10101110,010000002,011010012, 010101015, 110101115 “Re: Binary actress >> 4 tats Index Binary accress moauhss 16 Hil/Miss: M,M, MH, MMM Ml MAM, MI Md 5.32 ‘a. | Binary adress: 1p, 10000110,, 11010100, 1, 10000114, 110101012, 10100010;, 10100001, 102, 101100, 101001, 11011101 “ee: Binary acess >> 3 6s Index inary axes >> 1 bf) mod 8 HIYMISS: MyM, Ml Hy H, HM MMMM, ML 3. | Binary dcress: 00000110,, 11010110, 101011141, 110101102, 000001103, 01010100, (01000001, 10110000,,01000000,,01101001,,010101015,11010111 “ag Binan/ackiess shit rigtt 3bits Index hoary adress shit ght bib modus 8 HIY/MISS: MMM, Hy My MyM Hy Hy My Hy 5.3.3 GE hit, C2: Shits, OA: hits OL: Sal ne = 25 x 4 Dx 12 = 209, Same = 25x 94 3x12 = 261,C3 Siall me = 25% 10 + 4% 12= 298 ’. | CL dhit stall tine = 25 x 11 + 2x 12= 290 gees 2:4hits, stall time = 25 x8 + 3x 12 = 236 cycles CE 4nits, sail me = 25 x8 +5 x 12 = 260 jokes 534 Using equation on page 351, n= 14 bits,m=0 @. ward per bck) Dx x32 + (G2= 14-02) +1) = S00 Hots Catoulztng for 16 vor’ blogs, m= 4, i n'= 10 then the cache is 544. Kos, ane i: ‘te cache is 1 Mbit. Thus the cache has 128 KB of data. “Te lager cache may have a lager access tine, leading to lover performance, 1 | Using equation total cache ste = 2°(2" x2 + @2=n=m=2)+1).n= 13s, m=1.@ wards per bho 29x Q!x 32 + G2~ 13-1 -2) 4 1)= 2! x (6a + 17) = 663 Hews total cache se Form-=4 (16 wor blocks), n= 10 then the cache is 4. Mots and ifr'= 11 then cache is 1 bis. Thus the cacne has 64 KB of cata, “The [ager cache may have lange access tine, lec to lover performance, si7 Chapter 5 Sokitions 5.3.5 Fora larger direct-mapped cache to have a lower or equal miss rate than a smaller 2-way set associative cache, it would need to have at least double the cache block size. The advantage of such a solution is less misses for near by addresses (spatial locality), but with the disadvantage of suffering longer access times. 5.3.6 Yes, it s possible to use this function to index the cache. However, informa tion about the sixitsis lost because the bits are XOR'4, so you must include more tag its to identify the address in the cache. Solution 5.4 BAB a [2+ ees=282 by | 1+ 20/8/32)= 4078 BAA 3 Tey © [4 | a5 | 132 | 252 | aco | 1008 | 50 | 100 | S100 | 180 | 2100 To fofolz{s|{m|iwfo fitfe|i|ulea Cee (e[w[u[u[u[mu fatal [ww cl (e[s[w [nu [n[y [x[s [vy [s[v BAS 0.25 BAG : <000001, 0001.5 mem 1024]> <000001> 0011; mem 16]> <001011;,0000;, mem] 176]> <001000;, 0010; mem[2176]> <001110,, 0000; mem[224]> <001010,, 0000; mem 160]> 55.2 ‘Allocate cache bloc forthe missing dat, sekct a eplanerent ci {victim ay, put Ito he wrteack buffer, which wil be furter forwarded nto 2 wt ater {sue te miss request othe L2 cacte; Init in L2, source data irto Li cache; if miss, send ite request to memory, Data aries and is stall in LL cache; Processor resumes @EcUION and hts LA cache, Set te ay bt TLL miss, aocate cage blod« forthe missing data, eect a replacement wot, {evict cy PLE Litto De wrtsnadk ures, which val be Fue foverced into L2 we borer; 's5e ite miss request othe L2 cacte; Innit in L2, soure data into 4. came, goto miss, Send we request tormemory, Data amves and is install in L2 caches Data artves and i install in Lt. caches : PIODESSOFTeSUTES @SCUION a His in LA. cache, sete diy 55.3 Shrilr to 552, event tat @) Iii dean, putt into a victim bufer between the Li and LZ caches; ifvici dirty, put tnt the writeback buffer, whi wile further forwarded into L2 wit ber (4) hit in L2, source data into L1 cache, imate te 2 comys Siri to 552, ecent hat if Lvctin dean, put itinto a vim buffer between the L1 and L2 caches; if LL victin diny, putt int the waiteback buf, wich wil be further fonirded into L2 write buffers if ht 12, source data into L1 cache, alate copy in L2; 554 (0-466 reads and 0-160 writes per instruction (05 cycles). Minimal read/write bandwith are 0664 and O60 Venerol, (0.452 reads and 0.120 writes per instruction (05 cycles). Minimal read/write bandwicths are C608 and CABO byenEFOCIE. Chapter 5 Sokitions 555 ‘a | 0.082 reads and 0.0216 writes par nstructon (0.5 GS), Minimal rexd/wnte banaviatis Te (0.368 an 0.0884 tyteperoice. be | 0.084 reads and 0.0162 writes per nsttuctien (0.5 cles). Minimal read/write bandwidths ae 0.336 and 0.0648 beperoce. 55.6 “a | wrioback, writsalboate cache Sars banahicth, Mrinal rx /vait barcnkitis are O07 and 0.1152 byteperoce. i | Witoback witoclkcate cache saves bandnidih, Miinal read/write bardaiihs are O75: an 0.0863 byteperoce Solution 5.6 561 {2-SH TSS TAC. The MSS Fale GOESAT Change wih Cache Size OF working et. These ae cold MISSES. 5.62 25, 2h and 3.125% miss rates for 1Skyte, Grove and 12yte WoO!s. Spatial cay. 5.6.3 With next-line prefetching, mis rate will be near 0%. 5.64 @ [ie5Ke. b. [eate. 5.65 = [Sore b. [eate. 56.6 = | care, b. | Gane. Solution 5.7 5.74 . [Px_ [asa cre P2__| 452.GHe Pa__| 4.08 Gie P2__ | s26Mre 5.7.2 @ [PL [860m | 1387 oces P2 [626s | SABoces [PL [sa7ms | aadoces 2 |[saens | 3200005 5.7.3 @ [Pl [563 ro 2 | 405 w [pa |2as P2 2 [1.79 5.74 a. | 88ins 1421 es ers b. | 3655 30 gees Batter 5.7.5 a [576 ». [200 5.7.6 PL with L2 cache: OPI = 5.76, P2:CPI= 4.05, P2issill faster than Pl. even witian LL.cache PL wit 2 cadre: OPI = 202, P2:GPI= 1.79, P2igsill faster than Pi even win an Li.cacre Chapter 5 Sokitions Solution 5.8 5.81 ‘a | Binary axkress: 4p, 10000140,, 441010100,, 1, 10000L 1, LIOIOLO>, 1000010, 410100001 10> 1011005 101001,, 11011101, Tag: Binary access >> 3 bits Inner Binary adkress => Li) mod 4 HY Miss: M, Ml MH, H, H, MM, Ml, MM ML Final contents (block adresses). $e. 00:.0, 101000002, 101000, Sa 01: 10100010,, 10, Se 10: 110101005, 101100, Se 11: 10000110; be | Binary adress: fits 7S tag, 2-1 index Oblodk offs) (00000 11.05 Miss 1101011 03 Miss 3010111 1 Miss 4101011 05, Hit (00000 1103 Hit 1010 1005 Miss (010000015 Miss 101011103 Hit (010000005 Miss (011010015 Miss 01010 1015 Hit 101011 13 Ht Tag: Binary aches >> 3 bits Indexer see Binary acess >> 1 bit) mad 4 Final cache contents (block adresses, in base 2): ‘et: blocs (3 slots for 20rd blocks per set) {00 : 010000003, 01000000, 011010003 OL 10 : 01010100, 11 000001105, 110101105 10101110 582 ‘a | Binary axkress: 4p, 10000140,, 441010100,, 1, 10000L 1, LIOIOLO>, 1000010, 410100001 10> 1011005 101001,, 11011101, Tag: Binary access Index None (only one set) Hil Miss: MM, Ml H.M, MM MAM, M, MM Final contents (block adresses) 100001113, 110101015, 10100010, 10100001, 103, 101100>, 101001 110111019 Bary addeS5: ots 7-0 ta, ro Indexor bloc oft) (000001102, Miss 110101103, Miss AOlOLIL, Miss 0101105, Ht (000001103, Hit o1ot0100,, Miss (10000012, Miss 101011103, Miss ‘010000002, Miss 11010012, Miss (OLOLOIOLs, MSS (LRU discard block 10101111.) LIOIOLLI, MiSs, (LAU discard B10 010101005) “Be Binany acess Fal cacte conten (look adsresses):(B cache Slots, 15H01 per cache Sto) (00000110. 340101105 ougnoio13 so10111 1000013 10101110 (010000005 11010013, onnoioi Ao10111 58.3 Binay acess: 12, 10000110,, 11010100,, 4p, 100001113, 110101013, 101000103, 101000019, 102, 101100, 1010019, 11011101 HHiy/Miss, LAU: NM ML He Hy Hy M,N ML Mi ML Hy/Miss, MRU: M, MM, H, HM, M, MIM, M, ML Given 2 verd blocks, the best miss rate is 9/12, Binary aides: ts T= tg, Lob ofe) (Bache slots, 2ords per cache slot) (0000011 05,” Miss {01011 03, Miss 1010111 13, Miss 401011 0;, Hie ‘000011 03, Ht 010101005, Miss (0100000 15, Miss 4010111 0;, Hie (0100000 03, Ht 0110100 15, Miss 101010 4, Ht 01011 43, Hit No need for LRU or MRU replacement polis, henoe best miss rate Is 6/12. Chapter 5 Sokitions 584 a | Baer 20 Memory miss ces: 125 oles (1/3) ns/clock= 375 clock qyctes 1, Tal ©: 20 + 375 x 5% = 20.75/39.5/11.375 (roma/coubk/ tal) 2. Total PI 20 + 15% 5% + 375% 3h = 14/25.25/8.375 S. Total PI: 200 + 25 5h + 375 x LS = 10/16,7S/625 b. | Bac CPL 20 Memory miss gles: 100 deck oes 4. Toll OP = base CPI + memony miss cycles x 1st level cache miss rate 2. Total CP = base OP + memory miss odes * Bal miss rale w/2nd vel dectmapnad ‘cache + 2nd fevel crectmanped speed x 1st vel cache mis rate {3 Total CPI = base CP + memory miss odes x global miss rae w/2nd level Baay sel assoc cache + 2nd level Sey set assoc speed x 1st vel cache miss rete 41, Total P| (using st evel cathe): 2.0 + 100 0.04 =6.0 1 Total | (using 1st evel cathe): 2.0+ 200x 0.04 = 10.0 1 Total | (using Ist evel cache): 210+ 50% 004=4.0 2 Total CP (using 2rd level drectenanped cache: 2.0:+ 100% 0.04 + 100.08 2 Total PI (using 2nd level rected cache: 2.0 + 200% 0.04 + 10x0.04= 10.4 2 Total PI (sng 2nd level drectenanped cache 2.0+ 50% 0004+ 10x0.04= 4.4 {3 Total CP (using 2nd level Bana set assoc cache: 2.0+ 100x 0016 +20x004= 4.4 [3 Total CP (using 2nd level Buta sek assoc cache. 2.0+ 200 Q016 + 20x004=60 ‘3 Total CP (using 2nd level aay sek assoc cache. 2.0+ 50X0.016+20X0.04=3.6 585 am | Ber 20 Memary miss ces: 125 oycles/(1/3) ns/clock = 375 clock eyes Total PI: 20 + 15x 5 + 50% 3h + 37x LK = 8125, ‘This woul provide beter perfomance, but may complicate te design of the processor. This could kked to:mrbre complaccache coherendy notwcseci cle tine, lng arcmoe expensive chips. wb. | Base CPE ZO Memory miss qs: 100 dock oes 1. Total OP = base CP -+ memory miss Q/CeS x aba MISS rate w/2re level drecemapped ace + 2nd tvelcirectenepped speed x 1st hve cache miss rate 2 Total CP = base OP + memory miss O1GES x Blobal MISS rate W/SKd evel dkockenaRDed cacti + 2nd level cirectenapped speed > 1st Evel cache miss rate + Se evel rec mapped speed x 2nd level cacne miss rate 4. Total I (using Dn evel dectmapped cache) 2,0 + 100x004 + 10x0.01= 64 2: Total CP (usr Ses vel drecemapped Cache}: 200 +100 x0,013 + 10% 0104 +50 x ona = 5.7 ‘Tis would prove beter performeroe, bu may compa the deskof te processor This cous kkod fo: mare complex care caer, Wereased ce UTE, EE and MAS ERPENSHE CS. 58.6 a | EESCOR 20 Memory miss cles: 125 oles /(1/3)ns/ctock = 375 clock cyCes “lal CPL: 20 + 5 x 5 + 375 x A — 04TH x M)= 14/10 => 2 MB L2 cacte tomatch DM => 2.5 MB L2 cacte tomatch 21a b, | BxeOr: 20 Memary miss gles: 100 clock odes: 4) Tal CP (using 2nd level dreckmnapped cache): 2.0 + 100% 0.04++ 100.04 = 6.4 2) Total CPI (using 2nd level Srey sel assoc cache): 2.0 + 100 x 0.016 + 20 x0 53) Total CP1= base CA + cache acess time X 1st level cache miss rae + mernary miss ‘ces X (gobal miss rale = 0.7% n) \where n =further unit bec of 512 KB cache size beyond base 512 KB cache 4) Total CP: 200 + 50 [0.04] + [100] x (0.04 =0.007 xn) 8 LTS 6S E59 LST fern=5, 0715.0 Hence, to metch 2nd level drectmapped cache CPI,n'= 20r 1.5 MB L2 cache, and to match ‘2nd level Sy set ass00 cache CAI,N'= 5 oF 3MB L2 cache ter fer fer fer fer Solution 5.9 Instructors can change the disk latency, transferrate and optimal page size formore variants, Refer to Jim Gray's paper on the five-minute rule ten years later. 5.94 32 KB. 5.9.2 Still 32 KB. 5.9.3 64 KB, Because the disk bandwidth grows much faster than sock latency, future paging cost will be more close to constant, thus favoring larger pages. 5.94 1987/1997/2007: 205/267/308 seconds. (or roughly five minutes) 5.9.5 1987/1997/2007: §1/533/4935 seconds. (or 10 times longer for every 10 years). 5.9.6 (1) DRAM cos/MB scaling trend dramatically slows down; or (2) disk S/ access/sec dramatically increase. (3) is more likely to happen due to the emerging, flash technology. Chapter 5 Sokitions Solution 5.10 5.104 ‘a | Virtual page runtbar: Aokkess>> 12 bis H: Hit in TL, M: Miss in TLB itn page table, PF: Page Fault 0,7,3,3,1,4,2(M,H.M.H.PR HPA 1B Valid Tag Physbal Page Number 1 3 6 1 7 4 1 1 1B 1 2 “ Page table Valid Frysical nage or inci 5 iB 4 6 a u Disk a Disk Disk 2 | inary adores: @l headecmad, (its 15-12 vital page, 41-0 page offset) ‘2B, Page Foul, isk => prystcal nage Dy > TLB Siot 3) 7 8F4, Hit TLE 4400, Miss iNTLB,(=>TLB sit 0) BAG, Miss 11 TB, (=> TB Sit2) ‘SDE, Page Fault, disk => Fiysical page E,(-> TLB sket 3) 4100, HUNT B60 HINTLE 18 Vaid Tag sical Page 1 9 1 1 1 Prxe tole BRporonnerong a 5.10.2 ‘a | virual page number: Aacress >> 14 Dns HEHIUIN TLB, ME Missin TLE itn page tbe, PF: Page Foutt 0,3,2,1,0,0,2.04,H,5 4, HHH) 18 Vals eg ysical Page Number 1 B g Larger page ses allow for mere adkresses tobe stored a singe page, potentially ‘campered to sale pages, {decreasing the amount of pages thet must be brought n from disk and WerEasig Ne oer Of te TLE, However, fa program uses aakuresses fa Sparse fashion (er exemple rarwomly ‘excessirg a large main en Were wil be an exra penalty from wansferng larger ages Chapter 5 Sokitions bb | Binary acoress: ll heqaceomap, (ats 15-14 virtual page, 43-0 pase ofset) AED, Migs 11 TLS, > TLE SOL) 35F2, Page Fat, dis => prysical page D, > TB sot 1) 400, Hitin ns S586, Page Fault cis => prysical page E, (=> TLB sit 2) Line, HtinTls (tod, Hin TL, S060, Hit TS ge Frysical Page peers gooeere g a disk disk PROGR ORDER ER gg g o 5 Larger page sizes allow for mare axcresses tobe stored ina single nage, potentially Uecezsing the amaunt of pages that must be brought in ftom disk and nereasng the coverage of the TLE, However, if program uses addresses in a sharse fashion (or exerble rentomly, accessing large matrix, ten there willbe an extra penaly ftom Yensfering lense pages compared to smaller rages. 5.10.3 Virtual page mune: Adcress => 12s 0,7,3,3,1,1,2=> 0, 111,011, 011,001, 001,040 2 way set associative: Te N= Lit 16 Vaid Teg PN ais ag PN 1 o 5 1 a 1 ° Bo a 6 Diecenapped Tee VPN S> 2 bits 16 Valid Tag hsical Page Number 1 o 5 1 o iB 1 o 4 1 ° 6 ‘The TLB is important o avoiting paying high aocess tines to memory in oer to tense Vital aciresses to piysical adcresses. memory accesses are frequent, then the TLE wll become even mbre important. Without TLB, the page table woul! have 10 be referenoed upon Geary access usinga vitualackkesses, causing a significant Sendown Chapter 5 Sokitions Be | Binary adress: (al hecrdecima, (its 15-12 vitual page, 110 page offset) 2480, Page Fault, disk => physical page D, (= TLB set 0, kt 1) 7 8F4, Miss in TLS, (=> TLB set 1, sit 1) ACD, Miss in TLB, (= TLB set, slot 0) BAG, Miss in LB, (= TB set1, sk) 94DE, Page Fault, disk = physical page E,(=> TLB set 4 slot 1) 4100, Hit TB, 5060 HtinTlB Day set associative TLB (bits 15412 virtual page = bits 15413 tag, bits 12 set} (pete: tne stamps axcerdng tat sir physical page nuribers) Valid TagyTime) Physical Page 9 1 aa 1 22 D 1 BS c 1 a4 E Binary adress: (all hexadecimal), its 15-12 virtual pege, 11-0 page offset} 2460, Page Fault, disk => physical page D, (=> LB sit 2) Teri, Hin TB 4400, Miss in TLB, (=> TLB st) BAG, Miss in TLB, (= TB sbt3) ‘Q4DE, Page Fault, disk pysioal page F (=> TLB slet 1) 4100, Hit TB, 5060 HtinTlB ectmappad TLB (bits 15412 vitual page = bits 13-12 TLB sla} Valid Tag Pscal Page g 1 4 1 9g F 1 2 D 1 5 c ‘The TLB is important to acing paying high aocess tines to memory in ower to tenslate virtual actresses to piysical adresses. memory accesses are frequent, then the TLE ill become even more important. Without 2 TLB, te page table woul have to be referenoed upon every access using a viral adresses, causing a siicant slowdown. 5.104 ‘2K page = 12 offset bits, 20 page number bis 2)= 1M page table entries {LM entries » 4 bytes/ertry= ~4 NIB @” bytes) page table per application 2 byes x5 apps = 20.97 MB tal ‘vital 2ag703S size OF 64 BS 16 KB (24) page size, § (2") bytes per nage table entry 64 -14= a0bisor 2” page table ents with & byes per erty yels total of 2° bytes for a0 pase table "Dial for 5 applications = 5x 2" tes 5.10.5 “2K page = 12 offset bits, 20 page number bis 256 ertries (bits) for frst eval => 12 bits => 4096 envies per second vel Minimum: 128 fst evel erties used per amp 1128 entries x 4086 entries per seoond kevel = “524K (29) entries 524K 4 bytes/erty = ~2 MB (2%) second level page ible par app {128 envies x 6byees/entry = 768 bytes frst vel page table per arp ‘10 MB total forall 5 apps Maximum: 256 frst level envies used per app 256 entries x 4086 entries per second level = ~1M @) entries M4 bytes/ertyy = ~4 NB (2) second level nage table per pp 256 entries x 6 bytes/ertty = 1536 bytes frst vel page table per ann 20:58 MB total forall apps ‘tual cress sie of 64 bls 64 = 14= 20 bits or 2" page table entries 256 (2%) entries in main table al 8 (2°) bytes per page table entry (6 rounded up to nearest power 2) ‘otaof 2! tyes oF 2M for main page table 40 =8= 32 bits or 2 page teenies for Qc level table with 5 byes per ety yids total of 2° bytes foreach page table “Kal for Sappications = 5x(2 KB-+2 bles) [maximum wilh minimum as ha this fre] Chapter 5 Sokitions 5.10.6 ‘@ | 1616 DrectAtzored care WoW per blocs => 8 bytes/bloxk = 3 bis block offset 16 KG/B byes /olock => 2K SeIS => 11 Bis for nde, wit 2.4 KB page, te lover 32 its are aie for use for indexing prior to wenstaticn from ‘Vato PA, However 2 16 KB DireceMapped cacte needs the lier 14 bits to remain the same between VA to PA vanslation, Tus its rot possioe to build his cache, Ifthe cacre’s cata sue isto Le increased a higher asscciavly Must be Used Fe cache has 2 Words per blocs, only 9 its are avalable fOr Mex (HS 512 Ses). TO make a 16 KB carte, ey associtvly Must be USEd. by | vitual acres size of 64 bits 16 KB (24) page siz, 8 (2°) bytes per page table etry 16 18 drecemepped cane 2 wows or 8 2") bytes per block, means 3 bits for cache bloc offset 16 1/6 bytes par beck = 2K sets 11 bis for nde, ‘wit 2.16 KB page, the lover 14 bs are avalaoe fr use for index mre to wanation fem virtual to prysical. Consivenrg,2 16 Ke cirecnapped cache reqifes We lver 14 bis to ‘remain te seme between teislaton, Heros, Its possible to bull tis Cache Solution 5.11 5.111 ‘a | vitual actress 32 bs, pinsical memory 4 GB bg Se 8 KB or 13 bit, nage table entry 4 bytes or 2 bits PTE = 32 ~ 13 = 19 bisor 1K eves. Prptysical memory = 52K x 4 byes = 2 NB By | virtual actress 64 bis, pysical memory 16 GB age sie 4 HB or 12 bits, page table entry bytes cr 3 bits #PTE=64~12 = 52 bisor2” erues Prpiysical memory = 2x 2° = 2 bes 5.112 ‘a | vitual actress 32 bs, pinsical memory 4 GB bg Se 8 KB or 13 bit, nage table entry 4 bytes or 2 bits PTE = 32 ~ 13 = 19 bis or 1K enves. YB pote/ inte PTE = 2 pages inte per age Henoe with 2© PTES will nce evel nage table stun. Each ackess ransetion wil recure east 2 prnsical memary accesses. By | virtual actress 64 bis, pysical memory 16 GB age sie 4 HB or 12 bits, page table entry bytes cr 3 bits #PTE=64- 12 = 52 bisor 2” ermes 4.1 page/B bye PTE = 2° pages indesed per peg Henoe with 2 PTEs will need level page table Seu, Each acess tansetin wil equie at east 6 pinsical memory accesses,

You might also like