"Von Neumann Architecture is named after the mathematician to whom the idea is attributed. As mentioned on page 7, there is some dispute as to whether von Newnann really wes the sole author of ‘the poper “First Draft of « Report on EDVAC” in which these ideas rst appeared in the English speaking world, and whether that poper was the first time such on architecture was Proposed, given that Zuse designed computers using the same architecture sometime before that paper ‘wae written * Modern processors may goin additional performance by having more than one ALU. In conjunction with a pipeline (page 53) this ‘Bfectvely allows more than ‘one instruction to be executed at a time. The Central Processor Unit Suggested Reading J Gloan Brookshear, © “An Invitation to Computer Science,” G Michael Schneider and Judith L Gersting. + “Computer Organisation and Architecture,” William Stallings. Prentice Hall, cba ‘Von Neumann Architecture ‘The basic intemal organisation of modern computers follows what is known as Von Neumann Architecture ‘One key idea is to use general systems wherever possible, For example, we need some means of storing both programs and data, 90 we just use a general read-write memory for both tasks. As a second example, we need ta perform logical and arithmetic operations on data that might be stored in different places, but we can use just one Arithmetic and Logie Unit (ALU) to perfor all such operations. In summary, the von Nournann architecture is as follows: '* Data and instructions are stored in a single read-write memory. (In real computers, we also have secondary storage.) Bach cell in the memory can identified by its address Execution of programs usually ocours sequentially. Each instruction is interpreted by a central processor unit (CPU). ‘The required logical or arithmetic operations specified by the instruction are performed on the appropriate data by a general purpose Arithmetic and Logic Unit (ALU).? ‘The basic architecture of a modern von Neumann computer is given schematically in Figure 4.1 Registers ‘TWo-keep track of memory addresses and pieces of data that need to be operated on, the Central Processor Unit contains several registers. A register is a small piece of memory, usually with the same number of bits as the size ‘of « word for the processor concerned. (B.g. 4 bytes for a $2 bit processor.) Some registers have a very specific purpose in the operation of the CPU. (Other registers may have limitations placed on them (they may only be used cael Secondary Processing Ee Unit fi Input/Output Devices Figure 4.1: Basic Architecture of a Modern Computer in certain ways by certain machine code instructions). For example, some registers—called address registers—might only be able to contain data interpreted that is to be interpreted as a memory address. Some registers may be general purpose registers, and can be used by any instruction, and represent many different things (e.g, both data aud memory addresses). Special Purpose Internal Registers Some registers that are used in the internal operation of the processor are: + The Memory Address Register (MAR) holds the address of the next ‘memory location which is going to be read or written to. © The Memory Buffer Register (MBR) contains the instruction that has just been read from memory, or the data that has been read or is There may also be an about to be written to memory.” Input/Output Addrese ae pe eee ee ee physical devioes (or input/output modules), such pape aera + The Program Counter (PC) used to keep track of our current position in a (machine code) program. + The Accumulator (AC) in which results of calculations may be stored, In addition, there fs an instruction regiater which stores an instruction fetched from memory whilst it is being decoded, and a control register ‘whose individual bits record aspects of the state of the CPU, such as whether “The control register also the result of the last computation was zero, for example. All these registers indicates if the CPU isin are sketched in Figure 4.9, together with the ALU. the process of handling an interrupt (e806 77). Connections Between CPU and Memory Here, we will concentrate on the connection between the CPU and Memory. ‘The connection consists of several busses, namely: the Data Bus; the Address Bus and the Control Bus. Program Counter & Logie Unit Thstruction Control Register | [Testruct Memory Address] | Memory Buifer Register Register L Address Bus Data Bus Figure 4.2: The Basic Registers of a CPU Figure 4.9 shows the CPU-Memory bus Figure 43: Detail of CPU-Memory Bus Each bus consists of several ‘wires’, ‘lines’ or connections that ean be set to 1 or 0 (or left ‘loating’). On the data bus, one line is required in the data bus for each bit that can be transferred from memory simultaneously, The bit pattern on the address bus will comespond to an address in main memory. ‘The control bus consists of several lines that control various aspects of the computer. We will restrict our attention to just one of the control signals carried by the control bus, that governs reading and writing to memory. Figure 4.4 shows the Read-Write control lines ‘Reading from Memory When the central processor reads from memory the following sequence of actions are taken: 1. ‘The source address of the required data/insteuction is put in the Memory Address Register (MAR) of the Central Processing Unit 2. The MAR is connected to the address bus, so that the states of lines of the address bus reflects that register's contents, 3. The read/write line is set to indicate a read operation. Central Processing. Unit Address Br Data Bus Read/Write Contrel Line - Figure 4.4: Read-Write Control Lines 4. Next, the memory sets the lines of the data bus so that it reflects the contents of the memory at the indicated address. 5. ‘The Memory Buffer Register is then connected to the data bus so that its contents are set to the pattern of bits on the data bus (and hence to the pattern of bits in the desired memory location). ‘Writing to Memory When the central processor writes to memory the following sequence of actions are taken: 1. The required destination address is put in the Memory Address Register (MAR) of the Central Processing Unit. 2. The data to be written is put in the Memory Buffer Register. 8. The MAR is connected to the address bus, so that the states of lines of the address bus reflect the MAR’s contents. 4. The read/write line is set to indicate a write operation. 5, The Memory Buffer Register (MBR) is then connected to the data bus so that the pattern of bits on the data bus reflect its contents. 6. Tho memory sets the contents of the memory at the address on the ‘address bus to mirror the pattern of bits on the data bus (and hence the pattern of bits iz the MBR). The Fetch, Decode and Execute Cycle When running @ program, the CPU performs the following sequence of actions: 4. fetches an instruction from main memory; 2. decodes the instruction; 3. executes it. "Sometimes, this cyele is ‘This is repeated until a halt state is reached referred {0 more simply as the fetch-ezecute cyele, ‘The term machine cyele is sometimes used to refer to a single fetch, decode and execute cycle. Complex instructions may take moze than one machine cycle to execute. This can happen when sophisticated addressing modes are used, for example (page 50). In the CPU, the special purpose Program Counter (PO) register keeps track of the address of the next instruction. After fetching an instruction, the number stored in the Program Counter is incremented by 1, so that it is then pointing to the subsequent instruction. ‘The instruction that is fetched is loaded into the Instruction Register (IR) where it is decoded. ‘The instruction is a binary bit pattern which instructs ‘the CPU what to do in the execution phase. ‘The execution phase may take many steps. Kinds of Instructions A typical central processor unit understands many instructions, between Jt can be hard to count the few dozen to several hundred. Broadly spealeing, these instructions can be actual number of classified into the following kinds: instructions; in generel, ‘given two instructions, it is not always clear whether 1. Transfer of data between CPU and Memory. they are genuinely distinct ‘nstructions or jst @ Read from Memory; write to memory. ‘variations of each other: 2. Transfer of data between CPU and some Input/Output device. © Read from a device; @ unite to a device, 3, An arithmetic or logical operation on data. © Add two numbers; ‘= invert all the bits; © rotate all the bits one place to the right. 4. Control instructions (Branch and Conditional Branch instructions), ‘© Start executing code from a particular address; 1¢ If the result of the last eubtraction was zero, jump to some other part of the program. ‘Broadly speaking, 0 ‘In many cases, a single instruction may be of more than one kind. central processor unit is said to have « complez Format of Instructions, instruction set if ws individual instructions The bits in a word representing an instruction are usually split into: combine aspects of more than one kind. ee 2. One or more operands (or references to operands) ‘The op-code indicates what operation is to be performed. The operand(s) are the data used in the execution of the instruction. Note that the Memory Address Register and Memory Buffer Register cannot be manipulated directly by program inctructions, A Note on Operands Some texts use the term operand to refer to the data used in the execution of the instruction, wherever it is located. The instruction then contains an op-code and one or more address fields that are used to find the relevant data. Others use the term operand to refer to the bits in the instruction, regardless of whether they contain the relevant data or some pointer to the location of the actual data. Example Instruction Sat A very simple 16 bit computer might use the first 4 bits of each instruction ‘word for the op-code and the remaining 12 bits are the memory address of the operand. In a CPU with a Program Counter (PC) and an Accumulator (AC) we could ave the following instructions:® Description, COAD Toad AC from Memory STORE Store AC to Memory ADD Add to AC from Memory ‘BRZ
Branch to Address if AC is zero Op-code operand 0001
0010 OO1L 0100 ‘The mnemonics can be used to present a machine code program in a human readable form. They form the basic expressions used in Assembly Language programming. There is usually a very simple relationship between ‘the mnemonics and the underlying bit patterns of the corresponding machine code. ‘The expression is used to stand for the 12 bit operand that follows the op-code, and which is to be interpreted as an address in these instructions. Register Based CPUs Real CPUs usually have more accessible registers than just a Program Counter and an Accumulator. There may be a combination of Address Registers, for storing memory addresses and Data Registers for storing information to be used in arithmetic and logical operations, Alternatively, there may be General Purpose registers whose coutents can be taken +0 be either addresses or data. Addressing The Operands ‘The fields in an instruction that address the operands have a limited number of bits, but we may need to produce addresses that require more bits. To solve this problem, we can adopt different methods for addressing the operands of an instruction. Bits within an instruction can indicate which addressing method is to be used. Bach of the methods involve some trade-off, for example, between the address range (the size of the memory that can be addressed) and the time taken to calculate the address. Some CPUs also allow instructions to be several words long to accommodate word length operands, or large addresses to operands. Addressing Modes Some typical addressing modes are as follows: ° Using clocks to ensure thet all data transfers are correctly synchronised simplifies the design of the digital electronics. However, there can be a disadvantage in that some ‘components may operate ‘below then theoretical smazinaum speed. It és possible to design computers, and components ‘of computers to work without clocks. This can result in higher performance, but leads to more compies design. Th term asynchronous is applied to hardware, and Implicit ‘The location of the operand is implicit in the instruction (as in the destination of a LOAD instruction above). Immediate The operand is contained in the instruction. ireet The instruction indicates the address of the operand. Indirect The instruction contains the address of the address of the operand. Register The operand resides in a register Register Indirect The address of the operand is contained in a register. Displacement. A combination of direct addressing and register indirect, addressing, A value in a register and in the instruction are added to produce the address of the operand, Stack Some CPUs implement a stack in memory, the operand(s) is the top element(s) of the stack. Stack Based CPUs are a viable alternative to the more common Register Based CPU. Clocks ‘The hardware of a computer needs to he kept synchronised so that all parts of the machine are operating together. For example, we need to know that the data on the data bus corresponds to the data at the address last given on the address bus, and not the last but one address. Signals called clocks are used to ensure the various components of the machine are correctly synchronised.” Ina computer, a clock provides a sigual that continuously switches between 1 and 0. The speed at which the signal switches from 1 to 0 and back again is called the clock frequency. It is measured in Hertz (Hz) or cycles per second (5-1), One clock controls the timing of fetch, decode and execution steps within the processor unit, Usually, fetch and decode each take one clock cycle, The execution step can take more than one cycie, Often, @ different clock controls the speed of signals on the bus. Typical values are between 1MHz and 1GHa for the processor clock, and 1MHz and 200MHz for the processor-memory bus. ‘The slower speeds may be used in low-power embedded computing devices. ‘High performance workstations use frequencies towards the upper end of ddaia transfers, that required of “hese ranges, bisa the wee of clocks, Hartucre, and data trenafer mechanisms that can keep in atep withent sing lock ‘signals are called “runchronous Cache If the processor were linked directly to the main memory, it would be slowed down by the lower clock rate of the bus. To avoid this, the processor can be linked indirectly to the main memory via a eache (Figure 4.5. The cache contains a relatively small amount of memory (128KB-IMB) that can operate at (nearly) the same frequency as the CPU. ‘The cache contains data, together with a recored of where that data is located in main memory, and details of when the data was last used by the CPU. A GPU spends much of its time repeatedly access the same memory addresses. Once this data has been cached, then the information can be accessed at the speed of the cache rather than the lower speed af the main memory. “pramplee of CISC processors are Intel 286 Samily of processors, the Intel Pentium and the Motorola 68000 faraily of ‘processors Pipelining ‘We can genoralise instruction pre-fetch to obtain a longer pipeline by decomposing the CPUs operation into more steps.. For example, we can decompose the fotch-decade-execute cycle further, down inte the following stages: 1. Fetch Instruction; 2. Decode Instruction; 3. Caleulate Operands; 4. Fetch Operands; 5. Execute Instruction; 6. Write Operand If each stage takes the same time, then the CPU can be manfpulating up to 6 instructions at any one time. At best, 9 instructions then take 14 time units instead of 54, Exercise: Tyy to work out how 9 instructions can take 14 time unite with 2.6 stage pipeline, Problems with Pipelines In practice pipelining is never quite this good, for the following reasons: 1. The simple pipeline assumes that the next instruction to be executed is the next in sequence. In reality, thore may be a branch. This causes a Pipeline stall; the instructions in the pipeline must be cleared out, It takes time to fill the pipeline again. ‘There are various strategies to cope with pipeline stalls, including branch prediction, multiple pipelines and delayed branching (a compiler optimisation). 2. The operands of one instruction could be changed by another instruction that is still in the pipeline. ‘Various compiler optimisations can reduce this problem. 8. ‘Tho mechanics of operating the pipeline can slow the execution time of each individual instruction. CISC — Complex Instruction Set Computers Until quite recently, the instructions sets of CPUs have steadily become lager, with ome very poverhil omplsiatmatian hat mim stiemente ieee ‘These are called Complex Instruction Set Computers (CISC) Within the CPU, each instnuction is randated into many ‘micro-instructions’ y One argument used to justify CISC is that it can simplify the writing of compilers, as statements in the high level language are easily translated into the corresponding machine code instructions. Also, the code should be compact and efficent as each complex instruction does the work of several simpler instructions. Problems with CISC ‘° CPUs with large instruction sets are more complex, which can make ‘them slower. Im practice, it can be very hard to find the appropriate complex instruction in a given context, As a consequence, most instructions in compiled code just use the simpler instructions, + Although fewer instructions may be needed, the snore complex instructions are typically much longer than the simpler ones, © It can be very hard to optimise CISC code to minimise space, and maximise speed (and the use of the pipeline). RISC — Reduced Instruction Set Computers A different approach is to eliminate the complex instructions entirely and concentrate on optim ions thal oceur most frequently in compiled programs. ‘This leads to Reduced Instruction Set Computers (RISC). The effect of complex instructions is achieved by using several of these simpler instructions. Each instruction is like a miero-instruction on a CISC machine, Attributes of RISC machines Most, RISC machines have the following attributes: 1, One instruction per machine eycle. where a machine cycle is the time ‘taken to decode an instruction and perform an operation on the contents of some registers (page 49). 2. Register-to-Register operations: all aperations are on registers, only load. .And saye instructions access main memory. 2. Fimple address modes: Most instructions use simple regatr addvessing. @ Simple instruction formats: Tnstruction length is fixed, and the location of op-cede and operand fields are fixed. For efficiency, RISC devices have many general purpose registers (sometimes several hundred), All of these features simplify the desien of a high performance CPU, and give UEzemples of RISC more opportunities for compilers to produce optimised code. processors ineluder Sum Spare, Compag (formerly eae BBCI ipha Sticoy Alternatives to RISC and CISC : Graphics MIPS, ARM ond Some modern processors mix attributes of RISC and CISC architecture. For StrongARM, Motorola example, the essentially RISC-based PowerPC has some CISC features, and PowerPC the PentiumPro, Pentium If and Pentium IIT have a RISC core, although externally they are CISC. Other alternatives include the use of Ultra Long Word (ULW) or Eoplicitly Parattel Instructions (EPI): each ‘instruction’ contains several RISC-like insimictions which can be executed in parallel. 2241 the time of writing, Intel ond Hewlett Packard are developing EPI processors, and Transmeta have designed an ULW processor that uses software ‘morphing’ to convert the conventional CISC or RISC Instructions of an existing ‘executable program into ULW instrwetions. Jn principle, this technique allows their processor ta mimic any existing CISC or RISC processor. mere i a sense in which ‘the distributed nature of modern networked computers ean be said to ‘embody some aspects of parallel architecture, MCtustering is sometimes ‘used lo improve system availability: if « member of a cluster “crashes” the other members of the cluster cam take over its work, ‘The Central Processor Unit Compilers for UEW and EPI processors have to guarantee there are no data-dependencies between different parts of the long instructions.1? Alternatives to von Neumann architecture ‘Von Neumann architecture is sometimes called SISD — Single Instruction stream, Single Data stream. We may gain better performance with a Parallel Processing machine which has several processing units working simultaneously. ‘There are several ways of doing this Some Parallel Architectures * MIMD—Muitiple Instruction streams and Multiple Data streams—there are several processors each with their own access to main memory working on separate parts of a problem in a coordinated fashion, + SIMD—Single Instruction stream, Multiple Data stream—several processors, or several companents of one processor, perform the same ‘operations on different data, + SMP—Sytmetric Multiprocessors—a form of MIMD where different, independent processes can be run simultaneausly on different processors. Common in high performance servers * Clustering, another form of MIMD where several conventional machines with processors, memory and disc connected by @ network are used to work on a common task. Externally they can appear to act as one very large, very fast machine.

