Professional Documents
Culture Documents
ECE222
ECE222
ECE222
Consider an n-bit binary number , we usually interpret the positional number system from
MSB to LSB as:
V(B) =
Internally, computers generally use a fixed number of bits (8, 16, 32, etc.).
• Addition is done in a clockwise rotation while subtraction is done a counter clockwise rotation.
• When we cross between 111 and 000, a "carry" is generated.
• When we cross between 000 and 111, a "borrow" is generated.
We also need to be able to represent both positive and negative numbers. Three systems are used for
representing such numbers:
1. Sign-and-magnitude
Dedicate the MSB to the sign.
2. 1's complement
Invert all the bits in the originally positive number to get the negative number. This form allows for
8 patterns but only 7 numbers.
3. 2's complement
Invert all the bits after the first '1' from the right. You can also flip all the bits of the 1's complement
and add 1. This form allows for 8 patterns and 8 numbers.
Sign-and-magnitude and 1's complement have two representations for 0. This is not a good use of bits.
There is a more formal approach to calculate these forms:
In the addition and subtraction of the 2's complement, carry and borrow must be ignored.
To add two numbers, add their n-bit representations, ignoring the carry-out bit from the MSB position. The sum will
be algebraically correct value in 2's-complement if the actual result is in the range through
To subtract two numbers X and Y, that is, to perform X-Y, form the 2's-complement of Y, then add it to X using the
add rule. Again, the result will be the algebraically correct value in 2's complement representation if the actual result
is in the range through
Now there is a no problem crossing 111 to 000 (-1 to 0) or vice versa. However, the problem occurs when crossing
011 to 100 (3 to -4) or vice versa, this is an "overflow" or "underflow". It is important to note that if a carry or
borrow occurs, that this has no meaning in signed numbers and is ignored.
Addition, if no carry occurs, answer is correct. However, if a carry occurs, the correct answer is received by adding
back 1 (end-around-carry).
When performing addition/subtraction, if the signs of addends are the same, the sign of the answer must also be the
same. If the signs of the addends are different, ignore any carries. The answer will always be correct. The same rules
apply for 2's complement.
The grouping of bits was done to ease writing. Using 1 bit is known as the binary system, 3 bits is known as the
octal system and 4 bits is known as the hexadecimal system.
Electronic digital computers as we know them today have been developed since the 1940s. A long, slow evolution
of mechanical calculating devices preceded the development of electronic computers. Here, we briefly sketch the
history of computer development.
• A series of complex mechanical devices, constructed from gear wheels, levers and pulleys were used to
perform basic operations of addition, subtraction, multiplication and division. Holes of punched cards were
mechanically sensed and used to control the automatic sequencing of a list of calculations which essentially
provided a programming capability. These devices enables the computation of complete mathematical tables
of logarithms and trigonometric functions as approximated by polynomials. Output results were punched on
cards or printed on paper.
• Electromechanical relay devices, such as those used in early telephone switching systems, provided the means
for performing logic functions in computers built in the late 1930s and early 1940s.
• The first digital computer was done by Charles Babbage and Ada Lovelace (the "first" programmer).
• Vacuum tube circuits were used to perform logic operations and to store data. This technology initiated the
modern era of electronic digital computers
• The key concept of a stored program was introduced at the same time as the development of the first
electronic digital computer. Programs and their data were located in the same memory, as they are today. This
facilitates changing existing programs and data or preparing and loading new programs and data.
• Assembly language was used to prepare programs and was translated into machine language for execution.
• Basic arithmetic operations were performed in a few milliseconds, using vacuum tube technology to
implement logic functions.
• I/O functions were performed by devices similar to typewriters.
• Magnetic core memories and magnetic tape storage devices were also developed.
• Colossus, an electronic digital computer, was built by British codebreakers during World War II from over
1700 vacuum tubes. It was used to break the codes of the German Lorenz SZ-40 cipher machine that was
used by the German High Command.
• This era also saw the first electronic general-purpose computer, the ENIAC, the EDVAC, an electronic
computer designed to be a stored-program computer as well as the IBM 701/704/709.
• The transistor was invented at AT&T Bell Laboratories in the late 1940s and quickly replaced the vacuum
tube in implementing logic functions. This fundamental technology shift marked the start of the second
generation.
• Magnetic core memories and magnetic drum storage devices were widely used.
• Magnetic disk storage devices were developed in this generation.
• The earliest high-level languages, such as Fortran, were developed, making the preparation of application
programs much easier.
• Compilers were developed to translate these high-level language programs into assembly language, which
was then translated into executable machine-language form.
• This era saw the development of the DEC PDP-1, the first "cheap" computer, IBM 7090 and 7094, a "fast"
computer and the CDC 6600.
• Texas Instruments and Fairchild Semiconductor developed the ability to fabricate many transistors on a single
silicon chip, called integrated-circuit technology. This enabled faster and less costly processors and memory
elements to be built. This began to replace magnetic core memories.
• Other developments included the introduction of microprogramming, parallelism and pipelining.
• Operating system software allowed efficient sharing of a computer system by several user programs.
• Cache and virtual memories were developed. Cache memory makes the main memory appear faster than it
really is and virtual memory makes it appear larger.
• System 360 mainframe computers from IBM (IBM System 360) and the line of PDP minicomputers from
Digital Equipment Corporation (DEC PDP-11) were dominant commercial products.
• Integrated circuit fabrication techniques had evolved to the point where complete processors and large
sections of the main memory of small computers could be implemented on single chips.
• Tens of thousands of transistors could be placed on a single chip, and the name VLSI was coined to describe
this technology.
• A complete processor fabricated on a single chip became known as a microprocessor.
• Companies such as Intel, National Semiconductor, Motorola, Texas Instruments and Advanced Micro
Devices have been the driving forces of this technology.
• A particular form of VLSI technology, called Field Programmable Gate Arrays (FPGAs) has allowed system
developers to design and implement processor, memory and I/O circuits on a single chip to meet the
requirements of specific applications, especially in embedded computer systems.
• Embedded computer systems, portable notebook computers and versatile mobile telephone handsets are now
in widespread use. Personal desktop computers and workstations interconnected by wired or wireless local
area networks and the Internet, with access to database servers and search engines, provide a variety of
powerful computing platforms.
• Supercomputers and Grid computers, at the upper end of high performance computing, are used for weather
forecasting, scientific and engineering computation and simulations.
• This generation saw the development of the VAX 9000, IBM 3090, Cray X-MP (supercomputer) and Intel
4004/8008/8080 (personal computers)
• This saw the development of the Cray MPP (massively parallel supercomputer) and the Fujitsu VPP 500.
• The lowest memory address is located at the 0 while the highest memory address is located at the 2^n -1
position.
• The address points to a unique location in the memory.
• The contents located at a memory address is independent of the memory address itself.
• Modern computers have word lengths that typically range from 16 to 64 bits. If the word length of a
computer is, for example 32 bits, a single word can store a 32-bit signed number or four ASCII-encoded
characters, each occupying 8 bits. A unit of 8 bits is called a byte.
• Machine instructions may require one or more words for their representation.
• In both cases, byte addresses are taken as the addresses of the successive words in the memory of a computer
with 32-bit word length. These are the addresses used when accessing the memory to store or retrieve a word.
• Which is better? Either
• Which is preferred? Big-Endian because the bits appear in the correct order
• Both program instructions and data operands are stored in the memory.
• How to execute an instruction? The processor control circuits must cause the word (or words) containing the
instruction to be transferred from the memory to the processor. Operands and results must also be moved
between the memory and the processor. Thus, two basic operations involving the memory are needed - load
and store.
• Definition (Instructions): Instructions are a series of binary bits stored in memory and interpreted by the
CPU. Instructions may consist of one or more words in memory. Instructions are also characterized by the
number of addresses (for operands which also exist in memory) they require. They specify an operation to be
performed and the operands involved.
• 3 address instructions are very flexible but programs are very large (multiple word instructions)
• The CPU registers are the high-speed memory location built into the microprocessor. CPU uses these
memory locations to store data and instructions temporarily for processing.
• There are two parts to the execution of an instruction:
1. Fetch the instruction from memory (as pointed to be the PC) and place it in the Instruction Register
(IR).
2. Perform the specific instruction:
i. Fetch operands
ii. Arithmetic/logic operations
iii. Store results in memory
iv. Update PC and repeat
• Fetch the first instruction in the program from the main memory.
The PC is the key register here. Copy the PC into the MAR. The
address M is first placed in the MAR. A read signal is generated
by the CPU.
• Because the MAR is clocked, the PC is unaltered. Read the
memory into the MDR. The content of location M, i.e., ADD A,
B instruction is placed in MDR which is then placed in the IR.
The instruction is decoded and required signal components are
activated to perform the operation.
• Instruction is executed, that is, the ADD operation is performed.
• First Operand address A is placed in MAR and read signal is
generated. The operand content 5 is fetched from memory and
placed in MDR. This data 5 is sent to ALU by the single bus.
Second operand address B is placed in MAR and read signal is
generated. Operand data 2 is fetched and placed in MDR. This
data, 2 is then sent to ALU by the single bus. Once two operands
are available inside the ALU, arithmetic addition is performed
due to the signals generated by the system. The result 7 is
temporarily placed in the general purpose register R0.
• It is then sent to MDR by the single bus. An address of
memory where the result data should be stored is placed in
MAR. Once, the result data and address are placed in MDR and
MAR respectively, write signal is generated which fetches the
result data from CPU to proper memory position.
• The considered instruction ADD A, B is in 2-address instruction
format. In this format, 2nd operand source is destination also.
Hence, MAR will hold B and result data 7 is placed to
location B of memory.
• PC is incremented to get the next instruction.
Load Register R1 with the contents of location with Label A [contents of R1 <- contents of A]
( ) - contents of
Executing a given instruction is a two-phase procedure. In the first phase, called instruction fetch, the
instruction is fetched from the memory location whose address is in the PC. This instruction is placed in the IR
in the processor. At the start of the second phase, called instruction execute, the instruction in IR is examined to
determine which operation is to be performed. The specified operation is then performed by the processor. This
involves a small number of steps such as fetching operands from the memory or from processor registers,
performing an arithmetic or logic operation, and storing the result in the destination location. At some point
during this two-phase procedure, the contents of the PC are advanced to point to the next instruction. When the
execute phase of an instruction is completed, the PC contains the address of the next instruction, and a new
instruction fetch phase can begin.
• Sometimes, next sequential instruction is not the one to be executed - loops, conditional tests, etc. Branch
instruction are required. This type of instruction loads a new address into the program counter. As a
result, the processor fetches and executes the instruction at this new address, called the branch target,
instead of the instruction at the location that follows the branch instruction in sequential address order.
• Consider adding a series of N numbers together - we could just write a straight line program to add them
but not in general.
#: immediate operand
[ ]: indirect
[R2] (or ((R2))): takes the
contents of memory location *Left out a step: You need to
pointed to the contents of R2 change R2. You need to
increment it by 4 bytes to get
the other number to add.
Otherwise, you would have
added the same number over
and over
ADD (LOC), R3
(R3) <- ((LOC)) + (R3)
Base with Index (Ri, Rj) EA = (Ri) + (Rj) MOVE (R1, R2), R3
(R3) <- ((R1) + (R2))
Base with Index and X(Ri, Rj) EA = (Ri) + (Rj) + X MOVE 8(R1, R2), R3
Offset (R3) <- ((R1)+(R2)+8)
• There are 15 general purpose registers (R0-R14) plus a 32 bit program counter (R15)
• There is also a 32 bit current program status register (CPSR) which holds, among other things, the
condition code flags (N, Z, C, V)
• ADD
• ADC (add with carry)
• SUB
• SBC (subtract with carry)
• RSB (reverse subtract)
• RSC (reverse subtract with carry)
• CMP (compare)
• TST (AND test)
• TEQ (XOR test)
• AND
• XOR
• ORR (Or operation)
• MVN (Move negative - moves the 1's complement of the operand)
• Definition (Stack): An ordered list of elements, usually words, with the accessing restriction that elements
can be added or removed at one end of the list only (LIFO - last in first out)
• We define two operations: push (put) and pop (take)
• A push operation moves a new operand to the top-of-stack (TOS) (the previous top element is moved down)
• A pop operations takes the top of stack and moves the next lower element to the top
• The structure is sometimes referred to as a pushdown stack.
• ARM uses a Branch and Link (BL) instruction for subroutine calls.
• The return address (next instruction in calling routine) is stored in R14 -if nesting occurs, it must be saved.
• In modern computers, a stack is implemented by using a portion of the main memory for this purpose. In the
ARM, the stack is realized in memory with the aid of a special register (R13) called the Stack Pointer (SP). It
is used to point to a particular stack structure called the processor stack.
• We use a stack that grows in the direction of decreasing memory addresses. The stack is usually placed in
higher memory and programs are placed in "lower" memory - in this way, there is less likelihood of a
collision.
• The SP is always pointing to the TOS.
OR
If you want to push something, you would want to push something into the position "less than 4".
LD R0, (R13)
ADD R13, R13, #4
• Definition (Subroutine): A subroutine is a block of instructions that is executed each time the task has no be
performed.
• Subroutines are used to produce more compact (albeit slower) code for several reasons
○ Avoids duplication
○ Reuse of code
○ Library code
○ Enable modular approach to programming
• Any program that requires the use of the subroutine simply branches to its starting location. When a program
branches to a subroutine, we say that it is calling the subroutine. The instruction that performs this branch
operation is named a Call instruction.
• After a subroutine has been executed, the calling program must resume execution, continuing immediately
after the instruction that called the subroutine. The subroutine is said to return to the program that called it
and it does so by executing a Return instruction.
• Since the subroutine may be called from different places in a calling program, provision must be made for
returning to the appropriate location. The location where the calling program resumes execution is the
location pointed to by the updated program counter (PC) while the Call instruction is being executed. Hence,
the contents of the PC must be saved by the Call instruction to enable correct return to the calling program.
• The way in which a computer makes it possible to call and return from subroutines is referred to as its
subroutine linkage method.
• The simplest subroutine linkage method is to save the return address in a specific location which may be a
register dedicated to this function.
• Such a register is called the link register. When the subroutine completes its task, the Return instruction
returns to the calling program by branching indirectly through the link register. This allows multiple levels of
subprograms but the calling program must (safely) store the return address, that is, you must save the
previous before you get the other. There is also no recursion.
• We can also use a fixed memory location to store the return address. Although this is simple, its disadvantage
is that there is only one level of subroutine.
• The Call instruction is just a special branch instruction that performs the following operations:
○ Stores the contents of the PC in the link register
○ Branch to the target address specified by the Call instruction
• The Return instruction is a special branch instruction that performs the operation
○ Branch to the address contained in the link register
• Definition (Subroutine Nesting): A common programming practice, called subroutine nesting, is to have one
subroutine call another. In this case, the return address of the second call is also stored in the link register,
overwriting its previous contents. Hence, it is essential to save the contents of the link register in some other
location before calling another subroutine. Otherwise, the return address of the first subroutine will be lost.
• Subroutine nesting can be carried out to any depth. Eventually, the last subroutine called completes its
computations and returns to the subroutine that called it. The return address needed for this first return is the
last one generated in the nested call sequence.
• That is, return address are generated and used in the LIFO order. This suggests that the return addresses
associated with subroutine calls should be pushed onto the processor stack.
• Stack Linkage:
○ Call to subroutine pushes address of next sequential instruction (from the PC/link register) onto the
stack, accessed through the stack pointer SP, before it calls another subroutine
○ Return pops the saved return address from the stack and load it into the PC/link register
○ Levels and recursion only limited by stack space
○ Example (ARM): ARM uses R14 to store the return address. However, it must be pushed on stack if
nesting occurs.
• Definition (Parameter Passing): When calling a subroutine, a program must provide to the subroutine the
parameters, that is, the operands or their addresses, to be used in the computation. Later the subroutine returns
other parameters, which are the results of the computation. This exchange of information between a calling
program and a subroutine is referred to as parameter passing.
• There are several ways to pass parameters to/from subprograms:
○ Fixed locations (assuming you know them ahead of time)
○ Registers (straightforward and efficient if you have lots of them)
○ Via the stack (very powerful but can get confusing)
• Subroutine considerations:
○ Main or calling routine may be using some registers - they may depend upon the contents
○ Can be fatal if subprogram changes registers
• Procedures to remember when passing parameters:
○ Load parameters onto stack
○ Access parameters during subprogram
○ Return results to calling routine
○ Remove stack clutter! (# of pushes = # of pops)
○ Do not change stack point to retrieve parameters during subprogram. You can copy SP to an address
register and use it instead
• Definition (Assembler): The assembler program is one of a collection of utility programs that are a part of
the system software of a computer. The assembler, like any other program, is stored as a sequence of machine
instructions in the memory of the computer.
• A user program is usually entered into the computer through a keyboard and stored either in the memory or
on a magnetic disk. At this point, the user program is simply a set of lines of alphanumeric characters. When
the assembler program is executed, it reads the user program, analyzes it, and then generates the desired
machine language program. The latter contains patterns of 0s and 1s specifying instructions that will be
executed by the computer. The user program in its original alphanumeric text format is called a source
program, and the assembled machine-language program is called an object program.
• We must also be able to control certain aspects of the assembly process - assembler directives.
• In addition to providing a mechanism for representing instructions in a program, assembly language allows
the programmer to specify other information needed to translate the source program into the object program.
We have already mentioned that we need to assign numerical values to any names used in a program.
Suppose that the name TWENTY is used to represent the value 20. This fact may be conveyed to the
assembler program through an equate statement such as
TWENTY EQU 20
This statement does not denote an instruction that will be executed when the object program is run; in fact, it
will not even appear in the object program. It simply informs the assembler that the name TWENTY should
be replaced by the value 20 wherever it appears in the program. Such statements, called assembler directives
(or commands), are used by the assembler while it translates a source program into an object program.
In the data area, which follows the code area, the DCD
directives are used to label and initialize the data
operands. The word locations SUM and N are initialized
to 0 and 5, respectively, by the first two DCD directives. The address NUM1 is placed in the location POINTER by
the next DCD directive. The combination of the instruction LDR R2, POINTER and the data declaration POINTER
DCD NUM1 is one of the ways that the pseudoinstruction LDR R2, =NUM1 can be implemented.
The last DCD directive specifies that the five numbers to be added are placed in successive memory word locations,
starting at NUM1.
• One of the basic features of a computer is its ability to exchange data with other devices. This communication
capability enables a human operator, for example, to use a keyboard and a display screen to process text and
graphics.
• Types of Transfers
1. Parallel transfer
▪ Multiple bits are transferred simultaneously
▪ High speed but costly
▪ Problems with long distances (timing, etc.)
2. Serial transfer
▪ Uses a single wire and send data one bit at a time
▪ Slower speed
▪ Less costly over distance
▪ Two types:
□ Synchronous Serial Transfer
Transmitter sends data bits along with clock signal so receiver knows when data is
valid
Higher speed but requires additional lines
Speed can also be varied
□ Asynchronous Serial Transfer
No clock signal is sent
Sender and receiver agree on baud rate (bps) and thus the duration of each bit
Special signaling is required to synchronize sender/receiver (start bits, stop bits,
parity bits)
2 bus:
• An I/O device is connected to the interconnection network by using a circuit, called a device interface, which
provides the means for data transfer and for the exchange of status and control information needed to
facilitate the data transfers and govern the operation of the device.
• The interface includes some registers that can be accessed by the processor.
• One register may serve as a buffer for data transfers, another may hold information about the current status of
the device, and yet another may store the information that controls the operational behavior of the device.
• These data, status, and control registers are accessed by program instructions as if they were memory
locations.
• Typical transfers of information are between I/O registers and the registers in the processor.
• I/O devices are memory-mapped, that is they look like one or more memory locations
• Simplest form of I/O is Program Controlled I/O - here, the program has complete control of the I/O operation.
For example: consider a task that reads characters typed on a keyboard, stores these data in the memory, and
displays the same characters on a display screen. A simple way of implementing this task is to write a
program that performs all functions needed to realize the desired action.
• In addition to transferring each character from the keyboard into the memory, and then to the display, it is
necessary to ensure that this happens at the right time.
• An input character must be read in response to a key being pressed. For output, a character must be sent to the
display only when the display device is able to accept it.
• The rate of data transfer from the keyboard to a computer is limited by the typing speed of the user, which is
unlikely to exceed a few characters per second. The rate of output transfers from the computer to the display
is much higher. It is determined by the rate at which characters can be transmitted to and displayed on the
display device, typically several thousand characters per second. However, this is still much slower than the
speed of a processor that can execute billions of instructions per second. The difference in speed between the
processor and I/O devices creates the need for mechanisms to synchronize the transfer of data between them.
• One solution to this problem involves a signaling protocol. On output, the processor sends the first character
and then waits for a signal from the display that the next character can be sent. It then sends the second
character, and so on.
• An input character is obtained from the keyboard in a similar way. The processor waits for a signal indicating
that a key has been pressed and that a binary code that represents the corresponding character is available in
an I/O register associated with the keyboard. Then the processor proceeds to read that code.
• General Procedure:
○ Processor checks device's status
○ Processor moves data (in or out)
DMA vs PC I/O
• PROCESS:
○ Complete current instruction
○ Save processing environment
○ Go to service routine
○ Service interrupt
○ Return - restore environment
• Interrupts can occur at any time. This is why the proper procedure with using the stack must be observed.
• The means by which these issues are handled vary from one computer to another, and the approach taken is
an important consideration in determining the computer's suitability for a given application.
• When an interrupt request is received it is necessary to identify the particular device that raised the request.
• Furthermore, if two devices raise interrupt requests at the same time, it must be possible to break the tie and
select one of the two requests for service. When the interrupt-service routine for the selected device has been
completed, the second request can be serviced.
• The information needed to determine whether a device is requesting an interrupt is available in its status
register. When the device raises an interrupt request, it sets to 1 a bit in its status register, which we will call
the IRQ bit.
• The simplest way to identify the interrupting device is to have the interrupt-service routine poll all I/O
devices in the system.
• The first device encountered with its IRQ bit set to 1 is the device that should be serviced. An appropriate
subroutine is then called to provide the requested service.
• The polling scheme is easy to implement. Its main disadvantage, however, is the time spent interrogating the
IRQ bits of devices that may not be requesting any service.
What if there are multiple devices? How do we sort them out (priority)?
• Single IRQ line
○ To reduce the time involved in the polling process, a device requesting an interrupt may identify itself
directly to the processor. Then, the processor can immediately start executing the corresponding
interrupt-service routine. The term vectored interrupts refers to interrupt-handling schemes based on
this approach.
○ A device requesting an interrupt can identify itself if it has its own interrupt-request signal, or if it can
send a special code to the processor through the interconnection network.
○ The processor's circuits determine the memory address of the required interrupt service routine. A
commonly used scheme is to allocate permanently an area in the memory (somewhere in the bottom of
memory) to hold the addresses of interrupt-service routines. These addresses are referred to as
interrupt vectors, and they are said to constitute the interrupt-vector table.
○ When an interrupt request arrives, the information provided by the requesting device is used as a
pointer into the interrupt-vector table, and the address in the corresponding interrupt vector is
automatically loaded into the program counter.
○ A vectored interrupt is "smart" as it points us to where the service routine is. It is faster than software
polling. However, more hardware is needed to be "smart".
What if there are multiple devices? How do we sort them out (priority)?
• Hardware Polling (Daisy-Chain)
• Controlling Interrupts:
○ CPU
▪ Enabling of interrupts
▪ Priority structure (calculates if it is vectors or software polling)
▪ Masking of interrupts - CPU may only listen to certain priority
○ Device
▪ Enabling interrupt requests (generally a bit in the control register). Must be able to generate
interrupts
ARM Exceptions
• The ARM processor has two "normal" interrupt lines - I (normal) and F (fast interrupt request) lines which
can be disabled in the status register. These are interrupt-disable bits which determine whether the processor
is interrupted when an interrupt request is raised on the corresponding lines (IRQ and FIQ). The processor is
not interrupted if the disable bit is 1; it is interrupted if the disable bits is 0.
• Application programs run in User mode. However, user mode is not privileged and cannot manipulate these
bits (only supervisory mode can deal with them)
• System mode and the five exception modes are privileged modes. When the processor is in a privileged
mode, access to the status register is allowed so that the mode bits and the interrupt-disable bits can be
manipulated. This is done with instructions that are not available in User mode, which is an unprivileged
mode.
• FIQ is intended for one device or a very small number of devices that require service - in FIQ, registers R8-
R12 (general) and R13/R14 are replaced - any changes to these registers will not affect the user registers after
the exception has been services so they do not have to be saved or restored (it will not affect registers from
the main routine)
• IRQ exceptions are for dealing with "normal interrupts"
• Only R13/R14 are replaced (along with the processor state (status register)) - any registers used in the
exception service must be saved and restored (on the stack).
• Normally, the processor is running in User or System mode with the "normal" 16 registers available.
• When an exception occurs, switch is made to one of five exception modes where some of the 16 registers are
replaced by an equal number of banked registers.
• At any one time, there is only one device that controls the bus - Bus Master
• Definition (Bus Master): The device that initiates data transfer requests on the bus
• Bus Arbitration is required if there are multiple devices that can be Bus Master.
• There are two main methods:
○ Centralized Arbitration
▪ CPU (or a special device) supervises control of the bus
▪ Example: Multiple DMA controllers
□ DMA device requests control of bus by asserting BUS REQUEST (BR)
□ Processor activates BUS GRANT (BG1) which is connected in a daisy-chain
□ Bus use is indicated with BUS BUSY (BBSY) signal
○ Distributed Arbitration
▪ If device are peers, then arbitration can be done without a central controller
▪ "Competition" starts when one or more devices activate the Start Arbitration signal
▪ Each device "bids" for control of the bus by placing the bits of its ID on an arbitration bus - all
competing devices look for their address on the bus
▪ The "highest" address wins control of bus
▪ Generally, that device will drop out of competition until all devices have had a chance to control
the bus
• A bus requires a set of rules, often called a bus protocol, that governs how the bus is used by various devices.
The bus protocol determines when a device may place information on the bus, when it may load the data on
the bus into one of its registers, and so on. These rules are implemented by control signals that indicate what
and when actions are to be taken.
• There are three classes of lines which make up the "Bus"
○ Data
○ Address
○ Control
• One control line, usually labelled R/ , specifies whether a Read or Write operation is to be performed. It
specified Read when set to 1 and Write when set to 0. When several data sizes are possible, such as byte,
halfword or word, the required size is indicated by other control lines.
• The bus control lines also carry timing information. They specify the times at which the processor and the I/O
devices may place data on or receive data from the data lines.
• A variety of schemes have been devised for the timing of data transfers over a bus.
• These can be broadly classified as either synchronous or asynchronous schemes.
• In any data transfer operation, one device plays the role of a master (the device that initiates data transfers by
issuing Read/Write commands on the bus - normally, the processor) and slave (the device addressed by the
master).
Synchronous Bus
• All devices derive timing information from a common control line called the bus clock.
• The signal on this line has two phases: a high level followed by a low level. The two phases constitute a clock
cycle. The first half of the cycle between the low-to-high and high-to-low transitions is often referred to as a
clock pulse.
• Clock pulses are evenly spaced and must be long enough to accommodate slowest devices.
• Consider a read operation from a device:
• At time t0, the master places the device address on the address lines and sends a command on the control lines
indicating a Read operation. The command may also specify the length of the operand to be read.
• Information travels over the bus at a speed determined by its physical and electrical characteristics.
• The clock pulse width, t1-t0, must be longer than the maximum propagation delay over the bus. Also, it must
be long enough to allow all devices to decode the address and control signals, so that the addressed device
(the slave) can respond at time t1, by placing the requested input data on the data lines.
• At the end of the clock cycle, at time t2, the master loads the data on the data lines into one of its registers.
• To be loaded correctly into the register, data must be available for a period greater than the setup time of the
register. Hence, the period t2-t1 must be greater than the maximum propagation time on the bus plus the setup
time of the master's register.
Synchronous Bus
• A similar procedure is followed for a Write operation. The master places the output data on the data lines
when it transmits the address and command information. At time t2, the addressed device loads the data into
its data register.
• The timing diagram is an idealized representation - in reality, propagation delays on bus wires and in the
circuits of the devices cause different parts of the circuit to see signals at different times.
• The diagram above shows reality - two views of each signal, except the clock.
• Because signals take time to travel from one device to another, a given signal transition is seen by different
devices at different times.
• The top view shows the signals as seen by the master and the bottom view as seen by the slave.
• We assume that the clock changes are seen at the same time by all devices connected to the bus.
• System designers spend considerable effort to ensure that the clock signal satisfies the requirement.
Synchronous Bus
• Multiple Clock Cycle Transfers
○ In the previous cycle, all transfers were done in one clock cycle - as we noted, it is simple but clock
cycle must be long enough to accommodate the slowest device, that is, its slow transfer rate. This
forces all devices to operate at the speed of the slowest device.
○ Also, the processor has no way of determining whether the addressed device has actually responded.
At t2, it simply assumes that the input data are available on the data lines in a Read operation, or that
the output data have been received by the I/O device in a Write operation. If, because of a malfunction,
a device does not operate correctly, the error will not be detected.
○ SOLUTION: Add more signals to allow device to tell master when it is ready
○ To overcome these limitations, most buses incorporate control signals that represent a response from
the device. These signals inform the master than the slave has recognized its address and that it is
ready to participate in a data transfer operation.
○ They also make it possible to adjust the duration of the data transfer period to match the response
speeds of different devices. This is often accomplished by allowing a complete data transfer operation
to span several clock cycles. Then, the number of clock cycles involved can vary from one device to
another.
○ During clock cycle 1, the master sends address and command information on the bus, requesting a
Read operation. The slave receives this information and decodes it. It begins to access the requested
data on the active edge of the clock at the beginning of clock cycle 2. We have assumed that due to the
delay involved in getting the data, the slave cannot respond immediately. The data become ready and
are placed on the bus during clock cycle 3. The slave asserts a control signal called Slave-ready at the
same time. The master, which has been waiting for this signal, loads the data into the register at the end
of the clock cycle. The slave removes its data signals from the bus and returns its Slave-ready signal to
the low level at the end of cycle 3. The bus transfer operation is now complete, and the master may
send new address and command signals to start a new transfer in clock cycle 4.
○ The Slave-ready signal is an acknowledgement from the slave to the master, confirming that the
requested data have been placed on the bus. It also allows the duration of a bus transfer to change from
one device to another.
Asynchronous Bus
• An alternative scheme for controlling data transfers on a bus is based on the use of a handshake protocol
between the master and the slave to do transfers (no central clock).
• Definition (Handshake): A handshake is an exchange of command and response signals between the master
and the slave. It is a generalization of the way the Slave-ready signal is used in the previous figure.
• A control line called Master-ready is asserted by the master to indicate that it is ready to start a data transfer.
The slave responds by asserting Slave-ready.
Asynchronous Bus
Asynchronous Bus
• The timing for an output operation is essentially the same as for an input operation.
• In this case, the master places the output data on the data lines at the same time that it transmits the address
and command information.
• The selected slave loads the data into its data register when it receives the Master-ready signal and indicates
that it has done so by setting the Slave-ready signal to 1. The remainder of the cycle is similar to the input
operation.
• The I/O interface of a device consists of the circuitry needed to connect that device to the bus.
• On one side of the interface are the bus lines for address, data and control.
• On the other side are the connections needed to transfer data between the interface and the I/O device. This
side is called a port, and it can be either a parallel or a serial port.
• Inputs to registers can always "listen" to the bus. They are only clocked when they are addressed
• Output from register are only transferred to the bus when required (tri-state drivers).
• Each device will require address decoding and control signal generation.
• A parallel port transfers multiple bits of data simultaneously to or from the device.
• A serial port sends and receives data one bit at a time.
• Communication with the processor is the same for both formats; the conversion from a parallel to a serial
format and vice versa takes place inside the interface circuit.
Examples of Standards:
• RSC 232C
○ CCITT V.24
○ Standard for Serial Communications
○ Synchronous and asynchronous modes
○ Specifies electrical, physical, mechanical, procedural aspects of communications
• Nubus
○ 32 bit architecture
○ Up to 16 devices on one backplane
○ Designed for high speed/low cost
○ 10MHz clock
○ 32 bit shared address space
○ Single master - all other devices are slaves
• Multibus
○ INTEL initiative
○ 8, 16, 32 bit asynchronous transfers
○ IEEE 796 (16 bit)
○ IEEE 1296 (32 bit)
• IEEE 488
○ Standard for laboratory instrumentation
○ 8 bit parallel
▪ Up to 250Kbytes/sec
▪ Up to 20 metres
○ Modes
▪ Listener
▪ Talker
▪ Controller
Interconnection Standards
Examples of Standards:
• Peripheral Component Interconnect (PCI) Bus
○ First introduced in 1992
○ One of the first standard interface that was independent of a particular processor
○ This was developed as a low-cost, processor-independent bus.
○ It is housed on the motherboard of a computer and used to connect I/O interfaces for a wide variety of
devices.
○ A device connected to the PCI bus appears to the processor as if it is connected directly to the
processor bus.
○ Its interface registers are assigned addresses in the address space of the processor.
○ First "Plug-and-play" - connect device to the bus, software takes care of the rest
Examples of Standards:
• Small Computer System Interface (SCSI)
○ This refers to a standard bus defined by the American National Standards Institute - ANSI X3.131
○ The SCSI bus may be used to connect a variety of devices to a computer. It is particularly well-suited
for use with disk drives and is often found in installations such as institutional databases or email
systems where many disks drives are used.
○ In the original specification of the SCI standard, devices are connected to a computer via a 50-wire
cable, which can be up to 25 metres in length and can transfer data at rates of up to 5 Megabytes/s
(increased to 620Megabytes/s in later version). This speed is dependent on the number and length of
cable (few and shorter cables = higher rate)
○ Data are transferred either 8 bits of 16 bits in parallel, using clock speeds of up to 80MHz.
Select device 5
○ After arbitration/selection -
Master/slave relationship to
transfer data
○ BSY is released when data transfer
is finished
○ If connection is suspended - it can
be reselected - target device now
acquires the bus and selects
initiator
Interconnection Standards
Examples of Standards:
• Universal Serial Bus (USB)
○ Collaborative standard developed by computer and telecommunications industry
○ A large variety of devices are available with a USB connector
○ The commercial success of the USB is due to its simplicity and low cost.
○ The original USB specification supports two speeds of operation, called low-speed (1.5 Megabits/s)
and full-speed (12 Megabits/s).
○ Later USB2, called High-Speed USB was introduced. It enabled data transfers at speed up to 480
Megabits/s.
○ As I/O devices continued to evolve with even higher speed requirements, USB 3 (called Superspeed)
was developed. It supports data transfer rates up to 10 Gigabits/s.
○ The USB C, Thunderbolt, was developed, which supported up to 40 Gigabits/s
○ Data is transmitted in serial form. Clock and data are combined to prevent any skew problems
○ USB Architecture
▪ The USB uses point-to-point connections and a serial transmission format.
▪ When multiple devices are connected, they are arranged in a tree structure.
▪ Each node of the tree has a device called a hub, which acts as an intermediate transfer point
between the host computer and the I/O devices.
▪ At the root of the tree, a root hub connects the entire tree to the host computer.
▪ The leaves (functions) of the tree are the I/O devices.
Examples of Standards:
• Universal Serial Bus (USB)
○ USB Architecture
▪ Arbitration: USB works on device polling only. I/O devices are only allowed to respond when
polled.
▪ This allows for simple, inexpensive hubs (no real arbitration required).
▪ Each device on the USB, whether it is a hub or an I/O device is assigned a 7-bit address. This
address is local to the USB tree and is not related in any way to the processor's address space.
▪ The root hub of the USB, which is attached to the processor, appears as a single device.
▪ The host software communicates with individual devices by sending information to the root
hub, which it forwards to the appropriate device in the USB tree.
▪ When a device is first connected to a hub, or when it is powered on, it has the address 0.
▪ Periodically, the host polls each hub to collect status information and learn about new devices
that may have been added or disconnected.
▪ When the host is informed that a new device has been connected, it reads the information in a
special memory in the device's USB interface to learn about the device's capabilities.
▪ It then assigns the device a unique USB address and writes that address in one of the device's
interface registers. It is this initial connection procedure that gives the USB its plug-and-play
capability.
▪ When happens if there are multiple USB devices? Processor will slow down (host will have to
poll occasionally).
Interconnection Standards
Examples of Standards:
• Universal Serial Bus (USB)
○ USB Data
▪ There are two types of packets exchanged:
□ Control packets - address, acknowledgement, errors, etc.
□ Data packets - actual data
▪ Packets have PID that identifies type of packet (four bits used to identify type - transmitted
twice).
Examples of Standards:
• Universal Serial Bus (USB)
• The connection between the processor and its memory consists of address, data and control lines. The
processor uses the address lines to specify the memory location involved in a data transfer operation and uses
the data lines to transfer the data. At the same time, the control lines carry the command indicating a Read or
a Write operation and whether a byte or a word is to be transferred. The control lines also provide the
necessary timing information and are used by the memory to indicate when it has completed the requested
operation. When the processor-memory interface receives the memory's response, it asserts the MFC signal.
This is the processor's internal control signal that indicates that the requested memory operation has been
completed. When asserted, the processor proceeds to the next step in its execution sequence.
• Memory is built from a collection of memory cells. For example: 1 bit = 1 cell.
• Side node: What device can store 1 bit? Flip Flop. However, if we were to use flip flops to build memory, it
would be too huge.
• Cells are grouped into bytes/words
• A memory address returns a selected group of memory cells.
• Memory can be classified according to:
○ Primary or Secondary Memory
○ Access Memory - Random/Sequential
○ Memory Technology - Bipolar, CMOS, Magnetic, Optical
○ Memory Retention - Static/Dynamic
○ Memory Type - R/WM, ROM, EPROM, EEPROM, FLASH, SSD
Today:
• 16GB RAM costs $100CDN (approximately $0.007/MB)
• 2TB hard drive costs $100CDN (approximately $0.00005/MB)
• 100 4.7GB writeable DVDs cost $30CDN (approximately $0.000064/MB)
How it works? Memory Side Note
Space
UNIVAC Timed serial acoustic delay line 10-91 bit Why mercury?
Mercury Tube "words" per Because it is pretty dense. It
tube must be very well-
100 tubes = controlled at a specific
91000 bits of temperature.
memory
Magnetic Long rotating cylinder with magnetic coating. Some drum spun at speeds
Drums Multiple read heads were placed along cylinder up to 75000rpm.
"tracks". Instructions had fields for location of up
to 3 operands and next instruction location.
Creative program could greatly improve
performance.
Magnetic Core Ferro-magnetic disks were used as memory cells.
Wires were wrapped around in x/y plane plus a
sense wire. Hysteresis properties of disks would
remember the direction of polarization when
current from x/y lines were activated.
Capacitors A large bank of capacitors were used to create Developed in Bletchley
"dynamic memory" - capacitors were either Park (British crypto unit) -
charged (a "1") or not. Charge would slowly leak developed some of the first
out making it dynamic. computers used for
analyzing codes
Semiconductor Use transistors to form memory cells (F/F). VLSI Originally introduced in the
technology allowed many to be placed on one chip. late 1960s
Individual cells can be placed in a 2-D array
2-D Memory
2-D Memory
Latency
• Amount of time required to write/read a byte/word of data to/from memory
• The time required to transfer depends also on the rate at which successive words can be transferred and on
the size of the block. The time between successive words of a block is much shorter than the time needed to
transfer the first word.
• Single word access is worst case - burst mode "looks" much better - depends on over head to start transfer
(large) then time to transfer successive words (short)
Bandwidth
• A measure of how much data can be transferred per unit time (1 second)
• It depends on the speed of access to data (latency) and number of bits that can be transferred in parallel
(number of wires and speed of the link).
Double-Data-Rate SDRAM
• Faster version of SDRAM. New organizational and operational features to make it possible to achieve high
data rates during block transfers.
• The key idea is to take advantage of the fact that a large number of bits are accessed at the same time inside
the chip when a row address is applied.
• Uses two interleaved memory banks for back and forth switching.
• Transfers data on both edges of the clock (rising and falling).
• It has the same latency as SDRAM - double bandwidth (at best)
• Most effective in large block transfers - no advantage in individual word transfers
• Current standard is DDR3 (64 bit transfers)
RAMBUS Memory
• Used in gaming systems for extreme speeds
• It achieves a high data transfer rate by providing a high-speed interface between the memory and the
processor.
• In order to increase the bandwidth of this connection is to use few wires with a higher clock speed.
• The key feature is the use of differential signaling technique to transfer data to and from the memory chips.
Signals are transmitted using small voltage swings (0.3V) about and below a reference voltage.
• In previous cases - outputs were around 0V to Vcc (logic 1 is approximately 5V)
• Bus width is 8 or 16 (dual channel) bits
• Uses packets for transfer - no separate address lines
• Memory can only be read
○ Different technologies:
▪ Mask programmable ROM (done at the factory, expensive setup, cost to purchase is cheap)
▪ Fuse programmable ROM (PROM) (programmed in "burner", cheap setup, expensive cost to
purchase). This has no contents when delivered since you will be writing the contents.
ROM (Fuse-Programmable)
• If we interleave modules so that consecutive word access occur in different modules - we can start accesses
simultaneously.