Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 20

Asynchronous Buses

• An asynchronous bus has no master clock; instead, it uses a


handshake protocol between a master and a slave device.

• After the master asserts the ADDRESS, MREQ and RD


lines, it then asserts a special master synchronization line,
MSYN and waits for a response from the slave on a slave
synchronization line, SSYN.

• When the slave device sees MSYN, it performs the


necessary operation and asserts the SSYN when it is done.

• 3.38

B. Ross COSC 3p92 1


Asynchronous bus

• full handshake:
– 1. MSYN asserted
– 2. SSYN asserted in response
– 3. MSYN negated in response
– 4. SSYN negated in response

• Advantages:
– relatively independent of timing (other than skew
times)
– bus can take advantage of faster devices (unlike
synchronous buses)

• Disadvantage: more complex to build


– eg, memory chip design and CPU design are
interwoven

• Synchronous buses more common.

B. Ross COSC 3p92 2


Synchronous vs asynchronous

• Both synch and asynch involve devices reacting to signals


– eg. MREQ, RD, ...
• Synchronous: signals are enabled during a clock cycle
– the clock is a “master conductor” who coordinates timing of
activities
– signals can occur during specified timing limits within cycle
• eg. a time duration after another signal changes
• eg. a time duration after clock signal rises or falls
– These time durations (table 3.38) are critical. If devices are
too fast, they can read wrong signal. If they’re too slow,
they may cause extra wait state(s)
• Asynchronous signals can occur at any moment after
handshaking signals seen.
– No clock to coordinate timing
– cause/effect relationship via SYN line
– timing not as critical
• Synchronous devices cheaper to make
– devices react to signals at appropriate times during clock
cycles
– manufacturer simply needs to ensure that time durations are
respected.

B. Ross COSC 3p92 3


Bus Arbitration

• When more than one device wants to be the bus master, we


need some bus arbitration mechanism to prevent chaos.

• A centralized arbitration scheme requires a dedicated bus


arbiter, who determines which device is the bus master
next; hence, every device connects to the bus arbiter with
one (or more) bus request and one (or more) bus grant
lines.

• priority of device = position on chain: closer devices have


higher priority --> “daisy chain”

• can use multiple bus request and grant lines; each set
represents a priority, and devices hooked up according to
priority needs

– if multiple prioriy levels are being requested, arbiter


grants bus to higher priority line

– each priority line is daisy chained

B. Ross COSC 3p92 4


Bus

• 3.39

B. Ross COSC 3p92 5


Bus

• A decentralized arbitration scheme has no arbiter; the


devices themselves would follow a specific protocol to
determine who goes next.

• Multibus: variation of daisy chain

– 3 lines: request, busy, arbitration


– to use bus, device checks if busy is free and IN
arbitration is asserted --> if yes, then OUT is negated
– all devices downstream are not permitted to use bus
until OUT asserted
– BUT if device upstream negates OUT, this preempts
this device --> daisy chain structure

• 3.40

B. Ross COSC 3p92 6


Operations: Bus contention, interrupts

• bus contention: "lock" command can be used for


semaphore commands

– a special line is asserted which holds the bus for one


multiprocessor, in order to access shared memory data
structures

• interrupts:

– when I/O device done, it issues interrupt on bus

– multiple interrupts possible: an arbitration scheme


used like bus arbitration
– eg. assign device priorities

B. Ross COSC 3p92 7


Operations: interrupts

• interrupt controller: between CPU and devices to arbitrate


interrupts

• eg. Intel 8259A

• 3.42

• when device asserts 1 of 8 interrupt lines, controller asserts


INT and places device # on D0-D7 lines

– CPU access interupt vector and calls interrupt handler

– can cascade controllers: 2 stage = 64 devices

B. Ross COSC 3p92 8


Buses: IBM PC

• 62 lines (20 addr, 8 data, 34 control)


– data are only bidirectional lines

• synchronous bus: clock rate of 4.77 MHz (a multiple of another


clock set to video MHz)

• latches required because of multiplexing of pin signals: hold


values until their part of cycle

• transceivers used for addr, data lines because MOS 8088 is too
weak for reading & sending signals on bus

• bus has 2 address spaces - I/O, or Memory (MEMR, MEMW,


IOR, IOW control)
• - Intel’s explicit identification of I/O vs memory will be seen
in instruction set as well

• 8237A: DMA controller chip


– logic for bus protocol, DMA, block xfer
– 8088 sends it addr, device, counts, etc for DMA
transactions

B. Ross COSC 3p92 9


Later PC buses

• 80286 expansion (IBM AT): --> ISA (Industry Standard Architecture) bus
– 1st connect half = 8088
– 2nd half has 36 new lines (more data, addr, interrupt, DMA
channels,...)

• 3.47

• PS/2 series - Microchannel bus totally redefined and patented


– IBM’s attempt to discourage clones; but PS/2 not too successful

• EISA - Extended ISA


– industry (non-IBM) extension of ISA to 32-bit data transfer
– still back-compatible

B. Ross COSC 3p92 10


PCI Bus

• high bandwidth bus, suitably for multimedia


– ISA: 8.33 MHz, 2 bytes/cycle --> 16.7 MB/sec
– EISA: 4 bytes/cycle --> 33.3 MB/sec
– but full video requires:
• 2 * (1024x768 pixels/frame)*3 bytes/pixel*30 frames/sec =
135 MB/sec
(must xfer from HD to mem, then to video card, all on same
bus!)
• PCI 2.1 (1995):
– 66 MHz
– 64 bit transfers
– bandwidth: 528 MB/sec
• Typical PC systems: [3.50]
– up to 133MHz+; 250MHz+ in workstations(Suns)
– PC’s still have old ISA buses:
• access via ISA bridge(s)
• access to IDE disks, old slower peripherals
– dedicated fast access to memory
– PCI access to graphics, SCSI, USB, ...
• PCI cards come in 2 different voltages, and 32 and 64 bit versions (have
120 pins and 120+64 pins resp.)
• buses and cards can run at 33MHz or 66 MHz
• synchronous
• multiplex address and data pins
• AGP - accelerated graphics port
– created to handle higher video resolutions
– AGP 3.0: 2. 1GB/sec
– Recently replaced with PCI Express! (later)
• Pentium 4 architecture (fig 3.53)
– bridge connects all components
– PCI has low-speed peripherals

B. Ross COSC 3p92 11


PCI Bus, Arbitration

• PCI is synchronous
– master/slave (“initiator/target”)
– address and data lines are multiplexed: keeps pin count
down
– hence 3 cycles required:
1. master puts address on bus
2. master removes address, bus given to slave
3. slave outputs data
• centralized bus arbiter [fig 3.51]
– REQ#: device requests bus
– GNT#: arbiter asserts to grant bus to device
– no arbitration algorithm specified (can be round robin,
priority, ...)
• Transactions:
– normally 1 transaction per req/grant, with intervening wait
– longer or back-to-back xfers possible

B. Ross COSC 3p92 12


PCI Bus Signals

• [fig 3.52]
• Some signals:
– multiplexing: cycle 1: addr; cycle 3: data
– C/BE#: (i) cycle 1 = bus command (read 1 word, etc.)
• (i) cycle 2 = bit map of 4 bits telling which byte are
valid in 32-bit word
– FRAME#: master sends to start trans, indicate addr and cmd
lines are valid
– IRDY# = master ready to accept data
– IDSEL = select config space (device descr, “plug & play”)
– DEVSEL# = slave has read address
– TRDY# = data for read ready, or ready to accept data for
write
– 64-bit signals: expanded trans for 64 bits

B. Ross COSC 3p92 13


PCI bus transactions

• [fig 3.53]
• very similar to earlier example of synch bus timing
• actions occur on falling edges of clock
• T1:
– master puts addr on AD, read command on C/BE#
– then FRAME# to start transaction
• T2:
– master ‘floats’ addr bus so slave can put data on it
– IRDY: master ready to accept data
– C/BE# changed to indicate which bytes are to be enabled
• T3:
– slave asserts DEVSEL# (it got the address)
– puts data on AD lines, and asserts TRDY# when done
– (will wait until next cycle if it can’t do in time... wait state)

B. Ross COSC 3p92 14


PCI Express

• Newest bus standard in PCs (fig 3.57)


• Introduced to meet higher bandwidth demands
• Radically different design.
1. centralized switch between devices, CPU
2. fast serial ports between devices (vs. slower parallel)
note that parallel skew inhibits making buses too fast
3. network model of comm: send packets (vs. master/slave)
based on network design
4. error-detection in packets
5. longer connection lengths possible
6. expandable: multiple layers
7. smaller connectors
• (fig 3.58)
• protocol: communication rules
• protocol stack: hierarchy of rules at different levels
– 1. Physical (lowest): fast bit communication
– 2. link layer: packet transmission
• error correction
– 3. transaction layer: bus activity
• read or write transactions
• priorities
• etc
– 4. software layer: interface to OS
• can emulate PCI Express (but lose advantages!)

B. Ross COSC 3p92 15


Pinouts: Intel Pentium 4

• popular CISC chip


• Introduced in Nov 2000
– 1.5 GHz, 42 million transistors
– line width (internal wire): 0.18 micron
– 3 years later: 3.2 GHz, 55 million trans, 0.09 micron
• direct descendant of 8088 in original PC (1981)
– instructions are back-compatible for all Intel chips back to it!
• 32-bit CPU from software perspective
– but can transfer 64 bits at once (h/w perspective)
• Radically different internal microarchitecture: NetBurst
– deeper pipeline
– 2 ALU’s at twice the clock freq
– hyperthreading
• superscalar: executes multiple instructions at once
• cache: save recently executed instructions, data
– level 1: on chip, 8Kb
• decoded into microinstructions
– level 2: 256 Kb to 1 Mb, pgm and data
– level 3 (p4 extreme)
– caches create complexity for multiprocessing
• 2 buses:
– memory bus to SDRAM
– PCI bus for I/O
• 478 pin package
• up to 82 watts heat

B. Ross COSC 3p92 16


Pentium 4

• energy mgmt: putting CPU to sleep when not in use


– 5 states
• Pinout (fig 3-45)
– notation: name# means assert low (most)
• BP0: request bus
• BPRI#: hi priority req
• LOCK#: lock out devices
• 33 address lines
• but 36-bit addresses: low 3 are assumed 0
• hence data aligned on 8-byte boundaries
• ADS#: address lines valid
• REQ#: type of request (write, read,...)
• parity: for address, request
• error: FP, hardware,...
• Response: slave to master communication
• Data transfer
• 8 bytes, parity, ready signals,...
• interrupts: setting priorities
• power mgmt: which voltage
• heat mgmt: control fan speed based on temperature
• etc

B. Ross COSC 3p92 17


UltraSPARC III

• 64-bit RISC by Sun (manufactured by TI, ...)


• back-compatible to earlier SPARC chips
• adds 3D, multimedia, Java instns
• used in workstations, servers
• first introduced in 2000: 600MHz, 29 million trans, 0.18 microns
– 2002: 1.2 GHz, 0.13 microns, 50 Watts
• RISC:
– can do 4 instructions in parallel
– 6 pipelines: 2 for integer instns, 2 fp, 1 load/store, 1
branches
• 1368-pin land grid array (fig 3.47)
• caches:
– 2 L1 caches: 32 KB instns, 64 KB data
– 2 Kb prefetch
– 2 KB write (before L2)
– off-chip level 2 (like P4), but not packaged on chip
• can use 2nd party memory: more flexible, cheaper,
but slower
• Faster memory bandwidth than P4: 2.4 GB/sec vs 528 MB/sec
• Ultra Port Architecture (UPA): (fig 3.48)
– spec for connection (multi) CPU’s to (multiple) RAM
– L2: SRAM chips.
• tags = info about what memory is in “data” cache

B. Ross COSC 3p92 18


UltraSPARC III

• UPA:
– CPU communicates to it to access memory (as do other
CPU’s)
– address: in 2 cycles (row and column)
– can handle 2 memory transactions simultaneously.
Meanwhile, CPU(s) can carry on with other tasks
• UDB II chip:
– buffers requests and results between CPU and UPA

B. Ross COSC 3p92 19


Intel 8051

• Intended for embedded systems: devices needing computer


control, but in which price is primary factor (vs performance)
• Basically a simple 8-bit microprocessor, similar to those of the
late 1970’s and early 1980’s (Apple II, Atari, ...)
• However, it is totally self-contained: CPU, I/O, communication...
• Pinout: (Fig 3-49, 3-50)
– 16 address lines (62 KB)
– 8 data lines
– low-end byte of address multiplexes with data
• changes function at different cycles
– 4 I/O ports, 8 lines each
• main unique feature of chip!
• lets external devices (LEDS, buttons, controls,...)
interface directly to chip w/o external control logic
• Software can thus control device directly. Minimizes
expensive hardware!
– either external or internal memory can be accessed.
• RD, WR - read or write enable
• ALE - address latch (so memory can save address,
while lines multiplexed with data values)
• EA: for ext or internal memory access
– interrupt lines
– TDX, RXD: serial I/O to terminal or modem

B. Ross COSC 3p92 20

You might also like