PCIe - Express Pciee

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

PCIe-Compliance & Enumeration

Presented by: Pinki Shrivas


➔ What is PCIe Compliance
Contents
➔ Compatible Configuration Register Space
➔ Extended Configuration Space
➔ Host-to-PCI Bridge Configuration Registers
➔ Legacy PCI Mechanism
➔ Enhanced Configuration Access Mechanism
➔ Configuration Requests
◆ Type 0 Configuration Request
◆ Type 1 Configuration Request
➔ Example PCI-Compatible Configuration Access
➔ Example Enhanced Configuration Access
➔ Enumeration -Discovering the Topology
➔ Single Root Enumeration Example
➔ Multi-Root Enumeration Example
➔Address Space & Transaction Routing

➔Base Address Registers(BARs)

➔Base and Limit Registers


➔ PCI Express Switch Enumeration Using VMM-Based Design Ware Verification IP flow and Config-testcase
What is PCIe Compliance?
Definition
Compliance means that a product meets the standards set forth by the PCI-SIG in its PCI Express Test Specifications.
Compliance Program
➔ The PCI-SIG Compliance Workshops host interoperability and compliance tests
◆ Interoperability tests enable members to test their products against other members products
◆ Compliance tests allow for product testing against the PCI-SIG test
◆ Both testing types issue "pass" or "fail" results for each test area examined .To formally label products as compliant, they must score a
minimum of 80 percent on interoperability tests and pass all required compliance tests

PCIe compliance testing includes:


• Configuration testing :Examines configuration space in
PCIe Devices
• Link Protocol Testing : Examines devices link-level
protocol
• Electrical testing : Examines platform and add-in card
Transmitter and Receiver characteristics

Source :PCI-SIG
PCI-Compatible Configuration Register Space
➔ PCI-compatible configuration space was 256 bytes.

➔ The first 16 dwords (64 bytes) of this space are the


configuration header (Header Type 0 or Header Type 1).

➔ Type 0 headers are required for every Function except for


the bridge functions that use a Type 1 header.

➔ The remaining 48 dwords are used for optional registers


including PCI capability structures

➔ For PCle Functions, some capability structures are


required.

➔ PCle Functions must implement the following Capability


Structures:
◆ PCI Express Capability
◆ Power Management
◆ MSI and/or MSI-X

Fig: PCI-Compatible Configuration Register Space


Extended Configuration Space
➔ 256-byte configuration region is not enough to
contain all the new capability structures .

➔ Size of configuration space was expanded from


256 bytes per function to 4KB, called the Extended
Configuration Space.

➔ The 960-dword Extended Configuration area is only


accessible using the Enhanced configuration
mechanism .
➔ It is not visible to legacy PCI software.

➔ It contains additional optional Extended Capability


registers for PCle such as those listed :
◆ Advanced Error reporting
◆ Virtual Channels
◆ Device Serial number
◆ Power Budgeting

Fig: 4KB Configuration Space per PCI Express Function


Host-to-PCI Bridge Configuration Registers

Only the Root sends configuration Requests Generating Configuration Transactions


➔ Processors are generally unable to perform configuration read and
➔ Only the Root Complex is permitted to originate Configuration write requests directly because they can only generate memory
Requests. It acts as the system processor’s lto inject
and IO requests
Requests into the fabric and pass Completions back.
➔ Root Complex will need to translate certain of those accesses into

➔ Peer-to-peer Configuration Requests are not allowed. configuration requests in support of this process
➔ Configuration space can be accessed using either of two
mechanisms:
➔ The Requests are routed based on the target device’s ID, BDF : ◆ The legacy PCI configuration mechanism, using IO-
indirect accesses.
◆ Bus Number in the topology, ◆ The enhanced configuration mechanism, using memory-
mapped accesses.
◆ Device number on that bus,

◆ and Function number within that Device)


Legacy PCI Mechanism
Scenarios : IO-indirect method for instructing the system (the Root Complex or its equivalent) to perform PCI configuration accesses.
As it happened, the dominant PC processors (Intel x86) were only designed to address 64KB of IO address space.
By the time PCI was defined, limited IO space
Available address ranges : 0800h - 08FFh and 0C00h - 0CFFh
Problem : Not feasible to map the configuration registers for all the possible Functions directly into IO space.
Memory address space was also limited in size and mapping all of configuration space into memory address space
Solution : Indirect address mapping
➔ One register holds the target address,
➔ While a second holds the data going to or coming from the target.
A write to the address register, followed by a read or write to the data register,
Causes a single read or write transaction to the correct internal address for the target function.

Cons : Two IO accesses are needed to create one configuration access


The PCI-Compatible mechanism uses two 32-bit IO ports in the Host bridge of the Root Complex
Configuration Address Port, at IO addresses 0CF8h - 0CFBh,
Configuration Data Port, at IO addresses 0CFCh -CFFh.

Working : Accessing a Function’s PCI-compatible configuration registers is accomplished by :


➔ Writing the target Bus, Device, Function and dword numbers into the Configuration Address Port, setting its Enable bit in the process.
➔ A one-, two-, or four-byte IO read or write is sent to the Configuration Data Port.
➔ The host bridge in the Root Complex compares the specified target bus to the range of buses that exist downstream of the bridge.
➔ If the target bus is within that range, the bridge initiates a configuration read or write request (depending on whether the IO access to the Configuration Data Port
was a read or a write).
Contd..
Configuration Address Port
The information written to the configuration Address Port must
Configuration Address Port only latches information
conform to the following template :
when the processor performs a full 32-bit write to
the port . ➔ Bits [1:0] are hard-wired, read-only and must return zeros when read. The
location is dword aligned and no byte-specific offset is allowed.

➔ Bits [7:2] identify the target dword (also called the Register Number) in
the target Function's PCI-compatible configuration space.
◆ This mechanism is limited to the compatible configuration space (i.e.,
the first 64 doublewords of a Function’s configuration space).

➔ Bits [10:8] identify the target Function number (0 - 7) within the target
device.

➔ Bits [15:11] identity the target Device number (0 - 31).

➔ Bits [23:16] identity the target Bus number (0 - 255).

➔ Bits [30:24] are reserved and must be zero.

➔ Bit [31] must be set to 1b to enable translation of the subsequent IO


access to the Configuration Data Port into a configuration access.
◆ If bit 31 is zero and an IO read or write is sent to the Configuration
Data Port, the transaction is treated as an ordinary IO Request.
Fig: Configuration Address Port at 0CF8h
Contd..
Bus Compare and Data Port Usage for the Single Host System

➔ The information written to the Configuration Address Port is latched by the


➔ Host/PCI bridge within the Root Complex :
◆ If bit 31 is 1b

◆ Target bus is within the downstream range of bus numbers,

◆ Bridge translates a subsequent processor access targeting its Configuration


Data Port into a configuration request on bus 0.

◆ The processor then initiates an IO read or write transaction to the Configuration


Data Port at OCFCh.

◆ This causes the bridge to generate a Configuration Request


● That is a read when the IO access to the Configuration Data Port was a
read
● Configuration write if the IO access was a write. It will be a Type 0
configuration transaction if the target bus is bus 0,

● or a Type 1 for another bus within the range, or not forwarded at all if the
target bus is outside of the range.

Fig: Single Host System


Contd..
Bus Compare and Data Port Usage for the Multi-Host System

In order to prevent contention, only one of the bridges responds to the


processor's accesses to the configuration ports.

1. When the processor initiates the IO write to the Configuration


Address Port, the host bridges are configured .

2. During enumeration, software discovers and numbers all the buses


under the active bridge. When that’s done, it enables the inactive
host bridge .Both host bridges see the Requests and respond to
the appropriate bus number requests .

3. Accesses to the Configuration Address Port go to both host bridges


after that, and a subsequent read or write access to the
Configuration Data Port is only accepted by the host/PCI bridge that
is the gateway to the target bus.This bridge responds to the
processor’s transaction and the other ignores it.
a. If the target bus is the Secondary Bus, the bridge converts the
access to a Type 0 configuration access.
b. Otherwise, it converts it into a Type 1 configuration access.

Fig: Multi-Host System


Enhanced Configuration Access Mechanism
How PCI-X and, later, PCle, would access Configuration space,
There were two concerns :
➢ First, the 256-byte space per Function limited vendors who wanted to put more standardized capability structures.
○ Solution : The space was simply extended from 256 bytes to 4KB per Function

➢ Secondly, when PCI was developed there were few multi-processor systems.
➢ When there's only one CPU and it’s only running one thread, the fact that the old model takes two steps to generate one access
➢ Newer machines using multi-core, multi-threaded CPUs present a problem for the IO-indirect model
➢ As multiple threads are trying to access Configuration space at the same time

Cons :
➢ The two-step model will no longer work without some locking semantics.
➢ One thread A writes a value into the Configuration Address Port (CF8h)
➢ There is nothing to prevent thread B from overwriting that value before thread A can perform its corresponding access to the Configuration Data
Port (CFCh).

Solution :
➢ Conserve address space, create a single-step, uninterruptible process by mapping all of configuration space into memory addresses
➢ It allows a single command sequence, since one memory request in the specified address range will generate one Configuration Request on the
bus.

Trade-off is address size


➢ Mapping 4KB per Function for all the possible implementations requires allocating 256MB of memory address space.
➢ Modern architectures typically support physical memory address space between 36 and 48 bits .
➢ With these memory address space sizes, 256MB is insignificant.
To handle mapping, each Function 4KB configuration space starts at a 4KB-aligned address within the 256MB memory address space Contd..
set aside for configuration access, and the address bits now carry the identifying information about which Function is targeted

Memory Address Bit Field Description

A[63:28] Upper bits of the 256MB-aligned base address of the 256MB memory-mapped address range
allocated for the Enhanced Configuration Mechanism
The manner in which the base address is allocated is implementation-specific.It is supplied to the
OS by system firmware(typically through the ACPI tables)

A[27:20] Target Bus Number(0-255)

A[19:15] Target Device Number(0-31)

A[14:12] Target Function Number(0-7)

A[11:2] A[11:2] this range can address one of 1024 dwords, whereas the legacy method is limited to only
address one of 64 dwords.

A[1:0] Defines the access size and the Byte Enable setting

Table :Enhanced Configuration Mechanism Memory-Mapped Address Range

Rules
➢ A Root Complex is not required to support an access to enhanced configuration memory space if it crosses a dword address boundary (straddles two adjacent
memory dwords).
➢ Nor are they required to support the bus locking protocol that some processor types use for an atomic, or uninterrupted series of commands.
➢ Software should avoid both of these situations when accessing configuration space unless it is known that the Root Complex does support them.
Configuration Requests

Type 0 Configuration Request Type 1 Configuration Request


Example PCI-Compatible Configuration Access

Fig: Configuration Address Port at 0CF8h

To generating a configuration Request using the legacy CF8h/CFCh mechanism


Consider the following x86 assembly code sample. which will cause the Root Complex to
perform a 2-byte read from Bus 4,Device 0,Function 0,Register 0 (Vendor ID) :

mov dx, 0CF8h ; set dx = config address port address


mov eax, 80040000h ; enable =1 , bus 4, dev 0 , fun 0, DW 0
out dx, eax ; IO write to set up address port
mov dx,0CFh ; set dx = config data port address
in ax,dx ;2-byte read from config data port

Fig: Configuration Read Access


Example Enhanced Configuration Access
● Address bits 63:28 indicate the upper 36 bits of the 256MB-aligned base addres
the overall Enhanced Configuration address range (in this case,00000000
E0000000h).

● Address bits 27:20 select the target bus (in this case, 4).

● Address bits 19:15 select the target device (in this case, 0) on the bus.

● Address bits 14:12 select the target Function (in this case, 0) within the device.

● Address bits 11:2 selects the target dword (in this case, 0) within the selected
Function’s configuration space.

● Address bits 1:0 define the start byte location within the selected dword (in this c
0).

● The processor initiates a 2-byte memory read starting from memory location
E£0400000h, and this is latched by the Host Bridge in the Root Complex. The

● Host Bridge recognizes that the address matches the area designated for
Configuration and generates a Configuration read Request for the first two bytes

● dword 0, Function 0, device 0, bus 4. The remainder of the operation is the sam
that described in the previous section.

The Host Bridge Must have been assigned a base address value .
This example assumes that the 256MB-aligned base address of the Enhanced
Configuration memory-mapped range is E000000h:
mov ax, [E0400000h] ;memory-mapped config read
Enumeration - Discovering the Topology

The process of scanning the PCI Express fabric to discover its


topology is referred to as the enumeration process .
Single Root Enumeration Example

Fig: Header Type Register

Fig: Single Root System


At startup time, the configuration software executing on the processor performs enumeration : Contd..

1. Software updates the Host/PCI bridge Secondary Bus Number to zero and the Subordinate Bus Number to 255.

2. Starting with Device 0 (bridge A), the enumeration software attempts to read the Vendor ID from Function 0 in each of the 32 possible devices on bus 0.
3.
4. The Header Type field contains the value one (O1h) indicating this is a PCI-to-PCI bridge.

5. Now that software has found a bridge, performs a series of configuration writes to set the bridge’s bus number registers as follows :
1. Primary Bus Number Register = 0
2. Secondary Bus Number Register = 1
3. Subordinate Bus Number Register = 255

6. Enumeration software must perform a depth-first search. Before proceeding to discover additional Devices/Functions on bus 0, it must proceed to search bus 1,

7. Software reads the Vendor ID of Bus 1, Device 0, Function 0, which targets bridge C in our example. A valid Vendor ID is returned, indicating that Device 0, Function 0
exists on Bus 1.

8. The Header Type field in the Header register contains the value one (0000001b) indicating another PCI-to-PCI bridge. As before, bit 7 is a 0, indicating that bridge C is a
single-function device.

9. Software now performs a series of configuration writes to set bridge C’s bus number registers as follows:
1. Primary Bus Number Register = 1
2. Secondary Bus Number Register = 2
3. Subordinate Bus Number Register = 255

10. Continuing the depth-first search, a read is performed from bus 2, device 0,Function 0's Vendor ID register. The example assumes that bridge D is Device 0, Function 0 on
Bus 2.

11. A valid Vendor ID is returned, indicating bus 2, device 0, Function 0 exists.

12. The Header Type field in the Header register contains the value one(0000001b) indicating that this is a PCI-to-PCI bridge, and bit 7 is a 0, indicating that bridge D is a
single-function device.
12. Software now performs a series of configuration writes to set bridge D’s bus number registers as follows: Contd..
● Primary Bus Number Register = 2
● Secondary Bus Number Register = 3
● Subordinate Bus Number Register = 255

13. Continuing the depth-first search, a read is performed from bus 3, device 0,Function 0's Vendor ID register.

14, A valid Vendor ID is returned, indicating bus 3, device 0, Function 0 exists.

15. The Header Type field in the Header register contains the value zero(0000000b) indicating that this is an Endpoint functi on. Since this is an end-
point and not a bridge, it has a Type 0 header and there are no PCI-compatible buses beneath it. This time, bit 7 is a 1, indicating that this is a multifunction device.

16. Enumeration software performs accesses to the Vendor ID of all 8 possible functions in bus 3, device 0 and determines that only Function 1 exists in
addition to Function 0. Function 1 is also an Endpoint (Type 0 header), so there are no additional buses beneath this device.

17, Enumeration software continues scanning across on bus 3 to look for valid functions on devices 1 - 31 but does not find any additional functions.

18. Having found every function there was to find downstream of bridge D, enumeration software updates bridge D, with the real Subordinate Bus Number of 3. Then it backs up
one level (to bus 2) and continues scanning across on that bus looking for valid functions. The example assumes that bridge E is device 1, Function 0 on bus 2.

19, A valid Vendor ID is returned, indicating that this Function exists,

20, The Header Type field in bridge E’s Header register contains the value one(0000001b) indicating that this is a PCI-to-PCI bridge, and bit 7 is a 0, indicating a single-function
device.

21. Software now performs a series of configuration writes to set bridge E’s bus number registers as follows:
● Primary Bus Number Register = 2
● Secondary Bus Number Register = 4
● Subordinate Bus Number Register = 255

22. Continuing the depth-first search, a read is performed from bus 4, device 0,Function 0's Vendor ID register.

23. A valid Vendor ID is returned, indicating that this Function exists.

24, The Header Type field in the Header register contains the value zero(0000000b) indicating that this is an Endpoint device, and bit 7 is a 0, indicating that this is a single-
function device.
Contd..
25. Enumeration software scans bus 4 to look for valid functions on devices 1-31 but does not find any additional functions.

26. Having reached the bottom of this tree branch, enumeration software updates the bridge above that bus, E in this case, with the real Subordinate Bus Number of 4. It
then backs up one level (to bus 2) and moves on to read the Vendor ID of the next device (device 2). The example assumes that
devices 2 - 31 are not implemented on bus 2, so no additional devices are discovered on bus 2.

27, Enumeration software updates the bridge above bus 2, C in this case, with the real Subordinate Bus Number of 4 and backs up to the previous bus (bus 1) and
attempts to read the Vendor ID of the next device (device 1). The example assumes that devices 1 - 31 are not implemented on bus 1, so no additional devices are
discovered on bus 1.

28. Enumeration software updates the bridge above bus 1, A in this case, with the real subordinate Bus Number of 4. and backs up to the previous bus
(bus 0) and moves on to read the Vendor ID of the next device (device 1). The example assumes that bridge B is device 1, function 0 on bus 0.

29, In the same manner as previously described, the enumeration software discovers bridge B and performs a series of configuration writes to set bridge B's bus number
registers as follows:
● Primary Bus Number Register =0
● Secondary Bus Number Register =5
● Subordinate Bus Number Register = 255

30. Bridge F is then discovered and a series of configuration writes are performed to set its bus number registers as follows:
● Primary Bus Number Register =5
● Secondary Bus Number Register = 6
● Subordinate Bus Number Register = 255

31. Bridge G is then discovered and a series of configuration writes are per-
formed to set its bus number registers as follows:
● Primary Bus Number Register = 6
● Secondary Bus Number Register =7
● Subordinate Bus Number Register = 255

32, A single-function Endpoint device is discovered at bus 7, device 0, function 0, so the Subordinate Bus Number of Bridge G is updated to 7

33. Bridge H is then discovered and a series of configuration writes are performed to set its bus number registers as follows:
● Primary Bus Number Register = 6
● Secondary Bus Number Register = 8
● Subordinate Bus Number Register = 255
Contd..
.
34, Bridge J is discovered and a series of configuration writes are performed toset bridge its bus number registers as follows:
● Primary Bus Number Register = 8
● Secondary Bus Number Register = 9
● Subordinate Bus Number Register = 255

35, All devices and their respective Functions on bus 9 are discovered and none
of them are bridges, so the Subordinate Bus Number of bridges H and J are
updated to 9,

36. Bridge I is then discovered and a series of configuration writes are performed to set its bus number registers as follows:
● Primary Bus Number Register = 6
● Secondary Bus Number Register = 10
● Subordinate Bus Number Register = 255

37. A single-function Endpoint device is discovered at bus 10, device 0, function 0,

38. Since software has reached the bottom of this branch of the tree structure required for PCle topologies, the Subordinate Bus Number registers for
bridges B, F, and I are updated to 10, and so is the Host/PCI bridge's Subordinate Bus Number register.
Contd..
Final Values encoded into each bridge’s Primary,Secondary, Subordinate Bus Number

Fig :Configuration Read Access


Multi-Root Enumeration Example
Once that enumeration process has been completed, the enumeration
software takes the following steps to enumerate the secondary Root
Complex:

1. The enumeration software changes the Secondary and Subordinate Bus


Number values in the secondary Root Complex's Host/PC] bridge to bus 64
in this example. (The values of 64 and 128 are commonly used as the
starting bus number in multi-root systems)

1. Enumeration software then starts searching on bus 64 and discovers the


bridge attached to the downstream Root Port.

1. A series of configuration writes are performed to set its bus number registers
as follows:
a) Primary Bus Number Register = 64
b) Secondary Bus Number Register = 65
c) Subordinate Bus Number Register = 255

1. Device 0 is discovered on Bus 65 that implements a only Function 0, and


further searching reveals no other Devices are present on Bus 65, so the
search process moves back up one Bus level.

1. Enumeration continues on bus 64 and no additional devices are discovered,


so the Host/PCI's Subordinate Bus Number is updated to 65.

Fig : Multi-Root System


Address Space & Transaction Routing

➔PCIe supports the exact three address spaces :


◆Configuration
◆Memory
◆IO

Configuration Space : It allow software to control and check the status of devices.
Memory and IO Address Spaces
➔ In the early days of PCs, the internal registers/storage in IO devices were
accessed via IO address space.

➔ There are several limitation and undesirable effects related to IO address space .

➔ Newer devices that do not rely on legacy software or heavy compatibility issues
typically just map internal registers/storage through memory address
space(MMIO), with no IO address space begin requested .

➔ The size of the memory map is a function of the range of addresses that the
system can use .

➔ The size of the IO map in PCIe is limited to 32 bits(4GB)

➔ Many computers using Intel-compatiable(x86) processors , only the lower 16


bits(64KB) are used .

➔ PCIe can support memory address up to 64 bits in size

➔ Switches and Root Complexes to also have device-specific registers accessed via
MMIO and IO addresses.

Fig : Generic Memory And IO Address Maps


Base Address Registers(BARs)

⮚ Each device in a system may have different requirements in terms of


the amount and type of address space needed.
⮚ For example, one device may have 256 byte worth of internal
registers/storage that should be accessible through IO address
space and another device may have 16KB of internal
registers/storage that should be accessible through MMIO.

⮚ Software knows what’s the device’s requirements are in terms of


address space , then assuming the request can be fulfilled ,
software will simply allocate an available range of address , of
the appropriate type (IO, NP-MMIO or P-MMIO) , to that device

⮚ This is all accomplished through the Base Address


Registers(BARs) in the header of configuration space.

Fig :BARs in Configuration Space


Contd..
➔ System software must first determine the size and type of address space being
requested by a device .

➔ Device designer knows the collective size of the internal registers/storage that
should be accessible via IO or MMIO.

➔ The device designer also knows how the device will behave when those registers
are accessed (i.e. do reads have no side-effects or not ) .

➔ Determine whether prefetchable MMIO(reads have no side-effects) or non-


prefetchable MMIO(reads do have side effects) should be requested.

➔ Knowing this information , the device designer hard-coded the lower bits of the BARs
to certain values indicating the type and size of the address being requested .

➔ Upper bits of the BARs are writable by software.

➔ To determine the size and type of address space requested , system software
checks the lower bits of the BARs.

➔ Then write the base address of the address range being allocated to this device
into the into the upper bits of the BAR.

➔ Not all BARs have to be implemented. If a device does not need all the BARs to map
their internal registers, the extra BARs are hard-coded with all 0’s notifying software
that these BARs are not implemented .

Fig: PCI Express Devices And Type 0 And Type 1 Header Use
BAR Example1: 32-bit Memory Address Space
➔ Requesting a 4KB block of non-prefetchable memory (NP-MMIO).

➔ The following three points in the configuration process :


◆ Uninitialized state of the BAR. Lower bits to indicate the size and type , but upper
bits(which are read-write) are shown as Xs to indicate their values is not known .
◆ Writing all 1s to the BARs, software turns around and reads the value of each BAR,
starting with BAR0 to determine type and size of the address space being requested .
◆ To allocate an address range to BAR0 .

➔ Start address : F900_0000h

➔ Configuration of BAR0 is complete: Software enables memory address decoding in the


Command register (offset 04h).

➔ Device will accept any memory request that fall within the range :F900_0000h –
F900_0FFFh(4KB in size)

BAR Bits Meaning


0 Read as 0b, indicating a memory request

2:1 Read as 00b indicating the target only supports decoding a 32-bit address

3 Read as 0b, indicating request if for non-prefetchable memory(reads do have


side-effects) ;NP-MMIO
11:4 Read as all 0s, indicating the size of the request

31:12 Reads as all 1s because software has not programmed the upper bits with a start
address for the block .Since least significant bit 12 . The memory size requested
is 2^12 = 4KB
Fig: 32-Bit Non-Prefetchable Memory BAR Set Up
Table: Reading the BAR after Writing All 1s
BAR Example2: 64-bit Memory Address Space
➔ BAR1 and BAR2 are being used to request a 64MB block of Prefetchable memory
address space .

➔ Two sequential BARs are being used as device supports a 64-bit address for this
request

➔ Software can allocate the requested address space above the 4GB address boundary if
it wants .

➔ The following three points in the configuration process :


◆ Uninitialized state of the BAR. The Device designer has hard-coded the lower bits of
the lower BAR1 to indicate the size and type , while the bits of the upper BAR2 are
all read-write
◆ Writing all 1s to every BAR, software’s next step to read the next BAR(BAR1) and
evaluate it to see if the device is requesting additional address space is being requested
and this request is for prefetchable memory address space that can be allocated in
anywhere in the 64-bit address range.
◆ Final step is for the system software to allocate an address range to the BAR.
◆ Software has used two configuration writes to program the 64-bit start address for
the allocated range :
● In this example : bit 1of the Upper BAR(address bit 33 in the BAR pair)
is set and bit 30 of the Lower BAR (address bit 30 in the BAR pair) is
set to indicate a start address : 24000_0000h.

➔ Configuration of BAR pair(BAR1 & BAR2) is complete: Software enables memory


address decoding in the Command register (offset 04h).

➔ Device will accept any memory request that fall within the range :24000_0000h-
243FF_FFFFh(64MB in size)

Fig: 64-Bit Prefetchable Memory BAR Set Up


BAR BAR Bits Meaning
Lower 0 Read as 0b, indicating a memory request .

Lower 2:1 Read as 10b indicating the target supports a 64-bit address decoder, and that the next sequential BAR
contains the upper 32 bits of the address information.
Lower 3 Read as 1b, indicating request is for prefetchable memory (reads do not have side effects);P-MMIO

Lower 25:4 Read as all 0s,indicating the size of the request (these bits are hard-coded to 0)

Lower 31:26 Read as all 1s because software has not yet programmed the upper bits with a start address for the
block. Since least significant bit is 26.The memory address space request size is 2^26 =64MB

Upper 31:0 Read as all 1s.These bits will be used as the upper 32bits of the 64-bit start address programmed by
system software

Table: Reading the BAR pair after Writing All 1s


BAR Example3: IO Address Space Request
➔ The Requesting BAR(BAR3 )

➔ The following three points in the configuration process :


◆ Uninitialized state of the BAR. System software has previously written all 1s to every
BAR and has evaluated BAR0, then BAR1 and BAR2.Software see if the device is
requesting additional address space with BAR3.
◆ Now Software knows the request for 256 bytes of IO address space now reads in BAR3
to evaluate the size and type of the request.

➔ Start address : 16KB

➔ Configuration of BAR0 is complete: Software enables memory address


decoding in the Command register (offset 04h).

➔ Device will accept and respond to IO transactions within the range :4000h –
40FF(256bytes in size)

BAR Bits Meaning


0 Read as 1b, indicating an IO request. Since this is an IO request, bit 1 is reserved

1 Reserved. Hard-coded to 0b

7:2 Read as 0s indicating size of the request (these bits are hard-coded to 0)

31:8 Read as 1s because software has not yet programmed the upper bits with a start
address for the block . Since least significant bit is 8.The memory address space
request size is 2^8 =256 bytes

Fig: IO BAR Set Up


Base and Limit Registers

➢ Base and Limit registers in the Type 1 headers that are programmed
with the range of addresses that live beneath this bridge.

➢ There are the three sets of Base and Limit registers found in each
Type 1 header.

➢ Three sets of registers are needed because there can be three


separate address ranges living below a bridge:
○ Prefetchable Memory space (P-MMIO)
○ Non-Prefetchable Memory space (NP-MMIO)
○ IOspace (IO)

Fig: Example Topology for Setting Up Base and Limit Values :


Prefetchable Memory space (P-MMIO)
Prefetchable Memory Base/Limit Register Meanings

Fig: Prefetchable Memory Base/Limit Register Values


Non-Prefetchable Range(NP-MMIO)
Non-Prefetchable Memory Base/Limit Register Meanings
IO Range
IO Base/Limit Register Values Meanings

Fig: Example IO Base/Limit Register Values


Routing Mechanisms
➔ ID Based Routing
➔ ID Based Routing :
◆ Endpoints : One Check
◆ It is compatible with routing methods used in the
◆ Switches(Bridges) : Two Checks Per Port
PCI and PCI-X protocols for Configuration
transactions.
➔ Address Routing
◆ Endpoint Address Checking
◆ In PCIe, it is still used for routing configuration
packets .
➔ Switch Routing
◆ It is also used to route completions and some
➔ Implicit Routing
messages.

◆ Endpoints : One Check : An endpoint simply


checks the ID field in the packet header against
its own BDF

➔ Address Routing : Memory and IO transactions


both are using Address routing

➔ Implicit Routing : Used in Some message packets


Contd..
ID Based Routing

Switch (Bridges):Two checks Per Port

Fig: Switch Checks Routing Of An Inbound TLP using ID Routing


Address Routing

Fig :Endpoint Checks Incoming TLP


Switching Routing
Below are the steps that a bridge (switch port) takes
upon receiving an address-based TLP :

Downstream Travelling ILI’s (Received on Primary


Interface)

1. IF the target address in the TLP matches one of the BARs,


bridge (switch port) consumes the TLP .

1. IF the target address in the TLP falls in the range of one of its
Base/ Limit register sets, the packet will be forwarded to the
secondary interface (downstream).

1. ELSE the TLP will be handled as an Unsupported Request on


the primary interface.

Upstream Traveling TLPs (Received on Secondary


Interface)

1. IF the target address in the TLP matches one of the BARs,


bridge (switch port) consumes the TLP .

1. IF the target address in the TLP falls in the range of one of its
Base/Limit register sets, the TLP will be handled as an
Unsupported Request on the secondary interface.

1. ELSE the TLP will be forwarded to the primary interface


(upstream) given that the TLP address is not for this bridge and
is not for any function beneath this bridge.
Fig:Switch Checks Routing Of An Inbound TLP Using Address
PCI Express Switch Enumeration Using VMM-Based DesignWare Verification IP
Enumeration process the system software :
● Discovers all of the switch and endpoint devices that are connected to the system
● determines the memory requirements
● configures the PCIe devices.

Step1: The enumeration process selects the bus numbers for each device
● Upstream port of the switch will be assigned bus #2.
● In the switch device, the internal bus between the upstream port and the two
downstream ports will be assigned bus #3.
● The PCIe End Point (EP) devices that are downstream from the switch are Bus #4
and Bus # 5.
● The device numbers for both EP's will be 0.

Step2: Determine the memory size of each device.


● Testbench environment sets the memory size of each device.
● To simplify , the switch will not have any internal memory, and each EP device will have
2M of memory.
● The EP device on bus 4 will start at address 2M.
● The EP on bus 5 will start at address 4M.
● The address range that the Switch will support : 2M through 6M.
● Each EP device will only support 32-bit addresses.

Now Switch can be configured. Each port of the Switch has its own configuration space and
the format of the configuration space is type CONFIG 1.
Each of these CONFIG-1 spaces must be programmed. Fig:Example Testbench With Bus Numbers Assigned
The upstream port needs to be programmed first.
Contd..

Byte Offset Byte3 Byte2 Byte1 Byte 0


0x00 Device ID Vendor ID

0x04 Status Register Command Register

0x8 Class Code Revision ID

0xC BIST(0x00) Header Type Latency Timer Cache Line Size

0x10 Base Address Register 0

0x14 Base Address Register 1

0x18 Secondary Latency Timer Subordinate Bus Number Secondary Bus Number Primary Bus Number

0x1C Secondary Status I/O Limit I/O Base

0x20 Memory Limit Memory Base

0x24 Prefetchable Memory Limit Prefetchable Memory Base

0x28 Prefetchable Base Upper 32 Bits

0x2C Prefetchable Limit 32 Upper Bits

0x30 I/O Limit Upper 16 Bits I/O Base Upper 16 Bits

0x34 Reserved CapPtr

0x38 Expansion ROM Base Address

0x3c Bridge Control Interrupt Pin Interrupt Line

Table 1. PCI Configuration Space Header - Type 1


Configuration parameters and their values that need to be programmed.

Register Address Register Value

Command[15:0] 0x04 01 16'h0004

Primary Bus[7:0] 0x18 06 8'b00000010

Secondary Bus[7:0] 0x18 06 8'b00000011

Subordinate Bus[7:0] 0x18 06 8'b00000110

Memory Base[15:0] 0x20 08 16'h0020

Memory Limit[15:0] 0x20 08 16'h0060

Table 2. Upstream Port Configuration


Values
Note : There are a minimum of 6 registers that need to be programmed for the Switch to support configuration and memory accesses to an EP
device.
They are:
1. Command : Enable memory access
2. Primary Bus : Bus number of upstream bus
3. Secondary Bus : Bus number of bus directly attached device
4. Subordinate Bus : Bus number of farthest downstream device
5. Memory Limit : Maximum memory address
6. Memory Base : Starting memory address
Contd..
The upstream port configuration space is programmed using CONFIG-0 packets.
CONFIG-0 packets use the bus number, device number, function number, and register number to access the register to be programmed.
The following task will program the upstream Switch port with the previously listed values.

Program Upstream Port


1) Write Command register // Write Subordinate Bus, Secondary Bus and Primary Bus Numbers:
2) Write Subordinate, Secondary and Primary Bus Numbers test = cfg0_wr.randomize() with {
3) Write Memory Limit and Memory Base m_bvRequesterId == `TBD_REQ_ID;
task pcie_rvm_env::Upstream_switch_config(); m_bvBusNum == 8'h02; m_bvDevNum == 5'h00;
begin m_bvFuncNum == 3'h0; m_bvRegNum == 10'h006;
bit test; m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
bit status; m_bvvPayload[0] == 32'h00050302;
};
dw_vip_pcie_tlp_transaction cfg0_wr = if (!test) begin
new ( , dw_vip_pcie_tlp_transaction::CFG_WR_0); `vmm_error(log, "TBD Configuration request packet failed to randomize.");
dw_vip_pcie_tlp_transaction cfg_tlp; end

// Write COMMAND REGISTER cfg_tlp.notify.wait_for(vmm_data::ENDED);


// Enable memory cycles
test = cfg0_wr.randomize() with { // Write Memory Limit and Memory Base
m_bvRequesterId == `TBD_REQ_ID; test = cfg0_wr.randomize() with {
m_bvBusNum == 8'h02; m_bvDevNum == 5'h00; m_bvRequesterId == `TBD_REQ_ID;
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h001; m_bvBusNum == 8'h02; m_bvDevNum == 5'h00;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0; m_bvFuncNum == 3'h0; m_bvRegNum == 10'h008;
m_bvvPayload[0] == 32'h00000004; m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
}; m_bvvPayload[0] == 32'h00600020;
};
if (!test) begin if (!test) begin
`vmm_error(log, "TBD Configuration request packet failed to `vmm_error(log, "TBD Configuration request packet failed to randomize.");
randomize."); end
end
$cast(cfg_tlp, cfg_wr.copy());
$cast(cfg_tlp, cfg_wr.copy()); tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp); cfg_tlp.notify.wait_for(vmm_data::ENDED);
cfg_tlp.notify.wait_for(vmm_data::ENDED); end
$cast(cfg_tlp, cfg_wr.copy()); endtask
Contd..

The downstream port (Device 1 ) has the following values :


Register Address Register Value

Command[15:0] 0x04 01 16'h0004

Primary Bus[7:0] 0x18 06 8'b00000011

Secondary Bus[7:0] 0x18 06 8'b00000100

Subordinate Bus[7:0] 0x18 06 8'b00000100

Memory Base[15:0] 0x20 08 16'h0020

Memory Limit 15:0] 0x20 08 16'h0040

Table 3. Downstream Port Configuration Values

The downstream ports configuration space is programmed using CONFIG-1 packets.


CONFIG-1 packets also use bus number, device number, function number, and register number to access the register to be programmed.

.
The following task will program downstream device 1 of the downstream port : Contd..
// Program Device 1 Downstream Port // Write Subordinate Bus, Secondary Bus and Primary Bus Numbers:
// 1) Write Command register test = cfg1_wr.randomize() with {
// 2) Write Subordinate, Secondary and Primary Bus Numbers m_bvRequesterId == `TBD_REQ_ID;
// 3) Write Memory Limit and Memory Base m_bvBusNum == 8'h03; m_bvDevNum == 5'h01;
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h006;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
task pcie_rvm_env::Downstream_device_1_switch_config(); m_bvvPayload[0] == 32'h00040403;
begin };
bit test; if (!test) begin
`vmm_error(log, "TBD Configuration request packet failed to randomize.");
dw_vip_pcie_tlp_transaction cfg1_wr = end
new ( , dw_vip_pcie_tlp_transaction::CFG_WR_1);
dw_vip_pcie_tlp_transaction cfg_tlp; $cast(cfg_tlp, cfg_wr1.copy());
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
// Write COMMAND REGISTER cfg_tlp.notify.wait_for(vmm_data::ENDED);
// Enable memory cycles
test = cfg1_wr.randomize() with { // Write Memory Limit and Memory Base
m_bvRequesterId == `TBD_REQ_ID; test = cfg1_wr.randomize() with {
m_bvBusNum == 8'h03; m_bvDevNum == 5'h01; m_bvRequesterId == `TBD_REQ_ID;
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h001; m_bvBusNum == 8'h03; m_bvDevNum == 5'h01;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0; m_bvFuncNum == 3'h0; m_bvRegNum == 10'h008;
m_bvvPayload[0] == 32'h00000004; m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
}; m_bvvPayload[0] == 32'h00400020;
if (!test) begin };
`vmm_error(log, "TBD Configuration request packet failed to randomize."); if (!test) begin
end `vmm_error(log, "TBD Configuration request packet failed to randomize.");
end
$cast(cfg_tlp, cfg1_wr.copy());
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp); $cast(cfg_tlp, cfg_wr1.copy());
cfg_tlp.notify.wait_for(vmm_data::ENDED); tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
cfg_tlp.notify.wait_for(vmm_data::ENDED);

end
endtask
Contd..

The downstream port (Device 2 ) has the following values :


Register Address Register Value

Command[15:0] 0x04 01 16'h0004

Primary Bus[7:0] 0x18 06 8'b00000011

Secondary Bus[7:0] 0x18 06 8'b00000101

Subordinate Bus[7:0] 0x18 06 8'b00000101

Memory Base[15:0] 0x20 08 16'h0040

Memory Limit 15:0] 0x20 08 16'h0060

Table 4. Downstream Port Configuration Values

The downstream ports configuration space is programmed using CONFIG-1 packets.


CONFIG-1 packets also use bus number, device number, function number, and register number to access the register to be programmed.

.
The following task will program downstream device 2 of the downstream port : Contd..
// Write Subordinate Bus, Secondary Bus and Primary Bus Numbers:
// Program Device 2 Downstream Port
test = cfg1_wr.randomize() with {
// 1) Write Command register
m_bvRequesterId == `TBD_REQ_ID;
// 2) Write Subordinate, Secondary and Primary Bus Numbers
m_bvBusNum == 8'h03; m_bvDevNum == 5'h02;
// 3) Write Memory Limit and Memory Base
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h006;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
task pcie_rvm_env::Downstream_device_2_switch_config();
m_bvvPayload[0] == 32'h00050503;
begin
};
bit test;
if (!test) begin
`vmm_error(log, "TBD Configuration request packet failed to randomize.");
dw_vip_pcie_tlp_transaction cfg1_wr =
end
new ( , dw_vip_pcie_tlp_transaction::CFG_WR_1);
dw_vip_pcie_tlp_transaction cfg_tlp;
$cast(cfg_tlp, cfg_wr1.copy());
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
// Write COMMAND REGISTER
cfg_tlp.notify.wait_for(vmm_data::ENDED);
// Enable memory cycles
test = cfg1_wr.randomize() with {
// Write Memory Limit and Memory Base
m_bvRequesterId == `TBD_REQ_ID;
test = cfg1_wr.randomize() with {
m_bvBusNum == 8'h03; m_bvDevNum == 5'h02;
m_bvRequesterId == `TBD_REQ_ID;
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h001;
m_bvBusNum == 8'h03; m_bvDevNum == 5'h01;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h008;
m_bvvPayload[0] == 32'h00000004;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
};
m_bvvPayload[0] == 32'h00600040;
if (!test) begin
};
`vmm_error(log, "TBD Configuration request packet failed to randomize.");
if (!test) begin
end
`vmm_error(log, "TBD Configuration request packet failed to randomize.");
end
$cast(cfg_tlp, cfg1_wr.copy());
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
$cast(cfg_tlp, cfg_wr1.copy());
cfg_tlp.notify.wait_for(vmm_data::ENDED);
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
cfg_tlp.notify.wait_for(vmm_data::ENDED);

end
endtask
Summary

• The Switch is now ready to accept configuration and memory packets that are for the EP devices connected to the downstream ports of the
switch.

• PCIe designs must go through the process of Switch enumeration to discover available switches, and then to configure them.

• VMM based PCIe Verification IP and SystemVerilog to perform the task of configuration during the process of Switch enumeration.

• The examples use the rich set of class, protocol, and packet capabilities of the VMM models to perform the configuration task.
MindShare Arbor: Debug/Validation/Analysis and Learning Software Tool
MindShare Arbor Feature List

➢ Description of all config registers included in the PCle 3.0 spec


➢ Scan config space for all PCI-visible functions in system and a
description of
every one of these registers displayed in an easily readable format
➢ Directly access any memory or IO address
➢ Write to any config space location, memory address or IO address
➢ View standard and non-standard structures in a decoded format
Decode info included for standard PCI, PCI-X and PCI Express struc-
tures
Decode info included for some x86-based structures and device-
specific
registers
➢ Create your own XML-based decode files to drive Arbor's display
Create decode files for structures in config space, memory address
space and IO space
➢ Save system scans for viewing later or on other systems
Saved system scans are XML-based and open-format
➢ New features that are either already in or coming soon:
Difference checking between scans
Post-processing scans for illegal or non-optimal settings
Scripting support for automation
Decode for x86 structures (MSRs, paging, segmentation, interrupt
tables, etc.)
Decode for ACPI structures
Decode for USB structures
Decode for NVM Express structures
Reference
MindShare_PCIe30_eBook_v1.02

You might also like