Professional Documents
Culture Documents
PCIe - Express Pciee
PCIe - Express Pciee
PCIe - Express Pciee
Source :PCI-SIG
PCI-Compatible Configuration Register Space
➔ PCI-compatible configuration space was 256 bytes.
➔ Peer-to-peer Configuration Requests are not allowed. configuration requests in support of this process
➔ Configuration space can be accessed using either of two
mechanisms:
➔ The Requests are routed based on the target device’s ID, BDF : ◆ The legacy PCI configuration mechanism, using IO-
indirect accesses.
◆ Bus Number in the topology, ◆ The enhanced configuration mechanism, using memory-
mapped accesses.
◆ Device number on that bus,
➔ Bits [7:2] identify the target dword (also called the Register Number) in
the target Function's PCI-compatible configuration space.
◆ This mechanism is limited to the compatible configuration space (i.e.,
the first 64 doublewords of a Function’s configuration space).
➔ Bits [10:8] identify the target Function number (0 - 7) within the target
device.
● or a Type 1 for another bus within the range, or not forwarded at all if the
target bus is outside of the range.
➢ Secondly, when PCI was developed there were few multi-processor systems.
➢ When there's only one CPU and it’s only running one thread, the fact that the old model takes two steps to generate one access
➢ Newer machines using multi-core, multi-threaded CPUs present a problem for the IO-indirect model
➢ As multiple threads are trying to access Configuration space at the same time
Cons :
➢ The two-step model will no longer work without some locking semantics.
➢ One thread A writes a value into the Configuration Address Port (CF8h)
➢ There is nothing to prevent thread B from overwriting that value before thread A can perform its corresponding access to the Configuration Data
Port (CFCh).
Solution :
➢ Conserve address space, create a single-step, uninterruptible process by mapping all of configuration space into memory addresses
➢ It allows a single command sequence, since one memory request in the specified address range will generate one Configuration Request on the
bus.
A[63:28] Upper bits of the 256MB-aligned base address of the 256MB memory-mapped address range
allocated for the Enhanced Configuration Mechanism
The manner in which the base address is allocated is implementation-specific.It is supplied to the
OS by system firmware(typically through the ACPI tables)
A[11:2] A[11:2] this range can address one of 1024 dwords, whereas the legacy method is limited to only
address one of 64 dwords.
A[1:0] Defines the access size and the Byte Enable setting
Rules
➢ A Root Complex is not required to support an access to enhanced configuration memory space if it crosses a dword address boundary (straddles two adjacent
memory dwords).
➢ Nor are they required to support the bus locking protocol that some processor types use for an atomic, or uninterrupted series of commands.
➢ Software should avoid both of these situations when accessing configuration space unless it is known that the Root Complex does support them.
Configuration Requests
● Address bits 27:20 select the target bus (in this case, 4).
● Address bits 19:15 select the target device (in this case, 0) on the bus.
● Address bits 14:12 select the target Function (in this case, 0) within the device.
● Address bits 11:2 selects the target dword (in this case, 0) within the selected
Function’s configuration space.
● Address bits 1:0 define the start byte location within the selected dword (in this c
0).
● The processor initiates a 2-byte memory read starting from memory location
E£0400000h, and this is latched by the Host Bridge in the Root Complex. The
● Host Bridge recognizes that the address matches the area designated for
Configuration and generates a Configuration read Request for the first two bytes
● dword 0, Function 0, device 0, bus 4. The remainder of the operation is the sam
that described in the previous section.
The Host Bridge Must have been assigned a base address value .
This example assumes that the 256MB-aligned base address of the Enhanced
Configuration memory-mapped range is E000000h:
mov ax, [E0400000h] ;memory-mapped config read
Enumeration - Discovering the Topology
1. Software updates the Host/PCI bridge Secondary Bus Number to zero and the Subordinate Bus Number to 255.
2. Starting with Device 0 (bridge A), the enumeration software attempts to read the Vendor ID from Function 0 in each of the 32 possible devices on bus 0.
3.
4. The Header Type field contains the value one (O1h) indicating this is a PCI-to-PCI bridge.
5. Now that software has found a bridge, performs a series of configuration writes to set the bridge’s bus number registers as follows :
1. Primary Bus Number Register = 0
2. Secondary Bus Number Register = 1
3. Subordinate Bus Number Register = 255
6. Enumeration software must perform a depth-first search. Before proceeding to discover additional Devices/Functions on bus 0, it must proceed to search bus 1,
7. Software reads the Vendor ID of Bus 1, Device 0, Function 0, which targets bridge C in our example. A valid Vendor ID is returned, indicating that Device 0, Function 0
exists on Bus 1.
8. The Header Type field in the Header register contains the value one (0000001b) indicating another PCI-to-PCI bridge. As before, bit 7 is a 0, indicating that bridge C is a
single-function device.
9. Software now performs a series of configuration writes to set bridge C’s bus number registers as follows:
1. Primary Bus Number Register = 1
2. Secondary Bus Number Register = 2
3. Subordinate Bus Number Register = 255
10. Continuing the depth-first search, a read is performed from bus 2, device 0,Function 0's Vendor ID register. The example assumes that bridge D is Device 0, Function 0 on
Bus 2.
12. The Header Type field in the Header register contains the value one(0000001b) indicating that this is a PCI-to-PCI bridge, and bit 7 is a 0, indicating that bridge D is a
single-function device.
12. Software now performs a series of configuration writes to set bridge D’s bus number registers as follows: Contd..
● Primary Bus Number Register = 2
● Secondary Bus Number Register = 3
● Subordinate Bus Number Register = 255
13. Continuing the depth-first search, a read is performed from bus 3, device 0,Function 0's Vendor ID register.
15. The Header Type field in the Header register contains the value zero(0000000b) indicating that this is an Endpoint functi on. Since this is an end-
point and not a bridge, it has a Type 0 header and there are no PCI-compatible buses beneath it. This time, bit 7 is a 1, indicating that this is a multifunction device.
16. Enumeration software performs accesses to the Vendor ID of all 8 possible functions in bus 3, device 0 and determines that only Function 1 exists in
addition to Function 0. Function 1 is also an Endpoint (Type 0 header), so there are no additional buses beneath this device.
17, Enumeration software continues scanning across on bus 3 to look for valid functions on devices 1 - 31 but does not find any additional functions.
18. Having found every function there was to find downstream of bridge D, enumeration software updates bridge D, with the real Subordinate Bus Number of 3. Then it backs up
one level (to bus 2) and continues scanning across on that bus looking for valid functions. The example assumes that bridge E is device 1, Function 0 on bus 2.
20, The Header Type field in bridge E’s Header register contains the value one(0000001b) indicating that this is a PCI-to-PCI bridge, and bit 7 is a 0, indicating a single-function
device.
21. Software now performs a series of configuration writes to set bridge E’s bus number registers as follows:
● Primary Bus Number Register = 2
● Secondary Bus Number Register = 4
● Subordinate Bus Number Register = 255
22. Continuing the depth-first search, a read is performed from bus 4, device 0,Function 0's Vendor ID register.
24, The Header Type field in the Header register contains the value zero(0000000b) indicating that this is an Endpoint device, and bit 7 is a 0, indicating that this is a single-
function device.
Contd..
25. Enumeration software scans bus 4 to look for valid functions on devices 1-31 but does not find any additional functions.
26. Having reached the bottom of this tree branch, enumeration software updates the bridge above that bus, E in this case, with the real Subordinate Bus Number of 4. It
then backs up one level (to bus 2) and moves on to read the Vendor ID of the next device (device 2). The example assumes that
devices 2 - 31 are not implemented on bus 2, so no additional devices are discovered on bus 2.
27, Enumeration software updates the bridge above bus 2, C in this case, with the real Subordinate Bus Number of 4 and backs up to the previous bus (bus 1) and
attempts to read the Vendor ID of the next device (device 1). The example assumes that devices 1 - 31 are not implemented on bus 1, so no additional devices are
discovered on bus 1.
28. Enumeration software updates the bridge above bus 1, A in this case, with the real subordinate Bus Number of 4. and backs up to the previous bus
(bus 0) and moves on to read the Vendor ID of the next device (device 1). The example assumes that bridge B is device 1, function 0 on bus 0.
29, In the same manner as previously described, the enumeration software discovers bridge B and performs a series of configuration writes to set bridge B's bus number
registers as follows:
● Primary Bus Number Register =0
● Secondary Bus Number Register =5
● Subordinate Bus Number Register = 255
30. Bridge F is then discovered and a series of configuration writes are performed to set its bus number registers as follows:
● Primary Bus Number Register =5
● Secondary Bus Number Register = 6
● Subordinate Bus Number Register = 255
31. Bridge G is then discovered and a series of configuration writes are per-
formed to set its bus number registers as follows:
● Primary Bus Number Register = 6
● Secondary Bus Number Register =7
● Subordinate Bus Number Register = 255
32, A single-function Endpoint device is discovered at bus 7, device 0, function 0, so the Subordinate Bus Number of Bridge G is updated to 7
33. Bridge H is then discovered and a series of configuration writes are performed to set its bus number registers as follows:
● Primary Bus Number Register = 6
● Secondary Bus Number Register = 8
● Subordinate Bus Number Register = 255
Contd..
.
34, Bridge J is discovered and a series of configuration writes are performed toset bridge its bus number registers as follows:
● Primary Bus Number Register = 8
● Secondary Bus Number Register = 9
● Subordinate Bus Number Register = 255
35, All devices and their respective Functions on bus 9 are discovered and none
of them are bridges, so the Subordinate Bus Number of bridges H and J are
updated to 9,
36. Bridge I is then discovered and a series of configuration writes are performed to set its bus number registers as follows:
● Primary Bus Number Register = 6
● Secondary Bus Number Register = 10
● Subordinate Bus Number Register = 255
38. Since software has reached the bottom of this branch of the tree structure required for PCle topologies, the Subordinate Bus Number registers for
bridges B, F, and I are updated to 10, and so is the Host/PCI bridge's Subordinate Bus Number register.
Contd..
Final Values encoded into each bridge’s Primary,Secondary, Subordinate Bus Number
1. A series of configuration writes are performed to set its bus number registers
as follows:
a) Primary Bus Number Register = 64
b) Secondary Bus Number Register = 65
c) Subordinate Bus Number Register = 255
Configuration Space : It allow software to control and check the status of devices.
Memory and IO Address Spaces
➔ In the early days of PCs, the internal registers/storage in IO devices were
accessed via IO address space.
➔ There are several limitation and undesirable effects related to IO address space .
➔ Newer devices that do not rely on legacy software or heavy compatibility issues
typically just map internal registers/storage through memory address
space(MMIO), with no IO address space begin requested .
➔ The size of the memory map is a function of the range of addresses that the
system can use .
➔ Switches and Root Complexes to also have device-specific registers accessed via
MMIO and IO addresses.
➔ Device designer knows the collective size of the internal registers/storage that
should be accessible via IO or MMIO.
➔ The device designer also knows how the device will behave when those registers
are accessed (i.e. do reads have no side-effects or not ) .
➔ Knowing this information , the device designer hard-coded the lower bits of the BARs
to certain values indicating the type and size of the address being requested .
➔ To determine the size and type of address space requested , system software
checks the lower bits of the BARs.
➔ Then write the base address of the address range being allocated to this device
into the into the upper bits of the BAR.
➔ Not all BARs have to be implemented. If a device does not need all the BARs to map
their internal registers, the extra BARs are hard-coded with all 0’s notifying software
that these BARs are not implemented .
Fig: PCI Express Devices And Type 0 And Type 1 Header Use
BAR Example1: 32-bit Memory Address Space
➔ Requesting a 4KB block of non-prefetchable memory (NP-MMIO).
➔ Device will accept any memory request that fall within the range :F900_0000h –
F900_0FFFh(4KB in size)
2:1 Read as 00b indicating the target only supports decoding a 32-bit address
31:12 Reads as all 1s because software has not programmed the upper bits with a start
address for the block .Since least significant bit 12 . The memory size requested
is 2^12 = 4KB
Fig: 32-Bit Non-Prefetchable Memory BAR Set Up
Table: Reading the BAR after Writing All 1s
BAR Example2: 64-bit Memory Address Space
➔ BAR1 and BAR2 are being used to request a 64MB block of Prefetchable memory
address space .
➔ Two sequential BARs are being used as device supports a 64-bit address for this
request
➔ Software can allocate the requested address space above the 4GB address boundary if
it wants .
➔ Device will accept any memory request that fall within the range :24000_0000h-
243FF_FFFFh(64MB in size)
Lower 2:1 Read as 10b indicating the target supports a 64-bit address decoder, and that the next sequential BAR
contains the upper 32 bits of the address information.
Lower 3 Read as 1b, indicating request is for prefetchable memory (reads do not have side effects);P-MMIO
Lower 25:4 Read as all 0s,indicating the size of the request (these bits are hard-coded to 0)
Lower 31:26 Read as all 1s because software has not yet programmed the upper bits with a start address for the
block. Since least significant bit is 26.The memory address space request size is 2^26 =64MB
Upper 31:0 Read as all 1s.These bits will be used as the upper 32bits of the 64-bit start address programmed by
system software
➔ Device will accept and respond to IO transactions within the range :4000h –
40FF(256bytes in size)
1 Reserved. Hard-coded to 0b
7:2 Read as 0s indicating size of the request (these bits are hard-coded to 0)
31:8 Read as 1s because software has not yet programmed the upper bits with a start
address for the block . Since least significant bit is 8.The memory address space
request size is 2^8 =256 bytes
➢ Base and Limit registers in the Type 1 headers that are programmed
with the range of addresses that live beneath this bridge.
➢ There are the three sets of Base and Limit registers found in each
Type 1 header.
1. IF the target address in the TLP falls in the range of one of its
Base/ Limit register sets, the packet will be forwarded to the
secondary interface (downstream).
1. IF the target address in the TLP falls in the range of one of its
Base/Limit register sets, the TLP will be handled as an
Unsupported Request on the secondary interface.
Step1: The enumeration process selects the bus numbers for each device
● Upstream port of the switch will be assigned bus #2.
● In the switch device, the internal bus between the upstream port and the two
downstream ports will be assigned bus #3.
● The PCIe End Point (EP) devices that are downstream from the switch are Bus #4
and Bus # 5.
● The device numbers for both EP's will be 0.
Now Switch can be configured. Each port of the Switch has its own configuration space and
the format of the configuration space is type CONFIG 1.
Each of these CONFIG-1 spaces must be programmed. Fig:Example Testbench With Bus Numbers Assigned
The upstream port needs to be programmed first.
Contd..
0x18 Secondary Latency Timer Subordinate Bus Number Secondary Bus Number Primary Bus Number
.
The following task will program downstream device 1 of the downstream port : Contd..
// Program Device 1 Downstream Port // Write Subordinate Bus, Secondary Bus and Primary Bus Numbers:
// 1) Write Command register test = cfg1_wr.randomize() with {
// 2) Write Subordinate, Secondary and Primary Bus Numbers m_bvRequesterId == `TBD_REQ_ID;
// 3) Write Memory Limit and Memory Base m_bvBusNum == 8'h03; m_bvDevNum == 5'h01;
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h006;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
task pcie_rvm_env::Downstream_device_1_switch_config(); m_bvvPayload[0] == 32'h00040403;
begin };
bit test; if (!test) begin
`vmm_error(log, "TBD Configuration request packet failed to randomize.");
dw_vip_pcie_tlp_transaction cfg1_wr = end
new ( , dw_vip_pcie_tlp_transaction::CFG_WR_1);
dw_vip_pcie_tlp_transaction cfg_tlp; $cast(cfg_tlp, cfg_wr1.copy());
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
// Write COMMAND REGISTER cfg_tlp.notify.wait_for(vmm_data::ENDED);
// Enable memory cycles
test = cfg1_wr.randomize() with { // Write Memory Limit and Memory Base
m_bvRequesterId == `TBD_REQ_ID; test = cfg1_wr.randomize() with {
m_bvBusNum == 8'h03; m_bvDevNum == 5'h01; m_bvRequesterId == `TBD_REQ_ID;
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h001; m_bvBusNum == 8'h03; m_bvDevNum == 5'h01;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0; m_bvFuncNum == 3'h0; m_bvRegNum == 10'h008;
m_bvvPayload[0] == 32'h00000004; m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
}; m_bvvPayload[0] == 32'h00400020;
if (!test) begin };
`vmm_error(log, "TBD Configuration request packet failed to randomize."); if (!test) begin
end `vmm_error(log, "TBD Configuration request packet failed to randomize.");
end
$cast(cfg_tlp, cfg1_wr.copy());
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp); $cast(cfg_tlp, cfg_wr1.copy());
cfg_tlp.notify.wait_for(vmm_data::ENDED); tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
cfg_tlp.notify.wait_for(vmm_data::ENDED);
end
endtask
Contd..
.
The following task will program downstream device 2 of the downstream port : Contd..
// Write Subordinate Bus, Secondary Bus and Primary Bus Numbers:
// Program Device 2 Downstream Port
test = cfg1_wr.randomize() with {
// 1) Write Command register
m_bvRequesterId == `TBD_REQ_ID;
// 2) Write Subordinate, Secondary and Primary Bus Numbers
m_bvBusNum == 8'h03; m_bvDevNum == 5'h02;
// 3) Write Memory Limit and Memory Base
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h006;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
task pcie_rvm_env::Downstream_device_2_switch_config();
m_bvvPayload[0] == 32'h00050503;
begin
};
bit test;
if (!test) begin
`vmm_error(log, "TBD Configuration request packet failed to randomize.");
dw_vip_pcie_tlp_transaction cfg1_wr =
end
new ( , dw_vip_pcie_tlp_transaction::CFG_WR_1);
dw_vip_pcie_tlp_transaction cfg_tlp;
$cast(cfg_tlp, cfg_wr1.copy());
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
// Write COMMAND REGISTER
cfg_tlp.notify.wait_for(vmm_data::ENDED);
// Enable memory cycles
test = cfg1_wr.randomize() with {
// Write Memory Limit and Memory Base
m_bvRequesterId == `TBD_REQ_ID;
test = cfg1_wr.randomize() with {
m_bvBusNum == 8'h03; m_bvDevNum == 5'h02;
m_bvRequesterId == `TBD_REQ_ID;
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h001;
m_bvBusNum == 8'h03; m_bvDevNum == 5'h01;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
m_bvFuncNum == 3'h0; m_bvRegNum == 10'h008;
m_bvvPayload[0] == 32'h00000004;
m_bvFirstDWBE == 4'hF; m_bTD == 1'b0;
};
m_bvvPayload[0] == 32'h00600040;
if (!test) begin
};
`vmm_error(log, "TBD Configuration request packet failed to randomize.");
if (!test) begin
end
`vmm_error(log, "TBD Configuration request packet failed to randomize.");
end
$cast(cfg_tlp, cfg1_wr.copy());
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
$cast(cfg_tlp, cfg_wr1.copy());
cfg_tlp.notify.wait_for(vmm_data::ENDED);
tbd_gasket.m_oTlpTxInputChan.put(cfg_tlp);
cfg_tlp.notify.wait_for(vmm_data::ENDED);
end
endtask
Summary
• The Switch is now ready to accept configuration and memory packets that are for the EP devices connected to the downstream ports of the
switch.
• PCIe designs must go through the process of Switch enumeration to discover available switches, and then to configure them.
• VMM based PCIe Verification IP and SystemVerilog to perform the task of configuration during the process of Switch enumeration.
• The examples use the rich set of class, protocol, and packet capabilities of the VMM models to perform the configuration task.
MindShare Arbor: Debug/Validation/Analysis and Learning Software Tool
MindShare Arbor Feature List