Professional Documents
Culture Documents
Dell EMC PowerEdge Concepts and Features-Downloadable Content
Dell EMC PowerEdge Concepts and Features-Downloadable Content
CONCEPTS AND
FEATURES
DOWNLOADABLE CONTENT
DOWNLOADABLE CONTENT
Dell EMC PowerEdge Concepts and Features-Downloadable Content
Processors ............................................................................................................... 32
Processors ......................................................................................................................... 32
Intel Xeon Scalable Processor Family ................................................................................ 32
AMD Processors ................................................................................................................ 35
PowerEdge 15G Processors .............................................................................................. 36
Processor Settings ............................................................................................................. 37
Disassembly and Assembly of Processors ......................................................................... 38
Memory ..................................................................................................................... 40
Memory.............................................................................................................................. 40
Server Memory Module ...................................................................................................... 40
Power ........................................................................................................................ 68
Power ................................................................................................................................ 68
Power Supply Unit ............................................................................................................. 68
PSU Configuration Modes.................................................................................................. 69
PSU Redundancy Types .................................................................................................... 72
Power Capping .................................................................................................................. 73
Mixing Power Supplies ....................................................................................................... 74
PSU LED Indicator Behavior .............................................................................................. 75
PSU Firmware Updates ..................................................................................................... 76
PSU Blanks........................................................................................................................ 77
Removal and Installation of PSUs ...................................................................................... 78
Cooling...................................................................................................................... 79
Cooling .............................................................................................................................. 79
Fans and Types of Fans .................................................................................................... 79
Types of Fans .................................................................................................................... 80
PowerEdge 15G Fans ........................................................................................................ 84
Storage.................................................................................................................... 135
Storage ............................................................................................................................ 135
Introduction to Server Storage ......................................................................................... 135
NVMe............................................................................................................................... 136
Paddle Card - 15G Only ................................................................................................... 137
Removal and Installation of the Paddle Card ................................................................... 138
Internal Dual SD Module .................................................................................................. 139
Removal and Installation of IDSDM ................................................................................. 140
Boot Optimized Storage Solution (BOSS) ........................................................................ 141
RAID ................................................................................................................................ 144
RAID Level Comparison................................................................................................... 157
Hot Spare ........................................................................................................................ 159
Possible RAID Level Migrations ....................................................................................... 160
PERC Overview ............................................................................................................... 163
NVMe Support with PERC 11 .......................................................................................... 179
PERC Configuration Modes ............................................................................................. 180
PERC Card Matrix ........................................................................................................... 183
Removal and Installation of PERC ................................................................................... 185
Portfolio Overview
Movie:
Click the link below to watch an introduction to the Dell server portfolio.
https://edutube.emc.com/Player.aspx?autoplay=true&vno=ZICcxeiVyyvuY3T8JG72
ng
Dell EMC PowerEdge servers with common design components are identified by
the server model name.
The server naming convention provides insight into the form factor, class of
system, generation, and the CPU socket count.
Important:
• The PowerEdge XE family of servers is purpose-built for
complex, emerging workloads that require high-
performance and large storage. For example, the
PowerEdge XE8545.
• The PowerEdge XR family of servers is ruggedized,
industrial-grade servers intended for extreme
environments. For example, PowerEdge XR11/XR12.
The Dell EMC service tag is a seven-character identifier that is unique to the
product.
• The service tag of a PowerEdge server is a pullout tab also known as an
Enterprise Service Tag (EST). ESTs are typically located on the front or rear of
the chassis.
• Information about the service tag can also be found on a sticker typically on the
side of the chassis, and in the server BIOS.
All Dell EMC PowerEdge servers have a Service Tag and can have an Asset Tag
added.
The Asset Tag is an empty field within BIOS where you can input your own
identifying information such as the system’s security number or location ID.
Movie:
Click the link below to watch a demo video on locating the service tag in a
PowerEdge server.
https://edutube.emc.com/Player.aspx?autoplay=true&vno=DnD1V|@$@|UUKQ2lRt
WtNqDYMw
PowerEdge R630
PowerEdge R630
1 A tiered storage solution enables organizations to move older, less important, and
infrequently accessed data to less expensive storage solutions, while maintaining
active and important data on high-performing storage media.
PowerEdge R730xd
PowerEdge R730xd
PowerEdge R930
PowerEdge R930
5 2
1 4
Embedded NIC- 1 GbE x4 (this port is for management only). The NDC is present
in different versions ranging from 4x 1 GB, 2x 1 Gb + 2x 10 GB or 4x 10 GB.
5: Memory:
6:
Listed below are the PERC9 cards that are supported in the PowerEdge 13G
systems:
• Internal controllers: PERC S130 (Software RAID), PERC H330, PERC H730,
PERC H730P, HBA330 (no RAID internal HBA)
• External HBAs (RAID): PERC H830
• External HBAs (non-RAID): 12 Gbps SAS HBA
Storage Controllers: Listed below are the PERC9 cards that are supported in the
PowerEdge 13G systems:
• Internal controllers: PERC S130 (Software RAID), PERC H330, PERC H730,
PERC H730P, HBA330 (no RAID internal HBA)
• External HBAs (RAID): PERC H830
• External HBAs (non-RAID): 12 Gbps SAS HBA
• 8 GB vFlash media (optional) - all systems will get shipped with a vflash but
only systems that have the enterprise license will be able to use them.
• 16 GB vFlash media (optional) - all systems will ship with 8GB but customers
can request the 16GB vflash as an upgrade.
Expansion and Network: Up to x7 PCIe 3.0 slots, supports a dedicated RAID card
slot and a dedicated NDC slot.
Embedded NIC- 1 GbE x4 (his is the dedicated port for the iDRAC) - The NDC is
present in different versions ranging from 4x 1 GB, 2x 1 Gb + 2x 10 GB or 4x 10
GB.
8
2 5 6
1 3
4 9 10
7
1: Power Button
2: NMI button
6: LCD Panel
7: Service Tag
8: VGA Connector
The table below explains components on the front panel of the PowerEdge server.
1 Power Button
2 NMI Button
5 LCD Panel
6 VGA Connector
8 Service Tag
10 USB Connector
PowerEdge R640
PowerEdge R640
PowerEdge R740xd
PowerEdge R740xd
PowerEdge R940
PowerEdge R940
All the systems will ship with a bezel but the customer has the choice of purchasing
the bezel with or without LCD.
1 2
4 3
iDRAC9 supports the Group Manager feature that enables users to have multiple
console experience and offers simplified basic iDRAC management.
2: Intel C620 chipset: PowerEdge 14G systems include the Intel Lewisburg as the
Platform Controller Hub (PCH) chip. The Integrated Intel® Ethernet with scalable
iWARP RDMA in the Intel® C620 series chipset provides up to four 10 GBPS high-
speed Ethernet ports for high data throughputs and low-latency. Ideal for storage,
data intensive, and connected IoT solutions.
3: The Intel® Xeon® scalable processor family supports 2933 MT/s memory. As an
example, the PowerEdge R740 and R740xd support two DIMMs per channel at
2933 MT/s with these processors.
iDRAC9 supports the Group Manager feature that enables users to have multiple
console experience and offers simplified basic iDRAC management.
Intel C620 chipset: PowerEdge 14G systems include the Intel Lewisburg as the
Platform Controller Hub (PCH) chip. The Integrated Intel® Ethernet with scalable
iWARP RDMA in the Intel® C620 series chipset provides up to four 10 GBPS high-
speed Ethernet ports for high data throughputs and low-latency. Ideal for storage,
data intensive, and connected IoT solutions.
Processor: The Intel® Xeon® scalable processor family supports 2933 MT/s
memory. As an example, the PowerEdge R740 and R740xd support two DIMMs
per channel at 2993 MT/s with these processors.
The image below highlights the control panels that are located at the front of a
PowerEdge 14G system.
X2
System health
and system ID
Indicators
Status LED
System health
Indicators
Status LED and system ID
Indicators Indicators
PowerEdge R750
2 Root of Trust is a concept that starts a chain of trust that ensures systems boot
with a legitimate code at every step of the boot process. RoT is controlled by the
iDRAC9.
PowerEdge R750
PowerEdge XE8545
PowerEdge XE8545
PowerEdge XR11
PowerEdge XR11
PowerEdge R6515
PowerEdge R6515
1 2
4 3
1: Direct Liquid Cooling (DLC): DLC is introduced in the PowerEdge 15G systems
and features a leak-sensing technology to identify and resolve issues faster. The
DLC technology is supported only in the PowerEdge R650, PowerEdge R750,
PowerEdge R750xa and PowerEdge C6520 servers.
2: PowerEdge RAID Controllers: Support for PERC 10 and PERC 11 cards for
enhanced RAID performance.
To view the list of PERC types for Dell EMC systems, visit the List of PowerEdge
RAID Controller (PERC) types for Dell EMC systems document in dell.com/support
4: Memory: The 15G PowerEdge servers support the Intel Optane persistent
memory, that support up to 16 DIMMS per CPU.
Intel Optane persistent memory is also known as Barlow Pass. Click here for more
information on Barlow pass and different configuration.
Direct Liquid Cooling (DLC): DLC is introduced in the PowerEdge 15G systems
and features a leak-sensing technology to identify and resolve issues faster. The
DLC technology is supported only in the PowerEdge R650, PowerEdge R750,
PowerEdge R750xa, and PowerEdge C6520 servers.
PowerEdge RAID Controllers (PERC): Support for PERC 10 and PERC 11 cards
for enhanced RAID performance.
To view the list of PERC types for Dell EMC systems, visit the list of PowerEdge
RAID Controller (PERC) types for Dell EMC systems document in
dell.com/support.https://www.dell.com/support/kbdoc/en-in/000131648/list-of-
poweredge-raid-controller-perc-types-for-dell-emc-systems
Memory: The 15G PowerEdge servers support the Intel Optane persistent
memory, that support up to 16 DIMMS per CPU.
Intel Optane persistent memory is also known as Barlow Pass. Click here for more
information on Barlow pass and different configuration.
Server Components
Introduction
Hot swap components enable zero system downtime for failures and serviceability.
Examples of some Hot swap components are fans, disks, and Power Supply Units
(PSUs).
A field replacement unit (FRU) is a server component or assembly that requires the
entire chassis to be powered off to service.
FRUs are replaced by a user or technician without having to send the entire
product or system to a repair facility.
FRU is marked as blue color. Blue color indicates that the system must be
shutdown to replace this component.
Some component parts are designed for easy customer removal and replacement;
such parts are designated as Customer Self-Replaceable (CSR) or Customer
Replaceable Unit (CRU).
When during the troubleshooting, a Dell Technician determines that the repair can
be accomplished with a CSR/CRU designated part, Dell ships the designated part
directly to the customer, which allows customers to replace parts at their own
convenience.
Processors
Processors
The Intel processor uses a metal naming convention to designate the different
levels of available features. The levels are Platinum, Gold, Silver, and Bronze.
Click each image to learn more about the Intel Xeon scalable processor family.
1 2 3 4
1: Intel® Xeon® Platinum processors offer the industry best performance for
mission-critical and hybrid cloud workloads, real-time analytics, machine learning,
and artificial intelligence. The platinum processors offer monumental leaps in I/O,
memory, storage, and network technologies.
2: Intel® Xeon® Gold processors offer high performance, advanced reliability, and
hardware-enhanced security. The gold processors are optimized for demanding
data-centers, hybrid-cloud compute, network, and storage workloads.
AMD Processors
AMD Processor
The Dell EMC PowerEdge server portfolio is powered by the third generation AMD
EPYC™ Processors. The AMD processors system on a chip (SOC) is the next-
generation data center processor supporting socket compatibility with socket
infrastructure. The AMD Milan processor is based on a new enhanced Zen2 CPU
core with integrated I/O controllers.
The AMD Milan processor:
• Offers significant performance improvement from current generation production.
• Has 128 PCIe lanes, eight-channel memory, and dual-socket configurations.
• Lowers cost through an optimal balance of compute, memory, I/O, and security.
• Offers one I/O memory die which removes internal bottleneck for lower latency.
• Has up to 64 CPU cores per processor.
• Interchip global memory interconnect (xGMI2) up to 64 lanes.
• Has Secure Encrypted Virtualization(SEV) which provides 509 unique
hypervisor keys.
• Has two restrictions.
− The RTC/CMOS is built into the CPU, similar to previous PowerEdge AMD
servers. RTC/CMOS will be lost when CPU1 in server is removed or
reinstalled.
− AMD does not support early boot. No error message is displayed when there
is no memory that is populated in the system.
The PowerEdge 15G servers also support third-generation Intel Xeon scalable
processors.
The Intel® Xeon® Processor has increased performance and incremental memory
options.
The Xeon scalable processor supports usages from entry designs based on Intel
Xeon Silver processors to advanced capabilities offered in the new Intel Xeon
Platinum processor.
The third-generation Intel Xeon scalable processor supports:
• Up to 40 CPU cores.
• Increased memory capacity with up to 8 channels and up to 256 GB DDR4
DIMM support.
• Enhanced memory performance with support for up to 3200MT/s DIMMs (2
DPC).
• Intel Optane persistent memory 200 series module (Intel Optane Persistent
Memory 200 Series, up to 512 GB modules) up to 6 TB of total system
memory/socket DDR+PMM.
• Faster I/O with PCI express 4 and up to 64 lanes (per socket) at 16 GT/s.
• Faster Intel Ultra Path Interconnect (UPI) with 3 Intel UPI at 11.2 GT/s
(supported in gold and platinum options).
The 15G server with Intel processors and heatsinks has an additional anti-tilt
feature to prevent tilting of the heatsink assembly. The plastic nuts secure the
heatsinks to the system board.
Unlocked Locked
Processor Settings
The Processor Settings option is used to view and configure various processor
settings. The Processor Settings can be accessed through System Setup utility.
Processor Settings.
Go to System Setup Main Menu > System BIOS > Processor Settings.
Double-click the image to enlarge.
Click the play button to view the process of removing and installing the processor
on the Dell EMC PowerEdge server.
Movie:
Link to video:
https://edutube.emc.com/Player.aspx?autoplay=true&vno=GsFeXmu9Q3gwvx0r3gj
+gQ
Memory
Memory
Dell EMC PowerEdge servers run on Error-Correcting Code (ECC) memory. The
ECC memory can test and correct any memory errors without the processor or the
user being aware of these operations. ECC corrects the errors without interrupting
other operations on the server.
PowerEdge Servers
Error-Correcting Code (ECC) Memory
a) Both data (M bits) and code generated by Data In traffic are stored.
b) During fetch, new K code bits are generated from M data bits and compared Data Out
with fetched K code bits.
c) If no errors were detected in Compare, then the path to Data Out is taken.
Data M bits
Compare Error Signal
Data In
Code K bits
Memory Comparison
The table below highlights the differences in memory features across the three
generations of Dell EMC PowerEdge servers.
RAM Size 1 X 4 GB 1 X 8 GB 1 X 8 GB
14G 15G
13G
4 Memory channels x 3 slots per channel = 6 Memory channels x 2 slots per channel = 8 Memory channels x 2 slots per channel =
12 DIMMs per CPU 12 DIMMs per CPU 16 DIMMs per CPU
5Memory channels are the physical layer on which the data travels between the
CPU and memory modules.
Memory Layout
Click each tab to view the memory layout supported for each generation of
servers.
8If socket A1 is populated for processor 1, then populate socket B1 for processor 2
with an identical DIMM.
15G R750 (Intel) PowerEdge Server Motherboard 15G R7525 (AMD) PowerEdge Server Motherboard
Memory Settings
The Dell EMC server memory settings can be accessed through the Lifecycle
Controller (LCC) System Setup option.
Step 1
Step 2
Step 3
Step 4
Memory Settings
10
If this field is set to Enabled, memory interleaving is supported if a symmetric
memory configuration is installed.
If the field is set to Disabled, the system supports NUMA (asymmetric) memory
configurations. This option is set to Disabled by default.
Memory Modes
- Optimizer Mode
- Mirror Mode
- Spare Mode
- Dell Fault Resilient Mode
The Dell EMC PowerEdge chipset allows different operating modes for the memory
to be set in the BIOS.
The various memory modes available on the Dell EMC PowerEdge servers are:
• Optimizer Mode: The memory controllers run independently of each other.
• Mirror Mode: The system supports memory mirroring if identical memory
modules are installed in two channels.
• Advanced ECC Mode: The two memory channels closest to the processor
(channel 0 & 1) are combined to form a single 128-bit channel.
Optimizer Mode
Mirror Mode
Spare Mode
Optimizer Mode:
In Optimizer mode, all three channels are populated with memory modules. This
mode permits a larger total memory capacity but does not support SDDC with x8-
based memory modules.
It is recommended to populate all three channels with identical memory but each
channel can have a different size DIMM. The larger DIMM has to be installed in the
first slot and the configuration has to be the same across all three channels. In a
dual-processor configuration, the memory configuration for each processor must be
identical. Optimizer mode is the only mode to support mixed memory sizes. Any
configurations not following the above rules may generate error messages or not
11
A memory rank is a block or area of data that is created using some, or all, of the
memory chips on a module.
POST at all. For more detail read the initial release notes: Installing and configuring
DDR3 Memory
It is recommended to populate all three DIMM slots on servers with three DIMM
slots per channel to take advantage of memory interleaving to get maximum
performance. While a single UDIMM per channel gives slightly better performance
than an RDIMM, RDIMMs give better performance when multiple DIMM per
channel are installed.
Optimizer is used if just one DIMM for each processor is configured. A minimal
single-channel configuration of 1 GB memory modules per processor is also
supported in this mode. Minimum to POST would be one DIMM and in the first slot
and just CPU 1 installed.
Memory Mirroring:
In Advanced ECC (Lockstep) mode, the two channels closest to the processor (CH
0 & 1) are combined to form one 128-bit channel. This mode supports Single
Device Data Correction (SDDC) for both x4 and x8 based memory modules.
Memory modules must be identical in size, speed, and technology in the slots on
channel 0 and 1. Channel 2 has to be empty or option will not be available in the
System Setup program.
Using the Intel 5500 and 5520 chipset with Intel 55xx and 56xx processors
channels 0 and 1 are combined which enables 8-bit error correction instead of 4-bit
in normal Advances ECC (not lockstep). SDDC gives the ability to recover from
more types of single and multibit memory errors. The third channel and
corresponding memory slots cannot be used but full amount of installed physical
memory will be accessible to the operating system.
Note: 14G and 15G servers do not support advanced ECC mode.
Dell Fault Resilient Mode (FRM) is a Memory Operating mode available on the
BIOS settings of high-end yx2x Dell PowerEdge servers and later. This mode
establishes an area of memory that is fault resilient and protects the hypervisor
against uncorrectable memory errors, and safeguards the system from becoming
unresponsive. Systems with ESXi that supports the FRM feature can load the
operating system kernel to maximize system availability and or critical applications
or services.
• Single-rank sparing mode: It allocates one rank per channel as a spare. This
mode requires a population of two ranks or more per channel.
• Multi-rank sparing mode: It allocates two ranks per channel as a spare. This
mode requires a population of three ranks or more per channel.
When single rank memory sparing is enabled, the system memory available to the
operating system is reduced by one rank per channel. See the example below.
In a dual-processor configuration with twenty-four 16 GB dual-rank memory modules, the available system memory
is:
This calculation changes based upon single-rank or muti-rank sparing. In multi-rank sparing, the multiplier changes
to 1/2 (ranks/channel).
The Intel Optane Persistent Memory (Barlow Pass) solution retains data during a
power loss, system shutdown, or system errors. Barlow Pass (BPS) uses persistent
memory as storage, rather than traditional memory.
DRAM
(volitle variables)
Memory Bus
CPU
NVDIMM (BPS)
(volitle variables)
• Creates a unique new memory tier to reduce latencies and optimize workloads.
• Provides disruptive storage class memory cell technology (3DxPoint) that
resides on the DDR memory interface.
• Provides large memory footprints of 128 GB, 256 GB, and 512 GB.
• Enables in-memory data to survive a soft reset or a hard reboot (power loss).
• Provides minimal latency and faster storage for large amounts of memory.
Bulk Capacitors
DIMM
Primary Media Media Media Media Media Media
Controller 31.25mm
Side
133.35mm
Data Clock Oscilator
Buffers Persistent Memory DRAM (AIT)
Media Power Management IC
SPi Flash
SPD
DRAM SPi Secondary
Media Media Media Media Media PMIC
Side
Data
Buffers BUFF BUFF BUFF BUFF
NVDIMM
Intel Optane
PCIe SSD
Storage Tier
PCIe SSD
SATA SSD
HDD
Tape
The Barlow Pass architecture consists of two-tier memory and storage hierarchy to
address the data performance and storage challenges. The advantages of the
hierarchical approach are:
• Provides a unique combination of affordable large capacity and support for data
persistence.
• Optimizes the resources for efficient data access and storage.
• Provides higher performance (up to 3200 MT/s) with low latency DRAMs.
• Creates larger memory capacity (up to 4TB per CPU) to store and protect data
in DRAM.
• Enables in-memory computing for large datasets.
• Leverages the speed and proximity from the technologies nearer to the CPU.
3DS LR-DIMM Not Supported Not Supported Supported Not Supported Supported
NVDIMM-N
Not Supported Not Supported Not Supported Not Supported Not Supported
(R Type)
Click each tab to review the Barlow Pass memory and processor configuration for
15G systems.
The memory modes can be used only if the DIMMs are RDIMMs with capacity of
32GB or lesser.
BPS1 4+4 1 or 2 4 4 Y Y
BPS2 6+1 1 or 2 6 1 Y
BPS3 8+1 1 or 2 8 1 Y
BPS4 8+4 1 or 2 8 4 Y Y
BPS5 8+8 1 or 2 8 8 Y Y
BPS6 12+2 1 or 2 12 2 Y
Each row on the below chart represents a different valid memory configuration for
mixing Barlow Pass (B) and RDIMMs (R).
Per CPU
B R B R R B R B
R R R R B R R
R NV R NV R R NV R R B R NV R
R R B R R B B R R B R R
R B R B R B R B B R B R B R B R
B NV
R R R R R R R
R R R R R R B
Operational Modes
Users can configure the memory modes and update it in the BIOS.
12The mode can be changed through the BIOS settings: F2 -> System BIOS ->
Memory Settings -> Persistent Memory -> Intel Persistent Memory -> Region
Configuration.
Memory Mode
13They access DIMMs as system memory, and will not have control or direct
access to DDR4 DIMMs that are used for caching.
AppDirect Mode
No change to application
Application
Standard Raw Device Access OS or VMM
OS or VMM
Persistent Region
The App Direct mode is the default memory mode on the BIOS. AppDirect mode
uses the DIMM as storage.
Features of App Direct mode:
• Provides larger storage capacity, higher endurance, low latency and traditional
read/write.
• Works with existing file systems to access the files. Two major methods to
access the files are Block method14 and PMEM method15.
• Cache lines are accessed using load or store instructions.
• Application is responsible for flushing data out of CPU cache into persistence
guaranteed memory buffers.
14 The block method is slower and is similar to traditional storage access. The block
size is configurable at the operating system level.
15 PMEM method uses the full technology potential, but requires the application to
be optimized.
In Memory mode, the BIOS and operating system list the capacity of the Optane
memory and not the RDIMMs. The RDIMMS are used as cache for the Optane
DIMMs when running in memory mode.
In AppDirect mode, 632 GB is the full amount of memory available but only 128 GB
of it is volatile. The system uses the rest of the memory as persistent storage.
The image shows the difference between the memory capacity that is available when running in
Memory mode and AppDirect mode.
Click the play button to view the process to remove and install DIMMs on a 15G
PowerEdge server.
Link to video:
https://edutube.emc.com/Player.aspx?vno=u7HFeGUw5TxmRHnGo3VGFA==&aut
oplay=true
Movie:
Power
Power
In most cases16, the power supply unit (PSU) is a hot-swappable component that
provides power redundancy support on PowerEdge servers.
All the Dell EMC PowerEdge servers support a minimum of two PSUs.
16The PSUs shipped in the PowerEdge 200-500 server series are not hot-
swappable. Not all Dell EMC PowerEdge support a minimum of two PSUs. Some
low-end PowerEdge servers have a single PSU.
17Not all Dell EMC PowerEdge support a minimum of two PSUs. Low end
PowerEdge servers have a single PSU. For example, the PowerEdge R230 does
not support the multiple PSU feature and redundancy.
Grid Redundant
In grid redundant mode, the hot spare18 feature is disabled and the power output is
distributed equally across both power supplies. The Power Factor Correction (PFC)
is disabled by default, to reduce power consumption when the system is on
standby. However, if a single PSU fails, the power drops down.
18When the hot spare feature is enabled, one of the redundant PSUs is switched to
the sleep state. The active PSU supports 100 percent of the system load, thus
operating at higher efficiency. The PSU in the sleep state monitors the output
voltage of the active PSU. If the output voltage of the active PSU drops, the PSU in
the sleep state returns to an active output state.
No Redundancy
19When the hot spare feature is enabled, one of the redundant PSUs is switched to
the sleep state. The active PSU supports 100 percent of the system load, thus
operating at higher efficiency. The PSU in the sleep state monitors the output
voltage of the active PSU. If the output voltage of the active PSU drops, the PSU in
the sleep state returns to an active output state.
The PSU redundancy mode depends on the server type and the number of PSUs
in the system.
Power Capping
The Power capping option is used to limit the amount of power consumed by a
server.
When power cap policy is enabled, it enforces a user-defined power limits on the
system. If power-capping is not enabled, the default hardware power-protection
policy is used. This power-protection policy is independent of the user-defined
policy. The system performance is dynamically adjusted to maintain power
consumption close to the specified threshold.
Dell EMC PowerEdge servers support mixed PSU configurations. The criteria to
implement mixed PSU configurations are:
14G
15G
PSU LED
OFF No power
The PSU firmware can be updated through the Lifecycle Controller (LCC). Click
here to review how to update a server Power Supply Unit firmware, including a
video walk-through of the procedure.
Example of a PSU errors on iDRAC due to mixed PSU configuration. Double click to enlarge the
image.
PSU Blanks
To maintain an efficient airflow for system cooling, all servers with an empty PSU
slot require PSU blank plates. PSU blanks avoid the loss of cooling airflow. If the
PSU blanks are missing, the system temperature might increase and result in
component failures.
In 15G servers, the PSUs are located in the rear of the system. The PSUs are on
the opposite side of each other for better airflow within the chassis.
Click the play icon to view the process of removing and installing the a PSU in a
PowerEdge server.
Link to video:
https://edutube.emc.com/Player.aspx?vno=piELqAPBoEvL8f7u5bGZVQ==&autopl
ay=true
Movie:
Cooling
Cooling
A server consists of multiple fans. When a fan fails, the remaining fans take up the
load.
The cooling fans dissipate the heat generated by the functioning of the server21.
These fans cool the processors, expansion cards, and memory modules.
21 Some servers may not have hot-swappable fans (Hot swapping is the
replacement or adding of components to a system unit system without stopping,
shutting down, or rebooting the system). If no hot-swappable fans are available and
if a fan fails, the iDRAC ramps up the existing fans, similar to systems with hot-
swap fans. However, the failed fan cannot be replaced until the system has been
powered off as the fans must have their cables that are disconnected from the
system board.
Types of Fans
The Dell EMC PowerEdge servers use Standard fans, High-Performance fans, and
Very High-Performance fans based on the server configurations22.
Dell EMC PowerEdge servers come with different chassis dimensions and they can
be 1U, 2U, and so on. Based on the chassis dimension and design the fan
dimension may vary as well.
If a system has six fans and one of the fans fails, the iDRAC ramps up the
remaining fans. It keeps the temperature within the chassis at a set level. (It should
be noted that if the temperature is already well below the required level, the iDRAC
may not ramp up the remaining fans.)
Once the failed fan has been replaced, the iDRAC tests the new fan. It slowly
decreases the speed of the existing fans while increasing the speed of the new fan
until they are all operating at the correct speed.
Removal of the chassis covers often results in the fans ramping up as the cover is
used to deflect airflow throughout the system. If the cover is removed a certain
amount of airflow is lost so the iDRAC, upon detecting that the cover has been
removed, it ramps up the fan-in an effort to increase the airflow across the
components to maintain required temperatures.
22 A user can see the thermal restrictions matrix or the technical guide of the server
for more information.
Also when the system is first powered on, the temperatures take a few seconds to
be recorded. As a fail-safe procedure, the iDRAC ramps the fans up and then bring
them back down as temperature status is analyzed.
Should a fan fail, errors are posted and the remaining fans pick up the additional
workload. However, based on the temperature within the chassis, the remaining
fans may or may not increase their speed.
Some servers, especially the lower ranges may not have hot-swap fans. In that
case, if a fan fails, the iDRAC ramps up the existing fans like for systems that have
hot-swap fans, but the failed fan cannot be replaced until the system has been
powered off as the fans must have their cables that are disconnected from the
system board.
The HPR fans provide a higher airflow rate. HPR fans are required in 12 x 3.5”,
rear-storage configurations and most GPU configurations.
VHP fans require front 16x NVMe drives or 8x NVMe, 16x SAS drives with GPU
configurations.
Some of the 15th generation Dell EMC Intel servers use standard, high-
performance silver grade, or high-performance gold grade fans, dependent on the
configuration.
Dell EMC PowerEdge servers such as R250, R350, T350, T150, XR11 and XR12
use non hot pluggable single rotor fans.
Hot pluggable fans used in the Dell EMC PowerEdge R750 servers
Non-hot pluggable fans used in the Dell EMC PowerEdge T150 and R250 servers
Click the play icon to view the process of removing and installing a system fan-in a
Dell EMC PowerEdge R750 server.
Movie:
Link to video:
https://edutube.emc.com/Player.aspx?vno=o4jLJmM21AIp2vsPpWAs8Q==&autopl
ay=true
Heatsinks
1U standard (STD) heatsink (HSK) T-type HSK 2U high performance (HPR) HSK
The type of heatsink that is used is based on the CPU TDP23 and GPU
configurations.
Some PowerEdge servers have unique fan positioning in the chassis. For example,
the Dell EMC PowerEdge XR11 has two fans that are located towards the middle
of the chassis. It has an extended heatsink design for optimum cooling.
23 Thermal Design Power (TDP) is measured in watts and is the maximum amount
of heat that is generated by a GPU or CPU. There are multiple types of CPU
heatsinks available including standard (STD), T-type, and full height heatsinks.
In, certain single CPU configurations (non-GPU or nonrear-drive) only four fans are
required to be installed in the fan bay.24
Two fans installed in the middle of the PowerEdge XR11 Extended heatsink used in the PowerEdge XR11.
chassis.
Unique fan position and extended heatsink that is used in the PowerEdge XR11.
To remove the heatsink: The heatsink and processor are too hot to touch for some
time after the system has been powered off. Allow the heatsink and processor to
cool down before handling them.
1. Ensure that all four Anti-Tilt wires are in the locked position (outward position),
and then using a Torx number T30 screwdriver, loosen.
24In such configurations, only four fans are required to cool the system. For the
other two fan sockets, two fan blanks are required to be installed in fan bays 1 and
2. The number of fans that are required depends on the server model and
configuration.
Memory DIMM blank used in the R740. GPU air shroud used in the XE8545.
DIMM blanks on empty DIMM slots help regulate air flow through the CPU and
DIMM area.
For some servers, the air shroud looks like piano keys25 that drop down into empty
DIMM slots to stop airflow from being wasted. However, systems with midrange
storage require DIMM blanks for empty DIMM slots since the trays do not contain
piano keys.
Some of the PowerEdge servers like the XE8545 have separate GPU air shrouds
for better airflow and heat dissipation.
25 This means that they do not require single DIMM blanks on empty DIMM slots.
GPU fans located in the front of the chassis. Heatsinks seated on top of the Nvidia A100 GPUs.
Dell EMC specialized servers also known as XE servers can have advanced GPU
configurations. These systems generate a lot of heat and require custom solutions.
The Dell EMC PowerEdge XE8545 supports up to 4x NVIDIA A100s GPUs and
NVLink in an air-cooled chassis.
The GPUs are cooled with the help of hot-pluggable GPU fans and specialized
heatsinks for each of the GPUs.
Click the tabs to learn about the upgrades on design and thermal configurations.
Design Innovation
The advanced thermal design streamlines the airflow pathway in the chassis and
directs the appropriate volume of air to components that require a constant air
supply.
The design minimizes the fan and system power consumption while maintaining
the system temperature.
iDRAC Cooling Configuration settings page where a user can change the exhaust temperature
limit.
Dell EMC PowerEdge servers use advanced thermal control algorithms to maintain
system temperatures at reliable levels while minimizing fan speed26 and system
airflow.
Users can apply custom fan speeds when using interfaces such as: iDRAC UI,
BIOS setup (F2), and RACADM.
26This minimization of system fan speeds and airflow can result in high exhaust
temperatures may be of concern to some users.
As computing demands grow, so do data centers, and with this growth comes huge
amounts of heat27 that must be managed efficiently. Many data centers start out as
a few racks in a server room, adding more equipment over time. Without taking
cooling factors into account, data center HVAC management can become difficult.
27When data centers are exposed to heat, servers start to slow down or
malfunction altogether. The same thing happens when the server rooms are too
cold. The ideal temperature for the data center depends on the size and amount of
heat that is emitted, but operating within this ideal temperature range is crucial for
overall performance.
Hot aisle containment (HAC) guides the hot air (red arrows) into a system unit room air handler
(CRAH) which then recirculates the flow into cool air (blue arrows).
Multiple PowerEdge servers with new Intel and AMD processors support the Dell Technologies
DLC.
Direct Liquid Cooling (DLC) solution manages the growing thermal challenges. Dell
DLC solutions28 cool the CPU with warm liquid, which has the capacity to transfer
heat up to 4X more than the capacity of air cooling.
Because DLC solutions are more efficient at extracting heat, it reduces the burden
on server system fans and the data center’s cooling infrastructure.
The PowerEdge servers below offer DLC cooling on the newest Intel and AMD
processors:
• C6520
• C6420
• C6525
• R6525
• R7525
• R650
• R750
• R750xa
28DLC solution is more efficient at extracting heat, reducing the burden on server
system fans and the data center’s cooling infrastructure.
Discussion Block
Discussion Notes:
Manifold
Liquid
flow path
Microchannels
DLC example of a cold plate and coolant loop. Monolithic is used in the 15G rack servers and
modular is used in the Dell EMC PowerEdge C6420 and C6520 servers.
DLC uses the exceptional thermal capacity of liquid to absorb and remove the heat
that is created by new high-power processors. Cold plates are attached directly to
the processors. The coolant captures and removes the heat from the system to a
heat exchanger in the rack or row.
This heat load is removed from the data center using a warm water loop, potentially
bypassing the expensive chiller system. By replacing (or supplementing)
conventional air-cooling with higher-efficient liquid cooling, the overall operational
efficiency of the data center is improved.
Liquid Cooling Module (Monolithic architecture) Liquid Cooling Module (Modular architecture)
Leak Sense technology provides customers with the knowledge that potential
issues are found and reported quickly.
If a coolant leak occurs, the system’s leak sensor logs an alert29 in the iDRAC
system.
29Three errors can be reported: small leak (warning), large leak (critical), leak
sensor error (warning – indicates the issue with the leak detection board) on the
iDRAC. These error detections can be configured to take meaningful actions using
tools like OpenManage Enterprise.
POD Solution
POD solution containing two outer racks with node-level DLC and one middle In-Row Cooler.
The Dell EMC rack-level POD solution30 concept is designed for total heat capture.
The POD solution contains front and back containment for racks of DLC servers,
plus an In-Row Cooler that is integrated between the IT racks to capture any
remaining heat.
Monolithic Architecture
The Dell EMC PowerEdge R650, R750, and R750xa follow the monolithic
architecture.
30 A pod or a cluster is a set of system units that are linked by high-speed networks
into a single unit.
In the monolithic architecture, the Liquid Leak Sensor board connects to the
Complex Programmable Logic Device (CPLD) using the Liquid Cooling Rear I/O
board.
C2 Programming
Modular Architecture
In the modular architecture, the Liquid Leak Sensor board connects directly to the
Complex Programmable Logic Device (CPLD).
C2 Programming
Discussion Notes:
• The Liquid Cooling Rear I/O (LC RIO) board is a component specific to the
monolithic architecture only (PowerEdge R650, R750, and R750xa).
• A high-level overview of the liquid cooling process:
− Depending on the platform, the Liquid Leak Sensor (LLS) board, connects to
the immediate upstream entity (LC RIO board for the monolithic architecture
or the CPLD for the modular architecture) using an alert cable.
− The message is then forwarded to the iDRAC using the SPI-X registers, and
the error is logged.
• If a leak develops in a particular cold plate and the detection cable is not
engaged, then the alarm signal will not be received. A disengaged alert cable is
reported as an error in the iDRAC logs.
• Here, the context of a 'modular architecture' does not include blade servers.
The Dell EMC PowerEdge MX750c system does not support a liquid cooling
configuration.
External support for liquid cooling is common for both the monolithic and modular
architectures.
− CDUs connect to the rack manifold to pump coolant to the racks and
exchange heat from the servers with facility water.
Facility water at the CDU exchange heat with Manifolds connect Sled-internal liquid
customer site facility water servers to the CDU cooling system
DLC Ecosystem
The image below shows the high-level overview of the DLC ecosystem.
Rack manifold
Liquid cooled
servers
Coolant
Distribution Unit
Coolant pipe assembly
DLC ecosystem
Discussion Block
Discussion Notes:
The Dell EMC Chiller-Less Fresh Air solution brings air into the data center from
the outside to support the cooling systems. The Dell Fresh Air Solution:
Dell Fresh Air 2.0 hardware includes specific configurations that can operate at
higher temperature and humidity levels and use clean outside air for air intake
instead of tightly controlled air conditioning (AC) from a cold aisle.
The general configuration and device restrictions for deployment in a fresh air
environment are listed below.
5
2
3
4
1:
• High-power PCIe cards (>75 W that use AUX cable, such as: GPU) Lower
power cards could also be excluded based on system limitations.
• Third-party PCIe card (any power levels).
3:
Networking
Networking
NDC used in the Dell EMC PowerEdge 12G, 13G, and 14G servers.
Older generation servers used a network interface card (NIC) built into the system
board. When upgrading or changing the NIC technology, users would install a PCIe
network interface controller in one of the PCIe slots in the server.
With the Dell EMC PowerEdge 12G, 13G and 14G servers, the NICs are based on
a daughter card.31 Users can easily change network requirements as they evolve.
A Dell EMC Network Daughter Card (NDC)32 enables the user to choose the right
network fabric without using a valuable PCI slot. It presents an easy upgrade path
from 1 GbE to 25 GbE LAN speeds.
Movie:
Link to video:
https://edutube.emc.com/Player.aspx?vno=/g0hzowboonHcUgIVGTkcw==&autopla
y=true
31 The Network Daughter Card (NDC) is a custom form factor mezzanine card that
contains a complete NIC subsystem.
32 The NDC typical includes the features and behavior of a traditional LOM (LAN on
OCP Card
The Open Compute Project (OCP)33 cards are network cards that connect to the
PCI bus. They are physically smaller than the Industry Standard Architecture (ISA)
expansion card and often connect to a dedicated connector on the system board.
The OCP card was introduced with the Dell EMC PowerEdge 15G servers.
33The Open Compute Project (OCP) is an organization that shares designs of data
center products and best practices among companies. The designs and projects
include server designs, data storage, rack designs, open networking switches, and
so on
Important: The OCP and the NDC cards are not a hot-swappable
component.
Click the play button to view the process of removing and installing of an OCP
card in a Dell EMC PowerEdge server.
Movie:
Link to video:
https://edutube.emc.com/Player.aspx?vno=gN92Ty6wDims2t286i4/BQ==&autoplay
=true
SNAP I/O
The SNAP I/O adapters enable both CPUs within a dual-socket server to connect
directly to the network through its own dedicated PCIe interface.
SNAP I/O results in low latency, CPU utilization and higher network throughput.
The image below is of a SNAP I/O ConnecX-5 dual-port 100 GbE only adapter. It
supports PCIe Gen3/Gen4 x16. It is supported by the iDRAC and the Lifecycle
Controller.
The bottom-left image is of a primary SNAP I/O card, and the bottom right is an
auxiliary card. The left image is the SNAP I/O ConnectX-6 single port VPI HDR
adapter. It supports PCIe Gen4 x16 and PCIe Gen3 x32 (with auxiliary card). The
iDRAC and Lifecycle Controller do not support this card.
On the left, is a SNAP I/O ConnectX-6 single port VPI HDR adapter and on the right is an auxiliary
card.
The image below shows the SNAP I/O ConnectX-6 card along with the auxiliary
card which is installed in a Dell EMC PowerEdge server.
SNAPI
Network
The above image shows SNAPI-capable NIC directly connected to both CPUs bypassing QPI and
UPI34. It frees up bandwidth for applications and improves latency.
Both SNAP I/O and SNAPI35 (also know as socket direct cards) are similar in how
they function. However, they connect to the CPU differently.
connect directly with multiple upstream CPU sockets. It bypasses inter-CPU socket
link usage and associated overheads such as NUMA latency penalty.
Click the play button to see a video demonstrating the removal of the riser 1 which
supports the SNAP I/O module in the Dell EMC PowerEdge C6520 server.
Movie:
Link to video:
https://edutube.emc.com/Player.aspx?vno=mLFlKByDShaRqFXSpow4Pg==&autop
lay=true
The rear I/O and the LAN on motherboard (LOM) cards have been introduced with
the Dell EMC PowerEdge 15G servers.
• iDRAC port36
• Video Graphics Array (VGA)
• USB
• ID Button
• Chassis intrusion switch cable
• Optional serial connector
36 RIO has an iDRAC port, but the iDRAC chipset is on the system board.
LOM network refers to the Ethernet connectivity provided to the compute sleds by
the I/O modules installed at the back of the PowerEdge servers. LOMs eliminate
the need for a separate network interface card to access a local area network.
Dell EMC PowerEdge 15G servers support two NIC ports that are embedded on
the LOM card.
Different types of RIO cards used in a PowerEdge server which supports Direct
Liquid Cooling.
RIO card used in a non-liquid cooling configuration. Custom RIO card used in a Direct Liquid Cooling
configuration.
Click the play button to view the process of removing and installing of a LOM card
in a Dell EMC PowerEdge server.
Link to video:
https://edutube.emc.com/Player.aspx?vno=s/qq+d59xdKAk2mnlwv0Pg==&autopla
y=true
Click the play button to view the process of removing and installing of a rear I/O
card in a Dell EMC PowerEdge server.
Link to video:
https://edutube.emc.com/Player.aspx?vno=N2oP6pLsbwLi8bfst8oSNg==&autoplay
=true
Accelerator Cards
GPUs
37 A GPU typically has thousands of cores that are designed for efficient execution
of mathematical functions.
Application Code
(Serial Tasks)
Compute-Intensive Functions
5% of Code
GPU CPU
CPUs consist of minimal cores optimized for serial processing, while GPUs consist of thousands of
smaller, more efficient cores designed for parallel performance.
FPGAs
In the image example, Intel FPGA Accelerated Network Function moves load
balancing, QoS and classifying tasks away from the CPU load.
Smart Homes
Intel FPGA card programmed to augment the capabilities of Virtual Network Functions running on a
carrier cloud.
ASICs
Application-Specific Integrated Circuits (ASICs) are cards with silicon devices built
for specific purpose such as graph computing with massively parallel, low-precision
floating-point computing.
A Graphcore card with embedded IPUs, specifically designed for artificial intelligence.
Tip: The Graphcore IPU supports the PowerEdge R6525 and DSS
8440.
Manufactu Model
rer
Use Cases
Click each puzzle piece for information about use cases for GPUs.
Choosing GPUs and other accelerated architectures and products is a key decision
IT teams have in their hands. Once the decision is made, for the appropriate
workloads, then infrastructure strategy and product choices are addressed.
CUDA divides work into small independent work and solves independently among
the CUDA blocks.
CUDA requires a supported version of Linux with a GCC complier and toolchain, or
Microsoft Windows or Microsoft Visual Studio, depending on the OS used.
The links that are provided below detail CUDA installation instructions by operating
system.
iDRAC UI, System > Cooling > Temperatures. Double-click image to enlarge.
The system board inlet temperature shouldbetween the minimum and maximum
warning threshold38.
If a system board inlet temperature warning message is logged, the GPUs lower
the power consumption39 to avoid thermal damage.
Click play button to view the process of removing GPU Riser Module from a
PowerEdge Server,
Movie:
Link to video:
https://edutube.emc.com/Player.aspx?vno=UkyAL1BC2K50Wvk9kGQyKg==&auto
play=true
38 The range is optimal for GPU performance. The iDRAC sets the thermal warning
threshold when the GPU is installed.
39 Lowering the power consumption results in lower GPU performance.
The GPU full-length kit, half-length kit, and the GPU power cable kit are kits
available for customers. Depending on the kit ordered, the respective components
are available.
Use SolVe to generate the upgrade procedure for the GPU kit.
Expansion Card
Expansion Card
PCIe Overview
An x1, x4, x8, or x16 card can use a x16 lane slot. A system board can have
multiple slot types and support different PCIe versions.
PowerEdge R750 CPU and PCIe lanes. The PowerEdge R750 supports many riser and PCIe lane
configurations.
The system board processors control the PCIe slots. Also, the system board
chipset may support PCIe slots.
Dell EMC PowerEdge 14G servers support PCIe 3. PowerEdge 15G servers
support PCIe 3 and PCIe 4.
For example: the PowerEdge R750, with the use of expansion card risers, can
support up to 48 x 4.0 PCIe lanes per CPU.
The PowerEdge server supports many different PCIe card form factors. The
graphic shows the standard full-length, half-length, and low profile dimensions.
Also, PowerEdge servers may support other form factors such as half-length, and
half-height (HLHH).
Half-length: 167.65
mm (6.6")
computing.
Risers
Riser cards enable users to install additional expansion cards for the server.
Storage
Storage
NVMe
Paddle cards are used to connect an NVMe backplane to the system board using
cables. The paddle card interfaces with the system board chipset.
When paddle cards are used, the onboard s150 is controlling the NVMe disks.
Paddle cards are similar to risers, but they do not have the riser cage. Paddle cards
provide efficient data management on systems with many storage devices.
The paddle cards are only available with certain riser configurations. Not all
systems will come with paddle boards, it is dependent on the users configuration.
For example, in the PowerEdge R750, the configuration that supports 24 X 2.5"
hard drives with the backplane has paddle cards for efficient management.
1:
The video shows the process of removing and installing the paddle card on a Dell
EMC PowerEdge server.
Movie:
Link to video:
https://edutube.emc.com/Player.aspx?vno=s9rdvKggg4PdlphZBPRjNw==&autopla
y=true
The Internal Dual SD Module (IDSDM) provides a redundant SD-card module for
embedded hypervisors. The users configure the IDSDM for storage or as the
operating system boot partition.
The video shows the process of removing and installing the IDSDM and vFlash
cards on the Dell EMC PowerEdge server.
44 The data is written on both cards, but the data is read from the first card. If the
first card fails or is removed, then the second card automatically becomes active.
45 vFlash cards provide a shared storage space between the server system and its
iDRAC.
Movie:
Link to video:
https://edutube.emc.com/Player.aspx?vno=Sa4eNZrdwA1doMbXQGmChg==&auto
play=true
Dell EMC offers two types of Boot Optimized Storage Solution (BOSS) cards.
BOSS-S1
Features of the BOSS-S1 Adapter Card Features of the BOSS-S1 Modular Card
Click here to learn more about the Dell EMC BOSS-S1 card through the Dell EMC
Boot Optimized Server Storage-S1 User's Guide.
BOSS-S2
1. M.2 blank
2. M.2 carrier
4. BOSS-S2 module
5. M.2 card
6. BOSS-S2 card
7. Signal cable
8. Power cable
Click here to learn more about the Dell EMC BOSS-S2 through the Dell
Technologies Boot Optimized Storage Solution-S2 User's Guide.
RAID
Data is distributed across the drives in several ways known as RAID levels. Based
on the customer requirements, the RAID levels can be configured for optimal
performance. Click here to learn more about available RAID levels and
specifications.
1 2 3 4 5 6 7
1:
2:
3:
4:
RAID 6 uses the concept of dual parity with block-level disk striping. RAID 6 allows
two disk failures without duplicating the contents of entire physical disks. The disk
46Parity data is redundant data that is generated to provide fault tolerance within
certain RAID levels.
capacity is calculated by n-2. If there are four disks, then the virtual disk capacity is
the total size of two disks.
The minimum number of disks required to configure RAID 6 is four. RAID 6 can
have a maximum of 32 drives.
5:
RAID 10 combines RAID 0 and RAID 1 with a minimum of four disks. In RAID 10,
two disks are striped and mirrored onto two other disks, creating a single array of
disk drives.
The minimum number of disks required to configure RAID 10 is four. RAID 6 can
have a maximum of 240 drives.
6: RAID 50 (RAID 5+0), a type of nested RAID level, combines the block-level
striping of RAID 0 with the distributed parity of RAID 5.
RAID 0
Block 1 Block 2
Block 3 Block 4
Block 5 Block 6
Block 7 Block 8
Disk 1 Disk 2
An example of RAID 0.
RAID 0 uses the concept of striping that allows data to be written across multiple
hard drives instead of one physical disk. RAID 0 involves the partitioning of each
physical disk storage space into 64 KB stripes.
The minimum number of disks required to configure RAID 0 is two. RAID 0 can
have a maximum of 32 drives.
• Performance boost for read and write operations due to the striping of data
across multiple disks.
• Increases the total size of available space that is presented to the operating
system.
RAID 1
Block 1 Block 1
Block 2 Block 2
Block 3 Block 3
Block 4 Block 4
Disk 1 Disk 2
An example of RAID 1.
RAID 1 uses the concept of data mirroring. Data is mirrored or cloned to other disks
so that if one of the disks fails, the other one can be used.
The minimum number of disks required to configure RAID 1 is two. RAID 1 can
have a maximum of 32 drives.
• Improves read performance since different blocks of data can be accessed from
all the disks simultaneously.
• A multithreaded process can access block 1 from disk 1 and block 2 from disk 2
at once thereby increasing the read speed.
• Ideal for mission critical storage and hosting operating systems.
• Write performance is reduced since all the drives must be updated whenever
new data is written.
• Disk space is wasted to duplicate the data thereby increasing the cost to
storage ratio.
RAID 5
Parity
Generator A A P
0 1 a
B P B
0 b 1
P C C
c 0 1
D D P
0 1 d
Disk 1 Disk 2 Disk 3
An example of RAID 5.
RAID 5 uses the concept of distributed parity with block-level disk striping. RAID 5
stripes data blocks across multiple disks like RAID 0 while storing parity
information. The disk capacity is calculated by n-1. If there are three disks, then the
virtual disk capacity is the total size of two disks.
The minimum number of disks required to configure RAID 5 is three. RAID 5 can
have a maximum of 32 drives.
RAID 6
Parity A A P Q
Generator 0 1 a a
B P Q B
0 b b 1
P Q C C
c c 0 1
Q D D P
d 0 1 d
Disk 1 Disk 2 Disk 3 Disk 4
An example of RAID 6.
RAID 6 uses the concept of dual parity with block-level disk striping. RAID 6 allows
two disk failures without duplicating the contents of entire physical disks. The disk
capacity is calculated by n-2. If there are four disks, then the virtual disk capacity is
the total size of two disks.
The minimum number of disks required to configure RAID 6 is four. RAID 6 can
have a maximum of 32 drives.
RAID 10
RAID 0
RAID 1 RAID 1
RAID 10 combines RAID 0 and RAID 1 with a minimum of four disks. In RAID 10,
two disks are striped and mirrored onto two other disks, creating a single array of
disk drives.
The minimum number of disks required to configure RAID 10 is four. RAID 6 can
have a maximum of 240 drives.
RAID 50
RAID 0
RAID 5 RAID 5
A A P A A P
0 1 a 2 3 a
B P B B P B
0 b 1 2 b 3
P C C P C C
c 0 1 c 2 3
D D P D D P
0 1 d 2 3 d
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Disk 6
RAID 50 (RAID 5+0), a type of nested RAID level, combines the block-level striping
of RAID 0 with the distributed parity of RAID 5.
RAID 60
RAID 0
RAID 6 RAID 6
A A P Q A A P Q
B P Q B B P Q B
P Q C C P Q C C
Q D D P Q D D P
RAID 60 (6+0), a type of nested RAID level, combines the block-level striping of
RAID 0 with the dual distributed parity of RAID 6.
The table highlights the major differences between each RAID level.
Hot Spare
Hot spares47 are dedicated standby disks. When a hard drive that is used in a
virtual disk fails, the assigned hot spare48 is activated to replace the failed hard
drive without interrupting the system or requiring any intervention. When a hot
spare is activated, it rebuilds the data for all redundant virtual disks that were using
the failed hard drive.
47 A hot spare must be at least as large as the drive it is to replace, and a hot spare
must be the same drive type (SAS/SATA) as the drive it is to replace.
48 Hot spare cannot be assigned to 7200 RPM disks to replace 10 drives. It also
49The PERC 10 series can be configured so that the system backplane or storage
enclosure disk slots are dedicated as hot spare slots. This feature can be enabled
using the Dell OpenManage storage management application.
PERC Overview
The Dell EMC PowerEdge RAID Controller (PERC) is a series of RAID disk storage
controllers which support SAS, SATA hard drives, and Solid-State Drives (SSDs).
NVMe hardware RAID support is available with the PERC 11 (H755N front,
H755MX and H755 adapter).
H745 Adapter
1 Heat sink
2 Battery
6 PCIe Connector
H745 Front
2 Battery
3 Heat sink
7 Power connector
H345 Adapter
1 Heat sink
4 PCIe connector
H345 Front
2 Heat sink
5 Power connector
H755 Adapter
1 Heat sink
2 PCIe connector
3 Battery
4 Backplane connector A
5 Backplane connector B
H755 Front
1 Battery
4 Heat sink
5 Backplane connector A
6 Backplane connector A
H755 NVMe
1 Battery
4 Heat sink
5 Backplane connector A
6 Backplane connector B
H755 MX
The PERC11 controller introduces new features that boost performance. PERC11
supports the PCIe Gen4 host interface and the upgraded DDR4 8GB 2666MT/s
cache memory. However, the greatest addition to this generation of technology is
the inclusion of NVMe hardware RAID support. NVMe hardware RAID support is
available on the H755N front, H755MX and H755 adapter form factors.
The Dell EMC PowerEdge RAID Controller (PERC) is a series of RAID disk storage
controllers which support SAS and SATA hard drives, and Solid-State Drives
(SSDs). NVMe hardware RAID support is available with the PERC 11 (H755N
front, H755MX and H755 adapter).
2 1
3 6
4
7
5 8
1 Heatsink
2 PCIe connector
3 Battery
4 Backplane connector A
5 Backplane connector B
1 Heatsink
2 Battery
6 PCIe connector
2 Battery
3 Heatsink
7 Power connector
1 Heatsink
4 PCIe connector
2 Heatsink
5 Power connector
1 Battery
4 Heatsink
5 Backplane connector A
6 Backplane connector B
1 Battery
4 Heatsink
5 Backplane connector A
6 Backplane connector B
8: PERC H755 MX
The PERC H755MX does not support the MX5016s storage sled. The customers
want to use the MX5016s should use the HBA300MMZ (manages internal disks
only) or jumbo PERC (manages both internal and storage sled disks).
OS
SCSI
VD
PERC
Root
Complex
NVMe NVMe
Drives Drives
NVMe RAID
• A PERC translates the SCSI instructions and passes the instructions to the
NVMe drives.
• Windows Device Manager lists all the NVMe drives.
Virtual disks
Non-RAID disks
The image shows how the Windows Device Manager lists the NVMe drives.
In the Dell EMC 15G servers, PERC has two options for enclosure configuration
mode: Unified Mode and Split Mode.
Click each number to learn how to reset the enclosure mode through the PERC.
The split mode is indicated as <X:Y>. By default, the split mode is a <12:12> split.
X slots are assigned to one controller and Y slots are assigned to a different
controller.
Apply Changes
Once added to pending operations, click Apply Now to initiate the configuration
operation.
After the job is completed, a cold reboot is required to apply the changes.
Click each tab to view the PERC 10 and PERC 11 cards technical specification.
PERC 10 Specifications
PERC 11 Specification
Ask: List few PERC cards from PERC 10.6 series which support 14G and 15G
both.
Accept answers and discuss. Ensure that the following points are covered:
The video shows the process of removing and installing of the rear-loading PERC
module on the Dell EMC PowerEdge R750xa.
Movie:
Link to video:
https://edutube.emc.com/Player.aspx?vno=Hgc9SuXuvDTi9nC5Z05xxA==&autopla
y=true
Backplanes
The video shows the process of removing and installing the backplane on the Dell
EMC PowerEdge R750 server.
Movie:
Important: Make sure to remove all the cables that connect to the
system backplane before removing the backplane from the system.
Link to video:
https://edutube.emc.com/Player.aspx?vno=Nh7UK+Zx7sJKmki4oH/bzQ==&autopl
ay=true
Server security focuses on protecting the data and resources that are stored in
servers.
1 3 2 4 5 6 7 8
2:
7: System Erase: Allows users to easily retire or repurpose the latest PowerEdge
servers by securely and quickly wiping data from storage drives and other
embedded nonvolatile memory.
The Trusted Platform Module (TPM) is a hardware security device that provides the
server with the ability to create cryptographic keys. The cryptographic keys are
used for encryption and decryption.
When TPM is enabled on a device, the resident operating system works together
with the device to encrypt the hard drives. TPMs are passive devices.53 So, they
do not have the intelligence to communicate.
The TPM cannot be removed from one system board and installed on another
system board.
52 The chip includes a unique endorsement key that is baked into the module
during manufacturing, like a digital fingerprint to establish the trustworthiness of
data and applications. This cross-platform solution engages at the lowest level of
system operation, protecting against unauthorized firmware and software
modifications that can undermine system integrity.
53 TPMs only receive commands and return responses.
1 2
3 4
1: Hashing: Used to convert the input (string of characters) of any length to a fixed
size value which represents the original string using an algorithm.
TPM 2.0 is not fully supported in legacy BIOS mode because there is no pointer
to TPM logs in legacy BIOS mode.
Each domain or hierarchy of TPM has its own resources and controls.
Endorsement Hierarchy- The endorsement hierarchy is used when the user has
privacy concerns. The endorsement administrator has access to some
protected TPM commands and functionalities.
Different settings are used on Windows Server 2012R2, Windows Server 2016 and
later versions to match the operating system capabilities.
The BIOS settings need certain modifications to fully leverage the Windows Server
2016. Modifying the BIOS enables the server for the TPM guarded host
deployment that is required to run shielded virtual machines. Guarded hosts54 and
shielded Virtual Machines55 are new to Windows Server 2016 and later versions.
The PowerEdge 15G servers use a Silicon-based RoT to attest to the integrity of
the code running. The servers ensure that no unauthorized BIOS or firmware codes
run. If the code is replaced with malware, the server cannot execute the code.
The iDRAC is responsible for RoT and verifies BIOS SPI code before allowing host
chipset & CPU to run any code.
RoT Purpose
The silicon-based RoT starts a chain of trust to ensure systems boot with a
legitimate BIOS code. If the performed BIOS code has been verified as legitimate,
those credentials are trusted by the execution of each subsequent code.
RoT Operation
1. On a server, the silicon chip acts to validate that the BIOS is legitimate by
checking its encrypted signature.
2. This encrypted signature (a Dell EMC encryption key) is burned into silicon
during the manufacturing process and cannot be changed.
The only way to make the Root of Trust robust is to do it in hardware. The read-
only encryption keys are burned into PowerEdge servers at the factory. These keys
cannot be changed or erased. When the server powers on, the hardware chip
verifies that the BIOS code is legitimate from Dell EMC using the key that is burned
into silicon in the factory.
A failure to verify that the BIOS is legitimate results in a shutdown of the server and
the user is notified in the log. The BIOS recovery process can be initiated by the
user. If the RoT is validated successfully, the rest of the BIOS modules are
validated by using a chain of trust procedure until control is handed off to the
operating system or the hypervisor.
• The BIOS Live Scanning feature enables users to scan the system BIOS once
POST is completed. This task can be run once or can be set up on a schedule.
• The scan period could be once a week, once a month or once in a year
(adjustable by end user).
• The BIOS Live Scanning is a licensed feature and is available only with iDRAC
Datacenter license.
Image of the iDRAC UI with the BIOS Live Scanning option highlighted.
The 14G and the 15G PowerEdge servers support the Intel Boot Guard verified
boot feature. Boot Guard protects the server BIOS.
• Basic Input Output System (BIOS) is implicitly a critical element of any solution
stack that includes risks56 while updating.
• The BIOS persists between power cycles, becoming a potentially attractive
target for malicious attacks.
• Attacks against the BIOS are typically hard to detect. Attacks run before the
operating system and other security software loads. This mechanism leaves a
platform or organization exposed to further threat or performance issues.
56 Due to this, some users hesitate to perform scheduled updates during a server
life cycle.
Boot Guard is a processor feature that prevents the system from running the
firmware images that are not released by the manufacturer. It also allows the BIOS
or UEFI to verify that the BIOS is not compromised before booting.
In the Boot Guard verification method, the CPU compares the current BIOS or
UEFI firmware image with an official hash-generated version of the image that is
stored on PowerEdge servers.
Discussion
Boot Guard is a processor feature that prevents the system from running the
firmware images that are not released by the manufacturer. It also allows the BIOS
or UEFI to verify that the BIOS is not compromised before booting.
Discussion
Dell's BootGuard verification method involves comparing the BIOS image against
the official hash that is generated and stored on Dell's servers.
Basic Input Output System (BIOS) is implicitly a critical element of any solution
stack and since the BIOS persists between power cycles it poses a potentially
attractive target for malicious attacks. Attacks against the BIOS are typically hard to
detect because they run before the operating system and other security software
loads.
Due to the critical nature of the BIOS and the perceived risks of updating, some
customers hesitate to perform scheduled updates during a server lifecycle. This
can leave a platform or organization exposed to even further threat or performance
issues. For this reason, we have implemented multiple new features including Boot
Guard.
Protection at the Chipset: The Dell EMC 14th generation of PowerEdge servers
supports the Intel Boot Guard verified boot feature. The Boot Guard extends the
platform root of trust to the Platform Controller Hub (PCH). The PCH contains One-
Time Programmable (OTP) fuses that are burned by the Dell EMC factory during
the manufacturing process with the selected Boot Guard policy and the hash of the
Master Public Key. The Key Manifest on the BIOS SPI flash is signed by the Dell
EMC Master OEM key, and delegates authority to the Boot Policy Manifest key.
Then the Boot Policy Manifest authorizes the Initial Boot Block (IBB), which is the
first BIOS code module to run at the reset vector. If the IBB fails authentication,
Boot Guard shuts down the system and does not allow it to boot. Each BIOS
module contains a hash value of the next module in the chain and uses the hash
value to validate the next module. The IBB validates (SEC+PEI) before handing off
control to it. The (SEC+PEI) then validates (PEI+MRC) and (PEI+MRC) further
validates the (DXE+BDS) modules. After that point, the UEFI Secure Boot, if
enabled, can extend the root of trust to the remaining BIOS, third-party UEFI
drivers, and operating system loader.
1. The Boot Guard extends the platform RoT to the Platform Controller Hub
(PCH).
2. The PCH contains One-Time Programmable (OTP) fuses that are burned at the
Dell EMC factory during the manufacturing process.
3. The OTP contains the selected Boot Guard policy and the hash of the master
public key.
4. The key manifest on the BIOS SPI flash is signed by the Dell EMC master
Original Equipment Manufacturer (OEM) key and delegates authority to the boot
policy manifest key.
5. Each BIOS module contains a hash value of the next module in the chain and
uses the hash value to validate the next module.
If the Boot Guard event detects any issue in the BIOS image before booting, it
immediately activates the BIOS or UEFI recovery feature and attempts to recover a
backup BIOS or UEFI.
The Boot Guard event and the subsequent events that perform the BIOS or UEFI
recovery are captured in the Lifecycle Controller log as highlighted in the image.
A BIOS/UEFI recovery can be initiated in two ways, either through Boot Guard or
when the BIOS detects corruption.
There are two BIOS ROMs in the system, one that is 32MB (for the normal full-
sized BIOS) and another 16MB recovery ROM.
57 BIOS corruption can either be due to a malicious attack, due to a power loss
during the update process, or due to any other unforeseen event.
58 Read Only Memory
Secure Boot
As software security breaches are becoming more frequent and incognitive, system
administrators must deploy a wider variety of defenses, such as Secure Boot.
The UEFI Secure Boot is a technology that secures the boot process by verifying if
the drivers and operating system loaders are signed by the key that is authorized
by the firmware.
Secure Boot is a system BIOS feature that guards against attacks by preventing
the execution of unauthorized code in the preboot environment. It provides an
improved way for the BIOS to authenticate each component in the system using
certificates or policies during the boot process.
The BIOS authenticates each module that is run during the boot process using
certificates in the Secure Boot policy. Before the system BIOS loads a module into
memory, Secure Boot checks whether the module has the authorization to run the
system. This is done by launching various code modules, such as: device firmware,
diagnostics, and operating system loaders.
Secure Boot is a system BIOS feature that guards against attacks by preventing
the execution of unauthorized code in the preboot environment. It provides an
improved way for the BIOS to authenticate each component in the system using
certificates or policies during the boot process.
The BIOS authenticates each module that is run during the boot process using
certificates in the Secure Boot policy. Before the system BIOS loads a module into
memory, Secure Boot checks whether the module has the authorization to run the
system. This is done by launching various code modules, such as device firmware,
diagnostics, and operating system loaders.
The Secure Boot policy allows a user to specify the policy or digital signature that
BIOS uses to authenticate. The policy can be classified as:
• Standard: BIOS uses the default set of certificates to validate the drivers and
operating system loaders during the boot process. By default, the Secure Boot
Policy is set to Standard.
• Custom: BIOS uses the specific set of certificates that can be imported or
deleted from the standard certificates to validate the drivers and operating
system loaders during the boot process.
The Secure Boot policies in the latest technology of Dell EMC servers are
described in terms of various modes.
In this mode, the This works similar to setup This mode is the most
This uses a Dell factory,
customer can provide mode, except for policy secure and requires
randomly-generated,
their private key and violations. If there is a someone to physically
private key to
authenticate their policy violation, this does be at the box to modify
authenticate all cards
environment. not stop the system from any policies.
that are approved by
booting and will record the
Dell EMC.
failure in the LC log.
Secure Erase is used to reset the security attributes when an SED is inaccessible
due to lost or forgotten paraphrase.
The process of Secure Erase can be identified when data on SEDs is completely
erased and reset to the default state.
When OpenManage Essentials introduced the notion of configuration drift for both
hardware changes and firmware changes, customers began asking if there was a
way to prevent any changes from happening in the first place. This is how
Lockdown Mode started. The Lockdown option is selected in the iDRAC GUI and
used to prevent any changes to firmware or hardware settings during normal
operations.
When the System Lockdown Mode is enabled, only some configuration changes
are allowed.
Configuration Validation
Overview
Configuration Validation:
• Error Message - HWC8010: Occurs when there are one or two issues in the
configuration.
• Error Message - HWC8011: Occurs when there are multiple issues in the
configuration.
The product specific user manuals provide additional details about these errors.
Error Messages
The following table highlights the HWC8010 and HW8011 error messages along
with the interpretation of the error.
Click the Save Progress and Exit button in the course menu or below to
record this content as complete.
Go to the next learning or assessment, if applicable.
Two fans installed in the middle of the PowerEdge XR11 Extended heatsink used in the PowerEdge XR11.
chassis.
Fans may be running at high or full speed for various reasons. The workload
running on the server can result in high CPU utilization and thus an increase in
cooling requirement. If the system is idle and fans are still at full speed, then either
a hardware option (such as a high-power card or a third-party PCIe adapter)
presents in the server requires that full fan speed, or there is a failure of sensor
communication, a fan failure, or operation of a server without chassis cover and/or
air regulating shroud. Some systems require blanks for nonpopulated hard drive
slots, DIMM slots and/or CPU. Cooling for certain components may be
compromised if these blanks are missing, resulting in higher fan speeds.
Thermal algorithms define the minimum system fan speeds based on ambient
temperature, system configuration and system utilization. Allowing the user to
reduce fan speed could put system cooling at risk, potentially causing system
thermal-related failures. The only instance in which the user can reduce system fan
speed is when a third-party PCIe adapter card is part of the configuration for which
a thermal algorithm provides cooling based on limited information from the card.
This response may result in overcooling of the card. In this case, the user can turn
off the fan response that is associated with this card, or define a custom airflow
value for the card (iDRAC Web interface or RACADM). Turning off third-party Card
fan response may reduce the fan speed if other components within the system are
not requesting a higher fan speed than the response requested by the third-party
PCIe adapter card. It is not recommended to turn off this response unless the user
is aware of the cooling requirement of the adapter card.
I hear a fan spinning but my server is not powered ON. Is that expected?
Some server platforms are designed to allow one particular fan-in the system to
power ON when the system is in standby (AUX) state (AC plugged in, but power
button not pressed). This fan may run under some system inlet ambient conditions
to ensure cooling for onboard network devices that may be active in system AUX
state.
Fan speeds are expressed in Revolution Per Minute (RPM) but the input signal that
drives the fan to run at different speed is expressed as PWM (Pulse Width
Modulation). PWM can be any number between 0% and 100%. It should be noted
that a PWM of 0% generally does not mean that a fan is OFF. 0% is typically
defined as the fan’s lowest operational speed. Conversely, at 100% PWM, fans run
at the maximum RPM. The relationship between fan PWM and RPM is linear.
Various custom thermal settings are available and accessible using iDRAC
interfaces like RACADM, iDRAC UI, and BIOS HII browser. These thermal options
include, Custom Thermal Profiles (Maximum Performance, Maximum Performance
per Watt, Sound Cap); custom fan speed options (minimum fan speed, fan speed
offsets); and reduced Exhaust Temperature settings. In addition, custom airflow
settings can be applied to third-party PCIe adapter cards through RACADM and
iDRAC UI interfaces. The easiest way to access these options is to connect to the
iDRAC Web UI of the server and go to Cooling -> Configure Fans -> Fan
Configuration.
Sound Cap is a new feature of PowerEdge 14G servers. Sound Cap was
developed in response to customer requests and is for specialized environments in
which minimizing acoustical output is a higher criteria than peak raw performance.
Sound Cap limits, or “caps”, CPU power consumption and thus fan speed, resulting
in a lower acoustical ceiling. Its application is unique for acoustical deployments
and may result in reduced system performance.
Why are PCIe adapter cards installed based on a slot priority requirement in
the server?
There are various reasons that slot restrictions exist for certain cards. Some
common ones are:
• Certain slots are limited by PCIe lane width (like x4, x8, x16).
• Mechanically, a card may fit only in certain locations. This can be based on
such as whether the card is single wide vs. double wide, or standard-length or
full-length card.
• Cabling that is connected to the card that requires the card to be in a certain
location for optimum cable routing.
• Cooling limitations in certain slots, such as airflow limitations may cause a
Cooling or Thermal priority.
Where can I find more information about PCIe adapter card cooling on the
server?
The best place to look for this information is within the iDRAC Web UI. From the
iDRAC home screen, select Cooling -> Fan Overview -> Configure Fans. Then
scroll down to see the “PCIe Airflow Settings”. This section displays all the PCIe
adapter slots present in the system and the maximum airflow in LFM (Linear Feet
per Minute) available at each slot (when all fans are at full speed). This section also
indicates if a particular PCIe adapter card is considered a third-party Card, and if
so, what LFM is being provided. The user has the option to customize the airflow
based on the card specifications. This feature is new with PowerEdge 14G servers
and is an industry first.
Why is the top cover of my system hot and is that an indication of potential
cooling problem? OR Why are CPU temperatures high? OR Why is the air
coming out of the server so hot?
The system top cover may get hot in local regions above the CPU heatsinks or
near the back of the system. This occurs most commonly in dense systems and in
1U servers. The localized heating of the top cover is due to the close proximity of
the cover to the CPU heatsink or to the heated exhaust air at the rear of the
system. The surface- and exhaust temperatures should not exceed safety limits of
70°C. Components such as CPUs, GPUs, and general board components are
designed to run at higher temperatures without impact component or system
reliability. Users wanting to review or adjust system temperatures or exhaust air
temperature can use Custom Thermal Settings through various iDRAC interfaces to
increase fan speed (and thus system cooling) by applying any one of the Fan
Speed Offset, Minimum Fan Speed, and/or Custom Exhaust Temperature options.
Many high-power compute GPU devices that are passively cooled require platform-
specific configuration restrictions, and those are allowed only on a limited number
of platforms. Lower power (such as less than 75 W) PCIe adapters are supported
on all platforms. See platform-specific limitations to ensure compliance.
Some platforms require different CPU heat sinks based on the installed CPU TDP
or other specific hardware options. For example, shorter heat sinks and a different
air shroud are required in the R740 and R740xd to allow for proper GPU cooling.
See the individual platform details for specific information.
RIO card used in a non-liquid cooling configuration. Custom RIO card used in a Direct Liquid Cooling
configuration.
− In this configuration, each riser has one GPU. The empty slots should be
installed with a dummy GPU module.
− For example, if GPUs are installed in slot 31 and slot 33, then dummy GPU
modules must be installed in slot 32 and slot 34. Similarly, if two GPUs are
installed in slot 33 and slot 34, then dummy GPU modules must be installed
in empty slots 31 and slot 32.
Left GPU Module Right GPU Module
RAID in OMSA
NVMe Support
Intel Gen 4
P5500/P5600
Kioxia Gen 4
CD6/CM6
Samsung Gen 4
1733/1735