White Paper: Fujitsu Server Primergy Performance Report PRIMERGY RX2540 M1

White Paper  Performance Report PRIMERGY RX2540 M1
White Paper
FUJITSU Server PRIMERGY
Performance Report PRIMERGY RX2540 M1
This document contains a summary of the benchmarks executed for the FUJITSU Server
PRIMERGY RX2540 M1.
The PRIMERGY RX2540 M1 performance data are compared with the data of other
PRIMERGY models and discussed. In addition to the benchmark results, an explanation
has been included for each benchmark and for the benchmark environment.
Version
1.4
2015-07-02
http://ts.fujitsu.com/primergy Page 1 (62)

White Paper  Performance Report PRIMERGY RX2540 M1 Version: 1.4  2015-07-02
Contents
Document history ................................................................................................................................................ 3

Technical data .................................................................................................................................................... 5
SPECcpu2006 .................................................................................................................................................... 8
SPECpower_ssj2008 ........................................................................................................................................ 14
Disk I/O: Performance of storage media .......................................................................................................... 19
Disk I/O: Performance of RAID controllers ....................................................................................................... 24
SAP SD ............................................................................................................................................................. 33
OLTP-2 ............................................................................................................................................................. 36
TPC-E ............................................................................................................................................................... 40
vServCon .......................................................................................................................................................... 44
VMmark V2 ....................................................................................................................................................... 51
STREAM ........................................................................................................................................................... 58
Literature ........................................................................................................................................................... 61
Contact ............................................................................................................................................................. 62
Page 2 (62) http://ts.fujitsu.com/primergy

Document history
Version 1.0
New:
 Technical data
 SPECcpu2006
® ®
Measurements with Intel Xeon Processor E5-2600 v3 Product Family
 Disk I/O: Performance of storage media
Results for 3.5" storage media
 SAP SD
Certification number 2014028
 OLTP-2
® ®
Results for Intel Xeon Processor E5-2600 v3 Product Family
 VMmark V2
Measurement with Xeon E5-2699 v3
Version 1.1
New:
 SPECpower_ssj2008
Measurement with Xeon Xeon E5-2699 v3
 Disk I/O: Performance of RAID controllers
Measurements with “LSI SW RAID on Intel C610 (Onboard SATA)” and “PRAID EP400i” controllers
 TPC-E
Measurement with Xeon E5-2699 v3
 vServCon
® ®
Results for Intel Xeon Processor E5-2600 v3 Product Family
Updated:
 SPECcpu2006
® ®
Additional Measurements with Intel Xeon Processor E5-2600 v3 Product Family
 OLTP-2
® ®
New results for Intel Xeon Processor E5-2600 v3 Product Family
 VMmark V2
“Performance with Server Power” measurement with Xeon E5-2699 v3
“Performance with Server and Storage Power” measurement with Xeon E5-2699 v3
Version 1.2
Updated:
 Technical data
Model version PY RX2540 M1 12x 3.5" added
Modular PSU 450W platinum hp added
Modular PSU 1200W platinum hp added
 SPECcpu2006
® ®
Additional Measurements with Intel Xeon Processor E5-2600 v3 Product Family

Version 1.3
Updated:
 Technical data
Model version PY RX2540 M1 8x 2.5" expandable added
Memory module 32GB (1x32GB) 2Rx4 DDR4-2133 R ECC added
Max. number of internal hard disks increased by option for 4x REAR 2.5" SAS/SATA HDD/SSD
 SPECcpu2006
Estimates for 1-socket configurations added
 SPECpower_ssj2008
Competitive comparison added
 OLTP-2
Added new PRIMERGY systems
Minor corrections
 vServCon
Added new PRIMERGY systems
Minor corrections
 VMmark V2
New names “Server PPKW Score” and “Server and Storage PPKW Score”
Minor corrections
Version 1.4
New:
 STREAM
® ®
Measurements with Intel Xeon Processor E5-2600 v3 Product Family
Updated:
 Technical data
Memory module 64GB (1x64GB) 4Rx4 DDR4-2133 LR ECC added
 Disk I/O: Performance of RAID controllers
Model version PY RX2540 M1 8x 2.5" expandable added
Second controller instance in “LSI SW RAID on Intel C610 (Onboard SATA)” added
“PRAID CP400i” and “PRAID EP420i” controllers added
Minor changes

Technical data
PRIMERGY RX2540 M1
Decimal prefixes according to the SI standard are used for measurement units in this white paper (e.g. 1 GB
9 30
= 10 bytes). In contrast, these prefixes should be interpreted as binary prefixes (e.g. 1 GB = 2 bytes) for
the capacities of caches and memory modules. Separate reference will be made to any further exceptions
where applicable.
Model PRIMERGY RX2540 M1

PY RX2540 M1 4x 3.5' expandable
PY RX2540 M1 12x 3.5"
Model versions
PY RX2540 M1 8x 2.5" expandable
PY RX2540 M1 24x 2.5"
Form factor Rack server
®
Chipset Intel C610
Number of sockets 2
Number of processors orderable 1 or 2
® ®
Processor type Intel Xeon Processor E5-2600 v3 Product Family
Number of memory slots 24 (12 per processor)
Maximum memory configuration 1536 GB
PY RX2540 M1 4x 3.5' expandable, PY RX2540 M1 12x 3.5":
Controller with RAID 0, RAID 1 or RAID 10 for up to 4 × 3.5" SATA HDDs
Onboard HDD controller
PY RX2540 M1 8x 2.5" expandable, PY RX2540 M1 24x 2.5":
Controller with RAID 0, RAID 1 or RAID 10 for up to 8 × 2.5" SATA HDDs
5 × PCI-Express 3.0 x8
PCI slots
2 × PCI-Express 3.0 x16
PY RX2540 M1 4x 3.5' expandable: 8 × 3.5" + 4 × 2.5"
PY RX2540 M1 12x 3.5": 12 × 3.5" + 4 × 2.5"
Max. number of internal hard disks
PY RX2540 M1 8x 2.5" expandable: 28 × 2.5"
PY RX2540 M1 24x 2.5": 28 × 2.5"

Processors (since system release)

Cache QPI Rated Max. Max. TDP
Threads
Cores
Speed Frequency Turbo Memory
Processor Frequency Frequency
[MB] [GT/s] [Ghz] [Ghz] [MHz] [Watt]
Xeon E5-2623 v3 4 8 10 8.00 3.00 3.50 1866 105

Xeon E5-2637 v3 4 8 15 9.60 3.50 3.70 2133 135
Xeon E5-2603 v3 6 6 15 6.40 1.60 n/a 1600 85

Xeon E5-2609 v3 6 6 15 6.40 1.90 n/a 1600 85
Xeon E5-2620 v3 6 12 15 8.00 2.40 3.20 1866 85
Xeon E5-2643 v3 6 12 20 9.60 3.40 3.70 2133 135
Xeon E5-2630L v3 8 16 20 8.00 1.80 2.90 1866 55

Xeon E5-2630 v3 8 16 20 8.00 2.40 3.20 1866 85
Xeon E5-2640 v3 8 16 20 8.00 2.60 3.40 1866 90
Xeon E5-2667 v3 8 16 20 9.60 3.20 3.60 2133 135
Xeon E5-2650 v3 10 20 25 9.60 2.30 3.00 2133 105

Xeon E5-2660 v3 10 20 25 9.60 2.60 3.3 2133 105
Xeon E5-2650L v3 12 24 30 9.60 1.80 2.50 2133 65

Xeon E5-2670 v3 12 24 30 9.60 2.30 3.10 2133 120
Xeon E5-2680 v3 12 24 30 9.60 2.50 3.30 2133 120
Xeon E5-2690 v3 12 24 30 9.60 2.60 3.50 2133 135
Xeon E5-2683 v3 14 28 35 9.60 2.00 3.00 2133 120

Xeon E5-2695 v3 14 28 35 9.60 2.30 3.30 2133 120
Xeon E5-2697 v3 14 28 35 9.60 2.60 3.60 2133 145
Xeon E5-2698 v3 16 32 40 9.60 2.30 3.60 2133 135
Xeon E5-2699 v3 18 36 45 9.60 2.30 3.60 2133 145
All the processors that can be ordered with the PRIMERGY RX2540 M1, apart from Xeon E5-2603 v3 and
®
Xeon E5-2609 v3, support Intel Turbo Boost Technology 2.0. This technology allows you to operate the
processor with higher frequencies than the nominal frequency. Listed in the processor table is "Max. Turbo
Frequency" for the theoretical frequency maximum with only one active core per processor. The maximum
frequency that can actually be achieved depends on the number of active cores, the current consumption,
electrical power consumption and the temperature of the processor.
As a matter of principle Intel does not guarantee that the maximum turbo frequency will be reached. This is
related to manufacturing tolerances, which result in a variance regarding the performance of various
examples of a processor model. The range of the variance covers the entire scope between the nominal
frequency and the maximum turbo frequency.
The turbo functionality can be set via BIOS option. Fujitsu generally recommends leaving the "Turbo Mode"
option set at the standard setting "Enabled", as performance is substantially increased by the higher
frequencies. However, since the higher frequencies depend on general conditions and are not always
guaranteed, it can be advantageous to disable the "Turbo Mode" option for application scenarios with
intensive use of AVX instructions and a high number of instructions per clock unit, as well as for those that
require constant performance or lower electrical power consumption.

Memory modules (since system release)
Frequency [MHz]
Bit width of the
memory chips
Capacity [GB]
Load reduced
Low voltage
Registered
Memory module
Ranks
ECC
8GB (1x8GB) 1Rx4 DDR4-2133 R ECC 8 1 4 2133  
8GB (1x8GB) 2Rx8 DDR4-2133 R ECC 8 2 8 2133  
16GB (1x16GB) 2Rx4 DDR4-2133 R ECC 16 2 4 2133  
32GB (1x32GB) 2Rx4 DDR4-2133 R ECC 32 2 4 2133  
32GB (1x32GB) 4Rx4 DDR4-2133 LR ECC 32 4 4 2133   
64GB (1x64GB) 4Rx4 DDR4-2133 LR ECC 64 4 4 2133   
Power supplies (since system release) Max. number

Modular PSU 450W platinum hp 2
Modular PSU 800W titanium hp 2
Some components may not be available in all countries or sales regions.

Detailed technical information is available in the data sheet PRIMERGY RX2540 M1.

SPECcpu2006
Benchmark description
SPECcpu2006 is a benchmark which measures the system efficiency with integer and floating-point
operations. It consists of an integer test suite (SPECint2006) containing 12 applications and a floating-point
test suite (SPECfp2006) containing 17 applications. Both test suites are extremely computing-intensive and
concentrate on the CPU and the memory. Other components, such as Disk I/O and network, are not
measured by this benchmark.
SPECcpu2006 is not tied to a special operating system. The benchmark is available as source code and is
compiled before the actual measurement. The used compiler version and their optimization settings also
affect the measurement result.
SPECcpu2006 contains two different performance measurement methods: the first method (SPECint2006 or
SPECfp2006) determines the time which is required to process single task. The second method
(SPECint_rate2006 or SPECfp_rate2006) determines the throughput, i.e. the number of tasks that can be
handled in parallel. Both methods are also divided into two measurement runs, “base” and “peak” which
differ in the use of compiler optimization. When publishing the results the base values are always used; the
peak values are optional.
Compiler Measurement
Benchmark Arithmetics Type Application
optimization result
SPECint2006 integer peak aggressive
Speed single-threaded
SPECint_base2006 integer base conservative
SPECint_rate2006 integer peak aggressive
Throughput multi-threaded
SPECint_rate_base2006 integer base conservative
SPECfp2006 floating point peak aggressive
Speed single-threaded
SPECfp_base2006 floating point base conservative
SPECfp_rate2006 floating point peak aggressive
Throughput multi-threaded
SPECfp_rate_base2006 floating point base conservative
The measurement results are the geometric average from normalized ratio values which have been
determined for individual benchmarks. The geometric average - in contrast to the arithmetic average - means
that there is a weighting in favour of the lower individual results. Normalized means that the measurement is
how fast is the test system compared to a reference system. Value “1” was defined for the
SPECint_base2006-, SPECint_rate_base2006, SPECfp_base2006 and SPECfp_rate_base2006 results of
the reference system. For example, a SPECint_base2006 value of 2 means that the measuring system has
handled this benchmark twice as fast as the reference system. A SPECfp_rate_base2006 value of 4 means
that the measuring system has handled this benchmark some 4/[# base copies] times faster than the
reference system. “# base copies” specify how many parallel instances of the benchmark have been
executed.
Not every SPECcpu2006 measurement is submitted by us for publication at SPEC. This is why the SPEC
web pages do not have every result. As we archive the log files for all measurements, we can prove the
correct implementation of the measurements at any time.

Benchmark environment
System Under Test (SUT)
Hardware
® ®
Processor 2 processors of Intel Xeon Processor E5-2600 v3 Product Family
Memory 16 × 16GB (1x16GB) 2Rx4 DDR4-2133 R ECC
Software
Operating system SPECint_base2006, SPECint2006:
Xeon E5-2603 v3, E5-2609 v3, E5-2620 v3, E5-2630 v3, E5-2630L v3, E5-2650 v3, E5-
2690 v3, E5-2695 v3, E5-2697 v3, E5-2698 v3:
Red Hat Enterprise Linux Server release 7.0
All others: Red Hat Enterprise Linux Server release 6.5
SPECint_rate_base2006, SPECint_rate2006:
Xeon E5-2620 v3, E5-2630 v3, E5-2698 v3:
Red Hat Enterprise Linux Server release 7.0
SPECfp_base2006, SPECfp2006, SPECfp_rate_base2006, SPECfp_rate2006:
Xeon E5-2699 v3: Red Hat Enterprise Linux Server release 6.5
Operating system
echo always > /sys/kernel/mm/redhat_transparent_hugepage/enabled
settings
SPECint_base2006, SPECint2006:
Xeon E5-2623 v3, E5-2637 v3, E5-2640 v3, E5-2643 v3, E5-2650L v3, E5-2660 v3, E5-
2667 v3, E5-2670 v3, E5-2680 v3, E5-2683 v3:
C/C++: Version 14.0.0.080 of Intel C++ Studio XE for Linux
All others: C/C++: Version 15.0.0.090 of Intel C++ Studio XE for Linux
SPECint_rate_base2006, SPECint_rate2006:
Compiler
Xeon E5-2620 v3, E5-2630 v3, E5-2698 v3:
All others: C/C++: Version 14.0.0.080 of Intel C++ Studio XE for Linux
SPECfp_base2006, SPECfp2006, SPECfp_rate_base2006, SPECfp_rate2006:
Fortran: Version 15.0.0.090 of Intel Fortran Studio XE for Linux

Benchmark results
In terms of processors the benchmark result depends primarily on the size of the processor cache, the
support for Hyper-Threading, the number of processor cores and on the processor frequency. In the case of
processors with Turbo mode the number of cores, which are loaded by the benchmark, determines the
maximum processor frequency that can be achieved. In the case of single-threaded benchmarks, which
largely load one core only, the maximum processor frequency that can be achieved is higher than with multi-
threaded benchmarks.
The results marked (est.) are estimates.
SPECint_rate_base2006
Number of processors
SPECint_base2006
SPECint_rate2006
SPECint_rate2006
SPECint2006
Processor
Xeon E5-2623 v3 2 56.4 59.3 1 209 (est.) 216 (est.) 2 410 424
Xeon E5-2637 v3 2 61.5 64.7 1 233 (est.) 240 (est.) 2 456 470
Xeon E5-2603 v3 2 28.9 30.1 1 135 (est.) 140 (est.) 2 265 274
Xeon E5-2609 v3 2 33.9 35.2 1 156 (est.) 162 (est.) 2 306 317
Xeon E5-2620 v3 2 54.1 56.2 1 258 (est.) 269 (est.) 2 506 528
Xeon E5-2643 v3 2 64.0 67.5 1 340 (est.) 351 (est.) 2 667 688
Xeon E5-2630L v3 2 51.3 53.6 1 285 (est.) 296 (est.) 2 559 580
Xeon E5-2630 v3 2 55.6 58.1 1 338 (est.) 352 (est.) 2 662 690
Xeon E5-2640 v3 2 58.4 62.2 1 359 (est.) 370 (est.) 2 703 726
Xeon E5-2667 v3 2 63.3 66.9 1 414 (est.) 428 (est.) 2 812 838
Xeon E5-2650 v3 2 53.8 56.2 1 421 (est.) 434 (est.) 2 825 850
Xeon E5-2660 v3 2 58.4 61.7 1 453 (est.) 468 (est.) 2 888 917
Xeon E5-2650L v3 2 46.4 48.7 1 414 (est.) 429 (est.) 2 812 841
Xeon E5-2670 v3 2 56.0 59.2 1 494 (est.) 510 (est.) 2 968 999
Xeon E5-2680 v3 2 59.7 62.9 1 531 (est.) 546 (est.) 2 1040 1070
Xeon E5-2690 v3 2 62.7 65.4 1 551 (est.) 566 (est.) 2 1080 1110
Xeon E5-2683 v3 2 53.7 56.5 1 541 (est.) 561 (est.) 2 1060 1100
Xeon E5-2695 v3 2 58.9 61.6 1 582 (est.) 602 (est.) 2 1140 1180
Xeon E5-2697 v3 2 63.6 66.1 1 617 (est.) 638 (est.) 2 1210 1250
Xeon E5-2698 v3 2 63.2 65.5 1 638 (est.) 658 (est.) 2 1250 1290
Xeon E5-2699 v3 2 63.9 66.0 1 699 (est.) 719 (est.) 2 1370 1410

SPECfp_rate_base2006
SPECfp_base2006
SPECfp_rate2006
SPECfp_rate2006
SPECfp2006
Processor
Xeon E5-2623 v3 2 94.4 97.3 1 191 (est.) 196 (est.) 2 380 389
Xeon E5-2637 v3 2 101 103 1 214 (est.) 222 (est.) 2 425 440
Xeon E5-2603 v3 2 54.8 56.6 1 142 (est.) 145 (est.) 2 282 288
Xeon E5-2609 v3 2 61.0 63.1 1 163 (est.) 167 (est.) 2 325 331
Xeon E5-2620 v3 2 94.6 100 1 235 (est.) 242 (est.) 2 468 479
Xeon E5-2643 v3 2 112 116 1 286 (est.) 295 (est.) 2 569 585
Xeon E5-2630L v3 2 88.5 93.6 1 251 (est.) 259 (est.) 2 499 512
Xeon E5-2630 v3 2 102 108 1 283 (est.) 291 (est.) 2 563 577
Xeon E5-2640 v3 2 105 109 1 293 (est.) 302 (est.) 2 583 598
Xeon E5-2667 v3 2 115 119 1 330 (est.) 340 (est.) 2 656 673
Xeon E5-2650 v3 2 102 106 1 343 (est.) 354 (est.) 2 682 700
Xeon E5-2660 v3 2 108 112 1 355 (est.) 367 (est.) 2 706 726
Xeon E5-2650L v3 2 87.5 91.4 1 336 (est.) 347 (est.) 2 668 687
Xeon E5-2670 v3 2 104 108 1 381 (est.) 393 (est.) 2 759 779
Xeon E5-2680 v3 2 110 114 1 392 (est.) 406 (est.) 2 781 804
Xeon E5-2690 v3 2 114 118 1 398 (est.) 412 (est.) 2 793 815
Xeon E5-2683 v3 2 98.3 102 1 404 (est.) 417 (est.) 2 804 826
Xeon E5-2695 v3 2 103 108 1 410 (est.) 424 (est.) 2 816 840
Xeon E5-2697 v3 2 110 113 1 427 (est.) 441 (est.) 2 849 874
Xeon E5-2698 v3 2 107 112 1 436 (est.) 451 (est.) 2 868 893
Xeon E5-2699 v3 2 109 115 1 461 (est.) 478 (est.) 2 917 946
th
On 19 September 2014 the PRIMERGY RX2540 M1 with two
 Xeon E5-2643 v3 processors was ranked first in the 2-socket systems category for the
benchmark SPECint_base2006.
 Xeon E5-2667 v3 processors was ranked first for the benchmark SPECfp_base2006.
 Xeon E5-2667 v3 processors was ranked first for the benchmark SPECfp2006.
benchmark SPECint_rate_base2006.
benchmark SPECint_rate2006.
benchmark SPECfp_rate_base2006.
benchmark SPECfp_rate2006.
The results can be found at
http://www.fujitsu.com/fts/products/computing/servers/primergy/benchmarks/rx2540m1/.

The following four diagrams illustrate the throughput of the PRIMERGY RX2540 M1 in comparison to its
predecessor PRIMERGY RX300 S8, in their respective most performant configuration.
SPECcpu2006: integer performance
PRIMERGY RX2540 M1 vs. PRIMERGY RX300 S8
67.4 67.5
70 64.0
62.6
60
50
40
30
20
SPECint2006
10
SPECint_base2006
0
PRIMERGY RX300 S8 PRIMERGY RX2540 M1
2 x Xeon E5-2667 v2 2 x Xeon E5-2643 v3
SPECcpu2006: integer performance

1410
1500
1370
1250 962
1000 931
750
500
SPECint_rate2006
250
0
2 x Xeon E5-2697 v2 2 x Xeon E5-2699 v3

SPECcpu2006: floating-point performance

119
112
115
120
108
100
80
60
40
SPECfp2006
20
SPECfp_base2006
0
2 x Xeon E5-2667 v2 2 x Xeon E5-2667 v3
SPECcpu2006: floating-point performance

946
1000 917
900 696
800
677
700
600
500
400
300
200 SPECfp_rate2006
100
0
2 x Xeon E5-2697 v2 2 x Xeon E5-2699 v3

SPECpower_ssj2008
SPECpower_ssj2008 is the first industry-standard SPEC benchmark that evaluates the power and
performance characteristics of a server. With SPECpower_ssj2008 SPEC has defined standards for server
power measurements in the same way they have done for performance.
The benchmark workload represents typical server-side Java business applications. The workload is
scalable, multi-threaded, portable across a wide range of platforms and easy to run. The benchmark tests
CPUs, caches, the memory hierarchy and scalability of symmetric multiprocessor systems (SMPs), as well
as the implementation of Java Virtual Machine (JVM), Just In Time (JIT) compilers, garbage collection,
threads and some aspects of the operating system.
SPECpower_ssj2008 reports power consumption for
servers at different performance levels — from 100% to
“active idle” in 10% segments — over a set period of
time. The graduated workload recognizes the fact that
processing loads and power consumption on servers
vary substantially over the course of days or weeks. To
compute a power-performance metric across all levels,
measured transaction throughputs for each segment are
added together and then divided by the sum of the
average power consumed for each segment. The result
is a figure of merit called “overall ssj_ops/watt”. This
ratio provides information about the energy efficiency of
the measured server. The defined measurement
standard enables customers to compare it with other
configurations and servers measured with
SPECpower_ssj2008. The diagram shows a typical
graph of a SPECpower_ssj2008 result.
The benchmark runs on a wide variety of

operating systems and hardware
architectures and does not require extensive
client or storage infrastructure. The minimum
equipment for SPEC-compliant testing is two
networked computers, plus a power analyzer
and a temperature sensor. One computer is
the System Under Test (SUT) which runs
one of the supported operating systems and
the JVM. The JVM provides the environment
required to run the SPECpower_ssj2008
workload which is implemented in Java. The
other computer is a “Control & Collection
System” (CCS) which controls the operation
of the benchmark and captures the power,
performance and temperature readings for
reporting. The diagram provides an overview
of the basic structure of the benchmark
configuration and the various components.

Hardware
Model version PY RX2540 M1 4x 3.5' expandable
Processor Xeon E5-2699 v3
Network-Interface 1 × PLAN AP 1x1Gbit Cu Intel I210-T1 LP
Disk-Subsystem Onboard HDD controller
1 × DOM SATA 6G 64GB Main N H-P
Power Supply Unit 1 × Modular PSU 800W titanium hp
Software
BIOS R1.6.0
BIOS settings Hardware Prefetcher = Disabled
Adjacent Cache Line Prefetch = Disabled
DCU Streamer Prefetcher = Disabled
Onboard USB Controllers = Disabled
Power Technology = Custom
QPI Link Frequency Select = 6.4 GT/s
Turbo Mode = Disabled
Intel Virtualization Technology = Disabled
ASPM Support = L1 Only
DMI Control = Gen1
COD Enable = Enabled
Early Snoop = Disabled
Firmware 7.60A
Operating system Microsoft Windows Server 2008 R2 Enterprise SP1
Operating system Using the local security settings console, “lock pages in memory” was enabled for the user
settings running the benchmark.
Power Management: Enabled (“Fujitsu Enhanced Power Settings” power plan)
Set “Turn off hard disk after = 1 Minute” in OS.
Benchmark was started via Windows Remote Desktop Connection.
Microsoft Hotfix KB2510206 has been installed due to known problems of the group
assignment algorithm which does not create a balanced group assignment. For more
information see: http://support.microsoft.com/kb/2510206
JVM IBM J9 VM (build 2.6, JRE 1.7.0 Windows Server 2008 R2 amd64-64 20120322_106209 (JIT
enabled, AOT enabled)
JVM settings start /NODE [0,1,2,3] /AFFINITY [0x3,0xC,0x30,0xC0,0x300,0xC00,0x3000,0xC000,0x30000]
-Xmn825m -Xms975m -Xmx975m -Xaggressive -Xcompressedrefs -Xgcpolicy:gencon
-XlockReservation -Xnoloa -XtlhPrefetch -Xlp -Xconcurrentlevel0 -Xthr:minimizeusercpu
-Xgcthreads2 (-Xgcthreads1 for JVM5 and JVM23)
Other software IBM WebSphere Application Server V8.5.0.0, Microsoft Hotfix for Windows (KB2510206)

Benchmark results
The PRIMERGY RX2540 M1 achieved the following result:
SPECpower_ssj2008 = 10,654 overall ssj_ops/watt
The adjoining diagram shows the
result of the configuration described
above. The red horizontal bars show
the performance to power ratio in
ssj_ops/watt (upper x-axis) for each
target load level tagged on the y-axis
of the diagram. The blue line shows
the run of the curve for the average
power consumption (bottom x-axis) at
each target load level marked with a
small rhomb. The black vertical line
shows the benchmark result of
10,654 overall ssj_ops/watt for the
PRIMERGY RX2540 M1. This is the
quotient of the sum of the transaction
throughputs for each load level and
the sum of the average power con-
sumed for each measurement inter-
val.
The following table shows the benchmark results for the throughput in ssj_ops, the power consumption in
watts and the resulting energy efficiency for each load level.
Performance Power Energy Efficiency

Target Load ssj_ops Average Power (W) ssj_ops/watt
100% 3,257,627 271 12,001
90% 2,929,344 244 11,985
80% 2,599,922 216 12,063
70% 2,283,039 186 12,264
60% 1,957,744 161 12,181
50% 1,633,106 143 11,417
40% 1,303,216 128 10,187
30% 978,558 113 8,652
20% 652,175 98.2 6,642
10% 326,571 82.6 3,954
Active Idle 0 39.0 0
∑ssj_ops / ∑power = 10,654
The PRIMERGY RX2540 M1 achieved a new world record with this result, thus surpassing the
best result of the competition by 4.1% (date: March 18, 2015). Thus, the PRIMERGY RX2540
M1 proves itself to be the most energy-efficient server in the world. For the latest
SPECpower_ssj2008 benchmark results, visit: http://www.spec.org/power_ssj2008/results.

The comparison with the competition makes

SPECpower_ssj2008: PRIMERGY RX2540 M1 vs. competition the advantage of the PRIMERGY RX2540
12,000 M1 in the field of energy efficiency evident.
With 4.1% more energy efficiency than the
10,000 10,654 best result of the competition, the Quanta
10,238 QuantaGrid D51B-2U server, the
overall ssj_ops/watt
8,000 PRIMERGY RX2540 M1 is setting new

standards.
6,000
4,000
2,000
0
Quanta Fujitsu Server
QuantaGrid D51B-2U PRIMERGY RX2540 M1
The following diagram shows for each load level the power consumption (on the right y-axis) and the
throughput (on the left y-axis) of the PRIMERGY RX2540 M1 compared to the predecessor PRIMERGY
RX300 S8.
SPECpower_ssj2008: PRIMERGY RX2540 M1 vs. PRIMERGY RX300 S8

Thanks to the new Haswell processors the

SPECpower_ssj2008 overall ssj_ops/watt:
PRIMERGY RX2540 M1 has in comparison
with the PRIMERGY RX300 S8 a substan- 12,000 1,800
tially higher throughput. Despite the higher
power consumption this results in an overall 10,000 1,500
10,654
increase in energy efficiency in the
overall ssj_ops/watt
total power [watt]

PRIMERGY RX2540 M1 of 32%. 8,000 1,200
8,097
6,000 900
4,000 600
2,000 300
0 0
Fujitsu Server Fujitsu Server

Disk I/O: Performance of storage media

Performance measurements of disk subsystems for PRIMERGY servers are used to assess their
performance and enable a comparison of the different storage connections for PRIMERGY servers. As
standard, these performance measurements are carried out with a defined measurement method, which
models the accesses of real application scenarios on the basis of specifications.
The essential specifications are:
 Share of random accesses / sequential accesses
 Share of read / write access types
 Block size (kB)
 Number of parallel accesses (# of outstanding I/Os)
A given value combination of these specifications is known as “load profile”. The following five standard load
profiles can be allocated to typical application scenarios:
Standard load Access Type of access Block size Application

profile read write [kB]
File copy random 50% 50% 64 Copying of files
File server random 67% 33% 64 File server
Database (data transfer)
Database random 67% 33% 8
Mail server
Database (log file),
Streaming sequential 100% 0% 64 Data backup;
Video streaming (partial)
Restore sequential 0% 100% 64 Restoring of files
In order to model applications that access in parallel with a different load intensity, the “# of Outstanding
I/Os” is increased, starting with 1, 3, 8 and going up to 512 (from 8 onwards in increments to the power of
two).
The measurements of this document are based on these standard load profiles.
The main results of a measurement are:

 Throughput [MB/s] Throughput in megabytes per second
 Transactions [IO/s] Transaction rate in I/O operations per second
 Latency [ms] Average response time in ms
The data throughput has established itself as the normal measurement variable for sequential load profiles,
whereas the measurement variable “transaction rate” is mostly used for random load profiles with their small
block sizes. Data throughput and transaction rate are directly proportional to each other and can be
transferred to each other according to the formula
Data throughput [MB/s] = Transaction rate [IO/s] × Block size [MB]

Transaction rate [IO/s] = Data throughput [MB/s] / Block size [MB]
12
This section specifies capacities of storage media on a basis of 10 (1 TB = 10 bytes) while all other
20
capacities, file sizes, block sizes and throughputs are specified on a basis of 2 (1 MB/s = 2 bytes/s).
All the details of the measurement method and the basics of disk I/O performance are described in the white
paper “Basics of Disk I/O Performance”.

All the measurement results discussed in this section apply for the hardware and software components listed
below:
Hardware
Controller 1 × RAID Ctrl SAS 6G 0/1 (D2607)
Storage media SSD HDD
Intel SSDSC2BA100G3C HGST HUC156030CSS204
Intel SSDSC2BA800G3C Seagate ST1000NM0023
Seagate ST1000NM0033
Western Digital WD1001FYYG
Western Digital WD1003FBYX
Western Digital WD2000FYYZ
Western Digital WD5003ABYX
Software
Operating system Microsoft Windows Server 2008 Enterprise x64 Edition SP2
Administration software ServerView RAID Manager 5.7.2
File system NTFS
Measuring tool Iometer 2006.07.27
Measurement data 32 GB measurement file
Some components may not be available in all countries / sales regions.
Benchmark results
The results shown here are intended to help you select the appropriate storage media for the PRIMERGY
RX2540 M1 under the aspect of disk-I/O performance. For this purpose, a single storage medium was
measured in the configuration specified in the subsection Benchmark environment. The RAID controller
RAID Ctrl SAS 6G 0/1 (D2607) was used for the measurements. The following table summarizes its most
important features.
Controller name Cache Supported interfaces RAID levels
SATA 3G/6G
RAID Ctrl SAS 6G 0/1 (D2607) - PCIe 2.0 x8 0, 1, 1E, 10
SAS 3G/6G

Storage media
When selecting the type and number of storage media you can move the weighting in the direction of
storage capacity, performance, security or price. The following types of storage media can be used in the
PRIMERGY RX2540 M1:
Storage medium type Interface Form factor

HDD SATA 6G 3.5"
)
HDD SAS 12G 2.5" *
HDD SAS 6G 3.5"
)
SSD SATA 6G 2.5" *
*) partly also available with a 3.5" tray
HDDs and SSDs are operated via host bus adapters, usually RAID controllers, with a SATA or SAS
interface. The interface of the RAID controller to the chipset of the systemboard is typically PCIe or, in the
case of the integrated onboard controllers, an internal bus interface of the systemboard.
Of all the storage medium types SSDs offer by far the highest transaction rates for random load profiles as
well as the shortest access times. In return, however, the price per gigabyte of storage capacity is
substantially higher.
Cache settings
In most cases, the cache of HDDs has a great influence on disk-I/O performance. It is frequently regarded as
a security problem in case of power failure and is thus switched off. On the other hand, it was integrated by
hard disk manufacturers for the good reason of increasing the write performance. For performance reasons it
is therefore advisable to enable the hard disk cache. To prevent data loss in case of power failure you are
recommended to equip the system with a UPS.
For the purpose of easy and reliable handling of the settings for RAID controllers and hard disks it is
advisable to use the RAID-Manager software “ServerView RAID” that is supplied for PRIMERGY servers. All
the cache settings for controllers and hard disks can usually be made en bloc – specifically for the
application – by using the pre-defined modi “Performance” or “Data Protection”. The “Performance” mode
ensures the best possible performance settings for the majority of the application scenarios.
Performance values
The performance values of the PRIMERGY RX2540 M1 are summarized in the following tables, in each case
specifically for a single storage medium and with various access types and block sizes. The established
measurement variables, as already mentioned in the subsection Benchmark description, are used here.
Thus, transaction rate is specified for random accesses and data throughput for sequential accesses. To
avoid any confusion among the measurement units the tables have been separated for the two access
types.
The table cells contain the maximum achievable values. This means that each value is the maximum
achievable value of the whole range of load intensities (# of Outstanding I/Os). In order to also visualize the
numerical values each table cell is highlighted with a horizontal bar, the length of which is proportional to the
numerical value in the table cell. All bars shown in the same scale of length have the same color. In other
words, a visual comparison only makes sense for table cells with the same colored bars. Since the horizontal
bars in the table cells depict the maximum achievable performance values, they are shown by the color
getting lighter as you move from left to right. The light shade of color at the right end of the bar tells you that
the value is a maximum value and can only be achieved under optimal prerequisites. The darker the shade
becomes as you move to the left, the more frequently it will be possible to achieve the corresponding value
in practice.

SSDs in comparison with the most powerful HDD

Random accesses (maximum performance values in IO/s):
PRIMERGY RX2540 M1
Capacity Transactions [IO/s]
Storage device Interface
[GB] Database Fileserver Filecopy
800 Intel SSDSC2BA800G3C SATA 6G 35120 5554 5313
600 HGST HUC156060CSS204 (HDD) SAS 12G 872 749 759
Sequential accesses (maximum performance values in MB/s):

PRIMERGY RX2540 M1
Capacity Throughput [MB/s]
[GB] Streaming Restore
800 Intel SSDSC2BA800G3C SATA 6G 382 342
600 HGST HUC156060CSS204 (HDD) SAS 12G 236 236

HDDs
Random accesses (maximum performance values in IO/s):
PRIMERGY RX2540 M1
Capacity Transactions [IO/s]
[GB] Database Fileserver Filecopy
4000 Seagate ST4000NM0033 SATA 6G 413 363 354
4000 Seagate ST4000NM0023 SAS 6G 410 354 353
4000 Western Digital WD4001FYYG SAS 6G 331 294 310
4000 Western Digital WD4000FYYZ SATA 6G 324 285 293
1000 Western Digital WD1003FBYX SATA 6G 243 220 231
600 HGST HUC156060CSS204 SAS 12G 872 749 759
500 Western Digital WD5003ABYX SATA 6G 242 219 228
450 HGST HUC156045CSS204 SAS 12G 864 766 789
300 HGST HUC156030CSS204 SAS 12G 783 663 678
Sequential accesses (maximum performance values in MB/s):

PRIMERGY RX2540 M1
Capacity Throughput [MB/s]
[GB] Streaming Restore
4000 Seagate ST4000NM0033 SATA 6G 174 173
4000 Seagate ST4000NM0023 SAS 6G 186 185
4000 Western Digital WD4001FYYG SAS 6G 173 173
4000 Western Digital WD4000FYYZ SATA 6G 165 164
1000 Western Digital WD1003FBYX SATA 6G 130 129
600 HGST HUC156060CSS204 SAS 12G 236 236
500 Western Digital WD5003ABYX SATA 6G 132 131
450 HGST HUC156045CSS204 SAS 12G 237 237
300 HGST HUC156030CSS204 SAS 12G 236 236

Disk I/O: Performance of RAID controllers

Performance measurements of disk subsystems for PRIMERGY servers are used to assess their
performance and enable a comparison of the different storage connections for PRIMERGY servers. As
standard, these performance measurements are carried out with a defined measurement method, which
models the accesses of real application scenarios on the basis of specifications.
The essential specifications are:
 Share of random accesses / sequential accesses
 Share of read / write access types
 Block size (kB)
 Number of parallel accesses (# of outstanding I/Os)
A given value combination of these specifications is known as “load profile”. The following five standard load
profiles can be allocated to typical application scenarios:
Standard load Access Type of access Block size Application

profile read write [kB]
File copy random 50% 50% 64 Copying of files
File server random 67% 33% 64 File server
Database (data transfer)
Database random 67% 33% 8
Mail server
Database (log file),
Streaming sequential 100% 0% 64 Data backup;
Video streaming (partial)
Restore sequential 0% 100% 64 Restoring of files
In order to model applications that access in parallel with a different load intensity, the “# of Outstanding
I/Os” is increased, starting with 1, 3, 8 and going up to 512 (from 8 onwards in increments to the power of
two).
The measurements of this document are based on these standard load profiles.
The main results of a measurement are:

 Throughput [MB/s] Throughput in megabytes per second
 Transactions [IO/s] Transaction rate in I/O operations per second
 Latency [ms] Average response time in ms
The data throughput has established itself as the normal measurement variable for sequential load profiles,
whereas the measurement variable “transaction rate” is mostly used for random load profiles with their small
block sizes. Data throughput and transaction rate are directly proportional to each other and can be
transferred to each other according to the formula
Data throughput [MB/s] = Transaction rate [IO/s] × Block size [MB]

Transaction rate [IO/s] = Data throughput [MB/s] / Block size [MB]
12
This section specifies capacities of storage media on a basis of 10 (1 TB = 10 bytes) while all other
20
capacities, file sizes, block sizes and throughputs are specified on a basis of 2 (1 MB/s = 2 bytes/s).
All the details of the measurement method and the basics of disk I/O performance are described in the white
paper “Basics of Disk I/O Performance”.

All the measurement results discussed in this chapter were determined using the hardware and software
components listed below:

Hardware
Controller 1 × “LSI SW RAID on Intel C610 (Onboard SATA)”
1 × “PRAID CP400i”
1 × “PRAID EP400i”
1 × “PRAID EP420i”
Drive 4 × 3.5" SATA HDD Seagate ST3000NM0033
4 × 2.5" SATA HDD Seagate ST91000640NS
4 × 2.5" SATA SSD Intel SSDSC2BA400G3C
24 × 2.5" SAS SSD Toshiba PX02SMF040
24 × 2.5" SAS HDD HGST HUC156045CSS204
Software
BIOS settings Intel Virtualization Technology = Disabled
VT-d = Disabled
Energy Performance = Performance
Utilization Profile = Unbalanced
CPU C6 Report = Disabled
Operating system Microsoft Windows Server 2012 Standard
Operating system Choose or customize a power plan: High performance
settings For the processes that create disk I/Os: set the AFFINITY to the CPU node to which the
PCIe slot of the RAID controller is connected
Administration ServerView RAID Manager 5.7.2 / 6.1.4
software
Initialization of RAID RAID arrays are initialized before the measurement with an elementary block size of 64 kB
arrays (“stripe size”)
File system NTFS
Measuring tool Iometer 2006.07.27
Measurement data Measurement files of 32 GB with 1 – 8 hard disks; 64 GB with 9 – 16 hard disks;
128 GB with 17 or more hard disks

Benchmark results
The results presented here are designed to help you choose the right solution from the various configuration
options of the PRIMERGY RX2540 M1 in the light of disk-I/O performance. Various combinations of RAID
controllers and storage media will be analyzed below. Information on the selection of storage media
themselves is to be found in the section “Disk I/O: Performance of storage media”.
Hard disks
The hard disks are the first essential component. If there is a reference below to “hard disks”, this is meant
as the generic term for HDDs (“hard disk drives”, in other words conventional hard disks) and SSDs (“solid
state drives”, i.e. non-volatile electronic storage media).
Mixed drive configurations of SAS and SATA hard disks in one system are permitted, unless they are
excluded in the configurator for special hard disk types.
More hard disks per system are possible as a result of using 2.5" hard disks instead of 3.5" hard disks.
Consequently, the load that each individual hard disk has to overcome decreases and the maximum overall
performance of the system increases.
More detailed performance statements about hard disk types are available in the section “Disk I/O:
Performance of storage media” in this performance report.
Model versions
The maximum number of hard disks in the system depends on the system configuration. The following table
lists the essential cases. Only the highest supported version is named for all the interfaces we have dealt
with in this section.
Form Connection Number of PCIe Maximum number
Interface
factor type controllers of hard disks
2.5", 3.5" SATA 6G direct 0 8
2.5", 3.5" SATA 6G, SAS 12G direct 1 8
3.5" SATA 6G, SAS 12G Expander 1 12
2.5" SATA 6G, SAS 12G Expander 1 24

RAID controller
In addition to the hard disks the RAID controller is the second performance-determining key component. In
the case of these controllers the “modular RAID” concept of the PRIMERGY servers offers a plethora of
options to meet the various requirements of a wide range of different application scenarios.
The following table summarizes the most important features of the available RAID controllers of the
PRIMERGY RX2540 M1. A short alias is specified here for each controller, which is used in the subsequent
list of the performance values.
Controller name Alias Cache Supported In the system BBU/

interfaces FBU
Max. # disks RAID levels
per controller
LSI SW RAID on Intel Onboard C610 - SATA 6G - 4 × 2.5" 0, 1, 10 -/-
C610 (Onboard SATA) 4 × 3.5"
PRAID CP400i PRAID CP400i - SATA 6G PCIe 3.0 8 × 2.5" 0, 1, 1E, 5, -/-
SAS 12G x8 8 × 3.5" 10, 50
PRAID EP400i PRAID EP400i 1 GB SATA 6G PCIe 3.0 24 × 2.5" 0, 1, 1E, 5, 6, -/
SAS 12G x8 12 × 3.5" 10, 50, 60
PRAID EP420i PRAID EP420i 2 GB SATA 6G PCIe 3.0 24 × 2.5" 0, 1, 1E, 5, 6, -/
SAS 12G x8 12 × 3.5" 10, 50, 60
Onboard RAID controllers are implemented in the chipset on the server system board and use the CPU of
the server for the RAID functionality. This simple solution does not require a PCIe slot. The chipset does not
communicate with the CPU via PCIe, but via the “Direct Media Interface”, in short DMI. The PRIMERGY
RX2540 M1 has the Intel C610 chipset, in which two onboard RAID controllers are integrated. Each of these
controllers can be used by the PRIMERGY RAID Management to form arrays consisting of up to four hard
disks. The alias of the onboard controller is used in this document to refer to one controller instance in the
chipset.
System-specific interfaces
The interfaces of a controller in CPU direction (DMI or PCIe) and in the direction of hard disks (SAS or
SATA) have in each case specific limits for data throughput. These limits are listed in the following table. The
minimum of these two values is a definite limit, which cannot be exceeded. This value is highlighted in bold
in the following table.
Controller Effective in the configuration Connection

alias via
# Disk-side Limit for # CPU-side Limit for expander
data channels throughput of data channels throughput of
disk interface CPU-side
interface
1 × Onboard C610 1 × 4 × SATA 6G 2060 MB/s 4 × DMI 2.0 1716 MB/s -
*)
2 × Onboard C610 2 × 4 × SATA 6G 4120 MB/s 4 × DMI 2.0 1716 MB/s -
PRAID CP400i 8 × SAS 12G 8240 MB/s 8 × PCIe 3.0 6761 MB/s -/
PRAID EP400i 8 × SAS 12G 8240 MB/s 8 × PCIe 3.0 6761 MB/s -/
PRAID EP420i 8 × SAS 12G 8240 MB/s 8 × PCIe 3.0 6761 MB/s -/
*) The second controller instance doesn‘t increase the throughput limit on the CPU side
More details about the RAID controllers of the PRIMERGY systems are available in the white paper “RAID
Controller Performance”.

Settings
In most cases, the cache of HDDs has a great influence on disk-I/O performance. It is frequently regarded as
a security problem in case of power failure and is thus switched off. On the other hand, it was integrated by
hard disk manufacturers for the good reason of increasing the write performance. For performance reasons it
is therefore advisable to enable the hard disk cache. To prevent data loss in case of power failure you are
recommended to equip the system with a UPS.
In the case of controllers with a cache there are several parameters that can be set. The optimal settings can
depend on the RAID level, the application scenario and the type of data medium. In the case of RAID levels
5 and 6 in particular (and the more complex RAID level combinations 50 and 60) it is obligatory to enable the
controller cache for application scenarios with write share. If the controller cache is enabled, the data
temporarily stored in the cache should be safeguarded against loss in case of power failure. Suitable
accessories are available for this purpose (e.g. a BBU or FBU).
For the purpose of easy and reliable handling of the settings for RAID controllers and hard disks it is
advisable to use the RAID-Manager software “ServerView RAID” that is supplied for PRIMERGY servers. All
the cache settings for controllers and hard disks can usually be made en bloc – specifically for the
application – by using the pre-defined modi “Performance” or “Data Protection”. The “Performance” mode
ensures the best possible performance settings for the majority of the application scenarios.
More information about the setting options of the controller cache is available in the white paper “RAID
Controller Performance”.
Performance values
In general, disk-I/O performance of a RAID array depends on the type and number of hard disks, on the
RAID level and on the RAID controller. If the limits of the system-specific interfaces are not exceeded, the
statements on disk-I/O performance are therefore valid for all PRIMERGY systems. This is why all the
performance statements of the document “RAID Controller Performance” also apply for the PRIMERGY
RX2540 M1 if the configurations measured there are also supported by this system.
The performance values of the PRIMERGY RX2540 M1 are listed in table form below, specifically for
different RAID levels, access types and block sizes. Substantially different configuration versions are dealt
with separately. The established measurement variables, as already mentioned in the subsection
Benchmark description, are used here. Thus, transaction rate is specified for random accesses and data
throughput for sequential accesses. To avoid any confusion among the measurement units the tables have
been separated for the two access types.
The table cells contain the maximum achievable values. This has three implications: On the one hand hard
disks with optimal performance were used (the components used are described in more detail in the
subsection Benchmark environment). Furthermore, cache settings of controllers and hard disks, which are
optimal for the respective access scenario and the RAID level, are used as a basis. And ultimately each
value is the maximum value for the entire load intensity range (# of outstanding I/Os).
In order to also visualize the numerical values each table cell is highlighted with a horizontal bar, the length
of which is proportional to the numerical value in the table cell. All bars shown in the same scale of length
have the same color. In other words, a visual comparison only makes sense for table cells with the same
colored bars.
Since the horizontal bars in the table cells depict the maximum achievable performance values, they are
shown by the color getting lighter as you move from left to right. The light shade of color at the right end of
the bar tells you that the value is a maximum value and can only be achieved under optimal prerequisites.
The darker the shade becomes as you move to the left, the more frequently it will be possible to achieve the
corresponding value in practice.

2.5" - Random accesses (maximum performance values in IO/s):
PRIMERGY RX2540 M1
Model version PY RX2540 M1 24x 2.5'
Configuration version
HDDs random
HDDs random
SSDs random
SSDs random
64 kB blocks
64 kB blocks
8 kB blocks
8 kB blocks
RAID level
67% read
67% read
67% read
67% read
[IO/s]
[IO/s]
[IO/s]
[IO/s]
Controller
Hard disk
#Disks
RAID
type
2 1 464 389 47337 7870

ST91000640NS SATA HDD
Onboard C610 4 0 915 500 78887 14951
SSDSC2BA400G3 SATA SSD
4 10 714 415 63426 12256
2 1 1290 1112 75925 12445
HUC156045CSS204 SAS HDD 8 10 3948 2524 100776 55843
PRAID CP400i
PX02SMF040 SAS SSD 8 0 5490 3466 137616 77081
8 5 2827 1920 29911 19148
2 1 1630 883 78733 12318
8 10 6239 2818 113462 58778
8 0 7828 3783 132049 81445
HUC156045CSS204 SAS HDD
PRAID EP400i 8 5 2903 1779 54614 23046
PX02SMF040 SAS SSD
16 10 10401 5104 110528 78017
24 0 17737 9051 132349 99988
24 5 9675 5375 54107 21825
2 1 1949 1085 80178 12460
8 10 6150 3034 105915 58569
8 0 8065 3887 123219 79893
PRAID EP420i 8 5 3194 1766 54214 22894
PX02SMF040 SAS SSD
16 10 13624 6827 110944 78403
24 0 17713 9263 132539 100570
24 5 9705 5330 54287 22398

2.5" - Sequential accesses (maximum performance values in MB/s):
PRIMERGY RX2540 M1
HDDs sequential
HDDs sequential
SSDs sequential
SSDs sequential
64 kB blocks
64 kB blocks
64 kB blocks
64 kB blocks
100% write
100% write
100% read
100% read
RAID level
[MB/s]
[MB/s]
[MB/s]
[MB/s]
Controller
Hard disk
#Disks
RAID
type
2 1 112 108 726 443

ST91000640NS SATA HDD
Onboard C610 4 0 429 425 1264 1190
4 10 221 213 1027 605
2 1 394 235 1601 421
HUC156045CSS204 SAS HDD 8 10 1006 913 5918 1652
PRAID CP400i
PX02SMF040 SAS SSD 8 0 1816 1820 5838 3295
8 5 1577 1583 5844 1868
2 1 411 235 1596 420
8 10 1001 926 5873 1653
8 0 1836 1808 5818 3295
PRAID EP400i 8 5 1600 1591 5790 2651
PX02SMF040 SAS SSD
16 10 1949 1838 5917 2917
24 0 5320 5168 5914 6518
24 5 4984 2764 5914 2907
2 1 441 274 1595 421
8 10 1027 958 5888 1650
8 0 1919 1851 5848 3281
PRAID EP420i 8 5 1636 1605 5847 2611
PX02SMF040 SAS SSD
16 10 1967 1805 5914 2898
24 0 5272 5308 5912 6522
24 5 5021 2682 5912 2857

3.5" - Random accesses (maximum performance values in IO/s):
PRIMERGY RX2540 M1
HDDs random
HDDs random
SSDs random
SSDs random
64 kB blocks
64 kB blocks
8 kB blocks
8 kB blocks
RAID level
67% read
67% read
67% read
67% read
[IO/s]
[IO/s]
[IO/s]
[IO/s]
Controller
Hard disk
#Disks
RAID
type
2 1 487 435 47337 7870

ST3000NM0033 SATA HDD
Onboard C610 4 0 1081 609 78887 14951
4 10 813 464 63426 12256
2 1 1290 1112 75925 12445
HUC156045CSS204 SAS HDD 8 10 3948 2524 100776 55843
PRAID CP400i
PX02SMF040 SAS-SSD 8 0 5490 3466 137616 77081
8 5 2827 1920 29911 19148
2 1 1630 883 78733 12318
8 10 6239 2818 113462 58778
8 0 7828 3783 132049 81445
PRAID EP400i 8 5 2903 1779 54614 23046
PX02SMF040 SAS-SSD
12 10 7795 3837 111727 74612
12 0 10185 5113 133129 96384
12 5 4319 2631 54586 22676
2 1 1949 1085 80178 12460
HUC156045CSS204 SAS HDD 12 10 8104 3932 104772 72461
PRAID EP420i
PX02SMF040 SAS-SSD 12 0 10608 5255 123978 90934
12 5 5284 2885 53800 22547

3.5" - Sequential accesses (maximum performance values in MB/s):
PRIMERGY RX2540 M1
HDDs sequential
HDDs sequential
SSDs sequential
SSDs sequential
64 kB blocks
64 kB blocks
64 kB blocks
64 kB blocks
100% write
100% write
100% read
100% read
RAID level
[MB/s]
[MB/s]
[MB/s]
[MB/s]
Controller
Hard disk
#Disks
RAID
type
2 1 178 173 726 443

ST3000NM0033 SATA HDD
Onboard C610 4 0 667 672 1264 1190
4 10 350 336 1027 605
2 1 394 235 1601 421
HUC156045CSS204 SAS HDD 8 10 1006 913 5918 1652
PRAID CP400i
PX02SMF040 SAS-SSD 8 0 1816 1820 5838 3295
8 5 1577 1583 5844 1868
2 1 411 235 1596 420
8 10 997 909 5873 1653
8 0 1818 1571 5818 3295
PRAID EP400i 8 5 1609 1545 5790 2651
PX02SMF040 SAS-SSD
12 10 1479 1342 5900 2438
12 0 2697 2694 5898 4824
12 5 2432 2483 5898 2811
2 1 441 274 1595 421
HUC156045CSS204 SAS HDD 12 10 1496 1359 5932 2410
PRAID EP420i
PX02SMF040 SAS-SSD 12 0 2715 2685 5919 4762
12 5 2532 2437 5918 2762
Conclusion
At full configuration with powerful hard disks the PRIMERGY RX2540 M1 achieves a throughput of up to
6522 MB/s for sequential load profiles and a transaction rate of up to 137616 IO/s for typical, random
application scenarios.
For best possible performance we recommend one of the plug-in PCIe controllers. For more than eight hard
disks a controller with cache is required.
To operate SSDs within the maximum performance range the PRAID CP400i is already suited for the simpler
RAID levels 0, 1 and 10, and a controller with cache is to be preferred for RAID 5.
In the event of HDDs the controller cache for random load profiles with a significant write share has
performance advantages for all RAID levels.

SAP SD
The SAP application software consists of modules used to manage all standard business processes. These
include modules for ERP (Enterprise Resource Planning), such as Assemble-to-Order (ATO), Financial
Accounting (FI), Human Resources (HR), Materials Management (MM), Production Planning (PP) plus Sales
and Distribution (SD), as well as modules for SCM (Supply Chain Management), Retail, Banking, Utilities, BI
(Business Intelligence), CRM (Customer Relation Management) or PLM (Product Lifecycle Management).
The application software is always based on a database so that a SAP configuration consists of the
hardware, the software components operating system, zhe database and the SAP software itself.
SAP AG has developed SAP Standard Application Benchmarks in order to verify the performance, stability
and scaling of a SAP application system. The benchmarks, of which SD Benchmark is the most commonly
used and most important, analyze the performance of the entire system and thus measure the quality of the
integrated individual components.
The benchmark differentiates between a 2-tier and a 3-tier configuration. The 2-tier configuration has the
SAP application and database installed on one server. With a 3-tier configuration the individual components
of the SAP application can be distributed via several servers and an additional server handles the database.
The entire specification of the benchmark developed by SAP AG, Walldorf, Germany can be found at:
http://www.sap.com/benchmark.
The measurement set-up is symbolically illustrated below:
2-tier environment
Server Disk subsystem
Network
Benchmark System Under Test (SUT)

driver


Hardware
Processor 2 × Xeon E5-2699 v3
Network interface 1Gbit/s LAN
Disk subsystem PRIMERGY RX2540 M1:
1 × SSD SATA 6G 400GB Main 3.5' H-P EP
1 × FC Ctrl 8Gb/s 2 Chan LPe12002 MMF LC
1 × FibreCAT CX4-480 Storage Unit
Power Supply Unit 1 × Modular PSU 800W platinum hp
Software
BIOS settings DDR Performance = Performance Optimized
Operating system Microsoft Windows Server 2012 Datacenter
Database Microsoft SQL Server 2012 (64-bit)
SAP Business Suite
SAP enhancement package 5 for SAP ERP 6.0
Software
Benchmark driver
Hardware
Model PRIMERGY RX300 S4
Processor 2 × Xeon X5460
Memory 32 GB
Network interface 1Gbit/s LAN
Software
Operating system SUSE Linux Enterprise Server 11 SP1
Benchmark results
Number of SAP SD benchmark users 16,000
Average dialog response time 0.97 seconds
Throughput
Fully processed order line items/hour 1,750,000
Dialog steps/hour 5,250,000
SAPS 87,500
Average database request time (dialog/update) 0.014 sec / 0.026 sec
CPU utilization of central server 98%
Operating system, central server Windows Server 2012 Datacenter Edition
RDBMS SQL Server 2012
SAP Business Suite software SAP enhancement package 5 for SAP ERP 6.0
Configuration Fujitsu PRIMERGY RX2540 M1
Central Server 2 processors / 36 cores / 72 threads
Intel Xeon E5-2699 v3, 2.3GHz, 64KB L1 cache and 256KB L2
cache per core, 45 MB L3 cache per processor
256 GB main memory

The PRIMERGY RX2540 M1 obtained the best 2 processor, two-tier SAP SD Standard
Application Benchmark result on Windows (as of September 10, 2014). The latest SAP SD 2-tier
results can be found at
http://www.sap.com/solutions/benchmark/sd2tier.epx.
Two-Tier SAP SD results on Windows for 2 processor servers:

PRIMERGY RX2540 M1 vs. next best 2-socket servers
Fujitsu PRIMERGY RX2540 M1
2 x Xeon E5-2699 v3
2 processors/36 cores/72 threads
16000
Windows Server 2012 / SQL Server 2012
Certification number: 2014028
IBM System x3650 M5

2 x Xeon E5-2699 v3 16000
Windows Server 2012 / DB2 10
Cisco UCS B260 M4 12280

2 x Xeon E7-4890 v2
Windows Server 2012 / SQL Server 2012
SAP enhancement package 5 for SAP ERP 6.0 0 5000 10000 15000
Number of Benchmark Users
The following diagram illustrates the throughput of the PRIMERGY RX2540 M1 in comparison to its
predecessor, the PRIMERGY RX300 S8, in the respective most performant configuration.
Two-Tier SAP SD results: PRIMERGY RX2540 M1 vs. predecessor
Fujitsu PRIMERGY RX2540 M1

2 x Xeon E5-2699 v3
256 GB main memory
Windows Server 2012 Datacenter Edition 16000
SQL Server 2012
Fujitsu PRIMERGY RX300 S8

2 x Xeon E5-2697 v2
256 GB main memory
10240
Windows Server 2012 Standard Edition
SQL Server 2012
0 5000 10000 15000
Number of Benchmark Users

OLTP-2
OLTP stands for Online Transaction Processing. The OLTP-2 benchmark is based on the typical application
scenario of a database solution. In OLTP-2 database access is simulated and the number of transactions
achieved per second (tps) determined as the unit of measurement for the system.
In contrast to benchmarks such as SPECint and TPC-E, which were standardized by independent bodies
and for which adherence to the respective rules and regulations are monitored, OLTP-2 is an internal
benchmark of Fujitsu. OLTP-2 is based on the well-known database benchmark TPC-E. OLTP-2 was
designed in such a way that a wide range of configurations can be measured to present the scaling of a
system with regard to the CPU and memory configuration.
Even if the two benchmarks OLTP-2 and TPC-E simulate similar application scenarios using the same load
profiles, the results cannot be compared or even treated as equal, as the two benchmarks use different
methods to simulate user load. OLTP-2 values are typically similar to TPC-E values. A direct comparison, or
even referring to the OLTP-2 result as TPC-E, is not permitted, especially because there is no price-
performance calculation.
Further information can be found in the document Benchmark Overview OLTP-2.
Driver Tier A Tier B
Network Network
Application Server Database Server Disk

subsystem
Clients System Under Test (SUT)
All results were determined by way of example on a PRIMERGY RX2540 M1.

Database Server (Tier B)

Hardware
® ®
Processor Intel Xeon Processor E5-2600 v3 Product Family
Memory 1 processor: 8 × 32GB (1x32GB) 4Rx4 DDR4-2133 LR ECC
2 processors: 16 × 32GB (1x32GB) 4Rx4 DDR4-2133 LR ECC
Network interface 2 × onboard LAN 10 Gb/s
Disk subsystem RX2540 M1: Onboard RAID controller PRAID EP400i
2 × 300 GB 15k rpm SAS Drive, RAID1 (OS),
4 × 450 GB 15k rpm SAS Drive, RAID10 (LOG)
5 × LSI MegaRAID SAS 9286CV-8e or 5 × PRAID EP420e
(same performance with OLTP-2)
5 × JX40: 13 × 400 GB SSD Drive each, RAID5 (data)
Software
BIOS Version R1.0.0
Operating system Microsoft Windows Server 2012 R2 Standard
Database Microsoft SQL Server 2014 Enterprise
Application Server (Tier A)

Hardware
Model 1 × PRIMERGY RX200 S8
Memory 64 GB, 1600 MHz registered ECC DDR3
1 × Dual Port LAN 10 Gb/s
Disk subsystem 2 × 250 GB 7.2k rpm SATA Drive
Software
Operating system Microsoft Windows Server 2012 Standard
Client
Hardware
Processor 2 × Xeon E5-2670
Memory 32 GB, 1600 MHz registered ECC DDR3
1 × Dual Port LAN 1Gb/s
Disk subsystem 1 × 250 GB 7.2k rpm SATA Drive
Software
Operating system Microsoft Windows Server 2008 R2 Standard
Benchmark OLTP-2 Software EGen version 1.13.0

Benchmark results
Database performance greatly depends on the configuration options with CPU, memory and on the
connectivity of an adequate disk subsystem for the database. In the following scaling considerations for the
processors we assume that both the memory and the disk subsystem has been adequately chosen and is
not a bottleneck.
A guideline in the database environment for selecting main memory is that sufficient quantity is more
important than the speed of the memory accesses. This why a configuration with a total memory of 512 GB
was considered for the measurements with two processors and a configuration with a total memory of
256 GB for the measurements with one processor. Both memory configurations have memory access of
2133 MHz. Further information about memory performance can be found in the White Paper Memory
performance of Xeon E5-2600 v3 (Haswell-EP)-based systems.
The following diagram shows the OLTP-2 transaction rates that can be achieved with one and two
® ®
processors of the Intel Xeon Processor E5-2600 v3 Product Family.
OLTP-2 tps
E5-2699 v3 - 18C, HT 3768.70
2136.52
E5-2698 v3 - 16C, HT 3435.90

1915.90
E5-2697 v3 - 14C, HT 3235.78

1842.92
E5-2695 v3 - 14C, HT 2985.40
1689.76
E5-2683 v3 - 14C, HT 2889.68
1606.03
E5-2690 v3 - 12C, HT 2838.84

1586.30
E5-2680 v3 - 12C, HT 2765.09
1545.09
E5-2670 v3 - 12C, HT 2601.38
1454.37
E5-2650L v3 - 12C, HT 2188.13
1219.78
E5-2660 v3 - 10C, HT 2357.78

1310.06
E5-2650 v3 - 10C, HT 2232.40
1240.39
E5-2667 v3 - 8C, HT 2244.71

1224.02
E5-2640 v3 - 8C, HT 1996.46
1068.08
E5-2630L v3 - 8C, HT 1623.49
874.06
E5-2630 v3 - 8C, HT 1906.56
1019.98
E5-2643 v3 - 6C, HT 1823.73

979.99
E5-2620 v3 - 6C, HT 1449.64
773.16
E5-2609 v3 - 6C 732.37
409.34
684.82 2CPUs 512GB RAM
E5-2603 v3 - 6C
372.70
1CPU 256GB RAM
E5-2637 v3 - 4C, HT 1247.97

667.62
E5-2623 v3 - 4C, HT 1128.35
608.60
0 500 1000 1500 2000 2500 3000 3500 4000
tps
bold: measured
HT: Hyper-Threading
cursive: calculated

It is evident that a wide performance range is covered by the variety of released processors. If you compare
the OLTP-2 value of the processor with the lowest performance (Xeon E5-2603 v3) with the value of the
processor with the highest performance (Xeon E5-2699 v3), the result is a 5.2-fold increase in performance.
The features of the processors are summarized in the section “Technical data”.
The relatively large performance differences between the processors can be explained by their features. The
values scale on the basis of the number of cores, the size of the L3 cache and the CPU clock frequency and
as a result of the features of Hyper-Threading and turbo mode, which are available in most processor types.
Furthermore, the data transfer rate between processors (“QPI Speed”) also determines performance.
A low performance can be seen in the Xeon E5-2603 v3 and E5-2609 v3 processors, as they have to
manage without Hyper-Threading (HT) and turbo mode (TM).
Within a group of processors with the same number of cores scaling can be seen via the CPU clock
frequency.
If you compare the maximum achievable OLTP-2 values of the current system generation with the values
that were achieved on the predecessor systems, the result is an increase of about 52%.
Maximum OLTP-2 tps

tps Comparison of system generations
4500
+ ~52%
4000
3500
3000
2500
2000
2 × E5-2697 v2 2 × E5-2699 v3
1500 512 GB 512 GB
1000 SQL 2012 SQL 2014
500
0
Predecessor System Current System
Current System TX2560 M1 RX2530 M1 RX2540 M1 RX2560 M1

Predecessor System TX300 S8 RX200 S8 RX300 S8 RX350 S8

TPC-E
The TPC-E benchmark measures the performance of online transaction processing systems (OLTP) and is
based on a complex database and a number of different transaction types that are carried out on it. TPC-E is
not only a hardware-independent but also a software-independent benchmark and can thus be run on every
test platform, i.e. proprietary or open. In addition to the results of the measurement, all the details of the
systems measured and the measuring method must also be explained in a measurement report (Full
Disclosure Report or FDR). Consequently, this ensures that the measurement meets all benchmark
requirements and is reproducible. TPC-E does not just measure an individual server, but a rather extensive
system configuration. Keys to performance in this respect are the database server, disk I/O and network
communication.
The performance metric is tpsE, where tps means transactions per second. tpsE is the average number of
Trade-Result-Transactions that are performed within a second. The TPC-E standard defines a result as the
tpsE rate, the price per performance value (e.g. $/tpsE) and the availability date of the measured
configuration.
Further information about TPC-E can be found in the overview document Benchmark Overview TPC-E.
Benchmark results
In October 2014 Fujitsu submitted a TPC-E benchmark result for the PRIMERGY RX2540 M1 with the
18-core processor Intel Xeon E5-2699 v3 and 512 GB memory.
The results show an enormous increase in performance compared with the PRIMERGY RX300 S8 with a
simultaneous reduction in costs.

TPC-E 1.13.0
FUJITSU Server TPC Pricing 1.7.0
PRIMERGY RX2540 M1
Report Date
October 6, 2014
TPC-E Throughput Price/Performance Availability Date Total System Cost

3,772.08 tpsE $ 130.44 USD per tpsE December 1, 2014 $ 492,023 USD
Database Server Configuration

Operating System Database Manager
Processors/Cores/Threads Memory
Microsoft Windows Server Microsoft SQL Server
2/36/72 512 GB
2012 R2 Standard Edition 2014 Enterprise Edition
SUT Tier A
PRIMERGY RX200 S8
2x Intel Xeon E5-2667 v2 3.30 GHz
64 GB Memory
2x 250 GB 7.2k rpm SATA Drive
2x onboard LAN 1 Gb/s
1x Dual Port LAN 10 Gb/s
Tier B
PRIMERGY RX2540 M1
2x Intel Xeon E5-2699 v3 2.30 GHz
512 GB Memory
2x 300 GB 15k rpm SAS Drives
4x 450 GB 15k rpm SAS Drives
2x onboard LAN 10 Gb/s
6x SAS RAID Controller
Storage
1x PRIMECENTER Rack
5x ETERNUS JX40
65x 400 GB SSD Drives
Storage
Initial Database Size Redundancy Level 1
65 x 400 GB SSD
15,604 GB RAID-5 data and RAID-10 log
4 x 450 GB 15k rpm HDD
More details about this TPC-E result, in particular the Full Disclosure Report, can be found via the TPC web
page http://www.tpc.org/tpce/results/tpce_result_detail.asp?id=114100601.

In October 2014, Fujitsu is represented with five results in the TPC-E list (without historical results).
Price / Watts/
System and Processors Throughput Availability Date
Performance tpsE
PRIMERGY RX300 S7 with 2 × Xeon E5-2690 1871.81 tpsE $175.57 per tpsE 0.69 August 17, 2012
PRIMERGY RX500 S7 with 4 × Xeon E5-4650 2651.27 tpsE $161.95 per tpsE 0.68 November 1, 2012
PRIMERGY RX300 S8 with 2 × Xeon E5-2697 v2 2472.58 tpsE $135.14 per tpsE - September 10, 2013
PRIMEQUEST 2800E with 2 × Xeon E7-8890 v2 8582.52 tpsE $205.43 per tpsE - Mai 1, 2014
PRIMERGY RX2540 M1 with 2 × Xeon E5-2699 v3 3772.08 tpsE $130.44 per tpsE - December 1, 2014
See the TPC web site for more information and all the TPC-E results (including historical results)
(http://www.tpc.org/tpce).
The following diagram for 2-socket PRIMERGY systems with different processor types shows the good
performance of the 2-socket system PRIMERGY RX2540 M1.
tpsE $/tpsE
4500
tpsE
4000 $ per tpsE
3,772.08 500
3500
400
3000
2,472.58
2500
300
1,871.81
2000
1500 200
$175.57
$135.14 $130.44
1000
better
100
better
500
0 0
PRIMERGY PRIMERGY PRIMERGY
RX300 S7 RX300 S8 RX2540 M1
2 × E5-2690 2 × E5-2697 v2 2 × E5-2699 v3
512 GB 512 GB 512 GB
Price / Watts/
System and Processors Throughput Availability Date
Performance tpsE
PRIMERGY RX300 S7 with 2 × Xeon E5-2690 1871.81 tpsE $175.57 per tpsE 0.69 August 17, 2012
PRIMERGY RX300 S8 with 2 × Xeon E5-2697 v2 2472.58 tpsE $135.14 per tpsE - September 10, 2013
PRIMERGY RX2540 M1 with 2 × Xeon E5-2699 v3 3772.08 tpsE $130.44 per tpsE - December 1, 2014
In comparison with the PRIMERGY RX300 S8 the increase in performance is +53% and in comparison with
the PRIMERGY RX300 S7 +102%. The price per performance is $130.44/tpsE. Compared with the
PRIMERGY RX300 S8 the costs are reduced to 97% and with the PRIMERGY RX300 S7 to 74%.

The following overview, sorted according to price/performance, shows the best TPC-E price per
th
performance ratios (as of October 6 , 2014, without historical results) and the corresponding
TPC-E throughputs. PRIMERGY RX2540 M1 with a price per performance ratio of $130.44/tpsE
achieved the best cost-effectiveness.
Processor type tpsE

$/tpsE availability
System processors/ (higher is
(low er is better) date
cores/threads better)
2 × Intel Xeon
Fujitsu PRIMERGY RX2540 M1 3,772.08 130.44 2014-12-01
E5-2699 v3
2 × Intel Xeon
Fujitsu PRIMERGY RX300 S8 2,472.58 135.14 2013-09-10
E5-2697 v2
2 × Intel Xeon
IBM System x3650 M4 2,590.93 150.00 2013-11-29
E5-2697 v2
4 × Intel Xeon
E5-4650
2 × Intel Xeon
HP ProLiant DL380p Gen8 1,881.76 173.00 2012-12-21
E5-2690
2 × Intel Xeon
E5-2690
2 × AMD Opteron
HP ProLiant DL385p Gen8 1,416.37 183.00 2013-05-15
6386SE
4 × Intel Xeon
IBM System x3850 X6 5,576.27 188.69 2014-04-15
E7-4890 v2
8 × Intel Xeon
Fujitsu PRIMEQUEST 2800E 8,582.52 205.43 2014-05-01
E7-8890 v2
2 × Intel Xeon
IBM System x3650 M4 1,863.23 207.85 2012-05-31
E5-2690
See the TPC web site for more information and all the TPC-E results (including historical results)
(http://www.tpc.org/tpce).

vServCon
vServCon is a benchmark used by Fujitsu to compare server configurations with hypervisor with regard to
their suitability for server consolidation. This allows both the comparison of systems, processors and I/O
technologies as well as the comparison of hypervisors, virtualization forms and additional drivers for virtual
machines.
vServCon is not a new benchmark in the true sense of the word. It is more a framework that combines
already established benchmarks (or in modified form) as workloads in order to reproduce the load of a
consolidated and virtualized server environment. Three proven benchmarks are used which cover the
application scenarios database, application server and web server.
Application scenario Benchmark No. of logical CPU cores Memory

Database Sysbench (adapted) 2 1.5 GB
Java application server SPECjbb (adapted, with 50% - 60% load) 2 2 GB
Web server WebBench 1 1.5 GB
Each of the three application scenarios is allocated to a dedicated virtual machine (VM). Add to these a
fourth machine, the so-called idle VM. These four VMs make up a “tile”. Depending on the performance
capability of the underlying server hardware, you may as part of a measurement also have to start several
identical tiles in parallel in order to achieve a maximum performance score.
System Under Test
Database Java Web Idle Tile n

VM VM VM VM
… …
Database Java Web Idle Tile 3

VM VM VM VM
VM VM VM VM
VM VM VM VM
Each of the three vServCon application scenarios provides a specific benchmark result in the form of
application-specific transaction rates for the respective VM. In order to derive a normalized score, the
individual benchmark results for one tile are put in relation to the respective results of a reference system.
The resulting relative performance values are then suitably weighted and finally added up for all VMs and
tiles. The outcome is a score for this tile number.
Starting as a rule with one tile, this procedure is performed for an increasing number of tiles until no further
significant increase in this vServCon score occurs. The final vServCon score is then the maximum of the
vServCon scores for all tile numbers. This score thus reflects the maximum total throughput that can be
achieved by running the mix defined in vServCon that consists of numerous VMs up to the possible full
utilization of CPU resources. This is why the measurement environment for vServCon measurements is
designed in such a way that only the CPU is the limiting factor and that no limitations occur as a result of
other resources.
The progression of the vServCon scores for the tile numbers provides useful information about the scaling
behavior of the “System under Test”.
A detailed description of vServCon is in the document: Benchmark Overview vServCon.

Framework
Server Disk subsystem
controller
Multiple
1Gb or 10Gb
networks

Load generators
All results were determined by way of example on a PRIMERGY RX2540 M1.

Hardware
® ®
Processor Intel Xeon Processor E5-2600 v3 Product Family
Memory 1 processor: 8 × 32GB (1x32GB) 4Rx4 DDR4-2133 LR ECC
2 processors: 16 × 32GB (1x32GB) 4Rx4 DDR4-2133 LR ECC
Network interface 1 × dual port 1GbE adapter
1 × dual port 10GbE server adapter
Disk subsystem 1 × dual-channel FC controller Emulex LPe12002
ETERNUS DX80 storage systems:
Each Tile: 50 GB LUN
Each LUN: RAID 0 with 2 × Seagate ST3300657SS-Disks (15 krpm)
Software
Operating system VMware ESXi 5.5.0 U2 Build 2068190
Load generator (incl. Framework controller)

Hardware (Shared)
Enclosure PRIMERGY BX900
Hardware
Model 18 × PRIMERGY BX920 S1 server blades
Memory 12 GB
Network interface 3 × 1 Gbit/s LAN
Software
Operating system Microsoft Windows Server 2003 R2 Enterprise with Hyper-V

Load generator VM (per tile 3 load generator VMs on various server blades)
Hardware
Processor 1 × logical CPU
Memory 512 MB
Software
Operating system Microsoft Windows Server 2003 R2 Enterprise Edition
Benchmark results
®
The PRIMERGY dual-socket rack and tower systems dealt with here are based on processors of the Intel
®
Xeon Processor E5-2600 v3 Product Family. The features of the processors are summarized in the section
“Technical data”.
The available processors of these systems with their results can be seen in the following table.
Processor Score #Tiles
4 Cores E5-2623 v3 7.71 4

Hyper-Threading, Turbo Mode E5-2637 v3 8.65 4
E5-2603 v3 5.13 5
6 Cores
E5-2609 v3 5.83 5
6 Cores E5-2620 v3 10.1 6

E5-2630L v3 11.4 8
8 Cores E5-2630 v3 13.6 8
Intel Xeon Processor
Hyper-Threading, Turbo Mode

E5 v3 Product Family
E5-2640 v3 14.1 8
E5-2667 v3 15.9 8
10 Cores E5-2650 v3 16.6 10

®

E5-2650L v3 16.2 11
®
12 Cores E5-2670 v3 20.0 12

E5-2690 v3 22.4 13
E5-2683 v3 21.6 14
14 Cores
E5-2695 v3 23.5 14
E5-2697 v3 25.5 15
16 Cores
E5-2698 v3 27.3 16
18 Cores
E5-2699 v3 30.3 18
These PRIMERGY dual-socket rack and tower systems are very suitable for application virtualization thanks
to the progress made in processor technology. Compared with a system based on the previous processor
generation an approximate 76% higher virtualization performance can be achieved (measured in vServCon
score in their maximum configuration).

The first diagram compares the virtualization performance values that can be achieved with the processors
reviewed here.
® ®
Intel Xeon Processor E5-2600 v3 Product Family #Tiles
4 4 5 5 6 6 8 8 8 8 10 10 11 12 12 13 14 14 15 16 18
30
25
Final vServCon Score
20
15
E5-2650L v3 - 12 Cores
E5-2630L v3 - 8 Cores
E5-2698 v3 - 16 Cores
E5-2650 v3 - 10 Cores
E5-2660 v3 - 10 Cores
E5-2670 v3 - 12 Cores
E5-2680 v3 - 12 Cores
E5-2690 v3 - 12 Cores
E5-2683 v3 - 14 Cores
E5-2695 v3 - 14 Cores
E5-2697 v3 - 14 Cores
E5-2699 v3 - 18 Cores
10
E5-2623 v3 - 4 Cores
E5-2637 v3 - 4 Cores
E5-2603 v3 - 6 Cores
E5-2609 v3 - 6 Cores
E5-2620 v3 - 6 Cores
E5-2643 v3 - 6 Cores
E5-2630 v3 - 8 Cores
E5-2640 v3 - 8 Cores
E5-2667 v3 - 8 Cores
The relatively large performance differences between the processors can be explained by their features. The
values scale on the basis of the number of cores, the size of the L3 cache and the CPU clock frequency and
as a result of the features of Hyper-Threading and turbo mode, which are available in most processor types.
Furthermore, the data transfer rate between processors (“QPI Speed”) also determines performance.
A low performance can be seen in the Xeon E5-2603 v3 and E5-2609 v3 processors, as they have to
manage without Hyper-Threading (HT) and turbo mode (TM). In principle, these weakest processors are only
to a limited extent suitable for the virtualization environment.
Within a group of processors with the same number of cores scaling can be seen via the CPU clock
frequency.
As a matter of principle, the memory access speed also influences performance. A guideline in the
virtualization environment for selecting main memory is that sufficient quantity is more important than the
speed of the memory accesses. The vServCon scaling measurements presented here were all performed
with a memory access speed – depending on the processor type – of at most 2133 MHz. More information
about the topic “Memory Performance” and QPI architecture can be found in the White Paper Memory
performance of Xeon E5-2600 v3 (Haswell-EP)-based systems.

Until now we have looked at the virtualization performance of a

35 fully configured system. However, with a server with two sockets
the question also arises as to how good performance scaling is
30
from one to two processors. The better the scaling, the lower the
Final vServCon Score
× 1.97
25 overhead usually caused by the shared use of resources within a
server. The scaling factor also depends on the application. If the
20 server is used as a virtualization platform for server consolidation,
15 the system scales with a factor of 1.97. When operated with two
30.3@18 tiles
processors, the system thus achieves a significantly better
15.4@9 tiles
10 performance than with one processor, as is illustrated in the

diagram opposite using the processor version Xeon E5-2699 v3 as
5
an example.
0
1 x E5-2699 v3 2 x E5-2699 v3
The next diagram illustrates the virtualization performance for increasing numbers of VMs based on the
Xeon E5-2640 v3 (8 core) and E5-2695 v2 (14 core) processors.
In addition to the increased
number of physical cores, E5-2640 v3 E5-2695 v3
Hyper-Threading, which is 25
supported by almost all
®
processors of the Intel
®
Xeon Processor E5-2600 v3
Product Family, is an 20
additional reason for the high
vServCon Score
number of VMs that can be

operated. As is known, a 15
physical processor core is
consequently divided into two
logical cores so that the
number of cores available for 10
the hypervisor is doubled.
This standard feature thus
generally increases the 5
virtualization performance of
a system.
5.70
21.0
2.89
8.36
10.3
11.6
12.8
13.8
14.1
2.65
5.47
8.20
10.7
13.3
14.8
17.2
18.9
19.8
22.0
22.7
23.4
23.5
0
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 11 12 13 14
#Tiles
The previous diagram examined the total performance of all application VMs of a host. However, studying
the performance from an individual application VM viewpoint is also interesting. This information is in the
previous diagram. For example, the total optimum is reached in the above Xeon E5-2640 v3 situation with 24
application VMs (eight tiles, not including the idle VMs); the low load case is represented by three application
VMs (one tile, not including the idle VM). Remember: the vServCon score for one tile is an average value
across the three application scenarios in vServCon. This average performance of one tile drops when
changing from the low load case to the total optimum of the vServCon score - from 2.89 to 14.1/8=1.76, i.e.
to 61%. The individual types of application VMs can react very differently in the high load situation. It is thus
clear that in a specific situation the performance requirements of an individual application must be balanced
against the overall requirements regarding the numbers of VMs on a virtualization host.

The virtualization-relevant progress in processor technology since 2008 has an effect on the one hand on an
individual VM and, on the other hand, on the possible maximum number of VMs up to CPU full utilization.
The following comparison shows the proportions for both types of improvements.
Six systems with similar housing
2008 2009 2011 2012 2013 2014/2015
construction are compared: a
RX200 S4 RX200 S5 RX200 S6 RX200 S7 RX200 S8 RX2530 M1
system from 2008, a system from
2009, a system from 2011, a RX300 S4 RX300 S5 RX300 S6 RX300 S7 RX300 S8 RX2540 M1
system from 2012, a system from - - TX300 S6 RX350 S7 RX350 S8 RX2560 M1
2013 and a current system with TX300 S4 TX300 S5 TX300 S6 TX300 S7 TX300 S8 TX2560 M1
the best processors each (see
table below) for few VMs and for highest maximum performance.
Best vServCon Best vServCon

Performance Score Maximum Score
Few VMs 1 Tile Performance max.
2008 X5460 1.91 X5460 2.94@2 tiles
2009 X5570 2.45 X5570 6.08@ 6 tiles
2011 X5690 2.63 X5690 9.61@ 9 tiles
2012 E5-2643 2.73 E5-2690 13.5@ 8 tiles
2013 E5-2667 v2 2.85 E5-2697 v2 17.1@11 tiles
2014 E5-2643 v3 3.22 E5-2699 v3 30.3@18tiles
The clearest performance improvements arose from 2008 to 2009 with the introduction of the Xeon 5500
1
processor generation (e. g. via the feature “Extended Page Tables” (EPT) ). One sees an increase of the
vServCon score by a factor of 1.28 with a few VMs (one tile).
Virtualization relevant improvements

10
Few VMs (1 Tile)
9
6
vServCon Score
5
× 1.04 × 1.13
4 × 1.04
× 1.28 × 1.07
3
2
2.73 2.85 3.22
2.45 2.63
1 1.91
0
2008 2009 2011 2012 2013 2014 Year
Y CPU
X5460 X5570 X5690 E5-2643 E5-2667 v2 E5-2643 v3
3.17 GHz 2.93 GHz 2.93 GHz 3.3 GHz 3.3 GHz 3.4 GHz Freq.
4C 4C 6C 4C 8C 6C #Cores
With full utilization of the systems with VMs there was an increase by a factor of 2.07. The one reason was
the performance increase that could be achieved for an individual VM (see score for a few VMs). The other
reason was that more VMs were possible with total optimum (via Hyper-Threading). However, it can be seen
that the optimum was “bought” with a triple number of VMs with a reduced performance of the individual VM.
1
EPT accelerates memory virtualization via hardware support for the mapping between host and guest memory
addresses.

Virtualization relevant improvements
35
Score at optimum Tile count × 1.77
30
25
× 1.27
vServCon Score
20
× 1.40
15 30.3
× 1.58
× 2.07
10
17.1
13.5
5 9.61
6.08
2.94
0
2008 2009 2011 2012 2013 2014 Year
Y CPU
X5460 X5570 X5690 E5-2690 E5-2697 v2 E5-2699 v3
3.17 GHz 2.93 GHz 2.93 GHz 2.9 GHz 2.7 GHz 2.3 GHz Freq.
4C 4C 6C 8C 12C 18C #Cores
Where exactly is the technology progress between 2009 and 2014?

The performance for an individual VM in low-load situations has only slightly increased for the processors
compared here with the highest clock frequency per core. We must explicitly point out that the increased
virtualization performance as seen in the score cannot be completely deemed as an improvement for one
individual VM.
The decisive progress is in the higher number of physical cores and – associated with it – in the increased
values of maximum performance (factor 1.58, 1.40, 1.27 and 1.77 in the diagram).
Up to and including 2011 the best processor type of a processor generation had both the highest clock
frequency and the highest number of cores. From 2012 there have been differently optimized processors on
offer: Versions with a high clock frequency per core for few cores and versions with a high number of cores,
but with a lower clock frequency per core. The features of the processors are summarized in the section
“Technical data”.
Performance increases in the virtualization environment since 2009 are mainly achieved by increased VM
numbers due to the increased number of available logical or physical cores. However, since 2012 it has
been possible - depending on the application scenario in the virtualization environment – to also select a
CPU with an optimized clock frequency if a few or individual VMs require maximum computing power.

VMmark V2
VMmark V2 is a benchmark developed by VMware to compare server configurations with hypervisor
solutions from VMware regarding their suitability for server consolidation. In addition to the software for load
generation, the benchmark consists of a defined load profile and binding regulations. The benchmark results
can be submitted to VMware and are published on their Internet site after a successful review process. After
the discontinuation of the proven benchmark “VMmark V1” in October 2010, it has been succeeded by
“VMmark V2”, which requires a cluster of at least two servers and covers data center functions, like Cloning
and Deployment of virtual machines (VMs), Load Balancing, as well as the moving of VMs with vMotion and
also Storage vMotion.
In addition to the “Performance Only” result, it is also possible from version 2.5 of VMmark to alternatively
measure the electrical power consumption and publish it as a “Performance with Server Power” result (power
consumption of server systems only) and/or “Performance with Server and Storage Power” result (power
consumption of server systems and all storage components).
VMmark V2 is not a new benchmark in the actual sense. Application scenario Load tool # VMs
It is in fact a framework that consolidates already
established benchmarks, as workloads in order to Mail server LoadGen 1
simulate the load of a virtualized consolidated server Web 2.0 Olio client 2
environment. Three proven benchmarks, which cover E-commerce DVD Store 2 client 4
the application scenarios mail server, Web 2.0, and
Standby server (IdleVMTest) 1
e-commerce were integrated in VMmark V2.
Each of the three application scenarios is assigned to a total of seven dedicated virtual machines. Then add
to these an eighth VM called the “standby server”. These eight VMs form a “tile”. Because of the
performance capability of the underlying server hardware, it is usually necessary to have started several
identical tiles in parallel as part of a measurement in order to achieve a maximum overall performance.
A new feature of VMmark V2 is an infrastructure component, which is present once for every two hosts. It
measures the efficiency levels of data center consolidation through VM Cloning and Deployment, vMotion
and Storage vMotion. The Load Balancing capacity of the data center is also used (DRS, Distributed
Resource Scheduler).
The result of VMmark V2 for test type „Performance Only“ is a number, known as a “score”, which provides
information about the performance of the measured virtualization solution. The score reflects the maximum
total consolidation benefit of all VMs for a server configuration with hypervisor and is used as a comparison
criterion of various hardware platforms.
This score is determined from the individual results of the VMs and an infrastructure result. Each of the five
VMmark V2 application or front-end VMs provides a specific benchmark result in the form of application-
specific transaction rates for each VM. In order to derive a normalized score the individual benchmark results
for one tile are put in relation to the respective results of a reference system. The resulting dimensionless
performance values are then averaged geometrically and finally added up for all VMs. This value is included
in the overall score with a weighting of 80%. The infrastructure workload is only present in the benchmark
once for every two hosts; it determines 20% of the result. The number of transactions per hour and the
average duration in seconds respectively are determined for the score of the infrastructure workload
components.
In addition to the actual score, the number of VMmark V2 tiles is always specified with each VMmark V2
score. The result is thus as follows: “Score@Number of Tiles”, for example “4.20@5 tiles”.
In the case of the two test types “Performance with Server Power” and “Performance with Server and
Storage Power” a so-called “Server PPKW Score” and “Server and Storage PPKW Score” is determined,
which is the performance score divided by the average power consumption in kilowatts (PPKW =
performance per kilowatt (KW)).
The results of the three test types should not be compared with each other.
A detailed description of VMmark V2 is available in the document Benchmark Overview VMmark V2.

Clients & Management Server(s) Storage System
Multiple
1Gb or 10Gb
networks
Load Generators vMotion

incl. Prime Client and network
Datacenter Management
Server
System under Test (SUT)
“Performance Only” measurement result:

Hardware
Number of servers 2
Memory 512 GB: 16 × 32GB (1x32GB) 4Rx4 DDR4-2133 LR ECC
Network interface 1 × Emulex OneConnect OCe14000 Dual Port Adapter with PLAN EM 2x1Gb T interface
card
1 × Fujitsu Eth Ctrl 2x10Gbit PCIe x8 D2755 SFP+
Disk subsystem 1 × Dual port PFC EP LPe16002
2 × PRIMERGY RX300 S8 configured as Fibre Channel target:
7/6 × SAS-SSD (400 GB)
®
2 × Fusion-io ioDrive 2 PCIe-SSD (1.2 TB)
RAID 0 with several LUNs
Total: 8440 GB
Software
BIOS Version V5.0.0.9 R1.1.0
BIOS settings See details
Operating system ESX settings: see details
settings
Details
See disclosure http://www.vmware.com/a/assets/vmmark/pdf/2014-09-08-Fujitsu-RX2540M1.pdf

“Performance with Server Power” and “Performance with Server and Storage Power” measurement
results:

Hardware
Number of servers 2
Memory 384 GB: 12 × 32GB (1x32GB) 4Rx4 DDR4-2133 LR ECC
Network interface 1 × Emulex OneConnect OCe14000 Dual Port Adapter mit PLAN EM 2x1Gb T interface
card
1 × Fujitsu Eth Ctrl 2x10Gbit PCIe x8 D2755 SFP+
Disk subsystem 1 × Dual port PFC EP LPe12002
1 × PRIMERGY RX300 S8 konfiguriert als Fibre Channel Target:
12 × SAS-SSD (400 GB)
®
2 × Fusion-io ioDrive 2 PCIe-SSD (1.2 TB)
RAID 0 mit mehreren LUNs
Gesamt: 6292 GB
Software
BIOS Version V5.0.0.9 R1.3.0
BIOS settings See details
Operating system ESX settings: see details
settings
Details
See disclosure http://www.vmware.com/a/assets/vmmark/pdf/2014-09-30-Fujitsu-RX2540M1-serverptd.pdf
http://www.vmware.com/a/assets/vmmark/pdf/2014-09-30-Fujitsu-RX2540M1-allptd.pdf
Common for all measurement results:
Datacenter Management Server (DMS)

Hardware (Shared)
Network Switch 1 × PRIMERGY BX600 GbE Switch Blade 30/12
Hardware
Model 1 × server blade PRIMERGY BX620 S5
Memory 24 GB
Software
Operating system VMware ESXi 5.1.0 Build 799733
Datacenter Management Server (DMS) VM
Hardware
Memory 10 GB
Software
Operating system Microsoft Windows Server 2008 R2 Enterprise x64 Edition

Prime Client
Hardware (Shared)
Network Switch 1 × PRIMERGY BX600 GbE Switch Blade 30/12
Hardware
Model 1 × server blade PRIMERGY BX620 S5
Memory 12 GB
Software
Load generator
Hardware
Processor 4 × Xeon E7-4870
Memory 512 GB
Software
Operating system VMware ESX 4.1.0 U2 Build 502767
Load generator VM (per tile 1 load generator VM)
Hardware
Memory 4 GB
Software

Benchmark results
th
“Performance Only” measurement result (September 8 2014)
On September 8, 2014 Fujitsu achieved with a PRIMERGY RX2540 M1 with Xeon E5-2699 v3
processors and VMware ESXi 5.5.0 U2 a VMmark V2 score of “26.48@22 tiles” in a system
configuration with a total of 2 × 36 processor cores and when using two identical servers in the
“System under Test” (SUT). With this result the PRIMERGY RX2540 M1 is in the official
VMmark V2 “Performance Only” ranking the most powerful 2-socket server in a “matched pair” configuration
consisting of two identical hosts (valid as of benchmark results publication date).
th
All comparisons for the competitor products reflect the status of 17 September 2014. The current
VMmark V2 “Performance Only” results as well as the detailed results and configuration data are available at
http://www.vmware.com/a/vmmark/.
The diagram shows the “Performance Only” result of the PRIMERGY RX2540 M1 in comparison with the
® ®
best competitor system with 2 × 2 processors of the Intel Xeon Processor E5-2600 Product Family
(v1/v2/v3).
PRIMERGY RX2540 M1 vs. HP ProLiant DL380p Gen8

30
+60%
25
VMmark V2 Score
20
15
26.48@22 tiles
16.54@14 tiles
10
0
2 × Fujitsu 2 × HP
PRIMERGY RX2540 M1 ProLiant DL380p Gen8
2 × 2 × Xeon 2 × 2 × Xeon
E5-2699 v3 E5-2697 v2
The processors used, which with a good hypervisor setting could make optimal use of their processor
features, were the essential prerequisites for achieving the PRIMERGY RX2540 M1 result. These features
include Hyper-Threading. All this has a particularly positive effect during virtualization.
All VMs, their application data, the host operating system as well as additionally required data were on a
powerful Fibre Channel disk subsystem. As far as possible, the configuration of the disk subsystem takes the
specific requirements of the benchmark into account. The use of flash technology in the form of SAS SSDs
and PCIe-SSDs in the powerful Fibre Channel disk subsystem resulted in further advantages in response
times of the storage medium used.
The network connection to the load generators was implemented via 10Gb LAN ports. The infrastructure-
workload connection between the hosts was by means of 1Gb LAN ports.
All the components used were optimally attuned to each other.

th
“Performance with Server Power” measurement result (September 30 2014)
processors and VMware ESXi 5.5.0 U2 a VMmark V2 “Server PPKW Score” of
“25.2305@22 tiles” in a system configuration with a total of 2 × 36 processor cores and when
using two identical servers in the “System under Test” (SUT). With this result the PRIMERGY
RX2540 M1 is in the official VMmark V2 “Performance with Server Power” ranking the most energy-efficient
virtualization server worldwide (valid as of benchmark results publication date).
th
VMmark V2 “Performance with Server Power” results as well as the detailed results and configuration data
are available at http://www.vmware.com/a/vmmark/2/.
The diagram shows the “Performance with Server Power” result of the PRIMERGY RX2540 M1 in
comparison with the best competitor system.
Performance with Server Power

30
+43%
VMmark V2 Server PPKW Score
25
20
15
25.2305@22 tiles
17.6899@20 tiles
10
0
PRIMERGY RX2540 M1 ProLiant DL380 Gen9
2 × 2 × Xeon 2 × 2 × Xeon
E5-2699 v3 E5-2699 v3

th
“Performance with Server and Storage Power” measurement result (September 30 2014)
processors and VMware ESXi 5.5.0 U2 a VMmark V2 “Server and Storage PPKW Score” of
“20.8067@22 tiles” in a system configuration with a total of 2 × 36 processor cores and when
using two identical servers in the “System under Test” (SUT). With this result the PRIMERGY
RX2540 M1 is in the official VMmark V2 “Performance with Server and Storage Power” ranking the most
energy-efficient virtualization platform worldwide (valid as of benchmark results publication date).
th
VMmark V2 “Performance with Server and Storage Power” results as well as the detailed results and
configuration data are available at http://www.vmware.com/a/vmmark/3/.
The diagram shows the “Performance with Server and Storage Power” result of the PRIMERGY RX2540 M1
in comparison with the best competitor system.
Performance with Server and Storage Power

25
VMmark V2 Server and Storage PPKW Score
+64%
20
15
20.8067@22 tiles
12.7058@20 tiles
10
0
PRIMERGY RX2540 M1 ProLiant DL380 Gen9
2 × 2 × Xeon 2 × 2 × Xeon
E5-2699 v3 E5-2699 v3

STREAM
STREAM is a synthetic benchmark that has been used for many years to determine memory throughput and
which was developed by John McCalpin during his professorship at the University of Delaware. Today
STREAM is supported at the University of Virginia, where the source code can be downloaded in either
Fortran or C. STREAM continues to play an important role in the HPC environment in particular. It is for
example an integral part of the HPC Challenge benchmark suite.
The benchmark is designed in such a way that it can be used both on PCs and on server systems. The unit
of measurement of the benchmark is GB/s, i.e. the number of gigabytes that can be read and written per
second.
STREAM measures the memory throughput for sequential accesses. These can generally be performed
more efficiently than accesses that are randomly distributed on the memory, because the processor caches
are used for sequential access.
Before execution the source code is adapted to the environment to be measured. Therefore, the size of the
data area must be at least 12 times larger than the total of all last-level processor caches so that these have
as little influence as possible on the result. The OpenMP program library is used to enable selected parts of
the program to be executed in parallel during the runtime of the benchmark, consequently achieving optimal
load distribution to the available processor cores.
During implementation the defined data area, consisting of 8-byte elements, is successively copied to four
types, and arithmetic calculations are also performed to some extent.
Type Execution Bytes per step Floating-point calculation per step

COPY a(i) = b(i) 16 0
SCALE a(i) = q × b(i) 16 1
SUM a(i) = b(i) + c(i) 24 1
TRIAD a(i) = b(i) + q × c(i) 24 2
The throughput is output in GB/s for each type of calculation. The differences between the various values are
usually only minor on modern systems. In general, only the determined TRIAD value is used as a
comparison.
The measured results primarily depend on the clock frequency of the memory modules; the processors
influence the arithmetic calculations.
9
This chapter specifies throughputs on a basis of 10 (1 GB/s = 10 Byte/s).

Hardware
® ®
Processor 2 processors of Intel Xeon Processor E5-2600 v3 Product Family
Software
EnergyPerformance = Performance
Xeon E5-2603 v3, E5-2609 v3, E5-2637 v3, E5-2698 v3, E5-2699 v3:
COD Enable = disabled, Early Snoop = disabled
BIOS settings Xeon E5-2630 v3, E5-2630L v3, E5-2640 v3:
COD Enable = disabled, Early Snoop = enabled
All others:
COD Enable = enabled, Early Snoop = disabled
Xeon E5-2699 v3: Red Hat Enterprise Linux Server release 6.5
Operating system
Operating system
Transparent Huge Pages inactivated
settings
Compiler Intel C++ Composer XE 2013 SP1 for Linux Update 1
Benchmark Stream.c Version 5.9

Benchmark results
Processor Memory Max. Memory Cores Processor Number of TRIAD
Frequency Bandwidth Frequency Processors
[MHz] [GB/s] [GHz] [GB/s]
Xeon E5-2603 v3 1600 51 6 1.60 2 47.9
Xeon E5-2609 v3 1600 51 6 1.90 2 58.1
Xeon E5-2623 v3 1866 59 4 3.00 2 73.3 (est.)

Xeon E5-2620 v3 1866 59 6 2.40 2 88.9 (est.)
Xeon E5-2630L v3 1866 59 8 1.80 2 86.2
Xeon E5-2630 v3 1866 59 8 2.40 2 89.9
Xeon E5-2640 v3 1866 59 8 2.60 2 89.6
Xeon E5-2637 v3 2133 68 4 3.50 2 89.8

Xeon E5-2643 v3 2133 68 6 3.40 2 90.3 (est.)
Xeon E5-2667 v3 2133 68 8 3.20 2 90.3 (est.)
Xeon E5-2650 v3 2133 68 10 2.30 2 116 (est.)
Xeon E5-2660 v3 2133 68 10 2.60 2 115 (est.)
Xeon E5-2650L v3 2133 68 12 1.80 2 115
Xeon E5-2670 v3 2133 68 12 2.30 2 118 (est.)
Xeon E5-2680 v3 2133 68 12 2.50 2 118 (est.)
Xeon E5-2690 v3 2133 68 12 2.60 2 117
Xeon E5-2683 v3 2133 68 14 2.00 2 117 (est.)
Xeon E5-2695 v3 2133 68 14 2.30 2 118 (est.)
Xeon E5-2697 v3 2133 68 14 2.60 2 117 (est.)
Xeon E5-2698 v3 2133 68 16 2.30 2 117
Xeon E5-2699 v3 2133 68 18 2.30 2 116
The results marked (est.) are estimates.

Literature
PRIMERGY Servers
http://primergy.com/
PRIMERGY RX2540 M1
This White Paper:
http://docs.ts.fujitsu.com/dl.aspx?id=0cf6083d-23ab-4634-86e3-4d86ddf6e4f8
http://docs.ts.fujitsu.com/dl.aspx?id=884d7bdc-49ad-4550-b1a9-292d5770df9c
http://docs.ts.fujitsu.com/dl.aspx?id=4c793f74-a016-4bc7-8066-7775c8de5a1f
Data sheet
http://docs.ts.fujitsu.com/dl.aspx?id=84433438-7da7-4a24-b388-564f4e844fc9
PRIMERGY Performance
http://www.fujitsu.com/fts/x86-server-benchmarks
Performance of Server Components
http://www.fujitsu.com/fts/products/computing/servers/mission-critical/benchmarks/x86-
components.html
BIOS optimizations for Xeon E5-2600 v3 based systems
http://docs.ts.fujitsu.com/dl.aspx?id=f154aca6-d799-487c-8411-e5b4e558c88b
Memory performance of Xeon E5-2600 v3 (Haswell-EP)-based systems
http://docs.ts.fujitsu.com/dl.aspx?id=74eb62e6-4487-4d93-be34-5c05c3b528a6
RAID Controller Performance
http://docs.ts.fujitsu.com/dl.aspx?id=e2489893-cab7-44f6-bff2-7aeea97c5aef
Disk I/O: Performance of storage media and RAID controllers
Basics of Disk I/O Performance
http://docs.ts.fujitsu.com/dl.aspx?id=65781a00-556f-4a98-90a7-7022feacc602
Information about Iometer
http://www.iometer.org
OLTP-2
Benchmark Overview OLTP-2
http://docs.ts.fujitsu.com/dl.aspx?id=e6f7a4c9-aff6-4598-b199-836053214d3f
SAP SD
http://www.sap.com/benchmark
Benchmark overview SAP SD
http://docs.ts.fujitsu.com/dl.aspx?id=0a1e69a6-e366-4fd1-a1a6-0dd93148ea10
SPECcpu2006
http://www.spec.org/osg/cpu2006
Benchmark overview SPECcpu2006
http://docs.ts.fujitsu.com/dl.aspx?id=1a427c16-12bf-41b0-9ca3-4cc360ef14ce
SPECpower_ssj2008
http://www.spec.org/power_ssj2008
Benchmark Overview SPECpower_ssj2008
http://docs.ts.fujitsu.com/dl.aspx?id=166f8497-4bf0-4190-91a1-884b90850ee0
STREAM
http://www.cs.virginia.edu/stream/

TPC-E
http://www.tpc.org/tpce
Benchmark Overview TPC-E
http://docs.ts.fujitsu.com/dl.aspx?id=da0ce7b7-3d80-48cd-9b3a-d12e0b40ed6d
VMmark V2
Benchmark Overview VMmark V2
http://docs.ts.fujitsu.com/dl.aspx?id=2b61a08f-52f4-4067-bbbf-dc0b58bee1bd
VMmark V2
http://www.vmmark.com
vServCon
Benchmark Overview vServCon
http://docs.ts.fujitsu.com/dl.aspx?id=b953d1f3-6f98-4b93-95f5-8c8ba3db4e59
Contact
FUJITSU
Website: http://www.fujitsu.com/
PRIMERGY Product Marketing
mailto:Primergy-PM@ts.fujitsu.com
PRIMERGY Performance and Benchmarks
mailto:primergy.benchmark@ts.fujitsu.com
© Copyright 2014-2015 Fujitsu Technology Solutions. Fujitsu and the Fujitsu logo are trademarks or registered trademarks of Fujitsu Limited in Japan and
other countries. Other company, product and service names may be trademarks or registered trademarks of their respective owners. Technical data subject
to modification and delivery subject to availability. Any liability that the data and illustrations are complete, actual or correct is excluded. Designations may
be trademarks and/or copyrights of the respective manufacturer, the use of which by third parties for their own purposes may infringe the rights of such
owner.
For further information see http://www.fujitsu.com/fts/resources/navigation/terms-of-use.html
2015-07-02 WW EN

White Paper: Fujitsu Server Primergy Performance Report PRIMERGY RX2540 M1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

White Paper: Fujitsu Server Primergy Performance Report PRIMERGY RX2540 M1

Uploaded by

Copyright:

Available Formats

White Paper  Performance Report PRIMERGY RX2540 M1

http://ts.fujitsu.com/primergy Page 1 (62)

Document history ................................................................................................................................................ 3

Page 2 (62) http://ts.fujitsu.com/primergy

http://ts.fujitsu.com/primergy Page 3 (62)

Page 4 (62) http://ts.fujitsu.com/primergy

Model PRIMERGY RX2540 M1

http://ts.fujitsu.com/primergy Page 5 (62)

Processors (since system release)

Xeon E5-2623 v3 4 8 10 8.00 3.00 3.50 1866 105

Xeon E5-2603 v3 6 6 15 6.40 1.60 n/a 1600 85

Xeon E5-2630L v3 8 16 20 8.00 1.80 2.90 1866 55

Xeon E5-2650 v3 10 20 25 9.60 2.30 3.00 2133 105

Xeon E5-2650L v3 12 24 30 9.60 1.80 2.50 2133 65

Xeon E5-2683 v3 14 28 35 9.60 2.00 3.00 2133 120

Xeon E5-2698 v3 16 32 40 9.60 2.30 3.60 2133 135

Xeon E5-2699 v3 18 36 45 9.60 2.30 3.60 2133 145

Page 6 (62) http://ts.fujitsu.com/primergy

Memory modules (since system release)

Power supplies (since system release) Max. number

Some components may not be available in all countries or sales regions.

http://ts.fujitsu.com/primergy Page 7 (62)

Page 8 (62) http://ts.fujitsu.com/primergy

Some components may not be available in all countries or sales regions.

http://ts.fujitsu.com/primergy Page 9 (62)

Page 10 (62) http://ts.fujitsu.com/primergy

http://ts.fujitsu.com/primergy Page 11 (62)

SPECcpu2006: integer performance

Page 12 (62) http://ts.fujitsu.com/primergy

SPECcpu2006: floating-point performance

SPECcpu2006: floating-point performance

http://ts.fujitsu.com/primergy Page 13 (62)

The benchmark runs on a wide variety of

Page 14 (62) http://ts.fujitsu.com/primergy

Some components may not be available in all countries or sales regions.

http://ts.fujitsu.com/primergy Page 15 (62)

Performance Power Energy Efficiency

Page 16 (62) http://ts.fujitsu.com/primergy

The comparison with the competition makes

8,000 PRIMERGY RX2540 M1 is setting new

SPECpower_ssj2008: PRIMERGY RX2540 M1 vs. PRIMERGY RX300 S8

http://ts.fujitsu.com/primergy Page 17 (62)

Thanks to the new Haswell processors the

total power [watt]

Page 18 (62) http://ts.fujitsu.com/primergy

Disk I/O: Performance of storage media

Standard load Access Type of access Block size Application

The main results of a measurement are:

Data throughput [MB/s] = Transaction rate [IO/s] × Block size [MB]

http://ts.fujitsu.com/primergy Page 19 (62)

Some components may not be available in all countries / sales regions.

Controller name Cache Supported interfaces RAID levels

Page 20 (62) http://ts.fujitsu.com/primergy

Storage medium type Interface Form factor

*) partly also available with a 3.5" tray

http://ts.fujitsu.com/primergy Page 21 (62)

SSDs in comparison with the most powerful HDD

Sequential accesses (maximum performance values in MB/s):

Page 22 (62) http://ts.fujitsu.com/primergy

Sequential accesses (maximum performance values in MB/s):

http://ts.fujitsu.com/primergy Page 23 (62)

Disk I/O: Performance of RAID controllers

Standard load Access Type of access Block size Application