NFS Client Performance Tuning For High Speed (10Gbps or Faster) Networks

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Document 2050370.1 https://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-state...

Copyright (c) 2020, Oracle. All rights reserved. Oracle Confidential.

NFS Client Performance Tuning for High Speed (10Gbps or faster) Networks (Doc ID 2050370.1)

In this Document

Goal
Solution
Packet Loss
Connection Load Spreading
NFS kernel resources
I/O (read/write) sizes
Jumbo Frames
TCP window sizing
Direct I/O mount options
References

APPLIES TO:

Solaris Operating System - Version 10 3/05 and later


Information in this document applies to any platform.

GOAL

Solaris and it's NFS component NFS is, for the most part, tuned for good performance, although some additional attention is usually
required to achieve optimal throughput on on 10Gb/sec or faster networks.

SOLUTION

Packet Loss

NFS does not perform well with even a small amount of packet loss and/or TCP retransmissions. A TCP retransmit rate (from either the
NFS client or server, depending on which way data are flowing) of even 1/4 of 1 percent (0.25%) usually has a noticeable impact on bulk
read or write throughput.

It is therefore critical that modern network equipment (if not clients and servers) be employed. Most current Solaris 10Gbits/sec. (or faster)
network interfaces and platforms should perform well, with no packet loss - unless under quite extreme pressure. Make sure the same is
true of the network fabric as well.

Connection Load Spreading

Modern high speed network interfaces possess multiple hardware rings. An individual TCP connection will utilize just a single ring, and
won't leverage the full resources of a fast interface.

The Solaris kernel RPC, which manages NFS TCP connections, can be tuned for multiple TCP connections per NFS server.
For example, in /etc/system:

* more connections per NFS server


* default 1
set rpcmod:clnt_max_conns=8

will result in 8 TCP connections per NFS server address. In general spreading the NFS workload across 8 connections is plenty to keep a
10Gb link busy. More connections may be counter-productive, as there is some overhead in managing the workload across connections.
Therefore a value of 8 is generally recommended.

NFS kernel resources

1 of 3 26/08/2020 18:04
Document 2050370.1 https://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-state...

Each client NFS mount has 8 async worker threads what perform reads (readaheads), writes, and readdir operations. This can be a
limitation with high speed networks and powerful NFS servers. For general use, 32 threads can help throughput, with:

* more worker threads


* per NFS mount
* default 8
set nfs:nfs3_max_threads=32
set nfs:nfs4_max_threads=32

This tuning is per NFS mount. Use discretion on NFS clients with dozens of active mounts. A single busy mount (such as might be used for
backups) may benefit from more "max threads" - although usually once 64 (sometimes 128) async worker threads per mount are
configured, returns will tend to diminish, even for streaming bulk data (constant read or write) workloads such as backups.

NOTE: this tuning only affects NFS when using buffered (async) I/O. Mounts with the forcedirectio mount option, or noac for NFS writes, or
when using applications (such as Oracle databases with filesystemio_options=setall in init.ora) which utilize direct I/O will not benefit from
the additional async worker threads this tuning provides. In those cases, application threads generate all I/O requests, and the NFS kernel
async workers remain idle.

NFS servers have a maximum number of nfsd server threads. The Solaris 10 default is 16, which is much too small for a real NFS server. In
general, Solaris 10 systems should be tuned with something between 64 and 1024 nfsd threads, depending on the workload and the size of
the server. . Solaris 11 NFS servers default to 1024 NFS kernel threads, which should be sufficient for most workloads. See Doc
1004871.1 Information on determining number of nfsd (Network File System Daemon) threads.

I/O (read/write) sizes

In general, less individual I/O operations for moving the same amount of data will provide better performance vs. a greater amount of
smaller reads or writes.

NFS clients running Solaris 11.2 or later will have the read/write sizes for NFS mount (the mount rsize/wsize values) default to 1M (1048576)
bytes. Before 11.2, the NFSv3 default rsize/wsize value follows the nfs3_bsize value -- see Doc 1433313.1 Why is my nfs mount not
showing rsize and wsize despite the 1MB mount options?. When mounting with NFSv4, the default rsize/wsize values are 1MBbyte for both
Solaris 10 and Solaris 11.x.

However, the actual buffered (without Direct I/O) NFS I/O sizes are controlled by the nfs3_bsize or nfs4_bsize parameters. These have a
32KByte default value. Therefore, to increase the default NFS READ and WRITE size, use this /etc/system tuning:

** default buffered I/O sizes


set nfs:nfs3_bsize=1048576
set nfs:nfs4_bsize=1048576

Keep in mind that 1MB READ/WRITE sizes might not always be the optimal value. For example, when using an Oracle ZFS Storage
Appliance as an NFS server, a 128K (131072) byte read/write size may achieve the best response times, should the ZFS record size in use be
128KBytes. Matching the NFS client's I/O sizes to that of the NFS servers' back-end filesystem can avoid unnecessary read/modify/write
cycles.

As usual with performance tuning, mileage varies; test to find the best I/O size for the environment and workload.

Jumbo Frames

In many cases utilizing jumbo frames (~9000 bytes or greater MTU) is more efficient and may optimize throughput. That is, provided
jumbo frames are utilized end-to-end - which usually means a NFS client and server on the same subnet.

Modern 10Gbits/s (or faster) Ethernet NICs, and modern systems, perform quite well with the MTU=1500 Ethernet standard, but for bulk
data transfers, jumbo frames are certainly more efficient, as bulk data requires about 5X less packets with jumbo. Given that the
processing cost of moving data is primarily per-packet vs. per byte, jumbo frames can save CPU resources when handling large NFS I/O
operations.

Environments moving large amounts of data over NFS (workloads such as backups, or anything that moves large files regularly) will
frequently benefit from partitioning NFS traffic to a dedicated subnet (and therefore dedicated NIC hardware) if possible.

TCP window sizing

No TCP buffer tuning should be necessary for local data-center NFS. As of Solaris 10 kernel patch 147440-04 or later, Solaris NFS uses a
1MByte default TCP buffer (and window). This is tunable separately vs. the global TCP buffer sizes. NFS will use the greater of it's tuning
and the system TCP buffer settings. See Doc 1510324.1"NFS specific TCP socket buffer tuning with Solaris 10 update 10 (08/11) or later".

Direct I/O mount options

Direct I/O can provide the lowest latency, with the tradeoff of reduced (by quite a bit, usually) throughput. Direct I/O makes sense in some

2 of 3 26/08/2020 18:04
Document 2050370.1 https://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-state...

situations, but most of the time, especially with modern equipment and operating systems, standard NFS async I/O, utilizing the NFS async
worker threads and the NFS filesystem cache, provides sufficiently low latency and the highest throughput.

These mount options will result in NFS using direct I/O:

forcedirectio - This option enforced direct I/O for both NFS writes as well as NFS reads. No NFS filesystem caching (buffering) will be done,
as well as no read-aheads. All NFS I/O is driven directly from application threads. NFS async worker threads are not utilized.

noac - This option not only eliminates caching of attributes (which might be important if multiple clients are accessing the same NFS objects
concurrently), but it also eliminates "write behind" caching, which results in NFS writes using direct I/O. That means that all NFS write I/O is
driven by the application thread. The multiple NFS async worker threads per mount (which may have been tuned with the above
nfs:nfs[3,4]_max_threads tuning to increase throughput) are not utilized for NFS writes when the noac mount option is present.

While both mount options have their uses, it should be kept in mind that maximum throughput is usually seriously impacted with either
option. Normal NFS throughput tests using dd or file copies will usually result in throughput being a magnitude slower over high speed
networks with these mount options vs. without.

The most frequent user of Direct I/O are databases. Databases already have a large data cache, and "double buffering" in the NFS
filesystem cache is not desirable. Databases are very latency sensitive, and direct I/O provides the lowest latency for database I/O
operations.

The Oracle database in fact will use direct I/O (via the directio(3C) system call, which NFS will honor) for the files it requires whenever the
filesystemio_options=setall init.ora configuration setting is present. The advantage to having the database initiate direct I/O itself is that the
database will use direct I/O only on files that require it - such as tablespace data files and redo logs. Other file types - such as archive logs,
which can benefit from buffered I/O - may still utilize normal NFS buffering and the NFS async threads it affords.

Therefore, whenever throughput is a concern, the use of noac and/or forcedirectio mount options should be carefully considered. Tests
done with and without these option, to understand the consequences, are recommended.

REFERENCES

NOTE:1003058.1 - NFS on Solaris PSD (Product Support Document)

NOTE:1006098.1 - Basic tuning for NFS over Long Fat Networks (LFN)

NOTE:1285485.1 - GUDS - A Script for Gathering Solaris Performance Data


NOTE:1510324.1 - NFS specific TCP socket buffer tuning with Solaris 10 update 10 (08/11) or later

Didn't find what you are looking for?

3 of 3 26/08/2020 18:04

You might also like