What throughput can I get with NFS

This article concentrates on NFS over gigabit, the most common form in data centres at the time of writing.

The first thing to set is expectations when it comes to NFS. Your NFS server might have an 8Gb connection to a high speed SAN but if you network is 1Gb then that only equates to 125MB/s but even then in ideal conditions in a lab, using Jumbo Frames and tuned to a particular application, only speeds of 112MB/s have been obtained. But in fact it is worse than that, in real life empirical evidence shows that a throughput of 30-60 MB/s is more like what you should expect.

To put those figures in perspective a modern desktop PC drive with peak around 80-90MB/s, or a software raid5 of 4 cheap disks will hit 200MB/s. So how do you get your multimillion pound SAN system to deliver higher throughput to you servers than a few hundred pounds [GBP] of desktop PC hard disk? You have a few options:

- switch to 10Gb Ethernet, very expensive still, but potential to increase NFS throughput 10 fold.
- add more Ethernet ports and aggregate them, this requires correct network infrastructure and O/S support. 2 ports would double throughput. [ not tested]
- add more servers.

The final solution above, 'add more servers', won't increase max throughput per server but will allow more clients to be supported at nearer to maximum throughput. For example if you have one client accessing an NFS server it might get the whole server to itself and say get 40MB/s, but add a second client and they now both only get 50% of the maximum. But add in a second server and potentially they are both able to get 100% each.

The first two options mentioned above both increase what value that 100% equates to in raw throughput. All the options can coexist, so the ultimate nfs solution would have multiple NFS servers, with multiple 10Gb network connections.

Before investing all that money you need to consider are there any better ways of sharing data between servers? The answer has to be depends on what you are doing.

Some sort of clustered file system in most cases would probably be better and you will get nearer to true disk/array speeds. For example if using Oracle databases you could use Oracle ASM, or on a proprietary UNIX you could license there Clustered File System. If you are on Linux you could use Oracle's free solution OCFS2, or GFS on Redhat.

If there are financial, technical or support issues why you cannot use a clustered file system then some for of NFS system (or other networkfile system) is needed. You also have a number of choices here:

- proprietary NAS like Netapps
- NFS cluster, e.g. Red hat NFS cluster
- pseudo NFS cluster, using VIP and cluster files system or rsync
- standalone server

The latter option obviously will give you the least performance and least resilience but will be the cheapest, with a proprietary system probably giving you the best resilience and performance at the highest price.

Performance can be tweaked depending on load, for example for a database you want the largest NFS read and write size possible (32k on NFS3 and 64k on NFS4) and you will want to turn off all the file cache options.

For large sequential read thing like DB Replay you will most like still want rsize and wsize as large as possible but may want the file buffer cache on (to test).

For smaller files or random access type application you will probably want to reduce the rsize and wsize, as performance tools like bonie++ shows a performance loss at high NFS r/wsizes for these types of applications. But the only way to truly test whether block size makes a difference for your application is run life like tests using a load tool, e.g. Load Runner.

For programs needing large block sizes, with no Jumbo frames you can expect to get about 50MB/s, for smaller block sizes you could see maximum sequential read/write drop to 30MB/s but that doesn't mean setting the block size higher will make any overall difference to the application performance. As already mentioned this can show a performance loss. With other tests for oracle ebusiness the w/rsize made little difference to the autoconfig or relinking stages. Relinking in fact would appear to be one of the operations most effected when moving from a normal filesystem to NFS, I.e. Moving from local filesystems to shared appltop. The increase in time taken is very significant. Luckily this [relinking] is also an operation carried out less frequently than others.

The General NFS configuration recommendations given by Oracle are for Oracle databases and dont really apply to shared appltop, in fact you will get better performance with the defaults than these database recommendations. This is because a database will want the largest read/write size possible and will also turn off the filesystem buffer, the reason for this is the memory saved is better used for SGA and user sessions. The are some tweaks that do seem to make a difference for applications.

- no_wdelay
- r/wsize same size as filesystem block size e.g. 4k

The above won't give the maximum sequential read and write but they should give you a good balance between sequential and random access.

The other thing to give you more throughput is load balancing. Load balancing and failover with NFS can problematic. For example using NFS3 over TCP and repointing the client to a different server can cause Stale NFS handles. This makes that method difficult if using the auto mounter or a hardware loadbalancer.

Possible ways around this is switching to UDP or NFS4. Another possibility would be to build an NFS Cluster similar to the one in this Redhat link. None of these options have had failover tested by myself yet. But I did find that with similar setup on a good network, I.e. Local within data centres, there is little performance difference between NFS3 using UDP or TCP and NFS4 (which only works over TCP). So there should be no performance issue switch between them.

For load balancing with no failover you could have multiple nfs servers and point different application servers to different nfs servers. This would give you more resilience for the application, although not the application server itself, and better performance as the load will be spread.

For additional resilience you could have hardware or auto mounter failover to the other nfs servers. But you will probably have to remount the filesystem after killing off the application, then restart. This would give you a scenario like, one nfs server goes down but you only lose a percentage of you application, say 50%. You can then either wait for the failed server to come back online or switch to another NFS server. To achieve this all you would do is the instruction above, I.e remount and restart and it will connect to an available NFS server.

- Posted using BlogPress from my iPad