Page 1
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS This Dell technical white paper explains how to improve Network File System I/O performance by using Dell Fluid Cache for Direct Attached Storage in a High Performance Computing Cluster.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Contents Executive Summary ....................... 5 Introduction ......................6 1.1. Dell Fluid Cache for DAS (direct-attached storage) ............. 6 Solution design and architecture.................. 6 2.1. NFS storage solution (baseline) ................7 2.2.
Page 4
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Tables Table 1. NFS server and storage hardware configuration ............8 Table 2. NFS server software and firmware configuration ............. 9 Table 3. Hardware configuration for DFC ............... 10 Table 4.
This technical white paper describes how to improve I/O performance in such a NFS storage solution with the use of Dell Fluid Cache for DAS (DFC) technology. It describes the solution and presents cluster-level measured results for several I/O patterns. These results quantify the performance improvements possible with DFC, especially for random I/O patterns.
NFS server with direct-attached external SAS storage. The configuration of this NFS server is augmented with PCIe SSDs and DFC software for the DFC comparison. A 64-server Dell PowerEdge cluster was used as I/O clients to provide I/O load to the storage solution. The following...
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS sections provide details on each of these components as well as information on tuning and monitoring the solution. 2.1. NFS storage solution (baseline) The baseline in this study is an NFS configuration. One PowerEdge R720 is used as the NFS server.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Figure 2. NFS server Table 1. NFS server and storage hardware configuration Server configuration NFS SERVER PowerEdge R720 PROCESSORS Dual Intel(R) Xeon(R) CPU E5-2680 @ 2.70 GHz MEMORY 128 GB.
Mellanox OFED 1.5.3-3.1.0 The baseline described in this section is very similar to the Dell NSS. One key difference is the use of a single RAID controller to connect to all four storage arrays. In a pure-NSS environment, two PERC RAID controllers are recommended for optimal performance.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Table 3. Hardware configuration for DFC Server configuration NFS SERVER PowerEdge R720 CACHE POOL Two 350GB Dell PowerEdge Express Flash PCIe SSD SSD CONTROLLER Internal (slot 4) Rest of the configuration is the same as baseline, as described in Table 1...
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Table 5. I/O cluster details I/O cluster configuration 64 PowerEdge M420 blade servers CLIENTS 32 blades in each of two PowerEdge M1000e chassis Two PowerEdge M1000e chassis, each with 32 blades...
The NFS server and the attached storage arrays are configured and tuned for optimal performance. These options were selected based on extensive studies done by the Dell HPC team. Results of these studies and the tradeoffs of the tuning options are available in [4].
There are XFS mount options to optimize the file system as well. The options used for this solution are similar to the Dell NSS and are noatime, allocsize=1g, nobarrier, inode64, logbsize=262144, attr2. Details of these mount options are provided in [4].
From a system administrator’s point of view, there are several components in this storage solution that need to be monitored. This section lists a few simple Dell utilities that can be used to track the solution’s health and performance statistics.
For the recommended 350GB SSD drive, the standard warranty is 3 years, 25 PBW. The health of the device can be monitored using Dell OMSA utilities. OMSA reports the SSD “Device Life Remaining” and “Failure Predicted”. “Device Life Remaining” is an indication of the amount of data written to the device, and is calibrated for the PBW portion of the warranty.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS 2.5.3. Dell Fluid Cache for DAS health and monitoring DFC provides a very simple command-line utility /opt/dell/fluidcache/bin/fldc that can be used for configuration and management. Alternately, the DFC configuration can be accomplished using the OMSA GUI.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS • DFC in Write-Back mode (DFC-WB) – This configuration builds on the baseline by adding DFC as described in Section 2.2, and DFC is configured to operate in Write-Back (WB) mode. WB mode allows the caching of writes on the cache pool.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Figure 5. Large sequential write performance Sequential writes 2500 2000 1500 1000 Number of concurrent clients baseline DFC-WB DFC-WT Figure 6. Large sequential read performance Sequential reads 3500...
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS better than the baseline since the data is already in the DFC cache. As expected on read operations, WB and WT tests have similar performance and can reach peak throughout of ~3050 MiB/s.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Figure 8. Random read performance Random reads 140000 120000 100000 80000 60000 40000 20000 Number of concurrent clients baseline DFC-WB DFC-WT 3.3. Metadata tests This section presents the results of metadata tests using the mdtest benchmark. In separate tests, one million files were created, stated and unlinked concurrently from multiple NFS clients on the NFS server.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Figure 9. Metadata file create performance File create 45000 40000 35000 30000 25000 20000 15000 10000 5000 Number of concurrent clients baseline DFC-WB DFC-WT File create and file remove tests show similar results with the baseline out-performing the DFC configuration.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Figure 11. Metadata file remove performance File remove 50000 45000 40000 35000 30000 25000 20000 15000 10000 5000 Number of concurrent clients baseline DFC-WB DFC-WT 3.4. Cold-cache tests For all the test cases discussed in the previous sections, the file system was unmounted and re- mounted from the I/O clients and the NFS server between test iterations.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Figure 12 shows that on a cold-cache read for sequential tests, the throughput of the DFC configuration drops from a peak of ~3,050 MiB/s to ~1,050 Mi/s. Data needs to be pulled from backend storage and hence the drop in performance.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Figure 13 shows that on a cold-cache read for the random tests, peak IOPS of the DFC configurations drop from ~123,000 IOPs to ~80,000 IOPs. Interestingly this is higher than the baseline IOPs of ~9,300, and explained below.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS 4. Conclusion This Dell technical white paper describes a method to improve NFS performance using Dell Fluid Cache for DAS in an HPC environment. It presents measured cluster-level results of several different I/O patterns to quantify the performance of a tuned NFS solution and measure the performance boost provided by DFC.
Page 26
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS http://en.community.dell.com/techcenter/systems-management/w/wiki/1760.openmanage- server-administrator-omsa.aspx 8. Dell PowerEdge Express Flash PCIe SSD www.dell.com/poweredge/expressflash http://support.dell.com/support/edocs/storage/Storlink/PCIe%20SSD/UG/en/index.htm http://content.dell.com/us/en/home/d/solutions/limited-hardware-warranties.aspx...
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Appendix A: Step-by-step configuration of Dell Fluid Cache for This appendix provides detailed step-by-step instructions on the configuration of the storage solution described in this white paper. Readers familiar with Dell’s NSS line of solutions...
One RAID 0 virtual disk on two drives. This will be used for swap space. 2. Install Red Hat Enterprise Linux (RHEL 6.3) on the RAID 1 virtual disk of the PowerEdge R720. To use any GUI features of DFC or Dell OpenManage, select the “Desktop” group of packages.
Resolve any rpm dependencies as prompted by the OMSA setup scripts. For example, yum install libwsman1 openwsman-client. 7. Install Dell Fluid Cache for DAS v1.0. A DFC license will be needed to use the product. Copy the license file to the server. Execute the DFC setup script and resolve any Linux dependencies as required.
Page 30
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS 2. Change the OS I/O scheduler to “deadline”. To the end of the kernel line in /etc/grub.conf for the .14.1 errata kernel, add elevator=deadline. 3. To work around a known error message with the PCIe SSDs, add the following kernel parameter: to the end of the kernel line in /etc/grub.conf for the .14.1 errata kernel, add pci=nocrs.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Partitions : Available Hot Spare Policy violated : Not Applicable Encrypted : No Layout : RAID-0 Size : 557.75 GB (598879502336 bytes) Associated Fluid Cache State : Not enabled...
Page 32
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Status : Ok Name : PERC H810 Adapter Slot ID : PCI Slot 7 State : Ready Firmware Version : 21.1.0-0007 Minimum Required Firmware Version : Not Applicable Driver Version : 00.00.06.14-rh1...
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Stripe Element Size : 512 KB Disk Cache Policy : Disabled A.5. XFS and DFC configuration In this final step of the configuration on the server, the XFS file system is created, DFC is configured, and the storage exported to the I/O clients via NFS.
4. /opt/dell/fluidcache/bin/fldcstat is the utility to view and monitor DFC statistics. Check the fldcstat manual pages for options and descriptions of the statistics that are available. 5. Dell OpenManage Server Administrator provides a GUI to configure, administer, and monitor the server. Browse to https://localhost:1311 on the NFS server to see this GUI.
Mount NFS Share on clients. In addition for the cold cache tests described in Section 3.4, the disk managed by Dell Fluid Cache for DAS was disabled and the SSDs that are part of the cache pool were disabled after a write operation.
Page 36
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS IOzone Argument Description Number of threads Location of clients to run IOzone on when in clustered mode Does not unlink (delete) temporary file Use O_DIRECT, bypass client cache Give results in ops/sec.
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS B.2. mdtest can be downloaded from http://sourceforge.net/projects/mdtest/. Version 1.8.3 was used in mdtest these tests. It was compiled and installed on a NFS share that was accessible by compute nodes.
Page 38
Improving NFS Performance on HPC Clusters with Dell Fluid Cache for DAS Metadata file and directory creation test: # mpirun -np 32 -rr --hostfile ./hosts /nfs/share/mdtest -d /nfs/share/filedir -i 6 -b 320 -z 1 -L -I 3000 -y -u -t -C Metadata file and directory stat test: # mpirun -np 32 -rr --hostfile ./hosts /nfs/share/mdtest -d...