Summary of Contents for IBM Elastic Storage System 3000
Page 1
IBM Elastic Storage System 3000 Version 6.0.1 Service Guide SC28-3158-00...
Page 2
IBM welcomes your comments; see the topic “How to submit your comments” on page xi. When you send information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you.
Page 3
Contents Figures........................v Tables......................... vii About this information..................ix Who should read this information.......................ix IBM Elastic Storage System information units................... ix Related information.............................ix Conventions used in this information......................x How to submit your comments........................xi Chapter 1. Events....................1 Array events..............................1 Canister events............................
Page 5
Figures 1. Unlocking the drive and release latch......................21 2. Removing the drive............................21 3. Inserting the new drive..........................22 4. Completing the drive installation........................22 5. Correct drive blank orientation........................23 6. Details of Power Supply Units in the management GUI................24 7.
Page 7
Tables 1. Conventions..............................x 2. Events for the Array component........................1 3. Events for the Canister component......................1 4. Events for the Enclosure component......................4 5. Events for the physical disk component....................... 8 6. Events for the Recovery group component....................11 7.
Page 9
This information is intended for administrators of IBM Elastic Storage System (ESS) that includes IBM Spectrum Scale RAID. IBM Elastic Storage System information units IBM Elastic Storage System (ESS) 3000 documentation consists of the following information units. Information unit Type of information Intended users...
Page 10
Enter. In command examples, a backslash indicates that the command or coding example continues on the next line. For example: mkcondition -r IBM.FileSystem -e "PercentTotUsed > 90" \ -E "PercentTotUsed < 85" -m p "FileSystem space used" {item} Braces enclose a list from which you must choose an item in format and syntax descriptions.
Page 11
How to submit your comments To contact the IBM Spectrum Scale development organization, send your comments to the following email address: scale@us.ibm.com About this information xi...
Page 12
IBM Elastic Storage System 3000: Service Guide...
Page 13
The recorded events can also be displayed through the GUI. The following sections list the RAS events that are applicable to various components of the IBM Spectrum Scale system: Array events The following table lists the events that are created for the Array component.
Page 14
{0} measured a low fallen below the mmlsenclosure status by using the temperature value. actual low critical command reports mmlsenclosure threshold value for at the temperature command. least one sensor. sensor with a failure. 2 IBM Elastic Storage System 3000: Service Guide...
Page 15
STATE_CHANG ERROR The inspection of Number of populated The /opt/ibm/gss Check for specific events the CPU slots found CPU slots, number of /tools/bin/ related to CPUs by using a mismatch enabled CPUs, ess3kplt command the mmhealth command.
Page 16
The firmware level Check the installed of adapter {0} is of the adapter is BIOS level using the wrong. wrong. mmlsfirmware command. current_failed STATE_CHANGE ERROR currentSensor {0} The currentSensor failed. state is failed. 4 IBM Elastic Storage System 3000: Service Guide...
Page 17
If there is an issue with the SAS HBA or SAS Cable, reboot the node to see if this resolves the issue. If not contact your IBM representative. Chapter 1. Events 5...
Page 18
The DC power reports high supply current is current. greater than the threshold. power_high_voltage STATE_CHANGE WARNIN Power supply {0} The DC power reports high supply voltage is voltage. greater than the threshold. 6 IBM Elastic Storage System 3000: Service Guide...
Page 19
Table 4. Events for the Enclosure component (continued) Event Event Type Severity Message Description Cause User Action power_no_power STATE_CHANGE WARNIN Power supply {0} Power supply has has no power. no input AC power. The power supply may be turned off or disconnected from the AC supply.
Page 20
STATE_CHANGE ERROR The NVDIMM of the The nvram drive of the pdisk {0} is failed. disk is in error state. tsls nvra msta mand show fail state nvra drive of the disk. 8 IBM Elastic Storage System 3000: Service Guide...
Page 21
The pdisk state is missing. missing. gnr_pdisk_needanalysis STATE_CHANGE ERROR GNR pdisk {0} needs The GNR pdisk has a Contact IBM support if you analysis. problem that has to be mmls are not sure how to solve analyzed and solved pdis this problem.
Page 22
GNR will read-only mand using the sg_wr_modes from this disk. show command. s that pdisk state conta VWCE ssd_endurance_ok STATE_CHANGE INFO ssdEndurancePerc ssdEndurancePerc entage of GNR pdisk entage value is ok. {0} is ok. 10 IBM Elastic Storage System 3000: Service Guide...
Page 23
The recovery group is {0} is not active. not active. gnr_rg_found INFO_ADD_ENTITY INFO GNR recovery group A GNR recovery group {0} was found. listed in the IBM Spectrum Scale configuration was detected. gnr_rg_ok STATE_CHANGE INFO GNR recoverygroup The recovery group is {0} is ok.
Page 24
{0} is ok. hardware state using xCAT. server_power_supply_oc_line_ STATE_CHANGE ERRO OC Line 12V of The GUI The hardware None. 12V_failed Power Supply checks the part failed. {0} failed. hardware state using xCAT. 12 IBM Elastic Storage System 3000: Service Guide...
Page 25
Table 7. Server events (continued) Event Event Type Severi Message Description Cause User Action server_power_supply_ov_line_ 12V_ok STATE_CHANGE INFO OV Line 12V of The GUI The hardware None. Power Supply checks the part is ok. {0} is ok. hardware state using xCAT.
Page 26
Backplane {0} is checks the part is ok. hardware state using xCAT. dasd_backplane_failed STATE_CHANGE ERRO DASD The GUI The hardware None. Backplane {0} checks the part failed. failed. hardware state using xCAT. 14 IBM Elastic Storage System 3000: Service Guide...
Page 27
Table 7. Server events (continued) Event Event Type Severi Message Description Cause User Action server_cpu_ok STATE_CHANGE INFO All CPUs of The GUI The hardware None. server {0} are checks the part is ok. fully available. hardware state using xCAT. server_cpu_failed STATE_CHANGE ERRO At least one CPU...
Page 28
The hardware None. healthy. checks the part is ok. hardware state using xCAT. server_failed STATE_CHANGE ERRO The server {0} The GUI The hardware None. failed. checks the part failed. hardware state using xCAT. 16 IBM Elastic Storage System 3000: Service Guide...
Page 29
The vdisk state is degraded. degraded. gnr_vdisk_found INFO_ADD_ENTI INFO GNR vdisk {0} was A GNR vdisk listed in found. the IBM Spectrum Scale configuration was detected. gnr_vdisk_offline STATE_CHANGE ERROR GNR vdisk {0} is offline. The vdisk state is offline. gnr_vdisk_ok...
Page 30
18 IBM Elastic Storage System 3000: Service Guide...
Page 31
Chapter 2. Servicing Service information is intended for IBM authorized service personnel only. Consult the terms of your warranty to determine the extent to which you can attempt to accomplish any IBM ESS 3000 system maintenance. IBM service support representatives and lab based services personnel can access service information through the following link.
Page 32
The drive associated with the pdisk name in the previous command should now have flashing amber fault LED to indicate it is safe to remove this drive. Removing the disk physically 1. Press the blue touchpoint to unlock the latching handle, as shown in this figure. 20 IBM Elastic Storage System 3000: Service Guide...
Page 33
Figure 1. Unlocking the drive and release latch 2. Lower the handle and slide the drive out of the enclosure, as shown in this figure. Figure 2. Removing the drive Replacing the drive 1. Ensure that the LED indicators are at the top of the drive. 2.
Page 34
The following pdisks will be formatted on node ess01io1: mmvdisk: /dev/sdrk mmvdisk: mmvdisk: Location SX32901810-11 is Enclosure 2 Drive 11. mmvdisk: Pdisk e2s11 of RG BB01L successfully replaced. mmvdisk: Carrier resumed. 22 IBM Elastic Storage System 3000: Service Guide...
Page 35
6. Repeat the steps listed in the Preparing disks for replacement, Removing the disk physically and Replacing the drive sections for each pdisk that needs to be replaced as marked in the output of the mmvdisk pdisk list --replace command. Removing and replacing a drive blank Use the following procedures to remove a faulty drive slot filler and replace it with a new one from stock.
Page 36
• This procedure requires access to either management GUI or CLI command as a root user. IBM service personnel need to coordinate with the customer to work on this procedure. • Do not insert a PSU if the PSU slot does not contain a power interposer.
Page 37
1. In the management GUI, you can identify the faulty PSU from the Monitoring > Hardware page. You can also run the mmhealth node show enclosure command on the canister of the affected enclosure. To identify the affected enclosure, run the mmhealth cluster show enclosure command. The faulty enclosure will be in an DEGRADED or FAILED state.
Page 38
5 minutes. Operating for longer than this period might cause the control enclosure to shut down due to overheating. • No tools are required to complete this task. Do not remove or loosen any screws. 26 IBM Elastic Storage System 3000: Service Guide...
Page 39
1. Remove the power supply unit, as described in “Removing and replacing a power supply unit” on page Removing the power interposer 2. Remove the power interposer by pulling on the blue handle that is located beneath the PSU slot. Figure 9 on page 27 shows an example.
Page 40
MES instructions. ESS 3000 storage drives MES upgrade An offline IBM Elastic Storage System 3000 (ESS 3000) MES upgrade is supported for customers who want to upgrade a 12-drive ESS 3000 to a 24-drive ESS 3000.
Page 41
• When the resizing is done and the upgraded ESS 3000 is back online, you can perform other ESS and GPFS operations. Note: GPFS uses preferentially the new network shared disks (NSDs) to store data of a new file system. GPFS has four new NSDs that are the same as the four original NSDs, the workload per server is the same as it was before.
Page 42
The goal of this procedure is primarily to add a third high-speed adapter into each ESS 3000 canister. Customer can add supported InfiniBand or Ethernet adapters into the third PCI slot. • The PCI address is af:00.1 • The adapter type is ConnectX-5 [ConnectX-5 Ex] 30 IBM Elastic Storage System 3000: Service Guide...
Page 43
Figure 12. Ethernet ports on canister 1 (upper canister) Figure 13. Ethernet ports on canister 2 (lower canister) Note: These images show the PCIe ports for two adapters for each canister. For the MES upgrade, ESS 3000 has a third adapter with two more ports. Offline adapter MES procedure 1.
Page 44
Confirm with customer to ensure that customer did all required steps, and then disconnect the power cables. b. To shut down the storage enclosure, unplug both power cords that are on both the sides of the ESS 3000 system. 32 IBM Elastic Storage System 3000: Service Guide...
Page 45
After basic checks completion, place everything back into the frame and reinsert power cables. This step restarts the nodes. You can use the procedure in the Installing chapter of the IBM Elastic Storage System 3000: Hardware Planning and Installation Guide to do the following steps: 1) Plug your laptop to point-to-point to each container technician port.
Page 46
When the server is up again, do a basic ping test between the canister over the high-speed interface. c. If the ping is successful, start GPFS again by issuing the following command: # mmstartup -N <node class name> 34 IBM Elastic Storage System 3000: Service Guide...
Page 47
d. Ensure that node servers are active before you do the next step by issuing the following command: # mmgetstate -a e. Turn on the GPFS automount by issuing the following command: # mmchfs <filesystem> -A yes f. Turn on the GPFS autoload by issuing the following command: # mmchconfig autoload=yes g.
Page 48
The goal of this procedure is to add additional memory into each ESS 3000 canister. When the physical memory is installed, the customer can complete the operation by increasing the GPFS page pool. For more information, see the Planning for hardware chapter of the IBM Elastic Storage System 3000: Hardware Planning and Installation Guide.
Page 49
After basic checks completion, place everything back into the frame and reinsert power cables. This step restarts the nodes. You can use the procedure in the Installing chapter of the IBM Elastic Storage System 3000: Hardware Planning and Installation Guide to do the following steps: 1) Plug your laptop to point-to-point to each container technician port.
Page 50
When the server is up again, do a basic ping test between the canister over the high-speed interface. c. If the ping is successful, start GPFS again by issuing the following command: 38 IBM Elastic Storage System 3000: Service Guide...
Page 51
# mmhealth node show ESS 3000 storage drives concurrent MES upgrade An online IBM Elastic Storage System 3000 (ESS 3000) MES upgrade is supported for customers who want to upgrade a 12-drive ESS 3000 to a 24-drive ESS 3000. To upgrade the system, the NVMe drives with the same size as the existing 12 drives must be used. This MES upgrade doubles the available storage capacity in the existing ESS 3000.
Page 52
• All new or existing building blocks must be at the ESS 5.3.5.2 or ESS 3000 6.0.0.2 level. If the setup has any protocol nodes, these nodes must also be upgraded to ESS 5.3.5.2 levels (underlying code IBM Spectrum Scale 5.0.4.3 must be verified by using the gssinstallcheck or essinstallcheck command).
Page 53
7. To check whether all 24 NVMe drives have the latest firmware level, issue the following command from one of the canisters: # mmlsfirmware --type drive Example enclosure firmware available type product id serial number level firmware location ---- ---------- ------------- -------- --------...
Page 56
4. Check the state of GPFS on both canisters by issuing the following command: # mmgetstate -N this ESS 3000 node class A sample output is as follows: Node number Node name GPFS state ------------------------------------------- 44 IBM Elastic Storage System 3000: Service Guide...
Page 57
ess3ka-ib arbitrating ess3kb-ib active 5. Issue the following command until GPFS is in the active state on both canisters: # mmgetstate -N this ESS 3000 node class A sample output is as follows: Node number Node name GPFS state ------------------------------------------- ess3ka-ib active ess3kb-ib...
Page 58
46 IBM Elastic Storage System 3000: Service Guide...
Page 60
Table 10. FRU Part Numbers (continued) Description Part Number END5 power cord (9.2 ft), Drawer to IBM PDU - 0000001PP688 C13/C20 (250V/10A) for India Trusted Platform Module (TPM) 0000001YM315 Drive Blank 0000001YM705 DIMM Filler 0000001YM789 PCIe riser card with bracket assembly...
Page 61
C14, 200-240V/10A for India END0 Power Cord M (6.5 foot), Drawer to IBM PDU 0000001KV680 - C13/C14 (250V/10A) for India END1 Power Cord M (9 foot), Drawer to IBM PDU - 0000001KV681 C13/C14 (250V/10A) for India Chapter 3. Part Listings 49...
Page 62
Table 11. Cable Part Numbers (continued) Description Part Number END2 Power Cord m (14 ft), Drawer to IBM PDU - 0000001KV682 C13/C14 (250V/10A) for India 5M, Blue Ethernet Cat 5E cable 0000002CL468 5M, Green Ethernet Cat 5E cable 0000002CL469 5M, Yellow Ethernet Cat 5E cable...
Page 64
52 IBM Elastic Storage System 3000: Service Guide...
Page 65
Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead.
Page 66
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
Page 67
• See refers you from a non-preferred term to the preferred term or from an abbreviation to the spelled- out form. • See also refers you to a related or contrasting term. For other terms and definitions, see the IBM Terminology website (opens in new window): http://www.ibm.com/software/globalization/terminology building block A pair of servers with shared disk enclosures attached.
Page 68
See also cluster. (3) The routing of all transactions to a second controller when the first controller fails. See also cluster. failure group A collection of disks that share common access paths or adapter connection, and could all become unavailable through a single hardware failure. 56 IBM Elastic Storage System 3000: Service Guide...
Page 69
See file encryption key (FEK). file encryption key (FEK) A key used to encrypt sectors of an individual file. See also encryption key. file system The methods and data structures used to control how data is stored and retrieved. file system descriptor A data structure containing key information about a file system.
Page 70
Provides a way to control the bundling of several physical ports together to form a single logical channel. logical partition (LPAR) A subset of a server's hardware resources virtualized as a separate computer, each with its own operating system. See also node. LPAR See logical partition (LPAR). 58 IBM Elastic Storage System 3000: Service Guide...
Page 71
management network A network that is primarily responsible for booting and installing the designated server and compute nodes from the management server. management server (MS) An ESS node that hosts the ESS GUI and xCAT and is not connected to storage. It must be part of a GPFS cluster.
Page 72
Data that is associated with a recovery group. RKM server See remote key management server (RKM server). See Serial Attached SCSI (SAS). secure shell (SSH) A cryptographic (encrypted) network protocol for initiating text-based shell sessions securely on remote computers. 60 IBM Elastic Storage System 3000: Service Guide...
Page 73
Serial Attached SCSI (SAS) A point-to-point serial protocol that moves data to and from such computer storage devices as hard drives and tape drives. service network ® A private network that is dedicated to managing POWER8 servers. Provides Ethernet-based connectivity among the FSP, CPC, HMC, and management server. See symmetric multiprocessing (SMP).
Page 74
62 IBM Elastic Storage System 3000: Service Guide...
Page 75
Recovery group events 11 server events 12 virtual disk events 17 documentation ix resources ix IBM Elastic Storage System 3000 28 IBM Spectrum Scale events 1, 4, 8, 11, 12, 17 RAS events 1, 4, 8, 11, 12, 17 information overview ix...
Page 76
64 IBM Elastic Storage System 3000: Service Guide...
Page 77
Recovery group events 11 server events 12 virtual disk events 17 documentation ix resources ix IBM Elastic Storage System 3000 28 IBM Spectrum Scale events 1, 4, 8, 11, 12, 17 RAS events 1, 4, 8, 11, 12, 17 information overview ix...
Page 78
66 IBM Elastic Storage System 3000: Service Guide...