Page 1
HPE MSA 1040/2040 Storage Troubleshooting Guide Abstract This document provides information about troubleshooting HPE MSA 1040/2040 Storage. Use this document to troubleshoot and maintain Hewlett Packard Enterprise MSA Storage. Part Number: 866967-001 Published: June 2016 Edition: 1...
Contents 1 Troubleshooting HPE MSA 1040/2040 Storage..........4 Events and LEDs..........................4 Event Monitoring..........................4 Events sent as Indications to SMI-S Clients..................4 Events requiring FRU replacements....................5 LED Indications and Recommended Actions..................7 Disk Drives............................10 Disk Drive LEDs..........................11 Disk Error Conditions and Recommended Actions...............12 Disk Drive Bay Numbers.......................14 Drive Status...........................14 Disk Drive Leftover........................15 Vdisk or Disk Group Status......................16...
1 Troubleshooting HPE MSA 1040/2040 Storage The following section details the events and their descriptions used to troubleshoot HPE MSA 1040/2040 Storage: “Events and LEDs” (page 4) “Disk Drives” (page 10) “Management Controller Issue” (page 20) “Power Supply Faults and Power Cycle” (page 23) “Chassis Replacement”...
Events requiring FRU replacements Table 1 Events requiring FRU replacements scenarios Event ID Symptom Action Required “Disk Error Conditions and Recommended Actions” (page 12) Controller The sensors monitored a Verify the following: temperature or voltage in the The system has two power supplies and both warning range.
Page 6
Table 1 Events requiring FRU replacements scenarios (continued) Event ID Symptom Action Required Controller The supercapacitor pack is Replace on first occurrence. nearing end-of-life. The indicated disk type is invalid Replace on first occurrence. and is not enabled in the current configuration.
Table 1 Events requiring FRU replacements scenarios (continued) Event ID Symptom Action Required exceeded the capability of an FC SFP. A user inserted an unsupported Replace with a supported SFP. cable or SFP into the indicated controller host port. “Power Supply Faults and Recommended Actions” (page 25) LED Indications and Recommended Actions Table 2 LED Indications and recommended actions Indicator...
Page 8
Table 2 LED Indications and recommended actions (continued) Indicator Status Cause Action Required The enclosure rear panel On; blinking regularly System is functioning No action required. Wait for FRU OK LED properly. System is booting. system to boot. The controller module is not Verify the following: powered on.
Page 9
Table 2 LED Indications and recommended actions (continued) Indicator Status Cause Action Required The link is down. Check cable connections and reseat if necessary Inspect cables for damage. Replace cable if necessary Defective cable causes fault. Swap cables to determine the fault. Replace cable if necessary Verify that the switch, if...
Table 2 LED Indications and recommended actions (continued) Indicator Status Cause Action Required The power supply Input System is functioning No action required. properly. Power Source LED The power supply is not Verify that the power receiving adequate power. cable is properly connected and check the power source to which it connects...
“Two or Multiple Disk Drive Failure” (page 19) “Vdisk or Disk Group Alerts ” (page 19) “Vdisk or disk group Expansion Frequently Asked Questions (FAQs)” (page 19) Disk Drive LEDs Figure 1 2.5” SFF disk drive Figure 2 3.5” LFF disk drive Table 3 LEDs - Disk drive LEDs Description Fault/UID (amber/blue)
Table 4 LEDs - Disk drive combinations Online/Activity Fault/UID (green) (amber/blue) Description Recommended Action Normal operation. The drive is online, No action required. but it is not currently active. Blinking irregularly The drive is active and operating No action required. normally.
Page 13
Table 5 Disk error conditions and recommended actions (continued) Symptom Recommended Action Failed, and data recovery is needed, see “Using the Trust Command” (page 26). Event 8 reports that RAID-6 logic Replace the disk. intentionally failed the disk. Event 55 reports a SMART error for the If all the volumes and Vdisks or disk groups are online and available, disk.
Disk Drive Bay Numbers Figure 3 MSA 2040 SAN Array SFF enclosure Figure 4 MSA 2040 SAN Array LFF or supported drive enclosure Drive Status Table 6 Drive Status Status Displayed in the CLI or SMU Description Available AVAIL The drive is available for use.
Table 6 Drive Status (continued) Status Displayed in the CLI or SMU Description Vdisk or disk group Vdisk or disk group The drive is used in a Vdisk or disk group. Vdisk or disk group Spare Vdisk or disk group SP The drive is a spare assigned to a Vdisk or disk group.
Vdisk or Disk Group Status Table 8 Vdisk or disk group status Status Displayed in the CLI or SMU Description Critical CRIT The Vdisk or disk group is online; however, some drives are down and the Vdisk or disk group is not fault tolerant.
Table 9 Quarantine during array boot (available in all FW versions) (continued) Level Symptom Action Required Raid During boot, both the disk drives from the same OFFL, follow the troubleshooting steps “Using the sub-Vdisk or subgroup go missing and the Vdisk or Trust Command”...
Table 11 RAID-6 reconstruction scenarios Symptom Action Required To start reconstruction manually, replace each failed disk, and then do one of the following: Add each new disk as either a dedicated spare or a global spare. Remember that a different critical Vdisk or disk group might take the global spare than the one you intended.
Two or Multiple Disk Drive Failure RAID Level Symptom Description/Notes Action Required RAID 1 Two disk drive failures in RAID 1 supports a maximum of two If the Vdisk go QTOF, the Vdisk gets RAID 1 make it QTOF disk drives. de-quarantined automatically after the drives are recognized.
Question Answer group is quarantined. Will the expansion continue once NOTE: Avoid using the manual de-quarantine option the disk drives return and the Vdisk or disk group as it takes the Vdisk or disk group OFFL, which causes de-quarantines? data loss. During the expansion of a Vdisk or disk group (other than No, the spare is not used and the reconstruction does not RAID 6), if a drive fails making the Vdisk or disk group...
Page 21
Table 13 Management Controller scenarios Symptom Action Required SC component is functioning Restart MC from the other controller, if possible. properly and all Host I/O are being If that does not resolve the issue, restart the array. handled correctly while the MC component is not accepting a user login A storage controller might be...
Instances when the Management Controller becomes Unresponsive Table 14 Instances when the Management Controller becomes Unresponsive Symptom Cause Action Required The following messages regularly Firmware Flash is blocked, upgrade For systems that are in a firmware appear in the array logs: retries are evident.
Table 14 Instances when the Management Controller becomes Unresponsive (continued) Symptom Cause Action Required The following sequence of messages Management Controller and Storage For systems that have a appears in the array logs: Controller are unable to communication error between the MC communicate.
Page 24
If a power cycle is needed, follow these steps: Shut down, dismount, or unmap the hosts (servers) with access to the MSA volumes. This allows any pending writes in Application cache or Server memory to flush to the MSA controllers write-back cache. Shut down both array controllers in a dual controller system or a single controller system by using SMU or CLI interface.
Power Supply Faults and Recommended Actions Table 15 Power supply faults and recommended actions Symptom Recommended Action Power supply warning or failure. Related Event code 551. Verify the following: All the power supply units are working. No slots are left open for more than two minutes. If you might need to replace a module, leave the old module in place until you have the replacement, or use a blank cover to close the slot.
NOTE: Avoid unnecessary chassis replacement. Other than a mechanical failure, it is rare to have a chassis or midplane issue requiring replacement. A FRU is any HPE orderable replacement part. For information on replacing the Chassis, see HPE Chassis Replacement Instructions Guide on the Hewlett Packard Enterprise Support Center website.
NOTE: The trust command is used as a last step in a disaster recovery situation, and must be performed only by someone who knows how to use the trust command and has experience with Vdisk or disk group configurations and reviewing array logs. If you are not sure to take the corrective action, contact the Hewlett Packard Enterprise Support Center for further assistance.
to the logs. If you do not have the sub-Vdisk details, contact Hewlett Packard Enterprise Support Center for assistance. If you are uncertain about the order the drives Failed or went Leftover, or which drives were added into the Vdisk or disk group for reconstruction, contact Hewlett Packard Enterprise Support Center for further assistance.
Hewlett Packard Enterprise Support Center More Information on Access to Support Materials page: www.hpe.com/support/AccessToSupportMaterials IMPORTANT: Access to some updates might require product entitlement when accessed through the Hewlett Packard Enterprise Support Center. You must have an HP Passport set up with relevant entitlements. Websites Website Link Hewlett Packard Enterprise Information Library www.hpe.com/info/enterprise/docs...