Page 1
WIN-63-0100-03 HITACHI INDUSTRIAL COMPUTER HF-W6500 Model 60 RAS FEATURES MANUAL USER’S MANUAL...
Page 2
WIN-63-0100-03 HITACHI INDUSTRIAL COMPUTER HF-W6500 Model 60 RAS FEATURES MANUAL Read and Keep this manual. • Read safety instructions carefully and understand them before starting your operation. • Keep this manual at hand for reference. USER’S MANUAL...
Page 4
SAFETY INSTRUCTIONS Carefully read and fully understand the safety precautions below before operating the equipment. ⚫ Operate the equipment by following the instructions and procedures described in this manual. ⚫ Pay attention especially to safety precautions displayed on the equipment or in this manual.
Page 5
SAFETY INSTRUCTIONS (Continued) 1. SAFETY WARNINGS IN THIS MANUAL 1.1 Safety Warning Indicated as “NOTICE” ⚫ When failure of a drive is predicted, a drive hardware failure is likely to occur in the near future. We recommend you to back up the data and replace the drive.
Page 6
SAFETY INSTRUCTIONS (Continued) ⚫ If this equipment continues to operate after a fan failure is detected, internal parts such as processors will not be cooled sufficiently. In such a case, equipment malfunctions might occur, causing erratic system operation or damage to components. If possible, enable the automatic shutdown feature. ⚫...
Page 7
• Microsoft®, Windows®, Windows Server®, Windows NT®, and Visual Basic® are trademarks or registered trademarks of U.S. Microsoft Corporation in the United States and other countries. • All other product names (software and hardware) not from Hitachi described in this manual are registered trademarks, trademarks, or products of their respective owners.
Page 8
<Note for storage capacity calculations> ● Memory capacities and requirements, file sizes and storage requirements, etc. must be calculated according to the formula 2 . The following examples show the results of such calculations by 2 (to the right of the equals signs). 1 KB (kilobyte) = 1,024 bytes 1 MB (megabyte) = 1,048,576 bytes 1 GB (gigabyte) = 1,073,741,824 bytes...
CONTENTS SAFETY INSTRUCTIONS ..................S-1 CHAPTER 1 CAPABILITIES OF THE RAS FEATURE ........1-1 CHAPTER 2 ITEMS MONITORED BY THE RAS FEATURE ......2-1 2.1 Fan Monitoring ........................2-1 2.2 Monitoring Temperature inside the Chassis ................ 2-2 2.3 Drive Failure Prediction Function (SMART Monitoring) ........... 2-3 2.4 Drive Usage Monitoring ......................
Page 10
4.5.2 Hardware conditions that can be acquired by using remote notification ..... 4-24 4.5.3 Enabling the remote notification ................. 4-25 4.5.4 Objects in the extended MIB for HF-W ..............4-31 4.5.5 Extended MIB file for HF-W ..................4-39 4.6 Status Acquisition by Using the RAS Library ..............4-40 CHAPTER 5 CONTROLLING THE HARDWARE ..........
Page 11
8.1.1 Overview ........................8-1 8.1.2 Using the simulation function ..................8-4 8.1.3 Precautions when you use the Simulation Tool window ..........8-16 8.1.4 Event log entries ......................8-17 8.1.5 Remote notification ..................... 8-17...
Page 12
FIGURES Figure 1-1 RAS Setup Window ....................1-2 Figure 1-2 Hardware Status Icon ....................1-3 Figure 2-1 State of the Status Lamp and the CPUSTOP Contact ..........2-7 Figure 2-2 Example of a Flow Chart of Monitoring the Operational State of a User Program ........................
Page 13
TABLES Table 1-1 Overview of the RAS Feature ................... 1-1 Table 2-1 State of the Equipment in Terms of Fan Monitoring and State of the MCALL Contact ........................2-1 Table 2-2 State of the Equipment in Terms of Monitoring Temperature inside the Chassis and State of the MCALL Contact ................
Page 14
Table 6-11 Values Stored in Level in the HFW_ARRAY_STATUS Structure ...... 6-24 Table 6-12 Values Stored in DiskNumber in the HFW_ARRAY_STATUS Structure ..6-24 Table 6-13 Values Stored in Status in the HFW_ARRAY_STATUS Structure ..... 6-24 Table 6-14 Possible Combinations that Can Be Stored in Status in the HFW_ARRAY_STATUS Structure ..............
1. CAPABILITIES OF THE RAS FEATURE CHAPTER 1 CAPABILITIES OF THE RAS FEATURE The HF-W series come with the Reliability, Availability, Serviceability (RAS) feature that you would expect from a highly reliable industrial computer. The following is an overview of the RAS feature. Table 1-1 Overview of the RAS Feature Category Item...
1. CAPABILITIES OF THE RAS FEATURE <Monitoring> (1) Hardware status monitoring Monitors the hardware status of this equipment including the status of the fans and drives as well as the temperature inside the chassis. (2) OS hung-up monitoring Monitors the operational state of the OS using a dedicated timer implemented on this equipment.
1. CAPABILITIES OF THE RAS FEATURE <Status check> (5) Hardware status window Displays the hardware status of this equipment using a graphical interface. There is always an icon in the notification area of the taskbar to display the hardware status. Figure 1-2 Hardware Status Icon (6) Event notification Enables a user application to check the hardware status of this equipment by monitoring the...
Page 18
1. CAPABILITIES OF THE RAS FEATURE (12) Shutdown using library functions This function enables a user application to shutdown this equipment by using the RAS library. (13) Startup suppression when severe failure occurs Suppresses startup of this equipment when failure such as fan failure is detected during OS startup in order to protect the hardware.
Page 19
1. CAPABILITIES OF THE RAS FEATURE (20) Logging the trend of the temperature inside the chassis This function periodically measures the temperature inside the chassis of this equipment and records the data into a file. <Simulation> (21) Hardware status simulation Simulates the hardware status of this equipment.
2. ITEMS MONITORED BY THE RAS FEATURE CHAPTER 2 ITEMS MONITORED BY THE RAS FEATURE This chapter explains the items monitored by the RAS Feature. For information about the hardware specifications of the RAS external contacts interface (optional) described in this chapter and the usage of each contact, refer to “HF-W6500 Model 60 INSTRUCTION MANUAL (manual number WIN-62-0074)”.
2. ITEMS MONITORED BY THE RAS FEATURE 2.2 Monitoring Temperature inside the Chassis This function monitors the temperature inside the chassis using the temperature sensor in this equipment and notifies in the following methods when the temperature inside the chassis gets abnormally high.
2. ITEMS MONITORED BY THE RAS FEATURE 2.3 Drive Failure Prediction Function (SMART Monitoring) The hard drives in this equipment have the Self-Monitoring, Analysis and Reporting Technology (SMART) function, which constantly monitors the condition of the drives and anticipates a failure before the failure manifests itself. If the drive is an SSD, SMART can also monitor for the status in which the rate of writes to the SSD increases and exceeds a certain value.
2. ITEMS MONITORED BY THE RAS FEATURE 2.4 Drive Usage Monitoring The drive usage monitoring function adds up the power-on hours of a drive in this equipment. If the total power-on hours exceeds the set value, this function notifies in the methods below. By using this function, you can keep track of the appropriate timing of replacing a drive and prevent drive failure caused by using the drive for too long.
2. ITEMS MONITORED BY THE RAS FEATURE 2.5 Memory Monitoring Memory with Error Checking and Correcting (ECC) is installed in this equipment. A single bit error in the memory can be automatically corrected without causing any problems to the operation of the equipment. On the other hand, if a single bit error occurs as shown below, there is a possibility that the memory has failed, and we recommend replacing the memory modules from the viewpoint of preventive maintenance.
2. ITEMS MONITORED BY THE RAS FEATURE 2.6 OS Hung-up Monitoring The OS hung-up monitoring function uses a timer implemented on this equipment for monitoring the operational state of the OS (OS monitoring timer), in order to detect a situation that a process with the highest priority (real-time priority class) cannot run (hereinafter referred to as “OS hung-up”) due to, for example, runaway kernel or when all CPU load is used up by a driver.
2. ITEMS MONITORED BY THE RAS FEATURE Figure 2-1 shows how the state of the status lamp and the state of the CPUSTOP contact change. OS monitoring timer OS monitoring timer Turning off OS shutdown starts being retriggered. is timed out shutdown the power Power on...
2. ITEMS MONITORED BY THE RAS FEATURE 2.7 Watchdog Timer Monitoring This equipment has a watchdog timer. By retriggering a watchdog timer automatically, this function monitors whether processes are scheduled properly. In addition, this function can be used, for example, for monitoring the operational state of a user program by using dedicated library functions.
2. ITEMS MONITORED BY THE RAS FEATURE 2.7.2 Using a watchdog timer for monitoring a user program In order to use a watchdog timer for monitoring the operational state of a user program, you can use a configuration, for example, where the user program periodically retriggers the watchdog timer (that is, resets the timeout counter of the watchdog timer to the initial value) and another program checks whether timeout occurs for the watchdog timer.
Page 30
2. ITEMS MONITORED BY THE RAS FEATURE In Figure 2-2, the monitored program is configured to periodically retrigger the watchdog timer. The monitoring program periodically checks the timeout counter of the watchdog timer and if the value of the timeout counter (that is, the remaining time until timeout expires) is 0, it is determined that the timeout occurred.
2. ITEMS MONITORED BY THE RAS FEATURE 2.8 RAID Monitoring [B/T Model only] B and T Model has a RAID1 feature (hereinafter denoted simply as “RAID”) generally known as “mirror disk (mirroring)”. The RAID monitoring function monitors the status of the RAID on this equipment. If there is any change in the status of the RAID, this function notifies a user or an application using the following methods.
2. ITEMS MONITORED BY THE RAS FEATURE 2.8.1 State transition of the RAID Table 2-6 lists RAID statuses and their respective descriptions. Figure 2-3 shows the state transitions. Table 2-6 RAID Statuses and Their Descriptions RAID status Description Detailed information Normal Redundant data is intact, and the RAID is operating normally.
2. ITEMS MONITORED BY THE RAS FEATURE 2.8.2 Note about media error Media error is a status where there is a problem in data integrity while the RAID status is normal. If read errors occur at the copy source drive during a rebuild process, the rebuild process will complete but the sector data that could not be read is already lost, indicating a data integrity problem.
2. ITEMS MONITORED BY THE RAS FEATURE For details about (1) to (3) and (7), see “CHAPTER 4 CHECKING THE HARDWARE STATUS”. For ditails about (5), see below “Figure 2-4 Example of a Message Box when a Media Error Occurs. ” For details about (8), see “6.1.9 Get function for the RAID status (hfwRaidStat) [B/T Model only]”.
2. ITEMS MONITORED BY THE RAS FEATURE 2.9 RAS MCU Failure Monitoring Function The RAS MCU failure monitoring function monitors the operation of each RAS MCU mounted in this equipment and reports any detected failures by one of the following methods: (1) Remote notification (2) Message box display (3) Alarm lamp...
3. SETTING UP THE RAS FEATURE CHAPTER 3 SETTING UP THE RAS FEATURE 3.1 RAS Setup Windows 3.1.1 Overview In the RAS Setup window, the following functions can be set up. Table 3-1 Setup Items in the RAS Setup Window Item Automatically shutdown if fan failure has been detected.
3. SETTING UP THE RAS FEATURE 3.1.2 Starting the RAS Setup window To start the RAS Setup window, follow the procedure below. Before you start this window, you need to log on to the computer as an administrator account. 1. Click Start. 2.
3. SETTING UP THE RAS FEATURE 3.1.3 Using the RAS Setup window (1) Shutdown setting You can select whether this equipment is automatically shut down for each of the following cases: a fan failure, an abnormally high temperature, and a remote shutdown request.
Page 39
3. SETTING UP THE RAS FEATURE NOTICE If this equipment continues to operate while fan failure is detected, internal parts such as a processor are not cooled down sufficiently, and that may cause thermal runaway of the system due to malfunction of the equipment or result in damage to the parts.
3. SETTING UP THE RAS FEATURE (2) Watchdog timer setting You can set up the watchdog timer of this equipment. You can select one of the following ways of using the watchdog timer: • Not used • Retriggered by an application program •...
Page 41
3. SETTING UP THE RAS FEATURE NOTE When you change the setting to “Retriggered by application program,” the watchdog timer is stopped once. While the watchdog timer is stopped, it will time out. When a user application retriggers the watchdog timer by using the WdtControl function of the RAS library, the watchdog timer resumes a countdown.
3. SETTING UP THE RAS FEATURE (3) Drive usage monitoring setting You can set up the drive usage monitoring setting. By clicking Advanced, you can set up the advanced settings of this function. Figure 3-4 Items in the Drive Usage Monitoring Setting ●...
3. SETTING UP THE RAS FEATURE ● Advanced button If the drive usage monitoring is enabled, click Advanced to display the following window. If the drive usage monitoring is disabled, the setting cannot be performed because the button becomes inactive. The following window appears when a drive is not mounted in the optional drive bay.
Page 44
3. SETTING UP THE RAS FEATURE If you want to clear the current cumulative power-on hours when, click Reset to clear the value. Then the following message is displayed. If you click OK when you receive the message, the value of the current cumulative power-on hours is cleared.
3. SETTING UP THE RAS FEATURE (4) Status display digital LEDs setting You can set up the display mode of the status display digital LEDs located on the front of the equipment. Figure 3-6 Items in the Status Display Digital LEDs Setting ●...
3. SETTING UP THE RAS FEATURE (5) Popup notification setting You can set up the popup notification setting. By clicking Advanced, you can set up the advanced settings of this function. Figure 3-7 Items in the Popup Notification Setting ● “Function is available” check box •...
3. SETTING UP THE RAS FEATURE ● Advanced button Click Advanced to display the following window. A/S Model B/T Model Figure 3-8 Advanced Settings for the Popup Notification Setting [Events] • Fan failure • Abnormally high temp. • Drive failure prediction (SMART) •...
Page 48
3. SETTING UP THE RAS FEATURE [Message editing] You can edit popup notification messages. And you can check the messages after you edit them. For information about how to edit and check the messages, see “3.1.4 Editing popup notification messages”. If you changed the advanced setting and want to make the change effective, click OK.
3. SETTING UP THE RAS FEATURE 3.1.4 Editing popup notification messages (1) Editing popup notification messages If you want to edit messages used for popup notification, click Edit. Notepad is launched and the message definition file for popup notification is opened. Edit the messages in the format described below.
3. SETTING UP THE RAS FEATURE ■ Description in a Message Definition File 1. Section The table below shows a list of section names you can define for this function and an explanation of the message you define for each section. Table 3-2 Section Names and Defined Messages Section name Defined message...
Page 51
3. SETTING UP THE RAS FEATURE 2. Keys For a key, specify the line number of a line displayed as a part of the popup message. In this function, you can use the keys “Line1” through “Line5” for each section. If you specify other than “Line1”...
3. SETTING UP THE RAS FEATURE (2) Checking popup notification messages You can check the change you made in the message for each of the following items: • Fan failure • Abnormally high temp. • Drive failure prediction • Drive usage excess (When the drive power-on (=used) hours exceeds the threshold) •...
Page 53
3. SETTING UP THE RAS FEATURE 3. Click Display message. A popup notification message is displayed based on the change you made. After you confirm that the message is OK, click OK in the popup. If the message is not edited or there is something wrong in the message definition file, the following message is displayed.
4. CHECKING THE HARDWARE STATUS CHAPTER 4 CHECKING THE HARDWARE STATUS You can check the hardware status of this equipment by using the following methods. (1) Check by using GUIs You can check the hardware status of this equipment by using a graphical interface. For details, see “4.1 Hardware Status Window”.
4. CHECKING THE HARDWARE STATUS 4.1 Hardware Status Window 4.1.1 Overview After you log on to this equipment, there will always be an icon in the notification area of the taskbar to display the hardware status. If you double-click this icon or if you right- click the icon to display a popup menu and click Display Hardware status, detailed information about the hardware status of this equipment is displayed.
Page 56
4. CHECKING THE HARDWARE STATUS NOTE • For information about how to replace a perishable component, refer to “HF-W 6500 Model 60 INSTRUCTION MANUAL (manual number WIN-62-0074)”. • This window only displays the internal drives that are recognized during OS startup.
Page 57
4. CHECKING THE HARDWARE STATUS 4.1.2 Hardware status icon There will always be an icon in the notification area of the taskbar to display the hardware status. Note that, in the default factory settings, the icon is not shown. If you click the arrow at the side of the notification area, the icon will appear.
Page 58
4. CHECKING THE HARDWARE STATUS NOTE In rare cases, the registration of the hardware status icon fails and the following message is displayed. If this happens, follow the procedure below to retry the registration of the hardware status icon. 1. Click OK in the message above. 2.
4. CHECKING THE HARDWARE STATUS (1) List of displayed icons and description of each icon Table 4-1 shows a list of displayed icons and a description of each icon. A description of the displayed icon is shown when you point the mouse cursor to the icon. Table 4-1 Hardware Status Icon Hardware Icon...
4. CHECKING THE HARDWARE STATUS Figures 4-2 and 4-3 show examples of displaying the description of an icon when the hardware status of this equipment is normal and when the hardware status has an error. Figure 4-2 Example of Displaying the Description of an Icon (When the Hardware Status Is Normal) Figure 4-3 Example of Displaying the Description of an Icon (When the Hardware Status Has an Error)
4. CHECKING THE HARDWARE STATUS 4.1.3 Hardware status window The Hardware status window shows the details of the hardware status of this equipment. Figure 4-5 shows how to start the Hardware status window. 2. The Hardware status window opens. Notification area of the taskbar 1.
4. CHECKING THE HARDWARE STATUS (1) Description of the window 1. Fan condition Shows the current status of the fans. Table 4-2 Fan Condition and Displayed Information Fan condition Icon Information Normal Fan is working normally. Excessively low rotation Fan failure is detected. speed For details, refer to the event log.
4. CHECKING THE HARDWARE STATUS 3. Drive condition Shows the current status of the drives. In the following areas, the drive conditions of drive bay1, 2 are displayed. If the drive is mounted in an optional drive bay, the drive condition of drive bay 3 is also displayed.
4. CHECKING THE HARDWARE STATUS 4. Type of drive The types of the drive is displayed in the following table. Table 4-5 Type of the Drive and Displayed Information Type of the drive Displayed Information [ HDD ] [ SSD ] Unknown [ --- ] 5.
4. CHECKING THE HARDWARE STATUS 7. OFFLINE button [B/T Model only] Click this button to disconnect the corresponding drive from the RAID. This button works only when the drive condition of a drive in the RAID is either “Normal”, “Failure anticipation by SMART”, or “Drive usage excess”. In order to disconnect a drive, you must have administrator privileges.
4. CHECKING THE HARDWARE STATUS 8. Refresh button If you click this button, the latest hardware status is acquired and the information in the window is refreshed. 9. OK button Click this button to close the Hardware status window. Figure 4-7 shows the Hardware status window (B/T Model) when there is a hardware status error.
4. CHECKING THE HARDWARE STATUS 4.2 RAS Event Notification 4.2.1 Overview When an event that must be reported to a user such as hardware failure occurs, this function notifies the event to an application by setting an event object to the signaled state. An application can detect an event such as hardware failure by monitoring when the event object is set to the signaled state.
4. CHECKING THE HARDWARE STATUS Table 4-7 is a list of events to be reported to a user and their respective event objects. Table 4-7 Reported Events Event Event object name A PS fan failure occurred. W2KRAS_PSFAN_ERR_EVENT A Front fan1 failure occurred. W2KRAS_FTFAN1_ERR_EVENT W2KRAS_CPUFAN_ERR_EVENT A Front fan2 failure occurred.
4. CHECKING THE HARDWARE STATUS 4.3 Popup Notification 4.3.1 Overview When an event that must be reported to a user such as hardware failure occurred, this function notifies the event to a user by displaying a popup message on the desktop. Using this function, a user can know an event such as hardware failure occurred.
4. CHECKING THE HARDWARE STATUS 4.3.2 Messages to be displayed The following table shows a list of popup notification messages this function outputs. It should be noted that you can edit those messages. For information about how to edit messages, see “3.1.4 Editing popup notification messages”. Table 4-8 Messages Displayed Event Popup notification message...
4. CHECKING THE HARDWARE STATUS 4.4 Status Display Digital LEDs Function 4.4.1 Overview When an event that must be reported to a user such as hardware failure occurs, this function notifies the event to a user by displaying a code on the status display digital LEDs located on the front of this equipment.
4. CHECKING THE HARDWARE STATUS 4.4.2 Status codes to be displayed (1) Hardware status codes A hardware status code is displayed when an error has occurred in the hardware status of this equipment. If the hardware status is normal, a hardware status code is not displayed.
4. CHECKING THE HARDWARE STATUS (2) Application status codes An application status code is displayed by a user application by using library functions provided by this function. When an application status code is displayed, the center LED in the status indication LEDs is lit.
4. CHECKING THE HARDWARE STATUS 4.4.3 Status display modes This function has two display modes: “hardware status display mode” and “application status display mode”. Table 4-10 Status Display Modes Status display mode Description When the hardware status is normal, an application status Hardware status display mode code is displayed.
4. CHECKING THE HARDWARE STATUS 4.4.4 Priorities of displayed codes The following shows the priorities of the codes displayed by this feature. (1) In the hardware status display mode Table 4-11 Priorities of Hardware Status Display mode Code class Priority order STOP error code(“80”...
4. CHECKING THE HARDWARE STATUS 4.5 Remote Notification 4.5.1 Overview If you use this function, from a remote device through the network, you can check hardware conditions that can only be checked beside this equipment without this function. Even when hardware conditions cannot be checked beside this equipment because, for example, the system administrator is away from this equipment or this equipment is built into the facility, the hardware conditions can be checked from a remote device.
4. CHECKING THE HARDWARE STATUS 4.5.2 Hardware conditions that can be acquired by using remote notification The following hardware conditions and settings can be acquired from a remote device: • Fan condition • Temperature condition inside the chassis • Drive condition •...
4. CHECKING THE HARDWARE STATUS 4.5.3 Enabling the remote notification This function is disabled in the default factory-shipped settings. The remote notification uses the standard Windows® SNMP service. If you enable the SNMP service, the remote notification is enabled. When you use the remote notification, follow the procedure below to enable the SNMP service: (1) Starting the “SNMP Service Properties”...
Page 79
4. CHECKING THE HARDWARE STATUS (2) SNMP security configuration 1. In the “SNMP Service Properties” window, select the Security tab. 2. If you want to send a trap message whenever authentication fails, select the “Send authentication trap” check box. 3. Under “Accepted community names,” click Add. The “SNMP Service Configuration”...
Page 80
4. CHECKING THE HARDWARE STATUS 4. Specify whether to accept SNMP packets from hosts. If you want to accept SNMP packets from any manager on the network: • Select Accept SNMP packets from any host. If you want to restrict SNMP packets: •...
Page 81
4. CHECKING THE HARDWARE STATUS (3) SNMP trap configuration 1. In the “SNMP Service Properties” window, select the Traps tab. 2. Under Community name, type the name of the community that trap messages are sent to, and click “Add to list.” 3.
Page 82
4. CHECKING THE HARDWARE STATUS (4) Starting the SNMP service 1. In the “SNMP Services Properties” window, select the General tab. 2. Click Start. The SNMP service starts and the remote notification for the hardware status is enabled. 3. In order to start the SNMP service automatically at the next OS startup, in the Startup type list, select Automatic.
Page 83
4. CHECKING THE HARDWARE STATUS 4. The “Allowed apps” window appears. Click “Change settings” and then select the SNMP Service check box in “Allowed apps and features.” 5. Click OK. 4-30...
“x” with the object in the following table or replacing the following “y” with the object number in the table. Object ID: .iso.org.dod.internet.private.enterprises.Hitachi.systemExMib. hfwExMib.hfwRasStatus.x (x is an object in the table below) .1.3.6.1.4.1.116.5.45.1.y (y is an object number in the table below)
Page 85
4. CHECKING THE HARDWARE STATUS (2/3) Object Object Description Description of the values number 1: Healthy 2: Not mounted 3: Failure anticipated 5: Used hours exceeded the hfwDrv.drvTable.drvEntry.drvStatus 3.2.1.2 Drive condition threshold 7: Offline 8: Rebuild 99: Unknown Drive power-on (=used) hfwDrv.drvTable.drvEntry.drvUseTime 3.2.1.3 hours...
Page 86
4. CHECKING THE HARDWARE STATUS (3/3) Object Object Description Description of the values number Number of general purpose - 30 hfwGenDI.gendiNumber external contact inputs hfwGenDI.gendiTable.gendiEntry.gendiI Index number of - 6.2.1.1 ndex gendiEntry GENDI: GENDI contact hfwGenDI.gendiTable.gendiEntry.gendiN General purpose external GENDI0: GENDI0 contact 6.2.1.2 contact inputs name GENDI1: GENDI1 contact...
“x” with the object in the following table or replacing the following “y” with the object number in the table. Object ID: .iso.org.dod.internet.private.enterprises.Hitachi.systemExMib. hfwExMib.hfwRasSetting.x (x is an object in the table below) .1.3.6.1.4.1.116.5.45.2.y (y is an object number in the table below)
“x” with the object in the following table or replacing the following “y” with the object number in the table. Object ID: .iso.org.dod.internet.private.enterprises.Hitachi.system. hfw.hfwExMibInfo.x (x is an object in the table below) .1.3.6.1.4.1.116.3.45.1.y (y is an object number in the table below)
The enterprise ID for the trap notification when an error occurs is as follows. Enterprise ID: iso.org.dod.internet.private.enterprises.Hitachi.systemAP. hfwMibTrap.hfwRasErrorTrap .1.3.6.1.4.1.116.7.45.1 Table 4-17 Objects Related to the Trap Notification (When an Error Occurs) (1/2)...
The enterprise ID for the trap notification when the equipment has recovered from an error is as follows. Enterprise ID: iso.org.dod.internet.private.enterprises.Hitachi.systemAP. hfwMibTrap.hfwRasRecoverTrap .1.3.6.1.4.1.116.7.45.2 Table 4-18 Objects Related to the Trap Notification (When the Equipment Has Recovered...
Page 91
4. CHECKING THE HARDWARE STATUS No.1: %1 denotes the name of the recovered fan. No.3: %2 denotes the name of the memory slot that recovered from frequent error correction. No.4: %3 denotes the number of the array that recovered from an error. For this equipment, the array number is always 1. 4-38...
The enterprise ID for the trap notification related to operational modes is as follows. Enterprise ID: iso.org.dod.internet.private.enterprises.Hitachi.systemAP. hfwMibTrap.hfwRasInfoTrap .1.3.6.1.4.1.116.7.45.3 Table 4-19 Objects Related to the Trap Notification (Operational Modes)
4. CHECKING THE HARDWARE STATUS 4.6 Status Acquisition by Using the RAS Library By using the RAS library, the following hardware conditions can be acquired. For details about the RAS library, see “6.1 RAS Library”. ● To acquire the memory condition: Use the GetMemStatus function. ●...
5. CONTROLLING THE HARDWARE CHAPTER 5 CONTROLLING THE HARDWARE The RAS feature can have the following controls over this equipment. (1) Automatic shutdown of the equipment When a hardware error occurs or a remote shutdown request through the contact input is detected, the equipment can be automatically shut down.
5. CONTROLLING THE HARDWARE 5.1 Automatic Shutdown of the Equipment This function automatically shuts down the equipment when running the equipment would pose a danger when fan failure or abnormally high temperature is detected. By doing so, you can protect internal parts such as a processor from thermal degradation and prevent thermal runaway of the system due to malfunction of this equipment.
5. CONTROLLING THE HARDWARE 5.1.2 Automatic shutdown when detecting abnormally high temperature When the temperature sensor in this equipment detects the temperature is abnormally high inside the chassis, the equipment automatically shuts down. • This function can be enabled or disabled in the RAS Setup window. In the default factory setting, this function is disabled.
5. CONTROLLING THE HARDWARE 5.2 Controlling the Hardware by Using the RAS Library You can use the RAS library functions to shut down the system and control the general- purpose external contacts and the status display digital LEDs.For details about the library functions, see “6.1 RAS Library”.
5. CONTROLLING THE HARDWARE 5.3 RAID Configuration Control Command (raidctrl) [B/T Model only] The raidctrl command displays the status of the RAID and its drives. The command can also forcibly disconnect a drive and change the setting for RAID. This command can be launched at the command prompt.
5. CONTROLLING THE HARDWARE (1) Displaying the status of the RAID and its drives (with no options) If you execute the raidctrl command without any options, the command will output the status of the RAID and the drives in the RAID on this equipment. Tables 5-1 and 5-2 show the status of the RAID and drives displayed in the output.
Page 100
5. CONTROLLING THE HARDWARE (2) Disconnecting a drive (with /OFFLINE option) If you execute the raidctrl command with the /OFFLINE option, the specified drive is forcibly disconnected and set to offline. This option can be used only when the RAID status is “OPTIMAL”. Only 1 or 2 can be specified for DRVNO.
Page 101
5. CONTROLLING THE HARDWARE (3) Switching whether a media error is notified when it occurs (with the /NOTIFY option) If you execute the raidctrl command with the /NOTIFY option, you can switch whether media errors are notified when they occur. If the /NOTIFY option is specified alone, the current notification setting is displayed.
Page 102
5. CONTROLLING THE HARDWARE (4) Setting the load during rebuilding (with the /LOAD option) If you run the raidctrl command with the /LOAD option, you can set the level of the drive writing load to be put on the system during rebuilding. If the /LOAD option is specified alone, the current setting is displayed.
5. CONTROLLING THE HARDWARE <Diagnosis> When the raidctrl command finishes normally, it returns exit code 0. If the command returns abnormally, one of the error messages listed in Table 5-3 is displayed, and the command returns an exit code other than 0. Table 5-3 Error Messages of the raidctrl Command Error message Description...
6. LIBRARY FUNCTIONS CHAPTER 6 LIBRARY FUNCTIONS A user application can get and control the hardware status of this equipment by using the RAS library. For information about the hardware specifications of the RAS external contacts interface described in this chapter and the usage of each contact, refer to “HF-W6500 Model 60 INSTRUCTION MANUAL (manual number WIN-62-0074)”.
Page 106
6. LIBRARY FUNCTIONS NOTE ・Do not copy or move w2kras.dll, hfwras.dll, and ctrl7seg.dll to another directory. If you do, the RAS feature of this equipment cannot run properly. The functions offered by w2kras.dll and ctrl7seg.dll can be called from Visual Basic® (.NET is required for 64-bit operating systems).
6. LIBRARY FUNCTIONS 6.1.2 Shutdown function (BSSysShut) <Name> BSSysShut - System shutdown <Syntax> #include <w2kras.h> int BSSysShut(reboot) int reboot: /*Reboot flag*/ <Description> BSSysShut shuts down the system. The reboot argument is used for specifying whether to reboot the system after the shutdown. reboot = 0: The power to this equipment is turned off after the shutdown.
6. LIBRARY FUNCTIONS 6.1.3 Watchdog timer control function (WdtControl) (1) Function interface <Name> WdtControl - Watchdog timer control / status acquisition <Syntax> #include <w2kras.h> BOOL WdtControl(DWORD dwCmd, PDWORD pdwCount); <Description> This function performs the action specified by dwCmd on the watchdog timer. In order to use this function, in the RAS Setup window, select Retriggered by application program under Watchdog timer setting.
Page 109
6. LIBRARY FUNCTIONS If dwCmd is WDT_STAT, the remaining time (in seconds) at the timing of the function call until the watchdog timer expires is stored to the variable pointed by pdwCount. If the value of the variable pointed by pdwCount is 0 when the function returns, that means the watchdog timer has already timed out.
Page 110
6. LIBRARY FUNCTIONS (2) Behavior of the WDTTO contact of the RAS external contacts interface This sub-subsection describes the behavior of the WDTTO contact of the RAS external contacts interface under each of the following conditions. ● When the equipment starts The WDTTO contact is closed.
6. LIBRARY FUNCTIONS WDT is retriggered. WDT is retriggered. WdtControl function is called. WdtControl function is called. Power is WDT times out. (Timeout: 10 seconds) (Timeout: 10 seconds) turned on. Contact open (Normal) Power is turned off. 10 seconds Contact closed (WDT has timed out.) Time →...
6. LIBRARY FUNCTIONS 6.1.4 Control functions for the general purpose external contact outputs (GendoControl and GendoControlEx) Use the GendoControl and GendoControlEx functions to control the outputs to the general purpose external contacts of the RAS external contacts interface. (1) Function interface (GendoControl) <Name>...
6. LIBRARY FUNCTIONS (2) Function interface (GendoControlEx) <Name> GendoControlEx - Output control for the general purpose external contacts (GENDO0 / GENDO1 / GENDO2) <Syntax> #include <w2kras.h> BOOL GendoControlEx(DWORD dwPort, DWORD dwCmd); <Description> This function performs the action specified by dwCmd on the general purpose external contact (GENDO0, GENDO1, or GENDO2) of the RAS external contacts interface specified by dwPort.
6. LIBRARY FUNCTIONS <Diagnosis> If this function completes successfully, the function returns TRUE. If this function terminates with an error, the function returns FALSE. When this function terminates with an error, call the GetLastError Windows API function to get the error code. Error codes returned by this function on its own are as follows. Error code (value) Description W2KRAS_INVALID_PARAMETER...
6. LIBRARY FUNCTIONS 6.1.5 Get functions for the general purpose external contact inputs (GetGendi and GetGendiEx) Use the GetGendi and GetGendiEx functions to get the inputs for the general purpose external contacts of the RAS external contacts interface. (1) Function interface (GetGendi) <Name>...
6. LIBRARY FUNCTIONS (2) Function interface (GetGendiEx) <Name> GetGendiEx - Status acquisition of the general purpose external contact inputs (GENDI, GENDI0, GENDI1, and GENDI2) <Syntax> #include <w2kras.h> DWORD GetGendiEx(DWORD dwPort); <Description> This function gets the status of the general purpose external contact inputs (GENDI, GENDI0, GENDI1, and GENDI2) of the RAS external contacts interface.
Page 117
6. LIBRARY FUNCTIONS <Diagnosis> If this function terminates with an error, the function returns 0xffffffff. When this function terminates with an error, call the GetLastError Windows API function to get the error code. Error codes returned by this function on its own are as follows. Error code (value) Description W2KRAS_INVALID_PARAMETER...
6. LIBRARY FUNCTIONS 6.1.6 Log function (MConWriteMessage) <Name> MConWriteMessage - Logging <Syntax> #include <w2kras.h> VOID WINAPI MConWriteMessage(LPSTR lpBuffer); <Description> The MConWriteMessage function writes the specified message (characters) to a log file (file name: hfwrasa.log or hfwrasb.log). The message is written along with the time stamp. Two log files are available and the size of each file is 64 KB.
Page 119
NOTE ・This function has the same name as the message console function provided by W2K-PLUS, software from Hitachi, but this function does not output to the message console. ・In order to reduce the amount of resources used, this function, for example, opens and closes a pipe every time the function is called.
6. LIBRARY FUNCTIONS 6.1.7 Get function for the memory condition (GetMemStatus) <Name> GetMemStatus - Memory status acquisition <Syntax> #include <w2kras.h> BOOL GetMemStatus(PMEM_DATA pMemData); <Description> The GetMemStatus function stores the condition of the memory in this equipment to a structure pointed by pMemData.The parameters of this function are explained below. pMemData: This parameter specifies a pointer to a MEM_DATA structure that stores the acquired memory condition.
Page 121
6. LIBRARY FUNCTIONS For this model, correspondence between elements of Dimm_Status and DMM names is as follows. Element DIMM name Dimm_Status[0] DIMM A1 Dimm_Status[1] DIMM A2 Dimm_Status[2] DIMM B1 Dimm_Status[3] DIMM B2 <Diagnosis> If this function completes successfully, the function returns TRUE. If this function terminates with an error, the function returns FALSE.
6. LIBRARY FUNCTIONS 6.1.8 Get function for the drive condition (hfwDiskStat) <Name> hfwDiskStat - Drive status acquisition <Syntax> #include <hfwras.h> BOOL hfwDiskStat(PHFW_DISK_STATUS phfwDiskStatus); <Description> The hfwDiskStat function stores the drive conditions to a structure pointed by phfwDiskStatus. The parameters of this function are explained below. phfwDiskStatus: This parameter specifies a pointer to an HFW_DISK_STATUS structure that stores the drive conditions.
Page 123
6. LIBRARY FUNCTIONS Table 6-9 shows a list of defined values on drive types and statuses. Table 6-9 List of Define directive Defined value Description The type of drive is HDD. DRIVETYPE_HDD (0x00010000) The type of drive is SSD. DRIVETYPE_SSD (0x00020000) DISKSTAT_HEALTHY The drive is working properly.
6. LIBRARY FUNCTIONS 6.1.9 Get function for the RAID status (hfwRaidStat,) [B/T Model only] <Name> hfwRaidStat - RAID status acquisition <Syntax> #include <hfwras.h> BOOL hfwRaidStat (PHFW_RAID_STATUS phfwRaidStatus); <Description of the function> The hfwRaidStat function stores the RAID status to a structure pointed to by phfwRaidStatus. The parameters of this function are explained below.
6. LIBRARY FUNCTIONS Level stores the RAID level. The meaning of each value that can be stored in Level is as follows. Table 6-10 Values Stored in Level in the HFW_ARRAY_STATUS Structure Defined value Description HFW_RAID1 (0x00000001) RAID 1 is set up. DiskNumber stores the value that indicates the drive bays used by the RAID.
6. LIBRARY FUNCTIONS Possible combinations are as follows. Table 6-13 Possible Combinations that Can Be Stored in Status in the HFW_ARRAY_STATUS Structure RAID status Detailed information HFW_RAID_OPTIMAL None HFW_RAID_MEDIA_ERROR HFW_RAID_DEGRADE None HFW_RAID_MEDIA_ERROR HFW_RAID_REBUILD HFW_RAID_REBUILD | HFW_RAID_MEDIA_ERROR HFW_RAID_UNKNOWN None Progress stores the progress of the rebuild process. If the RAID status value (Status) does not contain HFW_RAID_REBUILD, 0 is stored.
6. LIBRARY FUNCTIONS 6.1.10 Status display digital LEDs control functions (SetStCode7seg, TurnOff7seg, SetMode7seg) (1) Application status code display function (SetStCode7seg) <Name> SetStCode7seg - Displaying an application status code <Syntax> #include <ctrl7seg.h> BOOL SetStCode7seg(DWORD dwStCode); <Description> This function outputs an application status code on the status display digital LEDs. On the status display digital LEDs, the value specified by this function is displayed in hexadecimal.
Page 128
6. LIBRARY FUNCTIONS (2) Application status code clear function (TurnOff7seg) <Name> TurnOff7seg - Turning off an application status code <Syntax> #include <ctrl7seg.h> BOOL TurnOff7seg(VOID); <Description> This function clears the application status code currently displayed on the status display digital LEDs. When this function is called, the status display digital LEDs are turned off. <Diagnosis>...
Page 129
6. LIBRARY FUNCTIONS (3) Status display mode setup function (SetMode7seg) <Name> SetMode7seg - Setting up the status display mode <Syntax> #include <ctrl7seg.h> BOOL SetMode7seg(DWORD dwMode); <Description> This function configures the status display mode of the status display digital LEDs. The parameters of this function are explained below. dwMode: This parameter specifies the “status display mode”...
6. LIBRARY FUNCTIONS 6.2 Sample Programs Sample program file in C that use RAS library functions are stored in the %ProgramFiles% \HFWRAS\sample directory. Use those files for reference when you develop a program or check the operation of those functions. The following table shows a list of sample programs.
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS CHAPTER 7 FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS 7.1 Notifying the Cause of the STOP Error Code 7.1.1 Overview This equipment records the memory contents in memory dump files when the system is forcibly recovered from OS lockup, an NMI is generated due to hardware failure, or uncorrectable memory error occurs.
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS 7.1.2 Supported causes of the STOP error code This function is activated when a blue screen appears due to a cause described in Table 7-1. Causes other than those in Table 7-1 are not supported. Table 7-1 Supported STOP Error Causes Contents recorded in the description Cause...
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS 7.1.3 Event log Table 7-2 shows a list of event log entries this function records. Those event log entries are recorded in the system log. Table 7-2 Event Log Entries Recorded by This Function Event ID Source Type...
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS 7.2 Log Information Collection Window 7.2.1 Overview In the log information collection window, you can take the following actions by using a graphical user interface. (1) Collecting log data This function saves the data used for preventive maintenance and post-failure analysis of the problem.
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS 7.2.3 Using the log information collection window 1. The log information collection window appears. By default, both the “Gather log data” and “Gather memory dump files” check boxes are selected. If you do not need one of those two, clear the check box for the one you do not need, and then click Continue.
Page 137
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS 4. The information selected at step 1 is collected. During the process, a window is displayed to show the progress. If the process finishes successfully, the following window appears. Do not do anything on the windows that appear during the process, Wait until "The maintenance operation completed"...
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS 5. A directory is created under the directory specified as the destination directory to save. The name of the directory is assigned based on the date and time of the operation. Under the directory just created, collected data is saved. If the following folder structure is not created, the collection of log information might fail.
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS 7.3 Logging the Trend of the Temperature inside the Chassis 7.3.1 Overview This function periodically measures the temperature inside the chassis of this equipment and records the data into a log file. You can adjust the logging interval for the temperature inside the chassis by using the logging interval setup command.
Page 140
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS <Log information format> The format of log information is as follows. (1) temp.csv YYYY/MM/DD hh:mm:ss, yxxx YYYY: Year, MM: Month, DD: Day, hh: hour (24-hour clock), mm: minute, ss: second, y: Sign (+ or -), xxx: (Temperature ( If acquiring the temperature fails, xxx is replaced with “---”.
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS 7.3.3 Logging interval setup command <Name> tmplogset - Setting up the logging interval <Syntax> tmplogset <Function> This command configures the interval for logging the trend of the temperature inside the chassis. The following shows how to use the command. 1.
Page 142
7. FEATURES RELATED TO MAINTENANCE AND FAILURE ANALYSIS 4. Type the number corresponding to the new interval you want to select and press Enter. If the number you typed is out of range, the following message is displayed to prompt you to type a correct number.
8. SIMULATING THE HARDWARE STATUS CHAPTER 8 SIMULATING THE HARDWARE STATUS 8.1 Hardware Status Simulation 8.1.1 Overview This function simulates the hardware status and general purpose external contacts status of this equipment. By simulating the hardware status and general purpose external contacts status, you can test a user application and check the notification interface of the RAS software even if a hardware error does not actually occur or connection to external contacts is not done.
Page 145
8. SIMULATING THE HARDWARE STATUS The monitoring function of the RAS software detects change in the simulated hardware status and general purpose external contacts status and notifies the change through various interfaces. For information about the interfaces used for notification, see the following sections in this manual.
Page 146
8. SIMULATING THE HARDWARE STATUS NOTE ・While the equipment is running in simulation mode, the following functions cannot be used: 1. OS Hung-up Monitoring ・In simulation mode, the memory monitoring function records an event (Event ID: 525) in the event log only when memory error is detected for the first time.
8. SIMULATING THE HARDWARE STATUS 8.1.2 Using the simulation function Run the simulation mode start command at the command prompt to set the RAS software to the “simulation mode”. When the RAS software transitions to the “simulation mode”, the Simulation Tool window appears on the screen. You can use this window to simulate the condition of hardware devices.
Page 148
8. SIMULATING THE HARDWARE STATUS (2) Starting the simulation mode To start the simulation mode, run the simulation mode start command (simrasstart command) at the command prompt. NOTE ・The simulation mode cannot be started from a remote desktop. Before you start the simulation mode, other logged-on users must log off.
Page 149
8. SIMULATING THE HARDWARE STATUS 5. The Simulation Tool window appears. After this point on, this equipment runs in simulation mode. Monitoring hardware failure is now disabled. NOTE The simulation function performs the following actions while running in simulation mode. ・The status indicator is lit between green and red.
8. SIMULATING THE HARDWARE STATUS (3) Using the Simulation Tool window When the RAS software transitions to the simulation mode, the Simulation Tool window appears as shown in the following figure. You can use the Simulation Tool window to change the condition of hardware devices.
Page 151
8. SIMULATING THE HARDWARE STATUS ● Target Shows the name of each simulated hardware device. Category Target Fan condition PS fan, Front fan1, Front fan2 Temperature condition inside the Internal Temperature chassis Drive condition Drive bay1, Drive bay2, Drive bay3 Memory condition DIMM A1, DIMM A2, DIMM B1, DIMM B2...
Page 152
8. SIMULATING THE HARDWARE STATUS ● Apply button If you click this button, all “Setting” are applied to the “Status” of hardware devices. The monitoring function of the RAS software detects change in the “Status” of the hardware devices and notifies the change through various interfaces. NOTE A new hardware status is applied to the notification interface of the RAS software when the following time elapses after you click Apply in the...
Page 153
8. SIMULATING THE HARDWARE STATUS ● Minimize button ([_] button) Click the Minimize button at the upper right corner of the Simulation Tool window to minimize the Simulation Tool window. ● Close button ([×] button) Click the Close button at the upper right corner of the Simulation Tool window to execute shutdown and exit the simulation mode.
Page 154
8. SIMULATING THE HARDWARE STATUS The following procedure shows how to simulate a hardware status using the Simulation Tool window. 1. Right-click a hardware item you want to simulate. A popup menu is displayed. The menu lists the statuses you can select based on the current hardware status. 2.
Page 155
8. SIMULATING THE HARDWARE STATUS 3. To apply the status displayed in “Setting” to the hardware status, click Apply. As a result, the “Status” in the Simulation Tool window is updated. NOTE When you click the Apply button, if no value is set (“---” is shown under “Setting”) or if the same status as the one shown under “Status”...
Page 156
8. SIMULATING THE HARDWARE STATUS ● Drive status <A/S Model> Current status Statuses in the popup menu Note Healthy Healthy, SMART Detected, (*1) Overrun, Not Connected SMART Detected Healthy, SMART Detected, (*1) (*2) Not Connected Overrun Healthy, Overrun, Not Connected (*1) (*2) Not Connected Healthy, Not Connected...
Page 157
8. SIMULATING THE HARDWARE STATUS ● Memory Current status Statuses in the popup menu Note Normal Normal, Error, Failure, (*1) Not Mounted Error (*1) (*2) Failure (*1) (*2) Not Mounted (*1) (*1) In the case of this equipment, the equipment never starts with the DIMM A2 not being mounted, and consequently, “Not Mounted”...
Page 158
8. SIMULATING THE HARDWARE STATUS (4) Exiting the simulation mode To exit the simulation mode, click the End button or Close botton. then, this equipment shutdown automatically. Alternatively,exit the simulation mode in the following cases(reasons). ・Shutdown is executed on the Start menu. ・A system shutdown API function such as the BSSysshut and ExitWindowsEx functions is executed.
8. SIMULATING THE HARDWARE STATUS 8.1.3 Precautions when you use the Simulation Tool window (1) When the new status to be simulated is finalized From when you select a new status to be simulated from a popup menu on the Simulation Tool window until when you click Apply, the new status to be simulated is not finalized and you can change it.
8. SIMULATING THE HARDWARE STATUS 8.1.4 Event log entries In order to clearly show which log entries for hardware failure originate from the simulation function, this function records event log entries listed in the following table. Note that a log entry with Event ID 252 is recorded at the timing when you click Apply in the Simulation Tool window.
8. SIMULATING THE HARDWARE STATUS 8.1.5 Remote notification This function notifies of the transition of the RAS software to simulation mode using trap notification so that an SNMP manager that monitors this equipment in a remote location can know that the hardware statuses it acquires and the trap notifications for hardware failure (and hardware recovery) it receives are generated in simulation mode.