HP Hardware RAID status in smart?

    • OMV 1.0
    • Resolved
    • HP Hardware RAID status in smart?

      Is there anyway I can get the storage->s.m.a.r.t GUI to display the smart status of my HP array?

      I've upgraded the smartmontools package to v6.4 from the wheezy-backports repository and I can query the smart status of the individual drives without issue.

      Source Code

      1. root@storage:/etc# smartctl /dev/sg1 -a -d cciss,0
      2. smartctl 6.4 2014-09-29 r3990 [x86_64-linux-3.2.0-4-amd64] (local build)
      3. Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
      4. /dev/sg1 [cciss_disk_00] [SAT]: Device open changed type from 'sat,auto+cciss' to 'sat'
      5. === START OF INFORMATION SECTION ===
      6. Model Family: Hitachi Deskstar 5K3000
      7. Device Model: Hitachi HDS5C3020ALA632
      8. Serial Number: ML0220F31DZZ5N
      9. LU WWN Device Id: 5 000cca 369d4003a
      10. Firmware Version: ML6OA580
      11. User Capacity: 2,000,398,934,016 bytes [2.00 TB]
      12. Sector Size: 512 bytes logical/physical
      13. Rotation Rate: 5940 rpm
      14. Form Factor: 3.5 inches
      15. Device is: In smartctl database [for details use: -P show]
      16. ATA Version is: ATA8-ACS T13/1699-D revision 4
      17. SATA Version is: SATA 2.6, 6.0 Gb/s (current: 3.0 Gb/s)
      18. Local Time is: Wed May 20 14:21:44 2015 AEST
      19. SMART support is: Available - device has SMART capability.
      20. SMART support is: Enabled
      21. === START OF READ SMART DATA SECTION ===
      22. SMART Status not supported: Incomplete response, ATA output registers missing
      23. SMART overall-health self-assessment test result: PASSED
      24. Warning: This result is based on an Attribute check.
      25. General SMART Values:
      26. Offline data collection status: (0x84) Offline data collection activity
      27. was suspended by an interrupting command from host.
      28. Auto Offline Data Collection: Enabled.
      29. Self-test execution status: ( 0) The previous self-test routine completed
      30. without error or no self-test has ever
      31. been run.
      32. Total time to complete Offline
      33. data collection: (22966) seconds.
      34. Offline data collection
      35. capabilities: (0x5b) SMART execute Offline immediate.
      36. Auto Offline data collection on/off support.
      37. Suspend Offline collection upon new
      38. command.
      39. Offline surface scan supported.
      40. Self-test supported.
      41. No Conveyance Self-test supported.
      42. Selective Self-test supported.
      43. SMART capabilities: (0x0003) Saves SMART data before entering
      44. power-saving mode.
      45. Supports SMART auto save timer.
      46. Error logging capability: (0x01) Error logging supported.
      47. General Purpose Logging supported.
      48. Short self-test routine
      49. recommended polling time: ( 1) minutes.
      50. Extended self-test routine
      51. recommended polling time: ( 383) minutes.
      52. SCT capabilities: (0x003d) SCT Status supported.
      53. SCT Error Recovery Control supported.
      54. SCT Feature Control supported.
      55. SCT Data Table supported.
      56. SMART Attributes Data Structure revision number: 16
      57. Vendor Specific SMART Attributes with Thresholds:
      58. ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
      59. 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
      60. 2 Throughput_Performance 0x0005 134 134 054 Pre-fail Offline - 100
      61. 3 Spin_Up_Time 0x0007 153 153 024 Pre-fail Always - 376 (Average 342)
      62. 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 115
      63. 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
      64. 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
      65. 8 Seek_Time_Performance 0x0005 146 146 020 Pre-fail Offline - 29
      66. 9 Power_On_Hours 0x0012 096 096 000 Old_age Always - 28883
      67. 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
      68. 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 102
      69. 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 116
      70. 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 116
      71. 194 Temperature_Celsius 0x0002 181 181 000 Old_age Always - 33 (Min/Max 16/50)
      72. 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
      73. 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
      74. 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
      75. 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
      76. SMART Error Log Version: 1
      77. No Errors Logged
      78. SMART Self-test log structure revision number 1
      79. No self-tests have been logged. [To run self-tests, use: smartctl -t]
      80. SMART Selective self-test log data structure revision number 1
      81. SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
      82. 1 0 0 Not_testing
      83. 2 0 0 Not_testing
      84. 3 0 0 Not_testing
      85. 4 0 0 Not_testing
      86. 5 0 0 Not_testing
      87. Selective self-test flags (0x0):
      88. After scanning selected spans, do NOT read-scan remainder of disk.
      89. If Selective self-test is pending on power-up, resume after 0 minute delay.
      Display All


      The problem I have is that no matter what changes I make to smartd.conf, the GUI always incorrectly shows /dev/sda (which is the logical disk).

      I've tried adding the following to /etc/default/openmediavault and rebooting, but it doesn't seem to make a difference

      Source Code

      1. OMV_SMARTMONTOOLS_DEFAULTDIRECTIVES=-a -d cciss -o on -S on -T permissive


      After adding the -d option I get the following if I attempt to save any changes from the GUI;

      Source Code

      1. Failed to execute command 'export LANG=C; omv-mkconf smartmontools 2>&1': /usr/sbin/omv-mkconf: 51: /etc/default/openmediavault: -d: not found



      Can anyone point me in the right direction to be able to get this displayed in the GUI?
      Where does the GUI enumerate the information for the drives it displays and can this be modified?

      I've already added the HP RAID package from OMV-Extras, but this doesn't provide an email mechanism, to let me know when a drive is faulty.

      I've also tried upgrading to stone burner as I see other people with other RAID controllers had their issue resolved by upgrading, but it didn't make a difference.

      Thanks.

      The post was edited 2 times, last by dojrude ().

    • oxidizer wrote:

      Use the HP Raid Plugin


      As as I stated, I've already added the HP RAID package from OMV-Extras, but this doesn't provide an email mechanism, to let me know when a drive is faulty. I don't see the point of a hardware status monitor if I have to manually login to get the status of it. It could've been failed for a month before I realise. The whole point of this is to get email notification when a drive is classed as failing or failed.

      i could manually script it to run the hpaculi script via cron, but it would be nice to have it all integrated into openmediavault and managed from the GUI.
    • OMV core does not support that because it only sees the logical volumes, not the physical ones of an HW RAID. This is because this requires specific hardware related implementations and is not possible in a generic way without HW drivers, etc ....
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • votdev wrote:

      OMV core does not support that because it only sees the logical volumes, not the physical ones of an HW RAID. This is because this requires specific hardware related implementations and is not possible in a generic way without HW drivers, etc ....


      Thanks for taking the time to reply, but I'm not sure I understand when you talk about hardware related implementations and HW drivers?

      This is an array that's already recognised and used within OMV and I've proved that I can query the smart status of individual drives within the array just by using smartctl. No additional hardware or drivers would be required.
      The problem appears to be in the way that the GUI enumerates the drives, preferring logical over physical.
      I was just wondering if there was a way that I could amend something like a config file so that it would use the physical rather than logical device names or even manually add hardcoded device paths.

      Perhaps you could explain what the following environment variables do exactly and if they are going to bear any relevance on what the GUI displays;

      smartmontools: OMV_SMARTMONTOOLS_CONFIG=/etc/smartd.conf
      smartmontools: OMV_SMARTMONTOOLS_DEFAULT=/etc/default/smartmontools
      smartmontools: OMV_SMARTMONTOOLS_DEFAULTDIRECTIVES=-a -o on -S on -T permissive

      As I said, I've tried hardcoding the devices in smartd.conf which seems to work as far as smartctl is concerned, but I still can't see the info in the GUI.
      I would have thought that using the environment variable OMV_SMARTMONTOOLS_DEFAULT and hardcoding the paths for the physical devices in smartd.conf would have been enough based on what I've been able to ascertain from google searches, but this still doesn't seem to work as expected.

      Thanks.
    • The environment variables are not related to the GUI.

      OMV only detects the logical volumes. To detect the physical volumes behind the HW RAID you have to use CLI commands like tw_cli, sas2ircu or arcconf to detect them. As far as i know there is no way to get the necessary information via sysfs or something like that. This means the manufacturer CLI commands must be used, and here we are at the point that it is not possible to retrieve the information in a generic way and a special implementation in the OMV backend is required.

      Right at the moment only devices that are detected via the kernel are supported. I do not think that this will change in near future.
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • votdev wrote:

      To detect the physical volumes behind the HW RAID you have to use CLI commands like tw_cli, sas2ircu or arcconf to detect them.


      Maybe to detect them, but not to access them as you can see here:

      dojrude wrote:

      smartctl /dev/sg1 -a -d cciss,0


      How about a "manual smart feature" that would allow to choose the /dev/xxx and set the -d vendor,X setting? I know that is not generic... but it would be a solution after all.

      @ryecoaaron ?

      Greetings
      David
      "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"

      Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.


      Upload Logfile via WebGUI/CLI
      #openmediavault on freenode IRC | German & English | GMT+1
      Absolutely no Support via PM!

      I host parts of the omv-extras.org Repository, the OpenMediaVault Live Demo and the pre-built PXE Images. If you want you can take part and help covering the costs by having a look at my profile page.

      The post was edited 1 time, last by davidh2k ().

    • davidh2k wrote:

      votdev wrote:

      To detect the physical volumes behind the HW RAID you have to use CLI commands like tw_cli, sas2ircu or arcconf to detect them.


      Maybe to detect them, but not to access them as you can see here:

      dojrude wrote:

      smartctl /dev/sg1 -a -d cciss,0


      How about a "manual smart feature" that would allow to choose the /dev/xxx and set the -d vendor,X setting? I know that is not generic... but it would be a solution after all.

      @ryecoaaron ?

      Greetings
      DAvid


      That would be awesome if we could manually edit the smart settings. Maybe have an auto or manual setting though, as I'm sure most people will be happy with the auto setting and not want to mess around with it.
    • I would love to see this as well, I have a dell branded raid card which is actually LSI and I can also see the drives using megaraid drivers, but if I add the recommended to smartd.conf it gets removed

      Source Code

      1. smartctl -a -d megaraid,4 /dev/sdb
      2. smartctl -a -d megaraid,5 /dev/sdb
      3. smartctl -a -d megaraid,6 /dev/sdb
      4. smartctl -a -d megaraid,7 /dev/sdb


      Display Spoiler
      root@omv:~# smartctl -a -d megaraid,4 /dev/sdb
      smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
      Copyright (C) 2002-11 by Bruce Allen, smartmontools.sourceforge.net

      /dev/sdb [megaraid_disk_04] [SAT]: Device open changed type from 'megaraid' to 'sat'
      === START OF INFORMATION SECTION ===
      Model Family: Western Digital Red
      Device Model: WDC WD20EFRX-68EUZN0
      Serial Number: WD-WMC4M3224655
      LU WWN Device Id: 5 0014ee 0591bf0be
      Firmware Version: 80.00A80
      User Capacity: 2,000,398,934,016 bytes [2.00 TB]
      Sector Sizes: 512 bytes logical, 4096 bytes physical
      Device is: In smartctl database [for details use: -P show]
      ATA Version is: 9
      ATA Standard is: Exact ATA specification draft version not indicated
      Local Time is: Fri May 29 13:43:22 2015 BST
      SMART support is: Available - device has SMART capability.
      SMART support is: Enabled

      === START OF READ SMART DATA SECTION ===
      SMART overall-health self-assessment test result: PASSED
      Warning: This result is based on an Attribute check.

      General SMART Values:
      Offline data collection status: (0x82) Offline data collection activity
      was completed without error.
      Auto Offline Data Collection: Enabled.
      Self-test execution status: ( 0) The previous self-test routine completed
      without error or no self-test has ever
      been run.
      Total time to complete Offline
      data collection: (26400) seconds.
      Offline data collection
      capabilities: (0x7b) SMART execute Offline immediate.
      Auto Offline data collection on/off support.
      Suspend Offline collection upon new
      command.
      Offline surface scan supported.
      Self-test supported.
      Conveyance Self-test supported.
      Selective Self-test supported.
      SMART capabilities: (0x0003) Saves SMART data before entering
      power-saving mode.
      Supports SMART auto save timer.
      Error logging capability: (0x01) Error logging supported.
      General Purpose Logging supported.
      Short self-test routine
      recommended polling time: ( 2) minutes.
      Extended self-test routine
      recommended polling time: ( 255) minutes.
      Conveyance self-test routine
      recommended polling time: ( 5) minutes.
      SCT capabilities: (0x703d) SCT Status supported.
      SCT Error Recovery Control supported.
      SCT Feature Control supported.
      SCT Data Table supported.

      SMART Attributes Data Structure revision number: 16
      Vendor Specific SMART Attributes with Thresholds:
      ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
      3 Spin_Up_Time 0x0027 176 172 021 Pre-fail Always - 4166
      4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 61
      5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
      7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
      9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7701
      10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
      11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
      12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 61
      192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 51
      193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 982
      194 Temperature_Celsius 0x0022 112 106 000 Old_age Always - 35
      196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
      197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
      198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
      199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
      200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

      SMART Error Log Version: 1
      No Errors Logged

      SMART Self-test log structure revision number 1
      Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
      # 1 Short offline Completed without error 00% 7574 -
      # 2 Extended offline Completed without error 00% 7484 -
      # 3 Short offline Completed without error 00% 7406 -
      # 4 Extended offline Completed without error 00% 7316 -
      # 5 Extended offline Interrupted (host reset) 10% 7281 -
      # 6 Extended offline Completed without error 00% 7256 -
      # 7 Extended offline Completed without error 00% 7243 -
      # 8 Extended offline Completed without error 00% 7150 -
      # 9 Short offline Completed without error 00% 7072 -
      #10 Extended offline Completed without error 00% 6982 -
      #11 Short offline Completed without error 00% 6904 -
      #12 Extended offline Completed without error 00% 6814 -
      #13 Short offline Completed without error 00% 6737 -
      #14 Extended offline Completed without error 00% 6646 -
      #15 Short offline Completed without error 00% 6569 -
      #16 Extended offline Completed without error 00% 6479 -
      #17 Extended offline Completed without error 00% 6438 -
      #18 Extended offline Completed without error 00% 6407 -
      #19 Extended offline Completed without error 00% 6301 -
      #20 Extended offline Completed without error 00% 6226 -
      #21 Extended offline Completed without error 00% 6217 -

      SMART Selective self-test log data structure revision number 1
      SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
      1 0 0 Not_testing
      2 0 0 Not_testing
      3 0 0 Not_testing
      4 0 0 Not_testing
      5 0 0 Not_testing
      Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
      If Selective self-test is pending on power-up, resume after 0 minute delay.

      HP GEN8 Micro Server 10GB Ram + 4xWD Red 2TB
    • davidh2k wrote:

      votdev wrote:

      To detect the physical volumes behind the HW RAID you have to use CLI commands like tw_cli, sas2ircu or arcconf to detect them.
      Maybe to detect them, but not to access them as you can see here:

      dojrude wrote:

      smartctl /dev/sg1 -a -d cciss,0
      How about a "manual smart feature" that would allow to choose the /dev/xxx and set the -d vendor,X setting? I know that is not generic... but it would be a solution after all.

      @ryecoaaron ?

      Greetings
      David

      Any chance for this "manual smart feature" in OMV3 or OMV4? this will solve all trouble with ARECA, LSI, 3Ware etc which is supported by Smartmontools: smartmontools.org/wiki/Supported_RAID-Controllers

      smartmontools.org/browser/trun…montools/smartd.conf.5.in

      The post was edited 1 time, last by bug11 ().

    • @votdev Maybe Volker can work on something, but I bet he wouldn't like it much.

      Greetings
      David
      "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"

      Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.


      Upload Logfile via WebGUI/CLI
      #openmediavault on freenode IRC | German & English | GMT+1
      Absolutely no Support via PM!

      I host parts of the omv-extras.org Repository, the OpenMediaVault Live Demo and the pre-built PXE Images. If you want you can take part and help covering the costs by having a look at my profile page.
    • davidh2k wrote:

      @votdev Maybe Volker can work on something, but I bet he wouldn't like it much.

      Greetings
      David
      Hehe, you are right. I don't like this idea because as I have really often said, OMV is no webmin replacement. Everything should be done automagically. Or do you see such selection boxes in Synology or other NAS solutions? If such a feature will be implemented it MUST be done magically without user interaction.
      Absolutely no support through PM!

      I must not fear.
      Fear is the mind-killer.
      Fear is the little-death that brings total obliteration.
      I will face my fear.
      I will permit it to pass over me and through me.
      And when it has gone past I will turn the inner eye to see its path.
      Where the fear has gone there will be nothing.
      Only I will remain.

      Litany against fear by Bene Gesserit
    • votdev wrote:

      davidh2k wrote:

      @votdev Maybe Volker can work on something, but I bet he wouldn't like it much.

      Greetings
      David
      Hehe, you are right. I don't like this idea because as I have really often said, OMV is no webmin replacement. Everything should be done automagically. Or do you see such selection boxes in Synology or other NAS solutions? If such a feature will be implemented it MUST be done magically without user interaction.

      Here is an example: forum.rockstor.com/t/s-m-a-r-t-for-usb-drives/834/9

      it has a short edit button on the SMART menu where one can add specific "-d "requirements for each drive.

      I see Rockstor as the closest competitor for OMV, unfortunately, it is a bit too buggy and BTRFS is way off being production ready, which is their required filesystem.