[HowTo] Log disk power states with collectd and Grafana

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • [HowTo] Log disk power states with collectd and Grafana

      I created a little script to keep track of the disk states (spindown/spinup) using collectd, InfluxDB and Grafana.
      It is based on this script: github.com/collectd/collectd/b…ter/contrib/exec-smartctl
      Now I can see in a nice graphic about which drive was spun up or down at wich point.



      Requirements:

      • collectd (to collect the data; is already part of OMV)
      • InfluxDB (to store the data)
      • Grafana (to show the data)
      I will not explain how to install InfluxDB and Grafana. There are plenty of tutorials available.

      First, you need to configure collectd to send the data to InfluxDB.
      Create the following file to configure collectd to send data to your InfluxDB-Server.
      Don't forget to change the ip address.

      Shell-Script: /usr/share/openmediavault/mkconf/collectd.d/network

      1. #!/bin/sh
      2. # This script adds support for sending data to InfluxDB
      3. set -e
      4. . /etc/default/openmediavault
      5. . /usr/share/openmediavault/scripts/helper-functions
      6. OMV_COLLECTD_CONFIG=${OMV_COLLECTD_CONFIG:-"/etc/collectd/collectd.conf"}
      7. OMV_COLLECTD_RRDTOOL_MKGRAPH=${OMV_COLLECTD_RRDTOOL_MKGRAPH:-"/usr/sbin/omv-mkgraph"}
      8. HOST_NAME=`hostname`
      9. # change hostname
      10. sed -i -e "s/Hostname\ \"localhost\"/Hostname\ \"$HOST_NAME\"/g" ${OMV_COLLECTD_CONFIG}
      11. cat <<EOF >> ${OMV_COLLECTD_CONFIG}
      12. LoadPlugin network
      13. <Plugin "network">
      14. Server "<<<<influxdb-ip-address>>>" "25826"
      15. </Plugin>
      16. EOF
      Display All
      EDIT: I think I forgot to mention you need to copy github.com/collectd/collectd/blob/master/src/types.db to /usr/share/collectd/types.db.
      And also see this guide to prepare InfluxDB to receive data from collectd. anomaly.io/collectd-metrics-to-influxdb/



      Create a user named 'smart' that has the permission to execute 'smartctl' with sudo.

      Source Code: /etc/sudoers.d/smart

      1. Cmnd_Alias SMARTCTL = /usr/sbin/smartctl
      2. smart ALL = (root) NOPASSWD: SMARTCTL

      Create the following file which will read the power state of your drives.
      Change the list of drives ("[..] sda sdb sdc ...") according to your system.

      Shell-Script: /usr/share/collectd/exec-disk_state.sh

      1. #!/bin/bash
      2. HOSTNAME="${COLLECTD_HOSTNAME:-$(hostname -f)}"
      3. INTERVAL="${COLLECTD_INTERVAL:-60}"
      4. # ls /dev | grep -E "^sd[a-z]$"
      5. while sleep "$INTERVAL"
      6. do
      7. for disk in sda sdb sdc sdd sde sdf
      8. do
      9. STATE=$(sudo smartctl -i -n standby /dev/$disk | grep -e "Device is in STANDBY mode" -e "Power mode is: ACTIVE or IDLE" 2>/dev/null)
      10. if [ "$STATE" = "Device is in STANDBY mode, exit(2)" ]
      11. then
      12. # STANBY
      13. VALUE="0"
      14. else
      15. if [ "$STATE" = "Power mode is: ACTIVE or IDLE" ]
      16. then
      17. # ACTIVE or IDLE
      18. VALUE="1"
      19. else
      20. # ERROR
      21. VALUE="U"
      22. fi
      23. fi
      24. echo "PUTVAL $HOSTNAME/exec-$disk/gauge-disk_state interval=$INTERVAL N:$VALUE"
      25. done
      26. done
      Display All
      If you execute the script manually it will take 60 seconds before the first output is printed.

      As a bonus, here is the slightly modified version of the original exec-smartctl script.
      This will read the disk temperature only when the disks are not in standby.

      Shell-Script: /usr/share/collectd/exec-smartctl

      1. #!/bin/bash
      2. HOSTNAME="${COLLECTD_HOSTNAME:-$(hostname -f)}"
      3. INTERVAL="${COLLECTD_INTERVAL:-10}"
      4. while sleep "$INTERVAL"
      5. do
      6. for disk in sda sdb sdc sdd sde sdf
      7. do
      8. TEMP=$((sudo smartctl -n standby -d ata -A /dev/$disk | grep Temperature_Celsius | awk '{ print $10; }') 2>/dev/null);
      9. if [ $? -ne 0 ]
      10. then
      11. TEMP="U"
      12. fi
      13. echo "PUTVAL $HOSTNAME/exec-$disk/temperature-disk_temp interval=$INTERVAL N:$TEMP"
      14. done
      15. done
      Display All
      Don't forget to make the scripts executable.
      Now we need to tell collectd to use these scripts.

      Shell-Script: /usr/share/openmediavault/mkconf/collectd.d/exec

      1. #!/bin/sh
      2. # This script adds support for hdd power state
      3. set -e
      4. . /etc/default/openmediavault
      5. . /usr/share/openmediavault/scripts/helper-functions
      6. OMV_COLLECTD_CONFIG=${OMV_COLLECTD_CONFIG:-"/etc/collectd/collectd.conf"}
      7. OMV_COLLECTD_RRDTOOL_MKGRAPH=${OMV_COLLECTD_RRDTOOL_MKGRAPH:-"/usr/sbin/omv-mkgraph"}
      8. cat <<EOF >> ${OMV_COLLECTD_CONFIG}
      9. LoadPlugin exec
      10. <Plugin exec>
      11. Exec "smart" "/usr/share/collectd/exec-disk_state.sh"
      12. Exec "smart" "/usr/share/collectd/exec-smartctl"
      13. </Plugin>
      14. EOF
      Display All



      Finally execute the following commands to create and activate the new collectd config.

      Shell-Script

      1. # create config
      2. sudo omv-mkconf collectd
      3. # Restart collectd (not sure if necessary)
      4. sudo systemctl restart collectd.service



      At this moment the data should be sent to your InfluxDB.
      The next step is to create a nice graph for this data.
      To display the graph with discrete values install this plugin: grafana.com/plugins/natel-discrete-panel

      I used this dashboard template as a starting point: grafana.com/dashboards/554
      You can import my graphs using this json data.

      Source Code

      1. {
      2. "backgroundColor": "rgba(128, 128, 128, 0.1)",
      3. "colorMaps": [
      4. {
      5. "color": "rgb(148, 45, 45)",
      6. "text": "active"
      7. },
      8. {
      9. "color": "rgb(94, 159, 61)",
      10. "text": "standby"
      11. }
      12. ],
      13. "datasource": "$datasource",
      14. "display": "timeline",
      15. "extendLastValue": true,
      16. "height": "",
      17. "highlightOnMouseover": true,
      18. "id": 17,
      19. "legendSortBy": "-ms",
      20. "lineColor": "rgba(128, 128, 128, 1.0)",
      21. "links": [],
      22. "mappingTypes": [
      23. {
      24. "name": "value to text",
      25. "value": 1
      26. },
      27. {
      28. "name": "range to text",
      29. "value": 2
      30. }
      31. ],
      32. "metricNameColor": "#000000",
      33. "rangeMaps": [
      34. {
      35. "from": "null",
      36. "text": "N/A",
      37. "to": "null"
      38. }
      39. ],
      40. "rowHeight": 16,
      41. "showDistinctCount": false,
      42. "showLegend": true,
      43. "showLegendCounts": false,
      44. "showLegendNames": true,
      45. "showLegendPercent": false,
      46. "showLegendTime": true,
      47. "showLegendValues": true,
      48. "showTransitionCount": false,
      49. "span": 6,
      50. "targets": [
      51. {
      52. "alias": "$tag_instance",
      53. "dsType": "influxdb",
      54. "groupBy": [
      55. {
      56. "params": [
      57. "instance"
      58. ],
      59. "type": "tag"
      60. }
      61. ],
      62. "hide": false,
      63. "measurement": "exec_value",
      64. "orderByTime": "ASC",
      65. "policy": "default",
      66. "refId": "A",
      67. "resultFormat": "time_series",
      68. "select": [
      69. [
      70. {
      71. "params": [
      72. "value"
      73. ],
      74. "type": "field"
      75. }
      76. ]
      77. ],
      78. "tags": [
      79. {
      80. "key": "host",
      81. "operator": "=~",
      82. "value": "/^$host$/"
      83. },
      84. {
      85. "condition": "AND",
      86. "key": "type_instance",
      87. "operator": "=",
      88. "value": "disk_state"
      89. }
      90. ]
      91. }
      92. ],
      93. "textSize": 12,
      94. "title": "Disk states",
      95. "type": "natel-discrete-panel",
      96. "valueMaps": [
      97. {
      98. "op": "=",
      99. "text": "N/A",
      100. "value": "null"
      101. },
      102. {
      103. "op": "=",
      104. "text": "standby",
      105. "value": "0"
      106. },
      107. {
      108. "op": "=",
      109. "text": "active",
      110. "value": "1"
      111. }
      112. ],
      113. "valueTextColor": "#000000",
      114. "writeAllValues": false,
      115. "writeLastValue": true,
      116. "writeMetricNames": true
      117. }
      Display All

      Source Code

      1. {
      2. "backgroundColor": "rgba(128, 128, 128, 0.1)",
      3. "colorMaps": [
      4. {
      5. "color": "rgb(148, 45, 45)",
      6. "text": "active"
      7. },
      8. {
      9. "color": "rgb(94, 159, 61)",
      10. "text": "standby"
      11. }
      12. ],
      13. "datasource": "$datasource",
      14. "display": "timeline",
      15. "extendLastValue": true,
      16. "height": "",
      17. "highlightOnMouseover": true,
      18. "id": 17,
      19. "legendSortBy": "-ms",
      20. "lineColor": "rgba(128, 128, 128, 1.0)",
      21. "links": [],
      22. "mappingTypes": [
      23. {
      24. "name": "value to text",
      25. "value": 1
      26. },
      27. {
      28. "name": "range to text",
      29. "value": 2
      30. }
      31. ],
      32. "metricNameColor": "#000000",
      33. "rangeMaps": [
      34. {
      35. "from": "null",
      36. "text": "N/A",
      37. "to": "null"
      38. }
      39. ],
      40. "rowHeight": 16,
      41. "showDistinctCount": false,
      42. "showLegend": true,
      43. "showLegendCounts": false,
      44. "showLegendNames": true,
      45. "showLegendPercent": false,
      46. "showLegendTime": true,
      47. "showLegendValues": true,
      48. "showTransitionCount": false,
      49. "span": 6,
      50. "targets": [
      51. {
      52. "alias": "$tag_instance",
      53. "dsType": "influxdb",
      54. "groupBy": [
      55. {
      56. "params": [
      57. "instance"
      58. ],
      59. "type": "tag"
      60. }
      61. ],
      62. "hide": false,
      63. "measurement": "exec_value",
      64. "orderByTime": "ASC",
      65. "policy": "default",
      66. "refId": "A",
      67. "resultFormat": "time_series",
      68. "select": [
      69. [
      70. {
      71. "params": [
      72. "value"
      73. ],
      74. "type": "field"
      75. }
      76. ]
      77. ],
      78. "tags": [
      79. {
      80. "key": "host",
      81. "operator": "=~",
      82. "value": "/^$host$/"
      83. },
      84. {
      85. "condition": "AND",
      86. "key": "type_instance",
      87. "operator": "=",
      88. "value": "disk_state"
      89. }
      90. ]
      91. }
      92. ],
      93. "textSize": 12,
      94. "title": "Disk states",
      95. "type": "natel-discrete-panel",
      96. "valueMaps": [
      97. {
      98. "op": "=",
      99. "text": "N/A",
      100. "value": "null"
      101. },
      102. {
      103. "op": "=",
      104. "text": "standby",
      105. "value": "0"
      106. },
      107. {
      108. "op": "=",
      109. "text": "active",
      110. "value": "1"
      111. }
      112. ],
      113. "valueTextColor": "#000000",
      114. "writeAllValues": false,
      115. "writeLastValue": true,
      116. "writeMetricNames": true
      117. }
      Display All


      Or create the graphs manually according to this screenshots.




      I hope this will be of use for someone.
      And feel free to give feedback in any way. :D

      The post was edited 3 times, last by cwempe ().