Category:HDD Monitoring with rrdtool
Introduction
This HowTo explains how you can set up continuous monitoring of all your harddisks. It
- uses smartmonctl
- writes every 30 min the current status to a round robin database using rrdtool
- generates for each S.M.A.R.T parameter 3 charts showing the status of the last week / last month / last year
Install Packages
- if not yet done, install Optware IPGK via the QNAP Web Administration site (under "App Center")
Alternative 1:
- launch Optware via the App Center (will open "The ipkg web frontend")
- to update the catalogue, select "Sync packages" -> yes, then press Submit
- filter to "smartmontools" and press Submit then click "install"
- filter to "rrdtool" and press Submit then click "install"
Alternative 2:
Log into your QNAP with SSH.
# ipkg install smartmontools # ipkg install rrdtool
Prepare Directories
# mkdir /mnt/HDA_ROOT/smartrrd # mkdir /share/Web/smartrrd
Install and Adopt the Script
Copy the following script to /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.sh
#!/bin/sh
script_dir=$(dirname "${BASH_SOURCE[0]}")
script_runtime=$(date '+%s')
http_path="/share/Web/smartrrd"
# 1 5 29 38 44 50 57 67 76 88
# +4 +24 +9 +6 +6 +7 +10 +9 +12
# ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
# 1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 72984072
smart_regex="^(.{4})(.{24})(.{9})(.{6})(.{6})(.{7})(.{10})(.{9})(.{12})(.+)$"
. $script_dir/smartctl_all_drives.conf
declare -a ATTRIBUTES
IFS='
'
# Get data for all drives from smartmontools and store it in an array ATTRIBUTES
# Later on this will allow to write values from all drives at once to the *.rrd file
for disk in /dev/sd[a-d]
do
for oneline in $(smartctl -d ata -A $disk | grep 'Always\|Offline')
do
[[ $oneline =~ $smart_regex ]]
smart_DISK=${disk:(-3)}
smart_ID=${BASH_REMATCH[1]// /}
smart_ID3=$(printf "%03d" $smart_ID)
smart_ATTRIBUTE_NAME=${BASH_REMATCH[2]// /}
smart_FLAG=${BASH_REMATCH[3]// /}
smart_VALUE=${BASH_REMATCH[4]// /}
smart_WORST=${BASH_REMATCH[5]// /}
smart_THRESH=${BASH_REMATCH[6]// /}
smart_TYPE=${BASH_REMATCH[7]// /}
smart_UPDATED=${BASH_REMATCH[8]// /}
smart_WHEN_FAILED=${BASH_REMATCH[9]// /}
smart_RAW_VALUE=${BASH_REMATCH[10]%(*} # remove trailing "(..." string manipulation
smart_RAW_VALUE=${smart_RAW_VALUE// /}
# populate attributes array
ATTRIBUTES[$smart_ID]+="$smart_DISK#$smart_RAW_VALUE "
done
done
IFS=' '
# Scan array ATTRIBUTES for values and if existing, write all values to *.rrd
# If necessary (e.g. when run for the first time), create the database
for i in {1..256}
do
if [[ ${ATTRIBUTES[$i]} ]]; then
smart_ID3=$(printf "%03d" $i)
rrd_ds=""
rrd_value=""
for disk_rawvalue in ${ATTRIBUTES[$i]}
do
rrd_ds+=${disk_rawvalue%'#'*}:
rrd_value+=${disk_rawvalue#*'#'}:
done
rrd_ds=${rrd_ds%:}
rrd_value=${rrd_value%:}
# create RRD if not yet exist
if [[ ! -f $script_dir/rrd/$smart_ID3.rrd ]]; then
rrdtool create "$script_dir/rrd/$smart_ID3.rrd" \
--step 1800 \
DS:sda:GAUGE:3600:0:U \
DS:sdb:GAUGE:3600:0:U \
DS:sdc:GAUGE:3600:0:U \
DS:sdd:GAUGE:3600:0:U \
RRA:MAX:0.5:1:336 \
RRA:MAX:0.5:2:744 \
RRA:MAX:0.5:48:365
# RRA:MAX:0.5:1:336 -> every 30min for 2x24x7 times (one week in 30min interval)
# RRA:MAX:0.5:2:744 -> every second 30min for 24x31 times (one month in 1h interval)
# RRA:MAX:0.5:48:365 -> every 48th 30min for 365 times (one year in 1day interval)
fi
rrdtool update "$script_dir/rrd/$smart_ID3.rrd" -t $rrd_ds $script_runtime:$rrd_value
fi
done
# Create charts for all existing *.rrd file
for filename in $script_dir/rrd/*.rrd
do
smart_ID3=${filename%'.'*}
smart_ID3=${smart_ID3#*'/'rrd'/'}
smart_ID=$(echo $smart_ID3 | sed 's/^0*//')
rrdtool graph "$http_path/${smart_ID3}_week.png" -a PNG --title="${smart_attributes[$smart_ID]}" \
--vertical-label "RAW_VALUE" --start end-1w --end $script_runtime \
DEF:a=$filename:sda:MAX \
DEF:b=$filename:sdb:MAX \
DEF:c=$filename:sdc:MAX \
DEF:d=$filename:sdd:MAX \
LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \
LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \
LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \
LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s"
rrdtool graph "$http_path/${smart_ID3}_month.png" -a PNG --title="${smart_attributes[$smart_ID]}" \
--vertical-label "RAW_VALUE" --start end-1m --end $script_runtime \
DEF:a=$filename:sda:MAX \
DEF:b=$filename:sdb:MAX \
DEF:c=$filename:sdc:MAX \
DEF:d=$filename:sdd:MAX \
LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \
LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \
LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \
LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s"
rrdtool graph "$http_path/${smart_ID3}_year.png" -a PNG --title="${smart_attributes[$smart_ID]}" \
--vertical-label "RAW_VALUE" --start end-1y --end $script_runtime \
DEF:a=$filename:sda:MAX \
DEF:b=$filename:sdb:MAX \
DEF:c=$filename:sdc:MAX \
DEF:d=$filename:sdd:MAX \
LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \
LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \
LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \
LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s"
done
# Recreate index.html
echo "" > $http_path/index.html
for i in {1..256}
do
if [[ ${ATTRIBUTES[$i]} ]]; then
smart_ID3=$(printf "%03d" $i)
echo "<img src=\"${smart_ID3}_week.png\"><img src=\"${smart_ID3}_month.png\"><img src=\"${smart_ID3}_year.png\"><br>" \
>> $http_path/index.html
fi
done
The script is designed for the 4 drives sda, sdb, sdc, sdd.
There are several positions in the script that have to be addapted accordingly if you have more or less drives or different identifiers (e.g. sda).
I posted this script here with the hope that somebody would make it more flexible later .-)
- for disk in /dev/sd[a-d] -> change according to what "fdisk -l" says about installed drives
- DS:sda:GAUGE:3600:0:U -> add/remove additional drives
- DEF:a=$filename:sda:MAX \ -> add/remove additional drives
- LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \ -> add/remove additional drives in all 3 charts (week/month/year), also change the color
Install Script Config File
Save the following file to /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.conf
The array is used to create meaningful chart titles.
smart_attributes[1]='001 Raw_Read_Error_Rate' smart_attributes[2]='002 Throughput_Performance' smart_attributes[3]='003 Spin_Up_Time' smart_attributes[4]='004 Start_Stop_Count' smart_attributes[5]='005 Reallocated_Sector_Ct' smart_attributes[7]='007 Seek_Error_Rate' smart_attributes[8]='008 Seek_Time_Performance' smart_attributes[9]='009 Power_On_Hours' smart_attributes[10]='010 Spin_Retry_Count' smart_attributes[11]='011 Calibration_Retry_Count' smart_attributes[12]='012 Power_Cycle_Count' smart_attributes[181]='181 Program_Fail_Cnt_Total' smart_attributes[183]='183 Runtime_Bad_Block' smart_attributes[184]='184 End-to-End_Error' smart_attributes[187]='187 Reported_Uncorrect' smart_attributes[188]='188 Command_Timeout' smart_attributes[189]='189 High_Fly_Writes' smart_attributes[190]='190 Airflow_Temperature_Cel' #smart_attributes[190]='190 ??' smart_attributes[191]='191 G-Sense_Error_Rate' smart_attributes[192]='192 Power-Off_Retract_Count' smart_attributes[193]='193 Load_Cycle_Count' smart_attributes[194]='194 Temperature_Celsius' smart_attributes[195]='195 Hardware_ECC_Recovered' smart_attributes[196]='196 Reallocated_Event_Count' smart_attributes[197]='197 Current_Pending_Sector' smart_attributes[198]='198 Offline_Uncorrectable' smart_attributes[199]='199 UDMA_CRC_Error_Count' smart_attributes[200]='200 Multi_Zone_Error_Rate' #smart_attributes[200]='200 ???' smart_attributes[223]='223 Load_Retry_Count' smart_attributes[225]='225 Load_Cycle_Count' smart_attributes[240]='240 Head_Flying_Hours' #smart_attributes[240]='240 ???' smart_attributes[241]='241 Total_LBAs_Written' smart_attributes[242]='242 Total_LBAs_Read'
In case you miss values here, please edit this wiki page and add them above. You should identify the attribute name using
smartctl -d ata -A /dev/hda
Unfortunately there are IDs that have multiple meanings like 190, 200, 230, 231, 232, 233, 240 (see: http://en.wikipedia.org/wiki/S.M.A.R.T.)
In case your drives use the strings that are commented out, adapt the .conf file accordingly.
Setup crontab
# vi /etc/config/crontab add the following line: */30 * * * * /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.sh # crontab /etc/config/crontab # /etc/init.d/crond.sh restart
After 30 minutes there should be files in the directory /mnt/HDA_ROOT/smartrrd/rrd as well as in /share/Web/smartrrd
On my system, I tested the smartctl_all_drives.sh script at the command line and got an error apparently related to rrd directory creation. Also: chmod +x the smartctl_all_drives.sh and smartctl_all_drives.conf.
Manually creating the rrd directory seemed to make things work.
[/mnt/HDA_ROOT/smartrrd] # ./smartctl_all_drives.sh ERROR: creating './rrd/001.rrd': No such file or directory ERROR: opening './rrd/001.rrd': No such file or directory .. [/mnt/HDA_ROOT/smartrrd] # mkdir rrd [/mnt/HDA_ROOT/smartrrd] # ./smartctl_all_drives.sh 497x207 497x207 ...
Open Monitoring Website
Make sure Web Server service is enabled (Control Panel, Applications, Web Server) .
Now you can open the monitoring site which should be available somewhere under
http://<QNAP>/smartrrd https://<QNAP>/smartrrd https://<QNAP>:8081/smartrrd
Enjoy
This category currently contains no pages or media.