[smartmontools-support] Disk failure notification options

Tue Jan 5 03:17:16 CET 2021

Hi,

I have a fedora33 system with smartmontools-7.1 and believe I have a
failing disk:

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68EUZN0
...
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       1946
  3 Spin_Up_Time            0x0027   178   178   021    Pre-fail
Always       -       6100
  4 Start_Stop_Count        0x0032   100   100   000    Old_age
Always       -       116
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       2
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
Always       -       0
  9 Power_On_Hours          0x0032   044   044   000    Old_age
Always       -       41073
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age
Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age
Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       113
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age
Always       -       55
193 Load_Cycle_Count        0x0032   001   001   000    Old_age
Always       -       9461832
194 Temperature_Celsius     0x0022   118   110   000    Old_age
Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
Offline      -       26

If I recall, the raw_read_error_rate was less than a few hundred about
a month ago. Are there any other indications here that would lead you
to believe this disk is failing? Do these disks typically last more
than 60,000 hours?

I have smartd running:
/usr/sbin/smartd -n -q never

I've also configured /etc/smartmontools/smartd.conf with the following
for this disk:
/dev/sdc -a -R 1 -W 4,45,55 -H -m admin at example.com -M exec
/usr/libexec/smartmontools/smartdnotify -n standby,10,q

I'm hoping this command will do the following:
- monitor all drive aspects
- send an alert whenever the Raw_Read_Error_Rate changes
- send an alert whenever Temperature changes >= 4 Celsius or , >= 45C
and log a critical alert when temp is >= 55

I just want to be sure I'm not doing something wrong that will
overlook an early warning alert for this drive failing.

Thanks,
Alex