[smartmontools-support] using smartd to monitor a rotating group of USB drives?
Nathan Stratton Treadway
nathanst at ontko.com
Wed Oct 31 06:19:25 CET 2018
We have a backup server which has four SATA drives inside the system,
plus a cycle of USB external drives which are plugged in for a few days
while waiting for a backup to be written to them, then once used are
unplugged (and taken off site), and the next drive in the cycle in
plugged in.
We list the four internal explicitly drives in smartd.conf, and smartd
monitors them with no problem.
We'd also like to have smartd check the status of whatever external
drive(s) happen to be plugged in at any point in time, and send an alert
email if there are SMART attributes (or error logs, or whatever) on those
drives reflecting a failure condition... but it doesn't really work as
we'd like.
(We've been trying this out using smartmontools 6.5+svn4324-1 as found
in the Bionic release of Ubuntu.)
Of course we looked at the the "-d removable" option lines in
smartd.conf... but that doesn't seem to apply DEVICESCAN-detected
devices.
In any case, we're running into the following issues in our scenario:
A) we get "SMART error (FailedOpenDevice) detected" warning messages
each time a drive is unplugged, repeated daily until some new drive
is plugged in and takes over the old /dev/sdX device name.
B) when a new drive is plugged in, it appears that smartd doesn't check
to see if the drive now found as /dev/sdX is actually the same drive
as the one found at start-up time -- and thus over time multiple
different drives are all treated as the same drive, with data saved
the same /var/lib/smartmontools/*<DEVICE_MODEL>_<SERIAL_NUMBER>*
files (named after whatever happened to be plugged in at startup
time) -- and, I believe, any warning emails sent include the original
drive's info in the body of the message, rather than the info for the
drive that's actually attached at that time.
C) smartd only checks for devices mapped to the /dev/sd* files that were
found at startup time; if more drives are plugged in simultaneously
after startup, the "extras" won't be detected.
I looked through the FAQ page on the Wiki and the Trac tickets, but the
only thing I found that appeared directly related to the above issues
was Track ticket #60 "DEVICESCAN and hotplug", which covers issue C).
Is there some way I'm missing to suppress the FailedOpenDevice alert
message completely for particular devices?
It seems like issue B) would cause problems in many situations, even on
systems that don't purposefully go through a who cycle of external
drives over time, but I haven't been able to find an existing Trac
ticket for it. Does it make sense for me to open one?
Has anyone else had any success using smartd to watch out for
warning/error conditions on a rotating group of external USB drives
(under Linux in particular)?
One thought I had to try working around all three issues (at the expense
of actively monitoring the drives for the full period they are plugged
in) was to leave the standard Ubuntu-installed smartd running as usual,
monitoring the internal drives, and then periodically run a separate
"smartd -q onecheck -c /etc/smartd_for_usb_drives.conf" command from
time to time to do a status check on whatever external drives are
available at that moment.
Will it cause any problems/conflicts to run "smartd -q onecheck" at the
same time as the standard long-term smartd process is running (assuming
that I make sure the internal drives are excluded from monitoring by the
smartd_for_usb_drives.conf file)?
Thanks.
Nathan
----------------------------------------------------------------------------
Nathan Stratton Treadway - nathanst at ontko.com - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
More information about the Smartmontools-support
mailing list