[smartmontools-support] using smartd to monitor a rotating group of USB drives?
Nathan Stratton Treadway
nathanst at ontko.com
Sun Nov 4 23:02:52 CET 2018
On Sun, Nov 04, 2018 at 19:51:32 +0100, Christian Franke wrote:
> >B) when a new drive is plugged in, it appears that smartd doesn't check
> > to see if the drive now found as /dev/sdX is actually the same drive
> > as the one found at start-up time -- and thus over time multiple
> > different drives are all treated as the same drive, with data saved
> > the same /var/lib/smartmontools/*<DEVICE_MODEL>_<SERIAL_NUMBER>*
> > files (named after whatever happened to be plugged in at startup
> > time) -- and, I believe, any warning emails sent include the original
> > drive's info in the body of the message, rather than the info for the
> > drive that's actually attached at that time.
>
> A platform independent solution is possibly difficult. It may work
> to add another option (e.g. "-d replaceable") to enable a re-check
> of device identity before each check cycle.
Is part of the issue that a normal check cycle doesn't retrieve the
identity information from the drive?
Specifically for the "B)" issue, it seems like the platform shouldn't be
an issue -- or, at least, the platform-dependent factors should be no
greater than what already has been implemented: smartd must already get
and parse the model+serial info when it first detects the device (since
it names the state file, etc. using that info), so the question is
whether it should repeat that same process each time it opens the
device, to make sure the assumption that one physical device is
associated with a particular /dev/ file continues to hold true.
If there is too much of a performance hit (or whatever) re-applying the
identity logic on (e.g.) internal-to-machine SATA drives for each check
cycle, a new option to turn on such checks for particular devices would
certainly be an improvement over the current situation.
However, given the confusion that results if a device-change happens
without smartd noticing, in general it seems like it would be ideal just
to always ask the question "Is the device currently accessed when I open
this device file the device I expect to be there?". (I haven't tried it
myself, but I believe some of our servers have hot-pluggable drive bays,
so even the /dev/sdb physically located inside the machine could be
unplugged and replaced while the system is running....)
In those cases where a new device is detected, the "easy" response is
simply to treat the situation similar to the FailedOpenDevice case: skip
the device for the round of checks, but try again in future check
cycle and see if the original device comes back.
Obviously the "fancy" response is to go ahead and start monitoring the
new device, making sure to do so using the new device's identity
information and switching to the appropriate .state file for that new
device, etc. It might well make sense to require this response to be
explicitly enabled (via a new option in smartd.conf). (And quite
possibily actually implementing the fancy response overlaps some
functionality with the resolution for Track ticket #60.... )
Nathan
----------------------------------------------------------------------------
Nathan Stratton Treadway - nathanst at ontko.com - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
More information about the Smartmontools-support
mailing list