[smartmontools-support] Automatically alert the user on S.M.A.R.T. health warnings

Claudio Kuenzler napsty at gmail.com
Tue Nov 23 13:02:41 CET 2021


After some time, I decided to learn some powerful, popular monitoring
> system that could automatically alert me about these kinds of problems.
> There
> would be some learning curve due to time-series databases etc., but it
> would be worth it. My very simple and common PC health checks will surely
> be
> built-in, ready to use. So I chose Prometheus to start with.
>

You can go down this road of course, but then you will have to adjust
Prometheus to "trigger" alerts when certain thresholds are reached.

The advice I am looking for here is not actually specific to Prometheus.
> The main problem is that I am not familiar with the S.M.A.R.T. values. What
> is the difference between "raw value" and "value"? Which one do I need to
> compare against "threshold"? Should that be <, <=, > or >= ?
>

This is where this gets tricky. Not all disks have the same SMART
attributes. Sometimes the names differ, the attribute ID might be different
and the thresholds might also be different.
Even if all your computers run the same drive models, certain attributes
can be "ignored", while others have to be taken more serious.

While Prometheus is very helpful for performance measuring and graph
comparison, alerting is not its strong suit. I'd personally go with a
monitoring system which is built for alerting, such as Nagios and forks,
Icinga or Sensu.
You can use the already existing check_smart.pl (perl) monitoring script ->
https://www.claudiokuenzler.com/monitoring-plugins/check_smart.php .
Problem here though is most likely that, if I understood correctly, you
refer to Windows machines as the PCs you are monitoring? This means you
will have to install smartmontools (if not already done) on Windows and
also Perl to be able to launch the monitoring plugin. And you will have to
install a monitoring agent (Icinga, NSClient to name two) to integrate
these PCs into the monitoring software and remotely execute check_smart.

So both ways (Prometheus or classical monitoring) are possible - but both
require you to do some workarounds and adjustments.


>
> (Get-WmiObject -Namespace root\wmi –Class
> MSStorageDriver_FailurePredictStatus).PredictFailure
>

Using WMI is a good idea but I've personally never used it for drive
monitoring. Maybe it works for you :-)


>
> Is there a way to generate such a simple indicator with smartmontools? I
> could then run a script at regular intervals, and feed the result to
> Prometheus.
>

Or you use the previously mentioned check_smart.pl plugin to run it at
regular intervals and feed the result to Prometheus. check_smart does use
smartmontools in the background, just handles the thresholds and creates an
alert to the user. I've not seen that combo with Prometheus yet though. But
I guess it should be possible, as performance data from check_smart could
be parsed and forwarded to Prometheus exporter. However, that's yet another
workaround.

Maybe there are other (commercial) alternatives around, but I personally
focus on open source solutions. The other mailing list members might have
other/additional ideas or solutions.

cheers,
ck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://listi.jpberlin.de/pipermail/smartmontools-support/attachments/20211123/a44ba8bc/attachment.htm>


More information about the Smartmontools-support mailing list