[smartmontools-support] Reasonable config of smartd to determine when to discard a disk (SATA, NVMe)

Erik Starbäck erik.starback at uppmax.uu.se
Thu Nov 23 12:34:11 CET 2023


For all our nodes with HW-raid we think it is rather clear... when the raid kicks out a disk. We discard it. We let the HW-RAID make the decision for us.

But most of our nodes doesn't have HW-raid and we need to decide in some other way...

Our current approach is to let smartd do the thing. I let the systems use the distributions default config rule (Centos7, Rock8 and some Ubuntu).

But when to discard? And when to reinstall it with the same disk? And when to just ignore the warning.

For example: I got mails about "Device: /dev/sda [SAT], 8 Currently unreadable (pending) sectors". Is it insane to discard that disk?

With this line I realize smartd can mail with 152 different subjects. How could I decide what action to make?

 strings /usr/sbin/smartd|grep "Device: %s"

We have about 1500 machines running smartd so we need some kind of simple way to decide. We can't do manual examination for all reports...

Thanks for some input!

/Erik Starbäck, UPPMAX, Uppsala University

När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/

E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy

More information about the Smartmontools-support mailing list