[smartmontools-support] NVMe Critical Warning: What if multiple bits are set?

Noel Kuntze noel.kuntze at thermi.consulting
Sun Jun 9 03:55:33 CEST 2019


Hello Claudio,

You're not quite right in your analysis.
Following is the correct understanding of the function you referenced in the link:

  * 0x01 = available spare has fallen below threshold
  * 0x02 = temperature is above or below threshold
  * 0x04 = NVM subsystem reliability has been degraded
  * 0x08 = media has been placed in read only mode
  * 0x10 = volatile memory backup device has failed
  * any other bits set = unknown critical warning(s)

I just changed the last line.
The unsigned char w, which is a 1 byte (8 bit) value. It holds the set warnings, where each bit can be set or unset, depending on if a warning exists.
E.g. if "available spare has fallen below threshold" and "temperature is above or below threshold" were set, w would have the value 0x3 (just 0x1 and 0x2 ORed together).
So the code understands the warning values of the first (depending on endianess) 5 bits of the byte, but if any others are set, it also adds "unknown critical warning(s)" to the output using jout(). The code handles the case in which any of the bits 6 to 8 (if the bits are indexed starting with 1)are set as "unknown critical warning(s)".

Kind regards

Noel

Am 07.06.19 um 10:05 schrieb Claudio Kuenzler:
> Hello all,
>
> Thanks a lot for constantly working and improving smartmontools, really appreciate that!
>
> I'm trying to figure out the value of the "Critical Warning" attribute in NVMe devices.
> In the source code (https://github.com/smartmontools/smartmontools/blob/e3fdde7aff4cd069e629ee987bf33ac8ccd621ad/smartmontools/nvmeprint.cpp#L300) I can find the following relevant information:
>
>   * 0x01 = available spare has fallen below threshold
>   * 0x02 = temperature is above or below threshold
>   * 0x04 = NVM subsystem reliability has been degraded
>   * 0x08 = media has been placed in read only mode
>   * 0x10 = volatile memory backup device has failed
>   * 0x1f = unknown critical warning(s)
>
>
> So far so good, that's understandable and is translated by smartctl itself into the self-assessment health.
>
> But according to the NVMe specification, it is possible that multiple alerts are set at the same time:
>
> > This field indicates critical warnings for the state of the controller.Each bit corresponds to a critical warning type; multiple bits may be set
>
> Did anyone ever see such a case where two thresholds were reached? For example it is possible that 0x01 and 0x02 can happen at the same time. How will this affect the value and how will smartctl cope with it? Will it just appear as "unknown critical warning(s)"?
>
> Thanks
>
> _______________________________________________
> Smartmontools-support mailing list
> Smartmontools-support at listi.jpberlin.de
> https://listi.jpberlin.de/mailman/listinfo/smartmontools-support

-- 
Noel Kuntze
IT security consultant

GPG Key ID: 0x0739AD6C
Fingerprint: 3524 93BE B5F7 8E63 1372 AF2D F54E E40B 0739 AD6C


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://listi.jpberlin.de/pipermail/smartmontools-support/attachments/20190609/11ae7206/attachment.asc>


More information about the Smartmontools-support mailing list