[smartmontools-support] Reading NVME logs
Christian Franke
Christian.Franke at t-online.de
Wed Dec 21 15:59:57 CET 2022
Thane K. Sherrington wrote:
> Hi all,
> Looking at NVME logs, I find them hard to understand.
>
> Take the following logs (questions inline).
>
> smartctl pre-7.4 2022-07-17 r5397 [i686-w64-mingw32-w11-21H2(64)]
> (CircleCI)
> ...
> === START OF SMART DATA SECTION ===
> **>>>>>>>>>>>>> *SMART overall-health self-assessment test result:
> PASSED - I assume this is good, but a perfect indicator, since I've
> seen regular drives say the passed SMART but had read errors.*
This message only exists for consistency with ATA and SCSI output. It
prints PASSED if and only if the "Critical Warning" byte from
SMART/Health info is zero.
Since the early days of ATA SMART, read errors do not imply that SMART
failure is reported.
https://www.smartmontools.org/wiki/FAQ#ATAdriveisfailingself-testsbutSMARThealthstatusisPASSED.Whatsgoingon
> ...
> SMART/Health Information (NVMe Log 0x02)
For details about this log, see for example "Figure 207" from "NVM
Express Base Specification, revision 2.0b":
https://nvmexpress.org/developers/nvme-specification/
> Critical Warning: 0x00
Bit 0 of this byte would be set if "the available spare capacity has
fallen below the threshold.". Then "FAILED!" would be printed above. See
spec for more bits.
> *>>>>>>>>>>>>> Temperature: 79 Celsius - This
> seems high - is that a problem?*
Possibly. If this persists, I would suggest to add more cooling.
> **>>>>>>>>>>>>> *Available Spare: 100% - This looks
> like 100% of the spare is free. I assume that's a good sign?*
Yes.
> Available Spare Threshold: 10%
> **>>>>>>>>>>>>> *Percentage Used: 16% - Does this
> mean that 16% of the drive is used, or 16% of the spare space is used?*
"Contains a vendor specific estimate of the percentage of NVM subsystem
life used based on the actual usage and the manufacturer’s prediction of
NVM life. ...".
See spec for full text.
> *>>>>>>>>>>>>> **Unsafe Shutdowns: 703 - Do unsafe
> shutdowns matter? *
This possibly indicates that the system was not properly shut down
frequently.
> *>>>>>>>>>>>>> **Error Information Log Entries: 3,470 - Is this
> a bad thing?*
Unknown. Experience shows that drive firmware of some vendors increments
this frequently.
> Error Information (NVMe Log 0x01, 16 of 63 entries)
> **>>>>>>>>>>>>> *No Errors Logged - If there are 3470 log entries, why
> are no errors logged?*
>
"The controller should clear this log page by removing all entries on
power cycle and Controller Level Reset."
This differs from ATA error logs which are persistent.
Regards,
Christian
More information about the Smartmontools-support
mailing list