[smartmontools-support] Reading NVME logs

Christian Franke Christian.Franke at t-online.de
Wed Dec 21 15:59:57 CET 2022


Thane K. Sherrington wrote:
> Hi all,
>     Looking at NVME logs, I find them hard to understand.
>
> Take the following logs (questions inline).
>
> smartctl pre-7.4 2022-07-17 r5397 [i686-w64-mingw32-w11-21H2(64)] 
> (CircleCI)
> ...
> === START OF SMART DATA SECTION ===
> **>>>>>>>>>>>>> *SMART overall-health self-assessment test result: 
> PASSED - I assume this is good, but a perfect indicator, since I've 
> seen regular drives say the passed SMART but had read errors.*

This message only exists for consistency with ATA and SCSI output. It 
prints PASSED if and only if the "Critical Warning" byte from 
SMART/Health info is zero.

Since the early days of ATA SMART, read errors do not imply that SMART 
failure is reported.
https://www.smartmontools.org/wiki/FAQ#ATAdriveisfailingself-testsbutSMARThealthstatusisPASSED.Whatsgoingon


> ...
> SMART/Health Information (NVMe Log 0x02)

For details about this log, see for example "Figure 207" from "NVM 
Express Base Specification, revision 2.0b":
https://nvmexpress.org/developers/nvme-specification/


> Critical Warning:                   0x00

Bit 0 of this byte would be set if "the available spare capacity has 
fallen below the threshold.". Then "FAILED!" would be printed above. See 
spec for more bits.


> *>>>>>>>>>>>>> Temperature:                        79 Celsius - This 
> seems high - is that a problem?*

Possibly. If this persists, I would suggest to add more cooling.


> **>>>>>>>>>>>>> *Available Spare:                    100% - This looks 
> like 100% of the spare is free.  I assume that's a good sign?*

Yes.


> Available Spare Threshold:          10%
> **>>>>>>>>>>>>> *Percentage Used:                    16% - Does this 
> mean that 16% of the drive is used, or 16% of the spare space is used?*

"Contains a vendor specific estimate of the percentage of NVM subsystem 
life used based on the actual usage and the manufacturer’s prediction of 
NVM life. ...".
See spec for full text.


> *>>>>>>>>>>>>> **Unsafe Shutdowns:                   703 - Do unsafe 
> shutdowns matter? *

This possibly indicates that the system was not properly shut down 
frequently.


> *>>>>>>>>>>>>> **Error Information Log Entries:      3,470 -  Is this 
> a bad thing?*

Unknown. Experience shows that drive firmware of some vendors increments 
this frequently.


> Error Information (NVMe Log 0x01, 16 of 63 entries)
> **>>>>>>>>>>>>> *No Errors Logged - If there are 3470 log entries, why 
> are no errors logged?*
>

"The controller should clear this log page by removing all entries on 
power cycle and Controller Level Reset."
This differs from ATA error logs which are persistent.

Regards,
Christian



More information about the Smartmontools-support mailing list