[smartmontools-support] How to read NVME results

Franc Zabkar fzabkar at internode.on.net
Thu Apr 28 20:47:24 CEST 2022


The Critical and Warning temperatures are reported in the "START OF 
SMART DATA SECTION". These thresholds would depend on the type of sensor 
in use. Most drives now appear to sense the temperatures on the flash 
controller die and in the NAND flash chips, although I haven't seen a 
datasheet which confirms the latter. The die temperatures for the flash 
controller are usually quite high, but the actual "composite" 
temperature reported by the drive is some kind of weighted average of 
each of its sensors. Earlier SSDs sensed the air temperature via a 
discrete temperature sensor IC. These temperatures are obviously lower.

Unsafe Shutdowns are those shutdowns which are not preceded by an ATA 
command which would force the SSD to flush its cache. This can cause a 
panic scenario which would trigger a power loss data protection incident.

https://www.intel.com/content/dam/support/us/en/documents/ssdc/hpssd/sb/Intel_SSD_320_Series_Product_specification.pdf

"An unsafe shutdown occurs whenever the device is powered off without 
STANDBY IMMEDIATE being the last command."

https://newsroom.intel.com/wp-content/uploads/sites/11/2016/01/Intel_SSD_320_Series_Enhance_Power_Loss_Technology_Brief.pdf

Importance of Power-Loss Data Protection

During a “clean” shutdown, most host systems initiate a command (the 
STANDBY IMMEDIATE command) to an SSD to give the SSD enough time to 
prepare for the shutdown. This allows the SSD to save data currently in 
transition (in temporary buffers) to the non-volatile NAND media.
However, during an unsafe power shutdown, the SSD abruptly loses power 
before the host system can initiate the STANDBY IMMEDIATE command. This 
prevents data in the temporary buffers from being saved in the 
non-volatile NAND.

In the Intel SSD 320 Series, user data and system data are stored in 
temporary buffers for a very short period of time compared to their 
residency in the NAND media. The Intel SSD 320 Series makes sure both 
types of data are protected during unexpected power loss events.



On 29/04/2022 3:07 am, Thane K. Sherrington wrote:
> Hi all,
>      Just wondering how I read the following results:
> 
> Specifically these lines:
> Unsafe Shutdowns:                   102
> Media and Data Integrity Errors:    0
> Error Information Log Entries:      80
> Warning  Comp. Temperature Time:    63
> Critical Comp. Temperature Time:    0
> Temperature Sensor 1:               59 Celsius
> 
> I'm assuming "Unsafe Shutdowns" aren't a failure in and of themselves, 
> but cause stress to the drive (hence, unsafe) is that right?
> Are "Media and Data Integrity" errors a bad failure (I don't have any on 
> this drive, but for future reference)?  They sound bad.
> Are 80 "Error Information Log Entries" bad - is there a way to see these 
> log entries?  Should I consider this the beginnings of problems?
> Is "Warning  Comp. Temperature Time" the number of minutes the drive has 
> reached a warning temperature?  What is the warning temperature?
> Is "Critical Comp. Temperature Time" the number of minutes the drive has 
> reached critical temperature?  What is critical temp?
> I see the drive is 59 Celsius - on mechanical drives, I always 
> considered 40C to be hot, and 50C to be overheating.  What are the 
> numbers for NVME SSDs?
> 
> 
> The full log is below.
> 
> 
> smartctl 7.2 2020-12-30 r5155 [i686-w64-mingw32-w10-1607(64)] (sf-7.2-1)
> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Model Number:                       KBG40ZNV256G KIOXIA
> Serial Number:                      X1MPHG45QW82
> Firmware Version:                   HP00AE00
> PCI Vendor/Subsystem ID:            0x1e0f
> IEEE OUI Identifier:                0x8ce38e
> Total NVM Capacity:                 256,060,514,304 [256 GB]
> Unallocated NVM Capacity:           0
> Controller ID:                      0
> NVMe Version:                       1.3
> Number of Namespaces:               1
> Namespace 1 Size/Capacity:          256,060,514,304 [256 GB]
> Namespace 1 Formatted LBA Size:     512
> Namespace 1 IEEE EUI-64:            8ce38e 0402f79283
> Local Time is:                      Thu Apr 28 11:57:27 2022 ADT
> Firmware Updates (0x14):            2 Slots, no Reset required
> Optional Admin Commands (0x001f):   Security Format Frmw_DL NS_Mngmt 
> Self_Test
> Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero 
> Sav/Sel_Feat Timestmp
> Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
> Maximum Data Transfer Size:         512 Pages
> Warning  Comp. Temp. Threshold:     79 Celsius
> Critical Comp. Temp. Threshold:     84 Celsius
> 
> Supported Power States
> St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
>   0 +     3.60W       -        -    0  0  0  0        1       1
>   1 +     2.60W       -        -    1  1  1  1        1       1
>   2 +     1.80W       -        -    2  2  2  2        1       1
>   3 -   0.0500W       -        -    4  4  4  4      800    1200
>   4 -   0.0050W       -        -    4  4  4  4     3000   32000
> 
> Supported LBA Sizes (NSID 0x1)
> Id Fmt  Data  Metadt  Rel_Perf
>   0 +     512       0         3
>   1 -    4096       0         1
> 
> === START OF SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SMART/Health Information (NVMe Log 0x02)
> Critical Warning:                   0x00
> Temperature:                        59 Celsius
> Available Spare:                    100%
> Available Spare Threshold:          5%
> Percentage Used:                    0%
> Data Units Read:                    1,242,650 [636 GB]
> Data Units Written:                 2,947,714 [1.50 TB]
> Host Read Commands:                 19,046,587
> Host Write Commands:                30,101,722
> Controller Busy Time:               97
> Power Cycles:                       333
> Power On Hours:                     77
> Unsafe Shutdowns:                   102
> Media and Data Integrity Errors:    0
> Error Information Log Entries:      80
> Warning  Comp. Temperature Time:    63
> Critical Comp. Temperature Time:    0
> Temperature Sensor 1:               59 Celsius
> 
> Error Information (NVMe Log 0x01, 16 of 256 entries)
> No Errors Logged
> 
> 
> -- 
> Thane K. Sherrington
> 
> Computer Connection, Ltd. ...taking the mystery out of computers since 1982.
> Winner of the 2012 Ian Spencer - Excellence in Business Award
> *Thanks for making us the Reader's Choice Best Computer Store in 2016, 
> 2017, 2018 and 2019!*
> 95 College St., Antigonish,
> NS B2G 1X6
> 902-863-3361 (phone)
> 902-863-2580 (fax)
> thane at computerconnectionltd.com
> 
> _______________________________________________
> Smartmontools-support mailing list
> Smartmontools-support at listi.jpberlin.de
> https://listi.jpberlin.de/mailman/listinfo/smartmontools-support


More information about the Smartmontools-support mailing list