[smartmontools-support] smartctl crashes reading NVMe error log on SPARC

Wed Mar 1 04:58:24 CET 2023

Hi,

On my SPARC system, smartctl crashes when reading the error log from an
NVMe device, like this:

  # ./smartctl -l error /dev/nvme0n1
  smartctl 7.3 2022-02-28 r5338 [sparc64-linux-6.1.14-00002-g7ccbbcd8239a] (local build)
  Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
  
  === START OF SMART DATA SECTION ===
  Error Information (NVMe Log 0x01, 16 of 256 entries)
  No Errors Logged
  
  double free or corruption (!prev)
  Aborted

The problem appears to be the byteswapping code in nvme_read_error_log,
since stepping through this with gdb reveals that nvme_read_log_page
returns 1024, which appears to be the size of the error_log array in
bytes, but it is misinterpreted as the number of array entries.  So I
imagine this issue must be present on any big-endian machine.

The following patch appears to be sufficient to resolve the crash.

Let me know if you need any more information!

Thanks,
  Nick

--- nvmecmds.cpp.orig	2023-02-28 21:24:45.031641754 -0500
+++ nvmecmds.cpp	2023-02-28 22:42:58.427003068 -0500
@@ -228,6 +228,8 @@ unsigned nvme_read_error_log(nvme_device
   unsigned n = nvme_read_log_page(device, 0xffffffff, 0x01, error_log,
                                   num_entries * sizeof(*error_log), lpo_sup);

+  n /= sizeof(*error_log);
+
   if (isbigendian()) {
     for (unsigned i = 0; i < n; i++) {
       swapx(&error_log[i].error_count);
@@ -240,7 +242,7 @@ unsigned nvme_read_error_log(nvme_device
     }
   }

-  return n / sizeof(*error_log);
+  return n;
 }

 // Read NVMe SMART/Health Information log.