[smartmontools-support] "Unexpected sense" errors logged on Dell PERC H700 controllers

Terry Kennedy TERRY at glaver.org
Wed Jun 26 02:28:05 CEST 2019


Christian Franke wrote:
> The difference is that the HUS156030VLS600 advertises compliance with 
> SPC-4 (SCSI Primary Commands - 4) which adds log subpages. Then smartctl 
>  >= r4679 queries "Supported Log Pages and Subpages log page" via LOG 
> SENSE page/subpage 0x00/0xff. This may fail in this case because drive 
> firmware does not implement it (which would possibly violate SPC-4) or 
> controller pass-through is not SPC-4 compatible.

  Probably the latter, as the systems with these drives seem to have them
as warranty replacements for older SPC-3 drives. I created the following
quick-and-dirty patch for my local source tree to treat all SPC-4 drives 
as SPC-3. It does not seem to affect the reporting in any way (comparing
before-and-after output):

*** scsiprint.cpp.orig  Thu Dec 27 12:07:44 2018
--- scsiprint.cpp       Tue Jun 25 19:45:44 2019
***************
*** 138,144 ****
          if (err)
              return;
          memcpy(sup_lpgs, gBuf, LOG_RESP_LEN);
!     } else if ((scsi_version >= SCSI_VERSION_SPC_4) &&
                 (scsi_version <= SCSI_VERSION_HIGHEST)) {
          /* unclear what code T10 will choose for SPC-6 */
          memcpy(sup_lpgs, gBuf, LOG_RESP_LEN);
--- 138,151 ----
          if (err)
              return;
          memcpy(sup_lpgs, gBuf, LOG_RESP_LEN);
! /*
!  * For FreeBSD we change this check to only trigger on SPC-5, as SPC-4
!  * drives would otherwise trigger a request that the Dell PERC H700 con-
!  * troller doesn't support, logging errors like (wrapped for convenience):
!  * mfi0: 7204 (614818162s/0x0002/info) - Unexpected sense: PD 03(e0x20/s3) 
!  * Path 5001e8200289f13a, CDB: 4d 00 40 ff 00 00 00 3e fc 00, Sense: 5/24/00
!  */
!     } else if ((scsi_version >= SCSI_VERSION_SPC_5) &&
                 (scsi_version <= SCSI_VERSION_HIGHEST)) {
          /* unclear what code T10 will choose for SPC-6 */
          memcpy(sup_lpgs, gBuf, LOG_RESP_LEN);

  In testing this, I discovered another issue. SanDisk SAS SSDs derived
from the Pliant product acquisition respond very unhappily to the "READ
DEFECT DATA (12)" command. In addition to logging the error in the system
error log, they increment the "Non-medium error count" stored on the drive
when they run into this command. The following patch (which should be used
only on systems with those drives) disables this function, instead return-
ing "defect list not found" to the caller:

*** scsicmds.cpp.orig   Sun Dec  2 11:07:26 2018
--- scsicmds.cpp        Tue Jun 25 20:21:13 2019
***************
*** 1129,1134 ****
--- 1129,1148 ----
   * command not supported, 3 if field in command not supported, 101 if
   * defect list not found (e.g. SSD may not have defect list) or returns
   * negated errno. SBC-3 section 5.18 (rev 35; vale Mark Evans) */
+ /*
+ /*
+  * SanDisk LT[nn]00MO/WM/RO SSD units with (at least) Dell Firmware D416
+  * reject this command and return "Defect list not found" (0/1c/00). But
+  * requesting the defect list logs errors like (wrapped for convenience):
+  * mfi0: 7210 (614822013s/0x0002/info) - Unexpected sense: PD 03(e0x20/s3)
+  * Path 5001e8200289f13a, CDB: b7 0c 00 00 00 00 00 00 00 08 00 00, Sense:
+  * 1/1c/00
+  * Worse, this increments the "Non-medium error count" on the drive. So
+  * this patch should be applied ONLY to systems including the above drive
+  * models (or other drives exhibiting the same misbehavior, such as the
+  * Pliant / SanDisk LB[n]06M/S/R.
+  */
+ 
  int
  scsiReadDefect12(scsi_device * device, int req_plist, int req_glist,
                   int dl_format, int addrDescIndex, uint8_t *pBuf, int bufLen)
***************
*** 1138,1143 ****
--- 1152,1158 ----
      uint8_t cdb[12];
      uint8_t sense[32];
  
+     return 101; /* just bail out without doing anything */
      memset(&io_hdr, 0, sizeof(io_hdr));
      memset(cdb, 0, sizeof(cdb));
      io_hdr.dxfer_dir = DXFER_FROM_DEVICE;

  It doesn't seem as if there is a good way to deal with this in the
drivedb.h file - all of the -F bugfixes are marked "ATA only" in the
manpage. These may be rare enough these days (older controllers, older
drives) that it isn't worth adding logic to deal with their quirks to
the main smartmontools distribution - what do you think?

        Terry Kennedy     http://www.glaver.org      New York, NY USA



More information about the Smartmontools-support mailing list