[smartmontools-support] Calcuting smartctl output to recovery bad-sector on FreeBSD ZFS

Budi Janto budijanto at studiokaraoke.co.id
Thu Mar 11 13:44:14 CET 2021


Hi,

I running FreeBSD 12.2-RELEASE and ZFS file system, my `dmesg` output 
look like:

Mar 11 18:45:36 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41 
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:36 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30 
1f 90 00 b8 01 00 20 00
Mar 11 18:45:36 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Error 5, 
Retries exhausted
Mar 11 18:45:42 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): 
READ_FPDMA_QUEUED. ACB: 60 20 28 1f 90 40 b8 01 00 00 00 00
Mar 11 18:45:42 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): CAM status: ATA 
Status Error
Mar 11 18:45:42 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41 
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:42 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30 
1f 90 00 b8 01 00 20 00
Mar 11 18:45:42 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Retrying 
command, 3 more tries remain
Mar 11 18:45:45 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): 
READ_FPDMA_QUEUED. ACB: 60 20 28 1f 90 40 b8 01 00 00 00 00
Mar 11 18:45:45 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): CAM status: ATA 
Status Error
Mar 11 18:45:45 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41 
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:45 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30 
1f 90 00 b8 01 00 20 00
Mar 11 18:45:45 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Retrying 
command, 2 more tries remain
Mar 11 18:45:48 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): 
READ_FPDMA_QUEUED. ACB: 60 20 28 1f 90 40 b8 01 00 00 00 00
Mar 11 18:45:48 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): CAM status: ATA 
Status Error
Mar 11 18:45:48 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41 
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:48 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30 
1f 90 00 b8 01 00 20 00
Mar 11 18:45:48 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Retrying 
command, 1 more tries remain
Mar 11 18:45:50 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): 
READ_FPDMA_QUEUED. ACB: 60 20 28 1f 90 40 b8 01 00 00 00 00
Mar 11 18:45:50 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): CAM status: ATA 
Status Error
Mar 11 18:45:50 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41 
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:50 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30 
1f 90 00 b8 01 00 20 00
Mar 11 18:45:50 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Retrying 
command, 0 more tries remain
Mar 11 18:45:53 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): 
READ_FPDMA_QUEUED. ACB: 60 20 28 1f 90 40 b8 01 00 00 00 00
Mar 11 18:45:53 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): CAM status: ATA 
Status Error
Mar 11 18:45:53 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41 
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:53 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30 
1f 90 00 b8 01 00 20 00
Mar 11 18:45:53 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Error 5, 
Retries exhausted
Mar 11 18:45:55 BEC-STG-P1 ZFS[5827]: vdev I/O failure, zpool=$pool 
path=$/dev/diskid/DISK-ZGY3WLSZp1 offset=$3784407121920 size=$16384 error=$5
Mar 11 18:45:55 BEC-STG-P1 ZFS[5828]: pool I/O failure, zpool=$pool error=$5
Mar 11 18:45:57 BEC-STG-P1 ZFS[5829]: vdev state changed, 
pool_guid=$3628960460546579489 vdev_guid=$5117958107908570560


Here output `smartctl` command utility.
root at BEC-STG-P1:~ # smartctl -i /dev/ada2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY3WLSZ
LU WWN Device Id: 5 000c50 0b4a12246
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Mar 11 19:17:43 2021 WIB
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root at BEC-STG-P1:~ # smartctl -A /dev/ada2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   081   048   044    Pre-fail 
Always       -       133286904
   3 Spin_Up_Time            0x0003   094   093   000    Pre-fail 
Always       -       0
   4 Start_Stop_Count        0x0032   100   100   020    Old_age 
Always       -       46
   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x000f   090   060   045    Pre-fail 
Always       -       905479472
   9 Power_On_Hours          0x0032   084   084   000    Old_age 
Always       -       14422 (51 39 0)
  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   020    Old_age 
Always       -       47
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always 
       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always 
       -       2330
188 Command_Timeout         0x0032   100   100   000    Old_age   Always 
       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always 
       -       0
190 Airflow_Temperature_Cel 0x0022   068   055   040    Old_age   Always 
       -       32 (Min/Max 24/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always 
       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always 
       -       23
193 Load_Cycle_Count        0x0032   097   097   000    Old_age   Always 
       -       6190
194 Temperature_Celsius     0x0022   032   045   000    Old_age   Always 
       -       32 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always 
       -       8
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age 
Offline      -       8
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always 
       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age 
Offline      -       10538 (198 46 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age 
Offline      -       18307534011
242 Total_LBAs_Read         0x0000   100   253   000    Old_age 
Offline      -       496531125795

root at BEC-STG-P1:~ # smartctl -l error /dev/ada2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 2329 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2329 occurred at disk power-on lifetime: 14422 hours (600 days + 
22 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   60 00 30 ff ff ff 4f 00   1d+04:32:43.920  READ FPDMA QUEUED
   60 00 20 ff ff ff 4f 00   1d+04:32:43.920  READ FPDMA QUEUED
   2f 00 01 10 00 00 00 00   1d+04:32:43.882  READ LOG EXT
   60 00 30 ff ff ff 4f 00   1d+04:32:41.159  READ FPDMA QUEUED
   60 00 20 ff ff ff 4f 00   1d+04:32:41.159  READ FPDMA QUEUED

Error 2328 occurred at disk power-on lifetime: 14422 hours (600 days + 
22 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   60 00 30 ff ff ff 4f 00   1d+04:32:41.159  READ FPDMA QUEUED
   60 00 20 ff ff ff 4f 00   1d+04:32:41.159  READ FPDMA QUEUED
   2f 00 01 10 00 00 00 00   1d+04:32:41.112  READ LOG EXT
   60 00 30 ff ff ff 4f 00   1d+04:32:38.381  READ FPDMA QUEUED
   60 00 20 ff ff ff 4f 00   1d+04:32:38.381  READ FPDMA QUEUED

Error 2327 occurred at disk power-on lifetime: 14422 hours (600 days + 
22 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   60 00 30 ff ff ff 4f 00   1d+04:32:38.381  READ FPDMA QUEUED
   60 00 20 ff ff ff 4f 00   1d+04:32:38.381  READ FPDMA QUEUED
   2f 00 01 10 00 00 00 00   1d+04:32:38.343  READ LOG EXT
   60 00 30 ff ff ff 4f 00   1d+04:32:35.600  READ FPDMA QUEUED
   60 00 20 ff ff ff 4f 00   1d+04:32:35.600  READ FPDMA QUEUED

Error 2326 occurred at disk power-on lifetime: 14422 hours (600 days + 
22 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   60 00 30 ff ff ff 4f 00   1d+04:32:35.600  READ FPDMA QUEUED
   60 00 20 ff ff ff 4f 00   1d+04:32:35.600  READ FPDMA QUEUED
   2f 00 01 10 00 00 00 00   1d+04:32:35.553  READ LOG EXT
   60 00 30 ff ff ff 4f 00   1d+04:32:32.843  READ FPDMA QUEUED
   60 00 30 ff ff ff 4f 00   1d+04:32:32.831  READ FPDMA QUEUED

Error 2325 occurred at disk power-on lifetime: 14422 hours (600 days + 
22 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   60 00 30 ff ff ff 4f 00   1d+04:32:32.843  READ FPDMA QUEUED
   60 00 30 ff ff ff 4f 00   1d+04:32:32.831  READ FPDMA QUEUED
   60 00 60 ff ff ff 4f 00   1d+04:32:32.816  READ FPDMA QUEUED
   60 00 20 ff ff ff 4f 00   1d+04:32:32.811  READ FPDMA QUEUED
   60 00 28 ff ff ff 4f 00   1d+04:32:32.803  READ FPDMA QUEUED

My questions is, how to write through dd command utility to give 
instruction that sector addresses should be not used by system? I did 
something like this:

root at BEC-STG-P1:~ # sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16

root at BEC-STG-P1:~ # dd if=/dev/zero of=/dev/ada0 bs=128k count=1 
seek=268435455
dd: /dev/ada0: Input/output error
1+0 records in
0+0 records out
0 bytes transferred in 0.000089 secs (0 bytes/sec)

Where's bs value from output (But failed, I change to bs=512):
root at BEC-STG-P1:~ # zfs get all | grep recordsize
pool  recordsize            128K                   default
                             ^^^^

And seek address from output:
40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
                                                        ^^^^^^^^^
Is this calculation correct?



-- 
Regards,


Budi Janto

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <https://listi.jpberlin.de/pipermail/smartmontools-support/attachments/20210311/ef6676eb/attachment.sig>


More information about the Smartmontools-support mailing list