[smartmontools-support] Calcuting smartctl output to recovery bad-sector on FreeBSD ZFS

Budi Janto budijanto at studiokaraoke.co.id
Sun Mar 14 01:04:09 CET 2021



On 3/14/21 12:42 AM, Christian Franke wrote:
> Did you try any of the suggested smartctl options (-l xerror -l defects) ?

# smartctl -l defects /dev/ada2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-STABLE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Pending Defects log (GP Log 0x0c) not supported

# smartctl -l xerror /dev/ada2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-STABLE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 30 (device log contains only the most recent 24 errors)
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 30 [5] occurred at disk power-on lifetime: 28340 hours (1180 days 
+ 20 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER -- ST COUNT  LBA_48  LH LM LL DV DC
   -- -- -- == -- == == == -- -- -- -- --
   40 -- 51 00 00 00 01 c7 35 17 a0 40 00  Error: UNC at LBA = 
0x1c73517a0 = 7637112736

   Commands leading to the command that caused the error were:
   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
Command/Feature_Name
   -- == -- == -- == == == -- -- -- -- --  --------------- 
--------------------
   60 01 00 00 38 00 01 c7 35 16 b0 40 08 39d+05:11:02.405  READ FPDMA 
QUEUED
   2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:11:02.404  READ LOG EXT
   60 01 00 00 28 00 01 c7 35 16 b0 40 08 39d+05:11:00.273  READ FPDMA 
QUEUED
   2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:11:00.273  READ LOG EXT
   60 01 00 00 18 00 01 c7 35 16 b0 40 08 39d+05:10:58.176  READ FPDMA 
QUEUED

Error 29 [4] occurred at disk power-on lifetime: 28340 hours (1180 days 
+ 20 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER -- ST COUNT  LBA_48  LH LM LL DV DC
   -- -- -- == -- == == == -- -- -- -- --
   40 -- 51 00 00 00 01 c7 35 17 a0 40 00  Error: UNC at LBA = 
0x1c73517a0 = 7637112736

   Commands leading to the command that caused the error were:
   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
Command/Feature_Name
   -- == -- == -- == == == -- -- -- -- --  --------------- 
--------------------
   60 01 00 00 28 00 01 c7 35 16 b0 40 08 39d+05:11:00.273  READ FPDMA 
QUEUED
   2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:11:00.273  READ LOG EXT
   60 01 00 00 18 00 01 c7 35 16 b0 40 08 39d+05:10:58.176  READ FPDMA 
QUEUED
   2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:10:58.175  READ LOG EXT
   60 01 00 00 08 00 01 c7 35 16 b0 40 08 39d+05:10:54.238  READ FPDMA 
QUEUED

Error 28 [3] occurred at disk power-on lifetime: 28340 hours (1180 days 
+ 20 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER -- ST COUNT  LBA_48  LH LM LL DV DC
   -- -- -- == -- == == == -- -- -- -- --
   40 -- 51 00 00 00 01 c7 35 17 a0 40 00  Error: UNC at LBA = 
0x1c73517a0 = 7637112736

   Commands leading to the command that caused the error were:
   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
Command/Feature_Name
   -- == -- == -- == == == -- -- -- -- --  --------------- 
--------------------
   60 01 00 00 18 00 01 c7 35 16 b0 40 08 39d+05:10:58.176  READ FPDMA 
QUEUED
   2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:10:58.175  READ LOG EXT
   60 01 00 00 08 00 01 c7 35 16 b0 40 08 39d+05:10:54.238  READ FPDMA 
QUEUED
   2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:10:54.238  READ LOG EXT
   60 01 00 00 f8 00 01 c7 35 16 b0 40 08 39d+05:10:52.137  READ FPDMA 
QUEUED

Error 27 [2] occurred at disk power-on lifetime: 28340 hours (1180 days 
+ 20 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER -- ST COUNT  LBA_48  LH LM LL DV DC
   -- -- -- == -- == == == -- -- -- -- --
   40 -- 51 00 00 00 01 c7 35 17 a0 40 00  Error: UNC at LBA = 
0x1c73517a0 = 7637112736

   Commands leading to the command that caused the error were:
   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
Command/Feature_Name
   -- == -- == -- == == == -- -- -- -- --  --------------- 
--------------------
   60 01 00 00 08 00 01 c7 35 16 b0 40 08 39d+05:10:54.238  READ FPDMA 
QUEUED
   2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:10:54.238  READ LOG EXT
   60 01 00 00 f8 00 01 c7 35 16 b0 40 08 39d+05:10:52.137  READ FPDMA 
QUEUED
   60 01 00 00 f0 00 01 c7 35 14 b0 40 08 39d+05:10:52.129  READ FPDMA 
QUEUED
   60 01 00 00 e8 00 01 c7 35 15 b0 40 08 39d+05:10:52.129  READ FPDMA 
QUEUED

Error 26 [1] occurred at disk power-on lifetime: 28340 hours (1180 days 
+ 20 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER -- ST COUNT  LBA_48  LH LM LL DV DC
   -- -- -- == -- == == == -- -- -- -- --
   40 -- 51 00 00 00 01 c7 35 17 a0 40 00  Error: UNC at LBA = 
0x1c73517a0 = 7637112736

   Commands leading to the command that caused the error were:
   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
Command/Feature_Name
   -- == -- == -- == == == -- -- -- -- --  --------------- 
--------------------
   60 01 00 00 f8 00 01 c7 35 16 b0 40 08 39d+05:10:52.137  READ FPDMA 
QUEUED
   60 01 00 00 f0 00 01 c7 35 14 b0 40 08 39d+05:10:52.129  READ FPDMA 
QUEUED
   60 01 00 00 e8 00 01 c7 35 15 b0 40 08 39d+05:10:52.129  READ FPDMA 
QUEUED
   61 00 10 00 e0 00 01 d1 bf fe 38 40 08 39d+05:10:52.129  WRITE FPDMA 
QUEUED
   61 00 10 00 d8 00 01 d1 bf fc 38 40 08 39d+05:10:52.043  WRITE FPDMA 
QUEUED

Error 25 [0] occurred at disk power-on lifetime: 28340 hours (1180 days 
+ 20 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER -- ST COUNT  LBA_48  LH LM LL DV DC
   -- -- -- == -- == == == -- -- -- -- --
   40 -- 51 00 00 00 01 c7 35 17 a0 40 00  Error: UNC at LBA = 
0x1c73517a0 = 7637112736

   Commands leading to the command that caused the error were:
   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
Command/Feature_Name
   -- == -- == -- == == == -- -- -- -- --  --------------- 
--------------------
   60 01 00 00 b8 00 01 c7 35 1f 28 40 08 39d+05:10:49.868  READ FPDMA 
QUEUED
   60 01 00 00 b0 00 01 c7 35 1e 28 40 08 39d+05:10:49.868  READ FPDMA 
QUEUED
   60 01 00 00 a8 00 01 c7 35 1d 28 40 08 39d+05:10:49.868  READ FPDMA 
QUEUED
   60 01 00 00 a0 00 01 c7 35 1c 28 40 08 39d+05:10:49.868  READ FPDMA 
QUEUED
   60 01 00 00 98 00 01 c7 35 1b 28 40 08 39d+05:10:49.868  READ FPDMA 
QUEUED

Error 24 [23] occurred at disk power-on lifetime: 28340 hours (1180 days 
+ 20 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER -- ST COUNT  LBA_48  LH LM LL DV DC
   -- -- -- == -- == == == -- -- -- -- --
   40 -- 51 00 00 00 01 c7 35 17 a0 40 00  Error: UNC at LBA = 
0x1c73517a0 = 7637112736

   Commands leading to the command that caused the error were:
   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
Command/Feature_Name
   -- == -- == -- == == == -- -- -- -- --  --------------- 
--------------------
   60 01 00 00 68 00 01 c7 35 1f 28 40 08 39d+05:10:47.745  READ FPDMA 
QUEUED
   60 01 00 00 60 00 01 c7 35 1e 28 40 08 39d+05:10:47.745  READ FPDMA 
QUEUED
   60 01 00 00 58 00 01 c7 35 1d 28 40 08 39d+05:10:47.745  READ FPDMA 
QUEUED
   60 01 00 00 50 00 01 c7 35 1c 28 40 08 39d+05:10:47.745  READ FPDMA 
QUEUED
   60 01 00 00 48 00 01 c7 35 1b 28 40 08 39d+05:10:47.745  READ FPDMA 
QUEUED

Error 23 [22] occurred at disk power-on lifetime: 28340 hours (1180 days 
+ 20 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER -- ST COUNT  LBA_48  LH LM LL DV DC
   -- -- -- == -- == == == -- -- -- -- --
   40 -- 51 00 00 00 01 c7 35 17 a0 40 00  Error: UNC at LBA = 
0x1c73517a0 = 7637112736

   Commands leading to the command that caused the error were:
   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
Command/Feature_Name
   -- == -- == -- == == == -- -- -- -- --  --------------- 
--------------------
   60 01 00 00 18 00 01 c7 35 1f 28 40 08 39d+05:10:45.648  READ FPDMA 
QUEUED
   60 01 00 00 10 00 01 c7 35 1e 28 40 08 39d+05:10:45.648  READ FPDMA 
QUEUED
   60 01 00 00 08 00 01 c7 35 1d 28 40 08 39d+05:10:45.648  READ FPDMA 
QUEUED
   60 01 00 00 00 00 01 c7 35 1c 28 40 08 39d+05:10:45.648  READ FPDMA 
QUEUED
   60 01 00 00 f8 00 01 c7 35 1b 28 40 08 39d+05:10:45.648  READ FPDMA 
QUEUED

> I'm not sure whether 'conv=noerror,sync' has any effect in conjunction 
> with /dev/zero.
> 
> Caching should be suppressed with '*flag=direct'. Check first that the 
> physical sector is actually unreadable, for example:
> 
> # dd if=/dev/ada2 of=/dev/null bs=4096 count=1 skip=417763282 iflag=direct
> 
> If and only if this command reports a read error, try to overwrite the 
> physical sector:
> 
> # dd if=/dev/zero of=/dev/ada2 bs=4096 count=1 seek=417763282 oflag=direct

I read from this https://datto.engineering/post/causing-zfs-corruption,
my goal is how to cover the bad-sector, so that the system does not use 
it. Just curious before I gave a new hard drive. In freebsd-ufs type 
already success follow this step:

# smartctl -l selftest /dev/ada0 | awk 'NR==7'
# 1  Extended offline    Completed: read failure       90%     36067 
      27292160
 
       ^^^^^^^^ (L)

# dumpfs /dev/ada0p2 | egrep '^bsize'
bsize   32768   shift   15      mask    0xffff8000
         ^^^^^ (B)

# fdisk -s /dev/ada0
/dev/ada0: 310101 cyl 16 hd 63 sec
Part        Start        Size Type Flags
    1:           1   312581807 0xee 0x00
                 ^ (S)

# gpart list ada0 | tail -n 5
1. Name: ada0
    Mediasize: 160041885696 (149G)
    Sectorsize: 512
                ^^^ (M)
    Mode: r2w2e3

Formula:
b = ((L - S) * M) / B)
L = 27292160
S = 1
M = 512
B = 32768

b = ((27292160 - 1) * 512) / 32768
b = 426439.984375 ~ 426439 (int)

# sysctl kern.geom.debugflags=0x10
# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=426439
# smartctl -l selftest /dev/ada0 | awk 'NR==7'
# 1  Extended offline    Completed without error       00%     36073 
      -
 
       ^

Is it possible in freebsd-zfs to use the same method? Thanks.


-- 
Regards,
Budi Janto

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <https://listi.jpberlin.de/pipermail/smartmontools-support/attachments/20210314/1e9cf076/attachment.sig>


More information about the Smartmontools-support mailing list