[smartmontools-support] Calcuting smartctl output to recovery bad-sector on FreeBSD ZFS
Budi Janto
budijanto at studiokaraoke.co.id
Sun Mar 14 01:04:09 CET 2021
On 3/14/21 12:42 AM, Christian Franke wrote:
> Did you try any of the suggested smartctl options (-l xerror -l defects) ?
# smartctl -l defects /dev/ada2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-STABLE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
Pending Defects log (GP Log 0x0c) not supported
# smartctl -l xerror /dev/ada2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-STABLE amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 30 (device log contains only the most recent 24 errors)
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 30 [5] occurred at disk power-on lifetime: 28340 hours (1180 days
+ 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 c7 35 17 a0 40 00 Error: UNC at LBA =
0x1c73517a0 = 7637112736
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- ---------------
--------------------
60 01 00 00 38 00 01 c7 35 16 b0 40 08 39d+05:11:02.405 READ FPDMA
QUEUED
2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:11:02.404 READ LOG EXT
60 01 00 00 28 00 01 c7 35 16 b0 40 08 39d+05:11:00.273 READ FPDMA
QUEUED
2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:11:00.273 READ LOG EXT
60 01 00 00 18 00 01 c7 35 16 b0 40 08 39d+05:10:58.176 READ FPDMA
QUEUED
Error 29 [4] occurred at disk power-on lifetime: 28340 hours (1180 days
+ 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 c7 35 17 a0 40 00 Error: UNC at LBA =
0x1c73517a0 = 7637112736
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- ---------------
--------------------
60 01 00 00 28 00 01 c7 35 16 b0 40 08 39d+05:11:00.273 READ FPDMA
QUEUED
2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:11:00.273 READ LOG EXT
60 01 00 00 18 00 01 c7 35 16 b0 40 08 39d+05:10:58.176 READ FPDMA
QUEUED
2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:10:58.175 READ LOG EXT
60 01 00 00 08 00 01 c7 35 16 b0 40 08 39d+05:10:54.238 READ FPDMA
QUEUED
Error 28 [3] occurred at disk power-on lifetime: 28340 hours (1180 days
+ 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 c7 35 17 a0 40 00 Error: UNC at LBA =
0x1c73517a0 = 7637112736
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- ---------------
--------------------
60 01 00 00 18 00 01 c7 35 16 b0 40 08 39d+05:10:58.176 READ FPDMA
QUEUED
2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:10:58.175 READ LOG EXT
60 01 00 00 08 00 01 c7 35 16 b0 40 08 39d+05:10:54.238 READ FPDMA
QUEUED
2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:10:54.238 READ LOG EXT
60 01 00 00 f8 00 01 c7 35 16 b0 40 08 39d+05:10:52.137 READ FPDMA
QUEUED
Error 27 [2] occurred at disk power-on lifetime: 28340 hours (1180 days
+ 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 c7 35 17 a0 40 00 Error: UNC at LBA =
0x1c73517a0 = 7637112736
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- ---------------
--------------------
60 01 00 00 08 00 01 c7 35 16 b0 40 08 39d+05:10:54.238 READ FPDMA
QUEUED
2f 00 00 00 01 00 00 00 00 00 10 40 08 39d+05:10:54.238 READ LOG EXT
60 01 00 00 f8 00 01 c7 35 16 b0 40 08 39d+05:10:52.137 READ FPDMA
QUEUED
60 01 00 00 f0 00 01 c7 35 14 b0 40 08 39d+05:10:52.129 READ FPDMA
QUEUED
60 01 00 00 e8 00 01 c7 35 15 b0 40 08 39d+05:10:52.129 READ FPDMA
QUEUED
Error 26 [1] occurred at disk power-on lifetime: 28340 hours (1180 days
+ 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 c7 35 17 a0 40 00 Error: UNC at LBA =
0x1c73517a0 = 7637112736
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- ---------------
--------------------
60 01 00 00 f8 00 01 c7 35 16 b0 40 08 39d+05:10:52.137 READ FPDMA
QUEUED
60 01 00 00 f0 00 01 c7 35 14 b0 40 08 39d+05:10:52.129 READ FPDMA
QUEUED
60 01 00 00 e8 00 01 c7 35 15 b0 40 08 39d+05:10:52.129 READ FPDMA
QUEUED
61 00 10 00 e0 00 01 d1 bf fe 38 40 08 39d+05:10:52.129 WRITE FPDMA
QUEUED
61 00 10 00 d8 00 01 d1 bf fc 38 40 08 39d+05:10:52.043 WRITE FPDMA
QUEUED
Error 25 [0] occurred at disk power-on lifetime: 28340 hours (1180 days
+ 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 c7 35 17 a0 40 00 Error: UNC at LBA =
0x1c73517a0 = 7637112736
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- ---------------
--------------------
60 01 00 00 b8 00 01 c7 35 1f 28 40 08 39d+05:10:49.868 READ FPDMA
QUEUED
60 01 00 00 b0 00 01 c7 35 1e 28 40 08 39d+05:10:49.868 READ FPDMA
QUEUED
60 01 00 00 a8 00 01 c7 35 1d 28 40 08 39d+05:10:49.868 READ FPDMA
QUEUED
60 01 00 00 a0 00 01 c7 35 1c 28 40 08 39d+05:10:49.868 READ FPDMA
QUEUED
60 01 00 00 98 00 01 c7 35 1b 28 40 08 39d+05:10:49.868 READ FPDMA
QUEUED
Error 24 [23] occurred at disk power-on lifetime: 28340 hours (1180 days
+ 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 c7 35 17 a0 40 00 Error: UNC at LBA =
0x1c73517a0 = 7637112736
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- ---------------
--------------------
60 01 00 00 68 00 01 c7 35 1f 28 40 08 39d+05:10:47.745 READ FPDMA
QUEUED
60 01 00 00 60 00 01 c7 35 1e 28 40 08 39d+05:10:47.745 READ FPDMA
QUEUED
60 01 00 00 58 00 01 c7 35 1d 28 40 08 39d+05:10:47.745 READ FPDMA
QUEUED
60 01 00 00 50 00 01 c7 35 1c 28 40 08 39d+05:10:47.745 READ FPDMA
QUEUED
60 01 00 00 48 00 01 c7 35 1b 28 40 08 39d+05:10:47.745 READ FPDMA
QUEUED
Error 23 [22] occurred at disk power-on lifetime: 28340 hours (1180 days
+ 20 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 01 c7 35 17 a0 40 00 Error: UNC at LBA =
0x1c73517a0 = 7637112736
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- ---------------
--------------------
60 01 00 00 18 00 01 c7 35 1f 28 40 08 39d+05:10:45.648 READ FPDMA
QUEUED
60 01 00 00 10 00 01 c7 35 1e 28 40 08 39d+05:10:45.648 READ FPDMA
QUEUED
60 01 00 00 08 00 01 c7 35 1d 28 40 08 39d+05:10:45.648 READ FPDMA
QUEUED
60 01 00 00 00 00 01 c7 35 1c 28 40 08 39d+05:10:45.648 READ FPDMA
QUEUED
60 01 00 00 f8 00 01 c7 35 1b 28 40 08 39d+05:10:45.648 READ FPDMA
QUEUED
> I'm not sure whether 'conv=noerror,sync' has any effect in conjunction
> with /dev/zero.
>
> Caching should be suppressed with '*flag=direct'. Check first that the
> physical sector is actually unreadable, for example:
>
> # dd if=/dev/ada2 of=/dev/null bs=4096 count=1 skip=417763282 iflag=direct
>
> If and only if this command reports a read error, try to overwrite the
> physical sector:
>
> # dd if=/dev/zero of=/dev/ada2 bs=4096 count=1 seek=417763282 oflag=direct
I read from this https://datto.engineering/post/causing-zfs-corruption,
my goal is how to cover the bad-sector, so that the system does not use
it. Just curious before I gave a new hard drive. In freebsd-ufs type
already success follow this step:
# smartctl -l selftest /dev/ada0 | awk 'NR==7'
# 1 Extended offline Completed: read failure 90% 36067
27292160
^^^^^^^^ (L)
# dumpfs /dev/ada0p2 | egrep '^bsize'
bsize 32768 shift 15 mask 0xffff8000
^^^^^ (B)
# fdisk -s /dev/ada0
/dev/ada0: 310101 cyl 16 hd 63 sec
Part Start Size Type Flags
1: 1 312581807 0xee 0x00
^ (S)
# gpart list ada0 | tail -n 5
1. Name: ada0
Mediasize: 160041885696 (149G)
Sectorsize: 512
^^^ (M)
Mode: r2w2e3
Formula:
b = ((L - S) * M) / B)
L = 27292160
S = 1
M = 512
B = 32768
b = ((27292160 - 1) * 512) / 32768
b = 426439.984375 ~ 426439 (int)
# sysctl kern.geom.debugflags=0x10
# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=426439
# smartctl -l selftest /dev/ada0 | awk 'NR==7'
# 1 Extended offline Completed without error 00% 36073
-
^
Is it possible in freebsd-zfs to use the same method? Thanks.
--
Regards,
Budi Janto
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <https://listi.jpberlin.de/pipermail/smartmontools-support/attachments/20210314/1e9cf076/attachment.sig>
More information about the Smartmontools-support
mailing list