[smartmontools-support] Calcuting smartctl output to recovery bad-sector on FreeBSD ZFS
Budi Janto
budijanto at studiokaraoke.co.id
Thu Mar 11 13:44:14 CET 2021
Hi,
I running FreeBSD 12.2-RELEASE and ZFS file system, my `dmesg` output
look like:
Mar 11 18:45:36 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:36 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30
1f 90 00 b8 01 00 20 00
Mar 11 18:45:36 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Error 5,
Retries exhausted
Mar 11 18:45:42 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0):
READ_FPDMA_QUEUED. ACB: 60 20 28 1f 90 40 b8 01 00 00 00 00
Mar 11 18:45:42 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): CAM status: ATA
Status Error
Mar 11 18:45:42 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:42 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30
1f 90 00 b8 01 00 20 00
Mar 11 18:45:42 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Retrying
command, 3 more tries remain
Mar 11 18:45:45 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0):
READ_FPDMA_QUEUED. ACB: 60 20 28 1f 90 40 b8 01 00 00 00 00
Mar 11 18:45:45 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): CAM status: ATA
Status Error
Mar 11 18:45:45 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:45 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30
1f 90 00 b8 01 00 20 00
Mar 11 18:45:45 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Retrying
command, 2 more tries remain
Mar 11 18:45:48 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0):
READ_FPDMA_QUEUED. ACB: 60 20 28 1f 90 40 b8 01 00 00 00 00
Mar 11 18:45:48 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): CAM status: ATA
Status Error
Mar 11 18:45:48 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:48 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30
1f 90 00 b8 01 00 20 00
Mar 11 18:45:48 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Retrying
command, 1 more tries remain
Mar 11 18:45:50 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0):
READ_FPDMA_QUEUED. ACB: 60 20 28 1f 90 40 b8 01 00 00 00 00
Mar 11 18:45:50 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): CAM status: ATA
Status Error
Mar 11 18:45:50 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:50 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30
1f 90 00 b8 01 00 20 00
Mar 11 18:45:50 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Retrying
command, 0 more tries remain
Mar 11 18:45:53 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0):
READ_FPDMA_QUEUED. ACB: 60 20 28 1f 90 40 b8 01 00 00 00 00
Mar 11 18:45:53 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): CAM status: ATA
Status Error
Mar 11 18:45:53 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): ATA status: 41
(DRDY ERR), error: 40 (UNC )
Mar 11 18:45:53 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): RES: 41 40 30
1f 90 00 b8 01 00 20 00
Mar 11 18:45:53 BEC-STG-P1 kernel: (ada2:ahcich5:0:0:0): Error 5,
Retries exhausted
Mar 11 18:45:55 BEC-STG-P1 ZFS[5827]: vdev I/O failure, zpool=$pool
path=$/dev/diskid/DISK-ZGY3WLSZp1 offset=$3784407121920 size=$16384 error=$5
Mar 11 18:45:55 BEC-STG-P1 ZFS[5828]: pool I/O failure, zpool=$pool error=$5
Mar 11 18:45:57 BEC-STG-P1 ZFS[5829]: vdev state changed,
pool_guid=$3628960460546579489 vdev_guid=$5117958107908570560
Here output `smartctl` command utility.
root at BEC-STG-P1:~ # smartctl -i /dev/ada2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate IronWolf
Device Model: ST4000VN008-2DR166
Serial Number: ZGY3WLSZ
LU WWN Device Id: 5 000c50 0b4a12246
Firmware Version: SC60
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5980 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Mar 11 19:17:43 2021 WIB
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
root at BEC-STG-P1:~ # smartctl -A /dev/ada2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 081 048 044 Pre-fail
Always - 133286904
3 Spin_Up_Time 0x0003 094 093 000 Pre-fail
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 46
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail
Always - 905479472
9 Power_On_Hours 0x0032 084 084 000 Old_age
Always - 14422 (51 39 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always - 47
184 End-to-End_Error 0x0032 100 100 099 Old_age Always
- 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always
- 2330
188 Command_Timeout 0x0032 100 100 000 Old_age Always
- 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0022 068 055 040 Old_age Always
- 32 (Min/Max 24/32)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always
- 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always
- 23
193 Load_Cycle_Count 0x0032 097 097 000 Old_age Always
- 6190
194 Temperature_Celsius 0x0022 032 045 000 Old_age Always
- 32 (0 21 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
- 8
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 8
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always
- 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age
Offline - 10538 (198 46 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age
Offline - 18307534011
242 Total_LBAs_Read 0x0000 100 253 000 Old_age
Offline - 496531125795
root at BEC-STG-P1:~ # smartctl -l error /dev/ada2
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 2329 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 2329 occurred at disk power-on lifetime: 14422 hours (600 days +
22 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 ff ff ff 4f 00 1d+04:32:43.920 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 1d+04:32:43.920 READ FPDMA QUEUED
2f 00 01 10 00 00 00 00 1d+04:32:43.882 READ LOG EXT
60 00 30 ff ff ff 4f 00 1d+04:32:41.159 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 1d+04:32:41.159 READ FPDMA QUEUED
Error 2328 occurred at disk power-on lifetime: 14422 hours (600 days +
22 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 ff ff ff 4f 00 1d+04:32:41.159 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 1d+04:32:41.159 READ FPDMA QUEUED
2f 00 01 10 00 00 00 00 1d+04:32:41.112 READ LOG EXT
60 00 30 ff ff ff 4f 00 1d+04:32:38.381 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 1d+04:32:38.381 READ FPDMA QUEUED
Error 2327 occurred at disk power-on lifetime: 14422 hours (600 days +
22 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 ff ff ff 4f 00 1d+04:32:38.381 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 1d+04:32:38.381 READ FPDMA QUEUED
2f 00 01 10 00 00 00 00 1d+04:32:38.343 READ LOG EXT
60 00 30 ff ff ff 4f 00 1d+04:32:35.600 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 1d+04:32:35.600 READ FPDMA QUEUED
Error 2326 occurred at disk power-on lifetime: 14422 hours (600 days +
22 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 ff ff ff 4f 00 1d+04:32:35.600 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 1d+04:32:35.600 READ FPDMA QUEUED
2f 00 01 10 00 00 00 00 1d+04:32:35.553 READ LOG EXT
60 00 30 ff ff ff 4f 00 1d+04:32:32.843 READ FPDMA QUEUED
60 00 30 ff ff ff 4f 00 1d+04:32:32.831 READ FPDMA QUEUED
Error 2325 occurred at disk power-on lifetime: 14422 hours (600 days +
22 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 30 ff ff ff 4f 00 1d+04:32:32.843 READ FPDMA QUEUED
60 00 30 ff ff ff 4f 00 1d+04:32:32.831 READ FPDMA QUEUED
60 00 60 ff ff ff 4f 00 1d+04:32:32.816 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 1d+04:32:32.811 READ FPDMA QUEUED
60 00 28 ff ff ff 4f 00 1d+04:32:32.803 READ FPDMA QUEUED
My questions is, how to write through dd command utility to give
instruction that sector addresses should be not used by system? I did
something like this:
root at BEC-STG-P1:~ # sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16
root at BEC-STG-P1:~ # dd if=/dev/zero of=/dev/ada0 bs=128k count=1
seek=268435455
dd: /dev/ada0: Input/output error
1+0 records in
0+0 records out
0 bytes transferred in 0.000089 secs (0 bytes/sec)
Where's bs value from output (But failed, I change to bs=512):
root at BEC-STG-P1:~ # zfs get all | grep recordsize
pool recordsize 128K default
^^^^
And seek address from output:
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
^^^^^^^^^
Is this calculation correct?
--
Regards,
Budi Janto
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 203 bytes
Desc: OpenPGP digital signature
URL: <https://listi.jpberlin.de/pipermail/smartmontools-support/attachments/20210311/ef6676eb/attachment.sig>
More information about the Smartmontools-support
mailing list