[smartmontools-support] Extended test doesn't catch CurrentPendingSector and OfflineUncorrectableSector errors
Dipanjan Das
mail.dipanjan.das at gmail.com
Mon Jul 12 17:43:56 CEST 2021
Hi Christian,
Thanks for your detailed response.
On Mon, 12 Jul 2021 at 04:14, Christian Franke <Christian.Franke at t-online.de>
wrote:
>
> This log matches the ddrescue result. Read of (at least) LBA 3884004178
> failed occasionally and worked again in later tests. Unfortunately the
> disk firmware did not redirect this weak sector.
>
(197) Current_Pending_Sector is set to 1 for the last few weeks. Maybe that
sector refers to this one where the read fails intermittently? I have run
several extended checks since this error appeared, hoping that it would get
remapped, and (5) Reallocated_Sector_Ct will bump up. Unfortunately, that
didn't happen. Not sure if it's because of the intermittent nature of the
failure.
In the SMART attributes section, (198) Offline_Uncorrectable is set to 6
for the last few weeks. Also, I am curious what's that related to.
This isn't a full '-x' output. "SMART Extended Comprehensive Error Log"
> and other items are missing.
>
> Please redirect smartctl output to a file and post it as an plaintext
> attachment. Add '-q noserial' if desired.
>
Attached. Didn't pass '-q noserial' intentionally if that helps you in any
way.
--
Thanks & Regards,
Dipanjan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://listi.jpberlin.de/pipermail/smartmontools-support/attachments/20210712/007ce2ea/attachment-0001.htm>
-------------- next part --------------
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-143-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital RE4-GP
Device Model: WDC WD2002FYPS-01U1B0
Serial Number: WD-WCAVY0562214
LU WWN Device Id: 5 0014ee 2031a9933
Firmware Version: 04.ZZG04
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Mon Jul 12 08:25:45 2021 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM level is: 254 (maximum performance), recommended: 128
APM level is: 254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (42000) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 478) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 4
3 Spin_Up_Time POS--K 152 152 021 - 9375
4 Start_Stop_Count -O--CK 100 100 000 - 49
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 026 026 000 - 54399
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 49
192 Power-Off_Retract_Count -O--CK 200 200 000 - 38
193 Load_Cycle_Count -O--CK 200 200 000 - 10
194 Temperature_Celsius -O---K 123 100 000 - 29
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 1 <=== Which sector/LBA does it refer to?
198 Offline_Uncorrectable ----CK 200 200 000 - 6 <=== Which sectors/LBAs does it refer to?
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 7
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 SATA NCQ Queued Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb5 GPL,SL VS 1 Device vendor specific log
0xb6 GPL VS 1 Device vendor specific log
0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 24 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 2
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 2 [1] occurred at disk power-on lifetime: 54389 hours (2266 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 80 00 e7 33 00 81 52 40 00 Error: UNC at LBA = 0xe733008152 = 992993116498
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 80 00 20 00 00 e7 81 33 00 40 08 9d+13:34:05.461 READ FPDMA QUEUED
60 00 80 00 18 00 00 e7 81 32 80 40 08 9d+13:34:05.460 READ FPDMA QUEUED
60 00 80 00 10 00 00 e7 81 32 00 40 08 9d+13:34:05.459 READ FPDMA QUEUED
60 00 80 00 08 00 00 e7 81 31 80 40 08 9d+13:34:05.456 READ FPDMA QUEUED
60 00 80 00 00 00 00 e7 81 31 00 40 08 9d+13:34:05.455 READ FPDMA QUEUED
Error 1 [0] occurred at disk power-on lifetime: 47851 hours (1993 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 08 00 e7 33 00 81 52 40 00 Error: UNC at LBA = 0xe733008152 = 992993116498
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 08 00 f0 00 00 e7 81 33 50 40 08 19d+15:28:08.735 READ FPDMA QUEUED
60 00 08 00 e8 00 00 9c c1 2d 80 40 08 19d+15:28:08.723 READ FPDMA QUEUED
60 00 08 00 e0 00 00 5f 01 65 08 40 08 19d+15:28:08.709 READ FPDMA QUEUED
60 00 08 00 d8 00 00 e7 81 4c 60 40 08 19d+15:28:08.693 READ FPDMA QUEUED
60 00 08 00 d0 00 00 5f 01 2a e0 40 08 19d+15:28:08.671 READ FPDMA QUEUED
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Selective offline Completed: read failure 90% 54367 3884004178
# 2 Selective offline Completed without error 00% 54367 -
# 3 Selective offline Completed: read failure 70% 54367 3884004178
# 4 Selective offline Completed: read failure 90% 54367 3884004178
# 5 Extended offline Aborted by host 70% 54367 -
# 6 Extended offline Completed: read failure 90% 54268 3884004178
# 7 Extended offline Completed without error 00% 54206 -
# 8 Extended offline Completed without error 00% 54199 -
# 9 Extended offline Completed: read failure 90% 54190 3884004178
#10 Short offline Completed without error 00% 54189 -
#11 Extended offline Completed without error 00% 53801 -
#12 Short captive Completed without error 00% 1530 -
1 of 5 failed self-tests are outdated by newer successful extended offline self-test # 7
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 3884004178 3884004178 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 2
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: SMART Off-line Data Collection executing in background (4)
Current Temperature: 29 Celsius
Power Cycle Min/Max Temperature: 25/32 Celsius
Lifetime Min/Max Temperature: 29/52 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (174)
Index Estimated Time Temperature Celsius
175 2021-07-12 00:28 29 **********
... ..(476 skipped). .. **********
174 2021-07-12 08:25 29 **********
SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)
Device Statistics (GP/SMART Log 0x04) not supported
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x000a 2 16 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x8000 4 5157178 Vendor specific
More information about the Smartmontools-support
mailing list