[smartmontools-support] Should I worry?
Nathan Stratton Treadway
nathanst at ontko.com
Sat Nov 23 20:36:59 CET 2019
On Sat, Nov 23, 2019 at 13:36:46 +0100, Jørn Dahl-Stamnes wrote:
> Hello,
>
> got a disk that have unreadable sectors. From logwatch I get this message:
>
> Currently unreadable (pending) sectors detected:
> /dev/sda [SAT] - 96 Time(s)
> 2 unreadable sectors detected
>
> Is this a sign that this disk are about to die?
Well, the precise meaning of this warning is that the drive found 2
sectors that it was unable to read successfully.
This is certainly a sign that something "not good" has happened on that
drive, and so some people/sites have a policy of simply replacing the
drive as soon as any such errors happen, figuring "better safe than
sorry".
However, it's also possible for the sectors to be unreadable due to
one-time problems, and that the drive will continue to work fine for a
long time once you resolve these particular errors.
So basically you have to weigh the hassle/cost of replacing the drive
now against the danger of it failing suddenly if you don't, based both
on how the drive is being used and what the data SMART is telling you.
Unfortunately you can't predict with any certainty just from the current
situation what will happen to the drive in the future, so given the
existance of the errors its definitely wise to make sure your backups
are happening regularly and probably a good idea to have a replacement
disk on hand hand in case this one starts to fail more drastically.
I've seen many disks with this sort of error work fine for years after
fixing the errors... but also a few which just had more and more bad
sectors over a period of a few weeks after the first errors showed up
(at which point we proceeded to replace them).
>
> $ smartct -a /dev/sda:
>
[...]
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED
> RAW_VALUE
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
[...]
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 2
> 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
[...]
In this case the Reallocated_Sector_Ct is still zero and there are only
two Current_Pending_Sector sectors, so it still seems plausible that
the errors are "self contained" rather then a sign of a broader failure.
If you see Reallocated_Sector_Ct climbing over time, or are unable to
clear the Current_Pending_Sector sector count back to zero, then I'd
start to be convinced there was a more general failure sitaution.
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours)
> LBA_of_first_error
> # 1 Extended offline Completed: read failure 90% 24838 1857598668
> # 2 Short offline Completed: read failure 90% 24836 1857598666
> # 3 Short offline Completed: read failure 90% 24812 1857598664
> # 4 Short offline Completed: read failure 90% 24789 1857598664
> # 5 Short offline Completed: read failure 90% 24765 1857598664
> # 6 Short offline Completed: read failure 90% 24741 1857598666
> # 7 Short offline Completed: read failure 90% 24717 1857598666
> # 8 Short offline Completed: read failure 90% 24693 1857598664
> # 9 Extended offline Completed: read failure 90% 24671 1857598665
> #10 Short offline Completed: read failure 90% 24669 1857598665
> #11 Short offline Completed: read failure 90% 24645 1857598664
> #12 Short offline Completed without error 00% 24621 -
The interesting thing here is that you are getting consistent read
failures within a few-sector range, so if you can identify what data is
stored on those sectors it should be fairly straightforward to rewrite
those particular ones (keeping in mind that this drive has 4kiB physical
sectors so you have to rewrite 8 logical sectors at once in order to
rewrite one physical one), thus either clearing the
Current_Pending_Sector count or finding that you flush out further
errors.
How difficult it will be to identify the data in question will depend
on how your disk is organized, but one place to start would be:
https://www.smartmontools.org/wiki/BadBlockHowto
Nathan
----------------------------------------------------------------------------
Nathan Stratton Treadway - nathanst at ontko.com - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239
More information about the Smartmontools-support
mailing list