[smartmontools-support] I can't complete the long test
Carlos E. R.
robin.listas at telefonica.net
Fri May 7 12:59:48 CEST 2021
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Friday, 2021-04-23 at 21:31 +0200, Carlos E. R. wrote:
> On Friday, 2021-04-23 at 12:51 +0200, Carlos E. R. wrote:
>> On 23/04/2021 12.14, Claudio Kuenzler wrote:
...
> At this moment, I'm guessing that smartd interprets the disk is sleeping and
> aborts the test, so I'm going to kill smartd.
...
> Starting the test (two identical disks)
...
> Hopefully, I'l report tomorrow success.
...
I did a mistake and had to change some things and restart.
So, I stopped the daemon, and had a script writing a text file every three
minutes on the two identical disks being tested, then forcing a sync of
those two. With that method, the long test suceeded.
At this moment, smartd is testing the disks automatically. I have the
script keeping busy one of them, and will see what happens.
[... somewhat later ...]
<3.6> 2021-04-24T22:19:42.271930+02:00 Isengard smartd 12721 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], old test of type L not run at Sat Apr 24 03:00:00 2021 CEST, starting now.
<3.6> 2021-04-24T22:19:42.334812+02:00 Isengard smartd 12721 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], starting scheduled Long Self-Test.
<3.6> 2021-04-24T22:49:36.820881+02:00 Isengard smartd 12721 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], self-test in progress, 90% remaining
<3.6> 2021-04-24T22:59:30.325255+02:00 Isengard smartd 12721 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], state written to /var/lib/smartmontools/smartd.WDC_WD80EZAZ_11TDBA0-2TKST2SD.ata.state
<3.6> 2021-04-24T23:00:01.337059+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], opened
<3.6> 2021-04-24T23:00:01.337530+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], WDC WD80EZAZ-11TDBA0, S/N:2TKST2SD, WWN:5-000cca-26af51579, FW:83.H0A83, 8.00 TB
<3.6> 2021-04-24T23:00:01.351494+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], found in smartd database: Western Digital Ultrastar He10/12
<3.6> 2021-04-24T23:00:01.362339+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], is SMART capable. Adding to "monitor" list.
<3.6> 2021-04-24T23:00:01.363817+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], state read from /var/lib/smartmontools/smartd.WDC_WD80EZAZ_11TDBA0-2TKST2SD.ata.state
<3.6> 2021-04-24T23:00:01.837603+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], self-test in progress, 90% remaining
<3.6> 2021-04-24T23:00:01.849333+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], state written to /var/lib/smartmontools/smartd.WDC_WD80EZAZ_11TDBA0-2TKST2SD.ata.state
<3.6> 2021-04-24T23:30:02.329874+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], self-test in progress, 80% remaining
<3.6> 2021-04-25T00:30:05.990162+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], previous self-test was aborted by the host
Isengard:~ #
That's sdc, the disk that the script doesn't write to. It goes to sleep
while testing itself. sdb and sde doesn't seem to suffer this problem.
I have to wait till all the tests complete, then I'll repeat with smartd stopped, and see if there is a difference.
[... next afternoon]
<3.6> 2021-04-25T15:00:11.804687+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], previous self-test completed without error
That's sdd, the disk that my script keeps alive. Other disks, without keep
alive, complete the test:
<3.6> 2021-04-25T02:30:01.626022+02:00 Isengard smartd 14624 - - Device: /dev/sda [SAT], previous self-test completed without error
<3.6> 2021-04-25T02:00:07.557851+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], starting scheduled Short Self-Test.
<3.6> 2021-04-25T02:30:07.016454+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], previous self-test completed without error
<3.6> 2021-04-25T06:00:01.833849+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/wwn-0x5000c5009399305f [SAT], previous self-test completed without error
<3.6> 2021-04-25T06:00:01.974584+02:00 Isengard smartd 14624 - - Device: /dev/disk/by-id/wwn-0x5000c500c4beb480 [SAT], previous self-test completed without error
I will test again both, with daemon stopped, and the script running on
only one of the disks.
2021-04-25 22:38:49.036941561+02:00
Isengard:~ # smartctl --log=selftest /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 10891 -
# 2 Extended offline Completed without error 00% 10866 -
# 3 Extended offline Completed without error 00% 10849 -
# 4 Extended offline Aborted by host 90% 10832 -
# 5 Extended offline Aborted by host 90% 10830 -
# 6 Extended offline Aborted by host 80% 10825 -
# 7 Extended offline Aborted by host 80% 10815 -
# 8 Short offline Completed without error 00% 10812 -
# 9 Extended offline Aborted by host 80% 10811 -
#10 Short offline Completed without error 00% 10788 -
#11 Short offline Completed without error 00% 10764 -
#12 Extended offline Aborted by host 90% 10742 -
#13 Short offline Completed without error 00% 10740 -
#14 Short offline Completed without error 00% 10716 -
#15 Short offline Completed without error 00% 10692 -
#16 Short offline Completed without error 00% 10671 -
#17 Short offline Completed without error 00% 10647 -
#18 Short offline Completed without error 00% 10623 -
#19 Short offline Completed without error 00% 10599 -
#20 Extended offline Aborted by host 80% 10578 -
#21 Short offline Completed without error 00% 10575 -
Isengard:~ #
So, the disk that was forced to be busy by a script, completed the test.
The disk that was not forced busy, failed the test:
Isengard:~ # smartctl --log=selftest /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Aborted by host 90% 16420 -
# 2 Short offline Completed without error 00% 16399 -
# 3 Extended offline Aborted by host 80% 16398 -
# 4 Extended offline Completed without error 00% 16394 -
# 5 Extended offline Aborted by host 90% 16377 -
# 6 Short offline Completed without error 00% 16358 -
# 7 Short offline Completed without error 00% 16333 -
# 8 Short offline Completed without error 00% 16309 -
# 9 Short offline Completed without error 00% 16285 -
#10 Short offline Completed without error 00% 16262 -
#11 Short offline Completed without error 00% 16238 -
#12 Extended offline Aborted by host 90% 16216 -
#13 Short offline Completed without error 00% 16214 -
#14 Short offline Completed without error 00% 16190 -
#15 Short offline Completed without error 00% 16166 -
#16 Short offline Completed without error 00% 16145 -
#17 Short offline Completed without error 00% 16121 -
#18 Short offline Completed without error 00% 16097 -
#19 Short offline Completed without error 00% 16073 -
#20 Extended offline Aborted by host 80% 16051 -
#21 Short offline Completed without error 00% 16049 -
Isengard:~ #
As Nathan said, it is the usb chipset which is the culprit, it is seen as
the "host". I didn't realize this. The daemon doesn't influence things.
So, perhaps I have to run a cronjob simultaenously with smartd testing to
force the disks to stay awake to be tested, or forget automated (long)
testing and do it manually when I want, as anyway the testing takes almost a
day.
- --
Cheers,
Carlos E. R.
(from openSUSE 15.2 x86_64 at Telcontar)
-----BEGIN PGP SIGNATURE-----
iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCYJUdpRwccm9iaW4ubGlz
dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVbGkAnR1dfNybdHOsno5Zl7Ij
w0fgqg2KAJ9k1sjnZ1oL/xuhNvDghiGcoeBWgw==
=8SaR
-----END PGP SIGNATURE-----
More information about the Smartmontools-support
mailing list