[smartmontools-support] I can't complete the long test

Carlos E. R. robin.listas at telefonica.net
Fri May 7 12:59:48 CEST 2021


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Friday, 2021-04-23 at 21:31 +0200, Carlos E. R. wrote:
> On Friday, 2021-04-23 at 12:51 +0200, Carlos E. R. wrote:
>>  On 23/04/2021 12.14, Claudio Kuenzler wrote:

...

> At this moment, I'm guessing that smartd interprets the disk is sleeping and 
> aborts the test, so I'm going to kill smartd.

...

> Starting the test (two identical disks)

...

> Hopefully, I'l report tomorrow success.

...

I did a mistake and had to change some things and restart.

So, I stopped the daemon, and had a script writing a text file every three 
minutes on the two identical disks being tested, then forcing a sync of 
those two. With that method, the long test suceeded.

At this moment, smartd is testing the disks automatically. I have the 
script keeping busy one of them, and will see what happens.

[... somewhat later ...]

<3.6> 2021-04-24T22:19:42.271930+02:00 Isengard smartd 12721 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], old test of type L not run at Sat Apr 24 03:00:00 2021 CEST, starting now.
<3.6> 2021-04-24T22:19:42.334812+02:00 Isengard smartd 12721 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], starting scheduled Long Self-Test.
<3.6> 2021-04-24T22:49:36.820881+02:00 Isengard smartd 12721 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], self-test in progress, 90% remaining
<3.6> 2021-04-24T22:59:30.325255+02:00 Isengard smartd 12721 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], state written to /var/lib/smartmontools/smartd.WDC_WD80EZAZ_11TDBA0-2TKST2SD.ata.state
<3.6> 2021-04-24T23:00:01.337059+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], opened
<3.6> 2021-04-24T23:00:01.337530+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], WDC WD80EZAZ-11TDBA0, S/N:2TKST2SD, WWN:5-000cca-26af51579, FW:83.H0A83, 8.00 TB
<3.6> 2021-04-24T23:00:01.351494+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], found in smartd database: Western Digital Ultrastar He10/12
<3.6> 2021-04-24T23:00:01.362339+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], is SMART capable. Adding to "monitor" list.
<3.6> 2021-04-24T23:00:01.363817+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], state read from /var/lib/smartmontools/smartd.WDC_WD80EZAZ_11TDBA0-2TKST2SD.ata.state
<3.6> 2021-04-24T23:00:01.837603+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], self-test in progress, 90% remaining
<3.6> 2021-04-24T23:00:01.849333+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], state written to /var/lib/smartmontools/smartd.WDC_WD80EZAZ_11TDBA0-2TKST2SD.ata.state
<3.6> 2021-04-24T23:30:02.329874+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], self-test in progress, 80% remaining
<3.6> 2021-04-25T00:30:05.990162+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], previous self-test was aborted by the host
Isengard:~ #

That's sdc, the disk that the script doesn't write to. It goes to sleep 
while testing itself. sdb and sde doesn't seem to suffer this problem.

I have to wait till all the tests complete, then I'll repeat with smartd stopped, and see if there is a difference.

[... next afternoon]

<3.6> 2021-04-25T15:00:11.804687+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], previous self-test completed without error

That's sdd, the disk that my script keeps alive. Other disks, without keep 
alive, complete the test:


<3.6> 2021-04-25T02:30:01.626022+02:00 Isengard smartd 14624 - -  Device: /dev/sda [SAT], previous self-test completed without error

<3.6> 2021-04-25T02:00:07.557851+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], starting scheduled Short Self-Test.
<3.6> 2021-04-25T02:30:07.016454+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], previous self-test completed without error

<3.6> 2021-04-25T06:00:01.833849+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/wwn-0x5000c5009399305f [SAT], previous self-test completed without error

<3.6> 2021-04-25T06:00:01.974584+02:00 Isengard smartd 14624 - -  Device: /dev/disk/by-id/wwn-0x5000c500c4beb480 [SAT], previous self-test completed without error




I will test again both, with daemon stopped, and the script running on 
only one of the disks.

2021-04-25 22:38:49.036941561+02:00

Isengard:~ # smartctl --log=selftest /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     10891         -
# 2  Extended offline    Completed without error       00%     10866         -
# 3  Extended offline    Completed without error       00%     10849         -
# 4  Extended offline    Aborted by host               90%     10832         -
# 5  Extended offline    Aborted by host               90%     10830         -
# 6  Extended offline    Aborted by host               80%     10825         -
# 7  Extended offline    Aborted by host               80%     10815         -
# 8  Short offline       Completed without error       00%     10812         -
# 9  Extended offline    Aborted by host               80%     10811         -
#10  Short offline       Completed without error       00%     10788         -
#11  Short offline       Completed without error       00%     10764         -
#12  Extended offline    Aborted by host               90%     10742         -
#13  Short offline       Completed without error       00%     10740         -
#14  Short offline       Completed without error       00%     10716         -
#15  Short offline       Completed without error       00%     10692         -
#16  Short offline       Completed without error       00%     10671         -
#17  Short offline       Completed without error       00%     10647         -
#18  Short offline       Completed without error       00%     10623         -
#19  Short offline       Completed without error       00%     10599         -
#20  Extended offline    Aborted by host               80%     10578         -
#21  Short offline       Completed without error       00%     10575         -

Isengard:~ #

So, the disk that was forced to be busy by a script, completed the test.
The disk that was not forced busy, failed the test:

Isengard:~ # smartctl --log=selftest /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%     16420         -
# 2  Short offline       Completed without error       00%     16399         -
# 3  Extended offline    Aborted by host               80%     16398         -
# 4  Extended offline    Completed without error       00%     16394         -
# 5  Extended offline    Aborted by host               90%     16377         -
# 6  Short offline       Completed without error       00%     16358         -
# 7  Short offline       Completed without error       00%     16333         -
# 8  Short offline       Completed without error       00%     16309         -
# 9  Short offline       Completed without error       00%     16285         -
#10  Short offline       Completed without error       00%     16262         -
#11  Short offline       Completed without error       00%     16238         -
#12  Extended offline    Aborted by host               90%     16216         -
#13  Short offline       Completed without error       00%     16214         -
#14  Short offline       Completed without error       00%     16190         -
#15  Short offline       Completed without error       00%     16166         -
#16  Short offline       Completed without error       00%     16145         -
#17  Short offline       Completed without error       00%     16121         -
#18  Short offline       Completed without error       00%     16097         -
#19  Short offline       Completed without error       00%     16073         -
#20  Extended offline    Aborted by host               80%     16051         -
#21  Short offline       Completed without error       00%     16049         -

Isengard:~ #


As Nathan said, it is the usb chipset which is the culprit, it is seen as
the "host". I didn't realize this. The daemon doesn't influence things.


So, perhaps I have to run a cronjob simultaenously with smartd testing to
force the disks to stay awake to be tested, or forget automated (long)
testing and do it manually when I want, as anyway the testing takes almost a
day.



- -- 
Cheers,
        Carlos E. R.
        (from openSUSE 15.2 x86_64 at Telcontar)

-----BEGIN PGP SIGNATURE-----

iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCYJUdpRwccm9iaW4ubGlz
dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVbGkAnR1dfNybdHOsno5Zl7Ij
w0fgqg2KAJ9k1sjnZ1oL/xuhNvDghiGcoeBWgw==
=8SaR
-----END PGP SIGNATURE-----


More information about the Smartmontools-support mailing list