[smartmontools-support] I can't complete the long test

Carlos E. R. robin.listas at telefonica.net
Fri Apr 23 21:31:21 CEST 2021


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On Friday, 2021-04-23 at 12:51 +0200, Carlos E. R. wrote:
> On 23/04/2021 12.14, Claudio Kuenzler wrote:

I started another test this afternoong:

Isengard:~ # date --rfc-3339=ns ; smartctl --test=long /dev/sdd 
2021-04-23 13:29:26.395555499+02:00 
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === 
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". 
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. 
Testing has begun. 
Please wait 1051 minutes for test to complete. 
Test will complete after Sat Apr 24 07:00:26 2021 CEST 
Use smartctl -X to abort test. 
Isengard:~ #

An hour later it kept running:

Isengard:~ # date --rfc-3339=ns ; smartctl --test=long /dev/sdd 
2021-04-23 14:40:32.288850715+02:00 
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === 
Can't start self-test without aborting current test (80% remaining), 
add '-t force' option to override, or run 'smartctl -X' to abort test. 
Isengard:~ #


But this evening it was not running:


Isengard:~ # date --rfc-3339=ns ; smartctl --test=long /dev/sdd 
2021-04-23 20:27:13.608830196+02:00 
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === 
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". 
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun. 
Please wait 1051 minutes for test to complete.
Test will complete after Sat Apr 24 13:58:13 2021 CEST
Use smartctl -X to abort test.
Isengard:~ # date --rfc-3339=ns ; smartctl -X /dev/sdd 
2021-04-23 20:27:25.217864047+02:00
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org


Tests:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%     10830         -
# 2  Extended offline    Aborted by host               80%     10825         -
# 3  Extended offline    Aborted by host               80%     10815         -
# 4  Short offline       Completed without error       00%     10812         -
# 5  Extended offline    Aborted by host               80%     10811         -
# 6  Short offline       Completed without error       00%     10788         -
# 7  Short offline       Completed without error       00%     10764         -
# 8  Extended offline    Aborted by host               90%     10742         -
# 9  Short offline       Completed without error       00%     10740         -
#10  Short offline       Completed without error       00%     10716         -
#11  Short offline       Completed without error       00%     10692         -
#12  Short offline       Completed without error       00%     10671         -
#13  Short offline       Completed without error       00%     10647         -
#14  Short offline       Completed without error       00%     10623         -
#15  Short offline       Completed without error       00%     10599         -
#16  Extended offline    Aborted by host               80%     10578         -
#17  Short offline       Completed without error       00%     10575         -
#18  Short offline       Completed without error       00%     10551         -
#19  Short offline       Completed without error       00%     10527         -
#20  Short offline       Completed without error       00%     10503         -
#21  Short offline       Completed without error       00%     10479         -


At the same time as the test, I was running a script to keep the disk alive:

cer at Isengard:~> cat ~/bin/busybody
#!/bin/bash

COUNT=0
while true  ; do
     let "COUNT = $COUNT + 1"
     echo -n -e "$COUNT \r"
     touch /mnt/tmp/cer/tocado
     sleep 1
     rm /mnt/tmp/cer/tocado
     sleep 179
done
cer at Isengard:~>


It is still running:

cer at Isengard:~> busybody
153




The log:

Isengard:~ # grep smartd /var/log/messages | egrep -i -v "Temperature" | tail
<3.6> 2021-04-23T03:47:39.018184+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], self-test in progress, 90% remaining
<3.6> 2021-04-23T04:47:46.053578+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], is in SLEEP mode, suspending checks
<3.6> 2021-04-23T05:17:39.526508+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], is back in ACTIVE or IDLE mode, resuming checks (1 check skipped)
<3.6> 2021-04-23T05:17:39.531501+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], previous self-test completed without error
<3.6> 2021-04-23T13:47:39.028412+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], self-test in progress, 90% remaining
<3.6> 2021-04-23T14:47:38.925176+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], self-test in progress, 80% remaining
<3.6> 2021-04-23T15:17:46.197105+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], is in SLEEP mode, suspending checks
<3.6> 2021-04-23T15:47:39.463341+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], is back in ACTIVE or IDLE mode, resuming checks (1 check skipped)
<3.6> 2021-04-23T15:47:39.468232+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], previous self-test completed without error
<3.6> 2021-04-23T20:47:44.130163+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], previous self-test was aborted by the host
Isengard:~ #



At this moment, I'm guessing that smartd interprets the disk is sleeping and aborts the test, so I'm going to kill smartd.

/etc/smartd.conf:

/dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 -d removable -d sat,16 -n standby  -T permissive -m root at telcontar.valinor -a -s (S/../.././02|L/../../6/03)



Starting the test (two identical disks)

Isengard:~ # date --rfc-3339=ns ; smartctl --test=long /dev/sdd
2021-04-23 21:17:47.732833559+02:00 
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === 
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". 
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. 
Testing has begun. 
Please wait 1051 minutes for test to complete. 
Test will complete after Sat Apr 24 14:48:47 2021 CEST 
Use smartctl -X to abort test. 
Isengard:~ # date --rfc-3339=ns ; smartctl --test=long /dev/sdc 
2021-04-23 21:17:50.454715617+02:00 
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM) 
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1013 minutes for test to complete.
Test will complete after Sat Apr 24 14:10:50 2021 CEST
Use smartctl -X to abort test.
Isengard:~ # systemctl stop smartd 
Isengard:~ # systemctl status smartd 
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
    Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled)
    Active: inactive (dead) since Fri 2021-04-23 21:18:09 CEST; 5s ago


And "busybody" running on one of the disks to keep it busy:

#!/bin/bash

COUNT=0
while true  ; do
     let "COUNT = $COUNT + 1"
     let "MINUTES = $COUNT * 180 / 60 "
     echo -n -e "$COUNT (counting to $MINUTES minutes)  \r"

     touch /mnt/tmp/cer/tocado
     sleep 1
     rm /mnt/tmp/cer/tocado
     sleep 179
done




Checking the log:

<3.6> 2021-04-23T21:17:33.203254+02:00 Isengard smartd 1149 - -  Device: /dev/sda [SAT], SMART Usage Attribute: 189 Airflow_Temperature_Cel changed from 42 to 43
<3.6> 2021-04-23T21:17:33.204572+02:00 Isengard smartd 1149 - -  Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 42 to 43
<3.6> 2021-04-23T21:18:09.442637+02:00 Isengard smartd 1149 - -  smartd received signal 15: Terminated
<3.6> 2021-04-23T21:18:09.443313+02:00 Isengard smartd 1149 - -  Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.KINGSTON_SMS200S3120G-50026B726901494E.ata.state
<3.6> 2021-04-23T21:18:09.443856+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/wwn-0x5000c5009399305f [SAT], state written to /var/lib/smartmontools/smartd.ST4000DM000_2AE166-ZDH0JC13.ata.state
<3.6> 2021-04-23T21:18:09.444301+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/wwn-0x5000c500c4beb480 [SAT], state written to /var/lib/smartmontools/smartd.ST4000DM004_2CV104-ZFN31Z3P.ata.state
<3.6> 2021-04-23T21:18:09.444719+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], state written to /var/lib/smartmontools/smartd.WDC_WD80EZAZ_11TDBA0-2TKST2SD.ata.state
<3.6> 2021-04-23T21:18:09.445118+02:00 Isengard smartd 1149 - -  Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], state written to /var/lib/smartmontools/smartd.WDC_WD80EZAZ_11TDBA0-7HKSUKBJ.ata.state
<3.6> 2021-04-23T21:18:09.445478+02:00 Isengard smartd 1149 - -  smartd is exiting (exit status 0)



Hopefully, I'l report tomorrow success.




- -- 
Cheers,
        Carlos E. R.
        (from openSUSE 15.2 x86_64 at Telcontar)

-----BEGIN PGP SIGNATURE-----

iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCYIMgiRwccm9iaW4ubGlz
dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVlXAAnA3FACcvbOZVI2va7AdM
xIAwntZaAJ9NGKJ9xz9IjPvTLPi9gsrVpR2AFQ==
=i36w
-----END PGP SIGNATURE-----


More information about the Smartmontools-support mailing list