[smartmontools-support] I can't complete the long test
Carlos E. R.
robin.listas at telefonica.net
Fri Apr 23 21:31:21 CEST 2021
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Friday, 2021-04-23 at 12:51 +0200, Carlos E. R. wrote:
> On 23/04/2021 12.14, Claudio Kuenzler wrote:
I started another test this afternoong:
Isengard:~ # date --rfc-3339=ns ; smartctl --test=long /dev/sdd
2021-04-23 13:29:26.395555499+02:00
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1051 minutes for test to complete.
Test will complete after Sat Apr 24 07:00:26 2021 CEST
Use smartctl -X to abort test.
Isengard:~ #
An hour later it kept running:
Isengard:~ # date --rfc-3339=ns ; smartctl --test=long /dev/sdd
2021-04-23 14:40:32.288850715+02:00
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Can't start self-test without aborting current test (80% remaining),
add '-t force' option to override, or run 'smartctl -X' to abort test.
Isengard:~ #
But this evening it was not running:
Isengard:~ # date --rfc-3339=ns ; smartctl --test=long /dev/sdd
2021-04-23 20:27:13.608830196+02:00
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1051 minutes for test to complete.
Test will complete after Sat Apr 24 13:58:13 2021 CEST
Use smartctl -X to abort test.
Isengard:~ # date --rfc-3339=ns ; smartctl -X /dev/sdd
2021-04-23 20:27:25.217864047+02:00
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
Tests:
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Aborted by host 90% 10830 -
# 2 Extended offline Aborted by host 80% 10825 -
# 3 Extended offline Aborted by host 80% 10815 -
# 4 Short offline Completed without error 00% 10812 -
# 5 Extended offline Aborted by host 80% 10811 -
# 6 Short offline Completed without error 00% 10788 -
# 7 Short offline Completed without error 00% 10764 -
# 8 Extended offline Aborted by host 90% 10742 -
# 9 Short offline Completed without error 00% 10740 -
#10 Short offline Completed without error 00% 10716 -
#11 Short offline Completed without error 00% 10692 -
#12 Short offline Completed without error 00% 10671 -
#13 Short offline Completed without error 00% 10647 -
#14 Short offline Completed without error 00% 10623 -
#15 Short offline Completed without error 00% 10599 -
#16 Extended offline Aborted by host 80% 10578 -
#17 Short offline Completed without error 00% 10575 -
#18 Short offline Completed without error 00% 10551 -
#19 Short offline Completed without error 00% 10527 -
#20 Short offline Completed without error 00% 10503 -
#21 Short offline Completed without error 00% 10479 -
At the same time as the test, I was running a script to keep the disk alive:
cer at Isengard:~> cat ~/bin/busybody
#!/bin/bash
COUNT=0
while true ; do
let "COUNT = $COUNT + 1"
echo -n -e "$COUNT \r"
touch /mnt/tmp/cer/tocado
sleep 1
rm /mnt/tmp/cer/tocado
sleep 179
done
cer at Isengard:~>
It is still running:
cer at Isengard:~> busybody
153
The log:
Isengard:~ # grep smartd /var/log/messages | egrep -i -v "Temperature" | tail
<3.6> 2021-04-23T03:47:39.018184+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], self-test in progress, 90% remaining
<3.6> 2021-04-23T04:47:46.053578+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], is in SLEEP mode, suspending checks
<3.6> 2021-04-23T05:17:39.526508+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], is back in ACTIVE or IDLE mode, resuming checks (1 check skipped)
<3.6> 2021-04-23T05:17:39.531501+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], previous self-test completed without error
<3.6> 2021-04-23T13:47:39.028412+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], self-test in progress, 90% remaining
<3.6> 2021-04-23T14:47:38.925176+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], self-test in progress, 80% remaining
<3.6> 2021-04-23T15:17:46.197105+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], is in SLEEP mode, suspending checks
<3.6> 2021-04-23T15:47:39.463341+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], is back in ACTIVE or IDLE mode, resuming checks (1 check skipped)
<3.6> 2021-04-23T15:47:39.468232+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], previous self-test completed without error
<3.6> 2021-04-23T20:47:44.130163+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], previous self-test was aborted by the host
Isengard:~ #
At this moment, I'm guessing that smartd interprets the disk is sleeping and aborts the test, so I'm going to kill smartd.
/etc/smartd.conf:
/dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 -d removable -d sat,16 -n standby -T permissive -m root at telcontar.valinor -a -s (S/../.././02|L/../../6/03)
Starting the test (two identical disks)
Isengard:~ # date --rfc-3339=ns ; smartctl --test=long /dev/sdd
2021-04-23 21:17:47.732833559+02:00
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1051 minutes for test to complete.
Test will complete after Sat Apr 24 14:48:47 2021 CEST
Use smartctl -X to abort test.
Isengard:~ # date --rfc-3339=ns ; smartctl --test=long /dev/sdc
2021-04-23 21:17:50.454715617+02:00
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.3.18-lp152.72-default] (SUSE RPM)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1013 minutes for test to complete.
Test will complete after Sat Apr 24 14:10:50 2021 CEST
Use smartctl -X to abort test.
Isengard:~ # systemctl stop smartd
Isengard:~ # systemctl status smartd
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Fri 2021-04-23 21:18:09 CEST; 5s ago
And "busybody" running on one of the disks to keep it busy:
#!/bin/bash
COUNT=0
while true ; do
let "COUNT = $COUNT + 1"
let "MINUTES = $COUNT * 180 / 60 "
echo -n -e "$COUNT (counting to $MINUTES minutes) \r"
touch /mnt/tmp/cer/tocado
sleep 1
rm /mnt/tmp/cer/tocado
sleep 179
done
Checking the log:
<3.6> 2021-04-23T21:17:33.203254+02:00 Isengard smartd 1149 - - Device: /dev/sda [SAT], SMART Usage Attribute: 189 Airflow_Temperature_Cel changed from 42 to 43
<3.6> 2021-04-23T21:17:33.204572+02:00 Isengard smartd 1149 - - Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 42 to 43
<3.6> 2021-04-23T21:18:09.442637+02:00 Isengard smartd 1149 - - smartd received signal 15: Terminated
<3.6> 2021-04-23T21:18:09.443313+02:00 Isengard smartd 1149 - - Device: /dev/sda [SAT], state written to /var/lib/smartmontools/smartd.KINGSTON_SMS200S3120G-50026B726901494E.ata.state
<3.6> 2021-04-23T21:18:09.443856+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/wwn-0x5000c5009399305f [SAT], state written to /var/lib/smartmontools/smartd.ST4000DM000_2AE166-ZDH0JC13.ata.state
<3.6> 2021-04-23T21:18:09.444301+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/wwn-0x5000c500c4beb480 [SAT], state written to /var/lib/smartmontools/smartd.ST4000DM004_2CV104-ZFN31Z3P.ata.state
<3.6> 2021-04-23T21:18:09.444719+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_32544B5354325344-0:0 [SAT], state written to /var/lib/smartmontools/smartd.WDC_WD80EZAZ_11TDBA0-2TKST2SD.ata.state
<3.6> 2021-04-23T21:18:09.445118+02:00 Isengard smartd 1149 - - Device: /dev/disk/by-id/usb-WD_My_Book_25EE_37484B53554B424A-0:0 [SAT], state written to /var/lib/smartmontools/smartd.WDC_WD80EZAZ_11TDBA0-7HKSUKBJ.ata.state
<3.6> 2021-04-23T21:18:09.445478+02:00 Isengard smartd 1149 - - smartd is exiting (exit status 0)
Hopefully, I'l report tomorrow success.
- --
Cheers,
Carlos E. R.
(from openSUSE 15.2 x86_64 at Telcontar)
-----BEGIN PGP SIGNATURE-----
iHoEARECADoWIQQZEb51mJKK1KpcU/W1MxgcbY1H1QUCYIMgiRwccm9iaW4ubGlz
dGFzQHRlbGVmb25pY2EubmV0AAoJELUzGBxtjUfVlXAAnA3FACcvbOZVI2va7AdM
xIAwntZaAJ9NGKJ9xz9IjPvTLPi9gsrVpR2AFQ==
=i36w
-----END PGP SIGNATURE-----
More information about the Smartmontools-support
mailing list