[smartmontools-support] Smartd is not running selective tests
Anthony Desmarais
anthony at tunguydesmarais.com
Mon Mar 27 20:14:14 CEST 2023
Hi wonder if anyone can help me.
I have two drives in my one linux box running Fedora 37. I also have
smartmontools ver 7.3-3.
The box has a western digital purple 8TB drive as well as a Seagate
Skyhawk 6TB drive.
I am trying to get smartd to run a selective test every monday at 1am,
performing a test of about a quarter of the drive every week.
To start with I executed the first selective test manually with the
following commands:
smartctl -t select,0-3907013292
/dev/disk/by-id/ata-WDC_WD80PURZ-85YNPY0_R6GE804Z
smartctl -t select,0-2930261292
/dev/disk/by-id/ata-ST6000VX001-2BD186_ZR13347M
Both of these ran just fine and I can see in the smart report that the
tests completed successfully (see attached text file containing both
reports).
Then in smartd.conf i have added these two lines:
/dev/disk/by-id/ata-SQF-S25M8-256G-SAC_2FA6078110F500505907 -I 194 -d
ata -f -l error -l selftest -l selfteststs -m anthony at tunguydesmarais>
/dev/disk/by-id/ata-WDC_WD80PURZ-85YNPY0_R6GE804Z -I 194 -d ata -a -m
anthony at tunguydesmarais.com -n standby,12,q -s (S/../../3/03|c/../../1>
/dev/disk/by-id/ata-ST6000VX001-2BD186_ZR13347M -I 194 -d ata -a -m
anthony at tunguydesmarais.com -n standby,12,q -s (S/../../3/03|c/../../1/0>
So the last two disks should run a selective test every monday at 1am.
However this test in not running. Looking is me syslog i see the
following errors:
Mar 27 01:53:02 daemon.crit [26]: smartd - smartd[1076]: - Device:
/dev/disk/by-id/ata-WDC_WD80PURZ-85YNPY0_R6GE804Z, prepare Selective
Self-Test failed
Mar 27 01:53:02 daemon.crit [26]: smartd - smartd[1076]: - Device:
/dev/disk/by-id/ata-ST6000VX001-2BD186_ZR13347M, prepare Selective
Self-Test failed
Out of interest I have another box running debian with two identical
drive in it (my backup NAS). I have the same setting in that one and I
see that the selective tests run just fine.
-------------- next part --------------
smartctl -a /dev/disk/by-id/ata-WDC_WD80PURZ-85YNPY0_R6GE804Z
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.7-200.fc37.x86_64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Purple
Device Model: WDC WD80PURZ-85YNPY0
Serial Number: R6GE804Z
LU WWN Device Id: 5 000cca 263c606df
Firmware Version: 80.H0A80
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Form Factor: 3.5 inches
Device is: In smartctl database 7.3/5319
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Mar 27 19:52:28 2023 SAST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 101) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: (1225) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_ FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 112
3 Spin_Up_Time 0x0007 185 185 024 Pre-fail Always - 325 (Average 388)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 585
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 128 128 020 Pre-fail Offline - 18
9 Power_On_Hours 0x0012 094 094 000 Old_age Always - 46761
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 575
22 Unknown_Attribute 0x0023 100 100 025 Pre-fail Always - 100
192 Power-Off_Retract_Count 0x0032 098 098 000 Old_age Always - 2468
193 Load_Cycle_Count 0x0012 098 098 000 Old_age Always - 2468
194 Temperature_Celsius 0x0002 196 196 000 Old_age Always - 33 (Min/Max 21/43)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 7
SMART Error Log Version: 1
ATA Error Count: 7 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 7 occurred at disk power-on lifetime: 7780 hours (324 days + 4 hours)
When the command that caused the error occurred, the device was active or idle .
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 41 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 08 e8 30 f4 85 40 08 6d+15:04:19.306 WRITE FPDMA QUEUED
ea 00 00 00 00 00 a0 08 6d+15:04:19.305 FLUSH CACHE EXT
ea 00 00 00 00 00 a0 08 6d+15:04:19.294 FLUSH CACHE EXT
61 10 d0 20 f4 85 40 08 6d+15:04:19.294 WRITE FPDMA QUEUED
ea 00 00 00 00 00 a0 08 6d+15:04:12.297 FLUSH CACHE EXT
Error 6 occurred at disk power-on lifetime: 7679 hours (319 days + 23 hours)
When the command that caused the error occurred, the device was active or idle .
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 41 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 80 00 08 ba 40 08 2d+10:11:16.115 WRITE FPDMA QUEUED
61 00 d8 00 50 ba 40 08 2d+10:11:16.115 WRITE FPDMA QUEUED
61 e0 d0 20 46 ba 40 08 2d+10:11:16.112 WRITE FPDMA QUEUED
61 20 c8 00 40 ba 40 08 2d+10:11:16.110 WRITE FPDMA QUEUED
61 00 c0 00 38 ba 40 08 2d+10:11:16.107 WRITE FPDMA QUEUED
Error 5 occurred at disk power-on lifetime: 7659 hours (319 days + 3 hours)
When the command that caused the error occurred, the device was active or idle .
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 41 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 28 18 d8 12 00 40 08 1d+14:10:59.413 WRITE FPDMA QUEUED
61 20 08 00 11 c0 40 08 1d+14:10:59.413 WRITE FPDMA QUEUED
61 20 00 00 11 40 40 08 1d+14:10:59.412 WRITE FPDMA QUEUED
61 10 f0 00 11 80 40 08 1d+14:10:59.407 WRITE FPDMA QUEUED
61 28 e8 00 11 c0 40 08 1d+14:10:59.404 WRITE FPDMA QUEUED
Error 4 occurred at disk power-on lifetime: 7656 hours (319 days + 0 hours)
When the command that caused the error occurred, the device was active or idle .
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 41 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 68 00 e0 ce 40 08 1d+11:11:20.644 WRITE FPDMA QUEUED
61 00 18 00 80 cf 40 08 1d+11:11:20.644 WRITE FPDMA QUEUED
61 00 10 00 78 cf 40 08 1d+11:11:20.639 WRITE FPDMA QUEUED
61 00 08 00 70 cf 40 08 1d+11:11:20.633 WRITE FPDMA QUEUED
61 00 00 00 68 cf 40 08 1d+11:11:20.627 WRITE FPDMA QUEUED
Error 3 occurred at disk power-on lifetime: 7656 hours (319 days + 0 hours)
When the command that caused the error occurred, the device was active or idle .
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 41 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 c0 38 40 9d 9b 40 08 1d+11:11:02.596 WRITE FPDMA QUEUED
61 80 48 80 a6 9b 40 08 1d+11:11:02.595 WRITE FPDMA QUEUED
61 80 40 00 a0 9b 40 08 1d+11:11:02.591 WRITE FPDMA QUEUED
61 40 30 00 98 9b 40 08 1d+11:11:02.590 WRITE FPDMA QUEUED
61 c0 28 40 95 9b 40 08 1d+11:11:02.586 WRITE FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error
# 1 Selective offline Completed without error 00% 46695 -
# 2 Short offline Completed without error 00% 46625 -
# 3 Short offline Completed without error 00% 46459 -
# 4 Short offline Completed without error 00% 46301 -
# 5 Short offline Completed without error 00% 46146 -
# 6 Short offline Completed without error 00% 45995 -
# 7 Short offline Completed without error 00% 45839 -
# 8 Short offline Completed without error 00% 45770 -
# 9 Selective offline Completed without error 00% 45737 -
#10 Selective offline Completed without error 00% 45734 -
#11 Selective offline Aborted by host 90% 45727 -
#12 Short offline Completed without error 00% 45608 -
#13 Short offline Completed without error 00% 45455 -
#14 Extended offline Completed without error 00% 45439 -
#15 Extended offline Aborted by host 30% 45341 -
#16 Short offline Completed without error 00% 45297 -
#17 Extended offline Aborted by host 10% 45161 -
#18 Short offline Completed without error 00% 45110 -
#19 Extended offline Aborted by host 50% 45073 -
#20 Short offline Completed without error 00% 7621 -
#21 Extended offline Aborted by host 80% 7620 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 3907013292 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
smartctl -a /dev/disk/by-id/ata-ST6000VX001-2BD186_ZR13347M
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.2.7-200.fc37.x86_64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Skyhawk
Device Model: ST6000VX001-2BD186
Serial Number: ZR13347M
LU WWN Device Id: 5 000c50 0e3da56c5
Firmware Version: CV12
User Capacity: 6,001,175,126,016 bytes [6.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database 7.3/5319
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Mar 27 19:53:50 2023 SAST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 694) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x70bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 080 065 006 Pre-fail Always - 102841324
3 Spin_Up_Time 0x0003 093 091 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 171
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 069 060 045 Pre-fail Always - 8637676
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1697h+00m+00.000s
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 169
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 064 057 040 Old_age Always - 36 (Min/Max 35/36)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 113
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 433
194 Temperature_Celsius 0x0022 036 043 000 Old_age Always - 36 (0 21 0 0 0)
195 Hardware_ECC_Recovered 0x001a 080 065 000 Old_age Always - 102841324
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 1615h+06m+31.297s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 91616742
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 11224582
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Selective offline Completed without error 00% 1630 -
# 2 Short offline Completed without error 00% 1561 -
# 3 Short offline Completed without error 00% 1394 -
# 4 Short offline Completed without error 00% 1236 -
# 5 Short offline Completed without error 00% 1081 -
# 6 Short offline Completed without error 00% 930 -
# 7 Short offline Completed without error 00% 774 -
# 8 Short offline Completed without error 00% 705 -
# 9 Selective offline Completed without error 00% 667 -
#10 Selective offline Aborted by host 90% 662 -
#11 Short offline Completed without error 00% 543 -
#12 Short offline Completed without error 00% 390 -
#13 Short offline Completed without error 00% 232 -
#14 Extended offline Completed without error 00% 89 -
#15 Short offline Completed without error 00% 45 -
#16 Extended offline Completed without error 00% 31 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 2930261292 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
More information about the Smartmontools-support
mailing list