[smartmontools-support] Problem while running long device test

David C. Partridge david.partridge at perdrix.co.uk
Sat Nov 17 11:23:34 CET 2018


I have an Adapted 51245 Raid card, I installed a new device and defined a
simple volume on that device, and then started a long smart test:

 smartctl -t long -C -d aacraid,0,0,7 /dev/sdc

this started fine, but some few hours later I got this in syslog:

Nov 16 21:25:06 Charon smartd[852]: Device: /dev/sdc [aacraid_disk_00_00_7]
[SCSI], failed to read SMART values
Nov 16 21:25:06 Charon smartd[852]: Sending warning via
/usr/share/smartmontools/smartd-runner to david.partridge at perdrix.co.uk ...
Nov 16 21:25:07 Charon smartd[852]: Warning via
/usr/share/smartmontools/smartd-runner to david.partridge at perdrix.co.uk:
successful
Nov 16 21:25:07 Charon smartd[852]: Device: /dev/sdc [aacraid_disk_00_00_7]
[SCSI], failed to read Temperature
Nov 16 21:25:07 Charon postfix/pickup[4010]: 0112F285835: uid=0
from=<root at perdrix.co.uk>
Nov 16 21:25:07 Charon postfix/cleanup[4762]: 0112F285835:
message-id=<20181116212507.0112F285835 at Charon.perdrix.co.uk>
Nov 16 21:25:07 Charon postfix/qmgr[1541]: 0112F285835:
from=<root at perdrix.co.uk>, size=935, nrcpt=1 (queue active)
Nov 16 21:25:08 Charon postfix/smtp[4766]: 0112F285835:
to=<david.partridge at perdrix.co.uk>, relay=relay.plus.net[212.159.8.107]:587,
delay=1.2, delays=0.03/0.02/1.1/0.05, dsn=2.0.0, status=sent (250
NlbvgAhPbx6b6Nlbwg3EYs mail accepted for delivery)
Nov 16 21:25:08 Charon postfix/qmgr[1541]: 0112F285835: removed
Nov 16 21:25:42 Charon kernel: [16757.128862] aacraid: Host adapter abort
request.
Nov 16 21:25:42 Charon kernel: [16757.128862] aacraid: Outstanding commands
on (1,0,1,0):
Nov 16 21:25:42 Charon kernel: [16757.148889] aacraid: Host adapter reset
request. SCSI hang ?
Nov 16 21:25:57 Charon kernel: [16772.509020] aacraid: Host adapter reset
request. SCSI hang ?
Nov 16 21:25:57 Charon kernel: [16772.509034] aacraid 0000:01:00.0:
outstanding cmd: midlevel-0
Nov 16 21:25:57 Charon kernel: [16772.509038] aacraid 0000:01:00.0:
outstanding cmd: lowlevel-0
Nov 16 21:25:57 Charon kernel: [16772.509041] aacraid 0000:01:00.0:
outstanding cmd: error handler-0
Nov 16 21:25:57 Charon kernel: [16772.509044] aacraid 0000:01:00.0:
outstanding cmd: firmware-1
Nov 16 21:25:57 Charon kernel: [16772.509047] aacraid 0000:01:00.0:
outstanding cmd: kernel-1
Nov 16 21:26:27 Charon kernel: [16802.517230] sd 1:0:1:0: Device offlined -
not ready after error recovery
Nov 16 21:26:27 Charon kernel: [16802.517245] sd 1:0:1:0: [sdb] tag#0 FAILED
Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Nov 16 21:26:27 Charon kernel: [16802.517252] sd 1:0:1:0: [sdb] tag#0 CDB:
Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Nov 16 21:26:27 Charon kernel: [16802.517255] print_req_error: I/O error,
dev sdb, sector 0
Nov 16 21:26:27 Charon kernel: [16802.517265] Buffer I/O error on dev sdb,
logical block 0, async page read
Nov 16 21:27:13 Charon kernel: [16848.025581] aacraid: Host adapter reset
request. SCSI hang ?
Nov 16 21:27:28 Charon kernel: [16863.389743] aacraid: Host adapter reset
request. SCSI hang ?
Nov 16 21:27:28 Charon kernel: [16863.389757] aacraid 0000:01:00.0: Adapter
health - 217
Nov 16 21:27:28 Charon kernel: [16863.389765] aacraid 0000:01:00.0:
outstanding cmd: midlevel-0
Nov 16 21:27:28 Charon kernel: [16863.389768] aacraid 0000:01:00.0:
outstanding cmd: lowlevel-0
Nov 16 21:27:28 Charon kernel: [16863.389771] aacraid 0000:01:00.0:
outstanding cmd: error handler-0
Nov 16 21:27:28 Charon kernel: [16863.389774] aacraid 0000:01:00.0:
outstanding cmd: firmware-2
Nov 16 21:27:28 Charon kernel: [16863.389777] aacraid 0000:01:00.0:
outstanding cmd: kernel-0

The system was then hung.  I'm wondering if the problem occurred because the
test was still running in foreground mode and that the attempt to read smart
data was blocked as a result.

David





More information about the Smartmontools-support mailing list