[smartmontools-support] READ FPDMA QUEUED failure

Mon Dec 17 18:53:29 CET 2018

Hi

(sorry for not replying directly, I don't have the message id handy).

To me, it looks like sdd hit an unreadable area and md already forced
another write to it[1]. Thus a scrub will probably not not do anything
special but should not hurt either.

However, I would be interested in the output of

smartctl -a /dev/sdd

as this could deliver some more information, but my guess is it will
only show "healthy" and one or more reallocated sectors.

Therefore, for now the array should be healthy enough, but IMHO major
things to consider are:

* using RAID5 without an orthogonal/internal checksum algorithm  is
probably bad for the future - consider moving to RAID6 and/or use ZFS.
As long as a disk "fail" noisily on a sector it is fine, but imagine a
md scrub will find that within a stripe, data and parity don't agree
anymore and all disks claim their provided set of bytes are correct.
Which one is to be believed?

  In RAID6 you have another set of parity to break the tie, in ZFS or
other file systems, you have internal checksums which can tell the FS
which part is corrupt and which part is good.

* I think, /dev/sdd needs to be monitored closely as it may fail
prematurely in the future.

Apart from that, I think, there is not much you can do right now.

Cheers
Carsten

[1] I am not even sure if this is a 4k or 512b native
HDD as the docs are pretty thin on this topic. Knowing this could help
understanding if a single native block (8 512b blocks) is gone or if
"only" 6 adjacent sectors were corrupted.

-- 
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
Callinstraße 38, 30167 Hannover, Germany
Phone: +49 511 762 17185