[smartmontools-support] pending sectors behind RAID controller, force correction???

Bruce Allen bruce.allen at aei.mpg.de
Sat Jul 28 07:40:30 CEST 2018


Hi David,

I forgot to add, these days in my own clusters we are using Linux
software RAID, which works well with smartmontools and lets you take
good advantage of the SMART feature set and problem reporting.  In our
testing Linux software RAID (on reasonable modern CPUs and disks) had
BETTER performance than hardware RAID cards.  So these days we are not
buying hardware RAID systems.

Cheers,
	Bruce



On 28.07.18 07:36, Bruce Allen wrote:
> Hi David,
> 
> These are good questions.  Unfortunately I don't have the answers.  (You
> are sufficiently informed and experienced that any questions you have
> are probably ones for which there is no group knowledge!)
> 
> But the good news is that most hardware vendors have realized that
> supporting SMART features (at least for knowledgeable end users) is a
> worthwhile thing to do.  In addition, many of these vendors use
> smartmontools internally, so they share a common language and
> understanding with us.
> 
> So I suggest that you write a carefully worded letter to the company
> that made your RAID card, and ask them how you can force pending sector
> bad blocks to get overwritten.
> 
> Cheers,
> 	Bruce
> 
> 
> 
> 
> 
> On 09.07.18 19:37, David Mathog wrote:
>> Greetings all.
>>
>> Routine smartctl runs on a machine turned up a physical disk with 4
>> pending sectors and 4 reallocated events counted.  The problem is, this
>> is in a RAID array, but the raid controller
>> will not talk to the usual RAID control software.  That issue is
>> documented here:
>>
>> https://serverfault.com/questions/919209/megacli-commands-return-exit-code-0x00-with-perc-h200
>>
>>
>> So, my question is, how does one kick the RAID in a situation like this
>> to make it write to those bad blocks so that they will be swapped out
>> and the sectors repaired from the redundant information in the RAID set?
>>
>> The normal "determine where the blocks are in the file system and
>> overwrite them" methods don't apply here since the file system is on a
>> virtual disk, so there is no way of knowing what is in the affected
>> blocks.  The blocks could even be part of the underlying RAID structure.
>>  It has been more than 7 days, so normally a patrol read should have
>> been run, which should have caught and fixed this.  Apparently not.
>>
>> I am not very confident that the BIOS/console level tools are going to
>> work right, since the supposedly supported perccli software won't talk
>> to the controller, the console stuff may not be working right either. 
>> The system was rebooted once, and luckily it came back up, but that
>> didn't let perccli talk to the controller.
>>
>> Thanks,
>>
>> David Mathog
>> mathog at caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>> _______________________________________________
>> Smartmontools-support mailing list
>> Smartmontools-support at listi.jpberlin.de
>> https://listi.jpberlin.de/mailman/listinfo/smartmontools-support
> 

-- 
--------------------------------------------------------------------------
Prof. Dr. Bruce Allen, Director
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Callinstrasse 38
D-30167 Hannover,  Germany
Tel +49-511-762-17145
Fax +49-511-762-17182
Email: bruce.allen at aei.mpg.de



More information about the Smartmontools-support mailing list