How is that any different from standard RAID? That's exactly the problem RAID wa...

csirac2 · on Sept 1, 2014

ZFS & btrfs detect and fix silent corruption - where no errors are emitted from the hardware.

I think the pertinent question is: when the filesystem goes to read a 4K block, and one drive's copy of this block in the RAID-1 set is different to its counterpart 4K block on another disk, which one wins?

maaku · on Sept 1, 2014

I didn't specify RAID-1. RAID-5 or RAID-6 can reconstruct the correct value in a silent fail.

Honest question: how often to drives silently fail? Drives contain per-sector checksums these days, explicitly to prevent this problem.

radiowave · on Sept 1, 2014

I don't know how often they fail, but I will say that the failures that I hope I never see again (or at least, I hope I never see again outside of a ZFS system) are not drive failures, but those involving intermittent disk controller or backplane faults. In comparison to the chaos I've seen this cause on NTFS systems, ZFS copes astonishingly well.

justincormack · on Sept 1, 2014

A normal raid does not check the checksums on read. It only uses them after a device failure.

Also it may have copies of the data eg raid 1 but does not know which is correct if they differ.

maaku · on Sept 1, 2014

No, every default mdadm install performs a complete scrub on the first Sunday of the month. Every block of the array is read back and validated. For RAID modes with parity (e.g. RAID-5, RAID-6) it is able to detect and fix the offending disk when a silent error occurs. You can trigger such a scrub whenever you want (I run mine once a week).

dsturnbull2049 · on Sept 2, 2014

Scrubbing the entire raid volume is significantly different from scrubbing every piece of data as it gets written/read.

First, in between your monthly/weekly scrubs your disks/controllers will be silently corrupting data, possibly generating errors on multiple devices resulting in data loss depending on raid type. ZFS detects corruption much more quickly.

Second, your traditional raid recovery is to rewriting the entire device to fix a single block. Let's say you're using RAID5 and you're rewriting parity. You get another block error. Oops, now you've lost everything. Since disks have an uncorrectible block error rate of 1 in 10^15 bits, you only need a moderately sized array to almost guarantee data loss. ZFS rewrites corrupt data on the fly.

maaku · on Sept 2, 2014

Every time you read or write from a RAID volume, it does perform validation and write-back on error detection. I think your mental model of how linux software RAID works needs updating.

I'm not trying to argue that mdadm is better than ZFS, just that in this case they pretty much compare the same.

justincormack · on Sept 2, 2014

If the _drive_ reports a read error it will. If there is silent data corruption it wont. You can test this by using dd to corrupt the underlying data on a drive.

maaku · on Sept 3, 2014

Hrm. I'm going to test that.