This is awesome! I love seeing companies run their own infrastructure. I wonder ...

88 · on March 2, 2021

This is mentioned in the video, but they don’t use any form of RAID, just paired/mirrored drives in different physical locations.

This is preferred for its simplicity and performance, and if I were in their position I would do the same thing.

jonah-archive · on March 3, 2021

Exactly correct. I suspect that in the not-too-distant future we will need to move to multi-disk filesystem-level clustering on the storage nodes for some of the reasons laid out in the talk but it's not at all unlikely that we retain the "disconnected mirror" abstraction for redundancy.

Additionally, as I alluded to in the video, we regularly end up going down into the details and when this happens being able to closely examine and understand a disk's contents as written directly at the LBA using fibmaps and similar tooling is invaluable. I have personally been involved in the discovery of multiple possible-data-loss hard drive firmware bugs, and our catalog system is very paranoid about integrity checking. Modern hard drives are computers unto themselves and layering additional complexity on top of that (particularly complexity which might obscure or silently correct errors in the underlying datastream) is something I approach very carefully.

lprd · on March 3, 2021

Hey Jonah,

Perhaps you already covered this in the video and I missed it, but I was wondering how the team went about hardware upgrades/disposal? I maintain a just few servers in my own homelab so upgrading is pretty trivial for me, but I can't imagine what its like on that scale. Also, which hypervisor are you using to manage all of those VMs?

Thanks again for the talk, it was very insightful and fun to watch!

jonah-archive · on March 3, 2021

Thank you, so glad you enjoyed it!

We try to keep a regular upgrade cycle (to hold to our tight budget), typically with a tick-tock of adding new hardware (expanding within our existing footprint) and cycling out old hardware and drives as they reach the far end of the works/doesn't work spectrum. We have a local partner who takes care of some disposal for us, but we also have no shortage of physical storage space, so we will also accumulate (sometime intentionally -- nearly our entire "red box" deployment -- see "previous version": https://archive.org/web/petabox.php -- is packed into a shipping container. We don't like to throw things away!).

For a hypervisor, we use Ganeti (running over KVM). Because our fleet is so heterogenous we need to be able to control a lot of VM parameters in order to efficiently pack our computational resources, and Ganeti is kind of in a sweet spot for us in terms of providing a lot more tooling than a bunch of virsh scripts, and being much smaller than systems like OpenStack geared towards large, homogenous deployments).

silicon2401 · on March 2, 2021

what's the difference between RAID and paired/mirrored drives? is the latter better for NAS?

88 · on March 2, 2021

RAID is an abstraction layer on top of the physical disks.

RAID 1 (mirrored drives) is similar to what the IA have described, but it sounds like they are creating their mirrors using simple file system commands (or tools like rsync) rather than introducing the complexity and overhead of hardware/software RAID.

Other forms of RAID (e.g. RAID 5/6) where data is spread across an array of drives with parity, would provide the IA with additional redundancy but at the expense of significantly increased cost and complexity.