All of the data is migrated over to alternate storage in a way which is easily retrievable, and in a form where "restoring from backup" is frequently tested.
The thing that's missing is retention of old data, but I can tell you that is fraught with its own complications. A week-old repository tarball is almost worse than useless in the context of the git repository; we'd sooner restore that data by having a developer re-run "git push" than to lose a week's worth of development.
And that's assuming that a daily or weekly tarball isn't itself corrupt, which would have been the case here unless we ran git-fsck before making the copy (which is what was thought to have been getting run in some fashion in the first place).
I do fully agree that there needs to be more intelligence on the anongit side of the servers if they're to be used as viable backups instead of just sync destinations, but everyone keeps mentioning solutions to problems we don't have or null solutions to problems we actually have.
Despite what everyone seems to think we have multiple other backups of the source data (including tarball-style), but they're all crap in comparison to being able to recover from anongit.
A week-old repository tarball is almost worse than useless in the context of the git repository; we'd sooner restore that data by having a developer re-run "git push" than to lose a week's worth of development.
The risk of using mirroring rather than versioned backups is that you lose all the data when a deletion is mirrored.
Yes, which is why the mirrors were affected and not the thousands of individual developers clones, nor the existing tarball snapshots.
And even that is because of a deliberate decision on the sysadmins' parts based on a misunderstanding of how git clone --mirror responds to a corrupt repo, not some simple oversight. Which is to say, countermeasures will be put in place for that as well.
I do wish people understood why having relying only on even 2-week-old backups is unacceptable in the context of a large active FOSS project's source code repository, it's not like it's OK to simply start over again from KDE 4.9.4.
>> I do wish people understood why having relying only on even 2-week-old backups is unacceptable
Yes, but i'm not convinced you see that this is EXACTLY what you are exposing yourself to.
What if next time it's a (nasty) bug in git? A push causes corruption perhaps?
Drop the idea of using git itself to host the backup strategy. Switch to plain old backups, if space (or performance - i'd wager the kde git repos must total a good few hundred GB, if not more, if there's artwork or other binaries in there too) is an issue there would be nothing wrong with incrementals for the */30 min backups.
The thing that's missing is retention of old data, but I can tell you that is fraught with its own complications. A week-old repository tarball is almost worse than useless in the context of the git repository; we'd sooner restore that data by having a developer re-run "git push" than to lose a week's worth of development.
And that's assuming that a daily or weekly tarball isn't itself corrupt, which would have been the case here unless we ran git-fsck before making the copy (which is what was thought to have been getting run in some fashion in the first place).
I do fully agree that there needs to be more intelligence on the anongit side of the servers if they're to be used as viable backups instead of just sync destinations, but everyone keeps mentioning solutions to problems we don't have or null solutions to problems we actually have.
Despite what everyone seems to think we have multiple other backups of the source data (including tarball-style), but they're all crap in comparison to being able to recover from anongit.