Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reading those stories makes me realize how well thought-out the process at my work is:

We have dev databases (one of which was recently empty, nobody knows why; but that's another matter), then a staging environment, and finally production. And the database in the staging environment runs on a weaker machine than the prod database. So before any schema change goes into production, we do a time measurement in the staging environment to have a rough upper bound for how long it will take, how much disc space it uses etc.

And we have a monthly sync from prod to staging, so the staging db isn't much smaller than prod db.

And the small team of developers occasionally decides to do a restore of the prod db in the development environment.

The downside is that we can't easily keep sensitive production data to find its way into the development environment.



When moving data from prod to other environments, consider a scrambler. E.g., replace all customer names with names generated from census data.

I try to keep data having the same form (e.g., length, number of records, similar relationships, looks like production data). But it's random enough so that if the data ever leaks, we don't have to apologize to everybody.

Since your handle is perlgeek, you're already well equipped to do a streaming transformation of your SQL dump. :)


Yep. For x.com I wrote a simple cron job that sterilizes the automated database dump and sends it to the dev server. Roughly, it's like this:

-cp the dump to a new working copy

-sed out cache and tmp tables

-Replace all personal user data with placeholders. This part can be tricky, because you have to find everywhere this lives (are form submissions stored and do they have PII?)

-Some more sed to deal with actions/triggers that are linked to production's db user specifically.

-Finally, scp the sanitized dump to the dev server, where it awaits a Jenkins job to import the new dump.

The cron job happens on the production DB server itself overnight (keeping the PII exposure at the same level it is already), so we don't even have to think about it. We've got a working, sanitized database dump ready and waiting every morning, and a fresh prod-like environment built for us when we log on. It's a beautiful thing.


This sounds like it'd make a good blog post.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: