Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The sub-linked legal document about Frank’s synthetic data generation was quite interesting, specifically how much difficulty their hired consultant (a Data Science Professor) had in creating it.

It can indeed be a tricky problem to do in a manner that’s both fast and accurate, but it’s absolutely possible once you have the right datasets, which aren’t even that large. U.S. ZIP codes, telephone area codes (with enough out-of-place ones to mimic people who’ve moved and kept their cell phone number), common names, and a word list will get you rows that look plausible. Matching street addresses requires a much larger dataset, but again, it’s not impossible.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: