Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very true. If you've solved the labeling/extraction problem using a means other than ML, you can use that means to generate synthetic data. The situation at my company is exactly this.

Say you use regular expressions to extract sensitive data from standardized, but numerously varied, form documents. The pieces of information extracted are very common classes of data: first name, last name, dates, physical locations.

During the extraction process you can save the complement of the extraction (the "leftovers") and insert generated data at the extraction points. Also, because you've extracted the actual sensitive data, you can exclude that from the set of values used for generation, if it's practical.

Sometimes people get caught up in the math and theory that they fail to see the practical solutions.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: