treybrick's comments

treybrick · on July 1, 2024

Disclaimer: I work at Tonic AI

We have a product called Ephemeral (in some ways similar to Neon) that orchestrates a kubernetes cluster to spin up and down ephemeral databases from "snapshots" for this exact use case.

We have a more well established product called Structural that does de-identification and anonymization from production data, and we have a pretty clean integration between the two products to make creating your fleet of ephemeral testing databases nice and easy.

In case you're interested -> https://www.tonic.ai/ephemeral

treybrick · on May 28, 2024

What types of files and databases does this integrate with? Most of my files are in S3 and many of them are messy PDFs made from scanning physical documents. Do I need to standardize them all to txt's or csv's or something to get them to work right?

joewferrara · on May 28, 2024

right now we support txt, csv, tsv, docx, xlsx, pdf, png, tif, tiff, jpg, and jpeg filetypes. we support either local files or aws s3 as the document store where the files are read from. so we can work with your messy files in s3 as they are without any standardizing!