More

alexwatson405 · 2025-12-03T16:09:50 1764778190

Hi all- I’m a co-founder from Gretel; our team and tech are now part of NVIDIA.

NeMo Data Designer is our core product from Gretel and now the internal framework we use heavily for both pre- and post-training data in Nemotron for a variety of use cases.

The OSS version is fully general-purpose: Python-first, modular, and designed so you can mix statistical samplers, LLM columns, and seed datasets in a single pipeline.

Happy to answer questions or hear feedback on missing features

alexwatson405 · on Aug 16, 2023

Hey there! Co-founder of Gretel.ai here, and I think I can provide some insights on this topic.

Firstly, the concept you're hinting at is not purely traditional ML. In traditional machine learning, we often prioritize feature extraction and engineering specific to a given problem space before training.

What you're describing and what we've been working on at Gretel.ai, is leveraging the power of models like Large Language Models (LLMs) to understand and extrapolate from vast amounts of diverse data without the need for time-consuming feature engineering. Here's a link to our open-source library https://github.com/gretelai/gretel-synthetics for synthetic data generation (currently supporting GAN and RNN-based language models), and also our recent announcement around a Tabular LLM we're training to help people build with data https://gretel.ai/tabular-llm

A few areas where we've found tabular or Large Data Models to be really useful are: * Creating privacy preserving versions of sensitive data * Creating additional labeled examples for ML training (much less expensive than traditional data collection/ml techniques) * Augmenting existing datasets with new fields, cleaning data, filling in missing values

Lots of mentions of RLHF here in the threads, one area I think RLHF will be super helpful is in ensuring that LLM data models return diverse and ethically fair results (hopefully better than the data they were trained on). Cheers!

alexwatson405 · on Feb 8, 2023

“Pics and it didn’t happen.” Love it

alexwatson405 · on Sept 20, 2022

Here is the call for papers (CFP) section in the FAQ. Links to a Google Form https://gretel.ai/synthesize2023#faqs

alexwatson405 · on Nov 16, 2020

Yep- versioning is definitely important, and not what Gretel focuses on. You could connect a Gretel project stream up to a Dat backend for versioning/lineage.

So you could use Gretel to anonymize or build a synthetic version of a dataset for sharing, and then use Dat for versioning

alexwatson405 · on Sept 21, 2020

Good point in the article that barring apps from aggressively tracking users (which is a good thing IMO), creates more power for companies like Facebook/Apple/Google/Amazon that already have access to the data

alexwatson405 · on Aug 25, 2020

+ Neat example of using the synthetic data to balance limited ML datasets: https://towardsdatascience.com/improving-massively-imbalance...

alexwatson405 · on Aug 25, 2020

Nice write-up! Getting servos and gears strong enough to jump will be a challenge, but looking forward to seeing what you come up with! I had a project to create a SpotMini from a Mekamon last year- biggest problem I ran into was that the servos could not support the weight of an iPhone- https://medium.com/@zredlined/making-my-own-spot-mini-2-2f12...

alexwatson405 · on Aug 25, 2020

@ofalko Our code is open source, you can always check it out for yourself. =)

https://github.com/gretelai/gretel-synthetics

alexwatson405 · on April 8, 2020

To quote the great Jean Luc Picard, pick "one impossible thing at a time".

I also find myself going in a lot of directions, and I've found that picking an idea and sticking with it, until it fails or works, is an achievement in itself.