This blows my mind. How is it even possible to validate a model that incorporate...

kajecounterhack · on Aug 3, 2022

It _is_ deterministic (same input gives same output).

You typically don't "test" pairs of inputs/outputs for a model. Instead you measure its performance by defining metrics e.g. "what's the ROUGE-2 score on summarization after fine-tuning AlexaTM 20B using N examples from dataset Y"

You can test some aspects of ML models, like sync testing (if you train on hardware A and run on hardware B, their results are not always the same). But generally you test the code that embeds the model, not the model itself.

taylorius · on Aug 3, 2022

How do you define validate? These models aren't formally proven to work in all cases or anything. They're just tested on a load of data, and if it's found that they work pretty well, then they get released.

lionkor · on Aug 3, 2022

So, like 90% of the non-FAANG software that we use every day