Thank you for reading! Cost is a big factor - i really want to make models that ...

Thank you for reading!

Cost is a big factor - i really want to make models that can run on average CPU only machines so most of the world can benefit, rather than needing expensive GPUs or an internet connection + subscriptions. Another big factor is privacy (you don't need to trust a 3rd party with your inputs).

As for the hyperparameters, pure bruteforce trial and error. It feels more like a dark art than a science. You roll the dice and then start tweaking things until the loss looks like it's dropping nicely and consistently, and the checkpoints are starting to output things resembling what we want. I sometimes do inference using checkpoints just to get a feel of if the model is learning (regardless of loss)