Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why not recompile every iteration? Weights are only updated at the end of the batch size at the earliest, and for distributed training, n batch sizes at the fastest, and generally only at the end of an iteration. In either case the cost of recompiling would be negligeable, no?


You'd pay the cost of the core computation O(n) times. Matrix products under the derivative fibration (jet; whatever your algebra calls it) are just more matrix products. A good sized NN is already in the heap. Also, the hard part is finding the ideal combination of fwd vs rev transforms (it's NP hard). This is similar to the complexity of finding the ideal subblock matrix multiply orchestration.

So, the killer cost is at compile time, not runtime, which is fundamental to the underlying autograd operation.

On the flip side, it's 2025, not 2006, so pro modern algorithms & heuristics can change this story quite a bit.

All of this is spelled out in Griewank's work (the book).



Yep. You can find used copies at some online places? Powell's in Portland (online store) sometimes has it for 25 or 30 $s.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: