Why not recompile every iteration? Weights are only updated at the end of the ba...

thechao · 2025-12-22T12:56:23 1766408183

You'd pay the cost of the core computation O(n) times. Matrix products under the derivative fibration (jet; whatever your algebra calls it) are just more matrix products. A good sized NN is already in the heap. Also, the hard part is finding the ideal combination of fwd vs rev transforms (it's NP hard). This is similar to the complexity of finding the ideal subblock matrix multiply orchestration.

So, the killer cost is at compile time, not runtime, which is fundamental to the underlying autograd operation.

On the flip side, it's 2025, not 2006, so pro modern algorithms & heuristics can change this story quite a bit.

All of this is spelled out in Griewank's work (the book).

spwa4 · 2025-12-22T13:57:27 1766411847

This one? https://epubs.siam.org/doi/book/10.1137/1.9780898717761

thechao · 2025-12-22T20:07:30 1766434050

Yep. You can find used copies at some online places? Powell's in Portland (online store) sometimes has it for 25 or 30 $s.