More

dionhaefner · 2025-12-04T13:04:21 1764853461

This is a case study in making gradient-based optimization (a la gradient descent) work with tools that weren’t designed for it. The goal is optimizing rocket grid fin stiffness (8 bars, 16 angular parameters), with a pipeline that includes Ansys SpaceClaim (CAD), PyMAPDL (FEM solver), and JAX. Each step needs derivatives, which means stitching together analytical adjoints from Ansys, finite differences for mesh operations, and JAX autodiff for everything else. What makes this practical is that each component runs in its own isolated environment (Tesseract), so the Ansys tools can live on Windows with their license requirements while the optimization logic runs on Linux while still being composable for end-to-end autodiff.

The payoff is watching emergent behavior: starting from random bar positions, the optimizer discovers that orthogonal grid patterns work well, diagonal lateral bars create efficient load paths, and clustering material near the attachment points maximizes stiffness. Final result is 75% stiffer than random and 24% better than a regular grid, even after adding back symmetry constraints for manufacturing.

dionhaefner · on Dec 6, 2021

I agree with your point regarding non-composability and ”algorithm lock in” (which may or may not be solvable woth better abstractions), but explicit time stepping schemes are still the main workhorse of global ocean modelling, so I’m not sure whether ”silly” is the right label here.

adgjlsfhk1 · on Dec 6, 2021

Why are explicit time stepping schemes the main tool used? Is it because the languages that these models are written in aren't flexible enough, or is there a math reason why dynamic time-stepping isn't better?

dionhaefner · on Dec 6, 2021

Climate models are vastly complex, and you need to bring together many experts from many disciplines to write and maintain one, and analyze the output. This seems to lead to the simplest methods coming out on top. Perhaps it could be solved with better abstractions (a lot of very smart people are trying).

ChrisRackauckas · on Dec 6, 2021

That's precisely what composability solves. We're seeing in CLIMA that using more general highly optimized solvers can greatly decrease the `f` cost count moreso than focusing on really low level optimizations. Especially in things like the land model where you can have many stability issues (such as large complex eigenvalues which happen to work very poorly with multistep methods, even BDF), the ability to split the develop of the time stepping to a huge community of 100's of developers without losing performance gives something where more optimal methods for a domain arise and are found. Yes, the standard is to use something simpler. No, it's not even close to optimal and that is something that is being made very clear.

dionhaefner · on Dec 6, 2021

The thing with reduced precision is that things may look fine at first, but then you eventually notice unphysical features in your solution (like additional wave modes after very long simulation times, or energy conservation issues). So we really don't know as a community yet how far we can venture from float64, but it looks like float32 may be viable.

Veros works OK on TPUs (about the same speed as a high-end GPU), but since you can't buy TPUs that's an immediate no for most academic users of climate models. Renting hardware doesn't really make sense when you keep it busy for months at a time and the HPC infrastructure is already in place.

adgjlsfhk1 · on Dec 6, 2021

can't you fix a lot of the nonphysical issues by using better integration schemes? that might be hard in Jax though. From what I know, it's options for better numerical stability are pretty limited.

ChrisRackauckas · on Dec 6, 2021

No, in fact, you want to go lower order with lower precision. The real answer is that if the solution is in the chaotic regime then maybe Float16 is fine because you'll be dominated by other numerical errors anyways (if you're also making sure you have adequate conservation so the solution doesn't explode in some way), but if you're not in the chaotic regime then even Float32 is pushing it in many cases (i.e. it better be non-stiff as stiffness pretty much guernetees operations which span beyond Float32 relative epsilon). So it's a case-dependent topic and not something that has an easy answer, though the case for Float16 is rather small.

(We had some small tests generating TPU ODE solver code from Julia and showcased some rather bizarre stuff back when Keno was working on it, but never wrote a post summarizing all of it)

dionhaefner · on Dec 6, 2021

True, but unfortunately Pytorch is not quite there yet when it comes to more complex benchmarks:

https://github.com/dionhaefner/pyhpc-benchmarks#example-resu...

JAX really is the only library that comes close to low-level code on CPU, almost always (that I've tried).

bertr4nd · on Dec 6, 2021

Interesting, I thought pytorch was a bit more competitive on those other benchmarks (but admittedly it’s been a while since I looked). Slicing shouldn’t be a fundamental problem, but perhaps there are some important details that have been overlooked. Thanks for pointing it out!

dionhaefner · on Dec 6, 2021

I was mostly referring to the millions (billions?) of dollars getting poured into Python library development by tech companies. With the effect that Python stays relevant and has a thriving library ecosystem. Maybe I'm wrong and Julia is just that good that it doesn't matter - I guess time will tell.

dionhaefner · on Dec 6, 2021

A bit late to the party, but here are some reasons:

- When we started Veros (~4 years ago) Julia was very new on our radar and we didn't know whether it would stick. And to be frank, I'm still not convinced whether it will stick. Yes it seems like a fantastic language, but we all know how long it took Python to gain traction.

- Climate scientists and students already do their post-processing in Python. Having the whole stack in the same language makes things a lot easier for domain experts whose first priority is physics, not coding.

- Python skills translate better to other jobs, which I think is important for young academics.

- The Python library ecosystem is so good. Need to use PETSc? `import petsc4py`. Simplify postprocessing? Export your model state as `xarray` dataset. Julia is great for bleeding edge autodiff through everything stuff, but the bread and butter libraries are just so polished and battle tested in Python.

- I don't know Julia :)

civilized · on Dec 6, 2021

Those are very good reasons!

dionhaefner · on Oct 19, 2021

Yes that would probably work! GAMs are a bit of a black box to me so I find it hard to reason about them. I think my method is a bit simpler and it’s more straightforward to get uncertainties, but certainly less elegant than a smooth solution.

dionhaefner · on Oct 19, 2021

I basically determine p via Bayesian inference within every bin (via a conjugate beta prior for p which gives a beta posterior). If that’s not Bayesian then I don’t know what is :)

Yes the pruning can be done with a frequentist method too. Yes you can come up with smarter / more statistically sound ways to construct these binnings. Do they work on >1e9 data points?

dionhaefner · on July 26, 2021

I’m not sure which standards you are referring to (I guess that means no). If you have a link or open an issue we could scope this out.

dionhaefner · on July 26, 2021

That is true to some degree, but with Terracotta we allow the frontend to do manipulations on the fly. Like changing contrast, applying colormaps, or band math. So the number of possible tiles is infinite.

If you don’t need that, you can just pre-generate all PNGs - then you don’t even need a server anymore, just a hash function to identify the right file and a big disk.

jhgb · on July 27, 2021

> like changing contrast, applying colormaps, or band math

Surely these are examples of things that a client could do with raster data independent of the capabilities of a server?

dionhaefner · on July 27, 2021

Only if the client has access to the full bit depth. The rendered PNGs are uint8, but the underlying raw data is often stored in 16 or even 32 bit precision.

jhgb · on July 27, 2021

I imagine that with HTTP range requests, it shouldn't be hard to access original TIFF data. Javascript or WebAssembly can then work with arbitrary bytevectors.

dionhaefner · on July 27, 2021

Yes, it’s possible. There are some proofs of concept for that IIRC. If you prefer to write everything in Javascript you can do that and do all heavy lifting in the client (plus some extra work to provide indexing and metadata like Terracotta).