It is insane to me that they ask that question in the title of the article and then go on to spoil the big first surprise.
“How do we talk to people about these games without spoiling it for them? Oh, I know I will spoil it for everybody who reads this article!!!
“It’s one of my favorite in the genre (maybe the best) and it requires a level of secrecy because it takes away the wow factor if you learn details about it early, but ANYWAYS here’s some details about it!!!!!!!!”
Concur: C and C++ are a great example of being both used for rigorous uses, but building/packaging being a mess. And I think the big adv Cargo/Rust has is learning from past mistakes, and taking good ideas that have come up; discarding bad.
UV (And a similar tool I built earlier) does solve it. With the important note: This was made feasible due to standardizing on pyproject toml, and wheel files. And being able to compile a diff wheel for each OS/Arch combo, and have the correct one download and installed automatically. And in the case of linux, the manylinux target. I think the old python libs that did arbitrary things in setup.py was a lost cause.
I'm confused about this: As the article outlines well, Std Rust (over core) buys you GPOS-provided things. For example:
- file system
- network interfaces
- dates/times
- Threads, e.g. for splitting across CPU cores
The main relevant one I can think which applies is an allocator.
I do a lot of GPU work with rust: Graphics in WGPU, and Cuda kernels + cuFFT mediated by Cudarc (A thin FFI lib). I guess, running Std lib on GPU isn't something I understand. What would be cool is the dream that's been building for decades about parallel computing abstractions where you write what looks like normal single-threaded CPU code, but it automagically works on SIMD instructions or GPU. I think this and CubeCL may be working towards that? (I'm using Burn as well on GPU, but that's abstracted over)
Of note: Rayon sort of is that dream for CPU thread pools!
The GPU shader just calls back to the CPU which executes the OS-specific function and relays the answer to the GPU side. It might not make much sense on its own to have such strong coupling, but it gives you a default behavior that makes coding easier.
>What would be cool is the dream that's been building for decades about parallel computing abstractions where you write what looks like normal single-threaded CPU code, but it automagically works on SIMD instructions or GPU.
I've had that same dream at various points over the years, and prior to AI my conclusion was that it was untenable barring a very large, world-class engineering team with truckloads of money.
I'm guessing a much smaller (but obviously still world-class!) team now has a shot at it, and if that is indeed what they're going for, then I could understand them perhaps being a bit coy.
It's one heck of a crazy hard problem to tackle. It really depends on what levels of abstraction are targeted, in addition to how much one cares about existing languages and supporting infra.
It's really nice to see a Rust-only shop, though.
Edit: Turns out it helps to RTFA in its entirety:
>>Our approach differs in two key ways. First, we target Rust's std directly rather than introducing a new GPU-specific API surface. This preserves source compatibility with existing Rust code and libraries. Second, we treat host mediation as an implementation detail behind std, not as a visible programming model.
In that sense, this work is less about inventing a new GPU runtime and more about extending Rust's existing abstraction boundary to span heterogeneous systems.
That last sentence is interesting in combination with this:
>>Technologies such as NVIDIA's GPUDirect Storage, GPUDirect RDMA, and ConnectX make it possible for GPUs to interact with disks and networks more directly in the datacenter.
Perhaps their modified std could enable distributed compute just by virtue of running on the GPU, so long as the GPU hardware topology supports it.
Exciting times if some of the hardware and software infra largely intended for disaggregated inference ends up as a runtime for [compiled] code originally intended for the CPU.
There was a library for Rust called “faster” which worked similarly to Rayon, but for SIMD.
The simpleminded way to do what you’re saying would be to have the compiler create separate PTX and native versions of a Rayon structure, and then choose which to invoke at runtime.
I work with GPUs and I'm also trying to understand the motivations here.
Side note & a hot take: that sort of abstraction never really existed for GPU and it's going to be even harder now as Nvidia et al races to put more & more specialized hardware bits inside GPUs
Or Blender, pen and paper, bag of LEGO, etc. Text in context of geometric object is more or less an abstract classification tool, barely a descriptive one.
Everyone knows what a `dice` is. But that's a taxonomical label, not a definition of one. Anyone reading this can probably draw a representative `dice` using only standard stationery supplies in under a minute. Now describe one in English with such rigor and precision that it readily translates to a .gcode file to be printed. That requires a good amount of useful neurodivergence to pull off at all.
The great thing about OpenSCAD is that one can model anything which one can describe using mathematics and cubes, cylinders, spheres, and transformations/relocations of same.
The awful thing about OpenSCAD is that what one can model is bounded by one's fluency with mathematics and one's ability to place and transform cubes, cylinders, and spheres.
I wouldn't call a FOSS project that you compare to some 2,620 USD/year software a dead-end. It's good enough for simple modeling, especially when it comes to scripting, and has been for 10 years already.
If you're using one of the old stable builds, then the newer nightly builds are markedly faster --- hopefully there will be a new stable release presently.
The two games the article has pictures of games are IMO everyone who plays games should play; they are two of the best of all time.
reply