Global illumination is the hard part. The math isn't that hard, but even the best render farms don't have enough computing power to support a straightforward implementation.
So what follow is an endless series of tricks. Path-tracing is one of the purest implementations, and it is actually a simple algorithm to implement, but if you don't want to have a noisy mess on all but the most simple shapes, now we are talking PhDs and rock star developers.
Would you mind linking some articles or hinting towards techniques used to "coerce" the choosing of ray sample directions so that noise is minimized even in very "specular" scenes? Sorry for the lack of proper terminology on my end, I've been out of the loop for a very long time, but I assume that's where the majority of the tricks are - I suppose the rest is mostly intersection check accelerations (i.e. BVH).
The modern state of the art for realtime is ml denoisers, taking noisy pixel data from multiple frames, plus other associated data eg velocity vectors of geometry, depth data etc. and using it to produce a perfectly denoised image.
Right now, I'm heavily into Cyberpunk 2077. I've got an RTX 5090, so I can turn all the details, including the ray tracing, to their maximum settings. It's absolutely gorgeous (Especially on my 4K HDR OLED monitor), but if you look really closely, you can still see the evidence of some shortcuts being taken.
Some reflections that are supposed to be a bit rough (like a thin puddle in the road) may appear a bit blurry as I'm walking, but will come into better focus when I stop. My guess is that as I'm moving, the angles of the rays being reflected change with every frame, making the data very noisy. Once I stop, they become consistent, so the reflection becomes clear.
> Graphics is trivial until you get to shadows and lighting
And reflections and refractions.
Raster-based reflections are a simple shortcut: Just take the rendered image of what's being reflected and invert it.
But that doesn't work when the reflected object is off screen. As a result, if you're over water that's reflecting a city skyline or something in the distance, then pitch the camera down, the reflection vanishes as the skyline goes off screen.
Alternatively, you can create an environment-mapped texture, but that makes the reflection not reflect what's actually there, just an approximation of it.
I find it incredibly distracting in games. It's like bad kerning: Once you know what it looks like, you see it EVERYWHERE.