Side note for 2023: Some modern MCUs are low cost, but have FPUs. A good example is the STM32G4. So, unlike on an M0 MCU or w/e, you can freely use f32, if you don't want fixed point. And you can get these for ~$1-2 / MCU.
That said... The G4 also has a hardware CORDIC peripheral that implements this algo for fixed-point uses. Is this mainly to avoid floating point precision losses? You program it using registers, but aren't implementing CORDIC yourself on the CPU; dedicated hardware inside the IC does it.
The second Parallax Propeller chip has a CORDIC engine in silicon. It's fast, and handles 64bit intermediate products, which makes the divide and trig stuff more than adequate precision for most things. One can always increase precision in software too.
I was late to the game learning about CORDIC. I had used fixed point a lot in 8 and 16 bit assembly land for performance, and determinism.
When I found out about it, I was stunned! It was fast, and only required basic math capability to be useful.
I found that a lack of barrel shifter or at least micro-coded shift 'n' hampers Cordic in an 8 bit CPU but I coded a routine up for fun in 6502. I never found a use for it myself as there always seems to be a faster way to do things for games programming at least.
Yes a barrel shifter would have been a nice addition to the 6502. L
Yes, general purpose code is rarely the answer. I think all of us see that after a while.
Then, after that process, one starts coding differently. Chain stuff together, using common ops, pulling off intermediate values, etc... For most things, a fast way exists, but it will often be a tweak or two to make sense in the context.
That said... The G4 also has a hardware CORDIC peripheral that implements this algo for fixed-point uses. Is this mainly to avoid floating point precision losses? You program it using registers, but aren't implementing CORDIC yourself on the CPU; dedicated hardware inside the IC does it.