None of what you ssid means writing compilers is hard: just that the first ones or techniques are hard to do. Once it's done, it's another building block that can be chained together to solve similar problems. Using a ML or LISP with right primitives makes that even easier. Designing the right language and implementing modularly makes that easier.
So, I'd say the industrial ones are as tough as they are because they support hard-to-compile languages with many dark corners while bring implemented in the same languages themselves. Make problem hard for yourself then it's hard. Make it easier then it's... still work but only a tiny fraction of what you alluded to.
It's far from trivial, but it is presented as more complex than it should. The parsing point is valid, when in AST land the rewrite / transform / optimize part, which could be seen as the most important one, seems more enjoyable and composable than the parsing stage which is often covered in great details (courses making you write BNF state matrices?).
Compilers are not really complicated compared to plenty of other hard topics. They're large, but they always separate out into different passes that you just run one after the other. You don't have to manage the program as a whole.
And optimizations don't even matter anyway. x86 runs your -O1 program as fast as your -O3, and nobody has put a real accurate model of an x86 into a production compiler for the last decade.
That's a pretty good indication that there is a big problem. Deep at the heart of it is that processor designers don't quite understand the performance characteristics of their processors any more, so they cannot produce e.g. an abstract memory model that compiler writers could base their optimisations on.
Optimisations for parallel processing, e.g. auto-parallelisation are not well developed either.
Oh, I think Intel knows it, but they consider it too much of a trade secret to even tell the compiler team about it.
There are actually numerous performance counters exposed by the CPU too, which are good for dynamic benchmarking, but hardly anyone cares to use them.
BTW, even though the ALU part of the CPU is extremely wide and dynamic, you still have to fit your instructions through the x86 decoder at the front, and it's worth writing a scheduler just to model that part.
I understand that they need to keep the low level details
of their CPU secret, but a memory model? All it says is when/how reads and write commute. If a CPU manufacturer were to
provide a nice, high-level memory model, this would make the CPU more sellable. Why? Because compiler writers could more aggressively optimise when compiling for the CPU.
I suspect that the reason they don't give out memory models is that
they don't know how to do it. That's also the vibe I've been getting
when I talked to people who would know.
Tbh, it is really hard to have an "accurate" model of a very wide OoO core. It is far too dynamic, you instruction scheduler cannot predict the occupancy of the pipelines.
In order to determine the meaning of a programming language, you need a platform independent memory model. This is increasingly important as CPUs are more and more concurrent and programming languages are exposing some of that concurrency to the programmer.
Of course the language designer can always play it safe and choose a ridiculous memory model like sequential consistency, but that prevents many useful compiler optimisations. At the other extreme you could explicitly state that the memory model is a specific processor, but then
(a) the language becomes too platform dependent and program semantics will change with every processor upgrade.
(b) nobody really knows what exactly the memory model is in any meaningful sense because even the manufacturers don't know (other than saying: it is what the CPU does). That means the semantics of the language is unclear.
What you want is something between the Scylla of inefficiency and the Charybdis of platform dependence. This is a difficult problem, and unsolved even for as simple a language as C, see e.g. [1]. Indeed it is unclear if memory models possessing the required attributes even exist, maybe it's contradictory to assume that they can.
Again, compilers are easy, easier than anything else.
If the entire thing is "complicated" (although it should not be), it can still be broken into a chain of laughably trivial passes. This is the most beautiful property of the compilers - an ability to break down complexity all the way down to trivial.
If you do not agree for some reason just name a part of a compiler you perceive as inherently complex, and I will show you how to turn it into a laughably simple chain.
I agree with you that idealised compilers have a beautiful pipeline structure. Finding this structure Is one of the great achievement of the early computing pioneers.
turn it into a laughably simple chain.
By this reasoning, any computable problem is trivially simple, because I can translate it into ARM/x86/MIPS/... machine code, and machine code is just a linear list of trivial commands, i.e. I can "turn it into a laughably simple chain".
My point is: if you chain enough simple things, the result stops being simple.
Look for example at modern JIT compilers, tracing or otherwise. They are not simple by any stretch of the imagination. To see that just look at a program, and predict how long the warm-up will take. Whoops ... unknown [1].
Modern, industrial-strength compilers are some of the most complicated pieces of technology we have.
Indeed almost every part of the compiler pipeline, in particular parsing, type-inference, and optimisations, are active areas of research.