Why are people so hung up about the x86 thing? ARM continues to be sold on because everyone has now understood they don't really matter; they are not driving the innovations, they were simply the springboard for the Apples, Qualcomms and Amazons to drive their own processor designs, and they are not setup to profit from that. ARMs reference design isn't competitive, the M1 is.
Instruction set architecture at this point is a bikeshed debate, it's certainly not what is holding Intel back.
..a big part of the reason the M1 is so fast is the large reorder buffer, which is enabled by the fact that arm instructions are all the same size, which makes parallel instruction decoding far easier. Because x86 instructions are variable length, the processor has to do some amount of work to even find out where the next instruction starts, and I can see how it would be difficult to do that work in parallel, especially compared to an architecture with a fixed instruction size.
Well, if we can have speculative execution, why not speculative decode? You could decode the stream as if the next instruction started at $CURRENT_PC+1, $CURRENT_PC+2, etc. When you know how many bytes the instruction at $CURRENT_PC takes, you could keep the right decode and throw the rest away.
Sure, it would mean multiple duplicate decoders, which eats up transistors. On the other hand, we've got to find something useful for all those transistors to do, and this looks useful...
According to the article I linked, that's basically how they do it:
"The brute force way Intel and AMD deal with this is by simply attempting to decode instructions at every possible starting point. That means x86 chips have to deal with lots of wrong guesses and mistakes which has to be discarded. This creates such a convoluted and complicated decoder stage that it is really hard to add more decoders. But for Apple, it is trivial in comparison to keep adding more.
In fact, adding more causes so many other problems that four decoders according to AMD itself is basically an upper limit for them."
That doesn't make any sense. The ROB is after instructions have been cracked into uops; the internal format and length of uops is "whatever is easiest for the design", since it's not visible to the outside world.
This argument does apply to the L1 cache, which sits before decode. (It does not apply to uop caches/L0 caches, but is related to them anyway, as they are most useful for CISCy designs, with instructions that decode in complicated ways into many uops.)
Maybe it wasn't clear, but the article I linked is saying that compared to M1, x86 architectures are decode-limited, because parallel decoding with variable-length instructions is tricky. Intel and AMD (again according to the linked article) have at most 4 decoders, while M1 has 8.
So yes the ROB is after decoding, but surely there's little point in having the ROB be larger than can be kept relatively full by the decoders.
Well intentioned as that article may be, it makes plenty of mistakes. For a rather glaring one, no, uops are not linked to OoO.
Secondly, it ignores the existence of uop caches that x86 designs use in order to not need such wide decoders. Some ARM designs also use uop caches, FWIW, since it can be more power efficient.
That doesn't mean that fixed width decoding like on aarch64 isn't an advantage; it certainly is. Also, M1 is certainly a very impressive design, though of course it also helps that it's fabbed on the latest and greatest process.
I would argue that ISA does matter. Beyond the decode width issue, x86 has some material warts compared to ARM64:
The x86 atomic operations are fundamentally expensive. ARM’s new LSE extensions are more flexible and can be faster. I don’t know how much this matters in practice, but there are certainly workloads for which it’s a big deal.
x86 cannot context-switch or handle interrupts efficiently. ARM64 can. This completely rules x86 out for some workloads.
ARM64 has TrustZone. x86 has SMM. One can debate the merits of TrustZone. SMM has no merits.
Finally, x86 is more than an ISA - it’s an ecosystem, and the x86 ecosystem is full of legacy baggage. If you want an Intel x86 solution, you basically have to also use Intel’s chipset, Intel’s firmware blobs, Intel’s SMM ecosystem, all of the platform garbage built around SMM, Intel’s legacy-on-top-of-legacy poorly secured SPI flash boot system, etc. This is tolerable if you are building a regular computer and can live with slow boot and with SMM. But for more embedded uses, it’s pretty bad. ARM64 has much less baggage. (Yes, Intel can fix this, but I don’t expect them to.)
> The x86 atomic operations are fundamentally expensive. ARM’s new LSE extensions are more flexible and can be faster. I don’t know how much this matters in practice, but there are certainly workloads for which it’s a big deal.
There's also the RcPc stuff in ARM 8.3 and 8.4 that could make acquire/release semantics cheaper.
> the x86 ecosystem is full of legacy baggage.
Luckily for ARM servers we have SBSA that adds things like UEFI and ACPI to the ARM platform. :)
That baggage means I can at least boot an Intel system without first building my own device tree, then hacking the kernel to actually make it work. Meanwhile over in ARM M1 land, they apparently managed to break wfi.
Well put. People are being their usual teamsport participants on x86 vs ARM. Intel has execution problems in two departments - manufacturing and integration. ISA is not an issue - they can very well solve the integration issues and investing in semiconductor manufacturing is the need of the hour for the US so I can imagine they getting some traction there with enough money and will.
IOW even if Intel switched ISA to ARM it won't magically fix any of the issues. We've had a lot of ARM vendors trying to do what Apple did for too long.
Intel / AMD had a duopoly on desktop / server because of x86 for a large number of years.
Loss of that duopoly - even with competitive manufacturing - has profound commercial implications for Intel. M1 and Graviton will be followed by others that will all erode Intel's business.
On the other hand if x86 keeps competitive there's lot of inertia in its favor. So it could go either way. Desktop especially has been a tough nut to crack for anyone other than Apple and they are only 8% of the market.
Probably more than 8% by value and with the hyperscalers looking at Arm that's a decent part of their business at risk - and that's ignoring what Nvidia, Qualcomm etc might do in the future.
Agreed that intertia is in their favour but it's not a great position to be in - it gives them a breathing space but not a long term competitive advantage.
Instruction set architecture at this point is a bikeshed debate, it's certainly not what is holding Intel back.