More

brucehoult · 2025-12-12T20:47:08 1765572428

> Qualcomm acquired Nuvia in order to bypass the licence fees charged by ARM

Both Nuvia and Qualcomm had Arm Architecture licenses that allowed them to develop and sell their own Arm-compatible CPUs.

There was no bypassing of license fees.

If Qualcomm had hired the Nuvia engineers before they developed their core at Nuvia, and they developed exactly the same core while employed at Qualcomm, then there would be no question that everyone was obeying the terms of their licenses.

Arm's claim rests on it being ok for Nuvia to sell chips of their own design, but not to sell the design itself, and not to transfer the design as part of selling the company.

brucehoult · 2025-12-11T23:21:38 1765495298

> Aarch64's ecosystem is huge

ARMv8 hardware (other than Apple) only shipped 3-6 years before RV64GC/RVA20, and ARMv9 is only about two years before the equivalent RVA23 -- at least in SBCs/Laptops. Obviously ARMv8 hardware went into mobile devices a lot earlier, though it was often running 32 bit code for the first few years.

It's nothing at all like the maturity lead x86 has over both.

brucehoult · 2025-12-11T23:16:02 1765494962

What version fragmentation?

Pretty much everything coming out in 2026 -- including Ventana's Veyron V2 -- is RVA23.

One profile to rule them all.

Currently-shipping applications processors are either RVA20 (plus the B extension in practice) or RVA22 with V as a standard option.

That's not fragmentation, it's just a standard linear progression. Each thing can run all the software from the previous thing:

    RVA20 (what e.g. Ubuntu 25.04 and earlier require)
    -> RVA20 + B
    -> RVA22
    -> RVA22 + V
    -> RVA23 (what Ubuntu 25.10 and later require)

Joel_Mckay · 2025-12-12T00:47:23 1765500443

The exact same mistakes were made in ARM6. RISC-Y biggest competitor is mature architecture ecosystems and variants of itself.

https://en.wikipedia.org/wiki/Second-system_effect

Even most ARM software compilers still cripple the advanced vendor specific asic features simply for stability mitigation. ARM 8/9 was actually a much leaner design. Cheers =3

https://xkcd.com/927/

brucehoult · 2025-12-12T06:27:43 1765520863

What mistakes?

No one is ever going to design an ISA that is complete and finished forever on Day #1. There are always going to be new data types and new algorithms to support e.g. the current rush to add "AI" support to all ISAs (NPUs, TPUs, whatever you want to call them).

Arm has ARMv9-A following on from ARMv8-A, and they are already up to Armv9.7-A in addition to as many ARMv8-A enhancements.

Intel/AMD have all kinds of add-ons to x86_64, and not even linear e.g. the here now gone now AVX512. Finally here to stay (presumably) in x86-64-v4. And there is already APX and AVX10 to add to that.

Joel_Mckay · 2025-12-12T07:54:21 1765526061

If people standardized around something like the RISC-V X280, added some standard license-free hardware codecs, and quietly ejected every other distraction. Than RISC-V may have dropped into mobile SoC markets like amd64 did with x86 hard-to-use failed successor IA-64. Note, the silicon business is about selling sustained volumes of identical product, and not about a CEOs ego selling bespoke chips in sub 100k batches.

There were many great chips that never survived in consumer product spaces. When manufacturers tell chip houses there is a permutation compatibility risk issue, and people take a petulant stance on the feedback... “Not my circus, not my monkeys” as they say.

1. Intel is kept alive by the promise of an integrated NVIDIA RTX SoC.

2. AMD understood something important about the software market, and that was easy backward-compatibility wins over _every_ other feature. Even Intel had to learn this the hard way.

3. 93% of the market is change sensitive... anyone that assumes cross-compiling is on the queue for that sector is greatly mistaken. Note, it took ARM over a decade driven by Googles dominance with mobile to gain traction.

4. Most software libraries will only enable advanced chip features if hardware is detected, and most compiled code simply uses the compatibility subset of compiled features (sure its 3 times slower, but it works everywhere.) No one is going to go through every permutation of an ISA with vendor specific features. The NERF'd subset of features in most Aarch64 and amd64 packages should be enough indication software people won't give a bean about unstable vanity silicon features.

We shall see how RISC-Y plays out in the market. Old Yeller sure looks nervous. =3

brucehoult · 2025-12-12T08:26:28 1765527988

The X280 is nothing special as a CPU core. It's basically the U74 with added 512 bit vector unit (but only 256 bit ALU), which makes it pretty much equivalent to SpacemiT's X60 core in their K1/M1 SoCs.

There is no X280 hardware available yet for general purchase. There is the HiFive Xara X280 announced in May, but that is believed to be available to SiFive licensees only. The SG2380 was going to have X280s as an NPU alongside P670 main cores, but that's been cancelled as a result of US sanctions on Sophgo. The PIC64-HSPC is a rad-hard chip using the X280 for NASA and other space customers, but will not be cheap -- the RAD750 PowerPC chip it is replacing reportedly costs $200,000 each.

Joel_Mckay · 2025-12-12T18:35:52 1765564552

Indeed, the U.S. Government $8.9 billion investment in Intel common stock could be an indication the entire force of the political structure may drop a boot on competitors.

Regulatory capture is something people need to take seriously. Some may shelve product IP for a few years, or set-up parallel factories in other countries without the artificial trade/global-talent barriers.

A standard doesn't have to be perfect, but must be consistent over significant periods of time to matter. Consider what happened to OpenSparc, Cell, IA-64, dojo tiles, and early RISC (Windows NT prototype was ported off by Microsoft.)

The NVIDIA CUDA card kludge wasn't necessarily "better" than something like the M3/M4/M5 at every task. But was economical hardware due to volume pricing, has 92% of the ecosystem, and most software already worked given it isn't walled-off.

Note people tend to avoid buying work, or porting to short-lived hardware. Best of luck, =3

brucehoult · 2025-12-10T08:43:01 1765356181

Quote, because unlike on Reddit I couldn't figure out how to do multi para > quotes with code here.

------

Compressed pointers reduce the need for memory by storing pointers as 32-bit unsigned offsets relative to a base register. Decompressing the pointers just consists of adding the offset and register together. As simple as this sounds, it comes with a small complication on our RISC-V 64-bit port. By construction, 32-bit values are always loaded into the 64-bit registers as signed values. This means that we need to zero-extend the 32-bit offset first. Until recently this was done by bit-anding the register with 0xFFFF_FFFF:

    li   t3,1
    slli t3, t3, 32
    addi t3, t3, -1
    and  a0, a0, t3

Now, this code uses the `zext.w` instruction from the Zba extension:

    zext.w a0, a0

-----

This is so strange. Does no one at Google know RISC-V? This has *never* needed more than...

    slli a0, a0, 32
    srli a0, a0, 32

And if they're going to use `Zba`, and zero-extend it and then add it to another register, then why use a separate `zext.w` instruction and `add` instead of ...

    add.uw decompressed, compressed, base

... to zero extend and add in one instruction??

After all, `zext.w` is just an alias for `add.uw` with the `zero` register as the last argument...

They also could have always simply stored the 32 bit offset as signed and pointed the base register 2GB into the memory area instead of using x86/Arm-centric design.

brucehoult · 2025-12-09T23:48:20 1765324100

Why "bad"? It seems to me it does exactly what it sets out to do.

Obviously, if you just want a fast laptop with a long battery life and you don't care what is inside it then you should get a Mac, or possibly something with the latest Qualcomm SoC, or an x86.

If so then this isn't for you anyway.

Jeff's facts are, obviously, correct but I really wish he'd drop all the snark. Just start off right at the start by saying "If you don't want this BECAUSE it's the RISC-V then it's not for you, wait for the 8-wide RVA23 machines in a year or so" and then stick to the facts from then on.

The people who are actually interested in something like this need a machine to work on for the next year, and this is by far the best option at the moment (unless you need RVV).

It's, so far, and for many purposes, the fastest RISC-V machine you can buy [1] and you can carry it around and even use it without power in a cafe or something for a while.

I don't even know what the last time was I wanted to use my laptop away from AC for more than 2-3 hours. As a 24 core i9 the battery life is only slightly longer anyway -- about 5 hours of light editing and browsing in Linux, but if I start to actually do heavy compiling using 200W then it's dead really quickly.

[1] the Milk-V Pioneer with 64 slower cores is faster for some things, but there isn't all that much that can really use more than 8 cores, even most software builds. And it's been out of production for a year, and costs $2500+ anyway.

rasz · 2025-12-10T03:58:28 1765339108

I suspect normal laptop with QEMU would run RISC-V code faster.

brucehoult · 2025-12-10T04:24:11 1765340651

No, not on a laptop with anything like a comparable number of cores.

Any x86 or Apple Silicon laptop that can match the DC-ROMA II in QEMU will need around three times as many cores -- if the task even scales to that many cores -- and will cost a lot more.

I tried compiling GCC 13 on my i9-13900HX laptop with 24 cores, and on Milk-V Megrez which is the same chip but only one of them (4 cores, not 8):

on Megrez:

    real    260m14.453s
    user    872m5.662s
    sys     32m13.826s

On docker/QEMU on i9:

    real    209m15.492s
    user    2848m3.082s
    sys     29m29.787s

Only just 25% faster on the x86 laptop. Compared to an 8 core RISC-V it would be slower.

And 3.2x more CPU time on the x86 with QEMU than on the RISC-V natively, so you'd need that many more "performance" cores than the either this RISC-V laptop has RISC-V.

Or build Linux kernel 7503345ac5f5 (almost exactly a year old at this point) using RISC-V defconfig:

i9-13900HX docker/qemu

    real    19m12.787s
    user    583m44.139s
    sys     10m3.000s

Ryzen 5 4500U laptop docker/qemu (Zen2 6 cores, Win11)

    real    143m20.069s
    user    820m26.988s
    sys     24m33.945s

Mac Mini M1 docker/qemu (4P + 4E cores)

    real    69m16.520s
    user    531m47.874s
    sys     12m28.567s

VisionFive 2 (4x U74 in-order cores @1.5 GHz, similar to RPi 3)

    real    67m35.189s
    user    249m55.469s
    sys     13m35.877s

Milk-V Megrez (4x P550 cores @1.8 GHz)

    real    42m12.414s
    user    149m5.034s
    sys     11m33.624s

The cheap (~$50) VisionFive 2 is the same speed as an M1 Mac with qemu, or twice as fast as the 6 core Zen 2).

The 4 core Megrez takes around twice as long as the 24 core i9 with qemu. Eight of the same cores in the DC-Roma II will match the 24 core i9 and be more than three times faster than the 8 core M1 Mac.

brucehoult · 2025-12-02T01:22:21 1764638541

He also runs a site with a bunch of different compilers and versions :p

mattgodbolt · 2025-12-02T02:40:48 1764643248

That's just some weird side hobby of his.

brucehoult · 2025-12-02T05:27:59 1764653279

Dude. You've become a verb.

brucehoult · 2025-12-02T01:18:26 1764638306

That's 1 byte smaller than `LDA #0`, but not faster. And you don't have enough registers to waste them -- being able to do `STZ` and the `(zp)` addressing mode without having to keep 0 in Z or Y were small but soooo convenient things in the 65C02.

snvzz · 2025-12-02T06:38:50 1764657530

You might like the PC Engine, a game console based on the 65C02*.

*Actually a custom chip also containing some peripherals.

brucehoult · 2025-12-02T01:14:37 1764638077

65C02s are $8 now. That didn't stop me buying one when I was stuck at home during COVID. And a 6809 too.

But forget AVR. Yeah, for a buck or so the ATTiny85 was my go-to small MCU five years ago, and the $5 328 for bigger tasks.

But for the last three years both can be replaced by a 48 MHz 32 bit RISC-V CH32V003 for $0.10 for the 8 pin package (like ATTiny85, and also no external components needed) and $0.20 for the 20 pin package with basically the same number of GPIOs as the 328. At 2k RAM and 16K flash it's the same RAM and a little less flash than the ATMega328 -- but not as much as you'd think as RISC-V handles 16 and 32 bit values and pointers sooo much better.

And now you have the CH32V002/4/5/6 with enhanced CPU and more RAM and/or flash -- up to 8K rAM and 62K flash on the 006 -- and still for around the $0.10-$0.20 price

https://www.lcsc.com/product-detail/C42431288.html

OhMeadhbh · 2025-12-04T23:07:48 1764889668

Hi Bruce! If you make it back to the states we'll have to drink a beer and wax poetic about the 6809. Do you know if anyone ever implement the embedded RISC-V profile in hardware? Not everything I do on small systems needs a 48MHz 32-bit. But if I could get away with a low I/O count, why not use the $0.10 part? Also pretty sure I saw 8051 based SoCs going for $2. I bet if you looked hard enough you could find something like a 6502 for about the same price.

There's probably no reason not to get some of the CH32VXXX's to play with. Every now and again I have an application that needs very low power and I'm happy to spring for an MSP430. But every time I buy an MSP430, TI EoLs the specific model I bought.

brucehoult · 2025-12-06T09:06:02 1765011962

Heeey, how's the Cruz treating you? If it still is.

I don't know why you'd ever want to pay a cent more for a 6502 or 8051 or AVR than for a RISC-V or ARM (e.g. Puya PY32F002A). Especially when the CH32V002/4/6 run on anything from 2V to 5V (plus a margin) which is pretty rare, and they don't need any external components.

I don't know whether the M6809 designers were the first to ever analyse a body of real software to find instruction and addressing mode frequencies and the distribution of immediates in order to optimise the encoding of a new ISA -- in a way that the 8086 people clearly didn't [1], but I think they were the first to publish about it, and I was fascinated by their BYTE articles at the time.

MSP430 is also a fun ISA. I just wish they were cheaper, and the cheap ones has more than 512 bytes of RAM. FRAM is funky. I also loooove the instruction encoding e.g. `add.w r10,r11` is `0x5B0A` where `5` is `add`, `B` is src register, `0` means reg to reg word size, `A` is dst register. Just beautiful. Far nicer for emulating on a 6502 or z80 than Arm or RISC-V too. The R2/R3 const generation is a bit whack though.

[1] e.g. on one hand deciding it was worth squeezing a 5 bit offset from any of 4 registers into a 2-byte instruction, while also providing 8 and 16 bit offsets with 3 and 4 byte instructions. They were also confident enough to relegate the 6800's SEC/CLC/SEI/CLI/SEV/CLV to two-byte instructions (with a mask so you could do multiple at once). But not confident enough to do the same with DAA, or SEX. They kept the M6800 encoding for DAA (and for as much else as possible e.g. keeping the opcodes for indexed addressing, but expanding from one option to dozens), but SEX was new to them and they could have experimented with it.

brucehoult · 2025-11-27T00:43:53 1764204233

Not really. It looks like that in the C code, but in the generated machine code it'll just be a single `MULH` instruction giving (only) the upper 64 bits of the result, no shift needed.

brucehoult · 2025-11-14T02:36:32 1763087792

No wireless? Less space than a Nomad? Lame.

That aged well. Six years later it turned into the iPhone.