Reminds me of an interview I had a while ago. The interviewer in all seriousness asked me to code up a sorting algorithm on the whiteboard. He was more of a business person than technical so was probably thinking of insertion, selection and bubblesort.
I said sure, quicksort, mergesort or radixsort?
He just said "okay, let's skip to the next question". :)
I'd be tempted to ask under what circumstances they'd expect me to be coding a sorting routine... Seems a bit like asking an accountant to add two 10 digit numbers.
BTW. Feeding that AI boom, that is driving up the cost of hardware... while at the same time asking people to buy new computers only to run the next version of their OS.
BTW. They did the same with Maps and the PDF reader: the two apps I used the most on my Windows tablet. An "upgrade" replaced the app with a nonce.
That just made me so incredibly angry.
> Efficiency: Uses a lazy-loading piece tree to avoid loading huge files into RAM
I once started writing a text editor on Linux, and first went down a similar route: a piece table over a mmap()'d file. But I abandoned using mmap, because Linux file systems typically don't have mandatory locking enabled, so you can't be sure that the file data won't be modified by another program.
(Then I got bogged down in Unicode handling... so 95% of the code became just about that, and I tired of it)
I'm wondering how the compiler optimised add_v3() and add_v4() though.
Was it through "idiom detection", i.e. by recognising those specific patterns, or did the compiler deduce the answers them through some more involved analysis?
There is a `c.mv` instruction in the compressed set, which most RISC-V processors implement.
That, and `add rd, rs, x0` could (like the zeroing idiom on x86), run entirely in the decoding and register-renaming stages of a processor.
RISC-V does actually have quite a few idioms. Some idioms are multi-instruction sequences ("macro ops") that could get folded into single micro-ops ("macro-op fusion"/"instruction fusion"): for example `lui` followed by `addi` for loading a 32-bit constant, and left shift followed by right shift for extracting a bitfield.
x86 has no architectural zero register, but a x86 CPU could have a microarchitectural zero register.
And when the instruction decoder in such a CPU with register renaming sees `xor eax, eax`, it just makes `eax` point to the zero register for instructions after it. It does not have to put any instruction into the pipeline, and it takes effectively 0 cycles.
That is what makes the "zeroing idiom" so powerful.
reply