Hacker Newsnew | past | comments | ask | show | jobs | submit | ludocode's commentslogin

I'm the author of Onramp. Thanks for linking it!

One of the VMs I wrote for Onramp is in POSIX shell [1]. This was intended to make C bootstrappable on any POSIX system of any architecture with nothing besides the shell. Unfortunately it's about 100,000x too slow to be useful. It's also at least as complicated as a machine code VM. I've since mostly abandoned the POSIX shell idea.

Onramp does have a very simple C89 VM though, and its purpose is for bootstrapping modern C on systems that have only a basic C compiler [2]. So this c89cc.sh could in theory work. I tried it and unfortunately it doesn't quite compile yet (and doesn't give a comprehensible error message either.) Even if it worked, c89cc.sh only compiles to x86_64 ELF, and it's way more complicated than the x86_64 ELF machine code Onramp VM [3].

This has been a bit of a recurring theme with Onramp: anything I've tried to get away from the initial machine code stages ends up being more complicated than handwritten machine code. Still, it's nice to have a lot of different ways to bootstrap. I love seeing projects like this and I'm glad to see more people taking bootstrapping seriously.

[1]: https://github.com/ludocode/onramp/blob/develop/platform/vm/...

[2]: https://github.com/ludocode/onramp/blob/develop/platform/vm/...

[3]: https://github.com/ludocode/onramp/blob/develop/platform/vm/...


Took some time to look at onramp, it looks awesome!

I just use ConnectBot to ssh to my house. It runs tmux and vim well, especially with a little pocket-size folding bluetooth keyboard to go with it.


These filesystems are not really alternatives because mdraid supports features those filesystems do not. For example, parity raid is still broken in btrfs (so it effectively does not support it), and last I checked zfs can't grow a parity raid array while mdraid can.

I run btrfs on top of mdraid in RAID6 so I can incrementally grow it while still having copy-on-write, checksums, snapshots, etc.

I hope that one day btrfs fixes its parity raid or bcachefs will become stable enough to fully replace mdraid. In the meantime I'll continue using mdraid with a copy-on-write filesystem on top.


> zfs can't grow a parity raid array while mdraid can.

indeed out of date - that was merged a long time ago and shipped in a stable version earlier this year.


Like everything else in engineering it is a matter of trade offs. The setup you chose to run really hampers the usefulness of having a checksuming file system, since it cannot simply get the correct data from another drive. As a peer pointed out: ZFS does support adding additional drives to expand a RaidZ (with some trade offs). What you cannot do is change the raid topology at the fly.


soon :)


Here's a link to the case on the Framework marketplace:

https://frame.work/ca/en/products/cooler-master-mainboard-ca...

I put my original mainboard in one of these when I upgraded. It's fantastic. I had it VESA-mounted to the back of a monitor for a while which made a great desktop PC. Now I use it as an HTPC.


Indeed, a decent closed hash table is maybe 30 lines. An open hash table with linear probing is even less, especially if you don't need to remove entries. It's almost identical to a linear search through an array; you just change where you start iterating.

In my first stage Onramp linker [1], converting linear search to an open hash table adds a grand total of 24 bytecode instructions, including the FNV-1a hash function. There's no reason to ever linear search a symbol table.

[1]: https://github.com/ludocode/onramp/blob/develop/core/ld/0-gl...


a linear search may be faster because it is cache and branch prediction frienly. Benchmarks on real world data is needed to make a final call.


Indeed! An eventual goal of Onramp is to bootstrap in freestanding so we can boot directly into the VM without an OS. This eliminates all binaries except for the firmware of the machine. The stage0/live-bootstrap team has already accomplished this so we know it's possible. Eliminating firmware is platform-dependent and mostly outside the scope of Onramp but it's certainly something I'd like to do as a related bootstrap project.

A modern UEFI is probably a million lines of code so there's a huge firmware trust surface there. One way to eliminate this would be to bootstrap on much simpler hardware. A rosco_m68k [1] is an example, one that has requires no third party firmware at all aside from the non-programmable microcode of the processor. (A Motorola 68010 is thousands of times slower than a modern processor so the bootstrap would take days, but that's fine, I can wait!)

Of course there's still the issue of trusting that the data isn't modified getting into the machine. For example you have to trust the tools you're using to flash EEPROM chips, or if you're using an SD card reader you have to trust its firmware. You also have to trust that your chips are legit, that the Motorola 68010 isn't a modern fake that emulates it while compromising it somehow. If you had the resources you'd probably want to x-ray the whole board at a minimum to make sure the chips are real. As for trusting ROM, I have some crazy ideas on how to get data into the machine in a trustable way, but I'm not quite ready to embarrass myself by saying them out loud yet :)

[1]: https://rosco-m68k.com/


Author here. I think my opinion would be about the same as the authors of the stage0 project [1]. They invested quite a bit of time trying to get Forth to work but ultimately abandoned it. Forth has been suggested often for bootstrapping a C compiler, and I hope someone does it someday, but so far no one has succeeded.

Programming for a stack machine is really hard, whereas programming for a register machine is comparatively easy. I designed the Onramp VM specifically to be easy to program in bytecode, while also being easy to implement in machine code. Onramp bootstraps through the same linker and assembly languages that are used in a traditional C compilation process so there are no detours into any other languages like Forth (or Scheme, which live-bootstrap does with mescc.)

tl;dr I'm not really convinced that Forth would simplify things, but I'd love to be proven wrong!

[1]: https://github.com/oriansj/stage0?tab=readme-ov-file#forth


You might get a kick out of DuskOS(baremetal forth system)'s C compiler.

https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/co...


To add a bit to this, although Dusk OS doesn't have the same goals as stage0, that is to mitigate the "trusting trust" attack, I think it effectively does it. Dusk OS kernels are less than 3000 bytes. The rest boots from source. One can easily audit those 3000 bytes manually to ensure that there's nothing inserted.

That being said, the goal of stage0 is to ultimately compile gcc and there's no way to do that with Dusk OS.

That being said (again), this README in stage0 could be updated because I indeed think that Dusk is a good counterpoint to this critique of Forth.


Oh, amazing! I've heard of DuskOS before but I didn't realize its C compiler was written in Forth.

Looks like it makes quite a few changes to C so it can't really run unmodified C code. I wonder how much work it would take to convert a full C compiler into something DuskCC can compile.

One of my goals with Onramp is to compile as much unmodified POSIX-style C code as possible without having to implement a full POSIX system. For example Onramp will never support a real fork() because the VM doesn't have virtual memory, but I do want to implement vfork() and exec().


It can't compile unmodified C code targeting POSIX. That's by design. Allowing this would import way too much complexity in the project.

But it does implement a fair chunk of C itself. The idea is to minimize the magnitude of the porting effort and make it mechanical.

For example, the driver the the DWC USB controller (the controller on the raspberry pi) comes from plan 9. There was a fair amount of porting to do, but it was mostly to remove the unnecessary hooks. The code itself, where the real logic happens, stays pretty much the same and can be compiled just fine by Dusk's C compiler.


That might be rather more difficult than you might expect. Advent of Code uses quite a lot of 64-bit numbers. A bit of googling tells me C64 BASIC only supports 16-bit integers and 32-bit floats. I imagine the other BASICs have similar limitations.

I did 2023 Advent of Code with my own compiler and this was the biggest challenge I ran into. I only had 32-bit integers at the time so I had to manually implement 64-bit math and number formatting within the language to be able to do the puzzles. You would probably have to do the same in BASIC.


Commodore BASIC, derived from Microsoft's 6502 BASIC, actually has 40-bit floats, with a 32-bit mantissa and 8-bit exponent, not that it would help much, if any, with 64-bit maths.

There are Microsoft BASICs that have 64-bit floats, such as built into ROM on the TRS-80 Model I, III and 4 w/Level 2 BASIC, TRS-80 Model 100/102, TI-99/4(a), Apple III, and MSX systems, or on cartridge such as Microsoft BASIC for the Atari 8-bit computers.


The generated C code could contain a backdoor. Generated C is not really auditable so there would be no way to tell that the code is compromised.


It can be difficult to explain why bootstrapping is important. I put a "Why?" section in the README of my own bootstrapping compiler [0] for this reason.

Security is a big reason and it's one the bootstrappable team tend to focus on. In order to avoid the trusting trust problem and other attacks (like the recent xz backdoor), we need to be able to bootstrap everything from pure source code. They go as far as deleting all pre-generated files to ensure that they only rely on things that are hand-written and auditable. So bootstrapping Python for example is pretty complicated because the source contains code generated by Python scripts.

I'm much more interested in the cultural preservation aspect of it. We want to preserve contemporary media for future archaeologists, for example in the Arctic World Archive [1]. Unfortunately it's pointless if they have no way to decode it. So what do we do? We can preserve the specs, but we can't really expect them to implement x265 and everything else they would need from scratch. We can preserve binaries, but then they'd need to either get thousand-year-old hardware running or virtualize a thousand-year-old CPU. We can give them, say, a definition of a simple Lisp, and then give them code that runs on that, but then who's going to implement x265 in a basic Lisp? None of this is really practical.

That's why in my project I made a simple virtual machine, then bootstrapped C on top of it. It's trivially portable, not just to present-day architectures but to future and alien architectures as well. Any future archaeologist or alien civilization could implement the VM in a day, then run the C bootstrap on it, then compile ffmpeg or whatever and decode our media. There are no black boxes here: it's all debuggable, auditable, open, handwritten source code.

[0]: https://github.com/ludocode/onramp?tab=readme-ov-file#why-bo...

[1]: https://en.wikipedia.org/wiki/Arctic_World_Archive


Yep, I think this would have been good context in the OP


Say you start with nothing but "pure source code".

With what tool do you process that source code?


The minimum tool that bootstrapping projects tend to start with is a hex monitor. That is, a simple-as-possible tool that converts hexadecimal bytes of input into raw bytes in memory, and then jumps to it.

You need some way of getting this hex tool in memory of course. On traditional computers this could be done on front panel switches, but of course modern computers don't have those anymore. You could also imagine it hand-woven into core rope memory for example, which could then be connected directly to the CPU at its boot address. There are many options here; getting the hex tool running is very platform-specific.

Once you have a hex tool, you can then use that to input the next stage, which is written in commented hexadecimal source code. The next tool then adds a few features, and so does the tool after that, and so on, eventually working your way up to assembly and C.


From the point of view of trust and security, bootstrapping has to be something that's easily repeatable by everyone, in a reasonable amount of time and steps, with the same results.

Not to mention using only the current versions of all the deliverables or at most one version back.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: