More

hedgehog · 2026-02-01T22:08:47 1769983727

Obviously I'm biased but this looks really useful.

hedgehog · 2026-02-01T20:18:32 1769977112

If you set your Apple device to beta updates for the previous release you can suppress the constant prompts to upgrade. Reduces the chance of accidentally upgrading.

doawoo · 2026-02-01T22:07:21 1769983641

Be warned if you actually install beta software and take your device to the Apple Store they will not replace parts because of the chance the diagnostic tools aren’t compatible- this bit me trying to get my iPhone battery replaced

ashton314 · 2026-02-01T21:35:53 1769981753

The hedgehog knows one great thing. This is it. Thank you.

lisper · 2026-02-01T21:36:32 1769981792

How do you do that?

hedgehog · 2026-02-01T22:04:30 1769983470

Settings -> General -> Software Update -> Beta Updates

It's the same on macOS and iOS, pick "macOS Sequoia Public Beta" or the corresponding release for your device. Apple still pushes security updates for those releases, and I haven't heard of any problems with the kind of minor updates that ship late in a major release's lifecycle, so I think the risk of running this way is low. This kicks the can a year or two down the road, at which point hopefully there are better workarounds.

lisper · 2026-02-02T03:35:13 1770003313

Thanks!

hedgehog · 2026-01-30T22:03:08 1769810588

High end consumer SSDs can do closer to 15 GB/s, though only with PCI-e gen 5. On a motherboard with two m.2 slots that's potentially around 30GB/s from disk. Edit: How fast everything is depends on how much data needs to get loaded from disk which is not always everything on MoE models.

greenavocado · 2026-01-31T00:50:43 1769820643

Would RAID zero help here?

hedgehog · 2026-01-31T01:43:33 1769823813

Yes, RAID 0 or 1 could both work in this case to combine the disks. You would want to check the bus topology for the specific motherboard to make sure the slots aren't on the other side of a hub or something like that.

hedgehog · 2026-01-30T13:00:55 1769778055

A salty pinch of death

hedgehog · 2026-01-27T21:18:34 1769548714

This looks pretty solid. I think you can make this process more efficient by decomposing the problem into layers that are more easily testable, e.g. testing topological relationships of DOM elements after parse, then spatial after layout, then eventually pixels on things like ACID2 or whatever the modern equivalent is. The models can often come up with tests more accurately than they get the code right the first time. There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.

embedding-shape · 2026-01-27T21:47:40 1769550460

> There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.

That's really interesting and sounds useful! I'm wondering if there are general guidelines/requirements (not specific to browsers) that could kind of "trigger" those things in the agent, without explicitly telling it. I think generally that's how I try to approach prompting.

hedgehog · 2026-01-28T01:05:08 1769562308

I think if you explain that general idea the models can figure it enough to write into an implementation plan, at least some of the time. Interesting problem though.

embedding-shape · 2026-01-28T01:11:33 1769562693

> that general idea the models can figure it enough to write into an implementation plan

I'm not having much luck with it, they get lost in their own designs/architectures all the time, even the best models (as far as I've tested stuff). But as long as I drive the design, things don't end up in a ball of spaghetti immediately.

Still trying to figure out better ways of doing that, feels like we need to focus on tooling that lets us collaborate with LLMs better, rather than trying to replace things with LLMs.

hedgehog · 2026-01-28T02:32:40 1769567560

Yeah, from what I can tell a lot of design ability is somewhere in the weights but the models don't regurgitate it without some coaxing. It may be related to the pattern where after generating some code you can instruct a model review it for correctness and it can find and fix many issues. Regarding tooling, there's a major philosophical divide between LLM maximalists that prefer the model to drive the "agentic" outer loop and what I'll call "traditionalists" that prefer control be run by algorithms more related to classical AI research. My personal suspicion is the second branch is greatly under-exploited but time will tell.

socalgal2 · 2026-01-28T06:41:11 1769582471

the modern equivalent is the Web Platform Tests

https://web-platform-tests.org/

hedgehog · 2026-01-28T17:44:44 1769622284

Amazing. I think if I were taking on the build-a-browser project I would pair that with the WhatWG HTML spec to come up with a task list (based on the spec line-by-line) linked to specific tests associated with each task. Then of course need an overall architecture and behavioral spec for how the browser part behaves beyond just rendering. A developer steering process full time might be able to get within 80% parity of existing browsers in a month. It would be an interesting experiment.

embedding-shape · 2026-01-28T17:52:24 1769622744

> I would pair that with the WhatWG HTML spec

I placed some specifications + WPT into the repository the agent had access to! https://github.com/embedding-shapes/one-agent-one-browser/tr...

But judging by the session logs, it doesn't seem like the agent saw them, I never pointed it there, and seems none of the searches returned anything from there.

I'm slightly curious in doing it from scratch again, but this time explicitly point it to the specifications, and see if it gets better or worse.

hedgehog · 2026-01-26T18:28:59 1769452139

I ported a closed source web conferencing tool to Rust over about a week with a few hours of actual attention and keyboard time. From 2.8MB of minified JS hosted in a browser to a 35MB ARM executable that embeds its own audio, WebRTC, graphics, embedded browser, etc. Also a mdbook spec to explain the protocol, client UI, etc. Zero lines of code by me. The steering work did require understanding the overall work to be done, some high level design of threading and buffering strategy, what audio processing to do, how to do sprite graphics on GPU, some time in a profiler to understand actual CPU time and memory allocations, etc. There is no way I could have done this by hand in a comparable amount of time, and given the clearly IP-encumbered nature I wouldn't spend the time to do it except that it was easy enough and allowed me to then fix two annoying usability bugs with the original.

written-beyond · 2026-01-26T19:03:39 1769454219

Please give us a write up

hedgehog · 2026-01-26T22:35:17 1769466917

I don't have time right now for a proper write-up but the basic points in the process were:

1. Write a document that describes the work. In this case I had the minified+bundled JS, no documentation, but I did know how I use the system and generally the important behavioral aspects of the web client. There are aspects of the system that I know from experience tend to be tricky, like compositing an embedded browser into other UI, or dealing with VOIP in general. Other aspects, like JS itself, I don't really know deeply. I knew I wanted a Mac .app out the end, as well as Flatpak for Linux. I knew I wanted an mdbook of the protocol and behavioral specs. Do the best you can. Think really hard about how to segment the work for hands-off testability so the assistant can grind the loop of add logs, test run, fix, etc.

2. In Claude Desktop (or whatever) paste in the text from 1 and instruct it to research and ask you batches of 10 clarifying questions until it has enough information to write a work plan for how to do the job, specific tools, necessary documentation, etc. Then read and critique until you feel like the thread has the elements of a good plan, and have Claude generate a .md of the plan.

3. Create a repo containing the JS file and the plan.

4. Add other tools like my preferred template for change implementation plans, Rust style guide, etc (have the chatbot write a language style guide for any language you use that covers the gap between common practice ~3 years ago and the specific version of the language you want to use, common errors, etc). I have specific instructions for tracking current work, work log, and key points to remember in files, everyone seems to do this differently.

5. Add Claude Code (or whatever) to the container or machine holding the repo.

Repeat until done:

6a. Instruct the assistant to do a time-boxed 60 minutes of work towards the goal, or until blocked on questions, then leave changes for your review along with any questions.

6b. Instruct the assistant to review changes from HEAD for correctness, completeness, and opportunities to simplify, leaving questions in chat.

6c. Review and give feedback / make changes as necessary. Repeat 6b until satisfied.

6d. Go back to 6a.

At various points you'll find that the job is mis-specified in some important way, or the assistant can't figure out what to do (e.g. if you have choppy audio due to a buffer bug, or a slow memory leak, it won't necessarily know about it). Sometimes you need to add guidance to the instructions like "update instructions to emphasize that we must never allocate in situation XYZ". Sometimes the repo will start to go off the rails messy, improved with instructions like "consider how to best organize this repository for ease of onboarding the next engineer, describe in chat your recommendations" and then have it do what it recommended.

There's a fair amount of hand-holding but a lot of it is just making sure what it's doing doesn't look crazy and pressing OK.

written-beyond · 2026-01-28T19:41:52 1769629312

Oh no I didn't meant a write about the prompting I meant about the actual client you wrote.

What was the final framework like, how did the protocols work, etc.

hedgehog · 2026-01-28T22:53:43 1769640823

Oh, there's a centrally hosted web server that hosts the assets, some of the conference state, account info, that sort of thing. Clients join a SSE channel for notifications of events relating to other clients. Then a combination of POST to the web service & ICE and STUN to establish all-to-all RTP over WebRTC for audio, and other client state updates as JSON over WebRTC data channel. The UI is very specific to the app but built on winit, webgpu, and egui. wry for embedded browser.

hedgehog · 2026-01-24T02:04:28 1769220268

I haven't tried them myself but the Kyutai has a couple projects that could fit.

https://kyutai.org

hedgehog · 2026-01-23T23:55:06 1769212506

This is effective and it's convenient to have all that stuff co-located with the code, but I've found it causes problems in team environments or really anywhere where you want to be able to work on multiple branches concurrently. I haven't come up with a good answer yet but I think my next experiment is to offload that stuff to a daemon with external storage, and then have a CLI client that the agent (or a human) can drive to talk to it.

hhmc · 2026-01-24T00:29:05 1769214545

git worktrees are the canonical solution

hedgehog · 2026-01-24T02:01:19 1769220079

worktrees are good but they solve a different problem. Question is, if you have a lot of agent config specific to your work on a project where do you put it? I'm coming around to the idea that checked in causes enough problems it's worth the pain to put it somewhere else.

ndriscoll · 2026-01-24T14:10:40 1769263840

I have this in my AGENTS.md:

  ## Task Management
  - Use the projects directory for tracking state
  - For code review tasks, do not create a new project
  - Within the `open` subdirectory, make a new folder for your project
  - Record the status of your work and any remaining work items in a `STATUS.md` file
  - Record any important information to remember in `NOTES.md`
  - Include links to MRs in NOTES.md.
  - Make a `worktrees` subdirectory within your project. When modifying a repo, use a `git worktree` within your project's folder. Skip worktrees for read-only tasks
  - Once a project is completed, you may delete all worktrees along with the worktrees subdirectory, and move the project folder to `completed` under a quarter-based time hierarchy, e.g. `completed/YYYY-Qn/project-name`.

More stuff, but that's the basics of folder management, though I haven't hooked it up to our CI to deal with MRs etc, and have never told it that a project is done, so haven't ironed out whether that part of the workflow works well. But it does a good job of taking notes, using project-based state directories for planning, etc. Usually it obeys the worktree thing, but sometimes it forgets after compaction.

I'm dumb with this stuff, but what I've done is set up a folder structure:

  dev/
     dev/repoA
     dev/repoB
     ...
     dev/ai-workflows/
         dev/ai-workflows/projects

And then in dev/AGENTS.md, I say to look at ai-workflows/AGENTS.md, and that's our team sharable instructions (e.g. everything I had above), skills, etc. Then I run it from `dev` so it has access to all repos at once and can make worktrees as needed without asking. In theory, we all should push our project notes so it can have a history of what changed when, etc. In practice, I also haven't been pushing my project directories because they have a lot of experimentation that might just end up as noise.

fragmede · 2026-01-24T02:57:42 1769223462

worktrees are a bunch of extra effort. if your code's well segregated, and you have the right config, you can run multiple agents in the same copy of the repo at the same time, so long as they're working on sufficiently different tasks.

energy123 · 2026-01-24T08:07:13 1769242033

How do you achieve coordination?

Or do you require the tasks be sufficiently unrelated?

tomashubelbauer · 2026-01-24T13:17:57 1769260677

I do this sometimes - let Claude Code implement three or four features or fixes at the same time on the same repository directory, no worktrees. Each session knows which files it created, so when you ask CC to commit the changes it made in this session, it can differentiate them. Sometimes it will think the other changes are temporary artifacts or results of an experiment and try to clear them (especially when your CLAUDE.md contains an instruction to make it clean after itself), so you need to watch out for that. If multiple features touch the same file and different hunks belong to different commits, that's where I step in and manually coordinate.

fragmede · 2026-01-25T04:51:40 1769316700

I'm insane and run sessions in parallel. Claude.md has Claude committing to git just the changes that session made, which lets me pull each sessions changes into their own separate branch for review without too much trouble.

hedgehog · 2026-01-23T19:14:17 1769195657

Engraving data on a titanium record would be a way to store it for many years even with exceptionally poor environmental conditions (fire, flood, locusts, plagues, what have you).

bayindirh · 2026-01-23T19:32:36 1769196756

M-DISC [0] will probably cover most of the scenarios. It's still expensive, though.

[0]: https://en.wikipedia.org/wiki/M-DISC

hedgehog · 2026-01-23T23:46:40 1769212000

I'd heard of those but never looked them up, always thought they'd be super expensive. It's about $40 for a drive that can write them and $13 each for 100GB media. That's pretty reasonable for durable storage.

bayindirh · 2026-01-24T11:24:00 1769253840

Yeah, normally it's not expensive, but since market for these discs are so small here, the prices are at exorbitant levels for M-DVDs. M-BDRs were unavailable, but they are available for reasonable prices, as I just checked.

I have drives which can burn M-DVDs, but I'd need an M-BDR drive. The ones I have doesn't support M-DISCs.

kccqzy · 2026-01-23T19:21:01 1769196061

Yes. And it doesn’t have to be titanium per se. Cerabyte is trying to use ceramic. Even rocks might be good enough.

thatguy0900 · 2026-01-23T20:29:18 1769200158

Nasa preferred gold (more specifically copper plated with nickel and then plated with gold) https://en.wikipedia.org/wiki/Voyager_Golden_Record

mikepurvis · 2026-01-23T20:51:36 1769201496

To be fair, that's not simply an archival disc, but also something explicitly intended to be readable by intelligent life elsewhere in space. The encoding of data was optimized for simplicity above all else.

hedgehog · 2026-01-21T20:58:23 1769029103

Basically a loan.