More

asabla · 2026-03-08T21:41:41 1773006101

Oh woah!

I've been trying to get microsandbox to play nicely. But this is much closer to what I actually need.

I glimpsed through the site and the script. But couldn't really see any obvious gotchas.

Any you've found so far which hasn't been documented yet?

e1g · 2026-03-08T21:46:46 1773006406

Pure TUI is solid - I’ve been running all my pets inside that cage for several weeks with no issues. Auto-updates work, session renewals work, config updates work etc.

But lately I’ve been using agents to test via browsers, and starting headless browsers from the agent is flakey. I’m working on that but it’s hard to find a secure default to run Chrome.

In the repo, I have policies for running the Claude desktop app and VSCode inside the same sandbox (so can do yolo mode there too), so there is hope for sandboxing headless Chrome as well.

asabla · 2026-03-08T22:26:49 1773008809

Yee I gotcha.

Did a migration myself last week from using playwright mcp towards playwright-cli instead. Which has been playing much nicer so far. I guess you would run into the same issues you've already mentioned about running chrome headless in one of these sandboxes.

I'll for sure keep an eye out for updates.

Kudos to the project!

e1g · 2026-03-09T00:41:04 1773016864

playwright-cli works out of the box, and I just merged support for agent-browser. If you end up testing out Safehouse, and have any issues, just create an issue on GitHub, and I'll check it out. Browser usage is definitely among my use cases.

asabla · 2026-03-05T21:43:40 1772747020

I really don't have any numbers to back this up. But it feels like the sweet spot is around ~500k context size. Anything larger then that, you usually have scoping issues, trying to do too much at the same time, or having having issues with the quality of what's in the context at all.

For me, I would say speed (not just time to first token, but a complete generation) is more important then going for a larger context size.

asabla · 2026-01-20T22:25:42 1768947942

Very nice!

I've been experimenting with a similar setup. And I'll probably implement some of the things you've been doing.

For the proxy part I've been running https://www.mitmproxy.org/ It's not fully working for all workflows yet. But it's getting close

asabla · 2025-12-21T23:29:45 1766359785

From my point of view, you're either choosing between instruction following or more creative solutions.

Codex models tend to be extremely good at following instructions, to the point that it won't do any additional work unless you ask it to. GPT-5.1 and GPT-5.2 on the other hand is a little bit more creative.

Models from Anthropics on the other hand is a lot more loosy goosy on the instructions, and you need to keep an eye on it much more often.

I'm using models interchangeably from both providers all the time depending on the task at hand. No real preference if one is better then the other, they're just specialized on different things

asabla · 2025-11-25T05:04:01 1764047041

This is such a good video. I really like the way he presents it as well.

His rant about CS historians is also a fun subject

asabla · 2025-10-28T06:59:14 1761634754

> The last thing I'll mention is that Claude Code (Sonnet 4.5) is still very token-happy, in that it eagerly goes above and beyond when not always necessary. Codex (gpt-5-codex) on the other hand, does exactly what you ask, almost to a fault.

I very much share your experience. As for the time being I like the experience with codex over claude, just because I find my self in a position where I know much sooner when to step in and just doing it manually.

With claude I find my self in a typing exercise much more often, I could probably get better of knowing when to stop ofc.

asabla · 2025-09-28T08:04:13 1759046653

I can't tell if this is satire or not. And some parts read like it was written by AI.

Either way, a more fine grained control over the GC is probably preferred over something like this.

shiomiru · 2025-09-28T08:25:54 1759047954

The whole post is obviously LLM spam. e.g.

> It is the conductor [emoji] that directs the symphony of independent Spaces.

this pattern is a meme by now.

axx83 · 2025-10-04T22:35:59 1759617359

Hi all, my apologies for the very late reply. I honestly didn't expect this post to get any attention and haven't checked back until now.

I want to clarify that the ideas in this manifesto are entirely my own. As I mentioned in the preface, English is not my native language, so I used AI tools to translate my original text. This is likely why some parts have an "AI-like" feel.

Thank you for the feedback; it's genuinely helpful. The substance of the work is what I truly wanted to share.

asabla · 2025-09-27T06:28:01 1758954481

I'm always so confused by those statements as well. Because just like you, I feel that the 20B version is really good at following instructions.

Some of the qwen models are too, but they seem to need a bit more handholding.

This is of course just anecdotal from my end. And I've been slacking on keeping up with evals while testing at home

asabla · 2025-08-28T16:31:18 1756398678

And by GPT-5 you mean through their API? Directly through Azure OpenAI services? or are you talking about ChatGPT set to using GPT-5.

All of these alternatives means different things when you say it takes +20 seconds for a full response.

ugh123 · 2025-08-28T16:47:05 1756399625

Sure, apologies. I mean ChatGPT UI

asabla · 2025-08-23T10:45:51 1755945951

I fundamentally agree with you.

But anti-cheat hasn't been about blocking every possible way of cheating for some time now. It's been about making it as in convenient as possible, thus reducing the amount of cheaters.

Is the current fad of using kernel level anti-cheats what we want? hell nah.

The responsibility of keeping a multi-player session clean of cheaters, was previously shared between the developers and server owners. While today this responsibility has fallen mostly on developers (or rather game studios) since they want to own the whole experience.