Thank you, I will take a look at that in detail. It does give me pause that Golang is hosted by Google and people here have extremely negative things to say about Google's privacy stance in general. We want to avoid the same result or reputation.
You should hope to have hat problem, because it only happens when you are already big and successful.
Your problems will be different early on.
And for clarity, the Golang usage information is logically separate from "Google", it's of no value to them what the metrics of little parts of the go compiler are doing. They only capture data from 10% until they have statistically sufficient data
You should not hold Google ideas or practices in your head when understanding how Go does their analytics, it is fundamentally different from web analytics and fairer than any other open source tool doing analytics.
>I've been one of the strongest supporters of local AI, dedicating thousands of hours towards building a framework to enable it.
Sounds like you're very serious about supporting local AI. I have a query for you (and anyone else who feels like donating) about whether you'd be willing to donate some memory/bandwidth resources p2p to hosting an offline model:
We have a local model we would like to distribute but don't have a good CDN.
As a user/supporter question, would you be willing to donate some spare memory/bandwidth in a simple dedicated browser tab you keep open on your desktop that plays silent audio (to not be put in the background and deloaded) and then allocates 100mb -1 gb of RAM and acts as a webrtc peer, serving checksumed models?[1] (Then our server only has to check that you still have it from time to time, by sending you some salt and a part of the file to hash and your tab proves it still has it by doing so). This doesn't require any trust, and the receiving user will also hash it and report if there's a mismatch.
Our server federates the p2p connections, so when someone downloads they do so from a trusted peer (one who has contributed and passed the audits) like you. We considered building a binary for people to run but we consider that people couldn't trust our binaries, or would target our build process somehow, we are paranoid about trust, whereas a web model is inherently untrusted and safer. Why do all this?
The purpose of this would be to host an offline model: we successfully ported a 1 GB model from C++ and Python to WASM and WebGPU (you can see Claude doing so here, we livestreamed some of it[2]), but the model weights at 1 GB are too much for us to host.
Please let us know whether this is something you would contribute a background tab to hosting on your desktop. It wouldn't impact you much and you could set how much memory to dedicate to it, but you would have the good feeling of knowing that you're helping people run a trusted offline model if they want - from their very own browser, no download required. The model we ported is fast enough for anyone to run on their own machines. Let me know if this is something you'd be willing to keep a tab open for.
It is very simple. Storage / bandwidth is not expensive. Residential bandwidth is. If you can convince people to install a bandwidth-related software on their residential homes, you can then charge other people $5 to $10 per 1GiB bandwidth (useful for botnet mostly, get around DDOS protections and other reCAPTCHA tasks).
Thank you for your suggestion. Below is only our plans/intentions, we welcome feedback about it:
We are not going to do what you suggest. Instead, our approach is to use the RAM people aren't using at the moment for a fast edge cache close to their area.
We've tried this architecture and get very low latency and high bandwidth. People would not be contributing their resources to anything they don't know about.
Torrents require users to download and install a torrent client! In addition, we would like to retain the possibility of giving live updates to the latest version of a sovereign fine-tuned file, torrents don't autoupdate. We want to keep improving what people get.
Finally, we would like the possibility of setting up market dynamics in the future: if you aren't currently using all your ram, why not rent it out? This matches the p2p edge architecture we envision.
In addition, our work on WebGPU would allow you to rent out your gpu to a background tab whenever you're not using it. Why have all that silicon sit idle when you could rent it out?
You could also donate it to help fine tune our own sovereign model.
All of this will let us bootstrap to the point where we could be trusted with a download.
> We have a local model we would like to distribute but don't have a good CDN.
That is not true. I am serving models off Cloudflare R2. It is 1 petabyte per month in egress use and I basically pay peanuts (~$200 everything included).
1 petabyte per month is 1 million downloads of a 1 GB file. We intend to scale to more than 1 million downloads per month. We have a specific scaling architecture in mind. We're qualified to say this because we've ported a billion parameter model to run in your browser - fast - on either webgpu or wasm. (You can see us doing it live at the youtube link in my comment above.) There is a lot of demand for that.
The bandwidth is free on Cloudflare R2. I paid money for storage (~10TiB storage of different models). If you only host 1GiB file there, you are only paying $0.01 per month I believe.
have you compared it with Claude Code at all? Is there a similar subscription model for Gemini as Claude? Does it have an agent like Claude Code or ChatGPT Codex? what are you using it for? How does it do with large contexts? (Claude AI Code has a 1 million token context).
I tried Claude Opus but at least for my tasks, Gemini provided better results. Both were way better than ChatGPT. Haven't done any agents yet, waiting on that until they mature a bit more.
Gemini 3.1 (and Gemini 3) are a lot smarter than Claude Opus 4.6
But...
Gemini 3 series are both mediocre at best in agentic coding.
Single shot question(s) about a code problem vs "build this feature autonomously".
Gemini's CLI harness is just not very good and Gemini's approach to agentic coding leaves a lot to be desired. It doesn't perform the double-checking that Codex does, it's slower than Claude, it runs off and does things without asking and not clearly explaining why.
(Claude Code now runs claude opus, so they're not so different.)
>it's [Gemini] nowhere near claude opus
Could you be a bit more specific, because your sibling reply says "pretty close to opus performance" so it would help if you gave additional information about how you use it and how you feel the two compare. Thanks.
For anyone getting a wrong answer from reasoning models, try adding "This might be a trick question, don't just go with your first instinct, really think it through" and see if it helps. Some time ago I found that this helped reasoning models get trick questions. (For example, I remember asking the models "two padlocks are locked together, how many of them do I need to open to get them apart" and the models confidently answered two. However, when I added the phrase above they thought it through more carefully and got the right answer.)
I agree with this article completely, nice to see it presented quantitatively.
>re "only" the harness changed
In our experience, AI's are like amnesiacs who can barely remember what they did three minutes ago (their last autonomous actions might still be in their context if you're lucky), with no chance at remembering what they did three days ago. As such, the "harness" determines their entire memory and is the single most important determinant of their outcome.
The best harness is a single self-contained, well-commented, obvious, and tiny code file followed by a plain explanation of what it does and what it's supposed to do, the change request, how you want it to do it (you have to say it with so much force and confidence that the AI is afraid of getting yelled at if they do anything else) and a large amount of text devoted to asking the AI not to break what is already working. Followed by a request to write a test that passes. Followed by asking for its judgment about whether it broke what was already working on or not. All in one tiny crisp prompt.
With such a harness, it's able to not break the code one time in twenty. If you use reverse psychology and ask it to do the opposite of what you want, it rises to fifty-fifty odds you'll get what you're trying to do.
Don't believe me? You can watch the livestream (see my previous comments).
Does this have an alternative market-based solution?
I see a big problem with job scams from "legitimate" companies advertising jobs they have no plans to fulfill or which do not exist at all. They seem to do this because they employ HR and recruiters, and it looks good if job openings are on their site. It's a real problem.
I believe there could be a solution: charge $500 to post an opening and give $50 to each of ten AI-vetted qualified candidates. That means the candidate had a real job interview and passed it without AI assistance. They get $50 to interview.
If the company doesn't hire any of ten qualified candidates, then they can pay another $500, or stop pretending that the job they're advertising for is real.
What do you think of this novel model?
On the candidate side, it allows them to interview once and then only be called into a job interview if they're being paid $50 to do so.
The system could also go the other way and prevent candidates from making interviewing their living, by only inviting them to a limited number of interviews with job offers on the table - such as three such interviews job offers. After that they don't get more job offers since they're not really on the market.
The AI could also represent salary reality more transparently.
Basically, overall it should be the companies hiring that pay, not candidates who have lots of their time wasted.
What do you think of such a market-based approach?
The State of Utopia[1] is currently fine-tuning an older 1 GB model called Bitnet, so that we have something beginning to have the shape of a sovereign model that can run on the edge. We think having model sovereignty is important for our citizens, and we are working on tools for them to easily further fine-tune the model straight from their browser. We are currently running a 30-hour training run on some simple hardware and through webGPU, so that no trust or installation is required.
We made it possible to run the model in webGPU and it is pretty fast even in that environment. You can see the porting process in my last few submissions, because we livestreamed Claude Code porting the base model from the original C++ and Python.
In a separate initiative, we produced a new hash function with AI - however, although it is novel, it might not be novel enough for publication and it's unclear whether we can publish it. It has several innovations compared to other hash formats.
We are running some other developments and experiments, but don't want to promise more than we can deliver in a working state, so for more information you just have to keep checking stateofutopia.com (or stofut.com for short).
Our biggest challenge at the moment is managing Claude's use of context and versions, while working on live production installs.
Everything takes time and attention and Claude Code is far from being fully autonomous building new productive services on a server - it's not even close to being able to do that autonomously. We feel that we have to be in the loop for everything.
[1] eventual goal: technocratic utopia, will be available at stateofutopia.com
reply