> it was born of some arrogance that they were speeding towards the inevitability of AGI
I think it was partly also PR. Google, OpenAI and Anthropic are fighting for mindshare and Dalle-E, Sora, Nano banana, etc generated a lot of media buzz for Google and OpenAI at various points in time.
He was on stage and had a mic. I don’t know that the students had a lot of options to make their voices heard in the situation. And since folks like Schmidt already have access to channels to spread their opinions and this was the students’ graduation I think they get a pass.
People who reach outlier-level success in a field tend to have strong opinions and an emotional connection to said field. It’s probably a non-trivial part of why they are so successful.
Maybe for folks who are deep into this, but it’s not exactly accessible. I tried reading up on it a couple of months ago, but parsing through what hardware I needed, the model and how to configure it (model size vs quantization), how I’d get access to the hardware (which for decent results in coding, new hardware runs $4k-$10k last I checked)—it had a non trivial barrier of entry. I was trying to do this over a long weekend and ran out of time. I’ll have to look into it again because having the local option would be great.
Edit: the replies to my comment are great examples of what I’m talking about when I say it’s hard to determine what hardware I’d need :).
For me the big hangup is the hardware. If I could find a simple guide to putting together a machine that I can run off an outlet in my home, I am sold. The problem is that I haven't found this yet (though I suppose I haven't looked very hard either).
$10K should be enough to pay for a 512GB RAM machine which in combination with partial SSD offload for the remaining memory requirements should be able to run SOTA models like DS4-Pro or Kimi 2.6 at workable speed. It depends whether MoE weights have enough locality over time that the SSD offload part is ultimately a minor factor.
(If you are willing to let the machine work mostly overnight/unattended, with only incidental and sporadic human intervention, you could even decrease that memory requirement a bit.)
As a typical example DeepSeek v4-pro has 59B active params at mostly FP4 size, so it needs to "find" around 30GB worth of params in RAM per inferred token. On a 512GB total RAM machine, most of those params will actually be cached in RAM (model size on disk is around 862GB), so assuming for the sake of argument that MoE expert selection is completely random and unpredictable, around 15GB in total have to be fetched from storage per token. If MoE selection is not completely random and there's enough locality, that figure actually improves quite a bit and inference becomes quite workable.
I've never seen reports of this kind of setup being able to deliver more than low single-digit tokens per second. That's certainly not usable interactively, and only of limited utility for "leave it to think overnight" tasks. Am I missing something?
Also, I don't know of a general solution to streaming models from disk. Is there an inference engine that has this built-in in a way that is generally applicable for any model? I know (I mean, I've seen people say it, I haven't tried it) you can use swap memory with CPU offloading in llama.cpp, and I can imagine that would probably work...but definitely slowly. I don't know if it automatically handles putting the most important routing layers on the GPU before offloading other stuff to system RAM/swap, though. I know system RAM would, over time, come to hold the hottest selection of layers most of the time as that's how swap works. Some people seem to be manually splitting up the layers and distributing them across GPU and system RAM.
Have you actually done this? On what hardware? With what inference engine?
Of the big three, Gemini gives me the worst responses for the type of tasks I give it. I haven’t really tried it for agentic coding, but the LLM itself often gives, long meandering answers and adds weird little bits of editorializing that are unnecessary at best and misleading at worst.
Same. The tone is really off. Here is a response I just got from Gemini 3.1: "Your simulation results are incredibly insightful, and they actually touch on one of the most notoriously difficult aspects of ..." It's pure bullshit, my simulation results are in fact broken, GPT spotted it immediately.
I get that credit cards are a barrier of entry but I’m more willing to give providers a break now that AI agents make it much easier to abuse free tiers. It’s also harder for smaller companies to offer free tiers. If we want a more diverse set of service providers we as customers need to be willing to accept some trade-offs.
The irony in your comment is that you accuse the OP of interpreting the world based on his own warped view of it rather than what’s actually in front of him, yet you’re doing precisely that. The OP did not call Altman racist and made a point to draw the distinction. He also claims his is not the only example of this and is effectively encouraging an investigative journalist and the rest of HN to look into it and verify for ourselves.
Some degree of skepticism is healthy here. An online comment is not definitive proof, and it’s all too easy to pile accusations as part of a comment thread that’s already critical of someone. But the way you readily armchair psychoanalyze and dismiss the OP tells me you’re not engaging in an honest way.
I think it was partly also PR. Google, OpenAI and Anthropic are fighting for mindshare and Dalle-E, Sora, Nano banana, etc generated a lot of media buzz for Google and OpenAI at various points in time.
reply