The worst vices are the superfluous faux-eloquence that meanders without meaning. Employing linguistic devices for the sake of utilizing them without managing to actually make a point with its usage.
I was trying to figure out why my SD card wasn't mounting and asked ChatGPT. It said:
> Your kernel is actually being very polite here. It sees the USB reader, shakes its hand, reads its name tag… and then nothing further happens. That tells us something important. Let’s walk this like a methodical gremlin.
It's so sickly sweet. I hate it.
Some other quotes:
> Let’s sketch a plan that treats your precious network bandwidth like a fragile desert flower and leans on ZFS to become your staging area.
> But before that, a quick philosophical aside: ZFS is a magnificent beast, but it is also picky.
> Ending thought: the database itself is probably tiny compared to your ebooks, and yet the logging machinery went full dragon-hoard. Once you tame binlogs, Booklore should stop trying to cosplay as a backup solution.
> Nice, progress! Login working is half the battle; now we just have to convince the CSS goblins to show up.
> Hyprland on Manjaro is a bit like running a spaceship engine in a treehouse: entirely possible, but the defaults are not tailored for you, so you have to wire a few things yourself.
> The universe has gifted you one of those delightfully cryptic systemd messages: “Failed to enable… already exists.” Despite the ominous tone, this is usually systemd’s way of saying: “Friend, the thing you’re trying to enable is already enabled.”
Did you not put some weird thing in your prompt ? That's not the style of writing I have in my ChatGPT, I run without memory and with default prompt.
Yours try to make a metaphore at every single response.
The developers aren't pro players either, the cutting edge for anti-cheats still require that non-cheaters play with cheaters for months. I would not be shocked if simple vote-kick outperforms every anti-cheat on the market.
- If you were too good on some server, you'd get banned.
- If the admin doesn't know well cheating, he could tolerate something that was obvious cheating.
- Cheaters could just change server often.
It used to be easy to just ban peoples yes, and it was as easy to switch servers.
Plus on most competitive game today, you have custom lobbies, which do exactly what you want, and there is a reason why only a minority of players uses it.
Custom lobbies don't meet the same need. That's for playing with your friends, or at least, people you vet yourself. Community servers are a sub-community in of themselves: people tend to play on the same servers on a regular basis, allowing you to build rapport, community norms, and have substantially more direct moderation than company-run servers.
Yes, sometimes you run into power-tripping moderators. That comes with the territory of having moderators. But the upsides, of being embedded in a usefully-sized community, and having nearly constant human moderation, not to mention the whole "stop killing games" of it all, far outweigh the need to shop around a bit for a good server.
I think the ideal middle ground is something like Squad's server system: The developers offer a contract to server owners, establishing basic standards that must be met to be a recommended server. Rules forbidding the crazy bigotry that milsims tend to attract, minimum server specs to ensure smooth gameplay, an effective appeals process. If a server meets those requirements, and signs the agreement to keep meeting those standards, they get put on a "recommended" server list (which 90%+ of the playerbase exclusively use). Other servers go on the "custom" server list, which can be modded, or spun up for certain events, or whatever.
two or three months ago, I played a game that did exactly what you proposed, V-Rising, it have a server browser, I played a week with friend on a busy server.
Then the server was gone for two weeks. When it was back, mosts of the bases were gone due to inactivity.
That's the kind of things that were common too, maybe you forgot about it.
All the multiplayer games I play today are either community server based, or I exclusively interact with private lobbies.
My negative experiences with community servers represent a pretty short list. Sometimes servers die, but games die sometimes, too. That's obviously only an issue with persistent-state games, like Minecraft, but it's unfortunate when it happens. Can't say it was so frequent that it impacted my enjoyment of any games as a whole.
All true, but of course you're missing the player agency component that renders those issues moot. If any of the above happens, you can simply find another server.
Private games (now called "custom lobbies") were available back then too, they're not equivalent to a public server browser.
They are functionally equivalent for the player.
The problem with player hosted servers is that it was very hard to get a fair and balanced competitive match, where now it's extremely common with matchmaking on servers hosted by the game company.
Back then at least you could do something about it. Now if there's an obvious cheater you just kinda sit there and take your L, and ask people to make reports.
If you were playing on a server you owned or for which you had ban permissions, you could do something about it. Otherwise, you had to hope that an admin was online to ban the cheater. If no one was around to take action, your option was to... sit there, take your L, and ask people to make reports (to the admins). You had the option to hop around between servers until you found one that didn't have cheaters, but is that all that different from just quitting back to matchmaking and hoping you find a match without cheaters?
Edit to add: I'm not disputing that kernel-level anticheat is bad; I agree that it is. I don't think it helps to try and hearken back to a golden age of PC gaming that didn't really exist. Maybe it was easier for server admins to manage because player populations were smaller back then, but that's about all that would have made things "better."
This is drudging up some formative memories. In the counter-strike / TF2 communities you'd have servers that would grant vote kick rights with more playtime and some of those regulars would then apply for mod rights. It worked quite well.
It still doesn't solve the unfair votekick problem. People with more play time, doesn't have necessarly the abilities nor tools to judge if someone is cheating.
Take a look at the trackmania community, some cheaters are caught years later, because they played it smart.
Some cheating can't only be observed by looking at the statistics, or hard proof of cheating being ran.
It's a pub. It doesn't matter as long as it's not obvious aim bots and people are having fun. Besides when it's a 32 player instant respawn death match server you have like 200-300 regulars. That type of cheating was never an issue in those because the servers were always full during peak times and everyone kinda knows each other.
They are not functionally equivalent, unless there are games I'm not familiar with where custom lobbies are published in a list for strangers to join. Normally a custom lobby implies invite only.
Not everyone is interested in a "fair and balanced competitive match" where you're guaranteed to win no more and no less than 50% of the time. I actually find that intolerably boring.
> They are not functionally equivalent, unless there are games I'm not familiar with where custom lobbies are published in a list for strangers to join.
Lots of the mosts played competitive games have that, or third party websites/discords that have links to custom lobbies.
> I have to conclude you're unfamiliar with what multiplayer gaming was like when servers were the norm.
Did you even played a single game competitively ? The fact you keep pushing for server browser tell me that no, you need communities on something else.
You likely forgot the hassle that server browser were, and forgot that lots of games didn't had a server browser.
LFG communities were important and excluding this shows you were only playing casually, forgot all the problems servers browser had.
Do you even remember, that you could get malware by joining servers in a server list ?!?
No, I used to play multiplayer games for fun, which was the norm until that option was removed and replaced with derisive "casual" and "competitive" modes.
99% of people who played CS1.x/tf/Q3A/bf1942/cod/etc booted up the game, found a server in the browser with low ping to play on, and if they liked it they favorited it. They came back the next day, and the next, and started to recognized other players. That is the server browser experience.
If you were in the tiny minority of players trying to be "competitive" back then, you're right I don't know what it looked like for you. Sounds like it sucked, honestly, and maybe competitive matchmaking solved some of those problems, but in the bargain we lost a lot of what made those games fun for "casuals" as you smugly call us.
> ...you needed to sink in a lot of time to get the few quality time you wrote about.
Sounds like you've got a skill issue. That doesn't match my experience, like, at all.
But a really, really easy shortcut was to find servers that indicated that they were furry-friendly. This all but guaranteed that
1) The folks on there would be fairly even-keeled and reasonable, and folks who weren't would be rapidly banned forever.
2) The folks on there would generally be good at the game. [0]
3) If you're lucky -and the game is one that permits custom "sprays" (as HL1 and Source engine did)- you might get to see some high-quality-but-thumbnail-sized furry porn.
[0] Seriously, at least back when both server browsers and user-hosted game servers were commonplace, I found a 1:1 correlation between "Are they a furry?" and "Are they particularly good at the game?". It was wild.
> The problem with player hosted servers is that it was very hard to get a fair and balanced competitive match
Playing against overwhelming odds has its own kind of charm. I once spend days just sabotaging the top players on some gun game servers, only wining myself once or twice. Games against friends with various fun handicaps and flat out abuse of any knowledge you could gain from playing against the same people repeatedly - what good is a hidding spot when everyone knows you will be there 50% of the time.
"Fair and balanced" games against completely random people are just missing something for me.
This is something matchmaking games totally miss which keeps them from being truly competitive in the way sports or old games were: a competitive community. You need other players with known identities to compare yourself against on a consistent basis.
Of course, classic competitive institutions had problems as well (“he’s very competitive” is not necessarily a nice description of a person!), but they seemed more enjoyable that this matchmaking stuff.
I did indeed play in the era LanceH is talking about, and I agree with them! We had many thriving communities with no serious cheating problems because of community moderation.
Yes, there were poorly moderated servers, but you could simply leave and try a different community until you found one that clicked for you. When you require equal moderation everywhere, you throw the baby out with the bath water.
Initially, until you found the right community run ones? I don't see the issue. Today is worse, especially when there is no server browser but just a blackbox that drops you in a random match.
I have no ambiant lighting.
I have my window opened or the CO2 level gets bad.
If I get lights, all the fucking insect existing in the forest will come in my room.
Or I can get a fresh breeze while being on my PC in the evening.
Hmm, wouldn't it sacrifice a better answer in some cases (not sure how many though)?
I'll be surprised if they hadn't specifically trained for structured "correct" output for this, in addition to picking next token following the structure.
In my experience (I've put hundreds of billions of tokens through structured outputs over the last 18 months), I think the answer is yes, but only in edge cases.
It generally happens when the grammar is highly constrained, for example if a boolean is expected next.
If the model assigns a low probability to both true and false coming next, then the sampling strategy will pick whichever one happens to score highest. Most tokens have very similar probabilities close to 0 most of the time, and if you're picking between two of these then the result will often feel random.
It's always the result of a bad prompt though, if you improve the prompt so that the model understands the task better, then there will then be a clear difference in the scores the tokens get, and so it seems less random.
It's not just the prompt that matters, it's also field order (and a bunch of other things).
Imagine you're asking your model to give you a list of tasks mentioned in a meeting, along with a boolean indicating whether the task is done. If you put the boolean first, the model must decide both what the task is and whether it is done at the same time. If you put the task description first, the model can separate that work into two distinct steps.
There are more tricks like this. It's really worth thinking about which calculations you delegate to the model and which you do in code, and how you integrate the two.
Grammars work best when aligned with prompt. That is, if your prompt gives you the right format of answer 80% of the time, the grammar will take you to a 100%. If it gives you the right answer 1% of the time, the grammar will give you syntactically correct garbage.
Sampling is already constrained with temperature, top_k, top_p, top_a, typical_p, min_p, entropy_penalty, smoothing etc. – filtering tokens to valid ones according to grammar is just yet another alternative. It does make sense and can be used for producing programming language output as well – what's the point in generating/bothering with up front know, invalid output? Better to filter it out and allow valid completions only.
No, that's a rumor lots of people have been taking at face value.
If you do the math, inferrence is very lucrative.
Here someone deployed a big model, the costs are $0.20/1M token
https://lmsys.org/blog/2025-05-05-large-scale-ep/
The article Zitron links says Cursor has single-digit millions of cash burn with about $1B in the bank (as of August). Assuming that is true, they are losing money but have a long runway.
That article says "Anysphere runs pretty lean with around 150 employees and has a single digit monthly cash burn, a source tells me." That would be total cash burn, i.e., net losses. If their AWS bill is bigger than that it's because they are making up for part of it with revenue.
Ed's mentioned ARR in previous articles and it's not a "generally accepted accounting principle". They cherry pick the highest monthly revenue number and multiply that by 12, but that's not their actual annual revenue.
"Cherry pick the highest" is misleading. If your revenue is growing 10% a month for a year straight and is not seasonal, picking any other than the most recent month to annualize would make no sense.
If a company's revenue in January is $100 and it grows by 10% every month, the December revenue is $285. The year's revenue would be about $2,138, but ARR in December would be $3,423. That's 1.6x the actual revenue.
ARR could be a useful tool to help predict future revenue, but why not simply report on actual revenue and suggest it might increase in the next year? I have found the most articles to be unclear to the reader about what ARR actually represents.
Why is the calendar year the relevant unit? If you insist on years, then if you consider the year from June to June, $2,138 would be misleading small.
The point of ARR is to give an up to date measure on a rapidly changing number. If you only report projected calendar year revenue, then on January 1 you switch from reporting 2025 annual revenue to 2026 projected revenue, a huge and confusing jump. Why not just report ARR every month? It's basically just a way of reporting monthly revenue — take the number you get and divide it by 12.
I am really skeptical that people are being bamboozled by this in some significant way. Zitron does far more confusing things with numbers in the name of critique.
You're correct, ARRs can be both misleading and for any 12-month period (I just chose a year to simplify), but the problem is AI companies tend to only release their latest ARR, and only selectively, which I believe is misleading in the opposite direction.
The "annual" just means that the unit of time is a year. It doesn't mean that it is recurring annually. You can call it Annualized Monthly Recurring Revenue if it makes you feel better.
Well people like Sam Altman have not been entirely honest and there's a reason they're not sharing their actual revenue numbers. If they could show they were growing 10% every month they would.
Eh, when you have a company that’s growing, picking the highest and annualizing it is sensible. If we had a mature company with highly seasonal revenue it would be dishonest.
I mean I think there are instances where OpenAI's revenue is seasonal. Lots of students using it during the school year and cancelling it during summer.
I think you missed the forest for the trees. I am sure the student population has some dropoff during summer months but the point is that for businesses that a growing month over month which most of these have since creation, you take the highest number (latest) and annualize it.
I am also willing to bet that the student dropoff is not pronounced. I am more thinking of a business that sells beach umbrellas, they make a lot of sales in the summer months and then next to nothing in the winter months. That would be dishonest.
I thought a human would be a considerable step up in complexity but I asked it first for a pelican[0] and then for a rat [1] to get out of the bird world and it did a great job on both.
But just fot thrills I also asked for a "punk rocker"[2] and the result--while not perfect--is leaps and bounds above anything from the last generation.
0 -- ok, here's the first hurdle! It's giving me "something went wrong" when I try to get a share link on any of my artifacts. So for now it'll have to be a "trust me bro" and I'll try to edit this comment soon.
I never understood the point of the pellican on a bicycle exercise:
LLMs coding agent doesnt have any way to see the output.
It means the only thing this test is testing, is the ability of the LLMs to memorise.
Because it excercises thinking about a pelican riding a bike (not common) and then describing that using SVG. It's quite nice imho and seems to scale with the power of the LLM model. Sure Simon has some actual reasons though.
I wouldn't say any LLMs are good at it. But it doesn't really matter, it's not a serious thing. It's the equivalent of "hello world" - or whatever your personal "hello world" is - whenever you get your hands on a new language.
Coordinate and shape of the element used to form a pellican.
If you think about how LLMs ingest their data, they have no way to know how to form a pellican in SVG.
I bet their ability to form a pellican result purely because someone already did it before.
> If you think about how LLMs ingest their data, they have no way to know how to form a pellican in SVG.
It's called generalization and yes, they do. I bet you could find plenty of examples of it working on something that truly isn't "present in the training data".
It's funny, you're so convinced that it's not possible without direct memorization but forgot to account for emergent behaviors (which are frankly all over the place in LLM's - where you been)?
At any rate, the pelican thing from simonw is clearly just for fun at this point.
reply