> Large aircraft are the cheapest and most scalable way to deliver a ton of explosive on target.
An important variable missing from your calculus is distance from munitions factory/supply depot. There are far cheaper and scalable ways to deliver tons of explosives if your supply lines are short, such as rail when you're defending your homeland. Carrier groups are both transport and FOBs
> You should also consider that it is much more difficult to sink a large ship than a small ship.
How did that turn out for the Russian Black Sea flagship, the Moskva?
> It's just a matter of getting the performance good enough.
Who will pay for the ongoing development of (near-)SoTA local models? The good open-weight models are all developed by for-profit companies - you know how that story will end.
Apple via customers paying for the whole solution ( eg a laptop that can run decent local models )?
I think Apple had something in the region of 143 billion in revenue in the last quarter.
Not saying it will happen - just that there are a variety of business models out there and in the end it all depends on where consumers put their money.
Spurious keyboard inputs and broken ribbon cables may have been issues in 2003, but tablet-mode laptops made in the last 15 years face no such issues; e.g. the many generations of the Lenovo Yoga series in that period. In 2026, even 7mm-thick phones can have reliable 180°/-180° folding screens - laptops have a lot more volume to play with and fewer lifetime open/close events.
Apple's problems with touchscreen laptops are not mechanical; if Apple were to make a decent touchscreen laptop - say a 12" MacBook Air, it'd have a 360° hinge and cannibalize iPad sales, so they don't make that device to preserve the segmentation motivating people to buy both devices.
That wasn't the only time Jobs trashed a category Apple didn't currently have annon-sale model for, but was actively developing; he also slurred 6-inch Android phones as "Hummers", and mocked the 7-inch Android tablets as "too small" a little while before Apple launched its iPad Mini.
There’s no contradiction here. Jobs’ point was about the MAIN input method. A touchscreen that requires a stylus as main input method still is a terrible idea. The Apple Pencil is meant for alternative and creative input, something you can’t do well with your fingers.
Please, leave that reddit-esque “iSheep”-type of comment out of here.
Chinese companies, starting with CXMT will own the consumer segment: until they are sanctioned/banned in the US. The rest of the world will be fine, but consumer desktop computing in the US will be akin to the cars in Cuba.
> Now I'm really curious. What field are you in that ndjson files of that size are common?
I'm not OP,but structured JSON logs can easily result in humongous ndjson files, even with a modest fleet of servers over a not-very-long period of time.
Replying here because the other comment is too deeply nested to reply.
Even if it's once off, some people handle a lot of once-offs, that's exactly where you need good CLI tooling to support it.
Sure jq isn't exactly super slow, but I also have avoided it in pipelines where I just need faster throughput.
rg was insanely useful in a project I once got where they had about 5GB of source files, a lot of them auto-generated. And you needed to find stuff in there. People were using Notepad++ and waiting minutes for a query to find something in the haystack. rg returned results in seconds.
The use case could be e.g. exactly processing an old trove of logs into something more easily indexed and queryable, and you might want to use jq as part of that processing pipeline
Fair, but for a once-off thing performance isn't usually a major factor.
The comment I was replying to implied this was something more regular.
EDIT: why is this being downvoted? I didn't think I was rude. The person I responded to made a good point, I was just clarifying that it wasn't quite the situation I was asking about.
At scale, low performance can very easily mean "longer than the lifetime of the universe to execute." The question isn't how quickly something will get done, but whether it can be done at all.
Good point. I said it above, but I'll repeat it here that I shouldn't have discounted how frequent once offs can be. I've worked in support before so I really should've known better
Certain people/businesses deal with one-off things every day. Even for something truly one-off, if one tool is too slow it might still be the difference between being able to do it once or not at all.
> I feel like we are just inching closer and closer to a world where rapid iteration of software will be by default.
There's a lots of experimentation right now, but one thing that's guaranteed is that the data gatekeepers will slam the door shut[1] - or install a toll-booth when there's less money sloshing about, and the winners and losers are clear. At some point in the future, Atlassian and Github may not grant Anthropic access to your tickets unless you're on the relevant tier with the appropriate "NIH AI" surcharge.
1. AI does not suspend or supplant good old capitalism and the cult of profit maximization.
Models aren't just big bags of floats you imagine them to be. Those bags are there, but there's a whole layer of runtimes, caches, timers, load balancers, classifiers/sanitizers, etc. around them, all of which have tunable parameters that affect the user-perceptible output.
It's still engineering. Even magic alien tech from outer space would end up with an interface layer to manage it :).
ETA: reminds me of biology, too. In life, it turns out the more simple some functional component looks like, the more stupidly overcomplicated it is if you look at it under microscope.
There's this[1]. Model providers have a strong incentive to switch (a part of) their inference fleet to quantized models during peak loads. From a systems perspective, it's just another lever. Better to have slightly nerfed models than complete downtime.
That isn't true. The whole point it to quickly pick up statistically significant variations quickly, and with the volume of tests they are doing there is plenty of data.
If you turn on the 95% CI bands you can see there is plenty of statistical significance.
Anybody with more than five years in the tech industry has seen this done in all domains time and again. What evidence you have AI is different, which is the extraordinary claim in this case...
Real world usage suggests otherwise. It's been a known trend for a while. Anthropic even confirmed as such ~6 months ago but said it was a "bug" - one that somehow just keeps happening 4-6 months after a model is released.
Real world usage is unlikely to give you the large sample sizes needed to reliably detect the differences between models. Standard error scales as the inverse square root of sample size, so even a difference as large as 10 percentage points would require hundreds of samples.
https://marginlab.ai/trackers/claude-code/ tries to track Claude Opus performance on SWE-Bench-Pro, but since they only sample 50 tasks per day, the confidence intervals are very wide. (This was submitted 2 months ago https://news.ycombinator.com/item?id=46810282 when they "detected" a statistically significant deviation, but that was because they used the first day's measurement as the baseline, so at some point they had enough samples to notice that this was significantly different from the long-term average. It seems like they have fixed this error by now.)
It's hard to trust public, high profile benchmarks because any change to a specific model (Opus 4.5 in this case) can be rejected if they have regressions on SWE-Bench-Pro, so everything that gets to be released would perform well in this benchmark
Any other benchmark at that sample size would have similarly huge error bars. Unless Anthropic makes a model that works 100% of the time or writes a bug that brings it all the way to zero, it's going to work sometimes and fail sometimes, and anyone who thinks they can spot small changes in how often it works without running an astonishingly large number of tests is fooling themselves with measurement noise.
They do. I'm currently seeing a degradation on Opus 4.6 on tasks it could do without trouble a few months back. Obvious I'm a sample of n=1, but I'm also convinced a new model is around the corner and they preemptively nerf their current model so people notice the "improvement".
Well, I don't see 4.5 on there ... so I'm not sure what you're trying to say.
And today is a 53% pass rate vs. a baseline 56% pass rate. That's a huge difference. If we recall what Anthropic originally promised a "max 5" user https://github.com/anthropics/claude-code/issues/16157#issue... -- which they've since removed from their site...
50-200 prompts. That's an extra 1-6 "wrong solutions" per 5 hours ... and you have to get a lot of wrong answers to arrive at a wrong solution.
I think the conspiracy theories are silly, but equally I think pretending these black boxes are completely stable once they're released is incorrect as well.
Until the AI scrapers[1] come for you at 5k requests per second and you're doing operations in hard-mode.
1. Most forges have http pages for discoverability. I suppose one could hypothetically setup an ssh-only forge and statically generate a html site periodically, but this is already advanced ops for the average Github user
This isn't a real thing and if it ever becomes a thing you can sue them for DDOS and send Sam Altman to jail. AI scraping is in the realm of 1-5 requests per second, not 5000.
An important variable missing from your calculus is distance from munitions factory/supply depot. There are far cheaper and scalable ways to deliver tons of explosives if your supply lines are short, such as rail when you're defending your homeland. Carrier groups are both transport and FOBs
> You should also consider that it is much more difficult to sink a large ship than a small ship.
How did that turn out for the Russian Black Sea flagship, the Moskva?
reply