More

grantpitt · 2025-11-24T19:10:21 1764011421

do say more

GodelNumbering · 2025-11-24T19:18:51 1764011931

Makes it sound like a one trick pony

jascha_eng · 2025-11-24T19:34:56 1764012896

Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here.

Mkengin · 2025-11-24T23:30:09 1764027009

I am eagerly awaiting swe-rebench results for November with all the new models: https://swe-rebench.com/

grantpitt · 2025-11-24T19:27:12 1764012432

well, it's a big trick

grantpitt · 2025-11-21T15:50:24 1763740224

Any application that can run in the browser, will eventually run in the browser.

grantpitt · 2025-11-20T16:09:52 1763654992

Huh, can you share a link? I tried here: https://gemini.google.com/share/e753745dfc5d

evrenesat · 2025-11-20T16:12:52 1763655172

https://gemini.google.com/share/79fe1a38e440

gandreani · 2025-11-20T16:33:10 1763656390

Maybe somewhere in the original comment it would have been fair to mention you can barely see the house in the original photo. This is actually a hilarious complaint

Jaxan · 2025-11-20T16:38:57 1763656737

Maybe. But this is not an edge case. I consider this genuine use of the marketed tool.

evrenesat · 2025-11-20T16:42:47 1763656967

That cannot be a valid excuse. Other than adding extra windows to the clearly visible wall, it's obvious that model perfectly capable to "see" the house. It just cannot "believe" that there can be a big empty wall on a garden house.

WesleyJohnson · 2025-11-20T21:17:55 1763673475

https://gemini.google.com/share/3b4d2cd55778

grantpitt · 2025-11-18T16:59:11 1763485151

Agreed, it also leads performance on arc-agi-1. Here's the leaderboard where you can toggle between arc-agi-1 and 2: https://arcprize.org/leaderboard

energy123 · 2025-11-18T22:46:26 1763505986

It leads on arc-agi-1 with Gemini 3.0 Deep Think, which uses "tool calls" according to google's post, whereas regular Gemini 3.0 Pro doesn't use "tool calls" for the same benchmark. I am unsure how significant this difference is.

grantpitt · 2025-10-07T17:10:32 1759857032

Very interesting to hear two technologists at a tech business conference say things along the lines of: "our tools do not merely extend us, they transform us", followed up with "we've become numb to the devastating consequences of technology".

(I know I'm somewhat selectively reading but still)

grantpitt · 2025-08-12T17:06:00 1755018360

Interesting because games are exactly the kinds of RL environments that models can effectively learn - but the catch is that they must do this learning on the fly in test-time. Very exciting to see this.

grantpitt · on May 24, 2024

Right, like in math, not all infinite sequences contain every finite subsequence. For example, a non-repeating sequence of 2's and 7's contains no sequence "4". The further condition is that the number be normal[1].

Also, TIL we don't know whether π is normal thus the popular claim that "every string of numbers eventually occurs in π" is not known to be true

[1] https://en.wikipedia.org/wiki/Normal_number

grantpitt · on April 19, 2024

Nice post. With respect to maximizing future options, I find the ideas expressed in the following quotes are interesting counter-points.

From '4,000 weeks': "Not only should you settle; ideally you should settle in a way that makes it harder to back out, such as moving in together, or having a child. The irony of all our efforts to avoid facing finitude -- to carry on believing that it might be possible not to choose between mutually exclusive options -- is that when people finally do choose, in a relatively irreversible way, they're usually much happier as a result."

From 'Zero to One': "When people lack concrete plans to carry out, they use formal rules to assemble a portfolio of various options. ... A definite view, by contrast, favors firm convictions. Instead of pursuing many-sided mediocrity and calling it "well-roundedness," a definite person determines the one best thing to do and then does it."

grantpitt · on April 12, 2024

> "By the 1880s, their numbers had plummeted from around 30 million to a mere 325"

I thought that must've been a typo upon first reading. Even 325,000 would be shocking. Amazing that the conservation efforts seem to have worked well.

littlestymaar · on April 12, 2024

It hasn't worked perfectly though, as the vaste majority of American bisons (all but 4 herds in fact) aren't true bisons but have cattle genes due to hybridation.

hasmanean · on April 12, 2024

Now the USA has 87 million cattle.

Private cattle ranches could never have competed with 30 million bison owned by nature, free for anyone to take.

cryptonector · on April 12, 2024

Many ranches raise bison. And even deer.

Just because the population was wild then wouldn't have precluded ranching bison, or even cows.

hasmanean · on April 13, 2024

How to you make a fenced off ranch when 30 million buffalo decide to stampede through?

littlestymaar · on April 14, 2024

You wrote it by yourself: you fence it off!

hasmanean · on April 19, 2024

And how does a stampeding herd coordinate to move around a fence?

Buffalo will run off a cliff in the hundreds because those in the back can’t see what’s in the front.

defrost · on April 19, 2024

Or because they've been driven from the rear and prompted by by a lead decoy to run toward a natural Ha-Ha, a drop hidden by the curve of the land.

https://en.wikipedia.org/wiki/Buffalo_jump

https://en.wikipedia.org/wiki/Ha-ha

There are second hand accounts from indigenous people, excavated bone sites with tools and spear tips, but reportedly no first hand recorded observations from Europeans of buffalo running off cliffs.

littlestymaar · on April 21, 2024

Buffaloes aren't exactly Main Battle Tanks you know: it's not that difficult to build fences that are able to stop them (it just won't be the light fences you have in mind).

Also, not voluntarily pushing the beasts to your fences is going to help a bit, since buffaloes don't run to cliff on their own…

grantpitt · on March 29, 2024

That quote is him criticizing a certain shift in science and education he perceives, right? Not him advocating for that position.