Hacker Newsnew | past | comments | ask | show | jobs | submit | obblekk's commentslogin

80% on swebench verified is incredible. a year ago the best model was at ~30%. i wonder if we'll soon have a convincingly superhuman coding capability (even in a narrow field like kernel optimization).

this is the most interesting time for software tools since compilers and static typechecking was invented.


Last year’s model were at 50-60% on SWE bench-verified actually


I see 25-29% here https://www.swebench.com/viewer.html for models released in Nov 2024 albeit not verified. gpt4o (Aug 2024) was 33% for swe bench verified.

Important point because people have a bias to underestimate the speed of ai progress.


Do you people think nobody calls your bluff?

Here’s the launch card of the sonnet 3.5 from a year and a month ago. Guess the number. Ok, Ill tell you: 49.0%. So yeah, the comment you replied to was not really off.

https://www.anthropic.com/news/3-5-models-and-computer-use


1yr timeline is ambitious if it means fully deployed.

Clearly the right thing for Sweden and others to do. Also worrying that even 3yrs into the Russian invasion, bordering countries are urgently increasing their preparedness for future conflicts.


I believe most of those POS systems can operate in offline mode, in Europe at least. I have friends who work for large event organizers, and they have spoken about how if the system is offline the bars can continue to take payments, but there is a risk as a person's account may not have sufficient balance to make the charge when the system comes back online.

Most people here pay by card and I would say the vast majority use debit cards. A lot of people don't even have credit cards, unlike the US.

I'm no expert so may be wrong about some of this, and maybe huge events like these have these systems in place due to the risk of having to shut down bars etc. Many events are completely cash less these days.


> I believe most of those POS systems can operate in offline mode, in Europe at least.

I was able to pay "offline" for my groceries at corner store nearby when their terminal had really bad or no connection at all - and that did happen a lot. They were just gathering all payments and when the "computer guy" was around he'd upload these to the Internet. The only caveat was that for some reasons these payments would be stuck for more than a week on transactions list


The standards are already designed and widely implemented in Europe and a smallish percentage of transactions are already fully offline.

I suspect this could be implemented with just policy and config changes, with no need to reissue cards or deploy new readers.


Right, basically all EMV cards are ready. You just need one that has some offline tolerance, as there are limits on both the amount and number of consecutive offline transactions. I believe these settings can be updated on the chip, i.e. your bank will tell you to go to an ATM perform any operation to make sure it's up to date.

Payment terminals might be trickier as we've observed during outages that they currently don't fall back to offline transactions. But their software and business rules can obviously be updated.


This is also partially due to hacking incidents in recent years. In 2021, all 800 Coop grocery stores were closed for a few days due to the Kaseya VSA ransomware attack [1].

[1]: https://en.wikipedia.org/wiki/Kaseya_VSA_ransomware_attack


This is awesome. Coolest hacker demo I've seen in a while.


These critiques of zirp never explain what should have been done differently.

It's very unlikely the Fed kept rates too low because inflation didn't exceed 2% for 12 years from 2008 to 2021. If the Fed had actually made money "too cheap" inflation would have kicked in much earlier.

More likely, we made it very hard to do new stuff in the country, which made the value of borrowed money low.

Congress spent decades adding regulations (often for good reason) that ultimately resulted in it being too expensive to do most things inside the US. If a business is banned from investing in most new things, it won't need to borrow more money.

The Fed just responded to the market's appraisal of money value. You can even see this in the long term charts - interest rates declined almost continuously from 1984 to 2022.

The entire time, inflation stayed at or below 2%. If the Fed had kept rates higher, theory would predict they would have caused a recession (and Scott Sumner has spent more than a decade arguing this is actually what caused the long deep recession of 2008 - money was too expensive, even at 0%).

Ultimately money is neutral in the long run. The things that truly matter for growth are laws, culture, natural resources and education. These are the causes of our present social dysfunction. These are the issues we should focus on fixing. Not fiddling with interest rates.


The Fed could have instructed Congress to tax any growth in retained or dividended profits in excess of the annual Fed inflation target. ‘Pay more wages or charge less money’ is a direct lever over inflation, even after a partial reduction in overgrowth penalty tax is granted for research spending, and isn’t prone to causing deflation. Sadly, they don’t consider this lever in-scope for economic policy, and as far as I know they didn’t even try (and so Congress had no opportunity to consider, refuse, or ignore their request).


Do you have an example of other times the Federal Reserve has lobbied Congress to adopt a policy?

The closest I could find by searching "federal reserve asks congress" was Fed Chair Jerome Powell asking Congress to clarify the legality of marijuana[1].

I think the Fed's independence cuts both ways. They don't negotiate on fiscal policy, and the government doesn't negotiate on monetary policy (except they absolutely try to, obviously).

1: https://apnews.com/article/business-jerome-powell-congress-d...


I’ll consider your question, but I’m unlikely to complete the necessary research in time to respond before the thirty-day comment window closes. I do generally recommend this article as a useful survey slash starting point for understanding the nature of the relationship between Congress and the Fed if you’d like to pursue it further yourself:

https://harvardlawreview.org/print/vol-138/the-federal-reser...


Is anyone arguing that the Fed is responsible for "total price levels"?

The fed funds rate is used as the foundation for nearly all lending that occurs in the US, and setting it is a key tool for regulating credit (the only job they have per this paper).

The paper also lays out how credit/money creation and destruction impacts the overall economy, so I can see how the Fed's actions related to regulating credit impact overall prices, but it's a second order effect at best and there doesn't seem to be any reasonable alternative?

I guess the issue is that the paper keeps talking about inflation caused by "supply-side issues", as if the cause of inflation can be clearly discerned at any point in time or that whatever inflation is caused by supply-side issues isn't tightly linked to the amount of credit available and therefore would fall under the purview of the Fed?


The Fed is responsible for controlling inflation; if taxation of excess business growth (relative to household spending power) beyond the inflation target serves that purpose, then how businesses choose to spend the excess growth (to avoid tax penalties) — either by lowering the price level and/or raising total wages paid — is entirely up to each business to determine in their specific circumstance and industry.


I don't understand your point.

I agree tax rates are disinflationary in theory and I believe in practice to an extent, so it could be another tool in the Fed's toolbox but not one I think they want to have or one that Congress is willing (or arguably able on their own, though they wouldn't have to if the Fed's role were just advisory) to give up.

But what does that have to do with the paper you posted?


Ah, we both agree on the reality of an unwilling Congress, for sure :) I misunderstood your request for a paper as topic unfamiliarity, apologies; feel free to disregard the link posted.


In fact, the Fed did try a time or two to get off the 0% rate in the decade after 2008. The economy reacted badly, so they backed off.


> These critiques of zirp never explain what should have been done differently.

Controversially: They don't have to. The point of this piece (and similar ones) isn't "how we should treat the next similar-to-2008 financial crisis", it's that the current desire to "RETRVN TO ZIRP" is dangerous.

Perhaps there truly was no better way to respond to the Great Recession. There's a pretty good case to be made that the recovery has been a wonderous success of modern central banking. That doesn't change the fact that ZIRP had downsides, big ones.

> If the Fed had actually made money "too cheap" inflation would have kicked in much earlier.

I'm not terribly sold on this economic theory, so do take that in mind for the rest of this comment.

One thing to consider: While there was shockingly little inflation in general goods and services, there has been pretty notable asset price inflation. P/E ratios have been steadily creeping up since 2010. The real estate market's supply-shortage has turbocharged it's inflation.

> Ultimately money is neutral in the long run.

This is true, but not particularly useful; It's a very "long" "long run". The conceit of modern central banking is to "smooth out" that long term trend, and we have many examples where (even non-modern) monetary policy can screw up a country.

The big problem with Trump's desire to hit the gas on the economy and slam interest rates into the floor is that this just very clearly does not work. At worst he'll quickly find himself in the situation Erdogan got himself, at best (that is, "best Trump seizes the fed" scenario) the US will find itself in an asset bubble economy like Japan.

And unlike Japan and Turkey, the US has a lot more to lose. Pension funds make up a sizable portion of the wealth and spending in the US. If the stock market were to take a hit similar to the Dotcom bubble crash or "Lost Decade(s)", pensions will need to be adjusted downwards, to disastrous and self-reinforcing economic consequences. (Another fun layer to this is that those pensions tend to be supplemented by real estate assets, which are not in a "true" bubble but will most certainly collapse in a steep recession.)


Then you add in the fact pensions are invested into Private Equity, and it’s glaringly obvious how fragile (and already broken) the US economy is.

Returning to ZIRP is bad, as is removing the Fed’s independence, as is allowing PE to continue operating unchecked, as is a whole bunch of other stuff (over regulation of small businesses, under regulation of corporate behemoths, over reliance on government assistance programs by workers of for-profit companies due to low wages, the precarity of gig work, the displacement of educated workers by automation while simultaneously dismantling social safety nets, the student debt crisis, the housing crisis, the auto crisis, infrastructure crisis, etc, etc).

As you pointed out (and the initial detractor ignores), it’s not that ZIRP itself was bad in theory, but rather that a return to it - knowing the harms it caused - is bad, and that there is no outcome of the Fed losing independence that doesn’t end with the wholesale demolition of the foundation of the global economy, that being the US Dollar and Economic engine lifting all boats through political independence.


There was inflation in asset prices during that period. Prices for health and education also went up far above 2% annually. The problem is that CPI is a deliberately incomplete measure.


Have people been dropping cursor usage for Claude code? I have dropped to using cursor as just an ide with auto complete. Curious if others are doing this too.


Dropped it long ago for Roo Code personally.


Auto complete isn’t an AI thing


Cursor’s autocomplete is SuperMaven (which they acquired).

From the site : “Supermaven uses Babble, a proprietary language model specifically optimized for inline code completion. Our in-house models and serving infrastructure allow us to provide the fastest completions and the longest context window of any copilot.”


LLMs are literally auto-complete models. I just so happens that when your auto-complete model gets big enough, and you poke it in the right way, it accidentally pretends to be intelligent. And it turns out, that pretending to be intelligent is almost as useful as actually being intelligent.


Claude Code makes me feel like I'm dispatching a legit engineer to go get something done. But they come back in a minute instead of a week. Most of the time the solution gets the job done. Sometimes it introduces too much complexity, sometimes it's totally wrong, but it gets the job done. Cursor meanwhile just feels like shortening the (copy editor/paste chat/copy chat/paste editor) loop.

For $200/month you can get equivalent value to a team of engineers. Plan accordingly! The stack is no longer safe for employment. You need to move up to manager or move down to metal.


> You need to move up to manager or move down to metal.

Why couldn't Claude do a managers job?


It’s a good question. I think a better benchmark than the current options is “go make $X dollars this quarter.” Right now the models fail this miserably. Claude can’t even run a vending machine inside Anthropic HQ. So there is still some kind of strategic activity that comes naturally to humans that LLMs struggle with. I know the big conundrum is “scaling solves this in the next N years” but my bet is that N > ~20 in this case.


What's an example of something you've had Claude Code do that would take a software engineer a week to do? Just curious.

I see people mention converting old legacy code from an old language to something more modern. I've also seen people mention greenfield projects.

Anything other than this? I'm trying to bring this productivity to my work but so far haven't been able to replace a week of work in a few minutes yet


Last week stripped out all CSS from a fairly substantial project and replaced with Tailwind equivs, it got all but a few cases right

That was gemini-cli, I could see some mistakes on trial run so created a GEMINI.md with system prompt and project description (about 50 lines) which clarified some tricky source layout situations

Second run it was fine, ran for about an hour or so -- I had attempted to do it manually a while back but it started to look like it would take a week or two


Thanks for the insight. I have seen similar uses at work, where people do a bit of an enhanced codemod to migrate code from using one deprecated thing (library, function, syntax) to another. And while a codemod has to be more exactly programmed. AI gives you the ability to cover spots in the code that may not 1 to 1 fit with what the pattern you had in mind.

EDIT: I haven't used Tailwind much but would something like this do what you're saying, or not really? https://www.loopple.com/tools/css-to-tailwind-converter


For the trivial cases that's fine (just using LLM does same)

But this particular project is not like a standard site and the CSS is in small fragments across 100s files and uses constants for some things like color values in places too

In that Loopple example you can see the conversion uses the Tailwind arbitrary value notation, the -[], so background-color:#afa8af gets converted to bg-[#afa8af], but I wanted nearest pure tailwind class bg-zinc-400, the agent seems to work out color distance fine so does all that in one-shot too


That's good to know it is better at translating the code from using one style to another! It is one of the gold use cases for AI agent coding at the moment. I've seen that at work as well.


In the abstract it discusses that most of the effect (16deg Celsius) is from reduced radiative heating and only a few degrees from evaporation.

Mostly the benefit is instead of having the concrete under you absorb and emit the sun, the leaves above you do.

This dramatically reduces the heat we feel at human height.


> the effect (16deg Celsius)

Did I read that right? 16°C seems like an enormous effect.

Seems like trees would be a small investment to effectively get "outdoor AC-ish"?

EDIT: for those of us who are more comfortable with Freedom Units, that's like going from 104°F to 75°F!


Yes, and cities with lots of trees are way more livable due to this. Planners in our town seem to hate trees with a passion, thank god we‘re moving away from this concrete desert.


Trees in cities are expensive to maintain, which is why they're often on the chopping block when budgets get tight. This is especially true in places like Las Vegas where there is little natural tree cover due to the climate. You have to have a staff of arborists to keep the trees alive in such a harsh environment.


Indeed, the town I‘m living in was 80% destroyed during WWII and that still shows in its finances. It’s amazing how long major disaster affects a region. Big drug issues, highest cancer rate in the country etc


The problem is that to get these effects you need large canopies of trees, and to get that to happen the trees have to take the space of something else. For street trees it takes away land from parking or traffic lanes; for properties it occupies both horizontal and vertical square footage since the sky above the tree needs to be clear. These are unpopular with some political affiliations and interest groups.


Replacing inefficient street parking and wide roads with trees is a straightforward win.


It is like Sim City 1 where crossroads generated traffic, so you'd replace them with parks.


Obligatory response to 'Freedom Units' for those who haven't seen it: https://www.youtube.com/watch?v=JYqfVE-fykk


Even without any evaporative effect, the air cooling of leaves (at least bringing them to the surrounding air temperature) happens more easily than that of concrete pavement due to height and larger surface area. The concrete can easily get heated much hotter than the air at even 10-20ft.

Wrt. water consumption - Mediterranean species like say olive trees are kind of optimized for low water consumption, by for example having leaves covered with wax-like stuff decreasing evaporation.


MIT should deploy their desert water harvesting tech in LV[0]

they just need to figure a way to reorient their panels to provide shade?

[0] only sunlight needed

https://www.thebrighterside.news/post/mits-high-tech-hydroge...

Re: olives, hopefully the terpenes can also help cloud formation


A lot of normal consumers pay $20 a month for ChatGPT. I think most software gets bid down in price bc the marginal costs are zero. Where it’s not (llm token generation) prices don’t plummet and consumers build a different expectation.


Please define "a lot". In term of percentage of users connected to the internet worldwide, I don't even think it reaches a percent.


this might be a good thing if viewed from the opposite perspective: people with kids/elderly parents usually can't afford to pay as much per person as people traveling alone for fun/corporate travel.


This is incredibly well written.

The oil example is very compelling for import substitution. And the covid example is interesting in showing the savings rate only went up as an offset of gov spending.

I'd love to see a follow up on (a) is it important for the US to increase domestic savings and (b) what are the best policies to do so, and why are they the best?

I imagine blanket tariffs might actually increase the savings rate because they increase the cost of importing all goods when the domestic alternatives are either inferior or more expensive. But I'm curious if they are the best way to achieve the savings goal.


The only policy that will increase savings rate is a stable or depreciating currency. People are incentivized to use an inflationary currency so they can maintain value.


But they can spend it on stocks, with the hope of maintaining or increasing value. Thats savings, right?


Who doesn't love the value of their savings being built 100% on the speculation of the public perception of made up things?


savings are a thermodynamical impossibility. real wealth decays (livestock will die, the roof over your head will leak, the bushel of corn will rot, ...). savings must be invested for it to have future value.


Money was traditionally a way to store value, and not all things decay at a rate that matters. Roman roads still exist. The Parthenon still exists. Roman coins still exist. In any inflationary currency value erodes, and it also encourages the production of less durable goods as time pressure encourages speed of production and not durability.


An underrated fact. The only savings that doesn't decay is someone else's debt, too.


yes thank you, I paraphrased Soddy, and for him debt was virtual wealth (and not subject to decay).


If only there was some material that didn't degrade over time and is hard to produce. Somebody should invent something like that.


It's circular. Sure it's pegged to metal, but won't tell you how much corn or land or homes you can buy with it. What should one oz of gold, hoarded in 2025, be able to buy you in 2050? Many factors will determine that. Theres no such thing as fixed value, unless the definition is self referential.

The only thing to do is turn present day savings in capital, it's the only claim one can have on wealth in the future.


Its value goes up and down, and is as speculative as anything else.


Price does not equal value. One oz today will buy you about the same as 1 oz 100 years ago.


Yes, it'll buy you an oz of gold.

https://www.macrotrends.net/1333/historical-gold-prices-100-...

What does equal value?


Whatever people buy. If you are looking at the dollar value of gold you have to look at what a dollar would buy that year. That is the value. You will find that a similar amount of gold buys the same amount of things throughout time regardless of the dollar price.

Another way to say that is that the dollar price of gold is correlated with the cumulative inflation of the dollar0 over time.


Thats the inflation adjusted curve, meaning it's what gold can buy relative to what a dollar can buy.


In 1900 a 20 dollar coin contained 0.9675 ounces of gold. An ounce of gold was legally defined as $20.67. No free floating gold price. A dollar coin contained 1.672 grams of gold.

1900 $1 is equal to 2025 $172

So a single dollar today will buy 1/172 of what it would in 1900. That is inflation. Not an inflation adjusted curve. Just the drastic devaluation of the dollar.


A single dollar today will buy 1/172 of what it would in 1900, as long as what you're buying is gold. If you are buying anything else, though, your number is not relevant. And that means that your number is useless, because it's only the number if you're buying gold, and gold is almost never what we actually want to buy.


The curve suggests that there would also be periods of inflation and deflation, for instance of a market basket of goods and services, under a gold standard.


In a perpetually inflationary environment it functions that way. Stocks become a universal savings account. Everyone pouring money in raises values whether or not the company being bought has any real value.


(a) There's an argument that people should save more for retirement, but I haven't heard anything more than that about why domestic savings as a whole has to increase. If anything, this is quite a good place to naturally run a deficit! Good rule of law and investment opportunities, as well as future earnings from migrants.

(b) Targeting the fiscal deficit usually works well, especially because it's particularly yawning right now. Forced savings (sing-style CPF) work ok too though, although only Singaporeans wouldn't consider that a tax.


> only Singaporeans wouldn't consider that a tax

Both forced savings and taxes are legally mandated by the government, but that does not mean that forced savings are taxes. Implementation details matter.

Your money in your own CPF account accumulates interest (at decent/attractive interest rates that generally exceed inflation rates), and is then paid out tax-free to you after retirement.

Additionally CPF funds are managed separately from the government's consolidated revenue. They are administered by the CPF Board and are not used for government expenses in its yearly budget.


Sure sure, my point is stuff like British national insurance and American social security are considered taxes, even though much of the money in expectation goes right back to you in retirement / health spending.

In Singapore, the flows are similar, even though the accounts are broken out individually and the top-ups are explicitly done.

In my view, the key is the government telling you what to spend your money on that gives it the shade of taxation. Whether they do so with labeled accounts or not seems more of an implementation detail.


> Whether they do so with labeled accounts or not seems more of an implementation detail.

I think this is an implementation detail of huge significance.

In the CPF system, your current contributions pay for your own future retirement. In the Social Security system, your current contributions pay for current retirees' retirement.

The CPF system is sustainable, because it truly is a savings scheme. The U.S. Social Security system is not, because it relies on having a tax base that never diminishes.


> Having access to goods and service is what gives you a better living standard

I think the good faith critique is access to imports can be taken away by the other country if they want. eg. rare earth metals. So being too heavily reliant on imports without the capacity to produce domestically is less long run access


Only if you have a single supplier.

For imports to be useful you need multiple suppliers all of whom have to have capacity to expand if one of the supplier lets you down.

Same as in business.

Industrial policy should decide domestic vs external production on that basis.

As the world moves to trade blocs the case for trade between trade blocs falls - precisely because the risk of getting left high and dry increases


I mean China has cut off rare earths from time to time, and from time to time we don't see crippling shortages but a rather quick supply response.

If you're concerned about short / medium term timeframes, I've yet to see a broad analysis that showed stockpiling (can even do it privately!) being insufficient.


Yes. Too much of a good thing, or a good thing at the wrong place or instant of time, ends up not being a good thing.

I would recommend against having sex in a subway station at rush hour, or drinking French Cognac during a job interview, although both are good things.

We can and should discuss how much trade deficit, and the nature of it, but in essence, it is still a good thing if you don't owe to other countries money in a currency you don't control to have this deficit.


Except the US has a realistic protectionist policy it can use: defence production. It's an industry which is diverse, naturally demands locality, but can also provide an export market.

And very much was a core US growth export till very recently.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: