> Krishna also referenced the depreciation of the AI chips inside data centers a...

darth_avocado · 2025-12-03T01:47:40 1764726460

Isn’t that what Michael Burry is complaining about? That five years is actually too generous when it comes to depreciation of these assets and that companies are being too relaxed with that estimate. The real depreciation is more like 2-3 years for these GPUs that cost tens of thousands of dollars a piece.

https://x.com/michaeljburry/status/1987918650104283372

enopod_ · 2025-12-03T09:01:13 1764752473

That's exactly the thing. It's only about bookkeeping.

The big AI corps keep pushing depreciation for GPUs into the future, no matter how long the hardware is actually useful. Some of them are now at 6 years. But GPUs are advancing fast, and new hardware brings more flops per watt, so there's a strong incentive to switch to the latest chips. Also, they run 24/7 at 100% capacity, so after only 1.5 years, a fair share of the chips is already toast. How much hardware do they have in their books that's actually not useful anymore? Noone knows! Slower depreciation means more profit right now (for those companies that actually make profit, like MS or Meta), but it's just kicking the can down the road. Eventually, all these investments have to get out of the books, and that's where it will eat their profits. In 2024, the big AI corps invested about $1 trillion in AI hardware, next year is expected to be $2 trillion. Only the interest payments for that are crazy. And all of this comes on top of the fact that none of the these companies actually make any profit at all with AI. (Except Nvidia of course) There's just no way this will pan out.

mbesto · 2025-12-03T15:48:40 1764776920

> It's only about bookkeeping.

> Some of them are now at 6 years.

There are three distinct but related topics here, it's not "just about bookkeeping" (though Michael Burry may be specifically pointing to the bookkeeping being misquoted):

1. Financial depreciation - accounting principals typically follow the useful life of the capital asset (simply put, if an airplane typically gets used for 30 years, they'll split the cost of purchasing an airplane across 30 years equally on their books). Getting this right has more to do with how future purchases get financed due to how the bookkeepers show profitability, balance sheets, etc.. Cashflow is ultimately what might create an insolvent company.

2. Useful life - per number 1 above - this is the estimated and actual life of the asset. So if the airplane actually is used over 35 years, not 30, it's actual useful life is 35 years. This is to your point of "some of them are 6 years old". Here is where this is going to get super tricky with GPUs. We (a) don't actually know what the useful life is or is going to be (hence Michael Burry's question) for these GPUs (b) the cost of this is going to get complicated fast. Let's say (I'm making these up) GPU X2000 is 2x the performance of GPU X1000 and your whole data center is full of GPU X1000. Do you replace all of those GPUs to increase throughput?

3. Support & maintenance - this is what actually gets supported by the vendor. There doesn't seem to be any public info about the Nvidia GPUs but typically these are 3-5 years (usually tied to the useful life) and often can be extended. Again, this is going to get super complicated to financially because we don't know what future advancements might happen to performance improvements to GPUs (and therefore would necessitate replacing old ones and therefore creating renewed maintenance contracts).

avisser · 2025-12-03T13:04:16 1764767056

> Also, they run 24/7 at 100% capacity, so after only 1.5 years

How does OpenAI keep this load? I would expect the load at 2pm Eastern to be WAY bigger than the load after California goes to bed.

krupan · 2025-12-03T14:16:23 1764771383

People outside the 4 U.S. Timezones exist?

avisser · 2025-12-03T14:24:35 1764771875

The Pacific ocean is big.

brookst · 2025-12-03T13:25:08 1764768308

Typical load management that’s existed for 70 years: when interactive workloads are off-peak, you do batch processing. For OpenAI that’s anything from LLM evaluation of the days’ conversations to user profile updates.

gizmo · 2025-12-03T11:16:35 1764760595

Flops per watt is relevant for a new data center build-out where you're bottlenecked on electricity, but I'm not sure it matters so much for existing data centers. Electricity is such as small percentage of total cost of ownership. The marginal cost of running a 5 year old GPU for 2 more years is small. The husk of a data center is cheap. It's the cooling, power delivery equipment, networking, GPUs etc that costs money, and when you retrofit data centers for the latest and greatest GPUs you have to throw away a lot of good equipment. Makes more sense to build new data centers as long as inference demand doesn't level off.

duped · 2025-12-03T03:56:36 1764734196

How different is this from rental car companies changing over their fleets? I don't know, this is a genuine question. The cars cost 3-4x as much and last about 2x as far as I know, and the secondary market is still alive.

logifail · 2025-12-03T06:44:06 1764744246

> How different is this from rental car companies changing over their fleets?

New generations of GPUs leapfrog in efficiency (performance per watt) and vehicles don't? Cars don't get exponentially better every 2–3 years, meaning the second-hand market is alive and well. Some of us are quite happy driving older cars (two parked outside our home right now, both well over 100,000km driven).

If you have a datacentre with older hardware, and your competitor has the latest hardware, you face the same physical space constraints, same cooling and power bills as they do? Except they are "doing more" than you are...

Would we could call it "revenue per watt"?

wongarsu · 2025-12-03T09:25:35 1764753935

The traditional framing would be cost per flop. At some point your total costs per flop over the next 5 years will be lower if you throw out the old hardware and replace it with newer more efficient models. With traditional servers that's typically after 3-5 years, with GPUs 2-3 years sounds about right

The major reason companies keep their old GPUs around much longer with now are the supply constraints

bbarnett · 2025-12-03T08:48:52 1764751732

The used market is going to be absolutely flooded with millions of old cards. I imagine shipping being the most expensive cost for them. The supply side will be insane.

Think 100 cards but only 1 buyer as a ratio. Profit for ebay sellers will be on "handling", or inflated shipping costs.

eg shipping and handling.

3form · 2025-12-03T09:44:35 1764755075

I assume NVIDIA and co. already protects themselves in some way, either by the fact of these cards not being very useful after resale, or requiring them to go to the grinder after they expire.

prewett · 2025-12-03T19:51:50 1764791510

In the late '90s, when CPUs were seeing the advances of GPUs are now seeing, there wasn't much of a market for two/three-year old CPUs. (According to a graph I had Gemini create, the Pentium had 100 MFLOPS and the Pentium 4 had 3000 MFLOPS.) I bought motherboards that supported upgrading, but never bothered, because what's the point of going from 400 MHz to 450 MHz, when the new ones are 600 or 800 MHz?

I don't think nVidia will have any problem there. If anything, hobbyists being able to use 2025 cards would increase their market by discovering new uses.

bbarnett · 2025-12-03T10:18:53 1764757133

Cards don't "expire". There are alternate strategies to selling cards, but if they don't sell the cards, then there is no transfer of ownership, and therefore NVIDIA is entering some form of leasing model.

If NVIDIA is leasing, then you can't get use those cards as collateral. You can't also write off depreciation. Part of what we're discussing is that terms of credit are being extended too generously, with depreciation in the mix.

The could require some form of contractual arrangement, perhaps volume discounts for cards, if they agree to destroy them at a fixed time. That's very weird though, and I've never heard of such a thing for datacenter gear.

They may protect themselves on the driver side, but someone could still write OSS.

pezgrande · 2025-12-03T13:41:13 1764769273

Dont they own socket for enterprise cards? I can't see consumers buying these card unless they are PCIE at the very least.

afavour · 2025-12-03T04:21:57 1764735717

Rental car companies aren’t offering rentals at deep discount to try to kickstart a market.

It would be much less of a deal if these companies were profitable and could cover the costs of renewing hardware, like car rental companies can.

cjonas · 2025-12-03T04:30:15 1764736215

I think it's a bit different because a rental car generates direct revenue that covers its cost. These GPU data centers are being used to train models (which themselves quickly become obsolete) and provide inference at a loss. Nothing in the current chain is profitable except selling the GPUs.

sho · 2025-12-03T06:08:51 1764742131

> and provide inference at a loss

You say this like it's some sort of established fact. My understanding is the exact opposite and that inference is plenty profitable - the reason the companies are perpetually in the red is that they're always heavily investing in the next, larger generation.

I'm not Anthropic's CFO so i can't really prove who's right one way or the other, but I will note that your version relies on everyone involved being really, really stupid.

darkwater · 2025-12-03T07:12:10 1764745930

The current generation of today was the next generation of yesterday. So, unless the services sold on inference can cover the cost of inference + training AND gain money, they are still operating at loss.

elktown · 2025-12-03T07:42:46 1764747766

“like it's some sort of established fact” -> “My understanding”?! a.k.a pure speculation. Some of you AI fans really need to read your posts out loud before posting them.

teodosin · 2025-12-03T08:18:01 1764749881

You misread the literal first snippet you quoted. There's no contradiction in what you replied to.

elktown · 2025-12-03T09:18:23 1764753503

rvba · 2025-12-03T08:47:34 1764751654

Or just "everyone" being greedy

chii · 2025-12-03T04:31:43 1764736303

> the secondary market is still alive.

this is the crux. Will these data center cards, if a newer model came out with better efficiency, have a secondary market to sell to?

It could be that second hand ai hardware going into consumers' hands is how they offload it without huge losses.

vesrah · 2025-12-03T04:51:43 1764737503

The GPUs going into data centers aren't the kind that can just be reused by putting them into a consumer PC and playing some video games, most don't even have video output ports and put out FPS similar to cheap integrated GPUs.

geerlingguy · 2025-12-03T06:11:53 1764742313

And the big ones don't even have typical PCIe sockets, they are useless outside of behemoth rackmount servers requiring massive power and cooling capacity that even well-equipped homelabs would have trouble providing!

gessha · 2025-12-03T13:37:03 1764769023

Don’t underestimate a homelaber’s intention to cosplay as a sysadmin or ability to set their house on fire ;)

I wonder if people will come up with ways to repurpose those data center cards.

duped · 2025-12-03T16:59:46 1764781186

I would presume that some tier shaped market will arise where the new cards are used for the most expensive compute tasks like training new models, the slightly used for inference, older cards for inference of older models, or applied to other markets that have less compute demand (or spend less $ per flop, like someone else mentioned).

It would be surprising to me that all this capital investment just evaporates when a new data center gets built or refitted with new servers. The old gear works, so sell it and price it accordingly.

physicsguy · 2025-12-03T05:23:04 1764739384

Data centre cards a don’t have fans and don’t have video out these days.

chii · 2025-12-03T05:25:11 1764739511

i dont mean consumer market for video cards - i mean a consumer buying ai chips to run themselves so they can have it locally.

If i can buy a $10k ai card for less than $5000 dollars, i probably would, if i can use it to run an open model myself.

mkjs · 2025-12-03T06:44:12 1764744252

At that point it isn't a $10k card anymore, it's a $5k card. And possibly not a $5k card for very long in the scenario that the market has been flooded with them.

physicsguy · 2025-12-03T05:48:28 1764740908

Ah well yes to a degree that’s possible but at least at the moment you’d still be better off buying a $5k Mac Studio if it’s just inference you’re doing

darkwater · 2025-12-03T07:15:30 1764746130

How many "yous" are there in the world? Probably a number that can buy what's inside one Azure DC?

mike_hearn · 2025-12-03T09:30:21 1764754221

Why would you do that when you can pay someone else to run the model for you on newer more efficient and more profitable hardware? What makes it profitable for you and not for them?

gessha · 2025-12-03T13:38:09 1764769089

Control and privacy?

esseph · 2025-12-03T08:57:38 1764752258

You need the hardware to wrap that in, and the power draw is going to be... significant.

rzerowan · 2025-12-02T19:38:06 1764704286

I think its illustrative to consider the previous computation cycle ala Cryptomining. Which passed through a similar lifecycle with energy and GPU accelerators.

The need for cheap wattage forced the operations to arbitrage the where location for the cheapest/reliable existing supply - there rarely was new buildout as the cost was to be reimbursed by the coins the miningpool recovered.

For the chip situation caused the same apprecaition in GPU cards with periodic offloading of cards to the secondary market (after wear and tear) as newer/faster/more efficient cards came out until custom ASICs took over the heavy lifting, causing the GPU card market to pivot.

Similarly in the short to moedium term the uptick of custo ASICs like with Google TPU will definately make a dent in bot cpex/opex and potentially also lead to a market with used GPUs as ASICs dominate.

So for GPUs i can certainly see the 5 year horizon making a impact in investment decisions as ASICs proliferate.

abraae · 2025-12-02T18:53:58 1764701638

It's just the same dynamic as old servers. They still work fine but power costs make them uneconomical compared to latest tech.

acdha · 2025-12-02T19:00:34 1764702034

It’s far more extreme: old servers are still okay on I/O, and memory latency, etc. won’t change that dramatically so you can still find productive uses for them. AI workloads are hyper-focused on a single type of work and, unlike most regular servers, a limiting factor in direct competition with other companies.

matt-p · 2025-12-03T01:06:23 1764723983

I mean you could use training GPUs for inference right? That would be use case number 1 for a 8 * a100 box in a couple of years. It can also be used for non IO limited things like folding proteins or other 'scientific' use cases. Push comes to shove im sure an old A100 will run crysis.

oblio · 2025-12-03T03:58:29 1764734309

All those use cases would probably use up 1% of the current AI infrastructure, let alone ahat they're planning to build.

Yeah, just like gas, possible uses will expand if AI crashes out, but:

* will these uses cover, say, 60% of all this infra?

* will these uses scale up to use that 60% within the next 5-7 years, while that hardware is still relevant and fully functional?

Also, we still have railroad tracks from the 1800s rail mania that were never truly used to capacity and dot com boom dark fiber that's also never been used fully, even with the internet growing 100x since. And tracks and fiber don't degrade as quickly as server hardware and especially GPUs.

physicsguy · 2025-12-03T05:24:16 1764739456

> Push comes to shove im sure an old A100 will run crysis.

They don’t have video out ports!

fulafel · 2025-12-03T07:10:18 1764745818

Just like laptop dGPUs.

m00x · 2025-12-03T01:02:44 1764723764

LambdaLabs is still making money off their Tesla V100s, A100s, and A6000s. The older ones are cheap enough to run some models and very cheap, so if that's all you need, that's what you'll pick.

The V100 was released in 2017, A6000 in 2020, A100 in 2021.

Havoc · 2025-12-02T21:13:58 1764710038

That could change with a power generation breakthrough. If power is very cheap then running ancient gear till it falls apart starts making more sense

overfeed · 2025-12-03T02:12:45 1764727965

Power consumption is only part of the equation. More efficient chips => less heat => lower cooling costs and/or higher compute density in the same space.

nish__ · 2025-12-03T02:52:58 1764730378

Solution: run them in the north. Put a server in the basement of every home in Edmonton and use the excess heat to warm the house.

rgmerk · 2025-12-03T06:21:33 1764742893

Hugely unlikely.

Even if the power is free you still need a grid connection to move it to where you need it, and, guess what, the US grid is bursting at the seams. This is not just due to data center demand; it was struggling to cope with the transition away from coal well before that point.

You also can’t buy a gas turbine for love nor money at the moment, and they’re not ever going to be free.

If you plonked massive amounts of solar panels and batteries in the Nevada desert, that’s becoming cheap but it ain’t free, particularly as you’ll still need gas backup for a string of cloudy days.

If you think SMRs are going to be cheap I have a bridge to sell you, you’re also not going to build them right next to your data centre because the NRC won’t let you.

So that leaves fusion or geothermal. Geothermal is not presently “very cheap” and fusion power has not been demonstrated to work at any price.

zppln · 2025-12-02T19:06:31 1764702391

I'm a little bit curious about this. Where do all the hardware from the big tech giants usually go once they've upgraded?

q3k · 2025-12-03T00:09:09 1764720549

In-house hyperscaler stuff gets shredded, after every single piece of flash storage gets first drilled through and every hard drive gets bent by a hydraulic press. Then it goes into the usual e-waste recycling stream (ie. gets sent to poor countries where precious metals get extracted by people with a halved life expectancy).

Off-the-shelf enterprise gear has a chance to get a second life through remarketing channels, but much of it also gets shredded due to dumb corporate policies. There are stories of some companies refusing to offload a massive decom onto the second hand market as it would actually cause a crash. :)

It's a very efficient system, you see.

oblio · 2025-12-03T04:00:56 1764734456

Similar to corporate laptops where due to stupid policies, for most BigCos you can't really buy or otherwise get a used laptop, even as the former corporate used of said laptop.

Super environmentally friendly.

trollbridge · 2025-12-02T19:09:27 1764702567

I used (relatively) ancient servers (5-10 years in age) because their performance is completely adequate; they just use slightly more power. As a plus it's easy to buy spare parts, and they run on DDR3, so I'm not paying the current "RAM tax". I generally get such a server, max out its RAM, max out its CPUs and put it to work.

taneq · 2025-12-03T00:07:12 1764720432

Same, the bang for buck on a 5yo server is insane. I got an old Dell a year ago (to replace our 15yo one that finally died) and it was $1200 AUD for a maxed out recently-retired server with 72TB of hard drives and something like 292GB of RAM.

PunchyHamster · 2025-12-03T00:11:21 1764720681

Just not too old. Easy to get into "power usage makes it not worth it" for any use case when it runs 24/7

monster_truck · 2025-12-03T00:46:46 1764722806

Seriously. 24/7 adds up faster than most realize!

The idle wattage per module has shrunk from 2.5-3W down to 1-1.2 between DDR3 & DDR5. Assuming a 1.3W difference (so 10.4W for 8760 hours), a DDR3 machine with 8 sticks would increase your yearly power consumption by almost 1% (assuming avg 10,500kWh/yr household)

That's only a couple dollars in most cases but the gap is only larger in every other instance. When I upgraded from Zen 2 to Zen 3 it was able to complete the same workload just as fast with half as many cores while pulling over 100W less. Sustained 100% utilization barely even heats a room effectively anymore!

blackenedgem · 2025-12-03T08:46:00 1764751560

The one thing to be careful with Zen 2 onwards is that if your server is going to be idling most of the time then the majority of your power usage comes from the IO die. Quite a few times you'd be better off with the "less efficient" Intel chips because they save 10-20 Watts when doing nothing.

ThatPlayer · 2025-12-03T12:14:51 1764764091

A similar one I just ran into: my Framework Desktop was idling @ 5W more than other reported numbers. Issue turned out to be the 10 year old ATX PSU I was using.

nish__ · 2025-12-03T02:54:12 1764730452

Wake on LAN?

darkwater · 2025-12-03T07:23:02 1764746582

Then you cannot enjoy some very useful and used home server functions like home automation or NVR.

taneq · 2025-12-03T07:49:25 1764748165

To be clear, this server is very lightly loaded, it's just running our internal network services (file server, VPN/DNS, various web apps, SVN etc.) so it's not like we're flogging a room full of GeForce 1080Ti cards instead of buying a new 4090Ti or whatever. Also it's at work so it doesn't impact the home power bill. :D

dpe82 · 2025-12-03T00:23:09 1764721389

Maybe? The price difference on newer hardware can buy a lot of electricity, and if you aren't running stuff at 100% all the time the calculation changes again. Idle power draw on a brand new server isn't significantly different from one that's 5 years old.

wmf · 2025-12-02T19:09:17 1764702557

Some is sold on the used market; some is destroyed. There are plenty of used V100 and A100 available now for example.

dogman144 · 2025-12-02T19:36:35 1764704195

Manipulating this for creative accounting seems to be the root of Michael Burry’s argument, although I’m not fluent enough in his figures to map here. But, commenting that it interesting to see IBM argue a similar case (somewhat), or comments ITT hitting the same known facts, in light of Nvidia’s counterpoints to him.

tim333 · 2025-12-03T14:07:28 1764770848

Burry just did his first interview for many years https://youtu.be/nsE13fvjz18?t=265

with Michael Lewis, about 30 mins long. Highlights - he thinks we are near the top, his puts are for two years time. If you go long he suggests healthcare stocks. He's been long gold some years, thinks bitcoin is dumb. Thinks this is dotcom bubble #2 except instead of pro investors it's mostly index funds this time. Most recent headlines about him have been bad reporting.

mbesto · 2025-12-03T15:52:47 1764777167

> They still work fine but power costs make them uneconomical compared to latest tech.

That's not necessarily the driving financial decision, in fact I'd argue company's with data center hardware purchases barely look at this number. It's more simple than that - their support runs out and its cheaper to buy a new piece of hardware (that IS more efficient) because the hardware vendors make extended support inordinately expensive.

Put yourselves in the shoes of a sales person at Dell selling enterprise server hardware and you'll see why this model makes sense.

PunchyHamster · 2025-12-03T00:10:23 1764720623

Eh, not exactly. If you don't run CPU at 70%+ the rest of the machine isn't that much more inefficient that model generation or two behind.

It used to be that new server could use half power of the old one at idle but vendors figured out that servers also need proper power management a while ago and it is much better.

Last few gens increase could be summed up to "low % increase in efficiency, with TDP, memory channels and core count increase".

So for loads not CPU bound the savings on newer gen aren't nearly worth it to replace it, and for bulk storage the CPU power usage is even smaller part

matt-p · 2025-12-03T01:10:44 1764724244

Definitely single thread performance and storage are the main reasons not to use an old server. A 6 year old server didn't have nvme drives, so SATA SSD at best. That's a major slow down if disk is important.

Aside from that there's no reason to not use a dual socket server from 5 years ago instead of a single socket server of today. Power and reliability maybe not as good.

zozbot234 · 2025-12-03T05:51:19 1764741079

NVMe is just a different form factor for what's essentially a PCIe connection, and adapters are widely available to bridge these formats. Surely old servers will still support PCIe?

knowitnone3 · 2025-12-03T00:20:39 1764721239

that was then. now, high-end chips are reaching 4,3,2 nm. power savings aren't that high anymore. what's the power saving going from 4 to 2nm?

monster_truck · 2025-12-03T00:28:11 1764721691

+5-20% clockspeed at 5-25% lower voltages (which has been and continues to be the trend) add up quick from gen to gen, nevermind density or ipc gains.

baq · 2025-12-03T08:14:16 1764749656

We can’t really go lower on voltage anymore without a very significant change in the materials used. Silicon band gap yadda yadda.

slashdave · 2025-12-03T00:03:53 1764720233

5 years is long, actually. This is not a GPU thing. It's standard for server hardware.

bigwheels · 2025-12-03T00:43:16 1764722596

Because usually it's more efficient for companies to retire the hardware and put in new stuff.

Meanwhile, my 10-15 year old server hardware keeps chugging along just fine in the rack in my garage.

AdrianB1 · 2025-12-03T01:37:09 1764725829

I thought the same until I calculated that newer hardware consumes a few times less energy and for something running 24x7 that adds up quite a bit (I live in Europe, energy is quite expensive).

So my homelab equipment is just 5 years old and it will get replaced in 2-3 years with something even more power efficient.

tharkun__ · 2025-12-03T04:32:17 1764736337

Where in Europe?

Asking coz I just did a quick comparison and it seems to depend but for comparison I have a really old AMD Athlon "e" processor (like literally September 2009 is when it came out according to some quick Google search, tho I probably bought it a few months later than that but still ...) that runs at ~45W TDP. In idle conditions, it typically consumes around 10 to 15 watts (internet wisdom, not kill-a-watt-wisdom).

Some napkin math says it would cost me about 40 years worth of amortization to replace this at my current power rates for this system. So why would I replace it? And even with some EU countries' power rates we seem to be at 5-10 years amortization upon replacement. I've been running this motherboard, CPU + RAM combo for ~15 years now it seems, replacing only the hard drives every ~3 years. And the tower it's in is about 25 years old.

Oh I forgot, I think I had to buy two new CR2032 batteries during those years (CMOS battery).

Now granted, this processor can basically do "nothing" in comparison to a current system I might buy. But I also don't need more for what it does.

z0mghii · 2025-12-03T05:16:36 1764738996

Well if you have a system that does "nothing" it's hard to argue to replace it

bbarnett · 2025-12-03T08:55:26 1764752126

"Nothing" from parent was a comparison. Doesn't mean their system is idle.

However many systems are mostly idle. A file server often doesn't use much cpu. It often isn't even serving anything.

tharkun__ · 2025-12-04T00:58:23 1764809903

That is definitely true and why I compared idle watts. That Athlon uses the same idle watts as modern mobile CPUs. So no reason to replace during the mostly idle times. Spot on. I can't have this system off during idle time as it wouldn't come up to fulfill its purpose fast enough when needed and it would be a pain to trigger that anyway (I mean, really, port knocking to start up that system type thing). Else I would. That I do do with the HTPC which has a more modern Intel core i3.

The "nothing" here was exactly meant more for the times when it does have to do something. But even then at 45W TDP, as long as it's able to do what it needs to, then the newer CPUs have no real edge. What they gain in performance due to multi core they loose in being essentially equivalent single core performance for what that machine does: HTPC file serving, email server etc.

prmoustache · 2025-12-03T06:56:03 1764744963

I guess you did the math but wouldn't it be more effective to spend the money on solar panels instead of replacing the computer hardware?

AdrianB1 · 2025-12-06T14:27:03 1765031223

I have panels and the night is long. Batteries are still very expensive.

thehappypm · 2025-12-03T04:40:15 1764736815

Energy is very cheap for data centers. have you ever looked up wholesale energy rates? It’s like a cent per kilowatt hour.

rsynnott · 2025-12-03T08:40:51 1764751251

"Just fine". Presumably you're not super-concerned with the energy costs? People who run data centres pretty much _have_ to be.

XorNot · 2025-12-03T00:45:14 1764722714

Sample size of 1 though. It's like how I've had hard disks last a decade, but a 100 node Hadoop cluster had 3 die per week after a few years.

snuxoll · 2025-12-03T01:00:42 1764723642

Spinning rust and fans are the outliers when it comes to longevity in compute hardware. I’ve had to replace a disk or two in my rack at home, but at the end of the day the CPUs, RAM, NICs, etc. all continue to tick along just fine.

When it comes to enterprise deployments, the lifecycle always revolves around price/performance. Why pay for old gear that sucks up power and runs 30% slower than the new hotness, after all!

But, here we are, hitting limits of transistor density. There’s a reason I still can’t get 13th or 14th gen poweredge boxes for the price I paid for my 12th gen ones years ago.

slashdave · 2025-12-03T01:38:42 1764725922

More than that. The equipment is depreciated on a 5 year schedule on the company balance sheet. It actually costs nothing to discard it.

johncolanduoni · 2025-12-03T01:58:37 1764727117

There’s no marginal tax impact of discarding it or not after 5 years - if it was still net useful to keep it powered, they would keep it. Depreciation doesn’t demand you dispose of or sell the item to see the tax benefit.

anticensor · 2025-12-07T05:58:16 1765087096

"Depreciation doesn’t demand you dispose of or sell the item to see the tax benefit."

It does require to dispose the item if you are a government entity otherwise the budget for the new item isn't allocated.

mattmaroon · 2025-12-03T03:13:48 1764731628

No but it tips the scales. If the new hardware is a little more efficient, but perhaps not so much so that you would necessarily replace it, the ability to appreciate the new stuff, but not the old stuff might tip your decision

jfindper · 2025-12-03T14:53:43 1764773623

There are several significant differences between your garage server and a data center.

matt-p · 2025-12-03T01:02:20 1764723740

5 years is a long time for GPUs maybe but normal servers have 7 year lifespans in many cases fwiw.

These GPUs I assume basically have potential longevity issues due to the density, if you could cool it really really well I imagine no problem.

atherton94027 · 2025-12-03T01:29:18 1764725358

> normal servers have 7 year lifespans in many cases fwiw

Eight years if you use Hetzner servers!

slashdave · 2025-12-03T01:40:11 1764726011

Normal servers are rarely run flat-out. These GPUs are supposed to be run that way. So, yeah, age is going to be a problem, as will cooling.

mcculley · 2025-12-02T18:50:45 1764701445

But if your competitor is running newer chips that consume less power per operation, aren't you forced to upgrade as well and dispose of the old hardware?

Octoth0rpe · 2025-12-02T18:56:22 1764701782

Sure, assuming the power cost reduction or capability increase justifies the expenditure. It's not clear that that will be the case. That's one of the shaky assumptions I'm referring to. It may be that the 2030 nvidia accelerators will save you $2000 in electricity per month per rack, and you can upgrade the whole rack for the low, low price of $800,000! That may not be worth it at all. If it saves you $200k/per rack or unlocks some additional capability that a 2025 accelerator is incapable of and customers are willing to pay for, then that's a different story. There are a ton of assumptions in these scenarios, and his logic doesn't seem to justify the confidence level.

overfeed · 2025-12-03T02:22:35 1764728555

> Sure, assuming the power cost reduction or capability increase justifies the expenditure. It's not clear that that will be the case.

Share price is a bigger consideration than any +/- differences[1] between expenditure vs productivity delta. GAAP allows some flexibility in how servers are depreciated, so depending on what the company wants to signal to shareholders (investing in infra for futur returns vs curtailing costs), it may make sense to shorten or lengthen depreciation time regardless of the actual TCOO keep/refresh cost comparisons.

1. Hypothetical scenario: a hardware refresh costs $80B, actual performance increase is only worth $8B, but the share price increases the value of org's holding of its own shares by $150B. As a CEO/CFO, which action would you recommend- without even considering your own bonus that's implicitly or explicitly tied to share price performance.

maxglute · 2025-12-02T21:04:33 1764709473

Demand/suppy economics is not so hypothetical.

Illustration numbers: AI demand premium = $150 hardware with $50 electricity. Normal demand = $50 hardware with $50 electricity. This is Nvidia margins @75% instead of 40%. CAPEX/OPEX is 70%/20% hardware/power instead of customary 50%/40%.

If bubble crashes, i.e. AI demand premium evaporates, we're back at $50 hardware and $50 electricity. Likely $50 hardware and $25 electricity if hardware improves. Nvdia back to 30-40% margins, operators on old hardware stuck with stranded assets.

The key thing to understand is current racks are sold at grossly inflated premiums right now, scarcity pricing/tax. If the current AI economic model doesn't work then fundmentally that premium goes away and subsequent build outs are going to be costplus/commodity pricing = capex discounted by non trivial amounts. Any breakthroughs in hardware, i.e. TPU compute efficiency would stack opex (power) savings. Maybe by year 8, first gen of data centers are still depreciated to $80 hardware + $50 power vs new center @ $50 hardware + $25 power. That old data center is a massive write-down because it will generate less revenue than it costs to amoritize.

trollbridge · 2025-12-02T19:15:03 1764702903

A typical data centre is $2,500 per year per kW load (including overhead, hvac and so on).

If it costs $800,000 to replace the whole rack, then that would pay off in a year if it reduces 320 kW of consumption. Back when we ran servers, we wouldn't assume 100% utilisation but AI workloads do do that; normal server loads would be 10kW per rack and AI is closer to 100. So yeah, it's not hard to imagine power savings of 3.2 racks being worth it.

Octoth0rpe · 2025-12-02T19:43:14 1764704594

Thanks for the numbers! Isn't it more likely that the amount of power/heat generated per rack will stay constant over each upgrade cycle, and the upgrade simply unlocks a higher amount of service revenue per rack?

PunchyHamster · 2025-12-03T00:17:06 1764721026

Not in the last few years. CPUs went from ~200W TDP to 500W.

And they went from zero to multiple GPUs per server. Tho we might hit "the chips can't be bigger and the cooling can't get much better" point there.

The usage would be similar if it was say a rack filled with servers full of bulk storage (hard drives generally keep the power usage similar while growing storage).

But CPU/GPU wise, it's just bigger chips/more chiplets, more power.

I'd imagine any flattening might be purely because "we have DC now, re-building cooling for next gen doesn't make sense so we will just build servers with similar power usage as previously", but given how fast AI pushed the development it might not happen for a while.

bleepblap · 2025-12-03T20:12:13 1764792733

I've been in university research computing for 15 years, so large enough (~900 nodes) we need a dedicated DC, but not at the same scale as others around here.

Our racks are provisioned so that there are two independent rails, which each can support 7kW. Up until the last few years, this was more than enough power. As CPU TDPs increased, we started to need to do things like not connect some nodes to both redundant rails or mix disk servers into compute racks to keep under 7kW/rack.

A single HGX B300 box has 6x6kW power supplies. Even before we get to paying the (high) power bills, it's going to cost a small fortune to just update the racks, power distribution units, UPS, etc... to even be able to support more than a handful of those things

toast0 · 2025-12-03T00:45:06 1764722706

> Isn't it more likely that the amount of power/heat generated per rack will stay constant over each upgrade cycle,

Power density seems to grow each cycle. But eventually your DC hits power capacity limits, and you have to leave racks empty because there's no power budget.

HWR_14 · 2025-12-02T19:31:16 1764703876

It depends on how much profit you are making. As long as you can still be profitable on the old hardware you don't have to upgrade.

AstroBen · 2025-12-03T00:30:17 1764721817

That's the thing though: a competitor with better power efficiency can undercut you and take your customers

tzs · 2025-12-03T02:51:01 1764730261

Or they could charge the same as you and make more money per customer. If they already have as many customers as they can handle doing that may be better than buying hardware to support a larger number of customers.

austin-cheney · 2025-12-02T18:59:10 1764701950

It’s not about assumptions on the hardware. It’s about the current demands for computation and expected growth of business needs. Since we have a couple years to measure against it should be extremely straightforward to predict. As such I have no reason to doubt the stated projections.

9cb14c1ec0 · 2025-12-03T00:05:55 1764720355

> Since we have a couple years to measure against

Trillion pound baby fallacy.

lumost · 2025-12-03T01:04:37 1764723877

Networking gear was famously overbought. Enterprise hardware is tricky as there isn’t much of a resale market for this gear once all is said and done.

The only valid use case for all of this compute which could reasonably replace ai is btc mining. I’m uncertain if the increased mining capacity would harm the market or not.

piva00 · 2025-12-03T08:39:54 1764751194

BTC mining on GPUs haven't been profitable for a long time, it's mostly ASICs, GPUs can be used for some other altcoins which makes the potential market for used previous generation GPUs even smaller.

blackenedgem · 2025-12-03T08:48:48 1764751728

That assumes you can add compute in a vacuum. If your altcoin receives 10x compute then it becomes 10x more expensive to mine.

That only scales if the coin goes up in value due to the extra "interest". Which isn't impossible but there's a limit, and it's more often to happen to smaller coins.

andix · 2025-12-02T19:24:17 1764703457

Failure rates also go up. For AI inference it’s probably not too bad in most cases, just take the node offline and re-schedule the jobs to other nodes.

coliveira · 2025-12-02T19:42:59 1764704579

There is the opportunity cost of using a whole datacenter to house ancient chips, even if they're still running. You're thinking like a personal use chip which you can run as long as it is non-defective. But for datacenters it doesn't make sense to use the same chips for more than a few years and I think 5 years is already stretching their real shelf life.

rlupi · 2025-12-02T20:32:58 1764707578

Do not forget that we're talking about supercomputers. Their interconnect makes machines not easily fungible, so even a low reduction in availability can have dramatic effects.

Also, after the end of the product life, replacement parts may no longer be available.

You need to get pretty creative with repair & refurbishment processes to counter these risks.

marcosdumay · 2025-12-03T01:54:32 1764726872

Historically, GPUs have improved in efficiency fast enough that people retired their hardware in way less than 5 years.

Also, historically the top of the line fabs were focused on CPUs, not GPUs. That has not been true for a generation, so it's not really clear if the depreciation speed will be maintained.

cubefox · 2025-12-03T14:47:25 1764773245

> Historically, GPUs have improved in efficiency fast enough that people retired their hardware in way less than 5 years.

This was a time when chip transistor cost was decreasing rapidly. A few years earlier even RAM cost was decreasing quickly. But these times are over now. For example, the PlayStation 5 (where the GPU is the main cost), which launched five years ago, even increased in price! This is historically unprecedented.

Most price/performance progress is nowadays made via better GPU architecture instead, but these architectures are already pretty mature, so the room for improvement is limited.

Given that the price per transistor (which TSMC & Co are charging) has decreased ever more slowly in recent years, I assume it will eventually come almost to a halt.

By the way, this is strictly speaking compatible with Moore's law, as it is only about transistors per chip area, not price. Of course the price per chip area was historically approximately constant, which meant exponentially increasing transistor density implied exponentially decreasing transistor price.

marcosdumay · 2025-12-03T17:43:46 1764783826

> This was a time when chip transistor cost was decreasing rapidly.

GPUs were actually mostly playing catch-up. They were progressively becoming more expensive parts that could afford being built on more advanced fabs.

And I'll have to point, "advanced fabs" is a completely post-Moore's law concept. Moore's law is about literally the number of transistors on the most economic package. Not any bullshit about area density that marketing people invented on the last decade (you can go read the paper). With Moore's law, the cheapest fab improves quickly enough that it beats whatever more advanced fabs existed before you can even finish designing a product.

chii · 2025-12-03T04:34:13 1764736453

> that people retired their hardware in way less than 5 years.

those people are end-consumers (like gamers), and only recently, bitcoin miners.

Gamers don't care for "profit and loss" - they want performance. Bitcoin miners do need to switch if they want to keep up.

But will an AI data center do the same?

thinkmassive · 2025-12-03T05:23:10 1764739390

Mining bitcoin with a GPU hasn't been profitable in over a decade.

TingPing · 2025-12-03T04:58:08 1764737888

The rate of change is equal for all groups. The gaming market can be the most conservative since it’s just luxury.

dmoy · 2025-12-02T18:56:07 1764701767

5 years is maybe referring to the accounting schedule for depreciation on computer hardware, not the actual useful lifetime of the hardware.

It's a little weird to phrase it like that though because you're right it doesn't mean you have to throw it out. Idk if this is some reflection of how IBM handles finance stuff or what. Certainly not all companies throw out hardware the minute they can't claim depreciation on it. But I don't know the numbers.

Anyways, 5 years is an infection point on numbers. Before 5 years you get depreciation to offset some cost of running. After 5 years, you do not, so the math does change.

skeeter2020 · 2025-12-02T18:58:46 1764701926

that is how the investments are costed though, so makes sense when we're talking return on investment, so you can compare with alternatives under the same evaluation criteria.

loeg · 2025-12-03T00:33:31 1764722011

It's option #2. But 5 year deprecation is optimistic; 2-3 years is more realistic.

anticensor · 2025-12-07T05:59:23 1765087163

Isn't the depreciation timespan imposed by the particular accounting standard?

araes · 2025-12-06T00:51:06 1764982266

General question to people who might actually know.

Is there anywhere that does anything like Backblaze's Hard Drive Failure Rates [1] for GPU Failure Rates in environments like data centers, high-performance computing, super-computers, mainframes?

[1] https://www.backblaze.com/blog/backblaze-drive-stats-for-q3-...

The best that came back on a search was a semi-modern article from 2023 [2] that appears to be a one-off and mostly related to consumer facing GPU purchases, rather than bulk data center, constant usage conditions. It's just difficult to really believe some of these kinds of hardware deprecation numbers since there appears to be so little info other than guesstimates.

[2] https://www.tweaktown.com/news/93052/heres-look-at-gpu-failu...

Found an Arxiv paper on continued checking that's from UIUC UrbanaIL about a 1,056 A100 and H100 GPU system. [3] However, the paper is primarily about memory issues and per/job downtime that causes task failures and work loss. GPU Resilience is discussed, it's just mostly from the perspective of short-term robustness in the face of propagating memory corruption issues and error correction, rather than multi-year, 100% usage GPU burnout rates.

[3] https://arxiv.org/html/2503.11901v3

Any info on the longer term burnout / failure rates for GPUs similar to Backblaze?

Edit: This article [4] claims it's 0.1-2% failure rate per year (0.8% (estimated)) with no real info about where the data came from (cites "industry reports and data center statistics"), and then claims they often last 3-5 years on average.

[4] https://cyfuture.cloud/kb/gpu/what-is-the-failure-rate-of-th...

more_corn · 2025-12-03T01:24:43 1764725083

When you operate big data centers it makes sense to refresh your hardware every 5 years or so because that’s the point at which the refreshed hardware is enough better to be worth the effort and expense. You don’t HAVE to, but its more cost effective if you do. (Source, used to operate big data centers)

lithos · 2025-12-02T20:18:40 1764706720

It's worse than that in reality, AI chips are on a two year cadence for backwards compatibility (NVIDIA can basically guarantee it, and you probably won't be able to pay real AI devs enough to stick around to make hardware work arounds). So their accounting is optimistic.

Patrick_Devine · 2025-12-03T00:38:47 1764722327

5 years is normal-ish depreciation time frame. I know they are gaming GPUs, but the RTX 3090 came out ~ 4.5 years before the RTX 5090. The 5090 has double the performance and 1/3 more memory. The 3090 is still a useful card even after 5 years.

cubefox · 2025-12-03T14:53:34 1764773614

RTX 3090 MSRP: 1500 USD

RTX 5090 MSRP: 2000 USD

m101 · 2025-12-03T14:01:29 1764770489

Given power and price constraints, it's not that you cannot run them in 5 years time it's that you don't want to run them in 5 years time and neither will anyone else that doesn't have free power.

ExoticPearTree · 2025-12-03T14:49:40 1764773380

> AI accelerator cards all start dying around the 5 year mark,

More likely the technology will be much better in 5 years in terms of hardware that it is (very) uneconomical to run anything on old hardware.

protocolture · 2025-12-03T04:16:16 1764735376

Actually my biggest issue here is that, assuming it hasnt paid off, you dont just convert to regular data center usage.

Honestly if we see a massive drop in DC costs because the AI bubble bursts I will be stoked.