Pentium floating-point division bug (1994)

benreesman · on Aug 11, 2023

That’s why they called it the Pentium and not the 586: on the bench it kept coming out at 585.9999999. I’m sorry.

To pay my way for the silliness: I will highly recommend “The Pentium Chronicles” for a view into that time and place.

It covers the development of the P6 arch, but there is some inside baseball about the P5. And it’s just one of the best books I’ve ever read about really amazingly well-run engineering efforts.

NovaDudely · on Aug 11, 2023

I will always appreciate the old jokes like this. Never apologies for them no matter how cheesy!

My favorite from the 90's was, what does a computer and an air conditioner have in common? Both stop working when you open windows.

weinzierl · on Aug 11, 2023

Why is Linux like a Wigwam?

No Windows, no Gates and Apache inside.

1-more · on Aug 11, 2023

TIL that wigwam and wickiup are two names for similar enough dwellings that we may consider them the same thing. I always thought that wigwams were bigger. They're usually called wigwams in the northeast among the Iroquois confederacy and wickiups in the southwest among the Apache.

BearOso · on Aug 11, 2023

I think the Iroquois were mostly associated with longhouses, not huts.

1-more · on Aug 11, 2023

yeah I think my issue stems from getting taught that wigwam was a synonym for longhouse when we learned the terms in school in the Northeast (like dead in the middle of an Iroquois nation). Haudenosaunee (endonym for Iroquois) literally references the longhouse. Wigwam has nothing to do with longhouse! It's an Algonquian word (further East from the Iroquois). Will keep this in mind.

Cyphase · on Aug 11, 2023

This reminds me of another old line:

"In a world without walls or fences, who needs Windows or Gates?"

quartz · on Aug 11, 2023

Your comment reminded me that Cyrix had a "586" chip (technically was called the 5x86) and 12 year old me definitely thought it was an Intel when I bought it at a local computer show. Turned out to be a great little processor for the price.

Cyrix was a really fascinating company. Extremely short lived but had a huge impact on Intel's grip over the processor space (Intel lost all their lawsuits against them and nearly faced antitrust proceedings because of it).

blululu · on Aug 11, 2023

Q: What do you get when you cross a mathematician with an Intel Pentium Chip? A: A mad scientist!

This bug is a great counterexample to the adage that 'computers don't make mistakes - they do exactly as they're told'.

EdwardCoffin · on Aug 11, 2023

I believe there is a lot of overlap between the transcript of an interview Robert Colwell did [1] with his book The Pentium Chronicles. I've read and liked both. I'm linking to archive.org's copy as the original site appears to be defunct.

[1] Oral history of Robert P. Colwell (1954- ) Interviewed by Paul N. Edwards, Assoc. Prof., University of Michigan School of Information, at Colwell’s home near Portland, Oregon, on August 24-25, 2009

https://web.archive.org/web/20210726205114/https://www.sigmi...

TheRealSteel · on Aug 11, 2023

I just checked and this book is 100$ AU. Why so expensive?

benreesman · on Aug 12, 2023

When I was running a team that included other managers I bought copies for everyone, and I believe at least one is in a box somewhere. I’ve read it dozens of times so I’m happy to pass it along if I can find it.

No promises (I’ve moved a lot) but if you email me I’ll try to find it and if I do it’s yours.

nazgulsenpai · on Aug 11, 2023

Appears to be available for lending at archive.org

krylon · on Aug 11, 2023

Wild guess - out of print, highly sought after, collector's item?

scrlk · on Aug 11, 2023

It's still in print - it's listed on Amazon UK for £37.95 (~$74 AUD).

If you're lucky enough to know someone with the right IEEE Xplore subscription: https://ieeexplore.ieee.org/book/5989703

Dwedit · on Aug 11, 2023

0.999 repeating is an infinite summation that converges to 1.

mkatx · on Aug 11, 2023

There are multiple proofs for this, but my favorite simple, yet informal proof, is that there is no real number you can add to 0.999 infinitely repeating to make it 1. So mathematically, they are equal.

westurner · on Aug 11, 2023

1.0-0.99999>0

If spheres have theoretically zero contact area, and they stack in a tube, is there zero contact area between the spheres?

We say that the limits of 1/x and 2/x are equal, but they have different slopes approaching said asymptotic limit

wscott · on Aug 11, 2023

[ time to re-share my Dr. Nicely story ]

Dr. Nicely caused quite a bit of excitement at Intel. I was on the p6 architecture team when he discovered the FDIV bug. Our FPU was formally verified and didn't have the same bug. To be nice to Dr Nicely we sent him a pre-release p6 development system to test with his program to demonstrate that his bug was fixed. He was working on a prime number sieve program and came back reporting that the p6 ran at 1/2 the speed of a Pentium for his code. Wow, another blackeye/firestorm caused by Dr. Nicely. He had too much of an audience for him to report to the world this new processor was slower.

So I got to spend a lot of time learning how to sieve works and what is happening. For the most part, it allocates a huge array in memory with each byte representing a number. You walk the array with a stride of known primes setting bytes and whatever is left must be prime. ie. every 3 is not prime, every 5 is not prime, every 7...

So in the steady state, you are writing a single byte to a cache line without reading anything. And every write hits a different cache line.

Now p6 had a write-allocate cache, but the Pentium would only allocate on read, so on the Pentium a write that misses the cache would become a write to memory. On the p6 that write would need to load the cache line from memory into the cache and then the line in the cache was modified. And since every line in the cache was also modified we had to flush some other cache line first to make room. So every 1-byte write would become a 32-byte write to memory followed by a 32-byte read from memory.

Normally write-allocate is a good thing, but in this case, it was a killer. We were stumped.

Then the magic observation: 99% of these writes were marking a space that was already marked. When you get up to walking by large strides most of those were already covered by one of the smaller factors.

So if you change the code from:

     array[N] = 1

to:

     if (!array[N]) array[N] = 1

Now suddenly we are doing a read first, and after that read we skip the write so the data in the cache doesn't become modified and can be discarded in the future. Also, the p6 was a super-scalar machine that ran multiple iterations of this loop in parallel and could have multiple reads going to memory at the same time. With that small tweak, the program got 4X faster and we went from being 1/2X the speed of a Pentium to being twice the speed. And this was at the same clock frequency. The test hardware ran 100Mhz, we released at 200Mhz and went up from there.

sponaugle · on Aug 11, 2023

The Pentium Pro is by far the CPU I remember being the most surprised about, and I was an engineer at Intel when it was released. Such an amazing design and manufacturing technique, and 200Mhz with 256 or 512k L2 cache! That seemed like such a huge cache at the time. Performance on video codecs with some tweaks was a 2x jump as well. Funny enough I still have a working Pentium Pro 200 system. good times!

inversetelecine · on Aug 11, 2023

I regret getting rid of mine. Harder to find now, or 'pricey' on eBay compared to 10 years ago.

I'll never forget the size of the cpus compared to others at the time. Running them in dual-cpu setups was fun also.

sponaugle · on Aug 11, 2023

Yea, It was surprising how many years after that the perfect spec developer station was a Pentium Pro, Mach 64 graphics card, monochrome card with mono monitor for WinICE, and of course Windows NT 4.0.

ghaff · on Aug 11, 2023

There are a lot of cache-related performance behaviors that can be triggered by the "wrong" code. I was the product for a minicomputer system once and an outside performance consultant stumbled on one of those which caused a bit of a kerfuffle at the time. As I recall, it wasn't really an issue for real-world workloads but if you did things just wrong, it could be a significant performance hit.

nextaccountic · on Aug 11, 2023

Could this transformation be still effective in modern processors?

retrac · on Aug 11, 2023

It wouldn't be needed today on x86_64. The underlying idea - priming the processor with hints about what and when - is now standard practice. There are explicit instructions for manipulating the cache and marking data as needed-soon. E.g. PREFETCH, available as __builtin_prefetch in GCC. Added to the Pentium III, I think.

nopinsight · on Aug 11, 2023

I recall my former professor at UT Austin, J Strother Moore, sharing how Intel utilized an Automatic Theorem Prover he helped create to spot bugs in processors. But when engineers tweaked the design just a little, thinking it was a no-brainer that it wouldn't cause issues, they skipped the automated checks.

And just like that, the infamous Pentium bug hit the headlines globally.

sn41 · on Aug 11, 2023

Thanks for that nice anecdote. Moore is also famous for the Boyer-Moore string matching algorithm.

Also, in the memory of Dr. Thomas R. Nicely (1943-2019), who found the bug:

https://faculty.lynchburg.edu/~nicely/

kstrauser · on Aug 11, 2023

A joke at the time riffed off a Star Trek TNG quote:

“I am Pentium of Borg. Division is futile.”

bernds74 · on Aug 11, 2023

The full version goes: "We are Pentium of Borg. Division is futile. You will be approximated."

kstrauser · on Aug 12, 2023

I forgot that last part! Thanks for that nostalgia blast.

dbcurtis · on Aug 11, 2023

My take away from this a lesson about validation strategy. First, I will point out that exhaustive testing for 2^64 * 2^64 operand pairs is impractical, for scale recall that there are around 2^75 (I maybe off a bit here) atoms in the universe. So you wont run all pairs, much less generate expected results for all pairs. So, you must sample. Purely random sampling is mostly going to generate operand pairs that are not edge cases and corner cases, where one or both of the operands falls precisely on a binate boundary (the precise point on the floating point number line where the exponent increments/decrements by 1). Your sampling must be heavier for the edge cases. Lesson: Understand the edge cases and be sure to attack them.

colejohnson66 · on Aug 11, 2023

The x87 FPU used here gives 80 bits of storage, so it's actually 2^80*2^80 operand pairs.

dbcurtis · on Aug 11, 2023

yes, I was simplifying. also 32 bot mode.

dcminter · on Aug 11, 2023

Back in the day this was the more interesting bug to me even if it was less serious in practice: https://en.m.wikipedia.org/wiki/Pentium_F00F_bug

mpixel · on Aug 11, 2023

what's interesting is that this isn't as serious as some of the issues today.

just following the example, the result is wrong after the 4th digit. this is absolutely 'could be handled in software with a significant performance loss' territory.

that isn't to say this isn't breaking -- it absolutely is.

what I'm getting at is that my CPU runs 30% slower with mitigations. if I were running multiple servers like the computer I use, and I was near the capacity of my resources, I would need to add some machine(s) -- and people do.

in the same way this is serious, these current issues are the same yet somehow we've lost some stuff along the way and we just don't do recalls or even receive any remedies. where's my check?

mjg59 · on Aug 11, 2023

For the most part, the bugs we've seen in recent history can be handled at the kernel level - there's largely no need for applications to care about them (browsers to an extent being a counterexample, but even then if you disable hyperthreading it can basically be ignored at the browser level). The floating point division bug was something that every single application would need to have handling code for. From a "What is the worst case outcome", the Spectre family of bugs may win. From a "What is going to require the most engineering effort to fix in software", fdiv would probably have been a much bigger deal.

nextaccountic · on Aug 11, 2023

One option would be to have an x86 to x86 compiler that automatically fixed your code. In principle you don't need access to source and it could be shipped by the OS

rvba · on Aug 11, 2023

I have this theory that Intel knew very well what they were doing by cutting corners everywhere. This way their processors were simply faster than the ones manufactured by AMD.

But it was "fake" advantage (due to ignoring bugs) - now that those bugs come out to light, their processors are slower. But were already sold.

And yes I know that AMD has some similar bugs.

The US government will not go after Intel though due to public safety issues - they want their processors to be manufactured at US soil (different thing is if they are really safe with all those bugs and spying features, also are they even manufactured inside USA anymore?).

Other governments could go aftet Intel though.

hardware2win · on Aug 11, 2023

>And yes I know that AMD has some similar bugs.

So, how does your theory stand?

If we assume that both companies were cutting corners and one was ahead, then?

dfhanionio · on Aug 11, 2023

Intel was vulnerable to Meltdown and Spectre. AMD was only vulnerable to Spectre.

Meltdown arose because vulnerable designs delay memory protection checks until as late as possible, after illegal accesses could have already affected state. This is a questionable decision, as it is playing fast and lose with the most fundamental building blocks of security. Most companies wisely chose a more cautious approach.

If an investigation found that watermelon is carcinogenic, that would be a problem. If that same investigation found that Andy's Farm sells watermelon that contains benzene, Andy's Farm couldn't defend itself by saying all watermelon is carcinogenic.

planede · on Aug 11, 2023

Until they got caught out on it competition on performance incentivized both to cut corners on side channel attack surfaces.

h2odragon · on Aug 11, 2023

I've had a notion for years that Intel has intentionally failed to pursue some areas of performance improvement so as to reserve them for "official" users.

way, way back at the beginning of the PC revolution, for example: SRAM vs DRAM? Intel went all in on DRAM. Imagine how different the world could be today if they'd chosen differently an could expect all main RAM to be "cahce latency". DRAM would be a slower dynamic store, possibly looking more like hard drives did.

What if they'd decided that unifying a bus was a good idea instead of tossing out new ones every couple years? Imagine the longevity and variety of the VME bus ecosystem couples with the size of the PC market, in the 80s.

mrguyorama · on Aug 11, 2023

> for example: SRAM vs DRAM? Intel went all in on DRAM.

This isn't really how it went. There was never a future with hundreds of megabytes of SRAM as it requires significantly more die area to produce and more power to use, making it significantly more expensive. The entire point of caches was because we couldn't afford to just make everything SRAM. Even today, we are only just getting to the point where you might have a few hundred megabytes of SRAM on the most expensive server CPUs.

colejohnson66 · on Aug 11, 2023

> The entire point of caches was because we couldn't afford to just make everything SRAM.

Because SRAM is expensive compared to DRAM. SRAM requires four transistors per bit, but DRAM requires just one. And that one transistor doubles as your capacitor. In addition, routing makes a single SRAM cell a bit bigger than four DRAM ones (at the same process node). So, DRAM can be packed to densities that are not feasible for SRAM. There's a reason AMD (and others) is/are starting to put cache on an entirely separate die (X3D).

dale_glass · on Aug 11, 2023

We're already up to 1.1 GB of cache:

https://www.servethehome.com/amd-genoa-x-the-1-1gb-l3-cache-...

It's a little bit mindblowing.

dustfree_mt · on Aug 11, 2023

UVM, formal verification, co-simulation through DPI were at their infancy back then, thus such silly(albeit critical) errors kept popping up and were really hard to catch during pre/post-silicon testing.

You can't catch spectre or similar things where multiple blocks all doing the wrong thing together with UVM or formal nowadays either(too large search space), so I'm eager to see what kind of top-level verif. methodologies people will invent as the designs grow larger and larger.

ajsnigrutin · on Aug 11, 2023

It's funny how they had to recall the CPUs (even if they tried to make it harder), and now, with all the security issues (from spectre to downfall), where the patches cause a massive CPU performance hit, noone is yelling for a recall.

ghaff · on Aug 11, 2023

Well, the "solution" to side channel attacks (including those that haven't been discovered yet) on pretty much every out-of-order execution processor out there, which is to say pretty much every high performance processor, is to disable a lot of features and significantly throttle them--which no one really wants to do.

wnoise · on Aug 11, 2023

I do. I want my hardware to do exactly what it gets told to do, and nothing else. A pity that that's basically impossible these days.

jansan · on Aug 11, 2023

I would rather call it "noteworthy" than "funny".

H8crilA · on Aug 11, 2023

Volkswagen also pretty much got away with selling cars that do not reach power specifications (if you remove the environmental problem).

toast0 · on Aug 11, 2023

I don't know about outside the US, but in the US, they had to offer to buy back the cars and/or modify them to fit the environmental spec. I don't know about power specs, but the mileage specs were done under the compliant engine control regime; so post modification you would get mileage to spec as opposed to the better mileage available earlier.

The buy back option was based on pricing before the news broke and for a good condition vehicle, there was an adjustment for odometer miles, but the vehicle could be in any condition, just had to run and move on its own power. They were not permitted to resale purchased vehicles anywhere globally unless they were modified to properly comply with emissions controls.

And they were forced into selling EVs and building a charging network. Maybe that will work out for them, maybe not.

CamperBob2 · on Aug 11, 2023

I had friends at the time who made six-figure gains from buying INTC calls during the initial panic. Fortis fortuna.

bombcar · on Aug 11, 2023

I think it was the first time that the "public at large", even the computer nerds, had really realized that the hardware could have bugs, serious bugs.

print_goto_ten · on Aug 11, 2023

I recall VB6 having a checkbox for dealing with this bug when you complied.

lizknope · on Aug 11, 2023

My first PC was a Pentium 90 that I got in summer 1994. It had the FDIV bug. I read about the bug, didn't think it was a big deal for how I used my computer and never got a replacement chip. I still have the CPU in a box in the closet.

ForOldHack · on Aug 11, 2023

https://www.science.org/doi/10.1126/science.267.5195.175

jgalt212 · on Aug 11, 2023

I have always wondered if anyone was able to construct an interest rate derivatives pick-off play based on this bug. My searches of the intertubes have found nothing so far.

klinquist · on Aug 11, 2023

"Intel Inside Can't Divide!" was what people said when this news broke.

baggy_trough · on Aug 11, 2023

Is it known how those few entries in the table got the wrong values?

whycome · on Aug 11, 2023

Intel's whitepaper on it has a bit more information. But still seems to (after a skim) kinda handwave away just HOW the bug got introduced.

https://users.fmi.uni-jena.de/~nez/rechnerarithmetik_5/fdiv_...

dbcurtis · on Aug 11, 2023

Yes. Off-by-one error in a bash script that generated them.