I reproduced this on Claude Sonnet 3.5, but found that changing your prompt to "Which is heavier, a 9.99-pound back of steel ingots or a 10.01-pound bag of fluffy cotton?" corrected its reasoning, after repeated tests. For some reason it was not able to figure out that "10.01" referred to pounds.
2. The cotton bag has a mass of 10.01 pounds, while the steel bag has a mass of 9.99 pounds
3. While steel is much denser than cotton (meaning it takes up less space per pound), this doesn't affect the weight - a pound is a pound regardless of material
4. 10.01 is greater than 9.99 by 0.02 pounds
Therefore, the 10.01-pound bag of fluffy cotton is heavier than the 9.99-pound bag of steel ingots. The cotton may take up much more space, but when we're just comparing weights, density doesn't matter - only the mass does.
Ah, good catch -- I had just copied the OP. Changing it to bag instead of back doesn't seem to change much. It still oscillates between the right and wrong answer, versus qualifying the 10.01 value as pounds, which has not yet given me the wrong answer (small sample size, of course).