The model's scoring was done by another model though no? That was the source of ...

		suddenlybananas 7 months ago \| parent \| context \| favorite \| on: AI agent benchmarks are broken The model's scoring was done by another model though no? That was the source of the answer being mislabed as correct. So a different model thought that 45+8=63.