So it gives you the wrong answer and then you keep telling it how to fix it unti...

killthebuddha · on Sept 14, 2024

Basically yes, but there's a very wide range of how explicit the feedback could be. Here's an example where I tell gpt-4 exactly what the rule is and it still fails:

https://chatgpt.com/share/66e514d3-ca0c-8011-8d1e-43234391a0...

and an example using gpt-4o:

https://chatgpt.com/share/66e515da-a848-8011-987f-71dab56446...

I'd share similar examples using claude-3.5-sonnet but I can't figure out how to do it from the claud.ai ui.

To be clear, my point is not at all that o1 is so incredibly smart. IMO the ARC-AGI puzzles show very clearly how dumb even the most advanced models are. My point is just that o1 does seem to be noticeably better at solving these problems than previous models.

rahimnathwani · on Sept 14, 2024

The easiest way I know of to share Claude chats is by using this Chrome extension to create a GitHub gist:

https://chromewebstore.google.com/detail/claudesave/bmdnfhji...

It's not perfect, but works fine for chats that don't have tables.

usaar333 · on Sept 14, 2024

> where I tell gpt-4 exactly what the rule is and it still fails

It figured out the rule itself. It has problems applying the rule.

In this example btw, asking it to write a program will solve the problem.

seaal · on Sept 14, 2024

All examples are 404'd for me.

killthebuddha · on Sept 14, 2024

Hmm. My first thought was that I shared non-public links, but I double-checked I can access them from another machine.

stingraycharles · on Sept 14, 2024

FYI They load fine for me.

seaal · on Sept 15, 2024

Yeah seems to just be an issue with my Firefox configuration -- works fine on Edge.

Zr01 · on Sept 14, 2024

The pages fail to load on old web browsers.