Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So it gives you the wrong answer and then you keep telling it how to fix it until it does? What does fancy prompting look like then, just feeding it the solution piece by piece?


Basically yes, but there's a very wide range of how explicit the feedback could be. Here's an example where I tell gpt-4 exactly what the rule is and it still fails:

https://chatgpt.com/share/66e514d3-ca0c-8011-8d1e-43234391a0...

and an example using gpt-4o:

https://chatgpt.com/share/66e515da-a848-8011-987f-71dab56446...

I'd share similar examples using claude-3.5-sonnet but I can't figure out how to do it from the claud.ai ui.

To be clear, my point is not at all that o1 is so incredibly smart. IMO the ARC-AGI puzzles show very clearly how dumb even the most advanced models are. My point is just that o1 does seem to be noticeably better at solving these problems than previous models.


The easiest way I know of to share Claude chats is by using this Chrome extension to create a GitHub gist:

https://chromewebstore.google.com/detail/claudesave/bmdnfhji...

It's not perfect, but works fine for chats that don't have tables.


> where I tell gpt-4 exactly what the rule is and it still fails

It figured out the rule itself. It has problems applying the rule.

In this example btw, asking it to write a program will solve the problem.


All examples are 404'd for me.


Hmm. My first thought was that I shared non-public links, but I double-checked I can access them from another machine.


FYI They load fine for me.


Yeah seems to just be an issue with my Firefox configuration -- works fine on Edge.


The pages fail to load on old web browsers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: