I have been using GPT 4 daily for coding for probably six months, immediately started using Claude Opus when it came out. Opus is ahead of GPT4. There are times when I test both, but I am almost always solely using Opus. GPT4 laziness is still a huge issue, it seems like Anthropic specifically trained Opus not to be lazy.
The UX of GPT4 is better, you can cancel chats/edit old chats, etc. But the raw model is behind. You have to expect that OpenAI is working on something big, and is not afraid of lagging behind Anthropic for a while.
Speaking from my own experience, which may be different from the grandparent comment: I’ll ask ChatGPT (on GPT4) for some analysis or factual type lookup, and I’ll get back a kinda generic answer that doesn’t answer the question. If I then prompt it again, aka a “please look it up” type message, the next reply will have the results I would have initially expected.
It makes me wonder if OpenAI has been tuning it to not do web queries below some certain threshold of “likely to help improve reply.”
I’d say ChatGPT’s replies have also gotten slowly worse with each passing month. I suspect as they try to tune it for bad outcomes, they’re inadvertently also chopping out the high points.
The UX of GPT4 is better, you can cancel chats/edit old chats, etc. But the raw model is behind. You have to expect that OpenAI is working on something big, and is not afraid of lagging behind Anthropic for a while.