In general Online Rounds of coding competitions are no longer going to work. As ...

krackers · on March 25, 2023

>As GPT-4 had been shown to outperform humans on the coding competition tasks

On the same frontpage: https://news.ycombinator.com/item?id=35297067

Food for thought. Also by their own paper it has only a codeforces rank of 392 which is below 5th percentile.

krackers · on March 25, 2023

Also a cute prompt I saw on the same twitter thread

```

This is a variant of the monty hall problem:

Suppose you're on a game show, and you're given the choice of three doors. IMPORTANT: All the doors are transparent. Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who clearly knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

```

The response from chatGPT (v3, non plus) is:

```

Yes, it is to your advantage to switch your choice.

In this variant of the Monty Hall problem, the fact that the doors are transparent does not change the underlying probability of the problem.

```

I wonder how gpt-4 does. There does seem to be a bit of overfitting, but I consider the prompt fair enough and even called out the trick in capital letters, so that even a tired human should be able to get the answer.

Edit: An even more obvious variant of the prompt is seen in https://news.ycombinator.com/item?id=35192466, which goes further and spells out that the contestant explicitly picks the door with the car. ChatGPT still gets it wrong.

aristidb · on March 25, 2023

GPT 4 gave me this answer:

In this variant of the Monty Hall problem, since all the doors are transparent, you can clearly see what is behind each door. This changes the nature of the problem entirely. The element of uncertainty, which is present in the original Monty Hall problem, is no longer a factor.

In this scenario, if you pick door No. 1 and see a car behind it, you should stick with your choice, as there is no advantage to switching. If you pick door No. 1 and see a goat behind it, you should switch to door No. 2, as you can clearly see the car behind it.

Since you can see what's behind the doors, the probability of winning the car is no longer based on conditional probability, and the original Monty Hall paradox does not apply. Instead, your decision is simply based on your observation of what's behind each door.

versteegen · on March 25, 2023

Somehow, this would be one of the most impressive things I've read about GPT-4. It's really difficult to argue that it has well-founded understanding of the question, assuming, of course, that this wasn't actually in its training set.

And I see someone DID ask GPT-3.5-based ChatGPT the same question at least a month ago [1], so OpenAI certainly has it on record. That's long enough ago that it could well have been used to fine-tune GPT-4.

[1] https://www.reddit.com/r/bing/comments/117hj14/bing_chat_can...

fastball · on March 25, 2023

GPT-4 responses.

-- First variant --

Since all the doors are transparent in this variant of the Monty Hall problem, you can already see what is behind each door when making your initial choice. Therefore, there is no need to rely on probability or switching strategies to maximize your chances of winning the car.

In this case, you simply choose the door with the car visible behind it. The host's action of opening another door and offering you the chance to switch doesn't provide any new information or change the odds, as you can already see the contents of each door.

-- Explicit variant --

In this particular case, since the doors are transparent and you can see that the car is behind door number one, you should stick with your original choice. There is no need to switch doors as the probability of winning the car by staying with door number one is 100%. The Monty Hall problem typically assumes that the doors are not transparent and that the contents behind them are unknown to the contestant.

TacticalCoder · on March 25, 2023

> I wonder how gpt-4 does.

The problem is that as soon as people started tricking ChatGPT 3 into problems like that, the correct answers are now being used to train the next versions and are going to be part of the dataset.

So GPT-4 or GPT-5 may get the answer right, but that still wouldn't mean anything.

M4v3R · on March 25, 2023

Not the case for GPT-4 though, as it’s knowledge cutoff is the same as GPT-3s, that’s why it’s easy to compare the two on the same problems and see what’s the difference.

versteegen · on March 25, 2023

Not so fast!

Yes, the GPT-4 paper says

> GPT-4 generally lacks knowledge of events that have occurred after the vast majority of its pre-training data cuts off in September 2021 [footnote: The pre-training and post-training data contain a small amount of more recent data], and does not learn from its experience.

But note the more recent data. We know that InstructGPT (GPT-3.5) was RL trained on examples of previous queries to GPT3, such as those trick questions. We could assume everything (after filtering e.g. for benchmark contamination) ever sent to OpenAI is in that post-training set. This is indeed a very small amount of data compared to the trillion-plus tokens of older data it was surely trained on. We also know that when ARC did their evaluations of GPT-4, OpenAI hadn't finished fine-tuning yet, so they've certainly been continuing to do so recently.

See also my other comment https://news.ycombinator.com/item?id=35300668

ResearchCode · on March 25, 2023

I suspect you're right. Part of the supervised "learning" is hard-coding answers to gotchas posted on Twitter.

fastball · on March 25, 2023

Fairly certain that is not what is going on here. GPT-4 seems genuinely better at reasoning and harder to trick from my testing.

Kiro · on March 25, 2023

I've read your prompt several times now and still don't understand. It seems intentionally crafted to be confusing with messy punctuation. I get lost before finishing the paragraph every time.

Just a couple of years ago anything else than it responding "I don't understand" would be science-fiction and now we are surprised it's answering incorrectly on something even humans have a hard time to parse.

codetrotter · on March 25, 2023

The point of the prompt is that in the classic game you don’t know what’s behind the doors.

But in the variant of the game used in the prompt, the doors are transparent.

So in this variant of the game you already know what is behind all of the doors, meaning that you will already have been able to choose the right door, and also meaning that “revealing” what is behind one of the doors does not change the probability of what is behind the other doors.

campbel · on March 25, 2023

Chess already has that problem and is still thriving. Something else is going on here.

milemi · on March 25, 2023

I’m sure it outperforms the general population since most people can’t code a hello world, or regurgitate an answer to a problem they have been trained to answer but have no understanding of like ChatGPT can. But if a minimally competent human gave me a completely nonsensical answer to a question they haven’t seen before, the way ChatGPT does so confidently, I would think one of us had a stroke.

I would expect a journalist who never had do think through a theory of computation course to make breathless claims that ChatGPT can “solve programming problems“, but I’m pretty surprised to hear so many people who have jobs in tech repeating these claims, especially since all it would take them to trip up ChatGPT is a few seconds to type in a slightly unfamiliar or non-trivial question. It’s like a kind of second-order Turing test: if you think ChatGPT can program, you’re not a real programmer.

lovecg · on March 25, 2023

In related news, all chess and Go competitions have been shutdown permanently.

devit · on March 25, 2023

GPT-4 is competely incapable of solving any advanced problem in coding or mathematics competitions, and usually doesn't even appear to correctly understand the problem statement (assuming the solutions are not in the training set, of course).

Just try submitting the IOI 2022 and IMO 2022 problems.

Obviously it still outperforms the average human, since the average human has no knowledge of mathematics or computer science whatsoever.

ajkjk · on March 25, 2023

> GPT-4 had been shown to outperform humans on the coding competition tasks

citation?

dragonwriter · on March 25, 2023

Probably like most GPT-4 performance claims, the non-peer-reviewed scientifically-formatted paper with opaque methodology published as PR for GPT-4.

schrodinger · on March 25, 2023

[flagged]

nozzlegear · on March 25, 2023

Sometimes things that have been discussed as possibilities on this forum begin to be repeated and taken as fact. One might say that this forum has hallucinated the capabilities of this bot beyond its actual capacity, and the hallucination grows daily. A citation is indeed needed.

lovecg · on March 25, 2023

I haven’t seen a clear answer on this actually - there was some confusion if the training data contained solutions for older problems, and if it underperforms on previously unseen ones. Plus it still seemed to underperform on medium and hard problems? Does anyone in the know have a summary on this?

camdenreslink · on March 25, 2023

I think it has only been shown that ChatGPT can solve easy/some medium leetcode style questions. I doubt it can solve difficult problems that would give human competitive programmers a hard time (for now anyway...)

ajkjk · on March 25, 2023

Oh, I looked and didn't find anything. I thought about writing something about how I thought it wasn't true but decided it would be better to just ask if they had a reference for it. also, it's an extraordinary claim, so pretending like it's something everybody already knows is pretty weird.

unethical_ban · on March 25, 2023

GPT4 has been out for what, weeks? We aren't talking established theorems here.

iliekcomputers · on March 25, 2023

chess has had engines 100x better than magnus carlsen for years and it's not dead. people who have fun giving these competitions will continue having fun, while people who don't will keep crying that they're useless or dead or whatever.

KeplerBoy · on March 25, 2023

That requires a clear ruleset.

In chess it's very clear where to draw the line: no help allowed, you're on your own with your brain.

Where should one draw the line in Coding competitions? No ChatGPT? I guess most people would agree. No Copilot? Same as ChatGPT as the products seem to converge. No Googling? That too will converge to something close to ChatGPT.

If you can't look up information during a contest anymore, it comes down to memorization instead of problem solving. I am afraid the concept of coding concepts is dead indeed.

nitwit005 · on March 25, 2023

There have, of course, already been coding competitions that didn't allow any sort of reference, ones that only allowed reference you brought with you, etc.

And, honestly, plenty of people do have enough memorized to do this stuff.

devit · on March 25, 2023

In-person competitions simply don't allow any Internet access, you have to use their own machines and you can't bring any data or printed material or electronic device.

KeplerBoy · on March 25, 2023

That's not the experience i had at past events. Everyone brought their own devices and solved the problems with whatever tool they deemed suitable.

Actually I'm attending another such event in a few days and expect to see a lot of ChatGPT sessions used with varying levels of success.

nmca · on March 25, 2023

> GPT-4 had been shown to outperform humans on the coding competition tasks

This is not true as of March 2023 (hard to prove a negative of course but look at the percentiles in the GPT4 report)