Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> "If the Gatekeeper says "I am examining your source code", the results seen by the Gatekeeper shall again be provided by the AI party, which is assumed to be sufficiently advanced to rewrite its own source code, manipulate the appearance of its own thoughts if it wishes, and so on."

This IMHO is a huge loophole. I would not accept the bet with this in place. In the real-world scenario I would expect that there would be a copy of the AI's source code somewhere outside the box which would provide some useful information, unlike this protocol which allows the AI to lie about the nature of its code (making the "I am examining your source code" question completely useless.)



Do you think having an outside method of examining the source code is advantage enough when the AI can rewrite its source code.


Yes, because examining the old source code allows you to predict its behaviour, including the rewriting of source code. If line 42 says "never rewrite lines 42 or 43" and line 43 says "never kill humans" you would be more likely to let it out of the box than if line 42 said "rewrite whatever you want" and line 43 said "do whatever is necessary to achieve world domination."


Yes, because examining the old source code allows you to predict its behaviour, including the rewriting of source code.

This is the halting problem (http://en.wikipedia.org/wiki/Halting_problem), and there is no solution.


You are incorrect, the halting problem only proves that you cannot solve it in the general case. A very significant subset of programs can be statically determined; it's easy to prove that "main(){}" halts and that "main(){while(true);}" doesn't. It should be trivially obvious that you could group all programs into "Halts" or "Unknown" with no false positives simply by executing the program for X steps and observing the result.

If this was actually a concern of the programmers, they could design the program carefully to ensure it falls into the Halts category.


A very significant subset of programs can be statically determined...

Technically this may be correct, but I feel confident in asserting that a transhuman AI would not fall into that subset. You would have to run a second AI with the exact same inputs in order to make your 'prediction', leaving you in the same predicament with the second AI.


Until it adds Line 165: "ignore line 42".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: