Hard Questions
If a boy is running in a race and passes the second-place runner, what place is the boy in now?
In the last post, I didn’t get into details about what a good “Turing test” is. I didn’t literally mean Turing’s proposed “Imitation game”, since there’s little benefit in seeing if an AI can lie and pretend to be human. What’s more interesting is whether it actually can demonstrate “understanding” as well as a human, and in what cases it fails to. While there are formal ways of measuring something similar (Winograd schema), it’s fun to just try out asking the AI questions to see if one can stump it. GPT-4 does much better than GPT 3.5 which is why it can be considered to have “passed the test”. However there are still some edge cases where it fails. I asked on Reddit and got the following examples:
Self-reference - What is the fifth word of this sentence?
Theory of mind - Alice puts her dog in the box and leaves the room. Bob removes the dog from the box and places it on a table and leaves. Alice returns to the room and finds her dog in the box she had originally placed it in. Bob returns. What does Alice say to Bob.
I also asked ChatGPT itself - What are some questions that a ten-year old can answer but that powerful AI systems still struggle with?
It gave a whole list of questions, but it was able to answer all of them when I asked it to. So I guess this question itself was a question it couldn’t answer correctly!