Research from UCLA psychologists has discovered a surprising new contender in our analogical reasoning battles – the artificial intelligence language model, GPT-3. Apparently, it holds its own against college undergraduates on reasoning problems typical of intelligence tests and the SAT.
But it fails to answer a key question: Is GPT-3 merely parroting human reasoning, or has it stumbled onto a brand-new cognitive process? (And, does this research say more about technology, college students, or intelligence tests?!)
Humans vs GPT-3
OpenAI holds GPT-3’s secrets under tight wraps, so they aren’t going to be much help in figuring out how the algorithm works its “magic.” Despite the mystery, the UCLA researchers found that GPT-3 outperformed their expectations on some tasks. Yet, other tasks saw it crash and burn.
Despite its ability to embarrass some college students, the study’s first author, Taylor Webb, emphasized GPT-3’s limitations. While it excels at analogical reasoning, it fails spectacularly at tasks simple for humans, like using tools to solve physical problems.
Webb and his colleagues tested GPT-3 on problems inspired by Raven’s Progressive Matrices. They translated the visual problems into text and gave the same problems to 40 UCLA undergraduate students.
Not only did GPT-3 perform as well as humans, but it also made similar mistakes.
What the Study Results Mean
GPT-3 solved 80% of the problems correctly, while the human average score was below 60%. The team then tested GPT-3 with SAT analogy questions they believed had never been on the internet (which would mean they weren’t part of the GPT training data). Again, GPT-3 outperformed the average college applicant’s score (then again, we know these tests aren’t really a measure of intelligence).
However, when the researchers tested the program against student volunteers on analogy problems based on short stories, GPT-3 struggled.
And tasks that require understanding physical space continue to baffle the so-called “artificial intelligence.”
“No matter how impressive our results, it’s important to emphasize that this system has major limitations,” said Taylor Webb, the study’s first author. “It can do analogical reasoning, but it can’t do things that are very easy for people, such as using tools to solve a physical task. When we gave it those sorts of problems — some of which children can solve quickly — the things it suggested were nonsensical.”