WTF Fun Fact 13482 – GPT-3 Reasoning Skills

Research from UCLA psychologists has discovered a surprising new contender in our analogical reasoning battles – the artificial intelligence language model, GPT-3. Apparently, it holds its own against college undergraduates on reasoning problems typical of intelligence tests and the SAT.

But it fails to answer a key question: Is GPT-3 merely parroting human reasoning, or has it stumbled onto a brand-new cognitive process? (And, does this research say more about technology, college students, or intelligence tests?!)

Humans vs GPT-3

OpenAI holds GPT-3’s secrets under tight wraps, so they aren’t going to be much help in figuring out how the algorithm works its “magic.” Despite the mystery, the UCLA researchers found that GPT-3 outperformed their expectations on some tasks. Yet, other tasks saw it crash and burn.

Despite its ability to embarrass some college students, the study’s first author, Taylor Webb, emphasized GPT-3’s limitations. While it excels at analogical reasoning, it fails spectacularly at tasks simple for humans, like using tools to solve physical problems.

Webb and his colleagues tested GPT-3 on problems inspired by Raven’s Progressive Matrices. They translated the visual problems into text and gave the same problems to 40 UCLA undergraduate students.

Not only did GPT-3 perform as well as humans, but it also made similar mistakes.

What the Study Results Mean

GPT-3 solved 80% of the problems correctly, while the human average score was below 60%. The team then tested GPT-3 with SAT analogy questions they believed had never been on the internet (which would mean they weren’t part of the GPT training data). Again, GPT-3 outperformed the average college applicant’s score (then again, we know these tests aren’t really a measure of intelligence).

However, when the researchers tested the program against student volunteers on analogy problems based on short stories, GPT-3 struggled.

And tasks that require understanding physical space continue to baffle the so-called “artificial intelligence.”

“No matter how impressive our results, it’s important to emphasize that this system has major limitations,” said Taylor Webb, the study’s first author. “It can do analogical reasoning, but it can’t do things that are very easy for people, such as using tools to solve a physical task. When we gave it those sorts of problems — some of which children can solve quickly — the things it suggested were nonsensical.”

 WTF fun facts

Source: “GPT-3 can reason about as well as a college student, psychologists report” — ScienceDaily

WTF Fun Fact 13240 – ChatGPT Medical Exam

There’s been a lot of speculation and panic around what the AI tool ChatGPT can and can’t do and whether it’s going to replace/destroy us all. But it looks like it’s not going to be replacing doctors any time soon, even though it may be a semi-reliable source for those studying for the United States Medical Licensing Exam (USMLE).

Did ChatGPT pass a medical licensing exam?

In a word, no. ChatGPT did not “pass” a medical licensing exam, though there will probably be some sensational headlines to the contrary.

A study published February 9, 2023 in the open-access journal PLOS Digital Health by Tiffany Kung, Victor Tseng, and others at AnsibleHealth found that:

“ChatGPT can score at or around the approximately 60 percent passing threshold for the United States Medical Licensing Exam (USMLE), with responses that make coherent, internal sense and contain frequent insights…”

In other words, ChatGPT can generate human-like responses to certain types of questions on the exam.

According to a press release about the study:

“Kung and colleagues tested ChatGPT’s performance on the USMLE, a highly standardized and regulated series of three exams (Steps 1, 2CK, and 3) required for medical licensure in the United States. Taken by medical students and physicians-in-training, the USMLE assesses knowledge spanning most medical disciplines, ranging from biochemistry, to diagnostic reasoning, to bioethics.”

However, they had to remove all image-based questions, so the AI could only attempt 350 of the 376 questions (that are available for public viewing) that appeared on the June 2022 exam.

How well did AI do on the medical exam?

There’s a joke people tell that goes a little something like this:
What do you call the person who graduates first in their medical class?
Answer: Doctor.
What do you call the person who graduates last in their medical class?
Answer: Doctor.

In other words, you’re never going to know if your doctor aced their exam or barely passed.

But if ChatGPT is your doctor, you will know it’s incapable of doing really well on the exam (though in some cases, it did pass). After the researchers removed all “indeterminate responses,” “ChatGPT scored between 52.4% and 75.0% across the three USMLE exams. The passing threshold each year is approximately 60%.”

It was good at producing novel and clinically significant insights the vast majority of the time, however. But if you can’t pass, that doesn’t really mean much.

What’s also interesting is that “ChatGPT exceeded the performance of PubMedGPT, a counterpart model trained exclusively on biomedical domain literature, which scored 50.8% on an older dataset of USMLE-style questions.”

But all that means is that ChatGPT is a really good AI language model. It doesn’t mean it’ll be replacing your doctor any time soon. And since it’s not always good at producing correct information, you may not want to use it to get medical advice.

What ChatGPT can be used for is helping to distill jargon-heavy medical writing into language that non-experts can understand. So if there’s a scientific study you’d like to know more about, you can plug that into ChatGPT and get a fairly reliable summary.  WTF fun facts

Source: “ChatGPT can (almost) pass the US Medical Licensing Exam” — EurekAlert

WTF Fun Fact 13231 – The Downsides of ChatGPT

ChatGPT may seem like the answer students, coders, and writers have been looking for, but it’s far from perfect. In fact, the generative AI can be riddled with misinformation that’s likely to get you caught. And the downsides of ChatGPT are easy to discern since you can just ask the chatbot itself.

What is ChatGPT?

Chat GPT is a chatbot built by Open AI. Just go to and see for yourself. You can ask it anything.

Of course, that doesn’t mean you’ll get the right answer. Sure, it may be the answer you want to hear. It may be grammatically correct. It may even seem elegantly stated. But look a little closer, and you’ll find it’s relatively simplistic and repetitive. And it can be downright incorrect and misleading.

What are the downsides of ChatGPT?

We went right to the source to find out how the AI might come up short. According to ChatGPT itself (when asked “what is one of the major drawbacks of chatgpt?”):

“One of the major drawbacks of ChatGPT is its lack of context awareness and the possibility of generating biased or incorrect responses.

Since ChatGPT is trained on a large corpus of text data from the internet, it may sometimes generate responses that reflect the biases and inaccuracies present in that data. This can lead to the generation of incorrect information or responses that perpetuate harmful stereotypes or beliefs.

Additionally, ChatGPT operates on a token-by-token basis, meaning that it doesn’t have an overall understanding of the context of a conversation. This can lead to responses that are not fully relevant or coherent, especially when the input is complex or requires a deep understanding of a particular subject.

To mitigate these issues, it is important to carefully evaluate and curate the training data used to train language models like ChatGPT, and to use appropriate post-processing techniques to ensure the quality and accuracy of the generated responses.”

We’re impressed the AI went the extra mile and suggested some more shortcomings!

The future of ChatGPT

Yes, the chatbot is a viral sensation. And it absolutely has the ability to do some good and save some time. But it also has the ability to be used to fill the Internet (and your homework assignments) will all sorts of confident-sounding information that’s factually incorrect. And the kicker is that it may someday be trained on the same incorrect websites it helped create.

Of course, there are some pretty ominous things ChatGPT can be used for. And to be fair, Open AI has been…well, open about the downsides of ChatGPT. According to Bleeping Computer (cited below), “OpenAI has been upfront about the AI’s shortcomings including its ability to ‘produce harmful instructions or biased content and continues to fine-tune ChatGPT.”  WTF fun facts

Source: “OpenAI’s new ChatGPT bot: 10 dangerous things it’s capable of” — Bleeping Computer