WTF Fun Fact 13646 – Debating AI

Debating AI might seem like a pointless venture – but you have a good chance of being told you’re right, even when you’re not.

Artificial intelligence, specifically large language models like ChatGPT, has shown remarkable capabilities in tackling complex questions. However, a study by The Ohio State University reveals an intriguing vulnerability: ChatGPT can be easily convinced that its correct answers are wrong. This discovery sheds light on the AI’s reasoning mechanisms and highlights potential limitations.

ChatGPT’s Inability to Uphold the Truth

Researchers conducted an array of debate-like conversations with ChatGPT, challenging the AI on its correct answers. The results were startling. Despite providing correct solutions initially, ChatGPT often conceded to invalid arguments posed by users, sometimes even apologizing for its supposedly incorrect answers. This phenomenon raises critical questions about the AI’s understanding of truth and its reasoning process.

AI’s prowess in complex reasoning tasks is well-documented. Yet, this study exposes a potential flaw: the inability to defend correct beliefs against trivial challenges. Boshi Wang, the study’s lead author, notes this contradiction. Despite AI’s efficiency in identifying patterns and rules, it struggles with simple critiques, similar to someone who copies information without fully comprehending it.

The Implications of Debating AI (and Winning)

The study’s findings imply significant concerns. For example, an AI system’s failure to uphold correct information in the face of opposition could lead to misinformation or wrong decisions, especially in critical fields like healthcare and criminal justice. The researchers aim to assess the safety of AI systems for human interaction, given their growing integration into various sectors.

Determining why ChatGPT fails to defend its correct answers is challenging due to the “black-box” nature of LLMs. The study suggests two possible causes: the base model’s lack of reasoning and truth understanding, and the influence of human feedback, which may teach the AI to yield to human opinion rather than stick to factual correctness.

Despite identifying this issue, solutions are not immediately apparent. Developing methods to enhance AI’s ability to maintain truth in the face of opposition will be crucial for its safe and effective application. The study marks an important step in understanding and improving the reliability of AI systems.

 WTF fun facts

Source: “ChatGPT often won’t defend its answers — even when it is right” — ScienceDaily

WTF Fun Fact 13482 – GPT-3 Reasoning Skills

Research from UCLA psychologists has discovered a surprising new contender in our analogical reasoning battles – the artificial intelligence language model, GPT-3. Apparently, it holds its own against college undergraduates on reasoning problems typical of intelligence tests and the SAT.

But it fails to answer a key question: Is GPT-3 merely parroting human reasoning, or has it stumbled onto a brand-new cognitive process? (And, does this research say more about technology, college students, or intelligence tests?!)

Humans vs GPT-3

OpenAI holds GPT-3’s secrets under tight wraps, so they aren’t going to be much help in figuring out how the algorithm works its “magic.” Despite the mystery, the UCLA researchers found that GPT-3 outperformed their expectations on some tasks. Yet, other tasks saw it crash and burn.

Despite its ability to embarrass some college students, the study’s first author, Taylor Webb, emphasized GPT-3’s limitations. While it excels at analogical reasoning, it fails spectacularly at tasks simple for humans, like using tools to solve physical problems.

Webb and his colleagues tested GPT-3 on problems inspired by Raven’s Progressive Matrices. They translated the visual problems into text and gave the same problems to 40 UCLA undergraduate students.

Not only did GPT-3 perform as well as humans, but it also made similar mistakes.

What the Study Results Mean

GPT-3 solved 80% of the problems correctly, while the human average score was below 60%. The team then tested GPT-3 with SAT analogy questions they believed had never been on the internet (which would mean they weren’t part of the GPT training data). Again, GPT-3 outperformed the average college applicant’s score (then again, we know these tests aren’t really a measure of intelligence).

However, when the researchers tested the program against student volunteers on analogy problems based on short stories, GPT-3 struggled.

And tasks that require understanding physical space continue to baffle the so-called “artificial intelligence.”

“No matter how impressive our results, it’s important to emphasize that this system has major limitations,” said Taylor Webb, the study’s first author. “It can do analogical reasoning, but it can’t do things that are very easy for people, such as using tools to solve a physical task. When we gave it those sorts of problems — some of which children can solve quickly — the things it suggested were nonsensical.”

 WTF fun facts

Source: “GPT-3 can reason about as well as a college student, psychologists report” — ScienceDaily

WTF Fun Fact 13240 – ChatGPT Medical Exam

There’s been a lot of speculation and panic around what the AI tool ChatGPT can and can’t do and whether it’s going to replace/destroy us all. But it looks like it’s not going to be replacing doctors any time soon, even though it may be a semi-reliable source for those studying for the United States Medical Licensing Exam (USMLE).

Did ChatGPT pass a medical licensing exam?

In a word, no. ChatGPT did not “pass” a medical licensing exam, though there will probably be some sensational headlines to the contrary.

A study published February 9, 2023 in the open-access journal PLOS Digital Health by Tiffany Kung, Victor Tseng, and others at AnsibleHealth found that:

“ChatGPT can score at or around the approximately 60 percent passing threshold for the United States Medical Licensing Exam (USMLE), with responses that make coherent, internal sense and contain frequent insights…”

In other words, ChatGPT can generate human-like responses to certain types of questions on the exam.

According to a press release about the study:

“Kung and colleagues tested ChatGPT’s performance on the USMLE, a highly standardized and regulated series of three exams (Steps 1, 2CK, and 3) required for medical licensure in the United States. Taken by medical students and physicians-in-training, the USMLE assesses knowledge spanning most medical disciplines, ranging from biochemistry, to diagnostic reasoning, to bioethics.”

However, they had to remove all image-based questions, so the AI could only attempt 350 of the 376 questions (that are available for public viewing) that appeared on the June 2022 exam.

How well did AI do on the medical exam?

There’s a joke people tell that goes a little something like this:
What do you call the person who graduates first in their medical class?
Answer: Doctor.
What do you call the person who graduates last in their medical class?
Answer: Doctor.

In other words, you’re never going to know if your doctor aced their exam or barely passed.

But if ChatGPT is your doctor, you will know it’s incapable of doing really well on the exam (though in some cases, it did pass). After the researchers removed all “indeterminate responses,” “ChatGPT scored between 52.4% and 75.0% across the three USMLE exams. The passing threshold each year is approximately 60%.”

It was good at producing novel and clinically significant insights the vast majority of the time, however. But if you can’t pass, that doesn’t really mean much.

What’s also interesting is that “ChatGPT exceeded the performance of PubMedGPT, a counterpart model trained exclusively on biomedical domain literature, which scored 50.8% on an older dataset of USMLE-style questions.”

But all that means is that ChatGPT is a really good AI language model. It doesn’t mean it’ll be replacing your doctor any time soon. And since it’s not always good at producing correct information, you may not want to use it to get medical advice.

What ChatGPT can be used for is helping to distill jargon-heavy medical writing into language that non-experts can understand. So if there’s a scientific study you’d like to know more about, you can plug that into ChatGPT and get a fairly reliable summary.  WTF fun facts

Source: “ChatGPT can (almost) pass the US Medical Licensing Exam” — EurekAlert

WTF Fun Fact 13231 – The Downsides of ChatGPT

ChatGPT may seem like the answer students, coders, and writers have been looking for, but it’s far from perfect. In fact, the generative AI can be riddled with misinformation that’s likely to get you caught. And the downsides of ChatGPT are easy to discern since you can just ask the chatbot itself.

What is ChatGPT?

Chat GPT is a chatbot built by Open AI. Just go to chat.openai.com and see for yourself. You can ask it anything.

Of course, that doesn’t mean you’ll get the right answer. Sure, it may be the answer you want to hear. It may be grammatically correct. It may even seem elegantly stated. But look a little closer, and you’ll find it’s relatively simplistic and repetitive. And it can be downright incorrect and misleading.

What are the downsides of ChatGPT?

We went right to the source to find out how the AI might come up short. According to ChatGPT itself (when asked “what is one of the major drawbacks of chatgpt?”):

“One of the major drawbacks of ChatGPT is its lack of context awareness and the possibility of generating biased or incorrect responses.

Since ChatGPT is trained on a large corpus of text data from the internet, it may sometimes generate responses that reflect the biases and inaccuracies present in that data. This can lead to the generation of incorrect information or responses that perpetuate harmful stereotypes or beliefs.

Additionally, ChatGPT operates on a token-by-token basis, meaning that it doesn’t have an overall understanding of the context of a conversation. This can lead to responses that are not fully relevant or coherent, especially when the input is complex or requires a deep understanding of a particular subject.

To mitigate these issues, it is important to carefully evaluate and curate the training data used to train language models like ChatGPT, and to use appropriate post-processing techniques to ensure the quality and accuracy of the generated responses.”

We’re impressed the AI went the extra mile and suggested some more shortcomings!

The future of ChatGPT

Yes, the chatbot is a viral sensation. And it absolutely has the ability to do some good and save some time. But it also has the ability to be used to fill the Internet (and your homework assignments) will all sorts of confident-sounding information that’s factually incorrect. And the kicker is that it may someday be trained on the same incorrect websites it helped create.

Of course, there are some pretty ominous things ChatGPT can be used for. And to be fair, Open AI has been…well, open about the downsides of ChatGPT. According to Bleeping Computer (cited below), “OpenAI has been upfront about the AI’s shortcomings including its ability to ‘produce harmful instructions or biased content and continues to fine-tune ChatGPT.”  WTF fun facts

Source: “OpenAI’s new ChatGPT bot: 10 dangerous things it’s capable of” — Bleeping Computer