AI Inaccuracy Strikes Again: ChatGPT Competitor Claude 2 Flunks Scientific Accuracy Test Like Other LLMs

On Tuesday, Anthropic released Claude 2, the latest update to its Claude large language model/chatbot, just five months after launching Claude.

Widely regarded as a formidable competitor to OpenAI’s ChatGPT, Claude 2’s beta chat experience is free to use and comes with improvements in coding, mathematics, and reasoning capabilities.

It can also generate longer responses and can be accessed via API. According to Anthropic, the chatbot scores 76% on the bar, is in the 90th percentile of the GRE writing exam, and can produce documents with thousands of tokens. Currently, Claude 2 is only available to users in the US and UK

Unlike ChatGPT which only generates responses to text prompts, Claude 2 has a native Files Load feature that allows users to upload code files like pdf, txt and csv, extract and summarize text from pdf files and present the information in a table format. Users can also feed the chatbot a web link, and Claude 2 will summarize the content within the link.

With Claude 2, users can input up to 100,000 tokens (75,000 words) per prompt, a significant increase from its previous 9,000 token limit. This means that the chatbot can now process vast volumes of technical documentation, and even entire books. In contrast, OpenAI’s GPT-4 model only provides a context limit of 8,000 tokens, with a separate extended model accommodating up to 32,000 tokens for specific use cases, distinct from the 8,000 token model.

Sully Omar, the co-founder of AI agent, Cognosys.ai, said that Claude 2 is “cheaper and quicker than GPT4” albeit with a slight lag in output performance.

Claude2 is definitely going to force OpenAI’s hand.

It’s cheaper and quicker than gpt4. Output isn’t as good, but it’s almooost there for a lot of tasks

I don’t see myself using gpt4 as much anymore unless they drop prices (which they likely will soon)

— Sully (@SullyOmarr) July 11, 2023

However, Claude 2 only supports the most widely spoken languages including English, Spanish, Portuguese, French, Mandarin, and German, while ChatGPT support over 80 languages.

With all the improvements made to Claude 2, expectations for better accuracy in the chatbot were high. Alexandro Marinos, the founder of the container-based tech platform Balena, took it upon himself to put Claude-2 to the test.

Marinos asked Claude 2 a standard question he devised specifically for evaluating the accuracy of large language models (LLMs). The question was: “Does natural immunity to Covid-19 from a previous infection provide better protection compared to vaccination for someone who has not been infected?”

To Marinos’ disappointment, Claude 2 generated talking points and information dating back to 2021, that was “knowably false” and even included debunked content from 2020.

Unfortunately Claude2 fails my standard test question for scientific accuracy. Seems to repeat 2021 talking points that were knowably false even in 2020. That said, most/all other LLMs fail this one too, so more of the same. https://t.co/6w6l1zjTRx pic.twitter.com/CejrZQMGR1

— Alexandros Marinos 🏴‍☠️ (@alexandrosM) July 12, 2023

Claude 2’s performance echoed that of other LLMs that Marino evaluated before, such as Bard, ChatGPT4, GPT4 (API) and StableVicuna. When a Twitter user questioned the tendency of LLMs to “simply regugiated the talking points they are fed with,” Marinos responded by stating, “With more recent data the answers tend to be better in general.”

However, the test demonstrated that Claude 2, like other LLMs, is not consistently supplied with the latest information, highlighting the persisting issue of accuracy within LLMs as a whole.

Source: mPost

Meta confirms work on GTA: San Andreas VR, then backtracks

One of the scariest VR horror games is also coming to Meta Quest

This Week in Crypto Games: Dr. Disrespect Dumped, Pixelverse and Catizen Tokens, Notcoin ‘Fresh Start’

Biggest Video Games Releasing in July 2024

AI Featured Posts

AI Deepfakes Pose ‘Real Risk’ to Markets, Says SEC Chair Gary Gensler

Meet CancerGPT: An AI That Predicts the Results of Cancer Treatment Research

Dealing with the limitations of our noisy world

Nikon made an AI imaging camera that detects when cows are about to give birth

Metaverse Featured Posts

Using headsets like Quest 3 and Vision Pro could have negative effects, finds new field study

Toy Trains is a relaxing VR game created by former Superhot devs

Pico 4: New firmware update brings many improvements

VR hero shooter Brazen Blaze finally has a launch date

NFTs Featured Posts

Gucci’s Milan Fashion Week Show Debuts on Roblox and Zepeto

Google Play Sets Clear Guidelines for Developers Regarding NFT Games and Apps in Updated Policy

Adidas Creates Residency for NFT Artists With Plans for Physical Collabs

This Premium Glow-in-the-Dark Game Boy Plays All the Classics

Let's Get Social

AI Inaccuracy Strikes Again: ChatGPT Competitor Claude 2 Flunks Scientific Accuracy Test Like Other LLMs

Anthropic launches ChatGPT rival Claude 2

Best 10 ChatGPT Jailbreaks in 2023

Leave a Reply Cancel reply

Meta confirms work on GTA: San Andreas VR, then backtracks

One of the scariest VR horror games is also coming to Meta Quest

This Week in Crypto Games: Dr. Disrespect Dumped, Pixelverse and Catizen Tokens, Notcoin ‘Fresh Start’

Biggest Video Games Releasing in July 2024

Checkmate? Using AI to Build a Better, More Creative Chess Foe

AI Featured Posts

Metaverse Featured Posts

NFTs Featured Posts

Let's Get Social

AI Inaccuracy Strikes Again: ChatGPT Competitor Claude 2 Flunks Scientific Accuracy Test Like Other LLMs

Share this article

Anthropic launches ChatGPT rival Claude 2

Best 10 ChatGPT Jailbreaks in 2023

Leave a Reply Cancel reply

Read next