December 11, 2024

Chinese AI startup DeepSeek’s newest model surpasses OpenAI’s o1 in ‘reasoning’ tasks

0

Chinese artificial intelligence startup DeepSeek has unveiled a new “reasoning” model that it says compare very favorably with OpenAI’s o1 large language model, which is designed to answer math and science questions with more accuracy than traditional LLMs.

The startup, which is an offshoot of the quantitative hedge fund High-Flyer Capital Management Ltd., revealed on X today that it’s launching a preview of its first reasoning model, DeepSeek-R1.

Reasoning models are different from standard LLMs thanks to their ability to “fact-check” their responses. To do this, they typically spend a much longer time considering how they should respond to a prompt, allowing them to sidestep problems such as “hallucinations,” which are common with chatbots like ChatGPT.

When OpenAI released the o1 model in September, it said it’s much better at dealing with queries and questions that require reasoning skills. That’s because it relies on a machine learning technique known as “chain of thought” or CoT, which allows it to break down complex tasks into smaller steps and carry them out one-by-one, improving its accuracy.

DeepSeek works in a similar way, planning ahead when presented with complex problems, solving them one after the other to ensure it can respond accurately. The process can take a while though, and like o1, it might need to “think” for up to 10 seconds before it can generate a response to a question.

The model’s thought process is entirely transparent too, allowing users to follow it as it tackles the individual steps required to arrive at an answer.

The startup says DeepSeek-R1 bests the capabilities of o1 on two key benchmarks, AIME and MATH. The former uses other AI models to evaluate the performance of LLMs, while the latter is a series of complex word problems. In addition, the model showed it correctly answered a number of “trick” questions that have tripped up existing models such as GPT-4o and Anthropic PBCs Claude, VentureBeat reported.

However, DeepSeek-R1 does suffer from a number of issues, with some commenters on X saying that it appears to struggle with logic problems such as Tic-Tac-Toe. That said, o1 also struggled with the same kinds of problems.

Users also reported that DeepSeek doesn’t respond to queries that the Chinese government likely deems to be too sensitive. When asked about incidents such as the Tiananmen Square massacre, Chinese President Xi Jingping’s relations with Donald Trump, and the potential of China invading Taiwan, it consistently replied that it’s “not sure how to approach this type of question.”

DeepSeek’s rejection of politically sensitive queries likely stems from the need for Chinese developers to ensure their models “embody core socialist values.”

That said, some users also revealed that it’s quite easy to jailbreak DeepSeek, and prompt it in a way that it ignores its guardrails. For example, one user found a way to get it to provide a detailed recipe and instructions for creating methamphetamine, which is, of course, highly illegal in most countries.

DeepSeek is a rather unusual AI startup thanks to its backing by a quantitative hedge fund that aims to use LLMs to enhance its trading strategies. It’s not new on the AI scene, having previously released an LLM called DeepSeek-V2 for general-purpose text and image generation and analysis. It was founded by a computer science graduate called Liang Wenfeng, and has the stated aim of achieving “superintelligent” AI.

DeepSeek-R1 can be accessed via the DeepSeek Chat application on the company’s website. Although it’s free to use, nonpaying users are limited to just 50 messages per day. The company is also planning to make DeepSeek-R1 available through an application programming interface.

Image: SiliconANGLE/Freepik AI

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU

Source link

About The Author

Leave a Reply

Your email address will not be published. Required fields are marked *