OpenAI releases o1 model that reasons with a ‘chain of thought’ but is not without its flaws

OpenAI releases o1 model that reasons with a ‘chain of thought’ but is not without its flaws

OpenAI has launched a new series of models that it says “can solve harder problems” than its earlier generative artificial intelligence (GenAI) models.

The California-based company said on Thursday it was releasing an early preview of the series, officially called o1-preview and o1-mini. The model has been code-named Strawberry.

OpenAI said that in its tests the new models performed similarly to PhD students on challenging tasks in physics, chemistry, and biology and did well in maths and coding.

The company said that it tested the model in a qualifying exam for the International Mathematical Olympiad (IMO), a high school math competition.

It had ten hours to solve six challenging algorithmic problems and was allowed 50 submissions per problem.

The o1 model solved 83 per cent of the problems while GPT-4o only solved 13 per cent, according to OpenAI.

What are the drawbacks?

The company notes that it does not have all the main features of ChatGPT, such as browsing the internet for information and uploading files and images. It also does not have image-analysing features, which have been disabled pending additional testing.

Another drawback is that it is very expensive. The new model is around three times the cost of GPT-4o for input and four times more expensive for output. The o1-preview is $15 (€13.50) per 1 million input tokens and $60 (€54) per 1 million output tokens. Tokens are raw data and 1 million tokens is around 750,000 words.

Related

For the moment it is not free to users but the company said it is planning to bring the o1-mini to all free ChatGPT users.

OpenAI also said in a technical paper that feedback from testers was that o1 tends to hallucinate (make things up) more than GPT-4o. It also does not admit as much to not having an answer to a question.

OpenAI co-founder and CEO Sam Altman said in a post on X that “o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it”.

‘Chain of thought’

OpenAI said that the model works “similar to how a human may think for a long time before responding to a difficult question,” adding that “o1 uses a chain of thought when attempting to solve a problem”.

OpenAI did not exactly show how this “chain of thought” reasoning worked, partly due to competitive advantage. But it did show “model generated summaries” of the chains of thought.

Working with governments

OpenAI said that to advance its commitments to AI safety, it recently formalised agreements with the US and UK AI Safety Institutes, which included granting institutes early access to the model prior to public release.

OpenAI did not mention working with European governments.