OpenAI Launches 'Reasoning' AI Model Optimized for STEM

OpenAI has launched o1, a new family of AI models that are optimized for "reasoning-heavy" tasks like math, coding and science.

OpenAI o1-preview and its lighterweight counterpart, OpenAI o1-mini, use "chain of thought" reasoning to answer prompts. They may take longer to solve problems for that reason, but are more likely to provide accurate outputs, specifically in response to complex, multistep problems. "Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes," OpenAI said in a blog post.

Based on reports, "o1" is the public name for "Strawberry," the top-secret AI project that OpenAI has been working on since at least last year, when it was internally labeled "Q-star."

Though the primary o1 model is still in preview, it represents an important step in OpenAI's road to artificial general intelligence (AGI). According to OpenAI's testing, when it exits preview, o1 will significantly outperform GPT-4o and be on par with human experts when asked to solve complex math, chemistry, physics and biology problems:

OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).

o1 also appears better at warding off jailbreak attacks, which are designed to make AI systems violate their own safeguards around security and responsible use. In what OpenAI called one of its "hardest jailbreaking tests," GPT-4o scored 22 (on a 0-100 scale) compared to o1-preview's 84. OpenAI attributed the improvement to its decision to train o1 to include the company's model behavior policies into its chain of reasoning.

"By teaching the model our safety rules and how to reason about them in context, we found evidence of reasoning capability directly benefiting model robustness," OpenAI said. "We believe that using a chain of thought offers significant advances for safety and alignment because (1) it enables us to observe the model thinking in a legible way, and (2) the model reasoning about safety rules is more robust to out-of-distribution scenarios."

The o1 family does have its shortcomings. o1-preview is not yet feature-complete, lacking multimodal support and Web browsing capabilities. "For many common cases GPT-4o will be more capable in the near term," said OpenAI. Meanwhile, o1-mini is less useful for non-STEM prompts — for instance, those that require "broad world knowledge."

OpenAI expects to issue regular updates to improve the models. Meanwhile, it said, "We believe o1 — and its successors — will unlock many new use cases for AI in science, coding, math, and related fields."

Both o1-preview and o1-mini are now available to ChatGPT Plus and Team users, while ChatGPT Enterprise and Edu users will get access sometime next week. Non-paying users of ChatGPT will eventually get access to o1-mini, though OpenAI did not provide a timeframe for this.  

For more information, visit the OpenAI site here.

About the Author

Gladys Rama (@GladysRama3) is the editorial director of Converge360.

Featured

  • a stylized magnifying glass and a neural network pattern with interconnected nodes, symbolizing search and AI processes

    OpenAI Launching AI-Powered Search Engine

    OpenAI has unveiled SearchGPT, a new AI-powered search engine designed to access information from across the internet in real time. The much-anticipated prototype will provide more organized and meaningful search results by summarizing and contextualizing information rather than returning lists of links.

  • Abstract geometric pattern with interconnected nodes and lines

    Microsoft 365 Copilot Updates Offer Expanded AI Capabilities, Collaboration Tools

    Microsoft has announced updates to its Microsoft 365 Copilot AI assistant, including expanded AI capabilities in individual apps, the ability to create autonomous agents, and a new AI-powered collaboration workspace.

  • Google Adds AI Video Creator to Workspace Labs

    Google has added a new AI-powered video creation service as part of its Workspace Labs program, where users can try out new AI features.

  • landscape photo with an AI rubber stamp on top

    California AI Watermarking Bill Supported by OpenAI

    OpenAI, creator of ChatGPT, is backing a California bill that would require tech companies to label AI-generated content in the form of a digital "watermark." The proposed legislation, known as the "California Digital Content Provenance Standards" (AB 3211), aims to ensure transparency in digital media by identifying content created through artificial intelligence. This requirement would apply to a broad range of AI-generated material, from harmless memes to deepfakes that could be used to spread misinformation about political candidates.