OpenAI Launches 'Reasoning' AI Model Optimized for STEM
        
        
        
        OpenAI has launched o1, a new family of AI models that are optimized  for "reasoning-heavy" tasks like math, coding and science.
OpenAI o1-preview  and its lighterweight counterpart, OpenAI o1-mini, use "chain of  thought" reasoning to answer prompts. They may take longer to solve  problems for that reason, but are more likely to provide accurate outputs,  specifically in response to complex, multistep problems. "Through  training, they learn to refine their thinking process, try different  strategies, and recognize their mistakes," OpenAI said in a blog post.
Based on  reports, "o1" is the public name for "Strawberry,"  the top-secret AI project that OpenAI has been working on since at least last  year, when it was internally labeled "Q-star." 
Though the primary  o1 model is still in preview, it represents an important step in OpenAI's road to  artificial general intelligence (AGI). According to OpenAI's testing,  when it exits preview, o1 will significantly outperform GPT-4o and be on par with human experts when asked to solve complex math, chemistry,  physics and biology problems: 
OpenAI o1 ranks in the 89th percentile on  competitive programming questions (Codeforces), places among the top 500  students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds  human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems  (GPQA). 
o1 also appears  better at warding off jailbreak  attacks, which are designed to make AI systems violate their own safeguards  around security and responsible use. In what OpenAI called one of its "hardest  jailbreaking tests," GPT-4o scored 22 (on a 0-100 scale) compared to o1-preview's  84. OpenAI attributed the improvement to its decision to train o1 to include the  company's model behavior policies into its chain of reasoning. 
"By  teaching the model our safety rules and how to reason about them in context, we  found evidence of reasoning capability directly benefiting model robustness,"  OpenAI said. "We believe that using a chain of thought offers significant  advances for safety and alignment because (1) it enables us to observe the  model thinking in a legible way, and (2) the model reasoning about safety rules  is more robust to out-of-distribution scenarios."
The o1 family  does have its shortcomings. o1-preview is not yet feature-complete, lacking  multimodal support and Web browsing capabilities. "For many common cases  GPT-4o will be more capable in the near term," said OpenAI. Meanwhile,  o1-mini is less useful for non-STEM prompts — for instance, those that require  "broad world knowledge."
OpenAI expects  to issue regular updates to improve the models. Meanwhile, it said, "We  believe o1 — and its successors — will unlock many new use cases for AI in  science, coding, math, and related fields."
Both  o1-preview and o1-mini are now available to ChatGPT Plus and Team users, while ChatGPT  Enterprise and Edu users will get access sometime next week. Non-paying users  of ChatGPT will eventually get access to o1-mini, though OpenAI did not provide  a timeframe for this.  
For more information, visit the OpenAI site here.