Content Details

In a world where technology and knowledge are intertwined, every reading is like a marvelous adventure that makes you feel the power of wisdom and inspires endless creativity.

OpenAI Introduces Latest Inference Models o3 and o4-mini: Performance Jumps and Paradigm Innovation

On April 17, 2025, OpenAI officially released the new inference models o3 (full-blooded version) and o4-mini in a late-night live broadcast, replacing the previous old models such as o1 and o3-mini. This update achieves significant improvements in the areas of knowledge inference, multimodal processing, and code capabilities, while optimizing the pricing strategy to bring a more efficient AI experience to developers and users.
ShirtAI allows free unlimited use of GPT-4, GPT-4o strongest, GPT-4.1-mini and other models with one click from the official website:www.lsshirtai.com

 

I. Overview of the model: a comprehensive upgrade from parameters to positioning

OpenAI's o3 and o4-mini are based on a new architecture and focus on different scenarios:
  • o3: As a "full-blooded version" of the flagship model, it focuses on advanced reasoning and tool synergy, supports full-featured tool access (e.g., Python, networking browsing, and function calls), and realizes for the first time the "integration of visual reasoning into the chain of thought", which is applicable to complex problem solving.
  • o4-mini: a lightweight, high-performance model focusing on fast high-level reasoning and code/vision tasks, with an outstanding price/performance ratio while remaining efficient.

Second, the performance comparison: multi-dimensional ability to crush the old model

1. Knowledge-based reasoning: a tool-enabled accuracy spike

In math competitions, science problems, and cross-curricular tests, o3 and o4-mini show a crushing performance, especially when tools are allowed to be called:
Data sets / tasks o1 o3-mini o3 (tool-less) o3 (with Python) o4-mini (without tools) o4-mini (with Python)
AIME 2024 Mathematics Competition (AC%) 74.3 87.3 91.6 95.2 93.4 98.7
Codeforces Code Contest (ELO) 1891 2073 2719 2073
GPQA Diamond Science Questions (AC%) 78 77 83.3 81.4
Humanity's Last Exam (AC%) 13.4 20.3 20.3 24.9 14.28 17.7
Key Findings:
  • o3 After calling Python, AIME accuracy improved from 91.6% to 95.2%, and Humanity's Last Exam improved accuracy by 24.9% thanks to the toolchain.
  • Although o4-mini is a lightweight model, it has reached 93.41 TP3T (AIME) without tools, which is close to the o3 tool version, and the price/performance ratio is outstanding. o4-mini-high solved the latest Project Euler problem in 2 minutes and 55 seconds, but it is not a simple problem, only 15 people can solve it in 30 minutes, and it is a new problem that came out only a few days ago, which is unlikely to appear in the o4 training set, showing that o4-mini-high relies on "thinking" to solve it. This is a new problem that came out only a few days ago and could not have appeared in o4's training set, which suggests that o4-mini-high relied on 'thinking' to solve it.

 

2. Multimodal Visual Reasoning: From "Image Recognition" to "Image Thinking"

For the first time, o3 and o4-mini support the integration of visual reasoning into the chain of thought, far surpassing older models in complex image understanding tasks:
data set mission statement o1 o3 o4-mini
MMMU (Visual Mathematics for Universities) Formula + Graphical Integrated Problem Solving (AC%) 77.6 82.9 81.6
MathVista (visual math) Geometric / Functional Image Reasoning (AC%) 71.8 87.5 84.3
CharXiv-Reasoning Scientific Diagram Comprehension (AC%) 55.1 75.4 72
Significance of the breakthrough: o3 can "look at the picture and think" like human beings, realizing the paradigm upgrading from "pixel processing" to "scene reasoning". A user casually took a picture on the way to work and asked o3 to analyze the location. A user took a photo on his way to work and asked o3 to analyze the location. It first enlarged the picture in the interception, analyzed the key information in the picture, and then searched related web pages to narrow down the search scope step by step, and finally gave the specific location information.

 

3. Code and engineering capabilities: o3 is the first choice for developers

In software engineering tasks, o3 leads with tool access and code comprehension, while o4-mini is balanced in lightweight scenarios:
code task norm o1-high o3-mini o3-high o4-mini-high
SWE-Bench Validation (AC%) Algorithms / System Design 48.9 69.1 69.1 68.1
Aider Code Editor (whole) Overall multilingual rewrite (%) 66.7 81.3 81.3 64.4
SWE-Lancer Order Taking Revenue Freelance assignments ($) 118,000 177,000 236,000
Practical value: o3 has averaged $236,000 per month in real coding tasks, far outpacing the old model and becoming a core tool for enterprise-level code development; o4-mini is suitable for rapid prototyping and lightweight code debugging.

 

 

 

 

4. Tool use and implementation: o3 A new paradigm for building intelligences

o3 demonstrates greater task coherence in tool collaboration scenarios such as multi-round command following, browser manipulation, and function calls:
Tool Tasks norm o1-high o3-mini o3 (tool version) o4-mini (tool version)
Scale MultiChallenge Multi-round command following (AC%) 28.3 44.93 56.51 42.99
BrowseComp Browser Operations Information Capture (AC%) 32.4 50.0 70.8 52.0
Tau-bench Function Calls Structured output (AC%) 49.7 51.5 57.6 (Retail) 65.6 (Retail)
Key Benefits: o3 has commercial-grade capabilities in automating complex processes by autonomously operating virtual browsers and calling APIs to generate structured outputs such as flight booking JSON.

 

III. Parameters and Pricing: Full Optimization of Price/Performance Ratio

mould reasoning ability tempo Price (Input/Output / Thousand Token) Supported inputs context window
o1 infrastructural slowest $15-$60 Text / Image 200,000
o3-mini high level moderate $1.1-$4.4 copies 200,000
o4-mini high level moderate $1.1-$4.4 Text / Image 200,000
o3 supreme slowest $10-$40 Text / Image 200,000
o1-pro specialized field slowest $150-$600 Text / Image 200,000
Core adjustments: o3 is priced 1/3 lower than o1 for a much better price/performance ratio; o4-mini is priced the same as o3-mini, but with support for image input and better inference.

If you want to use GPT Plus, Claude Pro, Grok Super official paid exclusive account, you can contact our professional team (wx: abch891) if you don't know how to recharge yourself.

For more products, please check out

See more at

ShirtAI - Penetrating Intelligence The AIGC Big Model: ushering in an era of dual revolution in engineering and science - Penetrating Intelligence
1:1 Restoration of Claude and GPT Official Website - AI Cloud Native Live Match App Global HD Sports Viewing Player (Recommended) - BlueShirt.com
Transit service based on official API - GPTMeta API Help, can anyone of you provide some tips on how to ask questions on GPT? - Knowing
Global Virtual Goods Digital Store - Global SmarTone (Feng Ling Ge) How powerful is Claude airtfacts feature that GPT instantly doesn't smell good? -BeepBeep

advertising position

Witness the super magic of artificial intelligence together!

Embrace your AI assistant and boost your productivity with just one click!