OpenAI 推出最新推理模型 o3 和 o4-mini：性能跃升与范式革新

Content Details

In a world where technology and knowledge are intertwined, every reading is like a marvelous adventure that makes you feel the power of wisdom and inspires endless creativity.

OpenAI Introduces Latest Inference Models o3 and o4-mini: Performance Jumps and Paradigm Innovation

On April 17, 2025, OpenAI officially released the new inference models o3 (full-blooded version) and o4-mini in a late-night live broadcast, replacing the previous old models such as o1 and o3-mini. This update achieves significant improvements in the areas of knowledge inference, multimodal processing, and code capabilities, while optimizing the pricing strategy to bring a more efficient AI experience to developers and users.

ShirtAI allows free unlimited use of GPT-4, GPT-4o strongest, GPT-4.1-mini and other models with one click from the official website:www.lsshirtai.com

I. Overview of the model: a comprehensive upgrade from parameters to positioning

OpenAI's o3 and o4-mini are based on a new architecture and focus on different scenarios:

o3: As a "full-blooded version" of the flagship model, it focuses on advanced reasoning and tool synergy, supports full-featured tool access (e.g., Python, networking browsing, and function calls), and realizes for the first time the "integration of visual reasoning into the chain of thought", which is applicable to complex problem solving.
o4-mini: a lightweight, high-performance model focusing on fast high-level reasoning and code/vision tasks, with an outstanding price/performance ratio while remaining efficient.

Second, the performance comparison: multi-dimensional ability to crush the old model

1. Knowledge-based reasoning: a tool-enabled accuracy spike

In math competitions, science problems, and cross-curricular tests, o3 and o4-mini show a crushing performance, especially when tools are allowed to be called:

Data sets / tasks	o1	o3-mini	o3 (tool-less)	o3 (with Python)	o4-mini (without tools)	o4-mini (with Python)
AIME 2024 Mathematics Competition (AC%)	74.3	87.3	91.6	95.2	93.4	98.7
Codeforces Code Contest (ELO)	1891	2073	–	2719	–	2073
GPQA Diamond Science Questions (AC%)	78	77	83.3	–	81.4	–
Humanity's Last Exam (AC%)	13.4	20.3	20.3	24.9	14.28	17.7

Key Findings:

o3 After calling Python, AIME accuracy improved from 91.6% to 95.2%, and Humanity's Last Exam improved accuracy by 24.9% thanks to the toolchain.
Although o4-mini is a lightweight model, it has reached 93.41 TP3T (AIME) without tools, which is close to the o3 tool version, and the price/performance ratio is outstanding. o4-mini-high solved the latest Project Euler problem in 2 minutes and 55 seconds, but it is not a simple problem, only 15 people can solve it in 30 minutes, and it is a new problem that came out only a few days ago, which is unlikely to appear in the o4 training set, showing that o4-mini-high relies on "thinking" to solve it. This is a new problem that came out only a few days ago and could not have appeared in o4's training set, which suggests that o4-mini-high relied on 'thinking' to solve it.

2. Multimodal Visual Reasoning: From "Image Recognition" to "Image Thinking"

For the first time, o3 and o4-mini support the integration of visual reasoning into the chain of thought, far surpassing older models in complex image understanding tasks:

data set	mission statement	o1	o3	o4-mini
MMMU (Visual Mathematics for Universities)	Formula + Graphical Integrated Problem Solving (AC%)	77.6	82.9	81.6
MathVista (visual math)	Geometric / Functional Image Reasoning (AC%)	71.8	87.5	84.3
CharXiv-Reasoning	Scientific Diagram Comprehension (AC%)	55.1	75.4	72

Significance of the breakthrough: o3 can "look at the picture and think" like human beings, realizing the paradigm upgrading from "pixel processing" to "scene reasoning". A user casually took a picture on the way to work and asked o3 to analyze the location. A user took a photo on his way to work and asked o3 to analyze the location. It first enlarged the picture in the interception, analyzed the key information in the picture, and then searched related web pages to narrow down the search scope step by step, and finally gave the specific location information.

3. Code and engineering capabilities: o3 is the first choice for developers

In software engineering tasks, o3 leads with tool access and code comprehension, while o4-mini is balanced in lightweight scenarios:

code task	norm	o1-high	o3-mini	o3-high	o4-mini-high
SWE-Bench Validation (AC%)	Algorithms / System Design	48.9	69.1	69.1	68.1
Aider Code Editor (whole)	Overall multilingual rewrite (%)	66.7	81.3	81.3	64.4
SWE-Lancer Order Taking Revenue	Freelance assignments ($)	118,000	177,000	236,000	–

Practical value: o3 has averaged $236,000 per month in real coding tasks, far outpacing the old model and becoming a core tool for enterprise-level code development; o4-mini is suitable for rapid prototyping and lightweight code debugging.

4. Tool use and implementation: o3 A new paradigm for building intelligences

o3 demonstrates greater task coherence in tool collaboration scenarios such as multi-round command following, browser manipulation, and function calls:

Tool Tasks	norm	o1-high	o3-mini	o3 (tool version)	o4-mini (tool version)
Scale MultiChallenge	Multi-round command following (AC%)	28.3	44.93	56.51	42.99
BrowseComp Browser Operations	Information Capture (AC%)	32.4	50.0	70.8	52.0
Tau-bench Function Calls	Structured output (AC%)	49.7	51.5	57.6 (Retail)	65.6 (Retail)

Key Benefits: o3 has commercial-grade capabilities in automating complex processes by autonomously operating virtual browsers and calling APIs to generate structured outputs such as flight booking JSON.

III. Parameters and Pricing: Full Optimization of Price/Performance Ratio

mould	reasoning ability	tempo	Price (Input/Output / Thousand Token)	Supported inputs	context window
o1	infrastructural	slowest	$15-$60	Text / Image	200,000
o3-mini	high level	moderate	$1.1-$4.4	copies	200,000
o4-mini	high level	moderate	$1.1-$4.4	Text / Image	200,000
o3	supreme	slowest	$10-$40	Text / Image	200,000
o1-pro	specialized field	slowest	$150-$600	Text / Image	200,000

Core adjustments: o3 is priced 1/3 lower than o1 for a much better price/performance ratio; o4-mini is priced the same as o3-mini, but with support for image input and better inference.

If you want to use GPT Plus, Claude Pro, Grok Super official paid exclusive account, you can contact our professional team (wx: abch891) if you don't know how to recharge yourself.

For more products, please check out	See more at
ShirtAI - Penetrating Intelligence	The AIGC Big Model: ushering in an era of dual revolution in engineering and science - Penetrating Intelligence
1:1 Restoration of Claude and GPT Official Website - AI Cloud Native	Live Match App Global HD Sports Viewing Player (Recommended) - BlueShirt.com
Transit service based on official API - GPTMeta API	Help, can anyone of you provide some tips on how to ask questions on GPT? - Knowing
Global Virtual Goods Digital Store - Global SmarTone (Feng Ling Ge)	How powerful is Claude airtfacts feature that GPT instantly doesn't smell good? -BeepBeep

categories.

advertising position

Witness the super magic of artificial intelligence together!

Embrace your AI assistant and boost your productivity with just one click!

Content Details

OpenAI Introduces Latest Inference Models o3 and o4-mini: Performance Jumps and Paradigm Innovation

I. Overview of the model: a comprehensive upgrade from parameters to positioning

Second, the performance comparison: multi-dimensional ability to crush the old model

1. Knowledge-based reasoning: a tool-enabled accuracy spike

2. Multimodal Visual Reasoning: From "Image Recognition" to "Image Thinking"

3. Code and engineering capabilities: o3 is the first choice for developers

4. Tool use and implementation: o3 A new paradigm for building intelligences

III. Parameters and Pricing: Full Optimization of Price/Performance Ratio

For more products, please check out

See more at

categories.

Newsletter

advertising position

Witness the super magic of artificial intelligence together!

The World's Strongest Artificial Intelligence

Navigation Index

Friendly Link

Contact Us