In the fierce competition in the field of artificial intelligence, Google has once again rewritten the rules of the game with disruptive innovation. The recently launched Gemini 2.5 Flash not only continues the powerful performance of its flagship model, but also realizes the ultimate balance between cost and efficiency through the revolutionary 'Hybrid Reasoning Architecture' and 'Thinking on Budget' mechanism. This breakthrough marks the formal entry of AI development into the era of "think-on-demand", providing unprecedented flexibility and cost-effectiveness for enterprises and developers.
ShirtAI can use Gemini-2.0-flash, Gemini-2.5-pro and other models for free and unlimitedly, the official website is one click away:www.lsshirtai.com
I. Performance Rampage: Redefining the Boundaries of Coding and Reasoning
1. Galton board test: crushing OpenAI's stunning performance
The Gemini 2.5 Flash demonstrated eye-popping power in the recent hotly debated Galton board physics simulation test. The task required the model to accurately simulate the trajectory of a small ball through a multi-layer obstacle board and ultimately present a normally distributed result. Test in progress:
-
-
- Gemini 2.5 Flash perfectly reproduces the laws of physics in just 5 prompts, generating smooth and natural animations that are fully consistent with the rules of real physics.
- OpenAI's models such as GPT-4o mini and O3-mini failed due to their inability to handle complex physical interactions, and even made low-level errors such as blob overlap and distribution anomalies.
- Jeff Dean, Google's chief scientist, personally praised the results of the test, calling it a 'seismic breakthrough in coding power'.
-
II. Technology Kernel: Hybrid Reasoning Architecture and the "Thinking Budget" Revolution
1. Hybrid inference models: a double breakthrough in performance and efficiency
The core innovation of Gemini 2.5 Flash is its hybrid reasoning architecture, a dynamic computing model that balances reasoning speed and accuracy. Unlike traditional models that think at full speed, Gemini 2.5 Flash allows developers to flexibly allocate the Thinking Budget, which is the number of tokens used by the model for internal reasoning before generating an answer, based on the complexity of the task. This mechanism is broken by:
- Controllable cost: when thinking is turned off, inference cost is reduced to $0.6/million tokens (only 1/6th of similar models), and performance is close to Pro when the highest thinking budget (24k tokens) is turned on.
- Dynamic adaptation: the model automatically adjusts the depth of thinking according to the difficulty of the task. For example, only a few hundred tokens are needed to complete reasoning in simple math problems, while tens of thousands of tokens can be consumed to pursue extreme precision in complex scientific analysis.
2. Arena test: comprehensively crushed similar models
In third-party review platform Imarena's arena rankings, Gemini 2.5 Flash is ranked second with 1392 Elo points in second place, tied with top models such as GPT-4.5 and Grok-3, and significantly better than Claude 3.7 Sonnet (1340 points) and DeepSeek R1 (1358 points). Its areas of strength include:
- Code generation: 63.5% single pass rate in LiveCodeBench V5 test (close to DeepSeek R1's 70.6%).
- Mathematical Reasoning: scored 78.01 TP3T on a single attempt in the AIME 2025 math competition simulation which surpasses Claude 3.7 Sonnet's 27.51 TP3T.
- Knowledge Quiz: Humanity's Last Exam test with 12.11 TP3T scoring second only to O4-mini (14.3%).
III. Developer Mania: Efficiency Leap and Cost Revolution
1. Extremely fast development experience: from prototype to live in a few lines of code
Developers have begun to utilize the flexibility of Gemini 2.5 Flash to complete complex projects:
-
- Physics simulation: netizen @RameshR generates normally distributed Galton plate animations in just 5 prompts, while OpenAI model fails due to physics engine flaws.
- Web Dev: Developer @Taro Bushidō's YouTube, Spotify knockoff interface that he built with is praised as a "pixel-perfect restoration of the official design".
- AI Agents: build MCP protocol agents for accessing Airbnb and Google Maps in just 30 lines of Python code.
2. Cost comparisons: a 'price/performance revolution' in AI
The following table visually compares the pricing strategy of Gemini 2.5 Flash with other models (based on millions of token inputs and outputs):
mould | Input cost ($/million tokens) | Output cost (reasoning off) | Output cost (reasoning on) |
---|---|---|---|
Gemini 2.5 Flash | $0.15 | $0.60 | $3.50 |
GPT-4o Mini | $0.10 | $1.10 | $4.40 |
Claude 3.7 Sonnet | $3.00 | $15.00 | – |
DeepSeek R1 | $3.00 | $15.00 | – |
Note: At a 3:1 ratio of inputs to outputs, the combined cost of the Gemini 2.5 Flash is only 1/30th of the Claude 3.7.
The release of Gemini 2.5 Flash marks the beginning of the shift of AI models from 'lab toys' to 'productivity tools'. Its hybrid inference architecture not only solves the contradiction between cost and performance, but also hints at the future direction of AI evolution: to realize infinite possibilities with limited arithmetic. As Google continues to iterate (such as the upcoming video generation plug-in), this cost-effective revolution led by Gemini may reshape the global AI development landscape.
If you want to use GPT Plus, Claude Pro, Grok Super official paid exclusive account, you can contact our professional team (wx: abch891) if you don't know how to recharge yourself.