Qwen 3发布：235B模型性能超越R1、Grok和o1，采用Apache 2.0许可证

Content Details

In a world where technology and knowledge are intertwined, every reading is like a marvelous adventure that makes you feel the power of wisdom and inspires endless creativity.

Qwen 3 released: 235B model outperforms R1, Grok and o1 with Apache 2.0 license

Recently, Ali Tongyi Thousand Questions team released a new generation of large model Qwen 3, which topped the global open source model throne upon its launch. Compared with its predecessor, Qwen 3 has made significant breakthroughs in inference capability, multi-language support, and deployment cost, etc. The performance of its flagship model, Qwen3-235B-A22B, is comparable to or even surpasses that of top models such as DeepSeek-R1, OpenAI's o1, o3-mini, XAI's Grok-3, and Google's Gemini-2.5-Pro. models.

Fully open source Qwen 3 family

The Qwen 3 family of models continues to be open-sourced under the relaxed Apache 2.0 protocol, which allows developers, research organizations and enterprises worldwide to download and commercialize the models for free. The open source Qwen 3 family includes two MoE models and six dense models:

MoE model::
- Qwen3-235B-A22B (total reference number 235B, activation reference number 22B)
- Qwen3-30B-A3B (total number of participants 30B, number of activated participants 3B)
intensive model::
- Qwen3-32B
- Qwen3-14B
- Qwen3-8B
- Qwen3-4B
- Qwen3-1.7B
- Qwen3-0.6B

It is worth noting that although Qwen3-235B-A22B has a much larger total number of references than other open source models, its actual deployment cost is dramatically lower - only four H20s are needed to deploy the full-blooded version, and the video memory footprint is only one-third of that of a model with similar performance.

Superior performance across all major benchmarks

The Qwen 3 series has performed well in various professional reviews and set several open source model records:

Qwen3 scored 81.5 points in the AIME25 assessment at OU level, setting a new open source record!
In the LiveCodeBench assessment, which evaluates coding capabilities, Qwen3 breaks the 70-point mark and outperforms Grok-3.
Qwen3 outperformed OpenAI-o1 and DeepSeek-R1 with a score of 95.6 on the ArenaHard measure, which evaluates the model's human-preference alignment
In the BFCL review, which evaluates a model's Agent capability, Qwen3 hit a new high of 70.8, surpassing top models such as Gemini2.5-Pro and OpenAI-o1

Even smaller models, such as Qwen3-4B, can match the performance of Qwen2.5-72B-Instruct, demonstrating significant efficiency gains. The smaller MoE model, Qwen3-30B-A3B, has only one-tenth the number of activation parameters of QwQ-32B, but has even better performance.

Groundbreaking "hybrid reasoning" model

One of the biggest innovations in Qwen3 is the introduction of "mixed reasoning" mode, which supports seamless switching between thinking and non-thinking modes:

cast: Modeling step-by-step reasoning to give a final answer after careful consideration, suitable for complex problems that require in-depth thinking
modus vivendi: The model provides fast, near-instantaneous response for simple problems that require more speed than depth

Users can flexibly control the inference process of the model according to the complexity of the task, and even set the "think budget" (i.e., the number of tokens expected to think in the maximum depth) to find the optimal balance between performance and cost. Benchmarks show that the think mode significantly improves model performance in tasks such as AIME24, AIME25, LiveCodeBech (v5) and GPQA Diamond.

Ali provides a simple soft-switching mechanism that allows users to dynamically control the model's thinking mode by adding "/think" and "/no_think" tags to the dialog.

Multi-language support and Agent capability enhancement

The Qwen3 model supports 119 languages and dialects, significantly extending its global application potential. At the same time, the Agent and code capabilities of the model have been significantly enhanced:

Native support for MCP protocol
Powerful tool invocation capabilities
Work with the Qwen-Agent framework to greatly reduce coding complexity
Achieves leading performance in complex intelligences-based tasks

Strong technical foundation: 36 trillion token pre-training

Qwen3's superior performance is built on huge training data and a well-designed training process:

Pre-training data volume reaches 36 trillion tokens, almost twice as much as Qwen 2.5
Coverage of 119 languages and dialects
In addition to web data, high-quality information extracted from documents such as PDFs is also included.
Generating Large Amounts of Synthetic Data with Qwen2.5-Math and Qwen2.5-Coder for Enhanced Math and Code Capabilities

The pre-training process is divided into three phases:

Basic language capability building: pre-training on over 30 trillion tokens with context length of 4K tokens
Knowledge dense optimization: increase the proportion of data for STEM, programming and reasoning tasks, etc., and continue training on an additional 5 trillion tokens
Context capability extension: use high quality long context data to extend the context length to 32K tokens

The post-training phase uses a four-phase process that includes long thought chain cold start, long thought chain reinforcement learning, thought pattern fusion, and generalized reinforcement learning to create hybrid models that are capable of both complex reasoning and rapid response.

Community Reaction and Practical Experience

Qwen3 was open-sourced in less than 3 hours, and GitHub went on a 17k-star spree, triggering an overwhelming response from the open source community. Apple engineer Awni Hannun announced that Qwen3 is now supported by the MLX framework, allowing all types of Apple devices, from the iPhone to the M2/M3 Ultra, to run Qwen3 models of different specifications natively.

A number of real-world tests have shown that Qwen3 can easily cope with complex reasoning problems such as mathematical proofs and programming tasks. For example, in a complex programming task (writing a Snake game with a Pinto chase feature), Qwen3-235B-A22B gave runnable code in only about 3 minutes.

Some users have tested it and found that compared to the Llama model with the same number of parameters, Qwen3 shows significant advantages, reasoning deeper, maintaining longer contexts, and solving more difficult problems.

Guide

The Qwen3 model is now live and available online in the MagicBuilder community, Hugging Face and GitHub:

Online Experience:https://chat.qwen.ai/
Magic Match Community:https://modelscope.cn/collections/Qwen3-9743180bdc6b48
Hugging Face:https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f
GitHub:https://github.com/QwenLM/Qwen3

For deployment, frameworks such as SGLang and vLLM are officially recommended; for local use, tools such as Ollama, LMStudio, MLX, llama.cpp and KTransformers are recommended.

These tools ensure that users can easily integrate Qwen3 into a variety of workflows, whether for research, development or production environments. A standard example of using the transformers library is shown below:

from modelscope import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-30B-A3B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype="auto", torch_dtype="auto")
        torch_dtype="auto",
        device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
        {"role": "user", "content": prompt}
messages = [ {"role": "user", "content": prompt} ]
text = tokenizer.apply_chat_template(
        messages, tokenize=False, tokenizer.apply_chat_template(
        tokenize=False,
        add_generation_prompt=True, enable_thinking=True 1TP
        enable_thinking=True # Switch between thinking and non-thinking modes. default is True.
Default is True. )

concluding remarks

So far, Ali Tongyi has open-sourced more than 200 models, with more than 300 million downloads globally and more than 100,000 models derived from a thousand questions, surpassing Llama in the U.S. to become the world's No. 1 open-source model.The open-sourcing of Qwen3 not only marks another major breakthrough in China's AI technology, but also provides the global AI developer community with a powerful new tool to promote the prosperity of the open-source ecosystem.

For more products, please check out	See more at
ShirtAI - Penetrating Intelligence	The AIGC Big Model: ushering in an era of dual revolution in engineering and science - Penetrating Intelligence
1:1 Restoration of Claude and GPT Official Website - AI Cloud Native	Live Match App Global HD Sports Viewing Player (Recommended) - BlueShirt.com
Transit service based on official API - GPTMeta API	Help, can anyone of you provide some tips on how to ask questions on GPT? - Knowing
Global Virtual Goods Digital Store - Global SmarTone (Feng Ling Ge)	How powerful is Claude airtfacts feature that GPT instantly doesn't smell good? -BeepBeep

categories.

advertising position

Witness the super magic of artificial intelligence together!

Embrace your AI assistant and boost your productivity with just one click!

Content Details

Qwen 3 released: 235B model outperforms R1, Grok and o1 with Apache 2.0 license

Fully open source Qwen 3 family

Superior performance across all major benchmarks

Groundbreaking "hybrid reasoning" model

Multi-language support and Agent capability enhancement

Strong technical foundation: 36 trillion token pre-training

Community Reaction and Practical Experience

Guide

concluding remarks

For more products, please check out

See more at

categories.

Newsletter

advertising position

Witness the super magic of artificial intelligence together!

The World's Strongest Artificial Intelligence

Navigation Index

Friendly Link

Contact Us