Recently, Ali Tongyi Thousand Questions team released a new generation of large model Qwen 3, which topped the global open source model throne upon its launch. Compared with its predecessor, Qwen 3 has made significant breakthroughs in inference capability, multi-language support, and deployment cost, etc. The performance of its flagship model, Qwen3-235B-A22B, is comparable to or even surpasses that of top models such as DeepSeek-R1, OpenAI's o1, o3-mini, XAI's Grok-3, and Google's Gemini-2.5-Pro. models.

Fully open source Qwen 3 family
The Qwen 3 family of models continues to be open-sourced under the relaxed Apache 2.0 protocol, which allows developers, research organizations and enterprises worldwide to download and commercialize the models for free. The open source Qwen 3 family includes two MoE models and six dense models:
- MoE model::
- Qwen3-235B-A22B (total reference number 235B, activation reference number 22B)
- Qwen3-30B-A3B (total number of participants 30B, number of activated participants 3B)
- intensive model::
- Qwen3-32B
- Qwen3-14B
- Qwen3-8B
- Qwen3-4B
- Qwen3-1.7B
- Qwen3-0.6B

It is worth noting that although Qwen3-235B-A22B has a much larger total number of references than other open source models, its actual deployment cost is dramatically lower - only four H20s are needed to deploy the full-blooded version, and the video memory footprint is only one-third of that of a model with similar performance.
Superior performance across all major benchmarks
The Qwen 3 series has performed well in various professional reviews and set several open source model records:
- Qwen3 scored 81.5 points in the AIME25 assessment at OU level, setting a new open source record!
- In the LiveCodeBench assessment, which evaluates coding capabilities, Qwen3 breaks the 70-point mark and outperforms Grok-3.
- Qwen3 outperformed OpenAI-o1 and DeepSeek-R1 with a score of 95.6 on the ArenaHard measure, which evaluates the model's human-preference alignment
- In the BFCL review, which evaluates a model's Agent capability, Qwen3 hit a new high of 70.8, surpassing top models such as Gemini2.5-Pro and OpenAI-o1
Even smaller models, such as Qwen3-4B, can match the performance of Qwen2.5-72B-Instruct, demonstrating significant efficiency gains. The smaller MoE model, Qwen3-30B-A3B, has only one-tenth the number of activation parameters of QwQ-32B, but has even better performance.

Groundbreaking "hybrid reasoning" model
One of the biggest innovations in Qwen3 is the introduction of "mixed reasoning" mode, which supports seamless switching between thinking and non-thinking modes:
- cast: Modeling step-by-step reasoning to give a final answer after careful consideration, suitable for complex problems that require in-depth thinking
- modus vivendi: The model provides fast, near-instantaneous response for simple problems that require more speed than depth
Users can flexibly control the inference process of the model according to the complexity of the task, and even set the "think budget" (i.e., the number of tokens expected to think in the maximum depth) to find the optimal balance between performance and cost. Benchmarks show that the think mode significantly improves model performance in tasks such as AIME24, AIME25, LiveCodeBech (v5) and GPQA Diamond.
Ali provides a simple soft-switching mechanism that allows users to dynamically control the model's thinking mode by adding "/think" and "/no_think" tags to the dialog.

Multi-language support and Agent capability enhancement
The Qwen3 model supports 119 languages and dialects, significantly extending its global application potential. At the same time, the Agent and code capabilities of the model have been significantly enhanced:
- Native support for MCP protocol
- Powerful tool invocation capabilities
- Work with the Qwen-Agent framework to greatly reduce coding complexity
- Achieves leading performance in complex intelligences-based tasks
Strong technical foundation: 36 trillion token pre-training
Qwen3's superior performance is built on huge training data and a well-designed training process:
- Pre-training data volume reaches 36 trillion tokens, almost twice as much as Qwen 2.5
- Coverage of 119 languages and dialects
- In addition to web data, high-quality information extracted from documents such as PDFs is also included.
- Generating Large Amounts of Synthetic Data with Qwen2.5-Math and Qwen2.5-Coder for Enhanced Math and Code Capabilities
The pre-training process is divided into three phases:
- Basic language capability building: pre-training on over 30 trillion tokens with context length of 4K tokens
- Knowledge dense optimization: increase the proportion of data for STEM, programming and reasoning tasks, etc., and continue training on an additional 5 trillion tokens
- Context capability extension: use high quality long context data to extend the context length to 32K tokens
The post-training phase uses a four-phase process that includes long thought chain cold start, long thought chain reinforcement learning, thought pattern fusion, and generalized reinforcement learning to create hybrid models that are capable of both complex reasoning and rapid response.

Community Reaction and Practical Experience
Qwen3 was open-sourced in less than 3 hours, and GitHub went on a 17k-star spree, triggering an overwhelming response from the open source community. Apple engineer Awni Hannun announced that Qwen3 is now supported by the MLX framework, allowing all types of Apple devices, from the iPhone to the M2/M3 Ultra, to run Qwen3 models of different specifications natively.
A number of real-world tests have shown that Qwen3 can easily cope with complex reasoning problems such as mathematical proofs and programming tasks. For example, in a complex programming task (writing a Snake game with a Pinto chase feature), Qwen3-235B-A22B gave runnable code in only about 3 minutes.
Some users have tested it and found that compared to the Llama model with the same number of parameters, Qwen3 shows significant advantages, reasoning deeper, maintaining longer contexts, and solving more difficult problems.
Guide
The Qwen3 model is now live and available online in the MagicBuilder community, Hugging Face and GitHub:
- Online Experience:https://chat.qwen.ai/
- Magic Match Community:https://modelscope.cn/collections/Qwen3-9743180bdc6b48
- Hugging Face:https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f
- GitHub:https://github.com/QwenLM/Qwen3
For deployment, frameworks such as SGLang and vLLM are officially recommended; for local use, tools such as Ollama, LMStudio, MLX, llama.cpp and KTransformers are recommended.
These tools ensure that users can easily integrate Qwen3 into a variety of workflows, whether for research, development or production environments. A standard example of using the transformers library is shown below:
from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-30B-A3B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype="auto", torch_dtype="auto")
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
messages = [ {"role": "user", "content": prompt} ]
text = tokenizer.apply_chat_template(
messages, tokenize=False, tokenizer.apply_chat_template(
tokenize=False,
add_generation_prompt=True, enable_thinking=True 1TP
enable_thinking=True # Switch between thinking and non-thinking modes. default is True.
Default is True. )
concluding remarks
So far, Ali Tongyi has open-sourced more than 200 models, with more than 300 million downloads globally and more than 100,000 models derived from a thousand questions, surpassing Llama in the U.S. to become the world's No. 1 open-source model.The open-sourcing of Qwen3 not only marks another major breakthrough in China's AI technology, but also provides the global AI developer community with a powerful new tool to promote the prosperity of the open-source ecosystem.
