A new milestone in AI music creation
With the rapid development of artificial intelligence technology, the field of music creation is undergoing an unprecedented change. Recently, Tencent AI Lab released an open-source music generation model called SongGeneration, an innovation that provides strong technical support for the vision of "everyone can create music".
Traditional music creation often requires specialized music knowledge and expensive equipment, and the emergence of SongGeneration completely breaks these thresholds. The model is not only capable of generating high-quality musical works, but more importantly, it is open to the whole society in the form of open source, so that every ordinary user can experience the charm of AI-assisted music creation.
Against the backdrop of the current common challenges of poor sound quality performance, insufficient musicality, and slow generation speed in music generation technology, SongGeneration has successfully solved these key problems through its innovative technical architecture and training methodology, setting a new benchmark for the music AI field.
SongGeneration model experience address:https://huggingface.co/spaces/tencent/SongGeneration
Powerful features put music creation at your fingertips
SongGeneration is equipped with four core features, each of which demonstrates its technological prowess in the field of music generation:
Intelligent Text Control
Users only need to enter a simple combination of keywords to generate a complete musical composition that matches the desired style and mood. For example, when the user enters "happy pop", the system will automatically create a pop song with a happy atmosphere; when the user enters "intense rock", the system will generate a rock song with a strong rhythm. This intuitive interaction makes music creation easier than ever.
Precision style following
This feature allows users to upload a reference audio clip of 10 seconds or more, which SongGeneration analyzes in depth and generates a new piece of music that is highly consistent in style. Whether it's pop, rock, Chinese, or a variety of "sacred songs", the model can accurately capture and reproduce their essence, while ensuring that the newly generated music has good musicality.
Multi-orbit generation technology
SongGeneration automatically generates separate vocal and backing tracks, a feature of great importance for music production. The system ensures a high degree of melodic, structural, rhythmic and orchestral matching, which greatly facilitates post-production music editing and mixing.
Tone Cloning Capability
Reference audio-based tone following allows SongGeneration to generate vocal performances with "tone clone" levels. The resulting songs not only sound highly similar to the reference audio, but also maintain a natural sound and outstanding sound quality, as well as being emotionally expressive.


Revolutionary Technology Architecture and Innovative Breakthroughs
SongGeneration's technical architecture consists of two core components, the data processing pipeline and the generative model, and achieves superior performance through a series of innovative technologies.
Data processing pipeline
The model constructs a complete music data processing system that integrates several key modules such as audio-accompaniment separation, structure analysis, and lyrics recognition. Through this pipeline, the system is able to accurately extract lyrics information from the raw audio, and at the same time obtain important labeling data such as music structure, genre type, sound quality level, etc., which provides a high-quality data base for subsequent model training.

Ultra Low Bit Rate Codecs
SongGeneration has achieved a major breakthrough in the field of music codecs by developing a dual-channel 48kHz high-quality music codec with the lowest bit rate of any open source model in the industry. The codec achieves the best music reconstruction results available today at a very low bit rate of only 25Hz and 0.35kbps, significantly reducing the burden of modeling language models.
The system is designed with two coding modes, Hybrid and Dual: the Hybrid mode models vocals and backing tracks in a unified way, ensuring that they are harmonically coherent; the Dual mode models them independently, making the details clearer.

Parallel prediction with multiple classes of tokens
The model pioneers a parallel prediction strategy of "mix first, double-track second" for multi-category tokens. Firstly, the language model predicts hybrid tokens to guide the overall arrangement of high-level structural information such as melody and rhythm, and then the extended autoregressive decoder models two-track tokens to capture the fine-grained variations of vocals and backing vocals. This design achieves parallel prediction without significantly increasing the sequence length and avoids mutual interference between tokens.
Multidimensional Human Preference Alignment
SongGeneration is the industry's first large model of music generation that aligns multi-dimensional human preferences, focusing on three dimensions: musicality preference, lyrics alignment preference, and cue consistency preference:
Type of preference | Construction Methods | effect |
---|---|---|
musicality preference | Training reward models with a small amount of manually labeled scoring data | Enhance the artistry and listening experience of generating music |
Lyrics Alignment Preferences | Calculating the number of phoneme errors using a pre-trained ASR model | Ensure that the lyrics are an accurate match to what is being sung |
Cue consistency preferences | Calculating Text-Audio Similarity by MuQ-MuLan | Enhanced model compliance with user instructions |
Three-stage training paradigm
The model adopts an innovative three-phase training strategy: the pre-training phase focuses on the modal alignment of different conditional inputs with music representations; the modular extension training phase trains extension modules to achieve parallel modeling of two-track tokens; and the multi-preference alignment training phase integrates human preferences to optimize the model towards generating music that matches human preferences.
Authoritatively recognized for superior performance
In order to comprehensively assess SongGeneration's performance, Tencent AI Lab, in conjunction with the School of Music and Recording Arts at the Communication University of China, established a comprehensive evaluation system that includes objective analysis and subjective perception.
Objective evaluation results
In an objective tool review, SongGeneration was thoroughly compared to several commercial models (Suno v4.5, Sponge Music, Mureka O1) and open source models (YuE, DiffRhythm, ACE-Step, SongGen):
Evaluation Dimension | SongGeneration Performance | Ranking |
---|---|---|
Production Quality (PQ) | talented | be number one (best or worst) |
Content appreciation (CE) | talented | be number one (best or worst) |
Content Utility (CU) | talented | be number one (best or worst) |
Production Complexity (PC) | favorable | lead |

Subjective assessment results
SongGeneration excels in several key dimensions in subjective manual reviews:
- Lyrics accuracy: outperforms many large models, including Suno, by showing excellent speech-to-text alignment
- melodic performance: Excellent in terms of musicality, emotional expression and musical linearity
- quality of accompaniment: Orchestration is rich and varied and blends well with the main theme
- overall performance: Comparable to Suno's latest v4.5 version, up to commercial modeling level.
The test results show that SongGeneration is firmly in the first place among the open source models, and also ranks top in the comparison of commercial models, fully proving its technical strength and application value.

Open Ecology Helps Popularize Music Creation
SongGeneration is not only technologically advanced, but more importantly, it adopts a completely open source approach to the community, injecting strong momentum into the development of the music AI ecosystem.
Multi-platform experience approach
Currently, users can experience SongGeneration through multiple channels:
- Hugging Face Platform::https://huggingface.co/tencent/SongGeneration
- GitHub open source repository::https://github.com/tencent-ailab/SongGeneration
- academic paper::https://arxiv.org/abs/2506.07520
As an open source project, SongGeneration opens a new path for the development of the music AI field. It not only lowers the technical threshold of music creation, but also provides powerful basic tools for researchers and developers. With the continuous contribution of the community and the continuous iteration of the technology, we have reason to believe that SongGeneration will push the entire music creation industry in the direction of more intelligent and popularization.
This milestone achievement marks a major breakthrough in AI music creation technology, truly realizing the vision of "everyone can create music" and injecting infinite possibilities for the development of the future music industry.