SongGeneration：开启AI音乐创作新时代的开源利器

Content Details

In a world where technology and knowledge are intertwined, every reading is like a marvelous adventure that makes you feel the power of wisdom and inspires endless creativity.

SongGeneration: the open source tool that opens a new era of AI music creation

A new milestone in AI music creation

With the rapid development of artificial intelligence technology, the field of music creation is undergoing an unprecedented change. Recently, Tencent AI Lab released an open-source music generation model called SongGeneration, an innovation that provides strong technical support for the vision of "everyone can create music".

Traditional music creation often requires specialized music knowledge and expensive equipment, and the emergence of SongGeneration completely breaks these thresholds. The model is not only capable of generating high-quality musical works, but more importantly, it is open to the whole society in the form of open source, so that every ordinary user can experience the charm of AI-assisted music creation.

Against the backdrop of the current common challenges of poor sound quality performance, insufficient musicality, and slow generation speed in music generation technology, SongGeneration has successfully solved these key problems through its innovative technical architecture and training methodology, setting a new benchmark for the music AI field.

SongGeneration model experience address:https://huggingface.co/spaces/tencent/SongGeneration

Powerful features put music creation at your fingertips

SongGeneration is equipped with four core features, each of which demonstrates its technological prowess in the field of music generation:

Intelligent Text Control

Users only need to enter a simple combination of keywords to generate a complete musical composition that matches the desired style and mood. For example, when the user enters "happy pop", the system will automatically create a pop song with a happy atmosphere; when the user enters "intense rock", the system will generate a rock song with a strong rhythm. This intuitive interaction makes music creation easier than ever.

Precision style following

This feature allows users to upload a reference audio clip of 10 seconds or more, which SongGeneration analyzes in depth and generates a new piece of music that is highly consistent in style. Whether it's pop, rock, Chinese, or a variety of "sacred songs", the model can accurately capture and reproduce their essence, while ensuring that the newly generated music has good musicality.

Multi-orbit generation technology

SongGeneration automatically generates separate vocal and backing tracks, a feature of great importance for music production. The system ensures a high degree of melodic, structural, rhythmic and orchestral matching, which greatly facilitates post-production music editing and mixing.

Tone Cloning Capability

Reference audio-based tone following allows SongGeneration to generate vocal performances with "tone clone" levels. The resulting songs not only sound highly similar to the reference audio, but also maintain a natural sound and outstanding sound quality, as well as being emotionally expressive.

Revolutionary Technology Architecture and Innovative Breakthroughs

SongGeneration's technical architecture consists of two core components, the data processing pipeline and the generative model, and achieves superior performance through a series of innovative technologies.

Data processing pipeline

The model constructs a complete music data processing system that integrates several key modules such as audio-accompaniment separation, structure analysis, and lyrics recognition. Through this pipeline, the system is able to accurately extract lyrics information from the raw audio, and at the same time obtain important labeling data such as music structure, genre type, sound quality level, etc., which provides a high-quality data base for subsequent model training.

Ultra Low Bit Rate Codecs

SongGeneration has achieved a major breakthrough in the field of music codecs by developing a dual-channel 48kHz high-quality music codec with the lowest bit rate of any open source model in the industry. The codec achieves the best music reconstruction results available today at a very low bit rate of only 25Hz and 0.35kbps, significantly reducing the burden of modeling language models.

The system is designed with two coding modes, Hybrid and Dual: the Hybrid mode models vocals and backing tracks in a unified way, ensuring that they are harmonically coherent; the Dual mode models them independently, making the details clearer.

Parallel prediction with multiple classes of tokens

The model pioneers a parallel prediction strategy of "mix first, double-track second" for multi-category tokens. Firstly, the language model predicts hybrid tokens to guide the overall arrangement of high-level structural information such as melody and rhythm, and then the extended autoregressive decoder models two-track tokens to capture the fine-grained variations of vocals and backing vocals. This design achieves parallel prediction without significantly increasing the sequence length and avoids mutual interference between tokens.

Multidimensional Human Preference Alignment

SongGeneration is the industry's first large model of music generation that aligns multi-dimensional human preferences, focusing on three dimensions: musicality preference, lyrics alignment preference, and cue consistency preference:

Type of preference	Construction Methods	effect
musicality preference	Training reward models with a small amount of manually labeled scoring data	Enhance the artistry and listening experience of generating music
Lyrics Alignment Preferences	Calculating the number of phoneme errors using a pre-trained ASR model	Ensure that the lyrics are an accurate match to what is being sung
Cue consistency preferences	Calculating Text-Audio Similarity by MuQ-MuLan	Enhanced model compliance with user instructions

Three-stage training paradigm

The model adopts an innovative three-phase training strategy: the pre-training phase focuses on the modal alignment of different conditional inputs with music representations; the modular extension training phase trains extension modules to achieve parallel modeling of two-track tokens; and the multi-preference alignment training phase integrates human preferences to optimize the model towards generating music that matches human preferences.

Authoritatively recognized for superior performance

In order to comprehensively assess SongGeneration's performance, Tencent AI Lab, in conjunction with the School of Music and Recording Arts at the Communication University of China, established a comprehensive evaluation system that includes objective analysis and subjective perception.

Objective evaluation results

In an objective tool review, SongGeneration was thoroughly compared to several commercial models (Suno v4.5, Sponge Music, Mureka O1) and open source models (YuE, DiffRhythm, ACE-Step, SongGen):

Evaluation Dimension	SongGeneration Performance	Ranking
Production Quality (PQ)	talented	be number one (best or worst)
Content appreciation (CE)	talented	be number one (best or worst)
Content Utility (CU)	talented	be number one (best or worst)
Production Complexity (PC)	favorable	lead

Subjective assessment results

SongGeneration excels in several key dimensions in subjective manual reviews:

Lyrics accuracy: outperforms many large models, including Suno, by showing excellent speech-to-text alignment
melodic performance: Excellent in terms of musicality, emotional expression and musical linearity
quality of accompaniment: Orchestration is rich and varied and blends well with the main theme
overall performance: Comparable to Suno's latest v4.5 version, up to commercial modeling level.

The test results show that SongGeneration is firmly in the first place among the open source models, and also ranks top in the comparison of commercial models, fully proving its technical strength and application value.

Open Ecology Helps Popularize Music Creation

SongGeneration is not only technologically advanced, but more importantly, it adopts a completely open source approach to the community, injecting strong momentum into the development of the music AI ecosystem.

Multi-platform experience approach

Currently, users can experience SongGeneration through multiple channels:

Hugging Face Platform::https://huggingface.co/tencent/SongGeneration
GitHub open source repository::https://github.com/tencent-ailab/SongGeneration
academic paper::https://arxiv.org/abs/2506.07520

As an open source project, SongGeneration opens a new path for the development of the music AI field. It not only lowers the technical threshold of music creation, but also provides powerful basic tools for researchers and developers. With the continuous contribution of the community and the continuous iteration of the technology, we have reason to believe that SongGeneration will push the entire music creation industry in the direction of more intelligent and popularization.

This milestone achievement marks a major breakthrough in AI music creation technology, truly realizing the vision of "everyone can create music" and injecting infinite possibilities for the development of the future music industry.

For more products, please check out	See more at
ShirtAI - Penetrating Intelligence	The AIGC Big Model: ushering in an era of dual revolution in engineering and science - Penetrating Intelligence
1:1 Restoration of Claude and GPT Official Website - AI Cloud Native	Live Match App Global HD Sports Viewing Player (Recommended) - BlueShirt.com
Transit service based on official API - GPTMeta API	Help, can anyone of you provide some tips on how to ask questions on GPT? - Knowing
Global Virtual Goods Digital Store - Global SmarTone (Feng Ling Ge)	How powerful is Claude airtfacts feature that GPT instantly doesn't smell good? -BeepBeep

categories.

advertising position

Witness the super magic of artificial intelligence together!

Embrace your AI assistant and boost your productivity with just one click!