Tutti: Expressive Multi-Singer Synthesis via Structure-Level Timbre Control and Vocal Texture Modeling
with Residual-based Texture Modeling

Anonymous Author

Abstract. While existing Singing Voice Synthesis systems achieve high-fidelity solo performances, they are constrained by global timbre control, failing to address dynamic multi-singer arrangement and vocal texture within a single song. To address this, we propose Tutti, a unified framework designed for structured multi-singer generation. Specifically, we introduce a Structure-Aware Singer Prompt to enable flexible singer scheduling evolving with musical structure, and propose Complementary Texture Learning via Condition-Guided VAE to capture implicit acoustic textures (e.g., spatial reverberation and spectral fusion) that are complementary to explicit controls. Experiments demonstrate that Tutti excels in precise multi-singer scheduling and significantly enhances the acoustic realism of choral generation, offering a novel paradigm for complex multi-singer arrangement.


Single-Singer

The textures which are empty there are all the same as the reference tone.

In Vevo1.5 generated, we use a segmented generation and splicing approach, so the resulting audio may sound quite choppy.

Lyrics(Generated by AI) Reference Texture Reference Voice Vevo1.5 Generated Tutti Generated(Ours)
[verse]
Raindrops tap the window pane
Steam rising with coffee scent
Pages turn as time moves slow
Waiting for a familiar face to show
[chorus]
That coffee shop corner holds our yesterdays
Every cup holds memories that sway
When music floats between the air
You walk in like you never left there
[verse]
月光洒在小路上
风儿轻轻吹过窗
啦啦啦 心随云飘荡
[verse]
Raindrops scribble poems on cobblestone
Oil-paper umbrella shatters dusk's glow
Tea steam drifts through wooden window seams
Vinyl hums of yellowed yesteryears
Footsteps freeze at the alley's turn
Waiting for a glance that'll never return

Multi-Singers

The textures which are empty there are all the same as the reference tone.

In Vevo1.5 generated, we use a segmented generation and splicing approach, so the resulting audio may sound quite choppy.



Lyrics(Generated by AI) Reference Texture Reference Voice Vevo1.5 Generated Tutti Generated(Ours)
[intro]
[Singer1 verse]
Going out tonight, changes into something red Her mother doesn't like that kind of dress Everything she never had, she's showing off
[Singer2 verse]
Driving too fast, moon is breaking through her hair She's heading for something that she won't forget Having no regrets is all that she really wants
[Singer1 & Singer2 chorus_multi]
We're only getting older, baby And I've been thinking about it lately
Does it ever drive you crazy
Just how fast the night changes
Everything that you've ever dreamed of
Disappearing when you wake up
But there's nothing to be afraid of
Even when the night changes
It will never change me and you


Singer1



Singer2

[00:00] verse1
[00:16] verse2
[00:32] chorus
[Singer1 verse]
Yellowed letters folded to boats
Carrying vows from youthful throats
Ink blurs in the rain
Drifting to memories' shore
[Singer2 verse]
Old piano keys hum of longing
Moss on windowsills keeps growing
Time slips through my fingers slow
The summer unopened
[Singer1 & Singer2 chorus_multi]
Wind scatters boats to the sea's edge
Piano chords where we never said goodbye
Moss climbs the door of years
And you're still the love that lingers near


Singer1



Singer2

[00:00] verse1
[00:15] verse2
[00:30] chorus
[Singer1 verse]
泛黄信纸折成纸船
载着年少未寄的誓言
墨迹晕开在雨天
漂向记忆的彼岸
[Singer2 verse]
旧琴键弹着思念
窗台青苔又蔓延
时光在指缝沉淀
未拆封的夏天
[Singer1 & Singer2 chorus_multi]
风吹散纸船漂向海平线
琴声里我们从未说再见
青苔爬上岁月的门沿
你仍是我心尖的眷恋


Singer1



Singer2

[00:00] verse1
[00:16] verse2
[00:31] chorus
[Singer1 verse]
旧书页夹着褪色车票
那年站台汽笛声在飘
银杏叶铺满石板小道
你背影融进秋日薄雾
[Singer2 verse]
咖啡凉在窗台一角
等一句迟到的早安
雨滴在玻璃上画问号
像未寄出的心跳
[Singer1 & Singer2 chorus_multi]
时光是无声的邮差
送回泛黄的诺言
当月光漫过旧相框
你仍住在我眼眶


Singer1



Singer2

[00:00] verse1
[00:16] verse2
[00:34] chorus