Controllable Multi-Speaker Emotional Speech Synthesis With Emotion Representation of High Generalization Capability

code

1. TTS Samples (English) on Cross-speaker Emotion Transfer on the ESD emotional speech dataset

Emotion Reference Audio Target Speaker The Proposed Model(ours) Mspk-GST Mspk-VAE EDM
Angry
Happy
Surprise
Neutral
Sad

2. TTS Samples (Mandarin) on Cross-speaker Emotion Transfer on the ESD emotional speech dataset

Emotion Reference Audio Target Speaker The Proposed Model(ours) Mspk-GST Mspk-VAE EDM
Angry
Happy
Surprise
Neutral
Sad

3. TTS Samples (Mandarin) on Cross-speaker Emotion Transfer on the DOE emotional speech dataset

Emotion Reference Audio Target Speaker The Proposed Model(ours) Mspk-GST Mspk-VAE EDM
Angry
Happy
Surprise
Sad

4. Extra samples on controllable emotional speech synthesis

Emotion Target Speaker Weak Medium Strong
Angry
Happy
Surprise
Sad