Controllable Multi-Speaker Emotional Speech Synthesis With Emotion Representation of High Generalization Capability

code

1. TTS Samples （English） on Cross-speaker Emotion Transfer on the ESD emotional speech dataset

Emotion Reference Audio Target Speaker The Proposed Model(ours) Mspk-GST Mspk-VAE EDM

Angry

Happy

Surprise

Neutral

Sad

2. TTS Samples (Mandarin) on Cross-speaker Emotion Transfer on the ESD emotional speech dataset

Emotion Reference Audio Target Speaker The Proposed Model(ours) Mspk-GST Mspk-VAE EDM

Angry

Happy

Surprise

Neutral

Sad

3. TTS Samples (Mandarin) on Cross-speaker Emotion Transfer on the DOE emotional speech dataset

Emotion Reference Audio Target Speaker The Proposed Model(ours) Mspk-GST Mspk-VAE EDM

Angry

Happy

Surprise

Sad

4. Extra samples on controllable emotional speech synthesis

Emotion Target Speaker Weak Medium Strong

Angry

Happy

Surprise

Sad