Training and testing stage of Mimic Blocker. The red arrows represent the computational pathways used to calculate the loss functions.
For adversarial attacks to be successful, the noise-injected style audio (ADVERSARIAL WAVEFORM) should be imperceptible to human listeners while the converted output (ADVERSARIAL OUTPUT) should maximally deviate from the original speaker's vocal characteristics.
| Style Waveform | Original Output | Adversarial Waveform | Adversarial Output |
|---|---|---|---|
|
Sample 1
|
Sample 1
|
Sample 1
|
Sample 1
|
|
Sample 2
|
Sample 2
|
Sample 2
|
Sample 2
|
|
Sample 3
|
Sample 3
|
Sample 3
|
Sample 3
|
|
Sample 4
|
Sample 4
|
Sample 4
|
Sample 4
|
| Style Waveform | Original Output | Adversarial Waveform | Adversarial Output |
|---|---|---|---|
|
Sample 1
|
Sample 1
|
Sample 1
|
Sample 1
|
|
Sample 2
|
Sample 2
|
Sample 2
|
Sample 2
|
|
Sample 3
|
Sample 3
|
Sample 3
|
Sample 3
|
|
Sample 4
|
Sample 4
|
Sample 4
|
Sample 4
|