Training and testing stage of Mimic Blocker. The red arrows represent the computational pathways used to calculate the loss functions.
For adversarial attacks to be successful, the noise-injected style audio (ADVERSARIAL WAVEFORM) should be imperceptible to human listeners while the converted output (ADVERSARIAL OUTPUT) should maximally deviate from the original speaker's vocal characteristics.
Style Waveform | Original Output | Adversarial Waveform | Adversarial Output |
---|---|---|---|
Sample 1
|
Sample 1
|
Sample 1
|
Sample 1
|
Sample 2
|
Sample 2
|
Sample 2
|
Sample 2
|
Sample 3
|
Sample 3
|
Sample 3
|
Sample 3
|
Sample 4
|
Sample 4
|
Sample 4
|
Sample 4
|
Style Waveform | Original Output | Adversarial Waveform | Adversarial Output |
---|---|---|---|
Sample 1
|
Sample 1
|
Sample 1
|
Sample 1
|
Sample 2
|
Sample 2
|
Sample 2
|
Sample 2
|
Sample 3
|
Sample 3
|
Sample 3
|
Sample 3
|
Sample 4
|
Sample 4
|
Sample 4
|
Sample 4
|