Unsupervised Learning of Disentangled Speech Content and Style Representation

PublishedJuly 2, 2024

Assigneenot available in USPTO data we have

InventorsRuoming Pang Andros Tjandra Yu Zhang Shigeki Karita

Technical Abstract

Patent Claims

8 claims

Legal claims defining the scope of protection, as filed with the USPTO.

2. The model of claim 1, wherein the content encoder generates the latent representation of linguistic content as a discrete per-timestep latent representation of linguistic content that discards speaking style variations in the input speech.

3. The model of claim 1, wherein the content encoder is trained using a content VQ loss based on the latent representations of linguistic content generated for each timestep, the content VQ loss encouraging the content encoder to minimize a distance between an output and a nearest codebook.

5. The model of claim 1, wherein the style encoder is trained using a style regularization loss based on a mean and variance of style latent variables predicted by the style encoder, the style encoder using the style regularization loss to minimize a Kullback-Leibler (KL) divergence between a Gaussian posterior with a unit Gaussian prior.

7. The model of claim 6, wherein the model is trained using a reconstruction loss between the input speech and the reconstruction of the input speech output from the decoder.

10. The computer-implemented method of claim 9, wherein processing the input speech to generate the latent representation of linguistic content comprises processing the input speech to generate the latent representation of linguistic content as a discrete per-timestep latent representation of linguistic content that discards speaking style variations in the input speech.

11. The computer-implemented method of claim 9, wherein the content encoder is trained using a content VQ loss based on the latent representations of linguistic content generated for each timestep, the content VQ loss encouraging the content encoder to minimize a distance between an output and a nearest codebook.

13. The computer-implemented method of claim 9, wherein the style encoder is trained using a style regularization loss based on a mean and variance of style latent variables predicted by the style encoder, the style encoder using the style regularization loss to minimize a Kullback-Leibler (KL) divergence between a Gaussian posterior with a unit Gaussian prior.

15. The computer-implemented method of claim 14, wherein the model is trained using a reconstruction loss between the input speech and the reconstruction of the input speech output from the decoder.

Patent Metadata

Filing Date

Unknown

Publication Date

July 2, 2024

Inventors

Ruoming Pang

Andros Tjandra

Yu Zhang

Shigeki Karita

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search