Legal claims defining the scope of protection, as filed with the USPTO.
2. The model of claim 1, wherein the content encoder generates the latent representation of linguistic content as a discrete per-timestep latent representation of linguistic content that discards speaking style variations in the input speech.
3. The model of claim 1, wherein the content encoder is trained using a content VQ loss based on the latent representations of linguistic content generated for each timestep, the content VQ loss encouraging the content encoder to minimize a distance between an output and a nearest codebook.
5. The model of claim 1, wherein the style encoder is trained using a style regularization loss based on a mean and variance of style latent variables predicted by the style encoder, the style encoder using the style regularization loss to minimize a Kullback-Leibler (KL) divergence between a Gaussian posterior with a unit Gaussian prior.
7. The model of claim 6, wherein the model is trained using a reconstruction loss between the input speech and the reconstruction of the input speech output from the decoder.
10. The computer-implemented method of claim 9, wherein processing the input speech to generate the latent representation of linguistic content comprises processing the input speech to generate the latent representation of linguistic content as a discrete per-timestep latent representation of linguistic content that discards speaking style variations in the input speech.
11. The computer-implemented method of claim 9, wherein the content encoder is trained using a content VQ loss based on the latent representations of linguistic content generated for each timestep, the content VQ loss encouraging the content encoder to minimize a distance between an output and a nearest codebook.
13. The computer-implemented method of claim 9, wherein the style encoder is trained using a style regularization loss based on a mean and variance of style latent variables predicted by the style encoder, the style encoder using the style regularization loss to minimize a Kullback-Leibler (KL) divergence between a Gaussian posterior with a unit Gaussian prior.
15. The computer-implemented method of claim 14, wherein the model is trained using a reconstruction loss between the input speech and the reconstruction of the input speech output from the decoder.
Unknown
July 2, 2024
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.