12027151

Unsupervised Learning of Disentangled Speech Content and Style Representation

PublishedJuly 2, 2024
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
8 claims

Legal claims defining the scope of protection, as filed with the USPTO.

2

2. The model of claim 1, wherein the content encoder generates the latent representation of linguistic content as a discrete per-timestep latent representation of linguistic content that discards speaking style variations in the input speech.

3

3. The model of claim 1, wherein the content encoder is trained using a content VQ loss based on the latent representations of linguistic content generated for each timestep, the content VQ loss encouraging the content encoder to minimize a distance between an output and a nearest codebook.

5

5. The model of claim 1, wherein the style encoder is trained using a style regularization loss based on a mean and variance of style latent variables predicted by the style encoder, the style encoder using the style regularization loss to minimize a Kullback-Leibler (KL) divergence between a Gaussian posterior with a unit Gaussian prior.

7

7. The model of claim 6, wherein the model is trained using a reconstruction loss between the input speech and the reconstruction of the input speech output from the decoder.

10

10. The computer-implemented method of claim 9, wherein processing the input speech to generate the latent representation of linguistic content comprises processing the input speech to generate the latent representation of linguistic content as a discrete per-timestep latent representation of linguistic content that discards speaking style variations in the input speech.

11

11. The computer-implemented method of claim 9, wherein the content encoder is trained using a content VQ loss based on the latent representations of linguistic content generated for each timestep, the content VQ loss encouraging the content encoder to minimize a distance between an output and a nearest codebook.

13

13. The computer-implemented method of claim 9, wherein the style encoder is trained using a style regularization loss based on a mean and variance of style latent variables predicted by the style encoder, the style encoder using the style regularization loss to minimize a Kullback-Leibler (KL) divergence between a Gaussian posterior with a unit Gaussian prior.

15

15. The computer-implemented method of claim 14, wherein the model is trained using a reconstruction loss between the input speech and the reconstruction of the input speech output from the decoder.

Patent Metadata

Filing Date

Unknown

Publication Date

July 2, 2024

Inventors

Ruoming Pang
Andros Tjandra
Yu Zhang
Shigeki Karita

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Unsupervised Learning of Disentangled Speech Content and Style Representation” (12027151). https://patentable.app/patents/12027151

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.