11756572

Self-Supervised Speech Representations for Fake Audio Detection

PublishedSeptember 12, 2023
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
16 claims

Legal claims defining the scope of protection, as filed with the USPTO.

3

3. The method of claim 2, wherein the single final audio feature vector comprises an averaging of each audio feature vector of the plurality of audio feature vectors.

4

4. The method of claim 2, wherein the single final audio feature vector comprises an aggregate of each audio feature vector of the plurality of audio feature vectors.

5

5. The method of claim 2, wherein the single fully-connected layer is configured to receive, as input, the single final audio feature vector and generate, as output, the score.

6

6. The method of claim 1, wherein the shallow discriminator model comprises one of a logistic regression model, a linear discriminant analysis model, or a random forest model.

7

7. The method of claim 1, wherein the trained self-supervised model is trained on a first training dataset comprising only training samples of human-originated speech.

8

8. The method of claim 7, wherein the shallow discriminator model is trained on the mixed training utterances and a second training dataset comprising training samples of synthetic speech, the second training dataset smaller than the first training dataset.

9

9. The method of claim 1, wherein the data processing hardware resides on the user device.

10

10. The method of claim 1, wherein the trained self-supervised model comprises a representation model derived from a larger trained self-supervised model.

13

13. The system of claim 12, wherein the single final audio feature vector comprises an averaging of each audio feature vector of the plurality of audio feature vectors.

14

14. The system of claim 12, wherein the single final audio feature vector comprises an aggregate of each audio feature vector of the plurality of audio feature vectors.

15

15. The system of claim 12, wherein the single fully-connected layer is configured to receive, as input, the single final audio feature vector and generate, as output, the score.

16

16. The system of claim 11, wherein the shallow discriminator model comprises one of a logistic regression model, a linear discriminant analysis model, or a random forest model.

17

17. The system of claim 11, wherein the trained self-supervised model is trained on a first training dataset comprising only training samples of human-originated speech.

18

18. The system of claim 17, wherein the shallow discriminator model is trained on the mixed training utterances and a second training dataset comprising training samples of synthetic speech, the second training dataset smaller than the first training dataset.

19

19. The system of claim 11, wherein the data processing hardware resides on the user device.

20

20. The system of claim 11, wherein the trained self-supervised model comprises a representation model derived from a larger trained self-supervised model.

Patent Metadata

Filing Date

Unknown

Publication Date

September 12, 2023

Inventors

Joel Shor
Alanna Foster Slocum

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Self-Supervised Speech Representations for Fake Audio Detection” (11756572). https://patentable.app/patents/11756572

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.