Self-Supervised Speech Representations for Fake Audio Detection

PublishedSeptember 12, 2023

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

16 claims

Legal claims defining the scope of protection, as filed with the USPTO.

3. The method of claim 2, wherein the single final audio feature vector comprises an averaging of each audio feature vector of the plurality of audio feature vectors.

4. The method of claim 2, wherein the single final audio feature vector comprises an aggregate of each audio feature vector of the plurality of audio feature vectors.

5. The method of claim 2, wherein the single fully-connected layer is configured to receive, as input, the single final audio feature vector and generate, as output, the score.

6. The method of claim 1, wherein the shallow discriminator model comprises one of a logistic regression model, a linear discriminant analysis model, or a random forest model.

7. The method of claim 1, wherein the trained self-supervised model is trained on a first training dataset comprising only training samples of human-originated speech.

8. The method of claim 7, wherein the shallow discriminator model is trained on the mixed training utterances and a second training dataset comprising training samples of synthetic speech, the second training dataset smaller than the first training dataset.

9. The method of claim 1, wherein the data processing hardware resides on the user device.

10. The method of claim 1, wherein the trained self-supervised model comprises a representation model derived from a larger trained self-supervised model.

13. The system of claim 12, wherein the single final audio feature vector comprises an averaging of each audio feature vector of the plurality of audio feature vectors.

14. The system of claim 12, wherein the single final audio feature vector comprises an aggregate of each audio feature vector of the plurality of audio feature vectors.

15. The system of claim 12, wherein the single fully-connected layer is configured to receive, as input, the single final audio feature vector and generate, as output, the score.

16. The system of claim 11, wherein the shallow discriminator model comprises one of a logistic regression model, a linear discriminant analysis model, or a random forest model.

17. The system of claim 11, wherein the trained self-supervised model is trained on a first training dataset comprising only training samples of human-originated speech.

18. The system of claim 17, wherein the shallow discriminator model is trained on the mixed training utterances and a second training dataset comprising training samples of synthetic speech, the second training dataset smaller than the first training dataset.

19. The system of claim 11, wherein the data processing hardware resides on the user device.

20. The system of claim 11, wherein the trained self-supervised model comprises a representation model derived from a larger trained self-supervised model.

Patent Metadata

Filing Date

Unknown

Publication Date

September 12, 2023

Inventors

Joel Shor

Alanna Foster Slocum

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search