US-10659588

Methods and systems for automatic discovery of fraudulent calls using speaker recognition

PublishedMay 19, 2020

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A computer-implemented method for determining potentially undesirable voices, according to some embodiments, includes: receiving a plurality of audio recordings, the plurality of audio recordings comprising voices associated with undesirable activity, and determining a plurality of audio components of each of the plurality of audio recordings. The method may further comprise generating a multi-dimensional vector of audio components, from the plurality of audio components, for each of the plurality of audio recordings to generate a plurality of multi-dimensional vectors of audio components, and comparing audio components between the plurality of multi-dimensional vectors of audio components to determine a plurality of clusters of multi-dimensional vectors, each cluster of the plurality of clusters comprising two or more of the plurality of multi-dimensional vectors of audio components, wherein each cluster of the plurality of clusters corresponds to a blacklisted voice. The method may further comprise receiving an audio recording or audio stream, and determining whether the audio recording or audio stream is associated with a voice associated with undesirable activity based on a comparison to the plurality of clusters.

Patent Claims

19 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method for determining voices associated with undesirable activity, comprising: receiving a plurality of audio recordings, the plurality of audio recordings comprising voices associated with undesirable activity; determining a plurality of audio components of each of the plurality of audio recordings; generating a multi-dimensional vector of audio components, from the plurality of audio components, for each of the plurality of audio recordings to generate a plurality of multi-dimensional vectors of audio components; comparing audio components between the plurality of multi-dimensional vectors of audio components to determine a plurality of clusters of multi-dimensional vectors, each cluster of the plurality of clusters comprising two or more of the plurality of multi-dimensional vectors of audio components, wherein each cluster of the plurality of clusters corresponds to a blacklisted voice; determining at least one erroneous cluster in the plurality of clusters by filtering the plurality of clusters, according to one or more predetermined thresholds, based on at least one of: cluster size, cluster coherence, cluster member metadata, a distribution of similarity scores, or a proportion of audio recordings associated with each of the plurality of clusters confirmed to be associated with undesirable activity; removing the erroneous cluster from the plurality of clusters; receiving an audio recording or audio stream; and determining whether the audio recording or audio stream is associated with a voice associated with undesirable activity based on a comparison to the plurality of clusters.

2. The method of claim 1 , further comprising: performing a nearest neighbor search on each of the plurality of multi-dimensional vectors of audio components to determine a nearest neighbor metric; and discarding one or more of the plurality of audio recordings in response to determining that the one or more of the plurality of audio recordings has a nearest neighbor metric smaller than a predetermined nearest neighbor metric.

3. The method of claim 2 , wherein comparing audio components between the plurality of multi-dimensional vectors of audio components comprises determining a minimum similarity between each of the plurality of multi-dimensional vectors of audio components and/or determining an average similarity between each of the plurality of multi-dimensional vectors of audio components.

4. The method of claim 1 , wherein determining whether an audio recording or audio stream is associated with a voice associated with undesirable activity comprises: receiving an audio recording, the audio recording comprising a voice; determining a plurality of audio components of the audio recording; generating a multi-dimensional vector of audio components of the audio recording; determining whether the multi-dimensional vector of audio components matches with one of the plurality of clusters of multi-dimensional vectors; and in response to determining that the multi-dimensional vector of audio components clusters with one of the plurality of clusters of the multi-dimensional vectors, flagging the audio recording as being associated with a voice associated with undesirable activity.

5. The method of claim 1 , wherein the plurality of clusters of multi-dimensional vectors is determined without human assistance.

6. The method of claim 1 , wherein each multi-dimensional vector of audio components of the plurality of multi-dimensional vectors of audio components comprises a fixed number of dimensions.

7. The method of claim 1 , wherein comparing audio components between the plurality of multi-dimensional vectors of audio components further comprises performing agglomerative hierarchical clustering across the plurality of audio recordings.

8. The method of claim 1 , further comprising: upon determining that the audio recording or audio stream is associated with a voice associated with undesirable activity, determining a genuine account based on the audio recording or audio stream; determining the verified owner of the genuine account; and generating an alert to the verified owner of the genuine account.

9. The method of claim 1 , wherein comparing audio components between the plurality of multi-dimensional vectors of audio components further comprises concatenating metadata, associated with the plurality of audio recordings, with the plurality of the plurality of multi-dimensional vectors.

10. The method of claim 1 , wherein the comparison to the plurality of clusters comprises determining a cosine similarity or log-likelihood ratio.

11. The method of claim 1 , further comprising: storing the plurality of clusters of multi-dimensional vectors in an undesirable voice data store.

12. A computer-implemented method for determining voices associated with undesirable activity, comprising: receiving an audio recording, the audio recording being associated with a voice associated with undesirable activity; determining a plurality of audio components of the audio recording; generating a multi-dimensional vector of audio components of the audio recording; determining whether the multi-dimensional vector of audio components matches with one of a plurality of clusters of multi-dimensional vectors, the plurality of clusters of multi-dimensional vectors being associated with a plurality of audio recordings, the plurality of audio recordings comprising voices associated with undesirable activity, the plurality of clusters of multi-dimensional vectors being automatically generated based on a plurality of audio recordings without human intervention; determining at least one erroneous cluster in the plurality of clusters of multi-dimensional vectors by filtering the plurality of clusters of multi-dimensional vectors, according to one or more predetermined thresholds, based on at least one of: cluster size, cluster coherence, cluster member metadata, a distribution of similarity scores, or a proportion of audio recordings associated with each of the plurality of clusters of multi-dimensional vectors confirmed to be associated with undesirable activity; removing the erroneous cluster from the plurality of clusters of multi-dimensional vectors; and in response to determining that the multi-dimensional vector of audio components matches one of the plurality of clusters of the multi-dimensional vectors, flagging the audio recording as potentially undesirable.

13. The method of claim 12 , wherein determining that the multi-dimensional vector of audio components matches one of the plurality of clusters comprises determining a cosine similarity or log-likelihood ratio.

14. The method of claim 12 , wherein each of the plurality of audio recordings contains a single distinct voice.

15. The method of claim 12 , wherein the multi-dimensional vector of audio components comprises a fixed number of dimensions.

16. The method of claim 12 , wherein determining whether the multi-dimensional vector of audio components matches with one of a plurality of clusters of multi-dimensional vectors is determined without human assistance.

17. The method of claim 12 , further comprising: upon determining that the audio recording or audio stream is associated with a potentially undesirable voice, determining a genuine account based on the audio recording or audio stream; determining the verified owner of the genuine account; and generating an alert to the verified owner of the genuine account.

18. The method of claim 12 , wherein the multi-dimensional vector of audio components is stored in a voice data store.

19. A computer system for determining potentially undesirable voices, the computer system comprising: a memory storing instructions; and one or more processors configured to execute the instructions to perform operations including: receiving a plurality of audio recordings, the plurality of audio recordings comprising voices associated with undesirable activity; determining a plurality of audio components of each of the plurality of audio recordings; generating a multi-dimensional vector of audio components, from the plurality of audio components, for each of the plurality of audio recordings to generate a plurality of multi-dimensional vectors of audio components; comparing audio components between the plurality of multi-dimensional vectors of audio components to determine a plurality of clusters of multi-dimensional vectors, each cluster of the plurality of clusters comprising two or more of the plurality of multi-dimensional vectors of audio components, wherein each cluster of the plurality of clusters corresponds to a blacklisted voice; determining at least one erroneous cluster in the plurality of clusters by filtering the plurality of clusters, according to one or more predetermined thresholds, based on at least one of: cluster size, cluster coherence, cluster member metadata, a distribution of similarity scores, or a proportion of audio recordings associated with each of the plurality of clusters confirmed to be associated with undesirable activity; removing the erroneous cluster from the plurality of clusters; receiving an audio recording or audio stream; and determining whether the audio recording or audio stream is associated with a voice associated with undesirable activity based on a comparison to the plurality of clusters.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L H04M

Patent Metadata

Filing Date

March 21, 2019

Publication Date

May 19, 2020

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search