US-6496800

Speaker verification system and method using spoken continuous, random length digit string

PublishedDecember 17, 2002

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A speaker verification system using the voice of a user uttering a continuous, random length digit string is provided. The speaker verification system includes a random digit generator for generating a continuous, random length digit string; a user interface for providing the continuous, random length digit string; a feature extractor for extracting voice features from the user's voice uttering the continuous, random length digit string; a digit voice verification unit for comparing the voice features with items in a speaker-independent continuous digit voice model to derive a digit string corresponding to items in the speaker-independent continuous digit voice model, which match the voice features, and for determining whether the derived digit string is identical to the digit string provided to the user via the user interface; and a speaker verification unit for comparing the voice features with a speaker-dependent model of the user to measure the similarity between them. The speaker-dependent model of the user includes previously determined features of the users' voice and determines whether to approve or reject the user based on the similarity.

Patent Claims

15 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A speaker verification system for verifying a user by the voice of the user uttering a continuous, random length digit string, the speaker verification system comprising: a random digit generator for generating a continuous, random length digit string; a user interface for providing the continuous, random length digit string, which is generated by the random digit generator, to the user and receiving the voice of the user uttering the provided continuous, random length digit string; a feature extractor for extracting voice features from the user's voice which is received via the user interface; a digit voice verification unit for comparing the voice features, which are extracted by the feature extractor, with items in a speaker-independent continuous digit voice model to derive a digit string corresponding to items in the speaker-independent continuous digit voice model, which match the voice features, and for determining whether the derived digit string is identical to the digit string, which has been provided to the user via the user interface; and a speaker verification unit for comparing the voice features, which are extracted by the feature extractor, with a speaker-dependent model of the user to measure the similarity between them, the speaker-dependent model of the user including previously determined features of the users' voice, and for determining whether to approve or reject the user based on the similarity, when it is determined that the derived digit string is identical to the digit string which has been provided to the user.

2. The speaker verification system of claim 1 , wherein the speaker verification unit comprises: a similarity measuring unit for comparing the voice features, which are extracted by the feature extractor, with the speaker-dependent model of the user to measure the similarity between them, when it is determined that the derived digit string is identical to the digit string which has been provided to the user; and a controller for determining whether to approve or reject the user based on the similarity which is measured by the similarity measuring unit and for constructing the speaker information of the user in the speaker-dependent model using the voice features, which are extracted by the feature extractor, when it is determined to approve the user.

3. The speaker verification system of claim 1 , wherein the verification unit comprises: a similarity measuring unit for comparing the voice features, which are extracted by the feature extractor, with the speaker-dependent model of the user to measure the similarity between them, when it is determined that the derived digit string is identical to the digit string which has been provided to the user; and a controller for determining whether the user's voice is misappropriated using similarity deviation, while determining whether to approve or reject the user based on the similarity measured by the similarity measuring unit.

4. A speaker verification method for verifying a user by the voice of a user uttering a continuous, random length digit voice, the speaker verification method comprising the steps of: (a) randomly generating a continuous, random length digit string; (b) providing the continuous, random length digit string to the user; (c) receiving the voice of the user uttering the continuous, random length digit string; (d) extracting voice features from the received user's voice; (e) comparing the extracted voice features with items in a speaker-independent continuous digit voice model to derive a digit string corresponding to items in the speaker-independent continuous digit voice model, which match the voice features, and determining whether the derived digit string is identical to the digit string provided to the user in the step (b); and (f) comparing the voice features extracted in the step (d), with a speaker-dependent model of the user and determining whether to approve or reject the user.

5. The speaker verification method of claim 4 , wherein the step (f) comprises the step of reconstructing the speaker information of the user in the speaker-dependent model using the voice features which are extracted in the step (d), when determining to approve the user.

6. A speaker verification method using continuous, random length digit voice, the speaker verification method comprising the steps of: (a) providing a continuous, random length digit string, which is randomly generated, to a user; (b) receiving the voice of the user uttering the continuous, random length digit string and extracting voice features from the received user's voice; (c) comparing the extracted voice features with items in a speaker-independent continuous digit voice model to derive a digit string corresponding to items in the speaker-independent continuous digit voice model, which are matched with the voice features, and determining whether the derived digit string is identical to the digit string, which has been provided to the user in the step (a); (d) comparing the voice features, which has been extracted in the step (c), with speaker model of the user and measuring the similarity between the voice features and the speaker model of the user, the similarity indicating the difference between phonetic values; (e) increasing a first speaker rejection count when the similarity, which is measured in the step (d), is greater than or equal to a predetermined lower similarity threshold and increasing a second speaker rejection count when the similarity is greater than or equal to a predetermined upper similarity threshold; and (f) after repeating the steps (a) through (e) a plurality of times, (f1) approving the user when the first speaker rejection count is 0, and rejecting the user when the second speaker rejection count is at least 1 or the first speaker rejection count exceeds a predetermined rejection count threshold; and (f2) determining whether to approve or reject the user based on the similarity measured in the step (d), when the second speaker rejection count is 0, and the first speaker rejection count is at least 1 and less than or equal to the rejection count threshold.

7. The speaker verification method of claim 6 , wherein the step (f) further comprises the step of reconstructing the speaker information of the user in the speaker-dependent model using the voice features extracted in the step (d), when it is determined that the user is approved.

8. The speaker verification method of claim 6 , wherein the step (f2) comprises the steps of: (f21) calculating deviation of the similarity measured in the step (d) when the second speaker rejection count is 0, and the first speaker rejection count is at least 1 and less than or equal to the rejection count threshold; and (f22) approving the user when the deviation of the similarity is greater than or equal to a predetermined similarity deviation threshold, and determining whether to approve or reject the user after verifying whether the user's voice is misappropriated when the deviation of the similarity is smaller than the similarity deviation threshold.

9. The speaker verification method of claim 8 , wherein the step (f22) comprises the steps of: (f221) approving the user when the deviation of the similarity is greater than or equal to the predetermined similarity deviation threshold, and requesting the user to repeat uttering of the continuous, random length digit string, which has already been input, and receiving the voice of the user uttering the digit string when the deviation of the similarity is smaller than the predetermined similarity deviation threshold; (f222) measuring similarity between the features of the user's voice, which is previously input, with features of the user's voice, which is input later; and (f223) rejecting the user when the similarity, which is measured in the step (f222), is smaller than a predetermined mechanical sound similarity threshold, and approving the user when the similarity, which is measured in the step (f222), is greater than or equal to the predetermined mechanical sound similarity threshold.

10. The speaker verification method of claim 9 , further comprising the step of reconstructing the speaker information of the user in the speaker-dependent model using the voice features extracted in the step (b), when it is determined that the user is approved.

11. A speaker registration method in a speaker verification system, the speaker registration method comprising the steps of: (a) providing a continuous digit string having various phonetic values to a user; (b) receiving the voice of the user uttering the continuous digit string; (c) extracting voice features from the received user's voice; (d) comparing the extracted voice features with items in a speaker-independent continuous digit voice model to derive a digit string corresponding to items in the speaker-independent continuous digit voice model, which match the voice features, and determining whether the derived digit string is identical to the digit string provided to the user in the step (a); (e) comparing the voice features extracted in the step (c), with speaker-dependent model of the user and measuring the similarity between the voice features and the speaker dependent model of the user; and (f) determining whether to register the user based on the measured similarity.

12. The speaker registration method of claim 11 , wherein the steps (a) through (e) are repeated until a sufficient model of the user is constructed.

13. A computer-readable recording medium for recording a program which is executed in a computer for speaker verification for verifying a user by the voice of the user uttering continuous, random length digit string, wherein the program comprises the steps of: (a) randomly generating a continuous, random length digit string; (b) providing the continuous, random length digit string to the user; (c) receiving the voice of the user uttering the continuous, random length digit string; (d) extracting voice features from the received user's voice; (e) comparing the extracted voice features with items in a speaker-independent continuous digit voice model to derive a digit string corresponding to items in the speaker-independent continuous digit voice model, which match the voice features, and determining whether the derived digit string is identical to the digit string provided to the user in the step (b); and (f) comparing the voice features extracted in the step (d), with a speaker-dependent model of the user and determining whether to approve or reject the user.

14. A computer-readable recording medium for recording a program which is executed in a computer for speaker verification for verifying a user by the voice of the user uttering continuous, random length digit string, wherein the program comprises the steps of: (a) providing a continuous, random length digit string, which is randomly generated, to a user; (b) receiving the voice of the user uttering the continuous, random length digit string and extracting voice features from the received user's voice; (c) comparing the extracted voice features with items in a speaker-independent continuous digit voice model to derive a digit string corresponding to items in the speaker-independent continuous digit voice model, which are matched with the voice features, and determining whether the derived digit string is identical to the digit string, which has been provided to the user in the step (a); (d) comparing the voice features, which has been extracted in the step (c), with speaker model of the user and measuring the similarity between the voice features and the speaker model of the user, the similarity indicating the difference between phonetic values; (e) increasing a first speaker rejection count when the similarity, which is measured in the step (d), is greater than or equal to a predetermined lower similarity threshold and increasing a second speaker rejection count when the similarity is greater than or equal to a predetermined upper similarity threshold; and (f) after repeating the steps (a) through (e) a plurality of times, (f1) approving the user when the first speaker rejection count is 0, and rejecting the user when the second speaker rejection count is at least 1 or the first speaker rejection count exceeds a predetermined rejection count threshold; and (f2) determining whether to approve or reject the user based on the similarity measured in the step (d), when the second speaker rejection count is 0, and the first speaker rejection count is at least 1 and less than or equal to the rejection count threshold.

15. A computer-readable recording medium for recording a program which is executed in a computer for speaker registration in a speaker verification system, wherein the program comprises the steps of: (a) providing a continuous digit string having various phonetic values to a user; (b) receiving the voice of the user uttering the continuous digit string; (c) extracting voice features from the received user's voice; (d) comparing the extracted voice features with items in a speaker-independent continuous digit voice model to derive a digit string corresponding to items in the speaker-independent continuous digit voice model, which match the voice features, and determining whether the derived digit string is identical to the digit string provided to the user in the step (a); (e) comparing the voice features extracted in the step (c), with speaker-dependent model of the user and measuring the similarity between the voice features and the speaker dependent model of the user; and (f) determining whether to register the user based on the measured similarity.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L

Patent Metadata

Filing Date

May 1, 2000

Publication Date

December 17, 2002

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search