10733996

User Authentication

PublishedAugust 4, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
29 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A device comprising: a processor configured to: extract a set of parameters from an audio signal; perform liveness verification by determining, based on a first plurality of parameters and a liveness data model, whether the audio signal corresponds to a first audio type indicating spoken speech or a second audio type indicating playback of recorded speech; perform user verification by determining, based on a second plurality of parameters and a user speech model, whether the audio signal corresponds to speech of a particular user associated with the user speech model and that the audio signal corresponds to the first audio type, and refrain from performing the user verification based on determining that the audio signal corresponds to the second audio type; perform keyword verification by determining, based on a third plurality of parameters and a keyword data model, whether the audio signal corresponds to a particular keyword, wherein the set of parameters includes the first plurality of parameters, the second plurality of parameters, and the third plurality of parameters; and generate an output indicating that user authentication is successful in response to determining that the audio signal corresponds to the speech of the particular user, to the particular keyword, and to the first audio type.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal. The processor performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 2

Original Legal Text

2. The device of claim 1 , wherein the processor is further configured to generate a second output indicating that the user authentication failed in response to determining that the audio signal corresponds to the second audio type.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal. The processor performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Additionally, the processor generates a second output indicating that user authentication failed if it determines the audio signal corresponds to recorded playback.

Claim 3

Original Legal Text

3. The device of claim 2 , wherein the processor is configured to generate the second output independently of performing the keyword verification and the user verification.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal. The processor performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Additionally, the processor generates a second output indicating that user authentication failed if it determines the audio signal corresponds to recorded playback. This second output indicating failure is generated without needing to perform or consider the results of the keyword verification or the user verification.

Claim 4

Original Legal Text

4. The device of claim 1 , wherein the liveness data model is user-independent.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal. The processor performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. This liveness data model is specifically user-independent. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 5

Original Legal Text

5. The device of claim 1 , wherein the first plurality of parameters is the same as the second plurality of parameters, and wherein the second plurality of parameters is the same as the third plurality of parameters.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal. The processor performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. In this device, the first set of parameters, the second set of parameters, and the third set of parameters are all the same. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 6

Original Legal Text

6. The device of claim 1 , wherein the liveness data model is trained based on a first set of recordings corresponding to spoken speech and a second set of recordings corresponding to played-back speech.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal. The processor performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. This liveness data model is specifically trained using a first set of recordings of actual spoken speech and a second set of recordings of played-back speech. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 7

Original Legal Text

7. The device of claim 6 , wherein the processor is configured to determine that the audio signal corresponds to the first audio type based on determining that the liveness data model indicates that the first plurality of parameters corresponds more closely to a second plurality of parameters of the first set of recordings than to a third plurality of parameters of the second set of recordings.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal. The processor performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. This liveness data model is specifically trained using a first set of recordings of actual spoken speech and a second set of recordings of played-back speech. The processor determines the audio signal corresponds to live spoken speech if the liveness data model indicates that the first set of parameters from the audio signal is more similar to parameters from the spoken speech recordings than to parameters from the played-back speech recordings. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 8

Original Legal Text

8. The device of claim 1 , wherein the liveness data model includes a machine-learning data model, and wherein the processor is configured to train the liveness data model by: extracting a first set of parameters from a first recording corresponding to spoken speech; extracting a second set of parameters from a second recording corresponding to playback of the first recording; updating the liveness data model based on the first set of parameters corresponding to the first audio type; and updating the liveness data model based on the second set of parameters corresponding to the second audio type.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal. The processor performs liveness verification using a liveness data model, which is a machine-learning data model, and a first set of parameters to determine if the audio signal is live spoken speech or recorded playback. The processor trains this liveness data model by extracting a first set of parameters from a first recording of spoken speech, then extracting a second set of parameters from a playback of that same first recording. It updates the liveness data model with the first parameters indicating live speech and with the second parameters indicating recorded playback. If the audio is live speech, the device performs user verification with a user speech model and a second set of parameters to identify a particular user. If it's recorded speech, user verification is skipped. It also performs keyword verification using a keyword data model and a third set of parameters to detect a specific keyword. The overall parameters include these three sets. The device generates an output indicating successful user authentication only if the audio signal is determined to be live spoken speech, from the particular user, and contains the specific keyword.

Claim 9

Original Legal Text

9. The device of claim 1 , wherein the first plurality of parameters indicates characteristics of the audio signal.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal. The processor performs liveness verification by determining, based on a first set of parameters (which indicate characteristics of the audio signal) and a liveness data model, whether the audio signal is live spoken speech or recorded playback. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 10

Original Legal Text

10. The device of claim 9 , wherein the characteristics include a dynamic frequency range of the audio signal.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal. The processor performs liveness verification by determining, based on a first set of parameters (which indicate characteristics of the audio signal, such as its dynamic frequency range) and a liveness data model, whether the audio signal is live spoken speech or recorded playback. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 11

Original Legal Text

11. The device of claim 1 , further comprising a display, wherein the processor is further configured to: receive user input indicating that the user authentication is to be performed; generate the particular keyword in response to receiving the user input; and provide a graphical user interface (GUI) indicating the particular keyword to the display, wherein the microphone is configured to receive the audio signal subsequent to the processor providing the GUI to the display.

Plain English Translation

A device includes a processor that extracts a set of parameters from an audio signal, and it also includes a display. The processor performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Additionally, the processor receives user input to start authentication, generates a specific keyword in response, displays this keyword via a graphical user interface (GUI) on the display, and then receives the audio signal (e.g., via a microphone) *after* the keyword has been displayed.

Claim 12

Original Legal Text

12. The device of claim 1 , further comprising: a microphone configured to generate the audio signal responsive to receiving an input audio signal; an antenna; and a transmitter coupled to the antenna and configured to transmit, via the antenna, authentication data to a second device based on determining that the user authentication is successful.

Plain English Translation

A device includes a microphone for generating an audio signal, an antenna, and a transmitter coupled to the antenna. It also includes a processor that extracts parameters from the audio signal. The processor performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Upon successful user authentication, the transmitter transmits authentication data to a second device via the antenna.

Claim 13

Original Legal Text

13. The device of claim 12 , wherein the microphone, the processor, the antenna, and the transmitter are integrated into a mobile device.

Plain English Translation

A device includes a microphone for generating an audio signal, an antenna, and a transmitter coupled to the antenna. It also includes a processor that extracts parameters from the audio signal. The processor performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to identify a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. The processor also performs keyword verification using a third set of parameters and a keyword data model to detect a specific keyword. The overall set of extracted parameters includes these three sets. An output indicating successful user authentication is generated only if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Upon successful user authentication, the transmitter transmits authentication data to a second device via the antenna. All these components – the microphone, processor, antenna, and transmitter – are integrated into a single mobile device.

Claim 14

Original Legal Text

14. A method comprising: receiving an audio signal at a device; extracting, at the device, a set of parameters from the audio signal; performing, at the device, liveness verification by determining, based on a first plurality of parameters and a liveness data model, whether the audio signal corresponds to a first audio type indicating spoken speech or a second audio type indicating playback of recorded speech; performing, at the device, user verification by determining, based on a second plurality of parameters and a user speech model, whether the audio signal corresponds to speech of a particular user associated with the user speech model, wherein the user speech model is distinct from the liveness data model; performing, at the device, keyword verification by determining, based on a third plurality of parameters and a keyword data model, whether the audio signal corresponds to a particular keyword, wherein the set of parameters includes the first plurality of parameters, the second plurality of parameters, and the third plurality of parameters; and generating, at the device, an output indicating that user authentication is successful based on determining that the audio signal corresponds to the speech of the particular user, to the particular keyword, and to the first audio type.

Plain English Translation

A method for user authentication performed by a device involves receiving an audio signal and extracting a set of parameters from it. The method includes performing liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. User verification is performed using a second set of parameters and a user speech model, which is distinct from the liveness data model, to determine if the audio signal matches a particular user's speech. Keyword verification is performed using a third set of parameters and a keyword data model to determine if a specific keyword is present. The extracted set of parameters includes these three parameter sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 15

Original Legal Text

15. The method of claim 14 , wherein the liveness data model is trained based on a first set of recordings corresponding to spoken speech and a second set of recordings corresponding to played-back speech.

Plain English Translation

A method for user authentication performed by a device involves receiving an audio signal and extracting a set of parameters from it. The method includes performing liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. This liveness data model is trained based on a first set of recordings corresponding to actual spoken speech and a second set of recordings corresponding to played-back speech. User verification is performed using a second set of parameters and a user speech model, which is distinct from the liveness data model, to determine if the audio signal matches a particular user's speech. Keyword verification is performed using a third set of parameters and a keyword data model to determine if a specific keyword is present. The extracted set of parameters includes these three parameter sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 16

Original Legal Text

16. The method of claim 15 , wherein the first plurality of parameters indicates a dynamic frequency range of the audio signal, wherein the audio signal is determined to correspond to the first audio type based on determining that that the liveness data model indicates that the dynamic frequency range of the audio signal corresponds more closely to first dynamic frequency ranges of the first set of recordings than to second dynamic frequency ranges of the second set of recordings.

Plain English Translation

A method for user authentication performed by a device involves receiving an audio signal and extracting a set of parameters from it. The method includes performing liveness verification using a first set of parameters, which specifically indicate the dynamic frequency range of the audio signal, and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. This liveness data model is trained based on a first set of recordings of actual spoken speech and a second set of recordings of played-back speech. The audio signal is determined to correspond to live spoken speech if the liveness data model indicates that the dynamic frequency range of the audio signal is more similar to the dynamic frequency ranges of the spoken speech recordings than to the dynamic frequency ranges of the played-back speech recordings. User verification is performed using a second set of parameters and a user speech model, which is distinct from the liveness data model, to determine if the audio signal matches a particular user's speech. Keyword verification is performed using a third set of parameters and a keyword data model to determine if a specific keyword is present. The extracted set of parameters includes these three parameter sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 17

Original Legal Text

17. The method of claim 14 , wherein the user speech model is trained based on input audio signals associated with the particular user.

Plain English Translation

A method for user authentication performed by a device involves receiving an audio signal and extracting a set of parameters from it. The method includes performing liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. User verification is performed using a second set of parameters and a user speech model, which is distinct from the liveness data model, to determine if the audio signal matches a particular user's speech. This user speech model is specifically trained using input audio signals associated with that particular user. Keyword verification is performed using a third set of parameters and a keyword data model to determine if a specific keyword is present. The extracted set of parameters includes these three parameter sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 18

Original Legal Text

18. The method of claim 14 , wherein the set of parameters is extracted in response to receiving a user input indicating that the user authentication is to be performed.

Plain English Translation

A method for user authentication performed by a device involves receiving an audio signal. In response to receiving a user input indicating that user authentication is to be performed, a set of parameters is extracted from the audio signal. The method then performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. User verification is performed using a second set of parameters and a user speech model, which is distinct from the liveness data model, to determine if the audio signal matches a particular user's speech. Keyword verification is performed using a third set of parameters and a keyword data model to determine if a specific keyword is present. The extracted set of parameters includes these three parameter sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 19

Original Legal Text

19. The method of claim 14 , wherein the user verification is performed in response to determining that the audio signal corresponds to the first audio type.

Plain English Translation

A method for user authentication performed by a device involves receiving an audio signal and extracting a set of parameters from it. The method includes performing liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. User verification is performed using a second set of parameters and a user speech model, which is distinct from the liveness data model, to determine if the audio signal matches a particular user's speech. This user verification step is only performed *after* it has been determined that the audio signal corresponds to live spoken speech. Keyword verification is performed using a third set of parameters and a keyword data model to determine if a specific keyword is present. The extracted set of parameters includes these three parameter sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword.

Claim 20

Original Legal Text

20. A computer-readable storage device storing instructions that, when executed by a processor, cause the processor to perform operations comprising: extracting a set of parameters from an audio signal; performing liveness verification by determining, based on a first plurality of parameters and a liveness data model, whether the audio signal corresponds to a first audio type indicating spoken speech or a second audio type indicating playback of recorded speech; performing user verification by determining, based on a second plurality of parameters and a user speech model, whether the audio signal corresponds to speech of a particular user associated with the user speech model; performing keyword verification by determining, based on a third plurality of parameters and a keyword data model, whether the audio signal corresponds to a particular keyword, wherein the set of parameters includes the first plurality of parameters, the second plurality of parameters, and the third plurality of parameters; generating an output indicating that user authentication is successful in response to determining that the audio signal corresponds to the speech of the particular user, to the particular keyword, and to the first audio type; and updating the user speech model based on the set of parameters in response to determining that the user authentication is successful.

Plain English Translation

A computer-readable storage device stores instructions that, when executed by a processor, cause it to perform user authentication operations. These operations include extracting a set of parameters from an audio signal. It performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. It performs user verification using a second set of parameters and a user speech model to determine if the audio signal matches a particular user's speech. It also performs keyword verification using a third set of parameters and a keyword data model to determine if a specific keyword is present. The overall extracted parameters include these three sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Furthermore, the user speech model is updated based on the extracted parameters if the user authentication is successful.

Claim 21

Original Legal Text

21. The computer-readable storage device of claim 20 , wherein the liveness data model is trained based on a first set of recordings corresponding to spoken speech and a second set of recordings corresponding to played-back speech.

Plain English Translation

A computer-readable storage device stores instructions that, when executed by a processor, cause it to perform user authentication operations. These operations include extracting a set of parameters from an audio signal. It performs liveness verification using a first set of parameters and a liveness data model (which is trained based on a first set of recordings corresponding to actual spoken speech and a second set of recordings corresponding to played-back speech) to determine if the audio signal is live spoken speech or recorded playback. It performs user verification using a second set of parameters and a user speech model to determine if the audio signal matches a particular user's speech. It also performs keyword verification using a third set of parameters and a keyword data model to determine if a specific keyword is present. The overall extracted parameters include these three sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Furthermore, the user speech model is updated based on the extracted parameters if the user authentication is successful.

Claim 22

Original Legal Text

22. The computer-readable storage device of claim 20 , wherein the user speech model is trained based on input audio signals associated with the particular user.

Plain English Translation

A computer-readable storage device stores instructions that, when executed by a processor, cause it to perform user authentication operations. These operations include extracting a set of parameters from an audio signal. It performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. It performs user verification using a second set of parameters and a user speech model (which is trained based on input audio signals associated with the particular user) to determine if the audio signal matches a particular user's speech. It also performs keyword verification using a third set of parameters and a keyword data model to determine if a specific keyword is present. The overall extracted parameters include these three sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Furthermore, the user speech model is updated based on the extracted parameters if the user authentication is successful.

Claim 23

Original Legal Text

23. The computer-readable storage device of claim 20 , wherein the first plurality of parameters is the same as the second plurality of parameters, and wherein the second plurality of parameters is the same as the third plurality of parameters.

Plain English Translation

A computer-readable storage device stores instructions that, when executed by a processor, cause it to perform user authentication operations. These operations include extracting a set of parameters from an audio signal. It performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. It performs user verification using a second set of parameters and a user speech model to determine if the audio signal matches a particular user's speech. It also performs keyword verification using a third set of parameters and a keyword data model to determine if a specific keyword is present. In these operations, the first set of parameters, the second set of parameters, and the third set of parameters are all the same. The overall extracted parameters include these three sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Furthermore, the user speech model is updated based on the extracted parameters if the user authentication is successful.

Claim 24

Original Legal Text

24. The computer-readable storage device of claim 20 , wherein the operations further comprise: determining that the user authentication is to be performed in response to determining that the audio signal corresponds to the particular keyword; and in response to determining that the user authentication is to be performed, performing the liveness verification and performing the user verification.

Plain English Translation

A computer-readable storage device stores instructions that, when executed by a processor, cause it to perform user authentication operations. These operations include extracting a set of parameters from an audio signal. It performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. It performs user verification using a second set of parameters and a user speech model to determine if the audio signal matches a particular user's speech. It also performs keyword verification by determining, based on a third set of parameters and a keyword data model, whether the audio signal corresponds to a particular keyword. The processor first determines that user authentication is to be performed in response to detecting the particular keyword. Then, if authentication is to be performed, it proceeds with the liveness verification and user verification. The overall extracted parameters include these three sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Furthermore, the user speech model is updated based on the extracted parameters if the user authentication is successful.

Claim 25

Original Legal Text

25. The computer-readable storage device of claim 20 , wherein the user speech model is distinct from the liveness data model.

Plain English Translation

A computer-readable storage device stores instructions that, when executed by a processor, cause it to perform user authentication operations. These operations include extracting a set of parameters from an audio signal. It performs liveness verification using a first set of parameters and a liveness data model to determine if the audio signal is live spoken speech or recorded playback. It performs user verification using a second set of parameters and a user speech model, which is distinct from the liveness data model, to determine if the audio signal matches a particular user's speech. It also performs keyword verification using a third set of parameters and a keyword data model to determine if a specific keyword is present. The overall extracted parameters include these three sets. An output indicating successful user authentication is generated if the audio signal is determined to be live spoken speech, corresponds to the particular user's speech, and contains the specific keyword. Furthermore, the user speech model is updated based on the extracted parameters if the user authentication is successful.

Claim 26

Original Legal Text

26. An apparatus comprising: means for generating an output signal responsive to receiving an audio signal; means for extracting a set of parameters from the output signal; means for performing liveness verification by determining, based on a first plurality of parameters and a liveness data model, whether the audio signal corresponds to a first audio type indicating spoken speech or a second audio type indicating playback of recorded speech; means for performing user verification by determining, based on a second plurality of parameters and a user speech model, whether the audio signal corresponds to speech of a particular user associated with the user speech model, wherein the user speech model is distinct from the liveness data model; means for performing keyword verification by determining, based on a third plurality of parameters and a keyword data model, whether the audio signal corresponds to a particular keyword, wherein the set of parameters includes the first plurality of parameters, the second plurality of parameters, and the third plurality of parameters; and means for generating an output indicating that user authentication is successful in response to determining that the audio signal corresponds to the speech of the particular user, to the particular keyword, and to the first audio type.

Plain English Translation

An apparatus for user authentication comprises means for generating an output signal from an audio input. It includes means for extracting a set of parameters from this output signal. It further has means for performing liveness verification by determining, based on a first set of parameters and a liveness data model, whether the audio signal is live spoken speech or recorded playback. It includes means for performing user verification by determining, based on a second set of parameters and a user speech model (which is distinct from the liveness data model), whether the audio signal corresponds to speech of a particular user. It also has means for performing keyword verification by determining, based on a third set of parameters and a keyword data model, whether the audio signal contains a particular keyword. The extracted set of parameters includes these three sets. Finally, it has means for generating an output indicating that user authentication is successful if the audio signal corresponds to the particular user's speech, contains the particular keyword, and is determined to be live spoken speech.

Claim 27

Original Legal Text

27. The apparatus of claim 26 , wherein the means for generating the output signal, the means for extracting the set of parameters, the means for performing the liveness verification, the means for performing the user verification, the means for performing the keyword verification, and the means for generating the output are integrated into a communication device, a personal digital assistant (PDA), a computer, a music player, a video player, an entertainment unit, a navigation device, a mobile device, a fixed location data unit, or a set top box.

Plain English Translation

An apparatus for user authentication comprises means for generating an output signal from an audio input. It includes means for extracting a set of parameters from this output signal. It further has means for performing liveness verification by determining, based on a first set of parameters and a liveness data model, whether the audio signal is live spoken speech or recorded playback. It includes means for performing user verification by determining, based on a second set of parameters and a user speech model (which is distinct from the liveness data model), whether the audio signal corresponds to speech of a particular user. It also has means for performing keyword verification by determining, based on a third set of parameters and a keyword data model, whether the audio signal contains a particular keyword. The extracted set of parameters includes these three sets. Finally, it has means for generating an output indicating that user authentication is successful if the audio signal corresponds to the particular user's speech, contains the particular keyword, and is determined to be live spoken speech. All these functional components (the means for generating output signal, extracting parameters, performing verifications, and generating the final output) are integrated into a single device such as a communication device, personal digital assistant (PDA), computer, music player, video player, entertainment unit, navigation device, mobile device, fixed location data unit, or a set top box.

Claim 28

Original Legal Text

28. A device comprising: a processor configured to: extract a set of parameters from an audio signal; perform liveness verification by determining, based on a first plurality of parameters and a liveness data model, whether the audio signal corresponds to a first audio type indicating spoken speech or a second audio type indicating playback of recorded speech; perform user verification by determining, based on a second plurality of parameters and a user speech model, whether the audio signal corresponds to speech of a particular user associated with the user speech; perform keyword verification by determining, based on a third plurality of parameters and a keyword data model, whether the audio signal corresponds to a particular keyword, wherein the set of parameters includes the first plurality of parameters, the second plurality of parameters, and the third plurality of parameters; generate an output indicating that user authentication is successful in response to determining that the audio signal corresponds to the speech of the particular user, to the particular keyword, and to the first audio type; and generate a second output indicating that the user authentication failed in response to determining that the audio signal corresponds to the second audio type, wherein the second output is generated independently of performing the keyword verification and the user verification.

Plain English Translation

A device comprises a processor configured to extract a set of parameters from an audio signal. The processor performs liveness verification by determining, based on a first set of parameters and a liveness data model, whether the audio signal is live spoken speech or recorded playback. If the audio signal is determined to be live spoken speech, the processor performs user verification using a second set of parameters and a user speech model to determine if the audio signal corresponds to speech of a particular user. If the audio signal is determined to be recorded playback, the processor refrains from performing user verification. It also performs keyword verification using a third set of parameters and a keyword data model to determine if the audio signal corresponds to a particular keyword. The overall extracted parameters include these three sets. The processor generates an output indicating successful user authentication if the audio signal corresponds to the particular user's speech, to the particular keyword, and to live spoken speech. Additionally, the processor generates a second output indicating that user authentication failed if it determines the audio signal corresponds to recorded playback. This second output is generated independently, meaning without performing or considering the results of the keyword verification and the user verification.

Claim 29

Original Legal Text

29. A device comprising: a processor configured to: extract a set of parameters from an audio signal; perform liveness verification by determining, based on a first plurality of parameters and a liveness data model, whether the audio signal corresponds to a first audio type indicating spoken speech or a second audio type indicating playback of recorded speech, wherein the liveness data model is user-independent; perform user verification by determining, based on a second plurality of parameters and a user speech model, whether the audio signal corresponds to speech of a particular user associated with the user speech model; perform keyword verification by determining, based on a third plurality of parameters and a keyword data model, whether the audio signal corresponds to a particular keyword, wherein the set of parameters includes the first plurality of parameters, the second plurality of parameters, and the third plurality of parameters; and generate an output indicating that user authentication is successful in response to determining that the audio signal corresponds to the speech of the particular user, to the particular keyword, and to the first audio type.

Plain English Translation

A device comprises a processor configured to extract a set of parameters from an audio signal. The processor performs liveness verification by determining, based on a first set of parameters and a liveness data model, whether the audio signal is live spoken speech or recorded playback. The liveness data model used for this verification is user-independent. The processor also performs user verification using a second set of parameters and a user speech model to determine if the audio signal corresponds to speech of a particular user. It performs keyword verification using a third set of parameters and a keyword data model to determine if the audio signal corresponds to a particular keyword. The overall extracted parameters include these three sets. The processor generates an output indicating that user authentication is successful if the audio signal corresponds to the particular user's speech, to the particular keyword, and to live spoken speech.

Patent Metadata

Filing Date

Unknown

Publication Date

August 4, 2020

Inventors

Bhaskara Ramudu Pendyala
Pavan Kumar Kadiyala

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “USER AUTHENTICATION” (10733996). https://patentable.app/patents/10733996

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10733996. See llms.txt for full attribution policy.

USER AUTHENTICATION