Patentable/Patents/US-10497364
US-10497364

Multi-user authentication on a device

PublishedDecember 3, 2019
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

In some implementations, an utterance is determined to include a particular user speaking a hotword based at least on a first set of samples of the particular user speaking the hotword. In response to determining that an utterance includes a particular user speaking a hotword based at least on a first set of samples of the particular user speaking the hotword, at least a portion of the utterance is stored as a new sample. A second set of samples of the particular user speaking the utterance is obtained, where the second set of samples includes the new sample and less than all the samples in the first set of samples. A second utterance is determined to include the particular user speaking the hotword based at least on the second set of samples of the user speaking the hotword.

Patent Claims
20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A computer-implemented method comprising: determining, by one or more computers, that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword; in response to determining that an utterance includes a particular user speaking a hotword based at least on the first hotword detection model generated from the first set of samples of the particular user speaking the hotword, storing at least a portion of the utterance as a new sample; obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples; determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword; and in response to determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword, recognizing the second utterance as having been spoken by the particular user.

Plain English Translation

This invention relates to voice recognition systems, specifically improving hotword detection accuracy for individual users. The problem addressed is the variability in how different users pronounce hotwords, leading to false positives or missed detections in voice-activated systems. The solution involves a personalized hotword detection model that adapts over time. The method uses a computer system to detect when a user speaks a hotword, such as "Hey Siri" or "OK Google," by analyzing audio input. Initially, a first hotword detection model is trained using a set of samples of the user speaking the hotword. When the system detects the hotword in an utterance, it stores a portion of that utterance as a new sample. This new sample is added to a second set of samples, which includes some but not all of the original samples. A second hotword detection model is then generated from this updated set. When the system detects the hotword again using this second model, it recognizes the utterance as coming from the same user. This adaptive approach improves accuracy by continuously refining the model with recent speech samples while retaining some historical data. The system ensures that the hotword detection remains personalized and up-to-date with the user's voice characteristics.

Claim 2

Original Legal Text

2. The method of claim 1 , wherein obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples comprises: selecting a predetermined number of recently stored samples as the second set of samples.

Plain English Translation

This invention relates to speech recognition systems that adapt to a user's voice over time. The problem addressed is maintaining accurate speech recognition performance as a user's voice changes, such as due to aging, illness, or environmental factors. Traditional systems rely on static voice models, which degrade in accuracy when the user's voice drifts from the initial training data. The method involves dynamically updating a voice model for a particular user by selecting a subset of recent speech samples. Initially, a first set of samples is collected from the user speaking an utterance. To adapt the model, a second set of samples is obtained, which includes a newly captured sample and a subset of the original samples. The second set is formed by selecting a predetermined number of the most recently stored samples from the first set, ensuring the model incorporates recent voice characteristics while retaining some historical data for stability. This approach balances adaptation to voice changes with the need for consistent performance, improving recognition accuracy over time without requiring full retraining. The method is particularly useful in applications where user voice characteristics may evolve, such as personal assistants, medical monitoring, or security systems.

Claim 3

Original Legal Text

3. The method of claim 1 , wherein obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples comprises: selecting both a predetermined number of most recently stored samples and a set of reference samples to combine together as the second set of samples.

Plain English Translation

This invention relates to speech recognition systems that adapt to a user's voice over time. The problem addressed is maintaining accurate speech recognition performance as a user's voice changes, such as due to aging, illness, or environmental factors. Traditional systems rely on static voice models, which degrade in accuracy when the user's voice drifts from the initial training data. The method involves dynamically updating a voice model by combining new speech samples with a subset of previously stored samples. When a new speech sample is obtained, a second set of samples is created by selecting a predetermined number of the most recently stored samples and a set of reference samples. The reference samples are chosen to represent stable aspects of the user's voice, ensuring the updated model retains consistency while incorporating recent changes. This hybrid approach prevents the model from being overly influenced by temporary variations while still adapting to long-term voice changes. The system balances fresh data with established reference points to maintain recognition accuracy over time. This adaptive technique improves speech recognition robustness in real-world scenarios where a user's voice may evolve.

Claim 4

Original Legal Text

4. The method of 3 , wherein the reference samples comprise samples from a registration process for the particular user and the most recent stored samples comprise samples from queries spoken by the particular user.

Plain English Translation

This invention relates to voice recognition systems, specifically improving the accuracy of user authentication by comparing voice samples. The problem addressed is the variability in a user's voice over time, which can lead to authentication failures. The solution involves dynamically updating reference voice samples used for comparison with the most recent voice samples from the user. The method involves storing reference voice samples from a user's initial registration process and continuously updating these references with the most recent voice samples from the user's authentication queries. When a new voice query is received, the system compares it against both the original registration samples and the most recent stored samples. This dual comparison helps account for natural variations in the user's voice, improving authentication accuracy. The system may also prioritize the most recent samples to better reflect current voice characteristics, while still retaining older samples for broader context. This adaptive approach reduces false rejections while maintaining security. The method is particularly useful in applications requiring high-security voice authentication, such as banking, access control, or personal device unlocking.

Claim 5

Original Legal Text

5. The method of claim 1 , comprising: in response to obtaining the second set of samples, deleting a sample in the first set of samples but not in the second set of samples.

Plain English Translation

This invention relates to a method for managing sample data sets, particularly in applications where sample sets are updated or compared over time. The problem addressed is the need to efficiently handle discrepancies between a first set of samples and a second set of samples, ensuring that outdated or irrelevant samples are removed when a new set is obtained. The method involves obtaining a first set of samples and a second set of samples, where the second set is derived from or replaces the first. When the second set is obtained, the method identifies samples that are present in the first set but absent in the second set. These samples are then deleted, ensuring that only the most relevant or up-to-date samples remain. This process helps maintain data consistency, reduces storage requirements, and prevents outdated information from being used in subsequent analyses or operations. The method is particularly useful in applications such as data processing, machine learning, or any system where sample sets are periodically updated. By automatically removing obsolete samples, the method ensures that only the most current data is retained, improving accuracy and efficiency in downstream tasks. The deletion step is triggered in response to obtaining the second set, making the process dynamic and responsive to changes in the sample data.

Claim 6

Original Legal Text

6. The method of claim 1 , wherein determining that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword comprises: generating the first hotword detection model using the first set of samples; inputting the utterance into the first hotword detection model; and determining that the first hotword detection model has classified the utterance as including the particular user speaking the hotword.

Plain English Translation

This invention relates to voice recognition systems, specifically improving hotword detection accuracy by using personalized models. The problem addressed is the difficulty of reliably detecting hotwords (e.g., "Hey Siri," "OK Google") when spoken by different users, as general models often struggle with variations in voice, accent, or environment. The solution involves creating a personalized hotword detection model for each user. First, a set of audio samples is collected where the user repeatedly speaks the hotword. These samples are used to train a first hotword detection model specific to that user. When an utterance is received, it is processed by this personalized model. If the model classifies the utterance as containing the user's voice speaking the hotword, the system confirms the detection. This approach enhances accuracy by adapting to individual voice characteristics, reducing false positives or negatives compared to generic models. The method may also involve updating the model over time with additional samples to maintain performance as the user's voice changes. This technique is particularly useful in smart assistants, voice-controlled devices, and security systems where precise user identification is critical.

Claim 7

Original Legal Text

7. The method of claim 1 , wherein determining that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword comprises: generating the second hotword detection model using the second set of samples; inputting the second utterance into the second hotword detection model; and determining that the second hotword detection model has classified the second utterance as including the particular user speaking the hotword.

Plain English Translation

This invention relates to voice recognition systems, specifically improving hotword detection accuracy for individual users. The problem addressed is the challenge of distinguishing a user's voice speaking a hotword from other voices or background noise, ensuring reliable activation of voice-controlled devices. The method involves creating a personalized hotword detection model for a specific user. Initially, a first set of audio samples of the user speaking the hotword is collected. A first hotword detection model is trained using these samples. When a first utterance is detected, it is input into this model to determine if the user spoke the hotword. If confirmed, the system proceeds to further processing, such as executing a command. To enhance accuracy, a second set of additional samples of the user speaking the hotword is collected. A second hotword detection model is generated using this expanded dataset. When a second utterance is detected, it is input into the second model, which classifies whether the user spoke the hotword. This refined model improves recognition reliability by leveraging more training data. The system may also compare the second utterance to the first model for consistency, ensuring robustness against variations in the user's voice or environmental conditions. The method ensures that only the particular user's voice reliably triggers the hotword, reducing false activations from other speakers.

Claim 8

Original Legal Text

8. The computer-implemented method of claim 1 , comprising: receiving a second new sample from a server; and determining that a third utterance includes the particular user speaking the hotword based at least on a third set of samples that includes the second new sample from the server and less than all the samples in the second set of samples.

Plain English Translation

This invention relates to voice recognition systems, specifically methods for detecting a hotword spoken by a particular user in an audio stream. The problem addressed is improving the accuracy and efficiency of hotword detection by dynamically adjusting the reference samples used for comparison. The method involves receiving a new audio sample from a server and analyzing it to determine if a hotword was spoken by a specific user. The analysis uses a subset of previously collected samples, rather than all available samples, to make the determination. This subset includes the new sample and excludes some older samples, allowing the system to adapt to recent speech patterns while reducing computational overhead. The method builds on a broader system that collects and processes audio samples to identify hotwords. It includes steps for receiving an initial set of samples, determining if a hotword is present, and updating the reference samples based on new data. The dynamic selection of samples for comparison improves accuracy by focusing on the most relevant data, particularly when the user's voice characteristics may have changed over time. This approach is useful in applications like voice assistants, where quick and accurate hotword detection is critical.

Claim 9

Original Legal Text

9. The method of claim 1 , comprising: receiving, from a server, indications of samples in a third set of samples; determining samples that are in the third set of samples that are not locally stored; providing a request to server for the samples in the third set of samples that are not locally stored; and receiving the samples that are not locally stored from the server in response to the request.

Plain English Translation

This invention relates to a method for managing sample data in a distributed system, addressing the challenge of efficiently synchronizing sample data between a local device and a remote server. The method involves receiving indications of samples in a third set of samples from a server, where the third set represents a collection of data samples that may include updates or new entries. The system then identifies which samples in this third set are not already stored locally, ensuring that only missing or updated data is requested. A request is sent to the server for the identified missing samples, and the server responds by providing the requested samples that were not previously stored locally. This approach optimizes data transfer by avoiding redundant downloads, reducing bandwidth usage and improving synchronization efficiency. The method is part of a broader system for managing sample data, where the first set of samples is a locally stored collection, the second set is a previously received collection from the server, and the third set is a new or updated collection. The process ensures that the local device maintains an up-to-date and complete dataset by dynamically requesting only the necessary samples from the server. This technique is particularly useful in applications requiring frequent data updates, such as real-time monitoring systems or distributed databases.

Claim 10

Original Legal Text

10. The method of claim 1 , comprising: providing the first set of samples to a voice-enabled device to enable the voice-enabled device to detect whether the particular user says the hotword, wherein determining that an utterance includes a particular user speaking a hotword based at least on the first hotword detection model generated from a first set of samples of the particular user speaking the hotword comprises receiving an indication that the voice-enabled device detected that the particular user said the hotword.

Plain English Translation

A voice recognition system improves hotword detection accuracy by training a personalized hotword detection model for a specific user. The system addresses the challenge of distinguishing a user's voice from background noise or other speakers in voice-enabled devices. The method involves generating a first hotword detection model using a first set of audio samples where the user repeatedly speaks a predefined hotword. This model is then deployed to a voice-enabled device, enabling it to detect when the user speaks the hotword. The device processes incoming audio and compares it against the model to determine if the hotword is spoken by the user. If a match is detected, the device sends an indication confirming the user's hotword utterance. This personalized approach enhances recognition accuracy by adapting to the user's unique voice patterns, reducing false activations from other speakers or ambient noise. The system may also incorporate additional models or samples to further refine detection performance.

Claim 11

Original Legal Text

11. The method of claim 1 , comprising: generating a hotword detection model using the first set of samples; and providing the hotword detection model to a voice-enabled device to enable the voice-enabled device to detect whether the particular user says the hotword, wherein determining that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword comprises receiving an indication that the voice-enabled device detected that the particular user said the hotword.

Plain English Translation

This invention relates to voice recognition technology, specifically improving hotword detection accuracy for individual users in voice-enabled devices. The problem addressed is the lack of personalized hotword detection models, which can lead to false positives or missed activations when a device relies on generic models trained on diverse user data. The method involves creating a personalized hotword detection model for a specific user. First, a set of audio samples is collected where the user repeatedly speaks the hotword. These samples are used to train a dedicated hotword detection model tailored to the user's voice characteristics, such as pitch, tone, and speaking style. The trained model is then deployed to the user's voice-enabled device, such as a smart speaker or virtual assistant. When the device receives an audio input, the personalized model analyzes the utterance to determine if the user has spoken the hotword. If the model confirms the hotword was spoken, the device receives an indication and proceeds with the corresponding action, such as waking up or executing a command. This approach enhances accuracy by adapting the detection model to the user's unique voice patterns, reducing errors compared to generic models. The method ensures that the device reliably responds only to the user's hotword, improving user experience and security.

Claim 12

Original Legal Text

12. The method of claim 1 , comprising: receiving, from a voice-enabled device, a request for a current set of samples for detecting whether the particular user said the hotword; determining samples in the current set of samples that are not locally stored by the voice-enabled device; and providing, to the voice-enabled device, an indication of the samples in the current set of samples and the samples in the current set of samples that are not locally stored by the voice-enabled device.

Plain English Translation

A method for managing voice recognition samples in a voice-enabled device involves updating a set of samples used to detect whether a user has spoken a specific hotword. The voice-enabled device sends a request for the current set of samples to a remote system. The remote system identifies which samples in the current set are not already stored locally on the voice-enabled device. The system then provides the voice-enabled device with an indication of the samples in the current set, including those that are not locally available. This allows the device to update its local storage with the necessary samples for accurate hotword detection. The method ensures that the voice-enabled device has the most recent and relevant samples without requiring a full download of all samples, optimizing storage and bandwidth usage. The approach is particularly useful in environments where network connectivity may be limited or where device storage is constrained. The system dynamically manages sample updates to maintain efficient and reliable hotword detection performance.

Claim 13

Original Legal Text

13. A system comprising: one or more computers; and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: determining, by the one or more computers, that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword; in response to determining that an utterance includes a particular user speaking a hotword based at least on the first hotword detection model generated from a-the first set of samples of the particular user speaking the hotword, storing at least a portion of the utterance as a new sample; obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples; determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword; and in response to determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword, recognizing the second utterance as having been spoken by the particular user.

Plain English Translation

The system operates in the domain of voice recognition, specifically for detecting and verifying user-specific hotwords in spoken utterances. The problem addressed is the need for accurate and personalized hotword detection, where a hotword is a predefined wake word or command used to activate a voice-controlled system. Existing systems often rely on generic models that may not adapt well to individual users' voices, leading to false positives or negatives. The system uses a personalized hotword detection model trained on samples of a specific user speaking the hotword. Initially, a first hotword detection model is generated from a first set of samples of the user speaking the hotword. When an utterance is detected as containing the hotword based on this model, a portion of the utterance is stored as a new sample. A second set of samples is then formed, which includes this new sample and a subset of the original samples. A second hotword detection model is generated from this second set. When a subsequent utterance is detected as containing the hotword using this updated model, the system recognizes the utterance as being spoken by the particular user. This approach allows the system to continuously refine its hotword detection model based on new samples, improving accuracy over time. The system ensures that the hotword detection remains personalized and adapts to variations in the user's voice.

Claim 14

Original Legal Text

14. The system of claim 13 , wherein obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples comprises: selecting a predetermined number of recently stored samples as the second set of samples.

Plain English Translation

This invention relates to a system for processing speech samples from a user to improve speech recognition accuracy. The problem addressed is maintaining up-to-date speech models for individual users while efficiently managing computational resources. The system collects and stores samples of a user's speech over time, forming a first set of samples. When a new speech sample is received, the system generates a second set of samples by selecting a predetermined number of the most recently stored samples, excluding older samples from the first set. This ensures the system adapts to recent speech patterns while limiting the number of samples processed. The system then uses this second set to update a speech model specific to the user, improving recognition accuracy for future utterances. The approach balances model accuracy with computational efficiency by dynamically adjusting the sample set size. This method is particularly useful in applications requiring personalized speech recognition, such as virtual assistants or transcription services, where user speech patterns may evolve over time. The invention focuses on optimizing the selection of training data to maintain model relevance without excessive processing overhead.

Claim 15

Original Legal Text

15. The system of claim 13 , wherein obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples comprises: selecting both a predetermined number of most recently stored samples and a set of reference samples to combine together as the second set of samples.

Plain English Translation

The system relates to speech recognition technology, specifically improving the accuracy of voice authentication or speaker verification by dynamically updating a reference dataset of voice samples. The problem addressed is maintaining accurate speaker models over time, as voice characteristics may change due to factors like aging, illness, or environmental conditions. Traditional systems often rely on static reference samples, which can degrade performance as the speaker's voice evolves. The system dynamically updates a reference dataset by combining a predetermined number of the most recently stored voice samples with a set of reference samples. This ensures the reference dataset remains current while retaining historical data for stability. The second set of samples, used for comparison or model training, includes a new sample and fewer than all samples from the initial set. The selection process involves choosing both the most recent samples and a curated set of reference samples, balancing freshness with consistency. This approach helps maintain accurate speaker verification by adapting to gradual voice changes while preserving the core characteristics of the speaker's voice. The system is particularly useful in applications requiring long-term speaker authentication, such as secure access control or personalized voice assistants.

Claim 16

Original Legal Text

16. The system of claim 15 , wherein the reference samples comprise samples from a registration process for the particular user and the most recent stored samples comprise samples from queries spoken by the particular user.

Plain English Translation

A biometric authentication system verifies user identity by comparing voice samples. The system addresses challenges in accurately matching voice inputs under varying conditions, such as background noise or speaker variability. It uses a reference set of voice samples collected during an initial registration process for a specific user. During authentication, the system compares the user's current voice input against the most recently stored samples from prior successful queries. This dynamic comparison improves accuracy by accounting for gradual changes in the user's voice over time. The system may also incorporate additional techniques, such as noise reduction or feature extraction, to enhance matching reliability. By continuously updating the reference data with recent successful samples, the system adapts to natural variations in the user's voice, reducing false rejections while maintaining security. This approach is particularly useful in applications requiring high-security voice authentication, such as banking, access control, or personal device unlocking. The system ensures robust performance by balancing between strict security requirements and user convenience.

Claim 17

Original Legal Text

17. The system of claim 13 , the operations comprising: in response to obtaining the second set of samples, deleting a sample in the first set of samples but not in the second set of samples.

Plain English Translation

This invention relates to a system for managing data samples, particularly in scenarios where sets of samples are compared and updated. The system addresses the problem of efficiently maintaining accurate and relevant data by dynamically adjusting sample sets based on new information. The system operates by comparing a first set of samples with a second set of samples, where the second set is obtained at a later time or under different conditions. When a sample is present in the first set but not in the second set, the system automatically deletes that sample from the first set. This ensures that the first set remains synchronized with the second set, removing outdated or irrelevant data. The system may also include additional operations, such as analyzing the samples for specific characteristics or conditions before deletion. The invention is useful in applications like data filtering, sensor monitoring, or any system requiring real-time or periodic updates to sample sets. The deletion process is triggered by the absence of a sample in the second set, ensuring that only relevant data is retained. This approach improves data accuracy and reduces storage or processing overhead by eliminating obsolete samples.

Claim 18

Original Legal Text

18. The system of claim 13 , wherein determining that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword comprises: generating the first hotword detection model using the first set of samples; inputting the utterance into the first hotword detection model; and determining that the first hotword detection model has classified the utterance as including the particular user speaking the hotword.

Plain English Translation

This invention relates to voice recognition systems, specifically improving hotword detection accuracy by using personalized models. The problem addressed is the difficulty of reliably detecting hotwords (e.g., wake words) when spoken by different users, as general models may struggle with variations in voice, accent, or speaking style. The system generates a personalized hotword detection model for each user by training it with a set of samples of that user speaking the hotword. When an utterance is received, the system inputs it into the user-specific model to determine if the hotword was spoken by that particular user. This approach enhances accuracy by adapting to individual voice characteristics, reducing false positives or negatives compared to generic models. The system may also compare the utterance against multiple models, including a general hotword model and user-specific models, to improve detection robustness. The personalized models are trained using machine learning techniques, where the first set of samples is used to generate the first hotword detection model. The system then applies this model to classify whether the hotword was spoken by the intended user. This method ensures that the system can distinguish between different users speaking the same hotword, improving overall voice interaction reliability.

Claim 19

Original Legal Text

19. The system of claim 13 , wherein determining that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword comprises: generating the second hotword detection model using the second set of samples; inputting the second utterance into the second hotword detection model; and determining that the second hotword detection model has classified the second utterance as including the particular user speaking the hotword.

Plain English Translation

This invention relates to voice recognition systems, specifically improving hotword detection accuracy for individual users. The problem addressed is the challenge of reliably detecting a user-specific hotword in varying acoustic environments, where general models may fail to distinguish between different speakers or background noise. The system includes a voice processing module that receives audio input containing a hotword. A hotword detection model, trained on a set of samples of a particular user speaking the hotword, analyzes the audio to determine if the hotword was spoken by that user. The model is generated by processing multiple samples of the user's voice to create a user-specific detection profile. When a new utterance is received, the system inputs the audio into the trained model, which classifies whether the hotword was spoken by the user. This personalized approach enhances accuracy by adapting to the user's unique voice characteristics. The system may also update the detection model over time using additional samples of the user speaking the hotword, improving performance as more data becomes available. This ensures the model remains effective even if the user's voice changes subtly. The invention focuses on real-time, user-specific hotword detection to enable more reliable voice-activated commands in devices like smart speakers or virtual assistants.

Claim 20

Original Legal Text

20. A non-transitory computer-readable medium storing instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: determining, by one or more computers, that an utterance includes a particular user speaking a hotword based at least on a first hotword detection model generated from a first set of samples of the particular user speaking the hotword; in response to determining that an utterance includes a particular user speaking a hotword based at least on the first hotword detection model generated from the first set of samples of the particular user speaking the hotword, storing at least a portion of the utterance as a new sample; obtaining a second set of samples of the particular user speaking the utterance, where the second set of samples includes the new sample and less than all the samples in the first set of samples; determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword based at least on a second hotword detection model generated from the second set of samples of the user speaking the hotword; and in response to determining, by the one or more computers, that a second utterance includes the particular user speaking the hotword, recognizing the second utterance as having been spoken by the particular user.

Plain English Translation

This invention relates to voice recognition systems, specifically improving hotword detection accuracy for individual users. The problem addressed is the variability in how different users pronounce hotwords, leading to false positives or negatives in voice-activated systems. The solution involves a personalized hotword detection model that adapts over time. The system first determines whether an utterance contains a hotword spoken by a particular user, using a hotword detection model trained on a set of samples of that user speaking the hotword. When a hotword is detected, at least part of the utterance is stored as a new sample. A second set of samples is then created, combining the new sample with some (but not all) of the original samples. A new hotword detection model is generated from this second set. When another utterance is detected as containing the hotword using this updated model, the system recognizes the utterance as being spoken by the particular user. This approach allows the system to continuously refine its hotword detection model based on real-world usage, improving accuracy over time without requiring manual retraining. The system dynamically updates its training data by retaining some older samples while incorporating new ones, ensuring the model remains relevant to the user's current speech patterns.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

April 18, 2018

Publication Date

December 3, 2019

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Multi-user authentication on a device” (US-10497364). https://patentable.app/patents/US-10497364

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-10497364. See llms.txt for full attribution policy.

Multi-user authentication on a device