10629192

Intelligent Personalized Speech Recognition

PublishedApril 21, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A video game system comprising: an audio input interface configured to receive audio data from one or more audio input devices; a display output configured for transmitting graphics data; a data store including computer-executable instructions; and a processor configured to execute the computer-executable instructions to perform a method comprising: determining a first phoneme mapping for each letter or combination of letters in a grammar training set; receiving audio data comprising a speech sample of a user reading a grammar training set; recognizing the speech sample using the first phoneme mapping; determining a first confidence score indicative of how accurately the grammar training set is recognized using the speech sample using the first phoneme mapping; changing the first phoneme mapping to generate a mutated phoneme mapping until a confidence score generated using the mutated phoneme mapping satisfies a defined confidence threshold; storing, in a user profile in the data store, the mutated phoneme mapping that satisfies the defined confidence threshold, wherein the mutated phoneme mapping is associated with the user in the user profile; receiving subsequent audio data from the user; accessing the user profile to retrieve the mutated phoneme mapping that is associated with the user; and using the mutated phoneme mapping to recognize the subsequent audio data received from the user.

Plain English Translation

A video game system is designed to improve speech recognition accuracy for individual users by dynamically adapting phoneme mappings based on their unique speech patterns. The system addresses the problem of generic speech recognition models failing to accurately interpret speech due to variations in pronunciation, accent, or speech patterns among users. The system includes an audio input interface to capture speech samples, a display output for graphics, a data store for storing instructions and user profiles, and a processor executing the instructions. The processor first determines an initial phoneme mapping for letters or letter combinations in a grammar training set. The user reads the training set aloud, and the system analyzes the speech sample using the initial phoneme mapping, generating a confidence score reflecting recognition accuracy. The system then iteratively adjusts the phoneme mapping (mutating it) until the confidence score meets a predefined threshold. The optimized phoneme mapping is stored in a user profile. When the user provides subsequent speech input, the system retrieves the personalized phoneme mapping from the profile and uses it to recognize the new audio data, improving accuracy. This approach ensures that speech recognition adapts to the user's unique pronunciation patterns, enhancing interaction within the video game system.

Claim 2

Original Legal Text

2. The video game system of claim 1 , wherein the processor is configured to execute the computer-executable instructions to perform the method further comprising: receiving information indicative of an accent of the user; and selecting the first phoneme mapping based on the information indicative of an accent.

Plain English Translation

A video game system includes a processor executing instructions to process speech input from a user. The system converts the speech into a sequence of phonemes using a phoneme mapping, which translates spoken words into phonetic representations for further processing. The system then generates a command or action in the game based on the phoneme sequence. To improve accuracy, the system receives information about the user's accent and selects an appropriate phoneme mapping tailored to that accent. This ensures that the speech recognition system accurately interprets the user's speech, regardless of regional or linguistic variations in pronunciation. The system may also include a microphone for capturing the user's speech and a display for presenting game content. The accent-based phoneme mapping selection enhances the system's ability to recognize and respond to user commands, improving gameplay responsiveness and reducing errors in speech interpretation.

Claim 3

Original Legal Text

3. The video game system of claim 2 , wherein the information indicative of an accent includes at least one of: an age of the user; a gender of the user; a current location of the user; or a location where the user lived.

Plain English Translation

This invention relates to a video game system that customizes gameplay based on a user's accent. The system identifies and analyzes the user's speech patterns to determine their accent, which is then used to tailor in-game content, interactions, or difficulty levels. The system addresses the problem of generic, one-size-fits-all gaming experiences by adapting to regional or demographic variations in speech, making the game more immersive and personalized. The system processes audio input from the user to extract features indicative of their accent, such as pronunciation, intonation, or vocabulary. These features are compared against a database of known accent profiles to classify the user's accent. The system then adjusts gameplay elements, such as dialogue, character interactions, or environmental settings, to reflect the user's accent or cultural background. For example, a user with a regional accent may encounter in-game characters that speak similarly, or the game may adjust difficulty based on the user's perceived age or location. The accent information includes factors like the user's age, gender, current location, or past residences. These details help refine the system's adaptation, ensuring that the game's customization aligns with the user's likely background. The goal is to enhance engagement by making the gaming experience feel more relevant and authentic to the user.

Claim 4

Original Legal Text

4. The video game system of claim 1 , wherein the processor is configured to execute the computer-executable instructions to perform the method further comprising: changing a phoneme associated with one or more letters in the grammar training set; recognizing the speech sample using the mutated phoneme mapping; determining a second confidence score based on an analysis of the speech sample using the mutated phoneme mapping; comparing the second confidence score to the first confidence score; and based at least in part on the comparison of the second confidence score to the first confidence score, undoing a previous mutation and performing a different mutation to generate the mutated phoneme mapping.

Plain English Translation

The invention relates to a video game system with speech recognition capabilities, specifically for grammar training. The system addresses the challenge of accurately recognizing speech in the presence of phonetic variations, which can degrade recognition performance. The system includes a processor that executes instructions to process speech samples from a user, comparing them against a grammar training set. The processor generates a phoneme mapping for the speech sample and calculates a confidence score based on this mapping. To improve recognition accuracy, the system mutates the phoneme associated with one or more letters in the grammar training set, then re-recognizes the speech sample using the mutated phoneme mapping. A second confidence score is determined and compared to the initial score. If the mutated mapping does not improve recognition, the system undoes the previous mutation and applies a different mutation to the phoneme mapping. This iterative process refines the phoneme mapping to enhance speech recognition accuracy in the context of grammar training. The system dynamically adjusts phoneme representations to better match user speech patterns, improving the robustness of the recognition engine.

Claim 5

Original Legal Text

5. The video game system of claim 1 , wherein changing the first phoneme mapping to generate the mutated phoneme mapping until the confidence score generated using the mutated phoneme mapping satisfies the defined confidence threshold comprises: changing a phoneme associated with one or more letters in the grammar training set; recognizing the speech sample using the mutated phoneme mapping; determining a second confidence score based on an analysis of the speech sample using the mutated phoneme mapping; comparing the second confidence score to the first confidence score; and determining that the grammar training set is more accurately recognized using the mutated phoneme mapping as compared to using the first phoneme mapping.

Plain English Translation

This invention relates to a video game system that improves speech recognition accuracy by dynamically adjusting phoneme mappings. The system addresses the problem of inaccurate speech recognition in games, particularly when players use non-standard pronunciations or dialects. The core technology involves a method for optimizing phoneme mappings in a grammar training set to enhance recognition confidence. The system starts with an initial phoneme mapping and evaluates its performance by analyzing a speech sample. A confidence score is generated based on how well the speech sample matches the expected grammar. If the confidence score falls below a defined threshold, the system iteratively modifies the phoneme mapping by altering phonemes associated with specific letters in the training set. Each modification creates a mutated phoneme mapping, which is then tested against the speech sample to produce a new confidence score. The system compares this new score to the previous one and determines whether the mutated mapping improves recognition accuracy. If the mutated mapping yields a higher confidence score, it replaces the original mapping, ensuring more accurate speech recognition over time. This adaptive approach allows the system to better handle variations in pronunciation, improving player experience in voice-controlled games.

Claim 6

Original Legal Text

6. The video game system of claim 1 , wherein changing the first phoneme mapping comprises randomly changing one or more phoneme mappings.

Plain English Translation

A video game system is designed to enhance player engagement by dynamically adjusting phoneme mappings in speech synthesis to generate varied and unpredictable audio responses. The system includes a speech synthesis module that converts text into speech using phoneme mappings, which define how text characters are translated into phonetic sounds. The problem addressed is the lack of variability in synthesized speech, which can make interactions feel repetitive and artificial. To solve this, the system modifies the phoneme mappings during gameplay, introducing randomness to the phonetic output. Specifically, the system randomly alters one or more phoneme mappings, causing the same text input to produce different phonetic pronunciations over time. This randomization creates a more natural and dynamic speech experience, preventing predictable patterns and enhancing immersion. The system may also include additional features such as real-time adjustments based on game events or player actions, further personalizing the audio output. By introducing variability in phoneme mappings, the system improves the realism and engagement of synthesized speech in video games.

Claim 7

Original Legal Text

7. The video game system of claim 1 , wherein the first confidence score is a per phoneme confidence score.

Plain English Translation

The invention relates to a video game system that enhances speech recognition accuracy by assigning confidence scores to recognized speech elements. The system addresses the challenge of accurately interpreting player speech in noisy or ambiguous environments, which can lead to misinterpretations and disrupt gameplay. The system processes audio input from a player, converts it into recognized speech elements, and assigns a confidence score to each recognized element. These confidence scores reflect the system's certainty in the accuracy of the recognition for each element. In this specific embodiment, the confidence score is calculated on a per-phoneme basis, meaning each individual phoneme within a recognized word or phrase is evaluated for confidence. This granular approach allows the system to better handle variations in pronunciation, accents, or background noise, improving the overall reliability of speech recognition in video games. The system may use these confidence scores to filter out low-confidence elements, request clarification, or adjust gameplay responses accordingly. This method ensures that the game responds more accurately to player commands, enhancing immersion and reducing frustration. The system may also integrate with other components, such as natural language processing or machine learning models, to further refine recognition and response accuracy.

Claim 8

Original Legal Text

8. A method for voice recognition as implemented by a computing system configured with specific computer-executable instructions, the method comprising: receiving a grammar training set; receiving a speech sample of a user reading at least a first portion of the grammar training set; determining a first phoneme mapping for each letter or combination of letters in the first portion of the grammar training set; recognizing the speech sample using the first phoneme mapping; determining a first confidence score indicative of how accurately the recognized speech sample corresponds to the first portion of the grammar training set by using the first phoneme mapping; changing the first phoneme mapping to generate a mutated phoneme mapping until a confidence score generated using the mutated phoneme mapping satisfies a defined confidence threshold; storing, in a user profile in a data store, the mutated phoneme mapping that satisfies the defined confidence threshold, wherein the mutated phoneme mapping is associated with the user in the user profile; receiving subsequent audio data from the user; accessing the user profile to retrieve the mutated phoneme mapping that is associated with the user; and using the mutated phoneme mapping to recognize the subsequent audio data received from the user.

Plain English Translation

This invention relates to voice recognition systems that adapt to individual users by optimizing phoneme mappings for improved accuracy. The problem addressed is the variability in how different users pronounce words, leading to recognition errors in conventional systems that rely on generic phoneme mappings. The solution involves a training process where a user reads a predefined grammar training set, and their speech is analyzed to generate an initial phoneme mapping for letters or letter combinations. The system then recognizes the speech sample using this mapping and calculates a confidence score indicating how well the recognized speech matches the training text. The phoneme mapping is iteratively adjusted (mutated) until the confidence score meets a predefined threshold. The optimized mapping is stored in a user profile and later used to recognize subsequent audio input from the same user, improving recognition accuracy by tailoring the phoneme model to the user's unique pronunciation patterns. This approach enhances voice recognition performance by personalizing the phoneme mappings based on individual speech characteristics.

Claim 9

Original Legal Text

9. The method of claim 8 , further comprising: receiving information indicative of an accent of the user; and selecting the first phoneme mapping based on the information indicative of an accent.

Plain English Translation

This invention relates to speech recognition systems that adapt to user accents. The problem addressed is the difficulty of accurately recognizing speech when the user's accent differs from the standard phoneme mappings used by the system. The invention improves speech recognition by dynamically selecting phoneme mappings tailored to the user's accent. The method involves receiving audio input from a user and analyzing it to determine the user's accent. This is done by comparing the user's speech patterns to predefined accent profiles or by detecting specific phonetic variations characteristic of certain accents. Once the accent is identified, the system selects a phoneme mapping that corresponds to the detected accent. This mapping defines how phonemes (basic speech sounds) are represented in the system, ensuring that the recognition process accounts for accent-specific pronunciation differences. The selected phoneme mapping is then applied to the user's speech input to improve recognition accuracy. The system may also include a training phase where the user's speech is analyzed over time to refine the accent detection and phoneme mapping selection. This adaptive approach allows the system to continuously improve its accuracy for individual users. The invention enhances speech recognition performance by reducing errors caused by accent mismatches, making the system more reliable for users with diverse accents.

Claim 10

Original Legal Text

10. The method of claim 9 , wherein the information indicative of an accent includes at least one of: an age of the user; a gender of the user; a current location of the user; or a location where the user lived.

Plain English Translation

This invention relates to systems and methods for processing audio data to determine information indicative of a user's accent. The technology addresses the challenge of accurately identifying and characterizing accents in speech, which is useful for applications such as language learning, voice recognition, and personalized user experiences. The method involves analyzing audio input to extract features that correlate with specific accent characteristics. These features may include phonetic variations, prosodic patterns, or other linguistic markers that distinguish one accent from another. The method further incorporates additional contextual information to refine accent detection. This information may include the user's age, gender, current location, or past residence. By combining these factors with audio analysis, the system improves the accuracy of accent identification. For example, a user's current location may help distinguish between regional dialects, while their age or gender may influence phonetic or prosodic variations. The system may also use historical data, such as where the user has lived, to better understand the influences shaping their accent. This approach enhances the robustness of accent detection in diverse linguistic environments.

Claim 11

Original Legal Text

11. The method of claim 8 , further comprising: changing a phoneme associated with one or more letters in the grammar training set; recognizing the speech sample using the mutated phoneme mapping; determining a second confidence score based on an analysis of the speech sample using the mutated phoneme mapping; comparing the second confidence score to the first confidence score; and based at least in part on the comparison of the second confidence score to the first confidence score, performing a different mutation to generate the mutated phoneme mapping.

Plain English Translation

This invention relates to speech recognition systems, specifically improving accuracy by dynamically adjusting phoneme mappings in grammar training sets. The problem addressed is the rigidity of traditional phoneme-to-letter mappings, which can lead to recognition errors when speech samples deviate from expected pronunciations. The solution involves iteratively modifying phoneme associations within a grammar training set to optimize speech recognition performance. The method begins by analyzing a speech sample using an initial phoneme mapping to generate a confidence score. If recognition accuracy is insufficient, the system alters one or more phoneme associations in the training set, creating a mutated phoneme mapping. The speech sample is then re-processed using this modified mapping to produce a second confidence score. The system compares the second score to the first, and if improvement is detected, further mutations are applied to refine the phoneme mapping. This iterative process continues until recognition confidence reaches a satisfactory threshold or no further improvements are observed. The approach allows the system to adapt to variations in pronunciation, improving accuracy for diverse speakers and dialects.

Claim 12

Original Legal Text

12. The method of claim 8 , wherein changing the first phoneme mapping to generate the mutated phoneme mapping until the confidence score generated using the mutated phoneme mapping satisfies the defined confidence threshold comprises: changing a phoneme associated with one or more letters in the grammar training set; recognizing the speech sample using the mutated phoneme mapping; determining a second confidence score based on an analysis of the speech sample using the mutated phoneme mapping; comparing the second confidence score to the first confidence score; and determining that the grammar training set is more accurately recognized using the mutated phoneme mapping as compared to using the first phoneme mapping.

Plain English Translation

This invention relates to improving speech recognition accuracy by dynamically adjusting phoneme mappings in a grammar training set. The problem addressed is the inherent variability in how different speakers pronounce words, which can lead to recognition errors when using fixed phoneme mappings. The solution involves iteratively modifying phoneme mappings to optimize recognition confidence. The method begins with an initial phoneme mapping for a grammar training set, which defines how letters or letter combinations correspond to phonemes. A speech sample is recognized using this mapping, and a confidence score is generated based on the recognition result. If this score does not meet a predefined threshold, the phoneme mapping is altered by changing the phoneme associated with one or more letters in the training set. The speech sample is then recognized again using this mutated mapping, and a new confidence score is calculated. The mutated mapping is retained if it yields a higher confidence score than the original, indicating improved recognition accuracy. This process repeats until the confidence score meets the threshold, ensuring the phoneme mapping is optimized for the specific speech sample. The method ensures that the grammar training set is more accurately recognized using the refined phoneme mapping compared to the initial version.

Claim 13

Original Legal Text

13. The method of claim 8 , wherein changing the first phoneme mapping comprises randomly changing one or more phoneme mappings.

Plain English Translation

This invention relates to speech processing systems, specifically methods for modifying phoneme mappings in speech synthesis or recognition systems to improve performance or generate variations. The problem addressed is the need for dynamic adjustment of phoneme-to-audio mappings to enhance naturalness, reduce errors, or create diverse outputs. The method involves altering the relationship between phonemes and their corresponding audio representations. In particular, the invention describes a technique where one or more phoneme mappings are randomly modified. This random adjustment can be applied to a subset of phonemes or across the entire phoneme set, depending on the system's requirements. The random changes may involve altering the audio representation of a phoneme, swapping mappings between phonemes, or introducing variations in timing or prosody. This approach can be used to improve robustness in speech recognition by training models with varied phoneme representations, or in speech synthesis to generate more natural or diverse outputs. The random modifications may be constrained by linguistic rules or acoustic criteria to ensure the changes remain plausible. This technique is particularly useful in systems where static phoneme mappings lead to unnatural or repetitive speech outputs.

Claim 14

Original Legal Text

14. The method of claim 8 , further comprising: detecting a word in the subsequent audio data, wherein the word is not included in the grammar training set.

Plain English Translation

This invention relates to speech recognition systems, specifically improving accuracy by handling out-of-grammar words in audio data. The problem addressed is that traditional speech recognition systems struggle with words not present in their predefined grammar training sets, leading to recognition errors or failures. The solution involves a method that processes audio data by first generating a grammar-based transcription using a trained grammar model. The system then analyzes subsequent audio data to detect words not included in the original grammar training set. When such words are identified, they are flagged or processed separately to improve recognition accuracy. The method may also involve updating the grammar model dynamically to incorporate these new words, enhancing future recognition performance. This approach ensures that the system can handle unexpected vocabulary while maintaining the benefits of grammar-based recognition for known words. The invention is particularly useful in applications like voice assistants, transcription services, or automated customer support where accurate recognition of both expected and unexpected words is critical.

Claim 15

Original Legal Text

15. A non-transitory, computer-readable storage medium storing computer readable instructions that, when executed by one or more processors in a computing device, causes the computing device to perform operations comprising: receiving, through a network connection port, a speech sample of a user reading a grammar training set; determining a first phoneme mapping for each letter or combination of letters in the grammar training set; recognizing the speech sample using the first phoneme mapping; determining a first confidence score indicative of how accurately the grammar training set is recognized using the speech sample using the first phoneme mapping; changing the first phoneme mapping to generate a mutated phoneme mapping until a confidence score generated using the mutated phoneme mapping satisfies a defined confidence threshold; storing, in a user profile in a data store, the mutated phoneme mapping that satisfies the defined confidence threshold, wherein the mutated phoneme mapping is associated with the user in the user profile; receiving subsequent audio data from the user; accessing the user profile to retrieve the mutated phoneme mapping that is associated with the user; and using the mutated phoneme mapping to recognize the subsequent audio data received from the user.

Plain English Translation

This invention relates to speech recognition systems that adapt to individual users by optimizing phoneme mappings for improved accuracy. The problem addressed is the variability in how different users pronounce letters or letter combinations, which can reduce the accuracy of standard speech recognition models. The solution involves a system that dynamically adjusts phoneme mappings based on a user's speech patterns to enhance recognition performance. The system receives a speech sample of a user reading a predefined grammar training set. It then determines an initial phoneme mapping for each letter or letter combination in the training set and uses this mapping to recognize the speech sample, generating a confidence score reflecting recognition accuracy. The system iteratively modifies the phoneme mapping (mutating it) until the confidence score meets a predefined threshold. The optimized phoneme mapping is stored in a user profile and later retrieved to recognize subsequent audio data from the same user, improving recognition accuracy by leveraging the user-specific phoneme mapping. This approach allows the system to adapt to individual pronunciation variations, enhancing speech recognition performance for personalized use.

Claim 16

Original Legal Text

16. The computer-readable storage medium of claim 15 , wherein the computer readable instructions are further configured to cause the computing device to perform operations comprising: receiving information indicative of an accent of the user; and selecting the first phoneme mapping based on the information indicative of an accent.

Plain English Translation

This invention relates to speech processing systems that adapt to a user's accent. The problem addressed is the difficulty in accurately recognizing or synthesizing speech when the user's accent differs from the default phoneme mappings used by the system. Phoneme mappings define how phonetic units correspond to spoken sounds, and mismatches between a user's accent and the system's default mappings can degrade performance. The system includes a computing device that processes speech data by selecting a phoneme mapping tailored to the user's accent. The device receives information about the user's accent, which may be explicitly provided or inferred from speech samples. Based on this information, the system selects an appropriate phoneme mapping from a set of available mappings, each corresponding to different accents. This ensures that the speech processing—whether recognition, synthesis, or other tasks—aligns with the user's natural speech patterns, improving accuracy and naturalness. The system may also include a database of phoneme mappings for various accents, allowing dynamic selection based on user input or automatic detection. The invention enhances speech processing systems by personalizing phoneme mappings to individual users, reducing errors caused by accent mismatches. This is particularly useful in applications like voice assistants, translation services, and accessibility tools where accurate speech processing is critical.

Claim 17

Original Legal Text

17. The computer-readable storage medium of claim 16 , wherein the information indicative of an accent includes at least one of: an age of the user; a gender of the user; a current location of the user; or a location where the user lived.

Plain English Translation

This invention relates to speech recognition systems that adapt to user accents. The problem addressed is the difficulty in accurately recognizing speech from users with diverse accents, which can degrade performance in speech-to-text applications. The solution involves a computer-readable storage medium containing instructions for a speech recognition system to analyze and adapt to user accents based on specific demographic and geographic factors. The system collects and processes information indicative of a user's accent, including the user's age, gender, current location, or past residence locations. This data is used to refine speech recognition models, improving accuracy for users with different accents. The system may also track changes in a user's accent over time, such as shifts due to relocation or aging, and adjust recognition parameters accordingly. By incorporating these factors, the system dynamically adapts to variations in pronunciation and speech patterns, enhancing recognition performance across diverse user groups. The approach ensures that speech recognition remains accurate regardless of regional or demographic differences in accent.

Claim 18

Original Legal Text

18. The computer-readable storage medium of claim 15 , wherein the computer readable instructions are further configured to cause the computing device to perform operations comprising: changing a phoneme associated with one or more letters in the grammar training set; recognizing the speech sample using the mutated phoneme mapping; determining a second confidence score based on an analysis of the speech sample using the mutated phoneme mapping; comparing the second confidence score to the first confidence score; and based at least in part on the comparison of the second confidence score to the first confidence score, undo a previous mutation and performing a different mutation to generate the mutated phoneme mapping.

Plain English Translation

Automatic speech recognition (ASR) systems rely on accurate phoneme mappings to convert spoken language into text. A key challenge is optimizing these mappings to improve recognition accuracy, especially when dealing with variations in pronunciation or dialect. Existing systems often struggle to dynamically adjust phoneme mappings during training, leading to suboptimal performance. This invention addresses this problem by providing a method for iteratively refining phoneme mappings in an ASR system. The process begins with a grammar training set containing speech samples and their corresponding text representations. The system generates a phoneme mapping for the training set and evaluates a speech sample using this mapping, producing a confidence score. To improve accuracy, the system mutates a phoneme associated with one or more letters in the training set, then re-evaluates the speech sample with the mutated mapping to determine a new confidence score. The system compares this new score to the original one. If the mutation does not improve recognition, the system reverts the change and applies a different mutation. This iterative process continues, refining the phoneme mapping to enhance ASR performance. The approach allows the system to adapt to pronunciation variations and improve recognition accuracy over time.

Claim 19

Original Legal Text

19. The computer-readable storage medium of claim 15 , wherein changing the first phoneme mapping to generate the mutated phoneme mapping until the confidence score generated using the mutated phoneme mapping satisfies the defined confidence threshold comprises: changing a phoneme associated with one or more letters in the grammar training set; recognizing the speech sample using the mutated phoneme mapping; determining a second confidence score based on an analysis of the speech sample using the mutated phoneme mapping; comparing the second confidence score to the first confidence score; and determining that the grammar training set is more accurately recognized using the mutated phoneme mapping as compared to using the first phoneme mapping.

Plain English Translation

This invention relates to speech recognition systems that improve accuracy by dynamically adjusting phoneme mappings. The problem addressed is the inherent variability in how different speakers pronounce words, which can reduce recognition accuracy when using fixed phoneme mappings. The solution involves iteratively modifying a phoneme mapping to optimize speech recognition performance. The system starts with an initial phoneme mapping and a grammar training set containing speech samples. It generates a confidence score by analyzing the speech samples using the initial mapping. If the score does not meet a predefined threshold, the system alters the phoneme mapping by changing the phoneme associated with one or more letters in the training set. The modified mapping is then used to re-analyze the speech samples, producing a new confidence score. The system compares this new score to the previous one and determines whether the modified mapping improves recognition accuracy. This process repeats until the confidence score meets the threshold, ensuring the phoneme mapping is optimized for the given speech samples. The method ensures that the speech recognition system adapts to variations in pronunciation, enhancing overall accuracy.

Claim 20

Original Legal Text

20. The computer-readable storage medium of claim 15 , wherein the first confidence score is a per word confidence score.

Plain English Translation

A system and method for processing natural language input involves analyzing text to generate confidence scores indicating the likelihood that each word or segment of the text is correctly recognized or interpreted. The system receives an input text, such as speech-to-text output or a typed query, and applies a confidence scoring algorithm to evaluate the accuracy of each word. The confidence scores may be per-word or per-segment, reflecting the system's certainty in the recognition or interpretation of individual elements. The system then uses these scores to refine the input, correct errors, or improve downstream processing tasks like search, translation, or summarization. The confidence scoring may be based on acoustic features, language models, or contextual analysis. The system can also adjust the scoring based on user feedback or historical data to improve accuracy over time. This approach enhances the reliability of text processing in applications where input quality varies, such as voice assistants, transcription services, or multilingual communication tools.

Patent Metadata

Filing Date

Unknown

Publication Date

April 21, 2020

Inventors

David Gershon Streat

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “INTELLIGENT PERSONALIZED SPEECH RECOGNITION” (10629192). https://patentable.app/patents/10629192

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10629192. See llms.txt for full attribution policy.

INTELLIGENT PERSONALIZED SPEECH RECOGNITION