Patentable/Patents/US-11527248
US-11527248

Speech recognition with parallel recognition tasks

PublishedDecember 13, 2022
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

The subject matter of this specification can be embodied in, among other things, a method that includes receiving an audio signal and initiating speech recognition tasks by a plurality of speech recognition systems (SRS's). Each SRS is configured to generate a recognition result specifying possible speech included in the audio signal and a confidence value indicating a confidence in a correctness of the speech result. The method also includes completing a portion of the speech recognition tasks including generating one or more recognition results and one or more confidence values for the one or more recognition results, determining whether the one or more confidence values meets a confidence threshold, aborting a remaining portion of the speech recognition tasks for SRS's that have not generated a recognition result, and outputting a final recognition result based on at least one of the generated one or more speech results.

Patent Claims
9 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 2

Original Legal Text

2. The computer-implemented method of claim 1, wherein each of the plurality of same particular speech recognition results is obtained from a different one of the subset of the SRSs.

Plain English Translation

This invention relates to speech recognition systems and addresses the challenge of improving accuracy by leveraging multiple speech recognition services (SRSs). The method involves processing an audio input through a subset of SRSs to generate multiple speech recognition results. Each result in the subset is derived from a distinct SRS within the subset, ensuring diversity in the recognition outputs. The method then analyzes these results to identify a particular speech recognition result that is common among multiple SRSs. This common result is selected as the final output, as it is more likely to be accurate due to its consistency across different recognition services. The approach mitigates errors from individual SRSs by relying on consensus, enhancing the reliability of the final transcription. The method may also involve preprocessing the audio input, such as noise reduction or normalization, before passing it to the SRSs. The subset of SRSs can be dynamically selected based on factors like performance metrics, cost, or latency requirements. The invention improves speech recognition accuracy by leveraging redundancy and cross-verification among multiple recognition services.

Claim 3

Original Legal Text

3. The computer-implemented method of claim 1, wherein a combination of the confidence values is further weighted based on a distribution of the confidence values obtained from the subset of the SRSs that generated the plurality of same particular speech recognition results.

Plain English Translation

This invention relates to improving the accuracy of speech recognition systems (SRSs) by analyzing and weighting confidence values derived from multiple SRS outputs. The problem addressed is the variability in confidence scores produced by different SRSs when processing the same audio input, which can lead to inconsistent or unreliable results. The method involves obtaining a plurality of speech recognition results from a subset of SRSs, where each SRS generates a confidence value for its output. When multiple SRSs produce the same particular speech recognition result, the confidence values associated with that result are further refined by applying a weighting factor. This weighting factor is based on the distribution of the confidence values from the subset of SRSs that generated the same result. By adjusting the confidence values in this manner, the method aims to enhance the reliability and consistency of the final speech recognition output, particularly in scenarios where multiple SRSs agree on a result but their individual confidence scores vary. The technique leverages the collective performance of the SRS subset to improve the confidence assessment of the recognized speech.

Claim 4

Original Legal Text

4. The computer-implemented method of claim 1, wherein a combination of the confidence values is further weighted based on one or more characteristics of the subset of the SRSs that generated the plurality of same particular speech recognition results.

Plain English Translation

This invention relates to improving speech recognition accuracy by analyzing and weighting confidence values derived from multiple speech recognition systems (SRSs). The problem addressed is the variability in performance across different SRSs, which can lead to inconsistent or unreliable speech recognition results. The method involves processing input speech data using a plurality of SRSs to generate multiple speech recognition results. When a subset of these SRSs produces the same particular speech recognition result, the confidence values associated with that result are combined and further refined by applying weights based on characteristics of the subset of SRSs. These characteristics may include factors such as the number of SRSs in the subset, their individual performance metrics, or other relevant attributes. By dynamically adjusting the confidence values in this manner, the method enhances the reliability and accuracy of the final speech recognition output. The approach leverages the strengths of multiple SRSs while mitigating their individual weaknesses, particularly in scenarios where different systems may perform differently under varying conditions. This technique is useful in applications requiring high-accuracy speech recognition, such as virtual assistants, transcription services, or real-time communication systems.

Claim 5

Original Legal Text

5. The computer-implemented method of claim 4, wherein the one or more characteristics include one or more characteristics selected from a group consisting of one or more overall levels of accuracy for a respective SRS of the subset of the SRSs that generated the plurality of same particular speech recognition results, one or more contextual levels of accuracy within a context for the audio signal for the respective SRS, and one or more temporal levels of accuracy for one or more periods of time for the respective SRS.

Plain English Translation

This invention relates to speech recognition systems (SRSs) and addresses the challenge of selecting the most accurate speech recognition result from multiple SRSs when they produce the same output. The method evaluates the performance of different SRSs based on various accuracy metrics to determine the most reliable result. The key characteristics assessed include overall accuracy levels for each SRS, contextual accuracy within specific contexts of the audio signal, and temporal accuracy over different time periods. By analyzing these factors, the system identifies the SRS that is most accurate for the given input, improving the reliability of speech recognition outputs. The approach ensures that the selected result is optimized for accuracy in different scenarios, whether general, context-specific, or time-dependent. This method enhances the performance of multi-SRS systems by dynamically selecting the best-performing SRS based on real-time or historical accuracy data.

Claim 6

Original Legal Text

6. The computer-implemented method of claim 1, wherein a combination of the confidence values is further weighted based on a level of similarity between respective SRSs of the subset of the SRSs that generated the plurality of same particular speech recognition results.

Plain English Translation

This invention relates to improving the accuracy of speech recognition systems by refining confidence values based on similarity between speech recognition systems (SRSs). The problem addressed is the variability in confidence scores assigned by different SRSs to the same speech input, which can lead to inconsistent or unreliable results. The method involves analyzing a subset of SRSs that produce the same recognition result for a given input. For each SRS in this subset, a confidence value is generated, representing the likelihood that the result is correct. These confidence values are then combined and further refined by applying a weighting factor based on the similarity between the SRSs. Higher similarity between SRSs increases the weight of their combined confidence values, while lower similarity reduces it. This approach enhances the reliability of the final confidence score by leveraging the agreement and similarity of multiple SRSs, thereby improving the overall accuracy of speech recognition. The method is particularly useful in systems where multiple SRSs are used to cross-validate results, such as in automated transcription or voice-controlled applications.

Claim 7

Original Legal Text

7. The computer-implemented method of claim 1, wherein a combination of the confidence values is further weighted based on error rates of the subset of the SRSs that generated the plurality of same particular speech recognition results.

Plain English Translation

This invention relates to improving the accuracy of speech recognition systems by dynamically weighting confidence values based on error rates of individual speech recognition systems (SRSs). The problem addressed is the variability in performance among different SRSs, where some may produce more reliable results than others. The method involves analyzing a subset of SRSs that generate the same speech recognition result for a given input. Each SRS in this subset assigns a confidence value to its result, indicating its certainty. The invention enhances accuracy by further weighting these confidence values based on the historical error rates of the SRSs in the subset. SRSs with lower error rates contribute more heavily to the final decision, while those with higher error rates have reduced influence. This approach ensures that the most reliable systems have a greater impact on the final output, improving overall speech recognition accuracy. The method is particularly useful in distributed or ensemble speech recognition systems where multiple SRSs process the same input. By dynamically adjusting weights based on error rates, the system adapts to the varying performance of individual SRSs, leading to more consistent and accurate results.

Claim 9

Original Legal Text

9. The device of claim 8, wherein each of the plurality of same particular speech recognition result is obtained from a different one of the subset of the SRSs.

Plain English Translation

The invention relates to a system for improving speech recognition accuracy by leveraging multiple speech recognition services (SRSs). The problem addressed is the variability in performance among different SRSs, which can lead to inconsistent or inaccurate transcriptions of spoken language. The solution involves using a subset of SRSs to process the same input audio and generating multiple speech recognition results. Each of these results is obtained from a different SRS within the subset, ensuring diversity in the recognition outputs. This approach allows for cross-verification or aggregation of results to enhance overall accuracy. The system may further include mechanisms to select the most reliable SRSs for a given input or to combine results from multiple SRSs to produce a final, more accurate transcription. The invention aims to mitigate errors by exploiting the strengths of different SRSs, particularly in scenarios where a single SRS may struggle with certain accents, background noise, or complex vocabulary. The method can be applied in real-time applications such as virtual assistants, transcription services, or automated customer support systems.

Claim 10

Original Legal Text

10. The device of claim 8, wherein a combination of the confidence values is further weighted based on one or more characteristics of the subset of the SRSs that generated the plurality of same particular speech recognition results.

Plain English Translation

The invention relates to speech recognition systems and improving the accuracy of speech recognition results by analyzing and weighting confidence values based on characteristics of the speech recognition systems (SRSs) that generated those results. The problem addressed is the variability in accuracy among different speech recognition systems, which can lead to inconsistent or unreliable transcriptions. The invention provides a method to enhance the reliability of speech recognition outputs by evaluating the confidence values assigned to recognition results and adjusting those values based on the performance characteristics of the specific SRSs that produced them. The device includes multiple speech recognition systems that process the same input speech to generate a plurality of speech recognition results. When a subset of these systems produces the same particular speech recognition result, the confidence values associated with that result are combined. The combined confidence values are then weighted based on one or more characteristics of the subset of SRSs, such as their historical accuracy, processing speed, or other performance metrics. This weighting adjusts the confidence values to reflect the reliability of the systems that generated the results, improving the overall accuracy of the speech recognition output. The invention ensures that the most reliable systems have a greater influence on the final result, reducing errors and enhancing the consistency of speech recognition outcomes.

Claim 11

Original Legal Text

11. The device of claim 8, wherein a combination of the confidence values is further weighted based on a level of similarity between respective SRSs of the subset of the SRSs that generated the plurality of same particular speech recognition results.

Plain English Translation

This invention relates to speech recognition systems and addresses the challenge of improving accuracy in speech recognition by leveraging multiple speech recognition systems (SRSs) and their confidence values. The system processes an input audio signal using a subset of SRSs, each generating a speech recognition result and an associated confidence value. When multiple SRSs produce the same particular speech recognition result, the system combines their confidence values to enhance reliability. The combination is further refined by weighting the confidence values based on the similarity between the SRSs in the subset, ensuring that more similar systems contribute proportionally to the final confidence assessment. This approach improves the robustness of speech recognition by accounting for both the agreement among systems and their inherent similarities, reducing errors in ambiguous or noisy audio inputs. The method dynamically adjusts the influence of each SRS based on their performance characteristics, leading to more accurate and context-aware speech recognition outcomes.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

May 27, 2020

Publication Date

December 13, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Speech recognition with parallel recognition tasks” (US-11527248). https://patentable.app/patents/US-11527248

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-11527248. See llms.txt for full attribution policy.