10553200

System and Methods for Correcting Text-To-Speech Pronunciation

PublishedFebruary 4, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
12 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A text-to-speech (TTS) server comprising one or more processors in communication with one or more memory devices, the TTS server configured to: generate, for a plurality of first user devices, a first machine pronunciation of text data according to at least one phonetic rule; receive crowdsource data comprising a plurality of pronunciation corrections of the first machine pronunciation from a plurality of audio input devices of the plurality of first user devices, wherein the plurality of first user devices are located in a first geographic location at a time of submission of the pronunciation corrections; generate a second machine pronunciation of the text data by augmenting the at least one phonetic rule based on the crowdsource data; receive, from a second user device, subsequent to generation of the second machine pronunciation, a TTS request including the text data; determine whether the second user device is located within the first geographic location; and provide, via an audio output device of the second user device, one of (i) the first machine pronunciation in response to the second user device being located outside the first geographic location, and (ii) the second machine pronunciation in response to the second user device being located within the first geographic location.

Plain English Translation

A text-to-speech (TTS) system improves pronunciation accuracy by leveraging crowdsourced corrections from users in specific geographic regions. The system includes a server with processors and memory that generates an initial machine pronunciation of text data using predefined phonetic rules. User devices in a particular geographic location submit audio corrections to the initial pronunciation, which the server collects as crowdsourced data. The server then refines the phonetic rules based on these corrections to produce an updated pronunciation tailored to the region. When a subsequent TTS request is received, the system checks the requesting device's location. If the device is within the same geographic region, the system provides the updated pronunciation; otherwise, it delivers the original pronunciation. This approach ensures that TTS output adapts to regional pronunciation variations while maintaining consistency for users outside the corrected area. The system dynamically improves pronunciation accuracy over time by continuously incorporating user feedback from specific locations.

Claim 2

Original Legal Text

2. The TTS server of claim 1 further configured to assign one of the pronunciation corrections submitted by one of the plurality of first user devices to a user profile associated with a user of the one of the plurality of first user devices.

Plain English Translation

This invention relates to a text-to-speech (TTS) system that improves pronunciation accuracy by incorporating user-submitted corrections. The system addresses the problem of inconsistent or incorrect pronunciation in automated speech synthesis, particularly when dealing with specialized terminology, regional dialects, or user-specific preferences. The TTS server receives text input and generates synthesized speech. It also receives pronunciation corrections from multiple user devices, where users can submit adjustments to how specific words or phrases should be pronounced. The server processes these corrections, validates them, and applies them to future speech synthesis tasks. Additionally, the server assigns submitted corrections to individual user profiles, allowing personalized pronunciation adjustments for each user. This ensures that corrections are stored and applied consistently for the user who submitted them, improving the accuracy of speech synthesis for that user's preferences or dialect. The system may also include a feedback mechanism where users can rate or approve corrections, helping to refine and validate the accuracy of the submitted changes. The server may prioritize corrections based on user feedback or frequency of submission, ensuring that the most reliable and widely accepted corrections are applied. This approach enhances the adaptability and personalization of TTS systems, making them more accurate and user-friendly.

Claim 3

Original Legal Text

3. The TTS server of claim 2 , wherein the pronunciation correction is configured to override the at least one phonetic rule.

Plain English Translation

The invention relates to a text-to-speech (TTS) system designed to improve pronunciation accuracy by dynamically correcting phonetic rules. Traditional TTS systems rely on predefined phonetic rules to convert text into speech, but these rules may not account for exceptions or variations in pronunciation, leading to unnatural or incorrect speech output. The system includes a TTS server that processes input text and generates speech using phonetic rules. A pronunciation correction module is integrated into the TTS server to modify or override these rules when necessary. The correction module can adjust pronunciation based on contextual factors, such as regional dialects, proper nouns, or user-specific preferences. By overriding the default phonetic rules, the system ensures more accurate and natural-sounding speech output. This approach enhances the flexibility and adaptability of TTS systems, particularly in scenarios where standard phonetic rules fail to produce the desired pronunciation. The invention addresses the problem of rigid phonetic rule application in TTS systems, providing a solution that dynamically adjusts pronunciation to improve speech quality.

Claim 4

Original Legal Text

4. The TTS server of claim 1 further configured to determine a current location of the plurality of first user devices via location services.

Plain English Translation

This invention relates to a text-to-speech (TTS) server system designed to enhance user interaction by dynamically adjusting speech output based on contextual factors, including user location. The system includes a TTS server that processes text input from multiple user devices and generates corresponding speech output. The server is configured to analyze the text input to identify contextual elements, such as keywords or phrases, and adjust the speech output parameters—such as voice characteristics, speed, or emphasis—based on these elements. Additionally, the server determines the current location of the user devices using location services, such as GPS or network-based positioning, to further tailor the speech output. For example, the system may modify pronunciation, dialect, or content relevance based on the user's geographic location. The server also manages user preferences and historical data to refine speech output over time, ensuring personalized and contextually appropriate responses. This approach improves user experience by making speech output more natural, relevant, and adaptive to both linguistic and environmental factors.

Claim 5

Original Legal Text

5. A computer-implemented method for correcting pronunciation in a text-to-speech (TTS) system, said method implemented using a TTS server in communication with one or more memory devices, said method comprising: generating, by the TTS server for a plurality of first user devices, a first machine pronunciation of text data according to at least one phonetic rule; receiving, by the TTS server, crowdsource data comprising a plurality of pronunciation corrections of the first machine pronunciation from a plurality of audio input devices of the plurality of first user devices, wherein the plurality of first user devices are located in a first geographic location at a time of submission of the pronunciation corrections; generating, by the TTS server, a second machine pronunciation of the text data by augmenting the at least one phonetic rule based on the crowdsource data; receiving, by the TTS server from a second user device, subsequent to generation of the second machine pronunciation, a TTS request including the text data; determining, by the TTS server, whether the second user device is located within the first geographic location; and providing, by the TTS server, via an audio output device of the second user device, one of (i) the first machine pronunciation in response to the second user device being located outside the first geographic location, and (ii) the second machine pronunciation in response to the second user device being located within the first geographic location.

Plain English Translation

This invention relates to improving text-to-speech (TTS) systems by correcting pronunciation errors using crowdsourced data. The problem addressed is the inability of traditional TTS systems to accurately adapt to regional pronunciation variations, leading to unnatural or incorrect speech output. The method involves a TTS server that initially generates a machine pronunciation of text data based on predefined phonetic rules. The server then collects pronunciation corrections from multiple user devices in a specific geographic location, where users provide audio input to refine the pronunciation. The server updates the phonetic rules using this crowdsourced data to generate an improved pronunciation version tailored to that region. When a subsequent TTS request is received, the server checks the user's location. If the user is within the same geographic area, the server provides the corrected pronunciation; otherwise, it defaults to the original version. This approach ensures that TTS output aligns with regional speech patterns, enhancing naturalness and accuracy. The system dynamically adapts to local dialects without requiring manual updates, improving user experience for geographically diverse audiences.

Claim 6

Original Legal Text

6. The method of claim 5 further comprising assigning one of the pronunciation corrections submitted by one of the plurality of first user devices to a user profile associated with a user of the one of the plurality of first user devices.

Plain English Translation

This invention relates to a system for improving pronunciation accuracy in a speech recognition or language learning application. The problem addressed is the variability in pronunciation corrections submitted by multiple users, which can lead to inconsistencies in training data or personalized feedback. The solution involves a method where a plurality of first user devices submit pronunciation corrections for speech inputs. These corrections are then assigned to user profiles associated with the respective users who submitted them. This allows the system to track individual user behavior, refine personalized pronunciation models, and ensure consistency in training data by associating corrections with specific users. The method may also involve aggregating corrections from multiple users to improve overall system accuracy. By linking corrections to user profiles, the system can provide more tailored feedback and adapt to individual pronunciation patterns over time. This approach enhances the reliability of speech recognition and language learning applications by ensuring that corrections are properly attributed and utilized for personalized and system-wide improvements.

Claim 7

Original Legal Text

7. The method of claim 6 , wherein the pronunciation correction is configured to override the at least one phonetic rule.

Plain English Translation

**Technical Summary for Prior Art Search Database** This invention relates to speech processing systems, specifically methods for correcting pronunciation errors in speech recognition or synthesis. The technology addresses the problem of inaccurate phonetic representations in speech processing, where predefined phonetic rules may not account for variations in pronunciation, leading to errors in speech recognition or synthesis. The method involves a pronunciation correction system that modifies phonetic representations of words or phrases to improve accuracy. A key feature is the ability to override predefined phonetic rules when necessary. This allows the system to handle exceptions or variations in pronunciation that standard rules cannot address. For example, if a phonetic rule incorrectly maps a word to a specific sound, the correction system can override this mapping with a more accurate representation. The system may also include a training phase where pronunciation corrections are learned from a dataset of speech samples, ensuring that the corrections are based on real-world usage. Additionally, the method may involve analyzing contextual factors, such as speaker identity or dialect, to determine when and how to apply corrections. By allowing pronunciation corrections to override phonetic rules, the system improves the accuracy of speech recognition and synthesis, particularly in cases where standard rules fail to capture pronunciation variations. This approach is useful in applications like voice assistants, language learning tools, and automated transcription services.

Claim 8

Original Legal Text

8. The method of claim 5 further comprising determining a current location of the plurality of first user devices via location services.

Plain English Translation

A system and method for managing user devices in a networked environment addresses the challenge of efficiently tracking and controlling multiple user devices, particularly in scenarios where device locations and statuses need to be monitored in real-time. The invention involves a networked system that includes a plurality of first user devices, each capable of communicating with a central server. The system further includes a second user device, such as a mobile device, that interacts with the first user devices to perform various functions, including sending commands, receiving status updates, and managing device configurations. The method includes determining the current location of the plurality of first user devices using location services, such as GPS, Wi-Fi triangulation, or cellular network-based positioning. This location data is then used to enhance device management, such as optimizing network performance, enforcing security policies, or providing location-based services. The system may also include features for remotely controlling the first user devices, such as powering them on or off, adjusting settings, or retrieving diagnostic information. Additionally, the method may involve authenticating the second user device before allowing it to interact with the first user devices, ensuring secure and authorized access. The invention improves efficiency in managing distributed devices by providing real-time location awareness and centralized control capabilities.

Claim 9

Original Legal Text

9. A non-transitory computer readable medium that includes computer executable instructions for correcting pronunciation in a text-to-speech (TTS) system, wherein when executed by a TTS server comprising at least one processor in communication with at least one memory device, the computer executable instructions cause the TTS server to: generate, for a plurality of first user devices, a first machine pronunciation of text data according to at least one phonetic rule; receive crowdsource data comprising a plurality of pronunciation corrections of the first machine pronunciation from a plurality of audio input devices of the plurality of first user devices, wherein the plurality of first user devices are located in a first geographic location at a time of submission of the pronunciation corrections; generate a second machine pronunciation of the text data by augmenting the at least one phonetic rule based on the crowdsource data; receive, from a second user device, subsequent to generation of the second machine pronunciation, a TTS request including the text data; determine whether the second user device is located within the first geographic location; and provide, via an audio output device of the second user device, one of (i) the first machine pronunciation in response to the second user device being located outside the first geographic location, and (ii) the second machine pronunciation in response to the second user device being located within the first geographic location.

Plain English Translation

This invention relates to improving text-to-speech (TTS) systems by correcting pronunciation errors using crowdsourced data. The problem addressed is the inability of traditional TTS systems to accurately adapt pronunciation to regional or local dialects, leading to unnatural or incorrect speech output. The solution involves a TTS server that generates an initial machine pronunciation of text data based on phonetic rules. User devices in a specific geographic location provide audio input devices to submit pronunciation corrections, which are collected as crowdsourced data. The TTS server then updates the phonetic rules using this data to generate a revised pronunciation tailored to that region. When a subsequent TTS request is received, the system checks the user device's location. If the device is within the same geographic area, the revised pronunciation is provided; otherwise, the original pronunciation is used. This approach ensures that TTS output adapts to local dialects while maintaining consistency across different regions. The system leverages user feedback to dynamically improve pronunciation accuracy without manual intervention.

Claim 10

Original Legal Text

10. The non-transitory computer readable medium of claim 9 , wherein the computer executable instructions further cause the TTS computing device to assign one of the pronunciation corrections submitted by one of the plurality of first user devices to a user profile associated with a user of the one of the plurality of first user devices.

Plain English Translation

This invention relates to text-to-speech (TTS) systems and addresses the challenge of improving pronunciation accuracy in synthesized speech. The system collects pronunciation corrections from multiple user devices, where users can submit edits to the TTS system's pronunciation of specific words or phrases. These corrections are then assigned to individual user profiles, allowing the system to personalize pronunciation adjustments based on user-specific preferences or dialects. The system may also analyze the corrections to identify common patterns or errors, which can be used to refine the TTS model for broader improvements. By associating corrections with user profiles, the system ensures that personalized adjustments persist across sessions, enhancing the consistency and accuracy of speech synthesis for each user. This approach improves user satisfaction by tailoring pronunciation to individual needs while also contributing to the overall refinement of the TTS system.

Claim 11

Original Legal Text

11. The non-transitory computer readable medium of claim 10 , wherein the pronunciation correction is configured to override the at least one phonetic rule.

Plain English Translation

This invention relates to speech processing systems, specifically methods for correcting pronunciation errors in speech recognition. The problem addressed is the inability of existing systems to accurately correct mispronunciations when they conflict with predefined phonetic rules. The solution involves a non-transitory computer-readable medium storing instructions that, when executed, perform pronunciation correction by overriding at least one phonetic rule. The system analyzes input speech to identify mispronunciations and applies corrections that deviate from standard phonetic rules when necessary. This allows for more flexible and accurate pronunciation adjustments, particularly in cases where strict adherence to phonetic rules would result in incorrect corrections. The method includes generating a phonetic representation of the input speech, comparing it to a reference pronunciation, and applying corrections that may override predefined rules to improve accuracy. The system may also adjust pronunciation based on contextual or user-specific data to further enhance correction precision. This approach ensures that speech recognition systems can handle exceptions to phonetic rules, improving overall recognition performance.

Claim 12

Original Legal Text

12. The non-transitory computer readable medium of claim 9 , wherein the computer executable instructions further cause the TTS server to determine a current location of the plurality of first user devices via location services.

Plain English Translation

This invention relates to a text-to-speech (TTS) system that enhances user interaction by leveraging location-based services. The system addresses the problem of providing contextually relevant audio responses by dynamically adjusting speech output based on the physical location of user devices. The system includes a TTS server that processes text input and generates synthesized speech for playback on user devices. The server is configured to determine the current location of multiple user devices using location services, such as GPS or network-based positioning. By analyzing the location data, the TTS server can customize speech output to reflect local conditions, preferences, or contextual information. For example, the system may modify pronunciation, accent, or content based on regional dialects or local events. Additionally, the TTS server may compare the locations of different user devices to coordinate interactions between users. This could involve synchronizing speech output across devices or adjusting responses to reflect shared or nearby locations. The system may also integrate with other services to provide location-aware features, such as navigation guidance or localized announcements. The invention improves upon existing TTS systems by introducing dynamic, location-based personalization, making speech output more relevant and engaging for users. This approach enhances accessibility and usability, particularly in applications requiring real-time, context-aware communication.

Patent Metadata

Filing Date

Unknown

Publication Date

February 4, 2020

Inventors

Jason Jay Lacoss-Arnold

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “SYSTEM AND METHODS FOR CORRECTING TEXT-TO-SPEECH PRONUNCIATION” (10553200). https://patentable.app/patents/10553200

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10553200. See llms.txt for full attribution policy.