A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method comprising: receiving an utterance; converting, via a processor, the utterance to text; and receiving a dual-tone multiple frequency tone indicating one of activating a function and deactivating the function; and when the dual-tone multiple frequency tone indicates activating the function: performing the function by analyzing the utterance, to yield an inflection pattern associated with the utterance and a characteristic of a user associated with the utterance; modifying the text based on the inflection pattern, to yield modified text; and displaying the modified text and the characteristic.
A method for speech-to-text conversion receives spoken input (an utterance). A processor converts the utterance into text. The system listens for a DTMF (Dual-Tone Multi-Frequency) tone, like those from a telephone keypad, that signals to either start or stop a function. If the DTMF tone indicates to activate a function, the system analyzes the utterance to determine the speaker's inflection pattern and a characteristic of the speaker, like gender or accent. It then modifies the initial text based on the inflection pattern (e.g., adding punctuation or correcting tone) and displays both the modified text and the speaker characteristic.
2. The method of claim 1 , further comprising: receiving, via a dual-tone multiple frequency detector, an input indicating activation of the conversion of the utterance to the text.
The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes a DTMF detector. This detector specifically listens for and recognizes the DTMF tone that indicates the activation of the conversion process, meaning the speech-to-text process begins only when this tone is received.
3. The method of claim 1 , further comprising: identifying the user based on speech segments stored in a database.
The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes identifying the speaker. This identification is done by comparing speech segments from the current utterance to speech segments stored in a database of known users.
4. The method of claim 1 , further comprising: identifying, using the utterance, one of a soft-spoken word, a hard-spoken word, shouting, laughter, and a human expression.
The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes identifying aspects of the utterance itself. Specifically, the system identifies soft-spoken words, hard-spoken words, shouting, laughter, or other human expressions within the spoken input, using this information to potentially improve text accuracy or add context.
5. The method of claim 1 , further comprising inserting punctuation into the modified text based on the inflection pattern.
The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes automatically inserting punctuation into the modified text. The punctuation is inserted based on the inflection pattern of the speaker's voice, such as adding a question mark at the end of a question.
6. The method of claim 1 wherein the characteristic comprises a gender.
The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, defines the "characteristic" of the user as the speaker's gender. The system attempts to identify the gender of the speaker based on their voice.
7. The method of claim 1 , wherein the characteristic comprises an accent.
The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, defines the "characteristic" of the user as the speaker's accent. The system attempts to identify the accent of the speaker based on their voice.
8. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: receiving an utterance; converting the utterance to text; receiving a dual-tone multiple frequency tone indicating one of activating a function and deactivating the function; and when the dual-tone multiple frequency tone indicates activating the function: performing the function by analyzing the utterance, to yield an inflection pattern associated with the utterance and a characteristic of a user associated with the utterance; modifying the text based on the inflection pattern, to yield modified text; and displaying the modified text and the characteristic.
A system for speech-to-text conversion includes a processor and a storage medium containing instructions. When executed, these instructions cause the system to receive spoken input (an utterance), convert it into text, and listen for a DTMF tone that signals to either start or stop a function. If the DTMF tone indicates activation, the system analyzes the utterance to determine the speaker's inflection pattern and a characteristic of the speaker, such as gender or accent. It then modifies the initial text based on the inflection pattern and displays both the modified text and the speaker characteristic.
9. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in the operations further comprising: receiving, via a dual-tone multiple frequency detector, an input indicating activation of the conversion of the utterance to the text.
The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes a DTMF detector. This detector specifically listens for and recognizes the DTMF tone that indicates the activation of the conversion process, meaning the speech-to-text process begins only when this tone is received.
10. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in the operations further comprising: identifying the user based on speech segments stored in a database.
The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes functionality to identify the speaker. This identification is done by comparing speech segments from the current utterance to speech segments stored in a database of known users.
11. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in the operations further comprising: identifying, using the utterance, one of a soft-spoken word, a hard-spoken word, shouting, laughter, and a human expression.
The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes functionality to identify aspects of the utterance itself. Specifically, the system identifies soft-spoken words, hard-spoken words, shouting, laughter, or other human expressions within the spoken input, using this information to potentially improve text accuracy or add context.
12. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in the operations further comprising: inserting punctuation into the modified text based on the inflection pattern.
The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes automatically inserting punctuation into the modified text. The punctuation is inserted based on the inflection pattern of the speaker's voice, such as adding a question mark at the end of a question.
13. The system of claim 8 , wherein the characteristic comprises a gender.
The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, defines the "characteristic" of the user as the speaker's gender. The system attempts to identify the gender of the speaker based on their voice.
14. The system of claim 8 , wherein the characteristic comprises an accent.
The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, defines the "characteristic" of the user as the speaker's accent. The system attempts to identify the accent of the speaker based on their voice.
15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: receiving an utterance; converting the utterance to text; receiving a dual-tone multiple frequency tone indicating one of activating a function and deactivating the function; and when the dual-tone multiple frequency tone indicates activating the function: performing the function by analyzing the utterance, to yield an inflection pattern associated with the utterance and a characteristic of a user associated with the utterance; and modifying the text based on the inflection pattern, to yield modified text; and displaying the modified text and the characteristic.
A computer-readable storage device stores instructions for speech-to-text conversion. When executed by a computing device, these instructions cause the device to receive spoken input (an utterance), convert it into text, and listen for a DTMF tone that signals to either start or stop a function. If the DTMF tone indicates activation, the system analyzes the utterance to determine the speaker's inflection pattern and a characteristic of the speaker, such as gender or accent. It then modifies the initial text based on the inflection pattern and displays both the modified text and the speaker characteristic.
16. The computer-readable storage device of claim 15 , the computer-readable storage device having additional instructions which result in the operations further comprising: receiving, via a dual-tone multiple frequency detector, an input indicating activation of the conversion of the utterance to the text.
The computer-readable storage device containing speech-to-text instructions described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes a DTMF detector. This detector specifically listens for and recognizes the DTMF tone that indicates the activation of the conversion process, meaning the speech-to-text process begins only when this tone is received.
17. The computer-readable storage device of claim 15 , the computer-readable storage device having additional instructions which result in the operations further comprising: identifying the user based on speech segments stored in a database.
The computer-readable storage device containing speech-to-text instructions described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes functionality to identify the speaker. This identification is done by comparing speech segments from the current utterance to speech segments stored in a database of known users.
18. The computer-readable storage device of claim 15 , the computer-readable storage device having additional instructions which result in the operations further comprising: identifying, using the utterance, one of a soft-spoken word, a hard-spoken word, shouting, laughter, and a human expression.
The computer-readable storage device containing speech-to-text instructions described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes functionality to identify aspects of the utterance itself. Specifically, the system identifies soft-spoken words, hard-spoken words, shouting, laughter, or other human expressions within the spoken input, using this information to potentially improve text accuracy or add context.
19. The computer-readable storage device of claim 15 , the computer-readable storage device having additional instructions which result in the operations further comprising: inserting punctuation into the modified text based on the inflection pattern.
The computer-readable storage device containing speech-to-text instructions described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes automatically inserting punctuation into the modified text. The punctuation is inserted based on the inflection pattern of the speaker's voice, such as adding a question mark at the end of a question.
20. The computer-readable storage device of claim 15 , wherein the characteristic comprises a gender.
The computer-readable storage device containing speech-to-text instructions described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, defines the "characteristic" of the user as the speaker's gender. The system attempts to identify the gender of the speaker based on their voice.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
September 11, 2012
July 16, 2013
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.