US-8489397

Method and device for providing speech-to-text encoding and telephony service

PublishedJuly 16, 2013

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber.

Patent Claims

20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method comprising: receiving an utterance; converting, via a processor, the utterance to text; and receiving a dual-tone multiple frequency tone indicating one of activating a function and deactivating the function; and when the dual-tone multiple frequency tone indicates activating the function: performing the function by analyzing the utterance, to yield an inflection pattern associated with the utterance and a characteristic of a user associated with the utterance; modifying the text based on the inflection pattern, to yield modified text; and displaying the modified text and the characteristic.

Plain English Translation

A method for speech-to-text conversion receives spoken input (an utterance). A processor converts the utterance into text. The system listens for a DTMF (Dual-Tone Multi-Frequency) tone, like those from a telephone keypad, that signals to either start or stop a function. If the DTMF tone indicates to activate a function, the system analyzes the utterance to determine the speaker's inflection pattern and a characteristic of the speaker, like gender or accent. It then modifies the initial text based on the inflection pattern (e.g., adding punctuation or correcting tone) and displays both the modified text and the speaker characteristic.

Claim 2

Original Legal Text

2. The method of claim 1 , further comprising: receiving, via a dual-tone multiple frequency detector, an input indicating activation of the conversion of the utterance to the text.

Plain English Translation

The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes a DTMF detector. This detector specifically listens for and recognizes the DTMF tone that indicates the activation of the conversion process, meaning the speech-to-text process begins only when this tone is received.

Claim 3

Original Legal Text

3. The method of claim 1 , further comprising: identifying the user based on speech segments stored in a database.

Plain English Translation

The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes identifying the speaker. This identification is done by comparing speech segments from the current utterance to speech segments stored in a database of known users.

Claim 4

Original Legal Text

4. The method of claim 1 , further comprising: identifying, using the utterance, one of a soft-spoken word, a hard-spoken word, shouting, laughter, and a human expression.

Plain English Translation

The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes identifying aspects of the utterance itself. Specifically, the system identifies soft-spoken words, hard-spoken words, shouting, laughter, or other human expressions within the spoken input, using this information to potentially improve text accuracy or add context.

Claim 5

Original Legal Text

5. The method of claim 1 , further comprising inserting punctuation into the modified text based on the inflection pattern.

Plain English Translation

The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes automatically inserting punctuation into the modified text. The punctuation is inserted based on the inflection pattern of the speaker's voice, such as adding a question mark at the end of a question.

Claim 6

Original Legal Text

6. The method of claim 1 wherein the characteristic comprises a gender.

Plain English Translation

The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, defines the "characteristic" of the user as the speaker's gender. The system attempts to identify the gender of the speaker based on their voice.

Claim 7

Original Legal Text

7. The method of claim 1 , wherein the characteristic comprises an accent.

Plain English Translation

The method for speech-to-text conversion described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, defines the "characteristic" of the user as the speaker's accent. The system attempts to identify the accent of the speaker based on their voice.

Claim 8

Original Legal Text

8. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: receiving an utterance; converting the utterance to text; receiving a dual-tone multiple frequency tone indicating one of activating a function and deactivating the function; and when the dual-tone multiple frequency tone indicates activating the function: performing the function by analyzing the utterance, to yield an inflection pattern associated with the utterance and a characteristic of a user associated with the utterance; modifying the text based on the inflection pattern, to yield modified text; and displaying the modified text and the characteristic.

Plain English Translation

A system for speech-to-text conversion includes a processor and a storage medium containing instructions. When executed, these instructions cause the system to receive spoken input (an utterance), convert it into text, and listen for a DTMF tone that signals to either start or stop a function. If the DTMF tone indicates activation, the system analyzes the utterance to determine the speaker's inflection pattern and a characteristic of the speaker, such as gender or accent. It then modifies the initial text based on the inflection pattern and displays both the modified text and the speaker characteristic.

Claim 9

Original Legal Text

9. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in the operations further comprising: receiving, via a dual-tone multiple frequency detector, an input indicating activation of the conversion of the utterance to the text.

Plain English Translation

The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes a DTMF detector. This detector specifically listens for and recognizes the DTMF tone that indicates the activation of the conversion process, meaning the speech-to-text process begins only when this tone is received.

Claim 10

Original Legal Text

10. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in the operations further comprising: identifying the user based on speech segments stored in a database.

Plain English Translation

The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes functionality to identify the speaker. This identification is done by comparing speech segments from the current utterance to speech segments stored in a database of known users.

Claim 11

Original Legal Text

11. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in the operations further comprising: identifying, using the utterance, one of a soft-spoken word, a hard-spoken word, shouting, laughter, and a human expression.

Plain English Translation

The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes functionality to identify aspects of the utterance itself. Specifically, the system identifies soft-spoken words, hard-spoken words, shouting, laughter, or other human expressions within the spoken input, using this information to potentially improve text accuracy or add context.

Claim 12

Original Legal Text

12. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in the operations further comprising: inserting punctuation into the modified text based on the inflection pattern.

Plain English Translation

The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes automatically inserting punctuation into the modified text. The punctuation is inserted based on the inflection pattern of the speaker's voice, such as adding a question mark at the end of a question.

Claim 13

Original Legal Text

13. The system of claim 8 , wherein the characteristic comprises a gender.

Plain English Translation

The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, defines the "characteristic" of the user as the speaker's gender. The system attempts to identify the gender of the speaker based on their voice.

Claim 14

Original Legal Text

14. The system of claim 8 , wherein the characteristic comprises an accent.

Plain English Translation

The speech-to-text system described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, defines the "characteristic" of the user as the speaker's accent. The system attempts to identify the accent of the speaker based on their voice.

Claim 15

Original Legal Text

15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: receiving an utterance; converting the utterance to text; receiving a dual-tone multiple frequency tone indicating one of activating a function and deactivating the function; and when the dual-tone multiple frequency tone indicates activating the function: performing the function by analyzing the utterance, to yield an inflection pattern associated with the utterance and a characteristic of a user associated with the utterance; and modifying the text based on the inflection pattern, to yield modified text; and displaying the modified text and the characteristic.

Plain English Translation

A computer-readable storage device stores instructions for speech-to-text conversion. When executed by a computing device, these instructions cause the device to receive spoken input (an utterance), convert it into text, and listen for a DTMF tone that signals to either start or stop a function. If the DTMF tone indicates activation, the system analyzes the utterance to determine the speaker's inflection pattern and a characteristic of the speaker, such as gender or accent. It then modifies the initial text based on the inflection pattern and displays both the modified text and the speaker characteristic.

Claim 16

Original Legal Text

16. The computer-readable storage device of claim 15 , the computer-readable storage device having additional instructions which result in the operations further comprising: receiving, via a dual-tone multiple frequency detector, an input indicating activation of the conversion of the utterance to the text.

Plain English Translation

The computer-readable storage device containing speech-to-text instructions described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes a DTMF detector. This detector specifically listens for and recognizes the DTMF tone that indicates the activation of the conversion process, meaning the speech-to-text process begins only when this tone is received.

Claim 17

Original Legal Text

17. The computer-readable storage device of claim 15 , the computer-readable storage device having additional instructions which result in the operations further comprising: identifying the user based on speech segments stored in a database.

Plain English Translation

The computer-readable storage device containing speech-to-text instructions described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes functionality to identify the speaker. This identification is done by comparing speech segments from the current utterance to speech segments stored in a database of known users.

Claim 18

Original Legal Text

18. The computer-readable storage device of claim 15 , the computer-readable storage device having additional instructions which result in the operations further comprising: identifying, using the utterance, one of a soft-spoken word, a hard-spoken word, shouting, laughter, and a human expression.

Plain English Translation

The computer-readable storage device containing speech-to-text instructions described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes functionality to identify aspects of the utterance itself. Specifically, the system identifies soft-spoken words, hard-spoken words, shouting, laughter, or other human expressions within the spoken input, using this information to potentially improve text accuracy or add context.

Claim 19

Original Legal Text

19. The computer-readable storage device of claim 15 , the computer-readable storage device having additional instructions which result in the operations further comprising: inserting punctuation into the modified text based on the inflection pattern.

Plain English Translation

The computer-readable storage device containing speech-to-text instructions described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, further includes automatically inserting punctuation into the modified text. The punctuation is inserted based on the inflection pattern of the speaker's voice, such as adding a question mark at the end of a question.

Claim 20

Original Legal Text

20. The computer-readable storage device of claim 15 , wherein the characteristic comprises a gender.

Plain English Translation

The computer-readable storage device containing speech-to-text instructions described previously, where spoken input is converted to text, and a DTMF tone activates analysis and text modification based on inflection and speaker characteristics, defines the "characteristic" of the user as the speaker's gender. The system attempts to identify the gender of the speaker based on their voice.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L H04M H04N

Patent Metadata

Filing Date

September 11, 2012

Publication Date

July 16, 2013

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search