Speech recognition dependent on text message content

PublishedDecember 1, 2015

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A method of automatic speech recognition. An utterance is received from a user in reply to a text message, via a microphone that converts the reply utterance into a speech signal. The speech signal is processed using at least one processor to extract acoustic data from the speech signal. An acoustic model is identified from a plurality of acoustic models to decode the acoustic data, and using a conversational context associated with the text message. The acoustic data is decoded using the identified acoustic model to produce a plurality of hypotheses for the reply utterance.

Patent Claims

2 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method of automatic speech recognition, comprising the steps of: a) receiving a text message at a speech recognition client device; b) processing the text message with conversational context-specific language models and emotional context-specific language models stored on the client device using at least one processor of the client device to identify a conversational context and an emotional context corresponding to the text message; c) synthesizing speech from the text message; d) communicating the synthesized speech via a loudspeaker of the client device to a user of the client device; e) receiving a reply utterance in response to the text message from the user via a microphone of the client device that converts the reply utterance into a speech signal; f) pre-processing the speech signal using the at least one processor to extract acoustic data from the received speech signal; g) communicating the extracted acoustic data, the identified conversational context, and identified emotional context to a speech recognition server; h) identifying an acoustic model of a plurality of acoustic models stored at the server to be used for decoding the acoustic data based on the identified conversational context, the identified emotional context, or both; i) decoding the acoustic data using the identified acoustic model to produce a plurality of hypotheses for the reply utterance; and j) post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance; k) presenting the identified hypothesis to the user; l) seeking confirmation from the user that the identified hypothesis is correct; m) outputting the identified hypothesis as at least part of a reply text message if the user confirms that the identified hypothesis is correct; otherwise n) using the emotional context to improve identification of the acoustic model, and repeating steps e) through m).

2. A method of automatic speech recognition, comprising the steps of: a) receiving a text message at a speech recognition client device; b) processing the text message with conversational context-specific language models and emotional context-specific language models stored on the client device using at least one processor of the client device to identify a conversational context and emotional context corresponding to the text message; c) synthesizing speech from the text message; d) communicating the synthesized speech via a loudspeaker of the client device to a user of the client device; e) receiving a reply utterance in response to the text message from the user via a microphone of the client device that converts the reply utterance into a speech signal; f) pre-processing the speech signal using the at least one processor to extract acoustic data from the received speech signal; g) identifying an acoustic model of a plurality of acoustic models to decode the acoustic data, using the identified conversational context and emotional context associated with the text message; h) decoding the acoustic data using the identified acoustic model to produce a plurality of hypotheses for the reply utterance; i) determining whether a confidence value associated with at least one of the plurality of hypotheses for the reply utterance is greater or less than a confidence threshold; j) communicating the extracted acoustic data, the conversational context, and the emotional context to a speech recognition server, if the confidence value is determined to be less than the confidence threshold, otherwise post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance, and outputting from the client device the identified hypothesis as at least part of a reply text message; k) identifying at the server, an acoustic model of a plurality of acoustic models stored at the server to decode the acoustic data, using the identified conversational context, the emotional context, or both; l) decoding the acoustic data using the acoustic model identified at the server to produce a plurality of hypotheses for the reply utterance; m) post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance; and n) outputting from the server the identified hypothesis as at least part of a reply text message.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L

Patent Metadata

Filing Date

March 25, 2011

Publication Date

December 1, 2015

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search