US-9058805

Multiple recognizer speech recognition

PublishedJune 16, 2015

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.

Patent Claims

17 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method performed by a data processing apparatus, the method comprising: receiving audio data that corresponds to an utterance; obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar; obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar; aligning the first and second transcriptions of the utterance to generate an aligned transcription; and classifying the utterance, based at least on a portion of the aligned transcription, as a voice command or a voice query.

2. The method of claim 1 , wherein the operations of at least one of the limited speech recognizer and the expanded speech recognizer are performed at a server computer device.

3. The method of claim 1 , further comprising: in response to classifying the utterance, based at least on a portion of the aligned transcription, as the voice command: generating the voice command using at least a portion of the first transcription and at least part of the second transcription; and initiating the voice command; and in response to classifying the utterance as the voice query: generating the voice query using at least a portion of the first transcription and at least part of the second transcription; and initiating the voice query.

4. The method of claim 1 , wherein the limited speech recognizer is configured to recognize one or more of a collection of placeholder terms, a collection of voice command terms, and a collection of contact names from a contact list.

5. The method of claim 1 , wherein the expanded speech recognizer is configured to recognize one or more of a collection of general grammar terms, a collection of placeholder terms, a collection of proper names, and a collection of voice command terms.

6. The method of claim 5 , wherein the expanded speech recognizer is not configured to recognize a collection of contact names from a contact list.

7. The method of claim 1 , wherein the operations of at least one of the limited speech recognizer and the expanded speech recognizer are performed at a mobile device.

8. A system comprising: a data processing apparatus; and a non-transitory memory storage storing instructions executable by the data processing apparatus and that upon such execution cause the data processing apparatus to perform operations comprising: receiving audio data that corresponds to an utterance; obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar; obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar; aligning the first and second transcriptions of the utterance to generate an aligned transcription; and classifying the utterance based at least on a portion of the aligned transcription, as a voice command or a voice query.

9. The system of claim 8 , wherein the operations of at least one of the limited speech recognizer and the expanded speech recognizer are performed at a mobile device.

10. The system of claim 8 , the operations further comprising: in response to classifying the utterance, based at least on a portion of the aligned transcription, as the voice command: generating the voice command using at least a portion of the first transcription and at least part of the second transcription; and initiating the voice command; and in response to classifying the utterance as the voice query: generating the voice query using at least a portion of the first transcription and at least part of the second transcription; and initiating the voice query.

11. The system of claim 8 , wherein the limited speech recognizer is configured to recognize one or more of a collection of placeholder terms, a collection of voice command terms, and a collection of contact names from a contact list.

12. The system of claim 8 , wherein the expanded speech recognizer is configured to recognize one or more of a collection of general grammar terms, a collection of placeholder terms, a collection of proper names, and a collection of voice command terms.

13. The system of claim 12 , wherein the expanded speech recognizer is not configured to recognize a collection of contact names from a contact list.

14. A non-transitory computer readable medium storing instructions executable by a data processing apparatus and that upon such execution cause the data processing apparatus to perform operations comprising: receiving audio data that corresponds to an utterance; obtaining a first transcription of the utterance that was generated using a limited speech recognizer, wherein the limited speech recognizer comprises a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar; obtaining a second transcription of the utterance that was generated using an expanded speech recognizer, wherein the expanded speech recognizer comprises a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar; aligning the first and second transcriptions of the utterance to generate an aligned transcription; and classifying the utterance based at least on a portion of the aligned transcription, as a voice command or a voice query.

15. The computer readable medium of claim 14 , the operations further comprising: in response to classifying the utterance, based at least on a portion of the aligned transcription, as the voice command: generating the voice command using at least a portion of the first transcription and at least part of the second transcription; and initiating the voice command; and in response to classifying the utterance as the voice query: generating the voice query using at least a portion of the first transcription and at least part of the second transcription; and initiating the voice query.

16. The computer readable medium of claim 14 , wherein the limited speech recognizer is configured to recognize one or more of a collection of placeholder terms, a collection of voice command terms, and a collection of contact names from a contact list.

17. The computer readable medium of claim 14 , wherein the expanded speech recognizer is configured to recognize one or more of a collection of general grammar terms, a collection of placeholder terms, a collection of proper names, and a collection of voice command terms.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L H04M

Patent Metadata

Filing Date

May 13, 2013

Publication Date

June 16, 2015

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search