US-7870000

Partially filling mixed-initiative forms from utterances having sub-threshold confidence scores based upon word-level confidence data

PublishedJanuary 11, 2011

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

The present disclosure relates to prompting for a spoken response that provides input for multiple elements. A single spoken utterance including content for multiple elements can be received, where each element is mapped to a data field. The spoken utterance can be speech-to-text converted to derive values for each of the multiple elements. An utterance level confidence score can be determined, which can fall below an associated certainty threshold. Element-level confidence scores for each of the derived elements can then be ascertained. A first set of the multiple elements can have element-level confidence scores above an associated certainty threshold and a second set can have scores below. Values can be stored in data fields mapped to the first set. A prompt for input for the second set can be played.

Patent Claims

12 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A speech processing method, implemented at least in part by at least one computer comprising at least one hardware processor, the method comprising: prompting, via the at least one computer, for a spoken response that provides input for multiple elements; receiving at the at least one computer, a single spoken utterance comprising content for multiple elements, each of which is mapped to a data field; speech-to-text converting, using the at least one computer, the spoken utterance to derive values for each of the multiple elements; determining, using the at least one computer, that an utterance-level confidence score for the spoken utterance falls below an associated certainty threshold; ascertaining, using the at least one computer, element-level confidence scores for each of the derived elements; determining, using the at least one computer, that a first set of the multiple elements each has an element-level confidence score above an associated certainty threshold and that a second set of the multiple elements each has an element-level confidence score below an associated certainty threshold; storing, on the at least one computer, values for data fields mapped to elements in the first set; and prompting, via the at least one computer, for a new spoken response that provides input for elements of the second set.

2. The method of claim 1 , wherein the ascertaining step is based on word-level confidence scores.

3. The method of claim 2 , further comprising: establishing a configurable percolation algorithm, said percolation algorithm defining a manner in which confidence scores associated with child nodes of a parse-tree are applied to parent nodes of the parse-tree, wherein leaf nodes of the parse-tree are words, each associated with one of the word-level confidence scores.

4. The method of claim 2 , said method further comprising: for an element node having multiple component nodes, determining component-level confidence scores for each of the component scores for each of the component nodes; and setting the element-level confidence score to a lowest one of the determined component-level confidence scores.

5. The method of claim 1 , wherein the multiple elements are defined within a mixed-initiative form.

6. The method of claim 5 , wherein the mixed-initiative form is written in a standardized language that includes language constructs for specifically handling voice input.

7. The method of claim 5 , wherein the mixed-initiative form is written in VoiceXML.

8. The method of claim 5 , wherein the mixed-initiative form is associated with a grammar document which defines a recognition grammar used in the speech-to-text converting step.

9. The method of claim 8 , wherein the grammar document is written in an augmented Backus-Naur form (ABNF) based language.

10. The method of claim 8 , wherein the grammar document is written in an Extensible Markup Language (XML) Speech Recognition Grammar Specification (SRGS) based language.

11. The method of claim 1 , further comprising: repeating the steps of claim 1 in a recursive fashion until an utterance-level confidence score for a received utterance falls above an associated certainty threshold at which point data fields mapped to the received utterance are all completed.

12. The method of claim 1 , wherein said steps of claim 1 are performed by at least one machine in accordance with at least one computer program stored in a computer readable media, said computer programming having a plurality of code sections that are executable by the at least one machine.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L

Patent Metadata

Filing Date

March 28, 2007

Publication Date

January 11, 2011

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search