US-6823308

Speech recognition accuracy in a multimodal input system

PublishedNovember 23, 2004

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A speech recognition method for use in a multimodal input system comprises receiving a multimodal input comprising digitized speech as a first modality input and data in at least one further modality input. Features in the speech and in the data in at least one further modality are identified. The identified features in the speech and in the data are used in the recognition of words by comparing the identified features with states in models for the words. The models have states for the recognition of speech and for words having features in at least one further modality associated with the words, the models also have states for the recognition of events in the further modality or each further modality.

Patent Claims

31 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A speech recognition method for use in a complementary multimodal input system, the method comprising the steps of: receiving a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input; identifying at least one feature in the speech and at least one feature in the data in said at least one further modality input; and recognising words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, said models also having states for the recognition of events in said at least one further modality, wherein said models each comprise an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words.

2. A speech recognition method according to claim 1 , wherein said models of words are organised in a net of words in accordance with grammar rules.

3. A speech recognition method according to claim 1 , wherein said data in said at least one further modality input comprises data identifying events.

4. A speech recognition method according to claim 1 , wherein the words are recognised by sequentially comparing identified features in the speech with the states in a first dimension and also comparing identified features in the further modality input or each further modality input in the further dimension or each respective further dimension to try to reach a final state.

5. A speech recognition method according to claim 1 , wherein the states in the models for the recognition of speech comprise states of Hidden Markov models.

6. A speech recognition method according to claim 1 , wherein said identified features define events in the further modality input or each further modality input.

7. A speech recognition method according to claim 6 , wherein said events comprise pointing events comprising one or more actions.

8. A speech recognition method according to claim 1 , wherein said states have probabilities associated therewith and the recognition step comprises comparing the identified features with the states to determine a word with the highest probability at a final state.

9. Program code for controlling a processor to implement the method of claim 1 .

10. A carrier medium carrying the program code according to claim 9 .

11. A multimodal input method comprising: using the speech recognition method according to any preceding claim to generate recognised words as a first modality input; and processing the recognised words and the further modality input or each further modality input in accordance with rules to generate an input for a process.

12. Speech recognition apparatus for use in a complementary multimodal input system, the apparatus comprising: receiving means for receiving a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input; identifying means for identifying at least one feature in the speech and at least one feature in the data in said at least one further modality input; and recognition means for recognising words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, said models also having states for the recognition of events in said at least one further modality, wherein said recognition means is adapted to use said models each comprising an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words.

13. Speech recognition apparatus according to claim 12 , including storage means for storing said models.

14. Speech recognition apparatus according to claim 12 , wherein said recognition means is adapted to use said models organised in a net of words in accordance with grammar rules.

15. Speech recognition apparatus according to claim 12 , wherein said receiving means is adapted to receive said data in said at least one further modality input comprising data identifying events.

16. Speech recognition apparatus according to claim 12 , wherein said recognition means is adapted to recognise the words by sequentially comparing identified features in the speech with the states in a first dimension and also by comparing identified features in the further modality input or each further modality input in the further dimension or each respective further dimension to try to reach a final state.

17. Speech recognition apparatus according to claim 12 , wherein said recognition means is adapted to use states of Hidden Markov models as the states in the models for the recognition of speech.

18. Speech recognition apparatus according to claim 12 , wherein said identifying means is adapted to identify said features defining events in the further modality input or each further modality input.

19. Speech recognition apparatus according to claim 18 , wherein said events comprise pointing events comprising one or more actions.

20. Speech recognition apparatus according to claim 12 , wherein said recognition means is adapted to use said models wherein said states have probabilities associated therewith, and to compare the identified features with the states to determine a word with the highest probability at a final state.

21. A multimodal input system comprising: speech input means for inputting speech as the first modality input; speech digitizing means for digitizing the input speech; further modality input means for inputting the data in the at least one further modality; the speech recognition apparatus according to any one of claims 12 to 20 for generating recognised words using the digitised speech and the data in the at least one further modality; and processing means for processing the recognized words and the further modality input or each further modality input in accordance with rules to generate an input for a process.

22. A processing system for implementing a process, the system comprising: the multimodal input system according to claim 21 for generating an input; and processing means for processing the generated input.

23. A method of recognising speech using multimodal input data, the method comprising the steps of: receiving multimodal data comprising speech data and data in at least further modality; and recognising words by comparing features in the speech data and features in the further modality data with word models having states for the recognition of speech, wherein said word models each comprise an array of states each having a dimensionality equal to the number of modes in the received multimodal data.

24. A speech recognition method according to claim 23 , wherein the step of recognising words comprises sequentially comparing features in the speech data with the states in a first dimension and comparing features in the data of the further modality or in each further modality with states in the further dimension or each respective further dimension to try to reach a final state.

25. A speech recognition apparatus, comprising: a receiver operable to receive multimodal data comprising speech data and data in at least one further modality; and a recogniser operable to recognise words by comparing features in the speech data and features in the further modality data with states in word models having states for the recognition of speech, each word model comprising an array of states having a dimensionality equal to the number of modes in the received multimodal input.

26. A speech recognition apparatus according to claim 25 , wherein said recogniser means is adapted to recognise words by sequentially comparing features in the speech data with the states in a first dimension and by comparing features in the data in the further modality or each further modality with states in the further dimension or each respective further dimension to try to reach a final state.

27. A speech recognition apparatus for use in a complementary multimodal input system, the apparatus comprising: a receiver operable to receive a complementary multimodal input comprising digitized speech as a first modality input and data in at least one further modality input; an identifier operable to identify at least one feature in the speech and at least one feature in the data in said at least one further modality input; and a recogniser operable to recognise words by comparing identified features in the speech and in the data with states in models for words, said models having states for the recognition of speech, and for words having at least one feature in said at least one further modality associated therewith, the models also having states for the recognition of events in said at least one further modality, wherein said models each comprise an array of states having a dimensionality equal to the number of modes in the received multimodal input that are used in recognising said words.

28. A method of recognising data using multimodal input data, the method comprising the steps of: receiving multimodal data comprising data input using a plurality of modalities; and recognising words by comparing features in the data input using the plurality of modalities with models representing words, the models having states for the recognition of data, wherein said models each comprise an array of states each having a dimensionality equal to the number of modes in the received multimodal data.

29. An apparatus for recognising data using multimodal input data, the apparatus comprising: receiving means for receiving multimodal data comprising data input using a plurality of modalities; and recognising means for recognising words by comparing features in the data input using the plurality of modalities with models representing words, the models having states for the recognition of data, wherein said models each comprise an array of states each having a dimensionality equal to the number of modes in the received multimodal data.

30. Program code for controlling a processor to implement the method of claim 28 .

31. A carrier medium carrying the program code according to claim 30 .

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L

Patent Metadata

Filing Date

February 16, 2001

Publication Date

November 23, 2004

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search