Multimodal task execution and text editing for a wearable system

PublishedApril 16, 2024

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Examples of wearable systems and methods can use multiple inputs (e.g., gesture, head pose, eye gaze, voice, and/or environmental factors (e.g., location)) to determine a command that should be executed and objects in the three-dimensional (3D) environment that should be operated on. The multiple inputs can also be used by the wearable system to permit a user to interact with text, such as, e.g., composing, selecting, or editing text.

Patent Claims

12 claims

Legal claims defining the scope of protection, as filed with the USPTO.

5. The system of claim 1, wherein the hardware processor is programmed to implement an automated speech recognition (ASR) engine to obtain the transcription.

6. The system of claim 5, wherein the ASR engine is configured to produce a score associated with one or more words in the string of text, which indicates a likelihood that the ASR engine correctly transcribed such words.

7. The system of claim 6, wherein the hardware processor is further programmed to cause the HMD to emphasize the one or more words if the likelihood of correct transcription is below a threshold level.

9. The system of claim 8, wherein the hardware processor is further programmed to determine that the user has given the user input command to edit the user-selected portion of the text based on data from the eye gaze tracking device indicating that the user's gaze has lingered on the portion of the text presented by the display for at least a threshold period of time.

10. The system of claim 8, wherein the hardware processor is programmed to determine that the user has given the user input command to edit the user-selected portion of the text based on data from the audio sensing device and data from the eye gaze tracking device indicating that the audio sensing device received a voice command while the user's gaze was focused on the portion of the text presented by the display.

13. The system of claim 8, wherein the hardware processor is further programmed to produce an automated speech recognition (ASR) score associated with one or more words in the text, which indicates a likelihood that such words are correctly transcribed.

14. The system of claim 13, wherein the hardware processor is further programmed to calculate the aggregated confidence score utilizing the first confidence score, the second confidence score, and the ASR score.

17. The method of claim 15, wherein at least a portion of the text is emphasized on the display where the portion is associated with a low confidence that a translation from the spoken input to the corresponding portion of the text is correct.

19. The method of claim 18, wherein the first mode of user input comprises a speech input received from an audio sensor of the wearable device, wherein the method further comprises transcribing the speech input to identify at least one of the selected portion of text, the subject, or the command operation.

20. The method of claim 18, wherein the second mode of user input comprises an input from at least one of: a user input device, a gesture, or an eye gaze.

21. The method of claim 18, wherein the interaction with the selected portion of text comprises at least one of: selecting, editing, or composing the selected portion of text.

22. The method of claim 18, wherein the subject comprises one or more of: a word, a phrase, or a sentence.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06T

Patent Metadata

Filing Date

December 22, 2021

Publication Date

April 16, 2024

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search