A system for automatic creation of interactive step-by-step guide using wearable devices is proposed. The system includes wearable audio-visual sensors such as a first-person camera, a processor, a computer readable medium and a communication interface module to deliver interactive guidance to the users.
Legal claims defining the scope of protection, as filed with the USPTO.
1. An interactive guide system comprising: a processor configured to: receive a video captured using a camera of a first client device, the video being captured while a user performs a task with respect to an object and speaks aloud to provide step-by-step instructions for performing the task; extract text of the step-by-step instructions spoken by the user from the video using speech recognition; detect at least one keyword in the text of the step-by-step instructions; extract a respective image frame for each step of the step-by-step instructions based on the detection of the at least one keyword; and assemble, and transmit to a second client device, an interactive guide file including the text of the step-by-step instructions and the respective image frame for each step of the step-by-step instructions.
2. The interface guide system of claim 1 , wherein the processor extract the text of the step-by-step instructions spoken by the user from the video using a natural language processing (NLP) technique.
3. The interactive guide system of claim 1 further comprising: a computer readable medium coupled to the processor and configured to store program instructions, the processor being configured to execute the program instructions to extract the text of the step-by-step instructions, detect the at least one keyword in the text of the step-by-step instructions, extract the respective image frame for each step of the step-by-step instructions, and assemble the interactive guide file.
4. The interactive guide system of claim 1 , wherein the second client device is communicatively coupled to the processor via a communication interface.
5. The interactive guide system of claim 1 , wherein the first client device is a wearable device having a first-person camera.
6. A device comprising: a non-transitory computer-readable medium that stores computer-executable instructions that, when executed by a processor, cause a processor to: receive a video captured using a camera of a first client device, the video being captured while a user performs a task with respect to an object and while the user speaks aloud to provide step-by-step instructions for performing the task; extract text of the step-by-step instructions spoken by the user from the video using speech recognition; detect at least one keyword in the text of the step-by-step instructions; extract a respective image frame for each step of the step-by-step instructions based on the detection of the at least one keyword; and assemble, and transmit to a second client device, an interactive guide file including the text of the step-by-step instructions and the respective image frame for each step of the step-by-step instructions.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
January 2, 2017
December 7, 2021
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.