Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method for annotation of video content in a device communicatively coupled to a network, the method comprising: receiving, in the device, a captured speech segment comprising speech from a user of a second device, wherein the captured speech segment annotates a portion of the video content streamed to the second device for being played to the user contemporaneously with the speech from the user; converting the captured speech segment to a text-segment; associating the text-segment with the portion of the video content contemporaneously played to the user; and storing in a selectively retrievable manner the text-segment so that the text-segment is associated with the portion of the video content.
A method for annotating video streams involves a device on a network receiving audio from a user on a second device. The user's speech provides a comment on a specific part of the video as it's playing. The system converts this speech into text and links it to the corresponding video segment. This text annotation is stored so it can be retrieved later in association with that specific point in the video.
2. An apparatus for annotation of a video content, the apparatus comprising: a memory; and a processor communicatively coupled to the memory and to a network interface, the processor configured to be communicatively coupled via the network interface to a network; the processor further configured to receive, via the network interface, a captured speech segment comprising speech from a user of a second device coupled to the network, wherein the captured speech segment annotates a portion of the video content streamed to the second device for being played to the user contemporaneously with the speech from the user; the processor further configured to convert the captured speech segment to a text-segment, to associate the text-segment with the portion of the video content contemporaneously played to the user; and to store in a selectively retrievable manner the text-segment so that the text-segment is associated with the portion of the video content.
An apparatus for annotating video content includes a processor and memory connected to a network. The processor receives audio from a user on another device connected to the network. This audio annotates a specific part of a video that's currently playing to the user. The processor converts this audio into text, links the text to the corresponding part of the video, and stores this linked text so that it is associated with the video content for later retrieval.
3. The method of claim 1 further comprising: streaming the video content via the network to the second device.
In addition to the method described in claim 1, which involves a network device receiving speech that annotates video content, the device also streams the video to the second device where the user is watching. So, the system handles both sending the video and receiving the annotations.
4. The method of claim 1 further comprising: receiving, in the device, a timestamp associated with the captured speech segment; wherein associating the text-segment with the portion of the video content further comprises using the timestamp.
Building upon claim 1, the annotation process also includes receiving a timestamp along with the speech segment. This timestamp is then used to more accurately associate the converted text with the specific moment in the video being annotated. This timestamp information improves the annotation's temporal accuracy.
5. The method of claim 1 further comprising: generating metadata based on the text-segment.
Expanding on claim 1, after converting the speech to text, the system generates metadata based on the text. This metadata could include keywords, topics, sentiment analysis, or other information extracted from the text that describes the content of the annotation.
6. The method of claim 1 further comprising: generating metadata based on an identified speaker associated with the speech segment.
Expanding on claim 1, the system generates metadata based on the identified speaker of the speech segment. This metadata could include the speaker's name, role, or other identifying information, allowing for speaker-specific annotations.
7. The method of claim 1 further comprising: generating metadata based on specific words of the text-segment.
Expanding on claim 1, the system generates metadata based on specific words within the converted text. For example, certain keywords could trigger specific metadata tags or classifications, providing a more detailed analysis of the video content.
8. The method of claim 1 further comprising: receiving, in the device, before receiving the captured speech segment, a message comprising a user input selecting an operational state.
Expanding on claim 1, the system receives, before receiving the speech segment, a user input which selects an operational state, such as selecting a specific mode for the annotation process before the audio input is received.
9. The method of claim 1 wherein the operational state is selected from the group consisting of an annotate state, a narrate state, a commentary state, an analyze state, and a review/edit state.
Building upon claim 8, the operational state chosen by the user is selected from annotate, narrate, commentary, analyze, or review/edit. This selection determines how the speech segment is handled and processed by the annotation system.
10. The method of claim 1 wherein storing the text-segment further comprises: storing the text-segment and metadata in a database of a storage device communicatively coupled to the network.
Further to claim 1, the storing of the text segment involves storing the text segment and related metadata in a database hosted on a storage device that's connected to the network, making it accessible to other systems and users.
11. The method of claim 1 wherein storing the text-segment further comprises: storing the text-segment in a database of a storage device communicatively coupled to the network.
Further to claim 1, the storing of the text segment involves storing the text segment in a database hosted on a storage device that's connected to the network, for access by other users or systems.
12. The method of claim 1 further comprising: storing, in a database of a storage device communicatively coupled to the network, metadata comprising a timestamp for associating the text-segment with the portion of the video content.
In addition to the method of claim 1, metadata, including a timestamp associating the text annotation with the specific point in the video, is stored in a database on a network-connected storage device. This allows accurate time-based searching and retrieval of annotations.
13. The method of claim 1 further comprising: storing, in a storage device communicatively coupled to the network, a modified version of the video content comprising metadata for associating the text-segment with the portion of the video content.
Expanding on claim 1, the system creates a modified version of the video content. This new version includes metadata that links the text annotation to the correct portion of the video, essentially embedding the annotation information within the video file itself.
14. The method of claim 1 wherein storing the text-segment further comprises: storing, in a storage device communicatively coupled to the network, a modified version of the video content comprising the text-segment and metadata for associating the text-segment with the portion of the video content.
Expanding on claim 1, the system creates a modified version of the video content which includes both the text segment and metadata that links the text annotation to the correct portion of the video, storing this modified video file on a network-connected storage device.
15. A method for annotation of video content in a device communicatively coupled to a network, the method comprising: receiving, in the device, a text-segment of recognized speech comprising recognized speech from a user of a second device coupled to the network, wherein the text-segment annotates a portion of the video content streamed to the second device for being played to the user contemporaneously with the speech from the user; associating the text-segment with the portion of the video content; and storing in a selectively retrievable manner the text-segment so that it is associated with the portion of the video content.
A method for video annotation involves receiving text of recognized speech from a user on a second device via a network connection. This text annotates a part of the video currently playing to the user. The text is linked to the video segment and stored in a way that it can be retrieved later in association with that specific video portion.
16. The method of claim 15 further comprising: streaming the video content via the network to the second device.
In addition to the text-based annotation method of claim 15, the system also streams the video content to the user's device over the network. This combined functionality allows for both content delivery and user annotation.
17. The method of claim 15 further comprising: receiving, in the device, a timestamp associated with the captured speech segment; wherein associating the text-segment with the portion of the video content further comprises using the timestamp.
Extending claim 15, the annotation process includes receiving a timestamp related to the original speech segment. This timestamp information is used to improve the accuracy of associating the text annotation with the specific moment in the video being viewed.
18. A non-transitory computer-readable medium having computer-executable instructions embodied thereon for annotation of video content in a device communicatively coupled to a network, wherein the instructions, when executed by at least one processor of the device, cause the at least one processor to perform the method of claim 15 .
A non-transitory computer-readable medium (like a hard drive or flash drive) stores instructions that, when executed by a processor, perform the method described in claim 15. This includes receiving text-based annotations from a user, linking them to video segments, and storing them for later retrieval.
19. A method for annotation of video content in a device communicatively coupled to a network, the method comprising: receiving, in the device, a text-segment of recognized speech comprising recognized speech from a user of a second device, wherein the text-segment annotates a portion of the video content streamed to the second device for being played to the user contemporaneously with the speech from the user; receiving, in the device, metadata comprising a timestamp for associating the text-segment with the portion of the video content; and storing in a selectively retrievable manner the text-segment so that it is associated with the portion of the video content.
A video annotation process involves receiving text of speech from a user on a second device, where that speech is annotating a video they are watching. It also receives metadata that includes a timestamp to associate the text with the correct video part. The system then stores the linked text annotation so it is retrievable with respect to the video segment.
20. The method of claim 19 further comprising: streaming the video content via the network to the second device.
In addition to the method of claim 19, which receives text annotations and metadata, the device also streams the video content to the user's device over the network to be annotated.
21. The method of claim 19 further comprising: receiving, in the device, a timestamp associated with the captured speech segment; wherein associating the text-segment with the portion of the video content further comprises using the timestamp.
Expanding on claim 19, the annotation process includes receiving a timestamp related to the original speech segment. This timestamp is then used to more accurately associate the text annotation with the specific moment in the video being viewed.
22. A non-transitory computer-readable medium having computer-executable instructions embodied thereon for annotation of video content in a device communicatively coupled to a network, wherein the instructions, when executed by at least one processor of the device, cause the at least one processor to perform the method of claim 19 .
A non-transitory computer-readable medium (like a USB drive or SSD) stores software instructions that, when run on a processor, carry out the video annotation method from claim 19. This method involves receiving text annotations and associated metadata (including timestamps), and storing them so they can be linked to specific parts of a video.
Unknown
July 29, 2014
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.