US-10565988

Speech recognition for internet video search and navigation

PublishedFebruary 18, 2020

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Speech representing a desired video site or video subject is detected and digitized at a TV remote, and then sent to a TV. The TV or in some embodiments an Internet server communicating with the TV use speech recognition principles to recognize the speech, enter a database using the recognized speech as entering argument, and return a link to an Internet site hosting the desired video. The link can be displayed on the TV for selection thereof by a user to retrieve the video.

Patent Claims

26 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A television comprising: an audio video receiver; a display; circuitry configured to: receive speech signals representing a video site or video subject; implement speech recognition on received speech signals to generate recognized speech data representing a video site or video subject; using the recognized speech data representing the video site or video subject, access at least one database including indices derived from at least digitized voice soundtracks that accompany video, or at least descriptive text that is associated with video, or at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video, the indices being associated with the at least one database; and at least one index in the indices being correlated with the recognized speech and identified as at least one matching index element from the at least one database, the matching index element being useful for providing video to the display.

2. The television of claim 1 , wherein the circuitry is configured to access at least one index derived from text input, for at least a data amount.

3. The television of claim 1 , wherein the circuitry is configured to access information received for a most recent time period.

4. The television of claim 1 , wherein the circuitry is configured to access a most recent “X” amount of information received, wherein “X” is a data amount.

5. The television of claim 1 , wherein the circuitry is configured to access information representing items that are initial, manufacturer-defined grammar.

6. The television of claim 1 , wherein the television is configured to maintain a limited grammar database so that memory and processing requirements to process the limited grammar database are manageable within the confines of the processor and memory, the limited grammar database including indices derived at least from the closed captioned text received by the television for a past “X” bytes, the limited grammar database not including indices derived from the closed captioned text received by the television in excess of the past “X” bytes, such that a match to the recognized speech is identified responsive to the recognized speech containing content that has occurred in the broadcast in the past “X” bytes.

7. The television of claim 1 , wherein the television comprises a remote control device including the circuitry.

8. The television of claim 1 , wherein the indices are derived from at least digitized voice soundtracks that accompany video.

9. The television of claim 1 , wherein the indices are derived from at least descriptive text that is associated with video.

10. The television of claim 1 , wherein the indices are derived from both digitized voice soundtracks that accompany video and descriptive text that is associated with video.

11. A television comprising: an audio video device (AVD); a remote control; wherein the remote control comprises circuitry to digitize received speech and send the digitized speech to the AVD; wherein the AVD comprises circuitry to: generate wireless commands to an audio video device (AVD); receive digitized speech and generate recognized speech from the digitized speech, the recognized speech being associated with a video; using the recognized speech as entering argument, access a data structure correlating speech associated with video to computer storage locations of stored video, the data structure comprising at least one index derived from at least digitized voice soundtracks that accompany video, or at least descriptive text that is associated with video, or at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video; and retrieving, from the data structure, at least an identification of at least one video correlated to a match of the recognized speech.

12. The television of claim 11 , wherein remote control device is configured to access at least in part metadata received in video adapted to be presented on the AVD.

13. The television of claim 11 , wherein remote control device is configured to access at least in part closed caption text received in video adapted to be presented on the AVD.

14. The television of claim 11 , wherein the remote control device is configured to access only information received for a most recent time period.

15. The television of claim 11 , wherein the remote control device is configured to access only a most recent “X” amount of information received, wherein “X” is a data amount.

16. The television of claim 11 , wherein the remote control device is adapted to remotely control a television receiver.

17. The television of claim 11 , wherein the data structure is obtained at least in part using metadata received in video presented on an audio video device (AVD).

18. The television of claim 11 , wherein the data structure is obtained at least in part using closed caption text received in video presented on the AVD.

19. The television of claim 11 , wherein the index is derived from at least digitized voice soundtracks that accompany video.

20. The television of claim 11 , wherein the index is derived from at least descriptive text that is associated with video.

21. The television of claim 11 , wherein the index is derived from at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video.

22. A machine-executed method comprising: receiving speech signals representing a video site or video subject; implementing speech recognition on received speech signals representing a video site or video subject to generate recognized speech; using the recognized speech representing the video site or video subject, access at least one database including at least one index derived from at least digitized voice soundtracks that accompany video, or at least descriptive text that is associated with video, or at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video; and correlating the recognized speech with at least one element of the index identified by the accessing to identify at least one matching index element from the at least one database, the matching index element being useful for providing video to the AVD.

23. The method of claim 22 , wherein the index is derived from at least digitized voice soundtracks that accompany video.

24. The method of claim 22 , wherein the index is derived from at least descriptive text that is associated with video.

25. The method of claim 22 , wherein the index is derived from at least both digitized voice soundtracks that accompany video and descriptive text that is associated with video.

26. A computer-implemented method comprising: recognizing digitized speech representing a video and generating recognized speech in response; using the recognized speech representing a video as entering argument, access a data structure correlating speech associated with video to computer storage locations of stored video, the data structure comprising at least one index derived from at least digitized voice soundtracks that accompany video, or at least descriptive text that is associated with video, or at least both digitized voice soundtracks that accompany video and descriptive text that is assocaited with video; retrieving, from the data structure, at least an identification of at one video correlated to a match of the recognized speech.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L G06F H04N

Patent Metadata

Filing Date

March 8, 2016

Publication Date

February 18, 2020

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search