A system and method for video surveillance and searching are disclosed. Video is analyzed and events are automatically detected. Based on the automatically detected events, textual descriptions are generated. The textual descriptions may be used to supplement video viewing and event viewing, and to provide for textual searching for events.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A video surveillance system, comprising: a first video camera configured to capture at least a first video sequence including a plurality of video images; a processing system configured to automatically detect events; a storage system configured to receive and store records of the automatically detected events with respective geographical locations in association with the automatically detected events, the associated geographical location being at least in part based on map data; and a text generator configured to automatically generate textual descriptions of the automatically detected events, and wherein the automatically detected events are events that relate to the map data and are based on rules that relate to the map data, such that the text generator generates text that combines the map data with the automatically detected events into a report; and wherein a Geographical Information System (GIS) database is used to extract a name of a static scene feature in the video sequence and the name is included in the report.
2. The video surveillance system of claim 1 , further comprising: a file generator configured to generate a file including the first video sequence and including embedded text that reflects a first textual description associated with a first detected event that occurs at a first time of the first video sequence and is inserted in the first video sequence based on the first time, and a second textual description associated with a second detected event that occurs at a second time of the first video sequence and is inserted in the first video sequence based on the second time, wherein the file is arranged so that when played back, the first textual description appears in the first video sequence at the first time and the second textual description appears in the first video sequence at the second time, such that the text displayed with the first video sequence changes, wherein the file for the first video sequence includes a first set of video frames, each video frame of the first set corresponding to one of the video images and including for simultaneous display a video image and embedded text.
3. The video surveillance system of claim 2 , further comprising: a display system configured to play back the first video sequence and the embedded text, such that during the play back, individual video images of the first video sequence that are associated with an automatically detected event are displayed at the same time as embedded text describing the event.
4. The video surveillance system of claim 2 , wherein the first time and second time include at least one of a temporal location within the video sequence and a universal time.
5. The video surveillance system of claim 1 , wherein the processing system is further configured to automatically detect events that involve two or more agents in the video sequence.
6. The video surveillance system of claim 1 , wherein the textual descriptions include a natural language sentence.
7. A video surveillance system comprising: a first video camera configured to capture a first video sequence including a plurality of video images; a processing system configured to automatically detect events in the video sequence; a storage system configured to receive and store records of the automatically detected events and to associate each automatically detected event with a geographical location where the automatically detected event occurred, wherein the records of the automatically detected events are searchable using one or more dropdown lists; a text generator configured to automatically generate a textual description of a first event of the automatically detected events, the textual description enhanced to include text that reflects the geographical location; a file generator configured to generate a file including the first video sequence and including embedded text that reflects the textual description and that is inserted in the video sequence; and a display system, wherein for each automatically detected event, and based on the textual description for the event, the display system is configured to display information for the event, the information including the text that reflects the geographical location, and to overlay the information on a map of a geographical area that includes the geographical location, such that the information for the event including the text that reflects the geographical location is visually associated with a particular location on the map, and wherein a Geographical Information System (GIS) database is used to extract the text that reflects the geographical location.
8. The video surveillance system of claim 7 , wherein the processing system is further configured to receive a search request including one or more natural language terms relating to the first event, and to return the information to be displayed and overlaid on the map of the geographical area, based on the search request.
9. The video surveillance system of claim 8 , wherein the search request further includes at least one of: geographical information, or time information.
10. The video surveillance system of claim 8 , wherein the display system includes a hand-held device configured to submit the search request and receive and display the information for the first event overlaid on the map of the geographical area.
11. The video surveillance system of claim 8 , wherein the processing system is configured to determine search results based on the search request and based on a geographical location of a device that makes the request.
12. The video surveillance system of claim 11 , wherein the device is a smart phone.
13. The video surveillance system of claim 11 , wherein the file generator is further configured to generate clips of the first video sequence, each clip including a set of video images that comprises an event, and the processing system is further configured to associate data related to the event with the clip.
14. The video surveillance system of claim 13 , wherein the processing system is further configured to receive a search request including one or more natural language terms relating to the first event, and return a clip of the first event based on the search request.
15. The video surveillance system of claim 7 , further comprising: for the first event, automatically displaying on the map, separately from the textual description for the first event, an indication of a route taken, the route corresponding to the first event.
16. A video surveillance method, comprising: analyzing a first video sequence including a plurality of video images; automatically detecting events in the first video sequence; receiving and storing records of the automatically detected events and associating each of the automatically detected events with a geographical location at which the event occurs, wherein the records of the automatically detected events are searchable using one or more dropdown lists; automatically generating a textual description of a first event of the automatically detected events, the textual description including text describing an actor, an action, and the geographical location; and for the first event of the automatically detected events, causing a display system to display information based on the textual description and overlay the information on a map of a geographical area, such that the text describing the actor, the action, and the geographical location is visually associated with a particular geographical location depicted in the map, and wherein a Geographical Information System (GIS) database is used to extract the text describing the geographical location.
17. The video surveillance method of claim 16 , further comprising receiving a search request including one or more natural language terms relating to the first event, and returning the information to be overlaid on the map of the geographical area, based on the search request.
18. The video surveillance method of claim 17 , wherein the search request further includes at least one of: geographical information, or temporal information.
19. The video surveillance method of claim 17 , further comprising, receiving the search request from a display device, and based on the search request, causing the display device to display the information for the first event overlaid on the map of the geographical area.
20. The video surveillance method of claim 17 , further comprising determining search results based on the search request and based on a geographical location of a device that makes the request.
21. The video surveillance method of claim 20 , wherein the device is a smart phone.
22. The video surveillance method of claim 16 , further comprising generating a file including a video clip of the first video sequence and including embedded text that reflects the textual description and is inserted in the video clip based on a time associated with the first video sequence.
23. The video surveillance method of claim 22 , further comprising: causing a display system to display the video clip and the embedded text, such that individual video images of the first video sequence that are associated with an automatically detected event are displayed at the same time as embedded text describing the event.
24. The video surveillance method of claim 16 , wherein automatically detecting events includes automatically detecting events between two agents in the video sequence.
25. The video surveillance method of claim 16 , further comprising: automatically displaying on the map, separately from the textual description, an indication of a route taken by the actor, the route corresponding to the first event.
26. A video searching method, comprising: capturing a plurality of video sequences from a plurality of respective video sources, each video sequence including a plurality of video images; for each video sequence, automatically detecting an event associated with a set of video images; for each detected event, storing a record of the event and associating with the record geographical information about the event; receiving a search request related to one or more events, the search request generated using at least one drop down menu, the search request including one or more natural language terms; searching from among the stored records for events that satisfy the search request; based on the results of the search, providing a natural language description for each automatically detected event, the natural language description for each automatically detected event including the geographical information about that event; and based on the results of the search, for each automatically detected event, causing a display device to display a map of a geographical area with the natural language description including the geographical information overlaid on the map and wherein a Geographical Information System (GIS) database is used to extract text that reflects the geographical information.
27. The video searching method of claim 26 , further comprising: based on the results of the search, for each detected event, additionally transmitting a video clip described by the natural language description and associated with the geographical information, and/or with universal time information.
28. The video searching method of claim 27 , further comprising: based on the results of the search, for each detected event, additionally providing, along with the natural language description and with the geographical information, universal time information, and/or a single image associated with the event.
29. The video searching method of claim 26 , wherein part of the natural language description for an associated event comprises a link to a video clip for the associated event.
30. The video searching method of claim 29 , wherein the video clip includes a series of video images of the associated event and an embedded natural language textual description of the event.
31. The video searching method of claim 26 , wherein each video source is a video camera.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
February 23, 2017
September 8, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.