10733457

Method and System for Predicting in Real-Time One or More Potential Threats in Video Surveillance

PublishedAugust 4, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
11 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method of predicting in real-time one or more potential threats in video surveillance, the method comprising: receiving, by a threat prediction system, a real-time video feed from a video surveillance system, wherein the video feed comprises a plurality of frames associated with a scene captured at a location of the video surveillance system; identifying, by the threat prediction system, one or more objects in each of the plurality of frames, wherein each of the one or more objects is sequenced with respect to the received plurality of frames; generating, by the threat prediction system, a scene description for each of the plurality of frames based on the one or more objects and context associated with corresponding frames, wherein the scene description comprises sentences describing the scene and gestures, and the context comprises sentences describing the scene along with emotions associated with a user, wherein the user is associated with the one or more objects in the corresponding frames; determining, by the threat prediction system, one or more real-time actions for the scene based on the scene description, wherein the one or more real-time actions are determined using a trained action prediction model which is trained based on a conditional probability of possible state action change from one state to another state from a sequence of possible states; predicting, by the threat prediction system, one or more potential threats to the user associated with the video feed based on the one or more real-time actions; and alerting, by the threat prediction system, the user of the one or more potential threats based on the prediction.

Plain English Translation

This method predicts real-time threats in video surveillance. It starts by receiving live video from a surveillance system, which includes frames of a scene. The system then identifies objects in each frame, keeping track of their sequence. Next, it generates a detailed scene description for each frame, combining information about these objects with context. The scene description includes sentences describing the scene and gestures, while the context adds sentences describing the scene along with the emotions of any identified users. Based on this scene description, the method determines real-time actions within the scene using a trained action prediction model. This model is specifically trained on the conditional probability of how actions change from one state to another over time. Finally, the system predicts potential threats to the user based on these real-time actions and then alerts the user about these predicted threats.

Claim 2

Original Legal Text

2. The method as claimed in claim 1 , wherein the one or more objects are identified using a trained object detection model, wherein the object detection model is trained using a plurality of video training feeds using convolution neural network technique.

Plain English Translation

This method predicts real-time threats in video surveillance. It starts by receiving live video from a surveillance system, which includes frames of a scene. The system then identifies objects in each frame using a trained object detection model, which was specifically trained with multiple video feeds using a Convolutional Neural Network (CNN) technique, keeping track of their sequence. Next, it generates a detailed scene description for each frame, combining information about these objects with context. The scene description includes sentences describing the scene and gestures, while the context adds sentences describing the scene along with the emotions of any identified users. Based on this scene description, the method determines real-time actions within the scene using a trained action prediction model. This model is specifically trained on the conditional probability of how actions change from one state to another over time. Finally, the system predicts potential threats to the user based on these real-time actions and then alerts the user about these predicted threats.

Claim 3

Original Legal Text

3. The method as claimed in claim 1 , wherein the scene description is generated using a trained scene description model, and wherein the scene description model is trained using a plurality of training objects identified for a plurality of video training feeds.

Plain English Translation

This method predicts real-time threats in video surveillance. It starts by receiving live video from a surveillance system, which includes frames of a scene. The system then identifies objects in each frame, keeping track of their sequence. Next, it generates a detailed scene description for each frame using a trained scene description model, which was trained using various objects identified from multiple video training feeds. This scene description combines information about the identified objects with context. The scene description itself comprises sentences describing the scene and gestures, while the context adds sentences describing the scene along with the emotions of any identified users. Based on this scene description, the method determines real-time actions within the scene using a trained action prediction model. This model is specifically trained on the conditional probability of how actions change from one state to another over time. Finally, the system predicts potential threats to the user based on these real-time actions and then alerts the user about these predicted threats.

Claim 4

Original Legal Text

4. The method as claimed in claim 1 , wherein the action prediction model is trained using a plurality of actions identified from a plurality of video training feeds.

Plain English Translation

This method predicts real-time threats in video surveillance. It starts by receiving live video from a surveillance system, which includes frames of a scene. The system then identifies objects in each frame, keeping track of their sequence. Next, it generates a detailed scene description for each frame, combining information about these objects with context. The scene description includes sentences describing the scene and gestures, while the context adds sentences describing the scene along with the emotions of any identified users. Based on this scene description, the method determines real-time actions within the scene using a trained action prediction model. This model is specifically trained on the conditional probability of how actions change from one state to another over time, and importantly, it is trained using various actions identified from multiple video training feeds. Finally, the system predicts potential threats to the user based on these real-time actions and then alerts the user about these predicted threats.

Claim 5

Original Legal Text

5. The method as claimed in claim 1 , wherein the one or more potential threats are predicted by mapping each of the one or more real-time actions with a plurality of predefined threats using a trained threat prediction model, wherein the threat prediction model is trained using a plurality of training actions.

Plain English Translation

This method predicts real-time threats in video surveillance. It starts by receiving live video from a surveillance system, which includes frames of a scene. The system then identifies objects in each frame, keeping track of their sequence. Next, it generates a detailed scene description for each frame, combining information about these objects with context. The scene description includes sentences describing the scene and gestures, while the context adds sentences describing the scene along with the emotions of any identified users. Based on this scene description, the method determines real-time actions within the scene using a trained action prediction model. This model is specifically trained on the conditional probability of how actions change from one state to another over time. Finally, the system predicts potential threats to the user by mapping the determined real-time actions to a set of predefined threats, utilizing a trained threat prediction model that was trained using various training actions. The system then alerts the user about these predicted threats.

Claim 6

Original Legal Text

6. A threat prediction system for predicting in real-time one or more potential threats in video surveillance, comprising: a processor; and a memory communicatively coupled to the processor, wherein the memory stores processor instructions, which, on execution, causes the processor to: receive a real-time video feed from a video surveillance system, wherein the video feed comprises a plurality of frames associated with a scene captured at a location of the video surveillance system; identify one or more objects in each of the plurality of frames, wherein each of the one or more objects is sequenced with respect to the received plurality of frames; generate a scene description for each of the plurality of frames based on the one or more objects and context associated with corresponding frames, wherein the scene description comprises sentences describing the scene and gestures, and the context comprises sentences describing the scene along with emotions associated with a user, wherein the user is associated with the one or more objects in the corresponding frames; determine one or more real-time actions for the scene based on the scene description, wherein the one or more real-time actions are determined using based on a conditional probability of possible state action change from one state to another state from a sequence of possible states; predict one or more potential threats to the user associated with the video feed based on the one or more real-time actions; and alert the user of the one or more potential threats based on the prediction.

Plain English Translation

A threat prediction system comprises a processor and memory to perform real-time threat prediction in video surveillance. The system receives live video from a surveillance system, consisting of frames from a captured scene. It identifies sequenced objects within each frame. For each frame, it generates a scene description based on these objects and context, where the description includes scene sentences and gestures, and the context includes scene sentences plus associated user emotions. Based on this scene description, the system determines real-time actions for the scene using a model trained on conditional probability of state action changes. Subsequently, it predicts potential threats to the user based on these real-time actions and then alerts the user about the predicted threats.

Claim 7

Original Legal Text

7. The threat prediction system as claimed in claim 6 , wherein the processor identifies the one or more objects using a trained object detection model, wherein the object detection model is trained using a plurality of video training feeds using convolution neural network technique.

Plain English Translation

A threat prediction system comprises a processor and memory to perform real-time threat prediction in video surveillance. The system receives live video from a surveillance system, consisting of frames from a captured scene. The processor identifies sequenced objects within each frame using a trained object detection model, which was specifically trained with multiple video feeds using a Convolutional Neural Network (CNN) technique. For each frame, it generates a scene description based on these objects and context, where the description includes scene sentences and gestures, and the context includes scene sentences plus associated user emotions. Based on this scene description, the system determines real-time actions for the scene using a model trained on conditional probability of state action changes. Subsequently, it predicts potential threats to the user based on these real-time actions and then alerts the user about the predicted threats.

Claim 8

Original Legal Text

8. The threat prediction system as claimed in claim 6 , wherein the processor generates the scene description using a trained scene description model, and wherein the scene description is trained using a plurality of training objects identified for a plurality of video training feeds.

Plain English Translation

A threat prediction system comprises a processor and memory to perform real-time threat prediction in video surveillance. The system receives live video from a surveillance system, consisting of frames from a captured scene. It identifies sequenced objects within each frame. The processor generates a scene description for each frame using a trained scene description model, which was trained using various objects identified from multiple video training feeds. This description is based on the identified objects and context, where the scene description includes sentences describing the scene and gestures, and the context includes sentences describing the scene plus associated user emotions. Based on this scene description, the system determines real-time actions for the scene using a model trained on conditional probability of state action changes. Subsequently, it predicts potential threats to the user based on these real-time actions and then alerts the user about the predicted threats.

Claim 9

Original Legal Text

9. The threat prediction system as claimed in claim 6 , wherein the action prediction model is trained using a plurality of actions identified from a plurality of video training feeds.

Plain English Translation

A threat prediction system comprises a processor and memory to perform real-time threat prediction in video surveillance. The system receives live video from a surveillance system, consisting of frames from a captured scene. It identifies sequenced objects within each frame. For each frame, it generates a scene description based on these objects and context, where the description includes scene sentences and gestures, and the context includes scene sentences plus associated user emotions. Based on this scene description, the system determines real-time actions for the scene using a trained action prediction model. This model is specifically trained on the conditional probability of how actions change from one state to another over time, and importantly, it is trained using various actions identified from multiple video training feeds. Subsequently, the system predicts potential threats to the user based on these real-time actions and then alerts the user about the predicted threats.

Claim 10

Original Legal Text

10. The threat prediction system as claimed in claim 6 , wherein the processor predicts the one or more potential threats by mapping each of the one or more real-time actions with a plurality of predefined threats using a trained threat prediction model, and wherein the threat prediction model is trained using a plurality of training actions.

Plain English Translation

A threat prediction system comprises a processor and memory to perform real-time threat prediction in video surveillance. The system receives live video from a surveillance system, consisting of frames from a captured scene. It identifies sequenced objects within each frame. For each frame, it generates a scene description based on these objects and context, where the description includes scene sentences and gestures, and the context includes scene sentences plus associated user emotions. Based on this scene description, the system determines real-time actions for the scene using a model trained on conditional probability of state action changes. Subsequently, the processor predicts potential threats to the user by mapping the determined real-time actions to a set of predefined threats, utilizing a trained threat prediction model that was trained using various training actions. The system then alerts the user about these predicted threats.

Claim 11

Original Legal Text

11. A non-transitory computer readable medium including instruction stored thereon that when processed by at least one processor cause threat prediction system to perform operation comprising: receiving a real-time video feed from a video surveillance system, wherein the video feed comprises a plurality of frames associated with a scene captured at a location of the video surveillance system; identifying one or more objects in each of the plurality of frames, wherein each of the one or more objects is sequenced with respect to the received plurality of frames; generating a scene description for each of the plurality of frames based on the one or more objects and context associated with corresponding frames, wherein the scene description comprises sentences describing the scene and gestures, and the context comprises sentences describing the scene along with emotions associated with a user, wherein the user is associated with the one or more objects in the corresponding frames; determining one or more real-time actions for the scene based on the scene description, wherein the one or more real-time actions are determined using based on a conditional probability of possible state action change from one state to another state from a sequence of possible states; predicting one or more potential threats to the user associated with the video feed based on the one or more real-time actions; and alerting the user of the one or more potential threats based on the prediction.

Plain English Translation

A non-transitory computer readable medium stores instructions that, when executed by a processor, cause a threat prediction system to predict real-time threats in video surveillance. The system receives live video from a surveillance system, comprising frames from a captured scene. It identifies sequenced objects within each frame. For each frame, it generates a scene description based on these objects and context, where the description includes scene sentences and gestures, and the context includes scene sentences plus associated user emotions. Based on this scene description, the system determines real-time actions for the scene using a model trained on conditional probability of state action changes. Subsequently, it predicts potential threats to the user based on these real-time actions and then alerts the user about the predicted threats.

Patent Metadata

Filing Date

Unknown

Publication Date

August 4, 2020

Inventors

Gopichand Agnihotram
Manjunath Ramachandra Iyer

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “METHOD AND SYSTEM FOR PREDICTING IN REAL-TIME ONE OR MORE POTENTIAL THREATS IN VIDEO SURVEILLANCE” (10733457). https://patentable.app/patents/10733457

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10733457. See llms.txt for full attribution policy.