Visual Relationship Detection Method and System Based on Region-Aware Learning Mechanisms

PublishedApril 12, 2022

Assigneenot available in USPTO data we have

InventorsAnan LIU Hongshuo TIAN Ning XU Weizhi NIE Dan SONG

Technical Abstract

Patent Claims

8 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A visual relationship detection method based on a region-aware learning mechanism, executed by a processor, comprising: acquiring a triplet graph structure and combining features after its aggregation with neighboring nodes, using the features as nodes in a second graph structure, and connecting in accordance with equiprobable edges to form the second graph structure; combining node features of the second graph structure with features of corresponding entity object nodes in the triplet, using the combined features as a visual attention mechanism and merging internal region visual features extracted by two entity objects, and using the merged region visual features as visual features to be used in the next message propagation by corresponding entity object nodes in the triplet; and after a certain number of times of message propagations, combining the output triplet node features and the node features of the second graph structure to infer predicates between object sets.

2. The visual relationship detection method based on a region-aware learning mechanism according to claim 1 , wherein the step of “acquiring a triplet graph structure” specifically, executed by the processor, comprises: using region visual features of the entity objects as features of a set of nodes in the first graph structure, connecting the entity objects in accordance with probabilities of co-occurrence, and gathering feature information of neighboring nodes by a message propagation mechanism to enhance the visual representation of the current node; using, after each message propagation, the output node features as the visual attention mechanism and also as the visual features to be used in the next message propagation by the nodes in the first graph structure; and using the extracted features of each object set and region visual features of the corresponding two entity objects as a set of nodes, and connecting in accordance with the statistical probabilities of visual relationships to form a triplet graph structure.

3. The visual relationship detection method based on a region-aware learning mechanism according to claim 2 , wherein the first graph structure is specifically as follows: co-occurrence matrixes are used as edges of the first graph structure and region visual features are used as vertices of the first graph structure.

4. The visual relationship detection method based on a region-aware learning mechanism according to claim 2 , wherein the step of “using, after each message propagation, the output node features as the visual attention mechanism and also as the visual features to be used in the next message propagation by the nodes in the first graph structure” executed by the processor, specifically comprises: combining the enhanced node representation with each region visual feature, to compute an unnormalized relevance score; normalizing the unnormalized relevance score to acquire a weight distribution value of the visual attention mechanism; obtaining the weighted sum of M region features of each entity object by the acquired weight distribution value of the visual attention mechanism, to obtain the merged visual representation; and acquiring the merged visual representation, and performing message propagation by using the merged visual representation as the visual features to be used in the next message propagation by corresponding nodes in the first graph structure.

5. The visual relationship detection method based on a region-aware learning mechanism according to claim 2 , wherein the triplet graph structure is specifically as follows: the statistical probabilities of visual relationships are used as edges of the triplet graph structure; and features of each object set and the region visual features of the corresponding two entity objects are used as vertices of the triplet graph structure.

6. The visual relationship detection method based on a region-aware learning mechanism according to claim 1 , wherein the second graph structure is specifically executed by the processor, as follows: Acquiring the output features of each triplet graph structure after its aggregation with neighboring nodes, mapping the acquired features to a feature space in a same dimension, and then connected them in the dimension of feature as the nodes in the second graph structure; and fully connecting the nodes in the second graph structure, and edges connecting each node and its neighboring nodes are equiprobable edges.

7. The visual relationship detection method based on a region-aware learning mechanism according to claim 1 , wherein the step of “using the combined features as a visual attention mechanism and merging internal region visual features extracted by two entity objects”, executed by the processor, specifically comprises: computing an unnormalized relevance score by the combined features and the output each region visual feature; and normalizing the unnormalized relevance score to acquire a weight distribution value of the visual attention mechanism, and obtaining the weighted sum of region features of the corresponding entity object to obtain the merged visual representation.

8. The visual relationship detection method based on a region-aware learning mechanism according to claim 1 , wherein the step of “combining the output triplet node features and the node features of the second graph structure”, executed by the processor, specifically comprises: outputting the nodes of each entity object in the triplet graph structure after T k message propagations, processing with the average pooling strategy and then combining with the visual features of the entity object in the dimension of feature; and outputting the nodes of the object sets in the triplet graph structure after T k message propagations, and connecting with the object set features of an initialized node and the output of each node in the second graph structure in the dimension of feature.

Patent Metadata

Filing Date

Unknown

Publication Date

April 12, 2022

Inventors

Anan LIU

Hongshuo TIAN

Ning XU

Weizhi NIE

Dan SONG

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search