US-10121076

Recognizing entity interactions in visual media

PublishedNovember 6, 2018

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

An entity interaction recognition system algorithmically recognizes a variety of different types of entity interactions that may be captured in two-dimensional images. In some embodiments, the system estimates the three-dimensional spatial configuration or arrangement of entities depicted in the image. In some embodiments, the system applies a proxemics-based analysis to determine an interaction type. In some embodiments, the system infers, from a characteristic of an entity detected in an image, an area or entity of interest in the image.

Patent Claims

21 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for inferring an area of interest in a two-dimensional image depicting at least one person, the method comprising, with a computing system, algorithmically: locating the at least one person in the image; determining, from the image, a spatial configuration of at least a portion of the at least one person located in the image; estimating a three-dimensional position of the person from the determined spatial configuration; analyzing the three-dimensional position of the person using a proxemics analysis; determining a type of human interaction likely depicted in the image based on the proxemics analysis; and inferring an area of interest in the image based on the determined type of human interaction, the area of interest at least partially spaced from the at least one person, and the area of interest having a size that is greater than zero and less than the size of the entire image.

2. The method of claim 1 , wherein the method comprises further inferring the area of interest based on a location of the person in the image and a spatial configuration determined for a plurality of characteristics of the person.

3. The method of claim 2 , wherein the plurality of characteristics comprises a face pose and a hand location of the person, and the method further comprises estimating the face pose, estimating the hand location based on the estimated face pose, and inferring the area of interest based on the estimated face pose and the estimated hand location.

4. The method of claim 2 , wherein the spatial configuration comprises an estimated face pose of the person, and the method comprises computing a direction of the person's gaze from the estimated face pose, and inferring the area of interest based on the computed direction of the person's gaze.

5. The method of claim 4 , further comprising estimating a location of the person's hand and inferring the area of interest based on the location of the person's hand.

6. The method of claim 1 , wherein the area of interest comprises a plurality of entities of possible interest, and the method comprises filtering the plurality of entities of possible interest to a smaller number of entities of possible interest based on a determined spatial configuration of at least one entity depicted in the image.

7. The method of claim 1 , comprising detecting an attribute of the person depicted in the image, computing a size of the attribute detected in the image, comparing the size of the detected attribute to a threshold size, and inferring the area of interest in the image based on the comparison of the size of the detected attribute to the threshold size.

8. The method of claim 7 , wherein the attribute of the person comprises the person's face, and the method comprises computing the size of the detected face, comparing the size of the detected face to a threshold face size and inferring the area of interest in the image based on the comparison of the size of the detected face to the threshold face size.

9. The method of claim 1 , wherein the inferred area of interest comprises a plurality of entities of possible interest and the method comprises using one or more characteristics of one or more of the entities of possible interest to filter the plurality of entities of possible of interest to a smaller number of entities of possible interest.

10. The method of claim 1 , further comprising a surface, and the method comprises determining a three-dimensional arrangement of the surface and another entity depicted in the image based on a characteristic of the surface.

11. The method of claim 10 , comprising associating a surface affordance with the surface based on the characteristic of the surface and the three-dimensional arrangement of the surface and a person depicted in the image.

12. A non-transitory computer readable medium for storing computer instructions that, when executed by one or more processors causes the one or more processors to perform a method for inferring an area of interest in a two-dimensional image depicting at least one person, the method comprising, algorithmically: locating the at least one person in the image; determining, from the image, a spatial configuration of at least a portion of the at least one person located in the image; estimating a three-dimensional position of the person from the determined spatial configuration; analyzing the three-dimensional position of the person using a proxemics analysis; determining a type of human interaction likely depicted in the image based on the proxemics analysis; and inferring an area of interest in the image based on the determined type of human interaction, the area of interest at least partially spaced from the at least one person, and the area of interest having a size that is greater than zero and less than the size of the entire image.

13. A computing system comprising one or more of an image/video tagger, an information retrieval system, and an intelligent assistant and one or more processors to perform a method for inferring an area of interest in a two-dimensional image depicting at least one person, the method comprising, algorithmically: locating the at least one person in the image; determining, from the image, a spatial configuration of at least a portion of the at least one person located in the image; estimating a three-dimensional position of the person from the determined spatial configuration; analyzing the three-dimensional position of the person using a proxemics analysis; determining a type of human interaction likely depicted in the image based on the proxemics analysis; and inferring an area of interest in the image based on the determined type of human interaction, the area of interest at least partially spaced from the at least one person, and the area of interest having a size that is greater than zero and less than the size of the entire image.

14. A method for inferring an area of interest in a recorded two-dimensional image using a characteristic of a gaze of a person depicted in the recorded two-dimensional image, the method comprising, with a computing system, algorithmically: detecting the person in the recorded two-dimensional image; estimating a three-dimensional spatial configuration of at least a portion of the person in the image; inferring a characteristic of the person's gaze in the image based on the estimated spatial configuration; determining a proxemics class associated with the recorded two-dimensional image based on the determined three-dimensional spatial configuration; and inferring an area of interest in the image based on at least one of the inferred characteristic of the person's gaze or the proxemics class.

15. The method of claim 14 , wherein the estimated spatial configuration comprises an estimated head pose of the detected person and the method comprises inferring a characteristic of the person's gaze in the image based on the estimated head pose.

16. The method of claim 14 , comprising classifying the recorded two-dimensional image as depicting a type of human interaction based on the inferred characteristic of the person's gaze.

17. The method of claim 16 , wherein the recorded image comprises a plurality of images and the method comprises tracking the person's gaze by repeating the detecting, estimating, and inferring for the plurality of images over a time period, and reclassifying the recorded image as depicting a different type of human interaction based on the tracking of the person's gaze over the time period.

18. The method of claim 14 , comprising inferring a direction of the person's gaze, and inferring an area of interest in the recorded two-dimensional image based on the direction of the person's gaze, wherein the area of interest at least partially spaced from the person, and the area of interest has a size that is greater than zero and less than the size of the entire recorded two-dimensional image.

19. The method of claim 18 , wherein the inferring an area of interest comprises inferring a three-dimensional area of interest from the recorded two-dimensional image.

20. A non-transitory computer readable medium for storing computer instructions that, when executed by one or more processors, causes the one or more processors to perform a method for inferring an area of interest in a recorded two-dimensional image using a characteristic of a gaze of a person depicted in the recorded two-dimensional image, the method comprising, algorithmically: detecting the person in the recorded two-dimensional image; estimating a three-dimensional spatial configuration of at least a portion of the person in the image; inferring a characteristic of the person's gaze in the image based on the estimated spatial configuration; determining a proxemics class associated with the recorded two-dimensional image based on the determined three-dimensional spatial configuration; and inferring an area of interest in the image based on at least one of the inferred characteristic of the person's gaze or the proxemics class.

21. A computing system comprising one or more of an image/video tagger, an information retrieval system, and an intelligent assistant and one or more processors to perform a method for inferring an area of interest in a recorded two-dimensional image using a characteristic of a gaze of a person depicted in the recorded two-dimensional image, the method comprising, algorithmically: detecting the person in the recorded two-dimensional image; estimating a three-dimensional spatial configuration of at least a portion of the person in the image; inferring a characteristic of the person's gaze in the image based on the estimated spatial configuration; determining a proxemics class associated with the recorded two-dimensional image based on the determined three-dimensional spatial configuration; and inferring an area of interest in the image based on at least one of the inferred characteristic of the person's gaze or the proxemics class.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06V

Patent Metadata

Filing Date

May 2, 2016

Publication Date

November 6, 2018

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search