Learning-Based Pose Estimation from Depth Maps

PublishedNovember 12, 2013

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

45 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for processing data, comprising: receiving a depth map of a scene containing a humanoid form, the depth map comprising a matrix of pixels having respective pixel depth values; extracting from the depth map respective descriptors based on the depth values in a plurality of patches distributed in respective positions over the humanoid form, wherein extracting the respective descriptors comprises dividing each patch into an array of spatial bins, and computing a vector of descriptor values corresponding to the pixel depth values in each of the spatial bins, wherein the descriptor values are indicative of a statistical distribution of the depth values in each bin; matching the extracted descriptors to previously-stored descriptors in a database; and estimating a pose of the humanoid form based on stored information associated with the matched descriptors.

2. The method according to claim 1 , wherein each patch has a center point, and wherein the spatial bins that are adjacent to the center point have smaller respective areas than the spatial bins at a periphery of the patch.

3. The method according to claim 1 , wherein each patch has a center point, and wherein the spatial bins are arranged radially around the center point.

4. The method according to claim 1 , wherein the descriptor values are indicative of a distribution of at least one type of depth feature in each bin, selected from the group of depth features consisting of depth edges and depth ridges.

5. The method according to claim 1 , wherein matching the extracted descriptors comprises finding a respective approximate nearest neighbor of each of the matched extracted descriptors among the stored descriptors in the database.

6. The method according to claim 1 , wherein the descriptors in the database are associated with corresponding pointers to respective locations of body joints, and wherein estimating the pose comprises applying the pointers to the respective positions of the patches from which the matching descriptors were extracted in order to estimate the locations of the joints of the humanoid form.

7. The method according to claim 6 , and comprising creating the database by processing a set of training maps in which ground-truth locations of the body joints are indicated in order to find the corresponding pointers.

8. The method according to claim 1 , wherein receiving the depth map comprises receiving a sequence of depth maps, and wherein estimating the pose comprises tracking movement of the humanoid form over multiple frames in the sequence.

9. The method according to claim 8 , and comprising controlling a computer application responsively to the tracked movement.

10. A method for processing data, comprising: receiving a depth map of a scene containing a humanoid form, the depth map comprising a matrix of pixels having respective pixel depth values; extracting from the depth map respective descriptors based on the depth values in a plurality of patches distributed in respective positions over the humanoid form, wherein extracting the respective descriptors comprises dividing each patch into an array of spatial bins, and computing a vector of descriptor values corresponding to the pixel depth values in each of the spatial bins, wherein the descriptor values are indicative of a distribution of at least one type of depth feature in each bin, selected from the group of depth features consisting of depth edges and depth ridges, and wherein the distribution is indicative of at least one characteristic of the depth features, selected from the group of characteristics consisting of a spatial distribution of the depth features and a directional distribution of the depth features; matching the extracted descriptors to previously-stored descriptors in a database; and estimating a pose of the humanoid form based on stored information associated with the matched descriptors.

11. A method for processing data, comprising: receiving a depth map of a scene containing a humanoid form, the depth map comprising a matrix of pixels having respective pixel depth values; extracting from the depth map respective descriptors based on the depth values in a plurality of patches distributed in respective positions over the humanoid form; matching the extracted descriptors to previously- stored descriptors in a database, wherein the descriptors in the database are associated with corresponding pointers to respective locations of body joints; and estimating a pose of the humanoid form based on stored information associated with the matched descriptors, wherein estimating the pose comprises applying the pointers to the respective positions of the patches from which the matching descriptors were extracted in order to estimate the locations of the joints of the humanoid form, and wherein estimating the pose comprises associating respective weights with the estimated locations of the joints provided by the extracted descriptors, and applying a weighted voting process using the weights to find the locations of the joints.

12. The method according to claim 11 , wherein associating the respective weights comprises computing the weights based on at least one weighting term that is selected from a group of weighting terms consisting of: a similarity term, based on a descriptor distance between the matched descriptors; a patch distance term, based on a Euclidean distance between a patch position and a joint location; a joint distance term, based on a joint distance between a given joint location and a parent joint location that has already been estimated; a predictive term, based on a previous joint location derived from a preceding depth map; a variance term, based on a variance of the joint location determined in creating the database; and a bone length term, based on distance between a current estimated bone length and an expected bone length derived from the locations of the joints.

13. The method according to claim 11 , wherein associating the respective weights comprises assessing a reliability of the patches providing the estimated locations, and assigning reliability values to the estimated locations based on the assessed reliability.

14. A method for processing data, comprising: receiving a depth map of a scene containing a humanoid form, the depth map comprising a matrix of pixels having respective pixel depth values; extracting from the depth map respective descriptors based on the depth values in a plurality of patches distributed in respective positions over the humanoid form; matching the extracted descriptors to previously-stored descriptors in a database; estimating a pose of the humanoid form based on stored information associated with the matched descriptors; normalizing a depth of the depth map by finding a representative depth coordinate of the humanoid form in the depth map; projecting a point cloud derived from the depth map responsively to the representative depth coordinate; and applying the normalized depth in matching the descriptors and estimating the pose.

15. A method for processing data, comprising: receiving a depth map of a scene containing a humanoid form, the depth map comprising a matrix of pixels having respective pixel depth values; extracting from the depth map respective descriptors based on the depth values in a plurality of patches distributed in respective positions over the humanoid form; matching the extracted descriptors to previously-stored descriptors in a database; estimating a pose of the humanoid form based on stored information associated with the matched descriptors, wherein estimating the pose comprises finding respective locations of joints of the humanoid form; and calibrating a scale of the humanoid form by finding a distance between the locations of the joints and scaling the depth map responsively to the distance, and applying the calibrated scale in matching the descriptors and estimating the pose.

16. Mapping apparatus, comprising: an imaging assembly, which is configured to provide a depth map of a scene containing a humanoid form, the depth map comprising a matrix of pixels having respective pixel depth values; and a processor, which is configured to extract from the depth map respective descriptors based on the depth values in a plurality of patches distributed in respective positions over the humanoid form, to match the extracted descriptors to previously-stored descriptors in a database, and to estimate a pose of the humanoid form based on stored information associated with the matched descriptors, wherein the processor is configured to divide each patch into an array of spatial bins, and to compute the descriptor as a vector of descriptor values corresponding to the pixel depth values in each of the spatial bins, and wherein the descriptor values are indicative of a statistical distribution of the depth values in each bin.

17. The apparatus according to claim 16 ,wherein each patch has a center point, and wherein the spatial bins that are adjacent to the center point have smaller respective areas than the spatial bins at a periphery of the patch.

18. The apparatus according to claim 16 , wherein each patch has a center point, and wherein the spatial bins are arranged radially around the center point.

19. The apparatus according to claim 16 , wherein the descriptor values are indicative of a distribution of at least one type of depth feature in each bin, selected from the group of depth features consisting of depth edges and depth ridges.

20. The apparatus according to claim 19 , wherein the distribution is indicative of at least one characteristic of the depth features, selected from the group of characteristics consisting of a spatial distribution of the depth features and a directional distribution of the depth features.

21. The apparatus according to claim 16 , wherein the processor is configured to match the extracted descriptors by finding a respective approximate nearest neighbor of each of the matched extracted descriptors among the stored descriptors in the database.

22. The apparatus according to claim 16 , wherein the descriptors in the database are associated with corresponding pointers to respective locations of body joints, and wherein the processor is configured to estimate the pose by applying the pointers to the respective positions of the patches from which the matching descriptors were extracted in order to estimate the locations of the joints of the humanoid form.

23. The apparatus according to claim 22 , wherein the database is created by processing a set of training maps in which ground-truth locations of the body joints are indicated in order to find the corresponding pointers.

24. The apparatus according to claim 22 , wherein the processor is configured to associate respective weights with the estimated locations of the joints provided by the extracted descriptors, and to apply a weighted voting process using the weights to find the locations of the joints.

25. The apparatus according to claim 24 , wherein the weights comprise at least one weighting term that is selected from a group of weighting terms consisting of: a similarity term, based on a descriptor distance between the matched descriptors; a patch distance term, based on a Euclidean distance between a patch position and a joint location; a joint distance term, based on a joint distance between a given joint location and a parent joint location that has already been estimated; a predictive term, based on a previous joint location derived from a preceding depth map; a variance term, based on a variance of the joint location determined in creating the database; and a bone length term, based on distance between a current estimated bone length and an expected bone length derived from the locations of the joints.

26. The apparatus according to claim 24 , wherein the processor is configured to assess a reliability of the patches providing the estimated locations, and to assign reliability values to the estimated locations based on the assessed reliability.

27. The apparatus according to claim 16 , wherein the processor is configured to normalize a depth of the depth map by finding a representative depth coordinate of the humanoid form in the depth map and projecting a point cloud derived from the depth map responsively to the representative depth coordinate, and to apply the normalized depth in matching the descriptors and estimating the pose.

28. The apparatus according to claim 16 , wherein the processor is configured to find respective locations of joints of the humanoid form, and to calibrate a scale of the humanoid form by finding a distance between the locations of the joints and scaling the depth map responsively to the distance, and to apply the calibrated scale in matching the descriptors and estimating the pose.

29. The apparatus according to claim 16 , wherein the imaging assembly is configured to provide a sequence of depth maps, and wherein the processor is configured to track movement of the humanoid form over multiple frames in the sequence.

30. The apparatus according to claim 29 , wherein the processor is configured to control a computer application responsively to the tracked movement.

31. A computer software product, comprising a non-transitory computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive a depth map of a scene containing a humanoid form, the depth map comprising a matrix of pixels having respective pixel depth values, to extract from the depth map respective descriptors based on the depth values in a plurality of patches distributed in respective positions over the humanoid form, to match the extracted descriptors to previously-stored descriptors in a database, and to estimate a pose of the humanoid form based on stored information associated with the matched descriptors, wherein the instructions cause the computer to divide each patch into an array of spatial bins, and to compute the descriptor as a vector of descriptor values corresponding to the pixel depth values in each of the spatial bins, and wherein the descriptor values are indicative of a statistical distribution of the depth values in each bin.

32. The product according to claim 31 , wherein each patch has a center point, and wherein the spatial bins that are adjacent to the center point have smaller respective areas than the spatial bins at a periphery of the patch.

33. The product according to claim 31 , wherein each patch has a center point, and wherein the spatial bins are arranged radially around the center point.

34. The product according to claim 31 , wherein the descriptor values are indicative of a distribution of at least one type of depth feature in each bin, selected from the group of depth features consisting of depth edges and depth ridges.

35. The product according to claim 34 , wherein the distribution is indicative of at least one characteristic of the depth features, selected from the group of characteristics consisting of a spatial distribution of the depth features and a directional distribution of the depth features.

36. The product according to claim 31 , wherein the instructions cause the computer to match the extracted descriptors by finding a respective approximate nearest neighbor of each of the matched extracted descriptors among the stored descriptors in the database.

37. The product according to claim 31 , wherein the descriptors in the database are associated with corresponding pointers to respective locations of body joints, and wherein the instructions cause the computer to estimate the pose by applying the pointers to the respective positions of the patches from which the matching descriptors were extracted in order to estimate the locations of the joints of the humanoid form.

38. The product according to claim 37 , wherein the database is created by processing a set of training maps in which ground-truth locations of the body joints are indicated in order to find the corresponding pointers.

39. The product according to claim 37 , wherein the instructions cause the computer to associate respective weights with the estimated locations of the joints provided by the extracted descriptors, and to apply a weighted voting process using the weights to find the locations of the joints.

40. The product according to claim 39 , wherein the weights comprise at least one weighting term that is selected from a group of weighting terms consisting of: a similarity term, based on a descriptor distance between the matched descriptors; a patch distance term, based on a Euclidean distance between a patch position and a joint location; a joint distance term, based on a joint distance between a given joint location and a parent joint location that has already been estimated; a predictive term, based on a previous joint location derived from a preceding depth map; a variance term, based on a variance of the joint location determined in creating the database; and a bone length term, based on distance between a current estimated bone length and an expected bone length derived from the locations of the joints.

41. The product according to claim 39 , wherein the instructions cause the computer to assess a reliability of the patches providing the estimated locations, and to assign reliability weights to the estimated locations based on the assessed reliability.

42. The product according to claim 31 , wherein the instructions cause the computer to normalize a depth of the depth map by finding a representative depth coordinate of the humanoid form in the depth map and projecting a point cloud derived from the depth map responsively to the representative depth coordinate, and to apply the normalized depth in matching the descriptors and estimating the pose.

43. The product according to claim 31 , wherein the instructions cause the computer to find respective locations of joints of the humanoid form, and to calibrate a scale of the humanoid form by finding a distance between the locations of the joints and scaling the depth map responsively to the distance, and to apply the calibrated scale in matching the descriptors and estimating the pose.

44. The product according to claim 31 , wherein the instructions cause the computer to receive a sequence of depth maps, and to track movement of the humanoid form over multiple frames in the sequence.

45. The product according to claim 44 , wherein the instructions cause the computer to control a computer application responsively to the tracked movement.

Patent Metadata

Filing Date

Unknown

Publication Date

November 12, 2013

Inventors

Shai Litvak

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search