Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for detecting three-dimensional (3D) objects in a scene, comprising: accessing a set of two-dimensional (2D) images of a scene captured by one or more cameras; detecting, by a 2D object detection process executing on one or more computing devices, an object in the set of 2D images as 2D bounding boxes; matching, by a 3D object detection process executing on the one or more computing devices, ones of the 2D bounding boxes that correspond to the object by determining an object cone for each of the 2D bounding boxes and building a cluster of object cones based at least in part on a score determined from a number of other object cones that contain a common center point; selecting at least a subset of the object cones of the cluster; reconstructing, by the 3D object detection process, a 3D bounding box that represents the object from an intersection of at least the subset of object cones of the cluster; and displaying, on a display screen or storing to a storage device, the 3D bounding box.
2. The method of claim 1 , wherein the common center point is a center point for a triplet of intersecting object cones, and the building the cluster comprises: selecting the triplet of intersecting object cones; determining the center point for the triplet; determining one or more other object cones that contain the center point; and adding the one or more other object cones to the triplet to produce the cluster.
3. The method of claim 2 , wherein the score is a score for the triplet and building the cluster comprises: determining the score for the triplet based on the number of other object cones that contain the center point for the triplet; and retaining the triplet based on the score for the triplet.
4. The method of claim 1 , wherein the reconstructing the 3D bounding box comprises: generating a 3D shape for the cluster based on the intersection of the subset of object cones; and providing the 3D shape for the cluster as the 3D bounding box.
5. The method of claim 4 , wherein the subset of object cones are selected as object cones that maximize similarity between a projection of the 3D bounding box onto one or more 2D images of the scene and 2D bounding boxes in the one or more 2D images.
6. The method of claim 4 , wherein the reconstructing the 3D bounding box further comprises: merging the 3D shape for the cluster with a 3D shape of one or more other clusters determined to represent the object.
7. The method of claim 4 , wherein reconstructing the 3D bounding box further comprises: detecting visibility issues caused by the 3D shape for the cluster occluding a 3D shape of one or more other clusters determined to represent other objects; and in response to the visibility issues, redistributing object cones from the cluster to the one or more other clusters and repeating generating the 3D shape.
8. The method of claim 4 , wherein reconstructing the 3D bounding box further comprises: removing one or more other clusters that do not include at least a threshold number of object cones and repeating generating the 3D shape.
9. The method of claim 1 , further comprising: refining the 3D bounding box based on structure-from-motion (SfM) photogrammetry performed using the 2D images of the scene.
10. The method of claim 9 , wherein the refining further comprises: applying SfM photogrammetry to the 2D images of the scene to determine sparse 3D points that represent the object; determining a SfM-based 3D bounding box that surrounds the 3D points; and reducing a size of the 3D bounding box based on the SfM-based 3D bounding box.
11. The method of claim 1 , further comprising: assigning each of the 2D bounding boxes a label, wherein the matching matches 2D bounding boxes having a same label.
12. The method of claim 11 , further comprising: assigning the 3D bounding box the label of the matched 2D bounding boxes.
13. A computing device comprising: a processor; and a memory coupled to the processor and configured to store a set of two-dimensional (2D) images of a scene and code executable on the processor for a plurality of software processes, the plurality of software processes including: a 2D object detection process configured to detect an object in the set of 2D images as 2D bounding boxes, and a three-dimensional (3D) object detection process configured to match ones of the 2D bounding boxes that correspond to the object by determining an object cone for each of the 2D bounding boxes and building a cluster of object cones based at least in part on a score determined from a number of other object cones that contain a common center point, select at least a subset of the object cones of the cluster, reconstruct a 3D bounding box that represents the object from an intersection of at least the subset of object cones of the cluster, and output the 3D bounding box.
14. A non-transitory electronic-device readable medium having instructions stored thereon, the instructions when executed by one or more processors of one or more electronic devices operable to: access a set of two-dimensional (2D) images of a scene; detect an object in 2D in the set of 2D images; determine an object cone for each 2D image of the 2D images; build a cluster of object cones for the object by selecting a subset of intersecting object cones, determining a common center point of the subset, determining a score based on whether one or more other object cones contain the common center point, and retaining the cluster based on the score; select at least a subset of the object cones of the cluster; generate a three-dimensional (3D) shape for the cluster based on an intersection of at least the subset of object cones; and output the 3D shape as the object in 3D.
15. The non-transitory electronic-device readable medium of claim 14 , wherein the subset is a triplet of intersecting object cones, the common center point is a center point for the triplet of intersecting object cones, the score is a score for the triple, and the instructions when executed are further operable to: select the triplet of intersecting object cones; determine the center point for the triplet; determine the score for the triplet based on whether one or more other object cones contain the center point; retain the triplet based on the score for the triple; and add the one or more other object cones to the triplet to produce the cluster.
16. The non-transitory electronic-device readable medium of claim 14 , wherein the instructions when executed are further operable to: merge the 3D shape for the cluster with a 3D shape of one or more other clusters determined to represent the object; detect visibility issues caused by the 3D shape for the cluster occluding a 3D shape of one or more other clusters determined to represent the other objects; and in response to the visibility issues, redistribute object cones from the cluster to the one or more other clusters.
17. The non-transitory electronic-device readable medium of claim 14 , wherein the instructions when executed are further operable to: refine the 3D bounding box based on a structure-from-motion (SfM) photogrammetry performed using the 2D images of the scene.
18. A method for detecting three-dimensional (3D) objects in a scene, comprising: accessing a set of two-dimensional (2D) images of the scene; detecting, by a 2D object detection process executing on one or more computing devices, an object in 2D in the set of 2D images; determining an object cone for each 2D image of the 2D images; building a cluster of object cones for the object by selecting a subset of intersecting object cones, determining a common center point of the subset, determining a score based on whether one or more other object cones contain the common center point, and retaining the cluster based on the score; selecting at least a subset of the object cones of the cluster; generating a 3D shape for the cluster based on an intersection of at least the subset of object cones; and outputting the 3D shape as the object in 3D.
19. The method of claim 18 , wherein the subset is a triplet of intersecting object cones, the common center point is a center point for the triplet of intersecting object cones, the score is a score for the triple, and the building the cluster further comprises: selecting the triplet of intersecting object cones; determining the center point for the triplet; determining the score for the triplet based on whether one or more other object cones contain the center point; retaining the triplet based on the score for the triple, and adding the one or more other object cones to the triplet to produce the cluster.
20. The method of claim 18 , further comprising: merging the 3D shape for the cluster with a 3D shape of one or more other clusters determined to represent the object; detecting visibility issues caused by the 3D shape for the cluster occluding a 3D shape of one or more other clusters determined to represent the other objects; and in response to the visibility issues, redistributing object cones from the cluster to the one or more other clusters.
21. The method of claim 18 , further comprising: refining the 3D bounding box based on a structure-from-motion (SfM) photogrammetry performed using the 2D images of the scene.
Unknown
March 22, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.