Three Dimensional Bounding Box Estimation from Two Dimensional Images

PublishedAugust 4, 2020

Assigneenot available in USPTO data we have

InventorsArsalan Mousavian John Patrick Flynn Dragomir Dimitrov Anguelov

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A system comprising: one or more processors; and a non-transitory computer readable medium comprising instructions that, when executed by the one or more processors, cause the system to perform operations comprising: receiving sensor data; determining an object in an environment represented in the sensor data; inputting at least a portion of the sensor data into a machine learning algorithm; receiving, based at least in part on the portion of the sensor data and from the machine learning algorithm, output associated with a physical parameter of the object, wherein the machine learning algorithm comprises: a coarse output branch configured to output a coarse output; and a fine offset branch configured to output an offset with respect to the coarse output by the coarse output branch; and wherein the output comprises a sum of the offset and a highest confidence value of a set of confidence values associated with the coarse output.

2. The system of claim 1 , wherein: a confidence value of the set of confidence values is associated with a potential physical parameter associated with the object.

3. The system of claim 1 , the operations further comprising determining, based at least in part on the sensor data, a two dimensional bounding box associated with the object, wherein: the sensor data comprises image data, the inputting is based at least in part on the two dimensional bounding box; the output associated with the physical parameter of the object comprises: an orientation of a three dimensional bounding box associated with the object; and dimensions of the three dimensional bounding box; and the coarse output represents a coarse orientation of the three dimensional bounding box; and the offset represents an orientation offset with respect to the coarse orientation of the three dimensional bounding box.

4. The system of claim 3 , wherein: the orientation of the three dimensional bounding box is based at least in part on the coarse orientation and the orientation offset, the orientation represented as an angle between: a first ray originating from a center of a sensor associated with the sensor data and passing through a center of the two dimensional bounding box, and a second ray aligned with a direction of the object.

5. The system of claim 3 , the operations further comprising estimating a position of the three dimensional bounding box by associating the three dimensional bounding box with the sensor data.

6. The system of claim 5 , wherein: estimating the position of the three dimensional bounding box in the environment comprises minimizing a difference between an association of the three dimensional bounding box with the image data and the two dimensional bounding box.

7. The system of claim 3 , wherein the machine learning algorithm is a convolution neural network trained based at least in part on training data comprising a training two dimensional bounding box and an associated ground truth three dimensional bounding box.

8. The system of claim 7 , wherein: the training data is based at least in part on a transformation to a training image; and the transformation comprises at least one of: mirroring the training image; adding noise to the training image; resizing the training image; or resizing the training two dimensional bounding box.

9. A method comprising: receiving sensor data; determining an object in an environment represented in the sensor data; inputting at least a portion of the sensor data into a machine learning algorithm; receiving, based at least in part on the portion of the sensor data and from the machine learning algorithm, output associated with a physical parameter of the object, wherein the machine learning algorithm comprises: a coarse output branch configured to output a coarse output; and a fine offset branch configured to output an offset with respect to the coarse output by the coarse output branch; and wherein the output comprises a sum of the offset and a highest confidence value of a set of confidence values associated with the coarse output.

10. The method of claim 9 , wherein: a confidence value of the set of confidence values is associated with a potential physical parameter associated with the object.

11. The method of claim 9 , further comprising: determining, based at least in part on the sensor data, a two dimensional bounding box associated with the object, wherein: the sensor data comprises image data, the inputting is based at least in part on the two dimensional bounding box; the output associated with the physical parameter of the object comprises: an orientation of a three dimensional bounding box associated with the object; and dimensions of the three dimensional bounding box; and the coarse output represents a coarse orientation of the three dimensional bounding box; and the offset represents an orientation offset with respect to the coarse orientation of the three dimensional bounding box.

12. The method of claim 11 , wherein the orientation of the three dimensional bounding box is based at least in part on the coarse orientation and the orientation offset, the orientation represented as an angle between: a first ray originating from a center of a sensor associated with the sensor data and passing through a center of the two dimensional bounding box, and a second ray aligned with a direction of the object.

13. The method of claim 11 , further comprising: estimating a position of the three dimensional bounding box by associating the three dimensional bounding box with the sensor data.

14. The method of claim 13 , wherein estimating the position of the three dimensional bounding box in the environment comprises minimizing a difference between an association of the three dimensional bounding box with the image data and the two dimensional bounding box.

15. A non-transitory computer readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving sensor data; determining an object in an environment represented in the sensor data; inputting at least a portion of the sensor data into a machine learning algorithm; receiving, based at least in part on the portion of the sensor data and from the machine learning algorithm, output associated with a physical parameter of the object, wherein the machine learning algorithm comprises: a coarse output branch configured to output a coarse output; and a fine offset branch configured to output an offset with respect to the coarse output by the coarse output branch; and wherein the output comprises a sum of the offset and a highest confidence value of a set of confidence values associated with the coarse output.

16. The non-transitory computer readable medium of claim 15 , wherein: a confidence value of the set of confidence values is associated with a potential physical parameter associated with the object.

17. The non-transitory computer readable medium of claim 15 , the operations further comprising: determining, based at least in part on the sensor data, a two dimensional bounding box associated with the object, wherein: the sensor data comprises image data, the inputting is based at least in part on the two dimensional bounding box; the output associated with the physical parameter of the object comprises: an orientation of a three dimensional bounding box associated with the object; and dimensions of the three dimensional bounding box; and the coarse output represents a coarse orientation of the three dimensional bounding box; and the offset represents an orientation offset with respect to the coarse orientation of the three dimensional bounding box.

18. The non-transitory computer readable medium of claim 17 , wherein the orientation of the three dimensional bounding box is based at least in part on the coarse orientation and the orientation offset, the orientation represented as an angle between: a first ray originating from a center of a sensor associated with the sensor data and passing through a center of the two dimensional bounding box, and a second ray aligned with a direction of the object.

19. The non-transitory computer readable medium of claim 17 , the operations further comprising: estimating a position of the three dimensional bounding box by associating the three dimensional bounding box with the sensor data.

20. The non-transitory computer readable medium of claim 19 , wherein estimating the position of the three dimensional bounding box in the environment comprises minimizing a difference between an association of the three dimensional bounding box with the image data and the two dimensional bounding box.

Patent Metadata

Filing Date

Unknown

Publication Date

August 4, 2020

Inventors

Arsalan Mousavian

John Patrick Flynn

Dragomir Dimitrov Anguelov

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search