10733441

Three Dimensional Bounding Box Estimation from Two Dimensional Images

PublishedAugust 4, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A system comprising: one or more processors; and a non-transitory computer readable medium comprising instructions that, when executed by the one or more processors, cause the system to perform operations comprising: receiving sensor data; determining an object in an environment represented in the sensor data; inputting at least a portion of the sensor data into a machine learning algorithm; receiving, based at least in part on the portion of the sensor data and from the machine learning algorithm, output associated with a physical parameter of the object, wherein the machine learning algorithm comprises: a coarse output branch configured to output a coarse output; and a fine offset branch configured to output an offset with respect to the coarse output by the coarse output branch; and wherein the output comprises a sum of the offset and a highest confidence value of a set of confidence values associated with the coarse output.

Plain English Translation

A system processes sensor data to identify an object and estimate its physical parameters. It inputs a portion of this data into a machine learning algorithm. This algorithm has two branches: a coarse output branch that produces an initial estimate represented by a set of confidence values, and a fine offset branch that calculates an adjustment. The system's final estimate for the object's physical parameter is determined by summing this fine offset with the highest confidence value from the coarse output branch's initial estimate.

Claim 2

Original Legal Text

2. The system of claim 1 , wherein: a confidence value of the set of confidence values is associated with a potential physical parameter associated with the object.

Plain English Translation

This system processes sensor data to identify an object and estimate its physical parameters. It inputs a portion of this data into a machine learning algorithm. This algorithm has two branches: a coarse output branch that produces an initial estimate as a set of confidence values, where each confidence value represents a potential physical parameter for the object. A fine offset branch calculates an adjustment. The system's final estimate for the object's physical parameter is determined by summing this fine offset with the highest confidence value from the coarse output branch's initial estimate.

Claim 3

Original Legal Text

3. The system of claim 1 , the operations further comprising determining, based at least in part on the sensor data, a two dimensional bounding box associated with the object, wherein: the sensor data comprises image data, the inputting is based at least in part on the two dimensional bounding box; the output associated with the physical parameter of the object comprises: an orientation of a three dimensional bounding box associated with the object; and dimensions of the three dimensional bounding box; and the coarse output represents a coarse orientation of the three dimensional bounding box; and the offset represents an orientation offset with respect to the coarse orientation of the three dimensional bounding box.

Plain English Translation

A system processes image data from sensors to identify an object and determine a two-dimensional bounding box around it. It then inputs the portion of the image data within this two-dimensional bounding box into a machine learning algorithm. This algorithm estimates physical parameters of the object, specifically the orientation and dimensions of a three-dimensional bounding box. The algorithm has a coarse output branch that provides a coarse estimate of the 3D bounding box's orientation, represented by a set of confidence values. A fine offset branch calculates an orientation offset relative to this coarse orientation. The system's final estimated 3D bounding box orientation is derived by summing this fine offset with the highest confidence value from the coarse output branch's orientation estimate.

Claim 4

Original Legal Text

4. The system of claim 3 , wherein: the orientation of the three dimensional bounding box is based at least in part on the coarse orientation and the orientation offset, the orientation represented as an angle between: a first ray originating from a center of a sensor associated with the sensor data and passing through a center of the two dimensional bounding box, and a second ray aligned with a direction of the object.

Plain English Translation

This system processes image data from sensors to identify an object and determine a two-dimensional bounding box around it. It inputs the image data within this 2D bounding box into a machine learning algorithm to estimate the object's three-dimensional bounding box orientation and dimensions. The algorithm uses a coarse output branch for an initial, coarse orientation estimate (as confidence values) and a fine offset branch for an orientation offset. The final 3D bounding box orientation is calculated by combining this offset with the highest confidence coarse orientation. This orientation is specifically defined as the angle between two rays: a first ray originating from the sensor's center and passing through the center of the 2D bounding box, and a second ray aligned with the object's direction.

Claim 5

Original Legal Text

5. The system of claim 3 , the operations further comprising estimating a position of the three dimensional bounding box by associating the three dimensional bounding box with the sensor data.

Plain English Translation

A system processes image data from sensors to identify an object and determine a two-dimensional bounding box around it. It then inputs the portion of the image data within this two-dimensional bounding box into a machine learning algorithm. This algorithm estimates physical parameters of the object, specifically the orientation and dimensions of a three-dimensional bounding box. The algorithm has a coarse output branch that provides a coarse estimate of the 3D bounding box's orientation (as confidence values) and a fine offset branch that calculates an orientation offset relative to this coarse orientation. The system's final estimated 3D bounding box orientation is derived by summing this fine offset with the highest confidence coarse orientation. Additionally, the system estimates the position of the three-dimensional bounding box by correlating it with the original sensor data.

Claim 6

Original Legal Text

6. The system of claim 5 , wherein: estimating the position of the three dimensional bounding box in the environment comprises minimizing a difference between an association of the three dimensional bounding box with the image data and the two dimensional bounding box.

Plain English Translation

This system processes image data from sensors to identify an object and determine a two-dimensional bounding box around it. It then inputs the image data within this 2D bounding box into a machine learning algorithm. This algorithm estimates physical parameters of the object, specifically the orientation and dimensions of a three-dimensional bounding box. The algorithm utilizes a coarse output branch for a coarse orientation estimate (as confidence values) and a fine offset branch for an orientation offset. The final 3D bounding box orientation is derived by summing this fine offset with the highest confidence coarse orientation. The system also estimates the 3D bounding box's position in the environment by minimizing the difference between how the 3D bounding box projects onto the image data and the previously detected 2D bounding box.

Claim 7

Original Legal Text

7. The system of claim 3 , wherein the machine learning algorithm is a convolution neural network trained based at least in part on training data comprising a training two dimensional bounding box and an associated ground truth three dimensional bounding box.

Plain English Translation

A system processes image data from sensors to identify an object and determine a two-dimensional bounding box around it. It then inputs the portion of the image data within this two-dimensional bounding box into a machine learning algorithm, which is a convolutional neural network. This CNN estimates physical parameters of the object, specifically the orientation and dimensions of a three-dimensional bounding box. The algorithm has a coarse output branch that provides a coarse estimate of the 3D bounding box's orientation (as confidence values) and a fine offset branch that calculates an orientation offset relative to this coarse orientation. The system's final estimated 3D bounding box orientation is derived by summing this fine offset with the highest confidence coarse orientation. The convolutional neural network was trained using data that included training 2D bounding boxes and their corresponding ground truth 3D bounding boxes.

Claim 8

Original Legal Text

8. The system of claim 7 , wherein: the training data is based at least in part on a transformation to a training image; and the transformation comprises at least one of: mirroring the training image; adding noise to the training image; resizing the training image; or resizing the training two dimensional bounding box.

Plain English Translation

This system processes image data from sensors to identify an object and determine a two-dimensional bounding box around it. It then inputs the image data within this 2D bounding box into a convolutional neural network (CNN) to estimate the object's three-dimensional bounding box orientation and dimensions. The CNN has a coarse output branch for an initial, coarse orientation estimate (as confidence values) and a fine offset branch for an orientation offset. The final 3D bounding box orientation is calculated by combining this offset with the highest confidence coarse orientation. The CNN was trained using data that included training 2D bounding boxes and their corresponding ground truth 3D bounding boxes. This training data was augmented by transformations applied to training images, such as mirroring, adding noise, resizing the image, or resizing the training 2D bounding box.

Claim 9

Original Legal Text

9. A method comprising: receiving sensor data; determining an object in an environment represented in the sensor data; inputting at least a portion of the sensor data into a machine learning algorithm; receiving, based at least in part on the portion of the sensor data and from the machine learning algorithm, output associated with a physical parameter of the object, wherein the machine learning algorithm comprises: a coarse output branch configured to output a coarse output; and a fine offset branch configured to output an offset with respect to the coarse output by the coarse output branch; and wherein the output comprises a sum of the offset and a highest confidence value of a set of confidence values associated with the coarse output.

Plain English Translation

A method for estimating an object's physical parameters involves receiving sensor data and identifying an object within that data. A portion of the sensor data is then fed into a machine learning algorithm. This algorithm uses a coarse output branch to generate an initial estimate, represented by a set of confidence values, and a fine offset branch to produce an adjustment. The method determines the final physical parameter output by summing this fine offset with the highest confidence value from the coarse output branch's initial estimate.

Claim 10

Original Legal Text

10. The method of claim 9 , wherein: a confidence value of the set of confidence values is associated with a potential physical parameter associated with the object.

Plain English Translation

A method for estimating an object's physical parameters involves receiving sensor data and identifying an object within that data. A portion of the sensor data is then fed into a machine learning algorithm. This algorithm uses a coarse output branch to generate an initial estimate as a set of confidence values, where each confidence value is associated with a potential physical parameter for the object. A fine offset branch produces an adjustment. The method determines the final physical parameter output by summing this fine offset with the highest confidence value from the coarse output branch's initial estimate.

Claim 11

Original Legal Text

11. The method of claim 9 , further comprising: determining, based at least in part on the sensor data, a two dimensional bounding box associated with the object, wherein: the sensor data comprises image data, the inputting is based at least in part on the two dimensional bounding box; the output associated with the physical parameter of the object comprises: an orientation of a three dimensional bounding box associated with the object; and dimensions of the three dimensional bounding box; and the coarse output represents a coarse orientation of the three dimensional bounding box; and the offset represents an orientation offset with respect to the coarse orientation of the three dimensional bounding box.

Plain English Translation

A method for estimating physical parameters of an object involves receiving image data from sensors, then determining a two-dimensional bounding box associated with the object in that data. A portion of the image data within this 2D bounding box is then inputted into a machine learning algorithm. This algorithm outputs physical parameters, specifically the orientation and dimensions of a three-dimensional bounding box for the object. The algorithm consists of a coarse output branch, which generates a coarse orientation estimate for the 3D bounding box (represented by confidence values), and a fine offset branch, which generates an orientation offset relative to the coarse orientation. The final 3D bounding box orientation is determined by summing this offset with the highest confidence coarse orientation.

Claim 12

Original Legal Text

12. The method of claim 11 , wherein the orientation of the three dimensional bounding box is based at least in part on the coarse orientation and the orientation offset, the orientation represented as an angle between: a first ray originating from a center of a sensor associated with the sensor data and passing through a center of the two dimensional bounding box, and a second ray aligned with a direction of the object.

Plain English Translation

A method for estimating physical parameters of an object involves receiving image data from sensors, then determining a two-dimensional bounding box for the object. The image data within this 2D bounding box is inputted into a machine learning algorithm, which outputs the orientation and dimensions of a three-dimensional bounding box. The algorithm uses a coarse output branch for a coarse orientation estimate (as confidence values) and a fine offset branch for an orientation offset. The final 3D bounding box orientation is calculated by combining this offset with the highest confidence coarse orientation. This orientation is specifically defined as the angle between a first ray originating from the sensor's center and passing through the 2D bounding box's center, and a second ray aligned with the object's direction.

Claim 13

Original Legal Text

13. The method of claim 11 , further comprising: estimating a position of the three dimensional bounding box by associating the three dimensional bounding box with the sensor data.

Plain English Translation

A method for estimating physical parameters of an object involves receiving image data from sensors, then determining a two-dimensional bounding box associated with the object. The image data within this 2D bounding box is inputted into a machine learning algorithm to output the orientation and dimensions of a three-dimensional bounding box. The algorithm uses a coarse output branch for a coarse orientation estimate (as confidence values) and a fine offset branch for an orientation offset. The final 3D bounding box orientation is determined by summing this offset with the highest confidence coarse orientation. Additionally, the method includes estimating the position of the three-dimensional bounding box by associating it with the original sensor data.

Claim 14

Original Legal Text

14. The method of claim 13 , wherein estimating the position of the three dimensional bounding box in the environment comprises minimizing a difference between an association of the three dimensional bounding box with the image data and the two dimensional bounding box.

Plain English Translation

A method for estimating physical parameters of an object involves receiving image data from sensors, then determining a two-dimensional bounding box for the object. The image data within this 2D bounding box is inputted into a machine learning algorithm to output the orientation and dimensions of a three-dimensional bounding box. The algorithm uses a coarse output branch for a coarse orientation estimate (as confidence values) and a fine offset branch for an orientation offset. The final 3D bounding box orientation is determined by summing this offset with the highest confidence coarse orientation. The method further estimates the 3D bounding box's position in the environment by minimizing the difference between how the 3D bounding box projects onto the image data and the previously detected 2D bounding box.

Claim 15

Original Legal Text

15. A non-transitory computer readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving sensor data; determining an object in an environment represented in the sensor data; inputting at least a portion of the sensor data into a machine learning algorithm; receiving, based at least in part on the portion of the sensor data and from the machine learning algorithm, output associated with a physical parameter of the object, wherein the machine learning algorithm comprises: a coarse output branch configured to output a coarse output; and a fine offset branch configured to output an offset with respect to the coarse output by the coarse output branch; and wherein the output comprises a sum of the offset and a highest confidence value of a set of confidence values associated with the coarse output.

Plain English Translation

A non-transitory computer readable medium stores instructions that, when executed, enable a system to estimate an object's physical parameters. The system receives sensor data, identifies an object within it, and inputs a portion of this data into a machine learning algorithm. This algorithm has a coarse output branch producing an initial estimate (as a set of confidence values) and a fine offset branch calculating an adjustment. The instructions cause the system to determine the final physical parameter output by summing this fine offset with the highest confidence value from the coarse output branch's initial estimate.

Claim 16

Original Legal Text

16. The non-transitory computer readable medium of claim 15 , wherein: a confidence value of the set of confidence values is associated with a potential physical parameter associated with the object.

Plain English Translation

A non-transitory computer readable medium stores instructions that, when executed, enable a system to estimate an object's physical parameters. The system receives sensor data, identifies an object within it, and inputs a portion of this data into a machine learning algorithm. This algorithm has a coarse output branch producing an initial estimate as a set of confidence values, where each confidence value is associated with a potential physical parameter for the object. A fine offset branch calculates an adjustment. The instructions cause the system to determine the final physical parameter output by summing this fine offset with the highest confidence value from the coarse output branch's initial estimate.

Claim 17

Original Legal Text

17. The non-transitory computer readable medium of claim 15 , the operations further comprising: determining, based at least in part on the sensor data, a two dimensional bounding box associated with the object, wherein: the sensor data comprises image data, the inputting is based at least in part on the two dimensional bounding box; the output associated with the physical parameter of the object comprises: an orientation of a three dimensional bounding box associated with the object; and dimensions of the three dimensional bounding box; and the coarse output represents a coarse orientation of the three dimensional bounding box; and the offset represents an orientation offset with respect to the coarse orientation of the three dimensional bounding box.

Plain English Translation

A non-transitory computer readable medium stores instructions that, when executed, enable a system to estimate physical parameters of an object. The system receives image data from sensors, determines a two-dimensional bounding box for the object, and inputs the image data within this 2D bounding box into a machine learning algorithm. This algorithm outputs physical parameters, specifically the orientation and dimensions of a three-dimensional bounding box. The algorithm comprises a coarse output branch that provides a coarse orientation estimate for the 3D bounding box (represented by confidence values), and a fine offset branch that calculates an orientation offset relative to the coarse orientation. The final 3D bounding box orientation is determined by summing this offset with the highest confidence coarse orientation.

Claim 18

Original Legal Text

18. The non-transitory computer readable medium of claim 17 , wherein the orientation of the three dimensional bounding box is based at least in part on the coarse orientation and the orientation offset, the orientation represented as an angle between: a first ray originating from a center of a sensor associated with the sensor data and passing through a center of the two dimensional bounding box, and a second ray aligned with a direction of the object.

Plain English Translation

A non-transitory computer readable medium stores instructions that, when executed, enable a system to estimate physical parameters of an object. The system receives image data from sensors, determines a two-dimensional bounding box for the object, and inputs the image data within this 2D bounding box into a machine learning algorithm. This algorithm outputs the orientation and dimensions of a three-dimensional bounding box. The algorithm uses a coarse output branch for a coarse orientation estimate (as confidence values) and a fine offset branch for an orientation offset. The final 3D bounding box orientation is calculated by combining this offset with the highest confidence coarse orientation. This orientation is specifically defined as the angle between a first ray originating from the sensor's center and passing through the 2D bounding box's center, and a second ray aligned with the object's direction.

Claim 19

Original Legal Text

19. The non-transitory computer readable medium of claim 17 , the operations further comprising: estimating a position of the three dimensional bounding box by associating the three dimensional bounding box with the sensor data.

Plain English Translation

A non-transitory computer readable medium stores instructions that, when executed, enable a system to estimate physical parameters of an object. The system receives image data from sensors, determines a two-dimensional bounding box for the object, and inputs the image data within this 2D bounding box into a machine learning algorithm. This algorithm outputs physical parameters, specifically the orientation and dimensions of a three-dimensional bounding box. The algorithm uses a coarse output branch for a coarse orientation estimate (as confidence values) and a fine offset branch for an orientation offset. The final 3D bounding box orientation is determined by summing this offset with the highest confidence coarse orientation. Additionally, the instructions cause the system to estimate the position of the three-dimensional bounding box by associating it with the original sensor data.

Claim 20

Original Legal Text

20. The non-transitory computer readable medium of claim 19 , wherein estimating the position of the three dimensional bounding box in the environment comprises minimizing a difference between an association of the three dimensional bounding box with the image data and the two dimensional bounding box.

Plain English Translation

A non-transitory computer readable medium stores instructions that, when executed, enable a system to estimate physical parameters of an object. The system receives image data from sensors, determines a two-dimensional bounding box for the object, and inputs the image data within this 2D bounding box into a machine learning algorithm. This algorithm outputs the orientation and dimensions of a three-dimensional bounding box. The algorithm uses a coarse output branch for a coarse orientation estimate (as confidence values) and a fine offset branch for an orientation offset. The final 3D bounding box orientation is determined by summing this offset with the highest confidence coarse orientation. The instructions further cause the system to estimate the 3D bounding box's position in the environment by minimizing the difference between how the 3D bounding box projects onto the image data and the previously detected 2D bounding box.

Patent Metadata

Filing Date

Unknown

Publication Date

August 4, 2020

Inventors

Arsalan Mousavian
John Patrick Flynn
Dragomir Dimitrov Anguelov

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “THREE DIMENSIONAL BOUNDING BOX ESTIMATION FROM TWO DIMENSIONAL IMAGES” (10733441). https://patentable.app/patents/10733441

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10733441. See llms.txt for full attribution policy.