Pixel-Wise Residual Pose Estimation for Monocular Depth Estimation

PublishedMay 3, 2022

Assigneenot available in USPTO data we have

InventorsVitor GUIZILINI Adrien David GAIDON

Technical Abstract

Patent Claims

18 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for scene reconstruction, comprising: generating, at a depth estimation model, both a depth estimate and a first pose estimate from a current image; generating, at a pose estimation model, a second pose estimate based on the current image and at least one previous image in a sequence of images; generating a warped image by warping each pixel in the current image based on the depth estimate, the first pose estimate, and the second pose estimate; and controlling an action of an agent based on the generated warped image.

2. The method of claim 1 , further comprising: generating a transformation matrix based on the first pose estimate and the second pose estimate; and generating the warped image by warping each pixel in the current image based on the depth estimate and the transformation matrix.

3. The method of claim 1 , in which the first pose estimate and the second pose estimate comprise both an x, y, z, translation and a roll, pitch, and yaw translation.

4. The method of claim 1 , further comprising obtaining the current image from a monocular camera, in which the current image is a two-dimensional image.

5. The method of claim 1 , in which the warped image is a three-dimensional image.

6. The method of claim 1 , further comprising determining a local transformation and a global transformation for each pixel.

7. An apparatus for scene reconstruction, comprising: a processor; a memory coupled with the processor; and instructions stored in the memory and operable, when executed by the processor, to cause the apparatus: to generate, at a depth estimation model, both a depth estimate and a first pose estimate from a current image; to generate, at a pose estimation model, a second pose estimate based on the current image and at least one previous image in a sequence of images; to generate a warped image by warping each pixel in the current image based on the depth estimate, the first pose estimate, and the second pose estimate; and to control an action of an agent based on the generated warped image.

8. The apparatus of claim 7 , in which execution of the instructions further cause the apparatus: to generate a transformation matrix based on the first pose estimate and the second pose estimate; and to generate the warped image by warping each pixel in the current image based on the depth estimate and the transformation matrix.

9. The apparatus of claim 7 , in which the first pose estimate and the second pose estimate comprise both an x, y, z, translation and a roll, pitch, and yaw translation.

10. The apparatus of claim 7 , in which: execution of the instructions further cause the apparatus to obtain the current image from a monocular camera; and the current image is a two-dimensional image.

11. The apparatus of claim 7 , in which the warped image is a three-dimensional image.

12. The apparatus of claim 7 , in which execution of the instructions further cause the apparatus determine a local transformation and a global transformation for each pixel.

13. A non-transitory computer-readable medium having program code recorded thereon for scene reconstruction, the program code executed by a processor and comprising: program code to generate, at a depth estimation model, both a depth estimate and a first pose estimate from a current image; program code to generate, at a pose estimation model, a second pose estimate based on the current image and at least one previous image in a sequence of images; program code to generate a warped image by warping each pixel in the current image based on the depth estimate, the first pose estimate, and the second pose estimate; and program code to control an action of an agent based on the generated warped image.

14. The non-transitory computer-readable medium of claim 13 , in which the program code further comprises: program code to generate a transformation matrix based on the first pose estimate and the second pose estimate; and program code to generate the warped image by warping each pixel in the current image based on the depth estimate and the transformation matrix.

15. The non-transitory computer-readable medium of claim 13 , in which the first pose estimate and the second pose estimate comprise both an x, y, z, translation and a roll, pitch, and yaw translation.

16. The non-transitory computer-readable medium of claim 13 , in which: the program code further comprises program code to obtain the current image from a monocular camera; and the current image is a two-dimensional image.

17. The non-transitory computer-readable medium of claim 13 , in which the warped image is a three-dimensional image.

18. The non-transitory computer-readable medium of claim 13 , in which the program code further comprises program code to determine a local transformation and a global transformation for each pixel.

Patent Metadata

Filing Date

Unknown

Publication Date

May 3, 2022

Inventors

Vitor GUIZILINI

Adrien David GAIDON

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search