Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for scene reconstruction, comprising: generating, at a depth estimation model, both a depth estimate and a first pose estimate from a current image; generating, at a pose estimation model, a second pose estimate based on the current image and at least one previous image in a sequence of images; generating a warped image by warping each pixel in the current image based on the depth estimate, the first pose estimate, and the second pose estimate; and controlling an action of an agent based on the generated warped image.
2. The method of claim 1 , further comprising: generating a transformation matrix based on the first pose estimate and the second pose estimate; and generating the warped image by warping each pixel in the current image based on the depth estimate and the transformation matrix.
3. The method of claim 1 , in which the first pose estimate and the second pose estimate comprise both an x, y, z, translation and a roll, pitch, and yaw translation.
4. The method of claim 1 , further comprising obtaining the current image from a monocular camera, in which the current image is a two-dimensional image.
5. The method of claim 1 , in which the warped image is a three-dimensional image.
6. The method of claim 1 , further comprising determining a local transformation and a global transformation for each pixel.
7. An apparatus for scene reconstruction, comprising: a processor; a memory coupled with the processor; and instructions stored in the memory and operable, when executed by the processor, to cause the apparatus: to generate, at a depth estimation model, both a depth estimate and a first pose estimate from a current image; to generate, at a pose estimation model, a second pose estimate based on the current image and at least one previous image in a sequence of images; to generate a warped image by warping each pixel in the current image based on the depth estimate, the first pose estimate, and the second pose estimate; and to control an action of an agent based on the generated warped image.
8. The apparatus of claim 7 , in which execution of the instructions further cause the apparatus: to generate a transformation matrix based on the first pose estimate and the second pose estimate; and to generate the warped image by warping each pixel in the current image based on the depth estimate and the transformation matrix.
9. The apparatus of claim 7 , in which the first pose estimate and the second pose estimate comprise both an x, y, z, translation and a roll, pitch, and yaw translation.
10. The apparatus of claim 7 , in which: execution of the instructions further cause the apparatus to obtain the current image from a monocular camera; and the current image is a two-dimensional image.
11. The apparatus of claim 7 , in which the warped image is a three-dimensional image.
12. The apparatus of claim 7 , in which execution of the instructions further cause the apparatus determine a local transformation and a global transformation for each pixel.
13. A non-transitory computer-readable medium having program code recorded thereon for scene reconstruction, the program code executed by a processor and comprising: program code to generate, at a depth estimation model, both a depth estimate and a first pose estimate from a current image; program code to generate, at a pose estimation model, a second pose estimate based on the current image and at least one previous image in a sequence of images; program code to generate a warped image by warping each pixel in the current image based on the depth estimate, the first pose estimate, and the second pose estimate; and program code to control an action of an agent based on the generated warped image.
14. The non-transitory computer-readable medium of claim 13 , in which the program code further comprises: program code to generate a transformation matrix based on the first pose estimate and the second pose estimate; and program code to generate the warped image by warping each pixel in the current image based on the depth estimate and the transformation matrix.
15. The non-transitory computer-readable medium of claim 13 , in which the first pose estimate and the second pose estimate comprise both an x, y, z, translation and a roll, pitch, and yaw translation.
16. The non-transitory computer-readable medium of claim 13 , in which: the program code further comprises program code to obtain the current image from a monocular camera; and the current image is a two-dimensional image.
17. The non-transitory computer-readable medium of claim 13 , in which the warped image is a three-dimensional image.
18. The non-transitory computer-readable medium of claim 13 , in which the program code further comprises program code to determine a local transformation and a global transformation for each pixel.
Unknown
May 3, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.