Unsupervised Training of Optical Flow Estimation Neural Networks

PublishedFebruary 18, 2025

Assigneenot available in USPTO data we have

InventorsDaniel Rudolf Maurer Austin Charles Stone Alper Ayvaci Anelia Angelova Rico Jonschkowski

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method performed by one or more computers and for training a neural network that has a plurality of network parameters and that is configured to receive as input a first image and a second image and to generate as output an optical flow estimate of optical flow between the first image and the second image, the method comprising: obtaining a batch of one or more training image pairs, each training image pair comprising a respective first training image and a respective second training image; for each of the one or more training image pairs: processing the first training image and the second training image using the neural network to generate a final optical flow estimate from the first training image to the second training image; generating a cropped final optical flow estimate from the first training image to the second training image, comprising cropping the final optical flow estimate from the first training image to the second training image; and training the neural network on the one or more training image pairs, the training comprising, for each training image pair, using the cropped final optical flow estimate for the training image pair as a target output for the neural network.

2. The method of claim 1, further comprising, for each of the one or more training image pairs: generating a modified first training image, comprising cropping the first training image in the training image pair; generating a modified second training image, comprising cropping the second training image in the training image pair; and processing the modified first training image and the modified second training image using the neural network to generate one or more modified optical flow estimates, wherein training the neural network on the one or more training image pairs comprises: computing a gradient with respect to the network parameters of a loss function that comprises a first term that measures, for each training image pair, an error between (i) the one or more modified optical flow estimates for the training image pair and (ii) the cropped final optical flow estimate for the training image pair; and updating the network parameters using the gradient.

3. The method of claim 2, wherein the neural network is configured to generate the optical flow estimate of optical flow between the first image and the second image by: initializing the optical flow estimate, and at each of a plurality of update iterations, updating the optical flow estimate using features of the first image and the second image, and wherein the one or more modified optical flow estimates include a respective modified optical flow estimate for each of the update iterations.

4. The method of claim 3, wherein the first term measures, for each training image pair and for each of the plurality of update iterations, an error between (i) the respective modified optical flow estimate for the update iteration for the training image pair and (ii) the cropped final optical flow estimate for the training image pair.

5. The method of claim 3, further comprising, for each training image pair and for each of the plurality of update iterations: generating a warped second training image by warping the second training image in the training image pair using the respective modified optical flow estimate for the update iteration, wherein: the loss function further comprises a second term that measures, for each training image pair and for each of the plurality of update iterations, a photometric difference between the warped second training image for the training image pair and for the update iteration and the first training image in the training image pair.

6. The method of claim 5, wherein the photometric difference is measured using an occlusion mask that masks out occluded pixels from contributing to the photometric difference.

7. The method of claim 3, wherein the loss function comprises a third term that measures, for each training image pair and for each of the plurality of update iterations, an edge-aware smoothness of the respective modified optical flow estimate for the training image pair and for the update iteration.

8. The method of claim 1, further comprising: obtaining one or more sequences of training images, each sequence comprising (i) a current training image, (ii) a preceding training image that precedes the current training image in the sequence, and (iii) a following training image that follows the current training image in the sequence; for each sequence: processing the current training image and the preceding training image in the sequence using the neural network to generate a backward optical flow estimate from the current training image to the preceding training image; processing the current training image and the following training image in the sequence using the neural network to generate a forward optical flow estimate from the current training image to the forward training image; generating, from the backward optical flow estimate, a prediction of the forward optical flow estimate; and generating an in-painted forward flow estimate from the prediction and from the forward optical flow estimate; computing a gradient with respect to the network parameters of a second loss function that includes a third term that measures, for each sequence, an error between the in-painted forward flow estimate and the forward optical flow estimate; and updating the network parameters using the gradient.

9. The method of claim 8, wherein generating, from the backward optical flow estimate, a prediction of the forward optical flow estimate comprises: processing an input comprising at least the backward optical flow estimate using a learned inversion machine learning model that has been trained to generate the prediction.

10. The method of claim 9, wherein the input comprises normalized image coordinates of the pixels in the current image.

11. The method of claim 9, wherein the learned inversion machine learning model is trained specifically for the preceding image—following image pair.

12. The method of claim 8, wherein generating an in-painted forward flow estimate from the prediction and from the forward optical flow estimate comprises: in-painting one or more occluded regions of the forward optical flow estimate using the prediction.

13. The method of claim 1, wherein the optical flow estimate of optical flow between the first image and the second image includes, for each pixel in the first image, a respective offset of a corresponding pixel in the second image.

14. The method of claim 1, wherein: generating a modified first training image further comprises applying one or more data augmentations to the first training image in the pair; and generating a modified second training image further comprises applying one or more data augmentations to the second training image in the pair.

15. One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for training a neural network that has a plurality of network parameters and that is configured to receive as input a first image and a second image and to generate as output an optical flow estimate of optical flow between the first image and the second image, the operations comprising: obtaining a batch of one or more training image pairs, each training image pair comprising a respective first training image and a respective second training image; for each of the one or more training image pairs: processing the first training image and the second training image using the neural network to generate a final optical flow estimate from the first training image to the second training image; generating a cropped final optical flow estimate from the first training image to the second training image, comprising cropping the final optical flow estimate from the first training image to the second training image; and training the neural network on the one or more training image pairs, the training comprising, for each training image pair, using the cropped final optical flow estimate for the training image pair as a target output for the neural network.

16. A system comprising: one or more computers; and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations training a neural network that has a plurality of network parameters and that is configured to receive as input a first image and a second image and to generate as output an optical flow estimate of optical flow between the first image and the second image, the operations comprising: obtaining a batch of one or more training image pairs, each training image pair comprising a respective first training image and a respective second training image; for each of the one or more training image pairs: processing the first training image and the second training image using the neural network to generate a final optical flow estimate from the first training image to the second training image; generating a cropped final optical flow estimate from the first training image to the second training image, comprising cropping the final optical flow estimate from the first training image to the second training image; and training the neural network on the one or more training image pairs, the training comprising, for each training image pair, using the cropped final optical flow estimate for the training image pair as a target output for the neural network.

17. The system of claim 16, the operations further comprising, for each of the one or more training image pairs: generating a modified first training image, comprising cropping the first training image in the training image pair; generating a modified second training image, comprising cropping the second training image in the training image pair; and processing the modified first training image and the modified second training image using the neural network to generate one or more modified optical flow estimates, wherein training the neural network on the one or more training image pairs comprises: computing a gradient with respect to the network parameters of a loss function that comprises a first term that measures, for each training image pair, an error between (i) the one or more modified optical flow estimates for the training image pair and (ii) the cropped final optical flow estimate for the training image pair; and updating the network parameters using the gradient.

18. The system of claim 17, wherein the neural network is configured to generate the optical flow estimate of optical flow between the first image and the second image by: initializing the optical flow estimate, and at each of a plurality of update iterations, updating the optical flow estimate using features of the first image and the second image, and wherein the one or more modified optical flow estimates include a respective modified optical flow estimate for each of the update iterations.

19. The system of claim 18, wherein the first term measures, for each training image pair and for each of the plurality of update iterations, an error between (i) the respective modified optical flow estimate for the update iteration for the training image pair and (ii) the cropped final optical flow estimate for the training image pair.

20. The system of claim 18, further comprising, for each training image pair and for each of the plurality of update iterations: generating a warped second training image by warping the second training image in the training image pair using the respective modified optical flow estimate for the update iteration, wherein: the loss function further comprises a second term that measures, for each training image pair and for each of the plurality of update iterations, a photometric difference between the warped second training image for the training image pair and for the update iteration and the first training image in the training image pair.

Patent Metadata

Filing Date

Unknown

Publication Date

February 18, 2025

Inventors

Daniel Rudolf Maurer

Austin Charles Stone

Alper Ayvaci

Anelia Angelova

Rico Jonschkowski

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search