Legal claims defining the scope of protection, as filed with the USPTO.
1. A method of forecasting multiple positions of a subject depicted by an image, the method comprising: receiving, by a forecasting neural network, an image depicting a subject, wherein the forecasting neural network comprises an encoder neural network, a recurrent neural network, and a decoder neural network; extracting, by the encoder neural network, a feature of the received image; providing the extracted feature to the recurrent neural network; determining, by the recurrent neural network, a first modification to the extracted feature; determining, by the recurrent neural network and based on the first modification, a second modification to the extracted feature; generating, by the recurrent neural network, a first forecasted feature based on the determined first modification and a second forecasted feature based on the determined second modification; providing the first forecasted feature and the second forecasted feature to the decoder neural network; and generating, by the decoder neural network, a first set of keypoints and a second set of keypoints, wherein each keypoint in the first set of keypoints indicates a forecasted position of a respective portion of the image subject and each keypoint in the second set of keypoints indicates a second forecasted position of the respective portion of the image subject.
2. The method of claim 1 , wherein the recurrent neural network includes a long short term memory (LSTM) neural network.
3. The method of claim 1 , wherein the recurrent neural network provides the first forecasted feature based on a convolution of (i) the extracted feature and (ii) memory information describing the first modification.
4. The method of claim 1 , further comprising: identifying, by the decoder neural network, a type of each keypoint in the first set of keypoints; and generating a pose based on the identified type of each keypoint in the first set of keypoints and information indicating connections between the identified types.
5. The method of claim 1 , further comprising: extracting, by a particular layer included in the encoder, an additional feature from the image; providing the additional extracted feature to an additional recurrent neural network; determining, by the additional recurrent neural network, an additional modification to the additional extracted feature; generating, by the additional recurrent neural network, an additional forecasted feature based on the additional modification; and generating, by an associated layer included in the decoder neural network, an additional set of forecasted keypoints based on the additional forecasted feature.
6. The method of claim 1 , further comprising: producing, with a flow decoder neural network, a first set of motion vectors based on the first forecasted feature and a second set of motion vectors based on the second forecasted feature; wherein each motion vector in the first set of motion vectors and each motion vector in the second set of motion vectors corresponds to a respective pixel in the received image.
7. The method of claim 6 , further comprising: determining a first forecasted position of each pixel in the received image based on the motion vector, from the first set of motion vectors, that corresponds to the respective pixel in the received image; generating a first image based on the first forecasted position of each pixel in the received image; determining a second forecasted position of each pixel in the received image based on the motion vector, from the second set of motion vectors, that corresponds to the respective pixel in the received image; and generating a second image based on the second forecasted position of each pixel in the received image.
8. A non-transitory computer-readable medium embodying program code for producing multiple poses from an input image, the program code comprising instructions which, when executed by a processor, cause the processor to perform operations comprising: receiving, by a forecasting neural network, an image depicting a subject, wherein the forecasting neural network comprises an encoder neural network, a recurrent neural network, and a decoder neural network; extracting, by the encoder neural network, a feature of the received image; providing the extracted feature to the recurrent neural network; determining, by the recurrent neural network, a first modification to the extracted feature; determining, by the recurrent neural network and based on the first modification, a second modification to the extracted feature; generating, by the recurrent neural network, a first forecasted feature based on the determined first modification and a second forecasted feature based on the determined second modification; providing the first forecasted feature and the second forecasted feature to the decoder neural network; and generating, by the decoder neural network, a first set of keypoints and a second set of keypoints, wherein each keypoint in the first set of keypoints indicates a forecasted position of a respective portion of the image subject and each keypoint in the second set of keypoints indicates a second forecasted position of the respective portion of the image subject.
9. The non-transitory computer-readable medium of claim 8 , wherein the recurrent neural network includes a long short term memory (LSTM) neural network.
10. The non-transitory computer-readable medium of claim 8 , wherein the recurrent neural network provides the first forecasted feature based on a convolution of (i) the extracted feature and (ii) memory information describing the first modification.
11. The non-transitory computer-readable medium of claim 8 , the operations further comprising: identifying, by the decoder neural network, a type of each keypoint in the first set of keypoints; and generating a pose based on the identified type of each keypoint in the first set of keypoints and information indicating connections between the identified types.
12. The non-transitory computer-readable medium of claim 8 , the operations further comprising: extracting, by a particular layer included in the encoder, an additional feature from the image; providing the additional extracted feature to an additional recurrent neural network; determining, by the additional recurrent neural network, an additional modification to the additional extracted feature; generating, by the additional recurrent neural network, an additional forecasted feature based on the additional modification; and generating, by an associated layer included in the decoder neural network, an additional set of forecasted keypoints based on the additional forecasted feature.
13. The non-transitory computer-readable medium of claim 8 , the operations further comprising: producing, with a flow decoder neural network, a first set of motion vectors based on the first forecasted feature and a second set of motion vectors based on the second forecasted feature; wherein each motion vector in the first set of motion vectors and each motion vector in the second set of motion vectors corresponds to a respective pixel in the received image.
14. The non-transitory computer-readable medium of claim 13 , the operations further comprising: determining a first forecasted position of each pixel in the received image based on the motion vector, from the first set of motion vectors, that corresponds to the respective pixel in the received image; generating a first image based on the first forecasted position of each pixel in the received image; determining a second forecasted position of each pixel in the received image based on the motion vector, from the second set of motion vectors, that corresponds to the respective pixel in the received image; and generating a second image based on the second forecasted position of each pixel in the received image.
15. A system for producing multiple poses from an input image, the system comprising: a means for receiving, by a forecasting neural network, an image depicting a subject, wherein the forecasting neural network comprises an encoder neural network, a recurrent neural network, and a decoder neural network; a means for extracting, by the encoder neural network, a feature of the received image; a means for providing the extracted feature to the recurrent neural network; a means for determining, by the recurrent neural network, a first modification to the extracted feature; a means for determining, by the recurrent neural network and based on the first modification, a second modification to the extracted feature; a means for generating, by the recurrent neural network, a first forecasted feature based on the determined first modification and a second forecasted feature based on the determined second modification; a means for providing the first forecasted feature and the second forecasted feature to the decoder neural network; and a means for generating, by the decoder neural network, a first set of keypoints and a second set of keypoints, wherein each keypoint in the first set of keypoints indicates a forecasted position of a respective portion of the image subject and each keypoint in the second set of keypoints indicates a second forecasted position of the respective portion of the image subject.
16. The system of claim 15 , wherein the recurrent neural network provides the first forecasted feature based on a convolution of (i) the extracted feature and (ii) memory information describing the first modification.
17. The system of claim 15 , further comprising: a means for identifying, by the decoder neural network, a type of each keypoint in the first set of keypoints; and a means for generating a pose based on the identified type of each keypoint in the first set of keypoints and information indicating connections between the identified types.
18. The system of claim 15 , further comprising: a means for extracting, by a particular layer included in the encoder, an additional feature from the image; a means for providing the additional extracted feature to an additional recurrent neural network; a means for determining, by the additional recurrent neural network, an additional modification to the additional extracted feature; a means for generating, by the additional recurrent neural network, an additional forecasted feature based on the additional modification; and a means for generating, by an associated layer included in the decoder neural network, an additional set of forecasted keypoints based on the additional forecasted feature.
19. The system of claim 15 , further comprising: a means for producing, with a flow decoder neural network, a first set of motion vectors based on the first forecasted feature and a second set of motion vectors based on the second forecasted feature; a means for wherein each motion vector in the first set of motion vectors and each motion vector in the second set of motion vectors corresponds to a respective pixel in the received image.
20. The system of claim 19 , further comprising: a means for determining a first forecasted position of each pixel in the received image based on the motion vector, from the first set of motion vectors, that corresponds to the respective pixel in the received image; a means for generating a first image based on the first forecasted position of each pixel in the received image; a means for determining a second forecasted position of each pixel in the received image based on the motion vector, from the second set of motion vectors, that corresponds to the respective pixel in the received image; and a means for generating a second image based on the second forecasted position of each pixel in the received image.
Unknown
October 9, 2018
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.