A method of video coding at a video coding device includes performing a deformable convolution through a deformable convolutional deep neural network (DNN) to generate one or more first feature maps based on a set of one or more previously reconstructed reference frames, generating a predicted frame based on the one or more first feature maps, and reconstructing a current frame based on the predicted frame. In an embodiment, a set of one or more second feature maps corresponding to the one or more previously reconstructed reference frames can be generated based on a feature extraction DNN. One or more offset maps corresponding to the one or more second feature maps can be generated, respectively, using an offset generation DNN.
Legal claims defining the scope of protection, as filed with the USPTO.
3. The method of claim 2, wherein the target frame neighbors the current frame when the current frame and the one or more previously reconstructed reference frames are arranged in a display order.
9. The method of claim 8, wherein the deformable convolutional DNN includes one or more 3D deformable convolution layers such that each of the one or more 3D deformable convolution layers is associated with a 3D deformable convolution kernel and a 3D offset map, and a 3D deformable convolution is performed based on the respective 3D deformable convolution kernel and the respective 3D offset map at one of the one or more 3D deformable convolution layers.
11. The method of claim 1, wherein the 4D tensor is formed by stacking the one or more previously reconstructed reference frames such that the 4D tensor includes a plurality of channels and a plurality of resolutions.
13. The method of claim 12, wherein the reference frames are selected from a sequence of frames in a video based on a temporal down-sampling operation, and a frame in the sequence of frames that is not selected by the down-sampling operation is used as the ground-truth frame.
15. The method of claim 14, wherein the target frame is the reference frame that neighbors the ground-truth frame when the ground-truth frame and the reference frames are arranged in display order.
19. The non-transitory computer-readable medium of claim 18, wherein the target frame neighbors the current frame when the current frame and the one or more previously reconstructed reference frames are arranged in a display order.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
May 13, 2021
June 27, 2023
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.