Image Compression with Recurrent Neural Networks

PublishedJanuary 29, 2019

Assigneenot available in USPTO data we have

InventorsGeorge Dan Toderici Sean O'Malley Rahul Sukthankar Sung Jin Hwang Damien Vincent+4 more

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer implemented method for compressing an image, comprising: obtaining an initial input image; processing the initial input image and subsequent input images using a neural network system until a compressed representation of the input image with target characteristics is achieved, comprising, for each of a plurality of iterations: identifying an input image for the iteration, wherein (i) for a first iteration of the plurality of iterations the input image is the initial input image and (ii) for each iteration of the plurality of iterations other than the first iteration, the input image is a residual error image between a reconstruction of the input image generated at a preceding iteration and the initial input image; processing the input image for the iteration through an encoder recurrent neural network to generate a compressed representation of the input image for the iteration; processing the compressed representation of the input image for the iteration through a decoder recurrent neural network to generate a reconstruction of the input image for the iteration; determining a residual error image between the reconstruction of the input image for the iteration and the initial input image; determining, from at least one of the residual error image for the iteration or the compressed representation of the input image for the iteration, whether the target characteristics have been achieved; and in response to determining that the target characteristics have been achieved, providing a compressed representation of the initial input image that comprises the compressed representation of the input image for one or more of the iterations of the plurality of iterations.

2. The method of claim 1 , wherein the target characteristics include one or more of (i) a target quality metric, and (ii) a target image compression rate.

3. The method of claim 1 , wherein the initial input image is a patch of a larger image.

4. The method of claim 3 , further comprising, processing each other patch of the larger image to generate corresponding compressed representations with independent target characteristics for each other patch of the larger image, wherein the corresponding compressed representations with target characteristics have varying image compression rates.

5. The method of claim 1 , wherein the encoder recurrent neural network includes a binarizing neural network layer configured to receive a first stack output as input and generate a binarized output, wherein the binarized output is the compressed representation of the input image for the iteration.

6. The method of claim 5 , wherein the compressed representation of the input image for the iteration has a predetermined number of bits.

7. The method of claim 6 , wherein the number of bits in the compressed representation of the input image may be varied by varying a number of nodes in the binarizing neural network layer before training.

8. The method of claim 6 , wherein the number of bits in the compressed representation of the input image corresponds to a number of rows in a linear weight matrix that is used to transform an activation from a previous layer in the neural network system.

9. The method of claim 5 , wherein the binarized output includes a respective discrete representation for each of a predetermined number of output bits, wherein the discrete representations are each in the set {−1,1}, and wherein the binarizing neural network layer is further configured to: process the received first stack output to generate a binarizing neural network layer output with the predetermined number of outputs, wherein the value of each output in the predetermined number of outputs is a real number in a continuous interval between −1 and 1; and for each output in the predetermined number of outputs, produce a corresponding discrete representation of the output in the set {−1,1}.

10. The method of claim 5 , wherein the binarizing neural network layer is (i) a fully connected neural network layer with hyperbolic tangent activations, or (ii) a convolutional neural network layer followed by a stochastic binary sampler.

11. The method of claim 1 , wherein the encoder neural network comprises one or more LSTM neural network layers and one or more convolutional neural network layers, and the decoder neural network comprises one or more LSTM neural network layers and one or more convolutional neural network layers.

12. The method of claim 1 , wherein the encoder neural network comprises one or more LSTM neural network layers and one or more convolutional LSTM neural network layers, and the decoder neural network comprises one or more LSTM neural network layers and one or more deconvolutional LSTM neural network layers.

13. The method of claim 1 , wherein the neural network system is trained using a single training procedure to learn to generate compressed representations of input images, wherein the training procedure does not depend on a dimension of the input images or a desired compression rate of the generated compressed representations of input images.

14. One or more non-transitory computer-readable media having instructions stored thereon that, when executed by one or more processors, cause the one or more processors to perform operations comprising: obtaining an initial input image; processing the initial input image and subsequent input images using a neural network system until a compressed representation of the input image with target characteristics is achieved, comprising, for each of a plurality of iterations: identifying an input image for the iteration, wherein (i) for a first iteration of the plurality of iterations the input image is the initial input image and (ii) for each iteration of the plurality of iterations other than the first iteration, the input image is a residual error image between a reconstruction of the input image generated at a preceding iteration and the initial input image; processing the input image for the iteration through an encoder recurrent neural network to generate a compressed representation of the input image for the iteration; processing the compressed representation of the input image for the iteration through a decoder recurrent neural network to generate a reconstruction of the input image for the iteration; determining a residual error image between the reconstruction of the input image for the iteration and the initial input image; determining, from at least one of the residual error image for the iteration or the compressed representation of the input image for the iteration, whether the target characteristics have been achieved; and in response to determining that the target characteristics have been achieved, providing a compressed representation of the initial input image that comprises the compressed representation of the input image for one or more of the iterations of the plurality of iterations.

15. A neural network system implemented by one or more computers, the neural network system comprising: an encoder neural network comprising one or more LSTM neural network layers and one or more non-LSTM neural network layers, wherein the encoder neural network is configured to receive an initial input image at a first of a plurality of iterations and a subsequent input image at each iteration of the plurality of iterations other than the first iteration and generate a first stack output at each of the plurality of iterations; a binarizing neural network layer configured to generate a binarized output at each of the plurality of iterations; a decoder neural network comprising one or more LSTM neural network layers and one or more non-LSTM neural network layers, wherein the decoder neural network is configured to generate a reconstruction of the input image at each of the plurality of iterations; and a residual error calculator configured to determine a residual error image between the reconstruction of the input image and the initial input image and determine, from at least one of the residual error image or the binarized output, whether the binarized output achieves one or more target characteristics.

16. The system of claim 15 , wherein the binarized output includes a respective discrete representation for each of a predetermined number of output bits, wherein the discrete representations are each in the set {−1,1}, and wherein the binarizing neural network layer is further configured to: process the received first stack output to generate a binarizing neural network layer output with the predetermined number of outputs, wherein the value of each output in the predetermined number of outputs is a real number in a continuous interval between −1 and 1; and for each output in the predetermined number of outputs, produce a corresponding discrete representation of the output in the set {−1,1}.

17. The system of claim 15 , wherein the generated binarized output is a compressed representation of the input image for the iteration and wherein the number of bits in the compressed representation of the input image may be varied by varying a number of nodes in the binarizing neural network layer before training.

18. The system of claim 17 , wherein the number of bits in the compressed representation of the input image corresponds to a number of rows in a linear weight matrix that is used to transform an activation from a previous layer in the neural network system.

19. The system of claim 15 , wherein the binarizing neural network layer is (i) a fully connected neural network layer with hyperbolic tangent activations, or (ii) a convolutional neural network layer followed by a stochastic binary sampler.

20. The system of claim 15 , wherein the neural network system is trained using a single training procedure to learn to generate compressed representations of input images, wherein the training procedure does not depend on a dimension of the input images or a desired compression rate of the generated compressed representations of input images.

Patent Metadata

Filing Date

Unknown

Publication Date

January 29, 2019

Inventors

George Dan Toderici

Sean O'Malley

Rahul Sukthankar

Sung Jin Hwang

Damien Vincent

Nicholas Johnston

David Charles Minnen

Joel Shor

Michele Covell

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search