Neural Network Layers with a Controlled Degree of Spatial Invariance

PublishedApril 1, 2025

Assigneenot available in USPTO data we have

InventorsGamaleldin Elsayed Prajit Ramachandran Jon Shlens Simon Kornblith

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computing system for relaxing spatial invariance in a neural network, comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store: a neural network comprising one or more layers with relaxed spatial invariance, wherein each of the one or more layers is configured to: receive a respective layer input; convolve a plurality of different kernels against the respective layer input to generate a plurality of intermediate outputs, each of the plurality of intermediate outputs having a plurality of portions; relax spatial invariance of the plurality of different kernels by applying, for each of the plurality of intermediate outputs, a respective plurality of weights respectively associated with the plurality of portions to generate a respective weighted output; and generate a respective layer output based on the weighted outputs; and instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations including using the neural network to process a network input to generate a network output.

2. The computing system of claim 1, wherein, for at least one of the one or more layers, the respective plurality of weights for each of the intermediate outputs comprises a respective plurality of learned weight parameter values.

3. The computing system of claim 1, wherein, for at least one of the one or more layers, the respective plurality of weights for each of the intermediate outputs comprises, for each intermediate output, a plurality of sums respectively of a plurality of row weight values and a plurality of column weight values.

4. The computing system of claim 1, wherein at least one of the one or more layers comprises one or more learned subnetworks that receive one or more portions of the respective layer input and, in response, predict the respective plurality of weights for one or more of the intermediate outputs.

5. The computing system of claim 1, wherein each of the one or more layers is further configured to apply, to the respective layer output, a respective bias, the respective bias comprising a plurality of bias values respectively associated with a plurality of rows and a plurality of columns of the respective layer output.

6. The computing system of claim 1, wherein the network input comprises: an image; a video; a point cloud; text; or a spectrogram dataset.

7. The computing system of claim 1, wherein the network output identifies: one or more objects contained in the network input; one or more properties of the network input; or one or more frequency positions of the network input.

8. The computing system of claim 1, wherein the number of kernels in the plurality of different kernels is based at least in part on a user-specified spatial rank parameter.

9. The computing system of claim 1, further comprising: pre-processing the network input to normalize a spatial configuration of one or more objects included in the network input.

10. The computing system of claim 1, wherein the sum of weights of the plurality of weights respectively associated with a portion of the plurality of portions is 1.

11. The computing system of claim 1, wherein each of the plurality of different kernels comprises a tensor having a channel depth greater than one.

12. The computing system of claim 1, wherein for each of the one or more layers, the respective layer input comprises a plurality of channels, and wherein to convolve the plurality of different kernels against the respective layer input the layer is configured to convolve each of the plurality of different kernels against all of the plurality of channels of the respective layer input.

13. A computer-implemented method, comprising: obtaining, by one or more computing devices, a neural network comprising one or more layers with relaxed spatial invariance, each of the one or more layers configured to: convolve a plurality of different kernels against a respective layer input to generate a plurality of intermediate outputs, each of the plurality of intermediate outputs having a plurality of portions; apply, for each of the plurality of intermediate outputs, a respective plurality of weights respectively associated with the plurality of portions to generate a respective weighted output; and generate a respective layer output based on the weighted output; obtaining, by the one or more computing devices, a network input; processing, by the one or more computing devices, the network input using the neural network to receive a network output; evaluating, by the one or more computing devices, a loss function that evaluates a difference between the network output and a ground truth associated with network output; modifying, by the one or more computing devices and based on the loss function, one or more values for one or both of: one or more parameters of at least one kernel of the plurality of different kernels; or one or more of the respective plurality of weights for at least one kernel of the plurality of different kernels; and providing, by the one or more computing devices, the plurality of different kernels and the respective plurality of weights as a trained model.

14. The computer-implemented method of claim 13, wherein modifying the values comprises modifying values for the one or more parameters of at least one kernel of the plurality of different kernels.

15. The computer-implemented method of claim 13, wherein modifying the values comprises modifying values for one or more of the respective plurality of weights for at least one kernel of the plurality of different kernels.

16. The computer-implemented method of claim 13, further comprising: modifying, by the one or more computing devices and based on the loss function, at least one bias value of a plurality of bias values.

17. The computer-implemented method of claim 13, further comprising: modifying, by the one or more computing devices and based on the loss function, a spatial rank parameter, the spatial rank parameter configured to control a number of kernels in the plurality of kernels.

18. The computer-implemented method of claim 13, further comprising: modifying, by the one or more computing devices and based on the loss function, one or more subnetwork parameters included in one or more learned subnetworks included in at least one of the one or more layers, the one or more learned subnetworks configured to receive one or more portions of the respective layer input and, in response, predict the respective plurality of weights for one or more of the intermediate outputs.

19. One or more tangible, non-transitory computer readable media storing computer-readable instructions that when executed by one or more processors cause the one or more processors to perform operations, the operations comprising: obtaining a neural network comprising one or more layers with relaxed spatial invariance, each of the one or more layers configured to: convolve a plurality of different kernels against a respective layer input to generate a plurality of intermediate outputs, each of the plurality of intermediate outputs having a plurality of portions; apply, for each of the plurality of intermediate outputs, a respective plurality of weights respectively associated with the plurality of portions to generate a respective weighted output; and generate a respective layer output based on the weighted output; obtaining a network input; processing the network input using the neural network to receive a network output; evaluating a loss function that evaluates a difference between the network output and a ground truth associated with network output; modifying, based on the loss function, one or more values for one or both of: one or more parameters of at least one kernel of the plurality of different kernels; or one or more of the respective plurality of weights for at least one kernel of the plurality of different kernels; and providing the plurality of different kernels and the respective plurality of weights as a trained model.

20. The one or more tangible, non-transitory media of claim 19, wherein the operations further comprise modifying, based on the loss function, one or more subnetwork parameters included in one or more learned subnetworks included in at least one of the one or more layers, the one or more learned subnetworks configured to receive one or more portions of the respective layer input and, in response, predict the respective plurality of weights for one or more of the intermediate outputs.

Patent Metadata

Filing Date

Unknown

Publication Date

April 1, 2025

Inventors

Gamaleldin Elsayed

Prajit Ramachandran

Jon Shlens

Simon Kornblith

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search