Distributed Batch Normalization Using Partial Populations

PublishedMay 24, 2022

Assigneenot available in USPTO data we have

InventorsLarry Robert Dennison Benjamin Klenk

Technical Abstract

Patent Claims

25 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method for training a neural network model, comprising: processing, by a processor in a plurality of processors, at least one sample included in a set of training samples to generate activations for the at least one sample; analyzing, by the processor, the activations to calculate a statistical measure associated with the activations for the at least one sample; transmitting the statistical measure to at least one additional processor configured to reduce multiple statistical measures received from the plurality of processors to generate normalization parameters associated with a layer of the neural network model; processing, by the processor, one or more additional samples included in the set of training samples to generate one or more additional activations for the one or more additional samples in parallel with the at least one additional processor generating the normalization parameters; receiving the normalization parameters from the at least one additional processor; and applying the normalization parameters to the activations for the at least one sample and the one or more additional activations for the one or more additional samples.

2. The computer-implemented method of claim 1 , wherein the statistical measure associated with the activations for the at least one sample is calculated based on an analysis of activations for at least two samples in the set of training samples allocated to the processor.

3. The computer-implemented method of claim 1 , wherein the statistical measure is calculated by the processor using a first precision and the normalization parameters are calculated by the at least one additional processor using a second precision.

4. The computer-implemented method of claim 3 , wherein the first precision is a 32-bit floating point format and the second precision is a 64-bit floating point format.

5. The computer-implemented method of claim 1 , wherein the at least one sample processed to generate the activations is selected from the set of training samples according to a round-robin scheduling mechanism.

6. The computer-implemented method of claim 1 , wherein each processor in the plurality of processors comprise a parallel processing unit configured to implement at least a portion of the neural network model.

7. The computer-implemented method of claim 1 , wherein the at least one additional processor comprises a switch configured to route data between the plurality of processors, the switch including a cache, and wherein the reduce operation is implemented, at least in part, within the cache.

8. The computer-implemented method of claim 1 , wherein analyzing the activations comprises calculating at least one of a mean or a variance for the activations.

9. The computer-implemented method of claim 1 , wherein a number of samples in the at least one sample represents a statistically insignificant sample of a mini-batch of training samples.

10. A system for training a neural network model, comprising: a processor configured to: process at least one sample included in a set of training samples to generate activations for the at least one sample, analyze the activations to calculate a statistical measure associated with the activations for the at least one sample, process one or more additional samples included in the set of training samples to generate one or more additional activations for the one or more additional samples, and apply normalization parameters to the activations for the at least one sample and the one or more additional activations for the one or more additional samples; and at least one additional processor in communication with the processor and configured to: receive multiple statistical measures from a plurality of processors, the multiple statistical measures including the statistical measure calculated by the processor, reduce the multiple statistical measures received from the plurality of processors to generate normalization parameters associated with a layer of the neural network model, and transmit the normalization parameters to each of the processors in the plurality of processors, wherein the processor processes the one or more additional samples included in the set of training samples in parallel with the at least one additional processor generating the normalization parameters.

11. The system of claim 10 , wherein the processor is a parallel processing unit.

12. The system of claim 10 , wherein the statistical measure is calculated by the processor using a first precision and the normalization parameters are generated by the at least one additional processor using a second precision.

13. The system of claim 12 , wherein the first precision is a 32-bit floating point format and the second precision is a 64-bit floating point format.

14. The system of claim 11 , wherein the at least one additional processor comprises a switch, the switch including a cache, and wherein the reduce operation is implemented, at least in part, within the cache.

15. The system of claim 10 , wherein the at least one additional processor comprises a switch configured to route data between the plurality of processors, the switch including a cache, and wherein the reduce operation is implemented, at least in part, within the cache.

16. The system of claim 10 , further comprising a host processor configured to distribute the set of training samples to the processor.

17. A non-transitory computer-readable media storing computer instructions for training a neural network model that, when executed by a processor, cause the processor to perform the steps of: processing at least one sample included in a set of training samples to generate activations for the at least one sample; analyzing the activations to calculate a statistical measure associated with the activations for the at least one sample; transmitting the statistical measure to at least one additional processor configured to reduce multiple statistical measures received from a plurality of processors to generate normalization parameters associated with a layer of the neural network model; processing one or more additional samples included in the set of training samples to generate one or more additional activations for the one or more additional samples in parallel with the at least one additional processor generating the normalization parameters; receiving the normalization parameters from the at least one additional processor; and applying the normalization parameters to the activations for the at least one sample and the one or more additional activations for the one or more additional samples.

18. The computer-readable media of claim 17 , wherein the statistical measure is calculated by the processor using a first precision and the normalization parameters are calculated by the at least one additional processor using a second precision.

19. The computer-readable media of claim 18 , wherein the first precision is a 32-bit floating point format and the second precision is a 64-bit floating point format.

20. The computer-readable media of claim 17 , wherein the at least one additional processor comprises a switch, the switch including a cache, and wherein the reduce operation is implemented, at least in part, within the cache.

21. An autonomous vehicle that utilizes a neural network model, comprising: a processor configured to: process at least one sample included in a set of training samples to generate activations for the at least one sample, analyze the activations to calculate a statistical measure associated with the activations for the at least one sample, transmit the statistical measure to at least one additional processor configured to reduce multiple statistical measures received from a plurality of processors to generate normalization parameters associated with a layer of the neural network model, process one or more additional samples included in the set of training samples to generate one or more additional activations for the one or more additional samples, and apply the normalization parameters to the activations for the at least one sample and the one or more additional activations for the one or more additional samples, wherein the at least one additional processor is configured to: receive multiple statistical measures from the plurality of processors, the multiple statistical measures including the statistical measure calculated by the processor, generate the normalization parameters associated with a layer of the neural network model, and transmit the normalization parameters to each of the processors in the plurality of processors, wherein the processor processes the one or more additional samples included in the set of training samples in parallel with the at least one additional processor generating the normalization parameters.

22. The autonomous vehicle of claim 21 , further comprising a transceiver configured to transmit the statistical measure to the at least one additional processor and receive the normalization parameters from the at least one additional processor via a wireless communications medium.

23. The autonomous vehicle of claim 21 , wherein the at least one additional processor and the plurality of processors are included in the autonomous vehicle.

24. The autonomous vehicle of claim 21 , wherein the autonomous vehicle is one of an automobile, a truck, a ship, an aircraft, a spacecraft, or an armored vehicle.

25. The autonomous vehicle of claim 21 , wherein at least a portion of the set of training samples is generated by a sensor network included in the autonomous vehicle during manual operation of the autonomous vehicle.

Patent Metadata

Filing Date

Unknown

Publication Date

May 24, 2022

Inventors

Larry Robert Dennison

Benjamin Klenk

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search