A compiler efficiently manages memory usage in the machine learning accelerator by intelligently ordering computations of a machine learning network. The compiler identifies a set of partial networks of the machine learning network representing portions of the machine learning network across multiple layers on which an output or set of outputs are dependent. Because any given output may depend on only a limited subset of intermediate outputs from the prior layers, each partial network may include only a small fraction of the intermediate outputs from each layer. Instead of implementing the MLN by computing one layer at a time, the compiler schedules instructions to sequentially implement partial networks. As each layer of a partial network is completed, the intermediate outputs can be released from memory. The described technique enables intermediate outputs to be directly streamed between processing elements of the machine learning accelerator without requiring large transfers to and from external memory.
Legal claims defining the scope of protection, as filed with the USPTO.
5. The method of claim 1, wherein at least one of the first intermediate outputs of the first partial network overlaps with at least one of the second intermediate outputs of the second partial network.
11. The method of claim 1, wherein scheduling the execution of the instructions comprises determining the order to minimize a total number of data transfers.
12. The method of claim 1, wherein scheduling the execution of the instructions comprises determining the order to minimize a number of data transfers to L2 memory external to the processing elements.
13. The method of claim 1, wherein allocating the computations of the machine learning network to the processing elements comprises allocating computations of a consecutive layers of the machine learning network to physically adjacent groups of processing elements.
23. The system of claim 22, wherein the instructions further cause the mesh of interconnected processing elements to implement partial networks of the machine learning network according to an order in which partial networks having overlapping intermediate outputs are implemented consecutively.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
May 4, 2020
February 21, 2023
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.