Devices and techniques are generally described for machine learning hardware optimization. In some examples, a first computing device may receive first data describing a first machine learning model. A first operator type and a first input size for a first layer of the first machine learning model may be determined from the first data. First executable code may be generated that defines a first operator for the first layer of the first machine learning model. The first operator may be specific to the first input size and the first operator type. The first executable code may be stored in non-transitory computer-readable memory. In some examples, second data may be input into the first machine learning model. The first machine learning model may process the second data to generate first output data based at least in part on execution of the first code.
Legal claims defining the scope of protection, as filed with the USPTO.
6. The method of claim 5, further comprising determining, prior to the generating the first executable code, first assembly code stored in the non-transitory computer-readable memory that corresponds to the first operator, wherein the first executable code comprises the first assembly code.
10. The method of claim 5, further comprising programming machine learning accelerator hardware of the first computing device using the first executable code from a first configuration that is agnostic to any machine learning model input size, output size, or operator type to a second configuration that is associated with the first machine learning model.
12. The method of claim 5, wherein a first processor of the first computing device is programmed for general-sized arithmetic operators, the method further comprising generating, using the first executable code, a re-programmed first processor programmed for at least one arithmetic operator that is specific to the first input size.
14. The system of claim 13, the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to determine, prior to the generation of the first executable code, first assembly code stored in the at least one non-transitory computer-readable memory that corresponds to the first operator, wherein the first executable code comprises the first assembly code.
20. The system of claim 13, wherein a first processor of the at least one processor is programmed for general-sized arithmetic operators and wherein the at least one non-transitory computer-readable memory stores further instructions that, when executed by the at least one processor, are further effective to generate, using the first executable code, a re-programmed first processor programmed for at least one arithmetic operator that is specific to the first input size.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
September 28, 2022
August 27, 2024
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.