Systems for neural network computation are provided. A neural network processor comprises a plurality of neural cores. The neural network processor has one or more processor precisions per activation. The processor is configured to accept data having a processor feature dimension. A transformation circuit is coupled to the neural network processor, and is adapted to: receive an input data tensor having an input precision per channel at one or more features; transform the input data tensor from the input precision to the processor precision; divide the input data into a plurality of blocks, each block conforming to one of the processor feature dimensions; provide each of the plurality of blocks to one of the plurality of neural cores. The neural network processor is adapted to compute, by the plurality of neural cores, output of one or more neural network layers.
Legal claims defining the scope of protection, as filed with the USPTO.
2. The method of claim 1, wherein the input data tensor comprises an image.
3. The method of claim 1, the neural network processor being configured for a predetermined number of features, wherein transforming the input data tensor comprises dividing input features into a plurality of feature sets, each having less than or equal to the predetermined number of features.
4. The method of claim 1, wherein dividing the input data tensor comprises zero-padding the number of blocks in one of the feature dimensions of the input data tensor to conform with one of the feature dimensions of the neural network processor.
5. The method of claim 1, wherein dividing the input data tensor comprises packing the input data tensor.
8. The method of claim 7, wherein the plurality of fixed precision partial sums are intermediate results.
9. The method of claim 8, wherein the intermediate results are weighted sums of a subset of inputs.
10. The method of claim 7, wherein the neural network processor is configured to iteratively compute a partial sum from the plurality of fixed precision partial sums.
12. The system of claim 11, wherein the input data tensor comprises an image.
13. The system of claim 11, wherein transforming the input data tensor comprises dividing each channel into a plurality of values having precisions less than or equal to the one of the processor precisions.
14. The system of claim 11, wherein dividing the input data tensor comprises zero-padding the number of blocks in a feature dimension of the input data tensor to conform with one of the feature dimensions of the neural network processor.
15. The system of claim 11, wherein dividing the input data tensor comprises packing the input data tensor.
18. The system of claim 17, wherein the plurality of fixed precision partial sums are intermediate results.
19. The system of claim 18, wherein the intermediate results are weighted sums of a subset of inputs.
20. The system of claim 17, wherein the neural network processor is configured to iteratively compute a partial sum from the plurality of fixed precision partial sums.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
October 11, 2018
December 31, 2024
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.