A circuit for performing neural network computations for a neural network comprising a plurality of neural network layers, the circuit comprising: a matrix computation unit configured to, for each of the plurality of neural network layers: receive a plurality of weight inputs and a plurality of activation inputs for the neural network layer, and generate a plurality of accumulated values based on the plurality of weight inputs and the plurality of activation inputs; and a vector computation unit communicatively coupled to the matrix computation unit and configured to, for each of the plurality of neural network layers: apply an activation function to each accumulated value generated by the matrix computation unit to generate a plurality of activated values for the neural network layer.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A circuit for performing neural network computations for a neural network comprising a plurality of neural network layers, the circuit comprising: a matrix computation unit configured to, for each of the plurality of neural network layers: receive a plurality of weight inputs and a plurality of activation inputs for the neural network layer, and generate a plurality of accumulated values based on the plurality of weight inputs and the plurality of activation inputs, wherein the matrix computation unit is configured as a two dimensional systolic array comprising a plurality of cells, wherein the plurality of weight inputs is shifted through a first plurality of cells along a first dimension of the systolic array, and wherein the plurality of activation inputs is shifted through a second plurality of cells along a second dimension of the systolic array; and a vector computation unit communicatively coupled to the matrix computation unit and configured to, for each of the plurality of neural network layers: apply an activation function to each of the plurality of accumulated values for the neural network layer generated by the matrix computation unit to generate a plurality of activated values for the neural network layer.
A neural network processor circuit performs computations for neural networks consisting of multiple layers. It includes a matrix computation unit and a vector computation unit. The matrix unit, a 2D systolic array of processing cells, receives weight inputs and activation inputs for each layer. Weight inputs shift through cells along one dimension of the array, while activation inputs shift through cells along the other dimension. The matrix unit generates accumulated values based on these inputs. The vector unit then applies an activation function to each accumulated value, generating activated values for that layer.
2. The circuit of claim 1 , further comprising: a unified buffer communicatively coupled to the matrix computation unit and the vector computation unit, where the unified buffer is configured to receive and store output from the vector computation unit, and the unified buffer is configured to send the received output as input to the matrix computation unit.
The neural network processor circuit described above is improved by including a unified buffer between the matrix computation unit (the systolic array) and the vector computation unit. This unified buffer stores the output (activated values) from the vector computation unit. The unified buffer then sends this stored output back to the matrix computation unit as input for subsequent layers or processing steps. This allows for efficient data reuse and reduces external memory access.
3. The circuit of claim 2 , further comprising: a sequencer configured to receive instructions from a host device and generate a plurality of control signals from the instructions, where the plurality of control signals control dataflow through the circuit; and a direct memory access engine communicatively coupled to the unified buffer and the sequencer, where the direct memory access engine is configured to send the plurality of activation inputs to the unified buffer, where the unified buffer is configured to send the plurality of activation inputs to the matrix computation unit, and where the direct memory access engine is configured to read result data from the unified buffer.
The neural network processor circuit, including the systolic array, vector computation unit, and unified buffer, is further improved by adding a sequencer and a direct memory access (DMA) engine. The sequencer receives instructions from a host device and generates control signals to manage data flow within the circuit. The DMA engine transfers activation inputs to the unified buffer, which then provides them to the matrix computation unit. The DMA engine also reads the processed results from the unified buffer, enabling efficient data transfer between the circuit and external memory.
4. The circuit of claim 3 , further comprising: a memory unit configured to send the plurality of weight inputs to the matrix computation unit, and where the direct memory access engine is configured to send the plurality of weight inputs to the memory unit.
The neural network processor circuit (systolic array, vector computation unit, unified buffer, sequencer, and DMA engine) includes a memory unit for storing weight inputs. The DMA engine transfers these weight inputs to this memory unit. The memory unit then sends the weight inputs to the matrix computation unit (systolic array) for neural network processing. This separates weight storage from the unified buffer and allows for dedicated memory bandwidth for weight data.
5. The circuit of claim 1 , where the two dimensional systolic array is a square array.
The neural network processor circuit uses a two-dimensional systolic array as its matrix computation unit. This systolic array, responsible for calculating accumulated values from weight and activation inputs, is implemented as a square array. This means the number of rows and columns in the systolic array are equal.
6. The circuit of claim 1 , where, for a given layer in the plurality of layers, a count of the plurality of activation inputs is greater than a size of the second dimension of the systolic array, and where the systolic array is configured to: divide the plurality of activation inputs into portions, where each portion has a size less than or equal to the size of the second dimension; generate, for each portion of activation inputs, a respective portion of accumulated values; and combining each portion of accumulated values to generate a vector of accumulated values for the given layer.
In the neural network processor circuit using a systolic array, if the number of activation inputs for a given neural network layer exceeds the size of one dimension of the systolic array, the array divides the activation inputs into smaller portions. Each portion is no larger than the array dimension. The array computes a set of accumulated values for each of these portions. Finally, these portions of accumulated values are combined to create a complete vector of accumulated values for that layer.
7. The circuit of claim 1 , where, for a given layer in the plurality of layers, a count of the plurality of weight inputs is greater than a size of the first dimension of the systolic array, and where the systolic array is configured to: divide the plurality of weight inputs into portions, where each portion has a size less than or equal to the size of the first dimension; generating, for each portion of weight inputs, a respective portion of accumulated values; and combining each portion of accumulated values to generate a vector of accumulated values for the given layer.
In the neural network processor circuit using a systolic array, if the number of weight inputs for a given neural network layer exceeds the size of one dimension of the systolic array, the array divides the weight inputs into smaller portions. Each portion is no larger than the array dimension. The array computes a set of accumulated values for each of these portions. Finally, these portions of accumulated values are combined to create a complete vector of accumulated values for that layer.
8. The circuit of claim 1 , where each cell in the plurality of cells comprises: a weight register configured to store a weight input; an activation register configured to store an activation input and configured to send the activation input to another activation register in a first adjacent cell along the second dimension; a sum-in register configured to store a previously summed value; multiplication circuitry communicatively coupled to the weight register and the activation register, where the multiplication circuitry is configured to output a product of the weight input and the activation input; and summation circuitry communicatively coupled to the multiplication circuitry and the sum-in register, where the summation circuitry is configured to output a sum of the product and the previously summed value, and where the summation circuitry is configured to send the sum to another sum-in register in a second adjacent cell along the first dimension.
Each cell within the systolic array of the neural network processor circuit contains a weight register to store a weight input, an activation register to store an activation input and pass it to a neighboring cell, and a sum-in register to store a partial sum. Multiplication circuitry multiplies the weight and activation inputs. Summation circuitry adds the product to the partial sum from the sum-in register. The summation circuitry outputs the new sum and passes it to a neighboring cell's sum-in register.
9. The circuit of claim 8 , where one or more cells in the plurality of cells are each configured to store the respective sum in a respective accumulator unit, where the respective sum is an accumulated value.
In the neural network processor circuit's systolic array cells, one or more cells contain an accumulator unit. This accumulator unit stores the sum calculated within the cell, which represents an accumulated value. This dedicated storage allows for the efficient collection of partial sums within the systolic array, before being sent to the vector computation unit.
10. The circuit of claim 1 , where the first dimension of the systolic array corresponds to columns of the systolic array, and where the second dimension of the systolic array corresponds to rows of the systolic array.
In the neural network processor circuit's systolic array, the first dimension of the array (where weight inputs are shifted) corresponds to the columns of the array. The second dimension (where activation inputs are shifted) corresponds to the rows of the array. This describes the physical orientation of the systolic array's data flow.
11. The circuit of claim 1 , where the vector computation unit normalizes each activated value to generate a plurality of normalized values.
This invention relates to a circuit for processing vector computations, particularly in systems requiring efficient normalization of activated values. The problem addressed is the need for accurate and computationally efficient normalization of data in applications such as machine learning, signal processing, or data compression, where normalized values are essential for further processing or analysis. The circuit includes a vector computation unit that processes input data to generate activated values. These activated values are then normalized to produce a plurality of normalized values. Normalization ensures that the data is scaled to a consistent range, improving computational stability and accuracy in subsequent operations. The normalization process may involve techniques such as scaling, centering, or other mathematical transformations to standardize the values. The circuit may also include additional components, such as memory units for storing intermediate or final results, control logic for managing data flow, and interfaces for input/output operations. The vector computation unit may be implemented using hardware accelerators, such as application-specific integrated circuits (ASICs) or field-programmable gate arrays (FPGAs), to enhance processing speed and efficiency. The normalization step ensures that the output data is suitable for further analysis, decision-making, or integration into larger systems. This invention is particularly useful in applications where real-time processing of large datasets is required, such as in neural networks, digital signal processing, or data analytics. By normalizing the activated values, the circuit improves the reliability and performance of downstream computations, making it a valuable component in modern data-driven sy
12. The circuit of claim 1 , where the vector computation unit pools one or more activated values to generate a plurality of pooled values.
After applying an activation function, the vector computation unit in the neural network processor circuit pools one or more activated values together to generate a set of pooled values. Pooling reduces the dimensionality of the data by combining the outputs of neuron clusters into a single value, which reduces computational complexity and helps to extract dominant features.
13. The circuit of claim 1 , where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array.
In the neural network processor circuit's systolic array, the first dimension of the array (where weight inputs are shifted) corresponds to the rows of the array. The second dimension (where activation inputs are shifted) corresponds to the columns of the array. This is an alternative orientation of the systolic array's data flow from Claim 10.
14. A method for performing neural network computations for a neural network comprising a plurality of neural network layers using a circuit comprising a matrix computation unit and a vector computation unit coupled to the matrix computation unit, where the matrix computation unit is configured as a two dimensional systolic array comprising a plurality of cells, and wherein the method comprises, for each of the plurality of neural network layers: providing a plurality of weight inputs and a plurality of activation inputs for the neural network layer to the matrix computation unit, comprising: shifting the plurality of weight inputs through a first plurality of cells along a first dimension of the systolic array, and shifting the plurality of activation inputs through a second plurality of cells along a second dimension of the systolic array; generating, using the matrix computation unit, a plurality of accumulated values, wherein the matrix computation unit is configured to receive the plurality of weight inputs and the plurality of activation inputs for the neural network layer and generate the plurality of accumulated values based on the plurality of weight inputs and the plurality of activation inputs; and generating, using the vector computation unit, a plurality of activated values for the neural network layer, wherein the matrix computation unit is configured to apply an activation function to each accumulated value generated by the matrix computation unit to generate a plurality of activated values for the neural network layer.
A method for neural network computations using a circuit comprised of a systolic array (matrix computation unit) and a vector computation unit, includes processing neural network layers sequentially. For each layer, the method involves providing weight and activation inputs to the systolic array. Weight inputs are shifted through cells along one dimension of the array, and activation inputs are shifted through cells along another dimension. The systolic array generates accumulated values based on these inputs. Finally, the vector computation unit applies an activation function to these accumulated values, generating activated values for the layer.
15. The method of claim 14 , further comprising: receiving, by a unified buffer communicatively coupled to the matrix computation unit and the vector computation unit; storing output from the vector computation unit at the unified buffer; sending, from the unified buffer, the received output as input to the matrix computation unit.
The neural network computation method (systolic array and vector unit) is enhanced by using a unified buffer between the systolic array and the vector unit. The unified buffer receives and stores the output (activated values) from the vector unit. Then, the unified buffer sends this stored output back to the systolic array as input for subsequent processing. This facilitates efficient data reuse and minimizes reliance on external memory.
16. The method of claim 15 , further comprising: receiving, at a sequencer, instructions from a host device and generating a plurality of control signals from the instructions, where the plurality of control signals control dataflow through the circuit; sending, from a direct memory access engine communicatively coupled to the unified buffer and the sequencer, the plurality of activation inputs to the unified buffer; sending, from the unified buffer, the plurality of activation inputs to the matrix computation unit; and reading, at the direct memory access engine, result data from the unified buffer.
The neural network computation method (systolic array, vector unit, and unified buffer) is further enhanced by incorporating a sequencer and a direct memory access (DMA) engine. The sequencer receives instructions from a host and generates control signals that govern data flow. The DMA engine sends activation inputs to the unified buffer, which then forwards them to the systolic array. The DMA engine also retrieves the results from the unified buffer, enabling streamlined data exchange between the circuit and external memory.
17. The method of claim 16 , further comprising: sending, at a memory unit, the plurality of weight inputs to the matrix computation unit; sending, from the direct memory access engine, the plurality of weight inputs to the memory unit.
The neural network computation method (systolic array, vector unit, unified buffer, sequencer, DMA engine) includes sending weight inputs from a memory unit to the systolic array. The DMA engine is used to send these weight inputs to the memory unit initially. This allows for a dedicated memory to supply weight data to the systolic array, decoupled from the activation input flow.
18. The method of claim 14 , where the two dimensional systolic array is a square array.
The neural network computation method utilizes a two-dimensional systolic array as the matrix computation unit. This systolic array, responsible for calculating accumulated values from weight and activation inputs, is implemented as a square array. This means the number of rows and columns in the systolic array are equal.
19. The method of claim 14 , where, for a given layer in the plurality of layers, a count of the plurality of activation inputs is greater than a size of the second dimension of the systolic array, the method further comprising: dividing, at the systolic array, the plurality of activation inputs into portions, where each portion has a size less than or equal to the size of the second dimension; generating, for each portion of activation inputs and at the systolic array, a respective portion of accumulated values; and combining, at the systolic array, each portion of accumulated values to generate a vector of accumulated values for the given layer.
The neural network computation method that uses a systolic array handles situations where the number of activation inputs for a layer exceeds the size of one dimension of the systolic array. The array divides the activation inputs into smaller portions, each no larger than the array dimension. For each portion, the array computes a set of accumulated values. These portions of accumulated values are then combined to generate a complete vector of accumulated values for that layer.
20. The method of claim 14 , where, for a given layer in the plurality of layers, a count of the plurality of weight inputs is greater than a size of the first dimension of the systolic array, the method further comprising: dividing, at the systolic array, the plurality of weight inputs into portions, where each portion has a size less than or equal to the size of the first dimension; generating, for each portion of weight inputs and at the systolic array, a respective portion of accumulated values; and combining, at the systolic array, each portion of accumulated values to generate a vector of accumulated values for the given layer.
The neural network computation method that uses a systolic array handles situations where the number of weight inputs for a layer exceeds the size of one dimension of the systolic array. The array divides the weight inputs into smaller portions, each no larger than the array dimension. For each portion, the array computes a set of accumulated values. These portions of accumulated values are then combined to generate a complete vector of accumulated values for that layer.
21. The method of claim 14 , where each cell in the plurality of cells comprises: a weight register configured to store a weight input; an activation register configured to store an activation input and configured to send the activation input to another activation register in a first adjacent cell along the second dimension; a sum-in register configured to store a previously summed value; multiplication circuitry communicatively coupled to the weight register and the activation register, where the multiplication circuitry is configured to output a product of the weight input and the activation input; and summation circuitry communicatively coupled to the multiplication circuitry and the sum-in register, where the summation circuitry is configured to output a sum of the product and the previously summed value, and where the summation circuitry is configured to send the sum to another sum-in register in a second adjacent cell along the first dimension.
In the neural network computation method, each cell within the systolic array contains a weight register to store a weight input, an activation register to store an activation input and pass it to a neighboring cell, and a sum-in register to store a partial sum. Multiplication circuitry multiplies the weight and activation inputs. Summation circuitry adds the product to the partial sum from the sum-in register. The summation circuitry outputs the new sum and passes it to a neighboring cell's sum-in register.
22. The method of claim 21 , further comprising storing, at one or more cells in the plurality of cells, the respective sum in a respective accumulator unit, where the respective sum is an accumulated value.
In the neural network computation method using systolic array cells, one or more cells contain an accumulator unit. This accumulator unit stores the sum calculated within the cell, which represents an accumulated value.
23. The method of claim 14 , where the first dimension of the systolic array corresponds to columns of the systolic array, and where the second dimension of the systolic array corresponds to rows of the systolic array.
In the neural network computation method employing a systolic array, the first dimension of the array (where weight inputs are shifted) corresponds to the columns of the array. The second dimension (where activation inputs are shifted) corresponds to the rows of the array.
24. The method of claim 14 , further comprising normalizing, at the vector computation unit, each activated value to generate a plurality of normalized values.
The neural network computation method, after applying an activation function, normalizes each activated value to produce a set of normalized values. This normalization is performed within the vector computation unit.
25. The method of claim 14 , further comprising pooling, at the vector computation unit, one or more activated values to generate a plurality of pooled values.
The neural network computation method includes pooling one or more activated values together to generate a set of pooled values. This pooling operation is performed within the vector computation unit.
26. The method of claim 14 , where the first dimension of the systolic array corresponds to rows of the systolic array, and where the second dimension of the systolic array corresponds to columns of the systolic array.
In the neural network computation method employing a systolic array, the first dimension of the array (where weight inputs are shifted) corresponds to the rows of the array. The second dimension (where activation inputs are shifted) corresponds to the columns of the array.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
December 22, 2016
July 18, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.