Neural Network Device for Neural Network Operation, Operating Method of the Neural Network Device, and Application Processor Including the Same

PublishedFebruary 11, 2025

Assigneenot available in USPTO data we have

InventorsHyunpil Kim Hyunwoo Sim Seongwoo Ahn Hasong Kim Doyoung Lee

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A neural network device, the neural network device comprising: a calculation circuit that includes a first multiplier, a second multiplier, an align shifter, an adder, and a first post adder, wherein the adder shares the first multiplier and the second multiplier, wherein the calculation circuit performs a first dot product operation on a plurality of floating point data pairs or a second dot product operation on a plurality of integer data pairs, wherein in the first dot product operation, the calculation circuit obtains a plurality of fraction multiplication results from the plurality of floating point data pairs, respectively, using the first multiplier, performs an align shift of the plurality of fraction multiplication results based on a maximum value identified from a plurality of exponent addition results that respectively correspond to the plurality of floating point data pairs using the align shifter, adds the aligned plurality of fraction multiplication results and generates first cumulative data using the adder, detects a first leading one by right shifting upper bits of the first cumulative data using the first post adder, detects a second leading one by right shifting lower bits of the first cumulative data that exclude the upper bits of the first cumulative data using the first post adder, and outputs the first cumulative data using the first post adder, and, wherein in the second dot product operation, the calculation circuit obtains a plurality of integer multiplication results from the plurality of integer data pairs, respectively, using the second multiplier, adds the plurality of integer multiplication results using the adder and outputs second cumulative data,, wherein the adder comprises: a first add circuit that adds upper bits, but not lower bits, of the aligned plurality of fraction multiplication results in the first dot product operation; and a second add circuit that adds lower bits, but not upper bits, of the aligned plurality of fraction multiplication results in the first dot product operation, or adds the plurality of integer multiplication results in the second dot product operation.

2. The neural network device of claim 1, wherein the calculation circuit further comprises: a second post adder that, in the second dot product operation, adds a first addition result data output from the second add circuit and outputs the second cumulative data, wherein the first post adder in the first dot product operation further generates third addition result data by adding addition result data output from the first add circuit to a second addition result data output from the second add circuit, and outputs the first cumulative data by normalizing and rounding the third addition result data.

3. The neural network device of claim 2, wherein the calculation circuit receives the plurality of integer data pairs that include first integer data pairs and second integer data pairs, and in the second dot product operation, obtains first integer multiplication results with respect to the first integer data pairs using the first multiplier, and obtains second integer multiplication results with respect to the second integer data pairs using the second multiplier.

4. The neural network device of claim 3, wherein the first add circuit adds the first integer multiplication results in the second dot product operation and wherein the second add circuit adds the second integer multiplication results in the second dot product operation.

5. The neural network device of claim 4, wherein, in the second dot product operation, the second post adder adds the addition result data output from the first add circuit to the addition result data output from the second add circuit and outputs the second cumulative data.

6. The neural network device of claim 2, wherein the calculation circuit gates the second multiplier and the second post adder when performing the first dot product operation, and gates the align shifter and the first post adder when performing the second dot product operation.

7. The neural network device of claim 1, wherein a plurality of input data items included in the plurality of floating point data pairs have different types of formats from that of the first cumulative data, and wherein a plurality of input data items included in the plurality of integer data pairs have different types of formats from that of the second cumulative data.

8. The neural network device of claim 7, wherein the plurality of input data items included in the plurality of floating point data pairs have a floating point 16 (FP16)-type format or a brain float 16 (BF16)-type format, and wherein the first cumulative data has a floating point 32 (FP32)-type format.

9. The neural network device of claim 8, wherein the calculation circuit extends an exponent bit field of first data of the plurality of input data items that have the FP16-type format and extends a fraction bit field of second data of the plurality of input data items that have the BF16-type format.

10. The neural network device of claim 7, wherein the plurality of input data items in the plurality of integer data pairs have an integer8 (INT8)-type format, and wherein the second cumulative data has an integer32 (INT32)-type format.

11. The neural network device of claim 1, further comprising: a buffer that stores third cumulative data that is floating point data generated by the calculation circuit, and wherein, in the first dot product operation, the calculation circuit receives the plurality of floating point data pairs and the third cumulative data, performs an align shift of the plurality of fraction multiplication results and a fraction part of the third cumulative data based on the maximum value identified from the plurality of exponent addition results and an exponent part of the third cumulative data, adds the aligned plurality of fraction multiplication results and the aligned fraction part of the third cumulative data using the adder, and outputs the first cumulative data.

12. The neural network device of claim 11, wherein the buffer stores fourth accumulative data that is integer data generated by the calculation circuit, and wherein, in the second dot product operation, the calculation circuit receives the plurality of integer data pairs and the fourth cumulative data, adds the plurality of fraction multiplication results and the fourth cumulative data using the adder, and outputs the second cumulative data.

13. The neural network device of claim 12, wherein the calculation circuit stores the first cumulative data and the second cumulative data in the buffer.

14. The neural network device of claim 1, wherein the calculation circuit further: detects a first value one by right shifting the upper bits of the first cumulative data for the detection of the first leading one using the first post adder, and detects a second value one by right shifting the lower bits of the first cumulative data that exclude the upper bits of the first cumulative data for the detection of the second leading one using the first post adder.

15. A method of operating a neural network device, the operating method comprising: configuring the neural network device to perform both floating point and integer operations; receiving a plurality of data pairs; performing a floating point operation when the plurality of data pairs have a floating point format; performing an integer operation when the plurality of data pairs have an integer format; and storing final data generated through the floating point operation or the integer operation in a memory, wherein performing the floating point operation comprises: obtaining a plurality of fraction multiplication results that respectively correspond to the plurality of data pairs using a floating point multiplier, performing an align shift of the plurality of fraction multiplication results based on a maximum value identified from a plurality of exponent addition results that respectively correspond to the plurality of data pairs using an align shifter, adding upper bits, but not lower bits, of the aligned plurality of fraction multiplication results using a first add circuit included in an adder, adding lower bits, but not upper bits, of the aligned plurality of fraction multiplication results using a second add circuit included in the adder, adding the plurality of fraction multiplication results using a post adder wherein first cumulative data is generated, detecting a first leading one by right shifting upper bits of the first cumulative data using the post adder, and detecting a second leading one by right shifting lower bits of the first cumulative data that exclude the upper bits of the first cumulative data using the post adder, and wherein performing the integer operation comprises: obtaining a plurality of integer multiplication results that respectively correspond to the plurality of data pairs using an integer multiplier, and adding the plurality of integer multiplication results using the adder wherein second cumulative data is generated.

16. The method of claim 15, wherein performing the floating point operation further comprises: adding the aligned plurality of fraction multiplication results using the adder, and outputting the first cumulative data, wherein detecting the first leading one further comprises: detecting a first value one by right shifting the upper bits of the first cumulative data using the post adder, and wherein detecting the second leading one further comprises detecting a second value one by right shifting the lower bits of the first cumulative data that exclude the upper bits of the first cumulative data using the post adder.

17. The method of claim 16, wherein performing the floating point operation further comprises: adding operation results of the first add circuit and the second add circuit wherein the first cumulative data is generated.

18. The method of claim 15, wherein performing the integer operation further comprises: obtaining first integer multiplication results that respectively correspond to first data pairs of the plurality of data pairs using the floating point multiplier and obtaining second integer multiplication results that respectively correspond to second data pairs of the plurality of data pairs using the integer multiplier.

19. The method of claim 18, wherein performing the integer operation further comprises: adding the first integer multiplication results using a first add circuit included in the adder, adding the second integer multiplication results using a second add circuit included in the adder, and adding operation results of the first add circuit and the second add circuit wherein the second cumulative data is generated.

20. An application processor, comprising: a neural network device that includes a floating point multiplier, an integer multiplier, an adder, a first post adder and a memory, wherein the neural network device performs a first dot product operation on a plurality of floating point data pairs or a second dot product operation on a plurality of integer data pairs, wherein, in the first dot product operation, the neural network device obtains a plurality of fraction multiplication results from the plurality of floating point data pairs, respectively, using the floating point multiplier, adds the plurality of fraction multiplication results using the adder wherein first cumulative data is generated, detects a first leading one by right shifting upper bits of the first cumulative data using the first post adder, detects a second leading one by right shifting lower bits of the first cumulative data that exclude the upper bits of the first cumulative data using the first post adder, and stores the first cumulative data in the memory, wherein in the second dot product operation, the neural network device obtains a plurality of integer multiplication results from the plurality of integer data pairs, respectively, using the floating point multiplier and the integer multiplier, adds the plurality of integer multiplication results using the adder wherein second cumulative data is generated, and stores the second cumulative data in the memory,, wherein the adder comprises: a first add circuit that adds upper bits, but not lower bits, of the plurality of fraction multiplication results in the first dot product operation; and a second add circuit that adds lower bits, but not upper bits, of the plurality of fraction multiplication results in the first dot product operation, or adds the plurality of integer multiplication results in the second dot product operation.

Patent Metadata

Filing Date

Unknown

Publication Date

February 11, 2025

Inventors

Hyunpil Kim

Hyunwoo Sim

Seongwoo Ahn

Hasong Kim

Doyoung Lee

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search