US-12282853

Accelerated embedding layer computations

PublishedApril 22, 2025

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Methods, systems, and apparatus, including computer-readable media, are described for performing neural network computations using a system configured to implement a neural network on a hardware circuit. The system includes a host that receives a batch of inputs to a neural network layer. Each of the inputs is stored in a memory location identified by an address. The system identifies one or more duplicate addresses in a listing of addresses for one or more inputs. For each duplicate address: the system generates a unique identifier that identifies the duplicate address in the listing of addresses. The system (i) obtains first inputs from memory locations identified by addresses corresponding to the unique identifiers and (ii) generates an output of the layer from the obtained first inputs.

Patent Claims

16 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for performing neural network computations using processor cores in a multi-core processing unit of a hardware integrated circuit configured to implement a neural network, the method comprising: receiving a request comprising a set of addresses; generating a filtered address list based on a filtering operation performed on the set of addresses; retrieving, by a first processor core, a first vector of weights corresponding to a first input feature identified by a first address in the filtered address list; performing a neural network computation using the first input feature and the corresponding first vector of weights; and generating an output for a layer of the neural network based on the neural network computation.

2. The method of claim 1, further comprising: retrieving, by a second processor core, a second vector of weights corresponding to a second input feature identified by a second address in the filtered address list; and performing a neural network computation using the second input feature and the corresponding second vector of weights.

3. The method of claim 1, wherein generating the filtered address list comprises: identifying duplicate addresses in the set of addresses using a scatter circuit of a host processor of the hardware integrated circuit; and filtering out the duplicate addresses based on a metadata tag that augments one or more duplicate addresses in the set of addresses to generate the filtered address list.

4. The method of claim 1, wherein: i) the layer of the neural network is an embedding layer; and ii) the first vector of weights and the second vector of weights are distinct embedding vectors for the embedding layer.

5. The method of claim 4, wherein: i) the first vector of weights is stored in a first data shard corresponding to the first processor core; and ii) the second vector of weights is stored in a second, different data shard corresponding to the second processor core.

6. The method of claim 1, further comprising: distributing addresses in the filtered address list among the first processor core and a second processor core of the multi-core processing unit.

7. The method of claim 2, wherein the neural network computation performed by the first processor core is a first neural network computation and the neural network computation performed by the second processor core is a second neural network computation, the method further comprises: receiving, at the first processor core from the second processor core, a result of the second neural network computation performed at the second processor core; performing, at the first processor core, a reduction operation based on a result of the first neural network computation and the result of the second neural network computation; and generating the output for the layer of the neural network based on a result of the reduction operation.

8. The method of claim 7, further comprising: communicating, from the first processor core to a third processor core of the multi-core processing unit, the result of the reduction operation; generating, at the third processor core, partial activations based on the result of the reduction operation received from the first processor core.

9. A system for performing neural network computations using processor cores in a multi-core processing unit of the system and a hardware integrated circuit configured to implement the neural network, the system comprising: a processor; and a non-transitory machine-readable medium for storing instructions that are executable by the processor to cause performance of operations comprising: receiving a request comprising a set of addresses; generating a filtered address list based on a filtering operation performed on the set of addresses; retrieving, by a first processor core, a first vector of weights corresponding to a first input feature identified by a first address in the filtered address list; performing a neural network computation using the first input feature and the corresponding first vector of weights; and generating an output for a layer of the neural network based on the neural network computation.

10. The system of claim 9, further comprising: retrieving, by a second processor core, a second vector of weights corresponding to a second input feature identified by a second address in the filtered address list; and performing a neural network computation using the second input feature and the corresponding second vector of weights.

11. The system of claim 9, wherein generating the filtered address list comprises: identifying duplicate addresses in the set of addresses using a scatter circuit of a host processor of the hardware integrated circuit; and filtering out the duplicate addresses based on a metadata tag that augments one or more duplicate addresses in the set of addresses to generate the filtered address list.

12. The system of claim 9, wherein: i) the layer of the neural network is an embedding layer; and ii) the first vector of weights and the second vector of weights are distinct embedding vectors for the embedding layer.

13. The system of claim 12, wherein: i) the first vector of weights is stored in a first data shard corresponding to the first processor core; and ii) the second vector of weights is stored in a second, different data shard corresponding to the second processor core.

14. The system of claim 9, further comprising: distributing addresses in the filtered address list among the first processor core and a second processor core of the multi-core processing unit.

15. The system of claim 10, wherein the neural network computation performed by the first processor core is a first neural network computation and the neural network computation performed by the second processor core is a second neural network computation, the operations further comprising: receiving, at the first processor core from the second processor core, a result of the second neural network computation performed at the second processor core; performing, at the first processor core, a reduction operation based on a result of the first neural network computation and the result of the second neural network computation; and generating the output for the layer of the neural network based on a result of the reduction operation.

16. The system of claim 15, further comprising: communicating, from the first processor core to a third processor core of the multi-core processing unit, the result of the reduction operation; and generating, at the third processor core, partial activations based on the result of the reduction operation received from the first processor core.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06N G06F

Patent Metadata

Filing Date

February 20, 2024

Publication Date

April 22, 2025

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search