US-11531619

High bandwidth memory system with crossbar switch for dynamically programmable distribution scheme

PublishedDecember 20, 2022

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. Each request processing unit includes a plurality of decomposition units and a crossbar switch, the crossbar switch communicatively connecting each of the plurality of decomposition units to each of the plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine. The control logic unit is configured to access the plurality of memory units using a dynamically programmable distribution scheme.

Patent Claims

17 claims

Legal claims defining the scope of protection, as filed with the USPTO.

2. The system of claim 1, wherein the first request processing unit of the first memory unit is configured to receive a broadcasted memory request.

3. The system of claim 2, wherein the broadcasted memory request references data stored in each of the plurality of memory units.

4. The system of claim 1, wherein the dynamically programmable distribution scheme utilizes an identifier associated with a workload of the first processing element.

5. The system of claim 1, wherein the first request processing unit is configured to determine whether each of the plurality of partial requests corresponds to corresponding data stored in a corresponding one of the first plurality of memory banks associated with the corresponding request processing unit.

6. The system of claim 1, wherein the first crossbar switch of the first request processing unit is configured to direct a first partial request for data stored in a corresponding one of the first plurality of memory banks to the corresponding memory bank and receive a retrieved data payload from the corresponding memory bank.

7. The system of claim 6, wherein the first request processing unit is configured to prepare a partial response using the retrieved data payload and provide the prepared partial response to the first processing element of the plurality of processing elements.

8. The system of claim 7, wherein the prepared partial response includes a corresponding sequence identifier ordering the partial response among a plurality of partial responses.

9. The system of claim 1, wherein the plurality of memory units includes a north memory unit, an east memory unit, a south memory unit, and a west memory unit.

10. The system of claim 1, wherein the plurality of processing elements are arranged in a two-dimensional array and the communication network communicatively includes a corresponding two-dimensional communication network connecting the plurality of processing elements.

11. The system of claim 10, wherein each decomposition unit of the plurality of decomposition units is configured to only receive a memory request from and only provide a response to processing elements located in a same row or column of the two-dimensional array.

12. The system of claim 4, wherein two or more processing elements of the plurality of processing elements share the identifier.

13. The system of claim 1, wherein a second processing element of the plurality of processing elements is configured with a different dynamically programmable distribution scheme for accessing memory units than the first processing element.

14. The system of claim 1, wherein the control logic unit of the first processing element is further configured with an access unit size for distributing data across the plurality of memory units.

15. The system of claim 1, wherein data elements of a machine learning weight matrix are distributed across the plurality of memory units using the dynamically programmable distribution scheme.

17. The method of claim 16, wherein the first memory unit includes a plurality of connections communicatively connecting the first memory unit to a processor, the processor includes the first processing element among a plurality of processing elements, and the received memory request is received at a first connection of the plurality of connections.

18. The method of claim 17, wherein the partial response is provided to the first processing element via the first connection.

20. The method of claim 19, wherein the first memory request was broadcasted to the plurality of memory units.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F G06N

Patent Metadata

Filing Date

December 17, 2019

Publication Date

December 20, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search