On-chip Atomic Transaction Engine

PublishedMay 17, 2022

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method, comprising: executing an instruction on a processor core of a plurality of processor cores, wherein individual ones of the processor cores are coupled via a memory interface to an atomic transaction engine instance and a memory arbitration component different from the atomic transaction engine instance, the executing comprising: identifying that the instruction comprises an access operation targeting a location in a distributed random access memory; sending, responsive to the identifying, parameters for the access operation to the respective atomic transaction engine instance via the respective memory interface; determining, by the atomic transaction engine instance based on the parameters received, that the targeted location of the distributed shared random access memory is controlled by another atomic transaction engine instance, and responsive to the determining: sending, by the atomic transaction engine instance, a request to perform the access operation to the other atomic transaction engine instance via an interconnect; and performing the access operation by the other atomic transaction engine instance, or initiating, by the other atomic transaction engine instance, the performing of the access operation by another processor core of the plurality of processor cores, wherein the performing comprises accessing a memory via the respective memory arbitration component of the other processor core.

2. The method of claim 1 , the determining further comprising: executing additional instructions on the processor core; or determining, by pipeline circuitry within the processor core, that the targeted location of the distributed shared random access memory is controlled by another atomic transaction engine instance.

3. The method of claim 1 , further comprising: receiving, by the atomic transaction engine instance from the other atomic transaction engine instance, response data for the request; and in response to said receiving: returning, by the atomic transaction engine instance, the response data to the processor core; or writing, by the atomic transaction engine instance, the response data to a location in memory from which the processor core expects to retrieve it.

4. The method of claim 1 , wherein the access operation targets multiple portions of a distributed shared memory, each controlled by an atomic transaction engine instance coupled to a different processor core of the plurality of processor cores; wherein sending the request to perform the access operation comprising sending a request to respective ones of the atomic transaction engine instances coupled to different processor cores of the plurality of processor cores; and wherein performing the access operation comprises performing the access operation or initiating the performing of the access operation by respective ones of the atomic transaction engine instances coupled to different processor cores of the plurality of processor cores.

5. The method of claim 1 , wherein performing the access operation by the other atomic transaction engine instance, or initiating, by the other atomic transaction engine instance, the performing of the access operation by the other processor core, comprises: determining if the access operation is performable by circuitry within the other atomic transaction engine instance without intervention by the other processor core; performing the access operation by the other atomic transaction engine instance responsive to determining that the access operation is performable by circuitry within the other atomic transaction engine instance without intervention by the other processor core; and initiating, by the other atomic transaction engine instance, the performing of the access operation by another processor core of the plurality of processor cores responsive to determining that the access operation is not performable by circuitry within the other atomic transaction engine instance without intervention by the other processor core.

6. The method of claim 1 , wherein initiating, by the other atomic transaction engine instance, the performing of the access operation by another processor core of the plurality of processor cores comprises: writing information about the access operation into one or more storage locations that are accessible to the other processor core; and issuing an interrupt to the other processor core indicating that the access operation should be executed by the other processor core.

7. A system, comprising: a plurality of processor cores, each coupled via a memory interface to an atomic transaction engine instance and a memory arbitration component different from the atomic transaction engine instance; wherein a processor core of the plurality of processor cores is configured to execute an instruction of a distributed application, wherein to execute the instruction, the processor core is configured to: identify that the instruction comprises an access operation targeting a location in a distributed random access memory; and send, responsive to the identifying, parameters for the access operation to the respective atomic transaction engine instance via the respective memory interface; wherein the atomic transaction engine instance is configured to: determine, based on the parameters received, that the targeted location of the distributed shared random access memory is controlled by another atomic transaction engine instance; and send, responsive to the determining, a request to perform the access operation to the other atomic transaction engine instance via an interconnect; and wherein the other atomic transaction engine instance is configured to: receive the request to perform the access operation from the atomic transaction engine instance via the interconnect; and perform the access operation or initiate performance of the access operation by another processor core of the plurality of processor cores, the performance comprising an access to a memory via the respective memory arbitration component of the other processor core.

8. The system of claim 7 , wherein during performance of the access operation, the other atomic transaction engine instance is configured to collect response data associated with execution of the identified operation.

9. The system of claim 7 , wherein the other atomic transaction engine instance comprises circuitry configured to perform access operations of a plurality of operation types; and wherein to determine if the access operation is performable by circuitry within the other atomic transaction engine instance without intervention by the other processor core, the other atomic transaction engine instance is configured to determine if the a type of the access operation is one of the plurality of operation types executable by circuitry within the other atomic transaction engine instance.

10. The system of claim 9 , wherein the access operation comprises a sequence of operations identified by information in the received request, each of which is of an operation type that is performable by circuitry within the other atomic transaction engine instance.

11. The system of claim 7 , wherein the other atomic transaction engine instance is configured to initiate the performance of the access operation by the other processor core in response to determining that the access operation is not performable by circuitry within the other atomic transaction engine instance without intervention by the other processor core; and wherein to initiate the performance of the access operation by the other processor core, the other atomic transaction engine instance is configured to: write information about the access operation into one or more storage locations that are accessible to the other processor core; and issue an interrupt to the other processor core indicating that the access operation should be executed by the other processor core.

12. The system of claim 11 , wherein, in response to the interrupt, the other processor core is configured to perform the access operation.

13. The system of claim 7 , wherein the other atomic transaction engine instance is further configured to: generate a response associated with performance of the access operation; and return the response to the atomic transaction engine instance.

14. A system, comprising: a plurality of processor cores, each coupled via a memory interface to an atomic transaction engine instance and a memory arbitration component different from the atomic transaction engine instance; and a distributed shared memory, wherein each of the plurality of processor cores controls a respective portion of the distributed shared memory; wherein an atomic transaction engine instance coupled to a processor core of the plurality of processor cores comprises circuitry configured to: retrieve a request from a receive queue local to the atomic transaction engine instance, wherein the request identifies an operation that targets a location in the distributed shared memory that is controlled by the processor core coupled to the atomic transaction engine instance; execute the identified operation, the execution comprising an access to a memory via the respective memory arbitration component of the processor core coupled to the atomic transaction engine instance.

15. The system of claim 14 , wherein the request was generated by another atomic transaction engine instance on behalf of a processor core of the plurality of processor cores coupled to the other atomic transaction engine instance and was communicated to the atomic transaction engine instance by the other atomic transaction engine instance.

16. The system of claim 14 , wherein the request was generated by the atomic transaction engine instance on behalf of the processor core coupled to the atomic transaction engine instance and was placed in the receive queue by the atomic transaction engine.

17. The system of claim 14 , wherein the receive queue is one of multiple receive queues local to the atomic transaction engine instance; and wherein each of the multiple receive queues stores requests received from respective different ones of the plurality of atomic transaction engine instances, or stores requests comprising operations of different types or having different priorities.

18. The system of claim 14 , wherein atomic transaction engine instances coupled to respective ones of the plurality of processor cores communicate with each other over a dedicated low-latency interconnect.

19. The system of claim 14 , wherein the system comprises two or more clusters of processor cores, each cluster comprising multiple processor cores and a respective crossbar over which respective atomic transaction instances associated with each of the multiple processor cores communicate with each other; and wherein atomic transaction instances coupled to respective processor cores in each of the clusters communicate with atomic transaction instances coupled to processor cores in other ones of the clusters over an interconnect between the respective crossbars.

20. The system of claim 14 , wherein the processor core coupled to the atomic transaction engine instance comprises address decode circuitry configured to determine if an operation targets a location in the distributed shared memory; and wherein the processor core coupled to the atomic transaction engine instance is configured to: determine, during execution of a distributed application, that an operation of the distributed application targets a location in the distributed shared memory; and in response to determining that the operation of the distributed application targets a location in the distributed shared memory: refrain from advancing the operation within pipeline circuitry of processor core coupled to the atomic transaction engine instance; and provide information about the operation of the distributed application to the atomic transaction engine instance usable for processing the operation of the distributed application.

Patent Metadata

Filing Date

Unknown

Publication Date

May 17, 2022

Inventors

Rishabh Jain

Erik M. Schlanger

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search