Providing Memory Dependence Prediction in Block-Atomic Dataflow Architectures

PublishedJune 16, 2020

Assigneenot available in USPTO data we have

InventorsChen-Han Ho Gregory Michael Wright

Technical Abstract

Patent Claims

23 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A processor-based device, based on a block-atomic dataflow architecture, comprising a hardware memory dependence prediction circuit, wherein the hardware memory dependence prediction circuit comprises: a predictor table configured to store a plurality of predictor table entries each comprising: a store instruction identifier corresponding to an instance of a store instruction; a block reach set comprising a plurality of block identifiers corresponding to a plurality of instruction blocks each containing one or more dependent load instructions having a memory dependence on the instance of the store instruction; and a load set comprising a plurality of load instruction identifiers each corresponding to a dependent load instruction of the one or more dependent load instructions; the hardware memory dependence prediction circuit configured to, upon a fetch of an instruction block by an execution pipeline of the processor-based device: determine, based on one or more store instruction identifiers of the predictor table, whether the instruction block contains a respective one or more store instructions that reach one or more dependent load instructions; and responsive to determining that the instruction block contains the respective one or more store instructions that reach one or more dependent load instructions, mark the respective one or more store instructions as having one or more dependent load instructions to wake.

2. The processor-based device of claim 1 , wherein the hardware memory dependence prediction circuit is further configured to: determine, based on one or more block reach sets and one or more load sets of the predictor table, whether the instruction block contains a first one or more dependent load instructions reached by one or more store instructions corresponding to the one or more block reach sets and the one or more load sets; and responsive to determining that the instruction block contains the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets, delay execution of the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets upon execution of the instruction block.

3. The processor-based device of claim 2 , wherein the hardware memory dependence prediction circuit is further configured to: detect execution of a first store instruction; determine whether the first store instruction is marked as having one or more dependent load instructions to wake; and responsive to determining that the first store instruction is marked as having one or more dependent load instructions to wake: identify one or more delayed dependent load instructions of the first store instruction; and wake the one or more delayed dependent load instructions of the first store instruction for execution.

4. The processor-based device of claim 3 , wherein the hardware memory dependence prediction circuit is configured to delay execution of the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets by being configured to: generate, based on the predictor table, a load delay marker identifying the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets, responsive to determining that the instruction block contains the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets; and transfer, based on the load delay marker, the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets to a delay buffer.

5. The processor-based device of claim 4 , wherein the hardware memory dependence prediction circuit is configured to wake the one or more delayed dependent load instructions of the first store instruction for execution by being configured to: generate, based on the predictor table, a wakeup mask identifying the one or more delayed dependent load instructions to wake; and transfer, based on the wakeup mask, the one or more delayed dependent load instructions from the delay buffer to the execution pipeline of the processor processor-based device for execution.

6. The processor-based device of claim 2 , wherein the hardware memory dependence prediction circuit is further configured to: detect a first memory dependence violation resulting from execution of a first dependent load instruction prior to a corresponding first store instruction; determine that the predictor table does not store a predictor table entry having a first store instruction identifier corresponding to the corresponding first store instruction; and responsive to determining that the predictor table does not store a predictor table entry having a first store instruction identifier corresponding to the corresponding first store instruction, generate a new predictor table entry containing a store instruction identifier corresponding to the corresponding first store instruction, a block reach set containing a block identifier corresponding to an instruction block of the first dependent load instruction, and a load set containing a load instruction identifier corresponding to the first dependent load instruction.

7. The processor-based device of claim 2 , wherein the hardware memory dependence prediction circuit is further configured to: detect a second memory dependence violation resulting from execution of a second dependent load instruction prior to a corresponding second store instruction; determine that the predictor table stores a predictor table entry having a second store instruction identifier corresponding to the corresponding second store instruction; and responsive to determining that the predictor table stores the predictor table entry having the second store instruction identifier corresponding to the corresponding second store instruction: determine whether a load instruction identifier corresponding to the second dependent load instruction is present in a load set of the predictor table entry; responsive to determining that a load instruction identifier corresponding to the second dependent load instruction is not present in the load set of the predictor table entry: add a load instruction identifier corresponding to the second dependent load instruction to the load set of the predictor table entry; determine whether a block reach set of the predictor table entry contains a block identifier corresponding to an instruction block of the second dependent load instruction; and responsive to determining that the block reach set of the predictor table entry does not contain a block identifier corresponding to the instruction block of the second dependent load instruction, add a block identifier corresponding to the instruction block of the second dependent load instruction to the block reach set of the predictor table entry.

8. The processor-based device of claim 7 , wherein: the plurality of predictor table entries each further comprises a plurality of confidence indicators corresponding to the plurality of load instruction identifiers of the load set; the hardware memory dependence prediction circuit is further configured to, responsive to determining that a load instruction identifier corresponding to the second dependent load instruction is present in the load set of the predictor table entry, increment a confidence indicator corresponding to the load instruction identifier corresponding to the second dependent load instruction; and the hardware memory dependence prediction circuit is configured to delay execution of the first one or more dependent load instructions reached by the one or more store instructions further responsive to one or more confidence indicators respectively corresponding to one or more load instruction identifiers corresponding to the first one or more dependent load instructions exceeding a confidence threshold.

9. The processor-based device of claim 8 , wherein the hardware memory dependence prediction circuit is further configured to: detect execution of a delayed dependent load instruction; determine whether a predicted memory dependence for the delayed dependent load instruction is confirmed; and responsive to determining that a predicted memory dependence for the delayed dependent load instruction is not confirmed, decrement a confidence indicator for the delayed dependent load instruction in the predictor table.

10. The processor-based device of claim 1 , wherein each store instruction identifier comprises one of the group consisting of: a program counter (PC) of an instruction block containing the instance of the store instruction and an indication of a logical order of the store instruction within the instruction block containing the instance of the store instruction; the PC of the instruction block containing the instance of the store instruction and an offset indicating a location of the store instruction relative to a start of the instruction block containing the instance of the store instruction; and a memory address of the store instruction.

11. The processor-based device of claim 1 , wherein each load instruction identifier of the plurality of load instruction identifiers comprises one of the group consisting of: a PC of an instruction block containing a dependent load instruction corresponding to the load instruction identifier and an indication of a logical order of the dependent load instruction within the instruction block containing the dependent load instruction corresponding to the load instruction identifier; the PC of the instruction block and an offset indicating a location of the dependent load instruction corresponding to the load instruction identifier relative to a start of the instruction block containing the dependent load instruction corresponding to the load instruction identifier; and a memory address of the dependent load instruction corresponding to the load instruction identifier.

12. The processor-based device of claim 1 integrated into an integrated circuit (IC).

13. The processor-based device of claim 1 , wherein the processor-based device is selected from the group consisting of: a set top box; a navigation device; a communications device; a fixed location data unit; a mobile location data unit; a global positioning system (GPS) device; a mobile phone; a cellular phone; a smart phone; a session initiation protocol (SIP) phone; a tablet; a server; a computer; a portable computer; a mobile computing device; a wearable computing device; a desktop computer; a personal digital assistant (PDA); a monitor; a computer monitor; a television; a tuner; a radio; a satellite radio; a music player; a digital music player; a portable music player; a digital video player; a video player; a digital video disc (DVD) player; a portable digital video player; an automobile; a vehicle component; avionics systems; a drone; and a multicopter.

14. A method for providing memory dependence prediction, comprising: detecting, by a memory dependence prediction circuit of a processor-based device based on a block-atomic dataflow architecture, a fetch of an instruction block by an execution pipeline of the processor-based device; upon detecting the fetch of the instruction block, determining, based on one or more store instruction identifiers of a predictor table, whether the instruction block contains a respective one or more store instructions that reach one or more dependent load instructions; responsive to determining that the instruction block contains the respective one or more store instructions that reach one or more dependent load instructions, marking the respective one or more store instructions as having one or more dependent load instructions to wake; determining, based on one or more block reach sets and one or more load sets of the predictor table, whether the instruction block contains one or more dependent load instructions reached by one or more store instructions corresponding to the one or more block reach sets and the one or more load sets, wherein each block reach set comprises a plurality of block identifiers corresponding to a plurality of instruction blocks each containing one or more dependent load instructions, and each load set comprises a plurality of load instruction identifiers each corresponding to a dependent load instruction of the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets; and responsive to determining that the instruction block contains the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets, delaying execution of the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets upon execution of the instruction block.

15. The method of claim 14 , further comprising: detecting execution of a store instruction; determining whether the store instruction is marked as having one or more dependent load instructions to wake; and responsive to determining that the store instruction is marked as having one or more dependent load instructions to wake: identifying one or more delayed dependent load instructions of the store instruction; and waking the one or more delayed dependent load instructions of the store instruction for execution.

16. The method of claim 15 , wherein delaying execution of the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets comprises: generating, based on the one or more block reach sets and the one or more load sets of the predictor table, a load delay marker identifying the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets, responsive to determining that the instruction block contains the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets; and transferring, based on the load delay marker, the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets to a delay buffer.

17. The method of claim 16 , wherein waking the one or more delayed dependent load instructions of the store instruction for execution comprises: generating, based on the one or more block reach sets and the one or more load sets of the predictor table, a wakeup mask identifying the one or more delayed dependent load instructions to wake; and transferring, based on the wakeup mask, the one or more delayed dependent load instructions from the delay buffer to the execution pipeline of the processor-based device for execution.

18. The method of claim 14 , further comprising: detecting a first memory dependence violation resulting from execution of a first dependent load instruction prior to a corresponding first store instruction; determining that the predictor table does not store a predictor table entry having a first store instruction identifier corresponding to the corresponding first store instruction; and responsive to determining that the predictor table does not store a predictor table entry having a first store instruction identifier corresponding to the corresponding first store instruction, generating a new predictor table entry containing a store instruction identifier corresponding to the corresponding first store instruction, a block reach set containing a block identifier corresponding to an instruction block of the first dependent load instruction, and a load set containing a load instruction identifier corresponding to the first dependent load instruction.

19. The method of claim 14 , further comprising: detecting a second memory dependence violation resulting from execution of a second dependent load instruction prior to a corresponding second store instruction; determining that the predictor table stores a predictor table entry having a second store instruction identifier corresponding to the corresponding second store instruction; and responsive to determining that the predictor table stores the predictor table entry having the second store instruction identifier corresponding to the corresponding second store instruction: determining whether a load instruction identifier corresponding to the second dependent load instruction is present in a load set of the predictor table entry; responsive to determining that a load instruction identifier corresponding to the second dependent load instruction is not present in the load set of the predictor table entry: adding a load instruction identifier corresponding to the second dependent load instruction to the load set of the predictor table entry; determining whether a block reach set of the predictor table entry contains a block identifier corresponding to an instruction block of the second dependent load instruction; and responsive to determining that the block reach set of the predictor table entry does not contain a block identifier corresponding to the instruction block of the second dependent load instruction, adding a block identifier corresponding to the instruction block of the second dependent load instruction to the block reach set of the predictor table entry.

20. The method of claim 19 , wherein: the predictor table further comprises a plurality of confidence indicators corresponding to the plurality of load instruction identifiers of the load set; the method further comprises, responsive to determining that a load instruction identifier corresponding to the second dependent load instruction is present in the load set of the predictor table entry, incrementing a confidence indicator corresponding to the load instruction identifier corresponding to the second dependent load instruction; and delaying execution of the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets is further responsive to one or more confidence indicators respectively corresponding to one or more load instruction identifiers corresponding to the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets exceeding a confidence threshold.

21. The method of claim 20 , further comprising: detecting execution of a delayed dependent load instruction; determining whether a predicted memory dependence for the delayed dependent load instruction is confirmed; and responsive to determining that a predicted memory dependence for the delayed dependent load instruction is not confirmed, decrementing a confidence indicator for the delayed dependent load instruction in the predictor table.

22. The method of claim 14 , wherein each store instruction identifier of the one or more store instruction identifiers comprises one of the group consisting of: a program counter (PC) of an instruction block containing an instance of a corresponding store instruction and an indication of a logical order of the store instruction within the instruction block containing the instance of the corresponding store instruction; the PC of the instruction block containing the instance of the corresponding store instruction and an offset indicating a location of the store instruction relative to a start of the instruction block containing the instance of the corresponding store instruction; and a memory address of the store instruction.

23. The method of claim 14 , wherein each load instruction identifier of the plurality of load instruction identifiers comprises one of the group consisting of: a PC of an instruction block containing a dependent load instruction corresponding to the load instruction identifier and an indication of a logical order of the dependent load instruction within the instruction block containing the dependent load instruction corresponding to the load instruction identifier; the PC of the instruction block and an offset indicating a location of the dependent load instruction corresponding to the load instruction identifier relative to a start of the instruction block containing the dependent load instruction corresponding to the load instruction identifier; and a memory address of the dependent load instruction corresponding to the load instruction identifier.

Patent Metadata

Filing Date

Unknown

Publication Date

June 16, 2020

Inventors

Chen-Han Ho

Gregory Michael Wright

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search