10684859

Providing Memory Dependence Prediction in Block-Atomic Dataflow Architectures

PublishedJune 16, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
23 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A processor-based device, based on a block-atomic dataflow architecture, comprising a hardware memory dependence prediction circuit, wherein the hardware memory dependence prediction circuit comprises: a predictor table configured to store a plurality of predictor table entries each comprising: a store instruction identifier corresponding to an instance of a store instruction; a block reach set comprising a plurality of block identifiers corresponding to a plurality of instruction blocks each containing one or more dependent load instructions having a memory dependence on the instance of the store instruction; and a load set comprising a plurality of load instruction identifiers each corresponding to a dependent load instruction of the one or more dependent load instructions; the hardware memory dependence prediction circuit configured to, upon a fetch of an instruction block by an execution pipeline of the processor-based device: determine, based on one or more store instruction identifiers of the predictor table, whether the instruction block contains a respective one or more store instructions that reach one or more dependent load instructions; and responsive to determining that the instruction block contains the respective one or more store instructions that reach one or more dependent load instructions, mark the respective one or more store instructions as having one or more dependent load instructions to wake.

Plain English Translation

A processor-based device with a block-atomic dataflow architecture includes a hardware memory dependence prediction circuit to optimize memory access operations. The circuit predicts dependencies between store and load instructions to improve execution efficiency. The predictor table stores entries for store instructions, each entry containing a store instruction identifier, a block reach set, and a load set. The block reach set lists instruction blocks containing load instructions dependent on the store instruction, while the load set lists individual dependent load instructions. When an instruction block is fetched, the circuit checks if it contains store instructions that affect dependent load instructions. If such dependencies are found, the store instructions are marked to wake their dependent loads, ensuring correct execution order while minimizing stalls. This approach enhances performance by reducing unnecessary memory access delays in dataflow architectures.

Claim 2

Original Legal Text

2. The processor-based device of claim 1 , wherein the hardware memory dependence prediction circuit is further configured to: determine, based on one or more block reach sets and one or more load sets of the predictor table, whether the instruction block contains a first one or more dependent load instructions reached by one or more store instructions corresponding to the one or more block reach sets and the one or more load sets; and responsive to determining that the instruction block contains the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets, delay execution of the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets upon execution of the instruction block.

Plain English Translation

A processor-based device includes a hardware memory dependence prediction circuit designed to optimize instruction execution by predicting and managing memory dependencies. The circuit analyzes instruction blocks to identify load instructions that may depend on prior store instructions, using block reach sets and load sets stored in a predictor table. These sets track relationships between store and load instructions to determine potential dependencies. When the circuit detects that an instruction block contains load instructions dependent on store instructions, it delays execution of those dependent load instructions until the corresponding store instructions have executed. This prevents incorrect data reads and ensures memory consistency, improving processor efficiency and performance by reducing stalls and mispredictions. The system dynamically tracks instruction dependencies in hardware, allowing for real-time adjustments to execution order without software intervention. This approach is particularly useful in high-performance computing environments where memory access patterns are complex and unpredictable.

Claim 3

Original Legal Text

3. The processor-based device of claim 2 , wherein the hardware memory dependence prediction circuit is further configured to: detect execution of a first store instruction; determine whether the first store instruction is marked as having one or more dependent load instructions to wake; and responsive to determining that the first store instruction is marked as having one or more dependent load instructions to wake: identify one or more delayed dependent load instructions of the first store instruction; and wake the one or more delayed dependent load instructions of the first store instruction for execution.

Plain English Translation

This invention relates to processor-based devices with hardware memory dependence prediction circuits designed to optimize instruction execution by managing dependencies between store and load instructions. The problem addressed is inefficiency in modern processors where load instructions dependent on prior store instructions may be unnecessarily delayed, leading to performance bottlenecks. The hardware memory dependence prediction circuit detects the execution of a store instruction and checks if it is marked as having dependent load instructions that should be woken up. If marked, the circuit identifies one or more delayed dependent load instructions associated with the store instruction and wakes them for execution. This mechanism ensures that load instructions that rely on data from a store instruction are not unnecessarily stalled, improving processing efficiency. The circuit operates by tracking dependencies between instructions in hardware, reducing the need for software-level synchronization or complex prediction algorithms. By waking dependent load instructions only when necessary, the system avoids unnecessary delays while maintaining data consistency. This approach is particularly useful in high-performance computing environments where minimizing instruction latency is critical. The invention enhances processor performance by dynamically managing memory dependencies at the hardware level.

Claim 4

Original Legal Text

4. The processor-based device of claim 3 , wherein the hardware memory dependence prediction circuit is configured to delay execution of the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets by being configured to: generate, based on the predictor table, a load delay marker identifying the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets, responsive to determining that the instruction block contains the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets; and transfer, based on the load delay marker, the first one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets to a delay buffer.

Plain English Translation

The invention relates to processor-based devices with hardware memory dependence prediction circuits designed to optimize instruction execution by managing memory dependencies. The problem addressed is the inefficiency in modern processors where dependent load instructions (instructions that rely on prior store operations) may execute prematurely, leading to incorrect results or performance penalties due to subsequent corrections. The hardware memory dependence prediction circuit identifies and delays execution of dependent load instructions that are reached by store instructions within a block of instructions. It uses a predictor table to generate a load delay marker for these dependent load instructions. When the instruction block contains such dependent load instructions, the circuit transfers them to a delay buffer, preventing premature execution until the corresponding store operations complete. This ensures correct memory ordering and improves processor efficiency by avoiding speculative execution errors and reducing pipeline stalls. The predictor table helps track relationships between store and load instructions, enabling accurate prediction of dependencies. The delay buffer temporarily holds the dependent load instructions until the necessary store operations are resolved, ensuring data consistency and performance optimization.

Claim 5

Original Legal Text

5. The processor-based device of claim 4 , wherein the hardware memory dependence prediction circuit is configured to wake the one or more delayed dependent load instructions of the first store instruction for execution by being configured to: generate, based on the predictor table, a wakeup mask identifying the one or more delayed dependent load instructions to wake; and transfer, based on the wakeup mask, the one or more delayed dependent load instructions from the delay buffer to the execution pipeline of the processor processor-based device for execution.

Plain English Translation

This invention relates to processor-based devices with hardware memory dependence prediction circuits designed to optimize instruction execution by managing dependencies between store and load instructions. The problem addressed is the inefficiency in modern processors where load instructions dependent on prior store instructions are delayed, causing pipeline stalls and reduced performance. The hardware memory dependence prediction circuit predicts dependencies between store and load instructions using a predictor table. When a store instruction is executed, the circuit identifies dependent load instructions that have been delayed in a delay buffer. To resolve these dependencies, the circuit generates a wakeup mask that specifies which delayed load instructions should be executed. The wakeup mask is then used to transfer the selected load instructions from the delay buffer to the processor's execution pipeline, allowing them to proceed without unnecessary delays. This mechanism improves processor efficiency by reducing stalls and ensuring dependent instructions are executed in the correct order while maintaining data consistency. The predictor table is updated dynamically to refine future predictions, enhancing overall performance.

Claim 6

Original Legal Text

6. The processor-based device of claim 2 , wherein the hardware memory dependence prediction circuit is further configured to: detect a first memory dependence violation resulting from execution of a first dependent load instruction prior to a corresponding first store instruction; determine that the predictor table does not store a predictor table entry having a first store instruction identifier corresponding to the corresponding first store instruction; and responsive to determining that the predictor table does not store a predictor table entry having a first store instruction identifier corresponding to the corresponding first store instruction, generate a new predictor table entry containing a store instruction identifier corresponding to the corresponding first store instruction, a block reach set containing a block identifier corresponding to an instruction block of the first dependent load instruction, and a load set containing a load instruction identifier corresponding to the first dependent load instruction.

Plain English Translation

A processor-based device includes a hardware memory dependence prediction circuit designed to improve performance by predicting and handling memory dependence violations. The circuit detects when a dependent load instruction executes before its corresponding store instruction, a condition known as a memory dependence violation. If the predictor table lacks an entry for the store instruction involved in the violation, the circuit generates a new entry. This entry includes an identifier for the store instruction, a block reach set identifying the instruction block containing the dependent load, and a load set with the load instruction identifier. The predictor table is used to anticipate future memory dependence violations, allowing the processor to optimize instruction scheduling and reduce stalls. This mechanism enhances efficiency by dynamically learning and storing patterns of memory dependence violations, enabling proactive adjustments to execution flow. The system is particularly useful in high-performance computing environments where minimizing pipeline stalls is critical. The hardware-based approach ensures low-latency decision-making, improving overall system throughput.

Claim 7

Original Legal Text

7. The processor-based device of claim 2 , wherein the hardware memory dependence prediction circuit is further configured to: detect a second memory dependence violation resulting from execution of a second dependent load instruction prior to a corresponding second store instruction; determine that the predictor table stores a predictor table entry having a second store instruction identifier corresponding to the corresponding second store instruction; and responsive to determining that the predictor table stores the predictor table entry having the second store instruction identifier corresponding to the corresponding second store instruction: determine whether a load instruction identifier corresponding to the second dependent load instruction is present in a load set of the predictor table entry; responsive to determining that a load instruction identifier corresponding to the second dependent load instruction is not present in the load set of the predictor table entry: add a load instruction identifier corresponding to the second dependent load instruction to the load set of the predictor table entry; determine whether a block reach set of the predictor table entry contains a block identifier corresponding to an instruction block of the second dependent load instruction; and responsive to determining that the block reach set of the predictor table entry does not contain a block identifier corresponding to the instruction block of the second dependent load instruction, add a block identifier corresponding to the instruction block of the second dependent load instruction to the block reach set of the predictor table entry.

Plain English Translation

A processor-based device includes a hardware memory dependence prediction circuit designed to improve performance by predicting and managing memory dependence violations. The circuit detects when a dependent load instruction executes before its corresponding store instruction, creating a memory dependence violation. To address this, the circuit checks a predictor table for an entry associated with the store instruction. If found, the circuit determines whether the load instruction's identifier is already in the entry's load set. If not, it adds the load identifier to the load set. The circuit also checks if the instruction block of the load instruction is in the entry's block reach set. If not, it adds the block identifier to the block reach set. This mechanism helps track and predict memory dependencies, allowing the processor to optimize instruction execution and reduce performance penalties caused by such violations. The predictor table dynamically updates to reflect new dependencies, improving accuracy over time. This approach enhances efficiency in out-of-order execution by minimizing stalls and reordering penalties.

Claim 8

Original Legal Text

8. The processor-based device of claim 7 , wherein: the plurality of predictor table entries each further comprises a plurality of confidence indicators corresponding to the plurality of load instruction identifiers of the load set; the hardware memory dependence prediction circuit is further configured to, responsive to determining that a load instruction identifier corresponding to the second dependent load instruction is present in the load set of the predictor table entry, increment a confidence indicator corresponding to the load instruction identifier corresponding to the second dependent load instruction; and the hardware memory dependence prediction circuit is configured to delay execution of the first one or more dependent load instructions reached by the one or more store instructions further responsive to one or more confidence indicators respectively corresponding to one or more load instruction identifiers corresponding to the first one or more dependent load instructions exceeding a confidence threshold.

Plain English Translation

This invention relates to hardware-based memory dependence prediction in processor-based devices, specifically addressing the challenge of accurately predicting and managing memory dependencies between load and store instructions to improve performance. The system includes a predictor table storing entries that each contain a load set of identifiers for dependent load instructions and corresponding confidence indicators. A hardware memory dependence prediction circuit evaluates these entries to predict dependencies between store and load instructions. When a second dependent load instruction is identified in a predictor table entry, the circuit increments the confidence indicator for that load instruction's identifier. The circuit then delays execution of one or more dependent load instructions if their corresponding confidence indicators exceed a predefined threshold, ensuring correct memory ordering while optimizing performance. The confidence indicators dynamically adjust based on execution history, improving prediction accuracy over time. This approach reduces unnecessary stalls and mispredictions, enhancing processor efficiency in handling memory operations.

Claim 9

Original Legal Text

9. The processor-based device of claim 8 , wherein the hardware memory dependence prediction circuit is further configured to: detect execution of a delayed dependent load instruction; determine whether a predicted memory dependence for the delayed dependent load instruction is confirmed; and responsive to determining that a predicted memory dependence for the delayed dependent load instruction is not confirmed, decrement a confidence indicator for the delayed dependent load instruction in the predictor table.

Plain English Translation

The invention relates to processor-based devices with hardware memory dependence prediction circuits designed to improve performance by predicting and managing memory dependencies between instructions. Memory dependencies occur when an instruction relies on data from a previous load instruction, and incorrect predictions can lead to performance penalties. The hardware circuit predicts these dependencies to optimize instruction execution but must handle cases where predictions are incorrect. When a delayed dependent load instruction is executed, the circuit checks whether the predicted memory dependence was accurate. If the prediction was incorrect, the circuit reduces the confidence indicator for that instruction in the predictor table, allowing the system to adjust future predictions. This adaptive mechanism helps maintain prediction accuracy and efficiency, particularly in scenarios where memory access patterns are dynamic or unpredictable. The predictor table stores confidence indicators for instructions, which are dynamically updated based on prediction outcomes, ensuring the system learns from errors and improves over time. This approach enhances processor performance by minimizing mispredictions and their associated penalties.

Claim 10

Original Legal Text

10. The processor-based device of claim 1 , wherein each store instruction identifier comprises one of the group consisting of: a program counter (PC) of an instruction block containing the instance of the store instruction and an indication of a logical order of the store instruction within the instruction block containing the instance of the store instruction; the PC of the instruction block containing the instance of the store instruction and an offset indicating a location of the store instruction relative to a start of the instruction block containing the instance of the store instruction; and a memory address of the store instruction.

Plain English Translation

A processor-based device is designed to improve memory access efficiency by tracking store instructions in a computing system. The device addresses the challenge of managing memory operations in parallel processing environments, where maintaining correct memory consistency and order is critical. The invention involves a mechanism for identifying store instructions using specific identifiers to ensure accurate tracking and synchronization of memory writes. Each store instruction identifier includes one of three possible formats. The first format uses a program counter (PC) of the instruction block containing the store instruction, combined with an indication of the logical order of the store instruction within that block. The second format uses the PC of the instruction block along with an offset indicating the store instruction's location relative to the start of the block. The third format directly uses the memory address of the store instruction itself. These identifiers enable precise tracking of store operations, allowing the processor to maintain memory consistency and optimize performance by reducing unnecessary synchronization overhead. The identifiers are used to distinguish between different instances of store instructions, ensuring that memory operations are correctly ordered and executed in parallel processing scenarios. This approach enhances system efficiency by minimizing conflicts and improving the accuracy of memory access operations. The invention is particularly useful in multi-core or multi-threaded systems where maintaining memory consistency is essential for correct program execution.

Claim 11

Original Legal Text

11. The processor-based device of claim 1 , wherein each load instruction identifier of the plurality of load instruction identifiers comprises one of the group consisting of: a PC of an instruction block containing a dependent load instruction corresponding to the load instruction identifier and an indication of a logical order of the dependent load instruction within the instruction block containing the dependent load instruction corresponding to the load instruction identifier; the PC of the instruction block and an offset indicating a location of the dependent load instruction corresponding to the load instruction identifier relative to a start of the instruction block containing the dependent load instruction corresponding to the load instruction identifier; and a memory address of the dependent load instruction corresponding to the load instruction identifier.

Plain English Translation

This invention relates to processor-based devices that manage load instruction identifiers to optimize memory access and instruction execution. The problem addressed is efficiently tracking dependent load instructions in a processor pipeline to ensure correct memory access ordering and avoid performance bottlenecks. The device includes a mechanism for generating and storing load instruction identifiers that uniquely reference dependent load instructions. Each identifier can be represented in one of three ways: (1) a program counter (PC) of the instruction block containing the dependent load instruction, combined with an indication of its logical order within that block; (2) the PC of the instruction block and an offset specifying the instruction's position relative to the block's start; or (3) the direct memory address of the dependent load instruction. These identifiers allow the processor to quickly locate and manage load instructions, ensuring proper synchronization and data consistency in out-of-order execution environments. The system improves efficiency by reducing redundant memory accesses and maintaining accurate instruction dependencies without excessive overhead. This approach is particularly useful in high-performance computing where precise load instruction tracking is critical for maintaining execution correctness and performance.

Claim 12

Original Legal Text

12. The processor-based device of claim 1 integrated into an integrated circuit (IC).

Plain English Translation

A processor-based device is integrated into an integrated circuit (IC) to enhance computational efficiency and reduce power consumption in electronic systems. The device includes a processor configured to execute instructions and a memory unit storing data and instructions accessible by the processor. The processor operates at a variable clock frequency, dynamically adjusting based on workload demands to optimize performance and energy usage. The IC further incorporates a power management module that monitors system conditions and adjusts the processor's clock frequency and voltage levels accordingly. This dynamic adjustment ensures that the processor operates at the lowest possible power state while maintaining required performance levels, thereby extending battery life in portable devices. The IC may also include additional components such as input/output interfaces, communication modules, and specialized accelerators to support various applications. The integration of these components into a single IC reduces physical footprint, improves thermal management, and enhances overall system reliability. This design is particularly useful in mobile devices, embedded systems, and high-performance computing environments where power efficiency and compact form factors are critical. The dynamic power management system ensures that the processor adapts to real-time workload variations, preventing unnecessary power consumption during idle or low-activity periods while delivering peak performance when needed.

Claim 13

Original Legal Text

13. The processor-based device of claim 1 , wherein the processor-based device is selected from the group consisting of: a set top box; a navigation device; a communications device; a fixed location data unit; a mobile location data unit; a global positioning system (GPS) device; a mobile phone; a cellular phone; a smart phone; a session initiation protocol (SIP) phone; a tablet; a server; a computer; a portable computer; a mobile computing device; a wearable computing device; a desktop computer; a personal digital assistant (PDA); a monitor; a computer monitor; a television; a tuner; a radio; a satellite radio; a music player; a digital music player; a portable music player; a digital video player; a video player; a digital video disc (DVD) player; a portable digital video player; an automobile; a vehicle component; avionics systems; a drone; and a multicopter.

Plain English Translation

This invention relates to processor-based devices configured for specific applications, addressing the need for versatile computing systems across diverse industries. The device includes a processor, memory, and communication interfaces to perform tasks such as data processing, navigation, communication, or media playback. The processor executes instructions stored in memory to enable functionalities like location tracking, multimedia rendering, or network connectivity. The device may also include input/output interfaces for user interaction or peripheral connectivity. The invention emphasizes adaptability, allowing the device to function as a set-top box, navigation system, communication device, or entertainment system, among others. It may operate in fixed or mobile environments, such as in vehicles, drones, or wearable computing systems. The device supports various communication protocols, including GPS for positioning and SIP for voice-over-IP calls. Additionally, it may integrate with multimedia players, displays, or automotive systems, ensuring broad applicability in consumer electronics, transportation, and industrial sectors. The design prioritizes modularity, enabling customization for specific use cases while maintaining core processing and connectivity capabilities.

Claim 14

Original Legal Text

14. A method for providing memory dependence prediction, comprising: detecting, by a memory dependence prediction circuit of a processor-based device based on a block-atomic dataflow architecture, a fetch of an instruction block by an execution pipeline of the processor-based device; upon detecting the fetch of the instruction block, determining, based on one or more store instruction identifiers of a predictor table, whether the instruction block contains a respective one or more store instructions that reach one or more dependent load instructions; responsive to determining that the instruction block contains the respective one or more store instructions that reach one or more dependent load instructions, marking the respective one or more store instructions as having one or more dependent load instructions to wake; determining, based on one or more block reach sets and one or more load sets of the predictor table, whether the instruction block contains one or more dependent load instructions reached by one or more store instructions corresponding to the one or more block reach sets and the one or more load sets, wherein each block reach set comprises a plurality of block identifiers corresponding to a plurality of instruction blocks each containing one or more dependent load instructions, and each load set comprises a plurality of load instruction identifiers each corresponding to a dependent load instruction of the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets; and responsive to determining that the instruction block contains the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets, delaying execution of the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets upon execution of the instruction block.

Plain English Translation

The invention relates to memory dependence prediction in processor-based devices using a block-atomic dataflow architecture. The problem addressed is efficiently identifying and managing dependencies between store and load instructions to optimize execution performance. The method involves a memory dependence prediction circuit that detects the fetch of an instruction block by the processor's execution pipeline. Upon detection, the circuit checks a predictor table to determine if the instruction block contains store instructions that affect dependent load instructions. If such store instructions are found, they are marked to wake dependent load instructions. The circuit then checks the predictor table's block reach sets and load sets to identify dependent load instructions reached by store instructions. Block reach sets contain identifiers for instruction blocks with dependent load instructions, while load sets contain identifiers for specific dependent load instructions. If dependent load instructions are found, their execution is delayed until the corresponding store instructions complete. This approach improves processor efficiency by reducing unnecessary stalls and ensuring correct memory access ordering.

Claim 15

Original Legal Text

15. The method of claim 14 , further comprising: detecting execution of a store instruction; determining whether the store instruction is marked as having one or more dependent load instructions to wake; and responsive to determining that the store instruction is marked as having one or more dependent load instructions to wake: identifying one or more delayed dependent load instructions of the store instruction; and waking the one or more delayed dependent load instructions of the store instruction for execution.

Plain English Translation

This invention relates to optimizing processor performance by managing dependencies between store and load instructions in a computing system. The problem addressed is inefficiency in instruction execution due to unnecessary delays caused by dependent load instructions waiting for store instructions to complete, which can lead to underutilization of processor resources. The method involves detecting the execution of a store instruction and determining whether it is marked as having one or more dependent load instructions that are in a delayed state. If the store instruction is marked as having such dependencies, the method identifies the specific delayed load instructions that depend on it. These dependent load instructions are then "woken" or reactivated for execution, allowing them to proceed without further delay. This ensures that load instructions do not remain stalled unnecessarily, improving overall processor efficiency. The method may also include tracking dependencies between instructions to mark store instructions appropriately, ensuring that only relevant load instructions are woken. This approach helps balance performance by reducing unnecessary stalls while maintaining data consistency. The technique is particularly useful in multi-core or out-of-order execution environments where instruction dependencies can significantly impact performance.

Claim 16

Original Legal Text

16. The method of claim 15 , wherein delaying execution of the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets comprises: generating, based on the one or more block reach sets and the one or more load sets of the predictor table, a load delay marker identifying the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets, responsive to determining that the instruction block contains the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets; and transferring, based on the load delay marker, the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets to a delay buffer.

Plain English Translation

This invention relates to optimizing instruction execution in processors by managing dependencies between load and store instructions to improve performance. The problem addressed is the inefficiency caused by dependent load instructions that must wait for preceding store instructions to complete, leading to pipeline stalls and reduced throughput. The method involves a predictor table that tracks relationships between instruction blocks, store instructions, and load instructions. Specifically, it identifies dependent load instructions that are reached by store instructions within the same or subsequent instruction blocks. When an instruction block contains such dependent load instructions, a load delay marker is generated based on the predictor table's block reach sets and load sets. This marker identifies the dependent load instructions that must wait for the corresponding store instructions. The dependent load instructions are then transferred to a delay buffer, where they are held until the store instructions complete. This prevents pipeline stalls by decoupling the execution of dependent load instructions from their preceding store instructions, allowing other instructions to proceed while the load instructions wait. The delay buffer ensures that the load instructions are executed only after the necessary store operations are finalized, maintaining data consistency while improving overall processor efficiency.

Claim 17

Original Legal Text

17. The method of claim 16 , wherein waking the one or more delayed dependent load instructions of the store instruction for execution comprises: generating, based on the one or more block reach sets and the one or more load sets of the predictor table, a wakeup mask identifying the one or more delayed dependent load instructions to wake; and transferring, based on the wakeup mask, the one or more delayed dependent load instructions from the delay buffer to the execution pipeline of the processor-based device for execution.

Plain English Translation

This invention relates to processor-based devices and specifically addresses the challenge of efficiently managing dependent load instructions in a processor pipeline. The problem arises when a store instruction is executed, potentially affecting subsequent load instructions that depend on the stored data. If these dependent load instructions are not properly synchronized, they may execute before the store completes, leading to incorrect results. The invention provides a solution by using a predictor table to track dependencies between store and load instructions and a delay buffer to temporarily hold dependent load instructions until the store instruction completes. The predictor table stores block reach sets and load sets for each store instruction, which identify potential dependent load instructions. When a store instruction is executed, the predictor table is used to generate a wakeup mask that identifies which dependent load instructions in the delay buffer are now ready for execution. The wakeup mask is then used to transfer the appropriate delayed dependent load instructions from the delay buffer to the execution pipeline, ensuring they execute only after the store instruction has completed. This mechanism improves processor efficiency by reducing stalls and ensuring correct execution order without unnecessary delays.

Claim 18

Original Legal Text

18. The method of claim 14 , further comprising: detecting a first memory dependence violation resulting from execution of a first dependent load instruction prior to a corresponding first store instruction; determining that the predictor table does not store a predictor table entry having a first store instruction identifier corresponding to the corresponding first store instruction; and responsive to determining that the predictor table does not store a predictor table entry having a first store instruction identifier corresponding to the corresponding first store instruction, generating a new predictor table entry containing a store instruction identifier corresponding to the corresponding first store instruction, a block reach set containing a block identifier corresponding to an instruction block of the first dependent load instruction, and a load set containing a load instruction identifier corresponding to the first dependent load instruction.

Plain English Translation

This invention relates to memory dependence prediction in computer systems, specifically addressing the challenge of efficiently detecting and resolving memory dependence violations during instruction execution. The method involves monitoring instruction execution to identify memory dependence violations, where a load instruction accesses memory before a corresponding store instruction has completed, leading to incorrect data reads. When such a violation occurs, the system checks a predictor table for an existing entry associated with the store instruction. If no entry exists, a new entry is generated in the predictor table. This entry includes an identifier for the store instruction, a block reach set identifying the instruction block containing the dependent load instruction, and a load set containing the identifier of the load instruction. The predictor table is used to anticipate future memory dependence violations, improving system performance by reducing unnecessary stalls or speculative execution penalties. The method ensures accurate memory access ordering while minimizing performance overhead by dynamically updating the predictor table based on detected violations. This approach is particularly useful in high-performance computing environments where efficient memory access prediction is critical.

Claim 19

Original Legal Text

19. The method of claim 14 , further comprising: detecting a second memory dependence violation resulting from execution of a second dependent load instruction prior to a corresponding second store instruction; determining that the predictor table stores a predictor table entry having a second store instruction identifier corresponding to the corresponding second store instruction; and responsive to determining that the predictor table stores the predictor table entry having the second store instruction identifier corresponding to the corresponding second store instruction: determining whether a load instruction identifier corresponding to the second dependent load instruction is present in a load set of the predictor table entry; responsive to determining that a load instruction identifier corresponding to the second dependent load instruction is not present in the load set of the predictor table entry: adding a load instruction identifier corresponding to the second dependent load instruction to the load set of the predictor table entry; determining whether a block reach set of the predictor table entry contains a block identifier corresponding to an instruction block of the second dependent load instruction; and responsive to determining that the block reach set of the predictor table entry does not contain a block identifier corresponding to the instruction block of the second dependent load instruction, adding a block identifier corresponding to the instruction block of the second dependent load instruction to the block reach set of the predictor table entry.

Plain English Translation

This invention relates to improving memory dependence prediction in computer processors, specifically addressing the problem of detecting and resolving memory dependence violations that occur when a load instruction executes before its corresponding store instruction. The system uses a predictor table to track store instructions and their associated load instructions to predict and mitigate such violations. The method involves detecting a memory dependence violation when a dependent load instruction executes before its corresponding store instruction. The predictor table is queried to determine if an entry exists for the store instruction. If found, the system checks whether the load instruction's identifier is already in the entry's load set. If not, the load instruction identifier is added to the load set. Additionally, the system checks if the instruction block of the load instruction is in the entry's block reach set. If not, the block identifier is added to the block reach set. This mechanism helps the processor predict and handle memory dependence violations more efficiently, reducing performance penalties by avoiding unnecessary stalls or re-executions. The predictor table dynamically updates to reflect new dependencies, improving accuracy over time.

Claim 20

Original Legal Text

20. The method of claim 19 , wherein: the predictor table further comprises a plurality of confidence indicators corresponding to the plurality of load instruction identifiers of the load set; the method further comprises, responsive to determining that a load instruction identifier corresponding to the second dependent load instruction is present in the load set of the predictor table entry, incrementing a confidence indicator corresponding to the load instruction identifier corresponding to the second dependent load instruction; and delaying execution of the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets is further responsive to one or more confidence indicators respectively corresponding to one or more load instruction identifiers corresponding to the one or more dependent load instructions reached by the one or more store instructions corresponding to the one or more block reach sets and the one or more load sets exceeding a confidence threshold.

Plain English Translation

In the field of computer architecture, particularly in processor design, a method improves load-store dependency prediction to enhance performance by reducing unnecessary stalls. The method addresses the problem of false dependencies in out-of-order execution, where load instructions may incorrectly stall due to predicted store dependencies, degrading performance. The solution involves a predictor table that tracks load instruction identifiers and their associated confidence indicators. When a load instruction is identified as dependent on a store instruction, the predictor table is updated to increment the confidence indicator for that load instruction identifier. If the confidence indicator exceeds a predefined threshold, the execution of dependent load instructions is delayed to prevent potential data hazards. The predictor table dynamically adjusts confidence levels based on historical accuracy, allowing the processor to make more informed decisions about when to stall load instructions. This approach optimizes performance by minimizing unnecessary delays while ensuring data consistency. The method is particularly useful in high-performance processors where accurate dependency prediction is critical for efficient execution.

Claim 21

Original Legal Text

21. The method of claim 20 , further comprising: detecting execution of a delayed dependent load instruction; determining whether a predicted memory dependence for the delayed dependent load instruction is confirmed; and responsive to determining that a predicted memory dependence for the delayed dependent load instruction is not confirmed, decrementing a confidence indicator for the delayed dependent load instruction in the predictor table.

Plain English Translation

This invention relates to improving the accuracy of memory dependence prediction in computer processors, particularly for delayed dependent load instructions. The problem addressed is the inefficiency and potential performance degradation caused by incorrect memory dependence predictions, which can lead to unnecessary stalls or incorrect speculative execution. The method involves detecting the execution of a delayed dependent load instruction, which is a load operation that depends on a prior store operation but is delayed in the pipeline. The system then checks whether a previously predicted memory dependence for this instruction is confirmed during execution. If the prediction is incorrect (i.e., the dependence was not confirmed), the system reduces a confidence indicator associated with the delayed dependent load instruction in a predictor table. This confidence indicator is used to adjust future predictions, improving accuracy over time. The predictor table stores entries for load instructions, including their predicted dependencies and confidence levels. By dynamically adjusting confidence based on prediction accuracy, the system refines its predictions, reducing mispredictions and enhancing processor efficiency. This method is part of a broader system for managing memory dependencies in out-of-order execution pipelines, where accurate predictions are critical for maintaining performance. The approach helps balance speculative execution with correctness, minimizing pipeline stalls and improving overall throughput.

Claim 22

Original Legal Text

22. The method of claim 14 , wherein each store instruction identifier of the one or more store instruction identifiers comprises one of the group consisting of: a program counter (PC) of an instruction block containing an instance of a corresponding store instruction and an indication of a logical order of the store instruction within the instruction block containing the instance of the corresponding store instruction; the PC of the instruction block containing the instance of the corresponding store instruction and an offset indicating a location of the store instruction relative to a start of the instruction block containing the instance of the corresponding store instruction; and a memory address of the store instruction.

Plain English Translation

The invention relates to a method for identifying and tracking store instructions in a computing system, particularly in the context of memory operations and instruction execution. The problem addressed involves efficiently managing and referencing store instructions to ensure correct memory access and data consistency, especially in systems where multiple instructions may interact with memory. The method involves generating and using store instruction identifiers to uniquely reference store instructions within a computing system. Each identifier can take one of several forms. In one approach, the identifier includes a program counter (PC) of an instruction block containing the store instruction, along with an indication of the logical order of the store instruction within that block. Alternatively, the identifier may include the PC of the instruction block and an offset indicating the location of the store instruction relative to the start of the block. Another option is to use the memory address of the store instruction itself as the identifier. These identifiers allow the system to precisely track and reference store instructions, ensuring that memory operations are correctly ordered and executed. This is particularly useful in systems where maintaining the correct sequence of memory operations is critical, such as in multi-threaded or parallel processing environments. The identifiers can be used to enforce memory consistency models, detect data races, or optimize memory access patterns. The method ensures that store instructions are uniquely and accurately referenced, improving system reliability and performance.

Claim 23

Original Legal Text

23. The method of claim 14 , wherein each load instruction identifier of the plurality of load instruction identifiers comprises one of the group consisting of: a PC of an instruction block containing a dependent load instruction corresponding to the load instruction identifier and an indication of a logical order of the dependent load instruction within the instruction block containing the dependent load instruction corresponding to the load instruction identifier; the PC of the instruction block and an offset indicating a location of the dependent load instruction corresponding to the load instruction identifier relative to a start of the instruction block containing the dependent load instruction corresponding to the load instruction identifier; and a memory address of the dependent load instruction corresponding to the load instruction identifier.

Plain English Translation

This invention relates to optimizing load instruction handling in computer systems, particularly in out-of-order execution processors where load dependencies must be tracked efficiently. The problem addressed is the need to accurately identify and manage dependent load instructions to prevent data hazards and ensure correct program execution. The invention provides a method for representing load instruction identifiers in a way that allows precise tracking of dependencies without excessive overhead. Each load instruction identifier includes information that uniquely specifies a dependent load instruction. The identifier can be implemented in three ways: (1) a program counter (PC) of the instruction block containing the dependent load instruction, combined with an indication of its logical order within that block; (2) the PC of the instruction block and an offset specifying the dependent load instruction's location relative to the block's start; or (3) the direct memory address of the dependent load instruction. These identifiers enable the processor to quickly resolve load dependencies, ensuring correct execution order while minimizing performance penalties. The method supports efficient dependency tracking in modern out-of-order processors, improving both correctness and throughput.

Patent Metadata

Filing Date

Unknown

Publication Date

June 16, 2020

Inventors

Chen-Han Ho
Gregory Michael Wright

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “PROVIDING MEMORY DEPENDENCE PREDICTION IN BLOCK-ATOMIC DATAFLOW ARCHITECTURES” (10684859). https://patentable.app/patents/10684859

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10684859. See llms.txt for full attribution policy.