A method, system and computer program product embodied on a computer-readable medium are provided for managing the execution of out-of-order instructions. The method includes the steps of receiving a plurality of instructions and identifying a subset of instructions in the plurality of instructions to be executed out-of-order.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method, comprising: receiving a plurality of instructions by a processor; identifying, by a scheduler unit within the processor, a subset of instructions in the plurality of instructions to be executed out-of-order by inspecting metadata associated with the plurality of instructions; and for each instruction in the subset of instructions, the scheduler unit adds the instruction to a virtual queue specified by metadata for the instruction, wherein the virtual queue is one of a plurality of virtual queues implemented in an on-chip random access memory (RAM), and wherein each virtual queue in the plurality of virtual queues is implemented as a circular FIFO in the RAM.
2. The method of claim 1 , wherein the metadata includes an identifier that specifies a particular virtual queue in the plurality of virtual queues.
3. The method of claim 1 , wherein the metadata is embedded within the instructions.
4. The method of claim 1 , wherein the subset of instructions to be executed out-of-order include arithmetic instructions.
5. The method of claim 1 , further comprising enabling hazard protection for input registers associated with a particular instruction when at least one of the instructions to be executed out-of-order is dispatched.
6. The method of claim 1 , further comprising disabling hazard protection for the input registers associated with the particular instruction when the particular instruction completes execution.
7. The method of claim 1 , wherein the metadata is stored separately from the instructions.
8. The method of claim 1 , wherein the processor includes a scheduling unit that is configured to identify the subset of instructions and add the instructions to the plurality of virtual queues.
9. The method of claim 8 , wherein the processor includes a dispatch unit that is configured to receive an instruction from a particular virtual queue associated with a functional unit of the processor and issue the instruction from the particular virtual queue only after a delay of N clock cycles since a previous instruction from the particular virtual queue was issued by the dispatch unit.
10. The method of claim 1 , wherein the plurality of instructions implement a reductive texture sampling program that includes one or more texture operations.
11. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform steps comprising: receiving a plurality of instructions by the processor; identifying, by a scheduler unit within the processor, a subset of instructions in the plurality of instructions to be executed out-of-order by inspecting metadata associated with the plurality of instructions; and for each instruction in the subset of instructions, the scheduler unit adds the instruction to a virtual queue specified by metadata for the instruction, wherein the virtual queue is one of a plurality of virtual queues implemented in an on-chip random access memory (RAM), and wherein each virtual queue in the plurality of virtual queues is implemented as a circular FIFO in the RAM.
12. The non-transitory computer-readable storage medium of claim 11 , the steps further comprising enabling hazard protection for input registers associated with a particular instruction when at least one of the instructions to be executed out-of-order is dispatched.
13. The non-transitory computer-readable storage medium of claim 12 , the steps further comprising disabling hazard protection for the input registers associated with the particular instruction when the particular instruction completes execution.
14. The non-transitory computer-readable storage medium of claim 11 , wherein the metadata is embedded within the instructions.
15. The non-transitory computer-readable storage medium of claim 11 , wherein the processor includes a scheduling unit that is configured to identify the subset of instructions and add the instructions to the plurality of virtual queues; and a dispatch unit that is configured to receive an instruction from a particular virtual queue associated with a functional unit of the processor and issue the instruction from the particular virtual queue only after a delay of N clock cycles since a previous instruction from the particular virtual queue was issued by the dispatch unit.
16. The non-transitory computer-readable storage medium of claim 11 , wherein the plurality of instructions implement a reductive texture sampling program that includes one or more texture operations.
17. A system, comprising: a processor configured to: receive a plurality of instructions; identify, by a scheduler unit within the processor, a subset of instructions in the plurality of instructions to be executed out-of-order by inspecting metadata associated with the plurality of instructions; and for each instruction in the subset of instructions, the scheduler unit adds the instruction to a virtual queue specified by metadata for the instruction, wherein the virtual queue is one of a plurality of virtual queues implemented in an on-chip random access memory (RAM), and wherein each virtual queue in the plurality of virtual queues is implemented as a circular FIFO in the RAM.
18. The system of claim 17 , wherein the processor comprises a graphics processing unit.
19. The system of claim 17 , wherein the metadata is embedded within the instructions.
20. The system of claim 17 , wherein the processor includes a scheduling unit that is configured to identify the subset of instructions and add the instructions to the plurality of virtual queues; and a dispatch unit that is configured to receive an instruction from a particular virtual queue associated with a functional unit of the processor and issue the instruction from the particular virtual queue only after a delay of N clock cycles since a previous instruction from the particular virtual queue was issued by the dispatch unit.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
July 18, 2013
April 9, 2019
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.