Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction configured to cause the processor to output a first data value to a first address in a first data cache, outputting, by the processor, the first data value to a second address in a second data cache, receiving a second instruction configured to cause a streaming engine associated with the processor to prefetch data from the first data cache, determining that the first data value has not been outputted from the second data cache to the first data cache, stalling execution of the second instruction, receiving an indication, from the second data cache, that the first data value has been output from the second data cache to the first data cache, and resuming execution of the second instruction based on the received indication.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for executing a plurality of instructions by a processor, the method comprising: receiving a first instruction configured to cause the processor to output a first data value for storing in a first data cache of a cache hierarchy; in response to the first instruction, outputting, by the processor, the first data value for storing in a second data cache of the cache hierarchy that is between the processor and the first data cache in the cache hierarchy; receiving a second instruction configured to cause a streaming engine associated with the processor to prefetch data that includes the first data value from the first data cache via a data path that bypasses the second data cache; in response to the second instruction, determining that the first data value has not been outputted from the second data cache to the first data cache; stalling execution of the second instruction; receiving an indication, from the second data cache, that the first data value has been output from the second data cache to the first data cache; and resuming execution of the second instruction based on the received indication.
2. The method of claim 1 , wherein the indication that the first data value has been outputted from the second data cache to the first data cache is based on an acknowledgement from the first data cache that the first data cache has consumed the first data value output from the second data cache.
3. The method of claim 1 , wherein the second instruction is received after the first instruction is received.
4. The method of claim 3 , further comprising: receiving a third instruction configured to cause the processor to output a third data value to the first data cache; and continuing execution of the first, second, and third instruction based on a determination that the third instruction was received after the second instruction was received.
5. The method of claim 1 , further comprising: receiving a third instruction configured to cause the processor to load the first data value from the first data cache, the third instruction received before receiving the second instruction; determining that the first data value has not been loaded from the first data cache; and stalling execution of the second instruction based on the determination that the first data value has not been loaded from the first data cache.
6. The method of claim 1 , wherein: the first instruction is associated with a group identifier; and the determining that the first data value has not been outputted from the second data cache to the first data cache includes querying, by the streaming engine, the second data cache based on the group identifier.
7. The method of claim 6 , wherein the group identifier includes a first color value associated with the first instruction that is equal to a second color value associated with the second instruction.
8. The method of claim 7 , further comprising: tagging a first memory operation associated with the first instruction with the first color value; and tagging a second memory operation associated with the second instruction with the second color value, wherein the first and second color value are based on a processor register field value.
9. The method of claim 6 , wherein the first instruction specifies the group identifier.
10. The method of claim 6 , wherein: the group identifier is a first group identifier; and the method further comprises: while the execution of the second instruction is stalled, determining that a third instruction is associated with a second group identifier that is different from the first group identifier, and based on the second group identifier being different from the first group identifier, executing the third instruction while the second instruction is stalled.
11. The method of claim 1 , wherein the first data cache is an L2 cache and the second data cache is an L1 cache.
12. A processor, comprising: a streaming engine capable of accessing a first data cache of a cache hierarchy; and an instruction execution pipeline controller, the instruction execution pipeline controller including circuitry configured to: receive a first instruction configured to cause the processor to output a first data value to the first data cache; in response to the first instruction, output the first data value to a second data cache between the first data cache and the processor in the cache hierarchy; receive a second instruction configured to cause the streaming engine to prefetch data from the first data cache via a data path that does not include the second data cache; in response to the second instruction, determine that the first data value has not been outputted from the second data cache to the first data cache; stall execution of the second instruction; receive an indication, from the second data cache, that the first data value has been output from the second data cache to the first data cache; and resume execution of the second instruction based on the received indication.
13. The processor of claim 12 , wherein the indication that the first data value has been outputted from the second data cache to the first data cache is based on an acknowledgement from the first data cache that the first data cache has consumed the first data value output from the second data cache.
14. The processor of claim 12 , wherein the second instruction is received after the first instruction is received.
15. The processor of claim 14 , wherein circuitry of the instruction execution pipeline controller is further configured to: receive a third instruction configured to cause the processor to output a third data value to the first data cache; and continue execution of the first, second, and third instruction based on a determination that the third instruction was received after the second instruction was received.
16. The processor of claim 12 , wherein circuitry of the instruction execution pipeline controller is further configured to: receive a third instruction, configured to cause the processor to load the first data value from the first data cache, the third instruction received before receiving the second instruction; determine that the first data value has not been loaded from the first data cache; and stall execution of the second instruction based on the determination that the first data value has not been loaded from the first data cache.
17. The processor of claim 12 , wherein: the first instruction is associated with a group identifier; and the determining that the first data value has not been outputted from the second data cache to the first data cache includes querying the second data cache based on the group identifier.
18. The processor of claim 17 , wherein the group identifier includes a first color value associated with the first instruction that is equal to a second color value associated with the second instruction.
19. The processor of claim 18 , wherein circuitry of the instruction execution pipeline controller is further configured to: tag a first memory operation associated with the first instruction with the first color value; and tag a second memory operation associated with the second instruction with the second color value, wherein the first and second color value are based on a processor register field value.
20. The processor of claim 12 , wherein the first data cache is an L2 cache and the second data cache is an L1 cache.
21. A processing system comprising: a memory space; a processor comprising: a streaming engine capable of autonomously accessing a first data cache of the memory space; and an instruction execution pipeline controller, the instruction execution pipeline controller including circuitry configured to: receive a first instruction configured to cause the processor to output a first data value to a first address in the first data cache, the first instruction associated with a first color value; output the first data value to a second address in a second data cache of the memory space; receive a second instruction configured to cause the streaming engine to prefetch data from the first data cache, the second instruction associated with the first color value; determine that the first data value has not been outputted from the second data cache to the first data cache; stall execution of the second instruction; receive an indication, from the second data cache, that the first data value has been output from the second data cache to the first data cache; resume execution of the second instruction based on the received indication; receive a third instruction configured to cause the processor to output a third data value to a third address in the first data cache, the third instruction associated with the first color value receive a fourth instruction associated with a second color value different from the first color value, the fourth instruction configured to cause the streaming engine to prefetch data from the third address; and execute the fourth instruction without stalling execution of the fourth instruction.
22. The processing system of claim 21 , wherein the second data cache comprises a L1 cache and wherein the first data cache comprises a L2 cache.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
March 11, 2019
March 30, 2021
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.