US-11347502

Apparatus and method of improved insert instructions

PublishedMay 31, 2022

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

Patent Claims

24 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A processor comprising: a plurality of vector registers including a source vector register greater than 127 bits and a destination vector register greater than 127 bits; instruction decode circuitry to decode instructions; and an execution unit to perform operations specified by the instructions, wherein, in response to the instruction decode circuitry decoding an insert instruction, the execution unit is to copy a 64-bit data element from the source vector register to a 64-bit data element location in the destination vector register without zeroing other data element locations in the destination vector register, wherein the 64-bit data element location of a plurality of 64-bit data element locations in the destination vector register is specified by a first value of an immediate of the insert instruction, and a second value of the insert instruction indicates a 64-bit element width granularity from a plurality of element width granularities.

2. The processor of claim 1 , further comprising: instruction fetch circuitry to fetch the instructions from a memory.

3. The processor of claim 1 , further comprising: a plurality of cores, the execution unit integral to a first core of the plurality of cores and a second execution unit integral to a second core of the plurality of cores.

4. The processor of claim 3 , further comprising: a level 1 data cache and level 1 instruction cache integral to one or more of the cores.

5. The processor of claim 4 , further comprising: cache coherency circuitry to maintain coherency between L1 data caches of different cores.

6. The processor of claim 1 , further comprising: a translation lookaside buffer to store virtual to physical address translations usable by the execution unit to translate virtual addresses to physical addresses.

7. The processor of claim 1 , further comprising: a reorder buffer to store data resulting from out-of-order execution of the instructions.

8. The processor of claim 1 , further comprising: register renaming circuitry to identify the source vector register and destination vector register in a physical register file.

9. A method comprising: decoding, by instruction decode circuitry of a processor, at least one instruction into a decoded at least one instruction; and in response to the at least one instruction being an insert instruction, executing the decoded at least one instruction, by an execution unit of the processor, to copy a 64-bit data element from a source vector register greater than 127 bits to a 64-bit data element location in a destination vector register greater than 127 bits without zeroing other data element locations in the destination vector register, wherein the 64-bit data element location of a plurality of 64-bit data element locations in the destination vector register is specified by a first value of an immediate of the insert instruction, and a second value of the insert instruction indicates a 64-bit element width granularity from a plurality of element width granularities.

10. The method of claim 9 , further comprising: fetching the at least one instruction from a memory using instruction fetch circuitry.

11. The method of claim 9 , wherein the processor further comprises: a plurality of cores, the execution unit integral to a first core of the plurality of cores and a second execution unit integral to a second core of the plurality of cores.

12. The method of claim 11 , wherein the processor further comprises: a level 1 data cache and level 1 instruction cache integral to one or more of the cores.

13. The method of claim 12 , further comprising: maintaining coherency between L1 data caches of a plurality of cores using cache coherency circuitry.

14. The method of claim 9 , further comprising: storing virtual to physical address translations usable by the processor to translate virtual addresses to physical addresses in a translation lookaside buffer.

15. The method of claim 9 , further comprising: storing data resulting from out-of-order execution of the at least one instruction in a reorder buffer.

16. The method of claim 9 , further comprising: identifying the source vector register and destination vector register in a physical register file using register renaming circuitry.

17. A non-transitory machine readable storage medium including instructions stored thereon which, when executed by a processor, cause the processor to: decode at least one instruction into a decoded at least one instruction with instruction decode circuitry of the processor; and in response to the at least one instruction being an insert instruction, execute the decoded at least one instruction with an execution unit of the processor to copy a 64-bit data element from a source vector register greater than 127 bits to a 64-bit data element location in a destination vector register greater than 127 bits without zeroing other data element locations in the destination vector register, wherein the 64-bit data element location of a plurality of 64-bit data element locations in the destination vector register is specified by a first value of an immediate of the insert instruction, and a second value of the insert instruction indicates a 64-bit element width granularity from a plurality of element width granularities.

18. The non-transitory machine readable storage medium of claim 17 further comprising: fetching the at least one instruction from a memory using instruction fetch circuitry.

19. The non-transitory machine readable storage medium of claim 17 , wherein the processor further comprises: a plurality of cores, the execution unit integral to a first core of the plurality of cores and a second execution unit integral to a second core of the plurality of cores.

20. The non-transitory machine readable storage medium of claim 19 , wherein the processor further comprises: a level 1 data cache and level 1 instruction cache integral to one or more of the cores.

21. The non-transitory machine readable storage medium of claim 20 , further comprising: maintaining coherency between L1 data caches of a plurality of cores using cache coherency circuitry.

22. The non-transitory machine readable storage medium of claim 17 , further comprising: storing virtual to physical address translations usable by the processor to translate virtual addresses to physical addresses in a translation lookaside buffer.

23. The non-transitory machine readable storage medium of claim 17 , further comprising: storing data resulting from out-of-order execution of the at least one instruction in a reorder buffer.

24. The non-transitory machine readable storage medium of claim 17 , further comprising: identifying the source vector register and destination vector register in a physical register file using register renaming circuitry.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F

Patent Metadata

Filing Date

March 31, 2017

Publication Date

May 31, 2022

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search