Legal claims defining the scope of protection, as filed with the USPTO.
1. A computer implemented method comprising: decoding a single instruction; executing the decoded single instruction by: copying, from one or more registers, a set of indices and a corresponding set of mask elements to an index array, generating a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; accessing an address from the set of addresses to store a corresponding data element if a corresponding mask element has said first value, and changing the values of corresponding mask elements from the first value to a second value responsive to completion of their respective stores.
2. The computer implemented method of claim 1 , wherein the single instruction is a single instruction multiple data (SIMD) instruction.
3. The computer implemented method of claim 2 , wherein said copying the set of indices and the corresponding set of mask elements to said index array being performed responsive to a first micro-operation generated by decoding said SIMD instruction.
4. The computer implemented method of claim 1 , wherein the executing further comprises: allocating buffer storage for addresses corresponding to the set of indices, and copying vector data elements to the allocated buffer storage.
5. The computer implemented method of claim 1 , wherein the mask elements are stored in a register.
6. The computer implemented method of claim 5 , wherein the register is architecturally visible.
7. A non-transitory machine readable medium storing a single instruction, when processing by a processor causing the processor to perform a method comprising: decoding the single instruction; executing the decoded single instruction by: copying, from one or more registers, a set of indices and a corresponding set of mask elements to an index array, generating a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; accessing an address from the set of addresses to store a corresponding data element if a corresponding mask element has said first value, and changing the values of corresponding mask elements from the first value to a second value responsive to completion of their respective stores.
8. The non-transitory machine readable medium of claim 7 , wherein the single instruction is single instruction multiple data (SIMD) instruction.
9. The non-transitory machine readable medium of claim 8 , wherein said copying the set of indices and the corresponding set of mask elements to said index array being performed responsive to a first micro-operation generated by decoding said SIMD instruction.
10. The non-transitory machine readable medium of claim 7 , wherein the executing further comprises: allocating buffer storage for addresses corresponding to the set of indices, and copying vector data elements to the allocated buffer storage.
11. The non-transitory machine readable medium of claim 7 , wherein the mask elements are stored in a register.
12. The non-transitory machine readable medium of claim 11 , wherein the register is architecturally visible.
13. The non-transitory machine readable medium of claim 7 , wherein the to execute further comprises to: allocate buffer storage for addresses corresponding to the set of indices, and copy vector data elements to the allocated buffer storage.
14. An apparatus comprising: decode circuitry to decode a single instruction; execution circuitry to execute the decoded single instruction to: copy, from one or more registers, a set of indices and a corresponding set of mask elements to an index array, generate a set of addresses from the set of indices in the index array for at least each corresponding mask element having a first value; access an address from the set of addresses to store a corresponding data element if a corresponding mask element has said first value, and change the values of corresponding mask elements from the first value to a second value responsive to completion of their respective stores.
15. The apparatus of claim 14 , wherein the single instruction is single instruction multiple data (SIMD) instruction.
16. The apparatus of claim 15 , wherein said to copy the set of indices and the corresponding set of mask elements to said index array is performed responsive to a first micro-operation generated by decoding said SIMD instruction.
17. The apparatus of claim 14 , wherein the mask elements are to be stored in a register.
18. The apparatus of claim 17 , wherein the register is architecturally visible.
Unknown
December 4, 2018
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.