In-Lane Vector Shuffle Instructions

PublishedDecember 24, 2019

Assigneenot available in USPTO data we have

InventorsZeev Sperber Robert Valentine Benny Eitan Doron Orenstein

Technical Abstract

Patent Claims

22 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A processor comprising: a decode unit including circuitry to decode a single instruction specifying a source operand, a destination operand, and an immediate operand, wherein the source operand and the destination operand each have a first lane and a second lane, wherein the first lane of the source operand is to store a first plurality of data elements, wherein the second lane of the source operand is to store a second plurality of data elements, and wherein the immediate operand is to specify a first plurality of control bits, a second plurality of control bits, a third plurality of control bits, and a fourth plurality of control bits; and an execution unit coupled with the decode unit, the execution unit to perform the single instruction and to use the first, second, third, and fourth pluralities of control bits for both the first and second lanes of the source operand, the execution unit to: copy one of the first plurality of data elements specified by the first plurality of control bits to a first data element position of the first lane of the destination operand, copy one of the first plurality of data elements specified by the second plurality of control bits to a second data element position of the first lane of the destination operand, copy one of the first plurality of data elements specified by the third plurality of control bits to a third data element position of the first lane of the destination operand, and copy one of the first plurality of data elements specified by the fourth plurality of control bits to a fourth data element position of the first lane of the destination operand; and copy one of the second plurality of data elements specified by the first plurality of control bits to a first data element position of the second lane of the destination operand, copy one of the second plurality of data elements specified by the second plurality of control bits to a second data element position of the second lane of the destination operand, copy one of the second plurality of data elements specified by the third plurality of control bits to a third data element position of the second lane of the destination operand, and copy one of the second plurality of data elements specified by the fourth plurality of control bits to a fourth data element position of the second lane of the destination operand.

2. The processor of claim 1 , wherein the source operand is a 256-bit operand, and wherein each of the first lane of the source operand and the second lane of the source operand is a 128-bit lane.

3. The processor of claim 2 , wherein each of the first plurality of data elements is a 32-bit data element and each of the second plurality of data elements is a 32-bit data element.

4. The processor of claim 1 , wherein the immediate operand is an 8-bit operand.

5. The processor of claim 4 , wherein the first plurality of control bits, the second plurality of control bits, the third plurality of control bits, and the fourth plurality of control bits each consist of 2 bits.

6. The processor of claim 1 , wherein the destination operand is a 256-bit operand, and wherein each of the first lane of the destination operand and the second lane of the destination operand is a 128-bit lane.

7. The processor of claim 1 , wherein the first lane of the source operand occupies one half of the source operand and the second lane of the source operand occupies another half of the source operand.

8. A system comprising: a plurality of processors; a memory; and a bus to communicatively couple a given processor of the plurality of processors to a plurality of other system components, wherein the given processor includes: a decode unit including circuitry to decode a single instruction specifying a source operand, a destination operand, and an immediate operand, wherein the source operand and the destination operand each have a first lane and a second lane, wherein the first lane of the source operand is to store a first plurality of data elements, wherein the second lane of the source operand is to store a second plurality of data elements, and wherein the immediate operand is to specify a first plurality of control bits, a second plurality of control bits, a third plurality of control bits, and a fourth plurality of control bits; and an execution unit coupled with the decode unit, the execution unit to perform the single instruction and to use the first, second, third, and fourth pluralities of control bits for both the first and second lanes of the source operand, the execution unit to: copy one of the first plurality of data elements specified by the first plurality of control bits to a first data element position of the first lane of the destination operand, copy one of the first plurality of data elements specified by the second plurality of control bits to a second data element position of the first lane of the destination operand, copy one of the first plurality of data elements specified by the third plurality of control bits to a third data element position of the first lane of the destination operand, and copy one of the first plurality of data elements specified by the fourth plurality of control bits to a fourth data element position of the first lane of the destination operand; and copy one of the second plurality of data elements specified by the first plurality of control bits to a first data element position of the second lane of the destination operand, copy one of the second plurality of data elements specified by the second plurality of control bits to a second data element position of the second lane of the destination operand, copy one of the second plurality of data elements specified by the third plurality of control bits to a third data element position of the second lane of the destination operand, and copy one of the second plurality of data elements specified by the fourth plurality of control bits to a fourth data element position of the second lane of the destination operand.

9. The system of claim 8 , wherein the source operand is a 256-bit operand, and wherein each of the first lane of the source operand and the second lane of the source operand is a 128-bit lane.

10. The system of claim 9 , wherein each of the first plurality of data elements is a 32-bit data element and each of the second plurality of data elements is a 32-bit data element.

11. The system of claim 8 , wherein the immediate operand is an 8-bit operand.

12. The system of claim 11 , wherein the first plurality of control bits, the second plurality of control bits, the third plurality of control bits, and the fourth plurality of control bits each consist of 2 bits.

13. The system of claim 8 , wherein the destination operand is a 256-bit operand, and wherein each of the first lane of the destination operand and the second lane of the destination operand is a 128-bit lane.

14. The system of claim 8 , wherein the first lane of the source operand occupies one half of the source operand and the second lane of the source operand occupies another half of the source operand.

15. A system comprising: a memory; and a processor coupled with the memory, the processor including: a decode unit including circuitry to decode a single instruction specifying a source operand, a destination operand, and an immediate operand, wherein the source operand and the destination operand each have a first lane and a second lane, wherein the first lane of the source operand is to store a first plurality of data elements, wherein the second lane of the source operand is to store a second plurality of data elements, and wherein the immediate operand is to specify a first plurality of control bits, a second plurality of control bits, a third plurality of control bits, and a fourth plurality of control bits; and an execution unit coupled with the decode unit, the execution unit to perform the single instruction and to use the first, second, third, and fourth pluralities of control bits for both the first and second lanes of the source operand, the execution unit to: copy one of the first plurality of data elements specified by the first plurality of control bits to a first data element position of the first lane of the destination operand, copy one of the first plurality of data elements specified by the second plurality of control bits to a second data element position of the first lane of the destination operand, copy one of the first plurality of data elements specified by the third plurality of control bits to a third data element position of the first lane of the destination operand, and copy one of the first plurality of data elements specified by the fourth plurality of control bits to a fourth data element position of the first lane of the destination operand; and copy one of the second plurality of data elements specified by the first plurality of control bits to a first data element position of the second lane of the destination operand, copy one of the second plurality of data elements specified by the second plurality of control bits to a second data element position of the second lane of the destination operand, copy one of the second plurality of data elements specified by the third plurality of control bits to a third data element position of the second lane of the destination operand, and copy one of the second plurality of data elements specified by the fourth plurality of control bits to a fourth data element position of the second lane of the destination operand.

16. The system of claim 15 , wherein the source operand is a 256-bit operand, and wherein each of the first lane of the source operand and the second lane of the source operand is a 128-bit lane.

17. The system of claim 16 , wherein each of the first plurality of data elements is a 32-bit data element and each of the second plurality of data elements is a 32-bit data element.

18. The system of claim 15 , wherein the immediate operand is an 8-bit operand.

19. The system of claim 18 , wherein the first plurality of control bits, the second plurality of control bits, the third plurality of control bits, and the fourth plurality of control bits each consist of 2 bits.

20. The system of claim 15 , wherein the destination operand is a 256-bit operand, and wherein each of the first lane of the destination operand and the second lane of the destination operand is a 128-bit lane.

21. The system of claim 15 , wherein the first lane of the source operand occupies one half of the source operand and the second lane of the source operand occupies another half of the source operand.

22. A processor comprising: a decode unit including hardware to decode a single instruction specifying a 256-bit source operand, a 256-bit destination operand, and an 8-bit immediate operand, wherein the 256-bit source operand and the 256-bit destination operand each have a first 128-bit lane and a second 128-bit lane, wherein the first 128-bit lane of the 256-bit source operand is to store a first plurality of 32-bit data elements, wherein the second 128-bit lane of the 256-bit source operand is to store a second plurality of 32-bit data elements, and wherein the 8-bit immediate operand is to specify a first plurality of control bits, a second plurality of control bits, a third plurality of control bits, and a fourth plurality of control bits, wherein the first, second, third, and fourth plurality of control bits are each 2 bits, wherein the first plurality of 32-bit data elements are floating-point data elements; and an execution unit coupled with the decode unit, the execution unit to perform the single instruction and to use the first, second, third, and fourth pluralities of control bits for both the first and second 128-bit lanes of the 256-bit source operand, the execution unit to: store one of the first plurality of 32-bit data elements specified by the first plurality of control bits to a first 32-bit data element position of the first 128-bit lane of the 256-bit destination operand, store one of the first plurality of 32-bit data elements specified by the second plurality of control bits to a second 32-bit data element position of the first 128-bit lane of the 256-bit destination operand, store one of the first plurality of 32-bit data elements specified by the third plurality of control bits to a third 32-bit data element position of the first 128-bit lane of the 256-bit destination operand, and store one of the first plurality of 32-bit data elements specified by the fourth plurality of control bits to a fourth 32-bit data element position of the first 128-bit lane of the 256-bit destination operand; and store one of the second plurality of 32-bit data elements specified by the first plurality of control bits to a first 32-bit data element position of the second 128-bit lane of the 256-bit destination operand, store one of the second plurality of 32-bit data elements specified by the second plurality of control bits to a second 32-bit data element position of the second 128-bit lane of the 256-bit destination operand, store one of the second plurality of 32-bit data elements specified by the third plurality of control bits to a third 32-bit data element position of the second 128-bit lane of the 256-bit destination operand, and store one of the second plurality of 32-bit data elements specified by the fourth plurality of control bits to a fourth 32-bit data element position of the second 128-bit lane of the 256-bit destination operand.

Patent Metadata

Filing Date

Unknown

Publication Date

December 24, 2019

Inventors

Zeev Sperber

Robert Valentine

Benny Eitan

Doron Orenstein

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search