9720691

Speculative Scalarization in Vector Processing

PublishedAugust 1, 2017
Assigneenot available in USPTO data we have
InventorsLee Howes
Technical Abstract

Patent Claims
27 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method comprising: receiving, by a first processor, vector code, wherein the vector code includes a plurality of instructions configured to be compiled for execution by a vector processor; determining, by the first processor during compilation of the vector code, whether at least one instruction of the plurality of instructions is a speculatively uniform instruction, wherein the speculatively uniform instruction is an instruction that cannot be determined to be a uniform operation during compilation and cannot be determined to not be a uniform operation during compilation; generating, by the first processor during compilation of the vector code, uniformity detection code for the at least one speculatively uniform instruction, wherein the uniformity detection code, when executed, is configured to determine whether the at least one speculatively uniform instruction is uniform during runtime; generating, by the first processor during compilation of the vector code, scalar code by scalarizing the at least one speculatively uniform instruction, wherein the scalar code is configured to be compiled by the first processor for execution by a scalar processor, a scalar processing unit of the vector processor, or a vector pipeline of the vector processor; and executing, by the vector processor, the generated uniformity detection code during runtime to determine whether the at least one speculatively uniform instruction is uniform during runtime.

2

2. The method of claim 1 , further comprising modifying, by the first processor during compilation of the vector code, the vector code to include the generated uniformity detection code and the generated scalar code.

3

3. The method of claim 1 , wherein the vector processor is a graphics processing unit (GPU) and the first processor is a central processing unit (CPU), or wherein the vector processor is a GPU and the scalar processor is the first processor, or wherein the vector processor, the first processor, and the scalar processor are the same processor configured to perform vector processing and scalar processing.

4

4. The method of claim 1 , wherein the vector code is single program, multiple data (SPMD) code.

5

5. The method of claim 1 , wherein the vector processor is a single instruction, multiple data (SIMD) architected graphics processing unit (GPU) and the vector code is single program, multiple data (SPMD) code.

6

6. The method of claim 1 , further comprising executing, by the scalar processing unit of the vector processor or the vector pipeline of the vector processor, the generated scalar code if the at least one speculatively uniform instruction is determined to be uniform during runtime based on execution of the generated uniformity detection code.

7

7. The method of claim 1 , further comprising not executing the generated scalar code if the at least one speculatively uniform instruction is determined not to be uniform during runtime based on execution of the generated uniformity detection code.

8

8. The method of claim 1 , further comprising executing the at least one speculatively uniform instruction by a plurality of vector pipelines of the vector processor.

9

9. A device comprising: a memory configured to store vector code; a first processor; and a vector processor, wherein the first processor is configured to: receive vector code from the memory, wherein the vector code includes a plurality of instructions configured to be compiled for execution by a vector processor; determine, during compilation of the vector code, whether at least one instruction of the plurality of instructions is a speculatively uniform instruction, wherein the speculatively uniform instruction is an instruction that cannot be determined to be a uniform operation during compilation and cannot be determined to not be a uniform operation during compilation; generate, during compilation of the vector code, uniformity detection code for the at least one speculatively uniform instruction, wherein the uniformity detection code, when executed, is configured to determine whether the at least one speculatively uniform instruction is uniform during runtime; and generate, during compilation of the vector code, scalar code by scalarizing the at least one speculatively uniform instruction, wherein the scalar code is configured to be compiled by the first processor for execution by a scalar processor, a scalar processing unit of the vector processor, or a vector pipeline of the vector processor, wherein the vector processor is configured to execute the generated uniformity detection code during runtime to determine whether the at least one speculatively uniform instruction is uniform during runtime.

10

10. The device of claim 9 , wherein the first processor is further configured to modify, during compilation of the vector code, the vector code to include the generated uniformity detection code and the generated scalar code.

11

11. The device of claim 9 , wherein the vector processor is a graphics processing unit (GPU) and the first processor is a central processing unit (CPU), or wherein the vector processor is a GPU and the scalar processor is the first processor, or wherein the vector processor, the first processor, and the scalar processor are the same processor configured to perform vector processing and scalar processing.

12

12. The device of claim 9 , wherein the vector code is single program, multiple data (SPMD) code.

13

13. The device of claim 9 , wherein the vector processor is a single instruction, multiple data (SIMD) architected graphics processing unit (GPU) and the vector code is single program, multiple data (SPMD) code.

14

14. The device of claim 9 , wherein the scalar processing unit of the vector processor or the vector pipeline of the vector processor is configured to execute the generated scalar code if the at least one speculatively uniform instruction is determined to be uniform during runtime based on execution of the generated uniformity detection code.

15

15. The device of claim 9 , wherein the vector processor is configured to not execute the generated scalar code if the at least one speculatively uniform instruction is determined not to be uniform during runtime based on execution of the generated uniformity detection code.

16

16. The device of claim 9 , wherein a plurality of vector pipelines of the vector processor are configured to execute the at least one speculatively uniform instruction.

17

17. An apparatus comprising: means for receiving vector code, wherein the vector code includes a plurality of instructions configured to be compiled for execution by a vector processor; means for determining, during compilation of the vector code, whether at least one instruction of the plurality of instructions is a speculatively uniform instruction, wherein the speculatively uniform instruction is an instruction that cannot be determined to be a uniform operation during compilation and cannot be determined to not be a uniform operation during compilation; means for generating, during compilation of the vector code, uniformity detection code for the at least one speculatively uniform instruction, wherein the uniformity detection code, when executed, is configured to determine whether the at least one speculatively uniform instruction is uniform during runtime; means for generating, during compilation of the vector code, scalar code by scalarizing the at least one speculatively uniform instruction, wherein the scalar code is configured to be compiled for execution by a scalar processor, a scalar processing unit of the vector processor, or a vector pipeline of the vector processor; and means for executing the generated uniformity detection code during runtime to determine whether the at least one speculatively uniform instruction is uniform during runtime.

18

18. The apparatus of claim 17 , further comprising means for modifying, during compilation of the vector code, the vector code to include the generated uniformity detection code and the generated scalar code.

19

19. The apparatus of claim 17 , wherein the vector processor is a graphics processing unit (GPU) and the scalar processor is a central processing unit (CPU), or wherein the vector processor and the scalar processor are the same processor configured to perform vector processing and scalar processing.

20

20. The apparatus of claim 17 , wherein the vector code is single program, multiple data (SPMD) code.

21

21. The apparatus of claim 17 , wherein the vector processor is a single instruction, multiple data (SIMD) architected graphics processing unit (GPU) and the vector code is single program, multiple data (SPMD) code.

22

22. The apparatus of claim 17 , further comprising means for executing the generated scalar code if the at least one speculatively uniform instruction is determined to be uniform during runtime based on execution of the generated uniformity detection code.

23

23. The apparatus of claim 17 , further comprising means for not executing the generated scalar code if the at least one speculatively uniform instruction is determined not to be uniform during runtime based on execution of the generated uniformity detection code.

24

24. The apparatus of claim 17 , further comprising means for executing the at least one speculatively uniform instruction.

25

25. A non-transitory computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors of a computing device to: receive vector code, wherein the vector code includes a plurality of instructions configured to be compiled for execution by a vector processor; determine, during compilation of the vector code, whether at least one instruction of the plurality of instructions is a speculatively uniform instruction, wherein the speculatively uniform instruction is an instruction that cannot be determined to be a uniform operation during compilation and cannot be determined to not be a uniform operation during compilation; generate, during compilation of the vector code, uniformity detection code for the at least one speculatively uniform instruction, wherein the uniformity detection code, when executed, is configured to determine whether the at least one speculatively uniform instruction is uniform during runtime; and generate, during compilation of the vector code, scalar code by scalarizing the at least one speculatively uniform instruction, wherein the scalar code is configured to be compiled for execution by a scalar processor, a scalar processing unit of the vector processor, or a vector pipeline of the vector processor, wherein the instructions, when executed, further cause one or more processors of the computing device to cause the vector processor to execute the generated uniformity detection code during runtime to determine whether the at least one speculatively uniform instruction is uniform during runtime.

26

26. The non-transitory computer-readable storage medium of claim 25 , wherein the instructions, when executed, further cause one or more processors of the computing device to cause the vector processor to: execute the generated scalar code if the at least one speculatively uniform instruction is determined to be uniform during runtime based on execution of the generated uniformity detection code.

27

27. The non-transitory computer-readable storage medium of claim 25 , wherein the instructions, when executed, further cause one or more processors of the computing device to cause the vector processor to: not execute the generated scalar code if the at least one speculatively uniform instruction is determined not to be uniform during runtime based on execution of the generated uniformity detection code.

Patent Metadata

Filing Date

Unknown

Publication Date

August 1, 2017

Inventors

Lee Howes

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “SPECULATIVE SCALARIZATION IN VECTOR PROCESSING” (9720691). https://patentable.app/patents/9720691

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.