10956218

Enqueuing Kernels from Kernels on GPU/CPU

PublishedMarch 23, 2021
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A program storage device, on which are stored instructions, comprising instructions that when executed cause one or more compute units to: enqueue a first kernel by a first compute unit for execution on a second compute unit, wherein the first and second compute units have different capabilities; determine, based on the execution of the first kernel, that a condition is met; and in response to the condition being met based on the execution of the first kernel, enqueue a second kernel for execution on the second compute unit.

Plain English Translation

This invention relates to heterogeneous computing systems where compute units with different capabilities (e.g., CPUs, GPUs, or specialized accelerators) collaborate to execute tasks. The problem addressed is efficiently managing workload distribution and execution across these diverse compute units to optimize performance and resource utilization. The system involves a program storage device containing instructions that, when executed, enable a first compute unit to enqueue a first kernel for execution on a second compute unit with different capabilities. During the execution of the first kernel, the system monitors for a specific condition. Upon detecting that the condition is met, the system automatically enqueues a second kernel for execution on the same second compute unit. This approach allows dynamic workload management, where subsequent tasks are triggered based on runtime conditions, improving flexibility and efficiency in heterogeneous computing environments. The solution ensures seamless coordination between compute units with varying capabilities, enabling adaptive task scheduling and execution.

Claim 2

Original Legal Text

2. The program storage device of claim 1 , wherein the first compute unit is a central processing unit (CPU) and the second compute unit is a graphic processing unit (GPU).

Plain English translation pending...
Claim 3

Original Legal Text

3. The program storage device of claim 1 , wherein the second kernel is enqueued after execution of the first kernel is complete.

Plain English translation pending...
Claim 4

Original Legal Text

4. The program storage device of claim 1 , wherein the second kernel is enqueued during execution of the first kernel.

Plain English translation pending...
Claim 5

Original Legal Text

5. The program storage device of claim 1 , wherein the first kernel is a data-parallel kernel, and wherein the second kernel is a data-parallel kernel with a different range than the first kernel.

Plain English translation pending...
Claim 6

Original Legal Text

6. The program storage device of claim 1 , wherein the first kernel is a task-parallel kernel, and wherein the second kernel is a task-parallel kernel.

Plain English translation pending...
Claim 7

Original Legal Text

7. The program storage device of claim 1 , wherein the first kernel is a task-parallel kernel, and wherein the second kernel is a data-parallel kernel.

Plain English Translation

This invention relates to a program storage device containing instructions for executing a hybrid parallel computing system that combines task-parallel and data-parallel processing. The system addresses the challenge of efficiently utilizing heterogeneous computing resources by dynamically assigning workloads to different types of parallel processing kernels. The first kernel is a task-parallel kernel, which manages independent tasks that can be executed concurrently, optimizing workload distribution across multiple processing units. The second kernel is a data-parallel kernel, which processes the same operation on multiple data elements simultaneously, leveraging vectorized or SIMD (Single Instruction, Multiple Data) architectures. The program storage device includes instructions for selecting and executing these kernels based on the nature of the computational workload, ensuring efficient resource utilization and performance optimization. The system may also include mechanisms for load balancing, task scheduling, and synchronization between the task-parallel and data-parallel kernels to maintain computational efficiency. This approach is particularly useful in applications requiring both fine-grained parallelism and large-scale data processing, such as scientific simulations, machine learning, and high-performance computing tasks.

Claim 8

Original Legal Text

8. The program storage device of claim 1 , wherein the first kernel is a data-parallel kernel, and wherein the second kernel is a task-parallel kernel.

Plain English translation pending...
Claim 9

Original Legal Text

9. The program storage device of claim 1 , wherein the second compute unit enqueues a barrier on a queue of commands to blocks execution of commands enqueued on the queue of commands after the barrier until the barrier completes.

Plain English translation pending...
Claim 10

Original Legal Text

10. The program storage device of claim 1 , wherein the second compute unit enqueues a marker on a queue of commands that does not complete until one or more other commands completes.

Plain English translation pending...
Claim 11

Original Legal Text

11. A computing device, comprising: one or more compute units; and a global memory, coupled to the one or more compute units, on which are stored instructions that when executed cause the one or more compute units to: enqueue a first kernel by a first compute unit for execution on a second compute unit, wherein the first and second compute units have different capabilities; determine, based on the execution of the first kernel, that a condition is met; and in response to the condition being met based on the execution of the first kernel, enqueue a second kernel for execution on the second compute unit.

Plain English translation pending...
Claim 12

Original Legal Text

12. The computing device of claim 11 , wherein the first compute unit is a central processing unit (CPU) and the second compute unit is a graphic processing unit (GPU).

Plain English translation pending...
Claim 13

Original Legal Text

13. The computing device of claim 11 , wherein the second kernel is enqueued after execution of the first kernel is complete.

Plain English translation pending...
Claim 14

Original Legal Text

14. The computing device of claim 11 , wherein the second kernel is enqueued during execution of the first kernel.

Plain English translation pending...
Claim 15

Original Legal Text

15. The computing device of claim 11 , wherein the first kernel is a data-parallel kernel, and wherein the second kernel is a data-parallel kernel with a different range than the first kernel.

Plain English Translation

This invention relates to computing devices configured to execute data-parallel kernels with different computational ranges. Data-parallel kernels are functions that process multiple data elements simultaneously, often used in high-performance computing tasks such as image processing, scientific simulations, and machine learning. A challenge in such systems is efficiently managing kernels with varying computational requirements, which can lead to inefficiencies in resource utilization and performance bottlenecks. The computing device includes a processor and memory storing instructions that, when executed, cause the processor to execute a first data-parallel kernel and a second data-parallel kernel. The first kernel processes data elements within a specific range, while the second kernel operates on a different range. This allows the device to handle diverse computational workloads by dynamically adjusting the scope of parallel processing. The system may also include mechanisms to optimize resource allocation, such as scheduling kernels based on their range requirements or partitioning data to minimize overhead. By supporting kernels with distinct ranges, the device improves flexibility and efficiency in parallel computing tasks, particularly in applications requiring heterogeneous processing demands.

Claim 16

Original Legal Text

16. The computing device of claim 11 , wherein the first kernel is a task-parallel kernel, and wherein the second kernel is a task-parallel kernel.

Plain English translation pending...
Claim 17

Original Legal Text

17. The computing device of claim 11 , wherein the first kernel is a task-parallel kernel, and wherein the second kernel is a data-parallel kernel.

Plain English translation pending...
Claim 18

Original Legal Text

18. The computing device of claim 11 , wherein the first kernel is a data-parallel kernel, and wherein the second kernel is a task-parallel kernel.

Plain English Translation

This invention relates to computing devices configured to execute different types of parallel processing kernels. The technology addresses the challenge of efficiently managing heterogeneous parallel workloads, where tasks may require either data-parallel or task-parallel execution. Data-parallel kernels process multiple data elements simultaneously using the same operations, while task-parallel kernels execute multiple independent tasks concurrently. The computing device includes a processor with specialized hardware or software components to dynamically allocate and schedule these kernels based on their parallelism type. The system optimizes resource utilization by distinguishing between data-parallel and task-parallel workloads, ensuring that each kernel type is executed in a manner that maximizes performance and minimizes overhead. This approach improves efficiency in applications such as scientific computing, machine learning, and real-time data processing, where workloads often combine both data-parallel and task-parallel elements. The invention enhances flexibility and performance by adapting to the specific requirements of each kernel type, avoiding bottlenecks that arise from treating all parallel workloads uniformly.

Claim 19

Original Legal Text

19. The computing device of claim 11 , wherein the second compute unit enqueues a barrier on a queue of commands to blocks execution of commands enqueued on the queue of commands after the barrier until the barrier completes.

Plain English translation pending...
Claim 20

Original Legal Text

20. The computing device of claim 11 , wherein the second compute unit enqueues a marker on a queue of commands that does not complete until one or more other commands completes.

Plain English translation pending...
Patent Metadata

Filing Date

Unknown

Publication Date

March 23, 2021

Inventors

Aaftab A. Munshi

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Enqueuing Kernels from Kernels on GPU/CPU” (10956218). https://patentable.app/patents/10956218

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10956218. See llms.txt for full attribution policy.