Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A program storage device, on which are stored instructions, comprising instructions that when executed cause one or more compute units to: enqueue a first kernel by a first compute unit for execution on a second compute unit, wherein the first and second compute units have different capabilities; determine, based on the execution of the first kernel, that a condition is met; and in response to the condition being met based on the execution of the first kernel, enqueue a second kernel for execution on the second compute unit.
This invention relates to heterogeneous computing systems where compute units with different capabilities (e.g., CPUs, GPUs, or specialized accelerators) collaborate to execute tasks. The problem addressed is efficiently managing workload distribution and execution across these diverse compute units to optimize performance and resource utilization. The system involves a program storage device containing instructions that, when executed, enable a first compute unit to enqueue a first kernel for execution on a second compute unit with different capabilities. During the execution of the first kernel, the system monitors for a specific condition. Upon detecting that the condition is met, the system automatically enqueues a second kernel for execution on the same second compute unit. This approach allows dynamic workload management, where subsequent tasks are triggered based on runtime conditions, improving flexibility and efficiency in heterogeneous computing environments. The solution ensures seamless coordination between compute units with varying capabilities, enabling adaptive task scheduling and execution.
2. The program storage device of claim 1 , wherein the first compute unit is a central processing unit (CPU) and the second compute unit is a graphic processing unit (GPU).
3. The program storage device of claim 1 , wherein the second kernel is enqueued after execution of the first kernel is complete.
4. The program storage device of claim 1 , wherein the second kernel is enqueued during execution of the first kernel.
5. The program storage device of claim 1 , wherein the first kernel is a data-parallel kernel, and wherein the second kernel is a data-parallel kernel with a different range than the first kernel.
6. The program storage device of claim 1 , wherein the first kernel is a task-parallel kernel, and wherein the second kernel is a task-parallel kernel.
7. The program storage device of claim 1 , wherein the first kernel is a task-parallel kernel, and wherein the second kernel is a data-parallel kernel.
This invention relates to a program storage device containing instructions for executing a hybrid parallel computing system that combines task-parallel and data-parallel processing. The system addresses the challenge of efficiently utilizing heterogeneous computing resources by dynamically assigning workloads to different types of parallel processing kernels. The first kernel is a task-parallel kernel, which manages independent tasks that can be executed concurrently, optimizing workload distribution across multiple processing units. The second kernel is a data-parallel kernel, which processes the same operation on multiple data elements simultaneously, leveraging vectorized or SIMD (Single Instruction, Multiple Data) architectures. The program storage device includes instructions for selecting and executing these kernels based on the nature of the computational workload, ensuring efficient resource utilization and performance optimization. The system may also include mechanisms for load balancing, task scheduling, and synchronization between the task-parallel and data-parallel kernels to maintain computational efficiency. This approach is particularly useful in applications requiring both fine-grained parallelism and large-scale data processing, such as scientific simulations, machine learning, and high-performance computing tasks.
8. The program storage device of claim 1 , wherein the first kernel is a data-parallel kernel, and wherein the second kernel is a task-parallel kernel.
9. The program storage device of claim 1 , wherein the second compute unit enqueues a barrier on a queue of commands to blocks execution of commands enqueued on the queue of commands after the barrier until the barrier completes.
10. The program storage device of claim 1 , wherein the second compute unit enqueues a marker on a queue of commands that does not complete until one or more other commands completes.
11. A computing device, comprising: one or more compute units; and a global memory, coupled to the one or more compute units, on which are stored instructions that when executed cause the one or more compute units to: enqueue a first kernel by a first compute unit for execution on a second compute unit, wherein the first and second compute units have different capabilities; determine, based on the execution of the first kernel, that a condition is met; and in response to the condition being met based on the execution of the first kernel, enqueue a second kernel for execution on the second compute unit.
12. The computing device of claim 11 , wherein the first compute unit is a central processing unit (CPU) and the second compute unit is a graphic processing unit (GPU).
13. The computing device of claim 11 , wherein the second kernel is enqueued after execution of the first kernel is complete.
14. The computing device of claim 11 , wherein the second kernel is enqueued during execution of the first kernel.
15. The computing device of claim 11 , wherein the first kernel is a data-parallel kernel, and wherein the second kernel is a data-parallel kernel with a different range than the first kernel.
This invention relates to computing devices configured to execute data-parallel kernels with different computational ranges. Data-parallel kernels are functions that process multiple data elements simultaneously, often used in high-performance computing tasks such as image processing, scientific simulations, and machine learning. A challenge in such systems is efficiently managing kernels with varying computational requirements, which can lead to inefficiencies in resource utilization and performance bottlenecks. The computing device includes a processor and memory storing instructions that, when executed, cause the processor to execute a first data-parallel kernel and a second data-parallel kernel. The first kernel processes data elements within a specific range, while the second kernel operates on a different range. This allows the device to handle diverse computational workloads by dynamically adjusting the scope of parallel processing. The system may also include mechanisms to optimize resource allocation, such as scheduling kernels based on their range requirements or partitioning data to minimize overhead. By supporting kernels with distinct ranges, the device improves flexibility and efficiency in parallel computing tasks, particularly in applications requiring heterogeneous processing demands.
16. The computing device of claim 11 , wherein the first kernel is a task-parallel kernel, and wherein the second kernel is a task-parallel kernel.
17. The computing device of claim 11 , wherein the first kernel is a task-parallel kernel, and wherein the second kernel is a data-parallel kernel.
18. The computing device of claim 11 , wherein the first kernel is a data-parallel kernel, and wherein the second kernel is a task-parallel kernel.
This invention relates to computing devices configured to execute different types of parallel processing kernels. The technology addresses the challenge of efficiently managing heterogeneous parallel workloads, where tasks may require either data-parallel or task-parallel execution. Data-parallel kernels process multiple data elements simultaneously using the same operations, while task-parallel kernels execute multiple independent tasks concurrently. The computing device includes a processor with specialized hardware or software components to dynamically allocate and schedule these kernels based on their parallelism type. The system optimizes resource utilization by distinguishing between data-parallel and task-parallel workloads, ensuring that each kernel type is executed in a manner that maximizes performance and minimizes overhead. This approach improves efficiency in applications such as scientific computing, machine learning, and real-time data processing, where workloads often combine both data-parallel and task-parallel elements. The invention enhances flexibility and performance by adapting to the specific requirements of each kernel type, avoiding bottlenecks that arise from treating all parallel workloads uniformly.
19. The computing device of claim 11 , wherein the second compute unit enqueues a barrier on a queue of commands to blocks execution of commands enqueued on the queue of commands after the barrier until the barrier completes.
20. The computing device of claim 11 , wherein the second compute unit enqueues a marker on a queue of commands that does not complete until one or more other commands completes.
Unknown
March 23, 2021
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.