Legal claims defining the scope of protection, as filed with the USPTO.
1. A method of executing compute kernels, comprising: comparing a kernel profile of a compute kernel to respective processor profiles of a plurality of processors in a heterogeneous computer system; determining a quantity of workgroups to schedule for execution at a time, wherein the determining is based upon current monitored workloads in the plurality of processors compared to a workload threshold and historical information, the historical information including at least one of execution times and processor utilization for each of the workgroups on processors of a first type and of a second type in the plurality of processors, and wherein each of the workgroups comprises multiple instances of an associated compute kernel configured to execute in parallel; and scheduling the workgroups for execution on one or more of the plurality of processors based on the comparing and determining.
2. The method of claim 1 , further comprising: selecting at least one processor from the plurality of processors based upon the comparing; and scheduling the compute kernel for execution in the selected at least one processor.
3. The method of claim 2 , further comprising: generating the kernel profile during installation or compilation of the compute kernel; and refining the kernel profile based upon dynamic information gathered during preceding executions of corresponding at least one of compute kernels or workgroups.
4. The method of claim 3 , wherein generating the kernel profile comprises: determining performance characteristics of the compute kernel; and determining instruction characteristics of the compute kernel.
5. The method of claim 4 , wherein generating the kernel profile further comprises: determining processor affinity characteristics for the compute kernel based upon proximity of at least one of data inputs and data outputs of the compute kernel.
6. The method of claim 2 , further comprising: characterizing respective processors of the plurality of processors; and generating the processor profiles based upon the characterizing.
7. The method of claim 6 , the characterizing respective processors including: determining at least one of a performance or capacity characteristic of a particular one of the respective processors using one or more of a configured property or a dynamically executed test on the particular processor.
8. The method of claim 2 , wherein the comparing comprises: determining available processors from the plurality of processors; determining one or more execution-ready compute kernels from a plurality of compute kernels; and matching respective kernel profiles of the one or more execution-ready compute kernels to respective processor profiles of the available processors.
9. The method of claim 8 , wherein the matching comprises: selecting a pairing of one of the execution-ready compute kernels and one or more of the available processors; estimating a performance measure to execute the selected compute kernel in the selected one or more available processors; and determining one or more matches for the selected pairing based upon the estimated performance measure.
10. The method of claim 9 , wherein the matching further comprises: estimating a consumed energy measure to execute the selected compute kernel in the selected one or more available processors; and determining the one or more matches for the selected pairing based further upon the estimated consumed energy measure.
11. The method of claim 10 , wherein the matching further comprises: periodically re-estimating one or more of the performance measure and the consumed energy measure based upon dynamically obtained performance data of the heterogeneous computer system.
12. The method of claim 10 , wherein the consumed energy measure is determined based upon at least one of an estimated energy required in executing the pairing and an estimated energy required in transferring data for the pairing.
13. The method of claim 9 , wherein the performance measure is determined based upon at least one of an estimated time required in executing the pairing and an estimated time required in transferring data for the pairing.
14. The method of claim 2 , wherein the scheduling comprises: determining respective sizes of workloads for the selected compute kernel to execute in respective ones of the selected at least one processor, wherein the selected at least one processor includes a processor of a first type and a processor of a second type; adjusting the determined respective sizes of workloads based upon historical information including dynamically measured at least one of execution times and processor utilization for respective workgroups on processors of the first type and on processors of the second type; and scheduling the compute kernel for execution in the selected at least one processor in accordance with the adjusted determined respective sizes of workloads.
15. The method of claim 14 , wherein the scheduling further comprises: comparing current conditions of the heterogeneous computer system with the historical information; determining a current status of the heterogeneous computer system in accordance with the comparing; and adjusting the scheduling based upon the current status.
16. A heterogeneous computer system, comprising: a plurality of processors including at least one processor of a first type and at least one processor of a second type; at least one memory coupled to the plurality of processors; and a unified kernel scheduler configured to: compare a kernel profile of a compute kernel to respective processor profiles of the plurality of processors; determine a quantity of workgroups to schedule for execution at a time, wherein the determining is based upon current monitored workloads in the plurality of processors compared to a workload threshold and historical information, the historical information including at least one of execution times and processor utilization for each of the workgroups on processors of the first type and of the second type in the plurality of processors, and wherein each of the workgroups comprises multiple instances of an associated compute kernel configured to execute in parallel; and schedule the workgroups for execution on one or more of the plurality of processors based on the comparing and determining.
17. The heterogeneous computer system of claim 16 , wherein the unified scheduler is further configured to: select at least one processor from the plurality of processors based upon the comparing; and schedule the compute kernel for execution in the selected at least one processor.
18. The heterogeneous computer system of claim 17 , further comprising: a unified kernel queue configured to enqueue compute kernels; and wherein the unified scheduler is further configured to: compare the kernel profile of the compute kernel from the unified kernel queue to respective processor profiles of the plurality of processors.
19. The heterogeneous computer system of claim 18 , wherein the unified scheduler is further configured to: determine available processors from the plurality of processors; determine one or more execution-ready compute kernels from the plurality of compute kernels; and match respective kernel profiles of the one or more execution-ready compute kernels to respective processor profiles of the available processors.
20. A non-transitory computer readable storage medium storing commands, wherein the commands, if executed, cause a method comprising: comparing a kernel profile of a compute kernel to respective processor profiles of a plurality of processors in a heterogeneous computer system; determining a quantity of workgroups to schedule for execution at a time, wherein the determining is based upon current monitored workloads in the plurality of processors compared to a workload threshold and historical information, the historical information including at least one of execution times and processor utilization for each of the workgroups on processors of a first type and of a second type in the plurality of processors, and wherein each of the workgroups comprises multiple instances of an associated compute kernel configured to execute in parallel; and scheduling the workgroups for execution on one or more of the plurality of processors based on the comparing and determining.
Unknown
April 22, 2014
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.