Embodiments are disclosed for a utilization-aware approach to cluster scheduling, to address this resource fragmentation and to improve cluster utilization and job throughput. In some embodiments a resource manager at a master node considers actual usage of running tasks and schedules opportunistic work on underutilized worker nodes. The resource manager monitors resource usage on these nodes and preempts opportunistic containers in the event this over-subscription becomes untenable. In doing so, the resource manager effectively utilizes wasted resources, while minimizing adverse effects on regularly scheduled tasks.
Legal claims defining the scope of protection, as filed with the USPTO.
2. The method of claim 1 , further comprising: de-allocating the opportunistic second-tier resource in response to determining that actual computing resource utilization at the worker node has risen above the second threshold.
3. The method of claim 1 , wherein the opportunistic second-tier resource is de-allocated, by the worker node, in response to a determination, by the worker node, that the actual computing resource utilization at the worker node has risen above the second threshold.
4. The method of claim 1 , wherein the opportunistic second-tier resource is allocated at the worker node to process the second task after determining that the second task allows processing by opportunistic second-tier resources.
5. The method of claim 1 , wherein a request received at the worker node to process a particular task in the distributed computing cluster includes an indication to allow or disallow processing of the particular task using an opportunistic second-tier resource.
6. The method of claim 1 , wherein the information received from the worker node includes periodic heartbeat signals, the method further comprising: determining whether to de-allocate the previously allocated opportunistic second-tier resource at the worker node each time a periodic heartbeat signal is received.
7. The method of claim 1 , wherein the information received from the worker node includes values for the first and/or second threshold.
8. The method of claim 1 , further comprising: setting the first and/or second threshold based on the information received from the worker node.
9. The method of claim 1 , wherein the first threshold and/or second threshold dynamically adjust in response to changes in actual computing resource utilization at the worker node.
10. The method of claim 1 , wherein the first threshold and/or second threshold are specific to the worker node and are different than thresholds at another worker node in the distributed computing cluster.
11. The method of claim 1 , wherein the second threshold is higher than the first threshold.
12. The method of claim 1 , further comprising: allocating an opportunistic third-tier resource at the worker node to process a third task in response to determining that the actual computing resource utilization at the worker node is below a third threshold; wherein the opportunistic third-tier resource includes underutilized computing resources previously allocated and guaranteed to the previously allocated first-tier resource and/or the opportunistic second-tier resource, and wherein the opportunistic third-tier resource is subject to de-allocation if the actual computing resource utilization at the worker node rises above a fourth threshold.
13. The method of claim 12 , wherein the third threshold is the same as the first threshold, and wherein the fourth threshold is the same as the second threshold.
14. The method of claim 1 , wherein the computing resources at the worker node include any one or more of processing, memory, data storage, I/O, or network resources.
16. The system of claim 15 , wherein the information received from the worker node includes periodic heart beat signals, and wherein the memory has further instructions stored thereon, which when executed by the processor, cause the system to further: determine whether actual resource utilization has risen above the second threshold each time a periodic heart beat signal is received.
17. The system of claim 15 , wherein the first threshold and/or second threshold dynamically adjust in response to changes in actual computing resource utilization at the worker node.
19. The method of claim 18 , wherein the information received from the worker node includes periodic heart beat signals, the method further comprising: determining whether actual resource utilization has risen above the second threshold each time a periodic heart beat signal is received.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
February 21, 2020
August 24, 2021
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.