A job profile describes characteristics of a job. A performance parameter is calculated based on the job profile, and using a value of the performance parameter, an allocation of resources is determined to assign to the job to meet a performance goal associated with a job.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A method comprising: receiving, by a system having a processor, a job profile of a job, wherein the job profile describes characteristics of map tasks and reduce tasks, wherein the map tasks produce intermediate results based on segments of input data, and the reduce tasks produce an output based on the intermediate results; providing, in the system, a performance model that calculates a performance parameter based on the characteristics of the job profile, a number of the map tasks, a number of the reduce tasks, and an allocation of resources; identifying, in the system, plural feasible solutions including corresponding different allocations of resources for which respective values of the performance parameter calculated by the performance model satisfy a performance goal associated with the job, wherein each of the different allocations of resources includes a respective number of map slots in which the map tasks are performed, and a respective number of reduce slots in which the reduce tasks are performed; and determining, by the system, a particular allocation of resources selected from the plural feasible solutions to assign to the job to meet the performance goal.
2. The method of claim 1 , wherein identifying the plural feasible solutions comprises: determining whether the corresponding value of the performance parameter calculated by the performance model for each of the plural feasible solutions satisfies the performance goal.
3. The method of claim 2 , wherein the performance goal is a completion time, and wherein the performance parameter is a time value.
4. The method of claim 2 , wherein the performance parameter calculated by the performance model is one of a lower bound parameter, an upper bound parameter, and an intermediate parameter between the lower bound parameter and the upper bound parameter.
5. The method of claim 1 , wherein determining the particular allocation of resources comprises selecting from among the feasible solutions in the set according to a predefined criterion.
6. The method of claim 1 , wherein the performance parameter is computed by the performance model further based on a number of map slots, a number of reduce slots, an average time duration of a map task, an average time duration of a shuffle phase in a reduce stage that includes the reduce tasks, an average of time duration of a sort phase in the reduce stage, and an average time duration of a reduce phase in the reduce stage.
7. An article comprising at least one non-transitory machine-readable storage medium storing instructions that upon execution cause a system having a processor to: receive a job profile describing a job to be performed in a distributed computing platform having resources, wherein the job profile includes characteristics of a map stage and a reduce stage of the job, the map stage processing input data to produce an intermediate result, and the reduce stage to process the intermediate result to produce an output; use the characteristics of the job profile to calculate corresponding values of a performance parameter for respective different allocations of resources for the job, wherein the performance parameter is computed based on a number of map tasks in the map stage, a number of reduce tasks in the reduce stage, a number of map slots, a number of reduce slots, an average time duration of a map task, an average time duration of a shuffle phase in the reduce stage, an average time duration of a sort phase in the reduce stage, and an average time duration of a reduce phase in the reduce stage; and determine, based on the values of the performance parameter, a specific allocation of the resources for the job that satisfies a performance goal.
8. The article of claim 7 , wherein the specific allocation of resources is a feasible solution, and wherein the instructions upon execution cause the system to further: identify plural feasible solutions including corresponding different allocations of resources for which respective values of the performance parameter satisfy the performance goal.
9. The article of claim 8 , wherein each of the different allocations of resources of the feasible solutions includes a respective number of map slots and a respective number of reduce slots, where the map tasks are performed in the map slots, and the reduce tasks are performed in the reduce slots, and wherein identifying the plural feasible solutions comprises: iterating through a range of numbers of the map slots; and for each of the numbers of map slots in the range, determining if there is a number of reduce slots for which a calculated value of the performance parameter satisfies the performance goal.
10. The article of claim 8 , wherein each of the allocations of resources of the feasible solutions includes a respective number of map slots and a respective number of reduce slots, where the map tasks are performed in the map slots, and the reduce tasks are performed in the reduce slots, and wherein identifying the plural feasible solutions comprises: iterating through a range of numbers of the reduce slots; and for each of the numbers of reduce slots in the range, determining if there is a number of map slots for which a calculated value of the performance parameter satisfies the performance goal.
11. The article of claim 7 , wherein the specific allocation of the resources for the job includes a number of map slots and a number of reduce slots, wherein the map slots of the specific allocation are used for executing tasks of the map stage, and the reduce slots of the specific allocation are used for executing tasks of the reduce stage.
12. The article of claim 11 , wherein the distributed computing platform has plural physical machines, where each physical machine has a respective set of map and reduce slots.
13. The article of claim 7 , wherein the performance goal is a target completion time.
14. The article of claim 7 , wherein the performance parameter is an upper bound performance parameter computed further based on a maximum time duration of a map task, a maximum time duration of the shuffle phase, a maximum time duration of the sort phase, and a maximum time duration of the reduce phase.
15. The article of claim 7 , wherein the performance parameter is an intermediate performance parameter between an upper bound and a lower bound, wherein the lower bound is computed based on the number of map tasks in the map stage, the number of reduce tasks in the reduce stage, the number of map slots, the number of reduce slots, the average time duration of a map task, the average time duration of the shuffle phase, the average time duration of the sort phase, and the average time duration of the reduce phase, and wherein the upper bound is computed based on the number of the map tasks in the map stage, the number of reduce tasks in the reduce stage, the number of map slots, the number of reduce slots, the average time duration of a map task, a maximum time duration of a map task, the average time duration of the shuffle phase, a maximum time duration of the shuffle phase, the average time duration of the sort phase, a maximum time duration of the sort phase, the average time duration of the reduce phase, and a maximum time duration of the reduce phase.
16. The article of claim 7 , wherein the performance parameter is a lower bound performance parameter.
17. A system comprising: storage media to store a job profile, wherein the job profile describes a job including a map stage to produce an intermediate result based on input data, and a reduce stage to produce an output based on the intermediate result; and at least one processor to: provide a performance model that calculates a performance parameter based on the job profile, a number of map tasks in the map stage, a number of reduce tasks in the reduce stage, a number of map slots in which the map tasks are performed, a number of reduce slots in which the reduce tasks are performed, an average time duration of a map task, an average time duration of a shuffle phase in the reduce stage, an average time duration of a sort phase in the reduce stage, and an average time duration of a reduce phase in the reduce stage; and determine, using a value of the performance parameter calculated by the performance model, a particular allocation of resources to assign to the job to meet a performance goal associated with the job.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
February 2, 2011
August 5, 2014
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.