Techniques are provided for scheduling computational tasks among multiple classes of storage resources based on a job classification. A job to be executed is classified into one of a plurality of predefined job classes. Each predefined job class is associated with a corresponding one of a plurality of predefined storage classes. The job is then assigned based on the classification to one of the storage resources of the predefined storage class associated with the classified predefined job class. Exemplary predefined storage classes include a performance class, a capacity class, a key-value storage class, and a shingled disk drive class. Exemplary predefined job classes include a CPU Intensive job class, an IO Intensive job class and a Small IO job class. Data required for a job is optionally prefetched before the job is assigned to a storage device. Data objects to be evicted from a storage device are optionally selected based on an anticipated future access.
Legal claims defining the scope of protection, as filed with the USPTO.
1. A job scheduling method comprising: obtaining at least one job to be executed from a client application; obtaining a classification of a plurality of storage resources in a plurality of storage tiers into one of a plurality of predefined storage classes, wherein said plurality of predefined storage classes comprises at least two of a CPU Intensive storage class; an IO Intensive storage class and a Small IO storage class for said plurality of storage resources in said plurality of storage tiers; maintaining a plurality of job queues comprising at least two of a CPU Intensive job queue; an IO Intensive job queue and a Small IO job queue; classifying, in response to said obtaining said at least one job to be executed from said client application, said at least one job to be executed into a particular one of a plurality of predefined job classes based on one or more characteristics of the at least one job to be executed, wherein said plurality of predefined job classes comprises at least two of a CPU Intensive job class; an Intensive job class and a Small IO job class, wherein each of said plurality of predefined job classes is associated with a corresponding one of said plurality of predefined storage classes; and dynamically assigning, in response to said obtaining said at least one job to be executed from said client application, said at least one job to be executed to one of the plurality of job queues for the predefined storage class associated with the particular predefined job class, wherein the storage resource for the predefined storage class is in at least one of the plurality of storage tiers, wherein the step of dynamically assigning said at least one job to be executed to the at least one storage class is based on said classifying of the at least one job to be executed into the particular predefined job class.
2. The method of claim 1 wherein said plurality of predefined storage classes comprises at least two of a performance class that employs storage resources based on performance considerations, a capacity class that employs storage resources based on capacity considerations, a key-value storage class that employs a hardware accelerated key-value store, and a shingled disk drive class.
3. The method of claim 2 wherein said key-value storage class comprises one or more of a key-value flash-based storage system and a key-value disk-based storage system.
4. The method of claim 1 wherein at least one job is assigned to said Small IO job class based on a comparison of one or more data objects associated with said at least one job to a page size threshold.
5. The method of claim 1 further comprising the step of prefetching data required for at least one job assigned to said IO Intensive job class before said at least one job is assigned to a storage device of said corresponding predefined storage class.
6. The method of claim 1 further comprising the step of selecting one or more data objects to be evicted from one or more storage devices based on an anticipated future access.
7. A computer program product comprising a non-transitory processor-readable storage medium having encoded therein executable code of one or more software programs, wherein the one or more software programs when executed perform the steps of the method of claim 1 .
8. The method of claim 1 further comprising the steps of querying a multi-tier interface to identify required data sets already resident in a performance storage tier and scheduling one or more jobs requiring said required data sets in a performance storage tier to execute using said performance storage tier.
9. The method of claim 1 further comprising the step of parsing said plurality of job queues to organize data for said classified jobs based on future data accesses.
10. A system, comprising: a memory; and at least one processing device, coupled to the memory, operative to implement the following steps: obtaining at least one job to be executed from a client application; obtaining a classification of a plurality of storage resources in a plurality of storage tiers into one of a plurality of predefined storage classes, wherein said plurality of predefined storage classes comprises at least two of a CPU Intensive storage class; an IO Intensive storage class and a Small IO storage class for said plurality of storage resources in said plurality of storage tiers; maintaining a plurality of job queues comprising at least two of a CPU Intensive job queue; an IO Intensive job queue and a Small IO job queue; classifying, in response to said obtaining said at least one job to be executed from said client application, said at least one job to be executed into a particular one of a plurality of predefined job classes based on one or more characteristics of the at least one job to be executed, wherein said plurality of predefined job classes comprises at least two of a CPU Intensive job class; an Intensive job class and a Small IO job class, wherein each of said plurality of predefined job classes is associated with a corresponding one of said plurality of predefined storage classes; and dynamically assigning, in response to said obtaining said at least one job to be executed from said client application, said at least one job to be executed to one of the plurality of job queues for the predefined storage class associated with the particular predefined job class, wherein the storage resource for the predefined storage class is in at least one of the plurality of storage tiers, wherein the step of dynamically assigning said at least one job to be executed to the at least one storage class is based on said classifying of the at least one job to be executed into the particular predefined job class.
11. The system of claim 10 wherein said plurality of predefined storage classes comprises at least two of a performance class that employs storage resources based on performance considerations, a capacity class that employs storage resources based on capacity considerations, a key-value storage class that employs a hardware accelerated key-value store, and a shingled disk drive class.
12. The system of claim 10 wherein at least one job is assigned to said Small IO job class based on a comparison of one or more data objects associated with said at least one job to a page size threshold.
13. The system of claim 10 wherein said at least one processing device is further configured to prefetch data required for at least one job assigned to said IO Intensive job class before said at least one job is assigned to a storage device of said corresponding predefined storage class.
14. The system of claim 10 wherein said at least one processing device is further configured to select one or more data objects to be evicted from one or more storage devices based on an anticipated future access.
15. The system of claim 10 wherein said at least one process device is further configured to query a multi-tier interface to identify required data sets already resident in a performance storage tier and schedule one or more jobs requiring said required data sets in a performance storage tier to execute using said performance storage tier.
16. The system of claim 10 further comprising the step of parsing said plurality of job queues to organize data for said classified jobs based on future data accesses.
17. A job scheduling system comprising: a plurality of data nodes; and a job scheduling node, wherein the job scheduling node is configured to: communicate with said plurality of said data nodes over a network; obtain at least one job to be executed from a client application; obtain a classification of a plurality of storage resources in a plurality of storage tiers into one of a plurality of predefined storage classes, wherein said plurality of predefined storage classes comprises at least two of a CPU Intensive storage class; an IO Intensive storage class and a Small IO storage class for said plurality of storage resources in said plurality of storage tiers; maintain a plurality of job queues comprising at least two of a CPU Intensive job queue; an IO Intensive job queue and a Small IO job queue; classify, in response to said obtaining said at least one job to be executed from said client application, said at least one job to be executed into a particular one of a plurality of predefined job classes based on one or more characteristics of the at least one job to be executed, wherein said plurality of predefined job classes comprises at least two of a CPU Intensive job class; an Intensive job class and a Small IO job class, wherein each of said plurality of predefined job classes is associated with a corresponding one of said plurality of predefined storage classes; and dynamically assign, in response to said obtaining said at least one job to be executed from said client application, said at least one job to be executed to one of the plurality of job queues for the predefined storage class associated with the particular predefined job class, wherein the storage resource for the predefined storage class is in at least one of the plurality of storage tiers, wherein the step of dynamically assigning said at least one job to be executed to the at least one storage class is based on said classifying of the at least one job to be executed into the particular predefined job class.
18. The job scheduling system of claim 17 wherein said plurality of predefined storage classes comprises at least two of a performance class that employs storage resources based on performance considerations, a capacity class that employs storage resources based on capacity considerations, a key-value storage class that employs a hardware accelerated key-value store, and a shingled disk drive class.
19. The job scheduling system of claim 17 wherein at least one job is assigned to said Small IO job class based on a comparison of one or more data objects associated with said at least one job to a page size threshold.
20. The job scheduling system of claim 17 wherein said at least one processing device is further configured to prefetch data required for at least one job assigned to said IO Intensive job class before said at least one job is assigned to a storage device of said corresponding predefined storage class.
21. The job scheduling system of claim 17 wherein said at least one processing device is further configured to select one or more data objects to be evicted from one or more storage devices based on an anticipated future access.
22. The job scheduling system of claim 17 wherein said at least one process device is further configured to query a multi-tier interface to identify required data sets already resident in a performance storage tier and schedule one or more jobs requiring said required data sets in a performance storage tier to execute using said performance storage tier.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
June 26, 2015
October 1, 2019
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.