US-10489217

Determining storage tiers for placement of data sets during execution of tasks in a workflow

PublishedNovember 26, 2019

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Provided are a computer program product, system, and method for determining storage tiers for placement of data sets during execution of tasks in a workflow. A representation of a workflow execution pattern of tasks for a job indicates a dependency of the tasks and data sets operated on by the tasks. A determination is made of an assignment of the data sets for the tasks to a plurality of the storage tiers based on the dependency of the tasks indicated in the workflow execution pattern. A moving is scheduled of a subject data set of the data sets operated on by a subject task of the tasks that is subject to an event to an assigned storage tier indicated in the assignment for the subject task subject. The moving of the data set is scheduled to be performed in response to the event with respect to the subject task.

Patent Claims

21 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer program product for assigning tasks to storage tiers to store data sets processed by the tasks, wherein the computer program product comprises a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause operations, the operations comprising: determining related data sets comprising data sets that one task or a group of interdependent tasks operate upon; determining whether the related data sets can be assigned to a higher performing storage tier, wherein the higher performing storage tier includes faster access storage devices than a relatively lower performing storage tier; assigning the one task or group of interdependent tasks to the higher performing storage tier to which the related data sets can be assigned in response to determining that the related data sets can be assigned to the higher performing storage tier, wherein the assignment of a storage tier to a task provides a preferred assignment of a storage tier for a task; and using an assignment of the higher performing storage tier to the task or group of interdependent tasks to determine whether to schedule to move the related data sets operated on by the task or group of interdependent tasks assigned to the higher performing storage tier that is different from the storage tier to which the related data sets are currently assigned.

2. The computer program product of claim 1 , wherein the group of interdependent tasks are interrelated when at least one of: the group of interdependent tasks concurrently operate on the related data sets; the group of interdependent tasks operate on the related data sets to provide input to a dependent task that must receive the input before the dependent task can execute; and the group of interdependent tasks comprise sequential tasks in one or more jobs when one task of the group of interdependent tasks cannot begin until a previous task in a sequence completes.

3. The computer program product of claim 1 , wherein only a task of the one task or the group of interdependent tasks that are operating on the related data sets that can be assigned to the higher performing store tier are assigned to the higher performing storage tier.

4. The computer program product of claim 1 , where the operations further comprise: assigning data sets not included in the related data sets to a lower performing storage tier than the higher performing storage tier.

5. The computer program product of claim 1 , wherein the one task or the group of interdependent tasks are assigned to the higher performing storage tier while the one task or at least one task in the group of interdependent tasks are operating on the related data sets.

6. The computer program product of claim 1 , wherein when the group of interdependent tasks are operating on the related data sets, the operations further comprise: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is not being operated by one task in the group of interdependent tasks.

7. The computer program product of claim 1 , wherein when the group of interdependent tasks are operating on the related data sets, the operations further comprise: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is being operated by one task in the group of interdependent tasks and when the related data set is currently on a storage tier that is lower performing than the higher performing storage tier.

8. A system coupled to a plurality of storage tiers, comprising: a plurality of computational nodes; and a computer readable storage medium having program instructions that when executed by the computational nodes perform operations, the operations comprising: determining related data sets comprising data sets that one task or a group of interdependent tasks operate upon; determining whether the related data sets can be assigned to a higher performing storage tier, wherein the higher performing storage tier includes faster access storage devices than a relatively lower performing storage tier; assigning the one task or group of interdependent tasks to the higher performing storage tier to which the related data sets can be assigned in response to determining that the related data sets can be assigned to the higher performing storage tier, wherein the assignment of a storage tier to a task provides a preferred assignment of a storage tier for a task; and using an assignment of the higher performing storage tier to the task or group of interdependent tasks to determine whether to schedule to move the related data sets operated on by the task or group of interdependent tasks assigned to the higher performing storage tier that is different from the storage tier to which the related data sets are currently assigned.

9. The system of claim 8 , wherein the group of interdependent tasks are interrelated when at least one of: the group of interdependent tasks concurrently operate on the related data sets; the group of interdependent tasks operate on the related data sets to provide input to a dependent task that must receive the input before the dependent task can execute; and the group of interdependent tasks comprise sequential tasks in one or more jobs when one task of the group of interdependent tasks cannot begin until a previous task in a sequence completes.

10. The system of claim 8 , wherein only a task of the one task or the group of interdependent tasks that are operating on the related data sets that can be assigned to the higher performing store tier are assigned to the higher performing storage tier.

11. The system of claim 8 , where the operations further comprise: assigning data sets not included in the related data sets to a lower performing storage tier than the higher performing storage tier.

12. The system of claim 8 , wherein the one task or the group of interdependent tasks are assigned to the higher performing storage tier while the one task or at least one task in the group of interdependent tasks are operating on the related data sets.

13. The system of claim 8 , wherein when the group of interdependent tasks are operating on the related data sets, the operations further comprise: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is not being operated by one task in the group of interdependent tasks.

14. The system of claim 8 , wherein when the group of interdependent tasks are operating on the related data sets, the operations further comprise: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is being operated by one task in the group of interdependent tasks and when the related data set is currently on a storage tier that is lower performing than the higher performing storage tier.

15. A method for assigning tasks to storage tiers to store data sets processed by the tasks, comprising: determining related data sets comprising data sets that one task or a group of interdependent tasks operate upon; determining whether the related data sets can be assigned to a higher performing storage tier, wherein the higher performing storage tier includes faster access storage devices than a relatively lower performing storage tier; assigning the one task or group of interdependent tasks to the higher performing storage tier to which the related data sets can be assigned in response to determining that the related data sets can be assigned to the higher performing storage tier, wherein the assignment of a storage tier to a task provides a preferred assignment of a storage tier for a task; and using an assignment of the higher performing storage tier to the task or group of interdependent tasks to determine whether to schedule to move the related data sets operated on by the task or group of interdependent tasks assigned to the higher performing storage tier that is different from the storage tier to which the related data sets are currently assigned.

16. The method of claim 15 , wherein the group of interdependent tasks are interrelated when at least one of: the group of interdependent tasks concurrently operate on the related data sets; the group of interdependent tasks operate on the related data sets to provide input to a dependent task that must receive the input before the dependent task can execute; and the group of interdependent tasks comprise sequential tasks in one or more jobs when one task of the group of interdependent tasks cannot begin until a previous task in a sequence completes.

17. The method of claim 15 , wherein only a task of the one task or the group of interdependent tasks that are operating on the related data sets that can be assigned to the higher performing store tier are assigned to the higher performing storage tier.

18. The method of claim 15 , further comprising: assigning data sets not included in the related data sets to a lower performing storage tier than the higher performing storage tier.

19. The method of claim 15 , wherein the one task or the group of interdependent tasks are assigned to the higher performing storage tier while the one task or at least one task in the group of interdependent tasks are operating on the related data sets.

20. The method of claim 15 , wherein when the group of interdependent tasks are operating on the related data sets, further comprising: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is not being operated by one task in the group of interdependent tasks.

21. The method of claim 15 , wherein when the group of interdependent tasks are operating on the related data sets, further comprising: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is being operated by one task in the group of interdependent tasks and when the related data set is currently on a storage tier that is lower performing than the higher performing storage tier.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F

Patent Metadata

Filing Date

December 15, 2017

Publication Date

November 26, 2019

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search