Provided are a computer program product, system, and method for determining storage tiers for placement of data sets during execution of tasks in a workflow. A representation of a workflow execution pattern of tasks for a job indicates a dependency of the tasks and data sets operated on by the tasks. A determination is made of an assignment of the data sets for the tasks to a plurality of the storage tiers based on the dependency of the tasks indicated in the workflow execution pattern. A moving is scheduled of a subject data set of the data sets operated on by a subject task of the tasks that is subject to an event to an assigned storage tier indicated in the assignment for the subject task subject. The moving of the data set is scheduled to be performed in response to the event with respect to the subject task.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A computer program product for assigning tasks to storage tiers to store data sets processed by the tasks, wherein the computer program product comprises a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause operations, the operations comprising: determining related data sets comprising data sets that one task or a group of interdependent tasks operate upon; determining whether the related data sets can be assigned to a higher performing storage tier, wherein the higher performing storage tier includes faster access storage devices than a relatively lower performing storage tier; assigning the one task or group of interdependent tasks to the higher performing storage tier to which the related data sets can be assigned in response to determining that the related data sets can be assigned to the higher performing storage tier, wherein the assignment of a storage tier to a task provides a preferred assignment of a storage tier for a task; and using an assignment of the higher performing storage tier to the task or group of interdependent tasks to determine whether to schedule to move the related data sets operated on by the task or group of interdependent tasks assigned to the higher performing storage tier that is different from the storage tier to which the related data sets are currently assigned.
Computer system for optimizing data storage tier assignments. The system addresses the problem of efficiently allocating data to different performance levels of storage to improve processing speed. A computer program product includes instructions for a processor to perform several operations. First, it identifies related data sets that are accessed by a specific task or a set of interconnected tasks. Next, it evaluates if these related data sets are suitable for placement on a faster, higher-performing storage tier. If the data sets can be assigned to this higher tier, the system then assigns the task or group of tasks to that tier. This assignment indicates a preferred storage location for the task's data. Finally, based on this preferred storage tier assignment, the system decides whether to move the related data sets to a different storage tier than their current location.
2. The computer program product of claim 1 , wherein the group of interdependent tasks are interrelated when at least one of: the group of interdependent tasks concurrently operate on the related data sets; the group of interdependent tasks operate on the related data sets to provide input to a dependent task that must receive the input before the dependent task can execute; and the group of interdependent tasks comprise sequential tasks in one or more jobs when one task of the group of interdependent tasks cannot begin until a previous task in a sequence completes.
This invention relates to computer program products for managing interdependent tasks in data processing systems. The problem addressed is the efficient coordination and execution of tasks that rely on shared data or sequential dependencies, which can lead to bottlenecks, inefficiencies, or errors if not properly managed. The invention describes a system where a group of interdependent tasks are interrelated in one or more of the following ways: they concurrently operate on related data sets, they provide input to a dependent task that requires the input before execution, or they are sequential tasks in one or more jobs where a task cannot begin until a preceding task completes. The system ensures that these tasks are executed in a coordinated manner, optimizing resource usage and preventing conflicts or delays. The invention may include mechanisms to detect task dependencies, schedule execution based on data availability or task completion, and manage shared data access to maintain consistency. By handling these interdependencies, the system improves task execution efficiency, reduces errors, and ensures that dependent tasks receive necessary inputs in the correct order. This is particularly useful in environments where multiple tasks must operate on shared data or follow strict execution sequences, such as in workflow automation, distributed computing, or data processing pipelines.
3. The computer program product of claim 1 , wherein only a task of the one task or the group of interdependent tasks that are operating on the related data sets that can be assigned to the higher performing store tier are assigned to the higher performing storage tier.
This invention relates to optimizing data storage and processing in a multi-tiered storage system. The problem addressed is inefficient resource utilization in systems where tasks operate on related data sets stored across different storage tiers with varying performance characteristics. The invention provides a method to dynamically assign tasks to the most suitable storage tier based on performance capabilities, ensuring that only tasks operating on data sets that can be accommodated by a higher-performing storage tier are assigned to that tier. This prevents overloading the higher-performing tier while maximizing efficiency. The system includes a storage tier manager that evaluates task requirements and data dependencies, a performance monitor that tracks storage tier capabilities, and a task scheduler that assigns tasks to the appropriate tier. The invention ensures that tasks are executed in the most efficient manner by aligning task assignments with storage tier performance, reducing latency and improving overall system throughput. The solution is particularly useful in environments where data sets are interdependent and require coordinated processing across multiple storage tiers.
4. The computer program product of claim 1 , where the operations further comprise: assigning data sets not included in the related data sets to a lower performing storage tier than the higher performing storage tier.
This invention relates to data storage management in computer systems, specifically optimizing storage tier allocation for improved performance and cost efficiency. The problem addressed is inefficient data placement across storage tiers, leading to suboptimal performance and higher operational costs. The invention provides a method for dynamically assigning data sets to different storage tiers based on their performance characteristics and relationships with other data sets. The system identifies related data sets that are frequently accessed together and assigns them to a higher performing storage tier to reduce latency and improve access speeds. Data sets not included in these related groups are assigned to a lower performing, cost-effective storage tier. This tiered approach ensures that frequently accessed data is stored in high-performance storage, while less critical data is stored in lower-cost storage, balancing performance and cost. The invention may also include mechanisms to monitor data access patterns and dynamically adjust tier assignments over time to maintain optimal performance. This solution is particularly useful in large-scale data storage environments where efficient resource allocation is critical.
5. The computer program product of claim 1 , wherein the one task or the group of interdependent tasks are assigned to the higher performing storage tier while the one task or at least one task in the group of interdependent tasks are operating on the related data sets.
This invention relates to data storage systems and methods for optimizing performance by dynamically assigning tasks to different storage tiers based on their data access patterns. The problem addressed is inefficient resource utilization in multi-tiered storage systems, where tasks may not be optimally placed relative to the data they operate on, leading to performance bottlenecks. The invention involves a computer program product that monitors task execution and data access patterns to identify tasks or groups of interdependent tasks that frequently operate on related data sets. These tasks are then dynamically reassigned to a higher-performing storage tier while they are actively processing the related data. This ensures that frequently accessed data remains in the faster storage tier, reducing latency and improving overall system performance. The system may also track dependencies between tasks to ensure that related tasks are co-located in the same storage tier, further optimizing performance. The solution includes mechanisms for detecting task interdependencies, analyzing data access patterns, and dynamically reallocating tasks to the appropriate storage tier. This approach improves efficiency by reducing unnecessary data movement and ensuring that high-priority tasks have access to the fastest storage resources when needed. The system may also revert tasks to lower-tier storage when they are idle or no longer accessing the related data, conserving higher-tier storage resources for other critical operations.
6. The computer program product of claim 1 , wherein when the group of interdependent tasks are operating on the related data sets, the operations further comprise: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is not being operated by one task in the group of interdependent tasks.
This invention relates to data management in computing systems, specifically optimizing storage performance for interdependent tasks operating on related datasets. The problem addressed is inefficient data access when tasks depend on shared datasets, leading to performance bottlenecks due to suboptimal storage tier placement. The system involves a group of interdependent tasks that process related datasets. When a task in this group begins execution, the system checks if the required dataset is currently in a higher-performing storage tier (e.g., faster memory or SSD). If the dataset is not in the optimal tier, the system schedules its movement to the higher-performing tier, but only if no other task in the group is actively using it. This ensures that data movement does not interfere with ongoing operations, reducing latency and improving overall system efficiency. The approach dynamically adjusts storage placement based on task execution patterns, minimizing unnecessary data transfers and maximizing performance for interdependent workloads. This is particularly useful in environments where tasks share datasets and where storage tiers have varying performance characteristics. The solution balances the need for fast access with the overhead of data movement, optimizing resource utilization.
7. The computer program product of claim 1 , wherein when the group of interdependent tasks are operating on the related data sets, the operations further comprise: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is being operated by one task in the group of interdependent tasks and when the related data set is currently on a storage tier that is lower performing than the higher performing storage tier.
This invention relates to optimizing data storage performance in computing systems that process interdependent tasks operating on related datasets. The problem addressed is inefficient data access when tasks require datasets stored in lower-performing storage tiers, leading to delays and reduced system performance. The system includes a storage hierarchy with multiple tiers of varying performance levels, such as fast solid-state drives (SSDs) and slower hard disk drives (HDDs). When a group of interdependent tasks begins execution, the system monitors which datasets are being actively used. If a dataset required by a task is located in a lower-performing storage tier (e.g., HDD) while the task is running, the system automatically schedules the dataset to be moved to a higher-performing tier (e.g., SSD). This proactive data migration ensures that frequently accessed or currently needed datasets reside in the fastest available storage, minimizing latency and improving overall task execution efficiency. The system dynamically adjusts storage allocation based on real-time task activity, optimizing performance without manual intervention.
8. A system coupled to a plurality of storage tiers, comprising: a plurality of computational nodes; and a computer readable storage medium having program instructions that when executed by the computational nodes perform operations, the operations comprising: determining related data sets comprising data sets that one task or a group of interdependent tasks operate upon; determining whether the related data sets can be assigned to a higher performing storage tier, wherein the higher performing storage tier includes faster access storage devices than a relatively lower performing storage tier; assigning the one task or group of interdependent tasks to the higher performing storage tier to which the related data sets can be assigned in response to determining that the related data sets can be assigned to the higher performing storage tier, wherein the assignment of a storage tier to a task provides a preferred assignment of a storage tier for a task; and using an assignment of the higher performing storage tier to the task or group of interdependent tasks to determine whether to schedule to move the related data sets operated on by the task or group of interdependent tasks assigned to the higher performing storage tier that is different from the storage tier to which the related data sets are currently assigned.
A system manages data storage across multiple storage tiers with varying performance levels to optimize task execution. The system includes computational nodes and a storage medium with program instructions for analyzing and assigning data sets to storage tiers. The operations involve identifying related data sets that are accessed by a single task or a group of interdependent tasks. The system evaluates whether these related data sets can be moved to a higher-performing storage tier, which includes faster access storage devices compared to lower-performing tiers. If the data sets can be assigned to a higher-performing tier, the system assigns the task or group of tasks to that tier, establishing a preferred storage tier assignment for the task. This assignment influences scheduling decisions for moving the related data sets, ensuring they are stored in the higher-performing tier to improve access efficiency. The system dynamically adjusts storage assignments based on task dependencies and performance requirements, optimizing data placement to enhance overall system performance.
9. The system of claim 8 , wherein the group of interdependent tasks are interrelated when at least one of: the group of interdependent tasks concurrently operate on the related data sets; the group of interdependent tasks operate on the related data sets to provide input to a dependent task that must receive the input before the dependent task can execute; and the group of interdependent tasks comprise sequential tasks in one or more jobs when one task of the group of interdependent tasks cannot begin until a previous task in a sequence completes.
This invention relates to a system for managing interdependent tasks in a computing environment, particularly where tasks must coordinate their execution based on shared data dependencies. The problem addressed is the inefficiency and complexity of scheduling and executing tasks that rely on each other, either through concurrent operations on shared data, sequential execution requirements, or input dependencies between tasks. The system identifies and groups tasks that are interdependent based on their relationships to shared data sets. These relationships are defined by three key conditions: tasks that concurrently operate on the same data sets, tasks that provide input to a dependent task that cannot execute until the input is received, and tasks that are part of a sequential workflow where one task cannot begin until a prior task in the sequence completes. The system ensures that these interdependent tasks are managed in a way that maintains their dependencies, improving efficiency and reducing conflicts in task execution. This approach is particularly useful in distributed computing environments where task coordination is critical for performance and correctness.
10. The system of claim 8 , wherein only a task of the one task or the group of interdependent tasks that are operating on the related data sets that can be assigned to the higher performing store tier are assigned to the higher performing storage tier.
This invention relates to a data storage system that optimizes task execution by dynamically assigning tasks to different storage tiers based on performance characteristics. The system addresses the problem of inefficient resource utilization in distributed computing environments where tasks operate on related data sets stored across multiple storage tiers with varying performance levels. The invention ensures that only tasks that can benefit from higher-performance storage are assigned to the higher-performing tier, while other tasks are directed to lower-performance tiers to balance workload and reduce costs. The system includes a task scheduler that evaluates tasks and their associated data sets to determine which tasks should be assigned to the higher-performing storage tier. The scheduler identifies interdependent tasks operating on related data sets and ensures that only those tasks that can be effectively executed in the higher-performance tier are assigned there. This selective assignment prevents unnecessary resource consumption and improves overall system efficiency. The system also includes a storage tier manager that monitors performance metrics and dynamically adjusts task assignments based on real-time conditions, ensuring optimal resource allocation. The invention enhances computational efficiency by minimizing data transfer overhead and reducing latency for critical tasks while maintaining cost-effectiveness for less performance-sensitive operations.
11. The system of claim 8 , where the operations further comprise: assigning data sets not included in the related data sets to a lower performing storage tier than the higher performing storage tier.
The invention relates to a data storage system that optimizes storage performance by dynamically assigning data sets to different storage tiers based on their relationship to frequently accessed data. The system identifies related data sets that are frequently accessed together and stores them in a higher performing storage tier, such as solid-state drives (SSDs), to reduce latency and improve access speed. Data sets that are not part of these related groups are assigned to a lower performing storage tier, such as hard disk drives (HDDs), to reduce storage costs while maintaining performance for critical data. The system monitors access patterns to determine which data sets are frequently accessed together, ensuring that related data remains in the high-performance tier while less frequently accessed data is moved to lower-cost storage. This approach balances performance and cost by prioritizing storage resources for data that benefits most from faster access times. The system may also include mechanisms to periodically reassess data relationships and adjust storage assignments accordingly, ensuring ongoing optimization.
12. The system of claim 8 , wherein the one task or the group of interdependent tasks are assigned to the higher performing storage tier while the one task or at least one task in the group of interdependent tasks are operating on the related data sets.
This invention relates to a data storage system that optimizes performance by dynamically assigning tasks to different storage tiers based on their processing demands. The system addresses the challenge of efficiently managing workloads in multi-tiered storage environments, where different tiers have varying performance characteristics. The invention ensures that tasks operating on related data sets are executed in the higher-performing storage tier, improving overall system efficiency and reducing latency. The system includes a storage tier selection module that evaluates task requirements and data dependencies to determine the optimal storage tier for execution. When a task or a group of interdependent tasks is processing related data sets, the system assigns them to the higher-performing storage tier to enhance performance. This dynamic allocation ensures that critical operations benefit from faster storage resources while less demanding tasks may be handled by lower-tier storage, balancing performance and resource utilization. The invention also includes mechanisms to monitor task execution and adjust tier assignments in real-time, adapting to changing workload conditions. By intelligently routing tasks to the appropriate storage tier, the system minimizes bottlenecks and maximizes throughput, particularly for workloads with interdependent tasks that require low-latency access to shared data.
13. The system of claim 8 , wherein when the group of interdependent tasks are operating on the related data sets, the operations further comprise: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is not being operated by one task in the group of interdependent tasks.
This invention relates to data storage management in systems handling interdependent tasks that operate on related datasets. The problem addressed is inefficient data access in multi-tiered storage systems, where tasks may experience delays when required datasets are stored in lower-performance tiers. The solution involves dynamically moving datasets to higher-performance storage tiers based on task execution patterns. The system includes a storage tiering mechanism that monitors task execution and dataset usage. When a task in a group of interdependent tasks begins execution, the system checks whether the related datasets are currently stored in a higher-performance storage tier. If a dataset is not actively being used by any task in the group, the system schedules its movement to the higher-performance tier in anticipation of future access. This proactive approach reduces latency by ensuring frequently accessed datasets are readily available in faster storage. The system also includes a task scheduling component that coordinates the execution of interdependent tasks, ensuring that datasets are moved to the optimal storage tier before they are needed. This coordination prevents unnecessary data transfers and optimizes storage resource utilization. The invention improves system performance by minimizing data access delays while efficiently managing storage resources.
14. The system of claim 8 , wherein when the group of interdependent tasks are operating on the related data sets, the operations further comprise: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is being operated by one task in the group of interdependent tasks and when the related data set is currently on a storage tier that is lower performing than the higher performing storage tier.
This invention relates to data storage management in systems handling interdependent tasks that operate on related datasets. The problem addressed is inefficient data access performance when tasks require datasets stored in lower-performing storage tiers, leading to delays and suboptimal resource utilization. The system dynamically manages data placement across storage tiers of varying performance levels (e.g., fast solid-state drives vs. slower disk-based storage) to optimize task execution. When a group of interdependent tasks begins processing, the system monitors which datasets are actively being used. If a dataset required by a task is located in a lower-performing storage tier, the system automatically schedules its migration to a higher-performing tier. This proactive data movement ensures that frequently accessed or critical datasets reside in faster storage, reducing latency and improving overall system throughput. The system also coordinates data movement based on task dependencies, preventing unnecessary transfers and ensuring datasets are available when needed. By dynamically adjusting storage tier assignments during task execution, the system balances performance and resource usage, particularly in environments with limited high-performance storage capacity. This approach is beneficial for workloads involving large-scale data processing, analytics, or machine learning, where dataset access patterns are predictable but may change over time.
15. A method for assigning tasks to storage tiers to store data sets processed by the tasks, comprising: determining related data sets comprising data sets that one task or a group of interdependent tasks operate upon; determining whether the related data sets can be assigned to a higher performing storage tier, wherein the higher performing storage tier includes faster access storage devices than a relatively lower performing storage tier; assigning the one task or group of interdependent tasks to the higher performing storage tier to which the related data sets can be assigned in response to determining that the related data sets can be assigned to the higher performing storage tier, wherein the assignment of a storage tier to a task provides a preferred assignment of a storage tier for a task; and using an assignment of the higher performing storage tier to the task or group of interdependent tasks to determine whether to schedule to move the related data sets operated on by the task or group of interdependent tasks assigned to the higher performing storage tier that is different from the storage tier to which the related data sets are currently assigned.
This invention relates to optimizing data storage performance in multi-tiered storage systems by intelligently assigning tasks and their associated data sets to appropriate storage tiers. The problem addressed is inefficient data access in systems where tasks operate on related data sets stored across different storage tiers with varying performance characteristics. The solution involves analyzing task dependencies and data relationships to improve storage tier utilization. The method identifies groups of interdependent tasks that process related data sets, which are data sets accessed or modified by the same task or task group. It evaluates whether these related data sets can be moved to a higher-performing storage tier, which includes faster access storage devices compared to lower-performing tiers. If feasible, the method assigns the task or task group to the higher-performing tier, establishing a preferred storage tier assignment for the task. This assignment then influences scheduling decisions for moving the related data sets to the higher-performing tier, even if they are currently stored in a different tier. The approach ensures that tasks and their associated data sets are co-located in the most efficient storage tier, reducing latency and improving overall system performance.
16. The method of claim 15 , wherein the group of interdependent tasks are interrelated when at least one of: the group of interdependent tasks concurrently operate on the related data sets; the group of interdependent tasks operate on the related data sets to provide input to a dependent task that must receive the input before the dependent task can execute; and the group of interdependent tasks comprise sequential tasks in one or more jobs when one task of the group of interdependent tasks cannot begin until a previous task in a sequence completes.
This invention relates to task scheduling and execution in computing systems, specifically addressing the challenge of efficiently managing interdependent tasks that operate on related data sets. The method involves identifying a group of interdependent tasks where the tasks are interrelated in at least one of three ways: they concurrently operate on the same or related data sets, they provide input to a dependent task that requires this input before execution, or they are sequential tasks in one or more jobs where a task cannot begin until a preceding task in the sequence completes. The method ensures that these interdependent tasks are scheduled and executed in a manner that maintains their dependencies, optimizing resource utilization and minimizing delays. This approach is particularly useful in distributed computing environments where tasks may be spread across multiple nodes or systems, requiring careful coordination to avoid conflicts or bottlenecks. By dynamically adjusting task execution based on their interdependencies, the method improves overall system performance and reliability. The solution is applicable in various computing scenarios, including data processing pipelines, workflow automation, and parallel computing systems.
17. The method of claim 15 , wherein only a task of the one task or the group of interdependent tasks that are operating on the related data sets that can be assigned to the higher performing store tier are assigned to the higher performing storage tier.
This invention relates to data processing systems that manage task execution across multiple storage tiers with varying performance characteristics. The problem addressed is optimizing task assignment to storage tiers to improve efficiency and performance. The system identifies interdependent tasks operating on related data sets and evaluates the performance capabilities of available storage tiers. Tasks are selectively assigned to higher-performing storage tiers only if they can benefit from the enhanced performance, while other tasks remain on lower-performing tiers. This selective assignment prevents unnecessary resource consumption and ensures that high-performance storage is used where it provides the greatest benefit. The system dynamically assesses task dependencies and data relationships to determine optimal storage tier assignments, balancing performance needs with resource utilization. The approach improves overall system efficiency by avoiding over-provisioning of high-performance storage while ensuring critical tasks receive the necessary resources. The invention is particularly useful in distributed computing environments where storage performance varies across tiers and tasks have interdependencies that affect data processing efficiency.
18. The method of claim 15 , further comprising: assigning data sets not included in the related data sets to a lower performing storage tier than the higher performing storage tier.
This invention relates to data storage management, specifically optimizing storage tier allocation for data sets based on their relationships. The problem addressed is inefficient storage resource utilization, where data sets with strong interdependencies are not optimally placed in high-performance storage tiers, leading to slower access times and wasted storage capacity. The method involves analyzing data sets to identify related data sets that are frequently accessed together or share dependencies. These related data sets are then assigned to a higher performing storage tier, such as solid-state drives (SSDs), to improve access speed and reduce latency. Data sets that are not identified as related are assigned to a lower performing storage tier, such as hard disk drives (HDDs), to balance cost and performance. The system dynamically adjusts storage tier assignments based on ongoing analysis of data access patterns and relationships, ensuring optimal performance for frequently accessed or interdependent data while reducing costs for less critical data. This approach enhances storage efficiency by aligning storage performance with data usage patterns.
19. The method of claim 15 , wherein the one task or the group of interdependent tasks are assigned to the higher performing storage tier while the one task or at least one task in the group of interdependent tasks are operating on the related data sets.
This invention relates to data storage systems and methods for optimizing performance by dynamically assigning tasks to different storage tiers based on their performance characteristics. The problem addressed is inefficient resource utilization in multi-tiered storage systems, where tasks may not be optimally placed relative to the data they access, leading to performance bottlenecks. The method involves identifying a task or a group of interdependent tasks that operate on related data sets. These tasks are then assigned to a higher-performing storage tier while they are actively processing the related data. This ensures that high-priority or performance-critical tasks benefit from faster storage access, improving overall system efficiency. The assignment may involve moving the tasks to a faster storage medium, such as solid-state drives (SSDs), while lower-priority tasks remain on slower storage, such as hard disk drives (HDDs). The method may also include monitoring task performance and storage tier utilization to dynamically adjust assignments as needed. This ensures that resources are allocated efficiently, balancing performance and cost. The approach is particularly useful in environments where workloads vary, such as cloud computing or enterprise data centers, where optimizing storage tier usage can reduce latency and improve throughput.
20. The method of claim 15 , wherein when the group of interdependent tasks are operating on the related data sets, further comprising: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is not being operated by one task in the group of interdependent tasks.
This invention relates to optimizing data storage performance in systems handling interdependent tasks that operate on related datasets. The problem addressed is inefficient data access when tasks require data from different storage tiers, leading to performance bottlenecks. The solution involves dynamically managing data placement based on task execution to minimize latency. The method involves a group of interdependent tasks operating on related datasets stored across multiple storage tiers with varying performance levels. When a task in the group starts, the system checks if the required dataset is currently in the higher-performing storage tier. If not, and if the dataset is not actively being used by another task, the system schedules the dataset to be moved to the higher-performing tier. This ensures that when the task needs the data, it is already available in the fastest accessible location, reducing wait times and improving overall system efficiency. The approach prioritizes data movement only when beneficial, avoiding unnecessary transfers that could disrupt other operations. This dynamic scheduling helps balance performance and resource usage in systems where tasks depend on shared datasets.
21. The method of claim 15 , wherein when the group of interdependent tasks are operating on the related data sets, further comprising: in response to starting a starting task in the group of interdependent tasks, scheduling to move a related data set of the related data sets to the higher performing storage tier when the related data set is being operated by one task in the group of interdependent tasks and when the related data set is currently on a storage tier that is lower performing than the higher performing storage tier.
This invention relates to optimizing data storage performance in systems handling interdependent tasks that operate on related datasets. The problem addressed is inefficient data access when tasks require datasets stored in lower-performing storage tiers, leading to delays and reduced system performance. The solution involves dynamically moving datasets to higher-performing storage tiers when they are actively being used by tasks, ensuring faster access and improved efficiency. The method involves a group of interdependent tasks operating on related datasets. When a starting task in this group begins execution, the system schedules the movement of the related dataset to a higher-performing storage tier if the dataset is currently stored in a lower-performing tier and is being actively accessed by a task. This proactive data migration ensures that frequently accessed datasets reside in faster storage, reducing latency and improving overall system throughput. The approach is particularly useful in environments where tasks are interdependent and datasets are shared or sequentially processed, such as in data analytics, machine learning workflows, or distributed computing systems. By dynamically adjusting storage tiers based on task execution, the system avoids unnecessary data transfers while ensuring optimal performance when needed.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
December 15, 2017
November 26, 2019
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.