Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A computer-implemented method for cross-tier data migration in a storage system, the method comprising: receiving, at a seeding module executing in the storage system, a request for migrating a plurality of files from a source tier to a target tier in the storage system; identifying a plurality of containers, each container containing one or more of the plurality of files, and wherein each container is associated with a container identifier; in response to the request, migrating, by the seeding module, data segments, in the plurality of containers, by performing a plurality of sequential operations to copy the data segments in the plurality of containers from the source tier to the target tier, wherein the seeding module includes a resumption context for maintaining a state of the plurality of sequential operations, the state indicating a last successful operation of the plurality of sequential operations, and wherein the resumption context comprises the container identifiers for the plurality of containers, an identifier of a last copied container, and a hash vector; updating, for each data segment copied from the source tier to the target tier, a bit of the hash vector corresponding to a fingerprint for the data segment copied by the last successful operation of the plurality of sequential operations; detecting, by the seeding module, that the data migration is suspended; checking the identifier of the last copied container and the hash vector of the resumption context to determine the last successfully copied container and the data segment copied by the last successful operation of the plurality of sequential operations; resuming the migration of the containers by copying a container following the last successfully copied container.
Storage systems and data migration. This invention addresses the challenge of efficiently migrating large amounts of data between different storage tiers within a system, particularly when the migration process may be interrupted. The method involves a computer system receiving a request to move multiple files from a source storage tier to a target storage tier. The files are organized into containers, each identified by a unique identifier. The system then copies data segments from these containers sequentially from the source to the target tier. To ensure resilience against interruptions, a seeding module maintains a resumption context. This context stores the state of the migration, including the identifiers of all containers involved, the identifier of the last container successfully copied, and a hash vector. For each data segment copied, a corresponding bit in the hash vector is updated. If the data migration is suspended, the system uses the stored last copied container identifier and the hash vector to pinpoint the exact data segment that was last successfully processed. The migration is then resumed by copying the container immediately following the last successfully copied one, leveraging the information in the resumption context to avoid redundant operations and ensure a smooth continuation of the data transfer.
2. The method of claim 1 , wherein the plurality of sequential operations includes a copying operation, which copies a subset of the plurality of containers from the source tier to the target tier.
3. The method of claim 2 , wherein the seeding module stores a container identifier of a last copied container of the subset of the plurality of containers in the resumption context, such that in the event of a failure of the copying operation, the seeding module is to resume copying from the last copied container.
4. The method of claim 1 , wherein the suspending of the data migration is caused by one or more of an execution of a garbage collector, a network disconnection, a system crash, or a user intervention.
5. The method of claim 1 , wherein the source tier is an active tier, and the target tier is a cloud tier.
6. The method of claim 1 , wherein the resumption context further includes perfect hash functions (PHFs), perfect hash vectors (PHVs), wherein the PHFs and PHVs create a collision-free mapping between the plurality of files and the plurality of containers.
A system and method for managing file storage and retrieval in a distributed computing environment addresses the challenge of efficiently organizing and accessing large volumes of data across multiple storage containers. The invention provides a resumption context that includes perfect hash functions (PHFs) and perfect hash vectors (PHVs) to ensure a collision-free mapping between files and storage containers. This approach eliminates the need for redundant data checks and minimizes storage overhead by guaranteeing that each file is uniquely assigned to a specific container without conflicts. The PHFs generate deterministic hash values for each file, while the PHVs store these values in a structured format, enabling rapid lookup and retrieval. The system dynamically adjusts the PHFs and PHVs as new files are added or existing files are modified, maintaining optimal performance and scalability. This method is particularly useful in distributed file systems, cloud storage, and big data applications where efficient data management is critical. The collision-free mapping ensures data integrity and reduces the computational overhead associated with traditional hashing techniques. The invention improves storage efficiency, retrieval speed, and system reliability in large-scale data environments.
7. The method of claim 1 , wherein the plurality of operations further includes an install operation, which is to update locations of the files to point to the target tier.
This invention relates to data storage systems, specifically methods for managing file storage across multiple storage tiers. The problem addressed is the inefficiency of traditional file storage systems that do not dynamically adjust file locations based on access patterns, leading to suboptimal performance and resource utilization. The method involves a system that monitors file access patterns and automatically migrates files between different storage tiers (e.g., fast-access SSDs and slower HDDs) based on usage frequency. The system includes operations to track file access, determine migration candidates, and move files to the appropriate tier. A key feature is an install operation that updates file location references to ensure the system correctly points to the target tier after migration. This operation ensures seamless access to files without requiring manual intervention or system downtime. The method optimizes storage performance by keeping frequently accessed files in high-speed storage while moving less-used files to lower-cost, slower storage. The system dynamically adjusts storage allocation based on real-time usage data, improving efficiency and reducing costs.
8. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations comprising: receiving, at a seeding module executing in a storage system, a request for migrating a plurality of files from a source tier to a target tier in the storage system; identifying a plurality of containers, each container containing one or more of the plurality of files, and wherein each container is associated with a container identifier; in response to the request, migrating, by the seeding module, data segments, in the plurality of containers, by performing a plurality of sequential operations to copy the data segments in the plurality of containers from the source tier to the target tier, wherein the seeding module includes a resumption context for maintaining a state of the plurality of sequential operations, the state indicating a last successful operation of the plurality of sequential operations, and wherein the resumption context comprises the container identifiers for the plurality of containers, an identifier of a last copied container, and a hash vector; updating, for each data segment copied from the source tier to the target tier, a bit of the hash vector corresponding to a fingerprint for the data segment copied by the last successful operation of the plurality of sequential operations; detecting, by the seeding module, that the data migration is suspended; checking the identifier of the last copied container and hash vector of the resumption context to determine the last successfully copied container and the data segment copied by the last successful operation of the plurality of sequential operations; resuming the migration of the containers by copying a container following the last successfully copied container.
9. The non-transitory machine-readable medium of claim 8 , wherein the plurality of sequential operations includes a copying operation, which copies a subset of the plurality of containers from the source tier to the target tier.
10. The non-transitory machine-readable medium of claim 9 , wherein the seeding module stores a container identifier of a last copied container of the subset of the plurality of containers in the resumption context, such that in the event of a failure of the copying operation, the seeding module is to resume copying from the last copied container.
11. The non-transitory machine-readable medium of claim 8 , wherein the suspending of the data migration is caused by one or more of an execution of a garbage collector, a network disconnection, a system crash, or a user intervention.
12. The non-transitory machine-readable medium of claim 8 , wherein the source tier is an active tier, and the target tier is a cloud tier.
13. The non-transitory machine-readable medium of claim 8 , wherein the resumption context further includes perfect hash functions (PHFs), perfect hash vectors (PHVs), wherein the PHFs and PHVs create a collision-free mapping between the plurality of files and the plurality of containers.
14. The non-transitory machine-readable medium of claim 8 , wherein the plurality of operations further includes an install operation, which is to update locations of the files to point to the target tier.
15. A data processing system, comprising: a processor; a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations, the operations comprising: receiving, at a seeding module executing in a storage system, a request for migrating a plurality of files from a source tier to a target tier in the storage system; identifying a plurality of container, each container containing one or more of the plurality of files, and wherein each container is associated with a container identifier; in response to the request, migrating, by the seeding module, data segments, in the plurality of containers, by performing a plurality of sequential operations to copy the data segments in the plurality of containers from the source tier to the target tier, wherein the seeding module includes a resumption context for maintaining a state of the plurality of sequential operations, the state indicating a last successful operation of the plurality of sequential operations, and wherein the resumption context comprises the container identifiers for the plurality of containers, an identifier of a last copied container, and a hash vector; updating, for each data segment copied from the source tier to the target tier, a bit of the hash vector corresponding to a fingerprint for the data segment copied by the last successful operation of the plurality of sequential operations; detecting, by the seeding module, that the data migration is suspended; checking the identifier of the last copied container and the hash vector of the resumption context to determine the last successfully copied container and the data segment copied by the last successful operation of the plurality of sequential operations; resuming the migration of the containers by copying a container following the last successfully copied container.
16. The system of claim 15 , wherein the plurality of sequential operations includes a copying operation, which copies a subset of the plurality of containers from the source tier to the target tier.
17. The system of claim 16 , wherein the seeding module stores a container identifier of a last copied container of the subset of the plurality of containers in the resumption context, such that in the event of a failure of the copying operation, the seeding module is to resume copying from the last copied container.
18. The system of claim of claim 15 , wherein the suspending of the data is caused by one or more of an execution of a garbage collector, a network disconnection, a system crash, or a user intervention.
19. The system of claim of claim 15 , wherein the source tier is an active tier, and the target tier is a cloud tier.
This invention relates to data storage systems that manage data across multiple storage tiers, particularly between an active tier and a cloud tier. The problem addressed is the efficient and automated movement of data between these tiers to optimize storage costs and performance. The system includes a source tier, a target tier, and a data management module that monitors data access patterns to determine which data should be migrated. The source tier is an active tier, meaning it is frequently accessed and optimized for high-speed retrieval, while the target tier is a cloud tier, which is typically lower-cost but slower to access. The data management module identifies data that is infrequently accessed and moves it from the active tier to the cloud tier to reduce storage costs. Conversely, when data in the cloud tier is requested, the system retrieves it and moves it back to the active tier to ensure fast access. The system may also include policies for data migration, such as thresholds for access frequency or time-based rules, to automate the process. The invention ensures that frequently accessed data remains in the high-performance active tier while less-used data is stored cost-effectively in the cloud. This approach balances performance and cost in hybrid storage environments.
20. The system of claim of claim 15 , wherein the resumption context further includes perfect hash functions (PHFs), perfect hash vectors (PHVs), wherein the PHFs and PHVs create a collision-free mapping between the plurality of files and the plurality of containers.
Unknown
February 23, 2021
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.