Deduplicated data is packed in a self-contained deduplicated repository having unique data blocks with each being referenced by a globally unique identifier (GUID). The self-contained deduplicated repository has information regarding both deduplicated data files and the unique data blocks of each of the deduplicated data files and a master GUID list containing a location of each of the unique data blocks.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A system for storing deduplicated data in a computing environment, the system comprising: a processor device operable in the computing storage environment, wherein the processor device: packs deduplicated data in a self-contained deduplicated repository having a plurality of unique data blocks with each being referenced by a globally unique identifier (GUID), and the self-contained deduplicated repository also having information regarding both deduplicated data files and the plurality of unique data blocks of each of the deduplicated data files and a master GUID list containing a location of each of the plurality of unique data blocks; loads the self-contained deduplicated repository into a target repository, the target repository being stored in a virtual environment; creates the self-contained deduplicated repository from a list of deduplicated data files, wherein the self-contained deduplicated repository is isolated such that the self-contained deduplicated repository is stored at a remote location without requiring power and contains a table of contents (TOC) of the deduplicated data files, metadata of the deduplicated data files, and one unique copy of each of the plurality of unique data blocks; and merges all deduplicated data files in the self-contained deduplicated repository into a target repository by reading the TOC and deduplication metadata, wherein the TOC and deduplication metadata are stored, and any of the GUID needed by the TOC that do not exist in the target repository are read from the self-contained deduplicated repository and stored in the target repository.
A system stores deduplicated data by packing it into a self-contained repository. This repository contains unique data blocks, each identified by a globally unique identifier (GUID). The repository also stores information about the deduplicated files, their unique data blocks, and a master list of GUIDs with their locations. This repository can be loaded into a target repository within a virtual environment. The self-contained repository is created from a list of deduplicated files, stored remotely without power, and includes a table of contents (TOC) and metadata for each file, along with a single copy of each unique data block. During a merge, the system reads the TOC and metadata. If a GUID from the TOC is missing in the target repository, the system retrieves it from the self-contained repository and stores it in the target repository.
2. The system of claim 1 , wherein the processor device examines the TOC of each of the deduplicated data files in the list of deduplicated data files one after another, wherein the master GUID list is maintained, and the TOC is an ordered list of the GUID that make up a file.
The system from the previous description examines the table of contents (TOC) of each deduplicated data file in a list sequentially, maintaining a master GUID list. The TOC is an ordered list of GUIDs that comprise each file. This allows the system to identify all unique data blocks required for the files.
3. The system of claim 1 , wherein the processor device adds to the master GUID list each one of the plurality of unique data blocks required by any one of the deduplicated data files in the list of deduplicated data files.
The system from the first description adds each unique data block's GUID to a master list, identifying all data blocks required by any of the deduplicated data files in the list. This ensures that all necessary data blocks are tracked for deduplication and retrieval.
4. The system of claim 1 , wherein the processor device commits to the self-contained deduplicated repository the TOC and metadata of each of the deduplicated data files in the list of deduplicated data files, the master GUID list, and the plurality of unique data blocks required by any one of the deduplicated data files in the list of deduplicated data files.
The system from the first description commits the table of contents (TOC), metadata of each deduplicated data file, the master GUID list, and all the unique data blocks required by any of the deduplicated data files to the self-contained deduplicated repository. This action ensures that all necessary components for restoring the deduplicated data are stored within the self-contained repository.
5. The system of claim 1 , wherein the processor device performs one of: ordering the plurality of unique data blocks in the self-contained deduplicated repository by the GUID, and mapping a position of the plurality of unique data blocks in the self-contained deduplicated repository.
The system from the first description orders the unique data blocks within the self-contained deduplicated repository by their GUIDs, or alternatively, it maps the physical position of each of these unique data blocks within the repository. This allows for efficient lookup and retrieval of data blocks based on their identifiers or location.
6. The system of claim 1 , wherein the processor device performs each of: inputting into the self-contained deduplicated repository the list of deduplicated data files, determining if each one of the deduplicated data files in the list of deduplicated data files are processed, appending metadata and the TOC of the one of the deduplicated data files to the self-contained deduplicated repository if one of the deduplicated data files is not processed, determining if each of the GUID in the one of the deduplicated data files is processed, determining if the GUID is located in the master GUID list, adding the GUID to the master GUID list, writing the master GUID list to the self-contained deduplicated repository, and writing the data of each of the GUID to the self-contained deduplicated repository.
The system from the first description inputs a list of deduplicated data files into the self-contained deduplicated repository. For each file, it checks if it has been processed. If not, the system appends the file's metadata and table of contents (TOC) to the repository. It then iterates through each GUID in the file, checking if it is already in the master GUID list. If the GUID is not present, it's added to the list. Finally, the system writes the updated master GUID list and the actual data associated with each GUID into the self-contained deduplicated repository.
7. The system of claim 1 , wherein the processor device retrieves the deduplicated data file from the self-contained deduplicated repository.
The system from the first description retrieves a requested deduplicated data file from the self-contained deduplicated repository. This is the primary function enabled by the system's organization and metadata.
8. The system of claim 7 , wherein the processor device performs one of: scanning the list of deduplicated data files at the beginning of the self-contained deduplicated repository until locating one of the deduplicated data files to be retrieved, retrieving metadata of the one of the deduplicated data files and the TOC from the self-contained deduplicated repository, skipping to the master GUID list and retrieving the master GUID list from the self-contained deduplicated repository, if each of the GUID in the one of the deduplicated data files is not processed, performing each of: finding a position of a first GUID in the one of the deduplicated data files in the master GUID list, and adding a tuple to the master GUID list that is retrieved, and if each of the GUID in the one of the deduplicated data files is processed, performing each of: sorting the master GUID list that is retrieved by position, determining if all tuples in the master GUID list have been processed, deriving a position of the GUID of a first tuple in the self-contained deduplicated repository from the position in the master GUID list that is retrieved, forwarding to the position of the GUID of the first tuple in the self-contained deduplicated repository and retrieving data of the GUID, and writing the data of the GUID of the first tuple to each position in the one of the deduplicated data according to appearance in consecutive tuples in the master GUID list that is sorted.
A system for managing deduplicated data files in a self-contained repository optimizes data retrieval and reconstruction. The system addresses inefficiencies in accessing and reconstructing deduplicated data by leveraging a master GUID (Globally Unique Identifier) list and a table of contents (TOC). The processor scans the list of deduplicated data files until locating a target file. It retrieves metadata and the TOC from the repository, then accesses the master GUID list. If the GUIDs in the target file are unprocessed, the system finds their positions in the master GUID list and adds corresponding tuples. If the GUIDs are already processed, the system sorts the master GUID list by position and checks if all tuples have been processed. For each tuple, it derives the GUID's position in the repository, retrieves the associated data, and writes it to the target file according to the sorted tuple sequence. This ensures efficient data reconstruction while maintaining the integrity of deduplicated storage. The system enhances performance by minimizing redundant operations and optimizing access patterns within the repository.
9. The system of claim 1 , wherein the processor device restores the deduplicated data file from the self-contained deduplicated repository by reading a table of contents (TOC) of the deduplicated data file and reading each one of the plurality of unique data blocks required by the deduplicated data file using the GUID listed in the TOC.
The system from the first description restores a deduplicated data file from the self-contained deduplicated repository by reading the table of contents (TOC) of the file and then retrieving each of the unique data blocks required by the file using the GUIDs listed in the TOC. This effectively reconstructs the original file from its deduplicated components.
10. The system of claim 1 , wherein the processor device allows each deduplication system that draws the GUID for the deduplication system from a similar GUID allocation sphere to share each one of a plurality of self-contained deduplicated repositories, extract files from each one of the plurality of self-contained deduplicated repositories, and append to each one of the plurality of self-contained deduplicated repositories.
The system from the first description allows multiple deduplication systems, using GUIDs from a similar allocation, to share, extract files from, and append data to multiple self-contained deduplicated repositories. This facilitates interoperability and data exchange between different deduplication systems.
11. A computer program product storing deduplicated data by a processor device, the computer program product comprising a computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that packs deduplicated data in a self-contained deduplicated repository having a plurality of unique data blocks with each being referenced by a globally unique identifier (GUID), and the self-contained deduplicated repository also having information regarding both deduplicated data files and the plurality of unique data blocks of each of the deduplicated data files and a master GUID list containing a location of each of the plurality of unique data blocks; an executable portion that loads the self-contained deduplicated repository into a target repository, the target repository being stored in a virtual environment; an executable portion that creates the self-contained deduplicated repository from a list of deduplicated data files, wherein the self-contained deduplicated repository is isolated such that the self-contained deduplicated repository is stored at a remote location without requiring power and contains a table of contents (TOC) of the deduplicated data files, metadata of the deduplicated data files, and one unique copy of each of the plurality of unique data blocks; and an executable portion that merges all deduplicated data files in the self-contained deduplicated repository into a target repository by reading the TOC and deduplication metadata, wherein the TOC and deduplication metadata are stored, and any of the GUID needed by the TOC that do not exist in the target repository are read from the self-contained deduplicated repository and stored in the target repository.
A computer program product stores deduplicated data by using executable code to pack it into a self-contained repository. The repository contains unique data blocks, each identified by a globally unique identifier (GUID). The repository also stores information about the deduplicated files, their unique data blocks, and a master list of GUIDs with their locations. This repository can be loaded into a target repository within a virtual environment. The self-contained repository is created from a list of deduplicated files, stored remotely without power, and includes a table of contents (TOC) and metadata for each file, along with a single copy of each unique data block. During a merge, the system reads the TOC and metadata. If a GUID from the TOC is missing in the target repository, the system retrieves it from the self-contained repository and stores it in the target repository.
12. The computer program product of claim 11 , further including an executable portion that examines the TOC of each of the deduplicated data files in the list of deduplicated data files one after another, wherein the master GUID list is maintained, and the TOC is an ordered list of the GUID that make up a file.
The computer program product from the previous description contains executable code that examines the table of contents (TOC) of each deduplicated data file in a list sequentially, maintaining a master GUID list. The TOC is an ordered list of GUIDs that comprise each file. This allows the system to identify all unique data blocks required for the files.
13. The computer program product of claim 11 , further including an executable portion that adds to the master GUID list each one of the plurality of unique data blocks required by any one of the deduplicated data files in the list of deduplicated data files.
The computer program product from the first description contains executable code that adds each unique data block's GUID to a master list, identifying all data blocks required by any of the deduplicated data files in the list. This ensures that all necessary data blocks are tracked for deduplication and retrieval.
14. The computer program product of claim 11 , further including an executable portion that commits to the self-contained deduplicated repository the TOC and metadata of each of the deduplicated data files in the list of deduplicated data files, the master GUID list, and the plurality of unique data blocks required by any one of the deduplicated data files in the list of deduplicated data files.
The computer program product from the first description contains executable code that commits the table of contents (TOC), metadata of each deduplicated data file, the master GUID list, and all the unique data blocks required by any of the deduplicated data files to the self-contained deduplicated repository. This action ensures that all necessary components for restoring the deduplicated data are stored within the self-contained repository.
15. The computer program product of claim 11 , further including an executable portion that performs one of: ordering the plurality of unique data blocks in the self-contained deduplicated repository by the GUID, and mapping a position of the plurality of unique data blocks in the self-contained deduplicated repository.
The computer program product from the first description contains executable code that orders the unique data blocks within the self-contained deduplicated repository by their GUIDs, or alternatively, it maps the physical position of each of these unique data blocks within the repository. This allows for efficient lookup and retrieval of data blocks based on their identifiers or location.
16. The computer program product of claim 11 , further including an executable portion that performs each of: inputting into the self-contained deduplicated repository the list of deduplicated data files, determining if each one of the deduplicated data files in the list of deduplicated data files are processed, appending metadata and the TOC of the one of the deduplicated data files to the self-contained deduplicated repository if one of the deduplicated data files is not processed, determining if each of the GUID in the one of the deduplicated data files is processed, determining if the GUID is located in the master GUID list, adding the GUID to the master GUID list, writing the master GUID list to the self-contained deduplicated repository, and writing the data of each of the GUID to the self-contained deduplicated repository.
The computer program product from the first description contains executable code that inputs a list of deduplicated data files into the self-contained deduplicated repository. For each file, it checks if it has been processed. If not, the system appends the file's metadata and table of contents (TOC) to the repository. It then iterates through each GUID in the file, checking if it is already in the master GUID list. If the GUID is not present, it's added to the list. Finally, the system writes the updated master GUID list and the actual data associated with each GUID into the self-contained deduplicated repository.
17. The computer program product of claim 11 , further including an executable portion that retrieves the deduplicated data file from the self-contained deduplicated repository.
The computer program product from the first description contains executable code that retrieves a requested deduplicated data file from the self-contained deduplicated repository. This is the primary function enabled by the system's organization and metadata.
18. The computer program product of claim 17 , further including an executable portion that retrieves the deduplicated data file from the self-contained deduplicated repository by performing one of: scanning the list of deduplicated data files at the beginning of the self-contained deduplicated repository until locating one of the deduplicated data files to be retrieved, retrieving metadata of the one of the deduplicated data files and the TOC from the self-contained deduplicated repository, skipping to the master GUID list and retrieving the master GUID list from the self-contained deduplicated repository, if each of the GUID in the one of the deduplicated data files is not processed, performing each of: finding a position of a first GUID in the one of the deduplicated data files in the master GUID list, and adding a tuple to the master GUID list that is retrieved, and if each of the GUID in the one of the deduplicated data files is processed, performing each of: sorting the master GUID list that is retrieved by position, determining if all tuples in the master GUID list have been processed, deriving a position of the GUID of a first tuple in the self-contained deduplicated repository from the position in the master GUID list that is retrieved, forwarding to the position of the GUID of the first tuple in the self-contained deduplicated repository and retrieving data of the GUID, and writing the data of the GUID of the first tuple to each position in the one of the deduplicated data according to appearance in consecutive tuples in the master GUID list that is sorted.
To retrieve a file, the computer program product from the previous description contains executable code that scans the list of deduplicated files at the beginning of the repository until it finds the desired file. It retrieves the file's metadata and TOC. If any GUIDs in the file are not yet processed, it retrieves the master GUID list and determines each GUID's position. It then creates tuples in the master GUID list of the files GUIDs that is retrieved. If the file's GUIDs are processed then the master GUID list of the files GUIDs that is retrieved sorted by position. The system then determines if all tuples in the master GUID list have been processed. From the GUID position in the master list the system determines its position inside the repository itself. Then skips to the data in the self-contained deduplicated repository, retrieves data, and writes the data of the GUID to the files respective position.
19. The computer program product of claim 11 , further including an executable portion that restores the deduplicated data file from the self-contained deduplicated repository by reading a table of contents (TOC) of the deduplicated data file and reading each one of the plurality of unique data blocks required by the deduplicated data file using the GUID listed in the TOC.
The computer program product from the first description contains executable code that restores a deduplicated data file from the self-contained deduplicated repository by reading the table of contents (TOC) of the file and then retrieving each of the unique data blocks required by the file using the GUIDs listed in the TOC. This effectively reconstructs the original file from its deduplicated components.
20. The computer program product of claim 11 , further including an executable portion that allows each deduplication system that draws the GUID for the deduplication system from a similar GUID allocation sphere to share each one of a plurality of self-contained deduplicated repositories, extract files from each one of the plurality of self-contained deduplicated repositories, and append to each one of the plurality of self-contained deduplicated repositories.
The computer program product from the first description contains executable code that allows multiple deduplication systems, using GUIDs from a similar allocation, to share, extract files from, and append data to multiple self-contained deduplicated repositories. This facilitates interoperability and data exchange between different deduplication systems.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
January 10, 2013
June 13, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.