US-9678971

Packing deduplicated data in a self-contained deduplicated repository

PublishedJune 13, 2017

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Deduplicated data is packed in a self-contained deduplicated repository having unique data blocks with each being referenced by a globally unique identifier (GUID). The self-contained deduplicated repository has information regarding both deduplicated data files and the unique data blocks of each of the deduplicated data files and a master GUID list containing a location of each of the unique data blocks.

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A system for storing deduplicated data in a computing environment, the system comprising: a processor device operable in the computing storage environment, wherein the processor device: packs deduplicated data in a self-contained deduplicated repository having a plurality of unique data blocks with each being referenced by a globally unique identifier (GUID), and the self-contained deduplicated repository also having information regarding both deduplicated data files and the plurality of unique data blocks of each of the deduplicated data files and a master GUID list containing a location of each of the plurality of unique data blocks; loads the self-contained deduplicated repository into a target repository, the target repository being stored in a virtual environment; creates the self-contained deduplicated repository from a list of deduplicated data files, wherein the self-contained deduplicated repository is isolated such that the self-contained deduplicated repository is stored at a remote location without requiring power and contains a table of contents (TOC) of the deduplicated data files, metadata of the deduplicated data files, and one unique copy of each of the plurality of unique data blocks; and merges all deduplicated data files in the self-contained deduplicated repository into a target repository by reading the TOC and deduplication metadata, wherein the TOC and deduplication metadata are stored, and any of the GUID needed by the TOC that do not exist in the target repository are read from the self-contained deduplicated repository and stored in the target repository.

2. The system of claim 1 , wherein the processor device examines the TOC of each of the deduplicated data files in the list of deduplicated data files one after another, wherein the master GUID list is maintained, and the TOC is an ordered list of the GUID that make up a file.

3. The system of claim 1 , wherein the processor device adds to the master GUID list each one of the plurality of unique data blocks required by any one of the deduplicated data files in the list of deduplicated data files.

4. The system of claim 1 , wherein the processor device commits to the self-contained deduplicated repository the TOC and metadata of each of the deduplicated data files in the list of deduplicated data files, the master GUID list, and the plurality of unique data blocks required by any one of the deduplicated data files in the list of deduplicated data files.

5. The system of claim 1 , wherein the processor device performs one of: ordering the plurality of unique data blocks in the self-contained deduplicated repository by the GUID, and mapping a position of the plurality of unique data blocks in the self-contained deduplicated repository.

6. The system of claim 1 , wherein the processor device performs each of: inputting into the self-contained deduplicated repository the list of deduplicated data files, determining if each one of the deduplicated data files in the list of deduplicated data files are processed, appending metadata and the TOC of the one of the deduplicated data files to the self-contained deduplicated repository if one of the deduplicated data files is not processed, determining if each of the GUID in the one of the deduplicated data files is processed, determining if the GUID is located in the master GUID list, adding the GUID to the master GUID list, writing the master GUID list to the self-contained deduplicated repository, and writing the data of each of the GUID to the self-contained deduplicated repository.

7. The system of claim 1 , wherein the processor device retrieves the deduplicated data file from the self-contained deduplicated repository.

8. The system of claim 7 , wherein the processor device performs one of: scanning the list of deduplicated data files at the beginning of the self-contained deduplicated repository until locating one of the deduplicated data files to be retrieved, retrieving metadata of the one of the deduplicated data files and the TOC from the self-contained deduplicated repository, skipping to the master GUID list and retrieving the master GUID list from the self-contained deduplicated repository, if each of the GUID in the one of the deduplicated data files is not processed, performing each of: finding a position of a first GUID in the one of the deduplicated data files in the master GUID list, and adding a tuple to the master GUID list that is retrieved, and if each of the GUID in the one of the deduplicated data files is processed, performing each of: sorting the master GUID list that is retrieved by position, determining if all tuples in the master GUID list have been processed, deriving a position of the GUID of a first tuple in the self-contained deduplicated repository from the position in the master GUID list that is retrieved, forwarding to the position of the GUID of the first tuple in the self-contained deduplicated repository and retrieving data of the GUID, and writing the data of the GUID of the first tuple to each position in the one of the deduplicated data according to appearance in consecutive tuples in the master GUID list that is sorted.

9. The system of claim 1 , wherein the processor device restores the deduplicated data file from the self-contained deduplicated repository by reading a table of contents (TOC) of the deduplicated data file and reading each one of the plurality of unique data blocks required by the deduplicated data file using the GUID listed in the TOC.

10. The system of claim 1 , wherein the processor device allows each deduplication system that draws the GUID for the deduplication system from a similar GUID allocation sphere to share each one of a plurality of self-contained deduplicated repositories, extract files from each one of the plurality of self-contained deduplicated repositories, and append to each one of the plurality of self-contained deduplicated repositories.

11. A computer program product storing deduplicated data by a processor device, the computer program product comprising a computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that packs deduplicated data in a self-contained deduplicated repository having a plurality of unique data blocks with each being referenced by a globally unique identifier (GUID), and the self-contained deduplicated repository also having information regarding both deduplicated data files and the plurality of unique data blocks of each of the deduplicated data files and a master GUID list containing a location of each of the plurality of unique data blocks; an executable portion that loads the self-contained deduplicated repository into a target repository, the target repository being stored in a virtual environment; an executable portion that creates the self-contained deduplicated repository from a list of deduplicated data files, wherein the self-contained deduplicated repository is isolated such that the self-contained deduplicated repository is stored at a remote location without requiring power and contains a table of contents (TOC) of the deduplicated data files, metadata of the deduplicated data files, and one unique copy of each of the plurality of unique data blocks; and an executable portion that merges all deduplicated data files in the self-contained deduplicated repository into a target repository by reading the TOC and deduplication metadata, wherein the TOC and deduplication metadata are stored, and any of the GUID needed by the TOC that do not exist in the target repository are read from the self-contained deduplicated repository and stored in the target repository.

12. The computer program product of claim 11 , further including an executable portion that examines the TOC of each of the deduplicated data files in the list of deduplicated data files one after another, wherein the master GUID list is maintained, and the TOC is an ordered list of the GUID that make up a file.

13. The computer program product of claim 11 , further including an executable portion that adds to the master GUID list each one of the plurality of unique data blocks required by any one of the deduplicated data files in the list of deduplicated data files.

14. The computer program product of claim 11 , further including an executable portion that commits to the self-contained deduplicated repository the TOC and metadata of each of the deduplicated data files in the list of deduplicated data files, the master GUID list, and the plurality of unique data blocks required by any one of the deduplicated data files in the list of deduplicated data files.

15. The computer program product of claim 11 , further including an executable portion that performs one of: ordering the plurality of unique data blocks in the self-contained deduplicated repository by the GUID, and mapping a position of the plurality of unique data blocks in the self-contained deduplicated repository.

16. The computer program product of claim 11 , further including an executable portion that performs each of: inputting into the self-contained deduplicated repository the list of deduplicated data files, determining if each one of the deduplicated data files in the list of deduplicated data files are processed, appending metadata and the TOC of the one of the deduplicated data files to the self-contained deduplicated repository if one of the deduplicated data files is not processed, determining if each of the GUID in the one of the deduplicated data files is processed, determining if the GUID is located in the master GUID list, adding the GUID to the master GUID list, writing the master GUID list to the self-contained deduplicated repository, and writing the data of each of the GUID to the self-contained deduplicated repository.

17. The computer program product of claim 11 , further including an executable portion that retrieves the deduplicated data file from the self-contained deduplicated repository.

18. The computer program product of claim 17 , further including an executable portion that retrieves the deduplicated data file from the self-contained deduplicated repository by performing one of: scanning the list of deduplicated data files at the beginning of the self-contained deduplicated repository until locating one of the deduplicated data files to be retrieved, retrieving metadata of the one of the deduplicated data files and the TOC from the self-contained deduplicated repository, skipping to the master GUID list and retrieving the master GUID list from the self-contained deduplicated repository, if each of the GUID in the one of the deduplicated data files is not processed, performing each of: finding a position of a first GUID in the one of the deduplicated data files in the master GUID list, and adding a tuple to the master GUID list that is retrieved, and if each of the GUID in the one of the deduplicated data files is processed, performing each of: sorting the master GUID list that is retrieved by position, determining if all tuples in the master GUID list have been processed, deriving a position of the GUID of a first tuple in the self-contained deduplicated repository from the position in the master GUID list that is retrieved, forwarding to the position of the GUID of the first tuple in the self-contained deduplicated repository and retrieving data of the GUID, and writing the data of the GUID of the first tuple to each position in the one of the deduplicated data according to appearance in consecutive tuples in the master GUID list that is sorted.

19. The computer program product of claim 11 , further including an executable portion that restores the deduplicated data file from the self-contained deduplicated repository by reading a table of contents (TOC) of the deduplicated data file and reading each one of the plurality of unique data blocks required by the deduplicated data file using the GUID listed in the TOC.

20. The computer program product of claim 11 , further including an executable portion that allows each deduplication system that draws the GUID for the deduplication system from a similar GUID allocation sphere to share each one of a plurality of self-contained deduplicated repositories, extract files from each one of the plurality of self-contained deduplicated repositories, and append to each one of the plurality of self-contained deduplicated repositories.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F

Patent Metadata

Filing Date

January 10, 2013

Publication Date

June 13, 2017

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search