US-8892526

Deduplication seeding

PublishedNovember 18, 2014

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Apparatus, methods, and other embodiments associated with de-duplication seeding are described. One example method includes re-configuring a data de-duplication repository with a blocklet from a data de-duplication seed corpus. Reconfiguring the repository may include adding a blocklet from the seed corpus to the repository, activating a blocklet identified with the seed corpus in the repository, removing a blocklet from the repository, and de-activating a blocklet in the repository. The example method may also include re-configuring a data de-duplication index associated with the data de-duplication repository with information about the blocklet. Reconfiguring the repository and the index increases the likelihood that a blocklet ingested by a data de-duplication apparatus that relies on the repository and the index will be treated as a duplicate blocklet by the data de-duplication apparatus.

Patent Claims

6 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A non-transitory computer-readable medium storing computer-executable instructions that when executed by a computer cause the computer to perform a data de-duplication method, the method comprising: re-configuring a data de-duplication repository with a first blocklet taken from a source other than a data stream being ingested by a data de-duplication apparatus, where the source other than the data stream being ingested is a seed corpus, and where re-configuring the repository comprises moving the first blocklet from the seed corpus into the repository; re-configuring a data de-duplication index associated with the data de-duplication repository with index information about the first blocklet, where reconfiguring the data de-duplication repository or the data de-duplication index increases the likelihood that a second blocklet will be treated as a duplicate blocklet when processed by the data de-duplication apparatus using the data de-duplication repository and the data-duplication index to support duplicate blocklet determinations, and generating a new seed corpus, where generating the new seed corpus comprises selecting a seed blocklet from an existing repository based, at least in part, on one or more of, a reference count associated with the seed blocklet, an attribute describing the generalness of the seed blocklet, a trial and error approach, and a random approach.

2. The non-transitory computer-readable medium of claim 1 , where re-configuring the repository comprises activating the first blocklet in the repository.

3. The non-transitory computer-readable medium of claim 1 , where reconfiguring the index comprises moving information about the first blocklet into the index.

4. The non-transitory computer-readable medium of claim 1 , where reconfiguring the index comprises activating information about the first blocklet in the index.

5. The non-transitory computer-readable medium of claim 1 , comprising selecting the seed corpus from two or more available seed corpora.

6. The non-transitory computer-readable medium of claim 5 , where the seed corpus is selected as a function of one or more of, a relationship between data to be ingested by the data de-duplication apparatus and the seed corpus, a historical performance measurement associated with the seed corpus, an on-the-fly performance measurement associated with the seed corpus, a user action, a calendar date, a day of the week, a time of day, a user identity, and an occurrence of a pre-defined event.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06F

Patent Metadata

Filing Date

January 11, 2012

Publication Date

November 18, 2014

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search