10565165

Selective Deduplication

PublishedFebruary 18, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method comprising: determining, by a processor of a computing device, a first deduplication priority for a first data object based upon a first projected likelihood that deduplication of the first data object will provide a storage space benefit to reduce storage consumption of a storage device, wherein the first projected likelihood is derived from aggregated statistical information indicating whether the first data object has characteristics that previously provided deduplication benefits; and performing inline deduplication for the first data object based upon the first deduplication priority exceeding a deduplication probability threshold corresponding to a likelihood of achieving the storage space benefit from inline deduplication, wherein the deduplication probability threshold is determined based upon predetermined performance metrics.

Plain English translation pending...
Claim 2

Original Legal Text

2. The method of claim 1 , comprising: determining a second deduplication priority for a second data object based upon a second projected likelihood that deduplication of the second data object will provide the storage space benefit.

Plain English Translation

This invention relates to data storage systems, specifically optimizing deduplication processes to maximize storage space savings. The problem addressed is inefficient deduplication, where storage systems may prioritize data objects that do not yield significant space savings, wasting computational resources. The solution involves dynamically assigning deduplication priorities to data objects based on their projected likelihood of providing storage space benefits. The method first determines a deduplication priority for a data object by analyzing its characteristics, such as redundancy patterns, historical deduplication success rates, and similarity to other stored data. This priority guides the order in which data objects are processed, ensuring higher-priority objects are deduplicated first. The invention extends this approach by determining a second deduplication priority for a second data object, again based on its projected likelihood of providing storage space benefits. This second priority may be recalculated periodically or triggered by changes in the data environment, such as new data ingestion or storage policy updates. The method ensures that deduplication efforts are focused on data objects most likely to yield meaningful storage savings, improving efficiency and reducing unnecessary processing overhead. The system may also adjust priorities dynamically based on real-time feedback from deduplication operations, further optimizing resource allocation.

Claim 3

Original Legal Text

3. The method of claim 1 , comprising: performing inline deduplication for the second data object based upon the second deduplication priority exceeding the deduplication probability threshold.

Plain English Translation

A method for optimizing data storage efficiency by performing inline deduplication of data objects based on their deduplication priority. The method addresses the challenge of efficiently reducing storage requirements by identifying and eliminating redundant data during the data ingestion process. The system first determines a deduplication priority for a second data object, which indicates the likelihood that the data object can be deduplicated. If this priority exceeds a predefined deduplication probability threshold, the method performs inline deduplication for the second data object. This involves comparing the data object to existing stored data to identify and remove duplicates, thereby conserving storage space. The method may also include determining a deduplication priority for a first data object and performing inline deduplication for the first data object if its priority exceeds the threshold. The deduplication process may involve comparing the data object to a deduplication index or other reference data to identify matches. The method ensures that only data objects with a sufficiently high likelihood of being deduplicated undergo the process, balancing computational overhead with storage savings. This approach is particularly useful in systems handling large volumes of data where storage efficiency is critical.

Claim 4

Original Legal Text

4. The method of claim 3 , comprising: performing post-processing deduplication for the second data object based upon the second deduplication priority being less than the deduplication probability threshold.

Plain English Translation

A method for optimizing data storage efficiency through selective deduplication involves processing data objects with varying deduplication priorities. The method identifies a second data object with a deduplication priority lower than a predefined threshold, indicating it is less critical for deduplication. To reduce computational overhead, the system performs post-processing deduplication specifically for this second data object, deferring the deduplication process until after initial storage operations. This approach prioritizes high-priority data objects for immediate deduplication while handling lower-priority data in a subsequent step, balancing storage efficiency with system performance. The method ensures that only data objects meeting specific criteria undergo deduplication, minimizing unnecessary processing and improving overall system throughput. The technique is particularly useful in environments where real-time deduplication is impractical or where certain data types require deferred processing to maintain system responsiveness. By dynamically adjusting deduplication based on priority, the method optimizes storage resources without compromising data integrity or accessibility.

Claim 5

Original Legal Text

5. The method of claim 1 , comprising: performing post-processing deduplication for the first data object based upon the first deduplication priority being less than the deduplication probability threshold.

Plain English Translation

A method for data deduplication in storage systems addresses the challenge of efficiently reducing redundant data storage while balancing computational overhead. The method involves analyzing data objects to determine their deduplication priority, which reflects the likelihood of redundancy. When a data object's deduplication priority is below a predefined threshold, the system performs post-processing deduplication. This step ensures that only data objects with sufficient redundancy potential undergo deduplication, optimizing storage efficiency without excessive processing. The method may also include pre-processing steps to assess redundancy likelihood, such as comparing data objects to existing stored data or analyzing metadata. By dynamically adjusting deduplication actions based on priority thresholds, the system minimizes storage space while maintaining performance. This approach is particularly useful in large-scale storage environments where redundant data is common, such as cloud storage or backup systems. The method ensures that deduplication is applied selectively, reducing unnecessary computational load while maximizing storage savings.

Claim 6

Original Legal Text

6. The method of claim 1 , wherein the performing inline deduplication comprises: performing the inline deduplication upon the first data object stored within a temporary storage location to create a deduplicated first data object that is subsequently stored into persistent storage.

Plain English Translation

This invention relates to data storage systems, specifically methods for improving storage efficiency through inline deduplication. The problem addressed is the inefficiency of storing duplicate data objects, which consumes unnecessary storage space and reduces system performance. The invention provides a solution by performing deduplication during the data write process, rather than as a post-processing step, to minimize storage usage and improve efficiency. The method involves storing a first data object in a temporary storage location. Inline deduplication is then performed on this data object to identify and eliminate redundant data, creating a deduplicated version of the first data object. This deduplicated data is subsequently stored in persistent storage, ensuring that only unique data is retained. The process ensures that deduplication occurs in real-time as data is written, reducing the need for additional storage space and computational resources later. The invention may also include additional steps such as comparing the first data object with previously stored data to identify duplicates, compressing the deduplicated data before storage, and managing metadata to track deduplicated segments. The method is particularly useful in systems where storage efficiency is critical, such as cloud storage, backup systems, and large-scale data centers. By performing deduplication inline, the system avoids the overhead of separate deduplication processes, improving overall performance and reducing storage costs.

Claim 7

Original Legal Text

7. The method of claim 4 , wherein the performing post-processing deduplication comprises: performing the post-processing deduplication upon the second data object while stored within persistent storage.

Plain English Translation

The invention relates to data deduplication techniques, specifically post-processing deduplication of data objects stored in persistent storage. The problem addressed is the inefficiency of traditional deduplication methods that either perform deduplication during data ingestion or rely on in-memory operations, which can be resource-intensive and may not fully optimize storage efficiency. The method involves performing post-processing deduplication on a second data object after it has been stored in persistent storage. This approach allows for deduplication to occur at a later stage, reducing the computational overhead during the initial data write process. The deduplication process identifies and eliminates redundant data segments within the second data object, comparing it against previously stored data to avoid storing duplicate copies. By performing this operation on data already in persistent storage, the system can leverage additional processing power and time to achieve more thorough deduplication, improving storage efficiency without impacting the performance of data ingestion. This method is particularly useful in systems where data is written frequently and deduplication cannot be performed in real-time due to performance constraints. The post-processing step ensures that storage space is optimized over time, reducing the overall storage footprint of the system. The technique can be applied to various types of persistent storage, including disk-based and solid-state storage systems.

Claim 8

Original Legal Text

8. The method of claim 7 , wherein the performing post-processing deduplication comprises: performing the post-processing deduplication based upon a determination that a current system load demand is below a threshold.

Plain English Translation

The invention relates to data deduplication in storage systems, specifically a method for performing post-processing deduplication based on system load conditions. Data deduplication reduces storage requirements by eliminating redundant copies of data, but the process can consume significant computational resources. The invention addresses the problem of balancing deduplication efficiency with system performance by dynamically adjusting deduplication operations based on current system load. The method involves monitoring the system's load demand, such as CPU or I/O usage, and comparing it to a predefined threshold. If the current load demand is below the threshold, indicating that the system has available resources, post-processing deduplication is performed. Post-processing deduplication occurs after data is initially written to storage, allowing the system to prioritize write operations while deferring resource-intensive deduplication tasks. This approach ensures that deduplication does not degrade system performance during peak usage periods. The method may also include pre-processing deduplication, where redundant data is identified and removed before storage, further optimizing storage efficiency. By dynamically triggering post-processing deduplication only when system resources are available, the invention maintains storage efficiency without compromising system responsiveness.

Claim 9

Original Legal Text

9. The method of claim 4 , wherein the performing post-processing deduplication comprises: performing the post-processing deduplication as a background operation.

Plain English Translation

A system and method for data storage optimization, particularly for reducing redundant data storage in a computing environment. The invention addresses the problem of inefficient storage usage due to duplicate data, which consumes unnecessary storage space and computational resources. The solution involves a post-processing deduplication technique that identifies and eliminates redundant data after initial storage operations. This process operates in the background, minimizing disruption to primary system functions. The deduplication mechanism scans stored data to detect identical or substantially similar data blocks, then replaces duplicates with references to a single stored instance. By performing this operation as a background task, the system maintains performance for active processes while gradually optimizing storage efficiency. The method is applicable to various storage systems, including file systems, databases, and cloud storage platforms, where reducing redundancy improves storage capacity and access speed. The background operation ensures that deduplication does not interfere with real-time data access or processing, making it suitable for environments requiring continuous availability. The invention enhances storage utilization without requiring significant changes to existing storage architectures or user workflows.

Claim 10

Original Legal Text

10. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority before the first data object is stored within persistent storage.

Plain English Translation

A system and method for optimizing data storage by prioritizing deduplication operations before data is written to persistent storage. The invention addresses inefficiencies in traditional storage systems where deduplication occurs after data is stored, leading to redundant storage and increased latency. The method involves analyzing a first data object to determine a deduplication priority before the object is stored. This priority is based on factors such as the likelihood of the data being duplicated elsewhere in the storage system, the size of the data object, and its access patterns. By assessing these factors pre-storage, the system can prioritize deduplication for high-value candidates, reducing storage overhead and improving write performance. The method may also involve comparing the first data object to a reference dataset or metadata to identify potential duplicates before storage. This proactive approach ensures that redundant data is eliminated early in the storage process, minimizing wasted storage space and computational resources. The system may further include mechanisms to dynamically adjust deduplication priorities based on real-time storage conditions and workload demands. This technique is particularly useful in large-scale storage environments where minimizing redundancy is critical for efficiency and cost savings.

Claim 11

Original Legal Text

11. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon a size characteristic of the first data object.

Plain English Translation

A method for optimizing data deduplication processes in storage systems addresses the challenge of efficiently identifying and prioritizing redundant data to reduce storage usage and improve performance. The method involves analyzing data objects to determine deduplication priorities based on specific characteristics of the data. In particular, the method evaluates the size of a first data object to assign a deduplication priority. Larger data objects may be prioritized for deduplication to maximize storage savings, as eliminating duplicates of large files can significantly reduce overall storage requirements. The method may also consider other factors, such as frequency of access or similarity to existing data, to further refine the deduplication strategy. By dynamically adjusting deduplication priorities based on object size and other attributes, the system ensures efficient use of storage resources while maintaining system performance. This approach is particularly useful in environments with high data volume and redundancy, such as cloud storage, enterprise backup systems, or distributed file systems.

Claim 12

Original Legal Text

12. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon a data object type of the first data object.

Plain English Translation

A system and method for optimizing data deduplication in storage environments involves prioritizing deduplication operations based on the type of data objects being processed. The invention addresses inefficiencies in traditional deduplication systems, which often apply uniform processing to all data, leading to unnecessary computational overhead and reduced performance. By analyzing the type of each data object (e.g., documents, images, databases, or system files), the system assigns a deduplication priority to ensure that higher-value or more frequently accessed data is processed first, improving storage efficiency and system responsiveness. The method may also incorporate additional factors, such as data age, access frequency, or storage constraints, to further refine prioritization. This approach reduces redundant processing of low-priority data while ensuring critical data is deduplicated promptly, enhancing overall storage performance and resource utilization. The system dynamically adjusts priorities as new data is ingested or usage patterns change, maintaining optimal deduplication efficiency over time.

Claim 13

Original Legal Text

13. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon a last modified timestamp of the first data object.

Plain English Translation

A system and method for optimizing data deduplication in storage environments addresses the challenge of efficiently identifying and prioritizing redundant data to reduce storage usage. The method involves analyzing data objects to determine deduplication priorities, with a focus on minimizing computational overhead while maximizing storage savings. A key aspect is the use of metadata attributes, such as timestamps, to assess redundancy likelihood. Specifically, the method determines a deduplication priority for a data object by evaluating its last modified timestamp, allowing the system to prioritize newer or more recently updated files that are more likely to contain unique or frequently accessed data. This approach improves deduplication efficiency by focusing resources on high-value targets, reducing the need for exhaustive comparisons across all stored data. The method may also incorporate additional metadata or content-based analysis to refine priority assignments, ensuring optimal storage utilization without compromising performance. The solution is particularly useful in large-scale storage systems where deduplication processes must balance speed and accuracy to maintain system responsiveness.

Claim 14

Original Legal Text

14. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon an update frequency of the first data object.

Plain English Translation

A system and method for optimizing data deduplication in storage systems addresses the challenge of efficiently managing redundant data to improve storage efficiency and performance. The invention prioritizes deduplication operations based on the update frequency of data objects, ensuring that frequently modified data is processed first to minimize redundant storage and improve system responsiveness. The method involves analyzing the update frequency of a first data object to determine its deduplication priority, allowing the system to focus on high-impact data that changes often. This approach reduces the computational overhead of deduplication by targeting the most dynamic data, thereby enhancing storage efficiency and reducing the likelihood of storing unnecessary duplicates. The system may also incorporate additional factors, such as data object size or access patterns, to further refine deduplication priorities. By dynamically adjusting deduplication priorities based on real-time data characteristics, the invention ensures optimal storage utilization and performance in environments with varying data update patterns. This method is particularly useful in large-scale storage systems where efficient deduplication is critical for maintaining performance and reducing costs.

Claim 15

Original Legal Text

15. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon statistic information derived from a previous deduplication operation.

Plain English Translation

A method for optimizing data deduplication in storage systems addresses the challenge of efficiently identifying and removing redundant data to save storage space and improve performance. The method involves determining a deduplication priority for data segments based on statistical information gathered from prior deduplication operations. By analyzing historical deduplication results, the system identifies patterns and trends that indicate which data segments are most likely to be redundant. This statistical analysis helps prioritize segments with higher redundancy likelihood for deduplication, reducing computational overhead and improving efficiency. The method dynamically adjusts the deduplication process based on real-world usage patterns, ensuring that storage resources are used optimally. This approach is particularly useful in large-scale storage environments where manual prioritization is impractical. The statistical information may include metrics such as frequency of occurrence, similarity scores, or historical deduplication success rates. By leveraging this data, the system avoids redundant processing of low-priority segments, enhancing overall system performance and reducing energy consumption. The method is applicable to various storage systems, including cloud storage, enterprise data centers, and backup solutions.

Claim 16

Original Legal Text

16. The method of claim 1 , comprising: determining the deduplication probability threshold based upon at least one a read throughput, a write throughput, a read response time, a write response time, network utilization, and a performance characteristic.

Plain English Translation

A method for optimizing data deduplication in storage systems addresses the challenge of balancing storage efficiency with system performance. Deduplication reduces redundant data storage but can impact read/write throughput, response times, and network utilization. The method dynamically adjusts a deduplication probability threshold based on real-time performance metrics. These metrics include read and write throughput, read and write response times, network utilization, and other performance characteristics. By analyzing these factors, the system determines an optimal threshold to maximize deduplication efficiency while maintaining acceptable system performance. The method ensures that deduplication operations do not degrade overall system responsiveness or throughput, adapting to varying workload conditions. This approach is particularly useful in environments where storage efficiency and performance must be carefully balanced, such as cloud storage, enterprise data centers, or high-performance computing systems. The dynamic adjustment mechanism allows the system to respond to changing workload patterns, ensuring consistent performance without sacrificing storage savings.

Claim 17

Original Legal Text

17. A computing device comprising: a memory having stored thereon instructions for performing a method; and a processor coupled to the memory, the processor configured to execute the instructions to cause the processor to: determine a first deduplication priority for a first data object based upon a first projected likelihood that deduplication of the first data object will provide a storage space benefit to reduce storage consumption of a storage device, wherein the first projected likelihood is derived from aggregated statistical information indicating whether the first data object has characteristics that previously provided deduplication benefits; and perform inline deduplication for the first data object based upon the first deduplication priority exceeding a deduplication probability threshold corresponding to a likelihood of achieving the storage space benefit from inline deduplication, wherein the deduplication probability threshold is determined based upon predetermined performance metrics.

Plain English Translation

This invention relates to data storage systems, specifically optimizing inline deduplication to reduce storage consumption while balancing performance. The system addresses the challenge of efficiently identifying data objects likely to yield storage savings through deduplication without excessive computational overhead. A computing device includes a processor and memory storing instructions for prioritizing deduplication based on statistical analysis. The processor evaluates a data object's characteristics, such as similarity to previously deduplicated data, to calculate a projected likelihood of storage benefit. If this likelihood exceeds a predefined threshold—determined by performance metrics like processing speed or resource usage—the system performs inline deduplication. The threshold ensures deduplication is applied only when the expected storage savings justify the computational cost, improving efficiency. The statistical approach avoids exhaustive deduplication checks, reducing latency and resource consumption while maximizing storage optimization. This method dynamically adapts to data patterns, ensuring optimal deduplication performance across varying workloads.

Claim 18

Original Legal Text

18. The computing device of claim 17 , wherein the instructions cause the processor to: determine the deduplication probability threshold based upon at least one a read throughput, a write throughput, a read response time, a write response time, network utilization, and a performance characteristic.

Plain English Translation

This invention relates to data storage systems, specifically optimizing deduplication processes to balance storage efficiency and performance. The system addresses the challenge of determining when to perform deduplication operations without negatively impacting system performance. Deduplication reduces storage costs by eliminating redundant data, but excessive deduplication can degrade read/write speeds and increase latency. The computing device includes a processor executing instructions to dynamically adjust a deduplication probability threshold based on real-time system metrics. These metrics include read throughput, write throughput, read response time, write response time, network utilization, and other performance characteristics. By analyzing these factors, the system determines an optimal threshold for deduplication operations, ensuring efficient storage while maintaining acceptable performance levels. The threshold may be recalculated continuously or periodically to adapt to changing workload conditions. This adaptive approach prevents performance degradation during high-demand periods while maximizing storage savings when system resources are available. The solution is particularly useful in environments with variable workloads, such as cloud storage or enterprise data centers.

Claim 19

Original Legal Text

19. The computing device of claim 17 , wherein the instructions cause the processor to: determine the first deduplication priority based upon an update frequency of the first data object.

Plain English Translation

A system for managing data storage in a computing device addresses the challenge of efficiently storing and retrieving data while minimizing redundancy. The system includes a processor and memory storing instructions that, when executed, perform data deduplication by identifying and removing duplicate copies of data objects. The system prioritizes deduplication operations based on factors such as the update frequency of data objects. Specifically, the system determines a deduplication priority for a first data object by analyzing how frequently the object is updated. Higher update frequencies may result in higher or lower deduplication priority, depending on the system's optimization goals. The system may also compare the first data object with a second data object to identify duplicates and apply deduplication techniques accordingly. The goal is to optimize storage efficiency and performance by intelligently managing redundant data based on usage patterns. This approach helps reduce storage overhead and improve data access speeds, particularly in environments with large datasets or frequent updates.

Claim 20

Original Legal Text

20. A non-transitory machine-readable storage media having stored thereon instructions, for performing a method, which causes a computing device to: determine a first deduplication priority for a first data object based upon a first projected likelihood that deduplication of the first data object will provide a storage space benefit to reduce storage consumption of a storage device, wherein the first projected likelihood is derived from aggregated statistical information indicating whether the first data object has characteristics that previously provided deduplication benefits; and perform inline deduplication for the first data object based upon the first deduplication priority exceeding a deduplication probability threshold corresponding to a likelihood of achieving the storage space benefit from inline deduplication, wherein the deduplication probability threshold is determined based upon predetermined performance metrics.

Plain English Translation

This invention relates to data storage systems and addresses the challenge of efficiently performing inline deduplication to reduce storage consumption while balancing performance overhead. The system determines a deduplication priority for a data object by analyzing its characteristics against aggregated statistical data that indicates whether similar objects have previously yielded storage savings through deduplication. This projected likelihood is used to assess the potential storage space benefit of deduplicating the object. If the projected likelihood exceeds a predefined threshold—based on performance metrics like processing time or resource usage—the system performs inline deduplication. The threshold ensures that deduplication is only applied when the expected storage savings justify the computational cost, optimizing both storage efficiency and system performance. The approach leverages historical deduplication outcomes to prioritize objects with high likelihoods of benefit, reducing unnecessary deduplication operations and improving overall system efficiency.

Patent Metadata

Filing Date

Unknown

Publication Date

February 18, 2020

Inventors

Damarugendra Mallaiah
Jayanta Basak

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “SELECTIVE DEDUPLICATION” (10565165). https://patentable.app/patents/10565165

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10565165. See llms.txt for full attribution policy.