10762051

Reducing Hash Collisions in Large Scale Data Deduplication

PublishedSeptember 1, 2020
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
20 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method comprising: obtaining a first data chunk and a second data chunk of a first plurality of data chunks associated with a first data snapshot of a computing system, the first data snapshot comprising a data set indicative of a reproducible machine state of the computing system at a time, and each data chunk of the first plurality of data chunks having a same predetermined data size; selecting, from a plurality of hash functions and at random, a first hash function for a first data chunk of the first plurality of data chunks; selecting, from the plurality of hash functions and at random, a second hash function for a second data chunk of the first plurality of data chunks; hashing the first data chunk with the first hash function to create a first hash value; hashing the second data chunk with the second hash function to create a second hash value; generating a first lookup table having a first hash record that includes the first hash value and a second hash record that includes the second hash value; creating a first archive comprising the first plurality of data chunks and the first lookup table to a datastore; obtaining a third data chunk and a fourth data chunk of a second plurality of data chunks associated with a second data snapshot of the computing system; hashing a third data chunk of the second plurality of data chunks with the first hash function to create a third hash value; hashing a fourth data chunk of the second plurality of data chunks with the second hash function to create a fourth hash value; generating a second lookup table having a third hash record that includes the third hash value and a fourth hash record that includes the fourth hash value; identifying the one or more dissimilar hash values between the first lookup table and the second lookup table, the identifying comprising: comparing the first hash record with the third hash record to determine whether the first hash value and the third hash value match; comparing the second hash record with the fourth hash record to determine whether the second hash value and the fourth hash value match; and identifying the one or more dissimilar data chunks based on the first hash record, the second hash record, the third hash record, and the fourth hash record; and creating a second archive comprising the one or more dissimilar data chunks, and transmitting the second archive to the datastore.

Plain English translation pending...
Claim 2

Original Legal Text

2. The method as recited in claim 1 , further comprising: identifying a data chunk pair comprising two data chunks having a same hash value; randomly selecting a complex hash function from a plurality of complex hash functions; hashing the data chunk pair with the complex hash function to create a complex hash value for the data chunk pair; and verifying that the data chunk pair is identical based on the complex hash value.

Plain English Translation

This invention relates to data integrity verification in storage systems, particularly for detecting and confirming identical data chunks using hash functions. The problem addressed is ensuring accurate identification of duplicate data chunks in storage systems, where simple hash collisions may lead to false positives or negatives. The solution involves a multi-step verification process to enhance reliability. The method begins by identifying a pair of data chunks that share the same hash value, indicating potential duplication. To resolve ambiguity, a complex hash function is randomly selected from a predefined set of complex hash functions. The data chunk pair is then hashed using this complex function, generating a new, more robust hash value. This complex hash value is used to definitively verify whether the data chunks are truly identical, reducing the likelihood of false matches caused by hash collisions. The use of multiple complex hash functions ensures higher accuracy in distinguishing between identical data and false positives. This approach is particularly useful in large-scale storage systems where data deduplication is critical for efficiency and reliability. The random selection of hash functions further enhances security and prevents predictable patterns that could be exploited. The method improves upon traditional deduplication techniques by adding an additional layer of verification, ensuring that only truly identical data chunks are identified as duplicates.

Claim 3

Original Legal Text

3. The method as recited in claim 1 , wherein the first hash record, the second hash record, the third hash record, and the fourth hash record comprise a data partition number indicative of an identification of a data partition in an endpoint, a data endpoint number indicative of an identification of the endpoint, and a data timestamp indicative of a time at which the snapshot was recorded.

Plain English Translation

This invention relates to a system for managing and organizing data snapshots in a distributed storage environment. The problem addressed is the need for efficient tracking and retrieval of data snapshots across multiple endpoints, ensuring data integrity and traceability. The method involves generating multiple hash records for data snapshots, where each hash record contains specific metadata to uniquely identify and locate the snapshot. The hash records include a data partition number, which identifies the specific data partition within an endpoint where the snapshot is stored. A data endpoint number identifies the endpoint itself, allowing the system to distinguish between different storage locations. Additionally, a data timestamp records the exact time when the snapshot was taken, enabling chronological tracking and version control. These hash records are used to organize and retrieve snapshots efficiently, ensuring that data can be accurately located and accessed based on partition, endpoint, and time. The inclusion of these metadata fields allows for robust data management, reducing the risk of data loss or corruption and improving system reliability. The method supports scalable and distributed storage architectures, making it suitable for large-scale data environments.

Claim 4

Original Legal Text

4. The method as recited in claim 1 , wherein creating the second archive comprises creating an upload package comprising the one or more dissimilar data chunks, the creating comprising: compressing and encrypting the one or more dissimilar data chunks; transmitting the upload package to the datastore; and updating a datastore index comprising a datastore location of the one or more dissimilar data chunks.

Plain English Translation

This invention relates to data storage systems, specifically methods for efficiently managing and transmitting dissimilar data chunks to a datastore. The problem addressed is the need to securely and efficiently package, transmit, and index dissimilar data chunks when creating a second archive in a distributed storage environment. The method involves creating an upload package containing one or more dissimilar data chunks. The process includes compressing and encrypting these chunks to reduce storage requirements and enhance security. The encrypted and compressed upload package is then transmitted to a datastore. Additionally, the method updates a datastore index to record the location of the stored data chunks, ensuring efficient retrieval and management. This approach optimizes storage efficiency, security, and accessibility in distributed storage systems.

Claim 5

Original Legal Text

5. A method comprising: obtaining a first data chunk and a second data chunk of a plurality of data chunks associated with a first data snapshot of a computing system; selecting, from a plurality of hash functions, a first hash function for the first data chunk and a second hash function for the second data chunk; writing a first hash record to a first lookup table, the first hash record including a first hash value from hashing the first data chunk with the first hash function; writing a second hash record to the first lookup table, the second hash record including a second hash value from hashing the second data chunk with the second hash function; creating a first archive comprising the plurality of data chunks and the first lookup table to a datastore; obtaining a third data chunk and a fourth data chunk of a second plurality of data chunks associated with a second data snapshot of the computing system; writing a third hash record to a second lookup table, the third hash record including a third hash value from hashing the third data chunk with the first hash function; writing a fourth hash record to the second lookup table, the fourth hash record including a fourth hash value from hashing the fourth data chunk with the second hash function; identifying one or more dissimilar data chunks by comparing hash values between the first lookup table and the second lookup table, wherein comparing the hash values comprises verifying an absence of a hash collision with a complex hash function when the hash values are identical; and creating a second archive comprising the one or more dissimilar data chunks, and transmitting the second archive to the datastore.

Plain English Translation

This invention relates to data deduplication in computing systems, specifically for identifying and archiving dissimilar data chunks between different system snapshots. The problem addressed is efficiently detecting changes in data snapshots while minimizing storage and computational overhead. The method involves capturing multiple data chunks from a first system snapshot, selecting distinct hash functions for different chunks, and generating hash values for each chunk. These hash values are stored in a lookup table alongside the original data chunks, forming a first archive. For a second snapshot, new data chunks are processed similarly, with hash values stored in a second lookup table. The system compares hash values between the two snapshots to identify dissimilar data chunks. If identical hash values are found, a complex hash function is used to verify no collision occurred, ensuring accurate change detection. Only the dissimilar chunks are archived and transmitted, reducing storage requirements. This approach optimizes deduplication by leveraging multiple hash functions and collision verification, ensuring reliable identification of modified data while minimizing redundant storage.

Claim 6

Original Legal Text

6. The method as recited in claim 5 , wherein the first data chunk, the second data chunk, the third data chunk, and the fourth data chunk have a same predetermined data size.

Plain English Translation

Data storage and retrieval systems often face challenges in efficiently managing and accessing large datasets, particularly when dealing with fragmented or unevenly sized data chunks. This can lead to inefficiencies in storage allocation, retrieval speed, and system performance. To address these issues, a method involves organizing data into multiple chunks of equal predetermined size. Specifically, the method includes dividing a dataset into at least four data chunks—a first, second, third, and fourth data chunk—each having the same fixed data size. This uniform sizing ensures consistent storage allocation, simplifies data indexing, and improves retrieval efficiency by reducing fragmentation and optimizing access patterns. The method may also include additional steps such as encoding, compressing, or encrypting the data chunks before storage, further enhancing data management and security. By standardizing chunk sizes, the system can more effectively distribute and retrieve data, leading to improved performance and reliability in data-intensive applications. This approach is particularly useful in distributed storage systems, cloud computing, and large-scale data processing environments where consistent and efficient data handling is critical.

Claim 7

Original Legal Text

7. The method as recited in claim 5 , wherein verifying the absence of the hash collision comprises: identifying a data chunk pair comprising a first data chunk from the first lookup table and a second data chunk from the second lookup table having a same hash value; randomly selecting the complex hash function from a plurality of complex hash functions; hashing the first data chunk from the first lookup table and the second data chunk from the second lookup table with the complex hash function to generate a complex hash value; and verifying that the data chunk pair is identical based at least in part on the complex hash value.

Plain English Translation

This invention relates to data integrity verification in systems using hash-based lookup tables. The problem addressed is ensuring that hash collisions do not lead to false positives when comparing data chunks across different lookup tables. Hash collisions occur when different data chunks produce the same hash value, which can cause errors in data comparison processes. The method involves verifying the absence of hash collisions between two lookup tables. First, a data chunk pair is identified, consisting of one chunk from each lookup table, where both chunks share the same hash value. Next, a complex hash function is randomly selected from a pool of available complex hash functions. The identified data chunks are then hashed using this complex hash function to generate a new, more robust hash value. Finally, the method verifies whether the data chunks are identical by analyzing the complex hash value. If the complex hash values match, the chunks are confirmed to be identical, ensuring no hash collision occurred. This approach enhances data integrity by reducing the likelihood of false matches due to hash collisions.

Claim 8

Original Legal Text

8. The method as recited in claim 7 , wherein the first hash function and the second hash function are randomly selected from a first group of hash functions.

Plain English Translation

This invention relates to cryptographic systems, specifically methods for enhancing security in data processing by using multiple hash functions. The problem addressed is the vulnerability of systems relying on a single hash function, which can be compromised if the function is found to have weaknesses or is reverse-engineered. The method involves generating a first hash value by applying a first hash function to input data and a second hash value by applying a second hash function to the same input data. The first and second hash functions are randomly selected from a predefined group of hash functions, ensuring that the selection process is unpredictable and resistant to attacks. This random selection increases security by making it difficult for an attacker to anticipate or exploit the hashing process. The method may also include combining the first and second hash values to produce a final output, which can be used for authentication, integrity verification, or other cryptographic purposes. The use of multiple hash functions from a diverse group further strengthens the system against potential vulnerabilities in any single function. This approach is particularly useful in applications where security is critical, such as digital signatures, password storage, or secure communication protocols.

Claim 9

Original Legal Text

9. The method as recited in claim 8 , wherein the first group of hash functions use fewer bits to store hash values than the plurality of complex hash functions.

Plain English Translation

A method for optimizing data storage and retrieval in a distributed system addresses the challenge of efficiently managing large datasets while minimizing computational and storage overhead. The method involves using a combination of simple and complex hash functions to improve performance. A first group of hash functions generates hash values with fewer bits compared to a second group of complex hash functions. The simple hash functions are used for initial data distribution, reducing storage requirements and computational load. The complex hash functions are then applied to refine the distribution, ensuring accurate data placement and retrieval. This dual-hash approach balances efficiency and precision, making it suitable for systems where both speed and storage optimization are critical. The method can be applied in distributed databases, content-addressable storage, or peer-to-peer networks to enhance scalability and reduce resource consumption. By leveraging simpler hashing for preliminary steps and more robust hashing for final steps, the system achieves a cost-effective solution for large-scale data management.

Claim 10

Original Legal Text

10. The method as recited in claim 5 , wherein the first hash record, the second hash record, the third hash record, and the fourth hash record comprise a data partition number indicative of an identification of a data partition in an endpoint, a data endpoint number indicative of an identification of the endpoint, and a data timestamp indicative of a time at which the snapshot was saved.

Plain English Translation

This invention relates to data management systems, specifically methods for organizing and tracking data snapshots in distributed storage environments. The problem addressed is the need for efficient data partitioning, endpoint identification, and timestamping to ensure accurate and retrievable snapshot records across multiple endpoints. The method involves generating multiple hash records (first, second, third, and fourth) that collectively store metadata for data snapshots. Each hash record includes a data partition number, which identifies a specific data partition within an endpoint. This allows the system to logically segment and manage data across different storage partitions. Additionally, each record contains a data endpoint number, which uniquely identifies the endpoint where the snapshot is stored, enabling precise tracking of data across distributed systems. Finally, each record includes a data timestamp, which records the exact time the snapshot was saved, ensuring temporal consistency and enabling version control. By incorporating these three key pieces of metadata—partition number, endpoint number, and timestamp—into the hash records, the system can efficiently organize, retrieve, and manage snapshots in a distributed storage environment. This approach enhances data integrity, simplifies recovery processes, and improves overall system reliability. The method is particularly useful in large-scale storage systems where multiple endpoints and partitions must be synchronized and tracked over time.

Claim 11

Original Legal Text

11. The method as recited in claim 5 , wherein creating the second archive comprises: creating an upload package comprising the one or more dissimilar data chunks, the creating comprising: compressing and encrypting the one or more dissimilar data chunks; encrypting the first data chunk; transmitting the upload package to the datastore; and updating a datastore index comprising a datastore location of the one or more dissimilar data chunks.

Plain English Translation

This invention relates to data storage and management, specifically improving the efficiency and security of archiving dissimilar data chunks in a distributed datastore. The problem addressed is the need to securely and efficiently store and manage data chunks that may differ in type, format, or origin, while ensuring data integrity and minimizing storage overhead. The method involves creating a second archive from one or more dissimilar data chunks. First, an upload package is generated, which includes the dissimilar data chunks. This process involves compressing and encrypting the dissimilar data chunks to reduce storage requirements and enhance security. Additionally, a first data chunk within the package is further encrypted to provide an extra layer of protection. The upload package is then transmitted to a datastore, where it is stored. Finally, a datastore index is updated to include the location of the stored dissimilar data chunks, ensuring that the data can be efficiently retrieved later. This approach ensures that dissimilar data chunks are securely stored, compressed for efficiency, and properly indexed for quick retrieval, addressing challenges in managing heterogeneous data in distributed storage systems.

Claim 12

Original Legal Text

12. The method as recited in claim 5 , wherein, after verifying that the hash values are identical, writing a pointer on the second lookup table that is indicative of a data location on the datastore.

Plain English Translation

This invention relates to data storage and retrieval systems, specifically addressing the challenge of efficiently managing and accessing data in a distributed or hierarchical storage environment. The method involves verifying the integrity of data by comparing hash values between a primary lookup table and a secondary lookup table. Once the hash values are confirmed to be identical, a pointer is written in the secondary lookup table to indicate the precise data location within the datastore. This ensures that the secondary lookup table can accurately reference the correct data location, improving data retrieval efficiency and reliability. The primary lookup table is used to initially store hash values associated with data entries, while the secondary lookup table serves as an intermediary for accessing the datastore. The process of verifying hash values ensures data consistency, preventing errors that could arise from corrupted or mismatched data references. By writing the pointer in the secondary lookup table, the system enables faster and more direct access to the stored data, reducing latency and improving overall system performance. This method is particularly useful in systems where data integrity and quick retrieval are critical, such as in distributed databases, cloud storage, or large-scale data processing environments.

Claim 13

Original Legal Text

13. A system comprising: one or more processors; and memory to store computer-executable instructions that, when executed, cause the one or more processors to: obtain a first data chunk and a second data chunk of a plurality of data chunks associated with a first data snapshot of a computing system; select, from a plurality of hash functions, a first hash function for the first data chunk and a second hash function for the second data chunk; write a first hash record to a first lookup table, the first hash record comprising a hash value from hashing the first data chunk with the first hash function; write a second hash record to the first lookup table, the second hash record comprising a second hash value from hashing the second data chunk with the second hash function; create a first archive comprising the plurality of data chunks and the first lookup table to a datastore; obtain a third data chunk and a fourth data chunk of a second plurality of data chunks associated with a second data snapshot of the computing system; write a third hash record to a second lookup table, the third hash record comprising a third hash value from hashing the third data chunk with the first hash function; write a fourth hash record to the second lookup table, the fourth hash record comprising a fourth hash value from hashing the fourth data chunk with the second hash function; identify one or more dissimilar data chunks by comparing hash values between the first lookup table and the second lookup table, wherein comparing the hash values comprises verifying an absence of a hash collision with a complex hash function when the hash values are identical; and create a second archive comprising the one or more dissimilar data chunks, and transmitting the second archive to the datastore.

Plain English Translation

The system is designed for efficient data deduplication and versioning in computing systems by comparing snapshots to identify and store only dissimilar data chunks. The technology addresses the challenge of managing large volumes of data by reducing storage requirements through deduplication while maintaining data integrity across multiple snapshots. The system processes data chunks from a first snapshot, applying different hash functions to each chunk and storing the resulting hash values in a lookup table. These chunks and the lookup table are then archived. For a second snapshot, the system similarly processes data chunks, storing their hash values in a new lookup table. The system compares hash values between the two lookup tables to identify dissimilar data chunks, verifying no hash collisions with a complex hash function when identical hash values are found. Only the dissimilar chunks are archived and transmitted to the datastore, minimizing redundant storage. This approach ensures efficient deduplication while preserving data consistency across snapshots.

Claim 14

Original Legal Text

14. The system as recited in claim 13 , wherein the first data chunk, the second data chunk, the third data chunk, and the fourth data chunk have a same predetermined data size.

Plain English Translation

The invention relates to a data processing system designed to improve data handling efficiency in distributed or parallel computing environments. The system addresses the challenge of managing data chunks of varying sizes, which can lead to inefficiencies in processing, storage, and transmission. By standardizing data chunk sizes, the system ensures consistent performance and reduces overhead associated with handling variable-length data. The system processes data by dividing it into multiple chunks, including at least a first, second, third, and fourth data chunk. Each of these chunks is assigned a predetermined, uniform data size. This standardization simplifies data distribution across multiple processors or storage nodes, minimizes synchronization delays, and enhances load balancing. The system may also include mechanisms to dynamically adjust the predetermined size based on system performance metrics, ensuring optimal resource utilization. The uniform chunk size allows for predictable memory allocation, efficient parallel processing, and simplified data reconstruction. This approach is particularly useful in high-performance computing, cloud storage, and real-time data streaming applications where consistency and speed are critical. By eliminating variability in data chunk sizes, the system improves overall system reliability and reduces the risk of bottlenecks.

Claim 15

Original Legal Text

15. The system as recited in claim 14 , wherein verifying the absence of the hash collision comprises: identifying a data chunk pair comprising a first data chunk from the first lookup table and a second data chunk from the second lookup table having a same hash value; randomly selecting the complex hash function from a plurality of complex hash functions; hashing the first data chunk from the first lookup table and the second data chunk from the second lookup table with the complex hash function to generate a complex hash value; and verifying that the data chunk pair is identical based at least in part on the complex hash value.

Plain English Translation

This invention relates to a system for verifying the absence of hash collisions in data storage or processing applications. The problem addressed is ensuring data integrity when multiple data chunks produce the same hash value, which can lead to collisions and errors in systems relying on hash-based comparisons. The system includes a first lookup table and a second lookup table, each storing data chunks and their corresponding hash values. To verify the absence of hash collisions, the system identifies a data chunk pair where one chunk is from the first lookup table and the other is from the second lookup table, both sharing the same hash value. A complex hash function is then randomly selected from a pool of available complex hash functions. Both data chunks in the pair are hashed using this selected complex hash function, generating a new, more detailed hash value. The system then verifies whether the data chunks are identical by comparing this complex hash value. If the complex hash values match, the chunks are confirmed to be identical, ensuring no collision occurred. This method enhances data integrity by reducing the likelihood of false positives in hash-based comparisons.

Claim 16

Original Legal Text

16. The system as recited in claim 14 , wherein the first hash function and the second hash function are randomly selected from a first group of hash functions.

Plain English Translation

A system for secure data processing involves generating multiple hash values from input data using different hash functions to enhance security and integrity verification. The system employs a first hash function and a second hash function, both randomly selected from a predefined group of hash functions, to produce distinct hash values. These hash values are then used to verify the integrity of the data or to perform secure comparisons. The random selection of hash functions from a group ensures that the system remains resilient against attacks that exploit predictable or fixed hash functions. This approach increases the difficulty for an attacker to reverse-engineer or manipulate the hashed data, thereby improving overall security. The system may also include additional hash functions or processing steps to further enhance security or meet specific application requirements. The use of multiple, randomly selected hash functions provides a flexible and robust method for data integrity verification in various security-sensitive applications.

Claim 17

Original Legal Text

17. The system as recited in claim 16 , wherein the first group of hash functions use fewer bits to store hash values than the complex hash functions of the plurality of complex hash functions.

Plain English Translation

A system for efficient data processing and storage uses a combination of hash functions to optimize performance and resource usage. The system employs a plurality of complex hash functions for generating detailed hash values, which are used for precise data identification and retrieval. Additionally, the system includes a first group of hash functions that produce hash values using fewer bits than the complex hash functions. This reduction in bit usage allows for faster computation and lower memory consumption while still providing sufficient accuracy for certain operations. The system dynamically selects between the first group of hash functions and the complex hash functions based on the specific requirements of the task, such as the need for speed versus precision. This approach balances computational efficiency with data integrity, making it suitable for applications where both performance and accuracy are critical, such as database indexing, data deduplication, and cryptographic operations. The use of fewer bits in the first group of hash functions reduces storage and processing overhead without sacrificing the overall functionality of the system.

Claim 18

Original Legal Text

18. The system as recited in claim 13 , wherein creating the second archive comprises: creating an upload package comprising the one or more dissimilar data chunks, the creating comprising: compressing and encrypting the one or more dissimilar data chunks; encrypting the first data chunk; transmitting the upload package to the datastore; and updating a datastore index comprising a datastore location of the one or more dissimilar data chunks.

Plain English Translation

This invention relates to data storage systems, specifically methods for creating and managing archived data packages in a distributed storage environment. The problem addressed is the efficient and secure storage of dissimilar data chunks, which are data segments that may differ in format, size, or type, across a distributed datastore. The system creates a second archive by generating an upload package containing one or more dissimilar data chunks. The process involves compressing and encrypting these chunks to optimize storage and security. Additionally, the first data chunk within the package is further encrypted separately. The upload package is then transmitted to a datastore, where it is stored. The system also updates a datastore index to record the location of the stored dissimilar data chunks, ensuring efficient retrieval. This approach ensures that data is stored securely and efficiently, with proper indexing for quick access. The use of compression and encryption enhances both storage efficiency and data security, while the separate encryption of the first data chunk may provide additional protection for critical or sensitive information. The system is designed to handle diverse data types, making it suitable for distributed storage environments where data integrity and accessibility are priorities.

Claim 19

Original Legal Text

19. The system as recited in claim 13 , wherein the computer-executable instructions are further executable by the one or more processsors to, after verifying that the hash values are identical, write a pointer on the second lookup table indicative of a data location on the datastore for the data chunk pair.

Plain English Translation

A system for managing data integrity in a distributed storage environment addresses the challenge of efficiently verifying and tracking data consistency across multiple storage nodes. The system includes a datastore for storing data chunks, a first lookup table for mapping data chunk identifiers to hash values, and a second lookup table for tracking data locations. The system compares hash values of data chunk pairs to verify their integrity. If the hash values match, the system updates the second lookup table with a pointer indicating the storage location of the verified data chunk pair. This ensures that only consistent and validated data is referenced, reducing the risk of data corruption or inconsistencies in distributed storage systems. The system enhances data reliability by automating the verification and tracking process, minimizing manual intervention and improving storage efficiency. The solution is particularly useful in environments where data integrity is critical, such as cloud storage, distributed databases, or blockchain applications.

Claim 20

Original Legal Text

20. The system as recited in claim 13 , wherein a number of hash functions in the plurality of hash functions is modifiable by a user.

Plain English Translation

A system for managing data using multiple hash functions allows users to adjust the number of hash functions applied to the data. The system processes data by applying a plurality of hash functions to generate multiple hash values, which are then used for data storage, retrieval, or comparison. The ability to modify the number of hash functions enables users to balance between computational efficiency and collision resistance. By increasing the number of hash functions, the system reduces the likelihood of hash collisions, improving data integrity and security. Conversely, reducing the number of hash functions decreases computational overhead, making the system more efficient for high-throughput applications. The system may be used in databases, cryptographic applications, or distributed storage systems where hash-based operations are critical. The user-configurable aspect allows adaptation to different performance and security requirements without requiring system redesign. This flexibility ensures optimal performance across varying workloads and security needs.

Patent Metadata

Filing Date

Unknown

Publication Date

September 1, 2020

Inventors

Ravishankar Bhagavandas
Aaron B. Fernandes
Zehua Wang
Jiabin Li
Huihui Li

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “REDUCING HASH COLLISIONS IN LARGE SCALE DATA DEDUPLICATION” (10762051). https://patentable.app/patents/10762051

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/10762051. See llms.txt for full attribution policy.