Secure Hashing of Large Data Files to Verify File Identity

PublishedJuly 29, 2025

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

20 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer-implemented method comprising: determining, by one or more processors, a size of a particular file received by an endpoint device; searching for a record indexed in a data structure by searching for a size value index of the record that corresponds to the size of the particular file, the data structure stored on the endpoint device; and in response to finding the record indexed in the data structure: accessing a sequence of records in the data structure, the sequence including multiple records starting from the record, and for each respective record of the multiple records: hashing a respective particular data portion of the particular file that has a location in the particular file that corresponds to a location parameter in the respective record to obtain a respective particular hash result; and determining whether the respective particular hash result matches a corresponding respective previous hash result stored in the respective record, wherein the respective previous hash result is based on a respective associated data portion in an associated file at a location in the associated file that is indicated by the location parameter; and in response to determining that the respective particular hash results match the corresponding respective previous hash results of the multiple records, determining that the particular file is the same as the associated file, obtaining file information previously determined for the associated file, and determining one or more characteristics of the particular file at the endpoint device using the file information.

2. The method of claim 1, further comprising: in response to determining that a particular hash result in the sequence is different than the corresponding previous hash result, searching for another record indexed in the data structure by searching for a size value index of the other record that corresponds to the size of the particular file or based on a matched hash result in a previous record of the sequence.

3. The method of claim 1, wherein each subsequent record in the sequence after the record is indexed based on a respective previous hash result in a previous record of the sequence.

4. The method of claim 1 wherein hashing the respective particular data portion includes storing the respective particular hash result in a cache that is available for later hashes of the respective particular data portion.

5. The method of claim 1, wherein the location parameter in the respective record indicates the location in the associated file that is randomly determined by the endpoint device when the record was created and the associated file was obtained by the endpoint device, such that the sequence of records stores hash results obtained from randomly-located data portions within the associated file.

6. The method of claim 1, wherein a total size of the respective particular data portions is less than the size of the particular file such that an amount of data in the particular file that is hashed is less than all data in the particular file.

7. The method of claim 1, wherein the sequence includes a predetermined number of records, wherein the predetermined number is configurable by the endpoint device.

8. The method of claim 1, wherein the sequence is based on a predetermined order of the multiple records in the data structure.

9. The method of claim 1, wherein the sequence is based on accessing a randomly-determined order or arbitrary order of the multiple records in the data structure.

10. The method of claim 1, further comprising combining the respective particular hash results to create a file reference for the particular file for use by processes executing on the endpoint device.

11. The method of claim 1, wherein determining the one or more characteristics of the particular file at the endpoint device includes determining a security status of the particular file that indicates whether the particular file is at least one of: malicious, potentially malicious, or benign.

12. The method of claim 1, further comprising, in response to not finding any record indexed in the data structure based on the size of the particular file or not finding any record in the data structure in which the respective particular hash result matches a corresponding respective previous hash result, applying a threat detection process to the particular file.

13. The method of claim 1, further comprising: in response to not finding any record indexed in the data structure by searching for the size value index or not finding any record in the data structure in which the respective particular hash result matches a corresponding respective previous hash result: creating a new record in the data structure, the new record being indexed with the size value index that corresponds to the size of the particular file when the new record is an initial record in a new sequence, or indexed based on the respective previous hash result in a previous record in the sequence previous to the new record when the new record is subsequent to the previous record in the sequence; determining a new location of a new data portion in the particular file; hashing the new data portion to obtain a new hash result; and storing the new location and the new hash result in the new record.

14. The method of claim 13, further comprising: creating a predetermined number of additional records in the sequence at successive hierarchical levels of the data structure after the new record; determining, for each additional record, a different respective additional location of a respective additional data portion in the particular file; hashing, for each additional record, the respective additional data portion to obtain a respective additional hash result; and storing, in each additional record, the respective additional location and the respective additional hash result.

15. The method of claim 1, wherein the data structure is encrypted for use only on the endpoint device.

16. The method of claim 1, wherein each particular data portion has a size, and further comprising determining the size of the respective particular data portions based on one or more characteristics of the endpoint device that stores the data structure.

17. The method of claim 1, wherein searching for the record is performed in response to the size of the particular file being greater than a threshold file size, and further comprising: in response to the size of the particular file being less than the threshold file size, hashing all data of the particular file to obtain a single hash result and comparing the single hash result to a single previous hash result for the associated file to determine if the particular file is the same as the associated file.

18. A device comprising: one or more hardware processors; and a memory coupled to the one or more hardware processors, with instructions stored thereon, that when executed by the one or more hardware processors, cause the one or more hardware processors to perform operations comprising: determining a size of a particular file received by the device via a communication network; searching for a record indexed in a data structure by searching for a size value index of the record that corresponds to the size of the particular file, the data structure stored on the device; and in response to finding the record indexed in the data structure: accessing a sequence of records in the data structure, the sequence including multiple records starting from the record, and for each respective record of the multiple records: hashing a respective particular data portion of the particular file that has a location in the particular file that corresponds to a location parameter in the respective record to obtain a respective particular hash result; and determining whether the respective particular hash result matches a corresponding respective previous hash result stored in the respective record, wherein the respective previous hash result is based on a respective associated data portion in an associated file at a location in the associated file that is indicated by the location parameter; and in response to determining that the respective particular hash results match the corresponding respective previous hash results of the multiple records, determining that the particular file is the same as the associated file, obtaining file information previously determined for the associated file, and determining one or more characteristics of the particular file at the device using the file information.

19. The device of claim 18, wherein the execution of the instructions cause the one or more hardware processors to perform operations further comprising: in response to determining that the respective particular hash result is different than the corresponding previous hash result, searching for another record indexed in the data structure by searching for a size value index of the other record that corresponds to the size of the particular file or based on a matched hash result in a previous record of the sequence; and in response to not finding any record indexed in the data structure by searching for a size value index of the other record that corresponds to the size of the particular file or not finding any record in the data structure in which the respective particular hash result matches a corresponding respective previous hash result, applying a threat detection process to the particular file.

20. A non-transitory computer-readable medium with instructions stored thereon that, when executed by one or more processors, cause the one or more processors to perform operations comprising: determining a size of a particular file received by an endpoint device coupled to a communication network; searching for a record indexed in a data structure by searching for a size value index of the record that corresponds to the size of the particular file, the data structure stored on the endpoint device; and in response to finding the record indexed in the data structure: accessing a sequence of records in the data structure, the sequence including multiple records starting from the record, and for each respective record of the multiple records: hashing a respective particular data portion of the particular file that has a location in the particular file that corresponds to a location parameter in the respective record to obtain a respective particular hash result; and determining whether the respective particular hash result matches a corresponding respective previous hash result stored in the respective record, wherein the respective previous hash result is based on a respective associated data portion in an associated file at a location in the associated file that is indicated by the location parameter; and in response to determining that the respective particular hash results match the corresponding respective previous hash results of the multiple records, determining that the particular file is the same as the associated file, obtaining file information previously determined for the associated file, and determining one or more characteristics of the particular file at the endpoint device using the file information; and in response to not finding any record indexed in the data structure based on the size of the particular file or not finding any record in the data structure in which the respective particular hash result matches a corresponding respective previous hash result, applying a threat detection process to the particular file.

Patent Metadata

Filing Date

Unknown

Publication Date

July 29, 2025

Inventors

James Christopher Carpenter

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search