8290918

Robust Hashing of Digital Media Data

PublishedOctober 16, 2012
Assigneenot available in USPTO data we have
InventorsSergey Ioffe
Technical Abstract

Patent Claims
32 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method for generating a fingerprint for a media file, the method executed by at least one computer system and comprising: creating by the at least one computer system, a plurality of randomized versions of the media file, each randomized version of the media file altered to a random extent with respect to an attribute of the media file; generating, in a memory of the at least one computer system, a histogram from the plurality of randomized versions of the media file, the histogram having a plurality of bins each associated with a different feature of the media file, and each bin storing a count of the randomized versions of the media file that have the feature associated with the bin, wherein generating the histogram comprises: creating the plurality of bins in the memory of the at least one computer system, wherein each of the plurality of bins is associated with (1) a particular hash function included in a set of hash functions; and (2) a particular output value for the particular hash function given an input comprising one of the randomized version of the media file; applying the set of hash functions to each randomized version of the media file; and determining the count of the randomized media versions of the media file for each defined bin, each count indicating a number of randomized versions that resulted in the particular output value from the particular hash function; generating the fingerprint for the media file based on the histogram; and storing the fingerprint to a computer-readable storage medium.

2

2. The method of claim 1 , wherein the randomized attribute comprises an amount of media data to be cropped from the media file.

3

3. The method of claim 1 , further comprising: creating a number of altered versions of each bin; applying a first hash function to the number of altered versions of each bin to generate a plurality of outputs for the first hash function; determining a smallest output for the first hash function; and storing a first data element representative of the altered version that yielded the determined smallest output for the first hash function.

4

4. The method of claim 3 , wherein the number of altered versions of a bin is based on the count determined for the bin.

5

5. The method of claim 3 , wherein the number of altered versions of a bin is based on a global weight associated with a set of features specified by the bin.

6

6. The method of claim 3 , further comprising: applying a second hash function to the number of altered versions of each bin to generate a plurality of outputs for the second hash function; determining a smallest output for the second hash function; and storing a second data element representative of the altered version that yielded the determined smallest output for the second hash function.

7

7. The method of claim 6 , wherein the first hash function and the second hash function are part of a family of hash functions, the first hash function having a first seed value and the second hash function having a second seed value.

8

8. The method of claim 6 , further comprising: assigning the media file to a cluster of media files based on an output vector comprising at least the first and second data elements.

9

9. The method of claim 8 , wherein assigning the media file to a cluster of media files based on the output vector comprises: calculating a number of matching entries for the output vector for the media file and a second output vector for a second media file.

10

10. A non-transitory computer-readable storage medium storing computer-executable code, the computer-executable code when executed by a processor causing the processor to perform a process for generating a fingerprint for a media file, the process comprising: creating a plurality of randomized versions of the media file, each randomized version of the media file altered to a random extent with respect to an attribute of the media file; generating a histogram from the plurality of randomized versions of the media file, the histogram having a plurality of bins each associated with a different feature of the media file, and each bin storing a count of the randomized versions of the media file that have the feature associated with the bin, wherein generating the histogram comprises: creating the plurality of bins in the memory of the at least one computer system, wherein each of the plurality of bins is associated with (1) a particular hash function included in a set of hash functions; and (2) a particular output value for the particular hash function given an input comprising one of the randomized version of the media file; applying the set of hash functions to each randomized version of the media file; and determining the count of the randomized media versions of the media file for each defined bin, each count indicating a number of randomized versions that resulted in the particular output value from the particular hash function; generating the fingerprint for the media file based on the histogram; and storing the fingerprint.

11

11. The computer-readable storage medium of claim 10 , wherein the randomized attribute comprises an amount of media data to be cropped from the media file.

12

12. The computer-readable storage medium of claim 10 , wherein the process further comprises: creating a number of altered versions of each bin; applying a first hash function to the number of altered versions of each bin to generate a plurality of outputs for the first hash function; determining a smallest output for the first hash function; and storing a first data element representative of the altered version that yielded the determined smallest output for the first hash function.

13

13. The computer-readable storage medium of claim 12 , wherein the number of altered versions of a bin is based on the count determined for the bin.

14

14. The computer-readable storage medium of claim 12 , wherein the number of altered versions of a bin is based on a global weight associated with a set of features specified by the bin.

15

15. The computer-readable storage medium of claim 12 , further comprising: applying a second hash function to the number of altered versions of each bin to generate a plurality of outputs for the second hash function; determining a smallest output for the second hash function; and storing a second data element representative of the altered version that yielded the determined smallest output for the second hash function.

16

16. The computer-readable storage medium of claim 15 , wherein the first hash function and the second hash function are part of a family of hash functions, the first hash function having a first seed value and the second hash function having a second seed value.

17

17. The computer-readable storage medium of claim 15 , the process further comprising: assigning the media file to a cluster of media files based on an output vector comprising at least the first and second data elements.

18

18. The computer-readable storage medium of claim 17 , wherein assigning the media file to a cluster of media files based on the output vector comprises: calculating a number of matching entries for the output vector for the media file and a second output vector for a second media file.

19

19. A method for generating a fingerprint for a media file, the method executed by at least one computer system and comprising: determining a plurality of feature descriptors for the media file, the feature descriptors each associated with a position within the media file; coarsely encoding the position of the determined feature descriptors within the media file to determine a segment associated with each of the feature descriptors, the segment comprising a range of possible positions; generating, in a memory of the computer system, a histogram for the media file, the histogram having a plurality of bins each associated with a different segment of the media file, and each bin storing a count of the feature descriptors associated with the segment; creating a number of altered versions of each bin in the histogram; applying a first hash function to the number of altered versions of each bin to generate a plurality of outputs for the first hash function; determining a smallest output for the first hash function; storing a first data element representative of the altered version that yielded the determined smallest output for the first hash function; generating the fingerprint for the media file based on the stored first data element; histogram; and storing the fingerprint to a computer-readable storage medium.

20

20. The method of claim 19 , wherein the number of altered versions of a bin is based on the count determined for the bin.

21

21. The method of claim 19 , wherein the number of altered versions of a bin is based on a global weight associated with a set of features specified by the bin.

22

22. The method of claim 19 , further comprising: applying a second hash function to the number of altered versions of each bin to generate a plurality of outputs for the second hash function; determining a smallest output for the second hash function; and storing a second data element representative of the altered version that yielded the determined smallest output for the second hash function.

23

23. The method of claim 22 , wherein the first hash function and the second hash function are part of a family of hash functions, the first hash function having a first seed value and the second hash function having a second seed value.

24

24. The method of claim 22 , further comprising: assigning the media file to a cluster of media files based on an output vector comprising at least the first and second data elements.

25

25. The method of claim 24 , wherein assigning the media file to a cluster of media files based on the output vector comprises: calculating a number of matching entries for the output vector for the media file and a second output vector for a second media file.

26

26. A non-transitory computer-readable storage medium storing computer-executable code, the computer-executable code when executed by a processor causing the processor to perform a process generating a fingerprint for a media file, the process comprising: determining a plurality of feature descriptors for the media file, the feature descriptors each associated with a position within the media file; coarsely encoding the position of the determined feature descriptors within the media to determine a segment associated with each of the feature descriptors, the segment comprising a range of possible positions; generating, in a memory of the computer system, a histogram for the media file, the histogram having a plurality of bins each associated with a different segment of the media file, and each bin storing a count of the feature descriptors associated with the segment; creating a number of altered versions of each bin in the histogram; applying a first hash function to the number of altered versions of each bin to generate a plurality of outputs for the first hash function; determining a smallest output for the first hash function; storing a first data element representative of the altered version that yielded the determined smallest output for the first hash function; generating the fingerprint for the media file based on the histogram; and storing the fingerprint.

27

27. The computer-readable storage medium of claim 26 , wherein the number of altered versions of a bin is based on the count determined for the bin.

28

28. The computer-readable storage medium of claim 26 , wherein the number of altered versions of a bin is based on a global weight associated with a set of features specified by the bin.

29

29. The computer-readable storage medium of claim 26 , further comprising: applying a second hash function to the number of altered versions of each bin to generate a plurality of outputs for the second hash function; determining a smallest output for the second hash function; and storing a second data element representative of the altered version that yielded the determined smallest output for the second hash function.

30

30. The computer-readable storage medium of claim 29 , wherein the first hash function and the second hash function are part of a family of hash functions, the first hash function having a first seed value and the second hash function having a second seed value.

31

31. The computer-readable storage medium of claim 29 , the process further comprising: assigning the media file to a cluster of media files based on an output vector comprising at least the first and second data elements.

32

32. The computer-readable storage medium of claim 31 , wherein assigning the media file to a cluster of media files based on the output vector comprises: calculating a number of matching entries for the output vector for the media file and a second output vector for a second media file.

Patent Metadata

Filing Date

Unknown

Publication Date

October 16, 2012

Inventors

Sergey Ioffe

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “ROBUST HASHING OF DIGITAL MEDIA DATA” (8290918). https://patentable.app/patents/8290918

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.

ROBUST HASHING OF DIGITAL MEDIA DATA — Sergey Ioffe | Patentable