Legal claims defining the scope of protection, as filed with the USPTO.
1. A method for generating a fingerprint for a media file, the method executed by at least one computer system and comprising: creating by the at least one computer system, a plurality of randomized versions of the media file, each randomized version of the media file altered to a random extent with respect to an attribute of the media file; generating, in a memory of the at least one computer system, a histogram from the plurality of randomized versions of the media file, the histogram having a plurality of bins each associated with a different feature of the media file, and each bin storing a count of the randomized versions of the media file that have the feature associated with the bin, wherein generating the histogram comprises: creating the plurality of bins in the memory of the at least one computer system, wherein each of the plurality of bins is associated with (1) a particular hash function included in a set of hash functions; and (2) a particular output value for the particular hash function given an input comprising one of the randomized version of the media file; applying the set of hash functions to each randomized version of the media file; and determining the count of the randomized media versions of the media file for each defined bin, each count indicating a number of randomized versions that resulted in the particular output value from the particular hash function; generating the fingerprint for the media file based on the histogram; and storing the fingerprint to a computer-readable storage medium.
2. The method of claim 1 , wherein the randomized attribute comprises an amount of media data to be cropped from the media file.
3. The method of claim 1 , further comprising: creating a number of altered versions of each bin; applying a first hash function to the number of altered versions of each bin to generate a plurality of outputs for the first hash function; determining a smallest output for the first hash function; and storing a first data element representative of the altered version that yielded the determined smallest output for the first hash function.
4. The method of claim 3 , wherein the number of altered versions of a bin is based on the count determined for the bin.
5. The method of claim 3 , wherein the number of altered versions of a bin is based on a global weight associated with a set of features specified by the bin.
6. The method of claim 3 , further comprising: applying a second hash function to the number of altered versions of each bin to generate a plurality of outputs for the second hash function; determining a smallest output for the second hash function; and storing a second data element representative of the altered version that yielded the determined smallest output for the second hash function.
7. The method of claim 6 , wherein the first hash function and the second hash function are part of a family of hash functions, the first hash function having a first seed value and the second hash function having a second seed value.
8. The method of claim 6 , further comprising: assigning the media file to a cluster of media files based on an output vector comprising at least the first and second data elements.
9. The method of claim 8 , wherein assigning the media file to a cluster of media files based on the output vector comprises: calculating a number of matching entries for the output vector for the media file and a second output vector for a second media file.
10. A non-transitory computer-readable storage medium storing computer-executable code, the computer-executable code when executed by a processor causing the processor to perform a process for generating a fingerprint for a media file, the process comprising: creating a plurality of randomized versions of the media file, each randomized version of the media file altered to a random extent with respect to an attribute of the media file; generating a histogram from the plurality of randomized versions of the media file, the histogram having a plurality of bins each associated with a different feature of the media file, and each bin storing a count of the randomized versions of the media file that have the feature associated with the bin, wherein generating the histogram comprises: creating the plurality of bins in the memory of the at least one computer system, wherein each of the plurality of bins is associated with (1) a particular hash function included in a set of hash functions; and (2) a particular output value for the particular hash function given an input comprising one of the randomized version of the media file; applying the set of hash functions to each randomized version of the media file; and determining the count of the randomized media versions of the media file for each defined bin, each count indicating a number of randomized versions that resulted in the particular output value from the particular hash function; generating the fingerprint for the media file based on the histogram; and storing the fingerprint.
11. The computer-readable storage medium of claim 10 , wherein the randomized attribute comprises an amount of media data to be cropped from the media file.
12. The computer-readable storage medium of claim 10 , wherein the process further comprises: creating a number of altered versions of each bin; applying a first hash function to the number of altered versions of each bin to generate a plurality of outputs for the first hash function; determining a smallest output for the first hash function; and storing a first data element representative of the altered version that yielded the determined smallest output for the first hash function.
13. The computer-readable storage medium of claim 12 , wherein the number of altered versions of a bin is based on the count determined for the bin.
14. The computer-readable storage medium of claim 12 , wherein the number of altered versions of a bin is based on a global weight associated with a set of features specified by the bin.
15. The computer-readable storage medium of claim 12 , further comprising: applying a second hash function to the number of altered versions of each bin to generate a plurality of outputs for the second hash function; determining a smallest output for the second hash function; and storing a second data element representative of the altered version that yielded the determined smallest output for the second hash function.
16. The computer-readable storage medium of claim 15 , wherein the first hash function and the second hash function are part of a family of hash functions, the first hash function having a first seed value and the second hash function having a second seed value.
17. The computer-readable storage medium of claim 15 , the process further comprising: assigning the media file to a cluster of media files based on an output vector comprising at least the first and second data elements.
18. The computer-readable storage medium of claim 17 , wherein assigning the media file to a cluster of media files based on the output vector comprises: calculating a number of matching entries for the output vector for the media file and a second output vector for a second media file.
19. A method for generating a fingerprint for a media file, the method executed by at least one computer system and comprising: determining a plurality of feature descriptors for the media file, the feature descriptors each associated with a position within the media file; coarsely encoding the position of the determined feature descriptors within the media file to determine a segment associated with each of the feature descriptors, the segment comprising a range of possible positions; generating, in a memory of the computer system, a histogram for the media file, the histogram having a plurality of bins each associated with a different segment of the media file, and each bin storing a count of the feature descriptors associated with the segment; creating a number of altered versions of each bin in the histogram; applying a first hash function to the number of altered versions of each bin to generate a plurality of outputs for the first hash function; determining a smallest output for the first hash function; storing a first data element representative of the altered version that yielded the determined smallest output for the first hash function; generating the fingerprint for the media file based on the stored first data element; histogram; and storing the fingerprint to a computer-readable storage medium.
20. The method of claim 19 , wherein the number of altered versions of a bin is based on the count determined for the bin.
21. The method of claim 19 , wherein the number of altered versions of a bin is based on a global weight associated with a set of features specified by the bin.
22. The method of claim 19 , further comprising: applying a second hash function to the number of altered versions of each bin to generate a plurality of outputs for the second hash function; determining a smallest output for the second hash function; and storing a second data element representative of the altered version that yielded the determined smallest output for the second hash function.
23. The method of claim 22 , wherein the first hash function and the second hash function are part of a family of hash functions, the first hash function having a first seed value and the second hash function having a second seed value.
24. The method of claim 22 , further comprising: assigning the media file to a cluster of media files based on an output vector comprising at least the first and second data elements.
25. The method of claim 24 , wherein assigning the media file to a cluster of media files based on the output vector comprises: calculating a number of matching entries for the output vector for the media file and a second output vector for a second media file.
26. A non-transitory computer-readable storage medium storing computer-executable code, the computer-executable code when executed by a processor causing the processor to perform a process generating a fingerprint for a media file, the process comprising: determining a plurality of feature descriptors for the media file, the feature descriptors each associated with a position within the media file; coarsely encoding the position of the determined feature descriptors within the media to determine a segment associated with each of the feature descriptors, the segment comprising a range of possible positions; generating, in a memory of the computer system, a histogram for the media file, the histogram having a plurality of bins each associated with a different segment of the media file, and each bin storing a count of the feature descriptors associated with the segment; creating a number of altered versions of each bin in the histogram; applying a first hash function to the number of altered versions of each bin to generate a plurality of outputs for the first hash function; determining a smallest output for the first hash function; storing a first data element representative of the altered version that yielded the determined smallest output for the first hash function; generating the fingerprint for the media file based on the histogram; and storing the fingerprint.
27. The computer-readable storage medium of claim 26 , wherein the number of altered versions of a bin is based on the count determined for the bin.
28. The computer-readable storage medium of claim 26 , wherein the number of altered versions of a bin is based on a global weight associated with a set of features specified by the bin.
29. The computer-readable storage medium of claim 26 , further comprising: applying a second hash function to the number of altered versions of each bin to generate a plurality of outputs for the second hash function; determining a smallest output for the second hash function; and storing a second data element representative of the altered version that yielded the determined smallest output for the second hash function.
30. The computer-readable storage medium of claim 29 , wherein the first hash function and the second hash function are part of a family of hash functions, the first hash function having a first seed value and the second hash function having a second seed value.
31. The computer-readable storage medium of claim 29 , the process further comprising: assigning the media file to a cluster of media files based on an output vector comprising at least the first and second data elements.
32. The computer-readable storage medium of claim 31 , wherein assigning the media file to a cluster of media files based on the output vector comprises: calculating a number of matching entries for the output vector for the media file and a second output vector for a second media file.
Unknown
October 16, 2012
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.