7328153

Automatic Identification of Sound Recordings

PublishedFebruary 5, 2008
Assigneenot available in USPTO data we have
Technical Abstract

Patent Claims
63 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1

1. A method of identifying digital recordings, comprising: extracting at least one candidate fingerprint from at least one portion of an unidentified recording, each candidate fingerprint including a predetermined number of candidate values for corresponding frequency ranges and each reference fingerprint including the predetermined number of reference values for the corresponding frequency ranges; and searching for a match between at least one candidate value derived from the at least one candidate fingerprint and at least one reference value in at least one reference fingerprint among a plurality of reference fingerprints, by determining whether each candidate fingerprint matches one of the reference fingerprints based on selectively weighted differences between corresponding candidate and reference values for different frequency ranges.

2

2. A method as recited in claim 1 , wherein said searching comprises computing at least one weighted absolute difference between the at least one candidate fingerprint and the at least one reference fingerprint using a weight based on a value derived from the at least one candidate fingerprint.

3

3. A method as recited in claim 1 , further comprising prior to said extracting, expanding dynamic range of the at least one portion of the unidentified recording.

4

4. A method as recited in claim 3 , wherein said expanding of the dynamic range makes all sample values within the at least one portion of an unidentified recording more equally likely.

5

5. A method as recited in claim 1 , further comprising: storing in a cache memory matched candidate fingerprints with identifiers of corresponding reference fingerprints; and determining whether a new candidate fingerprint is included in the matched candidate fingerprints in the cache memory prior to said searching using the new candidate fingerprint.

6

6. A method as recited in claim 5 , further comprising: indicating a match between the new candidate fingerprint and a corresponding reference fingerprint when the new candidate fingerprint is included in the matched candidate fingerprints in the cache memory; and adding the new candidate fingerprint to the cache memory and associating a corresponding identifier for the corresponding reference fingerprint with new candidate fingerprint in the cache memory.

7

7. A method as recited in claim 1 , further comprising generating each of the candidate and reference fingerprints to include values representing a magnitude of power at frequencies in frequency ranges with mid-range frequencies weighted less than high- and low-range frequencies.

8

8. A method as recited in claim 1 , wherein generation of each of the candidate and reference fingerprints comprises: computing power in each of a plurality of frequency bands; and normalizing the power for each frequency within each band so that a mean of the power within each band is equal to a predetermined value.

9

9. A method as recited in claim 1 , wherein generation of each of the candidate and reference fingerprints comprises computing a frequency distribution within each of a plurality of different frequency bands using a finer resolution at lower frequency bands than at higher frequency bands.

10

10. A method as recited in claim 1 , further comprising storing a plurality of the reference fingerprints for each of a plurality of reference recordings, wherein said extracting produces a plurality of candidate fingerprints from successive frames at a regular time interval, and wherein said searching identifies the unidentified recording as corresponding to a single reference recording only if matches are found between the reference fingerprints from the single reference recording and the candidate fingerprints obtained from a predetermined number of the successive frames.

11

11. A method as recited in claim 1 , further comprising storing a plurality of the reference fingerprints for each of a plurality of reference recordings, wherein said extracting produces a plurality of candidate fingerprints, and wherein said searching comprises: finding a first match between a first candidate fingerprint and one of the reference fingerprints for a potentially matching reference recording; and comparing other candidate fingerprints from the unknown recording with the reference fingerprints for the potentially matching reference recording until a predetermined number of matches are found.

12

12. A method as recited in claim 1 further comprising storing a plurality of the reference fingerprints for each of a plurality of reference recordings, and wherein said searching includes all of the reference fingerprints, unless a match is found.

13

13. A method as recited in claim 1 , further comprising generating the reference fingerprints for reference recordings by extracting a principal fingerprint from a specified portion of each reference recording; extracting auxiliary fingerprints from the reference recording at a regular time interval; computing distance measures from the principal fingerprint to the auxiliary fingerprints, respectively; generating a song profile based on the distance measures; and storing the principal fingerprint combined with the song profile as the reference fingerprint for the reference recording.

14

14. A method as recited in claim 1 , wherein said extracting comprises: separating the at least one portion of the unidentified recording into frequency bands; computing power spectra for the frequency bands, respectively; and computing at least one value from each power spectra.

15

15. A method as recited in claim 14 , wherein the frequency bands are output from filters derived from one prototype filter corresponding to an analysis wavelet.

16

16. A method as recited in claim 15 , wherein a ratio of bandwidth to center frequency is substantially identical for all of the filters.

17

17. A method as recited in claim 1 , wherein each of the candidate and reference fingerprints include a vector of at least 5 elements having at least 256 values each.

18

18. A method as recited in claim 17 , wherein each of the candidate and reference fingerprints include a vector of up to 38 elements having no more than 65,536 values each.

19

19. A method as recited in claim 18 , wherein each of the candidate and reference fingerprints include a vector of approximately 30 elements of approximately 16 bits each.

20

20. A method as recited in claim 1 , wherein said extracting produces a plurality of candidate fingerprints, each from different copies corresponding to a single reference recording, at least one of the different copies having been modified prior to said extracting.

21

21. A method as recited in claim 20 , wherein the at least one of the different copies having been modified by at least one of a time based audio effect, a frequency based audio effect, and a signal compression scheme.

22

22. A method of identifying digital recordings, comprising: extracting first and second candidate fingerprints from the least one portion of an unidentified recording, the first candidate fingerprint having low discernability of frequency variation from the original and the second candidate fingerprint having low discernability of amplitude variation from the originals; storing, for reference recordings, first reference fingerprints having low discernability of frequency variation and second reference fingerprints with low discernability of amplitude variation; and comparing the first candidate fingerprint with the first reference fingerprints and the second candidate fingerprint with the second reference fingerprints to find a match for the unidentified recording among the reference recordings.

23

23. A method as recited in claim 22 , wherein a first processor is used for said comparing of the first candidate fingerprint with the first reference fingerprints and concurrently a second processor is used for said comparing of the second candidate fingerprint with the second reference fingerprints.

24

24. A method as recited in claim 22 , wherein a first result of said comparing of the first candidate fingerprint with the first reference fingerprints is combined with a second result of said comparing of the second candidate fingerprint with the second reference fingerprints to determine whether corresponding first and second reference fingerprints for both the first and second fingerprints are stored.

25

25. A method as recited in claim 22 , wherein each of the at least one portion of the unidentified recording has a duration of less than 25 seconds.

26

26. A method as recited in claim 25 , wherein each of the at least one portion of the unidentified recording has a duration of at least 10 seconds and no greater than 20 seconds.

27

27. A method of identifying digital recordings, comprising: extracting weighted frequency spectra using overlapping frames with time weighting to smoothly transition between frames of an unidentified recording; and searching for a match between at least one candidate value derived from the weighted frequency spectra and at least one reference value in at least one reference fingerprint among a plurality of reference fingerprints, by transforming the weighted frequency spectra to transformed frequency spectra using a perceptual power scale attenuating high values relative to low values; computing the at least one candidate value from the transformed frequency spectra; and identifying the at least one reference value in the reference fingerprints that matches the at least one candidate value.

28

28. A method of identifying digital recordings, comprising: partitioning at least one portion of an unidentified recording into time-frequency regions, each time-frequency region covering at least three ranges of time frames and at least three ranges of frequencies; weighting the time-frequency regions to produce weighted time-frequency regions with emphasis on at least one middle-time and middle-frequency region; computing at least one candidate value of at least one candidate fingerprint using the weighted time-frequency regions; and searching for a match between the at least one candidate value and at least one reference value in at least one reference fingerprint among a plurality of reference fingerprints.

29

29. A method of identifying digital recordings, comprising; generating reference fingerprints for reference recordings by extracting a principal fingerprint from a specified portion of each reference recording; extracting auxiliary fingerprints from the reference recording at a regular time interval; computing reference distance measures from the principal fingerprint to the auxiliary fingerprints, respectively; generating a reference song profile based on the reference distance measures; and storing the principal fingerprint combined with the reference song profile as the reference fingerprint for the reference recording; extracting an initial candidate fingerprint and subsequent candidate fingerprints, following the initial candidate fingerprint at the regular time interval, for an unknown digital recording; and searching for a potentially matching reference recording for the unknown digital recording, by comparing the initial candidate fingerprint with the principal fingerprint for at least one of the reference recordings, and when the potentially matching reference recording is found, computing candidate distance measures from the initial candidate fingerprint to the subsequent candidate fingerprints, respectively; generating a candidate song profile based on the candidate distance measures; and identifying the unknown digital recording as the potentially matching reference recording only if the candidate song profile has a predetermined correlation to the reference song profile for the potentially matching reference recording.

30

30. A method as recited in claim 29 , wherein said comparing begins prior to completing said extracting of the subsequent candidate fingerprints.

31

31. A method of generating reference fingerprints of reference recordings for identifying unknown digital recordings, comprising: extracting a principal fingerprint from a specified portion of each reference recording and auxiliary fingerprints from the reference recording at regular frame intervals, each of the principal and auxiliary; fingerprints including a predetermined number of candidate values for corresponding frequency ranges; computing distance measures from the principal fingerprint to the auxiliary fingerprints, respectively; using selectively weighted difference between corresponding candidate and reference values generating a song profile based on the distance measures; and storing the principal fingerprint combined with the song profile as a reference fingerprint for the reference recording used to identify the unknown digital recordings.

32

32. At least one computer readable medium encoding instructions that when executed cause at least one processor to perform a method of identifying digital recordings, comprising: extracting at least one candidate fingerprint from at least one portion of an unidentified recording, each candidate fingerprint including a predetermined number of candidate values for corresponding frequency ranges and each reference fingerprint including the predetermined number of reference values for the corresponding frequency ranges; and searching for a match between at least one candidate value derived from the at least one candidate fingerprint and at least one reference value in at least one reference fingerprint among a plurality of reference fingerprints, by determining whether each candidate fingerprint matches one of the reference fingerprints based on selectively weighted differences between corresponding candidate and reference values for different frequency ranges.

33

33. At least one computer readable medium as recited in claim 32 , wherein said searching comprises computing at least one weighted absolute difference between the at least one candidate fingerprint and the at least one reference fingerprint using a weight based on a value derived from the at least one candidate fingerprint.

34

34. At least one computer readable medium as recited in claim 32 , further comprising prior to said extracting, expanding dynamic range of the at least one portion of the unidentified recording.

35

35. At least one computer readable medium as recited in claim 34 , wherein said expanding of the dynamic range makes all sample values within the at least one portion of an unidentified recording more equally likely.

36

36. At least one computer readable medium as recited in claim 32 , further comprising: storing in a cache memory matched candidate fingerprints with identifiers of corresponding reference fingerprints; and determining whether a new candidate fingerprint is included in the matched candidate fingerprints in the cache memory prior to said searching using the new candidate fingerprint.

37

37. At least one computer readable medium as recited in claim 36 , further comprising: indicating a match between the new candidate fingerprint and a corresponding reference fingerprint when the new candidate fingerprint is included in the matched candidate fingerprints in the cache memory; and adding the new candidate fingerprint to the cache memory and associating a corresponding identifier for the corresponding reference fingerprint with new candidate fingerprint in the cache memory.

38

38. At least one computer readable medium as recited in claim 32 , further comprising generating each of the candidate and reference fingerprints to include values representing a magnitude of power at frequencies in frequency ranges with mid-range frequencies weighted less than high- and low-range frequencies.

39

39. At least one computer readable medium as recited in claim 32 , wherein generation of each of the candidate and reference fingerprints comprises: computing power in each of a plurality of frequency bands; and normalizing the power for each frequency within each band so that a mean of the power within each band is equal to a predetermined value.

40

40. At least one computer readable medium as recited in claim 32 , wherein generation of each of the candidate and reference fingerprints comprises computing a frequency distribution within each of a plurality of different frequency bands using a finer resolution at lower frequency bands than at higher frequency bands.

41

41. At least one computer readable medium as recited in claim 32 , wherein the portion of the unidentified recording has a duration of less than 25 seconds.

42

42. At least one computer readable medium as recited in claim 41 , wherein the portion of the unidentified recording has a duration of at least 10 seconds and no greater than 20 seconds.

43

43. At least one computer readable medium as recited in claim 32 , further comprising storing a plurality of the reference fingerprints for each of a plurality of reference recordings, wherein said extracting produces a plurality of candidate fingerprints from successive frames at a regular time interval, and wherein said searching identifies the unidentified recording as corresponding to a single reference recording only if matches are found between the reference fingerprints from the single reference recording and the candidate fingerprints obtained from a predetermined number of the successive frames.

44

44. At least one computer readable medium as recited in claim 32 , further comprising storing a plurality of the reference fingerprints for each of a plurality of reference recordings, wherein said extracting produces a plurality of candidate fingerprints, and wherein said searching comprises: finding a first match between a first candidate fingerprint and one of the reference fingerprints for a potentially matching reference recording; and comparing other candidate fingerprints from the unidentified recording with the reference fingerprints for the potentially matching reference recording until a predetermined number of matches are found.

45

45. At least one computer readable medium as recited in claim 32 , further comprising storing a plurality of the reference fingerprints for each of a plurality of reference recordings, and wherein said searching includes all of the reference fingerprints, unless a match is found.

46

46. At least one computer readable medium as recited in claim 32 , further comprising generating the reference fingerprints for reference recordings by extracting a principal fingerprint from a specified portion of each reference recording; extracting auxiliary fingerprints from the reference recording at a regular time interval; computing distance measures from the principal fingerprint to the auxiliary fingerprints, respectively; generating a song profile based on the distance measures; and storing the principal fingerprint combined with the song profile as the reference fingerprint for the reference recording.

47

47. At least one computer readable medium as recited in claim 32 , wherein said extracting comprises: separating the at least one portion of the unidentified recording into frequency bands; computing power spectra for the frequency bands, respectively; and computing at least one value from each power spectra.

48

48. At least one computer readable medium as recited in claim 47 , wherein the frequency bands are output from filters derived from one prototype filter corresponding to an analysis wavelet.

49

49. At least one computer readable medium as recited in claim 48 , wherein a ratio of bandwidth to center frequency is substantially identical for all of the filters.

50

50. At least one computer readable medium as recited in claim 32 , wherein each of the candidate and reference fingerprints include a vector of at least 5 elements having at least 256 values each.

51

51. At least one computer readable medium as recited in claim 50 , wherein each of the candidate and reference fingerprints include a vector of up to 38 elements having no more than 65,536 values each.

52

52. At least one computer readable medium as recited in claim 51 , wherein each of the candidate and reference fingerprints include a vector of approximately 30 elements of approximately 16 bits each.

53

53. At least one computer readable medium as recited in claim 32 , wherein said extracting produces a plurality of candidate fingerprints, each from different copies corresponding to a single reference recording, at least one of the different copies having been modified prior to said extracting.

54

54. At least one computer readable medium as recited in claim 53 , wherein the at least one of the different copies having been modified by at least one of a time based audio effect, a frequency based audio effect, and a signal compression scheme.

55

55. At least one computer readable medium encoding instructions that when executed cause at least one processor to perform a method of identifying digital recordings, comprising: extracting first and second candidate fingerprints from at least one portion of an unidentified recording, the first candidate fingerprint having low discernability of frequency variation from the original and the second candidate fingerprint having low discernability of amplitude variation from the originals; storing, for reference recordings, first reference fingerprints having low discernability of frequency variation and second reference fingerprints with low discernability of amplitude variation; and comparing the first candidate fingerprint with the first reference fingerprints and the second candidate fingerprint with the second reference fingerprints to find a match for the unidentified recording among the reference recordings.

56

56. At least one computer readable medium as recited in claim 55 , wherein a first processor is used for said comparing of the first candidate fingerprint with the first reference fingerprints and concurrently a second processor is used for said comparing of the second candidate fingerprint with the second reference fingerprints.

57

57. At least one computer readable medium as recited in claim 55 , wherein a first result of said comparing of the first candidate fingerprint with the first reference fingerprints is combined with a second result of said comparing of the second candidate fingerprint with the second reference fingerprints to determine whether corresponding first and second reference fingerprints for both the first and second fingerprints are stored.

58

58. At least one computer readable medium encoding instructions that when executed cause at least one processor to perform a method of identifying digital recordings, comprising: extracting weighted frequency spectra using overlapping frames with time weighting to smoothly transition between frames of an unidentified recording; and searching for a match between at least one candidate value derived from the weighted frequency spectra and at least one reference value in at least one reference fingerprint among a plurality of reference fingerprints, by transforming the weighted frequency spectra to transformed frequency spectra using a perceptual power scale attenuating high values relative to low values; computing the at least one candidate value from the transformed frequency spectra; and identifying the at least one reference value in the reference fingerprints that matches the at least one candidate value.

59

59. At least one computer readable medium encoding instructions that when executed cause at least one processor to a method of identifying digital recordings, comprising: partitioning at least one portion of an unidentified recording into time-frequency regions, each time-frequency region covering at least three ranges of time frames and at least three ranges of frequencies; weighting time-frequency regions to produce weighted time-frequency regions with emphasis on at least one middle-time and middle-frequency region; computing the at least one candidate value using the weighted time-frequency regions; and identifying the at least one reference value in the reference fingerprints that matches the at least one candidate value.

60

60. At least one computer readable medium encoding instructions that when executed cause at least one processor to perform a method of identifying digital recordings, comprising: generating reference fingerprints for reference recordings by extracting a principal fingerprint from a specified portion of each reference recording; extracting auxiliary fingerprints from the reference recording at a regular time interval; computing reference distance measures from the principal fingerprint to the auxiliary fingerprints, respectively; generating a reference song profile based on the reference distance measures; and storing the principal fingerprint combined with the reference song profile as the reference fingerprint for the reference recording; extracting an initial candidate fingerprint and subsequent candidate fingerprints following the initial candidate fingerprint at the regular time interval, for an unknown digital recording; and searching for a potentially matching reference recording for the unknown digital recording, by comparing the initial candidate fingerprint with the principal fingerprint for at least one of the reference recordings, and when the potentially matching reference recording is found, computing candidate distance measures from the initial candidate fingerprint to the subsequent candidate fingerprints, respectively; generating a candidate song profile based on the candidate distance measures; and identifying the unknown digital recording as the potentially matching reference recording only if the candidate song profile has a predetermined correlation to the reference song profile for the potentially matching reference recording.

61

61. At least one computer readable medium as recited in claim 60 , wherein said comparing begins prior to completing said extracting of the subsequent candidate fingerprints.

62

62. At least one computer readable medium storing at least one program embodying a method of generating reference fingerprints of reference recordings for identifying unknown digital recordings, said method comprising: extracting a principal fingerprint from a specified portion of each reference recordings and auxiliary fingerprints from the reference recording at regular frame intervals, each of the principal and auxiliary fingerprints including a predetermined number of candidate values for corresponding frequency ranges; computing distance measures from the principal fingerprint to the auxiliary fingerprints, respectively; using selectively weighted difference between corresponding candidate and reference values generating a song profile based on the distance measures; and storing the principal fingerprint combined with the song profile as the reference fingerprint for the reference recording used in identifying the unknown digital recordings.

63

63. A system for identifying digital recordings, comprising: a storage unit storing reference fingerprints; using selectively weighted differences between corresponding candidate and reference values and a processor, coupled to said storage unit, extracting at least one candidate fingerprint from at least one portion of an unidentified digital recording, each candidate fingerprint including a predetermined number of candidate values for corresponding frequency ranges, and searching for a match between at least one candidate value derived from the at least one candidate fingerprint and at least one reference value in at least one reference fingerprint among the reference fingerprints.

Patent Metadata

Filing Date

Unknown

Publication Date

February 5, 2008

Inventors

Maxwell Wells
Vidya Venkatachalam
Luca Cazzanti
Kwan Fai Cheung
Navdeep Dhillon
Somsak Sukittanon

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “AUTOMATIC IDENTIFICATION OF SOUND RECORDINGS” (7328153). https://patentable.app/patents/7328153

© 2026 Patentable. All rights reserved.

Patentable is a research and drafting-assistant tool, not a law firm, and does not provide legal advice. Documents we generate are drafts for review by a licensed patent attorney.