US-10424321

Audio data classification

PublishedSeptember 24, 2019

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for analyzing an audio sample to determine whether the audio sample includes music audio data. One or more detectors, including a spectral fluctuation detector, a peak repetition detector, and a beat pitch detector, may analyze the audio sample and generate a score that represents whether the audio sample includes music audio data. One or more of the scores may be combined to determine whether the audio sample includes music audio data or non-music audio data.

Patent Claims

30 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A computer implemented method comprising: receiving, by an audio classification system, an audio sample that is associated with audio data; computing, by the audio classification system, a spectrogram of the received audio sample; detecting, by the audio classification system, one or more beats in the spectrogram; detecting, by the audio classification system, one or more sustained pitches in the spectrogram around the beats; determining, by the audio classification system for each of the one or more beats, a score based on the sustained pitches around the respective beat using the spectrogram; determining, by the audio classification system using the respective score for each of the one or more beats, a beat pitch score that indicates a likelihood that the audio sample contains music audio data; determining, by the audio classification system, whether the beat pitch score satisfies a beat pitch threshold; and classifying, by the audio classification system, the audio sample as containing music audio data or not containing music audio data based on determining whether the beat pitch score satisfies the beat pitch threshold.

2. The method of claim 1 , wherein detecting the one or more beats in the spectrogram comprises: determining one or more horizontal peaks in the spectrogram; generating a sparse representation of the spectrogram based on the horizontal peaks; and detecting the one or more beats in the sparse spectrogram.

3. The method of claim 2 , wherein detecting the one or more sustained pitches in the spectrogram around the beats comprises: determining one or more vertical peaks in the spectrogram; and detecting the one or more sustained pitches in the spectrogram around the beats based on the vertical peaks in the spectrogram around the beats.

4. The method of claim 3 , further comprising: determining that the audio sample contains music audio data based on the beat pitch score satisfying the beat pitch threshold, classifying the audio sample as containing music audio data or not containing music audio data based on determining whether the beat pitch score satisfies the beat pitch threshold comprises classifying the audio sample as containing music audio data in response to determining that the audio sample contains music audio data.

5. The method of claim 3 , wherein determining, for each of the one or more beats, the score based on the sustained pitches around the beat using the spectrogram comprises: determining, for each of the one or more beats using the spectrogram, a window that is centered on the respective beat, each window having a predetermined width in time, each of the predetermined widths in time being the same; determining, for each of the windows, a quantity of vertical peaks in the window; and determining, for each of the one or more beats, a highest score associated with the beat, the highest score based on a highest quantity of vertical peaks in one of the windows associated with the beat; wherein the score for each of the one or more beats comprises the highest score for the beat.

6. A non-transitory computer storage medium encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising: receiving, by an audio classification system, an audio sample that is associated with audio data; computing, by the audio classification system, a spectrogram of the received audio sample; detecting, by the audio classification system, one or more beats in the spectrogram; detecting, by the audio classification system, one or more sustained pitches in the spectrogram around the beats; determining, by the audio classification system for each of the one or more beats, a score based on the sustained pitches around the respective beat using the spectrogram; determining, by the audio classification system using the respective score for each of the one or more beats, a beat pitch score that indicates a likelihood that the audio sample contains music audio data; determining, by the audio classification system, whether the beat pitch score satisfies a beat pitch threshold; and classifying, by the audio classification system, the audio sample as containing music audio data or not containing music audio data based on determining whether the beat pitch score satisfies the beat pitch threshold.

7. The computer storage medium of claim 6 , wherein detecting the one or more beats in the spectrogram comprises: determining one or more horizontal peaks in the spectrogram; generating a sparse representation of the spectrogram based on the horizontal peaks; and detecting the one or more beats in the sparse spectrogram.

8. The computer storage medium of claim 7 , wherein detecting the one or more sustained pitches in the spectrogram around the beats comprises: determining one or more vertical peaks in the spectrogram; and detecting the one or more sustained pitches in the spectrogram around the beats based on the vertical peaks in the spectrogram around the beats.

9. The computer storage medium of claim 8 , the operations further comprising: determining that the audio sample contains music audio data based on the beat pitch score satisfying the beat pitch threshold, classifying the audio sample as containing music audio data or not containing music audio data based on determining whether the beat pitch score satisfies the beat pitch threshold comprises classifying the audio sample as containing music audio data in response to determining that the audio sample contains music audio data.

10. The computer storage medium of claim 8 , wherein determining, for each of the one or more beats, the score based on the sustained pitches around the beat using the spectrogram comprises: determining, for each of the one or more beats using the spectrogram, a window that is centered on the respective beat, each window having a predetermined width in time, each of the predetermined widths in time being the same; determining, for each of the windows, a quantity of vertical peaks in the window; and determining, for each of the one or more beats, a highest score associated with the beat, the highest score based on a highest quantity of vertical peaks in one of the windows associated with the beat; wherein the score for each of the one or more beats comprises the highest score for the beat.

11. An audio classification system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an audio sample that is associated with audio data; computing a spectrogram of the received audio sample; detecting one or more beats in the spectrogram; detecting one or more sustained pitches in the spectrogram around the beats; determining, for each of the one or more beats, a score based on the sustained pitches around the respective beat using the spectrogram; determining, using the respective score for each of the one or more beats, a beat pitch score that indicates a likelihood that the audio sample contains music audio data; determining whether the beat pitch score satisfies a beat pitch threshold; and classifying the audio sample as containing music audio data or not containing music audio data based on determining whether the beat pitch score satisfies the beat pitch threshold.

12. The system of claim 11 , wherein detecting the one or more beats in the spectrogram comprises: determining one or more horizontal peaks in the spectrogram; generating a sparse representation of the spectrogram based on the horizontal peaks; and detecting the one or more beats in the sparse spectrogram.

13. The system of claim 12 , wherein detecting the one or more sustained pitches in the spectrogram around the beats comprises: determining one or more vertical peaks in the spectrogram; and detecting the one or more sustained pitches in the spectrogram around the beats based on the vertical peaks in the spectrogram around the beats.

14. The system of claim 13 , the operations further comprising: determining that the audio sample contains music audio data based on the beat pitch score satisfying the beat pitch threshold, classifying the audio sample as containing music audio data or not containing music audio data based on determining whether the beat pitch score satisfies the beat pitch threshold comprises classifying the audio sample as containing music audio data in response to determining that the audio sample contains music audio data.

15. The system of claim 13 , wherein determining, for each of the one or more beats, the score based on the sustained pitches around the beat using the spectrogram comprises: determining, for each of the one or more beats using the spectrogram, a window that is centered on the respective beat, each window having a predetermined width in time, each of the predetermined widths in time being the same; determining, for each of the windows, a quantity of vertical peaks in the window; and determining, for each of the one or more beats, a highest score associated with the beat, the highest score based on a highest quantity of vertical peaks in one of the windows associated with the beat; wherein the score for each of the one or more beats comprises the highest score for the beat.

16. A computer implemented method comprising: receiving, by an audio classification system, an audio sample that is associated with audio data; computing, by the audio classification system, a spectrogram of the received audio sample; determining, by the audio classification system, an average spectral envelope of the spectrogram that is a curve in the frequency-amplitude plane of the spectrogram; determining, by the audio classification system, one or more differences between adjacent values in the average spectral envelope; determining, by the audio classification system using the differences between adjacent values in the average spectral envelope, a spectral fluctuation score that indicates a likelihood that the audio sample contains music audio data; determining, by the audio classification system, whether the spectral fluctuation score satisfies a threshold score; and classifying, by the audio classification system, the audio sample as containing music audio data or not containing music audio data based on determining whether on the spectral fluctuation score satisfies the threshold score.

17. The method of claim 16 , wherein determining, using the differences between adjacent values in the average spectral envelope, the spectral fluctuation score that indicates the likelihood that the audio sample contains music audio data comprises determining a mean of the one or more differences between adjacent values in the average spectral envelope.

18. The method of claim 17 , wherein determining the mean of the one or more differences between adjacent values in the average spectral envelope comprises determining the mean of the absolute values of the differences between adjacent values in the average spectral envelope.

19. The method of claim 18 , further comprising: approximating a first derivative of the average spectral envelope in the frequency dimension; wherein determining the one or more differences between adjacent values in the average spectral envelope comprises determining the one or more differences between adjacent values in the average spectral envelope based on the first derivative of the average spectral envelope.

20. The method of claim 16 , further comprising: determining an average squared magnitude of the audio sample; and comparing the average squared magnitude of the audio sample to a threshold value; wherein computing the spectrogram is based on determining that the average squared magnitude of the audio sample is greater than the threshold value.

21. A non-transitory computer storage medium encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising: receiving, by an audio classification system, an audio sample that is associated with audio data; computing, by the audio classification system, a spectrogram of the received audio sample; determining, by the audio classification system, an average spectral envelope of the spectrogram that is a curve in the frequency-amplitude plane of the spectrogram; determining, by the audio classification system, one or more differences between adjacent values in the average spectral envelope; determining, by the audio classification system using the differences between adjacent values in the average spectral envelope, a spectral fluctuation score that indicates a likelihood that the audio sample contains music audio data; determining, by the audio classification system, whether the spectral fluctuation score satisfies a threshold score; and classifying, by the audio classification system, the audio sample as containing music audio data or not containing music audio data based on determining whether on the spectral fluctuation score satisfies the threshold score.

22. The computer storage medium of claim 21 , wherein determining, using the differences between adjacent values in the average spectral envelope, the spectral fluctuation score that indicates the likelihood that the audio sample contains music audio data comprises determining a mean of the one or more differences between adjacent values in the average spectral envelope.

23. The computer storage medium of claim 22 , wherein determining the mean of the one or more differences between adjacent values in the average spectral envelope comprises determining the mean of the absolute values of the differences between adjacent values in the average spectral envelope.

24. The computer storage medium of claim 23 , the operations further comprising: approximating a first derivative of the average spectral envelope in the frequency dimension; wherein determining the one or more differences between adjacent values in the average spectral envelope comprises determining the one or more differences between adjacent values in the average spectral envelope based on the first derivative of the average spectral envelope.

25. The computer storage medium of claim 21 , the operations further comprising: determining an average squared magnitude of the audio sample; and comparing the average squared magnitude of the audio sample to a threshold value; wherein computing the spectrogram is based on determining that the average squared magnitude of the audio sample is greater than the threshold value.

26. An audio classification system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an audio sample that is associated with audio data; computing a spectrogram of the received audio sample; determining an average spectral envelope of the spectrogram that is a curve in the frequency-amplitude plane of the spectrogram; determining one or more differences between adjacent values in the average spectral envelope; determining, using the differences between adjacent values in the average spectral envelope, a spectral fluctuation score that indicates a likelihood that the audio sample contains music audio data; determining whether the spectral fluctuation score satisfies a threshold score; and classifying the audio sample as containing music audio data or not containing music audio data based on determining whether on the spectral fluctuation score satisfies the threshold score.

27. The system of claim 26 , wherein determining, using the differences between adjacent values in the average spectral envelope, the spectral fluctuation score that indicates the likelihood that the audio sample contains music audio data comprises determining a mean of the one or more differences between adjacent values in the average spectral envelope.

28. The system of claim 27 , wherein determining the mean of the one or more differences between adjacent values in the average spectral envelope comprises determining the mean of the absolute values of the differences between adjacent values in the average spectral envelope.

29. The system of claim 28 , the operations further comprising: approximating a first derivative of the average spectral envelope in the frequency dimension; wherein determining the one or more differences between adjacent values in the average spectral envelope comprises determining the one or more differences between adjacent values in the average spectral envelope based on the first derivative of the average spectral envelope.

30. The system of claim 26 , the operations further comprising: determining an average squared magnitude of the audio sample; and comparing the average squared magnitude of the audio sample to a threshold value; wherein computing the spectrogram is based on determining that the average squared magnitude of the audio sample is greater than the threshold value.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L G06F

Patent Metadata

Filing Date

July 1, 2013

Publication Date

September 24, 2019

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search