US-9589570

Audio classification based on perceptual quality for low or medium bit rates

PublishedMarch 7, 2017

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

The quality of encoded signals can be improved by reclassifying AUDIO signals carrying non-speech data as VOICE signals when periodicity parameters of the signal satisfy one or more criteria. In some embodiments, only low or medium bit rate signals are considered for re-classification. The periodicity parameters can include any characteristic or set of characteristics indicative of periodicity. For example, the periodicity parameter may include pitch differences between subframes in the audio signal, a normalized pitch correlation for one or more subframes, an average normalized pitch correlation for the audio signal, or combinations thereof. Audio signals which are re-classified as VOICED signals may be encoded in the time-domain, while audio signals that remain classified as AUDIO signals may be encoded in the frequency-domain.

Patent Claims

10 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 1

Original Legal Text

1. A method for encoding signals, the method comprising: receiving, by an audio encoder, a digital signal comprising audio data, wherein the audio data includes data of speech and non-speech sounds; classifying, by the audio encoder, the digital signal as an AUDIO signal based on the audio data in the digital signal; determining, by the audio encoder, whether classifying conditions are satisfied, wherein the classifying conditions include: pitch differences between sub-frames in the digital signal are less than a first threshold, a coding rate of the digital signal is below a second threshold, an average normalized pitch correlation value for the sub-frames in the digital signal is greater than a third threshold and a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a fourth threshold, wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two sub-frames respectively; re-classifying, by the audio encoder, the digital signal as a VOICED signal when the classifying conditions are satisfied; encoding, by the audio encoder, the digital signal in the time-domain if the digital signal is classified as a VOICED signal; and encoding, by the audio encoder, the digital signal in the frequency-domain if the digital signal is classified as an AUDIO signal.

Plain English Translation

An audio encoder improves encoding quality by reclassifying audio signals that contain non-speech data as "VOICED" signals if certain conditions related to the signal's periodicity are met. Specifically, the encoder initially classifies a digital signal as "AUDIO." It then checks if the pitch differences between sub-frames are low, the coding rate is low, and the average normalized pitch correlation and a smoothed version of it are high. If all these conditions are true, the encoder reclassifies the signal as "VOICED." "VOICED" signals are encoded in the time-domain, while signals remaining "AUDIO" are encoded in the frequency-domain. This optimizes the encoding process, especially at low or medium bit rates, by adapting the encoding method to the signal characteristics.

Claim 2

Original Legal Text

2. The method of claim 1 , wherein the average normalized pitch correlation value for the sub-frames in the digital signal is obtained by: determining a normalized pitch correlation value for each sub-frame in the digital signal; and dividing the sum of all normalized pitch correlation values by the number of the sub-frames in the digital signal to obtain the average normalized pitch correlation value.

Plain English Translation

To calculate the average normalized pitch correlation used in reclassifying audio signals, the audio encoder first determines the normalized pitch correlation value for each sub-frame of the digital signal. Then, it sums up all these normalized pitch correlation values. Finally, it divides this sum by the total number of sub-frames in the digital signal. This result is the average normalized pitch correlation value, which is then compared against a threshold to determine whether to re-classify an AUDIO signal as a VOICED signal for encoding in the time domain, potentially improving coding quality at low or medium bitrates.

Claim 3

Original Legal Text

3. The method of claim 1 , wherein the digital signal carries non-speech data.

Plain English Translation

The method for encoding signals, which involves reclassifying audio signals as VOICED signals based on periodicity parameters, is applied to digital signals specifically carrying non-speech data. This means the reclassification and subsequent encoding process is intended for audio content that is not speech, such as music or other sound effects. The purpose is to optimize the encoding of non-speech audio by analyzing its pitch characteristics and choosing the most appropriate encoding domain (time or frequency) based on whether it resembles a "VOICED" signal in terms of periodicity.

Claim 4

Original Legal Text

4. The method of claim 1 , wherein the digital signal carries music data.

Plain English Translation

The method for encoding signals, which involves reclassifying audio signals as VOICED signals based on periodicity parameters, is applied to digital signals specifically carrying music data. This means the reclassification and subsequent encoding process is tailored for music content. The goal is to improve music encoding efficiency by analyzing pitch characteristics and choosing either time-domain or frequency-domain encoding based on whether the music signal exhibits periodicity similar to a "VOICED" signal.

Claim 6

Original Legal Text

6. The method of claim 5 , wherein, P 1 , P 2 , P 3 , and P 4 are the best pitch values found in a pitch range from a minimum pitch limit PIT_MIN to a maximum pitch limit PIT_MAX for each sub-frame.

Plain English Translation

(This claim is dependent on claim 5, which is missing. Unable to fulfill the request.)

Claim 8

Original Legal Text

8. An audio encoder comprising: at least one processor; and a computer readable storage medium storing programming for execution by the at least one processor, the programming including instructions to: receive a digital signal comprising audio data, wherein the audio data includes data of speech and non-speech sounds; classify the digital signal as an AUDIO signal based on the audio data in the digital signal; determine whether classifying conditions are satisfied, wherein, the classifying conditions include: pitch differences between sub-frames in the digital signal are less than a first threshold, a coding rate of the digital signal is below a second threshold, an average normalized pitch correlation value for the sub-frames in the digital signal is greater than a third threshold and a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a fourth threshold; wherein, each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two sub-frames respectively; re-classify the digital signal as a VOICED signal when the classifying conditions are satisfied; encode the digital signal in the time-domain if the digital signal is classified as a VOICED signal; and encode the digital signal in the frequency-domain if the digital signal is classified as an AUDIO signal.

Plain English Translation

An audio encoder includes a processor and memory programmed to improve encoding quality by reclassifying audio signals that contain non-speech data as "VOICED" signals if certain conditions related to the signal's periodicity are met. Specifically, the encoder receives a digital signal and initially classifies it as "AUDIO." It then checks if the pitch differences between sub-frames are low, the coding rate is low, and the average normalized pitch correlation and a smoothed version of it are high. Pitch difference is the absolute value of the difference between pitch values of two sub-frames. If all these conditions are true, the encoder reclassifies the signal as "VOICED." "VOICED" signals are encoded in the time-domain, while signals remaining "AUDIO" are encoded in the frequency-domain. This optimizes encoding, especially at low/medium bit rates.

Claim 9

Original Legal Text

9. The audio encoder of claim 8 , wherein the instructions to determine an average normalized pitch correlation value for the sub-frames in the digital signal include instructions to: determine a normalized pitch correlation value for each sub-frame in the digital signal; and divide the sum of all normalized pitch correlation values by the number of the sub-frames in the digital signal to obtain the average normalized pitch correlation value.

Plain English Translation

In the audio encoder described above, the process to calculate the average normalized pitch correlation involves first determining a normalized pitch correlation value for each sub-frame of the digital signal. It then calculates the sum of all these normalized pitch correlation values. Finally, it divides this sum by the total number of sub-frames in the digital signal to obtain the average normalized pitch correlation value. This average is then used in the logic to determine whether to re-classify an AUDIO signal as a VOICED signal and encode in the time domain.

Claim 10

Original Legal Text

10. The audio encoder of claim 8 , wherein the digital signal carries non-speech data.

Plain English Translation

The audio encoder, designed to reclassify audio signals as VOICED based on periodicity to improve encoding, is specifically used for digital signals carrying non-speech data. This means the encoder is optimized for audio content such as music or environmental sounds, rather than speech. The system analyzes pitch characteristics to determine whether the non-speech signal exhibits periodicity resembling a "VOICED" signal, and selects time or frequency domain encoding accordingly.

Claim 11

Original Legal Text

11. The audio encoder of claim 8 , wherein the digital signal carries music data.

Plain English Translation

The audio encoder, designed to reclassify audio signals as VOICED based on periodicity to improve encoding, is specifically used for digital signals carrying music data. This means the encoder is optimized for music content. The encoder analyzes pitch characteristics to determine whether the music signal exhibits periodicity resembling a "VOICED" signal, and then chooses time-domain or frequency-domain encoding based on that analysis.

Claim 13

Original Legal Text

13. The audio encoder of claim 12 , wherein, P 1 , P 2 , P 3 , and P 4 are the best pitch values found in a pitch range from a minimum pitch limit PIT_MIN to a maximum pitch limit PIT_MAX for each sub-frame.

Plain English Translation

(This claim is dependent on claim 12, which is missing. Unable to fulfill the request.)

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L

Patent Metadata

Filing Date

September 13, 2013

Publication Date

March 7, 2017

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search