Method, Apparatus for Eliminating Popping Sounds at the Beginning of Audio, and Storage Medium

PublishedFebruary 5, 2019

Assigneenot available in USPTO data we have

InventorsLingcheng KONG

Technical Abstract

Patent Claims

15 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for eliminating popping sounds at beginning of audio, comprising: by a server system, the server system enhancing audio prior to being played and reducing computational complexity associated with enhancing the audio, such that processing capabilities of the server system are enhanced, examining audio frames within a pre-set time period from beginning of audio in a time domain to determine a popping residing section which includes a plurality of audio frames presenting popping sounds; calculating an average value of amplitudes of M audio frames preceding the popping residing section and an average value of amplitudes of K audio frames succeeding the popping residing section; setting amplitudes of the plurality of audio frames in the popping residing section to be zero in response to a determination that the two average values obtained from the calculating are both smaller than a pre-set silencing threshold, wherein setting amplitudes to be zero enables, at least in part, reductions in computational complexity; and reducing the amplitudes of the plurality of audio frames in the popping residing section in response to a determination that both the two average values obtained from the calculating are not smaller than a pre-set silencing threshold; wherein M and K are integers larger than one.

2. The method of claim 1 , wherein examining audio frames within a pre-set time period from beginning of audio in a time domain to determine a popping residing section comprises: calculating a short-time energy difference between each pair of adjacent audio frames in turn in a chronological order of audio frames within the pre-set time period from the beginning of the audio in the time domain; and determining a popping start position and a popping end position based on a pre-determined first popping threshold and the calculated short-time energy differences; wherein the popping residing section is a section defined by the popping start position and the popping end position.

3. The method of claim 1 , wherein examining audio frames within a pre-set time period from beginning of audio in a time domain to determine a popping residing section comprises: comparing a pre-set popping threshold with a short-time energy of each audio frame in turn in a chronological order of audio frames within the pre-set time period from the beginning of the audio in the time domain, determining a popping start position and a popping end position according to a comparing result; wherein the popping residing section is a section defined by the popping start position and the popping end position.

4. The method of claim 1 , wherein reducing the amplitudes of the audio frames in the popping residing section comprises: calculating a reduction coefficient using a largest amplitude of amplitudes of audio frames within the popping residing section and an average value of M audio frames preceding the popping residing section and an average value of K audio frames succeeding the popping residing section; and reducing the amplitudes of the audio frames in the popping residing section using the reduction coefficient.

5. The method of claim 1 , wherein reducing the amplitudes of the audio frames in the popping residing section comprises: calculating a reduction coefficient using a largest amplitude of amplitudes of the audio frames within the popping residing section and an average value of amplitudes of audio frames within the popping residing section; and reducing the amplitudes of the audio frames in the popping residing section using the reduction coefficient.

6. An apparatus for eliminating popping sounds at beginning of audio, comprising: a processor and a memory; the memory stores computer-readable instructions executable by the processor to: examine audio frames within a pre-set time period from the beginning of audio in a time domain to obtain a popping residing section which includes a plurality of audio frames presenting popping sounds; and calculate an average value of amplitudes of M audio frames preceding the popping residing section and an average value of amplitudes of K audio frames succeeding the popping residing section; judge whether the two average values are both smaller than a pre-determined silencing threshold, set amplitudes of audio frames within the popping residing section to be zero in response to a determination that both the two average values are smaller than the silencing threshold; or reduce amplitudes of audio frames within the popping residing section in response to a determination that the two average values are not both smaller than the silencing threshold; wherein M and K are integers larger than one, and wherein the apparatus enhances audio prior to being played and reduces computational complexity associated with enhancing the audio, such that processing capabilities of the apparatus are enhanced.

7. The apparatus of claim 6 , wherein the instructions are executable by the processor to: calculate a short-time energy difference between each pair of adjacent audio frames in turn in a chronological order of audio frames within the pre-set time period from beginning of audio in a time domain; determine a popping start position and a popping end position based on a pre-determined first popping threshold and the calculated short-time energy differences, wherein the popping residing section is a section defined by the popping start position and the popping end position.

8. The apparatus of claim 6 , wherein the instructions are executable by the processor to: compare a pre-set popping threshold with an amplitude of each of audio frames in turn in a chronological order of audio frames within the pre-set time period from beginning of audio in a time domain, determine a popping start position and a popping end position according to a compare result, wherein the popping residing section is a section defined by the popping start position and the popping end position.

9. The apparatus of claim 6 , wherein the instructions are executable by the processor to: calculate a reduction coefficient using a largest amplitude of amplitudes of audio frames within the popping residing section and an average value of M audio frames preceding the popping residing section and an average value of M audio frames succeeding the popping residing section; and reduce amplitudes of audio frames within the popping residing section according to the reduction coefficient.

10. The apparatus of claim 6 , wherein the instructions are executable by the processor to: calculate a reduction coefficient using a largest amplitude of amplitudes of audio frames within the popping residing section and an average value of amplitudes of audio frames within the popping residing section; and reduce amplitudes of audio frames within the popping residing section according to the reduction coefficient.

11. A non-transitory computer-readable storage medium storing computer-readable instructions that, when executed by a server system of one or more processors, cause the server system to: examine audio frames within a pre-set time period from the beginning of audio in a time domain to obtain a popping residing section which includes a plurality of audio frames presenting popping sounds; and calculate an average value of amplitudes of M audio frames preceding the popping residing section and an average value of amplitudes of K audio frames succeeding the popping residing section; judge whether the two average values are both smaller than a pre-determined silencing threshold, set amplitudes of audio frames within the popping residing section to be zero in response to a determination that both the two average values are smaller than the silencing threshold; or reduce amplitudes of audio frames within the popping residing section in response to a determination that the two average values are not both smaller than the silencing threshold; wherein M and K are integers larger than one, and wherein the server system enhances audio prior to being played and reduces computational complexity associated with enhancing the audio, such that processing capabilities of the server system are enhanced.

12. The non-transitory computer-readable storage medium of claim 11 , wherein the instructions are executable by the server system to: calculate a short-time energy difference between each pair of adjacent audio frames in turn in a chronological order of audio frames within the pre-set time period from beginning of audio in a time domain; determine a popping start position and a popping end position based on a pre-determined first popping threshold and the calculated short-time energy differences, wherein the popping residing section is a section defined by the popping start position and the popping end position.

13. The non-transitory computer-readable storage medium of claim 11 , wherein the instructions are executable by the server system to: compare a pre-set popping threshold with an amplitude of each of audio frames in turn in a chronological order of audio frames within the pre-set time period from beginning of audio in a time domain, determine a popping start position and a popping end position according to a compare result, wherein the popping residing section is a section defined by the popping start position and the popping end position.

14. The non-transitory computer-readable storage medium of claim 11 , wherein the instructions are executable by the server system to: calculate a reduction coefficient using a largest amplitude of amplitudes of audio frames within the popping residing section and an average value of M audio frames preceding the popping residing section and an average value of K audio frames succeeding the popping residing section; and reduce amplitudes of audio frames within the popping residing section according to the reduction coefficient.

15. The non-transitory computer-readable storage medium of claim 11 , wherein the instructions are executable by the server system to: calculate a reduction coefficient using a largest amplitude of amplitudes of audio frames within the popping residing section and an average value of amplitudes of audio frames within the popping residing section; and reduce amplitudes of audio frames within the popping residing section according to the reduction coefficient.

Patent Metadata

Filing Date

Unknown

Publication Date

February 5, 2019

Inventors

Lingcheng KONG

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search