US-9721580

Situation dependent transient suppression

PublishedAugust 1, 2017

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Provided are methods and systems for providing situation-dependent transient noise suppression for audio signals. Different strategies (e.g., levels of aggressiveness) of transient suppression and signal restoration are applied to audio signals associated with participants in a video/audio conference depending on whether or not each participant is speaking (e.g., whether a voiced segment or an unvoiced/non-speech segment of audio is present). If no participants are speaking or there is an unvoiced/non-speech sound present, a more aggressive strategy for transient suppression and signal restoration is utilized. On the other hand, where voiced audio is detected (e.g., a participant is speaking), the methods and systems apply a softer, less aggressive suppression and restoration process.

Patent Claims

18 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method performed by a teleconference computing device for suppressing transient noise in an audio signal, the method comprising: estimating a voice probability for a segment of the audio signal containing transient noise, the estimated voice probability being a probability that the segment contains voice data; responsive to determining that the estimated voice probability for the segment is greater than a threshold probability, suppressing the transient noise contained in the segment of the audio signal while reducing distortion of the voice data, including: calculating a spectral mean for the audio segment over a plurality of frequency bins of the audio segment, and for each frequency bin of the plurality of frequency bins of the audio segment, if a comparison of a current value of the magnitude of the frequency bin to the spectral mean and to a calculated factor of the spectral mean indicates that transient noise is present, suppressing the transient noise in the frequency bin, wherein the calculated factor of the spectral mean is a fixed spectral weighting that is configured to de-emphasize frequency bins of the plurality of frequency bins corresponding to frequencies at which the voice data is transmitted, wherein suppressing the transient noise includes adjusting the magnitude of the frequency bin to a new value between the spectral mean and the current value of the magnitude of the frequency bin; and responsive to determining that the estimated voice probability for the segment is less than the threshold probability, suppressing the transient noise contained in the segment of the audio signal while not reducing distortion of the voice data, including: calculating a spectral mean for the audio segment over the plurality of frequency bins of the audio segment, and for each frequency bin of the plurality of frequency bins of the audio segment, if a comparison of a magnitude of the frequency bin to the spectral mean indicates that transient noise is present, suppressing the transient noise in the frequency bin, wherein suppressing the transient noise includes adjusting the magnitude of the frequency bin to a new value between the spectral mean and the current value of the magnitude of the frequency bin, wherein the transient noise is at least one of feedback noise, fan noise, and button-clicking noise due to mechanical connection between an audio capture device and a keyboard or trackpad of the teleconferencing computing device.

2. The method of claim 1 , wherein the estimated voice probability is based on voicing information received from a pitch estimator.

3. The method of claim 1 , wherein estimating the voice probability for the segment of the audio signal includes identifying regions of the segment containing voiced speech.

4. The method of claim 3 , wherein identifying regions of the segment containing voiced speech includes identifying regions of the segment where the vocal folds are vibrating.

5. The method of claim 1 further comprising: in response to the comparison of the magnitude of the frequency bin to the spectral mean and to the calculated factor of the spectral mean satisfying a first condition, calculating a new magnitude for the frequency bin; and in response to the comparison of the magnitude of the frequency bin to the spectral mean and to the calculated factor of the spectral mean satisfying a second condition, maintaining the magnitude for the frequency bin, wherein the first condition is different from the second condition.

6. The method of claim 1 further comprising: in response to the comparison of the magnitude of the frequency bin to the spectral mean satisfying a first condition, calculating a new magnitude for the frequency bin; and in response to the comparison of the magnitude of the frequency bin to the spectral mean satisfying a second condition, maintaining the magnitude for the frequency bin, wherein the first condition is different from the second condition.

7. The method of claim 5 , wherein the new magnitude for the frequency bin is calculated based on the previous magnitude, the spectral mean, and an estimated probability that a transient noise is present in the audio segment.

8. The method of claim 6 , wherein the new magnitude for the frequency bin is calculated based on the previous magnitude, the spectral mean, and an estimated probability that a transient noise is present in the audio segment.

9. A teleconferencing computing system for suppressing transient noise in an audio signal, the system comprising: at least one processor; and a non-transitory computer-readable medium coupled to the at least one processor having instructions stored thereon which, when executed by the at least one processor, causes the at least one processor to: estimate a voice probability for a segment of the audio signal containing transient noise, the estimated voice probability being a probability that the segment contains voice data; responsive to determining that the estimated voice probability for the segment is greater than a threshold probability, suppress the transient noise contained in the segment of the audio signal while reducing distortion of the voice data, including: calculating a spectral mean for the audio segment over a plurality of frequency bins of the audio segment, and for each frequency bin of the plurality of frequency bins of the audio segment, if a comparison of a current value of the magnitude of the frequency bin to the spectral mean and to a calculated factor of the spectral mean indicates that transient noise is present, suppressing the transient noise in the frequency bin, wherein the calculated factor of the spectral mean is a fixed spectral weighting that is configured to de-emphasize frequency bins of the plurality of frequency bins corresponding to frequencies at which the voice data is transmitted, wherein suppressing the transient noise includes adjusting the magnitude of the frequency bin to a new value between the spectral mean and the current value of the magnitude of the frequency bin; and responsive to determining that the estimated voice probability for the segment is less than the threshold probability, suppress the transient noise contained in the segment of the audio signal while not reducing distortion of the voice data, including: calculating a spectral mean for the audio segment over a plurality of frequency bins of the audio segment, and for each frequency bin of the plurality of frequency bins of the audio segment, if a comparison of a magnitude of the frequency bin to the spectral mean indicates that transient noise is present, suppress the transient noise in the frequency bin, wherein suppressing the transient noise includes adjusting the magnitude of the frequency bin to a new value between the spectral mean and the current value of the magnitude of the frequency bin, wherein the transient noise is at least one of feedback noise, fan noise, and button-clicking noise due to mechanical connection between an audio capture device and a keyboard or trackpad of the teleconferencing computing device.

10. The system of claim 9 , the estimated voice probability is based on voicing information received from a pitch estimator.

11. The system of claim 9 , wherein the at least one processor is further caused to: identify regions of the segment where the vocal folds are vibrating; and determine that the regions of the segment where the vocal folds are vibrating are regions containing voiced speech.

12. The system of claim 9 , wherein the at least one processor is further caused to: in response to the comparison of the magnitude of the frequency bin to the spectral mean and to the calculated factor of the spectral mean satisfying a first condition, calculate a new magnitude for the frequency bin; and in response to the comparison of the magnitude of the frequency bin to the spectral mean and to the calculated factor of the spectral mean satisfying a second condition, maintain the magnitude for the frequency bin, wherein the first condition is different from the second condition.

13. The system of claim 9 , wherein the at least one processor is further caused to: in response to the comparison of the magnitude of the frequency bin to the spectral mean satisfying a first condition, calculate a new magnitude for the frequency bin; and in response to the comparison of the magnitude of the frequency bin to the spectral mean satisfying a second condition, maintain the magnitude for the frequency bin, wherein the first condition is different from the second condition.

14. The system of claim 12 , wherein the at least one processor is further caused to: calculate the new magnitude for the frequency bin based on the previous magnitude, the spectral mean, and an estimated probability that a transient noise is present in the audio segment.

15. The system of claim 13 , wherein the at least one processor is further caused to: calculate the new magnitude for the frequency bin based on the previous magnitude, the spectral mean, and an estimated probability that a transient noise is present in the audio segment.

16. A method performed by a teleconference computing device for suppressing transient noise in an audio signal, the method comprising: estimating a voice probability for a segment of the audio signal containing transient noise, the estimated voice probability being a probability that the segment contains voice data; responsive to determining that the estimated voice probability for the segment is greater than a threshold probability, suppressing the transient noise contained in the segment of the audio signal while reducing distortion of the voice data, including: calculating a spectral mean for the audio segment over a plurality of frequency bins of the audio segment, and for each frequency bin of the plurality of frequency bins of the audio segment, if a comparison of a current value of the magnitude of the frequency bin to the spectral mean and to a calculated factor of the spectral mean indicates that transient noise is present, suppressing the transient noise in the frequency bin, wherein the calculated factor of the spectral mean is a fixed spectral weighting that is configured to de-emphasize frequency bins of the plurality of frequency bins corresponding to frequencies at which the voice data is transmitted, wherein suppressing the transient noise includes adjusting the magnitude of the frequency bin to a new value between the spectral mean and the current value of the magnitude of the frequency bin; and responsive to determining that the estimated voice probability for the segment is less than the threshold probability, suppressing the transient noise contained in the segment of the audio signal while not reducing distortion of the voice data, including: calculating a spectral mean for the audio segment over the plurality of frequency bins of the audio segment, and for each frequency bin of the plurality of frequency bins of the audio segment, if a comparison of a magnitude of the frequency bin to the spectral mean indicates that transient noise is present, suppressing the transient noise in the frequency bin, wherein suppressing the transient noise includes adjusting the magnitude of the frequency bin to a new value between the spectral mean and the current value of the magnitude of the frequency bin, wherein the transient noise is at least one of feedback noise, fan noise, and button-clicking noise due to mechanical connection between an audio capture device and a keyboard or trackpad of the teleconferencing computing device.

17. The method of claim 16 , further comprising: in response to the comparison of the magnitude of the frequency bin to the spectral mean and to the calculated factor of the spectral mean satisfying a first condition, calculating a new magnitude for the frequency bin; and in response to the comparison of the magnitude of the frequency bin to the spectral mean and to the calculated factor of the spectral mean satisfying a second condition, maintaining the magnitude for the frequency bin, wherein the first condition is different from the second condition.

18. The method of claim 16 , further comprising: in response to the comparison of the magnitude of the frequency bin to the spectral mean satisfying a first condition, calculating a new magnitude for the frequency bin; and in response to the comparison of the magnitude of the frequency bin to the spectral mean satisfying a second condition, maintaining the magnitude for the frequency bin, wherein the first condition is different from the second condition.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G10L

Patent Metadata

Filing Date

March 31, 2014

Publication Date

August 1, 2017

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search