Sibilance Detection and Mitigation

PublishedDecember 15, 2020

Assigneenot available in USPTO data we have

Technical Abstract

Patent Claims

19 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method of sibilance detection and mitigation, comprising: extracting a predetermined spectrum feature representing a distribution of signal energy over a voice frequency band from a voice signal; determining a binary voice indicator from the voice signal, the binary voice indicator indicating whether active voice is present in the voice signal; in response to determining the binary voice indicator indicating that the active voice is present in the voice signal, performing: identifying sibilance based on the predetermined spectrum feature; determining whether the identified sibilance is an excessive sibilance based on comparing a level of the identified sibilance in a current frame with a long-term non-sibilance level estimated based on levels of non-sibilance voice in a plurality of frames; and in response to determining that the identified sibilance is an excessive sibilance based on comparing the level of the identified sibilance in the current frame with the long-term non-sibilance level estimated based on the levels of the non-sibilance voice in the plurality of frames, processing the voice signal by decreasing a level of the excessive sibilance so as to suppress the excessive sibilance.

2. The method of claim 1 , wherein the identifying sibilance based on the predetermined spectrum feature comprises: classifying the voice signal into a sibilance voice, a non-sibilance voice, or a noise or silence based on the predetermined spectrum feature and the binary voice indicator; and/or wherein the processing the voice signal comprises: processing the voice signal after an automatic gain control is performed on the voice signal.

3. The method of claim 1 , further comprising any of: determining a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and determining a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band, wherein the identifying excessive sibilance based on a level of the identified sibilance comprises: determining whether the identified sibilance is excessive sibilance based on any of the ratio and the peaky degree.

4. The method of claim 1 , wherein the processing the voice signal comprises: processing the voice signal according to a sibilance suppression curve, wherein the sibilance is suppressed only when its level is higher than a predetermined level threshold.

5. The method of claim 4 , wherein the sibilance suppression curve is an S-shape curve and wherein the sibilance is suppressed linearly or non-linearly when its level is higher than the predetermined level threshold but lower than another predetermined level threshold that is higher than the predetermined level threshold, and wherein the sibilance is suppressed by a predetermined suppression amount when its level is higher than the other predetermined level threshold.

6. The method of claim 5 , further comprising any of: determining a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and determining a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band, wherein the method further comprises: controlling an operating mode in which the sibilance is suppressed, based on any of the ratio and the peaky degree, and wherein the controlling an operating mode in which the sibilance is suppressed comprises any of: adjusting the predetermined suppression amount; and adjusting the predetermined suppression amount and the other predetermined level threshold.

7. The method of claim 1 , wherein the processing the voice signal comprises: processing the voice signal according to a sibilance suppression curve, wherein the sibilance is suppressed by a predetermined suppression amount when its level is higher than a predetermined level threshold; wherein the method further comprises any of: determining a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and determining a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band; wherein the method further comprises: controlling an operating mode in which the sibilance is suppressed, based on any of the ratio and the peaky degree, and wherein the controlling an operating mode in which the sibilance is suppressed comprises any of: adjusting the predetermined suppression amount; and adjusting the predetermined suppression amount and the other predetermined level threshold.

8. The method of claim 1 , wherein the predetermined spectrum feature comprises any of: a ratio of signal energy in a sibilance frequency band to signal energy in the voice frequency band; a ratio of signal energy in the sibilance frequency band to signal energy in a non-sibilance frequency band; a ratio of signal-to-noise ratio (SNR) in the sibilance frequency band and SNR in the non-sibilance frequency band; a spectrum centroid indicating a frequency position at which a center of mass of the spectrum is located; and a spectrum flux in the sibilance frequency band.

9. The method of claim 3 , wherein the peaky degree of the sibilance spectrum is determined based on any of: geometric mean and arithmetic mean of banded energies in the voice frequency band; a variance of adjacent banded energies in the sibilance frequency band; a standard deviation of adjacent banded energies in the sibilance frequency band; a sum of differences among banded energies in the sibilance frequency band; a maximum of differences among banded energies in the sibilance frequency band; a crest factor of banded energies in the sibilance frequency band; and spectral entropy in the voice frequency band.

10. A system of sibilance detection and mitigation, comprising: one or more processors; a non-transitory computer-readable medium storing a sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to perform: extracting a predetermined spectrum feature representing a distribution of signal energy over a voice frequency band from a voice signal; determining a binary voice indicator from the voice signal, the binary voice indicator indicating whether active voice is present in the voice signal; in response to determining the binary voice indicator indicating that the active voice is present in the voice signal, performing: identifying sibilance based on the predetermined spectrum feature; determining whether the identified sibilance is an excessive sibilance based on comparing a level of the identified sibilance in a current frame with a long-term non-sibilance level estimated based on levels of non-sibilance voice in a plurality of frames; and in response to determining that the identified sibilance is an excessive sibilance based on comparing the level of the identified sibilance in the current frame with the long-term non-sibilance level estimated based on the levels of the non-sibilance voice in the plurality of frames, processing the voice signal by decreasing a level of the excessive sibilance so as to suppress the excessive sibilance.

11. The system of claim 10 , wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform classifying the voice signal into a sibilance voice, a non-sibilance voice, or a noise or silence based on the predetermined spectrum feature and binary voice indicator; and/or processing the voice signal after an automatic gain control is performed on the voice signal.

12. The system of claim 10 , wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform any of: determining a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and determining a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band, determining whether the identified sibilance is excessive sibilance based on any of the ratio or the peaky degree.

13. The system of claim 10 , wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform processing the voice signal according to a sibilance suppression curve, and suppressing the sibilance only when its level is higher than a predetermined level threshold.

14. The system of claim 13 , wherein the sibilance suppression curve is an S-shape curve and wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform suppressing the sibilance linearly or non-linearly when its level is higher than the predetermined level threshold but lower than another predetermined level threshold that is higher than the predetermined level threshold, and suppressing the sibilance by a predetermined suppression amount when its level is higher than the other predetermined level threshold.

15. The system of claim 14 , wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform any of: determining a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and determining a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band, wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform: controlling an operating mode in which the sibilance is suppressed, based on any of the ratio and the peaky degree, and controlling the operating mode by any of: adjusting the predetermined suppression amount; and adjusting the predetermined suppression amount and the other predetermined level threshold.

16. The system of claim 10 , wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform: processing the voice signal according to a sibilance suppression curve; suppressing the sibilance by a predetermined suppression amount when its level is higher than a predetermined level threshold; wherein the system further comprises any of: a level ratio determiner that determines a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and a peaky degree determiner that determines a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band; wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform any of: controlling an operating mode in which the sibilance is suppressed, based on any of the ratio and the peaky degree; and controlling the operating mode by any of: adjusting the predetermined suppression amount; and adjusting the predetermined suppression amount and the other predetermined level threshold.

17. The system of claim 10 , wherein the predetermined spectrum feature comprises any of: a ratio of signal energy in a sibilance frequency band to signal energy in the voice frequency band; a ratio of signal energy in the sibilance frequency band to signal energy in a non-sibilance frequency band; a ratio of signal-to-noise ratio (SNR) in the sibilance frequency band and SNR in the non-sibilance frequency band; a spectrum centroid indicating a frequency position at which a center of mass of the spectrum is located; and a spectrum flux in the sibilance frequency band.

18. The system of claim 12 , wherein the peaky degree of the sibilance spectrum is determined based on any of: geometric mean and arithmetic mean of banded energies in the voice frequency band; a variance of adjacent banded energies in the sibilance frequency band; a standard deviation of adjacent banded energies in the sibilance frequency band; a sum of differences among banded energies in the sibilance frequency band; a maximum of differences among banded energies in the sibilance frequency band; a crest factor of banded energies in the sibilance frequency band; and spectral entropy in the voice frequency band.

19. A non-transitory computer-readable medium storing a sequence of computing instructions, which when executed by one or more processors, causes the one or more processors to perform steps of the method according to claim 1 .

Patent Metadata

Filing Date

Unknown

Publication Date

December 15, 2020

Inventors

Kai LI

David GUNAWAN

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search