Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method of sibilance detection and mitigation, comprising: extracting a predetermined spectrum feature representing a distribution of signal energy over a voice frequency band from a voice signal; determining a binary voice indicator from the voice signal, the binary voice indicator indicating whether active voice is present in the voice signal; in response to determining the binary voice indicator indicating that the active voice is present in the voice signal, performing: identifying sibilance based on the predetermined spectrum feature; determining whether the identified sibilance is an excessive sibilance based on comparing a level of the identified sibilance in a current frame with a long-term non-sibilance level estimated based on levels of non-sibilance voice in a plurality of frames; and in response to determining that the identified sibilance is an excessive sibilance based on comparing the level of the identified sibilance in the current frame with the long-term non-sibilance level estimated based on the levels of the non-sibilance voice in the plurality of frames, processing the voice signal by decreasing a level of the excessive sibilance so as to suppress the excessive sibilance.
Audio processing for speech enhancement. This invention addresses the problem of excessive sibilance, which are harsh "s" or "sh" sounds in speech, by detecting and reducing them. The method involves analyzing a voice signal to extract a spectrum feature that describes how signal energy is distributed across the voice frequency range. Simultaneously, a binary indicator is generated to determine if active speech is present in the signal. When active speech is detected, the system proceeds to identify sibilance using the extracted spectrum feature. It then determines if this sibilance is excessive by comparing its level in the current audio frame to a long-term average level of non-sibilant speech, estimated from multiple previous frames. If the sibilance is deemed excessive, its level is reduced to suppress the harsh sound.
2. The method of claim 1 , wherein the identifying sibilance based on the predetermined spectrum feature comprises: classifying the voice signal into a sibilance voice, a non-sibilance voice, or a noise or silence based on the predetermined spectrum feature and the binary voice indicator; and/or wherein the processing the voice signal comprises: processing the voice signal after an automatic gain control is performed on the voice signal.
This invention relates to voice signal processing, specifically methods for identifying and handling sibilance in voice signals. Sibilance refers to high-frequency sounds like "s" and "sh" that can cause distortion in audio recordings. The problem addressed is accurately detecting sibilance while distinguishing it from non-sibilant speech and noise, and ensuring proper signal processing to maintain audio quality. The method involves analyzing a voice signal to identify sibilance based on a predetermined spectrum feature. The signal is classified into three categories: sibilance voice, non-sibilance voice, or noise/silence. This classification relies on both the spectrum feature and a binary voice indicator that distinguishes speech from non-speech. The spectrum feature likely involves frequency analysis to detect the characteristic high-frequency components of sibilance. Additionally, the voice signal is processed after applying automatic gain control (AGC), which adjusts the signal's amplitude to a consistent level. AGC helps normalize the signal before further processing, ensuring that sibilance detection and subsequent steps are performed on a stable input. The method may also include steps for reducing or enhancing sibilance based on the classification results, though these are not explicitly detailed in the claim. The invention aims to improve voice signal quality by accurately identifying and managing sibilant sounds, which are often problematic in audio applications like speech recognition, telephony, and voice recording.
3. The method of claim 1 , further comprising any of: determining a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and determining a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band, wherein the identifying excessive sibilance based on a level of the identified sibilance comprises: determining whether the identified sibilance is excessive sibilance based on any of the ratio and the peaky degree.
This invention relates to audio processing, specifically methods for detecting and managing excessive sibilance in speech signals. Sibilance refers to high-frequency sounds produced by consonants like "s" and "sh," which can become harsh or unnatural when amplified or processed. The problem addressed is the need to accurately identify and mitigate excessive sibilance to improve audio quality in applications such as voice communication, recording, and playback systems. The method involves analyzing an audio signal to identify sibilance by detecting high-frequency components characteristic of sibilant sounds. To determine if the sibilance is excessive, the method calculates a ratio of the sibilance level to a long-term level of non-sibilant voices, providing a relative measure of prominence. Additionally, the method assesses the "peaky degree" of the sibilance spectrum by evaluating banded energies within the sibilance frequency range, which helps distinguish between natural and exaggerated sibilance. The decision to classify sibilance as excessive is based on either the ratio or the peaky degree, or both, ensuring precise detection. This approach enables adaptive processing to reduce or normalize excessive sibilance while preserving natural speech characteristics. The technique is useful in audio enhancement systems, noise reduction algorithms, and real-time communication devices where clear and pleasant speech quality is critical.
4. The method of claim 1 , wherein the processing the voice signal comprises: processing the voice signal according to a sibilance suppression curve, wherein the sibilance is suppressed only when its level is higher than a predetermined level threshold.
This invention relates to audio signal processing, specifically to methods for reducing sibilance in voice signals. Sibilance refers to the harsh, high-frequency sounds produced by consonants like "s" and "sh," which can become overly prominent in recorded or amplified speech, causing listener fatigue. The invention addresses this problem by selectively suppressing sibilance only when its level exceeds a predetermined threshold, ensuring natural-sounding speech while mitigating excessive harshness. The method processes a voice signal by applying a sibilance suppression curve, which attenuates high-frequency components associated with sibilant sounds. The suppression is dynamically adjusted based on the signal's amplitude, activating only when the sibilance level surpasses a predefined threshold. This threshold-based approach prevents over-processing of normal speech, maintaining clarity while reducing distortion. The technique may be implemented in real-time audio systems, such as voice communication devices, recording equipment, or speech enhancement algorithms, to improve listening comfort without altering the natural timbre of the voice. The invention ensures that suppression is applied precisely where needed, avoiding unnecessary attenuation of other high-frequency elements in the speech signal.
5. The method of claim 4 , wherein the sibilance suppression curve is an S-shape curve and wherein the sibilance is suppressed linearly or non-linearly when its level is higher than the predetermined level threshold but lower than another predetermined level threshold that is higher than the predetermined level threshold, and wherein the sibilance is suppressed by a predetermined suppression amount when its level is higher than the other predetermined level threshold.
This invention relates to audio signal processing, specifically methods for suppressing sibilance in audio signals. Sibilance refers to the harsh, high-frequency sounds produced by certain speech consonants, such as "s" and "sh," which can become overly prominent in recorded or amplified audio. The problem addressed is the need to reduce sibilance while preserving natural speech clarity and avoiding excessive distortion. The method involves applying a sibilance suppression curve to an audio signal, where the curve is shaped like an S-curve. The suppression is applied in stages based on the level of the sibilance. When the sibilance level is between a lower threshold and a higher threshold, the suppression is applied either linearly or non-linearly. If the sibilance level exceeds the higher threshold, a fixed suppression amount is applied. This multi-stage approach ensures that sibilance is reduced progressively, preventing abrupt changes in the audio signal while maintaining natural speech characteristics. The technique is particularly useful in applications like voice recording, broadcasting, and audio enhancement systems where clear and pleasant speech reproduction is critical.
6. The method of claim 5 , further comprising any of: determining a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and determining a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band, wherein the method further comprises: controlling an operating mode in which the sibilance is suppressed, based on any of the ratio and the peaky degree, and wherein the controlling an operating mode in which the sibilance is suppressed comprises any of: adjusting the predetermined suppression amount; and adjusting the predetermined suppression amount and the other predetermined level threshold.
This invention relates to audio processing, specifically methods for suppressing sibilance in voice signals. Sibilance refers to the harsh, high-frequency sounds produced by certain consonants like "s" and "sh," which can make speech sound unnatural or fatiguing when amplified. The invention addresses the challenge of dynamically adjusting sibilance suppression to maintain natural speech quality while reducing unwanted harshness. The method involves analyzing the audio signal to identify sibilance and then determining its intensity relative to non-sibilant speech. This includes calculating a ratio of sibilance level to long-term non-sibilant voice levels and assessing the "peaky" nature of the sibilance spectrum by examining energy distribution in sibilance frequency bands. Based on these measurements, the system controls the suppression mode by adjusting either the suppression amount or both the suppression amount and a threshold level for triggering suppression. This adaptive approach ensures that sibilance is reduced without over-suppressing, preserving speech clarity and naturalness. The method is particularly useful in audio applications like broadcasting, telephony, and hearing aids where sibilance can degrade listening comfort.
7. The method of claim 1 , wherein the processing the voice signal comprises: processing the voice signal according to a sibilance suppression curve, wherein the sibilance is suppressed by a predetermined suppression amount when its level is higher than a predetermined level threshold; wherein the method further comprises any of: determining a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and determining a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band; wherein the method further comprises: controlling an operating mode in which the sibilance is suppressed, based on any of the ratio and the peaky degree, and wherein the controlling an operating mode in which the sibilance is suppressed comprises any of: adjusting the predetermined suppression amount; and adjusting the predetermined suppression amount and the other predetermined level threshold.
This invention relates to audio processing, specifically to methods for suppressing sibilance in voice signals. Sibilance refers to harsh, high-frequency sounds produced by consonants like "s" and "sh," which can be unpleasant or fatiguing to listeners. The invention addresses the problem of excessive sibilance in recorded or transmitted voice signals by dynamically adjusting suppression parameters to reduce distortion while maintaining natural speech quality. The method processes a voice signal using a sibilance suppression curve, which reduces sibilance by a predetermined amount when its level exceeds a predefined threshold. The system may analyze the voice signal to determine the ratio of sibilance level to the long-term level of non-sibilance sounds, or it may assess the "peaky degree" of the sibilance spectrum by evaluating energy levels in specific frequency bands. Based on these measurements, the method dynamically controls the suppression mode by adjusting either the suppression amount, the level threshold, or both. This adaptive approach ensures that sibilance is reduced without over-suppressing or distorting the voice signal, preserving clarity and naturalness. The technique is particularly useful in applications like telephony, voice recording, and audio broadcasting where sibilance can degrade listening comfort.
8. The method of claim 1 , wherein the predetermined spectrum feature comprises any of: a ratio of signal energy in a sibilance frequency band to signal energy in the voice frequency band; a ratio of signal energy in the sibilance frequency band to signal energy in a non-sibilance frequency band; a ratio of signal-to-noise ratio (SNR) in the sibilance frequency band and SNR in the non-sibilance frequency band; a spectrum centroid indicating a frequency position at which a center of mass of the spectrum is located; and a spectrum flux in the sibilance frequency band.
This invention relates to audio signal processing, specifically methods for analyzing and characterizing speech signals to detect or quantify sibilance—a high-frequency fricative sound often associated with consonants like "s" or "sh." The problem addressed is the need for accurate and reliable detection of sibilance in speech signals, which is useful in applications such as speech enhancement, voice recognition, and audio quality assessment. The method involves analyzing the frequency spectrum of an audio signal to extract specific spectral features that indicate the presence or intensity of sibilance. The key features include ratios of signal energy between different frequency bands, such as the ratio of energy in a sibilance frequency band (typically high frequencies) to energy in a voice frequency band (typically lower frequencies) or a non-sibilance band. Additionally, the method may use the ratio of signal-to-noise ratio (SNR) between these bands, a spectrum centroid (a measure of the spectral "center of mass"), or spectrum flux (a measure of spectral change over time) in the sibilance band to further refine detection. By evaluating these spectral features, the method can distinguish sibilant sounds from other speech components, enabling improved processing for applications requiring precise speech analysis. The approach leverages well-known spectral analysis techniques but applies them in a novel way to enhance sibilance detection accuracy.
9. The method of claim 3 , wherein the peaky degree of the sibilance spectrum is determined based on any of: geometric mean and arithmetic mean of banded energies in the voice frequency band; a variance of adjacent banded energies in the sibilance frequency band; a standard deviation of adjacent banded energies in the sibilance frequency band; a sum of differences among banded energies in the sibilance frequency band; a maximum of differences among banded energies in the sibilance frequency band; a crest factor of banded energies in the sibilance frequency band; and spectral entropy in the voice frequency band.
The invention relates to audio signal processing, specifically methods for analyzing sibilance in voice signals. Sibilance refers to high-frequency sounds like "s" and "sh" in speech, which can be harsh or distracting if not properly managed. The method determines the "peaky degree" of the sibilance spectrum, which quantifies how pronounced or uneven the energy distribution is in the sibilance frequency range. This helps in identifying and mitigating excessive sibilance in audio processing applications like voice enhancement, noise reduction, or speech synthesis. The peaky degree is calculated using multiple statistical measures applied to banded energies in the voice frequency band. These measures include the geometric and arithmetic means of banded energies, variance and standard deviation of adjacent banded energies in the sibilance frequency band, the sum and maximum of differences among banded energies, the crest factor (ratio of peak to average energy), and spectral entropy (a measure of spectral flatness). These metrics help assess the spectral shape and irregularity of sibilance, enabling more precise control over its perception in processed audio. The method supports applications where natural or balanced sibilance is desired, such as in voice communication systems, audio mastering, or assistive listening devices.
10. A system of sibilance detection and mitigation, comprising: one or more processors; a non-transitory computer-readable medium storing a sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to perform: extracting a predetermined spectrum feature representing a distribution of signal energy over a voice frequency band from a voice signal; determining a binary voice indicator from the voice signal, the binary voice indicator indicating whether active voice is present in the voice signal; in response to determining the binary voice indicator indicating that the active voice is present in the voice signal, performing: identifying sibilance based on the predetermined spectrum feature; determining whether the identified sibilance is an excessive sibilance based on comparing a level of the identified sibilance in a current frame with a long-term non-sibilance level estimated based on levels of non-sibilance voice in a plurality of frames; and in response to determining that the identified sibilance is an excessive sibilance based on comparing the level of the identified sibilance in the current frame with the long-term non-sibilance level estimated based on the levels of the non-sibilance voice in the plurality of frames, processing the voice signal by decreasing a level of the excessive sibilance so as to suppress the excessive sibilance.
The system detects and mitigates excessive sibilance in voice signals. Sibilance refers to high-frequency sounds like "s" or "sh" that can become overly pronounced, degrading audio quality. The system processes voice signals to identify and reduce such excessive sibilance while preserving natural speech characteristics. The system extracts a spectrum feature representing energy distribution across voice frequencies from the input signal. It then determines whether active voice is present. If voice is detected, the system analyzes the spectrum feature to identify sibilance. To assess whether the sibilance is excessive, it compares the current frame's sibilance level against a long-term average of non-sibilance voice levels from prior frames. If the sibilance exceeds this threshold, the system processes the signal to reduce its intensity, effectively suppressing the excessive sibilance while maintaining speech clarity. The system dynamically adapts to varying speech patterns by continuously updating the long-term non-sibilance level, ensuring accurate detection and mitigation of sibilance in real-time applications like telephony, voice assistants, or audio processing. The approach avoids over-suppression by distinguishing between natural and excessive sibilance.
11. The system of claim 10 , wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform classifying the voice signal into a sibilance voice, a non-sibilance voice, or a noise or silence based on the predetermined spectrum feature and binary voice indicator; and/or processing the voice signal after an automatic gain control is performed on the voice signal.
This invention relates to voice signal processing systems designed to enhance audio quality by classifying and processing voice signals. The system addresses challenges in distinguishing between different types of voice signals and noise, which is critical for applications like speech recognition, telecommunication, and audio enhancement. The system includes one or more processors and a memory storing computing instructions that, when executed, perform several functions. First, the system analyzes a voice signal to extract a predetermined spectrum feature and a binary voice indicator, which helps determine whether the signal contains speech or noise. The system then classifies the voice signal into one of three categories: sibilance voice (e.g., sounds with high-frequency components like "s" or "sh"), non-sibilance voice (e.g., other speech sounds), or noise/silence. This classification allows for targeted processing of different signal types. Additionally, the system processes the voice signal after applying automatic gain control (AGC), which adjusts the signal's amplitude to improve clarity and reduce distortion. The combination of classification and AGC processing ensures that the voice signal is optimized for further applications, such as transcription or real-time communication. The invention improves upon prior systems by providing a more nuanced approach to voice signal handling, particularly in distinguishing sibilant sounds from other speech and noise.
12. The system of claim 10 , wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform any of: determining a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and determining a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band, determining whether the identified sibilance is excessive sibilance based on any of the ratio or the peaky degree.
This invention relates to audio processing systems designed to detect and manage excessive sibilance in speech signals. Sibilance refers to high-frequency sounds, such as "s" and "sh" sounds, which can become overly pronounced and harsh in recorded or processed audio. The system analyzes speech signals to identify sibilance and assesses whether it is excessive using two key metrics. First, it calculates the ratio of the sibilance level to the long-term level of non-sibilant speech, providing a relative measure of prominence. Second, it evaluates the "peaky degree" of the sibilance spectrum by analyzing banded energies within the sibilance frequency range, which indicates how sharply the sibilance stands out from surrounding frequencies. The system then determines if the sibilance is excessive based on either or both of these metrics. This approach allows for precise detection of problematic sibilance, enabling subsequent processing steps such as dynamic equalization or compression to improve audio quality. The system is particularly useful in applications like voice recording, telecommunication, and audio mastering, where clear and natural speech reproduction is critical.
13. The system of claim 10 , wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform processing the voice signal according to a sibilance suppression curve, and suppressing the sibilance only when its level is higher than a predetermined level threshold.
This invention relates to audio processing systems designed to reduce sibilance in voice signals. Sibilance refers to the harsh, high-frequency sounds produced by consonants like "s" and "sh," which can cause listener fatigue or distortion in recorded or transmitted audio. The system processes voice signals by applying a sibilance suppression curve, which selectively attenuates excessive sibilance. The suppression is conditional, only activating when the sibilance level exceeds a predefined threshold. This ensures natural-sounding speech while mitigating harshness. The system includes one or more processors executing instructions to analyze and modify the voice signal dynamically. The suppression curve is likely a frequency-dependent filter that targets specific sibilant frequencies. The threshold prevents over-processing, maintaining clarity in normal speech while reducing distortion in high-sibilance segments. This approach is useful in applications like telephony, voice recording, and real-time communication, where speech intelligibility and comfort are critical. The invention builds on prior audio processing techniques by introducing adaptive, threshold-based sibilance control to enhance audio quality.
14. The system of claim 13 , wherein the sibilance suppression curve is an S-shape curve and wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform suppressing the sibilance linearly or non-linearly when its level is higher than the predetermined level threshold but lower than another predetermined level threshold that is higher than the predetermined level threshold, and suppressing the sibilance by a predetermined suppression amount when its level is higher than the other predetermined level threshold.
This invention relates to audio processing systems designed to reduce sibilance, which is the harsh, high-frequency sound often produced by consonants like "s" and "sh." The system includes a processor executing instructions to analyze and modify audio signals to mitigate excessive sibilance. The sibilance suppression is controlled by an S-shaped curve, which defines how the system adjusts the audio based on the sibilance level. When the sibilance level exceeds a first threshold but remains below a second, higher threshold, the system suppresses the sibilance either linearly or non-linearly. If the sibilance level surpasses the second threshold, the system applies a fixed suppression amount to reduce the harshness. The system dynamically adjusts the suppression based on real-time audio analysis, ensuring natural-sounding speech while minimizing unpleasant high-frequency artifacts. This approach provides a balanced method for reducing sibilance without overly distorting the audio signal.
15. The system of claim 14 , wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform any of: determining a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and determining a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band, wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform: controlling an operating mode in which the sibilance is suppressed, based on any of the ratio and the peaky degree, and controlling the operating mode by any of: adjusting the predetermined suppression amount; and adjusting the predetermined suppression amount and the other predetermined level threshold.
This invention relates to audio processing systems designed to suppress sibilance in voice signals. Sibilance refers to harsh, high-frequency sounds produced by consonants like "s" and "sh," which can be unpleasant or distracting in recorded or transmitted speech. The system analyzes voice signals to identify sibilance and applies suppression techniques to reduce its prominence while preserving natural speech quality. The system includes a processor executing instructions to detect sibilance in an audio signal by comparing it to a reference level. It calculates a ratio between the sibilance level and the long-term level of non-sibilant voice sounds, providing a measure of sibilance intensity. Additionally, it evaluates the "peaky degree" of the sibilance spectrum by analyzing energy distribution in the sibilance frequency band, which helps distinguish between natural speech and excessive sibilance. Based on these metrics, the system dynamically adjusts suppression parameters. It controls the operating mode by modifying the predetermined suppression amount or both the suppression amount and a threshold level for triggering suppression. This adaptive approach ensures that sibilance is reduced effectively without over-suppressing or distorting the voice signal. The system aims to enhance audio clarity and listener comfort in applications like telephony, broadcasting, and voice recording.
16. The system of claim 10 , wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform: processing the voice signal according to a sibilance suppression curve; suppressing the sibilance by a predetermined suppression amount when its level is higher than a predetermined level threshold; wherein the system further comprises any of: a level ratio determiner that determines a ratio of the level of the identified sibilance to a long-term level of non-sibilance voices; and a peaky degree determiner that determines a peaky degree of a sibilance spectrum based on banded energies in a sibilance frequency band; wherein the sequence of computing instructions, which when executed by the one or more processors, causes the one or more processors to further perform any of: controlling an operating mode in which the sibilance is suppressed, based on any of the ratio and the peaky degree; and controlling the operating mode by any of: adjusting the predetermined suppression amount; and adjusting the predetermined suppression amount and the other predetermined level threshold.
This invention relates to audio processing systems designed to reduce sibilance in voice signals, which is a common issue in speech recordings where certain sounds (like "s" and "sh") become overly harsh or piercing. The system processes a voice signal using a sibilance suppression curve to mitigate these high-frequency distortions. When the detected sibilance exceeds a predefined level threshold, the system applies a predetermined suppression amount to reduce its intensity. The system includes components to analyze sibilance characteristics. A level ratio determiner calculates the ratio of sibilance level to the long-term level of non-sibilant speech, while a peaky degree determiner assesses the spectral shape of sibilance by analyzing energy distribution in the relevant frequency bands. These measurements inform the suppression strategy. The system dynamically adjusts the operating mode—either by modifying the suppression amount, the level threshold, or both—to optimize sibilance reduction while preserving natural speech quality. This adaptive approach ensures effective suppression without over-attenuating or distorting the voice signal. The invention is particularly useful in applications like voice recording, telecommunication, and audio enhancement where clear, balanced speech is critical.
17. The system of claim 10 , wherein the predetermined spectrum feature comprises any of: a ratio of signal energy in a sibilance frequency band to signal energy in the voice frequency band; a ratio of signal energy in the sibilance frequency band to signal energy in a non-sibilance frequency band; a ratio of signal-to-noise ratio (SNR) in the sibilance frequency band and SNR in the non-sibilance frequency band; a spectrum centroid indicating a frequency position at which a center of mass of the spectrum is located; and a spectrum flux in the sibilance frequency band.
The invention relates to audio signal processing, specifically systems for analyzing and classifying audio signals based on spectral features. The problem addressed is the need for accurate detection and characterization of sibilance in audio signals, which is important for applications such as speech enhancement, noise reduction, and voice activity detection. Sibilance refers to high-frequency sounds produced by speech, such as the "s" and "sh" sounds, which can be challenging to distinguish from noise or other audio components. The system processes an audio signal to extract and analyze spectral features that indicate the presence and intensity of sibilance. The key innovation lies in the use of predetermined spectrum features derived from the audio signal. These features include the ratio of signal energy in a sibilance frequency band (typically high frequencies) to signal energy in the voice frequency band (typically lower frequencies), as well as the ratio of signal energy in the sibilance band to energy in a non-sibilance band. Additionally, the system may analyze the signal-to-noise ratio (SNR) in the sibilance and non-sibilance bands, the spectrum centroid (a measure of the frequency position of the spectral center of mass), and the spectrum flux (a measure of spectral change over time) in the sibilance band. These features are used to distinguish sibilant sounds from other audio components, improving the accuracy of audio analysis and processing tasks. The system enhances the ability to detect and manage sibilance in real-time or offline audio processing applications.
18. The system of claim 12 , wherein the peaky degree of the sibilance spectrum is determined based on any of: geometric mean and arithmetic mean of banded energies in the voice frequency band; a variance of adjacent banded energies in the sibilance frequency band; a standard deviation of adjacent banded energies in the sibilance frequency band; a sum of differences among banded energies in the sibilance frequency band; a maximum of differences among banded energies in the sibilance frequency band; a crest factor of banded energies in the sibilance frequency band; and spectral entropy in the voice frequency band.
This invention relates to audio signal processing, specifically methods for analyzing and quantifying sibilance in voice signals. Sibilance refers to high-frequency hissing or whistling sounds in speech, often caused by fricative consonants like "s" or "sh." The system measures the "peaky degree" of the sibilance spectrum, which indicates the prominence or sharpness of sibilant sounds in a voice signal. This is useful for applications like voice enhancement, noise reduction, and speech synthesis, where controlling sibilance improves clarity and naturalness. The system calculates the peaky degree using multiple mathematical techniques applied to banded energy measurements in specific frequency ranges. These techniques include computing the geometric or arithmetic mean of energy levels in the voice frequency band, analyzing variance or standard deviation of adjacent energy bands in the sibilance frequency range, summing or identifying maximum differences among banded energies, evaluating the crest factor (ratio of peak to average energy), and calculating spectral entropy. These metrics help quantify how pronounced or irregular the sibilance is, enabling precise adjustments to audio processing algorithms. The approach ensures accurate detection and modification of sibilant sounds, improving speech intelligibility and quality in various audio applications.
19. A non-transitory computer-readable medium storing a sequence of computing instructions, which when executed by one or more processors, causes the one or more processors to perform steps of the method according to claim 1 .
This invention relates to a computer-implemented method for optimizing data processing in a distributed computing environment. The problem addressed is the inefficiency in resource allocation and task scheduling across multiple computing nodes, leading to delays and suboptimal performance. The solution involves a system that dynamically analyzes workload characteristics, network conditions, and available resources to allocate tasks to computing nodes in a way that minimizes latency and maximizes throughput. The system includes a workload analyzer that evaluates task dependencies and resource requirements, a resource monitor that tracks the status of computing nodes, and a scheduler that assigns tasks based on real-time data. The scheduler uses predictive algorithms to anticipate future workload demands and adjusts allocations accordingly. Additionally, the system includes a feedback mechanism that continuously refines scheduling decisions based on performance metrics. The method ensures efficient utilization of computing resources while adapting to changing conditions, improving overall system performance. The invention is particularly useful in cloud computing, big data processing, and high-performance computing environments where dynamic resource management is critical.
Unknown
December 15, 2020
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.