A method, medium, and system scalably encoding/decoding audio/speech. The method includes splitting an input signal into a low frequency band signal that is lower than a predetermined frequency and a high frequency band signal that is higher than the predetermined frequency, scalably encoding the split low frequency band signal into a core layer and one or more extension layers and then decoding the encoded core layer and the encoded extension layers, generating an error signal by using the split low frequency band signal and a decoded signal of the encoded core layer and the encoded extension layers, and encoding the error signal and the high frequency band signal into a signal-to-noise ratio (SNR) enhancement layer and a bandwidth extension layer.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A method for scalably encoding an input audio/speech signal, the method comprising: encoding, performed by using at least one processing device, a core layer signal associated with a core bandwidth, from the input audio/speech signal; encoding, performed by using at least one processing device, one or more enhancement layer signal associated with one or more extended bandwidth, respectively, from the input audio/speech signal; generating a bitstream by multiplexing the encoded core layer signal and the one or more encoded enhancement layer signal; and transmitting the bitstream to a decoding side, wherein the encoding one or more enhancement layer signal comprises: obtaining one or more extension signal from the core bandwidth and the one or more extended bandwidth; transforming the one or more extension signal from a time domain into a frequency domain; and generating the one or more encoded enhancement layer signal by encoding the one or more transformed extension signal.
A method for encoding audio/speech scalably. First, a core layer signal representing a core bandwidth is encoded from the input audio. One or more enhancement layer signals, each representing extended bandwidths, are then encoded. These encoded layers are multiplexed into a bitstream and transmitted. To encode an enhancement layer, an extension signal is obtained from the core and extended bandwidths, transformed from the time domain to the frequency domain, and then encoded to generate the enhancement layer signal. This process provides a scalable audio encoding.
2. The method of claim 1 further comprising: decoding the encoded core layer signal and the one or more encoded enhancement layer signal; generating an error signal by using the decoded core layer signal and the one or more decoded enhancement signal; and encoding the error signal into one or more signal-to-noise ratio (SNR) enhancement layer signal.
The audio encoding method also decodes the previously encoded core and enhancement layer signals. An error signal is generated by comparing the decoded core and enhancement layer signals to the original enhancement layer signals. This error signal is then encoded into one or more signal-to-noise ratio (SNR) enhancement layer signals, which improves the audio quality at the decoder. This process enhances the basic encoding with an error-correcting signal.
3. The method of claim 2 , wherein the generating of the error signal comprises generating the error signal by subtracting the decoded core layer signal and the one or more decoded enhancement layer signal from the one or more enhancement layer signal.
In the method for scalably encoding audio, generating the error signal involves subtracting the decoded core layer signal and the one or more decoded enhancement layer signals from the one or more enhancement layer signal. This subtraction creates a difference signal that represents the information lost during the initial encoding and decoding of the core and enhancement layers. This ensures the SNR enhancement signal effectively captures the errors for later correction during decoding.
4. The method of claim 3 , further comprising transforming the error signal from the time domain to the frequency domain, wherein the encoding of the error signal comprises encoding the transformed error signal into the one or more SNR enhancement layer signal.
In the method for scalably encoding audio with error correction, the generated error signal is transformed from the time domain to the frequency domain before encoding. The error signal is then encoded into one or more SNR enhancement layer signals. Transforming the error signal to the frequency domain allows for more efficient encoding and targeted improvement of the audio quality in specific frequency bands.
5. A method for scalably decoding an audio/speech signal, the method comprising: receiving a bitstream transmitted from an encoding side, the bitstream including an encoded core layer signal and one or more encoded enhancement layer signal; decoding, performed by using at least one processing device, the encoded core layer signal associated with a core bandwidth; decoding, performed by using at least one processing device, the one or more encoded enhancement layer signal associated with one or more extended bandwidth, respectively; and reconstructing a bandwidth extended signal for reproduction, based on the decoded core layer signal and the one or more decoded enhancement layer signal, wherein the decoding the one or more encoded enhancement layer comprises: decoding one or more encoded extension signal from the core bandwidth and the one or more extended bandwidth, included in the bitstream; transforming the one or more decoded extension signal from a frequency domain into a time domain; and generating the one or more transformed extension signal as the one or more decoded enhancement layer signal.
A method for decoding a scalably encoded audio/speech signal. A bitstream containing an encoded core layer and one or more encoded enhancement layers is received. The core layer signal, representing a core bandwidth, is decoded. The one or more enhancement layer signals, each representing extended bandwidths, are also decoded. A bandwidth extended signal is reconstructed based on the decoded core and enhancement layer signals for reproduction. To decode an enhancement layer, the encoded extension signal from the core and extended bandwidth, included in the bitstream, is decoded. Then, the decoded extension signal is transformed from the frequency domain to the time domain, generating a decoded enhancement layer signal.
6. The method of claim 5 further comprising: decoding one or more encoded SNR enhancement layer signal, included in the bitstream; and adding the one or more decoded SNR enhancement signal to the decoded core layer signal and the one or more decoded enhancement layer signal.
The audio decoding method further decodes one or more encoded SNR enhancement layer signals included in the bitstream. These decoded SNR enhancement signals are then added to the decoded core layer signal and the one or more decoded enhancement layer signals. This addition corrects errors and improves the signal-to-noise ratio, leading to enhanced audio quality.
7. A non-transitory computer readable recording medium having recorded thereon a computer program for executing the method of claim 5 .
A non-transitory computer-readable medium stores a computer program for performing the audio decoding method described in claim 5. That method receives a bitstream with encoded core and enhancement layers, decodes the core layer representing a core bandwidth, decodes the enhancement layers representing extended bandwidths, reconstructs a bandwidth extended signal based on the decoded layers. Decoding enhancement layers involves decoding the extension signal from frequency to time domain and generating the decoded enhancement layer signal.
8. The non-transitory computer readable recording medium of claim 7 , further comprising: decoding one or more encoded SNR enhancement layer signal, included in the bitstream; and adding the one or more decoded SNR enhancement signal to the decoded core layer signal and the one or more decoded enhancement layer signal.
The computer-readable medium storing the audio decoding program described in claim 7, also decodes one or more encoded SNR enhancement layer signals included in the bitstream and adds them to the decoded core and enhancement layers. This additional step refines the decoded audio by improving its signal-to-noise ratio, thereby correcting errors introduced during encoding and transmission.
9. A system for scalably encoding an input audio/speech signal, the system comprising: at least one processing device configured to: encode a core layer signal associated with a core bandwidth, from the input audio/speech signal; encode one or more enhancement layer signal associated with one or more extended bandwidth, respectively, from the input audio/speech signal; generate a bitstream by multiplexing the encoded core layer signal and the one or more encoded enhancement layer signal; and transmit the bitstream to a decoding side, wherein the processing device is configured to: obtain one or more extension signal from the core bandwidth and the one or more extended bandwidth; transform the one or more extension signal from a time domain into a frequency domain; and generate the one or more encoded enhancement layer signal by encoding the one or more transformed extension signal.
This invention relates to scalable audio/speech encoding systems designed to efficiently compress and transmit audio signals across varying bandwidths. The system addresses the challenge of maintaining audio quality while adapting to different network conditions or device capabilities by using a layered encoding approach. A processing device encodes a core layer signal representing a base bandwidth of the input audio/speech signal, ensuring fundamental intelligibility and quality. Additionally, one or more enhancement layer signals are encoded, each corresponding to extended bandwidths beyond the core layer. These layers are multiplexed into a single bitstream for transmission to a decoder. The system further extracts extension signals from both the core and extended bandwidths, converts them from the time domain to the frequency domain, and encodes these transformed signals to generate the enhancement layers. This layered structure allows decoders to reconstruct the audio at varying quality levels based on available bandwidth, ensuring flexibility and scalability in audio transmission. The invention optimizes bandwidth usage while preserving audio fidelity, making it suitable for applications like streaming, telecommunication, and multimedia systems.
10. The system of claim 9 , wherein the processing device is further configured to: decode the encoded core layer signal and the one or more encoded enhancement layer signal; generate an error signal by using the decoded core layer signal and the one or more decoded enhancement signal; and encode the error signal into one or more signal-to-noise ratio (SNR) enhancement layer signal.
The audio encoding system has a processor that also decodes the core and enhancement layer signals. An error signal is created by comparing the decoded layers to original enhancement layer signals. The processor then encodes this error signal into one or more SNR enhancement layer signals. This improves the audio quality at the decoder by providing a mechanism to correct errors.
11. A system for scalably decoding an audio/speech signal, the system comprising: at least one processing device configured to: receive a bitstream transmitted from an encoding side, the bitstream including an encoded core layer signal and one or more encoded enhancement layer signal; decode the encoded core layer signal associated with a core bandwidth; decode the one or more encoded enhancement layer signal associated with one or more extended bandwidth, respectively; and reconstruct a bandwidth extended signal for reproduction, based on the decoded core layer signal and the one or more decoded enhancement layer signal, wherein the processing device is configured to: decode one or more encoded extension signal from the core bandwidth and the one or more extended bandwidth, included in the bitstream; transform the one or more decoded extension signal from a frequency domain into a time domain; and generate the one or more transformed extension signal as the one or more decoded enhancement layer signal.
A system for decoding scalably encoded audio includes a processor. The processor receives a bitstream containing encoded core and enhancement layers. It decodes the core layer representing a core bandwidth and the enhancement layers representing extended bandwidths. It reconstructs a bandwidth extended signal from the decoded layers. To decode an enhancement layer, the processor decodes extension signals from frequency to time domain to generate decoded enhancement layers.
12. The system of claim 11 , wherein the processing device is further configured to: decode one or more encoded SNR enhancement layer, included in the bitstream; add the one or more decoded SNR enhancement signal to the decoded core layer and one or more decoded enhancement layer signal.
The audio decoding system has a processor that further decodes SNR enhancement layers from the bitstream. These decoded SNR enhancement signals are added to the decoded core and enhancement layer signals. This process improves the signal-to-noise ratio, correcting errors and boosting overall audio quality, offering an enhanced listening experience.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
October 5, 2012
August 15, 2017
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.