Managing Jitter Buffer Length for Improved Audio Quality

PublishedMay 24, 2022

Assigneenot available in USPTO data we have

InventorsMatthieu Hodgkinson Florian Heese Georg Bannasch

Technical Abstract

Patent Claims

19 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method of improving audio quality in real-time communications over a computer network, the method comprising: providing a jitter buffer configured to temporarily hold audio data received by a computing device over a computer network; measuring packet delays of a plurality of packets received by the computing device, each of the plurality of packets carrying a respective set of audio samples; constructing a histogram of the measured packet delays, the histogram including a set of buckets, each bucket representing a respective packet delay range and counting a number of audio samples that arrived in packets having delays within the respective packet delay range; for each of the set of buckets, generating a prediction of audio playback quality for a trial jitter buffer length set based on the packet delay range represented by the respective bucket; and setting a length of the jitter buffer based on an identified trial jitter buffer length for which a highest audio playback quality is predicted.

2. A method of improving audio quality in real-time communications over a computer network, the method comprising: generating, during a communication session between at least a first computing device and a second computing device over the computer network, multiple audio factors of the communication session, each of the audio factors reflecting a respective characteristic that is susceptible to degradation; combining the audio factors to produce an overall measure of audio quality; and taking remedial action to improve the overall measure of audio quality by adjusting a jitter buffer length of a jitter buffer configured to temporarily hold audio data received by the first computing device over the computer network prior to decoding the audio data, wherein the audio factors include a delay impairment factor generated by a delay impairment estimator, the delay impairment estimator: receiving a first input that provides a current jitter buffer length; receiving a second input that indicates a measure of audio interactivity between the first computing device and the second computing device; and providing an output that conveys a measure of audio quality based on the first input and the second input.

3. The method of claim 2 , wherein the audio data includes audio samples and is received in multiple packets having respective sequence identifiers, the sequence identifiers indicating an order in which the packets are generated, and wherein the method further comprises: ordering the audio samples in the jitter buffer based on the sequence identifiers of the packets; and providing the ordered audio samples to an audio decoder configured to decode the audio data.

4. The method of claim 3 , wherein a sequence identifier of a respective packet includes a sample index of an audio sample transmitted in the respective packet, the sample index increasing monotonically for successive audio samples.

5. The method of claim 3 , wherein the ordered audio data includes a gap where a packet is missing, and wherein the method further comprises: receiving the missing packet after the decoder has processed a portion of the ordered audio data corresponding to the gap; and discarding the missing packet after it has been received.

6. The method of claim 2 , further comprising performing a jitter-buffer-length optimization by: measuring packet delays of a plurality of packets received by the first computing device, each of the plurality of packets carrying a respective set of audio samples; constructing a histogram of the measured packet delays, the histogram including a set of buckets, each bucket representing a respective packet delay range and counting a number of audio samples that arrived in packets having delays within the respective packet delay range; for each of the set of buckets, generating a prediction of audio playback quality for a trial jitter buffer length set based on the packet delay range represented by the respective bucket; identifying a trial jitter buffer length for which a highest audio playback quality is predicted; and setting the jitter buffer length based on the identified trial jitter buffer length.

7. The method of claim 6 , wherein generating the prediction of audio playback quality includes, for each of the set of buckets: providing a set of audio factors for the trial jitter buffer length set based on the packet delay range represented by the respective bucket; transforming each of the set of audio factors for the respective bucket into a corresponding MOS (Mean Opinion Score) value, each MOS value providing a standardized measure of audio quality; and combining the set of MOS values to generate the prediction of audio playback quality for the respective bucket.

8. The method of claim 6 , wherein the audio factors further include a loss impairment factor generated by a loss impairment estimator, the loss impairment estimator: receiving an input that provides a current jitter buffer length; tracking gaps in audio data, the gaps arising from packets that were expected but did not arrive within the current jitter buffer length; and providing an output that conveys a measure of audio quality based on the current jitter buffer length and the gaps.

9. The method of claim 6 , wherein the audio factors further include a time-scaling impairment factor generated by a time-scaling impairment estimator, the time-scaling impairment estimator: receiving input that indicates a difference between a current jitter buffer length and a target jitter buffer length; and providing an output that conveys a measure of audio quality based on performing time scaling from the current jitter buffer length to the target jitter buffer length.

10. The method of claim 2 , wherein the audio factors further include a loss impairment factor generated by a loss impairment estimator, the loss impairment estimator: receiving an input that provides a current jitter buffer length; tracking gaps in audio data, the gaps arising from packets that were expected but did not arrive within the current jitter buffer length; and providing an output that conveys a measure of audio quality based on the current jitter buffer length and the gaps.

11. The method of claim 2 , wherein the audio factors further include a time-scaling impairment factor generated by a time-scaling impairment estimator, the time-scaling impairment estimator: receiving input that indicates a difference between a current jitter buffer length and a target jitter buffer length; and providing an output that conveys a measure of audio quality based on performing time scaling from the current jitter buffer length to the target jitter buffer length.

12. The method of claim 2 , wherein the delay impairment estimator further receives a third input that indicates a two-way mouth-to-ear (MTE) delay between the first computing device and the second computing device, and wherein the output is further based on a two-way mouth-to-ear (MTE) delay.

13. The method of claim 2 , wherein combining the audio factors to produce the overall measure of audio quality includes: transforming the audio factors into corresponding MOS (Mean Opinion Score) values, each MOS value providing a standardized measure of audio quality; and combining the MOS values to generate the overall measure of audio quality.

14. A computer program product including a set of non-transitory, computer-readable media having instructions which, when executed by control circuitry of a first computing device, cause the control circuitry to perform a method of improving audio quality in real-time communications over a computer network, the method comprising: generating, during a communication session between at least the first computing device and a second computing device over the computer network, multiple audio factors of the communication session, each of the audio factors reflecting a respective characteristic that is susceptible to degradation; combining the audio factors to produce an overall measure of audio quality; and taking remedial action to improve the overall measure of audio quality by adjusting a jitter buffer length of a jitter buffer configured to temporarily hold audio data received by the first computing device over the computer network prior to decoding the audio data, wherein the audio factors include a loss impairment factor generated by a loss impairment estimator, the loss impairment estimator: receiving an input that provides a current jitter buffer length; tracking gaps in audio data, the gaps arising from packets that were expected but did not arrive within the current jitter buffer length; and providing an output that conveys a measure of audio quality based on the current jitter buffer length and the gaps.

15. The computer program product of claim 14 , wherein the method further comprises performing a jitter-buffer-length optimization by: measuring packet delays of a plurality of packets received by the first computing device, each of the plurality of packets carrying a respective set of audio samples; constructing a histogram of the measured packet delays, the histogram including a set of buckets, each bucket representing a respective packet delay range and counting a number of the audio samples that arrived in packets having delays within the respective packet delay range; for each of the set of buckets, generating a prediction of audio playback quality for a trial jitter buffer length set based on the packet delay range represented by the respective bucket; identifying a trial jitter buffer length for which a highest audio playback quality is predicted; and setting the jitter buffer length based on the identified trial jitter buffer length.

16. The computer program product of claim 15 , wherein generating the prediction of audio playback quality includes, for each of the set of buckets: providing a set of audio factors for the trial jitter buffer length set based on the packet delay range represented by the respective bucket; transforming each of the set of audio factors for the respective bucket into a corresponding MOS (Mean Opinion Score) value, each MOS value providing a standardized measure of audio quality; and combining the set of MOS values to generate the prediction of audio playback quality for the respective bucket.

17. The computer program product of claim 14 , wherein the audio factors further include a delay impairment factor generated by a delay impairment estimator, the delay impairment estimator: receiving a first input that provides a current jitter buffer length; receiving a second input that indicates a measure of audio interactivity between the first computing device and the second computing device; and providing an output that conveys a measure of audio quality based on the first input and the second input.

18. The computer program product of claim 14 , wherein the audio factors further include a time-scaling impairment factor generated by a time-scaling impairment estimator, the time-scaling impairment estimator: receiving input that indicates a difference between a current jitter buffer length and a target jitter buffer length; and providing an output that conveys a measure of audio quality based on performing time scaling from the current jitter buffer length to the target jitter buffer length.

19. A method of improving audio quality in real-time communications over a computer network, the method comprising: generating, during a communication session between at least a first computing device and a second computing device over the computer network, multiple audio factors of the communication session, each of the audio factors reflecting a respective characteristic that is susceptible to degradation; combining the audio factors to produce an overall measure of audio quality; and taking remedial action to improve the overall measure of audio quality by adjusting a jitter buffer length of a jitter buffer configured to temporarily hold audio data received by the first computing device over the computer network prior to decoding the audio data, wherein the audio factors include a time-scaling impairment factor generated by a time-scaling impairment estimator, the time-scaling impairment estimator: receiving input that indicates a difference between a current jitter buffer length and a target jitter buffer length; and providing an output that conveys a measure of audio quality based on performing time scaling from the current jitter buffer length to the target jitter buffer length.

Patent Metadata

Filing Date

Unknown

Publication Date

May 24, 2022

Inventors

Matthieu Hodgkinson

Florian Heese

Georg Bannasch

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search