US-9270939

System and method for providing error resilience, random access and rate control in scalable video communications

PublishedFebruary 23, 2016

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Systems and methods for error resilient transmission, rate control, and random access in video communication systems that use scalable video coding are provided. Error resilience is obtained by using information from low resolution layers to conceal or compensate loss of high resolution layer information. The same mechanism is used for rate control by selectively eliminating high resolution layer information from transmitted signals, which elimination can be compensated at the receiver using information from low resolution layers. Further, random access or switching between low and high resolutions is also achieved by using information from low resolution layers to compensate for high resolution spatial layer packets that may have not been received prior to the switching time.

Patent Claims

30 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A digital video decoding system, the system comprising: a decoder logic configured to decode a received digital video signal, which is coded in a scalable video coding format supporting temporal scalability and at least one of spatial and quality scalability, wherein the scalable video coding format for spatial scalability includes a base spatial and at least one spatial enhancement layer, for quality scalability includes a base quality layer and at least one quality enhancement layer, and for temporal scalability includes a base temporal layer and at least one temporal enhancement layer, wherein the base temporal layers and enhancement temporal layers are interlinked by a threaded picture prediction structure for at least one of the spatial or quality scalability layers, wherein, for decoding a picture at a target spatial or quality layer higher than the corresponding base layer, the decoder logic is configured to use coded information from a spatial or quality layer of said picture lower than the target layer in the threaded prediction structure when a portion of the target layer's coded information is lost or not available, wherein the digital video decoding system is disposed in a receiving endpoint, the system further comprising: a linking communication network; a conferencing server computer linked to the receiving endpoint and at least one transmitting endpoint by at least one communication channel each over the linking communication network, and at least one endpoint that transmits the coded digital video that is coded in the scalable video coding format, wherein the conferencing server computer is configured to selectively eliminate portions of input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer, prior to creating an output video signal that is forwarded to the receiving endpoint.

2. The system of claim 1 wherein the conferencing server computer linked to the receiving endpoint and at least one transmitting endpoint is one of: a Transcoding Multipoint Control Unit using cascaded decoding and encoding; a Switching Multipoint Control Unit by selecting which input to transmit as output; a Scalable Video Communication Server using selective multiplexing; and a Compositing Scalable Video Communication Server using selective multiplexing and bitstream-level compositing.

3. The system of claim 1 wherein an encoder logic of the at least one transmitting endpoint is configured to encode transmitted media as frames in a threaded coding structure having a number of different temporal levels, wherein a subset of the frames (“R”) is particularly selected for reliable transport and includes at least the frames of the lowest temporal layer in the threaded coding structure so that the decoder logic can decode at least a portion of received media based on a reliably received frame of the type R after packet loss or error and thereafter is synchronized with the encoder logic, and wherein the conferencing server computer selectively eliminates portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer in non-R frames only, prior to creating the output video signal that is forwarded to the receiving endpoint.

4. The system of claim 1 , comprising: a transmitting endpoint that transmits coded digital video using a scalable video coding format; a the linking communication network links the transmitting endpoint with the receiving endpoint, wherein the transmitting endpoint is configured to selectively not transmit portions of its input video signal that correspond to layers higher than the base spatial or quality layer prior to creating an output video signal that is transmitted to the at least one receiving endpoint.

5. The system of claim 4 wherein an encoder logic of the transmitting endpoint is configured to encode transmitted media as frames in a threaded coding structure having a number of different temporal levels, wherein a subset of the frames (“R”) is particularly selected for reliable transport and includes at least the frames of the lowest temporal layer in the threaded coding structure and such that the decoder logic can decode at least a portion of received media based on a reliably received frame of the type R after packet loss or error and thereafter is synchronized with the encoder logic, and wherein the encoder logic selectively does not transmit to the at least one receiving endpoint portions of its input video signal that correspond to layers higher than the base spatial or quality layer in non-R frames only.

6. The system of claim 1 , wherein the decoder logic is configured to display decoded output pictures at a desired spatial resolution that falls in between an immediately lower and an immediately higher spatial layer provided by the coded video signal.

7. The system of claim 1 , wherein the decoder logic is further configured to operate a decoding loop of the immediately higher spatial layer at a desired spatial resolution by scaling all coded data of the immediately higher spatial layer to the desired spatial resolution, and wherein resultant drift is eliminated by using at least one of: periodic intra pictures; periodic use of intra base layer mode; and full resolution decoding of at least the lowest temporal layer of the immediately higher spatial layer.

8. The system of claim 1 , wherein the scalable video coding format includes at least one of: periodic intra pictures, periodic intra macroblocks, and threaded picture prediction, in order to avoid drift when the target layer's coded information that is lost or is not available corresponds to the base temporal layer.

9. The system of claim 1 , where the scalable video coding format is based on hybrid coding, the format comprising H.264, VC-1 or AVS standards, wherein the coded information from a spatial or quality layer lower than the target layer used by the decoder logic when some or all of the target layer's coded information is lost or is not available comprises at least one of: motion vector data, appropriately scaled for the target layer's resolution; coded prediction error difference, upsampled to the target layer's resolution; and intra data, upsampled to the target layer's resolution, and wherein the decoder logic is further configured to use the target layer's decoded pictures as references in the decoding process in order to construct decoded output pictures, rather than the lower layer decoded reference pictures.

10. The system of claim 1 , wherein the decoder logic is further configured to operate at least one decoding loop for spatial or quality layers higher than the target spatial or quality layer for at least the base temporal layer, so that when the decoder logic switches target layers it can immediately display decoded pictures at the new target layer resolution.

11. A method for decoding a digital video signal, comprising: receiving the digital video signal at a decoder logic, the digital video signal being coded in a scalable video coding format supporting temporal scalability and at least one of spatial and quality scalability, wherein the scalable video coding format for spatial scalability includes a base spatial and at least one spatial enhancement layer, for quality scalability includes a base quality layer and at least one quality enhancement layer, and for temporal scalability includes a base temporal layer and at least one temporal enhancement layer, wherein the base temporal layers and enhancement temporal layers are interlinked by a threaded picture prediction structure for at least one of the spatial or quality scalability layers; and decoding a picture at a target spatial or quality layer higher than the corresponding base layer using coded information from a spatial or quality layer of said picture lower than the target layer in the threaded prediction structure when a portion of the target layer's coded information is lost or not available; wherein the decoder is disposed in a receiving endpoint in a linking communication network, wherein a conferencing server computer is linked to the receiving endpoint and at least one transmitting endpoint by at least one communication channel each over the linking communication network, and wherein the at least one transmitting endpoint transmits the coded digital video that is coded in the scalable video coding format; the method further comprising: at the conferencing server computer, selectively eliminating portions of input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer, prior to creating an output video signal that is forwarded to the receiving endpoint.

12. The method of claim 11 wherein the conferencing server computer linked to the receiving endpoint and at least one transmitting endpoint is one of: a Transcoding Multipoint Control Unit using cascaded decoding and encoding; a Switching Multipoint Control Unit by selecting which input to transmit as output; a Scalable Video Communication Server using selective multiplexing; and a Compositing Scalable Video Communication Server using selective multiplexing and bitstream-level compositing.

13. The method of claim 11 , further comprising, at an encoder logic of the at least one transmitting endpoint, encoding transmitted media as frames in a threaded coding structure having a number of different temporal levels, wherein a subset of the frames (“R”) is particularly selected for reliable transport and includes at least the frames of the lowest temporal layer in the threaded coding structure so that the decoder logic can decode at least a portion of received media based on a reliably received frame of the type R after packet loss or error and thereafter is synchronized with the encoder logic, and wherein the conferencing server computer selectively eliminates portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer in non-R frames only, prior to creating the output video signal that is forwarded to the receiving endpoint.

14. The method of claim 11 , wherein a transmitting endpoint transmits coded digital video using a scalable video coding format; wherein the linking communication network links the transmitting endpoint with the receiving endpoint, the method further comprising, at the transmitting endpoint, selectively not transmitting portions of its input video signal that correspond to layers higher than the base spatial or quality layer, prior to creating an output video signal that is transmitted to the at least one receiving endpoint.

15. The method of claim 14 , further comprising, at the transmitting endpoint encoding transmitted media as frames in a threaded coding structure having a number of different temporal levels, wherein a subset of the frames (“R”) is particularly selected for reliable transport and includes at least the frames of the lowest temporal layer in the threaded coding structure and such that the decoder logic can decode at least a portion of received media based on a reliably received frame of the type R after packet loss or error and thereafter is synchronized with an encoder logic, and wherein the encoder logic selectively does not transmit to the at least one receiving endpoint portions of its input video signal that correspond to layers higher than the base spatial or quality layer in non-R frames only.

16. The method of claim 11 , further comprising, at the decoder logic, displaying decoded output pictures at a desired spatial resolution that falls in between an immediately lower and an immediately higher spatial layer provided by the coded video signal.

17. The method of claim 11 , further comprising, at the decoder logic, operating a decoding loop of the immediately higher spatial layer at a desired spatial resolution by scaling all coded data of the immediately higher spatial layer to the desired spatial resolution, and wherein resultant drift is eliminated by using at least one of: periodic intra pictures; periodic use of intra base layer mode; and full resolution decoding of at least the lowest temporal layer of the immediately higher spatial layer.

18. The method of claim 11 , wherein the scalable video coding format includes at least one of: periodic intra pictures, periodic intra macroblocks, and threaded picture prediction, in order to avoid drift when the target layer's coded information that is lost or is not available corresponds to the base temporal layer.

19. The method of claim 11 , where the scalable video coding format is based on hybrid coding, the format comprising H.264, VC-1 or AVS standards, wherein the coded information from a spatial or quality layer lower than the target layer used by the decoder logic when some or all of the target layer's coded information is lost or is not available comprises at least one of: motion vector data, appropriately scaled for the target layer's resolution; coded prediction error difference, upsampled to the target layer's resolution; and intra data, upsampled to the target layer's resolution, the method further comprising, at the decoder logic using the target layer's decoded pictures as references in the decoding process in order to construct decoded output pictures, rather than the lower layer decoded reference pictures.

20. The method of claim 11 further comprising, at the decoder logic operating at least one decoding loop for spatial or quality layers higher than the target spatial or quality layer for at least the base temporal layer, so that when the decoder logic switches target layers it can immediately display decoded pictures at the new target layer resolution.

21. A non-transitory computer readable medium comprising a set of executable instructions to direct a processor to decode a digital video signal, by: receiving the digital video signal at a decoder logic, the digital video signal being coded in a scalable video coding format supporting temporal scalability and at least one of spatial and quality scalability, wherein the scalable video coding format for spatial scalability includes a base spatial and at least one spatial enhancement layer, for quality scalability includes a base quality layer and at least one quality enhancement layer, and for temporal scalability includes a base temporal layer and at least one temporal enhancement layer, wherein the base temporal layers and enhancement temporal layers are interlinked by a threaded picture prediction structure for at least one of the spatial or quality scalability layers; and decoding a picture at a target spatial or quality layer higher than the corresponding base layer using coded information from a spatial or quality layer of said picture lower than the target layer in the threaded prediction structure when a portion of the target layer's coded information is lost or not available; wherein the decoder logic is disposed in a receiving endpoint in a linking communication network, wherein a conferencing server computer is linked to the receiving endpoint and at least one transmitting endpoint by at least one communication channel each over the linking communication network, and wherein the at least one transmitting endpoint transmits the coded digital video that is coded in the scalable video coding format; at the conferencing server computer, selectively eliminating portions of input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer, prior to creating an output video signal that is forwarded to the receiving endpoint.

22. The non-transitory computer readable medium of claim 21 , wherein the conferencing server computer linked to the receiving endpoint and at least one transmitting endpoint is one of: a Transcoding Multipoint Control Unit using cascaded decoding and encoding; a Switching Multipoint Control Unit by selecting which input to transmit as output; a Scalable Video Communication Server using selective multiplexing; and a Compositing Scalable Video Communication Server using selective multiplexing and bitstream-level compositing.

23. The non-transitory computer readable medium of claim 21 , further comprising executable instructions to direct the processor to, at an encoder logic of the at least one transmitting endpoint, encoding transmitted media as frames in a threaded coding structure having a number of different temporal levels, wherein a subset of the frames (“R”) is particularly selected for reliable transport and includes at least the frames of the lowest temporal layer in the threaded coding structure so that the decoder logic can decode at least a portion of received media based on a reliably received frame of the type R after packet loss or error and thereafter is synchronized with the encoder logic, and wherein the conferencing server computer selectively eliminates portions of the input video signals received from transmitting endpoints that correspond to layers higher than the base spatial or quality layer in non-R frames only, prior to creating the output video signal that is forwarded to the receiving endpoint.

24. The non-transitory computer readable medium of claim 21 , wherein a transmitting endpoint transmits coded digital video using a scalable video coding format; wherein the linking communication network links the transmitting endpoint with the receiving endpoint, further comprising executable instructions to direct the processor to, at the transmitting endpoint, selectively not transmitting portions of its input video signal that correspond to layers higher than the base spatial or quality layer, prior to creating an output video signal that is transmitted to the at least one receiving endpoint.

25. The non-transitory computer readable medium of claim 24 , further comprising executable instructions to direct the processor to, at the transmitting endpoint encoding transmitted media as frames in a threaded coding structure having a number of different temporal levels, wherein a subset of the frames (“R”) is particularly selected for reliable transport and includes at least the frames of the lowest temporal layer in the threaded coding structure and such that the decoder logic can decode at least a portion of received media based on a reliably received frame of the type R after packet loss or error and thereafter is synchronized with an encoder logic, and wherein the encoder logic selectively does not transmit to the at least one receiving endpoint portions of its input video signal that correspond to layers higher than the base spatial or quality layer in non-R frames only.

26. The non-transitory computer readable medium of claim 21 , further comprising executable instructions to direct the processor to, at the decoder logic, displaying decoded output pictures at a desired spatial resolution that falls in between an immediately lower and an immediately higher spatial layer provided by the coded video signal.

27. The non-transitory computer readable medium of claim 21 , further comprising executable instructions to direct the processor to, at the decoder logic, operating a decoding loop of the immediately higher spatial layer at a desired spatial resolution by scaling all coded data of the immediately higher spatial layer to the desired spatial resolution, and wherein resultant drift is eliminated by using at least one of: periodic intra pictures; periodic use of intra base layer mode; and full resolution decoding of at least the lowest temporal layer of the immediately higher spatial layer.

28. The non-transitory computer readable medium of claim 21 , wherein the scalable video coding format includes at least one of: periodic intra pictures, periodic intra macroblocks, and threaded picture prediction, in order to avoid drift when the target layer's coded information that is lost or is not available corresponds to the base temporal layer.

29. The non-transitory computer readable medium of claim 21 , where the scalable video coding format is based on hybrid coding, the format comprising H.264, VC-1 or AVS standards, wherein the coded information from a spatial or quality layer lower than the target layer used by the decoder logic when some or all of the target layer's coded information is lost or is not available comprises at least one of: motion vector data, appropriately scaled for the target layer's resolution; coded prediction error difference, upsampled to the target layer's resolution; and intra data, upsampled to the target layer's resolution, further comprising executable instructions to direct the processor to, at the decoder logic using the target layer's decoded pictures as references in the decoding process in order to construct decoded output pictures, rather than the lower layer decoded reference pictures.

30. The non-transitory computer readable medium of claim 21 , further comprising executable instructions to direct the processor to, at the decoder logic operating at least one decoding loop for spatial or quality layers higher than the target spatial or quality layer for at least the base temporal layer, so that when the decoder logic switches target layers it can immediately display decoded pictures at the new target layer resolution.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

H04N

Patent Metadata

Filing Date

January 28, 2014

Publication Date

February 23, 2016

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search