Patentable/Patents/US-6057884

US-6057884

Temporal and spatial scaleable coding for video object planes

PublishedMay 2, 2000

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Patent Claims

36 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of: downsampling pixel data of a first particular one of said VOPs of said input video sequence to provide a first base layer VOP having a reduced spatial resolution; upsampling pixel data of at least a portion of said first base layer VOP to provide a first upsampled VOP in said enhancement layer; differentially encoding said first upsampled VOP using said first particular one of said VOPs of said input video sequence for communication in said enhancement layer at a temporal position corresponding to said first base layer VOP; downsampling pixel data of a second particular one of said VOPs of said input video sequence to provide a second base layer VOP having a reduced spatial resolution; upsampling pixel data of at least a portion of said second base layer VOP to provide a second upsampled VOP in said enhancement layer which corresponds to said first upsampled VOP; using at least one of said first and second base layer VOPs to predict an intermediate VOP corresponding to said first and second upsampled VOPs; and encoding said intermediate VOP for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second upsampled VOPs.

2. The method of claim 1, wherein: said enhancement layer has a higher temporal resolution than said base layer; and said base and enhancement layer are adapted to provide at least one of: (a) a picture-in-picture (PIP) capability wherein a PIP image is carried in said base layer, and (b) a preview access channel capability wherein a preview access image is carried in said base layer.

3. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of: providing a first particular one of said VOPs of said input video sequence for communication in said base layer as a first base layer VOP; downsampling pixel data of at least a portion of said first base layer VOP for communication in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP; downsampling corresponding pixel data of said first particular one of said VOPs to provide a comparison VOP; differentially encoding said first downsampled VOP using said comparison VOP; differentially encoding said first base layer VOP using said first particular one of said VOPs by: determining a residue according to a difference between pixel data of said first base layer VOP and pixel data of said first particular one of said VOPs; and spatially transforming said residue to provide transform coefficients; wherein said VOPs in said input video sequence are field mode VOPs, and said first base layer VOP is differentially encoded by reordering lines of said pixel data of said first base layer VOP in a field mode prior to said determining step if said lines of pixel data meet a reordering criteria.

4. The method of claim 3, wherein: said lines of pixel data of said first base layer VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

5. A method for coding a bi-directionally predicted video object plane (B-VOP), comprising the steps of: scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer; providing first and second base layer VOPs in said base layer which correspond to said input video sequence VOPs; said second base layer VOP being predicted from said first base layer VOP according to a motion vector MV.sub.p ; providing said B-VOP in said enhancement layer at a temporal position which is intermediate to that of said first and second base layer VOPs; and encoding said B-VOP using at least one of: (a) a forward motion vector MV.sub.f and (b) a backward motion vector MV.sub.B, obtained by scaling said motion vector MV.sub.p.

6. The method of claim 5, wherein: a temporal distance TR.sub.p separates said first and second base layer VOPs; a temporal distance TR.sub.B separates said first base layer VOP and said B-VOP; m/n is a ratio of the spatial resolution of the first and second base layer VOPs to the spatial resolution of the B-VOP; and at least one of: (a) said forward motion vector MV.sub.f is determined according to the relationship MV.sub.f =(m/n).multidot.TR.sub.B .multidot.MV.sub.p /TR.sub.p ; and (b) said backward motion vector MV.sub.b is determined according to the relationship MV.sub.b =(m/n).multidot.(TR.sub.B -TR.sub.p).multidot.MV.sub.p /TR.sub.p.

7. The method of claim 5, comprising the further step of: encoding said B-VOP using at least one of: (a) a search region of said first base layer VOP whose center is determined according to said forward motion vector MV.sub.f ; and (b) a search region of said second base layer VOP whose center is determined according to said backward motion vector MV.sub.B.

8. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution; pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence; said method comprising the steps of: upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution; wherein: a second particular one of said VOPs of said input video sequence is downsampled to provide a second base layer VOP having a reduced spatial resolution; pixel data of at least a portion of said second base layer VOP is upsampled to provide a second upsampled VOP in said enhancement layer which corresponds to said first upsampled VOP; at least one of said first and second base layer VOPs is used to predict an intermediate VOP corresponding to said first and second upsampled VOPs; and said intermediate VOP is encoded for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second upsampled VOPs.

9. The method of claim 8, wherein: said enhancement layer has a higher temporal resolution than said base layer; and said base and enhancement layer are adapted to provide at least one of: (a) a picture-in-picture (PIP) capability wherein a PIP image is carried in said base layer, and (b) a preview access channel capability wherein a preview access image is carried in said base layer.

10. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP; pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP; corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and said first downsampled VOP is differentially encoded using said comparison VOP; said method comprising the steps of: upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and processing said first enhancement layer VOP with said restored associated spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution; wherein: said first base layer VOP is differentially encoded using said first particular one of said VOPs by determining a residue according to a difference between pixel data of said first base layer VOP and pixel data of said first particular one of said VOPs, and spatially transforming said residue to provide transform coefficients; and said VOPs in said input video sequence are field mode VOPs, and said first base layer VOP is differentially encoded by reordering lines of said pixel data of said first base layer VOP in a field mode prior to determining said residue if said lines of pixel data meet a reordering criteria.

11. The method of claim 10, wherein: said lines of pixel data of said first base layer VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

12. A method for recovering an input video sequence comprising video object planes (VOPs) which was scaled and communicated in a corresponding base layer and enhancement layer in a data stream, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: first and second base layer VOPs are provided in said base layer which correspond to said input video sequence VOPs; said second base layer VOP is predicted from said first base layer VOP according to a motion vector MV.sub.B ; a bi-directionally predicted video object plane (B-VOP) is provided in said enhancement layer at a temporal position which is intermediate to that of said first and second base layer VOPs; and said B-VOP is encoded using a forward motion vector MV.sub.f and a backward motion vector MV.sub.p which are obtained by scaling said motion vector MV.sub.p ; said method comprising the steps of: recovering said forward motion vector MV.sub.f and said backward motion vector MV.sub.B from said data stream; and decoding said B-VOP using said forward motion vector MV.sub.f and said backward motion vector MV.sub.B.

13. The method of claim 12, wherein: a temporal distance TR.sub.p separates said first and second base layer VOPs; a temporal distance TR.sub.B separates said first base layer VOP and said B-VOP; m/n is a ratio of the spatial resolution of the first and second base layer VOPs to the spatial resolution of the B-VOP; and at least one of: (a) said forward motion vector MV.sub.f is determined according to the relationship MV.sub.f =(m/n).multidot.TR.sub.B .multidot.MV.sub.p /TR.sub.p ; and (b) said backward motion vector MV.sub.b is determined according to the relationship MV.sub.b =(m/n).multidot.(TR.sub.B -TR.sub.p).multidot.MV.sub.p /TR.sub.p.

14. The method of claim 12, wherein: said B-VOP is encoded using at least one of: (a) a search region of said first base layer VOP whose center is determined according to said forward motion vector MV.sub.f ; and (b) a search region of said second base layer VOP whose center is determined according to said backward motion vector MV.sub.B.

15. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution; pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence; said apparatus comprising: means for upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and means for processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution; wherein: said VOPs in said input video sequence are field mode VOPs; and said first upsampled VOP is differentially encoded by reordering lines of said pixel data of said first upsampled VOP in a field mode if said lines of pixel data meet a reordering criteria, then determining a residue according to a difference between pixel data of said first unsampled VOP and pixel data of said first particular one of said VOPs of said input video sequence, and spatially transforming said residue to provide transform coefficients.

16. The apparatus of claim 15, wherein: said lines of pixel data of said first upsampled VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

17. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP; pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP; corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and said first downsampled VOP is differentially encoded using said comparison VOP; said apparatus comprising: means for upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and means for processing said first enhancement layer VOP with said restored spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution; wherein: said first downsampled VOP is differentially encoded by determining a residue according to a difference between pixel data of said first downsampled VOP and pixel data of said first particular one of said VOPs of said input video sequence, and spatially transforming said residue to provide transform coefficients; and said VOPs in said input video sequence are field mode VOPs, and said first base layer VOP is differentially encoded by reordering lines of said pixel data of said first base layer VOP in a field mode prior to determining said residue if said lines of pixel data meet a reordering criteria.

18. The apparatus of claim 17, wherein: said lines of pixel data of said first base layer VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

19. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which was scaled and communicated in a corresponding base layer and enhancement layer in a data stream, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: first and second base layer VOPs which correspond to said input video sequence VOPs are provided in said base layer; said second base layer VOP is predicted from said first base layer VOP according to a motion vector MV.sub.p ; a bi-directionally predicted video object plane (B-VOP) is provided in said enhancement layer at a temporal position which is intermediate to that of said first and second base layer VOPs; and said B-VOP is encoded using a forward motion vector MV.sub.f and a backward motion vector MV.sub.B which are obtained by scaling said motion vector MV.sub.p ; said apparatus comprising: means for recovering said forward motion vector MV.sub.f and said backward motion vector MV.sub.B from said data stream; and means for decoding said B-VOP using said forward motion vector MV.sub.f and said backward motion vector MV.sub.B.

20. The apparatus of claim 19, wherein: a temporal distance TR.sub.p separates said first and second base layer VOPs; a temporal distance TR.sub.B separates said first base layer VOP and said B-VOP; m/n is a ratio of the spatial resolution of the first and second base layer VOPs to the spatial resolution of the B-VOP; and at least one of: (a) said forward motion vector MV.sub.f is determined according to the relationship MV.sub.f =(m/n).multidot.TR.sub.B .multidot.MV.sub.p /TR.sub.p ; and (b) said backward motion vector MV.sub.b is determined according to the relationship MV.sub.b =(m/n).multidot.(TR.sub.B -TR.sub.p).multidot.MV.sub.p /TR.sub.p.

21. The apparatus of claim 19, wherein: said B-VOP is encoded using at least one of: (a) a search region of said first base layer VOP whose center is determined according to said forward motion vector MV.sub.f ; and (b) a search region of said second base layer VOP whose center is determined according to said backward motion vector MV.sub.B.

22. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of: downsampling pixel data of a first particular one of said VOPs of said input video sequence to provide a first base layer VOP having a reduced spatial resolution; upsampling pixel data of at least a portion of said first base layer VOP to provide a first upsampled VOP in said enhancement layer; differentially encoding said first upsampled VOP using said first particular one of said VOPs of said input video sequence for communication in said enhancement layer at a temporal position corresponding to said first base layer VOP; wherein said VOPs in said input video sequence are field mode VOPs, and said differentially encoding step comprises the further steps of: reordering lines of said pixel data of said first upsampled VOP in a field mode if said lines of pixel data meet a reordering criteria; then determining a residue according to a difference between pixel data of said first upsampled VOP and pixel data of said first particular one of said VOPs of said input video sequence; and spatially transforming said residue to provide transform coefficients.

23. The method of claim 22, wherein: said lines of pixel data of said first upsampled VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

24. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of: downsampling pixel data of a first particular one of said VOPs of said input video sequence to provide a first base layer VOP having a reduced spatial resolution; upsampling pixel data of at least a portion of said first base layer VOP to provide a first upsampled VOP in said enhancement layer; and differentially encoding said first upsampled VOP using said first particular one of said VOPs of said input video sequence for communication in said enhancement layer at a temporal position corresponding to said first base layer VOP; wherein: said base layer is adapted to carry higher priority, lower bit rate data, and said enhancement layer is adapted to carry lower priority, higher bit rate data.

25. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of: providing a first particular one of said VOPs of said input video sequence for communication in said base layer as a first base layer VOP; downsampling pixel data of at least a portion of said first base layer VOP for communication in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP; downsampling corresponding pixel data of said first particular one of said VOPs to provide a comparison VOP; differentially encoding said first downsampled VOP using said comparison VOP; providing a second particular one of said VOPs of said input video sequence for communication in said base layer as a second base layer VOP; downsampling pixel data of at least a portion of said second base layer VOP for communication in said enhancement layer as a second downsampled VOP at a temporal position corresponding to said second base layer VOP; downsampling corresponding pixel data of said second particular one of said VOPs to provide a comparison VOP; differentially encoding said second downsampled VOP using said comparison VOP; using at least one of said first and second base layer VOPs to predict an intermediate VOP corresponding to said first and second downsampled VOPs; and encoding said intermediate VOP for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second downsampled VOPs.

26. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of: providing a first particular one of said VOPs of said input video sequence for communication in said base layer as a first base layer VOP; downsampling pixel data of at least a portion of said first base layer VOP for communication in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP; downsampling corresponding pixel data of said first particular one of said VOPs to provide a comparison VOP; and differentially encoding said first downsampled VOP using said comparison VOP; wherein: the base and enhancement layers are adapted to provide a stereoscopic video capability in which image data in the enhancement layer has a lower spatial resolution than image data in the base layer.

27. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution; pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence; said method comprising the steps of: upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution; wherein: said VOPs in said input video sequence are field mode VOPs; and said first upsampled VOP is differentially encoded by reordering lines of said pixel data of said first upsampled VOP in a field mode if said lines of pixel data meet a reordering criteria, then determining a residue according to a difference between pixel data of said first upsampled VOP and pixel data of said first particular one of said VOPs of said input video sequence, and spatially transforming said residue to provide transform coefficients.

28. The method of claim 27, wherein: said lines of pixel data of said first upsampled VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

29. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution; pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence; said method comprising the steps of: upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution; wherein: said base layer is adapted to carry higher priority, lower bit rate data, and said enhancement layer is adapted to carry lower priority, higher bit rate data.

30. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP; pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP; corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and said first downsampled VOP is differentially encoded using said comparison VOP; said method comprising the steps of: upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and processing said first enhancement layer VOP with said restored associated spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution; wherein: a second particular one of said VOPs of said input video sequence is provided in said base layer as a second base layer VOP; pixel data of at least a portion of said second base layer VOP is downsampled and carried in said enhancement layer as a second downsampled VOP at a temporal position corresponding to said second base layer VOP; corresponding pixel data of said second particular one of said VOPs is downsampled to provide a comparison VOP; said second downsampled VOP is differentially encoded using said comparison VOP; at least one of said first and second base layer VOPs is used to predict an intermediate VOP corresponding to said first and second downsampled VOPs; and said intermediate VOP is encoded for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second downsampled VOPs.

31. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP; pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP; corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and said first downsampled VOP is differentially encoded using said comparison VOP; said method comprising the steps of: upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and processing said first enhancement layer VOP with said restored associated spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution; wherein: said base and enhancement layer are adapted to provide a stereoscopic video capability in which image data in said enhancement layer has a lower spatial resolution than image data in said base layer.

32. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution; pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence; said apparatus comprising: means for upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and means for processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution; wherein: a second particular one of said VOPs of said input video sequence is downsampled to provide a second base layer VOP having a reduced spatial resolution; pixel data of at least a portion of said second base layer VOP is upsampled to provide a second upsampled VOP in said enhancement layer which corresponds to said first upsampled VOP; at least one of said first and second base layer VOPs is used to predict an intermediate VOP corresponding to said first and second upsampled VOPs; and said intermediate VOP is encoded for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second upsampled VOPs.

33. The apparatus of claim 32, wherein: said enhancement layer has a higher temporal resolution than said base layer; and said base and enhancement layers are adapted to provide at least one of: (a) a picture-in-picture (PIP) capability wherein a PIP image is carried in said base layer, and (b) a preview access channel capability wherein a preview access image is carried in said base layer.

34. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution; pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence; said apparatus comprising: means for upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and means for processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution; wherein: said base layer is adapted to carry higher priority, lower bit rate data, and said enhancement layer is adapted to carry lower priority, higher bit rate data.

35. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP; pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP; corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and said first downsampled VOP is differentially encoded using said comparison VOP; said apparatus comprising: means for upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and means for processing said first enhancement layer VOP with said restored spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution; wherein: a second particular one of said VOPs of said input video sequence is provided for communication in said base layer as a second base layer VOP; pixel data of at least a portion of said second base layer VOP is downsampled to provide a second downsampled VOP in said enhancement layer which corresponds to said first upsampled VOP; at least one of said first and second base layer VOPs is used to predict an intermediate VOP corresponding to said first and second downsampled VOPs; and said intermediate VOP is encoded for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second downsampled VOPs.

36. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein: a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP; pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP; corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and said first downsampled VOP is differentially encoded using said comparison VOP; said apparatus comprising: means for upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and means for processing said first enhancement layer VOP with said restored spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution; wherein: said base and enhancement layer are adapted to provide a stereoscopic video capability in which image data in said enhancement layer has a lower spatial resolution than image data in said base layer.

Detailed Description

Complete technical specification and implementation details from the patent document.

DETAILED DESCRIPTION OF THE INVENTION

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

G06T H04N

Patent Metadata

Filing Date

Unknown

Publication Date

May 2, 2000

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search