The usual coding order according to which the reference view is coded prior to the dependent view, and within each view, a depth map is coded subsequent to the respective picture, may be maintained and does lead to a sacrifice of efficiency in performing inter-view redundancy removal by, for example, predicting motion data of the current picture of the dependent view from motion data of the current picture of the reference view. Rather, a depth map estimate of the current picture of the dependent view is obtained by warping the depth map of the current picture of the reference view into the dependent view, thereby enabling various methods of inter-view redundancy reduction more efficiently by bridging the gap between the views. According to another aspect, the following discovery is exploited: the overhead associated with an enlarged list of motion predictor candidates for a block of a picture of a dependent view is comparatively low compared to a gain in motion vector prediction quality resulting from an adding of a motion vector candidate which is determined from an, in disparity-compensated sense, co-located block of a reference view.
Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.
1. A decoder for decoding a multi-view signal transmitted via a data stream, comprising: a depth estimator configured for obtaining, using a processor, a depth map of a first view; and a dependent view reconstructor configured for: processing, using the processor, a flag that signals whether first motion data associated with the first view is derived using second motion data associated with a second view; responsive to the flag signaling that the first motion data is to be derived using the second motion data associated with the second view: estimating, using the processor and based on the depth map, a disparity with respect to the first view, identifying, using the processor, for a first picture coding block in the first view, a second picture coding block in the second view based on the disparity, obtaining, using the processor, the second motion data associated with the second picture coding block in the second view, predicting, using the processor, the first motion data associated with the first picture coding block in the first view based on the second motion data and the disparity, the predicting including deriving a first reference picture index associated with the first motion data by modifying a second reference picture index associated with the second motion data such that a first picture order count of a first reference picture is equal to a second picture order count of a second reference picture, adding, using the processor, the first motion data as a candidate in a set of motion data candidates for the first picture coding block in the first view, extracting, using the processor and from the data stream, index information specifying a motion data candidate of the set of motion data candidates for the first picture coding block in the first view, and reconstructing, using the processor, the first picture coding block of the first view by prediction based on the motion data candidate specified by the index information.
This invention relates to video decoding, specifically for multi-view signals transmitted via a data stream. The problem addressed is efficient motion data prediction and reconstruction in multi-view video coding, where motion data from one view can be derived from another view to reduce redundancy and improve compression. The decoder includes a depth estimator that generates a depth map for a first view. A dependent view reconstructor processes a flag indicating whether motion data for the first view should be derived from motion data of a second view. If the flag is set, the reconstructor estimates disparity between the views using the depth map, identifies corresponding coding blocks in the second view, and obtains their motion data. The first view's motion data is then predicted by modifying the second view's motion data, including adjusting reference picture indices to align picture order counts. The predicted motion data is added to a candidate set, and index information from the data stream selects the final motion data for reconstructing the first view's coding block. This approach leverages inter-view dependencies to improve motion prediction accuracy and reduce bitrate, particularly in multi-view video applications like 3D video or immersive media.
2. The decoder of claim 1 , wherein the dependent view reconstructor is further configured for: extracting motion data residual from the data stream; generating refined motion data for the first picture coding block based on the motion data candidate and the motion data residual; and reconstructing the first picture coding block of the first view by prediction based on the refined motion data.
This invention relates to video decoding, specifically improving the reconstruction of dependent views in multi-view video coding. The problem addressed is the inefficient handling of motion data in dependent views, which can lead to poor reconstruction quality and increased computational complexity. The solution involves a decoder with a dependent view reconstructor that enhances motion data processing for better prediction accuracy. The decoder includes a dependent view reconstructor that extracts motion data candidates from a reference view. These candidates are used to predict motion data for a picture coding block in the dependent view. The reconstructor further processes the motion data by extracting a motion data residual from the data stream. This residual is combined with the motion data candidate to generate refined motion data. The first picture coding block of the dependent view is then reconstructed using prediction based on this refined motion data. This approach improves reconstruction accuracy by refining motion data with residual information, reducing artifacts and computational overhead. The method is particularly useful in multi-view video applications where efficient decoding of dependent views is critical.
3. The decoder of claim 1 , wherein the depth estimator is configured for obtaining the depth map of the first view by warping another depth map associated with the second view into the depth map of the first view.
This invention relates to depth estimation in multi-view video coding, specifically improving the accuracy of depth maps for view synthesis. The problem addressed is the challenge of generating high-quality depth maps for a first view when only a depth map from a second view is available, which is common in multi-view video applications. The solution involves a decoder with a depth estimator that warps a depth map from a second view into the coordinate system of the first view. This warping process aligns the depth information from the second view with the first view, enabling accurate depth estimation without requiring additional depth data for the first view. The depth estimator may use interpolation or other warping techniques to ensure the depth map accurately represents the first view's geometry. This approach reduces computational complexity and improves synthesis quality in multi-view video systems by leveraging existing depth information from another view. The invention is particularly useful in applications like 3D video, virtual reality, and augmented reality, where accurate depth perception is critical.
4. The decoder of claim 3 , wherein the depth estimator is configured for warping by: obtaining a second disparity associated with a second picture of the second view; and applying the second disparity to a reference depth map of the second view to derive the depth map of the first view.
This invention relates to depth estimation in multi-view video coding, specifically improving the accuracy of depth maps used for view synthesis. The problem addressed is the challenge of accurately estimating depth information from multiple camera views to enable high-quality 3D video rendering. Traditional methods often suffer from inaccuracies due to occlusions, lighting variations, or limited baseline distances between cameras. The decoder includes a depth estimator that enhances depth map accuracy by leveraging disparities from multiple views. The depth estimator obtains a second disparity associated with a second picture of a second view. This disparity is then applied to a reference depth map of the second view to derive a more accurate depth map of the first view. This process involves warping the reference depth map using the second disparity, which helps correct inconsistencies and improve depth estimation across different views. The technique is particularly useful in multi-view video coding systems where depth maps are essential for synthesizing intermediate views or enhancing 3D perception. By refining depth information through cross-view disparity analysis, the invention enables more precise and reliable depth estimation, leading to improved 3D video quality and reduced artifacts in synthesized views.
5. The decoder of claim 1 , wherein the dependent view reconstructor is further configured for: identifying additional second picture coding blocks in the second view; obtaining additional second motion data associated with the additional second picture coding blocks; and estimating the first motion data for the first picture coding block in the first view based on both the second motion data and the additional second motion data.
This invention relates to video decoding, specifically improving motion estimation in multi-view video coding systems. The problem addressed is the challenge of accurately reconstructing motion data in a primary view (first view) when relying on motion data from a secondary view (second view), particularly when the secondary view's motion data alone is insufficient for precise reconstruction. The decoder includes a dependent view reconstructor that processes motion data from multiple coding blocks in the secondary view to enhance motion estimation in the primary view. The reconstructor identifies second picture coding blocks in the secondary view and retrieves their associated motion data. It then uses this motion data, along with additional motion data from other second picture coding blocks, to estimate the motion data for a corresponding first picture coding block in the primary view. This approach leverages multiple motion data points from the secondary view to improve the accuracy and reliability of motion estimation in the primary view, reducing errors that may arise from relying on a single motion data source. The invention is particularly useful in multi-view video coding, where motion data from one view can be used to predict or refine motion data in another view, enhancing compression efficiency and video quality. The use of multiple motion data points from the secondary view ensures more robust and accurate motion estimation in the primary view.
6. The decoder of claim 1 , wherein the dependent view reconstructor configured for extracting from the data stream, using the processor, a sub-block syntax element representing a sub-block flag that indicates whether the picture coding block in the first view is to be decoded in units of sub-blocks of the picture coding block, wherein the index information specifies the motion data candidate of the set of motion data candidates for a one of the sub-blocks of the picture coding block in the first view, and the one of the sub-blocks of the picture coding block is reconstructed by prediction based on the motion data candidate specified by the index information for the one of the sub-blocks.
This invention relates to video decoding, specifically improving the efficiency of multi-view video coding by enabling sub-block level prediction. The problem addressed is the inefficiency in decoding picture coding blocks (PCBs) in multi-view video streams, where traditional methods lack flexibility in handling varying motion characteristics within a single block. The decoder includes a dependent view reconstructor that processes a data stream to extract a sub-block syntax element. This element contains a sub-block flag indicating whether a PCB in the first view should be decoded in smaller sub-block units rather than as a single block. If the flag is set, the decoder uses index information to select a motion data candidate from a predefined set for each sub-block. The selected motion data candidate is then applied to reconstruct the sub-block through prediction, allowing finer granularity in motion compensation. This approach improves decoding accuracy by adapting to local motion variations within a PCB, particularly in multi-view scenarios where inter-view dependencies are leveraged. The solution enhances compression efficiency and visual quality by enabling sub-block level motion prediction, which is critical for complex scenes with diverse motion patterns.
7. A method for decoding a multi-view signal transmitted via a data stream, comprising: obtaining a depth map of a first view; processing a flag that signals whether first motion data associated with the first view is derived using second motion data associated with a second view; responsive to the flag signaling that the first motion data is to be derived using the second motion data associated with the second view: estimating, based on the depth map, a disparity with respect to the first view, identifying, for a first picture coding block in the first view, a second picture coding block in the second view based on the disparity, obtaining the second motion data associated with the second picture coding block in the second view, predicting the first motion data associated with the first picture coding block in the first view based on the second motion data and the disparity, the predicting including deriving a first reference picture index associated with the first motion data by modifying a second reference picture index associated with the second motion data such that a first picture order count of a first reference picture is equal to a second picture order count of a second reference picture, adding the first motion data as a candidate in a set of motion data candidates for the first picture coding block in the first view, extracting, from the data stream, index information specifying a motion data candidate of the set of motion data candidates for the first picture coding block in the first view, and reconstructing the first picture coding block of the first view by prediction based on the motion data candidate specified by the index information.
This invention relates to decoding multi-view video signals, specifically improving efficiency in motion data prediction across different views. The problem addressed is the computational overhead and redundancy in independently deriving motion data for each view in a multi-view video stream, which can lead to inefficiencies in decoding. The method involves obtaining a depth map of a first view and processing a flag that indicates whether motion data for the first view should be derived from motion data of a second view. If the flag signals derivation from the second view, the method estimates a disparity between the views using the depth map. For a coding block in the first view, a corresponding block in the second view is identified based on the disparity. Motion data from the second view's block is then used to predict motion data for the first view's block. This includes adjusting a reference picture index to ensure temporal consistency between the views, such that the picture order counts of the reference pictures match. The predicted motion data is added to a set of motion candidates for the first view's block. Index information from the data stream specifies which candidate to use, and the block is reconstructed using the selected motion data. This approach reduces redundancy by reusing motion data across views while maintaining temporal coherence.
8. The method of claim 7 , further comprising: extracting motion data residual from the data stream; generating refined motion data for the first picture coding block based on the motion data candidate and the motion data residual; and reconstructing the first picture coding block of the first view by prediction based on the refined motion data.
This invention relates to video coding, specifically improving motion prediction for multi-view video coding systems. The problem addressed is the inefficiency in predicting motion data for picture coding blocks in multi-view video sequences, which can lead to redundant data transmission and reduced compression efficiency. The method involves selecting a motion data candidate for a first picture coding block in a first view of a multi-view video sequence. The motion data candidate is chosen from a set of available motion data candidates, which may include motion data from neighboring blocks or other views. After selecting the candidate, motion data residual is extracted from the data stream. This residual represents the difference between the actual motion and the predicted motion based on the candidate. The method then generates refined motion data by combining the selected motion data candidate with the extracted motion data residual. Finally, the first picture coding block is reconstructed by prediction using this refined motion data, improving the accuracy of the motion compensation process. This approach enhances compression efficiency by reducing the amount of motion data that needs to be transmitted, particularly in multi-view video coding where motion data from multiple views can be leveraged for better prediction. The refined motion data allows for more precise reconstruction of the picture coding block, leading to improved video quality at lower bitrates.
9. The method of claim 7 , wherein the step of obtaining the depth map of the first view comprises warping another depth map associated with the second view into the depth map of the first view.
This invention relates to depth map processing in multi-view imaging systems, addressing the challenge of accurately generating depth information for a first view when a depth map from a second view is available. The method involves obtaining a depth map for the first view by warping a depth map associated with the second view into the coordinate system of the first view. This warping process accounts for geometric transformations between the two views, ensuring spatial consistency in the depth representation. The technique leverages known camera parameters and relative positioning to align depth data from the second view with the first view, enabling applications such as 3D reconstruction, virtual reality, and multi-view rendering. The method may also include additional steps like refining the warped depth map to correct for occlusions or distortions, ensuring high-fidelity depth information for the first view. This approach improves depth accuracy and reduces computational overhead by reusing existing depth data rather than generating it from scratch. The invention is particularly useful in systems where real-time depth estimation is required, such as in autonomous navigation or augmented reality.
10. The method of claim 9 , wherein the step of warping comprises: obtaining a second disparity associated with a second picture of the second view; and applying the second disparity to a reference depth map of the second view to derive the depth map of the first view.
This invention relates to image processing techniques for generating depth maps in multi-view imaging systems, particularly for applications like 3D reconstruction, virtual reality, or augmented reality. The problem addressed is the accurate derivation of depth information from multiple camera views to enhance 3D scene representation. The method involves warping a depth map from a second view to a first view using disparity information. First, a second disparity is obtained for a second picture of the second view. This disparity represents the horizontal shift between corresponding points in the first and second views. The second disparity is then applied to a reference depth map of the second view to derive the depth map of the first view. This process ensures that depth information is accurately transferred between views, improving consistency and reducing errors in 3D reconstruction. The technique leverages pre-existing depth maps and disparity data to generate depth information for different viewpoints, which is crucial for applications requiring high-precision 3D modeling or real-time depth estimation. By using disparity as a transformation tool, the method avoids the need for complex depth estimation algorithms, making it efficient and suitable for real-time processing. The approach is particularly useful in scenarios where multiple cameras capture overlapping scenes, such as in autonomous driving, robotics, or immersive media.
11. The method of claim 7 , wherein the step of predicting the dependent motion data comprises: identifying additional second picture coding blocks in the second view; obtaining additional second motion data associated with the additional second picture coding blocks; and estimating the first motion data for the first picture coding block in the first view based on both the second motion data and the additional second motion data.
This invention relates to video coding techniques, specifically improving motion prediction in multi-view video coding systems. The problem addressed is the challenge of accurately predicting motion data for a coding block in one view (the first view) by leveraging motion information from another view (the second view). Traditional methods may rely on limited motion data from a single block in the second view, leading to inaccuracies. The invention enhances motion prediction by using motion data from multiple blocks in the second view. First, additional coding blocks in the second view are identified. Then, motion data associated with these additional blocks is obtained. Finally, the motion data for the coding block in the first view is estimated by combining the motion data from the original block and the additional blocks in the second view. This approach improves prediction accuracy by incorporating a broader set of motion information, reducing errors and enhancing compression efficiency in multi-view video coding. The method is particularly useful in applications requiring high-quality video reconstruction with limited bandwidth, such as 3D video streaming or virtual reality.
12. The method of claim 7 , further comprising extracting from the data stream, using the processor, a sub-block syntax element representing a sub-block flag that indicates whether the picture coding block in the first view is to be decoded in units of sub-blocks of the picture coding block, wherein the index information specifies the motion data candidate of the set of motion data candidates for a one of the sub-blocks of the picture coding block in the first view, and the one of the sub-blocks of the picture coding block is reconstructed by prediction based on the motion data candidate specified by the index information for the one of the sub-blocks.
This invention relates to video coding techniques, specifically for multi-view video coding where multiple views of a scene are encoded and decoded to enable 3D or immersive video applications. The problem addressed is efficient motion data prediction and reconstruction in multi-view video coding, particularly when decoding picture coding blocks in sub-block units to improve compression efficiency. The method involves processing a data stream representing a multi-view video sequence, where a first view is decoded using motion data derived from a second view. A sub-block syntax element is extracted from the data stream, indicating whether a picture coding block in the first view should be decoded in smaller sub-block units. If sub-block decoding is enabled, index information is used to select a motion data candidate from a set of candidates for a specific sub-block within the picture coding block. The selected motion data candidate is then applied to reconstruct that sub-block through prediction. This approach allows for finer granularity in motion compensation, improving compression efficiency while maintaining synchronization between views. The method leverages inter-view dependencies to reduce redundancy and enhance coding performance.
13. An encoder for encoding a multi-view signal into a data stream, comprising: a depth estimator configured for obtaining, using a processor, a depth map of a first view; and a dependent view encoder configured for responsive to a flag signaling that first motion data is to be derived using second motion data associated with a second view: estimating, using the processor and based on the depth map, a disparity with respect to the first view, identifying, using the processor, for a first picture coding block in the first view, a second picture coding block in the second view based on the disparity, obtaining, using the processor, the second motion data associated with the second picture coding block in the second view, predicting, using the processor, the first motion data associated with the first picture coding block in the first view based on the second motion data and the disparity, the predicting including deriving a first reference picture index associated with the first motion data by modifying a second reference picture index associated with the second motion data such that a first picture order count of a first reference picture is equal to a second picture order count of a second reference picture, adding, using the processor, the first motion data as a candidate in a set of motion data candidates for the first picture coding block in the first view, and inserting, using the processor into the data stream, the flag and index information specifying a motion data candidate of the set of motion data candidates for the first picture coding block in the first view, wherein the first picture coding block of the first view is reconstructed using prediction based on the motion data candidate specified by the index information.
This invention relates to multi-view video encoding, specifically improving motion data prediction across different camera views. The problem addressed is the inefficient encoding of motion data in multi-view video systems, where redundant motion information is often transmitted for each view, increasing bitrate without improving compression. The encoder includes a depth estimator that generates a depth map for a first view, representing the 3D structure of the scene. A dependent view encoder then processes a second view to derive motion data for the first view. When a flag is set, the encoder estimates disparity (the difference in position of corresponding points between views) using the depth map. For a coding block in the first view, it identifies a corresponding block in the second view based on this disparity. The motion data of the second view block is then used to predict the motion data of the first view block. This involves adjusting the reference picture index of the second view's motion data to ensure temporal consistency, aligning the picture order counts of the reference frames. The predicted motion data is added to a candidate set for the first view block, and the encoder inserts a flag and index into the data stream to signal which candidate was selected. The first view block is then reconstructed using the chosen motion data candidate. This approach reduces redundancy by reusing motion information across views, improving compression efficiency.
14. The encoder of claim 13 , wherein the dependent view encoder is further configured for: determining motion data residual based on a difference between the motion data candidate and the first motion data associated with the picture coding block in the first view; and inserting the motion data residual, without the motion data candidate, into the data stream.
This invention relates to video encoding, specifically improving inter-view prediction in multi-view video coding. The problem addressed is the inefficiency in transmitting motion data for dependent views, which can lead to redundant data and increased bitrate. The solution involves a dependent view encoder that refines motion data prediction by calculating a motion data residual. The encoder first identifies a motion data candidate from a reference view, then compares it to the original motion data of the current picture coding block in the dependent view. The difference between these values forms the motion data residual. Instead of transmitting the full motion data candidate, only this residual is inserted into the data stream. This reduces redundancy and improves compression efficiency. The encoder may also include a motion data predictor that generates the candidate by scaling or interpolating motion data from the reference view, ensuring accurate prediction while minimizing transmitted data. The approach is particularly useful in multi-view video applications where bandwidth efficiency is critical, such as 3D video streaming or virtual reality. By transmitting only the residual, the system achieves better compression without sacrificing prediction accuracy.
15. The encoder of claim 13 , wherein the depth estimator is configured for obtaining the depth map of the first view by warping another depth map associated with the second view into the depth map of the first view.
This invention relates to video encoding systems that use depth estimation for efficient compression. The problem addressed is improving encoding efficiency by leveraging depth information from multiple camera views. The system includes an encoder with a depth estimator that generates a depth map for a first view by warping a depth map from a second view. The depth estimator uses this warped depth map to guide the encoding process, reducing redundancy between views. The encoder may also include a motion estimator that predicts motion between views using the depth maps, further enhancing compression. The system is designed for multi-view video applications where depth information can be shared across views to improve encoding efficiency without requiring additional data transmission. The warping process aligns the depth map from the second view to the first view's perspective, enabling accurate depth-based prediction. This approach reduces computational complexity and bandwidth requirements compared to traditional multi-view encoding methods. The invention is particularly useful in 3D video, virtual reality, and autonomous driving applications where multiple camera feeds must be efficiently encoded.
16. The encoder of claim 15 , wherein the depth estimator is configured for warping by: obtaining a second disparity associated with a second picture of the second view; and applying the second disparity to a reference depth map of the second view to derive the depth map of the first view.
This invention relates to video encoding, specifically improving depth estimation for multi-view video coding. The problem addressed is the computational complexity and accuracy challenges in generating depth maps for different camera views, which are essential for efficient multi-view video compression. The encoder includes a depth estimator that warps a reference depth map from a second view to a first view. The warping process involves obtaining a second disparity associated with a second picture of the second view and applying this disparity to the reference depth map of the second view. This generates a depth map for the first view, enabling accurate depth-based prediction and compression. The disparity represents the horizontal shift between corresponding points in the two views, allowing the depth estimator to project depth information from one view to another. This technique reduces the need for independent depth estimation in each view, improving encoding efficiency and reducing computational overhead. The method leverages existing disparity information to derive depth maps, enhancing accuracy while minimizing additional processing. This approach is particularly useful in multi-view video applications such as 3D video, virtual reality, and immersive media, where efficient depth representation is critical for high-quality compression.
17. The encoder of claim 13 , wherein the step of predicting the first motion data includes: identifying additional second picture coding blocks in the second view; obtaining additional second motion data associated with the additional second picture coding blocks; and estimating the first motion data for the first picture coding block in the first view based on both the second motion data and the additional second motion data.
This invention relates to video encoding, specifically improving motion prediction in multi-view video coding. The problem addressed is the inefficiency in predicting motion data for a coding block in one view (first view) by relying solely on motion data from a corresponding block in another view (second view). The solution enhances prediction accuracy by incorporating motion data from neighboring blocks in the second view. The encoder processes a first picture in a first view and a second picture in a second view, where the second picture is a reference for the first. The encoder identifies a first picture coding block in the first view and a corresponding second picture coding block in the second view. Motion data for the second picture coding block is obtained, but instead of using only this data, the encoder also identifies additional second picture coding blocks adjacent to the second picture coding block. Motion data for these additional blocks is obtained and combined with the second motion data to estimate the first motion data for the first picture coding block. This multi-block approach improves prediction accuracy by leveraging spatial correlations in the second view. The method reduces redundancy and enhances compression efficiency in multi-view video encoding by refining motion prediction through broader contextual analysis. The encoder may apply this technique to any coding block in the first view, ensuring consistent improvements across the entire picture. The solution is particularly useful in applications requiring high-quality multi-view video, such as 3D video streaming or virtual reality.
18. The encoder of claim 13 , wherein the dependent view encoder is configured for inserting into the data stream, using the processor, a sub-block syntax element representing a sub-block flag that indicates whether the picture coding block in the first view is to be coded in units of sub-blocks of the picture coding block, wherein the index information specifies the motion data candidate of the set of motion data candidates for a one of the sub-blocks of the picture coding block in the first view, and the one of the sub-blocks of the picture coding block is reconstructed by prediction based on the motion data candidate specified by the index information for the one of the sub-blocks.
This invention relates to video encoding, specifically improving multi-view video coding efficiency by enabling sub-block level motion prediction. The problem addressed is the inefficiency in encoding dependent views in multi-view video systems, where traditional block-based motion prediction may not fully exploit correlations between views, leading to redundant data and higher bitrate. The encoder includes a dependent view encoder that processes a first view of a multi-view video sequence. The encoder inserts a sub-block syntax element into the data stream, which includes a sub-block flag indicating whether a picture coding block in the first view should be coded in smaller sub-block units. If enabled, the encoder uses index information to specify a motion data candidate from a set of candidates for each sub-block within the picture coding block. The selected motion data candidate is then used to reconstruct the sub-block via prediction, improving accuracy and reducing redundancy. This approach allows finer granularity in motion compensation, particularly useful for regions with complex motion or view-dependent disparities. The encoder dynamically adjusts the coding unit size based on content characteristics, optimizing compression efficiency while maintaining reconstruction quality.
Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.
May 11, 2020
February 1, 2022
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.