Patentable/Patents/US-11979596
US-11979596

Joint coding for adaptive motion vector difference resolution

PublishedMay 7, 2024
Assigneenot available in USPTO data we have
Inventorsnot available in USPTO data we have
Technical Abstract

This disclosure relates generally to video coding and particularly to methods and systems for providing signaling schemes for jointly coding of motion vector difference with adaptive resolution in compound-reference inter-prediction. An example method for processing a current video block of a video stream is disclosed. The method includes receiving the video stream; determining from the video stream whether joint motion vector difference (MVD) coding is applied to the current video block; determining from the video stream whether adaptive MVD pixel resolution is applied to the current video block; and decoding the current video block based on whether joint MVD coding and whether adaptive MVD pixel resolution are applied to the current video block.

Patent Claims
9 claims

Legal claims defining the scope of protection. Each claim is shown in both the original legal language and a plain English translation.

Claim 6

Original Legal Text

6. The method of claim 1, wherein the at least one syntax element comprises a first flag and a second flag, the first flag indicating whether the motion vector differences of the current video block are jointly signaled or decoded and the second flag indicating whether adaptive MVD pixel resolution is applied to the current video block.

Plain English Translation

This invention relates to video coding techniques, specifically improving motion vector difference (MVD) signaling and resolution in video compression. The problem addressed is inefficient signaling and resolution adaptation of motion vectors, which can lead to redundant data transmission and suboptimal compression efficiency. The method involves using two flags to control MVD processing for a video block. The first flag determines whether the MVDs for the current block are jointly signaled or decoded, allowing for more efficient transmission when MVDs share common characteristics. The second flag indicates whether adaptive MVD pixel resolution is applied, enabling dynamic adjustment of MVD precision based on block characteristics. This adaptive resolution can reduce bitrate by using lower precision when high accuracy is unnecessary, while maintaining quality when needed. The method enhances compression efficiency by reducing redundant signaling and optimizing MVD resolution, leading to better rate-distortion performance in video encoding and decoding. The flags provide explicit control over MVD processing, allowing encoders to adaptively balance computational complexity and coding efficiency. This approach is particularly useful in modern video codecs where motion compensation plays a critical role in achieving high compression ratios.

Claim 7

Original Legal Text

7. The method of claim 6, wherein the first flag is signaled before the second flag in the video stream.

Plain English Translation

This invention relates to video encoding and decoding, specifically to signaling flags in a video stream to indicate the presence of certain data structures or features. The problem addressed is the need for efficient and clear signaling of these flags to ensure proper decoding and processing of the video stream. The method involves signaling a first flag and a second flag in a video stream, where the first flag is transmitted before the second flag. The first flag indicates the presence of a first data structure, such as a syntax element or a parameter set, in the video stream. The second flag indicates the presence of a second data structure, which may be related to or dependent on the first data structure. The order of signaling ensures that the decoder can properly interpret the first data structure before processing the second data structure, avoiding ambiguity or errors in decoding. The method may also include encoding or decoding the video stream based on the signaled flags. For example, if the first flag is set, the encoder or decoder processes the first data structure, and if the second flag is set, it processes the second data structure. The flags may be included in a header or a parameter set within the video stream, allowing the decoder to efficiently locate and interpret them. This approach improves the robustness and efficiency of video encoding and decoding by ensuring that necessary data structures are properly signaled and processed in the correct order.

Claim 8

Original Legal Text

8. The method of claim 7, wherein a single context is used for singling the second flag in the video stream, wherein the second flag comprises one of an amvd_flag, a jmvd_flag, or a joint amvd_flag.

Plain English Translation

This invention relates to video encoding and decoding, specifically improving motion vector prediction efficiency in video compression. The problem addressed is the redundancy and computational overhead in signaling motion vector differences (MVDs) for inter-predicted blocks, particularly when multiple flags are used to indicate different types of motion vector prediction modes. The method involves using a single context for signaling a second flag in a video stream, where the second flag indicates one of three possible motion vector prediction modes: an AMVD flag (adaptive motion vector difference), a JMVD flag (joint motion vector difference), or a joint AMVD flag. The single context simplifies the encoding and decoding process by reducing the number of syntax elements that need to be transmitted and processed, thereby improving coding efficiency and reducing computational complexity. This approach ensures that the appropriate motion vector prediction mode is selected based on the video content while minimizing the overhead associated with flag signaling. The method is particularly useful in advanced video coding standards where efficient motion vector prediction is critical for achieving high compression ratios.

Claim 9

Original Legal Text

9. The method of claim 8, wherein a context for signaling the second flag depends on coded information of the current video block and/or neighboring video block of the current video block.

Plain English Translation

This invention relates to video encoding and decoding, specifically improving signaling efficiency for video block processing. The problem addressed is the need to reduce redundancy in signaling flags used during video compression, particularly for syntax elements that control encoding/decoding decisions. The invention provides a method where a second flag, used to indicate a specific encoding mode or parameter for a current video block, is signaled based on context derived from coded information of the current block and/or its neighboring blocks. This context-dependent signaling reduces bitrate overhead by avoiding explicit transmission of the flag when its value can be inferred from surrounding data. The neighboring blocks may include spatially adjacent blocks or previously encoded blocks in the same coding tree unit. The method ensures that the flag's signaling is adaptive, leveraging statistical dependencies between blocks to improve compression efficiency. This approach is particularly useful in advanced video codecs where multiple encoding modes and parameters are available, and efficient signaling is critical for maintaining high compression ratios. The invention optimizes the encoding process by dynamically adjusting flag signaling based on local video content characteristics, leading to more efficient bitstream representation.

Claim 10

Original Legal Text

10. The method of claim 9, wherein the coded information comprises at least one of value of the second flag, a reference frame index, or an MVD candidate index of a neighboring video block of the current video block.

Plain English Translation

This invention relates to video encoding and decoding, specifically improving motion vector prediction efficiency in video compression. The problem addressed is the redundancy in signaling motion vector information, which increases bitrate and computational overhead. The solution involves using coded information derived from neighboring video blocks to predict motion vectors more accurately. The method involves encoding or decoding a current video block by referencing motion vector information from neighboring blocks. The coded information includes at least one of the following: a flag indicating whether to use a default motion vector or a predicted one, a reference frame index specifying which frame in the reference list to use, or an index pointing to a motion vector difference (MVD) candidate from a neighboring block. By reusing this information, the method reduces the need to transmit or compute redundant motion vector data, improving compression efficiency and processing speed. The neighboring blocks may include spatially adjacent or temporally adjacent blocks, and the method ensures compatibility with existing video coding standards by integrating seamlessly into the motion vector prediction process. The approach minimizes bitrate while maintaining or improving video quality, making it suitable for real-time applications like video streaming and conferencing. The invention optimizes both encoder and decoder operations by leveraging spatial and temporal correlations in video content.

Claim 11

Original Legal Text

11. The method of claim 6, wherein the second flag is signaled before the first flag in the video stream.

Plain English Translation

A method for signaling flags in a video stream involves transmitting a second flag before a first flag to indicate the presence of specific data or conditions in the video stream. The first flag is used to signal the availability of additional data, such as supplemental enhancement information (SEI) messages, which provide metadata or additional information about the video content. The second flag, transmitted earlier in the stream, serves as an early indicator that the first flag will follow, allowing a decoder to prepare for processing the subsequent data. This early signaling helps optimize decoding efficiency by reducing latency and ensuring that the decoder is ready to handle the additional data when it arrives. The method is particularly useful in video coding standards where timely processing of metadata is critical for maintaining synchronization and improving video quality. By structuring the transmission order of the flags, the method ensures that the decoder can efficiently manage resources and avoid delays in processing the video stream.

Claim 14

Original Legal Text

14. The method of claim 13, wherein the set of predefined conditions comprise a video block size constraint.

Plain English Translation

A method for video encoding or decoding involves processing video data by applying a set of predefined conditions to determine whether to use a specific coding mode or tool. One such predefined condition is a video block size constraint, which restricts the application of the coding mode or tool based on the size of the video block being processed. The method ensures that the coding mode or tool is only applied when the block size meets certain criteria, such as being larger or smaller than a specified threshold. This constraint helps optimize encoding efficiency, reduce computational complexity, or improve video quality by preventing the use of the coding mode or tool in block sizes where it may be ineffective or detrimental. The method may be part of a broader video coding system that includes additional steps for encoding or decoding video data, such as transforming, quantizing, or entropy coding the data. The video block size constraint ensures that the coding process adapts dynamically to different block sizes, enhancing overall performance.

Claim 16

Original Legal Text

16. The method of claim 15, wherein the position-dependent compound prediction comprises a compound wedge-based prediction.

Plain English Translation

A method for predicting compound movements in a technical system involves generating position-dependent predictions to enhance accuracy. The method focuses on systems where multiple components interact dynamically, such as in robotics, automation, or mechanical engineering, where precise movement prediction is critical. The challenge addressed is the need for accurate forecasting of compound movements, which are influenced by the positions and interactions of multiple components. The method includes generating a compound wedge-based prediction, which involves analyzing the spatial relationships and positional dependencies between components to predict their combined movements. This approach accounts for how changes in one component's position affect others, improving prediction reliability. The wedge-based technique likely involves geometric or mathematical modeling to simulate interactions and derive movement trajectories. The method may also incorporate real-time data inputs to refine predictions dynamically, ensuring adaptability to changing conditions. By leveraging position-dependent analysis, the system can anticipate compound movements more accurately than traditional methods that treat components independently. This enhances performance in applications requiring precise coordination, such as robotic arms, automated assembly lines, or vehicle navigation systems. The overall goal is to improve efficiency, reduce errors, and optimize system behavior through accurate movement forecasting.

Claim 18

Original Legal Text

18. The method of claim 1, wherein a frame or sequence level syntax element overrides lower level signaling in determining whether joint MVD coding or whether adaptive MVD pixel resolution is applied to the current video block.

Plain English Translation

This invention relates to video encoding and decoding, specifically improving motion vector difference (MVD) coding efficiency in video compression. The problem addressed is the need for flexible control over MVD coding strategies to optimize compression performance across different video content types and encoding conditions. The method involves using a syntax element at the frame or sequence level to override lower-level signaling (such as block-level or slice-level signaling) when determining whether to apply joint MVD coding or adaptive MVD pixel resolution to a current video block. Joint MVD coding combines multiple motion vector differences into a single coded unit, while adaptive MVD pixel resolution adjusts the precision of MVD values based on content characteristics. The frame or sequence-level syntax element provides a higher-level control mechanism, allowing the encoder to enforce consistent MVD coding behavior across multiple blocks or frames, which can improve compression efficiency and reduce signaling overhead. This approach is particularly useful in scenarios where lower-level decisions may not align with optimal encoding strategies for the entire frame or sequence. The method ensures that the chosen MVD coding technique is applied uniformly, reducing complexity and improving encoding/decoding efficiency.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

Patent Metadata

Filing Date

July 20, 2022

Publication Date

May 7, 2024

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Citation & reuse

Analysis on this page is generated by Patentable — an AI-powered patent intelligence platform. AI-generated summaries, explanations, FAQs, and analysis may be reused with attribution and a visible link back to the canonical URL below. Patent abstracts and claims are USPTO public domain.

Cite as: Patentable. “Joint coding for adaptive motion vector difference resolution” (US-11979596). https://patentable.app/patents/US-11979596

© 2026 Nomic Interactive Technology LLC. Machine-readable context available at /api/llm-context/US-11979596. See llms.txt for full attribution policy.

Joint coding for adaptive motion vector difference resolution