US-12568223-B2

Restrictions on decoder side motion vector derivation based on coding information

PublishedMarch 3, 2026

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Devices, systems and methods for digital video coding, which includes restrictions on decoder-side motion vector derivation based on coding information, are described. An exemplary method for video processing includes making a decision, for a conversion between a current block of a video and a bitstream representation of the video, regarding a selective enablement of a decoder side motion vector derivation (DMVD) tool for the current block, the DMVD tool deriving a refinement of motion information signaled in the bitstream representation, and the conversion using a merge mode and motion vector differences that are indicated by a motion direction and a motion magnitude, and performing, based on the decision, a conversion between the current block and the bitstream representation.

Patent Claims

Legal claims defining the scope of protection, as filed with the USPTO.

. A method for video processing, comprising:

. The method of, wherein the coding mode comprises a merge mode with a motion vector difference.

. The method of, wherein the merge mode with the motion vector difference indicates a motion vector difference and a starting point of motion information.

. The method of, wherein the motion vector difference is indicated by a motion direction and a motion magnitude, and the starting point of motion information is indicated by a merge indication, and wherein a final motion information of the current block is based on the motion vector difference and the starting point.

. The method of, wherein the DMVD tool is disabled when a condition is satisfied, and wherein the condition is at least one of:

. The method of, wherein the DMVD tool further comprises a frame-rate up conversion algorithm.

. The method of, wherein the merge mode with the motion vector difference indicates that the motion vector difference comprises a non-zero motion vector difference (MVD) component.

. The method of, wherein whether the DMVD tool for the current block is enabled is further based on whether the GBI mode is enabled .

. The method of, wherein the DMVD tool is disabled when the current block is coded using the GBI mode.

. The method of, wherein whether the DMVD tool is enabled is further based on motion information of the current block.

. The method of, wherein the DMVD tool is enabled upon a determination that (1) an absolute value of a motion vector, or (2) an absolute value of a motion vector difference, or (3) an absolute value of a motion vector of the current block is (a) less than a threshold TH1, wherein TH1≥0, or (b) greater than a threshold TH2, wherein TH2≥0.

. The method of, wherein the DMVD tool is enabled upon a determination that an absolute value of a decoded motion vector of the current block in a horizontal or vertical prediction direction is (a) less than a threshold THh1 or THv1, respectively, wherein THh1 ≥0 and THv1≥0, or (b) greater than a threshold THh2 or THv2, respectively, wherein THh2≥0 and THv2≥0.

. The method of, wherein the DMVD tool is enabled upon a determination that a function of an absolute value of a decoded motion vector (MV) of the current block in a horizontal prediction direction (MV_x) and a vertical prediction direction (MV_y) is (a) less than a threshold TH1, wherein TH1≥0, or (b) greater than a threshold TH2, wherein TH2≥0,

. The method of, wherein the conversion comprises decoding the current block from the bitstream.

. The method of, wherein the conversion comprises encoding the current block into the bitstream.

. The method of, wherein the DMVD tool is disabled when a product of a width and a height of the current block is less than a predefined value, and the predefined value is 128.

. The method of, wherein the DMVD tool is disabled when the coding mode is enabled for the conversion and a prediction signal of the current block is generated at least based on a weighted sum of an intra prediction signal and an inter prediction signal of the current block by using the coding mode.

. An apparatus for processing video data comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to:

. A non-transitory computer-readable storage medium storing instructions that cause a processor to:

. A method for storing bitstream of a video, comprising:

Detailed Description

Complete technical specification and implementation details from the patent document.

This application is a continuation of U.S. application Ser. No. 17/071,489, filed on Oct. 15, 2020, which is a continuation of International Application No. PCT/M2019/058976, filed on Oct. 22, 2019, which claims the priority to and benefits of International Patent Application Nos. PCT/CN2018/111224 and PCT/CN2018/123407, filed on Oct. 22, 2018 and Dec. 25, 2018, respectively. All the aforementioned patent applications are hereby incorporated by reference in their entireties.

This patent document relates to video coding techniques, devices and systems.

In spite of the advances in video compression, digital video still accounts for the largest bandwidth use on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, it is expected that the bandwidth demand for digital video usage will continue to grow.

Devices, systems and methods related to digital video coding, and specifically, to restrictions on decoder-side motion vector derivation based on coding information are described. The described methods may be applied to both the existing video coding standards (e.g., High Efficiency Video Coding (HEVC)) and future video coding standards or video codecs.

In one representative aspect, the disclosed technology may be used to provide a method for video processing. This method includes making a decision, for a conversion between a current block of a video and a bitstream representation of the video, regarding a selective enablement of a decoder side motion vector derivation (DMVD) tool for the current block, wherein the DMVD tool derives a refinement of motion information signaled in the bitstream representation, and wherein the conversion uses a merge mode and motion vector differences that are indicated by a motion direction and a motion magnitude; and performing, based on the decision, a conversion between the current block and the bitstream representation.

In another representative aspect, the disclosed technology may be used to provide a method for video processing. This method includes making a decision, based on decoded motion information associated with a current block of a video, regarding a selective enablement of a decoder side motion vector derivation (DMVD) tool for the current block, wherein the DMVD tool derives a refinement of motion information signaled in a bitstream representation of the video; and performing, based on the decision, a conversion between the current block and the bitstream representation.

In yet another representative aspect, the above-described method is embodied in the form of processor-executable code and stored in a computer-readable program medium.

In yet another representative aspect, a device that is configured or operable to perform the above-described method is disclosed. The device may include a processor that is programmed to implement this method.

In yet another representative aspect, a video decoder apparatus may implement a method as described herein.

The above and other aspects and features of the disclosed technology are described in greater detail in the drawings, the description and the claims.

Due to the increasing demand of higher resolution video, video coding methods and techniques are ubiquitous in modern technology. Video codecs typically include an electronic circuit or software that compresses or decompresses digital video, and are continually being improved to provide higher coding efficiency. A video codec converts uncompressed video to a compressed format or vice versa. There are complex relationships between the video quality, the amount of data used to represent the video (determined by the bit rate), the complexity of the encoding and decoding algorithms, sensitivity to data losses and errors, ease of editing, random access, and end-to-end delay (latency). The compressed format usually conforms to a standard video compression specification, e.g., the High Efficiency Video Coding (HEVC) standard (also known as H.265 or MPEG-H Part), the Versatile Video Coding standard to be finalized, or other current and/or future video coding standards.

Embodiments of the disclosed technology may be applied to existing video coding standards (e.g., HEVC, H.265) and future standards to improve compression performance. Section headings are used in the present document to improve readability of the description and do not in any way limit the discussion or the embodiments (and/or implementations) to the respective sections only.

1. Examples of Inter-Prediction in HEVC/H.265

Video coding standards have significantly improved over the years, and now provide, in part, high coding efficiency and support for higher resolutions. Recent standards such as HEVC and H.265 are based on the hybrid video coding structure wherein temporal prediction plus transform coding are utilized.

1.1 Examples of Prediction Modes

Each inter-predicted PU (prediction unit) has motion parameters for one or two reference picture lists. In some embodiments, motion parameters include a motion vector and a reference picture index. In other embodiments, the usage of one of the two reference picture lists may also be signaled using inter_pred_idc. In yet other embodiments, motion vectors may be explicitly coded as deltas relative to predictors.

When a CU is coded with skip mode, one PU is associated with the CU, and there are no significant residual coefficients, no coded motion vector delta or reference picture index. A merge mode is specified whereby the motion parameters for the current PU are obtained from neighboring PUs, including spatial and temporal candidates. The merge mode can be applied to any inter-predicted PU, not only for skip mode. The alternative to merge mode is the explicit transmission of motion parameters, where motion vector, corresponding reference picture index for each reference picture list and reference picture list usage are signaled explicitly per each PU.

When signaling indicates that one of the two reference picture lists is to be used, the PU is produced from one block of samples. This is referred to as ‘uni-prediction’. Uni-prediction is available both for P-slices and B-slices.

When signaling indicates that both of the reference picture lists are to be used, the PU is produced from two blocks of samples. This is referred to as ‘bi-prediction’. Bi-prediction is available for B-slices only.

1.1.1 Embodiments of Constructing Candidates for Merge Mode

When a PU is predicted using merge mode, an index pointing to an entry in the merge candidates list is parsed from the bitstream and used to retrieve the motion information. The construction of this list can be summarized according to the following sequence of steps:

Step 1: Initial candidates derivation

Step 2: Additional candidates insertion

shows an example of constructing a merge candidate list based on the sequence of steps summarized above. For spatial merge candidate derivation, a maximum of four merge candidates are selected among candidates that are located in five different positions. For temporal merge candidate derivation, a maximum of one merge candidate is selected among two candidates. Since constant number of candidates for each PU is assumed at decoder, additional candidates are generated when the number of candidates does not reach to maximum number of merge candidate (MaxNumMergeCand) which is signalled in slice header. Since the number of candidates is constant, index of best merge candidate is encoded using truncated unary binarization (TU). If the size of CU is equal to 8, all the PUs of the current CU share a single merge candidate list, which is identical to the merge candidate list of the 2N×2N prediction unit.

1.1.2 Constructing Spatial Merge Candidates

In the derivation of spatial merge candidates, a maximum of four merge candidates are selected among candidates located in the positions depicted in. The order of derivation is A, B, B, Aand B. Position Bis considered only when any PU of position A, B, B, Ais not available (e.g. because it belongs to another slice or tile) or is intra coded. After candidate at position Ais added, the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with same motion information are excluded from the list so that coding efficiency is improved.

To reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead only the pairs linked with an arrow inare considered and a candidate is only added to the list if the corresponding candidate used for redundancy check has not the same motion information. Another source of duplicate motion information is the “second PU” associated with partitions different from 2N×2N. As an example,depict the second PU for the case of N×2N and 2N×N, respectively. When the current PU is partitioned as N×2N, candidate at position Ais not considered for list construction. In some embodiments, adding this candidate may lead to two prediction units having the same motion information, which is redundant to just have one PU in a coding unit. Similarly, position Bis not considered when the current PU is partitioned as 2N×N.

1.1.3 Constructing Temporal Merge Candidates

In this step, only one candidate is added to the list. Particularly, in the derivation of this temporal merge candidate, a scaled motion vector is derived based on co-located PU belonging to the picture which has the smallest POC difference with current picture within the given reference picture list. The reference picture list to be used for derivation of the co-located PU is explicitly signaled in the slice header.

shows an example of the derivation of the scaled motion vector for a temporal merge candidate (as the dotted line), which is scaled from the motion vector of the co-located PU using the POC distances, tb and td, where tb is defined to be the POC difference between the reference picture of the current picture and the current picture and td is defined to be the POC difference between the reference picture of the co-located picture and the co-located picture. The reference picture index of temporal merge candidate is set equal to zero. For a B-slice, two motion vectors, one is for reference picture list 0 and the other is for reference picture list 1, are obtained and combined to make the bi-predictive merge candidate.

In the co-located PU (Y) belonging to the reference frame, the position for the temporal candidate is selected between candidates Cand C, as depicted in. If PU at position Cis not available, is intra coded, or is outside of the current CTU, position Cis used. Otherwise, position Cis used in the derivation of the temporal merge candidate.

1.1.4 Constructing Additional Types of Merge Candidates

Besides spatio-temporal merge candidates, there are two additional types of merge candidates: combined bi-predictive merge candidate and zero merge candidate. Combined bi-predictive merge candidates are generated by utilizing spatio-temporal merge candidates. Combined bi-predictive merge candidate is used for B-Slice only. The combined bi-predictive candidates are generated by combining the first reference picture list motion parameters of an initial candidate with the second reference picture list motion parameters of another. If these two tuples provide different motion hypotheses, they will form a new bi-predictive candidate.

shows an example of this process, wherein two candidates in the original list (, on the left), which have mvL0 and refldxL0 or mvL1 and refldxL1, are used to create a combined bi-predictive merge candidate added to the final list (, on the right).

Zero motion candidates are inserted to fill the remaining entries in the merge candidates list and therefore hit the MaxNumMergeCand capacity. These candidates have zero spatial displacement and a reference picture index which starts from zero and increases every time a new zero motion candidate is added to the list. The number of reference frames used by these candidates is one and two for uni- and bi-directional prediction, respectively. In some embodiments, no redundancy check is performed on these candidates.

1.1.5 Examples of Motion Estimation Regions for Parallel Processing

To speed up the encoding process, motion estimation can be performed in parallel whereby the motion vectors for all prediction units inside a given region are derived simultaneously. The derivation of merge candidates from spatial neighborhood may interfere with parallel processing as one prediction unit cannot derive the motion parameters from an adjacent PU until its associated motion estimation is completed. To mitigate the trade-off between coding efficiency and processing latency, a motion estimation region (MER) may be defined. The size of the MER may be signaled in the picture parameter set (PPS) using the “log2_parallel_merge_level_minus2” syntax element. When a MER is defined, merge candidates falling in the same region are marked as unavailable and therefore not considered in the list construction.

1.2 Embodiments of Advanced Motion Vector Prediction (AMVP)

AMVP exploits spatio-temporal correlation of motion vector with neighboring PUs, which is used for explicit transmission of motion parameters. It constructs a motion vector candidate list by firstly checking availability of left, above temporally neighboring PU positions, removing redundant candidates and adding zero vector to make the candidate list to be constant length. Then, the encoder can select the best predictor from the candidate list and transmit the corresponding index indicating the chosen candidate. Similarly with merge index signaling, the index of the best motion vector candidate is encoded using truncated unary. The maximum value to be encoded in this case is 2 (see). In the following sections, details about derivation process of motion vector prediction candidate are provided.

1.2.1 Examples of Constructing Motion Vector Prediction Candidates

summarizes derivation process for motion vector prediction candidate, and may be implemented for each reference picture list with refidx as an input.

In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidate and temporal motion vector candidate. For spatial motion vector candidate derivation, two motion vector candidates are eventually derived based on motion vectors of each PU located in five different positions as previously shown in.

For temporal motion vector candidate derivation, one motion vector candidate is selected from two candidates, which are derived based on two different co-located positions. After the first list of spatio-temporal candidates is made, duplicated motion vector candidates in the list are removed. If the number of potential candidates is larger than two, motion vector candidates whose reference picture index within the associated reference picture list is larger than 1 are removed from the list. If the number of spatio-temporal motion vector candidates is smaller than two, additional zero motion vector candidates is added to the list.

1.2.2 Constructing Spatial Motion Vector Candidates

In the derivation of spatial motion vector candidates, a maximum of two candidates are considered among five potential candidates, which are derived from PUs located in positions as previously shown in, those positions being the same as those of motion merge. The order of derivation for the left side of the current PU is defined as A, A, and scaled A, scaled A. The order of derivation for the above side of the current PU is defined as B, B, B, scaled B, scaled B, scaled B. For each side there are therefore four cases that can be used as motion vector candidate, with two cases not required to use spatial scaling, and two cases where spatial scaling is used. The four different cases are summarized as follows:

The no-spatial-scaling cases are checked first followed by the cases that allow spatial scaling. Spatial scaling is considered when the POC is different between the reference picture of the neighbouring PU and that of the current PU regardless of reference picture list. If all PUs of left candidates are not available or are intra coded, scaling for the above motion vector is allowed to help parallel derivation of left and above MV candidates. Otherwise, spatial scaling is not allowed for the above motion vector.

As shown in the example in, for the spatial scaling case, the motion vector of the neighbouring PU is scaled in a similar manner as for temporal scaling. One difference is that the reference picture list and index of current PU is given as input; the actual scaling process is the same as that of temporal scaling.

1.2.3 Constructing Temporal Motion Vector Candidates

Apart from the reference picture index derivation, all processes for the derivation of temporal merge candidates are the same as for the derivation of spatial motion vector candidates (as shown in the example in). In some embodiments, the reference picture index is signaled to the decoder.

2. Example of Inter Prediction Methods in Joint Exploration Model (JEM)

Patent Metadata

Filing Date

Unknown

Publication Date

March 3, 2026

Inventors

Unknown

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search