Methods to decode a picture from a bitstream are discussed. A partitioning structure of the picture is determined, wherein the partitioning structure defines at least first and second partitions of the picture. At least one dependency syntax element is decoded from the bitstream, and whether the second partition is dependent on or independent of the first partition is determined based on the at least one dependency syntax element. The picture is decoded from the bitstream based on determining whether the second partition of the picture is dependent on or independent of the first partition of the picture. Related methods of encoding and related devices are also discussed.
Legal claims defining the scope of protection, as filed with the USPTO.
. A method for decoding a picture from a bitstream, the method comprising:
Complete technical specification and implementation details from the patent document.
This application is a continuation of U.S. patent application Ser. No. 18/539,984 filed on Dec. 14, 2023, which is a continuation of U.S. patent application Ser. No. 17/437,018 filed on Sep. 7, 2021 (now U.S. Pat. No. 11,889,085), which itself is a 35 U.S.C. § 371 national stage application of PCT International Application No. PCT/SE2020/050244 filed on Mar. 5, 2020, which claims the benefit of U.S. Provisional Patent Application Ser. No. 62/815,548, filed on Mar. 8, 2019, the disclosures and content of which are incorporated by reference herein in their entireties.
The present disclosure relates generally to communication of pictures/video, and more particularly to methods providing encoding and/or decoding of pictures and/or video and related devices.
HEVC and VVC are discussed below.
High Efficiency Video Coding (HEVC), a.k.a. H.265, is a block-based video codec standardized by ITU-T and MPEG that utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra (I) prediction from within the current picture. Temporal prediction is achieved using inter (P) or bi-directional inter (B) prediction on the block level using previously decoded reference pictures. The difference between the original pixel data and the predicted pixel data, referred to as the residual, is transformed into the frequency domain, quantized and then entropy coded before transmitted together with necessary prediction parameters such as prediction mode and motion vectors, also entropy coded. By quantizing the transformed residuals, a tradeoff between bitrate and quality of the video may be controlled. The level of quantization is determined by the quantization parameter (QP). The decoder performs entropy decoding, inverse quantization and inverse transformation to obtain the residual, and then adds the residual to an intra or inter prediction to reconstruct a picture.
MPEG and ITU-T is working on the successor to HEVC within the Joint Video Exploratory Team (JVET). The name of this video codec under development is Versatile Video Coding (VCC).
QTBT is discussed below.
HEVC uses a block structure where each top-level coding block, i.e. the largest block in the coding block partitioning, referred to as the coding tree unit (CTU), may be partitioned by a quad tree (QT) structure. The following coding block partitions, referred to as coding units (CUs) can be recursively further partitioned into smaller equally sized CUs with the quad tree structure down to block size 8×8. In the current version of VVC, the block structure is a bit different compared to HEVC. The block structure in VVC is referred to as quadtree plus binary tree plus ternary tree block structure (QTBT+TT). A CU in QTBT can have either square or rectangular shapes. A coding tree unit (CTU) is first partitioned by a quad tree structure as in HEVC. Then it may be further partitioned with equally sized partitions either vertically or horizontally in a binary structure to form coding blocks, referred to as coding units (CUs). A block could thus have either a square or rectangular shape. The depth of the quad tree and binary tree can be set by the encoder in the bitstream. An example of dividing a CTU using QTBT is illustrated in. TT adds the possibility to divide a CU into three partitions instead of two equally sized partitions; increasing the possibilities to use a block structure that better fits the content structure in a picture.
illustrates an example of partitioning a CTU into CUs using QTBT.
CABAC is discussed below.
Context Adaptive Binary Arithmetic Coding (CABAC) is an entropy coding tool used in HEVC and VVC. CABAC encodes binary symbols, which keeps the complexity low and allows modelling of probabilities for more frequently used bits of a symbol. The probability models are selected adaptively based on local context, since coding modes are usually well correlated locally. The state of the CABAC machine, including the probabilities of the local context, is continuously updated based on the previous coded blocks. The CABAC states are reset for each picture, independent slice or tile.
Intra random access point (IRAP) pictures and the coded video sequence (CVS) are discussed below.
For single layer coding in HEVC, an access unit (AU) is the coded representation of a single picture. An AU may consist of several video coding layer (VCL) NAL units as well as non-VCL NAL units.
An Intra random access point (IRAP) picture in HEVC is a picture that does not refer to any pictures other than itself for prediction in its decoding process. The first picture in the bitstream in decoding order in HEVC must be an IRAP picture but an IRAP picture may additionally also appear later in the bitstream. HEVC specifies three types of IRAP pictures, the broken link access (BLA) picture, the instantaneous decoder refresh (IDR) picture and the clean random access (CRA) picture.
A coded video sequence (CVS) in HEVC is a series of access units starting at an IRAP access unit up to, but not including the next IRAP access unit in decoding order. A CRA may or may not start a CVS while the BLA and IDR types always start a new CVS.
NAL units are discussed below.
Both HEVC and VVC define a Network Abstraction Layer (NAL). All the data, i.e. both Video Coding Layer (VCL) or non-VCL data in HEVC and VVC is encapsulated in NAL units. NAL units are always byte aligned and can be seen as data packets. A VCL NAL unit contains data that represents picture sample values. A non-VCL NAL unit contains additional associated data such as parameter sets and supplemental enhancement information (SEI) messages. The NAL unit in HEVC begins with a header which specifies the NAL unit type of the NAL unit that identifies what type of data is carried in the NAL unit, the layer ID and the temporal ID for which the NAL unit belongs to. The NAL unit type is transmitted in the nal_unit_type codeword in the NAL unit header and the type indicates and defines how the NAL unit should be parsed and decoded. The rest of the bytes of the NAL unit is payload of the type indicated by the NAL unit type. A bitstream consists of a series of concatenated NAL units. A bitstream consists of a series of concatenated NAL units.
Parameter Sets are discussed below.
HEVC specifies three types of parameter sets, the picture parameter set (PPS), the sequence parameter set (SPS) and the video parameter set (VPS). The PPS contains data that is common for a whole picture, the SPS contains data that is common for a coded video sequence (CVS) and the VPS contains data that is common for multiple CVSs.
Tiles are discussed below.
The HEVC video coding standard includes a tool called tiles that divides a picture into rectangular spatially independent regions. Using tiles, a picture in HEVC can be partitioned into rows and columns of samples where a tile is an intersection of a row and a column. The tiles in HEVC are always aligned with CTU boundaries.
Slices in HEVC are discussed below.
The concept of slices in HEVC divides the picture into coded slices, where each slice is read in raster scan order in units of CTUs. A slice in HEVC may be either an independent slice or a dependent slice for which the values of some syntax elements of the slice header are inferred from the values for the preceding independent slice in decoding order. Each slice is encapsulated in its own NAL unit. Different coding types could be used for slices of the same picture, i.e. a slice could either be an I-slice, P-slice or B-slice. The main purpose of slices is to enable resynchronization in case of data loss.
Tile groups in VVC are discussed below.
The draft VVC specification does not include slices but a similar concept called tile groups. In HEVC a tile may contain one or more slices, or a slice may contain one or more tiles, but not both of those. In VVC, the concept is more straightforward; a tile group may contain one or more complete tiles. A tile group is used to group multiple tiles to reduce the overhead of each tile. Each tile group is encapsulated in its own NAL unit. For a picture, the tile groups may either be of rectangular shape or comprise one or more tiles in raster scan order. A rectangular tile group consists of M×N tiles where M is the number of tiles vertically and N the number of tiles horizontally in the tile group. A tile group in the VVC draft breaks prediction similar to a tile in HEVC.
Tile grouping as in JVET-M0853 are discussed below.
At the 13th JVET meeting in Marrakech in January 2019, rectangular tile grouping was adopted according to the JVET-M0853 proposal.
The proposed text as in JVET-M0853 is shown in the table of.
Semantics for new/modified elements ofare provided below.
single_tile_per_tile_group equal to 1 specifies that each tile group that refers to this PPS includes one tile. single_tile_per_tile_group equal to 0 specifies that a tile group that refers to this PPS may include more than one tile.
rect_tile_group_flag equal to 0 specifies that tiles within each tile group are in raster scan order and the tile group information is not signalled in PPS. rect_tile_group_flag equal to 1 specifies that tiles within each tile group cover a rectangular region of the picture and the tile group information is signalled in the PPS. When single_tile_per_tile_group_flag is equal to 1 rect_tile_group_flag is inferred to be equal to 1.
num_tile_groups_in_pic_minus1 plus 1 specifies the number of tile groups in each picture referring to the PPS. The value of num_tile_groups_in_pic_minus1 shall be in the range of 0 to (Num TilesInPic−1), inclusive. When not present and single_tile_per_tile_group_fla is equal to 1, the value of num_tile_groups_in_pic_minus1 is inferred to be equal to (NumTilesInPic−1).
top_left_tile_idx[i] specifies the tile index of the tile located at the top-left corner of the i-th tile group. The value of top_left_tile_idx[i] shall not be equal to the value of top_left_tile_idx[j] for any i not equal to j. When not present, top_left_tile_idx[i] is inferred to be equal to i. The length of the top_left_tile_idx[i] syntax element is Ceil(Log 2(NumTilesInPic) bits.
bottom_right_tile_idx[i] specifies the tile index of the tile located at the bottom-right corner of the i-th tile group. When single_tile_per_tile_group_flag is equal to 1 bottom_right_tile_idx[i] is inferred to be equal to top_left_tile_idx[i]. The length of the bottom_right_tile_idx[i] syntax element is Ceil(Log 2(NumTilesInPic)) bits.
It may be a requirement of bitstream conformance that any particular tile shall only be included in one tile group.
The variable NumTilesInTileGroup[i], which specifies the number of tiles in the tile group, and related variables, are derived as follows:
deltaTileIdx[]=bottom_right_tile__right_tile_
NumTileRowsInTileGroupMinus1[]=(deltaTileIdx[]/(_tile_columns_minus1+1))
NumTileColumnsInTileGroupMinus1[]=(deltaTileIdx[]%(_tile_columns_minus1+1))
NumTilesInTileGroup[]=(NumTileRowsInTileGroupMinus1[1)*(NumTileColumnsInTileGroupMinus1[1)
signalled_tile_group_id_flag equal to 1 specifies that the tile group ID for each tile group is signalled. signalled_tile_group_index_flag equal to 0 specifies that tile group IDs are not signalled. When rect_tile_group_flag is equal to 0, the value of signalled_tile_group_index_flag is inferred to be equal to 0.
signalled_tile_group_id_length_minus1 plus 1 specifies the number of bits used to represent the syntax element tile_group_id[i] when present, and the syntax element tile_group_address in tile group headers. The value of signalled_tile_group_index_length_minus1 shall be in the range of 0 to 15, inclusive. When not present, the value of signalled_tile_group_index_length_minus1 is inferred to be equal to Ceil(Log 2(num_tile_groups_in_pic_minus1+1))−1.
tile_group_id[i] specifies the tile group ID of the i-th tile group. The length of the tile_group_id[i] syntax element is tile_set_id_length_minus1+1 bits. When not present, tile_group_id[i] is inferred to be equal to i, for each i in the range of 0 to num_tile_groups_in_pic_minus1, inclusive.
In the present disclosure, syntax examples are shown relative to the proposed text in JVET-M0853 since at the time of writing this text was the most recent VVC tile specification text.
Segment groups, segments, units and partitions are discussed below.
Here we describe segment groups, segments and units. The term segment is used as a more general term than tiles, since embodiments of the present disclosure may be applied to different kinds of picture partitioning schemes and not only tile partitions known from HEVC and the VVC draft. In this section, tile is one embodiment of a segment, but there may also be other segment embodiments.
illustrates a picture partition.
shows a picture () of a video stream and an exemplary partitioning of the picture into units (), segments () and segment groups ().shows a picture () that consist of 64 units ().shows the segment structure () of the same picture () consisting of 16 segments (). The segment structure () is shown by dashed lines. Each segment () consists of a number of units. A segment can either consist of an integer number of complete units or a combination of complete and partial units. A number of segments form a segment group.shows the segment group partitioning of the same picture () which consist of 8 segment groups. The segment group may consist of segments in raster scan order. Alternatively, the segment group may consist of any group of segments that together form a rectangle. Alternatively, the segment group may consist of any subset of segments.
A segment may be equivalent to a tile. A segment group may be equivalent to a tile group. In this description, “tile” and “segment” can be used interchangeably. We also define in this description the term “partition” to mean “segment or segment group”. A partition may thus be either a segment or a segment group. The term unit may be equivalent to a CTU or in more general terms a block.
Encoding/decoding using known tile/segment partitioning, may reduce compression efficiency and/or cause tiling artifacts.
According to some embodiments of inventive concepts, methods are provided to decode a picture from a bitstream. A partitioning structure of the picture is determined, wherein the partitioning structure defines at least first and second partitions of the picture. At least one dependency syntax element is decoded from the bitstream, and whether the second partition is dependent on or independent of the first partition is determined based on the at least one dependency syntax element. The picture is decoded from the bitstream based on determining whether the second partition of the picture is dependent on or independent of the first partition of the picture.
Unknown
October 30, 2025
Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.