US-12328436

Subpicture sub-bitstream extraction improvements

PublishedJune 10, 2025

Assigneenot available in USPTO data we have

Inventorsnot available in USPTO data we have

Technical Abstract

Embodiments for video processing, including video coding, video decoding and video transcoding are described. One example method includes performing a conversion between a video having one or more layers having one or more video pictures having one or more subpictures and a bitstream of the video according to a rule, in which the rule defines network abstraction layer (NAL) units to be extracted from a bitstream during a sub-bitstream extraction process to output a sub-bitstream, the rule further specifies that one or more inputs to the sub-bitstream extraction process include a target output layer set (OLS) index (targetOlsIdx) that identifies an OLS index of a target OLS and is equal to an index to a list of OLSs specified by a video parameter set, and the one or more inputs satisfy a set of conditions.

Patent Claims

19 claims

Legal claims defining the scope of protection, as filed with the USPTO.

1. A method of video processing, comprising: performing a conversion between a video comprising one or more video pictures comprising one or more subpictures and a bitstream of the video, wherein the bitstream conforms to a first rule that specifies that, during a subpicture sub-bitstream extraction process, one or more parameters for a scaling window applicable to a subpicture are used to determine rewriting values of one or more syntax elements included in the bitstream for a scaling window applicable to a video picture, wherein the one or more syntax elements comprises a first set of syntax elements included in one or more picture parameter set (PPS), and the first set of syntax elements comprises at least one of: pps_scaling_win_left_offset, pps_scaling_win_right_offset, pps_scaling_win_top_offset, and pps_scaling_win_bottom_offset, wherein values of the first set of syntax elements in one or more referenced PPS network abstraction layer (NAL) units are rewritten by a calculation using original values of the first set of syntax elements from an original one or more PPS before they were rewritten and original values of the second set of syntax elements from an original one or more SPS before they were rewritten, and wherein the second set of syntax elements comprises at least one of: sps_subpic_ctu_top_left_x[spIdx], sps_subpic_width_minus1[spIdx], sps_subpic_ctu_top_left_y[spIdx], sps_subpic_height_minus1[spIdx], sps_pic_width_max_in_luma_samples, and sps_pic_height_max_in_luma_samples.

2. The method of claim 1, wherein the one or more parameters comprises at least one of a left offset, a right offset, a top offset, or a bottom offset of the scaling window applicable to the subpicture, wherein the left offset of the scaling window applicable to the subpicture is derived as follows: subpicScalWinLeftOffset=pps_scaling_win_left_offset−sps_subpic_ctu_top_left_x[spIdx]*CtbSizeY/SubWidthC, wherein the right offset of the scaling window is derived as follows: subpicScalWinRightOffset=(rightSubpicBd>=sps_pic_width_max_in_luma_samples)?pps_scaling_win_right_offset:pps_scaling_win_right_offset−(sps_pic_width_max_in_luma_samples−rightSubpicBd)/SubWidthC, and rightSubpicBd=(sps_subpic_ctu_top_left_x[spIdx]+sps_subpic_width_minus1[spIdx]+1)*CtbSizeY, wherein the top offset of the scaling window is derived as follows: subpicScalWinTopOffset=pps_scaling_win_top_offset−sps_subpic_ctu_top_left_y[spIdx]*CtbSizeY/SubHeightC, wherein the bottom offset of the scaling window applicable to the subpicture is derived as follows: subpicScalWinBotOffset=(botSubpicBd>=sps_pic_height_max_in_luma_samples)?pps_scaling_win_bottom_offset:pps_scaling_win_bottom_offset−(sps_pic_height_max_in_luma_samples−botSubpicBd)/SubHeightC, and botSubpicBd=(sps_subpic_ctu_top_left_y[spIdx]+sps_subpic_height_minus1[spIdx]+1)*CtbSizeY, wherein CtbSizeY is a width or a height of a luma coding tree block or coding tree unit, SubWidthC specifies a width of a video block and is obtained from a table according to a chroma format of a video picture including the video block, and SubHeightC indicates a height of a video block and is obtained from a table according to a chroma format of a picture including the video block, wherein subpicScalWinLeftOffset specifies the left offset of the scaling window applicable to the subpicture, subpicScalWinRightOffset specifies the right offset of the scaling window applicable to the subpicture, wherein pps_scaling_win_left_offset specifies original left offset applied to a picture size for scaling ratio calculation, pps_scaling_win_right_offset specifies an original right offset applied to a picture size for scaling ratio calculation, pps_scaling_win_top_offset specifies an original top offset applied to a picture size for scaling ratio calculation, and pps_scaling_win_bottom_offset specifies an original bottom offset applied to a picture size for scaling ratio calculation, wherein sps_subpic_ctu_top_left_x[spIdx] specifies an original x-coordinate of a coding tree unit located at a top-left corner of the spIdx-th subpicture, and sps_subpic_ctu_top_left_y[spIdx] specifies an original y-coordinate of a coding tree unit located at a top-left corner of the spIdx-th subpicture, wherein sps_pic_width_max_in_luma_samples specifies an original maximum width, in units of luma samples, of each decoded picture referring to the sequence parameter set, and sps_pic_height_max_in_luma_samples indicates an original maximum height, in units of luma samples, of each decoded picture referring to the sequence parameter set, wherein sps_subpic_width_minus1[spIdx] specifies an original width of the spIdx-th subpicture, and sps_subpic_height_minus1[spIdx] specifies an original height of the spIdx-th subpicture, and wherein subpicScalWinTopOffset specifies the top offset of the scaling window applicable to the subpicture, and subpicScalWinBotOffset specifies the bottom offset of the scaling window applicable to the subpicture.

3. The method of claim 1, wherein the conversion is performed according to a second rule that defines network abstraction layer (NAL) units to be extracted from a bitstream during a subpicture sub-bitstream extraction process to output a sub-bitstream, wherein the second rule further specifies that one or more inputs to the subpicture sub-bitstream extraction process include a target output layer set (OLS) index (targetOlsIdx) that identifies an OLS index of a target OLS and is equal to an index to a list of OLSs specified by a video parameter set, and wherein the one or more inputs satisfy a set of conditions.

4. The method of claim 3, wherein the second rule specifies that the one or more inputs further include a target highest temporal identifier value (tIdTarget), wherein the target highest temporal identifier value is in a range of 0 to a maximum number of temporal sublayers that are allowed to be present in a layer specified by the video parameter set, and wherein the maximum number of the temporal sublayers is indicated by a syntax element included in the video parameter set, and the syntax element is vps_max_sublayers_minus1.

5. The method of claim 3, wherein the second rule specifies that the one or more inputs include a list of target subpicture index values, subpicIdxTarget[i] for i from 0 to NumLayersInOls[targetOLsIdx]−1, whereby NumLayerInOls[i] specifies a number of layers in an i-th OLS and targetOLsIdx indicates a target OLS index.

6. The method of claim 5, wherein the set of conditions comprises a value of subpicIdxTarget[i] being equal to a value in a range of 0 to sps_num_subpics_minus1, such that sps_subpic_treated_as_pic_flag[subpicIdxTarget[i]] is equal to 1, wherein sps_num_subpics_minus1 and sps_subpic_treated_as_pic_flag[subpicIdxTarget[i]] are found in or inferred based on a sequence parameter set (SPS) referred to by a layer with nuh_layer_id equal to LayerIdInOls[targetOLsIdx][i], wherein sps_num_subpics_minus1 plus 1 specifies a number of subpictures in each picture in a particular layer video sequence, and sps_subpic_treated_as_pic_flag[subpicIdxTarget[i]] equal to 1 specifies that an i-th subpicture of each coded picture in a layer is treated as a picture in a decoding process excluding in-loop filtering operations, and wherein when the sps_num_subpics_minus1 for the layer with nuh_layer_id equal to LayerIdInOls[targetOlsIdx][i] is equal to 0, the value of subpicIdxTarget[i] is equal to 0.

7. The method of claim 5, wherein the set of conditions comprises that, for any two different integer values of m and n, when sps_num_subpics_minus1 is greater than 0 for both layers with nuh_layer_id equal to LayerIdInOls[targetOlsIdx][m] and LayerIdInOls[targetOlsIdx][n], respectively, subpicIdxTarget[m] is equal to subpicIdxTarget[n].

8. The method of claim 1, wherein the conversion is performed according to a third rule that specifies that, in a process of subpicture sub-bitstream extraction to output a sub-bitstream, removal of (i) a video coding layer (VCL) network abstraction layer (NAL) unit, (ii) filler data NAL units associated with the VCL NAL unit, and (iii) filler payload supplemental enhancement information (SEI) messages associated with the VCL NAL unit is performed regardless of an availability of an external means used to replace a parameter set that is removed during the subpicture sub-bitstream extraction.

9. The method of claim 8, wherein the third rule further specifies that, during the subpicture sub-bitstream extraction process, for each value of i in a range of 0 to NumLayersInOls[targetOlsIdx]−1, remove from the output sub-bitstream all VCL NAL units with nuh_layer_id equal to LayerIdInOls[targetOlsIdx][i] and sh_subpic_id not equal to SubpicIdVal[subpicIdxTarget[i]], associated filler data NAL units, and associated SEI NAL units that contain filler payload SEI messages, and wherein nuh_layer_id is a NAL unit header identifier, LayerIdInOls[targetOlsIdx] specifies a nuh_layer_id value in an output layer set (OLS) with the targetOLsIdx that is a target output layer set index, sh_subpic_id specifies a subpicture identifier of a subpicture that contains a slice, subpicIdxTarget[i] indicates target subpicture index values for i, and SubpicIdVal[subpicIdxTarget[i]] is a variable for subpicIdxTarget[i], i being an integer.

10. The method of claim 8, wherein the third rule further specifies that, during the subpicture sub-bitstream extraction process, remove all NAL units with nal_unit_type equal to FD NUT and SEI NAL units containing filler payload SEI messages in a case that sli_cbr_constraint_flag is equal to 0.

11. The method of claim 3, wherein the second rule specifies that the output sub-bitstream contains at least one video coding layer (VCL) NAL unit with nuh_layer_id equal to each of nuh_layer_id values in a list of LayerIdInOls[targetOlsIdx], whereby nuh_layer_id is a NAL unit header identifier and LayerIdInOls[targetOlsIdx] specifies the nuh_layer_id value in an OLS with the targetOLsIdx, or wherein the second rule specifies that the output sub-bitstream contains at least one VCL NAL unit with TemporalId equal to tIdTarget, whereby TemporalId indicates a target highest temporal identifier and tIdTarget is a value of TemporalId that is provided as one or more inputs to the sub-bitstream extraction process, and wherein the bitstream contains one or more coded slice NAL units with TemporalId equal to 0 without needing to contain coded slice NAL units with nuh_layer_id equal to 0, whereby nuh_layer_id specifies a NAL unit header identifier, or wherein the output sub-bitstream contains at least one VCL NAL unit with a NAL unit header identifier, nuh_layer_id, equal to a layer identifier, LayerIdInOls[targetOlsIdx][i] and with a subpicture identifier, sh_subpic_id, equal to SubpicIdVal[subpicIdxTarget[i]] for each i in a range of 0 to NumLayersInOls[targetOlsIdx]−1, whereby where NumLayerInOls[i] specifies a number of layers in an i-th OLS and i is an integer.

12. The method of claim 8, wherein the third rule further specifies that a SEI NAL unit containing an SEI message with a particular payload type does not contain another SEI message with a payload type different from the particular payload type, and the SEI message with the particular payload type corresponds to a filler payload SEI message.

13. The method of claim 8, wherein the third rule further specifies that, (1) in a case that sli_cbr_constraint_flag is equal to 1, set cbr_flag[tIdTarget][j] of the j-th CPB in a particular output layer set (OLS) and hypothetical reference decoder (HRD) parameters syntax structure in all the referenced VPSs to be equal to 1, where j is in a range of 0 to hrd_cpb_cnt_minus1, or (2) in a case that sli_cbr_constraint_flag is equal to 0, set cbr_flag[tIdTarget][j] to be equal to 0, and wherein the particular OLS and HRD parameters syntax structure has an index of idx[MultiLayerOlsIdx[targetOlsIdx]] to a list of OLS and HRD parameters syntax structures in a video parameter set (VPS) and applies to the targetOlsIdx-th OLS.

14. The method of claim 13, wherein the third rule further specifies that, (1) in the case that sli_cbr_constraint_flag is equal to 1, the output sub-bitstream is extracted without removing all NAL units with nal_unit_type equal to FD_NUT and filler payload SEI messages that are not associated with the VCL NAL units of a subpicture in subpicIdTarget[ ], or (2) in the case that sli_cbr_constraint_flag is equal to 0, the output sub-bitstream is extracted without removing all NAL units with nal_unit_type equal to FD_NUT and filler payload SEI messages.

15. The method of claim 13, wherein cbr_flag[i][j] equal to 0 specifies that to decode this bitstream by the HRD using the j-th CPB specification, a hypothetical stream scheduler (HSS) operates in an intermittent bit rate mode, and cbr_flag[i][j] equal to 1 specifies that the HSS operates in a constant bit rate (CBR) mode.

16. The method of claim 1, wherein the conversion includes encoding the video into the bitstream.

17. The method of claim 1, wherein the conversion includes decoding the video from the bitstream.

18. An apparatus for processing video data comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to: perform a conversion between a video comprising one or more video pictures comprising one or more subpictures and a bitstream of the video, wherein the bitstream conforms to a rule that specifies that, during a subpicture sub-bitstream extraction process, one or more parameters for a scaling window applicable to a subpicture are used to determine rewriting values of one or more syntax elements included in the bitstream for a scaling window applicable to a video picture, wherein the one or more syntax elements comprises a first set of syntax elements included in one or more picture parameter set (PPS), and the first set of syntax elements comprises at least one of: pps_scaling_win_left_offset, pps_scaling_win_right_offset, pps_scaling_win_top_offset, and pps_scaling_win_bottom_offset, wherein values of the first set of syntax elements in one or more referenced PPS network abstraction layer (NAL) units are rewritten by a calculation using original values of the first set of syntax elements from an original one or more PPS before they were rewritten and original values of a second set of syntax elements from an original one or more SPS before they were rewritten, wherein the second set of syntax elements comprises at least one of: sps_subpic_ctu_top_left_x[spIdx], sps_subpic_width_minus1[spIdx], sps_subpic_ctu_top_left_y[spIdx], sps_subpic_height_minus1[spIdx], sps_pic_width_max_in_luma_samples, and sps_pic_height_max_in_luma_samples.

19. A non-transitory computer-readable storage medium storing instructions that cause a processor to: perform a conversion between a video comprising one or more video pictures comprising one or more subpictures and a bitstream of the video, wherein the bitstream conforms to a rule that specifies that, during a subpicture sub-bitstream extraction process, one or more parameters for a scaling window applicable to a subpicture are used to determine rewriting values of one or more syntax elements included in the bitstream for a scaling window applicable to a video picture, wherein the one or more syntax elements comprises a first set of syntax elements included in one or more picture parameter set (PPS), and the first set of syntax elements comprises at least one of: pps_scaling_win_left_offset, pps_scaling_win_right_offset, pps_scaling_win_top_offset, and pps_scaling_win_bottom_offset, wherein values of the first set of syntax elements in one or more referenced PPS network abstraction layer (NAL) units are rewritten by a calculation using original values of the first set of syntax elements from an original one or more PPS before they were rewritten and original values of a second set of syntax elements from an original one or more SPS before they were rewritten, wherein the second set of syntax elements comprises at least one of: sps_subpic_ctu_top_left_x[spIdx], sps_subpic_width_minus1[spIdx], sps_subpic_ctu_top_left_y[spIdx], sps_subpic_height_minus1[spIdx], sps_pic_width_max_in_luma_samples, and sps_pic_height_max_in_luma_samples.

Classification Codes (CPC)

Cooperative Patent Classification codes for this invention. Click any code to explore related patents in that topic.

H04N

Patent Metadata

Filing Date

November 8, 2023

Publication Date

June 10, 2025

Want to explore more patents?

Browse 5M+ US patents with plain-English claim translations and AI-generated analysis.

Browse All Patents Try Prior Art Search