| rfc9328.original | rfc9328.txt | |||
|---|---|---|---|---|
| avtcore S. Zhao | Internet Engineering Task Force (IETF) S. Zhao | |||
| Internet-Draft Intel | Request for Comments: 9328 Intel | |||
| Intended status: Standards Track S. Wenger | Category: Standards Track S. Wenger | |||
| Expires: 2 February 2023 Tencent | ISSN: 2070-1721 Tencent | |||
| Y. Sanchez | Y. Sanchez | |||
| Fraunhofer HHI | Fraunhofer HHI | |||
| Y.-K. Wang | Y.-K. Wang | |||
| Bytedance Inc. | Bytedance Inc. | |||
| M. M Hannuksela | M. M Hannuksela | |||
| Nokia Technologies | Nokia Technologies | |||
| 1 August 2022 | December 2022 | |||
| RTP Payload Format for Versatile Video Coding (VVC) | RTP Payload Format for Versatile Video Coding (VVC) | |||
| draft-ietf-avtcore-rtp-vvc-18 | ||||
| Abstract | Abstract | |||
| This memo describes an RTP payload format for the video coding | This memo describes an RTP payload format for the Versatile Video | |||
| standard ITU-T Recommendation H.266 and ISO/IEC International | Coding (VVC) specification, which was published as both ITU-T | |||
| Standard 23090-3, both also known as Versatile Video Coding (VVC) and | Recommendation H.266 and ISO/IEC International Standard 23090-3. VVC | |||
| developed by the Joint Video Experts Team (JVET). The RTP payload | was developed by the Joint Video Experts Team (JVET). The RTP | |||
| format allows for packetization of one or more Network Abstraction | payload format allows for packetization of one or more Network | |||
| Layer (NAL) units in each RTP packet payload as well as fragmentation | Abstraction Layer (NAL) units in each RTP packet payload, as well as | |||
| of a NAL unit into multiple RTP packets. The payload format has wide | fragmentation of a NAL unit into multiple RTP packets. The payload | |||
| applicability in videoconferencing, Internet video streaming, and | format has wide applicability in videoconferencing, Internet video | |||
| high-bitrate entertainment-quality video, among other applications. | streaming, and high-bitrate entertainment-quality video, among other | |||
| applications. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
| provisions of BCP 78 and BCP 79. | ||||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
| and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
| time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
| material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
| Internet Standards is available in Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 2 February 2023. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9328. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2022 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
| 1.1. Overview of the VVC Codec . . . . . . . . . . . . . . . . 3 | 1.1. Overview of the VVC Codec | |||
| 1.1.1. Coding-Tool Features (informative) . . . . . . . . . 4 | 1.1.1. Coding-Tool Features (Informative) | |||
| 1.1.2. Systems and Transport Interfaces (informative) . . . 6 | 1.1.2. Systems and Transport Interfaces (Informative) | |||
| 1.1.3. High-Level Picture Partitioning (informative) . . . . 11 | 1.1.3. High-Level Picture Partitioning (Informative) | |||
| 1.1.4. NAL Unit Header . . . . . . . . . . . . . . . . . . . 13 | 1.1.4. NAL Unit Header | |||
| 1.2. Overview of the Payload Format . . . . . . . . . . . . . 15 | 1.2. Overview of the Payload Format | |||
| 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 2. Conventions | |||
| 3. Definitions and Abbreviations . . . . . . . . . . . . . . . . 15 | 3. Definitions and Abbreviations | |||
| 3.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 15 | 3.1. Definitions | |||
| 3.1.1. Definitions from the VVC Specification . . . . . . . 16 | 3.1.1. Definitions from the VVC Specification | |||
| 3.1.2. Definitions Specific to This Memo . . . . . . . . . . 19 | 3.1.2. Definitions Specific to This Memo | |||
| 3.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 19 | 3.2. Abbreviations | |||
| 4. RTP Payload Format . . . . . . . . . . . . . . . . . . . . . 20 | 4. RTP Payload Format | |||
| 4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . 21 | 4.1. RTP Header Usage | |||
| 4.2. Payload Header Usage . . . . . . . . . . . . . . . . . . 22 | 4.2. Payload Header Usage | |||
| 4.3. Payload Structures . . . . . . . . . . . . . . . . . . . 22 | 4.3. Payload Structures | |||
| 4.3.1. Single NAL Unit Packets . . . . . . . . . . . . . . . 23 | 4.3.1. Single NAL Unit Packets | |||
| 4.3.2. Aggregation Packets (APs) . . . . . . . . . . . . . . 23 | 4.3.2. Aggregation Packets (APs) | |||
| 4.3.3. Fragmentation Units . . . . . . . . . . . . . . . . . 28 | 4.3.3. Fragmentation Units | |||
| 4.4. Decoding Order Number . . . . . . . . . . . . . . . . . . 31 | 4.4. Decoding Order Number | |||
| 5. Packetization Rules . . . . . . . . . . . . . . . . . . . . . 33 | 5. Packetization Rules | |||
| 6. De-packetization Process . . . . . . . . . . . . . . . . . . 34 | 6. De-packetization Process | |||
| 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 36 | 7. Payload Format Parameters | |||
| 7.1. Media Type Registration . . . . . . . . . . . . . . . . . 36 | 7.1. Media Type Registration | |||
| 7.2. Optional Parameters Definition . . . . . . . . . . . . . 37 | 7.2. Optional Parameters Definition | |||
| 7.3. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 47 | 7.3. SDP Parameters | |||
| 7.3.1. Mapping of Payload Type Parameters to SDP . . . . . . 48 | 7.3.1. Mapping of Payload Type Parameters to SDP | |||
| 7.3.2. Usage with SDP Offer/Answer Model . . . . . . . . . . 50 | 7.3.2. Usage with SDP Offer/Answer Model | |||
| 7.3.3. Multicast . . . . . . . . . . . . . . . . . . . . . . 59 | 7.3.3. Multicast | |||
| 7.3.4. Usage in Declarative Session Descriptions . . . . . . 59 | 7.3.4. Usage in Declarative Session Descriptions | |||
| 7.3.5. Considerations for Parameter Sets . . . . . . . . . . 61 | 7.3.5. Considerations for Parameter Sets | |||
| 8. Use with Feedback Messages | ||||
| 8. Use with Feedback Messages . . . . . . . . . . . . . . . . . 61 | 8.1. Picture Loss Indication (PLI) | |||
| 8.1. Picture Loss Indication (PLI) . . . . . . . . . . . . . . 61 | 8.2. Full Intra Request (FIR) | |||
| 8.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 61 | 9. Security Considerations | |||
| 9. Security Considerations . . . . . . . . . . . . . . . . . . . 62 | 10. Congestion Control | |||
| 10. Congestion Control . . . . . . . . . . . . . . . . . . . . . 63 | 11. IANA Considerations | |||
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 64 | 12. References | |||
| 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 64 | 12.1. Normative References | |||
| 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 64 | 12.2. Informative References | |||
| 13.1. Normative References . . . . . . . . . . . . . . . . . . 64 | Acknowledgements | |||
| 13.2. Informative References . . . . . . . . . . . . . . . . . 66 | Authors' Addresses | |||
| Appendix A. Change History . . . . . . . . . . . . . . . . . . . 68 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 68 | ||||
| 1. Introduction | 1. Introduction | |||
| The Versatile Video Coding specification was formally published as | The Versatile Video Coding specification was formally published as | |||
| both ITU-T Recommendation H.266 [VVC] and ISO/IEC International | both ITU-T Recommendation H.266 [VVC] and ISO/IEC International | |||
| Standard 23090-3 [ISO23090-3]. VVC is reported to provide | Standard 23090-3 [ISO23090-3]. VVC is reported to provide | |||
| significant coding efficiency gains over High Efficiency Video Coding | significant coding efficiency gains over High Efficiency Video Coding | |||
| [HEVC], also known as H.265, and other earlier video codecs. | [HEVC], also known as H.265, and other earlier video codecs. | |||
| This memo specifies an RTP payload format for VVC. It shares its | This memo specifies an RTP payload format for VVC. It shares its | |||
| basic design with the NAL (Network Abstraction Layer) unit based RTP | basic design with the NAL-unit-based RTP payload formats of Advanced | |||
| payload formats of AVC Video Coding [RFC6184], Scalable Video Coding | Video Coding (AVC) [RFC6184], Scalable Video Coding (SVC) [RFC6190], | |||
| (SVC) [RFC6190], High Efficiency Video Coding (HEVC) [RFC7798] and | and High Efficiency Video Coding (HEVC) [RFC7798], as well as their | |||
| their respective predecessors. With respect to design philosophy, | respective predecessors. With respect to design philosophy, | |||
| security, congestion control, and overall implementation complexity, | security, congestion control, and overall implementation complexity, | |||
| it has similar properties to those earlier payload format | it has similar properties to those earlier payload format | |||
| specifications. This is a conscious choice, as at least RFC 6184 is | specifications. This is a conscious choice, as at least [RFC6184] is | |||
| widely deployed and generally known in the relevant implementer | widely deployed and generally known in the relevant implementer | |||
| communities. Certain scalability-related mechanisms known from | communities. Certain scalability-related mechanisms known from | |||
| [RFC6190] were incorporated into this document, as VVC version 1 | [RFC6190] were incorporated into this document, as VVC version 1 | |||
| supports temporal, spatial, and signal-to-noise ratio (SNR) | supports temporal, spatial, and signal-to-noise ratio (SNR) | |||
| scalability. | scalability. | |||
| 1.1. Overview of the VVC Codec | 1.1. Overview of the VVC Codec | |||
| VVC and HEVC share a similar hybrid video codec design. In this | VVC and HEVC share a similar hybrid video codec design. In this | |||
| memo, we provide a very brief overview of those features of VVC that | memo, we provide a very brief overview of those features of VVC that | |||
| are, in some form, addressed by the payload format specified herein. | are, in some form, addressed by the payload format specified herein. | |||
| Implementers have to read, understand, and apply the ITU-T/ISO/IEC | Implementers have to read, understand, and apply the ITU-T/ISO/IEC | |||
| specifications pertaining to VVC to arrive at interoperable, well- | specifications pertaining to VVC to arrive at interoperable, well- | |||
| performing implementations. | performing implementations. | |||
| Conceptually, both VVC and HEVC include a Video Coding Layer (VCL), | Conceptually, both VVC and HEVC include a Video Coding Layer (VCL), | |||
| which is often used to refer to the coding-tool features, and a NAL, | which is often used to refer to the coding-tool features, and a NAL, | |||
| which is often used to refer to the systems and transport interface | which is often used to refer to the systems and transport interface | |||
| aspects of the codecs. | aspects of the codecs. | |||
| 1.1.1. Coding-Tool Features (informative) | 1.1.1. Coding-Tool Features (Informative) | |||
| Coding tool features are described below with occasional reference to | Coding-tool features are described below with occasional reference to | |||
| the coding tool set of HEVC, which is well known in the community. | the coding-tool set of HEVC, which is well known in the community. | |||
| Similar to earlier hybrid-video-coding-based standards, including | Similar to earlier hybrid-video-coding-based standards, including | |||
| HEVC, the following basic video coding design is employed by VVC. A | HEVC, the following basic video coding design is employed by VVC. A | |||
| prediction signal is first formed by either intra- or motion- | prediction signal is first formed by either intra- or motion- | |||
| compensated prediction, and the residual (the difference between the | compensated prediction, and the residual (the difference between the | |||
| original and the prediction) is then coded. The gains in coding | original and the prediction) is then coded. The gains in coding | |||
| efficiency are achieved by redesigning and improving almost all parts | efficiency are achieved by redesigning and improving almost all parts | |||
| of the codec over earlier designs. In addition, VVC includes several | of the codec over earlier designs. In addition, VVC includes several | |||
| tools to make the implementation on parallel architectures easier. | tools to make the implementation on parallel architectures easier. | |||
| Finally, VVC includes temporal, spatial, and SNR scalability as well | Finally, VVC includes temporal, spatial, and SNR scalability, as well | |||
| as multiview coding support. | as multiview coding support. | |||
| Coding blocks and transform structure | Coding blocks and transform structure | |||
| Among major coding-tool differences between HEVC and VVC, one of | ||||
| Among major coding-tool differences between HEVC and VVC, one of the | the important improvements is the more flexible coding tree | |||
| important improvements is the more flexible coding tree structure in | structure in VVC, i.e., multi-type tree. In addition to quadtree, | |||
| VVC, i.e., multi-type tree. In addition to quadtree, binary and | binary and ternary trees are also supported, which contributes | |||
| ternary trees are also supported, which contributes significant | significant improvement in coding efficiency. Moreover, the | |||
| improvement in coding efficiency. Moreover, the maximum size of a | maximum size of a coding tree unit (CTU) is increased from 64x64 | |||
| coding tree unit (CTU) is increased from 64x64 to 128x128. To | to 128x128. To improve the coding efficiency of chroma signal, | |||
| improve the coding efficiency of chroma signal, luma chroma separated | luma-chroma-separated trees at CTU level may be employed for intra | |||
| trees at CTU level may be employed for intra-slices. The square | slices. The square transforms in HEVC are extended to non-square | |||
| transforms in HEVC are extended to non-square transforms for | transforms for rectangular blocks resulting from binary and | |||
| rectangular blocks resulting from binary and ternary tree splits. | ternary tree splits. Besides, VVC supports multiple transform | |||
| Besides, VVC supports multiple transform sets (MTS), including DCT-2, | sets (MTSs), including DCT-2, DST-7, and DCT-8, as well as the | |||
| DST-7, and DCT-8 as well as the non-separable secondary transform. | non-separable secondary transform. The transforms used in VVC can | |||
| The transforms used in VVC can have different sizes with support for | have different sizes with support for larger transform sizes. For | |||
| larger transform sizes. For DCT-2, the transform sizes range from | DCT-2, the transform sizes range from 2x2 to 64x64, and for DST-7 | |||
| 2x2 to 64x64, and for DST-7 and DCT-8, the transform sizes range from | and DCT-8, the transform sizes range from 4x4 to 32x32. In | |||
| 4x4 to 32x32. In addition, VVC also support sub-block transform for | addition, VVC also support sub-block transform for both intra- and | |||
| both intra and inter coded blocks. For intra coded blocks, intra | inter-coded blocks. For intra-coded blocks, intra sub- | |||
| sub-partitioning (ISP) may be used to allow sub-block based intra | partitioning (ISP) may be used to allow sub-block-based intra | |||
| prediction and transform. For inter blocks, sub-block transform may | prediction and transform. For inter blocks, sub-block transform | |||
| be used assuming that only a part of an inter-block has non-zero | may be used assuming that only a part of an inter block has non- | |||
| transform coefficients. | zero transform coefficients. | |||
| Entropy coding | Entropy coding | |||
| Similar to HEVC, VVC uses a single entropy-coding engine, which is | ||||
| Similar to HEVC, VVC uses a single entropy-coding engine, which is | based on context adaptive binary arithmetic coding [CABAC] but | |||
| based on context adaptive binary arithmetic coding [CABAC], but with | with the support of multi-window sizes. The window sizes can be | |||
| the support of multi-window sizes. The window sizes can be | initialized differently for different context models. Due to such | |||
| initialized differently for different context models. Due to such a | a design, it has more efficient adaptation speed and better coding | |||
| design, it has more efficient adaptation speed and better coding | efficiency. A joint chroma residual coding scheme is applied to | |||
| efficiency. A joint chroma residual coding scheme is applied to | further exploit the correlation between the residuals of two color | |||
| further exploit the correlation between the residuals of two color | components. In VVC, different residual coding schemes are applied | |||
| components. In VVC, different residual coding schemes are applied | for regular transform coefficients and residual samples generated | |||
| for regular transform coefficients and residual samples generated | using transform-skip mode. | |||
| using transform-skip mode. | ||||
| In-loop filtering | In-loop filtering | |||
| VVC has more feature support in loop filters than HEVC. The | ||||
| VVC has more feature support in loop filters than HEVC. The | deblocking filter in VVC is similar to HEVC but operates at a | |||
| deblocking filter in VVC is similar to HEVC but operates at a smaller | smaller grid. After deblocking and sample adaptive offset (SAO), | |||
| grid. After deblocking and sample adaptive offset (SAO), an adaptive | an adaptive loop filter (ALF) may be used. As a Wiener filter, | |||
| loop filter (ALF) may be used. As a Wiener filter, ALF reduces | ALF reduces distortion of decoded pictures. Besides, VVC | |||
| distortion of decoded pictures. Besides, VVC introduces a new module | introduces a new module called luma mapping with chroma scaling to | |||
| called luma mapping with chroma scaling to fully utilize the dynamic | fully utilize the dynamic range of signal so that rate-distortion | |||
| range of signal so that rate-distortion performance of both Standard | performance of both Standard Dynamic Range (SDR) and High Dynamic | |||
| Dynamic Range (SDR) and High Dynamic Range (HDR) content is improved. | Range (HDR) content is improved. | |||
| Motion prediction and coding | Motion prediction and coding | |||
| Compared to HEVC, VVC introduces several improvements in this | ||||
| area. First, there is the adaptive motion vector resolution | ||||
| (AMVR), which can save bit cost for motion vectors by adaptively | ||||
| signaling motion vector resolution. Then, the affine motion | ||||
| compensation is included to capture complicated motion-like | ||||
| zooming and rotation. Meanwhile, prediction refinement with the | ||||
| optical flow (PROF) with affine mode is further deployed to mimic | ||||
| affine motion at the pixel level. Thirdly, the decoder-side | ||||
| motion vector refinement (DMVR) is a method to derive the motion | ||||
| vector at the decoder side based on block matching so that fewer | ||||
| bits may be spent on motion vectors. Bidirectional optical flow | ||||
| (BDOF) is a similar method to PROF. BDOF adds a sample-wise | ||||
| offset at the 4x4 sub-block level that is derived with equations | ||||
| based on gradients of the prediction samples and a motion | ||||
| difference relative to coding-unit (CU) motion vectors. | ||||
| Furthermore, merge with motion vector difference (MMVD) is a | ||||
| special mode that further signals a limited set of motion vector | ||||
| differences on top of merge mode. In addition to MMVD, there are | ||||
| another three types of special merge modes, i.e., sub-block merge, | ||||
| triangle, and combined intra/inter prediction (CIIP). The sub- | ||||
| block merge list includes one candidate of sub-block temporal | ||||
| motion vector prediction (SbTMVP) and up to four candidates of | ||||
| affine motion vectors. Triangle is based on triangular block | ||||
| motion compensation. CIIP combines intra and inter predictions | ||||
| with weighting. Adaptive weighting may be employed with a block- | ||||
| level tool called bi-prediction with CU-based weighting (BCW), | ||||
| which provides more flexibility than in HEVC. | ||||
| Compared to HEVC, VVC introduces several improvements in this area. | Intra prediction and intra coding | |||
| First, there is the adaptive motion vector resolution (AMVR), which | To capture the diversified local image texture directions with | |||
| can save bit cost for motion vectors by adaptively signaling motion | finer granularity, VVC supports 65 angular directions instead of | |||
| vector resolution. Then the affine motion compensation is included | 33 directions in HEVC. The intra mode coding is based on a 6- | |||
| to capture complicated motion like zooming and rotation. Meanwhile, | most-probable-modes scheme, and the 6 most probable modes are | |||
| prediction refinement with the optical flow with affine mode (PROF) | derived using the neighboring intra prediction directions. In | |||
| is further deployed to mimic affine motion at the pixel level. | addition, to deal with the different distributions of intra | |||
| Thirdly the decoder side motion vector refinement (DMVR) is a method | prediction angles for different block aspect ratios, a wide-angle- | |||
| to derive MV vector at decoder side based on block matching so that | intra-prediction (WAIP) scheme is applied in VVC by including | |||
| fewer bits may be spent on motion vectors. Bi-directional optical | intra prediction angles beyond those present in HEVC. Unlike | |||
| flow (BDOF) is a similar method to PROF. BDOF adds a sample wise | HEVC, which only allows using the most adjacent line of reference | |||
| offset at 4x4 sub-block level that is derived with equations based on | samples for intra prediction, VVC also allows using two further | |||
| gradients of the prediction samples and a motion difference relative | reference lines, known as multi-reference-line (MRL) intra | |||
| to CU motion vectors. Furthermore, merge with motion vector | prediction. The additional reference lines can be only used for | |||
| difference (MMVD) is a special mode, which further signals a limited | the 6 most probable intra prediction modes. To capture the strong | |||
| set of motion vector differences on top of merge mode. In addition | correlation between different color components, in VVC, a cross- | |||
| to MMVD, there are another three types of special merge modes, i.e., | component linear mode (CCLM) is utilized, which assumes a linear | |||
| sub-block merge, triangle, and combined intra-/inter-prediction | relationship between the luma sample values and their associated | |||
| (CIIP). Sub-block merge list includes one candidate of sub-block | chroma samples. For intra prediction, VVC also applies a | |||
| temporal motion vector prediction (SbTMVP) and up to four candidates | position-dependent prediction combination (PDPC) for refining the | |||
| of affine motion vectors. Triangle is based on triangular block | prediction samples closer to the intra prediction block boundary. | |||
| motion compensation. CIIP combines intra- and inter- predictions | Matrix-based intra prediction (MIP) modes are also used in VVC, | |||
| with weighting. Adaptive weighting may be employed with a block- | which generates an up to 8x8 intra prediction block using a | |||
| level tool called bi-prediction with CU based weighting (BCW) which | weighted sum of downsampled neighboring reference samples, and the | |||
| provides more flexibility than in HEVC. | weights are hard-coded constants. | |||
| Intra prediction and intra-coding | ||||
| To capture the diversified local image texture directions with finer | ||||
| granularity, VVC supports 65 angular directions instead of 33 | ||||
| directions in HEVC. The intra mode coding is based on a 6-most- | ||||
| probable-mode scheme, and the 6 most probable modes are derived using | ||||
| the neighboring intra prediction directions. In addition, to deal | ||||
| with the different distributions of intra prediction angles for | ||||
| different block aspect ratios, a wide-angle intra prediction (WAIP) | ||||
| scheme is applied in VVC by including intra prediction angles beyond | ||||
| those present in HEVC. Unlike HEVC which only allows using the most | ||||
| adjacent line of reference samples for intra prediction, VVC also | ||||
| allows using two further reference lines, as known as multi- | ||||
| reference-line (MRL) intra prediction. The additional reference | ||||
| lines can be only used for the 6 most probable intra prediction | ||||
| modes. To capture the strong correlation between different colour | ||||
| components, in VVC, a cross-component linear mode (CCLM) is utilized | ||||
| which assumes a linear relationship between the luma sample values | ||||
| and their associated chroma samples. For intra prediction, VVC also | ||||
| applies a position-dependent prediction combination (PDPC) for | ||||
| refining the prediction samples closer to the intra prediction block | ||||
| boundary. Matrix-based intra prediction (MIP) modes are also used in | ||||
| VVC which generates an up to 8x8 intra prediction block using a | ||||
| weighted sum of downsampled neighboring reference samples, and the | ||||
| weights are hardcoded constants. | ||||
| Other coding-tool features | Other coding-tool features | |||
| VVC introduces dependent quantization (DQ) to reduce quantization | ||||
| error by state-based switching between two quantizers. | ||||
| VVC introduces dependent quantization (DQ) to reduce quantization | 1.1.2. Systems and Transport Interfaces (Informative) | |||
| error by state-based switching between two quantizers. | ||||
| 1.1.2. Systems and Transport Interfaces (informative) | ||||
| VVC inherits the basic systems and transport interfaces designs from | VVC inherits the basic systems and transport interface designs from | |||
| HEVC and AVC. These include the NAL-unit-based syntax structure, the | HEVC and AVC. These include the NAL-unit-based syntax structure, the | |||
| hierarchical syntax and data unit structure, the supplemental | hierarchical syntax and data unit structure, the supplemental | |||
| enhancement information (SEI) message mechanism, and the video | enhancement information (SEI) message mechanism, and the video | |||
| buffering model based on the hypothetical reference decoder (HRD). | buffering model based on the hypothetical reference decoder (HRD). | |||
| The scalability features of VVC are conceptually similar to the | The scalability features of VVC are conceptually similar to the | |||
| scalable variant of HEVC known as SHVC. The hierarchical syntax and | scalable extension of HEVC, known as SHVC. The hierarchical syntax | |||
| data unit structure consists of parameter sets at various levels | and data unit structure consists of parameter sets at various levels | |||
| (decoder, sequence (pertaining to all), sequence (pertaining to a | (i.e., decoder, sequence (pertaining to all), sequence (pertaining to | |||
| single), picture), picture-level header parameters, slice-level | a single), and picture), picture-level header parameters, slice-level | |||
| header parameters, and lower-level parameters. | header parameters, and lower-level parameters. | |||
| A number of key components that influenced the network abstraction | A number of key components that influenced the network abstraction | |||
| layer design of VVC as well as this memo are described below | layer design of VVC, as well as this memo, are described below | |||
| Decoding capability information | Decoding capability information | |||
| The decoding capability information includes parameters that stay | The decoding capability information (DCI) includes parameters that | |||
| constant for the lifetime of a VVC bitstream in the duration of a | stay constant for the lifetime of a VVC bitstream in the duration | |||
| video conference, continuous video stream, and similar--any video | of a video conference, continuous video stream, and similar, i.e., | |||
| that is processed by a decoder between setup and teardown. For | any video that is processed by a decoder between setup and | |||
| streaming, the requirement of constant parameters pertains through | teardown. For streaming, the requirement of constant parameters | |||
| splicing. Such information includes profile, level, and sub-profile | pertains through splicing. Such information includes profile, | |||
| information to determine a maximum capability interop point that is | level, and sub-profile information to determine a maximum | |||
| guaranteed to be never exceeded, even if splicing of video sequences | capability interop point that is guaranteed to never be exceeded, | |||
| occurs within a session. It further includes constraint fields (most | even if splicing of video sequences occurs within a session. It | |||
| of which are flags), which can optionally be set to indicate that the | further includes constraint fields (most of which are flags), | |||
| video bitstream will be constrained in the use of certain features as | which can optionally be set to indicate that the video bitstream | |||
| indicated by the values of those fields. With this, a bitstream can | will be constrained in the use of certain features, as indicated | |||
| be labeled as not using certain tools, which allows among other | by the values of those fields. With this, a bitstream can be | |||
| things for resource allocation in a decoder implementation. | labeled as not using certain tools, which allows, among other | |||
| things, for resource allocation in a decoder implementation. | ||||
| Video parameter set | Video parameter set | |||
| The video parameter set (VPS) pertains to one or more coded video | ||||
| The video parameter set (VPS) pertains to one or more coded video | sequences (CVSs) of multiple layers covering the same range of | |||
| sequences (CVSs) of multiple layers covering the same range of access | access units and includes, among other information, decoding | |||
| units, and includes, among other information, decoding dependency | dependency expressed as information for reference-picture-list | |||
| expressed as information for reference picture list construction of | construction of enhancement layers. The VPS provides a "big | |||
| enhancement layers. The VPS provides a "big picture" of a scalable | picture" of a scalable sequence, including what types of operation | |||
| sequence, including what types of operation points are provided, the | points are provided; the profile, tier, and level of the operation | |||
| profile, tier, and level of the operation points, and some other | points; and some other high-level properties of the bitstream that | |||
| high-level properties of the bitstream that can be used as the basis | can be used as the basis for session negotiation and content | |||
| for session negotiation and content selection, etc. One VPS may be | selection, etc. One VPS may be referenced by one or more sequence | |||
| referenced by one or more sequence parameter sets. | parameter sets. | |||
| Sequence parameter set | Sequence parameter set | |||
| The sequence parameter set (SPS) contains syntax elements | ||||
| The sequence parameter set (SPS) contains syntax elements pertaining | pertaining to a coded layer video sequence (CLVS), which is a | |||
| to a coded layer video sequence (CLVS), which is a group of pictures | group of pictures belonging to the same layer, starting with a | |||
| belonging to the same layer, starting with a random access point, and | random access point, and followed by pictures that may depend on | |||
| followed by pictures that may depend on each other, until the next | each other until the next random access point picture. In MPEG-2, | |||
| random access point picture. In MPEG-2, the equivalent of a CVS was | the equivalent of a CVS was a group of pictures (GOP), which | |||
| a group of pictures (GOP), which normally started with an I frame and | normally started with an I frame and was followed by P and B | |||
| was followed by P and B frames. While more complex in its options of | frames. While more complex in its options of random access | |||
| random access points, VVC retains this basic concept. One remarkable | points, VVC retains this basic concept. One remarkable difference | |||
| difference of VVC is that a CLVS may start with a Gradual Decoding | of VVC is that a CLVS may start with a Gradual Decoding Refresh | |||
| Refresh (GDR) picture, without requiring presence of traditional | (GDR) picture without requiring presence of traditional random | |||
| random access points in the bitstream, such as instantaneous decoding | access points in the bitstream, such as instantaneous decoding | |||
| refresh (IDR) or clean random access (CRA) pictures. In many TV-like | refresh (IDR) or clean random access (CRA) pictures. In many TV- | |||
| applications, a CVS contains a few hundred milliseconds to a few | like applications, a CVS contains a few hundred milliseconds to a | |||
| seconds of video. In video conferencing (without switching MCUs | few seconds of video. In video conferencing (without switching | |||
| involved), a CVS can be as long in duration as the whole session. | Multipoint Control Units (MCUs) involved), a CVS can be as long in | |||
| duration as the whole session. | ||||
| Picture and adaptation parameter set | Picture and adaptation parameter set | |||
| The picture parameter set and the adaptation parameter set (PPS and | The picture parameter set (PPS) and the adaptation parameter set | |||
| APS, respectively) carry information pertaining to zero or more | (APS) carry information pertaining to zero or more pictures and | |||
| pictures and zero or more slices, respectively. The PPS contains | zero or more slices, respectively. The PPS contains information | |||
| information that is likely to stay constant from picture to picture, | that is likely to stay constant from picture to picture, at least | |||
| at least for pictures for a certain type-whereas the APS contains | for pictures for a certain type, whereas the APS contains | |||
| information, such as adaptive loop filter coefficients, that are | information, such as adaptive loop filter coefficients, that are | |||
| likely to change from picture to picture or even within a picture. A | likely to change from picture to picture or even within a picture. | |||
| single APS is referenced by all slices of the same picture if that | A single APS is referenced by all slices of the same picture if | |||
| APS contains information about luma mapping with chroma scaling | that APS contains information about luma mapping with chroma | |||
| (LMCS) or scaling list. Different APSs containing ALF parameters can | scaling (LMCS) or a scaling list. Different APSs containing ALF | |||
| be referenced by slices of the same picture. | parameters can be referenced by slices of the same picture. | |||
| Picture header | Picture header | |||
| A picture header (PH) contains information that is common to all | ||||
| A Picture Header contains information that is common to all slices | slices that belong to the same picture. Being able to send that | |||
| that belong to the same picture. Being able to send that information | information as a separate NAL unit when pictures are split into | |||
| as a separate NAL unit when pictures are split into several slices | several slices allows for saving bitrate, compared to repeating | |||
| allows for saving bitrate, compared to repeating the same information | the same information in all slices. However, there might be | |||
| in all slices. However, there might be scenarios where low-bitrate | scenarios where low-bitrate video is transmitted using a single | |||
| video is transmitted using a single slice per picture. Having a | slice per picture. Having a separate NAL unit to convey that | |||
| separate NAL unit to convey that information incurs in an overhead | information incurs in an overhead for such scenarios. For such | |||
| for such scenarios. For such scenarios, the picture header syntax | scenarios, the picture header syntax structure is directly | |||
| structure is directly included in the slice header, instead of its | included in the slice header, instead of its own NAL unit. The | |||
| own NAL unit. The mode of the picture header syntax structure being | mode of the picture header syntax structure being included in its | |||
| included in its own NAL unit or not can only be switched on/off for | own NAL unit or not can only be switched on/off for an entire CLVS | |||
| an entire CLVS, and can only be switched off when in the entire CLVS | and can only be switched off when, in the entire CLVS, each | |||
| each picture contains only one slice. | picture contains only one slice. | |||
| Profile, tier, and level | Profile, tier, and level | |||
| The profile, tier, and level syntax structures in DCI, VPS, and | ||||
| The profile, tier and level syntax structures in DCI, VPS and SPS | SPS contain profile, tier, and level information for all layers | |||
| contain profile, tier, level information for all layers that refer to | that refer to the DCI, for layers associated with one or more | |||
| the DCI, for layers associated with one or more output layer sets | output layer sets specified by the VPS, and for any layer that | |||
| specified by the VPS, and for any layer that refers to the SPS, | refers to the SPS, respectively. | |||
| respectively. | ||||
| Sub-profiles | Sub-profiles | |||
| Within the VVC specification, a sub-profile is a 32-bit number, | ||||
| Within the VVC specification, a sub-profile is a 32-bit number, coded | coded according to ITU-T Recommendation T.35, that does not carry | |||
| according to ITU-T Rec. T.35, that does not carry a semantics. It is | semantics. It is carried in the profile_tier_level structure and | |||
| carried in the profile_tier_level structure and hence (potentially) | hence is (potentially) present in the DCI, VPS, and SPS. External | |||
| present in the DCI, VPS, and SPS. External registration bodies can | registration bodies can register a T.35 codepoint with ITU-T | |||
| register a T.35 codepoint with ITU-T registration authorities and | registration authorities and associate with their registration a | |||
| associate with their registration a description of bitstream | description of bitstream restrictions beyond the profiles defined | |||
| restrictions beyond the profiles defined by ITU-T and ISO/IEC. This | by ITU-T and ISO/IEC. This would allow encoder manufacturers to | |||
| would allow encoder manufacturers to label the bitstreams generated | label the bitstreams generated by their encoder as complying with | |||
| by their encoder as complying with such sub-profile. It is expected | such sub-profile. It is expected that upstream standardization | |||
| that upstream standardization organizations (such as: DVB and ATSC), | organizations (such as Digital Video Broadcasting (DVB) and | |||
| as well as walled-garden video services will take advantage of this | Advanced Television Systems Committee (ATSC)), as well as walled- | |||
| labeled system. In contrast to "normal" profiles, it is expected | garden video services, will take advantage of this labeled system. | |||
| that sub-profiles may indicate encoder choices traditionally left | In contrast to "normal" profiles, it is expected that sub-profiles | |||
| open in the (decoder-centric) video coding specs, such as GOP | may indicate encoder choices traditionally left open in the | |||
| structures, minimum/maximum QP values, and the mandatory use of | (decoder-centric) video coding specifications, such as GOP | |||
| certain tools or SEI messages. | structures, minimum/maximum Quantizer Parameter (QP) values, and | |||
| the mandatory use of certain tools or SEI messages. | ||||
| General constraint fields | General constraint fields | |||
| The profile_tier_level structure carries a considerable number of | ||||
| The profile_tier_level structure carries a considerable number of | constraint fields (most of which are flags), which an encoder can | |||
| constraint fields (most of which are flags), which an encoder can use | use to indicate to a decoder that it will not use a certain tool | |||
| to indicate to a decoder that it will not use a certain tool or | or technology. They were included in reaction to a perceived | |||
| technology. They were included in reaction to a perceived market | market need to label a bitstream as not exercising a certain tool | |||
| need for labeled a bitstream as not exercising a certain tool that | that has become commercially unviable. | |||
| has become commercially unviable. | ||||
| Temporal scalability support | Temporal scalability support | |||
| VVC includes support of temporal scalability, by the inclusion of | ||||
| VVC includes support of temporal scalability, by inclusion of the | the signaling of TemporalId in the NAL unit header, the | |||
| signaling of TemporalId in the NAL unit header, the restriction that | restriction that pictures of a particular temporal sublayer cannot | |||
| pictures of a particular temporal sublayer cannot be used for inter | be used for inter prediction reference by pictures of a lower | |||
| prediction reference by pictures of a lower temporal sublayer, the | temporal sublayer, the sub-bitstream extraction process, and the | |||
| sub-bitstream extraction process, and the requirement that each sub- | requirement that each sub-bitstream extraction output be a | |||
| bitstream extraction output be a conforming bitstream. Media-Aware | conforming bitstream. Media-Aware Network Elements (MANEs) can | |||
| Network Elements (MANEs) can utilize the TemporalId in the NAL unit | utilize the TemporalId in the NAL unit header for stream | |||
| header for stream adaptation purposes based on temporal scalability. | adaptation purposes based on temporal scalability. | |||
| Reference picture resampling (RPR) | Reference picture resampling (RPR) | |||
| In AVC and HEVC, the spatial resolution of pictures cannot change | ||||
| In AVC and HEVC, the spatial resolution of pictures cannot change | unless a new sequence using a new SPS starts, with an intra random | |||
| unless a new sequence using a new SPS starts, with an Intra random | access point (IRAP) picture. VVC enables picture resolution | |||
| access point (IRAP) picture. VVC enables picture resolution change | change within a sequence at a position without encoding an IRAP | |||
| within a sequence at a position without encoding an IRAP picture, | picture, which is always intra coded. This feature is sometimes | |||
| which is always intra-coded. This feature is sometimes referred to | referred to as reference picture resampling (RPR), as the feature | |||
| as reference picture resampling (RPR), as the feature needs | needs resampling of a reference picture used for inter prediction | |||
| resampling of a reference picture used for inter prediction when that | when that reference picture has a different resolution than the | |||
| reference picture has a different resolution than the current picture | current picture being decoded. RPR allows resolution change | |||
| being decoded. RPR allows resolution change without the need of | without the need of coding an IRAP picture and hence avoids a | |||
| coding an IRAP picture and hence avoids a momentary bit rate spike | momentary bit rate spike caused by an IRAP picture in streaming or | |||
| caused by an IRAP picture in streaming or video conferencing | video conferencing scenarios, e.g., to cope with network condition | |||
| scenarios, e.g., to cope with network condition changes. RPR can | changes. RPR can also be used in application scenarios wherein | |||
| also be used in application scenarios wherein zooming of the entire | zooming of the entire video region or some region of interest is | |||
| video region or some region of interest is needed. | needed. | |||
| Spatial, SNR, and multiview scalability | Spatial, SNR, and multiview scalability | |||
| VVC includes support for spatial, SNR, and multiview scalability. | VVC includes support for spatial, SNR, and multiview scalability. | |||
| Scalable video coding is widely considered to have technical benefits | Scalable video coding is widely considered to have technical | |||
| and enrich services for various video applications. Until recently, | benefits and enrich services for various video applications. | |||
| however, the functionality has not been included in the first version | Until recently, however, the functionality has not been included | |||
| of specifications of the video codecs. In VVC, however, all those | in the first version of specifications of the video codecs. In | |||
| forms of scalability are supported in the first version of VVC | VVC, however, all those forms of scalability are supported in the | |||
| natively through the signaling of the nuh_layer_id in the NAL unit | first version of VVC natively through the signaling of the | |||
| header, the VPS which associates layers with given nuh_layer_id to | nuh_layer_id in the NAL unit header, the VPS that associates | |||
| each other, reference picture selection, reference picture resampling | layers with the given nuh_layer_id to each other, reference | |||
| for spatial scalability, and a number of other mechanisms not | picture selection, reference picture resampling for spatial | |||
| relevant for this memo. | scalability, and a number of other mechanisms not relevant for | |||
| this memo. | ||||
| Spatial scalability | Spatial scalability | |||
| With the existence of reference picture resampling (RPR), the | ||||
| With the existence of Reference Picture Resampling (RPR), the | ||||
| additional burden for scalability support is just a | additional burden for scalability support is just a | |||
| modification of the high-level syntax (HLS). The inter-layer | modification of the high-level syntax (HLS). The inter-layer | |||
| prediction is employed in a scalable system to improve the | prediction is employed in a scalable system to improve the | |||
| coding efficiency of the enhancement layers. In addition to | coding efficiency of the enhancement layers. In addition to | |||
| the spatial and temporal motion-compensated predictions that | the spatial and temporal motion-compensated predictions that | |||
| are available in a single-layer codec, the inter-layer | are available in a single-layer codec, the inter-layer | |||
| prediction in VVC uses the possibly resampled video data of the | prediction in VVC uses the possibly resampled video data of the | |||
| reconstructed reference picture from a reference layer to | reconstructed reference picture from a reference layer to | |||
| predict the current enhancement layer. The resampling process | predict the current enhancement layer. The resampling process | |||
| for inter-layer prediction, when used, is performed at the | for inter-layer prediction, when used, is performed at the | |||
| block-level, reusing the existing interpolation process for | block level, reusing the existing interpolation process for | |||
| motion compensation in single-layer coding. It means that no | motion compensation in single-layer coding. It means that no | |||
| additional resampling process is needed to support spatial | additional resampling process is needed to support spatial | |||
| scalability. | scalability. | |||
| SNR scalability | SNR scalability | |||
| SNR scalability is similar to spatial scalability except that | ||||
| SNR scalability is similar to spatial scalability except that | ||||
| the resampling factors are 1:1. In other words, there is no | the resampling factors are 1:1. In other words, there is no | |||
| change in resolution, but there is inter-layer prediction. | change in resolution, but there is inter-layer prediction. | |||
| Multiview scalability | Multiview scalability | |||
| The first version of VVC also supports multiview scalability, | ||||
| The first version of VVC also supports multiview scalability, | ||||
| wherein a multi-layer bitstream carries layers representing | wherein a multi-layer bitstream carries layers representing | |||
| multiple views, and one or more of the represented views can be | multiple views, and one or more of the represented views can be | |||
| output at the same time. | output at the same time. | |||
| SEI messages | SEI messages | |||
| Supplemental enhancement information (SEI) messages are | ||||
| information in the bitstream that do not influence the decoding | ||||
| process as specified in the VVC specification but address issues | ||||
| of representation/rendering of the decoded bitstream, label the | ||||
| bitstream for certain applications, and other, similar tasks. The | ||||
| overall concept of SEI messages and many of the messages | ||||
| themselves has been inherited from the AVC and HEVC | ||||
| specifications. Except for the SEI messages that affect the | ||||
| specification of the hypothetical reference decoder (HRD), other | ||||
| SEI messages for use in the VVC environment, which are generally | ||||
| useful also in other video coding technologies, are not included | ||||
| in the main VVC specification but in a companion specification | ||||
| [VSEI]. | ||||
| Supplemental enhancement information (SEI) messages are information | 1.1.3. High-Level Picture Partitioning (Informative) | |||
| in the bitstream that do not influence the decoding process as | ||||
| specified in the VVC spec, but address issues of representation/ | ||||
| rendering of the decoded bitstream, label the bitstream for certain | ||||
| applications, among other, similar tasks. The overall concept of SEI | ||||
| messages and many of the messages themselves has been inherited from | ||||
| the AVC and HEVC specs. Except for the SEI messages that affect the | ||||
| specification of the hypothetical reference decoder (HRD), other SEI | ||||
| messages for use in the VVC environment, which are generally useful | ||||
| also in other video coding technologies, are not included in the main | ||||
| VVC specification but in a companion specification [VSEI]. | ||||
| 1.1.3. High-Level Picture Partitioning (informative) | ||||
| VVC inherited the concept of tiles and wavefront parallel processing | VVC inherited the concept of tiles and wavefront parallel processing | |||
| (WPP) from HEVC, with some minor to moderate differences. The basic | (WPP) from HEVC, with some minor to moderate differences. The basic | |||
| concept of slices was kept in VVC but designed in an essentially | concept of slices was kept in VVC but designed in an essentially | |||
| different form. VVC is the first video coding standard that includes | different form. VVC is the first video coding standard that includes | |||
| subpictures as a feature, which provides the same functionality as | subpictures as a feature, which provides the same functionality as | |||
| HEVC motion-constrained tile sets (MCTSs) but designed differently to | HEVC motion-constrained tile sets (MCTSs) but designed differently to | |||
| have better coding efficiency and to be friendlier for usage in | have better coding efficiency and to be friendlier for usage in | |||
| application systems. More details of these differences are described | application systems. More details of these differences are described | |||
| below. | below. | |||
| Tiles and WPP | Tiles and WPP | |||
| Same as in HEVC, a picture can be split into tile rows and tile | ||||
| Same as in HEVC, a picture can be split into tile rows and tile | columns in VVC, in-picture prediction across tile boundaries is | |||
| columns in VVC, in-picture prediction across tile boundaries is | disallowed, etc. However, the syntax for signaling of tile | |||
| disallowed, etc. However, the syntax for signaling of tile | partitioning has been simplified by using a unified syntax design | |||
| partitioning has been simplified, by using a unified syntax design | for both the uniform and the non-uniform mode. In addition, | |||
| for both the uniform and the non-uniform mode. In addition, | signaling of entry point offsets for tiles in the slice header is | |||
| signaling of entry point offsets for tiles in the slice header is | optional in VVC, while it is mandatory in HEVC. The WPP design in | |||
| optional in VVC while it is mandatory in HEVC. The WPP design in VVC | VVC has two differences compared to HEVC: i) the CTU row delay is | |||
| has two differences compared to HEVC: i) The CTU row delay is reduced | reduced from two CTUs to one CTU, and ii) signaling of entry point | |||
| from two CTUs to one CTU; ii) signaling of entry point offsets for | offsets for WPP in the slice header is optional in VVC while it is | |||
| WPP in the slice header is optional in VVC while it is mandatory in | mandatory in HEVC. | |||
| HEVC. | ||||
| Slices | Slices | |||
| In VVC, the conventional slices based on CTUs (as in HEVC) or | ||||
| macroblocks (as in AVC) have been removed. The main reasoning | ||||
| behind this architectural change is as follows. The advances in | ||||
| video coding since 2003 (the publication year of AVC v1) have been | ||||
| such that slice-based error concealment has become practically | ||||
| impossible due to the ever-increasing number and efficiency of in- | ||||
| picture and inter-picture prediction mechanisms. An error- | ||||
| concealed picture is the decoding result of a transmitted coded | ||||
| picture for which there is some data loss (e.g., loss of some | ||||
| slices) of the coded picture or a reference picture, as at least | ||||
| some part of the coded picture is not error-free (e.g., that | ||||
| reference picture was an error-concealed picture). For example, | ||||
| when one of the multiple slices of a picture is lost, it may be | ||||
| error-concealed using an interpolation of the neighboring slices. | ||||
| While advanced video coding prediction mechanisms provide | ||||
| significantly higher coding efficiency, they also make it harder | ||||
| for machines to estimate the quality of an error-concealed | ||||
| picture, which was already a hard problem with the use of simpler | ||||
| prediction mechanisms. Advanced in-picture prediction mechanisms | ||||
| also cause the coding efficiency loss due to splitting a picture | ||||
| into multiple slices to be more significant. Furthermore, network | ||||
| conditions become significantly better while, at the same time, | ||||
| techniques for dealing with packet losses have become | ||||
| significantly improved. As a result, very few implementations | ||||
| have recently used slices for maximum-transmission-unit-size | ||||
| matching. Instead, substantially all applications where low-delay | ||||
| error resilience is required (e.g., video telephony and video | ||||
| conferencing) rely on system/transport-level error resilience | ||||
| (e.g., retransmission or forward error correction) and/or picture- | ||||
| based error resilience tools (e.g., feedback-based error | ||||
| resilience, insertion of IRAPs, scalability with a higher | ||||
| protection level of the base layer, and so on). Considering all | ||||
| the above, nowadays, it is very rare that a picture that cannot be | ||||
| correctly decoded is passed to the decoder, and when such a rare | ||||
| case occurs, the system can afford to wait for an error-free | ||||
| picture to be decoded and available for display without resulting | ||||
| in frequent and long periods of picture freezing seen by end | ||||
| users. | ||||
| In VVC, the conventional slices based on CTUs (as in HEVC) or | Slices in VVC have two modes: rectangular slices and raster-scan | |||
| macroblocks (as in AVC) have been removed. The main reasoning behind | slices. The rectangular slice, as indicated by its name, covers a | |||
| this architectural change is as follows. The advances in video | rectangular region of the picture. Typically, a rectangular slice | |||
| coding since 2003 (the publication year of AVC v1) have been such | consists of several complete tiles. However, it is also possible | |||
| that slice-based error concealment has become practically impossible, | that a rectangular slice is a subset of a tile and consists of one | |||
| due to the ever-increasing number and efficiency of in-picture and | or more consecutive, complete CTU rows within a tile. A raster- | |||
| inter-picture prediction mechanisms. An error-concealed picture is | scan slice consists of one or more complete tiles in a tile | |||
| the decoding result of a transmitted coded picture for which there is | raster-scan order; hence, the region covered by raster-scan slices | |||
| some data loss (e.g., loss of some slices) of the coded picture or a | need not but could have a non-rectangular shape, but it may also | |||
| reference picture for at least some part of the coded picture is not | happen to have the shape of a rectangle. The concept of slices in | |||
| error-free (e.g., that reference picture was an error-concealed | VVC is therefore strongly linked to or based on tiles instead of | |||
| picture). For example, when one of the multiple slices of a picture | CTUs (as in HEVC) or macroblocks (as in AVC). | |||
| is lost, it may be error-concealed using an interpolation of the | ||||
| neighboring slices. While advanced video coding prediction | ||||
| mechanisms provide significantly higher coding efficiency, they also | ||||
| make it harder for machines to estimate the quality of an error- | ||||
| concealed picture, which was already a hard problem with the use of | ||||
| simpler prediction mechanisms. Advanced in-picture prediction | ||||
| mechanisms also cause the coding efficiency loss due to splitting a | ||||
| picture into multiple slices to be more significant. Furthermore, | ||||
| network conditions become significantly better while at the same time | ||||
| techniques for dealing with packet losses have become significantly | ||||
| improved. As a result, very few implementations have recently used | ||||
| slices for maximum transmission unit size matching. Instead, | ||||
| substantially all applications where low-delay error resilience is | ||||
| required (e.g., video telephony and video conferencing) rely on | ||||
| system/transport-level error resilience (e.g., retransmission, | ||||
| forward error correction) and/or picture-based error resilience tools | ||||
| (feedback-based error resilience, insertion of IRAPs, scalability | ||||
| with higher protection level of the base layer, and so on). | ||||
| Considering all the above, nowadays it is very rare that a picture | ||||
| that cannot be correctly decoded is passed to the decoder, and when | ||||
| such a rare case occurs, the system can afford to wait for an error- | ||||
| free picture to be decoded and available for display without | ||||
| resulting in frequent and long periods of picture freezing seen by | ||||
| end users. | ||||
| Slices in VVC have two modes: rectangular slices and raster-scan | ||||
| slices. The rectangular slice, as indicated by its name, covers a | ||||
| rectangular region of the picture. Typically, a rectangular slice | ||||
| consists of several complete tiles. However, it is also possible | ||||
| that a rectangular slice is a subset of a tile and consists of one or | ||||
| more consecutive, complete CTU rows within a tile. A raster-scan | ||||
| slice consists of one or more complete tiles in a tile raster scan | ||||
| order, hence the region covered by a raster-scan slices need not but | ||||
| could have a non-rectangular shape, but it may also happen to have | ||||
| the shape of a rectangle. The concept of slices in VVC is therefore | ||||
| strongly linked to or based on tiles instead of CTUs (as in HEVC) or | ||||
| macroblocks (as in AVC). | ||||
| Subpictures | Subpictures | |||
| VVC is the first video coding standard that includes the support | ||||
| of subpictures as a feature. Each subpicture consists of one or | ||||
| more complete rectangular slices that collectively cover a | ||||
| rectangular region of the picture. A subpicture may be either | ||||
| specified to be extractable (i.e., coded independently of other | ||||
| subpictures of the same picture and of earlier pictures in | ||||
| decoding order) or not extractable. Regardless of whether a | ||||
| subpicture is extractable or not, the encoder can control whether | ||||
| in-loop filtering (including deblocking, SAO, and ALF) is applied | ||||
| across the subpicture boundaries individually for each subpicture. | ||||
| VVC is the first video coding standard that includes the support of | Functionally, subpictures are similar to the motion-constrained | |||
| subpictures as a feature. Each subpicture consists of one or more | tile sets (MCTSs) in HEVC. They both allow independent coding and | |||
| complete rectangular slices that collectively cover a rectangular | extraction of a rectangular subset of a sequence of coded pictures | |||
| region of the picture. A subpicture may be either specified to be | for use cases like viewport-dependent 360-degree video streaming | |||
| extractable (i.e., coded independently of other subpictures of the | optimization and region of interest (ROI) applications. | |||
| same picture and of earlier pictures in decoding order) or not | ||||
| extractable. Regardless of whether a subpicture is extractable or | ||||
| not, the encoder can control whether in-loop filtering (including | ||||
| deblocking, SAO, and ALF) is applied across the subpicture boundaries | ||||
| individually for each subpicture. | ||||
| Functionally, subpictures are similar to the motion-constrained tile | ||||
| sets (MCTSs) in HEVC. They both allow independent coding and | ||||
| extraction of a rectangular subset of a sequence of coded pictures, | ||||
| for use cases like viewport-dependent 360o video streaming | ||||
| optimization and region of interest (ROI) applications. | ||||
| There are several important design differences between subpictures | There are several important design differences between subpictures | |||
| and MCTSs. First, the subpictures feature in VVC allows motion | and MCTSs. First, the subpictures featured in VVC allow motion | |||
| vectors of a coding block pointing outside of the subpicture even | vectors of a coding block to point outside of the subpicture, even | |||
| when the subpicture is extractable by applying sample padding at | when the subpicture is extractable by applying sample padding at | |||
| subpicture boundaries in this case, similarly as at picture | the subpicture boundaries, in this case, similarly as at picture | |||
| boundaries. Second, additional changes were introduced for the | boundaries. Second, additional changes were introduced for the | |||
| selection and derivation of motion vectors in the merge mode and in | selection and derivation of motion vectors in the merge mode and | |||
| the decoder side motion vector refinement process of VVC. This | in the decoder-side motion vector refinement process of VVC. This | |||
| allows higher coding efficiency compared to the non-normative motion | allows higher coding efficiency compared to the non-normative | |||
| constraints applied at the encoder-side for MCTSs. Third, rewriting | motion constraints applied at the encoder-side for MCTSs. Third, | |||
| of SHs (and PH NAL units, when present) is not needed when extracting | rewriting of slice headers (SHs) (and PH NAL units, when present) | |||
| one or more extractable subpictures from a sequence of pictures to | is not needed when extracting one or more extractable subpictures | |||
| create a sub-bitstream that is a conforming bitstream. In sub- | from a sequence of pictures to create a sub-bitstream that is a | |||
| bitstream extractions based on HEVC MCTSs, rewriting of SHs is | conforming bitstream. In sub-bitstream extractions based on HEVC | |||
| needed. Note that in both HEVC MCTSs extraction and VVC subpictures | MCTSs, rewriting of SHs is needed. Note that, in both HEVC MCTSs | |||
| extraction, rewriting of SPSs and PPSs is needed. However, typically | extraction and VVC subpictures extraction, rewriting of SPSs and | |||
| there are only a few parameter sets in a bitstream, while each | PPSs is needed. However, typically, there are only a few | |||
| picture has at least one slice, therefore rewriting of SHs can be a | parameter sets in a bitstream, whereas each picture has at least | |||
| significant burden for application systems. Fourth, slices of | one slice; therefore, rewriting of SHs can be a significant burden | |||
| different subpictures within a picture are allowed to have different | for application systems. Fourth, slices of different subpictures | |||
| NAL unit types. Fifth, VVC specifies HRD and level definitions for | within a picture are allowed to have different NAL unit types. | |||
| subpicture sequences, thus the conformance of the sub-bitstream of | Fifth, VVC specifies HRD and level definitions for subpicture | |||
| each extractable subpicture sequence can be ensured by encoders. | sequences, thus the conformance of the sub-bitstream of each | |||
| extractable subpicture sequence can be ensured by encoders. | ||||
| 1.1.4. NAL Unit Header | 1.1.4. NAL Unit Header | |||
| VVC maintains the NAL unit concept of HEVC with modifications. VVC | VVC maintains the NAL unit concept of HEVC with modifications. VVC | |||
| uses a two-byte NAL unit header, as shown in Figure 1. The payload | uses a two-byte NAL unit header, as shown in Figure 1. The payload | |||
| of a NAL unit refers to the NAL unit excluding the NAL unit header. | of a NAL unit refers to the NAL unit excluding the NAL unit header. | |||
| +---------------+---------------+ | +---------------+---------------+ | |||
| |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| | |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |F|Z| LayerID | Type | TID | | |F|Z| LayerID | Type | TID | | |||
| +---------------+---------------+ | +---------------+---------------+ | |||
| The Structure of the VVC NAL Unit Header. | Figure 1: The Structure of the VVC NAL Unit Header | |||
| Figure 1 | ||||
| The semantics of the fields in the NAL unit header are as specified | The semantics of the fields in the NAL unit header are as specified | |||
| in VVC and described briefly below for convenience. In addition to | in VVC and described briefly below for convenience. In addition to | |||
| the name and size of each field, the corresponding syntax element | the name and size of each field, the corresponding syntax element | |||
| name in VVC is also provided. | name in VVC is also provided. | |||
| F: 1 bit | F: 1 bit | |||
| forbidden_zero_bit. This field is required to be zero in VVC. | ||||
| forbidden_zero_bit. Required to be zero in VVC. Note that the | Note that the inclusion of this bit in the NAL unit header was to | |||
| inclusion of this bit in the NAL unit header was to enable | enable transport of VVC video over MPEG-2 transport systems | |||
| transport of VVC video over MPEG-2 transport systems (avoidance of | (avoidance of start code emulations) [MPEG2S]. In the context of | |||
| start code emulations) [MPEG2S]. In the context of this payload | this payload format, the value 1 may be used to indicate a syntax | |||
| format, the value 1 may be used to indicate a syntax violation, | violation, e.g., for a NAL unit resulted from aggregating a number | |||
| e.g., for a NAL unit resulted from aggregating a number of | of fragmented units of a NAL unit but missing the last fragment, | |||
| fragmented units of a NAL unit but missing the last fragment, as | as described in the last sentence of Section 4.3.3. | |||
| described in the last sentence of section 4.3.3. | ||||
| Z: 1 bit | Z: 1 bit | |||
| nuh_reserved_zero_bit. This field is required to be zero in VVC, | ||||
| nuh_reserved_zero_bit. Required to be zero in VVC, and reserved | and reserved for future extensions by ITU-T and ISO/IEC. | |||
| for future extensions by ITU-T and ISO/IEC. | This memo does not overload the "Z" bit for local extensions a) | |||
| This memo does not overload the "Z" bit for local extensions, as | because overloading the "F" bit is sufficient and b) in order to | |||
| a) overloading the "F" bit is sufficient and b) to preserve the | preserve the usefulness of this memo to possible future versions | |||
| usefulness of this memo to possible future versions of [VVC]. | of [VVC]. | |||
| LayerId: 6 bits | LayerId: 6 bits | |||
| nuh_layer_id. This field identifies the layer a NAL unit belongs | ||||
| nuh_layer_id. Identifies the layer a NAL unit belongs to, wherein | to, wherein a layer may be, e.g., a spatial scalable layer, a | |||
| a layer may be, e.g., a spatial scalable layer, a quality scalable | quality scalable layer, a layer containing a different view, etc. | |||
| layer, a layer containing a different view, etc. | ||||
| Type: 5 bits | Type: 5 bits | |||
| nal_unit_type. This field specifies the NAL unit type, as defined | ||||
| nal_unit_type. This field specifies the NAL unit type as defined | ||||
| in Table 5 of [VVC]. For a reference of all currently defined NAL | in Table 5 of [VVC]. For a reference of all currently defined NAL | |||
| unit types and their semantics, please refer to Section 7.4.2.2 in | unit types and their semantics, please refer to Section 7.4.2.2 in | |||
| [VVC]. | [VVC]. | |||
| TID: 3 bits | TID: 3 bits | |||
| nuh_temporal_id_plus1. This field specifies the temporal | nuh_temporal_id_plus1. This field specifies the temporal | |||
| identifier of the NAL unit plus 1. The value of TemporalId is | identifier of the NAL unit plus 1. The value of TemporalId is | |||
| equal to TID minus 1. A TID value of 0 is illegal to ensure that | equal to TID minus 1. A TID value of 0 is illegal to ensure that | |||
| there is at least one bit in the NAL unit header equal to 1, so to | there is at least one bit in the NAL unit header equal to 1 in | |||
| enable the consideration of start code emulations in the NAL unit | order to enable the consideration of start code emulations in the | |||
| payload data independent of the NAL unit header. | NAL unit payload data independent of the NAL unit header. | |||
| 1.2. Overview of the Payload Format | 1.2. Overview of the Payload Format | |||
| This payload format defines the following processes required for | This payload format defines the following processes required for | |||
| transport of VVC coded data over RTP [RFC3550]: | transport of VVC coded data over RTP [RFC3550]: | |||
| * Usage of RTP header with this payload format | * usage of the RTP header with this payload format | |||
| * Packetization of VVC coded NAL units into RTP packets using three | * packetization of VVC coded NAL units into RTP packets using three | |||
| types of payload structures: a single NAL unit packet, aggregation | types of payload structures: a single NAL unit packet, aggregation | |||
| packet, and fragment unit | packet, and fragment unit | |||
| * Transmission of VVC NAL units of the same bitstream within a | * transmission of VVC NAL units of the same bitstream within a | |||
| single RTP stream | single RTP stream | |||
| * Media type parameters to be used with the Session Description | * media type parameters to be used with the Session Description | |||
| Protocol (SDP) [RFC8866] | Protocol (SDP) [RFC8866] | |||
| * Usage of RTCP feedback messages | * usage of RTCP feedback messages | |||
| 2. Conventions | 2. Conventions | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 3. Definitions and Abbreviations | 3. Definitions and Abbreviations | |||
| 3.1. Definitions | 3.1. Definitions | |||
| This document uses the terms and definitions of VVC. Section 3.1.1 | This document uses the terms and definitions of VVC. Section 3.1.1 | |||
| lists relevant definitions from [VVC] for convenience. Section 3.1.2 | lists relevant definitions from [VVC] for convenience. Section 3.1.2 | |||
| provides definitions specific to this memo. All the used terms and | provides definitions specific to this memo. All the used terms and | |||
| definitions in this memo are verbatim copies of [VVC] specification. | definitions in this memo are verbatim copies from the [VVC] | |||
| specification. | ||||
| 3.1.1. Definitions from the VVC Specification | 3.1.1. Definitions from the VVC Specification | |||
| Access unit (AU): A set of PUs that belong to different layers and | Access unit (AU): | |||
| contain coded pictures associated with the same time for output from | A set of PUs that belong to different layers and contain coded | |||
| the DPB. | pictures associated with the same time for output from the DPB. | |||
| Adaptation parameter set (APS): A syntax structure containing syntax | Adaptation parameter set (APS): | |||
| elements that apply to zero or more slices as determined by zero or | A syntax structure containing syntax elements that apply to zero | |||
| more syntax elements found in slice headers. | or more slices as determined by zero or more syntax elements found | |||
| in slice headers. | ||||
| Bitstream: A sequence of bits, in the form of a NAL unit stream or a | Bitstream: | |||
| byte stream, that forms the representation of a sequence of AUs | A sequence of bits, in the form of a NAL unit stream or a byte | |||
| forming one or more coded video sequences (CVSs). | stream, that forms the representation of a sequence of AUs forming | |||
| one or more coded video sequences (CVSs). | ||||
| Coded picture: A coded representation of a picture comprising VCL NAL | Coded picture: | |||
| units with a particular value of nuh_layer_id within an AU and | A coded representation of a picture comprising VCL NAL units with | |||
| containing all CTUs of the picture. | a particular value of nuh_layer_id within an AU and containing all | |||
| CTUs of the picture. | ||||
| Clean random access (CRA) PU: A PU in which the coded picture is a | Clean random access (CRA) PU: | |||
| CRA picture. | A PU in which the coded picture is a CRA picture. | |||
| Clean random access (CRA) picture: An IRAP picture for which each VCL | Clean random access (CRA) picture: | |||
| NAL unit has nal_unit_type equal to CRA_NUT. | An IRAP picture for which each VCL NAL unit has nal_unit_type | |||
| equal to CRA_NUT. | ||||
| Coded video sequence (CVS): A sequence of AUs that consists, in | Coded video sequence (CVS): | |||
| decoding order, of a CVSS AU, followed by zero or more AUs that are | A sequence of AUs that consists, in decoding order, of a CVSS AU, | |||
| not CVSS AUs, including all subsequent AUs up to but not including | followed by zero or more AUs that are not CVSS AUs, including all | |||
| any subsequent AU that is a CVSS AU. | subsequent AUs up to but not including any subsequent AU that is a | |||
| CVSS AU. | ||||
| Coded video sequence start (CVSS) AU: An AU in which there is a PU | Coded video sequence start (CVSS) AU: | |||
| for each layer in the CVS and the coded picture in each PU is a CLVSS | An AU in which there is a PU for each layer in the CVS and the | |||
| picture. | coded picture in each PU is a CLVSS picture. | |||
| Coded layer video sequence (CLVS): A sequence of PUs with the same | Coded layer video sequence (CLVS): | |||
| value of nuh_layer_id that consists, in decoding order, of a CLVSS | A sequence of PUs with the same value of nuh_layer_id that | |||
| PU, followed by zero or more PUs that are not CLVSS PUs, including | consists, in decoding order, of a CLVSS PU, followed by zero or | |||
| all subsequent PUs up to but not including any subsequent PU that is | more PUs that are not CLVSS PUs, including all subsequent PUs up | |||
| a CLVSS PU. | to but not including any subsequent PU that is a CLVSS PU. | |||
| Coded layer video sequence start (CLVSS) PU: A PU in which the coded | Coded layer video sequence start (CLVSS) PU: | |||
| picture is a CLVSS picture. | A PU in which the coded picture is a CLVSS picture. | |||
| Coded layer video sequence start (CLVSS) picture: A coded picture | Coded layer video sequence start (CLVSS) picture: | |||
| that is an IRAP picture with NoOutputBeforeRecoveryFlag equal to 1 or | A coded picture that is an IRAP picture with | |||
| a GDR picture with NoOutputBeforeRecoveryFlag equal to 1. | NoOutputBeforeRecoveryFlag equal to 1 or a GDR picture with | |||
| NoOutputBeforeRecoveryFlag equal to 1. | ||||
| Coding tree unit (CTU): A CTB of luma samples, two corresponding CTBs | Coding Tree Block (CTB): | |||
| of chroma samples of a picture that has three sample arrays, or a CTB | An NxN block of samples for some value of N such that the division | |||
| of samples of a monochrome picture or a picture that is coded using | of a component into CTBs is a partitioning. | |||
| three separate colour planes and syntax structures used to code the | ||||
| samples. | ||||
| Decoding Capability Information (DCI): A syntax structure containing | Coding tree unit (CTU): | |||
| syntax elements that apply to the entire bitstream. | A CTB of luma samples, two corresponding CTBs of chroma samples of | |||
| a picture that has three sample arrays, or a CTB of samples of a | ||||
| monochrome picture or a picture that is coded using three separate | ||||
| colour planes and syntax structures used to code the samples. | ||||
| Decoded picture buffer (DPB): A buffer holding decoded pictures for | Coding Unit (CU): | |||
| reference, output reordering, or output delay specified for the | A coding block of luma samples, two corresponding coding blocks of | |||
| hypothetical reference decoder. | chroma samples of a picture that has three sample arrays in the | |||
| single tree mode, or a coding block of luma samples of a picture | ||||
| that has three sample arrays in the dual tree mode, or two coding | ||||
| blocks of chroma samples of a picture that has three sample arrays | ||||
| in the dual tree mode, or a coding block of samples of a | ||||
| monochrome picture, and syntax structures used to code the | ||||
| samples. | ||||
| Gradual decoding refresh (GDR) picture: A picture for which each VCL | Decoding Capability Information (DCI): | |||
| NAL unit has nal_unit_type equal to GDR_NUT. | A syntax structure containing syntax elements that apply to the | |||
| entire bitstream. | ||||
| Instantaneous decoding refresh (IDR) PU: A PU in which the coded | Decoded picture buffer (DPB): | |||
| picture is an IDR picture. | A buffer holding decoded pictures for reference, output | |||
| reordering, or output delay specified for the hypothetical | ||||
| reference decoder. | ||||
| Instantaneous decoding refresh (IDR) picture: An IRAP picture for | Gradual decoding refresh (GDR) picture: | |||
| which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL or | A picture for which each VCL NAL unit has nal_unit_type equal to | |||
| IDR_N_LP. | GDR_NUT. | |||
| Intra random access point (IRAP) AU: An AU in which there is a PU for | Instantaneous decoding refresh (IDR) PU: | |||
| each layer in the CVS and the coded picture in each PU is an IRAP | A PU in which the coded picture is an IDR picture. | |||
| picture. | ||||
| Intra random access point (IRAP) PU: A PU in which the coded picture | Instantaneous decoding refresh (IDR) picture: | |||
| is an IRAP picture. | An IRAP picture for which each VCL NAL unit has nal_unit_type | |||
| equal to IDR_W_RADL or IDR_N_LP. | ||||
| Intra random access point (IRAP) picture: A coded picture for which | Intra random access point (IRAP) AU: | |||
| all VCL NAL units have the same value of nal_unit_type in the range | An AU in which there is a PU for each layer in the CVS and the | |||
| of IDR_W_RADL to CRA_NUT, inclusive. | coded picture in each PU is an IRAP picture. | |||
| Layer: A set of VCL NAL units that all have a particular value of | Intra random access point (IRAP) PU: | |||
| nuh_layer_id and the associated non-VCL NAL units. | A PU in which the coded picture is an IRAP picture. | |||
| Network abstraction layer (NAL) unit: A syntax structure containing | Intra random access point (IRAP) picture: | |||
| an indication of the type of data to follow and bytes containing that | A coded picture for which all VCL NAL units have the same value of | |||
| data in the form of an RBSP interspersed as necessary with emulation | nal_unit_type in the range of IDR_W_RADL to CRA_NUT, inclusive. | |||
| prevention bytes. | ||||
| Network abstraction layer (NAL) unit stream: A sequence of NAL units. | Layer: | |||
| A set of VCL NAL units that all have a particular value of | ||||
| nuh_layer_id and the associated non-VCL NAL units. | ||||
| Output Layer Set (OLS): A set of layers for which one or more layers | Network abstraction layer (NAL) unit: | |||
| are specified as the output layers. | A syntax structure containing an indication of the type of data to | |||
| follow and bytes containing that data in the form of an RBSP | ||||
| interspersed as necessary with emulation prevention bytes. | ||||
| Operation point (OP): A temporal subset of an OLS, identified by an | Network abstraction layer (NAL) unit stream: | |||
| OLS index and a highest value of TemporalId. | A sequence of NAL units. | |||
| Picture parameter set (PPS): A syntax structure containing syntax | Output Layer Set (OLS): | |||
| elements that apply to zero or more entire coded pictures as | A set of layers for which one or more layers are specified as the | |||
| determined by a syntax element found in each slice header. | output layers. | |||
| Picture unit (PU): A set of NAL units that are associated with each | Operation point (OP): | |||
| other according to a specified classification rule, are consecutive | A temporal subset of an OLS, identified by an OLS index and a | |||
| in decoding order, and contain exactly one coded picture. | highest value of TemporalId. | |||
| Random access: The act of starting the decoding process for a | Picture Header (PH): | |||
| bitstream at a point other than the beginning of the stream. | A syntax structure containing syntax elements that apply to all | |||
| slices of a coded picture. | ||||
| Sequence parameter set (SPS): A syntax structure containing syntax | Picture parameter set (PPS): | |||
| elements that apply to zero or more entire CLVSs as determined by the | A syntax structure containing syntax elements that apply to zero | |||
| content of a syntax element found in the PPS referred to by a syntax | or more entire coded pictures as determined by a syntax element | |||
| element found in each picture header. | found in each slice header. | |||
| Slice: An integer number of complete tiles or an integer number of | Picture unit (PU): | |||
| consecutive complete CTU rows within a tile of a picture that are | A set of NAL units that are associated with each other according | |||
| exclusively contained in a single NAL unit. | to a specified classification rule, are consecutive in decoding | |||
| order, and contain exactly one coded picture. | ||||
| Slice header (SH): A part of a coded slice containing the data | Random access: | |||
| elements pertaining to all tiles or CTU rows within a tile | The act of starting the decoding process for a bitstream at a | |||
| represented in the slice. | point other than the beginning of the bitstream. | |||
| Sublayer: A temporal scalable layer of a temporal scalable bitstream | Raw Byte Sequence Payload (RBSP): | |||
| consisting of VCL NAL units with a particular value of the TemporalId | A syntax structure containing an integer number of bytes that is | |||
| variable, and the associated non-VCL NAL units. | encapsulated in a NAL unit and is either empty or has the form of | |||
| a string of data bits containing syntax elements followed by an | ||||
| RBSP stop bit and zero or more subsequent bits equal to 0. | ||||
| Subpicture: An rectangular region of one or more slices within a | Sequence parameter set (SPS): | |||
| picture. | A syntax structure containing syntax elements that apply to zero | |||
| or more entire CLVSs as determined by the content of a syntax | ||||
| element found in the PPS referred to by a syntax element found in | ||||
| each picture header. | ||||
| Sublayer representation: A subset of the bitstream consisting of NAL | Slice: | |||
| units of a particular sublayer and the lower sublayers. | An integer number of complete tiles or an integer number of | |||
| consecutive complete CTU rows within a tile of a picture that are | ||||
| exclusively contained in a single NAL unit. | ||||
| Tile: A rectangular region of CTUs within a particular tile column | Slice header (SH): | |||
| and a particular tile row in a picture. | A part of a coded slice containing the data elements pertaining to | |||
| all tiles or CTU rows within a tile represented in the slice. | ||||
| Tile column: A rectangular region of CTUs having a height equal to | Sublayer: | |||
| the height of the picture and a width specified by syntax elements in | A temporal scalable layer of a temporal scalable bitstream | |||
| the picture parameter set. | consisting of VCL NAL units with a particular value of the | |||
| TemporalId variable, and the associated non-VCL NAL units. | ||||
| Tile row: A rectangular region of CTUs having a height specified by | Subpicture: | |||
| syntax elements in the picture parameter set and a width equal to the | A rectangular region of one or more slices within a picture. | |||
| width of the picture. | ||||
| Video coding layer (VCL) NAL unit: A collective term for coded slice | Sublayer representation: | |||
| NAL units and the subset of NAL units that have reserved values of | A subset of the bitstream consisting of NAL units of a particular | |||
| nal_unit_type that are classified as VCL NAL units in this | sublayer and the lower sublayers. | |||
| Specification. | ||||
| Tile: | ||||
| A rectangular region of CTUs within a particular tile column and a | ||||
| particular tile row in a picture. | ||||
| Tile column: | ||||
| A rectangular region of CTUs having a height equal to the height | ||||
| of the picture and a width specified by syntax elements in the | ||||
| picture parameter set. | ||||
| Tile row: | ||||
| A rectangular region of CTUs having a height specified by syntax | ||||
| elements in the picture parameter set and a width equal to the | ||||
| width of the picture. | ||||
| Video coding layer (VCL) NAL unit: | ||||
| A collective term for coded slice NAL units and the subset of NAL | ||||
| units that have reserved values of nal_unit_type that are | ||||
| classified as VCL NAL units in this Specification. | ||||
| 3.1.2. Definitions Specific to This Memo | 3.1.2. Definitions Specific to This Memo | |||
| Media-Aware Network Element (MANE): A network element, such as a | Media-Aware Network Element (MANE): | |||
| middlebox, selective forwarding unit, or application-layer gateway | A network element, such as a middlebox, selective forwarding unit, | |||
| that is capable of parsing certain aspects of the RTP payload headers | or application-layer gateway that is capable of parsing certain | |||
| or the RTP payload and reacting to their contents. | aspects of the RTP payload headers or the RTP payload and reacting | |||
| to their contents. | ||||
| Informative note: The concept of a MANE goes beyond normal routers | | Informative note: The concept of a MANE goes beyond normal | |||
| or gateways in that a MANE has to be aware of the signaling (e.g., | | routers or gateways in that a MANE has to be aware of the | |||
| to learn about the payload type mappings of the media streams), | | signaling (e.g., to learn about the payload type mappings of | |||
| and in that it has to be trusted when working with Secure RTP | | the media streams), and in that it has to be trusted when | |||
| (SRTP). The advantage of using MANEs is that they allow packets | | working with Secure RTP (SRTP). The advantage of using | |||
| to be dropped according to the needs of the media coding. For | | MANEs is that they allow packets to be dropped according to | |||
| example, if a MANE has to drop packets due to congestion on a | | the needs of the media coding. For example, if a MANE has | |||
| certain link, it can identify and remove those packets whose | | to drop packets due to congestion on a certain link, it can | |||
| elimination produces the least adverse effect on the user | | identify and remove those packets whose elimination produces | |||
| experience. After dropping packets, MANEs must rewrite RTCP | | the least adverse effect on the user experience. After | |||
| packets to match the changes to the RTP stream, as specified in | | dropping packets, MANEs must rewrite RTCP packets to match | |||
| Section 7 of [RFC3550]. | | the changes to the RTP stream, as specified in Section 7 of | |||
| | [RFC3550]. | ||||
| NAL unit decoding order: A NAL unit order that conforms to the | NAL unit decoding order: | |||
| constraints on NAL unit order given in Section 7.4.2.4 in [VVC], | A NAL unit order that conforms to the constraints on NAL unit | |||
| follow the Order of NAL units in the bitstream. | order given in Section 7.4.2.4 in [VVC], follow the order of NAL | |||
| units in the bitstream. | ||||
| RTP stream (See [RFC7656]): Within the scope of this memo, one RTP | RTP stream (see [RFC7656]): | |||
| stream is utilized to transport a VVC bitstream, which may contain | Within the scope of this memo, one RTP stream is utilized to | |||
| one or more layers, and each layer may contain one or more temporal | transport a VVC bitstream, which may contain one or more layers, | |||
| sublayers. | and each layer may contain one or more temporal sublayers. | |||
| Transmission order: The order of packets in ascending RTP sequence | Transmission order: | |||
| number order (in modulo arithmetic). Within an aggregation packet, | The order of packets in ascending RTP sequence number order (in | |||
| the NAL unit transmission order is the same as the order of | modulo arithmetic). Within an aggregation packet, the NAL unit | |||
| appearance of NAL units in the packet. | transmission order is the same as the order of appearance of NAL | |||
| units in the packet. | ||||
| 3.2. Abbreviations | 3.2. Abbreviations | |||
| AU Access Unit | AU Access Unit | |||
| AP Aggregation Packet | AP Aggregation Packet | |||
| APS Adaptation Parameter Set | APS Adaptation Parameter Set | |||
| CTU Coding Tree Unit | CTU Coding Tree Unit | |||
| CVS Coded Video Sequence | ||||
| DPB Decoded Picture Buffer | CVS Coded Video Sequence | |||
| DCI Decoding Capability Information | DPB Decoded Picture Buffer | |||
| DON Decoding Order Number | DCI Decoding Capability Information | |||
| FIR Full Intra Request | DON Decoding Order Number | |||
| FU Fragmentation Unit | FIR Full Intra Request | |||
| GDR Gradual Decoding Refresh | FU Fragmentation Unit | |||
| HRD Hypothetical Reference Decoder | GDR Gradual Decoding Refresh | |||
| IDR Instantaneous Decoding Refresh | HRD Hypothetical Reference Decoder | |||
| IRAP Intra Random Access Point | IDR Instantaneous Decoding Refresh | |||
| MANE Media-Aware Network Element | IRAP Intra Random Access Point | |||
| MTU Maximum Transfer Unit | MANE Media-Aware Network Element | |||
| NAL Network Abstraction Layer | MTU Maximum Transfer Unit | |||
| NALU Network Abstraction Layer Unit | NAL Network Abstraction Layer | |||
| OLS Output Layer Set | NALU Network Abstraction Layer Unit | |||
| PLI Picture Loss Indication | OLS Output Layer Set | |||
| PPS Picture Parameter Set | PLI Picture Loss Indication | |||
| RPSI Reference Picture Selection Indication | PPS Picture Parameter Set | |||
| SEI Supplemental Enhancement Information | RPSI Reference Picture Selection Indication | |||
| SLI Slice Loss Indication | SEI Supplemental Enhancement Information | |||
| SPS Sequence Parameter Set | SLI Slice Loss Indication | |||
| VCL Video Coding Layer | SPS Sequence Parameter Set | |||
| VPS Video Parameter Set | VCL Video Coding Layer | |||
| VPS Video Parameter Set | ||||
| 4. RTP Payload Format | 4. RTP Payload Format | |||
| 4.1. RTP Header Usage | 4.1. RTP Header Usage | |||
| The format of the RTP header is specified in [RFC3550] (reprinted as | The format of the RTP header is specified in [RFC3550] (reprinted as | |||
| Figure 2 for convenience). This payload format uses the fields of | Figure 2 for convenience). This payload format uses the fields of | |||
| the header in a manner consistent with that specification. | the header in a manner consistent with that specification. | |||
| The RTP payload (and the settings for some RTP header bits) for | The RTP payload (and the settings for some RTP header bits) for | |||
| aggregation packets and fragmentation units are specified in | aggregation packets and fragmentation units are specified in Sections | |||
| Section 4.3.2 and Section 4.3.3, respectively. | 4.3.2 and 4.3.3, respectively. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |V=2|P|X| CC |M| PT | sequence number | | |V=2|P|X| CC |M| PT | sequence number | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | timestamp | | | timestamp | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | synchronization source (SSRC) identifier | | | synchronization source (SSRC) identifier | | |||
| +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | |||
| | contributing source (CSRC) identifiers | | | contributing source (CSRC) identifiers | | |||
| | .... | | | .... | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| RTP Header According to [RFC3550] | Figure 2: RTP Header According to RFC 3550 | |||
| Figure 2 | ||||
| The RTP header information to be set according to this RTP payload | The RTP header information to be set according to this RTP payload | |||
| format is set as follows: | format is set as follows: | |||
| Marker bit (M): 1 bit | Marker bit (M): 1 bit | |||
| Set for the last packet, in transmission order, among each set of | Set for the last packet, in transmission order, among each set of | |||
| packets that contain NAL units of one access unit. This is in | packets that contain NAL units of one access unit. This is in | |||
| line with the normal use of the M bit in video formats to allow an | line with the normal use of the M bit in video formats to allow an | |||
| efficient playout buffer handling. | efficient playout buffer handling. | |||
| Payload Type (PT): 7 bits | Payload Type (PT): 7 bits | |||
| The assignment of an RTP payload type for this new packet format | The assignment of an RTP payload type for this new packet format | |||
| is outside the scope of this document and will not be specified | is outside the scope of this document and will not be specified | |||
| here. The assignment of a payload type has to be performed either | here. The assignment of a payload type has to be performed either | |||
| through the profile used or in a dynamic way. | through the profile used or in a dynamic way. | |||
| Sequence Number (SN): 16 bits | Sequence Number (SN): 16 bits | |||
| Set and used in accordance with [RFC3550]. | Set and used in accordance with [RFC3550]. | |||
| Timestamp: 32 bits | Timestamp: 32 bits | |||
| The RTP timestamp is set to the sampling timestamp of the content. | The RTP timestamp is set to the sampling timestamp of the content. | |||
| A 90 kHz clock rate MUST be used. If the NAL unit has no timing | A 90 kHz clock rate MUST be used. If the NAL unit has no timing | |||
| properties of its own (e.g., parameter set and SEI NAL units), the | properties of its own (e.g., parameter set and SEI NAL units), the | |||
| RTP timestamp MUST be set to the RTP timestamp of the coded | RTP timestamp MUST be set to the RTP timestamp of the coded | |||
| pictures of the access unit in which the NAL unit (according to | pictures of the access unit in which the NAL unit (according to | |||
| Section 7.4.2.4 of [VVC]) is included. Receivers MUST use the RTP | Section 7.4.2.4 of [VVC]) is included. Receivers MUST use the RTP | |||
| timestamp for the display process, even when the bitstream | timestamp for the display process, even when the bitstream | |||
| contains picture timing SEI messages or decoding unit information | contains picture timing SEI messages or decoding unit information | |||
| SEI messages as specified in [VVC]. | SEI messages, as specified in [VVC]. | |||
| Informative note: When picture timing SEI messages are present, | | Informative note: When picture timing SEI messages are | |||
| the RTP sender is responsible to ensure that the RTP timestamps | | present, the RTP sender is responsible to ensure that the | |||
| are consistent with the timing information carried in the | | RTP timestamps are consistent with the timing information | |||
| picture timing SEI messages. | | carried in the picture timing SEI messages. | |||
| Synchronization source (SSRC): 32 bits | Synchronization source (SSRC): 32 bits | |||
| Used to identify the source of the RTP packets. A single SSRC is | Used to identify the source of the RTP packets. A single SSRC is | |||
| used for all parts of a single bitstream. | used for all parts of a single bitstream. | |||
| 4.2. Payload Header Usage | 4.2. Payload Header Usage | |||
| The first two bytes of the payload of an RTP packet are referred to | The first two bytes of the payload of an RTP packet are referred to | |||
| as the payload header. The payload header consists of the same | as the payload header. The payload header consists of the same | |||
| fields (F, Z, LayerId, Type, and TID) as the NAL unit header as shown | fields (F, Z, LayerId, Type, and TID) as the NAL unit header shown in | |||
| in Section 1.1.4, irrespective of the type of the payload structure. | Section 1.1.4, irrespective of the type of the payload structure. | |||
| The TID value indicates (among other things) the relative importance | The TID value indicates (among other things) the relative importance | |||
| of an RTP packet, for example, because NAL units belonging to higher | of an RTP packet, for example, because NAL units belonging to higher | |||
| temporal sublayers are not used for the decoding of lower temporal | temporal sublayers are not used for the decoding of lower temporal | |||
| sublayers. A lower value of TID indicates a higher importance. | sublayers. A lower value of TID indicates a higher importance. More | |||
| More-important NAL units MAY be better protected against transmission | important NAL units MAY be better protected against transmission | |||
| losses than less-important NAL units. | losses than less-important NAL units. | |||
| 4.3. Payload Structures | 4.3. Payload Structures | |||
| Three different types of RTP packet payload structures are specified. | Three different types of RTP packet payload structures are specified. | |||
| A receiver can identify the type of an RTP packet payload through the | A receiver can identify the type of an RTP packet payload through the | |||
| Type field in the payload header. | Type field in the payload header. | |||
| The three different payload structures are as follows: | The three different payload structures are as follows: | |||
| skipping to change at page 23, line 14 ¶ | skipping to change at line 1079 ¶ | |||
| * Aggregation Packet (AP): Contains more than one NAL unit within | * Aggregation Packet (AP): Contains more than one NAL unit within | |||
| one access unit. This payload structure is specified in | one access unit. This payload structure is specified in | |||
| Section 4.3.2. | Section 4.3.2. | |||
| * Fragmentation Unit (FU): Contains a subset of a single NAL unit. | * Fragmentation Unit (FU): Contains a subset of a single NAL unit. | |||
| This payload structure is specified in Section 4.3.3. | This payload structure is specified in Section 4.3.3. | |||
| 4.3.1. Single NAL Unit Packets | 4.3.1. Single NAL Unit Packets | |||
| A single NAL unit packet contains exactly one NAL unit, and consists | A single NAL unit packet contains exactly one NAL unit and consists | |||
| of a payload header as defined in Table 5 of [VVC] (denoted here as | of a payload header, as defined in Table 5 of [VVC] (denoted here as | |||
| PayloadHdr), following with a conditional 16-bit DONL field (in | PayloadHdr), following with a conditional 16-bit DONL field (in | |||
| network byte order), and the NAL unit payload data (the NAL unit | network byte order), and the NAL unit payload data (the NAL unit | |||
| excluding its NAL unit header) of the contained NAL unit, as shown in | excluding its NAL unit header) of the contained NAL unit, as shown in | |||
| Figure 3. | Figure 3. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | PayloadHdr | DONL (conditional) | | | PayloadHdr | DONL (conditional) | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | | | | | | |||
| | NAL unit payload data | | | NAL unit payload data | | |||
| | | | | | | |||
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | :...OPTIONAL RTP padding | | | :...OPTIONAL RTP padding | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| The Structure of a Single NAL Unit Packet | Figure 3: The Structure of a Single NAL Unit Packet | |||
| Figure 3 | ||||
| The DONL field, when present, specifies the value of the 16 least | The DONL field, when present, specifies the value of the 16 least | |||
| significant bits of the decoding order number of the contained NAL | significant bits of the decoding order number of the contained NAL | |||
| unit. If sprop-max-don-diff (see definition in Section 7.2) is | unit. If sprop-max-don-diff (defined in Section 7.2) is greater than | |||
| greater than 0, the DONL field MUST be present, and the variable DON | 0, the DONL field MUST be present, and the variable DON for the | |||
| for the contained NAL unit is derived as equal to the value of the | contained NAL unit is derived as equal to the value of the DONL | |||
| DONL field. Otherwise (sprop-max-don-diff is equal to 0), the DONL | field. Otherwise (sprop-max-don-diff is equal to 0), the DONL field | |||
| field MUST NOT be present. | MUST NOT be present. | |||
| 4.3.2. Aggregation Packets (APs) | 4.3.2. Aggregation Packets (APs) | |||
| Aggregation Packets (APs) can reduce packetization overhead for small | Aggregation packets (APs) can reduce packetization overhead for small | |||
| NAL units, such as most of the non-VCL NAL units, which are often | NAL units, such as most of the non-VCL NAL units, which are often | |||
| only a few octets in size. | only a few octets in size. | |||
| An AP aggregates NAL units of one access unit and it MUST NOT contain | An AP aggregates NAL units of one access unit, and it MUST NOT | |||
| NAL units from more than one AU. Each NAL unit to be carried in an | contain NAL units from more than one AU. Each NAL unit to be carried | |||
| AP is encapsulated in an aggregation unit. NAL units aggregated in | in an AP is encapsulated in an aggregation unit. NAL units | |||
| one AP are included in NAL unit decoding order. | aggregated in one AP are included in NAL-unit-decoding order. | |||
| An AP consists of a payload header as defined in Table 5 of [VVC] | An AP consists of a payload header, as defined in Table 5 of [VVC] | |||
| (denoted here as PayloadHdr with Type=28) followed by two or more | (denoted here as PayloadHdr with Type=28), followed by two or more | |||
| aggregation units, as shown in Figure 4. | aggregation units, as shown in Figure 4. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | PayloadHdr (Type=28) | | | | PayloadHdr (Type=28) | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | |||
| | | | | | | |||
| | two or more aggregation units | | | two or more aggregation units | | |||
| | | | | | | |||
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | :...OPTIONAL RTP padding | | | :...OPTIONAL RTP padding | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| The Structure of an Aggregation Packet | Figure 4: The Structure of an Aggregation Packet | |||
| Figure 4 | ||||
| The fields in the payload header of an AP are set as follows. The F | The fields in the payload header of an AP are set as follows. The F | |||
| bit MUST be equal to 0 if the F bit of each aggregated NAL unit is | bit MUST be equal to 0 if the F bit of each aggregated NAL unit is | |||
| equal to zero; otherwise, it MUST be equal to 1. The Type field MUST | equal to zero; otherwise, it MUST be equal to 1. The Type field MUST | |||
| be equal to 28. | be equal to 28. | |||
| The value of LayerId MUST be equal to the lowest value of LayerId of | The value of LayerId MUST be equal to the lowest value of LayerId of | |||
| all the aggregated NAL units. The value of TID MUST be the lowest | all the aggregated NAL units. The value of TID MUST be the lowest | |||
| value of TID of all the aggregated NAL units. | value of TID of all the aggregated NAL units. | |||
| Informative note: All VCL NAL units in an AP have the same TID | | Informative note: All VCL NAL units in an AP have the same TID | |||
| value since they belong to the same access unit. However, an AP | | value since they belong to the same access unit. However, an | |||
| may contain non-VCL NAL units for which the TID value in the NAL | | AP may contain non-VCL NAL units for which the TID value in the | |||
| unit header may be different than the TID value of the VCL NAL | | NAL unit header may be different than the TID value of the VCL | |||
| units in the same AP. | | NAL units in the same AP. | |||
| Informative Note: If a system envisions sub-picture level or | | Informative note: If a system envisions subpicture-level or | |||
| picture level modifications, for example by removing sub-pictures | | picture-level modifications, for example, by removing | |||
| or pictures of a particular layer, a good design choice on the | | subpictures or pictures of a particular layer, a good design | |||
| sender's side would be to aggregate NAL units belonging to only | | choice on the sender's side would be to aggregate NAL units | |||
| the same sub-picture or picture of a particular layer. | | belonging to only the same subpicture or picture of a | |||
| | particular layer. | ||||
| An AP MUST carry at least two aggregation units and can carry as many | An AP MUST carry at least two aggregation units and can carry as many | |||
| aggregation units as necessary; however, the total amount of data in | aggregation units as necessary; however, the total amount of data in | |||
| an AP obviously MUST fit into an IP packet, and the size SHOULD be | an AP obviously MUST fit into an IP packet, and the size SHOULD be | |||
| chosen so that the resulting IP packet is smaller than the MTU size | chosen so that the resulting IP packet is smaller than the MTU size | |||
| so to avoid IP layer fragmentation. An AP MUST NOT contain FUs | in order to avoid IP layer fragmentation. An AP MUST NOT contain the | |||
| specified in Section 4.3.3. APs MUST NOT be nested; i.e., an AP can | FUs specified in Section 4.3.3. APs MUST NOT be nested, i.e., an AP | |||
| not contain another AP. | cannot contain another AP. | |||
| The first aggregation unit in an AP consists of a conditional 16-bit | The first aggregation unit in an AP consists of a conditional 16-bit | |||
| DONL field (in network byte order) followed by a 16-bit unsigned size | DONL field (in network byte order), followed by 16 bits of unsigned | |||
| information (in network byte order) that indicates the size of the | size information (in network byte order) that indicate the size of | |||
| NAL unit in bytes (excluding these two octets, but including the NAL | the NAL unit in bytes (excluding these two octets but including the | |||
| unit header), followed by the NAL unit itself, including its NAL unit | NAL unit header), followed by the NAL unit itself, including its NAL | |||
| header, as shown in Figure 5. | unit header, as shown in Figure 5. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | : DONL (conditional) | NALU size | | | : DONL (conditional) | NALU size | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | NALU size | | | | NALU size | | | |||
| +-+-+-+-+-+-+-+-+ NAL unit | | +-+-+-+-+-+-+-+-+ NAL unit | | |||
| | | | | | | |||
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | : | | : | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| The Structure of the First Aggregation Unit in an AP | Figure 5: The Structure of the First Aggregation Unit in an AP | |||
| Figure 5 | ||||
| Informative Note: The first octet of Figure 5 (indicated by the | | Informative note: The first octet of Figure 5 (indicated by the | |||
| first colon) belongs to a previous aggregation unit. It is | | first colon) belongs to a previous aggregation unit. It is | |||
| depicted to emphasize that aggregation units are octet-aligned | | depicted to emphasize that aggregation units are octet aligned | |||
| only. Similarly, the NAL unit carried in the aggregation unit can | | only. Similarly, the NAL unit carried in the aggregation unit | |||
| terminate at the octet boundary. | | can terminate at the octet boundary. | |||
| The DONL field, when present, specifies the value of the 16 least | The DONL field, when present, specifies the value of the 16 least | |||
| significant bits of the decoding order number of the aggregated NAL | significant bits of the decoding order number of the aggregated NAL | |||
| unit. | unit. | |||
| If sprop-max-don-diff is greater than 0, the DONL field MUST be | If sprop-max-don-diff is greater than 0, the DONL field MUST be | |||
| present in an aggregation unit that is the first aggregation unit in | present in an aggregation unit that is the first aggregation unit in | |||
| an AP, and the variable DON for the aggregated NAL unit is derived as | an AP, and the variable DON for the aggregated NAL unit is derived as | |||
| equal to the value of the DONL field, and the variable DON for an | equal to the value of the DONL field, and the variable DON for an | |||
| aggregation unit that is not the first aggregation unit in an AP | aggregation unit that is not the first aggregation unit in an AP- | |||
| aggregated NAL unit is derived as equal to the DON of the preceding | aggregated NAL unit is derived as equal to the DON of the preceding | |||
| aggregated NAL unit in the same AP plus 1 modulo 65536. Otherwise | aggregated NAL unit in the same AP plus 1 modulo 65536. Otherwise | |||
| (sprop-max-don-diff is equal to 0), the DONL field MUST NOT be | (sprop-max-don-diff is equal to 0), the DONL field MUST NOT be | |||
| present in an aggregation unit that is the first aggregation unit in | present in an aggregation unit that is the first aggregation unit in | |||
| an AP. | an AP. | |||
| An aggregation unit that is not the first aggregation unit in an AP | An aggregation unit that is not the first aggregation unit in an AP | |||
| will be followed immediately by a 16-bit unsigned size information | will be followed immediately by 16 bits of unsigned size information | |||
| (in network byte order) that indicates the size of the NAL unit in | (in network byte order) that indicate the size of the NAL unit in | |||
| bytes (excluding these two octets, but including the NAL unit | bytes (excluding these two octets but including the NAL unit header), | |||
| header), followed by the NAL unit itself, including its NAL unit | followed by the NAL unit itself, including its NAL unit header, as | |||
| header, as shown in Figure 6. | shown in Figure 6. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | : NALU size | NAL unit | | | : NALU size | NAL unit | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | |||
| | | | | | | |||
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | : | | : | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| The Structure of an Aggregation Unit That Is Not the First | Figure 6: The Structure of an Aggregation Unit That Is Not the First | |||
| Aggregation Unit in an AP | Aggregation Unit in an AP | |||
| Figure 6 | ||||
| Informative Note: The first octet of Figure 6 (indicated by the | | Informative note: The first octet of Figure 6 (indicated by the | |||
| first colon) belongs to a previous aggregation unit. It is | | first colon) belongs to a previous aggregation unit. It is | |||
| depicted to emphasize that aggregation units are octet-aligned | | depicted to emphasize that aggregation units are octet aligned | |||
| only. Similarly, the NAL unit carried in the aggregation unit can | | only. Similarly, the NAL unit carried in the aggregation unit | |||
| terminate at the octet boundary. | | can terminate at the octet boundary. | |||
| Figure 7 presents an example of an AP that contains two aggregation | Figure 7 presents an example of an AP that contains two aggregation | |||
| units, labeled as 1 and 2 in the figure, without the DONL field being | units, labeled as 1 and 2 in the figure, without the DONL field being | |||
| present. | present. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | RTP Header | | | RTP Header | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| skipping to change at page 27, line 26 ¶ | skipping to change at line 1260 ¶ | |||
| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | . . . | NALU 2 Size | NALU 2 HDR | | | . . . | NALU 2 Size | NALU 2 HDR | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | NALU 2 HDR | | | | NALU 2 HDR | | | |||
| +-+-+-+-+-+-+-+-+ NALU 2 Data | | +-+-+-+-+-+-+-+-+ NALU 2 Data | | |||
| | . . . | | | . . . | | |||
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | :...OPTIONAL RTP padding | | | :...OPTIONAL RTP padding | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| An Example of an AP Packet Containing | Figure 7: An Example of an AP Packet Containing Two Aggregation | |||
| Two Aggregation Units without the DONL Field | Units without the DONL Field | |||
| Figure 7 | ||||
| Figure 8 presents an example of an AP that contains two aggregation | Figure 8 presents an example of an AP that contains two aggregation | |||
| units, labeled as 1 and 2 in the figure, with the DONL field being | units, labeled as 1 and 2 in the figure, with the DONL field being | |||
| present. | present. | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | RTP Header | | | RTP Header | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| skipping to change at page 28, line 27 ¶ | skipping to change at line 1289 ¶ | |||
| + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | : NALU 2 Size | | | : NALU 2 Size | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | NALU 2 HDR | | | | NALU 2 HDR | | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | | |||
| | | | | | | |||
| | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | :...OPTIONAL RTP padding | | | :...OPTIONAL RTP padding | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| An Example of an AP Containing | Figure 8: An Example of an AP Containing Two Aggregation Units | |||
| Two Aggregation Units with the DONL Field | with the DONL Field | |||
| Figure 8 | ||||
| 4.3.3. Fragmentation Units | 4.3.3. Fragmentation Units | |||
| Fragmentation Units (FUs) are introduced to enable fragmenting a | Fragmentation Units (FUs) are introduced to enable fragmenting a | |||
| single NAL unit into multiple RTP packets, possibly without | single NAL unit into multiple RTP packets, possibly without | |||
| cooperation or knowledge of the [VVC] encoder. A fragment of a NAL | cooperation or knowledge of the [VVC] encoder. A fragment of a NAL | |||
| unit consists of an integer number of consecutive octets of that NAL | unit consists of an integer number of consecutive octets of that NAL | |||
| unit. Fragments of the same NAL unit MUST be sent in consecutive | unit. Fragments of the same NAL unit MUST be sent in consecutive | |||
| order with ascending RTP sequence numbers (with no other RTP packets | order with ascending RTP sequence numbers (with no other RTP packets | |||
| within the same RTP stream being sent between the first and last | within the same RTP stream being sent between the first and last | |||
| fragment). | fragment). | |||
| When a NAL unit is fragmented and conveyed within FUs, it is referred | When a NAL unit is fragmented and conveyed within FUs, it is referred | |||
| to as a fragmented NAL unit. APs MUST NOT be fragmented. FUs MUST | to as a fragmented NAL unit. APs MUST NOT be fragmented. FUs MUST | |||
| NOT be nested; i.e., an FU can not contain a subset of another FU. | NOT be nested, i.e., an FU cannot contain a subset of another FU. | |||
| The RTP timestamp of an RTP packet carrying an FU is set to the NALU- | The RTP timestamp of an RTP packet carrying an FU is set to the NALU- | |||
| time of the fragmented NAL unit. | time of the fragmented NAL unit. | |||
| An FU consists of a payload header as defined in Table 5 of [VVC] | An FU consists of a payload header as defined in Table 5 of [VVC] | |||
| (denoted here as PayloadHdr with Type=29), an FU header of one octet, | (denoted here as PayloadHdr with Type=29), an FU header of one octet, | |||
| a conditional 16-bit DONL field (in network byte order), and an FU | a conditional 16-bit DONL field (in network byte order), and an FU | |||
| payload, as shown in Figure 9. | payload (as shown in Figure 9). | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | PayloadHdr (Type=29) | FU header | DONL (cond) | | | PayloadHdr (Type=29) | FU header | DONL (cond) | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| | |||
| | DONL (cond) | | | | DONL (cond) | | | |||
| |-+-+-+-+-+-+-+-+ | | |-+-+-+-+-+-+-+-+ | | |||
| | FU payload | | | FU payload | | |||
| | | | | | | |||
| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | :...OPTIONAL RTP padding | | | :...OPTIONAL RTP padding | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| The Structure of an FU | Figure 9: The Structure of an FU | |||
| Figure 9 | ||||
| The fields in the payload header are set as follows. The Type field | The fields in the payload header are set as follows. The Type field | |||
| MUST be equal to 29. The fields F, LayerId, and TID MUST be equal to | MUST be equal to 29. The fields F, LayerId, and TID MUST be equal to | |||
| the fields F, LayerId, and TID, respectively, of the fragmented NAL | the fields F, LayerId, and TID, respectively, of the fragmented NAL | |||
| unit. | unit. | |||
| The FU header consists of an S bit, an E bit, an R bit and a 5-bit | The FU header consists of an S bit, an E bit, an R bit, and a 5-bit | |||
| FuType field, as shown in Figure 10. | FuType field, as shown in Figure 10. | |||
| +---------------+ | +---------------+ | |||
| |0|1|2|3|4|5|6|7| | |0|1|2|3|4|5|6|7| | |||
| +-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
| |S|E|P| FuType | | |S|E|P| FuType | | |||
| +---------------+ | +---------------+ | |||
| The Structure of FU Header | Figure 10: The Structure of the FU Header | |||
| Figure 10 | ||||
| The semantics of the FU header fields are as follows: | The semantics of the FU header fields are as follows: | |||
| S: 1 bit | S: 1 bit | |||
| When set to 1, the S bit indicates the start of a fragmented NAL | When set to 1, the S bit indicates the start of a fragmented NAL | |||
| unit, i.e., the first byte of the FU payload is also the first | unit, i.e., the first byte of the FU payload is also the first | |||
| byte of the payload of the fragmented NAL unit. When the FU | byte of the payload of the fragmented NAL unit. When the FU | |||
| payload is not the start of the fragmented NAL unit payload, the S | payload is not the start of the fragmented NAL unit payload, the S | |||
| bit MUST be set to 0. | bit MUST be set to 0. | |||
| skipping to change at page 30, line 11 ¶ | skipping to change at line 1356 ¶ | |||
| The semantics of the FU header fields are as follows: | The semantics of the FU header fields are as follows: | |||
| S: 1 bit | S: 1 bit | |||
| When set to 1, the S bit indicates the start of a fragmented NAL | When set to 1, the S bit indicates the start of a fragmented NAL | |||
| unit, i.e., the first byte of the FU payload is also the first | unit, i.e., the first byte of the FU payload is also the first | |||
| byte of the payload of the fragmented NAL unit. When the FU | byte of the payload of the fragmented NAL unit. When the FU | |||
| payload is not the start of the fragmented NAL unit payload, the S | payload is not the start of the fragmented NAL unit payload, the S | |||
| bit MUST be set to 0. | bit MUST be set to 0. | |||
| E: 1 bit | E: 1 bit | |||
| When set to 1, the E bit indicates the end of a fragmented NAL | When set to 1, the E bit indicates the end of a fragmented NAL | |||
| unit, i.e., the last byte of the payload is also the last byte of | unit, i.e., the last byte of the payload is also the last byte of | |||
| the fragmented NAL unit. When the FU payload is not the last | the fragmented NAL unit. When the FU payload is not the last | |||
| fragment of a fragmented NAL unit, the E bit MUST be set to 0. | fragment of a fragmented NAL unit, the E bit MUST be set to 0. | |||
| P: 1 bit | P: 1 bit | |||
| When set to 1, the P bit indicates the last FU of the last VCL NAL | When set to 1, the P bit indicates the last FU of the last VCL NAL | |||
| unit of a coded picture, i.e., the last byte of the FU payload is | unit of a coded picture, i.e., the last byte of the FU payload is | |||
| also the last byte of the last VCL NAL unit of the coded picture. | also the last byte of the last VCL NAL unit of the coded picture. | |||
| When the FU payload is not the last fragment of the last VCL NAL | When the FU payload is not the last fragment of the last VCL NAL | |||
| unit of a coded picture, the P bit MUST be set to 0. | unit of a coded picture, the P bit MUST be set to 0. | |||
| FuType: 5 bits | FuType: 5 bits | |||
| The field FuType MUST be equal to the field Type of the fragmented | The field FuType MUST be equal to the field Type of the fragmented | |||
| NAL unit. | NAL unit. | |||
| The DONL field, when present, specifies the value of the 16 least | The DONL field, when present, specifies the value of the 16 least | |||
| significant bits of the decoding order number of the fragmented NAL | significant bits of the decoding order number of the fragmented NAL | |||
| unit. | unit. | |||
| If sprop-max-don-diff is greater than 0, and the S bit is equal to 1, | If sprop-max-don-diff is greater than 0, and the S bit is equal to 1, | |||
| the DONL field MUST be present in the FU, and the variable DON for | the DONL field MUST be present in the FU, and the variable DON for | |||
| the fragmented NAL unit is derived as equal to the value of the DONL | the fragmented NAL unit is derived as equal to the value of the DONL | |||
| field. Otherwise (sprop-max-don-diff is equal to 0, or the S bit is | field. Otherwise (sprop-max-don-diff is equal to 0, or the S bit is | |||
| equal to 0), the DONL field MUST NOT be present in the FU. | equal to 0), the DONL field MUST NOT be present in the FU. | |||
| A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e., | A non-fragmented NAL unit MUST NOT be transmitted in one FU, i.e., | |||
| the Start bit and End bit must not both be set to 1 in the same FU | the Start bit and End bit must not both be set to 1 in the same FU | |||
| header. | header. | |||
| The FU payload consists of fragments of the payload of the fragmented | The FU payload consists of fragments of the payload of the fragmented | |||
| NAL unit so that if the FU payloads of consecutive FUs, starting with | NAL unit so that, if the FU payloads of consecutive FUs, starting | |||
| an FU with the S bit equal to 1 and ending with an FU with the E bit | with an FU with the S bit equal to 1 and ending with an FU with the E | |||
| equal to 1, are sequentially concatenated, the payload of the | bit equal to 1, are sequentially concatenated, the payload of the | |||
| fragmented NAL unit can be reconstructed. The NAL unit header of the | fragmented NAL unit can be reconstructed. The NAL unit header of the | |||
| fragmented NAL unit is not included as such in the FU payload, but | fragmented NAL unit is not included as such in the FU payload, but | |||
| rather the information of the NAL unit header of the fragmented NAL | rather the information of the NAL unit header of the fragmented NAL | |||
| unit is conveyed in F, LayerId, and TID fields of the FU payload | unit is conveyed in the F, LayerId, and TID fields of the FU payload | |||
| headers of the FUs and the FuType field of the FU header of the FUs. | headers of the FUs and the FuType field of the FU header of the FUs. | |||
| An FU payload MUST NOT be empty. | An FU payload MUST NOT be empty. | |||
| If an FU is lost, the receiver SHOULD discard all following | If an FU is lost, the receiver SHOULD discard all following | |||
| fragmentation units in transmission order corresponding to the same | fragmentation units in transmission order, corresponding to the same | |||
| fragmented NAL unit, unless the decoder in the receiver is known to | fragmented NAL unit, unless the decoder in the receiver is known to | |||
| be prepared to gracefully handle incomplete NAL units. | be prepared to gracefully handle incomplete NAL units. | |||
| A receiver in an endpoint or in a MANE MAY aggregate the first n-1 | A receiver in an endpoint or in a MANE MAY aggregate the first n-1 | |||
| fragments of a NAL unit to an (incomplete) NAL unit, even if fragment | fragments of a NAL unit to an (incomplete) NAL unit, even if fragment | |||
| n of that NAL unit is not received. In this case, the | n of that NAL unit is not received. In this case, the | |||
| forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a | forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a | |||
| syntax violation. | syntax violation. | |||
| 4.4. Decoding Order Number | 4.4. Decoding Order Number | |||
| skipping to change at page 32, line 20 ¶ | skipping to change at line 1445 ¶ | |||
| If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), | If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), | |||
| AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] | AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] | |||
| If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), | If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), | |||
| AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n]) | AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n]) | |||
| If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), | If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), | |||
| AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) | AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) | |||
| For any two NAL units m and n, the following applies: | For any two NAL units (m and n), the following applies: | |||
| * AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows | * When AbsDon[n] is greater than AbsDon[m], this indicates that NAL | |||
| NAL unit m in NAL unit decoding order. | unit n follows NAL unit m in NAL unit decoding order. | |||
| * When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order | * When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order | |||
| of the two NAL units can be in either order. | of the two NAL units can be in either order. | |||
| * AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes | * When AbsDon[n] is less than AbsDon[m], this indicates that NAL | |||
| NAL unit m in decoding order. | unit n precedes NAL unit m in decoding order. | |||
| Informative note: When two consecutive NAL units in the NAL | | Informative note: When two consecutive NAL units in the NAL | |||
| unit decoding order have different values of AbsDon, the | | unit decoding order have different values of AbsDon, the | |||
| absolute difference between the two AbsDon values may be | | absolute difference between the two AbsDon values may be | |||
| greater than or equal to 1. | | greater than or equal to 1. | |||
| Informative note: There are multiple reasons to allow for the | | Informative note: There are multiple reasons to allow for the | |||
| absolute difference of the values of AbsDon for two consecutive | | absolute difference of the values of AbsDon for two consecutive | |||
| NAL units in the NAL unit decoding order to be greater than | | NAL units in the NAL unit decoding order to be greater than | |||
| one. An increment by one is not required, as at the time of | | one. An increment by one is not required, as at the time of | |||
| associating values of AbsDon to NAL units, it may not be known | | associating values of AbsDon to NAL units, it may not be known | |||
| whether all NAL units are to be delivered to the receiver. For | | whether all NAL units are to be delivered to the receiver. For | |||
| example, a gateway might not forward VCL NAL units of higher | | example, a gateway might not forward VCL NAL units of higher | |||
| sublayers or some SEI NAL units when there is congestion in the | | sublayers or some SEI NAL units when there is congestion in the | |||
| network. In another example, the first intra-coded picture of | | network. In another example, the first intra-coded picture of | |||
| a pre-encoded clip is transmitted in advance to ensure that it | | a pre-encoded clip is transmitted in advance to ensure that it | |||
| is readily available in the receiver, and when transmitting the | | is readily available in the receiver, and when transmitting the | |||
| first intra-coded picture, the originator does not exactly know | | first intra-coded picture, the originator does not exactly know | |||
| how many NAL units will be encoded before the first intra-coded | | how many NAL units will be encoded before the first intra-coded | |||
| picture of the pre-encoded clip follows in decoding order. | | picture of the pre-encoded clip follows in decoding order. | |||
| Thus, the values of AbsDon for the NAL units of the first | | Thus, the values of AbsDon for the NAL units of the first | |||
| intra-coded picture of the pre-encoded clip have to be | | intra-coded picture of the pre-encoded clip have to be | |||
| estimated when they are transmitted, and gaps in values of | | estimated when they are transmitted, and gaps in values of | |||
| AbsDon may occur. | | AbsDon may occur. | |||
| 5. Packetization Rules | 5. Packetization Rules | |||
| The following packetization rules apply: | The following packetization rules apply: | |||
| * If sprop-max-don-diff is greater than 0, the transmission order of | * If sprop-max-don-diff is greater than 0, the transmission order of | |||
| NAL units carried in the RTP stream MAY be different than the NAL | NAL units carried in the RTP stream MAY be different than the NAL | |||
| unit decoding order. Otherwise (sprop-max-don-diff is equal to | unit decoding order. Otherwise (sprop-max-don-diff is equal to | |||
| 0), the transmission order of NAL units carried in the RTP stream | 0), the transmission order of NAL units carried in the RTP stream | |||
| MUST be the same as the NAL unit decoding order. | MUST be the same as the NAL unit decoding order. | |||
| * A NAL unit of a small size SHOULD be encapsulated in an | * A NAL unit of a small size SHOULD be encapsulated in an | |||
| aggregation packet together with one or more other NAL units in | aggregation packet together with one or more other NAL units in | |||
| order to avoid the unnecessary packetization overhead for small | order to avoid the unnecessary packetization overhead for small | |||
| NAL units. For example, non-VCL NAL units such as access unit | NAL units. For example, non-VCL NAL units, such as access unit | |||
| delimiters, parameter sets, or SEI NAL units are typically small | delimiters, parameter sets, or SEI NAL units, are typically small | |||
| and can often be aggregated with VCL NAL units without violating | and can often be aggregated with VCL NAL units without violating | |||
| MTU size constraints. | MTU size constraints. | |||
| * Each non-VCL NAL unit SHOULD, when possible from an MTU size match | * Each non-VCL NAL unit SHOULD, when possible from an MTU size match | |||
| viewpoint, be encapsulated in an aggregation packet together with | viewpoint, be encapsulated in an aggregation packet together with | |||
| its associated VCL NAL unit, as typically a non-VCL NAL unit would | its associated VCL NAL unit, as typically a non-VCL NAL unit would | |||
| be meaningless without the associated VCL NAL unit being | be meaningless without the associated VCL NAL unit being | |||
| available. | available. | |||
| * For carrying exactly one NAL unit in an RTP packet, a single NAL | * For carrying exactly one NAL unit in an RTP packet, a single NAL | |||
| skipping to change at page 34, line 22 ¶ | skipping to change at line 1524 ¶ | |||
| the following description should be seen as an example of a suitable | the following description should be seen as an example of a suitable | |||
| implementation. Other schemes may be used as well, as long as the | implementation. Other schemes may be used as well, as long as the | |||
| output for the same input is the same as the process described below. | output for the same input is the same as the process described below. | |||
| The output is the same when the set of output NAL units and their | The output is the same when the set of output NAL units and their | |||
| order are both identical. Optimizations relative to the described | order are both identical. Optimizations relative to the described | |||
| algorithms are possible. | algorithms are possible. | |||
| All normal RTP mechanisms related to buffer management apply. In | All normal RTP mechanisms related to buffer management apply. In | |||
| particular, duplicated or outdated RTP packets (as indicated by the | particular, duplicated or outdated RTP packets (as indicated by the | |||
| RTP sequence number and the RTP timestamp) are removed. To determine | RTP sequence number and the RTP timestamp) are removed. To determine | |||
| the exact time for decoding, factors such as a possible intentional | the exact time for decoding, factors, such as a possible intentional | |||
| delay to allow for proper inter-stream synchronization MUST be | delay to allow for proper inter-stream synchronization, MUST be | |||
| factored in. | factored in. | |||
| NAL units with NAL unit type values in the range of 0 to 27, | NAL units with NAL unit type values in the range of 0 to 27, | |||
| inclusive, may be passed to the decoder. NAL-unit-like structures | inclusive, may be passed to the decoder. NAL-unit-like structures | |||
| with NAL unit type values in the range of 28 to 31, inclusive, MUST | with NAL unit type values in the range of 28 to 31, inclusive, MUST | |||
| NOT be passed to the decoder. | NOT be passed to the decoder. | |||
| The receiver includes a receiver buffer, which is used to compensate | The receiver includes a receiver buffer, which is used to compensate | |||
| for transmission delay jitter within individual RTP stream, and to | for transmission delay jitter within individual RTP streams and to | |||
| reorder NAL units from transmission order to the NAL unit decoding | reorder NAL units from transmission order to the NAL unit decoding | |||
| order. In this section, the receiver operation is described under | order. In this section, the receiver operation is described under | |||
| the assumption that there is no transmission delay jitter within an | the assumption that there is no transmission delay jitter within an | |||
| RTP stream. To make a difference from a practical receiver buffer | RTP stream. To make a difference from a practical receiver buffer | |||
| that is also used for compensation of transmission delay jitter, the | that is also used for compensation of transmission delay jitter, the | |||
| receiver buffer is hereafter called the de-packetization buffer in | receiver buffer is hereafter called the de-packetization buffer in | |||
| this section. Receivers should also prepare for transmission delay | this section. Receivers should also prepare for transmission delay | |||
| jitter; that is, either reserve separate buffers for transmission | jitter, that is, either reserve separate buffers for transmission | |||
| delay jitter buffering and de-packetization buffering or use a | delay jitter buffering and de-packetization buffering or use a | |||
| receiver buffer for both transmission delay jitter and de- | receiver buffer for both transmission delay jitter and de- | |||
| packetization. Moreover, receivers should take transmission delay | packetization. Moreover, receivers should take transmission delay | |||
| jitter into account in the buffering operation, e.g., by additional | jitter into account in the buffering operation, e.g., by additional | |||
| initial buffering before starting of decoding and playback. | initial buffering before starting of decoding and playback. | |||
| The de-packetization process extracts the NAL units from the RTP | The de-packetization process extracts the NAL units from the RTP | |||
| packets in an RTP stream as follows. When an RTP packet carries a | packets in an RTP stream as follows. When an RTP packet carries a | |||
| single NAL unit packet, the payload of the RTP packet is extracted as | single NAL unit packet, the payload of the RTP packet is extracted as | |||
| a single NAL unit, excluding the DONL field, i.e., third and fourth | a single NAL unit, excluding the DONL field, i.e., third and fourth | |||
| bytes, when sprop-max-don-diff is greater than 0. When an RTP packet | bytes, when sprop-max-don-diff is greater than 0. When an RTP packet | |||
| carries an Aggregation Packet, several NAL units are extracted from | carries an aggregation packet, several NAL units are extracted from | |||
| the payload of the RTP packet. In this case, each NAL unit | the payload of the RTP packet. In this case, each NAL unit | |||
| corresponds to the part of the payload of each aggregation unit that | corresponds to the part of the payload of each aggregation unit that | |||
| follows the NALU size field as described in Section 4.3.2. When an | follows the NALU size field, as described in Section 4.3.2. When an | |||
| RTP packet carries a Fragmentation Unit (FU), all RTP packets from | RTP packet carries a Fragmentation Unit (FU), all RTP packets from | |||
| the first FU (with the S field equal to 1) of the fragmented NAL unit | the first FU (with the S field equal to 1) of the fragmented NAL unit | |||
| up to the last FU (with the E field equal to 1) of the fragmented NAL | up to the last FU (with the E field equal to 1) of the fragmented NAL | |||
| unit are collected. The NAL unit is extracted from these RTP packets | unit are collected. The NAL unit is extracted from these RTP packets | |||
| by concatenating all FU payloads in the same order as the | by concatenating all FU payloads in the same order as the | |||
| corresponding RTP packets and appending the NAL unit header with the | corresponding RTP packets and appending the NAL unit header with the | |||
| fields F, LayerId, and TID, set to equal to the values of the fields | fields F, LayerId, and TID set to equal the values of the fields F, | |||
| F, LayerId, and TID in the payload header of the FUs respectively, | LayerId, and TID in the payload header of the FUs, respectively, and | |||
| and with the NAL unit type set equal to the value of the field FuType | with the NAL unit type set equal to the value of the field FuType in | |||
| in the FU header of the FUs, as described in Section 4.3.3. | the FU header of the FUs, as described in Section 4.3.3. | |||
| When sprop-max-don-diff is equal to 0, the de-packetization buffer | When sprop-max-don-diff is equal to 0, the de-packetization buffer | |||
| size is zero bytes, and the NAL units carried in the single RTP | size is zero bytes, and the NAL units carried in the single RTP | |||
| stream are directly passed to the decoder in their transmission | stream are directly passed to the decoder in their transmission | |||
| order, which is identical to their decoding order. | order, which is identical to their decoding order. | |||
| When sprop-max-don-diff is greater than 0, the process described in | When sprop-max-don-diff is greater than 0, the process described in | |||
| the remainder of this section applies. | the remainder of this section applies. | |||
| There are two buffering states in the receiver: initial buffering and | There are two buffering states in the receiver: initial buffering and | |||
| skipping to change at page 35, line 47 ¶ | skipping to change at line 1598 ¶ | |||
| Initial buffering lasts until the difference between the greatest and | Initial buffering lasts until the difference between the greatest and | |||
| smallest AbsDon values of the NAL units in the de-packetization | smallest AbsDon values of the NAL units in the de-packetization | |||
| buffer is greater than or equal to the value of sprop-max-don-diff. | buffer is greater than or equal to the value of sprop-max-don-diff. | |||
| After initial buffering, whenever the difference between the greatest | After initial buffering, whenever the difference between the greatest | |||
| and smallest AbsDon values of the NAL units in the de-packetization | and smallest AbsDon values of the NAL units in the de-packetization | |||
| buffer is greater than or equal to the value of sprop-max-don-diff, | buffer is greater than or equal to the value of sprop-max-don-diff, | |||
| the following operation is repeatedly applied until this difference | the following operation is repeatedly applied until this difference | |||
| is smaller than sprop-max-don-diff: | is smaller than sprop-max-don-diff: | |||
| * The NAL unit in the de-packetization buffer with the smallest | The NAL unit in the de-packetization buffer with the smallest | |||
| value of AbsDon is removed from the de-packetization buffer and | value of AbsDon is removed from the de-packetization buffer and | |||
| passed to the decoder. | passed to the decoder. | |||
| When no more NAL units are flowing into the de-packetization buffer, | When no more NAL units are flowing into the de-packetization buffer, | |||
| all NAL units remaining in the de-packetization buffer are removed | all NAL units remaining in the de-packetization buffer are removed | |||
| from the buffer and passed to the decoder in the order of increasing | from the buffer and passed to the decoder in the order of increasing | |||
| AbsDon values. | AbsDon values. | |||
| 7. Payload Format Parameters | 7. Payload Format Parameters | |||
| This section specifies the optional parameters. A mapping of the | This section specifies the optional parameters. A mapping of the | |||
| parameters with Session Description Protocol (SDP) [RFC4556] is also | parameters with Session Description Protocol (SDP) [RFC8866] is also | |||
| provided for applications that use SDP. | provided for applications that use SDP. | |||
| Parameters starting with the string "sprop" for stream properties can | Parameters starting with the string "sprop" for stream properties can | |||
| be used by a sender to provide a receiver with the properties of the | be used by a sender to provide a receiver with the properties of the | |||
| stream that is or will be sent. The media sender (and not the | stream that is or will be sent. The media sender (and not the | |||
| receiver) selects whether, and with what values, "sprop" parameters | receiver) selects whether, and with what values, "sprop" parameters | |||
| are being sent. This uncommon characteristic of the "sprop" | are being sent. This uncommon characteristic of the "sprop" | |||
| parameters may not be intuitive in the context of some signaling | parameters may not be intuitive in the context of some signaling | |||
| protocol concepts, especially with offer/answer. Please see | protocol concepts, especially with offer/answer. Please see | |||
| Section 7.3.2 for guidance specific to the use of sprop parameters in | Section 7.3.2 for guidance specific to the use of sprop parameters in | |||
| the Offer/Answer case. | the offer/answer case. | |||
| 7.1. Media Type Registration | 7.1. Media Type Registration | |||
| The receiver MUST ignore any parameter unspecified in this memo. | The receiver MUST ignore any parameter unspecified in this memo. | |||
| Type name: video | Type name: video | |||
| Subtype name: H266 | Subtype name: H266 | |||
| Required parameters: N/A | Required parameters: N/A | |||
| Optional parameters: | Optional parameters: profile-id, tier-flag, sub-profile-id, interop- | |||
| constraints, level-id, sprop-sublayer-id, sprop-ols-id, recv- | ||||
| profile-id, tier-flag, sub-profile-id, interop-constraints, level- | sublayer-id, recv-ols-id, max-recv-level-id, sprop-dci, sprop-vps, | |||
| id, sprop-sublayer-id, sprop-ols-id, recv-sublayer-id, recv-ols- | sprop-sps, sprop-pps, sprop-sei, max-lsr, max-fps, sprop-max-don- | |||
| id, max-recv-level-id, sprop-dci, sprop-vps, sprop-sps, sprop-pps, | diff, sprop-depack-buf-bytes, depack-buf-cap (refer to Section 7.2 | |||
| sprop-sei, max-lsr, max-fps, sprop-max-don-diff, sprop-depack-buf- | for definitions). | |||
| bytes, depack-buf-cap (Refer to Section 7.2 for definitions). | ||||
| Encoding considerations: | ||||
| This type is only defined for transfer via RTP (RFC 3550). | ||||
| Security considerations: | ||||
| See Section 9 of RFC XXXX. | Encoding considerations: This type is only defined for transfer via | |||
| RTP [RFC3550]. | ||||
| Interoperability considerations: N/A | Security considerations: See Section 9 of RFC 9328. | |||
| Published specification: | ||||
| Please refer to RFC XXXX and VVC coding specification [VVC]. | Interoperability considerations: N/A | |||
| Applications that use this media type: | Published specification: Please refer to RFC 9328 and VVC coding | |||
| specification [VVC]. | ||||
| Any application that relies on VVC-based video services over RTP | Applications that use this media type: Any application that relies | |||
| on VVC-based video services over RTP | ||||
| Fragment identifier considerations: N/A | Fragment identifier considerations: N/A | |||
| Additional information: N/A | Additional information: N/A | |||
| Person & email address to contact for further information: | Person & email address to contact for further information: | |||
| Stephan Wenger (stewe@stewe.org) | Stephan Wenger (stewe@stewe.org) | |||
| Intended usage: COMMON | Intended usage: COMMON | |||
| Restrictions on usage: N/A | ||||
| Author: See Authors' Addresses section of RFC XXXX. | Restrictions on usage: N/A | |||
| Change controller: | Author: See Authors' Addresses section of RFC 9328. | |||
| IETF <avtcore@ietf.org> | Change controller: IETF <avtcore@ietf.org> | |||
| 7.2. Optional Parameters Definition | 7.2. Optional Parameters Definition | |||
| profile-id, tier-flag, sub-profile-id, interop-constraints, and | profile-id, tier-flag, sub-profile-id, interop-constraints, and | |||
| level-id: | level-id: | |||
| These parameters indicate the profile, the tier, the default | ||||
| These parameters indicate the profile, tier, default level, sub- | level, the sub-profile, and some constraints of the bitstream | |||
| profile, and some constraints of the bitstream carried by the RTP | carried by the RTP stream, or a specific set of the profile, the | |||
| stream, or a specific set of the profile, tier, default level, | tier, the default level, the sub-profile, and some constraints the | |||
| sub-profile and some constraints the receiver supports. | receiver supports. | |||
| The subset of coding tools that may have been used to generate the | The subset of coding tools that may have been used to generate the | |||
| bitstream or that the receiver supports, as well as some | bitstream or that the receiver supports, as well as some | |||
| additional constraints are indicated collectively by profile-id, | additional constraints, are indicated collectively by profile-id, | |||
| sub-profile-id, and interop-constraints. | sub-profile-id, and interop-constraints. | |||
| Informative note: There are 128 values of profile-id. The | | Informative note: There are 128 values of profile-id. The | |||
| subset of coding tools identified by the profile-id can be | | subset of coding tools identified by profile-id can be | |||
| further constrained with up to 255 instances of sub-profile-id. | | further constrained with up to 255 instances of sub-profile- | |||
| In addition, 68 bits included in interop-constraints, which can | | id. In addition, 68 bits included in interop-constraints, | |||
| be extended up to 324 bits provide means to further restrict | | which can be extended up to 324 bits, provide means to | |||
| tools from existing profiles. To be able to support this fine- | | further restrict tools from existing profiles. To be able | |||
| granular signaling of coding tool subsets with profile-id, sub- | | to support this fine-granular signaling of coding-tool | |||
| profile-id and interop-constraints, it would be safe to require | | subsets with profile-id, sub-profile-id, and interop- | |||
| symmetric use of these parameters in SDP offer/answer unless | | constraints, it would be safe to require symmetric use of | |||
| recv-ols-id is included in the SDP answer for choosing one of | | these parameters in SDP offer/answer unless recv-ols-id is | |||
| the layers offered. | | included in the SDP answer for choosing one of the layers | |||
| | offered. | ||||
| The tier is indicated by tier-flag. The default level is | The tier is indicated by tier-flag. The default level is | |||
| indicated by level-id. The tier and the default level specify the | indicated by level-id. The tier and the default level specify the | |||
| limits on values of syntax elements or arithmetic combinations of | limits on values of syntax elements or arithmetic combinations of | |||
| values of syntax elements that are followed when generating the | values of syntax elements that are followed when generating the | |||
| bitstream or that the receiver supports. | bitstream or that the receiver supports. | |||
| In SDP offer/answer, when the SDP answer does not include the | In SDP offer/answer, when the SDP answer does not include the | |||
| recv-ols-id parameter that is less than the sprop-ols-id parameter | recv-ols-id parameter that is less than the sprop-ols-id parameter | |||
| in the SDP offer, the following applies: | in the SDP offer, the following applies: | |||
| - The tier-flag, profile-id, sub-profile-id, and interop- | * The tier-flag, profile-id, sub-profile-id, and interop- | |||
| constraints parameters MUST be used symmetrically, i.e., the | constraints parameters MUST be used symmetrically, i.e., the | |||
| value of each of these parameters in the offer MUST be the same | value of each of these parameters in the offer MUST be the same | |||
| as that in the answer, either explicitly signaled or implicitly | as that in the answer, either explicitly signaled or implicitly | |||
| inferred. | inferred. | |||
| - The level-id parameter is changeable as long as the highest | * The level-id parameter is changeable as long as the highest | |||
| level indicated by the answer is either equal to or lower than | level indicated by the answer is either equal to or lower than | |||
| that in the offer. Note that a highest level higher than | that in the offer. Note that the highest level higher than | |||
| level-id in the offer for receiving can be included as max- | level-id in the offer for receiving can be included as max- | |||
| recv-level-id. | recv-level-id. | |||
| In SDP offer/answer, when the SDP answer does include the recv- | In SDP offer/answer, when the SDP answer does include the recv- | |||
| ols-id parameter that is less than the sprop-ols-id parameter | ols-id parameter that is less than the sprop-ols-id parameter in | |||
| in the SDP offer, the set of tier-flag, profile-id, sub- | the SDP offer, the set of tier-flag, profile-id, sub-profile-id, | |||
| profile-id, interop-constraints, and level-id parameters | interop-constraints, and level-id parameters included in the | |||
| included in the answer MUST be consistent with that for the | answer MUST be consistent with that for the chosen output layer | |||
| chosen output layer set as indicated in the SDP offer, with the | set as indicated in the SDP offer, with the exception that the | |||
| exception that the level-id parameter in the SDP answer is | level-id parameter in the SDP answer is changeable as long as the | |||
| changeable as long as the highest level indicated by the answer | highest level indicated by the answer is either lower than or | |||
| is either lower than or equal to that in the offer. | equal to that in the offer. | |||
| More specifications of these parameters, including how they relate | More specifications of these parameters, including how they relate | |||
| to syntax elements specified in [VVC] are provided below. | to syntax elements specified in [VVC], are provided below. | |||
| profile-id: | profile-id: | |||
| When profile-id is not present, a value of 1 (i.e., the Main 10 | When profile-id is not present, a value of 1 (i.e., the Main 10 | |||
| profile) MUST be inferred. | profile) MUST be inferred. | |||
| When used to indicate properties of a bitstream, profile-id is | When used to indicate properties of a bitstream, profile-id is | |||
| derived from the general_profile_idc syntax element that applies | derived from the general_profile_idc syntax element that applies | |||
| to the bitstream in an instance of the profile_tier_level( ) | to the bitstream in an instance of the profile_tier_level( ) | |||
| syntax structure. | syntax structure. | |||
| VVC bitstreams transported over RTP using the technologies of this | VVC bitstreams transported over RTP using the technologies of this | |||
| memo SHOULD contain only a single profile_tier_level( ) structure | memo SHOULD contain only a single profile_tier_level( ) structure | |||
| in the DCI, unless the sender can assure that a receiver can | in the DCI, unless the sender can assure that a receiver can | |||
| correctly decode the VVC bitstream regardless of which | correctly decode the VVC bitstream, regardless of which | |||
| profile_tier_level( ) structure contained in the DCI was used for | profile_tier_level( ) structure contained in the DCI was used for | |||
| deriving profile-id and other parameters for the SDP O/A exchange. | deriving profile-id and other parameters for the SDP offer/answer | |||
| exchange. | ||||
| As specified in [VVC], a profile_tier_level( ) syntax structure | As specified in [VVC], a profile_tier_level( ) syntax structure | |||
| may be contained in an SPS NAL unit, and one or more | may be contained in an SPS NAL unit, and one or more | |||
| profile_tier_level( ) syntax structures may be contained in a VPS | profile_tier_level( ) syntax structures may be contained in a VPS | |||
| NAL unit and in a DCI NAL unit. One of the following three cases | NAL unit and in a DCI NAL unit. One of the following three cases | |||
| applies to the container NAL unit of the profile_tier_level( ) | applies to the container NAL unit of the profile_tier_level( ) | |||
| syntax structure containing syntax elements used to derive the | syntax structure containing syntax elements used to derive the | |||
| values of profile-id, tier-flag, level-id, sub-profile-id, or | values of profile-id, tier-flag, level-id, sub-profile-id, or | |||
| interop-constraints: 1) The container NAL unit is an SPS, the | interop-constraints: | |||
| bitstream is a single-layer bitstream, and the profile_tier_level( | ||||
| ) syntax structures in all SPSs referenced by the CVSs in the | 1. The container NAL unit is an SPS, the bitstream is a single- | |||
| bitstream has the same values respectively for those | layer bitstream, and the profile_tier_level( ) syntax | |||
| profile_tier_level( ) syntax elements; 2) The container NAL unit | structures in all SPSs referenced by the CVSs in the bitstream | |||
| is a VPS, the profile_tier_level( ) syntax structure is the one in | have the same values respectively for those | |||
| the VPS that applies to the OLS corresponding to the bitstream, | profile_tier_level( ) syntax elements. | |||
| and the profile_tier_level( ) syntax structures applicable to the | ||||
| OLS corresponding to the bitstream in all VPSs referenced by the | 2. The container NAL unit is a VPS, the profile_tier_level( ) | |||
| CVSs in the bitstream have the same values respectively for those | syntax structure is the one in the VPS that applies to the OLS | |||
| profile_tier_level( ) syntax elements; 3) The container NAL unit | corresponding to the bitstream, and the profile_tier_level( ) | |||
| is a DCI NAL unit and the profile_tier_level( ) syntax structures | syntax structures applicable to the OLS corresponding to the | |||
| in all DCI NAL units in the bitstream has the same values | bitstream in all VPSs referenced by the CVSs in the bitstream | |||
| respectively for those profile_tier_level( ) syntax elements. | have the same values respectively for those | |||
| profile_tier_level( ) syntax elements. | ||||
| 3. The container NAL unit is a DCI NAL unit, and the | ||||
| profile_tier_level( ) syntax structures in all DCI NAL units | ||||
| in the bitstream have the same values respectively for those | ||||
| profile_tier_level( ) syntax elements. | ||||
| [VVC] allows for multiple profile_tier_level( ) structures in a | [VVC] allows for multiple profile_tier_level( ) structures in a | |||
| DCI NAL unit, which may contain different values for the syntax | DCI NAL unit, which may contain different values for the syntax | |||
| elements used to derive the values of profile-id, tier-flag, | elements used to derive the values of profile-id, tier-flag, | |||
| level-id, sub-profile-id, or interop-constraints in the different | level-id, sub-profile-id, or interop-constraints in the different | |||
| entries. However, herein defined is only a single profile-id, | entries. However, herein defined is only a single profile-id, | |||
| tier-flag, level-id, sub-profile-id, or interop-constraints. When | tier-flag, level-id, sub-profile-id, or interop-constraints. When | |||
| signaling these parameters and a DCI NAL unit is present with | signaling these parameters and a DCI NAL unit is present with | |||
| multiple profile_tier_level( ) structures, these values SHOULD be | multiple profile_tier_level( ) structures, these values SHOULD be | |||
| the same as the first profile_tier_level structure in the DCI, | the same as the first profile_tier_level structure in the DCI, | |||
| skipping to change at page 40, line 4 ¶ | skipping to change at line 1789 ¶ | |||
| level-id, sub-profile-id, or interop-constraints in the different | level-id, sub-profile-id, or interop-constraints in the different | |||
| entries. However, herein defined is only a single profile-id, | entries. However, herein defined is only a single profile-id, | |||
| tier-flag, level-id, sub-profile-id, or interop-constraints. When | tier-flag, level-id, sub-profile-id, or interop-constraints. When | |||
| signaling these parameters and a DCI NAL unit is present with | signaling these parameters and a DCI NAL unit is present with | |||
| multiple profile_tier_level( ) structures, these values SHOULD be | multiple profile_tier_level( ) structures, these values SHOULD be | |||
| the same as the first profile_tier_level structure in the DCI, | the same as the first profile_tier_level structure in the DCI, | |||
| unless the sender has ensured that the receiver can decode the | unless the sender has ensured that the receiver can decode the | |||
| bitstream when a different value is chosen. | bitstream when a different value is chosen. | |||
| tier-flag, level-id: | tier-flag, level-id: | |||
| The value of tier-flag MUST be in the range of 0 to 1, inclusive. | The value of tier-flag MUST be in the range of 0 to 1, inclusive. | |||
| The value of level-id MUST be in the range of 0 to 255, inclusive. | The value of level-id MUST be in the range of 0 to 255, inclusive. | |||
| If the tier-flag and level-id parameters are used to indicate | If the tier-flag and level-id parameters are used to indicate | |||
| properties of a bitstream, they indicate the tier and the highest | properties of a bitstream, they indicate the tier and the highest | |||
| level the bitstream complies with. | level the bitstream complies with. | |||
| If the tier-flag and level-id parameters are used for capability | If the tier-flag and level-id parameters are used for capability | |||
| exchange, the following applies. If max-recv-level-id is not | exchange, the following applies. If max-recv-level-id is not | |||
| present, the default level defined by level-id indicates the | present, the default level defined by level-id indicates the | |||
| highest level the codec wishes to support. Otherwise, max-recv- | highest level the codec wishes to support. Otherwise, max-recv- | |||
| level-id indicates the highest level the codec supports for | level-id indicates the highest level the codec supports for | |||
| receiving. For either receiving or sending, all levels that are | receiving. For either receiving or sending, all levels that are | |||
| lower than the highest level supported MUST also be supported. | lower than the highest level supported MUST also be supported. | |||
| If no tier-flag is present, a value of 0 MUST be inferred; if no | If no tier-flag is present, a value of 0 MUST be inferred; if no | |||
| level-id is present, a value of 51 (i.e., level 3.1) MUST be | level-id is present, a value of 51 (i.e., level 3.1) MUST be | |||
| inferred. | inferred. | |||
| Informative note: The level values currently defined in the VVC | | Informative note: The level values currently defined in the | |||
| specification are in the form of "majorNum.minorNum", and the | | VVC specification are in the form of "majorNum.minorNum", | |||
| value of the level-id for each of the levels is equal to | | and the value of the level-id for each of the levels is | |||
| majorNum * 16 + minorNum * 3. It is expected that if any | | equal to majorNum * 16 + minorNum * 3. It is expected that, | |||
| levels are defined in the future, the same convention will be | | if any levels are defined in the future, the same convention | |||
| used, but this cannot be guaranteed. | | will be used, but this cannot be guaranteed. | |||
| When used to indicate properties of a bitstream, the tier-flag and | When used to indicate properties of a bitstream, the tier-flag and | |||
| level-id parameters are derived respectively from the syntax | level-id parameters are derived respectively from the syntax | |||
| element general_tier_flag, and the syntax element | element general_tier_flag, and the syntax element | |||
| general_level_idc or sub_layer_level_idc[j], that apply to the | general_level_idc or sub_layer_level_idc[j], that apply to the | |||
| bitstream, in an instance of the profile_tier_level( ) syntax | bitstream in an instance of the profile_tier_level( ) syntax | |||
| structure. | structure. | |||
| If the tier-flag and level-id are derived from the | If the tier-flag and level-id are derived from the | |||
| profile_tier_level( ) syntax structure in a DCI NAL unit, the | profile_tier_level( ) syntax structure in a DCI NAL unit, the | |||
| following applies: | following applies: | |||
| - tier-flag = general_tier_flag | * tier-flag = general_tier_flag | |||
| - level-id = general_level_idc | * level-id = general_level_idc | |||
| Otherwise, if the tier-flag and level-id are derived from the | Otherwise, if the tier-flag and level-id are derived from the | |||
| profile_tier_level( ) syntax structure in an SPS or VPS NAL unit, | profile_tier_level( ) syntax structure in an SPS or VPS NAL unit, | |||
| and the bitstream contains the highest sublayer representation in | and the bitstream contains the highest sublayer representation in | |||
| the OLS corresponding to the bitstream, the following applies: | the OLS corresponding to the bitstream, the following applies: | |||
| - tier-flag = general_tier_flag | * tier-flag = general_tier_flag | |||
| - level-id = general_level_idc | ||||
| * level-id = general_level_idc | ||||
| Otherwise, if the tier-flag and level-id are derived from the | Otherwise, if the tier-flag and level-id are derived from the | |||
| profile_tier_level( ) syntax structure in an SPS or VPS NAL | profile_tier_level( ) syntax structure in an SPS or VPS NAL unit, | |||
| unit, and the bitstream does not contain the highest sublayer | and the bitstream does not contain the highest sublayer | |||
| representation in the OLS corresponding to the bitstream, the | representation in the OLS corresponding to the bitstream, the | |||
| following applies, with j being the value of the sprop- | following applies, with j being the value of the sprop-sublayer-id | |||
| sublayer-id parameter: | parameter: | |||
| - tier-flag = general_tier_flag | * tier-flag = general_tier_flag | |||
| - level-id = sub_layer_level_idc[j] | * level-id = sub_layer_level_idc[j] | |||
| sub-profile-id: | sub-profile-id: | |||
| The value of the parameter is a comma-separated (',') list of data | The value of the parameter is a comma-separated (',') list of data | |||
| using base64 encoding (Section 4 of [RFC4648]) representation | using base64 encoding (Section 4 of [RFC4648]) representation | |||
| without "==" padding. | without "==" padding. | |||
| When used to indicate properties of a bitstream, sub-profile-id is | When used to indicate properties of a bitstream, sub-profile-id is | |||
| derived from each of the ptl_num_sub_profiles | derived from each of the ptl_num_sub_profiles | |||
| general_sub_profile_idc[i] syntax elements that apply to the | general_sub_profile_idc[i] syntax elements that apply to the | |||
| bitstream in a profile_tier_level( ) syntax structure. | bitstream in a profile_tier_level( ) syntax structure. | |||
| interop-constraints: | interop-constraints: | |||
| skipping to change at page 41, line 29 ¶ | skipping to change at line 1861 ¶ | |||
| The value of the parameter is a comma-separated (',') list of data | The value of the parameter is a comma-separated (',') list of data | |||
| using base64 encoding (Section 4 of [RFC4648]) representation | using base64 encoding (Section 4 of [RFC4648]) representation | |||
| without "==" padding. | without "==" padding. | |||
| When used to indicate properties of a bitstream, sub-profile-id is | When used to indicate properties of a bitstream, sub-profile-id is | |||
| derived from each of the ptl_num_sub_profiles | derived from each of the ptl_num_sub_profiles | |||
| general_sub_profile_idc[i] syntax elements that apply to the | general_sub_profile_idc[i] syntax elements that apply to the | |||
| bitstream in a profile_tier_level( ) syntax structure. | bitstream in a profile_tier_level( ) syntax structure. | |||
| interop-constraints: | interop-constraints: | |||
| A base64 encoding (Section 4 of [RFC4648]) representation of the | A base64 encoding (Section 4 of [RFC4648]) representation of the | |||
| data that includes the syntax elements | data that includes the ptl_frame_only_constraint_flag syntax | |||
| ptl_frame_only_constraint_flag and ptl_multilayer_enabled_flag and | element, the ptl_multilayer_enabled_flag syntax element, and the | |||
| the general_constraints_info( ) syntax structure that apply to the | general_constraints_info( ) syntax structure that apply to the | |||
| bitstream in an instance of the profile_tier_level( ) syntax | bitstream in an instance of the profile_tier_level( ) syntax | |||
| structure. | structure. | |||
| If the interop-constraints parameter is not present, the following | If the interop-constraints parameter is not present, the following | |||
| MUST be inferred: | MUST be inferred: | |||
| - ptl_frame_only_constraint_flag = 1 | * ptl_frame_only_constraint_flag = 1 | |||
| - ptl_multilayer_enabled_flag = 0 | * ptl_multilayer_enabled_flag = 0 | |||
| - gci_present_flag in the general_constraints_info( ) syntax | * gci_present_flag in the general_constraints_info( ) syntax | |||
| structure = 0 | structure = 0 | |||
| Using interop-constraints for capability exchange results in a | Using interop-constraints for capability exchange results in a | |||
| requirement on any bitstream to be compliant with the interop- | requirement on any bitstream to be compliant with the interop- | |||
| constraints. | constraints. | |||
| sprop-sublayer-id: | sprop-sublayer-id: | |||
| This parameter MAY be used to indicate the highest allowed value | This parameter MAY be used to indicate the highest allowed value | |||
| of TID in the bitstream. When not present, the value of sprop- | of TID in the bitstream. When not present, the value of sprop- | |||
| sublayer-id is inferred to be equal to 6. | sublayer-id is inferred to be equal to 6. | |||
| The value of sprop-sublayer-id MUST be in the range of 0 to 6, | The value of sprop-sublayer-id MUST be in the range of 0 to 6, | |||
| inclusive. | inclusive. | |||
| sprop-ols-id: | sprop-ols-id: | |||
| This parameter MAY be used to indicate the OLS that the bitstream | This parameter MAY be used to indicate the OLS that the bitstream | |||
| applies to. When not present, the value of sprop-ols-id is | applies to. When not present, the value of sprop-ols-id is | |||
| inferred to be equal to TargetOlsIdx as specified in 8.1.1 in | inferred to be equal to TargetOlsIdx, as specified in | |||
| [VVC]. If this optional parameter is present, sprop-vps MUST also | Section 8.1.1 of [VVC]. If this optional parameter is present, | |||
| be present or its content MUST be known a priori at the receiver. | sprop-vps MUST also be present or its content MUST be known a | |||
| priori at the receiver. | ||||
| The value of sprop-ols-id MUST be in the range of 0 to 256, | The value of sprop-ols-id MUST be in the range of 0 to 256, | |||
| inclusive. | inclusive. | |||
| Informative note: VVC allows having up to 257 output layer sets | | Informative note: VVC allows having up to 257 output layer | |||
| indicated in the VPS as the number of output layer sets minus 2 | | sets indicated in the VPS, as the number of output layer | |||
| is indicated with a field of 8 bits. | | sets minus 2 is indicated with a field of 8 bits. | |||
| recv-sublayer-id: | recv-sublayer-id: | |||
| This parameter MAY be used to signal a receiver's choice of the | This parameter MAY be used to signal a receiver's choice of the | |||
| offered or declared sublayer representations in the sprop-vps and | offered or declared sublayer representations in sprop-vps and | |||
| sprop-sps. The value of recv-sublayer-id indicates the TID of the | sprop-sps. The value of recv-sublayer-id indicates the TID of the | |||
| highest sublayer that a receiver supports. When not present, the | highest sublayer that a receiver supports. When not present, the | |||
| value of recv-sublayer-id is inferred to be equal to the value of | value of recv-sublayer-id is inferred to be equal to the value of | |||
| the sprop-sublayer-id parameter in the SDP offer. | the sprop-sublayer-id parameter in the SDP offer. | |||
| The value of recv-sublayer-id MUST be in the range of 0 to 6, | The value of recv-sublayer-id MUST be in the range of 0 to 6, | |||
| inclusive. | inclusive. | |||
| recv-ols-id: | recv-ols-id: | |||
| This parameter MAY be used to signal a receiver's choice of the | This parameter MAY be used to signal a receiver's choice of the | |||
| offered or declared output layer sets in the sprop-vps. The value | offered or declared output layer sets in sprop-vps. The value of | |||
| of recv-ols-id indicates the OLS index of the bitstream that a | recv-ols-id indicates the OLS index of the bitstream that a | |||
| receiver supports. When not present, the value of recv-ols-id is | receiver supports. When not present, the value of recv-ols-id is | |||
| inferred to be equal to value of the sprop-ols-id parameter | inferred to be equal to the value of the sprop-ols-id parameter | |||
| inferred from or indicated in the SDP offer. When present, the | inferred from or indicated in the SDP offer. When present, the | |||
| value of recv-ols-id must be included only when sprop-ols-id was | value of recv-ols-id must be included only when sprop-ols-id was | |||
| received and must refer to an output layer set in the VPS that | received and must refer to an output layer set in the VPS that | |||
| includes no layers other than all or a subset of the layers of the | includes no layers other than all or a subset of the layers of the | |||
| OLS referred to by sprop-ols-id. If this optional parameter is | OLS referred to by sprop-ols-id. If this optional parameter is | |||
| present, sprop-vps must have been received or its content must be | present, sprop-vps must have been received or its content must be | |||
| known a priori at the receiver. | known a priori at the receiver. | |||
| The value of recv-ols-id MUST be in the range of 0 to 256, | The value of recv-ols-id MUST be in the range of 0 to 256, | |||
| inclusive. | inclusive. | |||
| skipping to change at page 43, line 23 ¶ | skipping to change at line 1947 ¶ | |||
| The value of max-recv-level-id MUST be in the range of 0 to 255, | The value of max-recv-level-id MUST be in the range of 0 to 255, | |||
| inclusive. | inclusive. | |||
| When max-recv-level-id is not present, the value is inferred to be | When max-recv-level-id is not present, the value is inferred to be | |||
| equal to level-id. | equal to level-id. | |||
| max-recv-level-id MUST NOT be present when the highest level the | max-recv-level-id MUST NOT be present when the highest level the | |||
| receiver supports is not higher than the default level. | receiver supports is not higher than the default level. | |||
| sprop-dci: | sprop-dci: | |||
| This parameter MAY be used to convey a decoding capability | This parameter MAY be used to convey a decoding capability | |||
| information NAL unit of the bitstream for out-of-band | information NAL unit of the bitstream for out-of-band | |||
| transmission. The parameter MAY also be used for capability | transmission. The parameter MAY also be used for capability | |||
| exchange. The value of the parameter a base64 encoding (Section 4 | exchange. The value of the parameter is a base64 encoding | |||
| of [RFC4648]) representations of the decoding capability | (Section 4 of [RFC4648]) representation of the decoding capability | |||
| information NAL unit as specified in Section 7.3.2.1 of [VVC]. | information NAL unit, as specified in Section 7.3.2.1 of [VVC]. | |||
| sprop-vps: | sprop-vps: | |||
| This parameter MAY be used to convey any video parameter set to | ||||
| This parameter MAY be used to convey any video parameter set NAL | the NAL unit of the bitstream for out-of-band transmission of | |||
| unit of the bitstream for out-of-band transmission of video | video parameter sets. The parameter MAY also be used for | |||
| parameter sets. The parameter MAY also be used for capability | capability exchange and to indicate substream characteristics | |||
| exchange and to indicate sub-stream characteristics (i.e., | (i.e., properties of output layer sets and sublayer | |||
| properties of output layer sets and sublayer representations as | representations, as defined in [VVC]). The value of the parameter | |||
| defined in [VVC]). The value of the parameter is a comma- | is a comma-separated (',') list of base64 encoding (Section 4 of | |||
| separated (',') list of base64 encoding (Section 4 of [RFC4648]) | [RFC4648]) representations of the video parameter set NAL units, | |||
| representations of the video parameter set NAL units as specified | as specified in Section 7.3.2.3 of [VVC]. | |||
| in Section 7.3.2.3 of [VVC]. | ||||
| The sprop-vps parameter MAY contain one or more than one video | The sprop-vps parameter MAY contain one or more than one video | |||
| parameter set NAL units. However, all other video parameter sets | parameter set NAL units. However, all other video parameter sets | |||
| contained in the sprop-vps parameter MUST be consistent with the | contained in the sprop-vps parameter MUST be consistent with the | |||
| first video parameter set in the sprop-vps parameter. A video | first video parameter set in the sprop-vps parameter. A video | |||
| parameter set vpsB is said to be consistent with another video | parameter set vpsB is said to be consistent with another video | |||
| parameter set vpsA if the number of OLSs in vpsA and vpsB is the | parameter set vpsA if the number of OLSs in vpsA and vpsB are the | |||
| same and any decoder that conforms to the profile, tier, level, | same and any decoder that conforms to the profile, tier, level, | |||
| and constraints indicated by the data starting from the syntax | and constraints indicated by the data starting from the syntax | |||
| element general_profile_idc to the syntax structure | element general_profile_idc to the syntax structure | |||
| general_constraints_info(), inclusive, in the profile_tier_level( | general_constraints_info(), inclusive, in the profile_tier_level( | |||
| ) syntax structure corresponding to any OLS with index olsIdx in | ) syntax structure corresponding to any OLS with index olsIdx in | |||
| vpsA can decode any CVS(s) referencing vpsB when TargetOlsIdx is | vpsA can decode any CVS(s) referencing vpsB when TargetOlsIdx is | |||
| equal to olsIdx that conforms to the profile, tier, level, and | equal to olsIdx that conforms to the profile, tier, level, and | |||
| constraints indicated by the data starting from the syntax element | constraints indicated by the data starting from the syntax element | |||
| general_profile_idc to the syntax structure | general_profile_idc to the syntax structure | |||
| general_constraints_info(), inclusive, in the profile_tier_level( | general_constraints_info(), inclusive, in the profile_tier_level( | |||
| skipping to change at page 44, line 14 ¶ | skipping to change at line 1985 ¶ | |||
| ) syntax structure corresponding to any OLS with index olsIdx in | ) syntax structure corresponding to any OLS with index olsIdx in | |||
| vpsA can decode any CVS(s) referencing vpsB when TargetOlsIdx is | vpsA can decode any CVS(s) referencing vpsB when TargetOlsIdx is | |||
| equal to olsIdx that conforms to the profile, tier, level, and | equal to olsIdx that conforms to the profile, tier, level, and | |||
| constraints indicated by the data starting from the syntax element | constraints indicated by the data starting from the syntax element | |||
| general_profile_idc to the syntax structure | general_profile_idc to the syntax structure | |||
| general_constraints_info(), inclusive, in the profile_tier_level( | general_constraints_info(), inclusive, in the profile_tier_level( | |||
| ) syntax structure corresponding to the OLS with index | ) syntax structure corresponding to the OLS with index | |||
| TargetOlsIdx in vpsB. | TargetOlsIdx in vpsB. | |||
| sprop-sps: | sprop-sps: | |||
| This parameter MAY be used to convey sequence parameter set NAL | This parameter MAY be used to convey sequence parameter set NAL | |||
| units of the bitstream for out-of-band transmission of sequence | units of the bitstream for out-of-band transmission of sequence | |||
| parameter sets. The value of the parameter is a comma-separated | parameter sets. The value of the parameter is a comma-separated | |||
| (',') list of base64 encoding (Section 4 of [RFC4648]) | (',') list of base64 encoding (Section 4 of [RFC4648]) | |||
| representations of the sequence parameter set NAL units as | representations of the sequence parameter set NAL units, as | |||
| specified in Section 7.3.2.4 of [VVC]. | specified in Section 7.3.2.4 of [VVC]. | |||
| A sequence parameter set spsB is said to be consistent with | A sequence parameter set spsB is said to be consistent with | |||
| another sequence parameter set spsA if any decoder that conforms | another sequence parameter set spsA if any decoder that conforms | |||
| to the profile, tier, level, and constraints indicated by the data | to the profile, tier, level, and constraints indicated by the data | |||
| starting from the syntax element general_profile_idc to the syntax | starting from the syntax element general_profile_idc to the syntax | |||
| structure general_constraints_info(), inclusive, in the | structure general_constraints_info(), inclusive, in the | |||
| profile_tier_level( ) syntax structure in spsA can decode any | profile_tier_level( ) syntax structure in spsA can decode any | |||
| CLVS(s) referencing spsB that conforms to the profile, tier, | CLVS(s) referencing spsB that conforms to the profile, tier, | |||
| level, and constraints indicated by the data starting from the | level, and constraints indicated by the data starting from the | |||
| skipping to change at page 44, line 35 ¶ | skipping to change at line 2005 ¶ | |||
| starting from the syntax element general_profile_idc to the syntax | starting from the syntax element general_profile_idc to the syntax | |||
| structure general_constraints_info(), inclusive, in the | structure general_constraints_info(), inclusive, in the | |||
| profile_tier_level( ) syntax structure in spsA can decode any | profile_tier_level( ) syntax structure in spsA can decode any | |||
| CLVS(s) referencing spsB that conforms to the profile, tier, | CLVS(s) referencing spsB that conforms to the profile, tier, | |||
| level, and constraints indicated by the data starting from the | level, and constraints indicated by the data starting from the | |||
| syntax element general_profile_idc to the syntax structure | syntax element general_profile_idc to the syntax structure | |||
| general_constraints_info(), inclusive, in the profile_tier_level( | general_constraints_info(), inclusive, in the profile_tier_level( | |||
| ) syntax structure in spsB. | ) syntax structure in spsB. | |||
| sprop-pps: | sprop-pps: | |||
| This parameter MAY be used to convey picture parameter set NAL | This parameter MAY be used to convey picture parameter set NAL | |||
| units of the bitstream for out-of-band transmission of picture | units of the bitstream for out-of-band transmission of picture | |||
| parameter sets. The value of the parameter is a comma-separated | parameter sets. The value of the parameter is a comma-separated | |||
| (',') list of base64 encoding (Section 4 of [RFC4648]) | (',') list of base64 encoding (Section 4 of [RFC4648]) | |||
| representations of the picture parameter set NAL units as | representations of the picture parameter set NAL units, as | |||
| specified in Section 7.3.2.5 of [VVC]. | specified in Section 7.3.2.5 of [VVC]. | |||
| sprop-sei: | sprop-sei: | |||
| This parameter MAY be used to convey one or more SEI messages that | This parameter MAY be used to convey one or more SEI messages that | |||
| describe bitstream characteristics. When present, a decoder can | describe bitstream characteristics. When present, a decoder can | |||
| rely on the bitstream characteristics that are described in the | rely on the bitstream characteristics that are described in the | |||
| SEI messages for the entire duration of the session, independently | SEI messages for the entire duration of the session, independently | |||
| from the persistence scopes of the SEI messages as specified in | from the persistence scopes of the SEI messages, as specified in | |||
| [VSEI]. | [VSEI]. | |||
| The value of the parameter is a comma-separated (',') list of | The value of the parameter is a comma-separated (',') list of | |||
| base64 encoding (Section 4 of [RFC4648]) representations of SEI | base64 encoding (Section 4 of [RFC4648]) representations of SEI | |||
| NAL units as specified in [VSEI]. | NAL units, as specified in [VSEI]. | |||
| Informative note: Intentionally, no list of applicable or | ||||
| inapplicable SEI messages is specified here. Conveying certain | ||||
| SEI messages in sprop-sei may be sensible in some application | ||||
| scenarios and meaningless in others. However, a few examples | ||||
| are described below: | ||||
| 1) In an environment where the bitstream was created from film- | ||||
| based source material, and no splicing is going to occur during | ||||
| the lifetime of the session, the film grain characteristics SEI | ||||
| message is likely meaningful, and sending it in sprop-sei | ||||
| rather than in the bitstream at each entry point may help with | ||||
| saving bits and allows one to configure the renderer only once, | ||||
| avoiding unwanted artifacts. | ||||
| 2) Examples for SEI messages that would be meaningless to be | | Informative note: Intentionally, no list of applicable or | |||
| conveyed in sprop-sei include the decoded picture hash SEI | | inapplicable SEI messages is specified here. Conveying | |||
| message (it is close to impossible that all decoded pictures | | certain SEI messages in sprop-sei may be sensible in some | |||
| have the same hashtag) or the filler payload SEI message (as | | application scenarios and meaningless in others. However, a | |||
| there is no point in just having more bits in SDP). | | few examples are described below: | |||
| | | ||||
| | In an environment where the bitstream was created from film- | ||||
| | based source material, and no splicing is going to occur | ||||
| | during the lifetime of the session, the film grain | ||||
| | characteristics SEI message is likely meaningful, and | ||||
| | sending it in sprop-sei, rather than in the bitstream at | ||||
| | each entry point, may help with saving bits and allows one | ||||
| | to configure the renderer only once, avoiding unwanted | ||||
| | artifacts. | ||||
| | | ||||
| | Examples for SEI messages that would be meaningless to be | ||||
| | conveyed in sprop-sei include the decoded picture hash SEI | ||||
| | message (it is close to impossible that all decoded pictures | ||||
| | have the same hashtag) or the filler payload SEI message (as | ||||
| | there is no point in just having more bits in SDP). | ||||
| max-lsr: | max-lsr: | |||
| The max-lsr MAY be used to signal the capabilities of a receiver | The max-lsr MAY be used to signal the capabilities of a receiver | |||
| implementation and MUST NOT be used for any other purpose. The | implementation and MUST NOT be used for any other purpose. The | |||
| value of max-lsr is an integer indicating the maximum processing | value of max-lsr is an integer indicating the maximum processing | |||
| rate in units of luma samples per second. The max-lsr parameter | rate in units of luma samples per second. The max-lsr parameter | |||
| signals that the receiver is capable of decoding video at a higher | signals that the receiver is capable of decoding video at a higher | |||
| rate than is required by the highest level. | rate than is required by the highest level. | |||
| Informative note: When the OPTIONAL media type parameters are | | Informative note: When the OPTIONAL media type parameters | |||
| used to signal the properties of a bitstream, and max-lsr is | | are used to signal the properties of a bitstream, and max- | |||
| not present, the values of tier-flag, profile-id, sub-profile- | | lsr is not present, the values of tier-flag, profile-id, | |||
| id interop-constraints, and level-id must always be such that | | sub-profile-id, interop-constraints, and level-id must | |||
| the bitstream complies fully with the specified profile, tier, | | always be such that the bitstream complies fully with the | |||
| and level. | | specified profile, sub-profile, tier, level, and interop- | |||
| | constraints. | ||||
| When max-lsr is signaled, the receiver MUST be able to decode | When max-lsr is signaled, the receiver MUST be able to decode | |||
| bitstreams that conform to the highest level, with the exception | bitstreams that conform to the highest level, with the exception | |||
| that the MaxLumaSr value in Table 136 of [VVC] for the highest | that the MaxLumaSr value in Table A.3 of [VVC] for the highest | |||
| level is replaced with the value of max-lsr. Senders MAY use this | level is replaced with the value of max-lsr. Senders MAY use this | |||
| knowledge to send pictures of a given size at a higher picture | knowledge to send pictures of a given size at a higher picture | |||
| rate than is indicated in the highest level. | rate than is indicated in the highest level. | |||
| When not present, the value of max-lsr is inferred to be equal to | When not present, the value of max-lsr is inferred to be equal to | |||
| the value of MaxLumaSr given in Table 136 of [VVC] for the highest | the value of MaxLumaSr given in Table A.3 of [VVC] for the highest | |||
| level. | level. | |||
| The value of max-lsr MUST be in the range of MaxLumaSr to 16 * | The value of max-lsr MUST be in the range of MaxLumaSr to 16 * | |||
| MaxLumaSr, inclusive, where MaxLumaSr is given in Table 136 of | MaxLumaSr, inclusive, where MaxLumaSr is given in Table A.3 of | |||
| [VVC] for the highest level. | [VVC] for the highest level. | |||
| max-fps: | max-fps: | |||
| The value of max-fps is an integer indicating the maximum picture | The value of max-fps is an integer indicating the maximum picture | |||
| rate in units of pictures per 100 seconds that can be effectively | rate in units of pictures per 100 seconds that can be effectively | |||
| processed by the receiver. The max-fps parameter MAY be used to | processed by the receiver. The max-fps parameter MAY be used to | |||
| signal that the receiver has a constraint in that it is not | signal that the receiver has a constraint in that it is not | |||
| capable of processing video effectively at the full picture rate | capable of processing video effectively at the full picture rate | |||
| that is implied by the highest level and, when present, max-lsr. | that is implied by the highest level and, when present, max-lsr. | |||
| The value of max-fps is not necessarily the picture rate at which | The value of max-fps is not necessarily the picture rate at which | |||
| the maximum picture size can be sent, it constitutes a constraint | the maximum picture size can be sent; it constitutes a constraint | |||
| on maximum picture rate for all resolutions. | on maximum picture rate for all resolutions. | |||
| Informative note: The max-fps parameter is semantically | | Informative note: The max-fps parameter is semantically | |||
| different from max-lsr in that max-fps is used to signal a | | different from max-lsr in that max-fps is used to signal a | |||
| constraint, lowering the maximum picture rate from what is | | constraint, lowering the maximum picture rate from what is | |||
| implied by other parameters. | | implied by other parameters. | |||
| The encoder MUST use a picture rate equal to or less than this | The encoder MUST use a picture rate equal to or less than this | |||
| value. In cases where the max-fps parameter is absent, the | value. In cases where the max-fps parameter is absent, the | |||
| encoder is free to choose any picture rate according to the | encoder is free to choose any picture rate according to the | |||
| highest level and any signaled optional parameters. | highest level and any signaled optional parameters. | |||
| The value of max-fps MUST be smaller than or equal to the full | The value of max-fps MUST be smaller than or equal to the full | |||
| picture rate that is implied by the highest level and, when | picture rate that is implied by the highest level and, when | |||
| present, max-lsr. | present, max-lsr. | |||
| skipping to change at page 47, line 12 ¶ | skipping to change at line 2120 ¶ | |||
| of any two NAL units naluA and naluB, where naluA follows naluB in | of any two NAL units naluA and naluB, where naluA follows naluB in | |||
| decoding order and precedes naluB in transmission order. | decoding order and precedes naluB in transmission order. | |||
| The value of sprop-max-don-diff MUST be an integer in the range of | The value of sprop-max-don-diff MUST be an integer in the range of | |||
| 0 to 32767, inclusive. | 0 to 32767, inclusive. | |||
| When not present, the value of sprop-max-don-diff is inferred to | When not present, the value of sprop-max-don-diff is inferred to | |||
| be equal to 0. | be equal to 0. | |||
| sprop-depack-buf-bytes: | sprop-depack-buf-bytes: | |||
| This parameter signals the required size of the de-packetization | This parameter signals the required size of the de-packetization | |||
| buffer in units of bytes. The value of the parameter MUST be | buffer in units of bytes. The value of the parameter MUST be | |||
| greater than or equal to the maximum buffer occupancy (in units of | greater than or equal to the maximum buffer occupancy (in units of | |||
| bytes) of the de-packetization buffer as specified in Section 6. | bytes) of the de-packetization buffer, as specified in Section 6. | |||
| The value of sprop-depack-buf-bytes MUST be an integer in the | The value of sprop-depack-buf-bytes MUST be an integer in the | |||
| range of 0 to 4294967295, inclusive. | range of 0 to 4294967295, inclusive. | |||
| When sprop-max-don-diff is present and greater than 0, this | When sprop-max-don-diff is present and greater than 0, this | |||
| parameter MUST be present and the value MUST be greater than 0. | parameter MUST be present and the value MUST be greater than 0. | |||
| When not present, the value of sprop-depack-buf-bytes is inferred | When not present, the value of sprop-depack-buf-bytes is inferred | |||
| to be equal to 0. | to be equal to 0. | |||
| Informative note: The value of sprop-depack-buf-bytes indicates | | Informative note: The value of sprop-depack-buf-bytes | |||
| the required size of the de-packetization buffer only. When | | indicates the required size of the de-packetization buffer | |||
| network jitter can occur, an appropriately sized jitter buffer | | only. When network jitter can occur, an appropriately sized | |||
| has to be available as well. | | jitter buffer has to be available as well. | |||
| depack-buf-cap: | depack-buf-cap: | |||
| This parameter signals the capabilities of a receiver | This parameter signals the capabilities of a receiver | |||
| implementation and indicates the amount of de-packetization buffer | implementation and indicates the amount of de-packetization buffer | |||
| space in units of bytes that the receiver has available for | space in units of bytes that the receiver has available for | |||
| reconstructing the NAL unit decoding order from NAL units carried | reconstructing the NAL unit decoding order from NAL units carried | |||
| in the RTP stream. A receiver is able to handle any RTP stream | in the RTP stream. A receiver is able to handle any RTP stream | |||
| for which the value of the sprop-depack-buf-bytes parameter is | for which the value of the sprop-depack-buf-bytes parameter is | |||
| smaller than or equal to this parameter. | smaller than or equal to this parameter. | |||
| When not present, the value of depack-buf-cap is inferred to be | When not present, the value of depack-buf-cap is inferred to be | |||
| equal to 4294967295. The value of depack-buf-cap MUST be an | equal to 4294967295. The value of depack-buf-cap MUST be an | |||
| integer in the range of 1 to 4294967295, inclusive. | integer in the range of 1 to 4294967295, inclusive. | |||
| Informative note: depack-buf-cap indicates the maximum possible | | Informative note: depack-buf-cap indicates the maximum | |||
| size of the de-packetization buffer of the receiver only, | | possible size of the de-packetization buffer of the receiver | |||
| without allowing for network jitter. | | only, without allowing for network jitter. | |||
| 7.3. SDP Parameters | 7.3. SDP Parameters | |||
| The receiver MUST ignore any parameter unspecified in this memo. | The receiver MUST ignore any parameter unspecified in this memo. | |||
| 7.3.1. Mapping of Payload Type Parameters to SDP | 7.3.1. Mapping of Payload Type Parameters to SDP | |||
| The media type video/H266 string is mapped to fields in the Session | The media type video/H266 string is mapped to fields in the Session | |||
| Description Protocol (SDP) [RFC8866] as follows: | Description Protocol (SDP) [RFC8866] as follows: | |||
| * The media name in the "m=" line of SDP MUST be video. | * The media name in the "m=" line of SDP MUST be video. | |||
| * The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the | * The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the | |||
| media subtype). | media subtype). | |||
| * The clock rate in the "a=rtpmap" line MUST be 90000. | * The clock rate in the "a=rtpmap" line MUST be 90000. | |||
| * The OPTIONAL parameters profile-id, tier-flag, sub-profile-id, | * The OPTIONAL parameters profile-id, tier-flag, sub-profile-id, | |||
| interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id, | interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id, | |||
| recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max- | recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max- | |||
| fps, sprop-max-don-diff, sprop-depack-buf-bytes and depack-buf- | fps, sprop-max-don-diff, sprop-depack-buf-bytes, and depack-buf- | |||
| cap, when present, MUST be included in the "a=fmtp" line of SDP. | cap, when present, MUST be included in the "a=fmtp" line of SDP. | |||
| The fmtp line is expressed as a media type string, in the form of | The fmtp line is expressed as a media type string, in the form of | |||
| a semicolon-separated list of parameter=value pairs. | a semicolon-separated list of parameter=value pairs. | |||
| * The OPTIONAL parameter sprop-vps, sprop-sps, sprop-pps, sprop-sei, | * The OPTIONAL parameters sprop-vps, sprop-sps, sprop-pps, sprop- | |||
| and sprop-dci, when present, MUST be included in the "a=fmtp" line | sei, and sprop-dci, when present, MUST be included in the "a=fmtp" | |||
| of SDP or conveyed using the "fmtp" source attribute as specified | line of SDP or conveyed using the "fmtp" source attribute as | |||
| in Section 6.3 of [RFC5576]. For a particular media format (i.e., | specified in Section 6.3 of [RFC5576]. For a particular media | |||
| RTP payload type), sprop-vps, sprop-sps, sprop-pps, sprop-sei, or | format (i.e., RTP payload type), sprop-vps, sprop-sps, sprop-pps, | |||
| sprop-dci MUST NOT be both included in the "a=fmtp" line of SDP | sprop-sei, or sprop-dci MUST NOT be both included in the "a=fmtp" | |||
| and conveyed using the "fmtp" source attribute. When included in | line of SDP and conveyed using the "fmtp" source attribute. When | |||
| the "a=fmtp" line of SDP, those parameters are expressed as a | included in the "a=fmtp" line of SDP, those parameters are | |||
| media type string, in the form of a semicolon-separated list of | expressed as a media type string, in the form of a semicolon- | |||
| parameter=value pairs. When conveyed in the "a=fmtp" line of SDP | separated list of parameter=value pairs. When conveyed in the | |||
| for a particular payload type, the parameters sprop-vps, sprop- | "a=fmtp" line of SDP for a particular payload type, the parameters | |||
| sps, sprop-pps, sprop-sei, and sprop-dci MUST be applied to each | sprop-vps, sprop-sps, sprop-pps, sprop-sei, and sprop-dci MUST be | |||
| SSRC with the payload type. When conveyed using the "fmtp" source | applied to each SSRC with the payload type. When conveyed using | |||
| attribute, these parameters are only associated with the given | the "fmtp" source attribute, these parameters are only associated | |||
| source and payload type as parts of the "fmtp" source attribute. | with the given source and payload type as parts of the "fmtp" | |||
| source attribute. | ||||
| Informative note: Conveyance of sprop-vps, sprop-sps, and | | Informative note: Conveyance of sprop-vps, sprop-sps, and | |||
| sprop-pps using the "fmtp" source attribute allows for out-of- | | sprop-pps using the "fmtp" source attribute allows for out-of- | |||
| band transport of parameter sets in topologies like Topo-Video- | | band transport of parameter sets in topologies like Topo-Video- | |||
| switch-MCU as specified in [RFC7667] | | switch-MCU, as specified in [RFC7667]. | |||
| An general usage of media representation in SDP is as follows: | A general usage of media representation in SDP is as follows: | |||
| m=video 49170 RTP/AVP 98 | m=video 49170 RTP/AVP 98 | |||
| a=rtpmap:98 H266/90000 | a=rtpmap:98 H266/90000 | |||
| a=fmtp:98 profile-id=1; | a=fmtp:98 profile-id=1; | |||
| sprop-vps=<video parameter sets data>; | sprop-vps=<video parameter sets data>; | |||
| sprop-sps=<sequence parameter set data>; | sprop-sps=<sequence parameter set data>; | |||
| sprop-pps=<picture parameter set data>; | sprop-pps=<picture parameter set data>; | |||
| A SIP Offer/Answer exchange wherein both parties are expected to both | A SIP offer/answer exchange wherein both parties are expected to both | |||
| send and receive could look like the following. Only the media | send and receive could look like the following. Only the media | |||
| codec-specific parts of the SDP are shown. Some lines are wrapped | codec-specific parts of the SDP are shown. Some lines are wrapped | |||
| due to text constraints. | due to text constraints. | |||
| Offerer->Answerer: | Offerer->Answerer: | |||
| m=video 49170 RTP/AVP 98 | m=video 49170 RTP/AVP 98 | |||
| a=rtpmap:98 H266/90000 | a=rtpmap:98 H266/90000 | |||
| a=fmtp:98 profile-id=1; level_id=83; | a=fmtp:98 profile-id=1; level_id=83; | |||
| The above represents an offer for symmetric video communication using | The above represents an offer for symmetric video communication using | |||
| [VVC] and it's payload specification, at the main profile and level | [VVC] and its payload specification at the main profile and level 5.1 | |||
| 5.1 (and, as the levels are downgradable, all lower levels. | (and as the levels are downgradable, all lower levels). Informally | |||
| Informally speaking, this offer tells the receiver of the offer that | speaking, this offer tells the receiver of the offer that the sender | |||
| the sender is willing to receive up to 4Kp60 resolution at the | is willing to receive up to 4Kp60 resolution at the maximum bitrates | |||
| maximum bitrates specified in [VVC]. At the same time, if this offer | specified in [VVC]. At the same time, if this offer were accepted | |||
| were accepted "as is", the offer can expect that the answerer would | "as is", the offer can expect that the answerer would be able to | |||
| be able to receive and properly decode H.266 media up to and | receive and properly decode H.266 media up to and including level | |||
| including level 5.1. | 5.1. | |||
| Answerer->Offerer: | Answerer->Offerer: | |||
| m=video 49170 RTP/AVP 98 | m=video 49170 RTP/AVP 98 | |||
| a=rtpmap:98 H266/90000 | a=rtpmap:98 H266/90000 | |||
| a=fmtp:98 profile-id=1; level_id=67 | a=fmtp:98 profile-id=1; level_id=67 | |||
| With this answer to the offer above, the system receiving the offer | With this answer to the offer above, the system receiving the offer | |||
| advises the offerer that it is incapable of handing H.266 at level | advises the offerer that it is incapable of handing H.266 at level | |||
| 5.1 but is capable of decoding 1080p60. As H.266 video codecs must | 5.1 but is capable of decoding 1080p60. As H.266 video codecs must | |||
| support decoding at all levels below the maximum level they | support decoding at all levels below the maximum level they | |||
| implement, the resulting user experience would likely be that both | implement, the resulting user experience would likely be that both | |||
| systems send video at 1080p60. However, nothing prevents an encoder | systems send video at 1080p60. However, nothing prevents an encoder | |||
| from further downgrading its sending to, for example 720p30 if it | from further downgrading its sending to, for example, 720p30 if it | |||
| were short of cycles, bandwidth, or for other reasons. | were short of cycles or bandwidth or for other reasons. | |||
| 7.3.2. Usage with SDP Offer/Answer Model | 7.3.2. Usage with SDP Offer/Answer Model | |||
| This section describes the negotiation of unicast messages using the | This section describes the negotiation of unicast messages using the | |||
| offer-answer model as described in [RFC3264] and its updates. The | offer/answer model as described in [RFC3264] and its updates. The | |||
| section is split into subsections, covering a) media format | section is split into subsections, covering a) media format | |||
| configurations not involving non-temporal scalability; b) scalable | configurations not involving non-temporal scalability; b) scalable | |||
| media format configurations; c) the description of the use of those | media format configurations; c) the description of the use of those | |||
| parameters not involving the media configuration itself but rather | parameters not involving the media configuration itself but rather | |||
| the parameters of the payload format design; and d) multicast. | the parameters of the payload format design; and d) multicast. | |||
| 7.3.2.1. Non-scalable media format configuration | 7.3.2.1. Non-scalable Media Format Configuration | |||
| A non-scalable VVC media configuration is such a configuration where | A non-scalable VVC media configuration is such a configuration where | |||
| no non-temporal scalability mechanisms are allowed. In [VVC] version | no non-temporal scalability mechanisms are allowed. In [VVC] version | |||
| 1, that implies that general_profile_idc indicates one of the | 1, it is implied that general_profile_idc indicates one of the | |||
| following profiles: Main10, Main10 Still Picture, Main 10 4:4:4, | following profiles: Main 10, Main 10 Still Picture, Main 10 4:4:4, or | |||
| Main10 4:4:4 Still Picture, with general_profile_idc values of 1, 65, | Main 10 4:4:4 Still Picture, with general_profile_idc values of 1, | |||
| 33, and 97, respectively. Note that non-scalable media | 65, 33, and 97, respectively. Note that non-scalable media | |||
| configurations includes temporal scalability, inline with VVC's | configurations include temporal scalability inline with VVC's design | |||
| design philosophy and profile structure. | philosophy and profile structure. | |||
| The following limitations and rules pertaining to the media | The following limitations and rules pertaining to the media | |||
| configuration apply: | configuration apply: | |||
| * The parameters identifying a media format configuration for VVC | * The parameters identifying a media format configuration for VVC | |||
| are profile-id, tier-flag, sub-profile-id, level-id, and interop- | are profile-id, tier-flag, sub-profile-id, level-id, and interop- | |||
| constraints. These media configuration parameters, except level- | constraints. These media configuration parameters, except level- | |||
| id, MUST be used symmetrically. | id, MUST be used symmetrically. | |||
| The answerer MUST structure its answer in according to one of the | The answerer MUST structure its answer according to one of the | |||
| following three options: | following three options: | |||
| 1) maintain all configuration parameters with the values remaining | 1. maintain all configuration parameters with the values | |||
| the same as in the offer for the media format (payload type), with | remaining the same as in the offer for the media format | |||
| the exception that the value of level-id is changeable as long as | (payload type), with the exception that the value of level-id | |||
| the highest level indicated by the answer is not higher than that | is changeable as long as the highest level indicated by the | |||
| indicated by the offer; | answer is not higher than that indicated by the offer; | |||
| 2) include in the answer the recv-sublayer-id parameter, with a | 2. include in the answer the recv-sublayer-id parameter, with a | |||
| value less than the sprop-sublayer-id parameter in the offer, for | value less than the sprop-sublayer-id parameter in the offer, | |||
| the media format (payload type), and maintain all configuration | for the media format (payload type), and maintain all | |||
| parameters with the values remaining the same as in the offer for | configuration parameters with the values remaining the same as | |||
| the media format (payload type), with the exception that the value | in the offer for the media format (payload type), with the | |||
| of level-id is changeable as long as the highest level indicated | exception that the value of level-id is changeable as long as | |||
| by the answer is not higher than the level indicated by the sprop- | the highest level indicated by the answer is not higher than | |||
| sps or sprop-vps in offer for the chosen sublayer representation; | the level indicated by sprop-sps or sprop-vps in offer for the | |||
| or | chosen sublayer representation; or | |||
| 3) remove the media format (payload type) completely (when one or | ||||
| more of the parameter values are not supported). | ||||
| Informative note: The above requirement for symmetric use | 3. remove the media format (payload type) completely (when one or | |||
| does not apply for level-id, and does not apply for the | more of the parameter values are not supported). | |||
| other bitstream or RTP stream properties and capability | ||||
| parameters as described in Section 7.3.2.3 below. | | Informative note: The above requirement for symmetric use does | |||
| | not apply for level-id and does not apply for the other | ||||
| | bitstream or RTP stream properties and capability parameters, | ||||
| | as described in Section 7.3.2.3 below. | ||||
| * To simplify handling and matching of these configurations, the | * To simplify handling and matching of these configurations, the | |||
| same RTP payload type number used in the offer SHOULD also be used | same RTP payload type number used in the offer SHOULD also be used | |||
| in the answer, as specified in [RFC3264]. | in the answer, as specified in [RFC3264]. | |||
| * The same RTP payload type number used in the offer for the media | * The same RTP payload type number used in the offer for the media | |||
| subtype H266 MUST be used in the answer when the answer includes | subtype H266 MUST be used in the answer when the answer includes | |||
| recv-sublayer-id. When the answer does not include recv-sublayer- | recv-sublayer-id. When the answer does not include recv-sublayer- | |||
| id, the answer MUST NOT contain a payload type number used in the | id, the answer MUST NOT contain a payload type number used in the | |||
| offer for the media subtype H266 unless the configuration is | offer for the media subtype H266 unless the configuration is | |||
| exactly the same as in the offer or the configuration in the | exactly the same as in the offer or the configuration in the | |||
| answer only differs from that in the offer with a different value | answer only differs from that in the offer with a different value | |||
| of level-id. The answer MAY contain the recv-sublayer-id | of level-id. The answer MAY contain the recv-sublayer-id | |||
| parameter if an VVC bitstream contains multiple operation points | parameter if a VVC bitstream contains multiple operation points | |||
| (using temporal scalability and sublayers) and sprop-sps or sprop- | (using temporal scalability and sublayers) and sprop-sps or sprop- | |||
| vps is included in the offer where information of sublayers are | vps is included in the offer where information of sublayers are | |||
| present in the first sequence parameter set or video parameter set | present in the first sequence parameter set or video parameter set | |||
| contained in sprop-sps or sprop-vps respectively. If the sprop- | contained in sprop-sps or sprop-vps, respectively. If sprop-sps | |||
| sps or sprop-vps is provided in an offer, an answerer MAY select a | or sprop-vps is provided in an offer, an answerer MAY select a | |||
| particular operation point indicated in the first sequence | particular operation point indicated in the first sequence | |||
| parameter set or video parameter set contained in sprop-sps or | parameter set or video parameter set contained in sprop-sps or | |||
| sprop-vps respectively. When the answer includes a recv-sublayer- | sprop-vps, respectively. When the answer includes a recv- | |||
| id that is less than a sprop-sublayer-id in the offer, the | sublayer-id that is less than a sprop-sublayer-id in the offer, | |||
| following applies: | the following applies: | |||
| 1) When sprop-sps parameter is present, all sequence parameter | 1. When the sprop-sps parameter is present, all sequence | |||
| sets contained in the sprop-sps parameter in the SDP answer and | parameter sets contained in the sprop-sps parameter in the SDP | |||
| all sequence parameter sets sent in-band for either the offerer- | answer and all sequence parameter sets sent in-band for either | |||
| to-answerer direction or the answerer-to-offerer direction MUST be | the offerer-to-answerer direction or the answerer-to-offerer | |||
| consistent with the first sequence parameter set in the sprop-sps | direction MUST be consistent with the first sequence parameter | |||
| parameter of the offer (see the semantics of sprop-sps in | set in the sprop-sps parameter of the offer (see the semantics | |||
| Section 7.1 of this document on one sequence parameter set being | of sprop-sps in Section 7.1 of this document on one sequence | |||
| consistent with another sequence parameter set). | parameter set being consistent with another sequence parameter | |||
| set). | ||||
| 2) When sprop-vps parameter is present, all video parameter sets | 2. When the sprop-vps parameter is present, all video parameter | |||
| contained in the sprop-vps parameter in the SDP answer and all | sets contained in the sprop-vps parameter in the SDP answer | |||
| video parameter sets sent in-band for either the offerer-to- | and all video parameter sets sent in-band for either the | |||
| answerer direction or the answerer-to-offerer direction MUST be | offerer-to-answerer direction or the answerer-to-offerer | |||
| consistent with the first video parameter set in the sprop-vps | direction MUST be consistent with the first video parameter | |||
| parameter of the offer (see the semantics of sprop-vps in | set in the sprop-vps parameter of the offer (see the semantics | |||
| Section 7.1 of this document on one video parameter set being | of sprop-vps in Section 7.1 of this document on one video | |||
| consistent with another video parameter set). | parameter set being consistent with another video parameter | |||
| set). | ||||
| 3) The bitstream sent in either direction MUST conform to the | 3. The bitstream sent in either direction MUST conform to the | |||
| profile, tier, level, and constraints of the chosen sublayer | profile, tier, level, and constraints of the chosen sublayer | |||
| representation as indicated by the profile_tier_level( ) syntax | representation, as indicated by the profile_tier_level( ) | |||
| structure in the first sequence parameter set in the sprop-sps | syntax structure in the first sequence parameter set in the | |||
| parameter or by the first profile_tier_level( ) syntax structure | sprop-sps parameter or by the first profile_tier_level( ) | |||
| in the first video parameter set in the sprop-vps parameter of the | syntax structure in the first video parameter set in the | |||
| offer. | sprop-vps parameter of the offer. | |||
| Informative note: When an offerer receives an answer that | | Informative note: When an offerer receives an answer that does | |||
| does not include recv-sublayer-id, it has to compare payload | | not include recv-sublayer-id, it has to compare payload types | |||
| types not declared in the offer based on the media type | | not declared in the offer based on the media type (i.e., video/ | |||
| (i.e., video/H266) and the above media configuration | | H266) and the above media configuration parameters with any | |||
| parameters with any payload types it has already declared. | | payload types it has already declared. This will enable it to | |||
| This will enable it to determine whether the configuration | | determine whether the configuration in question is new or if it | |||
| in question is new or if it is equivalent to configuration | | is equivalent to configuration already offered, since a | |||
| already offered, since a different payload type number may | | different payload type number may be used in the answer. The | |||
| be used in the answer. The ability to perform operation | | ability to perform operation point selection enables a receiver | |||
| point selection enables a receiver to utilize the temporal | | to utilize the temporal scalable nature of a VVC bitstream. | |||
| scalable nature of an VVC bitstream. | ||||
| 7.3.2.2. Scalable media format configuration | 7.3.2.2. Scalable Media Format Configuration | |||
| A scalable VVC media configuration is such a configuration where non- | A scalable VVC media configuration is such a configuration where non- | |||
| temporal scalability mechanisms are allowed. In [VVC] version 1, | temporal scalability mechanisms are allowed. In [VVC] version 1, it | |||
| that implies that general_profile_idc indicates one of the following | is implied that general_profile_idc indicates one of the following | |||
| profiles: Multilayer Main 10, and Multilayer Main 10 4:4:4, with | profiles: Multilayer Main 10 and Multilayer Main 10 4:4:4, with | |||
| general_profile_idc values of 17 and 49, respectively. | general_profile_idc values of 17 and 49, respectively. | |||
| The following limitations and rules pertaining to the media | The following limitations and rules pertaining to the media | |||
| configuration apply. They are listed in an order that would be | configuration apply. They are listed in an order that would be | |||
| logical for an implementation to follow: | logical for an implementation to follow: | |||
| * The parameters identifying a media format configuration for | * The parameters identifying a media format configuration for | |||
| scalable VVC are profile-id, tier-flag, sub-profile-id, level-id, | scalable VVC are profile-id, tier-flag, sub-profile-id, level-id, | |||
| interop-constraints, and sprop-vps. These media configuration | interop-constraints, and sprop-vps. These media configuration | |||
| parameters, except level-id, MUST be used symmetrically, except as | parameters, except level-id, MUST be used symmetrically, except as | |||
| noted below. | noted below. | |||
| * The answerer MAY include a level-id that MUST be lower than or | * The answerer MAY include a level-id that MUST be lower than or | |||
| equal to the level-id indicated in the offer (either expressed by | equal to the level-id indicated in the offer (either expressed by | |||
| level-id in the offer, or implied by the default level as specific | level-id in the offer or implied by the default level, as | |||
| in Section 7.1). | specified in Section 7.1). | |||
| * When sprop-ols-id is present in an offer, sprop-vps MUST also be | * When sprop-ols-id is present in an offer, sprop-vps MUST also be | |||
| present in the same offer and including at least one valid VPS, so | present in the same offer and include at least one valid VPS so to | |||
| to allow the answerer to meaningfully interpret sprop-ols-id and | allow the answerer to meaningfully interpret sprop-ols-id and | |||
| select recv-ols-id (see below). | select recv-ols-id (see below). | |||
| * The answerer MUST NOT include recv-ols-id unless the offer | * The answerer MUST NOT include recv-ols-id unless the offer | |||
| includes sprop-ols-id. When present, recv-ols-id MUST indicate a | includes sprop-ols-id. When present, recv-ols-id MUST indicate a | |||
| supported output layer set in the VPS that includes no layers | supported output layer set in the VPS that includes no layers | |||
| other than all or a subset of the layers of the OLS referred to by | other than all or a subset of the layers of the OLS referred to by | |||
| sprop-ols-id. If unable, the answerer MUST remove the media | sprop-ols-id. If unable, the answerer MUST remove the media | |||
| format. | format. | |||
| Informative note: if an offerer wants to offer more than one | | Informative note: If an offerer wants to offer more than one | |||
| output layer set, it can do so by offering multiple VVC media | | output layer set, it can do so by offering multiple VVC media | |||
| with different payload types. | | with different payload types. | |||
| * The offerer MAY include sprop-sublayer-id which indicates the | * The offerer MAY include sprop-sublayer-id, which indicates the | |||
| highest allowed value of TID in the bitstream. The answerer MAY | highest allowed value of TID in the bitstream. The answerer MAY | |||
| include recv-sublayer-id which can be used to reduce the number of | include recv-sublayer-id, which can be used to reduce the number | |||
| sublayers from the value of sprop-sublayer-id. | of sublayers from the value of sprop-sublayer-id. | |||
| * When the answerer includes recv-ols-id and configuration | * When the answerer includes recv-ols-id and configuration | |||
| parameters profile-id, tier-flag, sub-profile-id, level-id, and | parameters profile-id, tier-flag, sub-profile-id, level-id, and | |||
| interop-constraints, it MUST use the configuration parameter | interop-constraints, it MUST use the configuration parameter | |||
| values as signaled in the sprop-vps for the operating point with | values as signaled in the sprop-vps for the operating point with | |||
| the largest number of sublayers for the chosen output layer set, | the largest number of sublayers for the chosen output layer set, | |||
| with the exception that the value of level-id is changeable as | with the exception that the value of level-id is changeable as | |||
| long as the highest level indicated by the answer is not higher | long as the highest level indicated by the answer is not higher | |||
| than the level indicated by the sprop-vps in offer for the | than the level indicated by sprop-vps in offer for the operating | |||
| operating point with the largest number of sublayers for the | point with the largest number of sublayers for the chosen output | |||
| chosen output layer set. | layer set. | |||
| 7.3.2.3. Payload format configuration | 7.3.2.3. Payload Format Configuration | |||
| The following limitations and rules pertain to the configuration of | The following limitations and rules pertain to the configuration of | |||
| the payload format buffer management mostly and apply to both | the payload format buffer management mostly and apply to both | |||
| scalable and non-scalable VVC. | scalable and non-scalable VVC. | |||
| * The parameters sprop-max-don-diff, and sprop-depack-buf-bytes | * The parameters sprop-max-don-diff and sprop-depack-buf-bytes | |||
| describe the properties of an RTP stream that the offerer or the | describe the properties of an RTP stream that the offerer or the | |||
| answerer is sending for the media format configuration. This | answerer is sending for the media format configuration. This | |||
| differs from the normal usage of the offer/answer parameters: | differs from the normal usage of the offer/answer parameters; | |||
| normally such parameters declare the properties of the bitstream | normally, such parameters declare the properties of the bitstream | |||
| or RTP stream that the offerer or the answerer is able to receive. | or RTP stream that the offerer or the answerer is able to receive. | |||
| When dealing with VVC, the offerer assumes that the answerer will | When dealing with VVC, the offerer assumes that the answerer will | |||
| be able to receive media encoded using the configuration being | be able to receive media encoded using the configuration being | |||
| offered. | offered. | |||
| Informative note: The above parameters apply for any RTP | | Informative note: The above parameters apply for any RTP | |||
| stream, when present, sent by a declaring entity with the same | | stream, when present, sent by a declaring entity with the same | |||
| configuration. In other words, the applicability of the above | | configuration. In other words, the applicability of the above | |||
| parameters to RTP streams depends on the source endpoint. | | parameters to RTP streams depends on the source endpoint. | |||
| Rather than being bound to the payload type, the values may | | Rather than being bound to the payload type, the values may | |||
| have to be applied to another payload type when being sent, as | | have to be applied to another payload type when being sent, as | |||
| they apply for the configuration. | | they apply for the configuration. | |||
| * The capability parameter max-lsr MAY be used to declare further | * The capability parameter max-lsr MAY be used to declare further | |||
| capabilities of the offerer or answerer for receiving. It MUST | capabilities of the offerer or answerer for receiving. It MUST | |||
| NOT be present when the direction attribute is sendonly. | NOT be present when the direction attribute is sendonly. | |||
| * The capability parameter max-fps MAY be used to declare lower | * The capability parameter max-fps MAY be used to declare lower | |||
| capabilities of the offerer or answerer for receiving. It MUST | capabilities of the offerer or answerer for receiving. It MUST | |||
| NOT be present when the direction attribute is sendonly. | NOT be present when the direction attribute is sendonly. | |||
| * When an offerer offers an interleaved stream, indicated by the | * When an offerer offers an interleaved stream, indicated by the | |||
| presence of sprop-max-don-diff with a value larger than zero, the | presence of sprop-max-don-diff with a value larger than zero, the | |||
| offerer MUST include the size of the de-packetization buffer | offerer MUST include the size of the de-packetization buffer | |||
| sprop-depack-buf-bytes. | sprop-depack-buf-bytes. | |||
| * To enable the offerer and answerer to inform each other about | * To enable the offerer and answerer to inform each other about | |||
| their capabilities for de-packetization buffering in receiving RTP | their capabilities for de-packetization buffering in receiving RTP | |||
| streams, both parties are RECOMMENDED to include depack-buf-cap. | streams, both parties are RECOMMENDED to include depack-buf-cap. | |||
| * The sprop-dci, sprop-vps, sprop-sps, or sprop-pps, when present | * The parameters sprop-dci, sprop-vps, sprop-sps, or sprop-pps, when | |||
| (included in the "a=fmtp" line of SDP or conveyed using the "fmtp" | present (included in the "a=fmtp" line of SDP or conveyed using | |||
| source attribute as specified in Section 6.3 of [RFC5576]), are | the "fmtp" source attribute, as specified in Section 6.3 of | |||
| used for out-of-band transport of the parameter sets (DCI, VPS, | [RFC5576]), are used for out-of-band transport of the parameter | |||
| SPS, or PPS, respectively). | sets (DCI, VPS, SPS, or PPS, respectively). | |||
| * The answerer MAY use either out-of-band or in-band transport of | * The answerer MAY use either out-of-band or in-band transport of | |||
| parameter sets for the bitstream it is sending, regardless of | parameter sets for the bitstream it is sending, regardless of | |||
| whether out-of-band parameter sets transport has been used in the | whether out-of-band parameter sets transport has been used in the | |||
| offerer-to-answerer direction. Parameter sets included in an | offerer-to-answerer direction. Parameter sets included in an | |||
| answer are independent of those parameter sets included in the | answer are independent of those parameter sets included in the | |||
| offer, as they are used for decoding two different bitstreams, one | offer, as they are used for decoding two different bitstreams; one | |||
| from the answerer to the offerer and the other in the opposit | from the answerer to the offerer and the other in the opposite | |||
| direction. In case some RTP packets are sent before the SDP | direction. In case some RTP packets are sent before the SDP | |||
| offer/answer settles down, in-band parameter sets MUST be used for | offer/answer settles down, in-band parameter sets MUST be used for | |||
| those RTP stream parts sent before the SDP offer/answer. | those RTP stream parts sent before the SDP offer/answer. | |||
| * The following rules apply to transport of parameter set in the | * The following rules apply to transport of parameter sets in the | |||
| offerer-to-answerer direction. | offerer-to-answerer direction. | |||
| - An offer MAY include sprop-dci, sprop-vps, sprop-sps, and/or | - An offer MAY include sprop-dci, sprop-vps, sprop-sps, and/or | |||
| sprop-pps. If none of these parameters is present in the | sprop-pps. If none of these parameters are present in the | |||
| offer, then only in-band transport of parameter sets is used. | offer, then only in-band transport of parameter sets is used. | |||
| - If the level to use in the offerer-to-answerer direction is | - If the level to use in the offerer-to-answerer direction is | |||
| equal to the default level in the offer, the answerer MUST be | equal to the default level in the offer, the answerer MUST be | |||
| prepared to use the parameter sets included in sprop-vps, | prepared to use the parameter sets included in sprop-vps, | |||
| sprop-sps, and sprop-pps (either included in the "a=fmtp" line | sprop-sps, and sprop-pps (either included in the "a=fmtp" line | |||
| of SDP or conveyed using the "fmtp" source attribute) for | of SDP or conveyed using the "fmtp" source attribute) for | |||
| decoding the incoming bitstream, e.g., by passing these | decoding the incoming bitstream, e.g., by passing these | |||
| parameter set NAL units to the video decoder before passing any | parameter set NAL units to the video decoder before passing any | |||
| NAL units carried in the RTP streams. Otherwise, the answerer | NAL units carried in the RTP streams. Otherwise, the answerer | |||
| MUST ignore sprop-vps, sprop-sps, and sprop-pps (either | MUST ignore sprop-vps, sprop-sps, and sprop-pps (either | |||
| included in the "a=fmtp" line of SDP or conveyed using the | included in the "a=fmtp" line of SDP or conveyed using the | |||
| "fmtp" source attribute) and the offerer MUST transmit | "fmtp" source attribute) and the offerer MUST transmit | |||
| parameter sets in-band. | parameter sets in-band. | |||
| * The following rules apply to transport of parameter set in the | * The following rules apply to transport of parameter sets in the | |||
| answerer-to-offerer direction. | answerer-to-offerer direction. | |||
| - An answer MAY include sprop-dci, sprop-vps, sprop-sps, and/or | - An answer MAY include sprop-dci, sprop-vps, sprop-sps, and/or | |||
| sprop-pps. If none of these parameters is present in the | sprop-pps. If none of these parameters are present in the | |||
| answer, then only in-band transport of parameter sets is used. | answer, then only in-band transport of parameter sets is used. | |||
| - The offerer MUST be prepared to use the parameter sets included | - The offerer MUST be prepared to use the parameter sets included | |||
| in sprop-vps, sprop-sps, and sprop-pps (either included in the | in sprop-vps, sprop-sps, and sprop-pps (either included in the | |||
| "a=fmtp" line of SDP or conveyed using the "fmtp" source | "a=fmtp" line of SDP or conveyed using the "fmtp" source | |||
| attribute) for decoding the incoming bitstream, e.g., by | attribute) for decoding the incoming bitstream, e.g., by | |||
| passing these parameter set NAL units to the video decoder | passing these parameter set NAL units to the video decoder | |||
| before passing any NAL units carried in the RTP streams. | before passing any NAL units carried in the RTP streams. | |||
| * When sprop-dci, sprop-vps, sprop-sps, and/or sprop-pps are | * When sprop-dci, sprop-vps, sprop-sps, and/or sprop-pps are | |||
| conveyed using the "fmtp" source attribute as specified in | conveyed using the "fmtp" source attribute, as specified in | |||
| Section 6.3 of [RFC5576], the receiver of the parameters MUST | Section 6.3 of [RFC5576], the receiver of the parameters MUST | |||
| store the parameter sets included in sprop-dci, sprop-vps, sprop- | store the parameter sets included in sprop-dci, sprop-vps, sprop- | |||
| sps, and/or sprop-pps and associate them with the source given as | sps, and/or sprop-pps and associate them with the source given as | |||
| part of the "fmtp" source attribute. Parameter sets associated | part of the "fmtp" source attribute. Parameter sets associated | |||
| with one source (given as part of the "fmtp" source attribute) | with one source (given as part of the "fmtp" source attribute) | |||
| MUST only be used to decode NAL units conveyed in RTP packets from | MUST only be used to decode NAL units conveyed in RTP packets from | |||
| the same source (given as part of the "fmtp" source attribute). | the same source (given as part of the "fmtp" source attribute). | |||
| When this mechanism is in use, SSRC collision detection and | When this mechanism is in use, SSRC collision detection and | |||
| resolution MUST be performed as specified in [RFC5576]. | resolution MUST be performed as specified in [RFC5576]. | |||
| Table 1 lists the interpretation of all the parameters that MAY be | Figure 11 lists the interpretation of all the parameters that MAY be | |||
| used for the various combinations of offer, answer, and direction | used for the various combinations of offer, answer, and direction | |||
| attributes. Note that the two columns wherein the recv-ols-id | attributes. | |||
| parameter is used only apply to answers, whereas the other columns | ||||
| apply to both offers and answers. | ||||
| sendonly --+ | sendonly --+ | |||
| answer: recvonly, recv-ols-id --+ | | answer: recvonly, recv-ols-id --+ | | |||
| recvonly w/o recv-ols-id --+ | | | recvonly w/o recv-ols-id --+ | | | |||
| answer: sendrecv, recv-ols-id --+ | | | | answer: sendrecv, recv-ols-id --+ | | | | |||
| sendrecv w/o recv-ols-id --+ | | | | | sendrecv w/o recv-ols-id --+ | | | | | |||
| | | | | | | | | | | | | |||
| profile-id C D C D P | profile-id C D C D P | |||
| tier-flag C D C D P | tier-flag C D C D P | |||
| level-id D D D D P | level-id D D D D P | |||
| skipping to change at page 57, line 32 ¶ | skipping to change at line 2553 ¶ | |||
| sprop-dci P P - - P | sprop-dci P P - - P | |||
| sprop-sei P P - - P | sprop-sei P P - - P | |||
| sprop-vps P P - - P | sprop-vps P P - - P | |||
| sprop-sps P P - - P | sprop-sps P P - - P | |||
| sprop-pps P P - - P | sprop-pps P P - - P | |||
| sprop-sublayer-id P P - - P | sprop-sublayer-id P P - - P | |||
| recv-sublayer-id O O O O - | recv-sublayer-id O O O O - | |||
| sprop-ols-id P P - - P | sprop-ols-id P P - - P | |||
| recv-ols-id X O X O - | recv-ols-id X O X O - | |||
| Table 1. Interpretation of parameters for various combinations of | ||||
| offers, answers, direction attributes, with and without recv-ols-id. | ||||
| Columns that do not indicate offer or answer apply to both. | ||||
| Legend: | Legend: | |||
| C: configuration for sending and receiving bitstreams | C: configuration for sending and receiving bitstreams | |||
| D: changeable configuration, same as C except possible | D: changeable configuration, same as C, except possible | |||
| to answer with a different but consistent value (see the | to answer with a different but consistent value (see the | |||
| semantics of the six parameters related to profile, tier, | semantics of the six parameters related to profile, tier, | |||
| and level on these parameters being consistent) | and level on these parameters being consistent) | |||
| P: properties of the bitstream to be sent | P: properties of the bitstream to be sent | |||
| R: receiver capabilities | R: receiver capabilities | |||
| O: operation point selection | O: operation point selection | |||
| X: MUST NOT be present | X: MUST NOT be present | |||
| -: not usable, when present MUST be ignored | -: not usable, when present MUST be ignored | |||
| Figure 11: Interpretation of Parameters for Various Combinations | ||||
| of Offers, Answers, and Direction Attributes, with and without | ||||
| recv-ols-id. | ||||
| Parameters used for declaring receiver capabilities are, in general, | Parameters used for declaring receiver capabilities are, in general, | |||
| downgradable; i.e., they express the upper limit for a sender's | downgradable, i.e., they express the upper limit for a sender's | |||
| possible behavior. Thus, a sender MAY select to set its encoder | possible behavior. Thus, a sender MAY select to set its encoder | |||
| using only lower/lesser or equal values of these parameters. | using only lower/lesser or equal values of these parameters. | |||
| When the answer does not include a recv-ols-id that is less than the | When the answer does not include a recv-ols-id that is less than the | |||
| sprop-ols-id in the offer, parameters declaring a configuration point | sprop-ols-id in the offer, parameters declaring a configuration point | |||
| are not changeable, with the exception of the level-id parameter for | are not changeable, with the exception of the level-id parameter for | |||
| unicast usage, and these parameters express values a receiver expects | unicast usage, and these parameters express values a receiver expects | |||
| to be used and MUST be used verbatim in the answer as in the offer. | to be used and MUST be used verbatim in the answer as in the offer. | |||
| When a sender's capabilities are declared with the configuration | When a sender's capabilities are declared with the configuration | |||
| skipping to change at page 58, line 26 ¶ | skipping to change at line 2596 ¶ | |||
| configurations in a single payload type. Thus, when multiple | configurations in a single payload type. Thus, when multiple | |||
| configuration offers are made, each offer requires its own RTP | configuration offers are made, each offer requires its own RTP | |||
| payload type associated with the offer. However, it is possible to | payload type associated with the offer. However, it is possible to | |||
| offer multiple operation points using one configuration in a single | offer multiple operation points using one configuration in a single | |||
| payload type by including sprop-vps in the offer and recv-ols-id in | payload type by including sprop-vps in the offer and recv-ols-id in | |||
| the answer. | the answer. | |||
| An implementation SHOULD be able to understand all media type | An implementation SHOULD be able to understand all media type | |||
| parameters (including all optional media type parameters), even if it | parameters (including all optional media type parameters), even if it | |||
| doesn't support the functionality related to the parameter. This, in | doesn't support the functionality related to the parameter. This, in | |||
| conjunction with proper application logic in the implementation | conjunction with proper application logic in the implementation, | |||
| allows the implementation, after having received an offer, to create | allows the implementation, after having received an offer, to create | |||
| an answer by potentially downgrading one or more of the optional | an answer by potentially downgrading one or more of the optional | |||
| parameters to the point where the implementation can cope, leading to | parameters to the point where the implementation can cope, leading to | |||
| higher chances of interoperability beyond the most basic interop | higher chances of interoperability beyond the most basic interop | |||
| points (for which, as described above, no optional parameters are | points (for which, as described above, no optional parameters are | |||
| necessary). | necessary). | |||
| Informative note: in implementations of previous H.26x payload | | Informative note: In implementations of previous H.26x payload | |||
| formats it was occasionally observed that implementations were | | formats, it was occasionally observed that implementations were | |||
| incapable of parsing most (or all) of the optional parameters. As | | incapable of parsing most (or all) of the optional parameters. | |||
| a result, the offer-answer exchange resulted in a baseline | | As a result, the offer/answer exchange resulted in a baseline | |||
| performance (using the default values for the optional parameters) | | performance (using the default values for the optional | |||
| with the resulting suboptimal user experience. However, there are | | parameters) with the resulting suboptimal user experience. | |||
| valid reasons to forego the implementation complexity of | | However, there are valid reasons to forego the implementation | |||
| implementing the parsing of some or all of the optional | | complexity of implementing the parsing of some or all of the | |||
| parameters, for example, when there is pre-determined knowledge, | | optional parameters, for example, when there is predetermined | |||
| not negotiated by an SDP-based offer/answer process, of the | | knowledge, not negotiated by an SDP-based offer/answer process, | |||
| capabilities of the involved systems (walled gardens, baseline | | of the capabilities of the involved systems (walled gardens, | |||
| requirements defined in application standards higher up in the | | baseline requirements defined in application standards higher | |||
| stack, and similar). | | up in the stack, and similar). | |||
| An answerer MAY extend the offer with additional media format | An answerer MAY extend the offer with additional media format | |||
| configurations. However, to enable their usage, in most cases a | configurations. However, to enable their usage, in most cases, a | |||
| second offer is required from the offerer to provide the bitstream | second offer is required from the offerer to provide the bitstream | |||
| property parameters that the media sender will use. This also has | property parameters that the media sender will use. This also has | |||
| the effect that the offerer has to be able to receive this media | the effect that the offerer has to be able to receive this media | |||
| format configuration, not only to send it. | format configuration, not only to send it. | |||
| 7.3.3. Multicast | 7.3.3. Multicast | |||
| For bitstreams being delivered over multicast, the following rules | For bitstreams being delivered over multicast, the following rules | |||
| apply: | apply: | |||
| skipping to change at page 59, line 46 ¶ | skipping to change at line 2659 ¶ | |||
| as long as the three above rules are obeyed. | as long as the three above rules are obeyed. | |||
| 7.3.4. Usage in Declarative Session Descriptions | 7.3.4. Usage in Declarative Session Descriptions | |||
| When VVC over RTP is offered with SDP in a declarative style, as in | When VVC over RTP is offered with SDP in a declarative style, as in | |||
| Real Time Streaming Protocol (RTSP) [RFC7826] or Session Announcement | Real Time Streaming Protocol (RTSP) [RFC7826] or Session Announcement | |||
| Protocol (SAP) [RFC2974], the following considerations are necessary. | Protocol (SAP) [RFC2974], the following considerations are necessary. | |||
| * All parameters capable of indicating both bitstream properties and | * All parameters capable of indicating both bitstream properties and | |||
| receiver capabilities are used to indicate only bitstream | receiver capabilities are used to indicate only bitstream | |||
| properties. For example, in this case, the parameter profile-id, | properties. For example, in this case, the parameters profile-id, | |||
| tier-id, level-id declares the values used by the bitstream, not | tier-id, and level-id declare the values used by the bitstream, | |||
| the capabilities for receiving bitstreams. As a result, the | not the capabilities for receiving bitstreams. As a result, the | |||
| following interpretation of the parameters MUST be used: | following interpretation of the parameters MUST be used: | |||
| - Declaring actual configuration or bitstream properties: | - Declaring actual configuration or bitstream properties: | |||
| o profile-id | o profile-id | |||
| o tier-flag | o tier-flag | |||
| o level-id | o level-id | |||
| skipping to change at page 61, line 11 ¶ | skipping to change at line 2720 ¶ | |||
| reject (RTSP) or not participate in (SAP) the session. It | reject (RTSP) or not participate in (SAP) the session. It | |||
| falls on the creator of the session to use values that are | falls on the creator of the session to use values that are | |||
| expected to be supported by the receiving application. | expected to be supported by the receiving application. | |||
| 7.3.5. Considerations for Parameter Sets | 7.3.5. Considerations for Parameter Sets | |||
| When out-of-band transport of parameter sets is used, parameter sets | When out-of-band transport of parameter sets is used, parameter sets | |||
| MAY still be additionally transported in-band unless explicitly | MAY still be additionally transported in-band unless explicitly | |||
| disallowed by an application, and some of these additional parameter | disallowed by an application, and some of these additional parameter | |||
| sets may update some of the out-of-band transported parameter sets. | sets may update some of the out-of-band transported parameter sets. | |||
| Update of a parameter set refers to the sending of a parameter set of | An update of a parameter set refers to the sending of a parameter set | |||
| the same type using the same parameter set ID but with different | of the same type using the same parameter set ID but with different | |||
| values for at least one other parameter of the parameter set. | values for at least one other parameter of the parameter set. | |||
| 8. Use with Feedback Messages | 8. Use with Feedback Messages | |||
| The following subsections define the use of the Picture Loss | The following subsections define the use of the Picture Loss | |||
| Indication (PLI) and Full Intra Request (FIR) feedback messages with | Indication (PLI) and Full Intra Request (FIR) feedback messages with | |||
| [VVC]. The PLI is defined in [RFC4585], and the FIR message is | [VVC]. The PLI is defined in [RFC4585], and the FIR message is | |||
| defined in [RFC5104]. In accordance with this memo, unlike [HEVC], a | defined in [RFC5104]. In accordance with this memo, unlike [HEVC], a | |||
| sender MUST NOT send Slice Loss Indication (SLI) or Reference Picture | sender MUST NOT send Slice Loss Indication (SLI) or Reference Picture | |||
| Selection Indication (RPSI), and a receiver SHOULD ignore RPSI and | Selection Indication (RPSI), and a receiver SHOULD ignore RPSI and | |||
| treat a received SLI as a PLI. | treat a received SLI as a PLI. | |||
| 8.1. Picture Loss Indication (PLI) | 8.1. Picture Loss Indication (PLI) | |||
| As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a | As specified in Section 6.3.1 of [RFC4585], the reception of a PLI by | |||
| media sender indicates "the loss of an undefined amount of coded | a media sender indicates "the loss of an undefined amount of coded | |||
| video data belonging to one or more pictures". Without having any | video data belonging to one or more pictures". Without having any | |||
| specific knowledge of the setup of the bitstream (such as use and | specific knowledge of the setup of the bitstream (such as use and | |||
| location of in-band parameter sets, non-IRAP decoder refresh points, | location of in-band parameter sets, non-IRAP decoder refresh points, | |||
| picture structures, and so forth), a reaction to the reception of an | picture structures, and so forth), a reaction to the reception of a | |||
| PLI by a VVC sender SHOULD be to send an IRAP picture and relevant | PLI by a VVC sender SHOULD be to send an IRAP picture and relevant | |||
| parameter sets; potentially with sufficient redundancy so to ensure | parameter sets, potentially with sufficient redundancy so to ensure | |||
| correct reception. However, sometimes information about the | correct reception. However, sometimes information about the | |||
| bitstream structure is known. For example, state could have been | bitstream structure is known. For example, such information can be | |||
| established outside of the mechanisms defined in this document that | parameter sets that have been conveyed out of band through mechanisms | |||
| parameter sets are conveyed out of band only, and stay static for the | not defined in this document and that are known to stay static for | |||
| duration of the session. In that case, it is obviously unnecessary | the duration of the session. In that case, it is obviously | |||
| to send them in-band as a result of the reception of a PLI. Other | unnecessary to send them in-band as a result of the reception of a | |||
| examples could be devised based on a priori knowledge of different | PLI. Other examples could be devised based on a priori knowledge of | |||
| aspects of the bitstream structure. In all cases, the timing and | different aspects of the bitstream structure. In all cases, the | |||
| congestion control mechanisms of RFC 4585 MUST be observed. | timing and congestion control mechanisms of [RFC4585] MUST be | |||
| observed. | ||||
| 8.2. Full Intra Request (FIR) | 8.2. Full Intra Request (FIR) | |||
| The purpose of the FIR message is to force an encoder to send an | The purpose of the FIR message is to force an encoder to send an | |||
| independent decoder refresh point as soon as possible, while | independent decoder refresh point as soon as possible while observing | |||
| observing applicable congestion-control-related constraints, such as | applicable congestion-control-related constraints, such as those set | |||
| those set out in [RFC8082]). | out in [RFC8082]. | |||
| Upon reception of a FIR, a sender MUST send an IDR picture. | Upon reception of a FIR, a sender MUST send an IDR picture. | |||
| Parameter sets MUST also be sent, except when there is a priori | Parameter sets MUST also be sent, except when there is a priori | |||
| knowledge that the parameter sets have been correctly established. A | knowledge that the parameter sets have been correctly established. A | |||
| typical example for that is an understanding between sender and | typical example for that is an understanding between the sender and | |||
| receiver, established by means outside this document, that parameter | receiver, established by means outside this document, that parameter | |||
| sets are exclusively sent out-of-band. | sets are exclusively sent out of band. | |||
| 9. Security Considerations | 9. Security Considerations | |||
| The scope of this Security Considerations section is limited to the | The scope of this section is limited to the payload format itself and | |||
| payload format itself and to one feature of [VVC] that may pose a | to one feature of [VVC] that may pose a particularly serious security | |||
| particularly serious security risk if implemented naively. The | risk if implemented naively. The payload format, in isolation, does | |||
| payload format, in isolation, does not form a complete system. | not form a complete system. Implementers are advised to read and | |||
| Implementers are advised to read and understand relevant security- | understand relevant security-related documents, especially those | |||
| related documents, especially those pertaining to RTP (see the | pertaining to RTP (see the Security Considerations section in | |||
| Security Considerations section in [RFC3550]), and the security of | [RFC3550]) and the security of the call-control stack chosen (that | |||
| the call-control stack chosen (that may make use of the media type | may make use of the media type registration of this memo). | |||
| registration of this memo). Implementers should also consider known | Implementers should also consider known security vulnerabilities of | |||
| security vulnerabilities of video coding and decoding implementations | video coding and decoding implementations in general and avoid those. | |||
| in general and avoid those. | ||||
| Within this RTP payload format, and with the exception of the user | Within this RTP payload format, and with the exception of the user | |||
| data SEI message as described below, no security threats other than | data SEI message as described below, no security threats other than | |||
| those common to RTP payload formats are known. In other words, | those common to RTP payload formats are known. In other words, | |||
| neither the various media-plane-based mechanisms, nor the signaling | neither the various media-plane-based mechanisms nor the signaling | |||
| part of this memo, seems to pose a security risk beyond those common | part of this memo seem to pose a security risk beyond those common to | |||
| to all RTP-based systems. | all RTP-based systems. | |||
| RTP packets using the payload format defined in this specification | RTP packets using the payload format defined in this specification | |||
| are subject to the security considerations discussed in the RTP | are subject to the security considerations discussed in the RTP | |||
| specification [RFC3550], and in any applicable RTP profile such as | specification [RFC3550] and in any applicable RTP profile, such as | |||
| RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ | RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ | |||
| SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP | SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP | |||
| Does Not Mandate a Single Media Security Solution" [RFC7202] | Does Not Mandate a Single Media Security Solution" [RFC7202] | |||
| discusses, it is not an RTP payload format's responsibility to | discusses, it is not an RTP payload format's responsibility to | |||
| discuss or mandate what solutions are used to meet the basic security | discuss or mandate what solutions are used to meet the basic security | |||
| goals like confidentiality, integrity and source authenticity for RTP | goals, like confidentiality, integrity, and source authenticity for | |||
| in general. This responsibility lays on anyone using RTP in an | RTP in general. This responsibility lays on anyone using RTP in an | |||
| application. They can find guidance on available security mechanisms | application. They can find guidance on available security mechanisms | |||
| and important considerations in "Options for Securing RTP Sessions" | and important considerations in "Options for Securing RTP Sessions" | |||
| [RFC7201]. The rest of this section discusses the security impacting | [RFC7201]. The rest of this section discusses the security impacting | |||
| properties of the payload format itself. | properties of the payload format itself. | |||
| Because the data compression used with this payload format is applied | Because the data compression used with this payload format is applied | |||
| end-to-end, any encryption needs to be performed after compression. | end to end, any encryption needs to be performed after compression. | |||
| A potential denial-of-service threat exists for data encodings using | A potential denial-of-service threat exists for data encodings using | |||
| compression techniques that have non-uniform receiver-end | compression techniques that have non-uniform receiver-end | |||
| computational load. The attacker can inject pathological datagrams | computational load. The attacker can inject pathological datagrams | |||
| into the bitstream that are complex to decode and that cause the | into the bitstream that are complex to decode and that cause the | |||
| receiver to be overloaded. [VVC] is particularly vulnerable to such | receiver to be overloaded. [VVC] is particularly vulnerable to such | |||
| attacks, as it is extremely simple to generate datagrams containing | attacks, as it is extremely simple to generate datagrams containing | |||
| NAL units that affect the decoding process of many future NAL units. | NAL units that affect the decoding process of many future NAL units. | |||
| Therefore, the usage of data origin authentication and data integrity | Therefore, the usage of data origin authentication and data integrity | |||
| protection of at least the RTP packet is RECOMMENDED but NOT | protection of at least the RTP packet is RECOMMENDED but NOT REQUIRED | |||
| REQUIRED, based on the thoughts of [RFC7202] | based on the thoughts of [RFC7202]. | |||
| Like HEVC [RFC7798], [VVC] includes a user data Supplemental | Like HEVC [RFC7798], [VVC] includes a user data Supplemental | |||
| Enhancement Information (SEI) message. This SEI message allows | Enhancement Information (SEI) message. This SEI message allows | |||
| inclusion of an arbitrary bitstring into the video bitstream. Such a | inclusion of an arbitrary bitstring into the video bitstream. Such a | |||
| bitstring could include JavaScript, machine code, and other active | bitstring could include JavaScript, machine code, and other active | |||
| content. [VVC] leaves the handling of this SEI message to the | content. [VVC] leaves the handling of this SEI message to the | |||
| receiving system. In order to avoid harmful side effects of the user | receiving system. In order to avoid harmful side effects of the user | |||
| data SEI message, decoder implementations cannot naively trust its | data SEI message, decoder implementations cannot naively trust its | |||
| content. For example, it would be a bad and insecure implementation | content. For example, it would be a bad and insecure implementation | |||
| practice to forward any JavaScript a decoder implementation detects | practice to forward any JavaScript a decoder implementation detects | |||
| skipping to change at page 63, line 43 ¶ | skipping to change at line 2848 ¶ | |||
| end points. | end points. | |||
| 10. Congestion Control | 10. Congestion Control | |||
| Congestion control for RTP SHALL be used in accordance with RTP | Congestion control for RTP SHALL be used in accordance with RTP | |||
| [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551] or | [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551] or | |||
| AVPF [RFC4585]. If best-effort service is being used, an additional | AVPF [RFC4585]. If best-effort service is being used, an additional | |||
| requirement is that users of this payload format MUST monitor packet | requirement is that users of this payload format MUST monitor packet | |||
| loss to ensure that the packet loss rate is within an acceptable | loss to ensure that the packet loss rate is within an acceptable | |||
| range. Packet loss is considered acceptable if a TCP flow across the | range. Packet loss is considered acceptable if a TCP flow across the | |||
| same network path, and experiencing the same network conditions, | same network path and experiencing the same network conditions would | |||
| would achieve an average throughput, measured on a reasonable | achieve an average throughput, measured on a reasonable timescale, | |||
| timescale, that is not less than all RTP streams combined are | that is not less than all RTP streams combined are achieved. This | |||
| achieved. This condition can be satisfied by implementing | condition can be satisfied by implementing congestion-control | |||
| congestion-control mechanisms to adapt the transmission rate, the | mechanisms to adapt the transmission rate, by implementing the number | |||
| number of layers subscribed for a layered multicast session, or by | of layers subscribed for a layered multicast session, or by arranging | |||
| arranging for a receiver to leave the session if the loss rate is | for a receiver to leave the session if the loss rate is unacceptably | |||
| unacceptably high. | high. | |||
| The bitrate adaptation necessary for obeying the congestion control | The bitrate adaptation necessary for obeying the congestion control | |||
| principle is easily achievable when real-time encoding is used, for | principle is easily achievable when real-time encoding is used, for | |||
| example, by adequately tuning the quantization parameter. However, | example, by adequately tuning the quantization parameter. However, | |||
| when pre-encoded content is being transmitted, bandwidth adaptation | when pre-encoded content is being transmitted, bandwidth adaptation | |||
| requires the pre-coded bitstream to be tailored for such adaptivity. | requires the pre-coded bitstream to be tailored for such adaptivity. | |||
| The key mechanisms available in [VVC] are temporal scalability, and | The key mechanisms available in [VVC] are temporal scalability and | |||
| spatial/SNR scalability. A media sender can remove NAL units | spatial/SNR scalability. A media sender can remove NAL units | |||
| belonging to higher temporal sublayers (i.e., those NAL units with a | belonging to higher temporal sublayers (i.e., those NAL units with a | |||
| high value of TID) or higher spatio-SNR layers until the sending | high value of TID) or higher spatio-SNR layers until the sending | |||
| bitrate drops to an acceptable range. | bitrate drops to an acceptable range. | |||
| The mechanisms mentioned above generally work within a defined | The mechanisms mentioned above generally work within a defined | |||
| profile and level and, therefore, no renegotiation of the channel is | profile and level; therefore no renegotiation of the channel is | |||
| required. Only when non-downgradable parameters (such as profile) | required. Only when non-downgradable parameters (such as profile) | |||
| are required to be changed does it become necessary to terminate and | are required to be changed does it become necessary to terminate and | |||
| restart the RTP stream(s). This may be accomplished by using | restart the RTP stream(s). This may be accomplished by using | |||
| different RTP payload types. | different RTP payload types. | |||
| MANEs MAY remove certain unusable packets from the RTP stream when | MANEs MAY remove certain unusable packets from the RTP stream when | |||
| that RTP stream was damaged due to previous packet losses. This can | that RTP stream was damaged due to previous packet losses. This can | |||
| help reduce the network load in certain special cases. For example, | help reduce the network load in certain special cases. For example, | |||
| MANEs can remove those FUs where the leading FUs belonging to the | MANEs can remove those FUs where the leading FUs belonging to the | |||
| same NAL unit have been lost or those dependent slice segments when | same NAL unit have been lost or those dependent slice segments when | |||
| the leading slice segments belonging to the same slice have been | the leading slice segments belonging to the same slice have been | |||
| lost, because the trailing FUs or dependent slice segments are | lost, because the trailing FUs or dependent slice segments are | |||
| meaningless to most decoders. MANE can also remove higher temporal | meaningless to most decoders. MANE can also remove higher temporal | |||
| scalable layers if the outbound transmission (from the MANE's | scalable layers if the outbound transmission (from the MANE's | |||
| viewpoint) experiences congestion. | viewpoint) experiences congestion. | |||
| 11. IANA Considerations | 11. IANA Considerations | |||
| A new media type, as specified in Section 7.1 of this memo, has been | A new media type has been registered with IANA; see Section 7.1. | |||
| registered with IANA. | ||||
| 12. Acknowledgements | ||||
| Dr. Byeongdoo Choi is thanked for the video codec related technical | ||||
| discussion and other aspects in this memo. Xin Zhao and Dr. Xiang Li | ||||
| are thanked for their contributions on [VVC] specification | ||||
| descriptive content. Spencer Dawkins is thanked for his valuable | ||||
| review comments that led to great improvements of this memo. Some | ||||
| parts of this specification share text with the RTP payload format | ||||
| for HEVC [RFC7798]. We thank the authors of that specification for | ||||
| their excellent work. | ||||
| 13. References | 12. References | |||
| 13.1. Normative References | 12.1. Normative References | |||
| [ISO23090-3] | [ISO23090-3] | |||
| ISO/IEC 23090-3, "Information technology - Coded | International Organization for Standardization, | |||
| representation of immersive media Part 3 Versatile Video | "Information technology - Coded representation of | |||
| Coding", 2021, <https://www.iso.org/standard/73022.html>. | immersive media - Part 3: Versatile video coding", ISO/ | |||
| IEC 23090-3:2022, September 2022, | ||||
| <https://www.iso.org/standard/73022.html>. | ||||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model | [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model | |||
| with Session Description Protocol (SDP)", RFC 3264, | with Session Description Protocol (SDP)", RFC 3264, | |||
| DOI 10.17487/RFC3264, June 2002, | DOI 10.17487/RFC3264, June 2002, | |||
| <https://www.rfc-editor.org/info/rfc3264>. | <https://www.rfc-editor.org/info/rfc3264>. | |||
| skipping to change at page 65, line 35 ¶ | skipping to change at line 2926 ¶ | |||
| [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and | [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and | |||
| Video Conferences with Minimal Control", STD 65, RFC 3551, | Video Conferences with Minimal Control", STD 65, RFC 3551, | |||
| DOI 10.17487/RFC3551, July 2003, | DOI 10.17487/RFC3551, July 2003, | |||
| <https://www.rfc-editor.org/info/rfc3551>. | <https://www.rfc-editor.org/info/rfc3551>. | |||
| [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. | [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. | |||
| Norrman, "The Secure Real-time Transport Protocol (SRTP)", | Norrman, "The Secure Real-time Transport Protocol (SRTP)", | |||
| RFC 3711, DOI 10.17487/RFC3711, March 2004, | RFC 3711, DOI 10.17487/RFC3711, March 2004, | |||
| <https://www.rfc-editor.org/info/rfc3711>. | <https://www.rfc-editor.org/info/rfc3711>. | |||
| [RFC4556] Zhu, L. and B. Tung, "Public Key Cryptography for Initial | ||||
| Authentication in Kerberos (PKINIT)", RFC 4556, | ||||
| DOI 10.17487/RFC4556, June 2006, | ||||
| <https://www.rfc-editor.org/info/rfc4556>. | ||||
| [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, | [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, | |||
| "Extended RTP Profile for Real-time Transport Control | "Extended RTP Profile for Real-time Transport Control | |||
| Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, | Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, | |||
| DOI 10.17487/RFC4585, July 2006, | DOI 10.17487/RFC4585, July 2006, | |||
| <https://www.rfc-editor.org/info/rfc4585>. | <https://www.rfc-editor.org/info/rfc4585>. | |||
| [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data | [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data | |||
| Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, | Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, | |||
| <https://www.rfc-editor.org/info/rfc4648>. | <https://www.rfc-editor.org/info/rfc4648>. | |||
| skipping to change at page 66, line 35 ¶ | skipping to change at line 2966 ¶ | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: | [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: | |||
| Session Description Protocol", RFC 8866, | Session Description Protocol", RFC 8866, | |||
| DOI 10.17487/RFC8866, January 2021, | DOI 10.17487/RFC8866, January 2021, | |||
| <https://www.rfc-editor.org/info/rfc8866>. | <https://www.rfc-editor.org/info/rfc8866>. | |||
| [VSEI] "Versatile supplemental enhancement information messages | [VSEI] ITU-T, "Versatile supplemental enhancement information | |||
| for coded video bitstreams", 2020, | messages for coded video bitstreams", ITU-T | |||
| Recommendation H.274, May 2022, | ||||
| <https://www.itu.int/rec/T-REC-H.274>. | <https://www.itu.int/rec/T-REC-H.274>. | |||
| [VVC] "Versatile Video Coding, ITU-T Recommendation H.266", | [VVC] ITU-T, "Versatile Video Coding", ITU-T | |||
| 2020, <http://www.itu.int/rec/T-REC-H.266>. | Recommendation H.266, April 2022, | |||
| <http://www.itu.int/rec/T-REC-H.266>. | ||||
| 13.2. Informative References | 12.2. Informative References | |||
| [CABAC] and et al, "Transform coefficient coding in HEVC, IEEE | [CABAC] Sole, J., et al., "Transform coefficient coding in HEVC", | |||
| Transactions on Circuits and Systems for Video | IEEE Transactions on Circuits and Systems for Video | |||
| Technology", DOI 10.1109/TCSVT.2012.2223055, December | Technology, DOI 10.1109/TCSVT.2012.2223055, December 2012, | |||
| 2012, <https://doi.org/10.1109/TCSVT.2012.2223055>. | <https://doi.org/10.1109/TCSVT.2012.2223055>. | |||
| [HEVC] "High efficiency video coding, ITU-T Recommendation | [HEVC] ITU-T, "High efficiency video coding", ITU-T | |||
| H.265", 2019, <https://www.itu.int/rec/T-REC-H.265>. | Recommendation H.265, August 2021, | |||
| <https://www.itu.int/rec/T-REC-H.265>. | ||||
| [MPEG2S] IS0/IEC, "Information technology - Generic coding of | [MPEG2S] International Organization for Standardization, | |||
| moving pictures and associated audio information - Part 1: | "Information technology - Generic coding of moving | |||
| Systems, ISO International Standard 13818-1", 2013. | pictures and associated audio information - Part 1: | |||
| Systems", ISO/IEC 13818-1:2022, September 2022. | ||||
| [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session | [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session | |||
| Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974, | Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974, | |||
| October 2000, <https://www.rfc-editor.org/info/rfc2974>. | October 2000, <https://www.rfc-editor.org/info/rfc2974>. | |||
| [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP | [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP | |||
| Payload Format for H.264 Video", RFC 6184, | Payload Format for H.264 Video", RFC 6184, | |||
| DOI 10.17487/RFC6184, May 2011, | DOI 10.17487/RFC6184, May 2011, | |||
| <https://www.rfc-editor.org/info/rfc6184>. | <https://www.rfc-editor.org/info/rfc6184>. | |||
| skipping to change at page 68, line 5 ¶ | skipping to change at line 3034 ¶ | |||
| [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. | [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. | |||
| M. Hannuksela, "RTP Payload Format for High Efficiency | M. Hannuksela, "RTP Payload Format for High Efficiency | |||
| Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, | Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, | |||
| March 2016, <https://www.rfc-editor.org/info/rfc7798>. | March 2016, <https://www.rfc-editor.org/info/rfc7798>. | |||
| [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., | [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., | |||
| and M. Stiemerling, Ed., "Real-Time Streaming Protocol | and M. Stiemerling, Ed., "Real-Time Streaming Protocol | |||
| Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December | Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December | |||
| 2016, <https://www.rfc-editor.org/info/rfc7826>. | 2016, <https://www.rfc-editor.org/info/rfc7826>. | |||
| Appendix A. Change History | Acknowledgements | |||
| To RFC Editor: PLEASE REMOVE ThIS SECTION BEFORE PUBLICATION | ||||
| draft-zhao-payload-rtp-vvc-00 ........ initial version | ||||
| draft-zhao-payload-rtp-vvc-01 ........ editorial clarifications and | ||||
| corrections | ||||
| draft-ietf-payload-rtp-vvc-00 ........ initial WG draft | ||||
| draft-ietf-payload-rtp-vvc-01 ........ VVC specification update | ||||
| draft-ietf-payload-rtp-vvc-02 ........ VVC specification update | ||||
| draft-ietf-payload-rtp-vvc-03 ........ VVC coding tool introduction | ||||
| update | ||||
| draft-ietf-payload-rtp-vvc-04 ........ VVC coding tool introduction | ||||
| update | ||||
| draft-ietf-payload-rtp-vvc-05 ........ reference udpate and adding | ||||
| placement for open issues | ||||
| draft-ietf-payload-rtp-vvc-06 ........ address editor's note | ||||
| draft-ietf-payload-rtp-vvc-07 ........ address editor's notes | ||||
| draft-ietf-payload-rtp-vvc-08 ........ address editor's notes | ||||
| draft-ietf-payload-rtp-vvc-09 ........ address editor's notes | ||||
| draft-ietf-payload-rtp-vvc-10 ........ address editor's notes | ||||
| draft-ietf-payload-rtp-vvc-11 ........ address editor's notes | ||||
| draft-ietf-payload-rtp-vvc-12 ........ address editor's notes | ||||
| draft-ietf-payload-rtp-vvc-13 ........ address editor's notes | ||||
| draft-ietf-payload-rtp-vvc-14 ........ address 2nd WGLC comments | Dr. Byeongdoo Choi is thanked for the video-codec-related technical | |||
| discussion and other aspects in this memo. Xin Zhao and Dr. Xiang Li | ||||
| are thanked for their contributions on [VVC] specification | ||||
| descriptive content. Spencer Dawkins is thanked for his valuable | ||||
| review comments that led to great improvements of this memo. Some | ||||
| parts of this specification share text with the RTP payload format | ||||
| for HEVC [RFC7798]. We thank the authors of that specification for | ||||
| their excellent work. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Shuai Zhao | Shuai Zhao | |||
| Intel | Intel | |||
| 2200 Mission College Blvd | 2200 Mission College Blvd | |||
| Santa Clara, 95054 | Santa Clara, 95054 | |||
| United States of America | United States of America | |||
| Email: shuai.zhao@ieee.org | Email: shuai.zhao@ieee.org | |||
| Stephan Wenger | Stephan Wenger | |||
| Tencent | Tencent | |||
| 2747 Park Blvd | 2747 Park Blvd | |||
| End of changes. 344 change blocks. | ||||
| 1262 lines changed or deleted | 1228 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||