rfc8761xml2.original.xml   rfc8761.xml 
<?xml version='1.0' encoding='utf-8'?> <?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent" [ <rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="info"
<!ENTITY RFC6350 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC docName="draft-ietf-netvc-requirements-10" ipr="trust200902"
.6350.xml"> submissionType="IETF" xml:lang="en" version="3" number="8761"
]> consensus="true" symRefs="false" sortRefs="true" tocInclude="true">
<?rfc toc="yes"?>
<?rfc text-list-symbols="o.-*+"?> <?rfc text-list-symbols="o.-*+"?>
<rfc category="info" docName="draft-ietf-netvc-requirements-10" ipr="trust200902
" submissionType="IETF" xml:lang="en" version="3" >
<?rfc symrefs="no"?>
<front> <front>
<title abbrev="Video Codec Requirements and Evaluattion">Video Codec Requirement <title abbrev="Video Codec Requirements and Evaluation">Video Codec Requirements
s and Evaluation Methodology</title> and Evaluation Methodology</title>
<author fullname="Alexey Filippov" initials="A." <seriesInfo name="RFC" value="8761" />
surname="Filippov">
<author fullname="Alexey Filippov" initials="A." surname="Filippov">
<organization>Huawei Technologies</organization> <organization>Huawei Technologies</organization>
<address> <address>
<email>alexey.filippov@huawei.com</email> <email>alexey.filippov@huawei.com</email>
<!-- uri and facsimile elements may also be added -->
</address> </address>
</author> </author>
<author fullname="Andrey Norkin" initials="A." <author fullname="Andrey Norkin" initials="A." surname="Norkin">
surname="Norkin">
<organization>Netflix</organization> <organization>Netflix</organization>
<address> <address>
<email>anorkin@netflix.com</email> <email>anorkin@netflix.com</email>
</address> </address>
</author> </author>
<author fullname="Jose Roberto Alvarez" initials="J.R." <author fullname="Jose Roberto Alvarez" initials="J.R." surname="Alvarez"
surname="Alvarez"> >
<organization>Huawei Technologies</organization> <organization>Huawei Technologies</organization>
<address> <address>
<email>j.alvarez@ieee.org</email> <email>j.alvarez@ieee.org</email>
</address> </address>
</author> </author>
<date day="28" month="November" year="2019" /> <date month="April" year="2020"/>
<keyword>Internet-Draft</keyword>
<keyword>NETVC</keyword>
<keyword>evaluation</keyword>
<keyword>requirements</keyword>
<keyword>compression performance</keyword>
<keyword>video coding applications</keyword>
<abstract> <abstract>
<t> <t>
This document provides requirements for a video codec designed This document provides requirements for a video codec designed mainly for
mainly for use over the Internet. In addition, this document use over the Internet. In addition, this document describes an evaluation
describes an evaluation methodology needed for measuring the methodology for measuring the compression efficiency to determine whether
compression efficiency to ensure whether the stated requirements are or not the stated requirements have been fulfilled.
fulfilled or not.
</t> </t>
</abstract> </abstract>
</front> </front>
<middle> <middle>
<section title="Introduction"> <!-- 1, line 97--> <section title="Introduction">
<t>In this document, the requirements for a video codec designed mainly for use <t>This document presents the requirements for a video codec designed mainly
over the Internet are presented. The requirements encompass a wide range of appl for use over the Internet. The requirements encompass a wide range of
ications that use data transmission over the Internet including Internet video s applications that use data transmission over the Internet, including Internet
treaming, IPTV, peer-to-peer video conferencing, video sharing, screencasting, g video streaming, IPTV, peer-to-peer video conferencing, video sharing,
ame streaming and video monitoring / surveillance. For each application, typical screencasting, game streaming, and video monitoring and surveillance. For each
resolutions, frame-rates and picture access modes are presented. application, typical resolutions, frame rates, and picture-access modes are
Specific requirements related to data transmission over packet-loss networks are presented. Specific requirements related to data transmission over
considered as well. In this document, when we discuss data protection technique packet-loss networks are considered as well. In this document, when we
s we only refer to methods designed and implemented to protect data inside the v discuss data-protection techniques, we only refer to methods designed and
ideo codec since there are many existing techniques that protect generic data tr implemented to protect data inside the video codec since there are many
ansmitted over networks with packet losses. From the theoretical point of view, existing techniques that protect generic data transmitted over networks with
both packet-loss and bit-error robustness can be beneficial for video codecs. In packet losses. From the theoretical point of view, both packet-loss and
practice, packet losses are a more significant problem than bit corruption in I bit-error robustness can be beneficial for video codecs. In practice, packet
P networks. It is worth noting that there is an evident interdependence between losses are a more significant problem than bit corruption in IP networks. It
possible amount of delay and the necessity of error robust video streams: is worth noting that there is an evident interdependence between the possible
</t> amount of delay and the necessity of error-robust video streams:
<t>o If an amount of delay is not crucial for an application, then reliable tra
nsport protocols such as TCP that retransmits undelivered packets can be used to
guarantee correct decoding of transmitted data.
</t>
<t>o If the amount of delay must be kept low, then either data transmission sho
uld be error free (e.g., by using managed networks) or compressed video stream s
hould be error resilient.
</t> </t>
<t>Thus, error resilience can be useful for delay-critical applications to provi
de low delay in packet-loss environment. <ul spacing="normal">
<li>If the amount of delay is not crucial for an application, then reliable
transport protocols such as TCP that retransmit undelivered packets can be
used to guarantee correct decoding of transmitted data.
</li>
<li>If the amount of delay must be kept low, then either data transmission
should be error free (e.g., by using managed networks) or the compressed
video stream should be error resilient.
</li>
</ul>
<t>Thus, error resilience can be useful for delay-critical applications to
provide low delay in a packet-loss environment.
</t> </t>
</section> <!-- ends: "1 from line 97--> </section>
<section title="Definitions and abbreviations used in this document"> <!-- 2, li ne 113--> <section anchor="defs" title="Terminology Used in This Document">
<artwork> <section anchor="def1" title="Definitions">
<![CDATA[
+------------------+-----------------------------------------------+
| Term | Meaning |
+------------------+-----------------------------------------------+
| High dynamic | is a set of techniques that allow a greater |
| range imaging | dynamic range of exposures or values (i.e., |
| | a wide range of values between light and dark |
| | areas) than normal digital imaging techniques.|
| | The intention is to accurately represent the |
| | wide range of intensity levels found in such |
| | examples as exterior scenes that include |
| | light-colored items struck by direct sunlight |
| | and areas of deep shadow [7]. |
| | |
| Random access | is the period of time between two closest |
| period | independently decodable frames (pictures). |
| | |
| RD-point | A point in a two dimensional rate-distortion |
| | space where the values of bitrate and quality |
| | metric are used as x- and y-coordinates, |
| | respectively |
| | |
| Visually | is a form or manner of lossy compression |
| lossless | where the data that are lost after the file |
| compression | is compressed and decompressed is not |
| | detectable to the eye; the compressed data |
| | appearing identical to the uncompressed |
| | data [8]. |
| | |
| Wide color gamut | is a certain complete color subset (e.g., |
| | considered in ITU-R BT.2020) that supports a |
| | wider range of colors (i.e., an extended range|
| | of colors that can be generated by a specific |
| | input or output device such as a video camera,|
| | monitor or printer and can be interpreted by |
| | a color model) than conventional color gamuts |
| | (e.g., considered in ITU-R BT.601 or BT.709). |
+------------------+-----------------------------------------------+
Table 1. Definitions used in the text of this document <dl newline="true">
]]> <dt>High dynamic range imaging</dt>
</artwork> <dd>A set of techniques that allows a greater dynamic range of exposures or
values (i.e., a wider range of values between light and dark areas) than normal
digital imaging techniques. The intention is to accurately represent the wide
range of intensity levels found in examples such as exterior scenes that
include light-colored items struck by direct sunlight and areas of deep shadow
<xref target="HDR"/>.</dd>
<dt>Random access period</dt>
<dd>The period of time between the two closest independently decodable frames
(pictures).</dd>
<dt>RD-point</dt>
<dd>A point in a two-dimensional rate-distortion space where the values of
bitrate and quality metric are used as x- and y-coordinates, respectively.</dd>
<dt>Visually lossless compression</dt>
<dd>A form or manner of lossy compression where the data that are lost
after the file is compressed and decompressed is not detectable to the eye;
the compressed data appear identical to the uncompressed data <xref
target="COMPRESSION"/>.</dd>
<dt>Wide color gamut</dt>
<artwork> <dd>A certain complete color subset (e.g., considered in ITU-R BT.2020 <xref
<![CDATA[ target="BT2020-2" />) that supports a wider range of colors (i.e., an extended
+--------------+---------------------------------------------------+ range of colors that can be generated by a specific input or output device
| Abbreviation | Meaning | such as a video camera, monitor, or printer and can be interpreted by a color
+--------------+---------------------------------------------------+ model) than conventional color gamuts (e.g., considered in ITU-R BT.601 <xref
| AI | All-Intra (each picture is intra-coded) | target="BT601"/> or BT.709 <xref target="BT709"/>).</dd>
| BD-Rate | Bjontegaard Delta Rate | </dl>
| FIZD | just the First picture is Intra-coded, Zero |
| | structural Delay |
| GOP | Group of Picture |
| HBR | High Bitrate Range |
| HDR | High Dynamic Range |
| HRD | Hypothetical Reference Decoder |
| IPTV | Internet Protocol Television |
| LBR | Low Bitrate Range |
| MBR | Medium Bitrate Range |
| MOS | Mean Opinion Score |
| MS-SSIM | Multi-Scale Structural Similarity quality index |
| PAM | Picture Access Mode |
| PSNR | Peak Signal-to-Noise Ratio |
| QoS | Quality of Service |
| QP | Quantization Parameter |
| RA | Random Access |
| RAP | Random Access Period |
| RD | Rate-Distortion |
| SEI | Supplemental Enhancement Information |
| UGC | User-Generated Content |
| VDI | Virtual Desktop Infrastructure |
| VUI | Video Usability Information |
| WCG | Wide Color Gamut |
+--------------+---------------------------------------------------+
Table 2. Abbreviations used in the text of this document </section>
]]>
</artwork>
</section> <!-- ends: "2 from line 113--> <section anchor="abbr" title="Abbreviations">
<section title="Applications"> <!-- 3, line 191-->
<t>In this chapter, an overview of video codec applications that are currently a <dl newline="false" indent="12" spacing="normal">
vailable on the Internet market is presented. It is worth noting that there are
different use cases for each application that define a target platform, and henc <dt>AI</dt>
e there are different types of communication channels involved (e.g., wired or w <dd>All-Intra (each picture is intra-coded)</dd>
ireless channels) that are characterized by different quality of service as well
as bandwidth; for instance, wired channels are considerably more error- free th <dt>BD-Rate</dt>
an wireless channels and therefore require different QoS approaches. The target <dd>Bjontegaard Delta Rate</dd>
platform, the channel bandwidth and the channel quality determine resolutions, f
rame-rates and quality or bit-rates for video streams to be encoded or decoded. <dt>FIZD</dt>
By default, color format YCbCr 4:2:0 is assumed for the application scenarios li <dd>just the First picture is Intra-coded, Zero structural Delay</dd>
sted below.
<dt>FPS</dt>
<dd>Frames per Second</dd>
<dt>GOP</dt>
<dd>Group of Picture</dd>
<dt>GPU</dt>
<dd>Graphics Processing Unit</dd>
<dt>HBR</dt>
<dd>High Bitrate Range </dd>
<dt>HDR</dt>
<dd>High Dynamic Range</dd>
<dt>HRD</dt>
<dd>Hypothetical Reference Decoder</dd>
<dt>HEVC</dt>
<dd>High Efficiency Video Coding</dd>
<dt>IPTV</dt>
<dd>Internet Protocol Television</dd>
<dt>LBR</dt>
<dd>Low Bitrate Range</dd>
<dt>MBR</dt>
<dd>Medium Bitrate Range</dd>
<dt>MOS</dt>
<dd>Mean Opinion Score</dd>
<dt>MS-SSIM</dt>
<dd>Multi-Scale Structural Similarity quality index</dd>
<dt>PAM</dt>
<dd>Picture Access Mode</dd>
<dt>PSNR</dt>
<dd>Peak Signal-to-Noise Ratio</dd>
<dt>QoS</dt>
<dd>Quality of Service</dd>
<dt>QP</dt>
<dd>Quantization Parameter</dd>
<dt>RA</dt>
<dd>Random Access</dd>
<dt>RAP</dt>
<dd>Random Access Period</dd>
<dt>RD</dt>
<dd>Rate-Distortion</dd>
<dt>SEI</dt>
<dd>Supplemental Enhancement Information</dd>
<dt>SIMD</dt>
<dd>Single Instruction, Multiple Data</dd>
<dt>SNR</dt>
<dd>Signal-to-Noise Ratio</dd>
<dt>UGC</dt>
<dd>User-Generated Content</dd>
<dt>VDI</dt>
<dd>Virtual Desktop Infrastructure</dd>
<dt>VUI</dt>
<dd>Video Usability Information</dd>
<dt>WCG</dt>
<dd>Wide Color Gamut</dd>
</dl>
</section>
</section>
<section anchor="apps" title="Applications">
<t>In this section, an overview of video codec applications that are currently
available on the Internet market is presented. It is worth noting that there
are different use cases for each application that define a target platform;
hence, there are different types of communication channels involved (e.g.,
wired or wireless channels) that are characterized by different QoS
as well as bandwidth; for instance, wired channels are considerably
more free from error than wireless channels and therefore require different QoS
approaches.
The target platform, the channel bandwidth, and the
channel quality determine resolutions, frame rates, and either quality or
bitrates for video streams to be encoded or decoded.
By default, color format YCbCr 4:2:0 is assumed for
the application scenarios listed below.
</t> </t>
<section title="Internet Video Streaming"> <!-- 3.1, line 197--> <section title="Internet Video Streaming">
<t>Typical content for this application is movies, TV-series and shows, and anim <t>Typical content for this application is movies, TV series and shows, and
ation. Internet video streaming uses a variety of client devices and has to oper animation. Internet video streaming uses a variety of client devices and has
ate under changing network conditions. For this reason, an adaptive streaming mo to operate under changing network conditions. For this reason, an adaptive
del has been widely adopted. Video material is encoded at different quality leve streaming model has been widely adopted. Video material is encoded at
ls and different resolutions, which are then chosen by a client depending on its different quality levels and different resolutions, which are then chosen by a
capabilities and current network bandwidth. An example combination of resolutio client depending on its capabilities and current network bandwidth. An example
ns and bitrates is shown in Table 3. combination of resolutions and bitrates is shown in <xref target="vid-stream" />
.
</t> </t>
<t>A video encoding pipeline in on-demand Internet video streaming typically ope rates as follows: <t>A video encoding pipeline in on-demand Internet video streaming typically ope rates as follows:
</t> </t>
<ul> <ul>
<li>Video is encoded in the cloud by software encoders. <li>Video is encoded in the cloud by software encoders.
</li> </li>
<li>Source video is split into chunks, each of which is encoded separately, in p arallel. <li>Source video is split into chunks, each of which is encoded separately, in p arallel.
</li> </li>
<li>Closed-GOP encoding with 2-5 second intra-picture intervals (or more) is use <li>Closed-GOP encoding with intrapicture intervals of 2-5
d. seconds (or longer) is used.
</li> </li>
<li>Encoding is perceptually optimized. Perceptual quality is important and shou ld be considered during the codec development. <li>Encoding is perceptually optimized. Perceptual quality is important and shou ld be considered during the codec development.
</li> </li>
</ul> </ul>
<artwork> <table anchor="vid-stream">
<![CDATA[ <name>
+----------------------+-------------------------+-----------------+ Internet Video Streaming: Typical Values of Resolutions, Frame Rates,
| Resolution * | Frame-rate, fps | PAM | and PAMs</name>
+----------------------+-------------------------+-----------------+ <thead>
+----------------------+-------------------------+-----------------+ <tr>
| 4K, 3840x2160 | 24/1.001, 24, 25, | RA | <th>Resolution *</th>
+----------------------+ +-----------------+ <th>PAM</th>
| 2K (1080p), 1920x1080| 30/1.001, 30, 50, | RA | <th align="center">Frame Rate, FPS **</th>
+----------------------+ +-----------------+ </tr>
| 1080i, 1920x1080* | 60/1.001, 60, 100, | RA | </thead>
+----------------------+ +-----------------+ <tbody>
| 720p, 1280x720 | 120/1.001, 120 | RA | <tr>
+----------------------+ +-----------------+ <td>4K, 3840x2160</td>
| 576p (EDTV), 720x576 | The set of frame-rates | RA | <td>RA</td>
+----------------------+ +-----------------+ <td align="center" rowspan="10"><br/><br/><br/>24/1.001, 24, 25, <br/>30/1
| 576i (SDTV), 720x576*| presented in this table | RA | .001, 30, 50, <br/>60/1.001,
+----------------------+ +-----------------+ 60, 100, <br/>120/1.001, 120</td>
| 480p (EDTV), 720x480 | is taken from Table 2 | RA | </tr>
+----------------------+ +-----------------+ <tr>
| 480i (SDTV), 720x480*| in [1] | RA | <td>2K (1080p), 1920x1080</td>
+----------------------+ +-----------------+ <td>RA</td>
| 512x384 | | RA | </tr>
+----------------------+ +-----------------+ <tr>
| QVGA, 320x240 | | RA | <td>1080i, 1920x1080*</td>
+----------------------+-------------------------+-----------------+]]> <td>RA</td>
</tr>
Table 3. Internet Video Streaming: typical values of resolutions, <tr>
frame-rates, and RAPs <td>720p, 1280x720</td>
</artwork> <td>RA</td>
</tr>
<tr>
<td>576p (EDTV), 720x576</td>
<td>RA</td>
</tr>
<tr>
<td>576i (SDTV), 720x576*</td>
<td>RA</td>
</tr>
<tr>
<td>480p (EDTV), 720x480</td>
<td>RA</td>
</tr>
<tr>
<td>480i (SDTV), 720x480*</td>
<td>RA</td>
</tr>
<tr>
<td>512x384</td>
<td>RA</td>
</tr>
<tr>
<td>QVGA, 320x240</td>
<td>RA</td>
</tr>
</tbody>
</table>
<t> <t>
NB *: Interlaced content can be handled at the higher system level *Note: Interlaced content can be handled at the higher system level
and not necessarily by using specialized video coding tools. It is and not necessarily by using specialized video coding tools. It is
included in this table only for the sake of completeness as most included in this table only for the sake of completeness, as most
video content today is in the progressive format. video content today is in the progressive format.
</t> </t>
<t>Characteristics and requirements of this application scenario are as follows: <t>
**Note: The set of frame rates presented in this table is taken from Table 2 in
<xref target="BT2020-2"/>.
</t>
<t>The characteristics and requirements of this application scenario are as foll
ows:
</t> </t>
<ul> <ul>
<li>High encoder complexity (up to 10x and more) can be tolerated since encoding <li>High encoder complexity (up to 10x and more) can be tolerated since
happens once and in parallel for different segments. encoding happens once and in parallel for different segments.
</li> </li>
<li>Decoding complexity should be kept at reasonable levels to enable efficient decoder implementation. <li>Decoding complexity should be kept at reasonable levels to enable efficient decoder implementation.
</li> </li>
<li><t>Support and efficient encoding of a wide range of content types and forma ts is required:</t> <li><t>Support and efficient encoding of a wide range of content types and forma ts is required:</t>
<ul> <ul>
<li>High Dynamic Range (HDR), Wide Color Gamut (WCG), high resolution (currently <li>High Dynamic Range (HDR), Wide Color Gamut (WCG), high-resolution
, up to 4K), high frame-rate content are important use cases, the codec should b (currently, up to 4K), and high-frame-rate content are important use cases; the
e able to encode such content efficiently. codec should be able to encode such content efficiently.
</li> </li>
<li>Coding efficiency improvement at both lower and higher resolutions is import <li>Improvement of coding efficiency at both lower and higher resolutions is
ant since low resolutions are used when streaming in low bandwidth conditions. important since low resolutions are used when streaming in low-bandwidth
conditions.
</li> </li>
<li>Improvement on both "easy" and "difficult" content in terms <li>Improvement on both "easy" and "difficult" content in terms
of compression efficiency at the same quality level of compression efficiency at the same quality level
contributes to the overall bitrate/storage savings. contributes to the overall bitrate/storage savings.
</li> </li>
<li>Film grain (and sometimes other types of noise) is often present in the stre
aming movie-type content and is usually a part of the creative intent. <li>Film grain (and sometimes other types of noise) is often present in movies
and similar content; this is usually part of the creative intent.
</li> </li>
</ul></li> </ul></li>
<li>Significant improvements in compression efficiency between generations of vi <li>Significant improvements in compression efficiency between generations of
deo standards are desirable since this scenario typically assumes long-term supp video standards are desirable since this scenario typically assumes long-term
ort of legacy video codecs. support of legacy video codecs.
</li> </li>
<li>Random access points are inserted frequently (one per 2-5 seconds) to enable switching between resolutions and fast-forward playback. <li>Random access points are inserted frequently (one per 2-5 seconds) to enable switching between resolutions and fast-forward playback.
</li> </li>
<li>Elementary stream should have a model that allows easy parsing and identific <li>The elementary stream should have a model that allows easy parsing and
ation of the sample components. identification of the sample components.
</li> </li>
<li>Middle QP values are normally used in streaming, this is also the range wher <li>Middle QP values are normally used in streaming; this is also the range
e compression efficiency is important for this scenario. where compression efficiency is important for this scenario.
</li> </li>
<li>Scalability or other forms of supporting multiple quality representations ar <li>Scalability or other forms of supporting multiple quality representations
e beneficial if they do not incur significant bitrate overhead and if mandated i are beneficial if they do not incur significant bitrate overhead and if
n the first version. mandated in the first version.
</li> </li>
</ul> </ul>
</section> <!-- ends: "3.1 from line 197--> </section>
<section title="Internet Protocol Television (IPTV)"> <!-- 3.2, line 269--> <section title="Internet Protocol Television (IPTV)">
<t>This is a service for delivering television content over IP-based networks. I PTV may be classified into two main groups based on the type of delivery, as fol lows: <t>This is a service for delivering television content over IP-based networks. I PTV may be classified into two main groups based on the type of delivery, as fol lows:
</t> </t>
<ul> <ul>
<li>unicast (e.g., for video on demand), where delay is not crucial; <li>unicast (e.g., for video on demand), where delay is not crucial; and
</li> </li>
<li>multicast/broadcast (e.g., for transmitting news) where zapping, i.e. stream <li>multicast/broadcast (e.g., for transmitting news) where
changing, delay is important. zapping (i.e., stream changing) delay is important.
</li> </li>
</ul> </ul>
<t>In the IPTV scenario, traffic is transmitted over managed (QoS- based) networ
ks. Typical content used in this application is news, movies, cartoons, series, <t>In the IPTV scenario, traffic is transmitted over managed (QoS-based)
TV shows, etc. One important requirement for both groups is Random access to pic networks. Typical content used in this application is news, movies, cartoons,
tures, i.e. random access period (RAP) should be kept small enough (approximatel series, TV shows, etc. One important requirement for both groups is that random
y, 1-5 seconds). Optional requirements are as follows: access to pictures (i.e., the random access period (RAP)) should be kept small
enough (approximately 1-5 seconds). Optional requirements are as follows:
</t> </t>
<ul> <ul>
<li>Temporal (frame-rate) scalability; <li>Temporal (frame-rate) scalability; and
</li> </li>
<li>Resolution and quality (SNR) scalability. <li>Resolution and quality (SNR) scalability.
</li> </li>
</ul> </ul>
<t>For this application, typical values of resolutions, frame-rates, and RAPs ar <t>
e presented in Table 4. For this application, typical values of resolutions, frame rates, and PAMs
are presented in <xref target="IPTV" />.
</t> </t>
<artwork> <table anchor="IPTV">
<![CDATA[ <name>
+----------------------+-------------------------+-----------------+ IPTV: Typical Values of Resolutions, Frame Rates, and PAMs</name>
| Resolution * | Frame-rate, fps | PAM | <thead>
+----------------------+-------------------------+-----------------+ <tr>
+----------------------+-------------------------+-----------------+ <th>Resolution *</th>
| 2160p (4K),3840x2160 | 24/1.001, 24, 25, | RA | <th>PAM</th>
+----------------------+ +-----------------+ <th align="center">Frame Rate, FPS **</th>
| 1080p, 1920x1080 | 30/1.001, 30, 50, | RA | </tr>
+----------------------+ +-----------------+ </thead>
| 1080i, 1920x1080* | 60/1.001, 60, 100, | RA | <tbody>
+----------------------+ +-----------------+ <tr>
| 720p, 1280x720 | 120/1.001, 120 | RA | <td align="center">2160p (4K), 3840x2160</td>
+----------------------+ +-----------------+ <td>RA</td>
| 576p (EDTV), 720x576 | The set of frame-rates | RA | <td align="center" rowspan="8"><br/><br/><br/>24/1.001, 24, 25,
+----------------------+ +-----------------+ <br/>30/1.001, 30, 50, <br/>60/1.001, 60, 100, <br/>120/1.001, 120 </td>
| 576i (SDTV), 720x576*| presented in this table | RA | </tr>
+----------------------+ +-----------------+ <tr>
| 480p (EDTV), 720x480 | is taken from Table 2 | RA | <td>1080p, 1920x1080</td>
+----------------------+ +-----------------+ <td>RA</td>
| 480i (SDTV), 720x480*| in [1] | RA | </tr>
+----------------------+-------------------------+-----------------+ <tr>
<td>1080i, 1920x1080*</td>
<td>RA</td>
</tr>
<tr>
<td>720p, 1280x720</td>
<td>RA</td>
</tr>
<tr>
<td>576p (EDTV), 720x576</td>
<td>RA</td>
</tr>
<tr>
<td>576i (SDTV), 720x576*</td>
<td>RA</td>
</tr>
<tr>
<td>480p (EDTV), 720x480</td>
<td>RA</td>
</tr>
<tr>
<td>480i (SDTV), 720x480*</td>
<td>RA</td>
</tr>
</tbody>
</table>
Table 4. IPTV: typical values of resolutions, frame-rates, and RAPs
]]>
</artwork>
<t> <t>
NB *: Interlaced content can be handled at the higher system level *Note: Interlaced content can be handled at the higher system level
and not necessarily by using specialized video coding tools. It is and not necessarily by using specialized video coding tools. It is
included in this table only for the sake of completeness as most included in this table only for the sake of completeness, as most
video content today is in the progressive format. video content today is in a progressive format.
</t> </t>
</section> <!-- ends: "3.2 from line 269--> <t>
<section title="Video conferencing"> <!-- 3.3, line 319--> **Note: The set of frame rates presented in this table is taken
<t>This is a form of video connection over the Internet. This form allows users from Table 2 in <xref target="BT2020-2" />.
to establish connections to two or more people by two- way video and audio trans </t>
mission for communication in real-time. For this application, both stationary an
d mobile devices can be used. The main requirements are as follows: </section>
<section title="Video Conferencing">
<t>This is a form of video connection over the Internet. This form allows
users to establish connections to two or more people by two- way video and
audio transmission for communication in real time. For this application, both
stationary and mobile devices can be used. The main requirements are as
follows:
</t> </t>
<ul> <ul>
<li>Delay should be kept as low as possible (the preferable and maximum end-to-e <li>Delay should be kept as low as possible (the preferable and maximum
nd delay values should be less than 100 ms [9] and 320 ms [2], respectively); end-to-end delay values should be less than 100 ms <xref target="SG-16"/> and 32
0 ms <xref target="G1091"/>, respectively);
</li> </li>
<li>Temporal (frame-rate) scalability; <li>Temporal (frame-rate) scalability; and
</li> </li>
<li>Error robustness. <li>Error robustness.
</li> </li>
</ul> </ul>
<t>Support of resolution and quality (SNR) scalability is highly desirable. For <t>
this application, typical values of resolutions, frame-rates, and RAPs are prese Support of resolution and quality (SNR) scalability is highly
nted in Table 5. desirable. For this application, typical values of resolutions, frame rates,
and PAMs are presented in <xref target="vid-conf"/>.
</t> </t>
<artwork> <table anchor="vid-conf">
<![CDATA[ <name>
+----------------------+-------------------------+----------------+ Video Conferencing: Typical Values of Resolutions, Frame Rates, and PAMs</name
| Resolution | Frame-rate, fps | PAM | >
+----------------------+-------------------------+----------------+ <thead>
+----------------------+-------------------------+----------------+ <tr>
| 1080p, 1920x1080 | 15, 30 | FIZD | <th>Resolution</th>
+----------------------+-------------------------+----------------+ <th>Frame Rate, FPS</th>
| 720p, 1280x720 | 30, 60 | FIZD | <th>PAM</th>
+----------------------+-------------------------+----------------+ </tr>
| 4CIF, 704x576 | 30, 60 | FIZD | </thead>
+----------------------+-------------------------+----------------+ <tbody>
| 4SIF, 704x480 | 30, 60 | FIZD | <tr>
+----------------------+-------------------------+----------------+ <td>1080p, 1920x1080 </td>
| VGA, 640x480 | 30, 60 | FIZD | <td>15, 30</td>
+----------------------+-------------------------+----------------+ <td>FIZD</td>
| 360p, 640x360 | 30, 60 | FIZD | </tr>
+----------------------+-------------------------+----------------+ <tr>
<td>720p, 1280x720</td>
<td>30, 60</td>
<td>FIZD</td>
</tr>
<tr>
<td>4CIF, 704x576</td>
<td>30, 60</td>
<td>FIZD</td>
</tr>
<tr>
<td>4SIF, 704x480</td>
<td>30, 60</td>
<td>FIZD</td>
</tr>
<tr>
<td>VGA, 640x480 </td>
<td>30, 60</td>
<td>FIZD</td>
</tr>
<tr>
<td>360p, 640x360</td>
<td>30, 60</td>
<td>FIZD</td>
</tr>
Table 5. Video conferencing: typical values of resolutions, frame- </tbody>
rates, and RAPs </table>
]]>
</artwork>
</section> <!-- ends: "3.3 from line 319--> </section>
<section title="Video sharing"> <!-- 3.4, line 358--> <section title="Video Sharing">
<t>This is a service that allows people to upload and share video data (using li <t>This is a service that allows people to upload and share video data (using
ve streaming or not) and to watch them. It is also known as video hosting. A typ live streaming or not) and watch those videos. It is also known as video hosting
ical User-generated Content (UGC) scenario for this application is to capture vi . A
deo using mobile cameras such as GoPro or cameras integrated into smartphones (a typical User-Generated Content (UGC) scenario for this application is to
mateur video). The main requirements are as follows: capture video using mobile cameras such as GoPros or cameras integrated into
smartphones (amateur video). The main requirements are as follows:
</t> </t>
<ul> <ul>
<li>Random access to pictures for downloaded video data; <li>Random access to pictures for downloaded video data;
</li> </li>
<li>Temporal (frame-rate) scalability; <li>Temporal (frame-rate) scalability; and
</li> </li>
<li>Error robustness. <li>Error robustness.
</li> </li>
</ul> </ul>
<t>Support of resolution and quality (SNR) scalability is highly desirable. For <t>
this application, typical values of resolutions, frame-rates, and RAPs are prese Support of resolution and quality (SNR) scalability is highly
nted in Table 6. desirable. For this application, typical values of resolutions, frame rates,
and PAMs are presented in <xref target="vid-share" />.
</t>
<t>
Typical values of resolutions and frame rates in <xref target="vid-share" /> are
taken from
<xref target="YOUTUBE" />.
</t> </t>
<artwork> <table anchor="vid-share">
<![CDATA[ <name>
+----------------------+-------------------------+----------------+ Video Sharing: Typical Values of Resolutions, Frame Rates, and PAMs
| Resolution | Frame-rate, fps | PAM | </name>
+----------------------+-------------------------+----------------+ <thead>
+----------------------+-------------------------+----------------+ <tr>
| 2160p (4K),3840x2160 | 24, 25, 30, 48, 50, 60 | RA | <th>Resolution</th>
+----------------------+-------------------------+----------------+ <th>Frame Rate, FPS</th>
| 1440p (2K),2560x1440 | 24, 25, 30, 48, 50, 60 | RA | <th>PAM</th>
+----------------------+-------------------------+----------------+ </tr>
| 1080p, 1920x1080 | 24, 25, 30, 48, 50, 60 | RA | </thead>
+----------------------+-------------------------+----------------+ <tbody>
| 720p, 1280x720 | 24, 25, 30, 48, 50, 60 | RA | <tr>
+----------------------+-------------------------+----------------+ <td>2160p (4K), 3840x2160</td>
| 480p, 854x480 | 24, 25, 30, 48, 50, 60 | RA | <td>24, 25, 30, 48, 50, 60</td>
+----------------------+-------------------------+----------------+ <td>RA</td>
| 360p, 640x360 | 24, 25, 30, 48, 50, 60 | RA | </tr>
+----------------------+-------------------------+----------------+ <tr>
Table 6. Video sharing: typical values of resolutions, frame-rates <td>1440p (2K), 2560x1440</td>
[10], and RAPs <td>24, 25, 30, 48, 50, 60</td>
]]> <td>RA</td>
</artwork> </tr>
<tr>
<td>1080p, 1920x1080</td>
<td>24, 25, 30, 48, 50, 60</td>
<td>RA</td>
</tr>
<tr>
<td>720p, 1280x720</td>
<td>24, 25, 30, 48, 50, 60</td>
<td>RA</td>
</tr>
<tr>
<td>480p, 854x480</td>
<td>24, 25, 30, 48, 50, 60</td>
<td>RA</td>
</tr>
<tr>
<td>360p, 640x360 </td>
<td>24, 25, 30, 48, 50, 60</td>
<td>RA</td>
</tr>
</tbody>
</table>
</section> <!-- ends: "3.4 from line 358--> </section>
<section title="Screencasting"> <!-- 3.5, line 373--> <section title="Screencasting">
<t>This is a service that allows users to record and distribute computer desktop
screen output. This service requires efficient compression of computer-generate <t>This is a service that allows users to record and distribute
d content with high visual quality up to visually and mathematically (numericall video data from a computer screen. This service requires efficient compression o
y) lossless [11]. Currently, this application includes business presentations (p f
owerpoint, word documents, email messages, etc.), animation (cartoons), gaming c computer-generated content with high visual quality up to visually and
ontent, data visualization, i.e. such type of content that is characterized by f mathematically (numerically) lossless <xref target="HEVC-EXT" />.
ast motion, rotation, smooth shade, 3D effect, highly saturated colors with full Currently, this application
resolution, clear textures and sharp edges with distinct colors [11]), virtual includes business presentations (PowerPoint, Word documents, email messages,
desktop infrastructure (VDI), screen/desktop sharing and collaboration, supervis etc.), animation (cartoons), gaming content, and data visualization. This
ory control and data acquisition (SCADA) automation display, wireless display, d type of content is characterized by fast motion, rotation, smooth shade,
isplay wall, digital operating room (DiOR), etc. For this application, an import 3D effect, highly saturated colors with full resolution, clear textures and
ant requirement is the support of low-delay configurations with zero structural sharp edges with distinct colors <xref target="HEVC-EXT" />, virtual desktop
delay, a wide range of video formats (e.g., RGB) in addition to YCbCr 4:2:0 and infrastructure (VDI),
YCbCr 4:4:4 [11]. For this application, typical values of resolutions, frame-rat screen/desktop sharing and collaboration, supervisory control and data
es, and RAPs are presented in Table 7. acquisition (SCADA) display, automotive/navigation display, cloud gaming, factor
y automation
display, wireless display, display wall,
digital operating room (DiOR), etc. For this application, an important
requirement is the support of low-delay configurations with zero structural
delay for a wide range of video formats (e.g., RGB) in addition to YCbCr 4:2:0
and YCbCr 4:4:4 <xref target="HEVC-EXT" />.
For this application, typical values of resolutions,
frame rates, and PAMs are presented in <xref target="screencast" />.
</t> </t>
<artwork> <table anchor="screencast">
<![CDATA[ <name>
+----------------------+-------------------------+----------------+ Screencasting for RGB and YCbCr 4:4:4 Format: Typical Values of Resolutions, F
| Resolution | Frame-rate, fps | PAM | rame Rates, and PAMs
+----------------------+-------------------------+----------------+ </name>
+----------------------+-------------------------+----------------+ <thead>
| Input color format: RGB 4:4:4 | <tr>
+----------------------+-------------------------+----------------+ <th align="center">Resolution</th>
| 5k, 5120x2880 | 15, 30, 60 | AI, RA, FIZD | <th align="center">Frame Rate, FPS</th>
+----------------------+-------------------------+----------------+ <th align="center">PAM</th>
| 4k, 3840x2160 | 15, 30, 60 | AI, RA, FIZD | </tr>
+----------------------+-------------------------+----------------+ </thead>
| WQXGA, 2560x1600 | 15, 30, 60 | AI, RA, FIZD | <tbody>
+----------------------+-------------------------+----------------+ <tr>
| WUXGA, 1920x1200 | 15, 30, 60 | AI, RA, FIZD | <td colspan="3" align="center">Input color format: RGB 4:4:4</td>
+----------------------+-------------------------+----------------+ </tr>
| WSXGA+, 1680x1050 | 15, 30, 60 | AI, RA, FIZD | <tr>
+----------------------+-------------------------+----------------+ <td>5k, 5120x2880</td>
| WXGA, 1280x800 | 15, 30, 60 | AI, RA, FIZD | <td>15, 30, 60</td>
+----------------------+-------------------------+----------------+ <td>AI, RA, FIZD</td>
| XGA, 1024x768 | 15, 30, 60 | AI, RA, FIZD | </tr>
+----------------------+-------------------------+----------------+ <tr>
| SVGA, 800x600 | 15, 30, 60 | AI, RA, FIZD | <td>4k, 3840x2160</td>
+----------------------+-------------------------+----------------+ <td>15, 30, 60</td>
| VGA, 640x480 | 15, 30, 60 | AI, RA, FIZD | <td>AI, RA, FIZD</td>
+----------------------+-------------------------+----------------+ </tr>
| Input color format: YCbCr 4:4:4 | <tr>
+----------------------+-------------------------+----------------+ <td>WQXGA, 2560x1600</td>
| 5k, 5120x2880 | 15, 30, 60 | AI, RA, FIZD | <td>15, 30, 60</td>
+----------------------+-------------------------+----------------+ <td>AI, RA, FIZD</td>
| 4k, 3840x2160 | 15, 30, 60 | AI, RA, FIZD | </tr>
+----------------------+-------------------------+----------------+ <tr>
| 1440p (2K), 2560x1440| 15, 30, 60 | AI, RA, FIZD | <td>WUXGA, 1920x1200</td>
+----------------------+-------------------------+----------------+ <td>15, 30, 60</td>
| 1080p, 1920x1080 | 15, 30, 60 | AI, RA, FIZD | <td>AI, RA, FIZD</td>
+----------------------+-------------------------+----------------+ </tr>
| 720p, 1280x720 | 15, 30, 60 | AI, RA, FIZD | <tr>
+----------------------+-------------------------+----------------+ <td>WSXGA+, 1680x1050</td>
Table 7. Screencasting for RGB and YCbCr 4:4:4 format: typical <td>15, 30, 60</td>
values of resolutions, frame-rates, and RAPs <td>AI, RA, FIZD</td>
]]> </tr>
</artwork> <tr>
<td>WXGA, 1280x800</td>
<td>15, 30, 60</td>
<td>AI, RA, FIZD</td>
</tr>
<tr>
<td>XGA, 1024x768</td>
<td>15, 30, 60</td>
<td>AI, RA, FIZD</td>
</tr>
<tr>
<td>SVGA, 800x600</td>
<td>15, 30, 60</td>
<td>AI, RA, FIZD</td>
</tr>
<tr>
<td>VGA, 640x480</td>
<td>15, 30, 60</td>
<td>AI, RA, FIZD</td>
</tr>
<tr>
<td colspan="3" align="center">Input color format: YCbCr 4:4:4</td>
</tr>
<tr>
<td>5k, 5120x2880</td>
<td>15, 30, 60</td>
<td>AI, RA, FIZD</td>
</tr>
<tr>
<td>4k, 3840x2160</td>
<td>15, 30, 60</td>
<td>AI, RA, FIZD</td>
</tr>
<tr>
<td>1440p (2K), 2560x1440</td>
<td>15, 30, 60</td>
<td>AI, RA, FIZD</td>
</tr>
<tr>
<td>1080p, 1920x1080</td>
<td>15, 30, 60</td>
<td>AI, RA, FIZD</td>
</tr>
<tr>
<td>720p, 1280x720</td>
<td>15, 30, 60</td>
<td>AI, RA, FIZD</td>
</tr>
</tbody>
</table>
</section> <!-- ends: "3.5 from line 373--> </section>
<section title="Game streaming"> <!-- 3.6, line 380--> <section title="Game Streaming">
<t>This is a service that provides game content over the Internet to different l <t>This is a service that provides game content over the Internet to different
ocal devices such as notebooks, gaming tablets, etc. In this category of applica local devices such as notebooks and gaming tablets. In this category of
tions, server renders 3D games in cloud server, and streams the game to any devi applications, the server renders 3D games in a cloud server and streams the game
ce with a wired or wireless broadband connection [12]. There are low latency req to
uirements for transmitting user interactions and receiving game data in less tha any device with a wired or wireless broadband connection <xref target="GAME"
n a turn-around delay of 100 ms. This allows anyone to play (or resume) full fea />. There are low-latency requirements for transmitting user interactions and
tured games from anywhere in the Internet [12]. An example of this application i receiving game data with a turnaround delay of less than 100 ms. This allows
s Nvidia Grid [12]. Another category application is broadcast of video games pla anyone to play (or resume) full-featured games from anywhere on the Internet
yed by people over the Internet in real time or for later viewing [12]. There ar <xref target="GAME" />. An example of this application is Nvidia Grid <xref
e many companies such as Twitch, YY in China enable game broadcasting [12]. Game target="GAME" />.
s typically contain a lot of sharp edges and large motion [12]. The main require Another application scenario of this category is broadcast of video games
ments are as follows: played by people over the Internet in real time or for later viewing <xref
target="GAME" />. There are many companies, such as Twitch and YY in China, that
enable
game broadcasting <xref target="GAME" />. Games typically contain a lot of
sharp edges and large motion <xref target="GAME" />. The main requirements are
as follows:
</t> </t>
<ul> <ul>
<li>Random access to pictures for game broadcasting; <li>Random access to pictures for game broadcasting;
</li> </li>
<li>Temporal (frame-rate) scalability; <li>Temporal (frame-rate) scalability; and
</li> </li>
<li>Error robustness. <li>Error robustness.
</li> </li>
</ul> </ul>
<t>Support of resolution and quality (SNR) scalability is highly desirable. For <t>
this application, typical values of resolutions, frame-rates, and RAPs are simil Support of resolution and quality (SNR) scalability is highly
ar to ones presented in Table 5. desirable. For this application, typical values of resolutions, frame rates,
and PAMs are similar to ones presented in <xref target="vid-conf"/>.
</t> </t>
</section> <!-- ends: "3.6 from line 380--> </section>
<section title="Video monitoring / surveillance"> <!-- 3.7, line 393--> <section title="Video Monitoring and Surveillance">
<t>This is a type of live broadcasting over IP-based networks. Video streams are <t>This is a type of live broadcasting over IP-based networks. Video streams
sent to many receivers at the same time. A new receiver may connect to the stre are sent to many receivers at the same time. A new receiver may connect to the
am at an arbitrary moment, so random access period should be kept small enough ( stream at an arbitrary moment, so the random access period should be kept
approximately, ~1-5 seconds). Data are transmitted publicly in the case of video small enough (approximately, 1-5 seconds). Data are transmitted publicly in
monitoring and privately in the case of video surveillance, respectively. For I the case of video monitoring and privately in the case of video
P- cameras that have to capture, process and encode video data, complexity inclu surveillance. For IP cameras that have to capture, process, and encode video
ding computational and hardware complexity as well as memory bandwidth should be data, complexity -- including computational and hardware complexity, as well
kept low to allow real-time processing. In addition, support of high dynamic ra as memory bandwidth -- should be kept low to allow real-time processing. In
nge and a monochrome mode (e.g., for infrared cameras) as well as resolution and addition, support of a high dynamic range and a monochrome mode (e.g., for
quality (SNR) scalability is an essential requirement for video surveillance. I infrared cameras) as well as resolution and quality (SNR) scalability is an
n some use-cases, high video signal fidelity is required even after lossy compre essential requirement for video surveillance.
ssion. Typical values of resolutions, frame-rates, and RAPs for video monitoring
/ surveillance applications are presented in Table 8. In some use cases, high
video signal fidelity is required even after lossy compression. Typical values
of resolutions, frame rates, and PAMs for video monitoring and surveillance
applications are presented in <xref target="monitoring"/>.
</t> </t>
<artwork> <table anchor="monitoring">
<![CDATA[ <name>
+----------------------+-------------------------+-----------------+ Video Monitoring and Surveillance: Typical Values of Resolutions, Frame Rates,
| Resolution | Frame-rate, fps | PAM | and PAMs</name>
+----------------------+-------------------------+-----------------+ <thead>
+----------------------+-------------------------+-----------------+ <tr>
| 2160p (4K),3840x2160 | 12, 25, 30 | RA, FIZD | <th>Resolution</th>
+----------------------+-------------------------+-----------------+ <th>Frame Rate, FPS</th>
| 5Mpixels, 2560x1920 | 12, 25, 30 | RA, FIZD | <th>PAM</th>
+----------------------+-------------------------+-----------------+ </tr>
| 1080p, 1920x1080 | 25, 30 | RA, FIZD | </thead>
+----------------------+-------------------------+-----------------+ <tbody>
| 1.3Mpixels, 1280x960 | 25, 30 | RA, FIZD | <tr>
+----------------------+-------------------------+-----------------+ <td>2160p (4K), 3840x2160</td>
| 720p, 1280x720 | 25, 30 | RA, FIZD | <td>12, 25, 30</td>
+----------------------+-------------------------+-----------------+ <td>RA, FIZD</td>
| SVGA, 800x600 | 25, 30 | RA, FIZD | </tr>
+----------------------+-------------------------+-----------------+ <tr>
Table 8. Video monitoring / surveillance: typical values of <td>5Mpixels, 2560x1920</td>
resolutions, frame-rates, and RAPs <td>12, 25, 30</td>
]]> <td>RA, FIZD</td>
</artwork> </tr>
<tr>
<td>1080p, 1920x1080</td>
<td>25, 30</td>
<td>RA, FIZD</td>
</tr>
<tr>
</section> <!-- ends: "3.7 from line 393--> <td>1.23Mpixels, 1280x960</td>
</section> <!-- ends: "3 from line 191--> <td>25, 30</td>
<section title="Requirements"> <!-- 4, line 401--> <td>RA, FIZD</td>
<t>Taking the requirements discussed above for specific video applications, this </tr>
chapter proposes requirements for an internet video codec. <tr>
<td>720p, 1280x720</td>
<td>25, 30</td>
<td>RA, FIZD</td>
</tr>
<tr>
<td>SVGA, 800x600 </td>
<td>25, 30</td>
<td>RA, FIZD</td>
</tr>
</tbody>
</table>
</section>
</section>
<section title="Requirements">
<t>Taking the requirements discussed above for specific video applications,
this section proposes requirements for an Internet video codec.
</t> </t>
<section title="General requirements"> <!-- 4.1, line 406--> <section anchor="gen-reqs" title="General Requirements">
<t> 4.1.1. The most basic requirement is coding efficiency, i.e. compression per
formance on both "easy" and "difficult" content for applications and use cases i <section anchor="efficiency" title="Coding Efficiency">
n Section 2. The codec should provide higher coding efficiency over state-of-the
-art video codecs such as HEVC/H.265 and VP9, at least by 25% in accordance with <t>
the methodology described in Section 4.1 of this document. For higher resolutio The most fundamental requirement is coding efficiency, i.e., compression
ns, the coding efficiency improvements are expected to be higher than for lower performance on both "easy" and "difficult" content for applications and use
resolutions. <!-- 4.1.1, line 408--> cases in <xref target="apps" />. The codec should provide higher coding efficien
cy over
state-of-the-art video codecs such as HEVC/H.265 and VP9, at least 25%, in
accordance with the methodology described in <xref target="eval-method"/> of thi
s document. For
higher resolutions, the improvements in coding efficiency are expected to be
higher than for lower resolutions.
</t> </t>
<t> 4.1.2. Good quality specification and well-defined profiles and levels are r </section>
equired to enable device interoperability and facilitate decoder implementations
. A profile consists of a subset of entire bitstream syntax elements and consequ <section anchor="profiles" title="Profiles and Levels">
ently it also defines the necessary tools for decoding a conforming bitstream of <t>Good-quality specification and well-defined profiles and levels are
that profile. A level imposes a set of numerical limits to the values of some s required to enable device interoperability and facilitate decoder
yntax elements. An example of codec levels to be supported is presented in Table implementations. A profile consists of a subset of entire bitstream syntax
9. An actual level definition should include constraints on features that impac elements; consequently, it also defines the necessary tools for decoding a
t the decoder complexity. For example, these features might be as follows: maxim conforming bitstream of that profile. A level imposes a set of numerical
um bit-rate, line buffer size, memory usage, etc. limits to the values of some syntax elements. An example of codec levels to be
supported is presented in <xref target="codec-levels"/>. An actual level
definition should include constraints on features that impact the decoder
complexity. For example, these features might be as follows: maximum bitrate,
line buffer size, memory usage, etc.
</t> </t>
<artwork> <table anchor="codec-levels">
<![CDATA[ <name>Codec Levels</name>
+------------------------------------------------------------------+ <thead>
| Level | Example picture resolution at highest frame rate | <tr>
+-------------+----------------------------------------------------+ <th>Level</th>
| | 128x96(12,288*)@30.0 | <th>Example picture resolution at highest frame rate</th>
| 1 | 176x144(25,344*)@15.0 | </tr>
+-------------+----------------------------------------------------+ </thead>
| 2 | 352x288(101,376*)@30.0 | <tbody>
+-------------+----------------------------------------------------+ <tr>
| | 352x288(101,376*)@60.0 | <td>1</td>
| 3 | 640x360(230,400*)@30.0 | <td>128x96(12,288*)@30.0<br/>176x144(25,344*)@15.0</td>
+-------------+----------------------------------------------------+ </tr>
| | 640x360(230,400*)@60.0 | <tr>
| 4 | 960x540(518,400*)@30.0 | <td>2</td>
+-------------+----------------------------------------------------+ <td>352x288(101,376*)@30.0</td>
| | 720x576(414,720*)@75.0 | </tr>
| 5 | 960x540(518,400*)@60.0 | <tr>
| | 1280x720(921,600*)@30.0 | <td>3</td>
+-------------+----------------------------------------------------+ <td>352x288(101,376*)@60.0<br/>640x360(230,400*)@30.0</td>
| | 1,280x720(921,600*)@68.0 | </tr>
| 6 | 2,048x1,080(2,211,840*)@30.0 | <tr>
+-------------+----------------------------------------------------+ <td>4</td>
| | 1,280x720(921,600*)@120.0 | <td>640x360(230,400*)@60.0<br/>960x540(518,400*)@30.0</td>
| 7 | 2,048x1,080(2,211,840*)@60.0 | </tr>
+-------------+----------------------------------------------------+ <tr>
| | 1,920x1,080(2,073,600*)@120.0 | <td>5</td>
| 8 | 3,840x2,160(8,294,400*)@30.0 | <td>720x576(414,720*)@75.0<br/>960x540(518,400*)@60.0<br/>1280x720(921,600
| | 4,096x2,160(8,847,360*)@30.0 | *)@30.0</td>
+-------------+----------------------------------------------------+ </tr>
| | 1,920x1,080(2,073,600*)@250.0 | <tr>
| 9 | 4,096x2,160(8,847,360*)@60.0 | <td>6</td>
+-------------+----------------------------------------------------+ <td>1,280x720(921,600*)@68.0<br/>2,048x1,080(2,211,840*)@30.0</td>
| | 1,920x1,080(2,073,600*)@300.0 | </tr>
| 10 | 4,096x2,160(8,847,360*)@120.0 | <tr>
+-------------+----------------------------------------------------+ <td>7</td>
| | 3,840x2,160(8,294,400*)@120.0 | <td>1,280x720(921,600*)@120.0</td>
| 11 | 8,192x4,320(35,389,440*)@30.0 | </tr>
+-------------+----------------------------------------------------+ <tr>
| | 3,840x2,160(8,294,400*)@250.0 | <td>8</td>
| 12 | 8,192x4,320(35,389,440*)@60.0 | <td>1,920x1,080(2,073,600*)@120.0<br/>3,840x2,160(8,294,400*)@30.0<br/>4,0
+-------------+----------------------------------------------------+ 96x2,160(8,847,360*)@30.0</td>
| | 3,840x2,160(8,294,400*)@300.0 | </tr>
| 13 | 8,192x4,320(35,389,440*)@120.0 | <tr>
+-------------+----------------------------------------------------+ <td>9</td>
Table 9. Codec levels <td>1,920x1,080(2,073,600*)@250.0<br/>4,096x2,160(8,847,360*)@60.0</td>
]]> </tr>
</artwork> <tr>
<td>10</td>
<td>1,920x1,080(2,073,600*)@300.0<br/>4,096x2,160(8,847,360*)@120.0</td>
</tr>
<tr>
<td>11</td>
<td>3,840x2,160(8,294,400*)@120.0<br/>8,192x4,320(35,389,440*)@30.0</td>
</tr>
<tr>
<td>12</td>
<td>3,840x2,160(8,294,400*)@250.0<br/>8,192x4,320(35,389,440*)@60.0</td>
</tr>
<tr>
<td>13</td>
<td>3,840x2,160(8,294,400*)@300.0<br/>8,192x4,320(35,389,440*)@120.0</td>
</tr>
</tbody>
</table>
<t> <t>
NB *: The quantities of pixels are presented for such applications *Note: The quantities of pixels are presented for applications in which a
where a picture can have an arbitrary size (e.g., screencasting) picture can have an arbitrary size (e.g., screencasting).
</t> </t>
</section>
<t> 4.1.3. Bitstream syntax should allow extensibility and backward compatibilit <section anchor="syntax" title="Bitstream Syntax">
y. New features can be supported easily by using metadata (e.g., such as SEI mes <t>Bitstream syntax should allow extensibility and backward
sages, VUI, headers) without affecting the bitstream compatibility with legacy d compatibility. New features can be supported easily by using metadata (such as
ecoders. A newer version of the decoder shall be able to play bitstreams of an o SEI messages, VUI, and headers) without affecting the bitstream compatibility
lder version of the same or lower profile and level. with legacy decoders. A newer version of the decoder shall be able to play
bitstreams of an older version of the same or lower profile and level.
</t> </t>
</section>
<section anchor="model" title="Parsing and Identification of Sample Components">
<t> <t>
4.1.4. A bitstream should have a model that allows easy parsing and identificati A bitstream should have a model that allows easy parsing and identification of
on of the sample components (such as ISO/IEC14496-10, Annex B or ISO/IEC 14496-1 the sample components (such as Annex B of ISO/IEC 14496-10 <xref
5). In particular, information needed for packet handling (e.g., frame type) sho target="ISO14496-10" /> or ISO/IEC 14496-15 <xref target="ISO14496-15"/>). In
uld not require parsing anything below the header level. <!-- 4.1.4, line 414--> particular, information needed for packet handling (e.g., frame type) should
not require parsing anything below the header level.
</t> </t>
</section>
<section anchor="tools" title="Perceptual Quality Tools">
<t> <t>
4.1.5. Perceptual quality tools (such as adaptive QP and quantization matrices) Perceptual quality tools (such as adaptive QP and quantization matrices)
should be supported by the codec bit-stream. <!-- 4.1.5, line 416--> should be supported by the codec bitstream.
</t> <!-- ends: "4.1.5 from line 416--> </t>
</section>
<section anchor="buffer" title="Buffer Model">
<t> <t>
4.1.6. The codec specification shall define a buffer model such as hypothetical The codec specification shall define a buffer model such as hypothetical referen
reference decoder (HRD). <!-- 4.1.6, line 418--> ce decoder (HRD).
</t> <!-- ends: "4.1.6 from line 418--> </t>
</section>
<section anchor="integration" title="Integration">
<t> <t>
4.1.7. Specifications providing integration with system and delivery layers shou Specifications providing integration with system and delivery layers should be d
ld be developed. <!-- 4.1.7, line 420--> eveloped.
</t> <!-- ends: "4.1.7 from line 420--> </t>
</section> <!-- ends: "4.1 from line 406--> </section>
<section title="Basic requirements"> <!-- 4.2, line 423--> </section>
<section title="Input source formats:"> <!-- 4.2.1, line 425-->
<section title="Basic Requirements">
<section title="Input Source Formats">
<t>
Input pictures coded by a video codec should have one of the following formats:
</t>
<ul> <ul>
<li>Bit depth: 8- and 10-bits (up to 12-bits for a high profile) per color compo nent; <li>Bit depth: 8 and 10 bits (up to 12 bits for a high profile) per color compon ent.
</li> </li>
<li><t>Color sampling formats:</t> <li><t>Color sampling formats:</t>
<ul> <ul>
<li>YCbCr 4:2:0; <li>YCbCr 4:2:0
</li> </li>
<li>YCbCr 4:4:4, YCbCr 4:2:2 and YCbCr 4:0:0 (preferably in different profile(s) ). <li>YCbCr 4:4:4, YCbCr 4:2:2, and YCbCr 4:0:0 (preferably in different profile(s ))
</li> </li>
</ul></li> </ul></li>
<li>For profiles with bit depth of 10 bits per sample or higher, support of high dynamic range and wide color gamut. <li>For profiles with bit depth of 10 bits per sample or higher, support of high dynamic range and wide color gamut.
</li> </li>
<li>Support of arbitrary resolution according to the level constraints for such <li>Support of arbitrary resolution according to the level constraints for
applications where a picture can have an arbitrary size (e.g., in screencasting) applications in which a picture can have an arbitrary size (e.g., in screencasti
. ng).
</li>
<li>Exemplary input source formats for codec profiles are shown in Table 10.
</li> </li>
</ul> </ul>
<t>
Exemplary input source formats for codec profiles are shown in <xref target="exe
mplary"/>.
</t>
<artwork> <table anchor="exemplary">
<![CDATA[ <name>Exemplary Input Source Formats for Codec Profiles</name>
+---------+-----------------+-------------------------------------+ <thead>
| Profile | Bit-depths per | Color sampling formats | <tr>
| | color component | | <th>Profile</th>
+---------+-----------------+-------------------------------------+ <th>Bit depths per color component</th>
| 1 | 8 and 10 | 4:0:0 and 4:2:0 | <th>Color sampling formats</th>
+---------+-----------------+-------------------------------------+ </tr>
| 2 | 8 and 10 | 4:0:0, 4:2:0 and 4:4:4 | </thead>
+---------+-----------------+-------------------------------------+ <tbody>
| 3 | 8, 10 and 12 | 4:0:0, 4:2:0, 4:2:2 and 4:4:4 | <tr>
+---------+-----------------+-------------------------------------+ <td>1</td>
Table 10. Exemplary input source formats for codec profiles <td>8 and 10</td>
]]> <td>4:0:0 and 4:2:0</td>
</artwork> </tr>
<tr>
<td>2</td>
<td>8 and 10</td>
<td>4:0:0, 4:2:0, and 4:4:4</td>
</tr>
<tr>
<td>3</td>
<td>8, 10, and 12</td>
<td>4:0:0, 4:2:0, 4:2:2, and 4:4:4</td>
</tr>
</tbody>
</table>
</section> <!-- ends: "4.2.1 from line 425--> </section>
<section title="Coding delay:"> <!-- 4.2.2, line 466--> <section title="Coding Delay">
<t>
In order to meet coding delay requirements, a video codec should support all of
the following:
</t>
<ul> <ul>
<li><t>Support of configurations with zero structural delay also referred to as <li><t>Support of configurations with zero structural delay, also referred to
"low-delay" configurations.</t> as "low-delay" configurations.</t>
<ul> <ul>
<li>Note 1: end-to-end delay should be up to 320 ms [2] but its preferable value
should be less than 100 ms [9] <li>Note: End-to-end delay should be no more than 320 ms <xref target="G1091"
/>, but it is preferable for its value to be less than 100 ms <xref
target="SG-16"/>.
</li> </li>
</ul></li> </ul></li>
<li>Support of efficient random access point encoding (such as intra coding and <li>Support of efficient random access point encoding (such as intracoding and
resetting of context variables) as well as efficient switching between multiple resetting of context variables), as well as efficient switching between
quality representations. multiple quality representations.
</li> </li>
<li>Support of configurations with non-zero structural delay (such as out-of-ord <li>Support of configurations with nonzero structural delay (such as
er or multi-pass encoding) for applications without low-delay requirements if su out-of-order or multipass encoding) for applications without low-delay
ch configurations provide additional compression efficiency improvements. requirements, if such configurations provide additional compression efficiency
improvements.
</li> </li>
</ul> </ul>
</section> <!-- ends: "4.2.2 from line 466--> </section>
<section title="Complexity:"> <!-- 4.2.3, line 485--> <section title="Complexity">
<t>
Encoding and decoding complexity considerations are as follows:
</t>
<ul> <ul>
<li>Feasible real-time implementation of both an encoder and a decoder supportin
g a chosen subset of tools for hardware and software implementation on a wide ra <li>Feasible real-time implementation of both an encoder and a decoder
nge of state-of-the-art platforms. The real-time encoder tools subset should pro supporting a chosen subset of tools for hardware and software implementation
vide meaningful improvement in compression efficiency at reasonable complexity o on a wide range of state-of-the-art platforms. The subset of real-time encoder
f hardware and software encoder implementations as compared to real-time impleme tools should provide meaningful improvement in compression efficiency at
ntations of state-of-the-art video compression technologies such as HEVC/H.265 a reasonable complexity of hardware and software encoder implementations as
nd VP9. compared to real-time implementations of state-of-the-art video compression
technologies such as HEVC/H.265 and VP9.
</li> </li>
<li>High-complexity software encoder implementations used by offline encoding ap <li>High-complexity software encoder implementations used by offline encoding
plications can have 10x or more complexity increase compared to state-of-the-art applications can have a 10x or more complexity increase compared to
video compression technologies such as HEVC/H.265 and VP9. state-of-the-art video compression technologies such as HEVC/H.265 and VP9.
</li> </li>
</ul> </ul>
</section> <!-- ends: "4.2.3 from line 485--> </section>
<section title="Scalability:"> <!-- 4.2.4, line 495--> <section title="Scalability">
<t>
The mandatory scalability requirement is as follows:
</t>
<ul> <ul>
<li>Temporal (frame-rate) scalability should be supported. <li>Temporal (frame-rate) scalability should be supported.
</li> </li>
</ul> </ul>
</section> <!-- ends: "4.2.4 from line 495--> </section>
<section title="Error resilience:"> <!-- 4.2.5, line 501--> <section title="Error Resilience">
<t>
In order to meet the error resilience requirement, a video codec should
satisfy all of the following conditions:
</t>
<ul> <ul>
<li>Error resilience tools that are complementary to the error protection mechan <li>Tools that are complementary to the error-protection
isms implemented on transport level should be supported. mechanisms implemented on the transport level should be supported.
</li> </li>
<li>The codec should support mechanisms that facilitate packetization of a bitst ream for common network protocols. <li>The codec should support mechanisms that facilitate packetization of a bitst ream for common network protocols.
</li> </li>
<li>Packetization mechanisms should enable frame-level error recovery by means o f retransmission or error concealment. <li>Packetization mechanisms should enable frame-level error recovery by means o f retransmission or error concealment.
</li> </li>
<li>The codec should support effective mechanisms for allowing decoding and reco nstruction of significant parts of pictures in the event that parts of the pictu re data are lost in transmission. <li>The codec should support effective mechanisms for allowing decoding and reco nstruction of significant parts of pictures in the event that parts of the pictu re data are lost in transmission.
</li> </li>
<li>The bitstream specification shall support independently decodable sub-frame <li>The bitstream specification shall support independently decodable subframe
units similar to slices or independent tiles. It shall be possible for the encod units similar to slices or independent tiles. It shall be possible for the
er to restrict the bit-stream to allow parsing of the bit-stream after a packet- encoder to restrict the bitstream to allow parsing of the bitstream after a
loss and to communicate it to the decoder. packet loss and to communicate it to the decoder.
</li> </li>
</ul> </ul>
</section> <!-- ends: "4.2.5 from line 501--> </section>
</section> <!-- ends: "4.2 from line 423--> </section>
<section title="Optional requirements"> <!-- 4.3, line 519--> <section title="Optional Requirements">
<section title="Input source formats"> <!-- 4.3.1, line 522--> <section title="Input Source Formats">
<t>
It is a desired but not mandatory requirement for a video codec to support
some of the following features:
</t>
<ul> <ul>
<li>Bit depth: up to 16-bits per color component. <li>Bit depth: up to 16 bits per color component.
</li> </li>
<li>Color sampling formats: RGB 4:4:4. <li>Color sampling formats: RGB 4:4:4.
</li> </li>
<li>Auxiliary channel (e.g., alpha channel) support. <li>Auxiliary channel (e.g., alpha channel) support.
</li> </li>
</ul> </ul>
</section> <!-- ends: "4.3.1 from line 522--> </section>
<section title="Scalability:"> <!-- 4.3.2, line 534--> <section title="Scalability">
<t>
Desirable scalability requirements are as follows:
</t>
<ul> <ul>
<li>Resolution and quality (SNR) scalability that provide low compression effici <li>Resolution and quality (SNR) scalability that provides a low-compression
ency penalty (up to 5% of BD-rate [13] increase per layer with reasonable increa efficiency penalty (increase of up to 5% of BD-rate <xref target="PSNR" /> per
se of both computational and hardware complexity) can be supported in the main p layer with reasonable increase of both computational and hardware complexity)
rofile of the codec being developed by the NETVC WG. Otherwise, a separate profi can be supported in the main profile of the codec being developed by the NETVC
le is needed to support these types of scalability. Working Group. Otherwise, a separate profile is needed to support these types of
scalability.
</li> </li>
<li>Computational complexity scalability(i.e. computational complexity is decrea <li>Computational complexity scalability (i.e., computational complexity is
sing along with degrading picture quality) is desirable. decreasing along with degrading picture quality) is desirable.
</li> </li>
</ul> </ul>
</section> <!-- ends: "4.3.2 from line 534--> </section>
<section title="Complexity:"> <!-- 4.3.3, line 544--> <section title="Complexity">
<t>Tools that enable parallel processing (e.g., slices, tiles, wave front propag <t>Tools that enable parallel processing (e.g., slices, tiles, and wave-front
ation processing) at both encoder and decoder sides are highly desirable for man propagation processing) at both encoder and decoder sides are highly desirable
y applications. for many applications.
</t> </t>
<ul> <ul>
<li>High-level multi-core parallelism: encoder and decoder operation, especially <li>High-level multicore parallelism: encoder and decoder operation,
entropy encoding and decoding, should allow multiple frames or sub-frame region especially entropy encoding and decoding, should allow multiple frames or
s (e.g. 1D slices, 2D tiles, or partitions) to be processed concurrently, either subframe regions (e.g., 1D slices, 2D tiles, or partitions) to be processed
independently or with deterministic dependencies that can be efficiently pipeli concurrently, either independently or with deterministic dependencies that can
ned be efficiently pipelined.
</li> </li>
<li>Low-level instruction set parallelism: favor algorithms that are SIMD/GPU fr <li>Low-level instruction-set parallelism: favor algorithms that are SIMD/GPU
iendly over inherently serial algorithms friendly over inherently serial algorithms
</li> </li>
</ul> </ul>
</section> <!-- ends: "4.3.3 from line 544--> </section>
<section title="Coding efficiency"> <!-- 4.3.4, line 557--> <section title="Coding Efficiency">
<t>Compression efficiency on noisy content, content with film grain, computer ge <t>Compression efficiency on noisy content, content with film grain, computer
nerated content, and low resolution materials is desirable. generated content, and low resolution materials is desirable.
</t> </t>
</section> <!-- ends: "4.3.4 from line 557--> </section>
</section> <!-- ends: "4.3 from line 519--> </section>
</section> <!-- ends: "4 from line 401--> </section>
<section title="Evaluation methodology"> <!-- 5, line 563--> <section anchor="eval-method" title="Evaluation Methodology">
<t>As shown in Fig.1, compression performance testing is performed in 3 overlapp
ed ranges that encompass 10 different bitrate values: <t>As shown in <xref target="QP"/>, compression performance testing is
performed in three overlapped ranges that encompass ten different bitrate values
:
</t> </t>
<ul> <ul>
<li>Low bitrate range (LBR) is the range that contains the 4 lowest bitrates of <li>Low bitrate range (LBR) is the range that contains the four lowest
the 10 specified bitrates (1 of the 4 bitrate values is shared with the neighbor bitrates of the ten specified bitrates (one of the four bitrate values is shared
ing range); with the neighboring range).
</li> </li>
<li>Medium bitrate range (MBR) is the range that contains the 4 medium bitrates <li>Medium bitrate range (MBR) is the range that contains the four medium
of the 10 specified bitrates (2 of the 4 bitrate values are shared with the neig bitrates of the ten specified bitrates (two of the four bitrate values are share
hboring ranges); d with the neighboring ranges).
</li> </li>
<li>High bitrate range (HBR) is the range that contains the 4 highest bitrates o <li>High bitrate range (HBR) is the range that contains the four highest
f the 10 specified bitrates (1 of the 4 bitrate values is shared with the neighb bitrates of the ten specified bitrates (one of the four bitrate values is
oring range). shared with the neighboring range).
</li> </li>
</ul> </ul>
<t>Initially, for the codec selected as a reference one (e.g., HEVC or VP9), a s <t>Initially, for the codec selected as a reference one (e.g., HEVC or VP9), a
et of 10 QP (quantization parameter) values should be specified in [14] and corr set of ten QP (quantization parameter) values should be specified as in <xref
esponding quality values should be calculated. In Fig.1, QP and quality values a target="I-D.ietf-netvc-testing" />, and corresponding quality values should be
re denoted as QP0, QP1, QP2,..., QP8, QP9 and Q0, Q1, Q2,..., Q8, Q9, respective calculated.
ly. To guarantee the overlaps of quality levels between the bitrate ranges of th
e reference and tested codecs, a quality alignment procedure should be performed In
for each range's outermost (left- and rightmost) quality levels Qk of the refer <xref target="QP"/>, QP and quality values are denoted as "QP0"-"QP9" and
ence codec (i.e. for Q0, Q3, Q6, and Q9) and the quality levels Q'k (i.e. Q'0, Q "Q0"-"Q9", respectively. To guarantee the overlaps of quality
'3, Q'6, and Q'9) of the tested codec. Thus, these quality levels Q'k and, hence levels between the bitrate ranges of the reference and tested codecs, a
, the corresponding QP value QP'k (i.e. QP'0, QP'3, QP'6, and QP'9) of the teste quality alignment procedure should be performed for each range's outermost
d codec should be selected using the following formulas: (left- and rightmost) quality levels Qk of the reference codec (i.e., for Q0,
Q3, Q6, and Q9) and the quality levels Q'k (i.e., Q'0, Q'3, Q'6, and Q'9) of
the tested codec. Thus, these quality levels Q'k, and hence the corresponding
QP value QP'k (i.e., QP'0, QP'3, QP'6, and QP'9), of the tested codec should be
selected using the following formulas:
</t> </t>
<sourcecode> <artwork name="" type="" align="left" alt=""><![CDATA[
Q'k = min { abs(Q'i - Qk) }, Q'k = min { abs(Q'i - Qk) },
i in R i in R
QP'k = argmin { abs(Q'i(QP'i) - Qk(QPk)) }, QP'k = argmin { abs(Q'i(QP'i) - Qk(QPk)) },
i in R i in R
</sourcecode> ]]></artwork>
<t>where R is the range of the QP indexes of the tested codec, i.e. the candidat <t>where R is the range of the QP indexes of the tested codec, i.e., the
e Internet video codec. The inner quality levels (i.e. Q'1, Q'2, Q'4, Q'5, Q'7, candidate Internet video codec. The inner quality levels (i.e., Q'1, Q'2, Q'4,
and Q'8) as well as their corresponding QP values of each range (i.e. QP'1, QP'2 Q'5, Q'7, and Q'8), as well as their corresponding QP values of each range
, QP'4, QP'5, QP'7, and QP'8) should be as equidistantly spaced as possible betw (i.e., QP'1, QP'2, QP'4, QP'5, QP'7, and QP'8), should be as equidistantly
een the left- and rightmost quality levels without explicitly mapping their valu spaced as possible between the left- and rightmost quality levels without
es using the above described procedure. explicitly mapping their values using the procedure described above.
</t> </t>
<figure> <figure anchor="QP">
<name>Quality/QP alignment for compression performance evaluation <name>Quality/QP Alignment for Compression Performance Evaluation
</name> </name>
<artwork> <artwork>
<![CDATA[ QP'9 QP'8 QP'7 QP'6 QP'5 QP'4 QP'3 QP'2 QP'1 QP'0 &lt;+-----
QP'9 QP'8 QP'7 QP'6 QP'5 QP'4 QP'3 QP'2 QP'1 QP'0 <+-----
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | Tested ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | Tested
| | | | | | | | | | | codec | | | | | | | | | | | codec
Q'0 Q'1 Q'2 Q'3 Q'4 Q'5 Q'6 Q'7 Q'8 Q'9 <+----- Q'0 Q'1 Q'2 Q'3 Q'4 Q'5 Q'6 Q'7 Q'8 Q'9 &lt;+-----
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | |
Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 <+----- Q0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 &lt;+-----
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | Reference ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | Reference
| | | | | | | | | | | codec | | | | | | | | | | | codec
QP9 QP8 QP7 QP6 QP5 QP4 QP3 QP2 QP1 QP0 <+----- QP9 QP8 QP7 QP6 QP5 QP4 QP3 QP2 QP1 QP0 &lt;+-----
+----------------+--------------+--------------+---------> +----------------+--------------+--------------+---------&gt;
^ ^ ^ ^ Bit-rate ^ ^ ^ ^ Bitrate
|-------LBR------| |-----HBR------| |-------LBR------| |-----HBR------|
^ ^ ^ ^
|------MBR-----| |------MBR-----|
]]>
</artwork> </artwork>
</figure> </figure>
<t>Since the QP mapping results may vary for different sequences, eventually, th <t>Since the QP mapping results may vary for different sequences, this quality
is quality alignment procedure needs to be separately performed for each quality alignment procedure eventually needs to be performed separately for each quality
assessment index and each sequence used for codec performance evaluation to ful assessment index and each sequence used for codec performance evaluation to
fill the above described fulfill the requirements described above.
</t> </t>
<t>To assess the quality of output (decoded) sequences, two indexes, PSNR [3] an
d MS-SSIM [3,15] are separately computed. In the case of the YCbCr color format, <t>To assess the quality of output (decoded) sequences, two indexes (PSNR
PSNR should be calculated for each color plane whereas MS-SSIM is calculated fo <xref target="ISO29170-1"/> and MS-SSIM <xref target="ISO29170-1"/> <xref
r luma channel only. In the case of the RGB color format, both metrics are compu target="MULTI-SCALE"/>) are separately computed. In the case of the YCbCr
ted for R, G and B channels. Thus, for each sequence, 30 RD-points for PSNR (i.e color format, PSNR should be calculated for each color plane, whereas MS-SSIM
. three RD-curves, one for each channel) and 10 RD-points for MS-SSIM (i.e. one is calculated for the luma channel only. In the case of the RGB color format,
RD-curve, for luma channel only) should be calculated in the case of YCbCr. If c both metrics are computed for R, G, and B channels. Thus, for each sequence,
ontent is encoded as RGB, 60 RD-points (30 for PSNR and 30 for MS-SSIM) should b 30 RD-points for PSNR (i.e., three RD-curves, one for each channel) and 10
e calculated, i.e. three RD- curves (one for each channel) are computed for PSNR RD-points for MS-SSIM (i.e., one RD-curve, for luma channel only) should be
as well as three RD-curves (one for each channel) for MS-SSIM. calculated in the case of YCbCr. If content is encoded as RGB, 60 RD-points
(30 for PSNR and 30 for MS-SSIM) should be calculated (i.e., three RD-curves,
one for each channel) are computed for PSNR as well as three RD-curves (one
for each channel) for MS-SSIM.
</t> </t>
<t>Finally, to obtain an integral estimation, BD-rate savings [13] should be com <t>Finally, to obtain an integral estimation, BD-rate savings <xref
puted for each range and each quality index. In addition, average values over al target="PSNR" /> should be
l the 3 ranges should be provided for both PSNR and MS-SSIM. A list of video seq computed for each range and each quality index. In addition, average values
uences that should be used for testing as well as the 10 QP values for the refer over all three ranges should be provided for both PSNR and MS-SSIM. A list of
ence codec are defined in [14]. Testing processes should use the information on video sequences that should be used for testing, as well as the ten QP values
the codec applications presented in this document. As the reference for evaluati for the reference codec, are defined in <xref
on, state-of-the-art video codecs such as HEVC/H.265 [4,5] or VP9 must be used. target="I-D.ietf-netvc-testing" />. Testing processes should use the
The reference source code of the HEVC/H.265 codec can be found at [6]. The HEVC/ information on the codec applications presented in this document. As the
H.265 codec must be configured according to [16] and Table 11. reference for evaluation, state-of-the-art video codecs such as HEVC/H.265
<xref target="ISO23008-2"/><xref target="H265"/> or VP9 must be used. The refere
nce source
code of the HEVC/H.265 codec
can be found at <xref target="HEVC"/>. The HEVC/H.265 codec must be configured
according to <xref target="CONDITIONS"/>
and <xref target="intra-period" />.
</t> </t>
<artwork> <table anchor="intra-period">
<![CDATA[ <name>Intraperiods for Different HEVC/H.265 Encoding Modes According to
+----------------------+-------------------------------------------+ [16]</name>
| Intra-period, second | HEVC/H.265 encoding mode according to [16]| <thead>
+----------------------+-------------------------------------------+ <tr>
| AI | Intra Main or Intra Main10 | <th>Intra-period, second</th>
+----------------------+-------------------------------------------+ <th>HEVC/H.265 encoding mode according to <xref target="CONDITIONS"/></th>
| RA | Random access Main or | </tr>
| | Random access Main10 | </thead>
+----------------------+-------------------------------------------+ <tbody>
| FIZD | Low delay Main or | <tr>
| | Low delay Main10 | <td>AI</td>
+----------------------+-------------------------------------------+ <td>Intra Main or Intra Main10</td>
</tr>
Table 11. Intra-periods for different HEVC/H.265 encoding modes <tr>
according to [16] <td>RA</td>
]]> <td>Random access Main or<br/>Random access Main10</td>
</artwork> </tr>
<tr>
<td>FIZD</td>
<td>Low delay Main or<br/>Low delay Main10</td>
</tr>
</tbody>
</table>
<t>According to the coding efficiency requirement described in Section 3.1.1, BD <t>According to the coding efficiency requirement described in <xref
-rate savings calculated for each color plane and averaged for all the video seq target="efficiency"/>, BD-rate savings calculated for each color plane and
uences used to test the NETVC codec should be, at least, averaged for all the video sequences used to test the NETVC codec should be,
at least,
</t> </t>
<ul> <ul>
<li>25% if calculated over the whole bitrate range; <li>25% if calculated over the whole bitrate range; and
</li> </li>
<li>15% if calculated for each bitrate subrange (LBR, MBR, HBR). <li>15% if calculated for each bitrate subrange (LBR, MBR, HBR).
</li> </li>
</ul> </ul>
<t>Since values of the two objective metrics (PSNR and MS-SSIM) are available fo <t>Since values of the two objective metrics (PSNR and MS-SSIM) are available
r some color planes, each value should meet these coding efficiency requirements for some color planes, each value should meet these coding efficiency
, i.e. the final BD-rate saving denoted as S is calculated for a given color pla requirements. That is, the final BD-rate saving denoted as S is calculated for
ne as follows: a given color plane as follows:
</t> </t>
<sourcecode>
S = min { S_psnr, S_ms-ssim }, <artwork name="" type="" align="left" alt=""><![CDATA[
</sourcecode> S = min { S_psnr, S_ms-ssim }
]]></artwork>
<t>where S_psnr and S_ms-ssim are BD-rate savings calculated for the given color plane using PSNR and MS-SSIM metrics, respectively. <t>where S_psnr and S_ms-ssim are BD-rate savings calculated for the given color plane using PSNR and MS-SSIM metrics, respectively.
</t> </t>
<t>In addition to the objective quality measures defined above, subjective evalu <t>In addition to the objective quality measures defined above, subjective
ation must also be performed for the final NETVC codec adoption. For subjective evaluation must also be performed for the final NETVC codec adoption. For
tests, the MOS-based evaluation procedure must be used as described in section 2 subjective tests, the MOS-based evaluation procedure must be used as described
.1 of [3]. For perception-oriented tools that primarily impact subjective qualit in Section 2.1 of <xref target="ISO29170-1" />. For perception-oriented tools th
y, additional tests may also be individually assigned even for intermediate eval at primarily impact subjective quality, additional tests may also be individuall
uation, subject to a decision of the NETVC WG. y assigned even for intermediate evaluation, subject to a decision of the NETVC
WG.
</t> </t>
</section> <!-- ends: "5 from line 563--> </section>
<section title="Security Considerations"> <!-- 6, line 648--> <section title="Security Considerations">
<t>This document itself does not address any security considerations. However, i <t>This document itself does not address any security considerations. However,
t is worth noting that a codec implementation (for both an encoder and a decoder it is worth noting that a codec implementation (for both an encoder and a
) should take into consideration the worst-case computational complexity, memory decoder) should take into consideration the worst-case computational
bandwidth, and physical memory size needed to processes the potentially untrust complexity, memory bandwidth, and physical memory size needed to process the
ed input (e.g., the decoded pictures used as references). potentially untrusted input (e.g., the decoded pictures used as references).
</t> </t>
</section> <!-- ends: "6 from line 648--> </section>
<section title="IANA Considerations"> <!-- 7, line 653--> <section title="IANA Considerations">
<t>This document has no IANA actions. <t>This document has no IANA actions.
</t> </t>
</section> <!-- ends: "7 from line 653--> </section>
<section title="References"> <!-- 8, line 658--> </middle>
<section title="Normative References"> <!-- 8.1, line 661-->
<t>[1] Recommendation ITU-R BT.2020-2: Parameter values for ultra- high defini <back>
tion television systems for production and international programme exchange, 201
5. <references>
</t> <name>References</name>
<t>[2] Recommendation ITU-T G.1091: Quality of Experience requirements for tel <references>
epresence services, 2014. <name>Normative References</name>
</t>
<t>[3] ISO/IEC PDTR 29170-1: Information technology -- Advanced image coding a <reference anchor="BT2020-2" target="https://www.itu.int/rec/R-REC-BT.202
nd evaluation methodologies -- Part 1: Guidelines for 0-2-201510-I/en">
</t> <front>
<t>[4] ISO/IEC 23008-2:2015. Information technology -- High efficiency coding <title>Parameter values for ultra-high definition television systems
and media delivery in heterogeneous environments -- Part 2: High efficiency vide for production and international programme exchange</title>
o coding <author>
</t> <organization>ITU-R</organization>
<t>[5] Recommendation ITU-T H.265: High efficiency video coding, 2013. </author>
</t> <date month="October" year="2015" />
<t>[6] High Efficiency Video Coding (HEVC) reference software (HEVC Test Model </front>
also known as HM) at the web-site of Fraunhofer Institute for Telecommunication <seriesInfo name="ITU-R Recommendation" value="BT.2020-2" />
s, Heinrich Hertz Institute (HHI): https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSo </reference>
ftware/
</t> <reference anchor="G1091" target="https://www.itu.int/rec/T-REC-G.1091/en">
</section> <!-- ends: "8.1 from line 661-->
<section title="Informative References"> <!-- 8.2, line 683--> <front>
<t>[7] Definition of the term "high dynamic range imaging" at the web-site of <title>Quality of Experience requirements for telepresence
Federal Agencies Digital Guidelines Initiative: http://www.digitizationguideline services</title>
s.gov/term.php?term=highdynami crangeimaging <author>
</t> <organization>ITU-T</organization>
<t>[8] Definition of the term "compression, visually lossless" at the web-site </author>
of Federal Agencies Digital Guidelines Initiative: http://www.digitizationguide <date month="October" year="2014" />
lines.gov/term.php?term=compressio nvisuallylossless </front>
</t> <seriesInfo name="ITU-T Recommendation" value="G.1091" />
<t>[9] S. Wenger, "The case for scalability support in version 1 of Future Vid </reference>
eo Coding," Document COM 16-C 988 R1-E of ITU-T Video Coding Experts Group (ITU-
T Q.6/SG 16), Geneva, Switzerland, September 2015. <reference anchor="ISO29170-1" target="https://www.iso.org/standard/63637.html">
</t> <front>
<t>[10] "Recommended upload encoding settings (Advanced)" for the YouTube video <title>Information technology -- Advanced image coding and evaluation --
-sharing service: https://support.google.com/youtube/answer/1722171?hl=en Part 1: Guidelines for image coding system evaluation</title>
</t> <author>
<t>[11] H. Yu, K. McCann, R. Cohen, and P. Amon, "Requirements for future exten <organization>ISO</organization>
sions of HEVC in coding screen content", Document N14174 of Moving Picture Exper </author>
ts Group (ISO/IEC JTC 1/SC 29/ WG 11), San Jose, USA, January 2014. <date month="October" year="2017" />
</t> </front>
<t>[12] Manindra Parhy, "Game streaming requirement for Future Video Coding," D <seriesInfo name="ISO/IEC" value="TR 29170-1:2017" />
ocument N36771 of ISO/IEC Moving Picture Experts Group (ISO/IEC JTC 1/SC 29/WG 1 </reference>
1), Warsaw, Poland, June 2015.
</t> <reference anchor="ISO23008-2" target="https://www.iso.org/standard/67660.html">
<t>[13] G. Bjontegaard, "Calculation of average PSNR differences between RD-cur <front>
ves," Document VCEG-M33 of ITU-T Video Coding Experts Group (ITU-T Q.6/SG 16), A <title>Information technology -- High efficiency coding and media
ustin, Texas, USA, April 2001. delivery in heterogeneous environments -- Part 2: High efficiency video
</t> coding</title>
<t>[14] T. Daede, A. Norkin, and I. Brailovskiy, "Video Codec Testing and Quali <author>
ty Measurement", draft-ietf-netvc-testing-08(work in progress), January 2019, p. <organization>ISO</organization>
23. </author>
</t> <date month="May" year="2018" />
<t>[15] Z. Wang, E. P. Simoncelli, and A. C. Bovik, "Multi-scale structural sim </front>
ilarity for image quality assessment," Invited Paper, IEEE Asilomar Conference o <seriesInfo name="ISO/IEC" value="23008-2:2015" />
n Signals, Systems and Computers, Nov. 2003, Vol. 2, pp. 1398-1402. </reference>
</t>
<t>[16] F. Bossen, "Common test conditions and software reference configuration <reference anchor="H265" target="https://www.itu.int/rec/T-REC-H.265">
s," Document JCTVC-L1100 of Joint Collaborative Team on Video Coding (JCT-VC) of
the ITU-T Video Coding Experts Group (ITU-T Q.6/SG 16) and ISO/IEC Moving Pictu <front>
re Experts Group (ISO/IEC JTC 1/SC 29/WG 11), Geneva, Switzerland, January 2013. <title>High efficiency video coding</title>
</t>
</section> <!-- ends: "8.2 from line 683--> <author>
</section> <!-- ends: "8 from line 658--> <organization>ITU-T</organization>
<section title="Acknowledgments"> <!-- 9, line 716--> </author>
<t>The authors would like to thank Mr. Paul Coverdale, Mr. Vasily
Rufitskiy, and Dr. Jianle Chen for many useful discussions on this <date month="November" year="2019" />
document and their help while preparing it as well as Mr. Mo Zanaty,
Dr. Minhua Zhou, Dr. Ali Begen, Mr. Thomas Daede, Mr. Adam Roach, </front>
Dr. Thomas Davies, Mr. Jonathan Lennox, Dr. Timothy Terriberry, <seriesInfo name="ITU-T Recommendation" value="H.265" />
Mr. Peter Thatcher, Dr. Jean-Marc Valin, Mr. Roman Danyliw, Mr. Jack </reference>
Moffitt, Mr. Greg Coppa, and Mr. Andrew Krupiczka for their valuable <reference anchor="HEVC" target="https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoft
comments on different revisions of this document. ware/">
<front>
<title>High Efficiency Video Coding (HEVC) reference software (HEVC
Test Model also known as HM)</title>
<author>
<organization>Fraunhofer Institute for
Telecommunications</organization>
</author>
</front>
</reference>
</references>
<references>
<name>Informative References</name>
<reference anchor="HDR"
target="http://www.digitizationguidelines.gov/term.php?term=highdynami
crangeimaging">
<front>
<title>Term: High dynamic range imaging</title>
<author>
<organization>Federal Agencies Digital Guidelines Initiative</organizat
ion>
</author>
</front>
</reference>
<reference anchor="COMPRESSION"
target="http://www.digitizationguidelines.gov/term.php?term=compressio
nvisuallylossless">
<front>
<title>Term: Compression, visually lossless</title>
<author>
<organization>Federal Agencies Digital Guidelines Initiative</organizat
ion>
</author>
</front>
</reference>
<reference anchor="SG-16" target="https://www.itu.int/md/T13-SG16-C-0988/en">
<front>
<title>The case for scalability support in version 1 of Future Video Codi
ng</title>
<author surname="Wenger" initials="S">
<organization>ITU-T</organization>
</author>
<date month="September" year="2015" />
</front>
<seriesInfo name="SG 16 (Study Period 2013)" value="Contribution 988" />
</reference>
<reference anchor="YOUTUBE"
target="https://support.google.com/youtube/answer/1722171?hl=en">
<front>
<title>Recommended upload encoding settings</title>
<author>
<organization>YouTube</organization>
</author>
</front>
</reference>
<reference anchor="HEVC-EXT" target="https://mpeg.chiariglione.org/standards/mpe
g-h/high-efficiency-video-coding/requirements-extension-hevc-coding-screen-conte
nt">
<front>
<title>Requirements for an extension of HEVC for coding of screen content
</title>
<author surname="Yu" initials="H" role="editor"/>
<author surname="McCann" initials="K" role="editor"/>
<author surname="Cohen" initials="R" role="editor"/>
<author surname="Amon" initials="P" role="editor"/>
<date month="January" year="2014" />
</front>
<seriesInfo name="ISO/IEC JTC 1/SC 29/WG 11 Moving Picture Experts Group"
value="MPEG2013/N14174" />
<seriesInfo name="San Jose," value="USA" />
</reference>
<reference anchor="GAME" target="">
<front>
<title>Game streaming requirement for Future Video Coding</title>
<author surname="Parhy" initials="M"/>
<date month="June" year="2015" />
</front>
<seriesInfo name="ISO/IEC JTC 1/SC 29/WG 11 Moving Picture Experts Group"
value="N36771" />
<seriesInfo name="Warsaw," value="Poland" />
</reference>
<reference anchor="PSNR" target="https://www.itu.int/wftp3/av-arch/video-site/01
04_Aus/">
<front>
<title>Calculation of average PSNR differences between RD-curves</title>
<author surname="Bjontegaard" initials="G">
<organization>ITU-T</organization>
</author>
<date month="April" year="2001" />
</front>
<seriesInfo name="SG 16" value="VCEG-M33" />
</reference>
<!-- draft-ietf-netvc-testing-09 exists -->
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.draf
t-ietf-netvc-testing-09.xml"/>
<reference anchor="MULTI-SCALE" target="https://ieeexplore.ieee.org/document/129
2216">
<front>
<title>Multiscale structural similarity for image quality assessment</tit
le>
<author surname="Wang" initials="Z"/>
<author surname="Simoncelli" initials="E.P."/>
<author surname="Bovik" initials="A.C."/>
<date month="November" year="2003" />
</front>
<seriesInfo name="IEEE" value="Thirty-Seventh Asilomar Conference on
Signals, Systems and Computers" />
<seriesInfo name="DOI" value="10.1109/ACSSC.2003.1292216" />
</reference>
<reference anchor="CONDITIONS"
target="http://phenix.it-sudparis.eu/jct/doc_end_user/current_document
.php?id=7281">
<front>
<title>Common HM test conditions and software reference configurations</t
itle>
<author surname="Bossen" initials="F">
</author>
<date month="April" year="2013" />
</front>
<seriesInfo name="Joint Collaborative Team on Video Coding (JCT-VC) of
the ITU-T Video Coding Experts Group (ITU-T Q.6/SG 16)
and ISO/IEC Moving Picture Experts Group (ISO/IEC JTC
1/SC 29/WG 11)" value="" />
<seriesInfo name="Document" value="JCTVC-L1100" />
</reference>
<reference anchor="BT601" target="https://www.itu.int/rec/R-REC-BT.601/">
<front>
<title>Studio encoding parameters of digital television for standard 4:3
and wide screen 16:9 aspect ratios</title>
<author>
<organization>ITU-R</organization>
</author>
<date month="March" year="2011" />
</front>
<seriesInfo name="ITU-R Recommendation" value="BT.601" />
</reference>
<reference anchor="ISO14496-10" target="https://www.iso.org/standard/75400.html"
>
<front>
<title>Information technology -- Coding of audio-visual objects -- Part
10: Advanced video coding</title>
<author>
<organization>ISO/IEC</organization>
</author>
</front>
<seriesInfo name="ISO/IEC DIS" value="14496-10" />
</reference>
<reference anchor="ISO14496-15" target="https://www.iso.org/standard/74429.html"
>
<front>
<title>Information technology — Coding of audio-visual objects — Part
15: Carriage of network abstraction layer (NAL) unit structured video
in the ISO base media file format</title>
<author>
<organization>ISO/IEC</organization>
</author>
</front>
<seriesInfo name="ISO/IEC" value="14496-15" />
</reference>
<reference anchor="BT709" target="https://www.itu.int/rec/R-REC-BT.709">
<front>
<title>Parameter values for the HDTV standards for production and
international programme exchange</title>
<author>
<organization>ITU-R</organization>
</author>
<date month="June" year="2015" />
</front>
<seriesInfo name="ITU-R Recommendation" value="BT.709" />
</reference>
</references>
</references>
<section anchor="sect-8" numbered="false" toc="default">
<name>Acknowledgments</name>
<t>The authors would like to thank <contact fullname="Mr. Paul Coverdale"/>,
<contact fullname="Mr. Vasily Rufitskiy"/>, and <contact fullname="Dr. Jianle
Chen"/> for many useful discussions on this document and their help while
preparing it, as well as <contact fullname="Mr. Mo Zanaty"/>, <contact
fullname="Dr. Minhua Zhou"/>, <contact fullname="Dr. Ali Begen"/>, <contact
fullname="Mr. Thomas Daede"/>, <contact fullname="Mr. Adam Roach"/>, <contact
fullname="Dr. Thomas Davies"/>, <contact fullname="Mr. Jonathan Lennox"/>,
<contact fullname="Dr. Timothy Terriberry"/>, <contact fullname="Mr. Peter
Thatcher"/>, <contact fullname="Dr. Jean-Marc Valin"/>, <contact
fullname="Mr. Roman Danyliw"/>, <contact fullname="Mr. Jack Moffitt"/>,
<contact fullname="Mr. Greg Coppa"/>, and <contact fullname="Mr. Andrew
Krupiczka"/> for their valuable comments on different revisions of this
document.
</t> </t>
</section>
</section> <!-- ends: "9 from line 716-->
</middle>
<back>
</back> </back>
</rfc> </rfc>
<!-- generated from file draft-ietf-netvc-requirements-10.nroff with nroff2xml 0 .1.0 by Tomek Mrugalski -->
 End of changes. 130 change blocks. 
797 lines changed or deleted 1377 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/