Network Working Group M. Westerlund Internet-Draft B. Burman Intended status: Standards Track Ericsson Expires: November 17, 2012 May 16, 2012 Codec Control for WebRTC draft-westerlund-rtcweb-codec-control-00 Abstract This document proposes how WebRTC should handle media codec control between peers. With media codec control we mean such parameters as video resolution and frame-rate. This includes both initial establishment of capabilities using the SDP based JSEP signalling and during ongoing real-time interactive sessions in response to user and application events. The solution uses SDP for initial boundary establishment that are rarely, if ever changed. During the session the RTCP based Codec Operations Point (COP) signaling solution is used for dynamic control of parameters enabling timely and responsive controls. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 17, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Westerlund & Burman Expires November 17, 2012 [Page 1] Internet-Draft Abbreviated-Title May 2012 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Abrevations . . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 4 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Requirements and Motivations . . . . . . . . . . . . . . . . . 5 4.1. Use Cases and Requirements . . . . . . . . . . . . . . . . 5 4.2. Motivations . . . . . . . . . . . . . . . . . . . . . . . 11 4.2.1. Performance . . . . . . . . . . . . . . . . . . . . . 11 4.2.2. Ease of Use . . . . . . . . . . . . . . . . . . . . . 13 5. SDP Usage . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6. COP Usage . . . . . . . . . . . . . . . . . . . . . . . . . . 14 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 8. Security Considerations . . . . . . . . . . . . . . . . . . . 15 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 10.1. Normative References . . . . . . . . . . . . . . . . . . . 16 10.2. Informative References . . . . . . . . . . . . . . . . . . 17 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 Westerlund & Burman Expires November 17, 2012 [Page 2] Internet-Draft Abbreviated-Title May 2012 1. Introduction In WebRTC there exist need for codec control to improve the efficiency and user experience of its real-time interactive media transported over a PeerConnection. The fundamentals of the codec control is that the media receiver provides preference for how it would like the media to be encoded to best suit the receiver's consumption of the media stream. This includes parameters such as video resolution and frame-rate, and for audio number of channels and audio bandwidth. It also includes non media specific properties such as how to provision available transmission bit-rates between different media streams. This document proposes a specific solution for how to accomplish codec control that meets the goals and requirements. It is based on establishing the outer boundaries, when it comes to codec support and capabilities, at PeerConnection establishment using JSEP [I-D.ietf-rtcweb-jsep] and SDP [RFC4566]. During an ongoing session the preferred parameters are signalled using the Codec Operation Point RTCP Extension (COP) [I-D.westerlund-avtext-codec-operation-point]. The java script Application will primarily make its preferences made clear through its usage of the media elements, like selecting the size of the rendering area for video. But it can also use the constraints concept in the API to indicate preferences that the browser can weigh into its decision to request particular preferred parameters. This document provides a more detailed overview of the solution. Then it discusses the use cases and requirements that motivates the solution, followed by an analysis of the benefits and downsides of the proposed solution. This is followed by a proposed specification of how WebRTC should use SDP and COP. 2. Definitions 2.1. Abrevations The following Abbreviations are used in this document. COP: Codec Operation Point RTCP Extension, the solution for codec control defined in [I-D.westerlund-avtext-codec-operation-point]. JSEP: Java script Session Establishment Protocol [I-D.ietf-rtcweb-jsep]. Westerlund & Burman Expires November 17, 2012 [Page 3] Internet-Draft Abbreviated-Title May 2012 RTP: Real-time Transport Protocol [RFC3550]. SDP: Session Description Protocol [RFC4566]. 2.2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 3. Overview The basic idea in this proposal is to use JSEP to establish the outer limits for behavior and then use Codec Operation Point (COP) [I-D.westerlund-avtext-codec-operation-point] proposal to handle dynamic changes during the session. Boundary conditions are typically media type specific and in some cases also codec specific. Relevant for video are highest resolution, frame-rate and maximum complexity. These can be expressed in JSEP SDP for H.264 using the H.264 RTP payload format [RFC6184] specifying the profile and level concept. The authors expect something similar for the VP8 payload format [I-D.ietf-payload-vp8]. During the session the browser implementation detects when there is need to use COP to do any of the following things. a. Request new target values for codec operation, for example based on that the GUI element displaying a video has changed due to window resize or purpose change. This includes parameters such as resolution, frame-rate, and picture aspect ratio. b. Change parameters due to changing display screen attached to the device. Affected parameters include resolution, picture aspect ratio and sample aspect ratio. c. Indicate when the end-point changes encoding parameters in its role as sender. d. Change important parameters affecting the transport for media streams such as a maximum media bit-rate, token bucket size (to control the burstiness of the sender), used RTP payload type, maximum RTP packet size, application data unit Aggregation (to control amount of audio frames in the same RTP packet). Westerlund & Burman Expires November 17, 2012 [Page 4] Internet-Draft Abbreviated-Title May 2012 e. Affect the relative prioritization of media streams. The receiving client may send a COP request in RTCP to request some set of parameters to be changed according to the receiving client's preferences. The applications preferences are primarily indicated through its usage of the media elements. But there exist cases and properties where the application will have to provide additional preference information for example using the constraints. The browser implementation takes all these information into account when expressing preference using a set of parameters. The media sender evaluates the request and weights it against other potential receiver's requests and may update one or more (if scalability is supported) codec operation points to better suit the receivers. Any new operation point(s) are announced using a COP Notification. Independently if the codec operation point(s) are changed or not, the COP request is acknowledged using a COP status message. Using RTCP and "Extended Secure RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)" [RFC5124] the COP message can in most cases be sent immediately or with a very small delay. As the message travels in the media plane it will reach the peer or the next middlebox that are part of the media path directly. 4. Requirements and Motivations This section discusses the use cases and the requirements for codec control. This includes both the ones explicitly discussed in the use case document but also derived ones. This is followed by a discussion why the proposed mechanism is considered the most suitable for WebRTC. 4.1. Use Cases and Requirements There are use cases and derived requirements in "Web Real-Time Communication Use-cases and Requirements" [I-D.ietf-rtcweb-use-cases-and-requirements]. A Selection of interesting Use Cases and the description parts that are most applicable to Codec Control are: 4.2.1 - Simple Video Communication Service: Two or more users have loaded a video communication web application into their browsers, provided by the same service provider, and logged into the service it provides. The web service publishes information about user login status by pushing updates to the web application in the Westerlund & Burman Expires November 17, 2012 [Page 5] Internet-Draft Abbreviated-Title May 2012 browsers. When one online user selects a peer online user, a 1-1 video communication session between the browsers of the two peers is initiated. The invited user might accept or reject the session. During session establishment a self-view is displayed, and once the session has been established the video sent from the remote peer is displayed in addition to the self-view. During the session, each user can select to remove and re-insert the self- view as often as desired. Each user can also change the sizes of his/her two video displays during the session. Each user can also pause sending of media (audio, video, or both) and mute incoming media The two users may be using communication devices of different makes, with different operating systems and browsers from different vendors. 4.2.10 - Multiparty video communication: In this use-case is the Simple Video Communication Service use-case (Section 4.2.1) is extended by allowing multiparty sessions. No central server is involved - the browser of each participant sends and receives streams to and from all other session participants. The web application in the browser of each user is responsible for setting up streams to all receivers. In order to enhance intelligibility, the web application pans the audio from different participants differently when rendering the audio. This is done automatically, but users can change how the different participants are placed in the (virtual) room. In addition the levels in the audio signals are adjusted before mixing. Another feature intended to enhance the use experience is that the video window that displays the video of the currently speaking peer is highlighted. Each video stream received is by default displayed in a thumbnail frame within the browser, but users can change the display size. Note: What this use-case adds in terms of requirements is capabilities to send streams to and receive streams from several peers concurrently, as well as the capabilities to render the video from all received streams and be able to spatialize, level adjust and mix the audio from all received streams locally in the browser. It also adds the capability to measure the audio level/ activity. Westerlund & Burman Expires November 17, 2012 [Page 6] Internet-Draft Abbreviated-Title May 2012 4.3.3 - Video conferencing system with central server: An organization uses a video communication system that supports the establishment of multiparty video sessions using a central conference server. The browser of each participant send an audio stream (type in terms of mono, stereo, 5.1, ... depending on the equipment of the participant) to the central server. The central server mixes the audio streams (and can in the mixing process naturally add effects such as spatialization) and sends towards each participant a mixed audio stream which is played to the user. The browser of each participant sends video towards the server. For each participant one high resolution video is displayed in a large window, while a number of low resolution videos are displayed in smaller windows. The server selects what video streams to be forwarded as main- and thumbnail videos respectively, based on speech activity. As the video streams to display can change quite frequently (as the conversation flows) it is important that the delay from when a video stream is selected for display until the video can be displayed is short. Note: This use-case adds requirements on support for fast stream switches F7. There exist several solutions that enable the server to forward one high resolution and several low resolution video streams: a) each browser could send a high resolution, but scalable stream, and the server could send just the base layer for the low resolution streams, b) each browser could in a simulcast fashion send one high resolution and one low resolution stream, and the server just selects or c) each browser sends just a high resolution stream, the server transcode into low resolution streams as required. The derived requirements that applies to codec control are: F3: Transmitted streams MUST be rate controlled. F6: The browser MUST be able to handle high loss and jitter levels in a graceful way. F7: The browser MUST support fast stream switches. F24: The browser MUST be able to take advantage of capabilities to prioritize voice and video appropriately. Westerlund & Burman Expires November 17, 2012 [Page 7] Internet-Draft Abbreviated-Title May 2012 F25: The browser SHOULD use encoding of streams suitable for the current rendering (e.g. video display size) and SHOULD change parameters if the rendering changes during the session. It might not be obvious how some of the above requirements actually have impact on the question of controlling the media encoder in a transmitter so let's go through what the document authors consider be its applicability. But let's start with reviewing the topologies that exist. Peer to Peer: This is the basic topology used in use case "Simple Video Communication Service". Two end-points communicating directly with each other. A PeerConnection directly connects the source and the sink of the media stream. Thus in this case it is simple and straightforward to feed preferences from the sink into the source's media encoder to produce the best possible match that the source is capable of, given the preferences. Peer to Multiple Peers: A given source have multiple PeerConnections going from the source to a number of receivers, i.e. sinks as described by use case "Multiparty video communication". In some implementations this will be implemented as Peer to Peer topology where only the source for the raw media is common between the different PeerConnections. On more resource constrained devices that can't afford individual media encodings for each PeerConnection the media stream is to be delivered over, there exist a need to merge the different preferences from the different receivers into a single or a set of fewer configurations that can be produced. For codecs that has scalability features, it might be possible to produce multiple actual operation points in a single encoding and media stream. For example multiple frame rates can be produced by H.264 by encoding using a frame structure where some frames can be removed to produce a lower bit-rate and lower frame rate version of the stream. Thus possibly allowing multiple concurrent operation points to be produced to meet the demands for an even larger number of preferred operation points. Centralized Conferencing: This topology consists of a central server to which each conference participant connects his PeerConnection(s). Over that PeerConnection the participant will receive all the media streams the conference service thinks should be sent and delivered. The actual central node can work in several different modes for media streams. It can be a very simple relay node (RTP transport translator [RFC5117]), where it forwards all media streams arriving to it to the other participants, forming a common RTP session among all participants with full visibility. Another mode of operation would be an RTP mixer that forwards selected media streams using a set of SSRC the Westerlund & Burman Expires November 17, 2012 [Page 8] Internet-Draft Abbreviated-Title May 2012 RTP mixer has. The third mode is to perform actual media mixing such as where audio is mixed and video is composited into a new video image and encoded again. This results in two different behaviors in who needs to merge multiple expressed preferences. For a simple relay central node, the merge of preferences may be placed on the end-point, similar to the resource constrained peer to multiple peer case above. The second alternative is to let the central node merge the preferences into a single set of preferences, which is then signalled to the media source end-point. Note: In the above it might be possible to establish multiple PeerConnections between an end-point and the central node. The different PeerConnections would then be used to express different preferences for a given media stream. This enables simulcast delivery to the central node so that it can use more than a single operation point to meet the preferences expressed by the multiple receiving participants. That approach can improve the media quality for end-points capable of receiving and using a higher media quality, since they can avoid being constrained by the lowest common denominator of a single operation point. Peer Relayed: This is not based on an explicit use case in the use case document. It is based on a usage that appears possible to support, and for which there has been interest. The topology is that Peer A sources a media stream and sends it over a PeerConnection to B. B in its turn has a PeerConnection to Peer C. B chooses to relay the incoming media stream from A to C. To maintain quality, it is important that B does not decode and re- encode the media stream (transcoding). Thus a case arises where B will have to merge the preferences from itself and C into the preferences it signals to A. Comments on the applicability of the requirement on the codec control: F3: This requirement requires rate-control on the media streams. There will also be multiple media streams being sent to or from a given end-point. Combined, this creates a potential issue when it comes to prioritization between the different media streams and what policy to use to increase and decrease the bit-rate provided for each media stream. The application's preferences combined with other properties such as current resolution and frame-rate affects which parameter that is optimal to use when bit-rate needs to be changed. The other aspect is if one media stream is less relevant so that reducing that stream's quality or even terminating the transmission while keeping others unchanged is the Westerlund & Burman Expires November 17, 2012 [Page 9] Internet-Draft Abbreviated-Title May 2012 best choice for the application. In other cases, applying the cheese cutter principle and reduce all streams in equal proportion is the most desirable. Another aspect is the potential for requesting aggregation of multiple audio frames in the same RTP packet to reduce the overhead and thus lower the bit-rate for some increased delay and packet loss sensitivity. F6: The browser MUST be able to handle high loss and jitter levels in a graceful way. When such conditions are encountered, it will be highly beneficial for the receiver to be able to indicate that the sender should try to combat this by changing the encoding and media packetization. For example for audio it might be beneficial to aggregate several frames together and apply additional levels of FEC on those fewer packets that are produced to reduce the residual audio frame loss. F7: The browser MUST support fast stream switches. Fast stream switches occur in several ways in WebRTC. One is in the centralized conferencing when relay based central nodes turn on and off individual media streams depending on the application's current needs. Another is RTP mixers that switches input sources for a given outgoing SSRC owned by the mixer. This should have minimal impact on a receiver as there is no SSRC change. Along the same lines, the application can cause media stream changes by changing their usage in the application. By changing the usage of a media stream from being the main video to become a thumbnail of one participant in the session, there exist a need to rapidly switch the video resolution to enable high efficiency and avoid increased bit-rate usage. F24: The browser MUST be able to take advantage of capabilities to prioritize voice and video appropriately. This requirement comes from the QoS discussion present in use case 4.2.6 (Simple Video Communication Service, QoS). This requirement assumes that the application has a preference for one media type over another. Given this assumption, the same prioritization can actually occur for a number of codec parameters when there exist multiple media streams and one can individually control these media streams. This is another aspect of the discussion for requirement F3. F25: The browser SHOULD use encoding of streams suitable for the current rendering (e.g. video display size) and SHOULD change parameters if the rendering changes during the session. This requirement comes from a very central case that a receiving application changes the display layout and where it places a given media stream. Thus changing the amount of screen estate that the media stream is viewed on also changes what resolution that would be the optimal to use from the media sender. However, this Westerlund & Burman Expires November 17, 2012 [Page 10] Internet-Draft Abbreviated-Title May 2012 discussion should not only apply to video resolution. Additional application preferences should result in preferences being expressed to the media sender also for other properties, such as video frame-rate. For audio, number of audio channels and the audio bandwidth are relevant properties. The authors hope this section has provided a sufficiently clear picture that there exist both multiple topologies with different behaviors, and different points where preferences might need to be merged. The discussion of the requirements also provides a view that there are multiple parameters that needs to be expressed, not only video resolution. 4.2. Motivations This section discusses different types of motivations for this solution. It includes comparison to the solution described in "RTCWEB Resolution Negotiation" [I-D.alvestrand-rtcweb-resolution]. 4.2.1. Performance The proposed solution has the following performance characteristics. The initial phase, establishing the boundaries, is done in parallel with the media codec negotiation and establishment of the PeerConnection. Thus using the signalling plane is optimal as this process does not create additional delay or significant overhead. During an ongoing communication session, using COP messages in RTCP has the following properties: Path Delay: The COP messages are contained in the RTCP packets being sent over the PeerConnection, i.e. the most optimal peer to peer path that ICE has managed to get to work. Thus one can expect this path to be equal or shorter in delay than the signalling path being used between the PeerConnection end-points. If the signalling message is instead sent over the PeerConnection's data channel, it will be using the same path. In almost all cases, the direct path between two peers will also be shorter than a path going via the webserver. Media Plane: The COP messages will always go to the next potential RTP/RTCP processing point, i.e. the one on the other side of the PeerConnection. Even for multiparty sessions using centralized servers, the COP message may need to be processed in the middle to perform the merger of the different participant's preferences. Westerlund & Burman Expires November 17, 2012 [Page 11] Internet-Draft Abbreviated-Title May 2012 Overhead: An RTCP COP message can be sent as reduced size RTCP message [RFC5506] thus having minimal unnecessary baggage. For example a COP Request message requesting a new target resolution from a single SSRC will be 29 bytes. Using reduced size RTCP keeps the average RTCP size down and enables rapid recovery of the early allowed flag in early mode and in more cases enable the immediate mode. Minimal Blocking: Using RTCP lets the transmission of COP messages be governed by RTCP's transmission rules. As WebRTC will be using the SAVPF profile it is possible to use the early mode, allowing an early transmission of an RTCP packet carrying a feedback event, like a COP request, to be sent with little delay. It might even be possible to determine that the immediate mode of operation can be enabled, thus allowing the RTCP feedback events to be sent immediate in all cases while using the mode. The small overhead and usage of reduced size RTCP will help ensure that the times when transmission of a COP message will be blocked is a rare event and will quickly be released. The next aspect of RTCP blocking is that we expect that the application will need to rapidly switch back and forth between codec parameters. Thus requiring both a protocol that allows quick setting of parameters and also the possibility to revert back to previous preferences while the request is outstanding. COP has support for such updated requests, even if the first request is in flight. If the above is compared to the properties that Harald Alvestrand's proposal [I-D.alvestrand-rtcweb-resolution] has, the following differences are present. When it comes to signalling path delay, a signalling plane based solution will in almost all cases at best have the same path delay as a media plane solution, achieved by using the data channel to carry the signalling. There the only difference will be the message size, which will only incur a minor difference in transfer times. But in cases where the application has not implemented use of the data channel, the signalling path will be worse, possibly significantly. Using the signalling plane for solutions based on centralized conference mixers can easily result in that the request message needs to be processed in the webserver before being forwarded to the mixer node that actually processes the media, followed by the central mixer triggering additional signalling messages to other end-points that also needs to react. This can be avoided assuming that the data channel is used for signalling transport. Using the media plane for such signalling will be equal or better in almost all cases. Westerlund & Burman Expires November 17, 2012 [Page 12] Internet-Draft Abbreviated-Title May 2012 When it comes to blocking, there exist a significant issue with using JSEP to carry this type of messages. Someone that has sent an SDP offer in an offer/answer exchange is blocked from sending a new update until it has received a final or provisional answer to that offer. Here COP has a great advantage as the design has taken rapid change of parameters into consideration and allows multiple outstanding requests. 4.2.2. Ease of Use We see a great benefit in that COP can be allowed to be mainly driven by the browser implementation and its knowledge of how media elements are currently being used by the application. For example the video resolution of the display area can be determined by the browser, further determining that the resource consumption would be reduced and the image quality improved or at least maintained by requesting another target resolution better suiting the current size. There are also other metrics or controls that exist in the browser space, like the congestion control that can directly use the COP signalling to request more suitable parameters given the situation. Certain application preferences can't be determined based solely on the usage. Thus using the constraints mechanism to indicate preferences is a very suitable solution for most such properties. For example the relative priority of media streams, or a desire for lower frame rate to avoid reductions in resolution or image quality SNR are likely to need constraints. This type of operation results in better performance for simple applications where the implementor isn't as knowledged about the need to initiate signalling to trigger a change of video resolution. Thus providing good performance in more cases and having less amount of code in their applications. Still more advanced applications should have influence on the behavior. This can be realized in several ways. One is to use the constraints to inform the browser about the application's preferences how to treat the different media streams, thus affecting how COP is used. If required, it is possible to provide additional API functionalities for the desired controls. The authors are convinced that providing ease of use for the simple application is important. Providing more advanced controls for the advanced applications is desirable. Westerlund & Burman Expires November 17, 2012 [Page 13] Internet-Draft Abbreviated-Title May 2012 5. SDP Usage SDP SHALL be used to establish boundaries and capabilities for the media codec control in WebRTC. This includes the following set of capabilities that is possible to express in SDP: Codec Capabilities: For all media codecs it is needed to determine what capabilities that are available if there exist optional functionalities. This concerns both media encoding and the RTP payload format as codec control can affect both. For codecs where the span of complexities are large there might exist need to express the level of complexity supported. For Video codecs like H.264 this can be expressed by the profile level ID. These capabilities are expected to be defined by the RTP payload format or in SDP attributes defined in the RTP payload formats to be used. COP Parameters Supported: SDP SHALL be used to negotiate the set of COP parameters that the peers can express preferences for and for which they will send notification on their sets of parameter values used. 6. COP Usage An WebRTC end-point SHALL implement Codec Operation Point RTCP Extension [I-D.westerlund-avtext-codec-operation-point]. The following COP parameters SHALL be supported: o Payload Type o Bitrate o Token Bucket Size o Framerate o Horizontal Pixels o Vertical Pixels o Maximum RTP Packet Size o Maximum RTP Packet Rate o Application Data Unit Aggregation Please note that also the ALT and ID parameters must be implemented Westerlund & Burman Expires November 17, 2012 [Page 14] Internet-Draft Abbreviated-Title May 2012 in COP for COP to correctly function. To make COP usage efficient the end-point SHALL implement Reduced size RTCP packets [RFC5506]. To provide in addition to requesting specific frame-rates also the RTCP Codec Control Messages "Temporal-Spatial Trade-off Request and Notification" [RFC5104] . This enables a receiver to make a relative indication of their preferred trade-off between spatial and temporal quality. This provides an highly useful indication to the media sender about what the receiver prefer in a relative sense. The COP framerate or resolution parameters can be used to further provides target, max or min values to further indicate within which set of parameters the sender should find this relative trade-off. To enable an receiver to temporarily halt or pause delivery of a given media stream an WebRTC end-point SHALL also implement "RTP Media Stream Pause and Resume" [I-D.westerlund-avtext-rtp-stream-pause]. This is important COP related features as described by the use case and motivations to enable the receiver to indicate that it prefers to have a given media stream halted if the aggregate media bit-rate is reduced. It can also be used to recover aggregate media bit-rate when the application has no current use of a given media stream, but may rapidly need it again due to interactions in the application or with other instances. 7. IANA Considerations This document makes no request of IANA. Note to RFC Editor: this section may be removed on publication as an RFC. 8. Security Considerations The usage of COP and its security issues are described in [I-D.westerlund-avtext-codec-operation-point]. The main threats to this usage of COP are the following things: a. That the SDP based codec boundary signalling and COP parameter negotiation could be intercepted and modified. Thus enabling denial of service attacks on the end-points reducing the scope of the COP usage and the media codec parameters to provide sub- optimal quality or block certain features. To prevent this the SDP needs to be authenticated and integrity protected. Westerlund & Burman Expires November 17, 2012 [Page 15] Internet-Draft Abbreviated-Title May 2012 b. The COP messages themselves could be modified to affect the negotiated codec parameters. This could have sever impact on the media quality as media streams can be completely throttled, or configured to very reduced framerate or resolution. To prevent this source authentication and integrity protection must be applied to the RTCP compound packets. c. In multi-party applications of COP an entity may need to combine multiple sets of requested parameters. In these multi-party cases a particular participant may target the other participants and actively try to degrade their experience. Any COP entity merging sets will need to consider if a particular participant is actively harmful to the others and can chose to ignore that entities request. 9. Acknowledgements 10. References 10.1. Normative References [I-D.ietf-rtcweb-jsep] Uberti, J. and C. Jennings, "Javascript Session Establishment Protocol", draft-ietf-rtcweb-jsep-00 (work in progress), March 2012. [I-D.westerlund-avtext-codec-operation-point] Westerlund, M., Burman, B., and L. Hamm, "Codec Operation Point RTCP Extension", draft-westerlund-avtext-codec-operation-point-00 (work in progress), March 2012. [I-D.westerlund-avtext-rtp-stream-pause] Akram, A., Burman, B., Grondal, D., and M. Westerlund, "RTP Media Stream Pause and Resume", draft-westerlund-avtext-rtp-stream-pause-01 (work in progress), May 2012. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Westerlund & Burman Expires November 17, 2012 [Page 16] Internet-Draft Abbreviated-Title May 2012 Description Protocol", RFC 4566, July 2006. [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, "Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)", RFC 5104, February 2008. [RFC5506] Johansson, I. and M. Westerlund, "Support for Reduced-Size Real-Time Transport Control Protocol (RTCP): Opportunities and Consequences", RFC 5506, April 2009. 10.2. Informative References [I-D.alvestrand-rtcweb-resolution] Alvestrand, H., "RTCWEB Resolution Negotiation", draft-alvestrand-rtcweb-resolution-00 (work in progress), April 2012. [I-D.ietf-payload-vp8] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. Galligan, "RTP Payload Format for VP8 Video", draft-ietf-payload-vp8-04 (work in progress), March 2012. [I-D.ietf-rtcweb-use-cases-and-requirements] Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- Time Communication Use-cases and Requirements", draft-ietf-rtcweb-use-cases-and-requirements-07 (work in progress), April 2012. [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, January 2008. [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)", RFC 5124, February 2008. [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP Payload Format for H.264 Video", RFC 6184, May 2011. Westerlund & Burman Expires November 17, 2012 [Page 17] Internet-Draft Abbreviated-Title May 2012 Authors' Addresses Magnus Westerlund Ericsson Farogatan 6 SE-164 80 Kista Sweden Phone: +46 10 714 82 87 Email: magnus.westerlund@ericsson.com Bo Burman Ericsson Farogatan 6 SE-164 80 Kista Sweden Phone: +46 10 714 13 11 Email: bo.burman@ericsson.com Westerlund & Burman Expires November 17, 2012 [Page 18]