Network Working Group P. Thatcher Internet-Draft J. Uberti Intended status: Standards Track Google Expires: August 8, 2013 February 4, 2013 An argument for encoding multiple media sources per m= section of SDP. draft-pthatcher-mmusic-many-sources-00 Abstract This document explains why it is preferable to have multiple media sources encoded in SDP as one m= section rather than many m= sections, especially when there are a large number of video sources which change frequently and need dynamic resolution changes, such as in a video conferencing system. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 8, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Thatcher & Uberti Expires August 8, 2013 [Page 1] Internet-Draft Argument for multiple sources per m-line February 2013 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. The downsides of many, frequently changing m= sections. . . . . 3 3. Responses to the benefits of many m= sections. . . . . . . . . 4 4. Other considerations for a video conferencing system. . . . . . 5 4.1. Per-source negotiation is not a major factor . . . . . . . 5 4.2. Source selection . . . . . . . . . . . . . . . . . . . . . 5 4.3. Extensibility to very large or dynamic conferences . . . . 6 5. Use in Production . . . . . . . . . . . . . . . . . . . . . . . 6 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 7 Thatcher & Uberti Expires August 8, 2013 [Page 2] Internet-Draft Argument for multiple sources per m-line February 2013 1. Introduction There's a long debate over whether to encode multiple media sources in SDP as multiple m= sections or whether to encode multiple sources into a single m= section, such as in RFC5576. When there are a large number of sources (10-100, such as in a video conferencing system), multiple m= sections without BUNDLE would require many (10-100) transports, which would incur significant overhead and fragility. This alone would be sufficient argument for not using multiple m= sections when dealing with large numbers of sources. However, what if we can combine BUNDLE and multiple m= sections? Would that be usable for scenarios with many sources that change frequently (such as a video conferencing system)? We believe that such an approach (of BUNDLE + many-m-sections) has a number of significant problems when dealing with many sources, and that an approach using multiple sources m= section (as in RFC5576) should be preferred. 2. The downsides of many, frequently changing m= sections. When there are many sources, and when both sides can add or remove sources at any time (which is common in a video conferencing system), using many m= sections has a number of serious problems. o If the answerer has more sources than the offerer, the answerer cannot put them in the answer, since the answer must have an equal number of m= sections as the offer. o If both ends wish to add or remove sources at the same time, the result is signaling glare, since m= sections are identified by index. It's unclear how this would be resolved efficiently (a known unknown). o If one side adds or removes a source at the same time that the other refers to a source (such as signaling a desired resolution for a specific source) the result is more complex signaling glare, again due to the lookup by index. It's unclear how this would be resolved efficiently (another known unknown). o Adding or removing a source requires sending a full list of all sources, which doesn't scale well to 10s or 100s of sources, especially when they are added or removed frequently. o Because this solution requires BUNDLE, which is still not widely supported, it can be deployed in fewer situations, and will take longer to reach usable support. There are many open questions regarding BUNDLE, including whether partial BUNDLEing is allowed; if it is, then adding a new media stream with BUNDLE will result Thatcher & Uberti Expires August 8, 2013 [Page 3] Internet-Draft Argument for multiple sources per m-line February 2013 in having to generate new transport candidates, in the event the remote side doesn't want to BUNDLE the new stream. o With BUNDLE, use of RFC5576 a=ssrc attributes is required in order for the recipient to be able to properly demultiplex incoming RTP. Payload type isn't sufficient as a demux point, since RTCP packets don't contain RTP payload types. Therefore, this approach requires both sides to understand RFC5576, just like the single-m- section approach. o BUNDLE has many complex rules regarding how the various SDP attributes in the different m= lines need to be constructed, since there is only a single RTP session. For one, all payload types need to be unique, which could complicate the use of per-source attributes with many sources; if having per-source attributes required the payload types to be different, and there were so many sources that the payload type space were exhausted, then what would we do? There are lots of "unknown unknowns" here. Or, as I like to say "thar be dragons". 3. Responses to the benefits of many m= sections. The idea of using many m= sections with BUNDLE is not without merit. The major benefits that have been suggested are: 1. It allows per-source SDP attributes. 2. We have to support multiple m= sections anyway. 3. There's no advantage to multi-source m= sections. 4. There are many unknowns with having multiple sources per m= section. However: 1. For a multi-source m= section, support for most of the important SDP attributes is already defined on a per-source basis in RFC5576, and the few remaining attributes already have proposals. As an example, if we wanted to have two audio sources, one with a high bitrate and one with a low bitrate, we could describe that using the per-source fmtp parameters defined in RFC5576 to indicate, say, high bitrate Opus for one source and low bitrate Opus for the other. In short, this is a solved problem for the multi-source m= section approach, and so the multi-m=section approach doesn't have much of an advantage. 2. While multiple m= sections will still be needed for a finite number of legacy cases, that does not mean multiple m= sections is suitable for all cases, including those with many, frequently changing sources. As has been shown, there are a number of problems. We can continue to use multiple m= sections in certain cases for legacy interop while using multiple sources per m= section going forward so that we can support many sources without the stated problems. Thatcher & Uberti Expires August 8, 2013 [Page 4] Internet-Draft Argument for multiple sources per m-line February 2013 3. As mentioned, many m= sections has a number of significant problems, and the multi-source m= section avoids them. Thus, it does have advantages. 4. The RFC5576-based approach has already been in use for many years in a major video conferencing system (Google+ Hangouts), and has worked very well. We feel that we have searched the "unknown" space and solved all of the major issues. In other words, we feel that the multi-source m= section is mostly a solved problem for the case of many sources that change frequently, such as in a video conferencing system. On the other hand, the multi m= section approach has never been tried by anyone, that we are aware of, with a large number of m= sections and so almost be definition has many unknown unknowns. 4. Other considerations for a video conferencing system. 4.1. Per-source negotiation is not a major factor In a video conferencing system, such as Google+ Hangouts, we have found that it is not only unnecessary to negotiate per-source codecs, RTP header extensions, etc, it's actually something we don't want to do, because it is simply more complexity that isn't needed. We have found that it is best to negotiate the codecs, RTP header extensions, etc, once, independent of sources, and then simply select from the negotiated options the settings for each source. This selection can be done by the receiver, but in many cases (e.g. selection of bitrate), it is far better for the sender to make this choice. The sender is aware of the details of the media it is sending, and the bandwidth estimate to the receiver, so it is in a far better position to decide with what options a given source should be encoded (and it can change these options at any time with no signaling delay). Thus, from our view, the per-source codec, RTP header extension, etc, negotiation of the multi m= section approach is not an advantage, but rather an additional complexity. 4.2. Source selection It is desirable in a video conferencing system to allow the receiver to tell the sender what resolution to send on a per-source basis, and to change that value at any time throughout the call. As discussed, the multiple m= section approach has a problem when once side selects a source resolution while the other side modifies the list of sources. A multi-source m= section approach avoids this, and work as already begun defining source selection SDP, as in https://tools.ietf.org/ Thatcher & Uberti Expires August 8, 2013 [Page 5] Internet-Draft Argument for multiple sources per m-line February 2013 html/draft-lennox-mmusic-sdp-source-selection-02 4.3. Extensibility to very large or dynamic conferences SDP, by its nature, describes the full state of a session, with no support for partial updates. Thus, any mechanism for describing or controlling sources based on SDP will become impractical as the number of sources becomes very large (hundreds or more) or the list of sources changes quickly. In such scenarios, other mechanisms for describing or controlling sources are preferred. Several mechanisms along these lines have been proposed, including the CLUE signaling channel (sent on an independent SCTP data channel), extensions to the XCON conference event package (using SIP SUBSCRIBE/NOTIFY), or the Codec Operation Point (COP) messages (sent over RTCP). In all these cases, however, SDP mechanisms will be used to describe the RTP sessions that carry the sources that are controlled or described externally. It is very desirable that the syntax of the SDP descriptions of these sessions be as similar as possible to that of sessions with sources described in SDP. Multi-source m= sections meet this naturally -- the sessions controlled externally would simply omit the a=ssrc attributes. The multiple m= line proposal, by contrast, does not appear to have any similarly natural way to extend itself to external control. 5. Use in Production We'd like to reiterate that an RFC-5576-based approach, with multiple sources per m= section is already in use by Google+ Hangouts for doing a form video conferencing. It has been used successfully for several years for millions of users and conferences and has scaled well for up to tens of simultaneous sources within a video conference. It is used for logical sources of audio, video, and arbitrary real-time data. We feel that using multiple source per m=section is a proven solution for dealing with large numbers of sources that change frequently, because we've used it and it works. The "unknown" space has been thoroughly explored. On the other hand, we see large risk with going down the multiple m=section approach when dealing with many sources. 6. Conclusion We believe that encoding multiple sources as multiple m= sections does not scale well to many sources, has many problems, and many unknowns. On the contrary, encoding the sources into one m= section, as per RFC5576, scales to many sources, provides a number of Thatcher & Uberti Expires August 8, 2013 [Page 6] Internet-Draft Argument for multiple sources per m-line February 2013 advantages, and has fewer unknowns. 1. It allows per-source SDP attributes for most of the important attributes. 2. It scales to many sources in cases where many m= sections don't. 3. It's been used already and has fewer unknowns. 7. Security Considerations None. 8. IANA Considerations None. 9. Acknowledgements Jonathan Lennox provided the observations on large or dynamic conferences. Authors' Addresses Peter Thatcher Google 747 6th St S Kirkland, WA 98033 USA Email: pthatcher@gmail.com Justin Uberti Google 747 6th St S Kirkland, WA 98033 USA Email: justin@uberti.name Thatcher & Uberti Expires August 8, 2013 [Page 7]