CLUE WG A. Romanow Internet-Draft Cisco Systems Intended status: Informational S. Botzko Expires: January 16, 2013 Polycom July 15, 2012 Requirements for Telepresence Multi-Streams draft-ietf-clue-telepresence-requirements-02.txt Abstract This memo discusses the requirements for a specification that enables telepresence interoperability, by describing the relationship between multiple RTP streams. In addition, the problem statement and definitions are also covered herein. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 16, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Romanow & Botzko Expires January 16, 2013 [Page 1] Internet-Draft CLUE Telepresence Requirements July 2012 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 5 5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 7 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 9. Informative References . . . . . . . . . . . . . . . . . . . . 11 Appendix A. Open issues . . . . . . . . . . . . . . . . . . . . . 11 Appendix B. Changes From Earlier Versions . . . . . . . . . . . . 12 B.1. Changes From Draft -00 . . . . . . . . . . . . . . . . . . 12 B.2. Changes from draft -01 . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 Romanow & Botzko Expires January 16, 2013 [Page 2] Internet-Draft CLUE Telepresence Requirements July 2012 1. Introduction Telepresence systems greatly improve collaboration. In a telepresence conference (as used herein), the goal is to create an environment that gives the users a feeling of (co-located) presence - the feeling that a local user is in the same room with other local users and the remote parties. Currently, systems from different vendors often do not interoperate because they do the same tasks differently, as discussed in the Problem Statement section below. The approach taken in this memo is to set requirements for a future specification(s) that, when fulfilled by an implementation of the specification(s), provide for interoperability between IETF protocol based telepresence systems. It is anticipated that a solution for the requirements set out in this memo likely involves the exchange of adequate information about participating sites; information that is currently not standardized by the IETF. The purpose of this document is to describe the requirements for a specification that enables interworking between different SIP-based [RFC3261] telepresence systems, by exchanging and negotiating appropriate information. Non IETF protocol based systems, such as those based on ITU-T Rec. H.323, are out of scope. These requirements are for the specification, they are not requirements on the telepresence systems implementing the solution/protocol that will be specified. Telepresence systems of different vendors, today, can follow radically different architectural approaches while offering a similar user experience. It is not the intention of CLUE to dictate telepresence architectural and implementation choices. CLUE enables interoperability between telepresence systems by exchanging information about the systems' characteristics. Systems can use this information to control their behavior to allow for interoperability between those systems. A telepresence session, requires at least one sending and one receiving endpoint. Multiparty telepresence sessions include more than two endpoints, and centralized infrastructure such as Multipoint Control Units (MCUs) or equivalent. CLUE specifies the syntax, semantics, and control flow of information to enable the best possible user experience at those endpoints. Sending endpoints, or MCUs, are not mandated to use any of the CLUE specifications that describe their capabilities, attributes, or behavior. Similarly, it is not envisioned that endpoints or MCUs must ever take into account information received. However, by making available as much information as possible, and by taking into account Romanow & Botzko Expires January 16, 2013 [Page 3] Internet-Draft CLUE Telepresence Requirements July 2012 as much information as has been received or exchanged, MCUs and endpoints are expected to select operation modes that enable the best possible user experience under their constraints. The document structure is as follows: Definitions are set out, followed by a description of the problem of telepresence interoperability that led to this work. Then the requirements to a specification addressing the current shortcomings are enumerated and discussed. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 3. Definitions The definitions are from draft-wenger-clue-definitions-01.txt. The editor's notes are not included here. Audio Mixing: refers to the accumulation of scaled audio signals to produce a single audio stream. See RTP Topologies, [RFC5117]. Conference: used as defined in [RFC4353], A Framework for Conferencing within the Session Initiation Protocol (SIP). Endpoint: The logical point of final termination through receiving, decoding and rendering, and/or initiation through capturing, encoding, and sending of media streams. An endpoint consists of one or more physical devices which source and sink media streams, and exactly one [RFC4353] Participant (which, in turn, includes exactly one SIP User Agent). In contrast to an endpoint, an MCU may also send and receive media streams, but it is not the initiator nor the final terminator in the sense that Media is Captured or Rendered. Endpoints can be anything from multiscreen/multicamera rooms to handheld devices. Endpoint Characteristics: include placement of Capture and Rendering Devices, capture/render angle, resolution of cameras and screens, spatial location and mixing parameters of microphones. Endpoint characteristics are not specific to individual media streams sent by the endpoint. Romanow & Botzko Expires January 16, 2013 [Page 4] Internet-Draft CLUE Telepresence Requirements July 2012 Layout: How rendered media streams are spatially arranged with respect to each other on a single screen/mono audio telepresence endpoint, and how rendered media streams are arranged with respect to each other on a multiple screen/speaker telepresence endpoint. Note that audio as well as video is encompassed by the term layout--in other words, included is the placement of audio streams on speakers as well as video streams on video screens. Left: to be interpreted as a stage direction, see also [StageDirection(Wikipedia)] Local: Sender and/or receiver physically co-located ("local") in the context of the discussion. MCU: Multipoint Control Unit (MCU) - a device that connects two or more endpoints together into one single multimedia conference [RFC5117]. An MCU may include a Mixer [RFC4353]. Media: Any data that, after suitable encoding, can be conveyed over RTP, including audio, video or timed text. Model: a set of assumptions a telepresence system of a given vendor adheres to and expects the remote telepresence system(s) also to adhere to. Remote: Sender and/or receiver on the other side of the communication channel (depending on context); not Local. A remote can be an Endpoint or an MCU. Render: the process of generating a representation from a media, such as displayed motion video or sound emitted from loudspeakers. Right: to be interpreted as stage direction, see also [StageDirection(Wikipedia)] Telepresence: an environment that gives non co-located users or user groups a feeling of (co-located) presence - the feeling that a Local user is in the same room with other Local users and the Remote parties. The inclusion of Remote parties is achieved through multimedia communication including at least audio and video signals of high fidelity. 4. Problem Statement In order to create a "being there" experience characteristic of telepresence, media inputs need to be transported, received, and coordinated between participating systems. Different telepresence Romanow & Botzko Expires January 16, 2013 [Page 5] Internet-Draft CLUE Telepresence Requirements July 2012 systems take diverse approaches in crafting a solution, or, they implement similar solutions quite differently. They use disparate techniques, and they describe, control and negotiate media in dissimilar fashions. Such diversity creates an interoperability problem. The same issues are solved in different ways by different systems, so that they are not directly interoperable. This makes interworking difficult at best and sometimes impossible. Worse, many telepresence systems use proprietry protocol extensions to solve telepresence-related problems, even if those extensions are based on common standards such as SIP. Some degree of interworking between systems from different vendors is possible through transcoding and translation. This requires additional devices, which are expensive, often not entirely automatic, and they sometimes introduce unwelcome side effects, such as additional delay or degraded performance. Specialized knowledge is currently required to operate a telepresence conference with endpoints from different vendors, for example to configure transcoding and translating devices. Often such conferences do not start as planned, or are interrupted by difficulties that arise. The general problem that needs to be solved can be described as follows. Today, each endpoint sends audio and video captures based upon an implicitly assumed model for rendering a realistic depiction based on this information. If all endpoints are manufactured by the same vendor, they work with the same model and render the information according to the model implicitly assumed by the vendor. However, if the devices are from different vendors, the models they each use for rendering presence can and usually do differ. The result can be that the telepresence systems actually connect, but the user experience suffers, for example because one system assumes that the first video stream is captured from the right camera, whereas the other assumes the first video stream is captured from the left camera. If Alice and Bob are at different sites, Alice needs to tell Bob about the camera and sound equipment arranement at her site so that Bob's receiver can create an accurate rendering of her site. Alice and Bob need to agree on what the salient characteristics are as well as how to represent and communicate them. Characteristics may include number, placement, capture/render angle, resolution of cameras and screens, spatial location and audio mixing parameters of microphones. The telepresence multi-stream work seeks to describe the sender situation in a way that allows the receiver to render it Romanow & Botzko Expires January 16, 2013 [Page 6] Internet-Draft CLUE Telepresence Requirements July 2012 realistically, though it may have a different rendering model than the sender; and for the receiver to provide information to the sender in order to help the sender create adequate content for interworking. 5. Requirements Although some aspects of these requirements can be met by existing technology, such as SDP, or H.264, nonetheless we state them here to have a complete record of what the requirements for CLUE are, whether new work is needed or they can be met by existing technology. Figuring this out will be part of the solution development, rather than part of the requirements. REQMT-1: The solution MUST support a description of the spatial arrangement of source video images sent in video streams which enables a satisfactory reproduction at the receiver of the original scene. This applies to each site in a point to point or a multipoint meeting and refers to the spatial ordering within a site, not to the ordering of images between sites. Use case point to point symmetric, and all other use cases. REQMT-1a: The solution MUST support a means of allowing the preservation of the order of images in the captured scene. For example, if John is to Susan's right in the image capture, John is also to Susan's right in the rendered image. REQMT-1b: The solution MUST support a means of allowing the preservation of order of images in the scene in two dimensions - horizontal and vertical. REQMT-1c: The solution MUST support a means to identify the point of capture of individual video captures in three dimensions. REQMT-1d: The solution MUST support a means to identify the extent of individual video captures in three dimensions. REQMT-2: The solution MUST support a description of the spatial arrangement of captured source audio sent in audio streams which enables a satisfactory reproduction at the receiver in a spatially correct manner. This applies to each site in a point to point or a multipoint meeting and refers to Romanow & Botzko Expires January 16, 2013 [Page 7] Internet-Draft CLUE Telepresence Requirements July 2012 the spatial ordering within a site, not the ordering of channels between sites. Use case point to point symmetric, and all use cases, especially heterogeneous. REQMT-2a: The solution MUST support a means of preserving the spatial order of audio in the captured scene. For example, if John sounds as if he is at Susan's right in the captured audio, John voice is also placed at Susan's right in the rendered image. REQMT-2b: The solution MUST support a means to identify the number and spatial arrangement of audio channels including monaural, stereophonic (2.0), and 3.0 (left, center, right) audio channels. REQMT-2c: The solution MUST NOT preclude the use of binaural audio. [Edt. This is an outstanding issue. Text will be changed when the issue is resolved.] REQMT-2d: The solution MUST support a means to identify the point of capture of individual audio captures in three dimensions. REQMT-2e: The solution MUST support a means to identify the extent of individual audio captures in three dimensions. REQMT-3: The solution MUST support a mechanism to enable a satisfactory spatial matching between audio and video streams coming from the same endpoints. Use case is point to point symmetric, and all use cases. REQMT-3a: The solution MUST enable individual audio streams to be associated with one or more video image captures, and individual video image captures to be associated with one or more audio captures, for the purpose of rendering proper position. Romanow & Botzko Expires January 16, 2013 [Page 8] Internet-Draft CLUE Telepresence Requirements July 2012 REQMT-3b: The solution MUST enable individual audio streams to be rendered in any desired spatial position. Edt: Rendering is an open issue. Text will be changed when it is resolved.] REQMT-4: The solution MUST enable interoperability between endpoints that have a different number of similar devices. For example, one endpoint may have 1 screen, 1 speaker, 1 camera, 1 mic, and another endpoint may have 3 screens, 2 speakers, 3 cameras and 2 mics. Or, in a multi-point conference, one endpoint may have one screen, another may have 2 screens and a third may have 3 screens. This includes endpoints where the number of devices of a given type is zero. Use case is asymmetric point to point and multipoint. REQMT-5: The solution MUST support means of enabling interoperability between telepresence endpoints where cameras are of different picture aspect ratios. REQMT-6: The solution MUST provide scaling information which enables rendering of a video image at the actual size of the captured scene. REQMT-7: The solution MUST support means of enabling interoperability between telepresence endpoints where displays are of different resolutions. REQMT-8: The solution MUST support methods for handling different bit rates in the same conference. REQMT-9: The solution MUST support means of enabling interoperability between endpoints that send and receive different numbers of media streams. Use case heterogeneous and multipoint. REQMT-10: The solution MUST make it possible for endpoints without support for telepresence extensions to participate in a telepresence session with those that do. REQMT-11: The solution MUST support a mechanism for determining whether or not an endpoint or MCU is capable of telepresence extensions. Romanow & Botzko Expires January 16, 2013 [Page 9] Internet-Draft CLUE Telepresence Requirements July 2012 REQMT-12: The solution MUST support a means to enable more than two sites to participate in a teleconference. Use case multipoint. REQMT-13: The solution MUST support both transcoding and switching approaches to providing multipoint conferences. REQMT-14: The solution MUST support mechanisms to make possible for either or both site switching or segment switching. [Edt: This needs rewording. Deferred until layout discussion is resolved.] REQMT-15: The solution MUST support mechanisms for presentations in such a way that: * Presentations can have different sources * Presentations can be seen by all * There can be variation in placement, number and size of presentations REQMT-16: The solution MUST include extensibility mechanisms. REQMT-17: The solution must support a mechanism for allowing information about media captures to change during a conference. 6. Acknowledgements This draft has benefitted from all the comments on the mailing list and a number of discussions. So many people contributed that it is not possible to list them all. 7. IANA Considerations TBD 8. Security Considerations TBD Romanow & Botzko Expires January 16, 2013 [Page 10] Internet-Draft CLUE Telepresence Requirements July 2012 9. Informative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC4353] Rosenberg, J., "A Framework for Conferencing with the Session Initiation Protocol (SIP)", RFC 4353, February 2006. [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, January 2008. [StageDirection(Wikipedia)] Wikipedia, "Blocking (stage), available from http:// en.wikipedia.org/wiki/Stage_direction#Stage_directions", May 2011, . Appendix A. Open issues OPEN-1 Binaural Audio [REQMT-2C] The need to support of binaural audio is unresolved, and the "MUST NOT preclude" language in this requirement is problematic. The authors believe this requirement needs to be either changed or withdrawn, depending on how the issue is resolved. OPEN-2 Reference to Rendering [REQMT-3b] This is the only requirement which refers to rendering. It may also be empty, since receivers can rendering audio captures as they wish. This is deferred until broader discussion on rendering requirements is concluded. OPEN-3 Conference modes [REQMT-14] This wording of this requirement is problematic in part because the conference modes (site switching and segment switching) are not defined. It at least needs rewording. This is deferred until broader discussion on layout is concluded. OPEN-4 Need to capture requirement that attributes can change at any time during the call. Romanow & Botzko Expires January 16, 2013 [Page 11] Internet-Draft CLUE Telepresence Requirements July 2012 OPEN-5 Need to add requirement for three dimensions in the right place OPEN-6 Multi-view, is there a requirement needed? Appendix B. Changes From Earlier Versions Note to the RFC-Editor: please remove this section prior to publication as an RFC. B.1. Changes From Draft -00 o Requirement #2, The solution MUST support a means to identify monaural, stereophonic (2.0), and 3.0 (left, center, right) audio channels. changed to The solution MUST support a means to identify the number and spatial arrangement of audio channels including monaural, stereophonic (2.0), and 3.0 (left, center, right) audio channels. o Added back references to the Use case document. * Requirement #1 Use case point to point symmetric, and all other use cases. * Requirement #2 Use case point to point symmetric, and all use cases, especially heterogeneous. * Requirement #3 Use case point to point symmetric, and all use cases. * Requirement #4 Use case is asymmetric point to point, and multipoint. * Requirement #9 Use case heterogeneous and multipoint. * Requirement #12 Use case multipoint. B.2. Changes from draft -01 Cleaned up the Problem Statement section, re-worded. Added Requirement Paragraph 17 in response to WG Issue #4 to make a requirement for dynamically changing information. Approved by Romanow & Botzko Expires January 16, 2013 [Page 12] Internet-Draft CLUE Telepresence Requirements July 2012 WG Added requirements #1.c and #1.d. Approved by WG Added requirements #2.d and #2.e. Approved by WG Authors' Addresses Allyn Romanow Cisco Systems San Jose, CA 95134 USA Email: allyn@cisco.com Stephen Botzko Polycom Andover, MA 01810 US Email: stephen.botzko@polycom.com Romanow & Botzko Expires January 16, 2013 [Page 13]