CLUE                                                           R. Hansen
Internet-Draft                                             Cisco Systems
Intended status: Standards Track                            A. Pepperell
Expires: December 2, 2012                                    Silverflare
                                                              A. Romanow
                                                              B. Baldino
                                                           Cisco Systems
                                                            M. Duckworth
                                                                 Polycom
                                                            May 31, 2012


           The need for consumer spatial information in CLUE
                  draft-hansen-clue-consumer-layout-00

Abstract

   This draft is for discussion in the CLUE working group.  It proposes
   adding the ability for the consumer to provide specific information
   to the provider.

   This document proposes allowing consumers to include spatial
   parameters in their consumer requests to providers in order to
   improve the provider's ability to assign media to streams in a way
   that is helpful for rendering.  The solution proposed here is in
   partial response to CLUE Task #10, Does Framework provide sufficient
   info for receiver?

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on December 2, 2012.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the


Hansen, et al.          Expires December 2, 2012                [Page 1]

Internet-Draft    Consumer spatial information in CLUE          May 2012


   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
   3.  Motiviation - Conferencing in CLUE  . . . . . . . . . . . . . . 3
   4.  Issues associated with subscribing to multiple switched
       captures  . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
     4.1.  Provider advertising spatially-related switched
           captures  . . . . . . . . . . . . . . . . . . . . . . . . . 5
   5.  Consumer includes optional spatial information  . . . . . . . . 6
     5.1.  Applicability of consumer spatial information to audio  . . 8
   6.  Implications and conclusions  . . . . . . . . . . . . . . . . . 8
   7.  Security Considerations . . . . . . . . . . . . . . . . . . . . 8
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 8
     8.1.  Normative References  . . . . . . . . . . . . . . . . . . . 8
     8.2.  Informative References  . . . . . . . . . . . . . . . . . . 9
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 9


Hansen, et al.          Expires December 2, 2012                [Page 2]

Internet-Draft    Consumer spatial information in CLUE          May 2012


1.  Introduction

   This draft notes some limitations of CLUE when it comes to correctly
   rendering video under certain conditions, and proposes the optional
   addition of spatial information by the consumer to resolve these
   issues.  This does not imply that the authors believe that the
   proposed solution is the only option available; rather, this draft is
   meant as a starting point for discussion.


2.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119] and
   indicate requirement levels for compliant implementations.


3.  Motiviation - Conferencing in CLUE

   The current methodology of the CLUE framework
   [I-D.ietf-clue-framework] is well suited to the case of systems with
   a relatively static set of capture devices.  However, scenarios with
   a much more dynamic set of capture devices being presented to
   consumers, such as a voice-switched conferencing where multiple
   endpoints connect to a middle box such as an MCU, present additional
   challenges.  An example of such a scenario is shown below, with four
   endpoints A, B, C and D in a conference:

                   +-----+
        +---+     /       \     +---+
        | A |----/         \----| B |
        +---+   /           \   +---+
               +     MCU     +
        +---+   \           /   +---+
        | C |----\         /----| D |
        +---+     \       /     +---+
                   +-----+

   In this scenario endpoint A is not directly connected to any of the
   other endpoints and so will not have the capture information
   associated with their media streams directly available.

   One approach is for the MCU to advertise B, C and D's captures as
   separate capture scenes to A - A can then subscribe to any capture
   from any of the other endpoints.

   However, as the size of the conference increases the number of


Hansen, et al.          Expires December 2, 2012                [Page 3]

Internet-Draft    Consumer spatial information in CLUE          May 2012


   captures that must be advertised will quickly become impractical.
   Further, in many conferencing scenarios, endpoints do not wish to
   specify the endpoints they want to see - instead they wish to see the
   video and audio from the 'most relevant' endpoints as determined by
   the MCU (where relevance is usually determined by audio activity
   level).  Finally, advertising all available captures in this fashion
   can be problematic in the case of captures that are simultaneously
   exclusive, as one consumer may ask for one and a second for its
   mutually exclusive partner.

   As such, the MCU has the ability in CLUE to advertise switched
   captures; these don't directly represent specific real video or audio
   captures.  Instead, subscribing to one of these captures means that
   the provider will switch the stream it sends to the consumer based on
   its internal logic.  In the example above, the MCU might advertise a
   single, switched video capture to A; if A subscribed to this then the
   MCU would forward the video stream from B, C or D based on which it
   felt was most relevant (often calculated based on the loudness of an
   associated audio stream).


4.  Issues associated with subscribing to multiple switched captures

   As such, The consumer A from the previous example can subscribe to
   one or more of these switched captures and will receive that many
   streams from the MCU, switched from their originating source.
   However, A does not receive the spatial capture information from the
   originating source associated with these streams alongside the RTP
   packets.  As a result things become more complicated when A
   subscribes to multiple video captures, and when the other endpoints
   provide multiple video streams with correlated spatial information.
   For example, if A is a three-screen system and hence requests three
   streams, if all the streams it receives are independent it can render
   them as it wishes, as shown below where it receives one stream from
   each of B, C and D:

      +------+ +------+ +------+
      |      | |      | |      |
      |  B   | |  C   | |  D   |
      |      | |      | |      |
      +------+ +------+ +------+

   However, if A receives more than one stream from a particular
   endpoint and these streams have related spatial relationships then it
   is possible for A to lay them out erroneously.  This is illustrated
   below, where A is receiving three streams of video that originated at
   B, which should correctly be ordered (L)eft, (C)enter, (R)ight:


Hansen, et al.          Expires December 2, 2012                [Page 4]

Internet-Draft    Consumer spatial information in CLUE          May 2012


      +------+ +------+ +------+
      |      | |      | |      |
      | B(L) | | B(C) | | B(R) |  Correct
      |      | |      | |      |
      +------+ +------+ +------+

      +------+ +------+ +------+
      |      | |      | |      |
      | B(C) | | B(R) | | B(L) |  Incorrect
      |      | |      | |      |
      +------+ +------+ +------+

   When laid out incorrectly this leads to objects (such as a person
   being viewed) being split into sections displayed in disparate, non-
   contiguous locations.

   This problem could be solved if A had the spatial capture information
   from B. In a small conference it may be possible for the middle box
   to pre-send all the capture information from all other endpoints to A
   (and to every other endpoint), but as the number of captures per
   endpoint and the number of endpoints in a conference rise caching all
   the data becomes impractical.

   An alternative would be for A to request the originating capture
   information for streams it is receiving, or for the MCU to send it
   whenever it switches streams.  However, because the RTP packets and
   the CLUE capture information will be sent in separate channels this
   will lead to cases where A is receiving RTP packets but has not yet
   received the corresponding capture data and the same problem occurs.
   The endpoint must then choose between displaying nothing or risk
   making incorrect layout choices.

4.1.  Provider advertising spatially-related switched captures

   One tool that already exists within the CLUE framework that can be
   used to partially solve this problem is the MCU including spatial
   information for the switched captures it advertised.  In this case,
   for example, it would advertise three captures with area of capture
   information for each that portray them as the left, center and right
   captures of a single hypothetical room.  In this case, when the MCU
   has unrelated one-screen streams to send to A it can associate them
   with whichever switched capture it chooses.  But when sending a two-
   or three-screen set of streams it can ensure that they are correctly
   laid out adjacent to each other and in the correct order.  A could
   then request these three captures and render the streams
   appropriately on its left, center and right screen, needing to take
   no action to ensure that the streams are correctly laid out.


Hansen, et al.          Expires December 2, 2012                [Page 5]

Internet-Draft    Consumer spatial information in CLUE          May 2012


   However, this solution is not sufficient for all use-cases.  The
   issue is that the MCU will need to advertise a suitable separate
   group of switched captures for each endpoint configuration that could
   connect to it.  If the possible endpoint configurations are limited,
   this may still represent a plausible number; for instance, an MCU
   that wanted to support endpoints with one, two, three or four screens
   laid out contigously left-to-right could advertise a capture set with
   the following entries:

   {
     [VC0]
     [VC1, VC2]
     [VC3, VC4, VC5]
     [VC6, VC7, VC8, VC9]
   }

   where VC0 was a single switched capture, VC1 and VC2 were two
   switched captures each representing half the scene, and so on.

   But this means that the MCU is only able to support certain pre-
   defined layouts - supporting additional configurations of screens
   (such as a 2x2 array) requires a new entry for each, and designing a
   new endpoint configuration means updating all the MCUs it
   interoperates with.  This problem becomes particularly acute if the
   endpoint has many screens, or wants to perform local composition
   (subscribing to multiple streams per screen and rendering them
   locally for display) - this both substantially increases the number
   of streams that the endpoint would wish to subscribe to, and
   increases the complexity of layouts possible.  For instance, an
   endpoint with two screens that wanted to show a 2x2 grid of
   participants on each would need to subscribe to eight captures with
   appropriate spatial information.


5.  Consumer includes optional spatial information

   We can address these issues and allow an endpoint more complex stream
   rendering configurations, while substantially reducing the number and
   complexity of switched captures the MCU must advertise.  The approach
   is for the consumer to optionally include some information on the
   spatial relationships with its rendering as part of its request.
   This allows the MCU to advertise a single collection of switched
   captures with no spatial information for the consumer to subscribe
   to, rather than attempting to anticipate every layout an endpoint
   might desire, and having to advertise an entry for each with suitable
   spatial information.

   There are a number of forms this consumer information could take.


Hansen, et al.          Expires December 2, 2012                [Page 6]

Internet-Draft    Consumer spatial information in CLUE          May 2012


   However, the form most consistent with the existing CLUE data model,
   and offering most flexibility for the future, is for the consumer to
   be able to describe the spatial relationship of its screens in the
   same fashion and using the same system as in the provider's capture
   attributes.  'Area of Display' would be an optional attribute of a
   consumer request, and would have the same properties as the
   provider's 'Area of Capture' (i.e. four co-planar {X,Y,Z}
   coordinates).  If the consumer includes information on the area of
   display the provider may then choose to use that information to
   inform its choice when switching video.  Alternatively, in the cases
   where there were no spatial constraints on the video the provider was
   switching, or where fixed streams were being sent, the area of
   display information could be ignored.

   A straightforward example of this would be where consumer A is a
   three-screen system wishing to join a large conference including
   one-, two- and three-screen systems.  The MCU offers a capture scene
   including three switched captures, to which A wishes to subscribe.  A
   then sends a choice for each of those captures, and for each choice
   includes an area of display attribute giving the position of each of
   its screens.  The MCU can then use that information to ensure that,
   when switching in the video streams from multi-screen systems, it
   does so in a way that they will be rendered correctly on A.

   A more complicated example is where A is still a three-screen system
   wishing to join a large conference including one-, two- and three-
   screen systems, but now wishes to receive more than one video stream
   per screen, composing them locally.  The layout A wishes to achieve
   is (three large screens, each with one main video displayed full-
   screen and three picture-in-picture views):

      +-------------+  +-------------+  +-------------+
      |             |  |             |  |             |
      |             |  |             |  |             |
      |             |  |             |  |             |
      |             |  |             |  |             |
      | +-+ +-+ +-+ |  | +-+ +-+ +-+ |  | +-+ +-+ +-+ |
      | +-+ +-+ +-+ |  | +-+ +-+ +-+ |  | +-+ +-+ +-+ |
      +-------------+  +-------------+  +-------------+

   The MCU advertises that it can send at least 12 switched video
   streams to A simultaneously.  A makes 12 choices, including a
   suitable area of display for each one.  This information allows the
   MCU to not just ensure that multi-screen systems are not laid out
   incorrectly, but potentially to also optimize other choices, such as
   not splitting multi-screen systems being rendered in the smaller PiP
   panes across bezels, show presentation and full-motion video received
   from the same participant on the same screen, and so on.


Hansen, et al.          Expires December 2, 2012                [Page 7]

Internet-Draft    Consumer spatial information in CLUE          May 2012


5.1.  Applicability of consumer spatial information to audio

   The text above is primarily concerned with resolving issues for
   video, but it may still be relevant for audio; the consumer may wish
   to provide spatial information about the locations at which they will
   be playing out their audio.  However, for the most part I believe
   this is less relevant; that audio does not have the same rigid
   requirements for playout that were described above for video, and
   that for the most part the problem can be solved with the provider-
   specified spatial coordinates already defined in the specification.


6.  Implications and conclusions

   CLUE has been designed as a provider-oriented protocol, with the
   provider giving a list of the resources it can supply and the
   consumer selecting from these.  This proposal fits into that pattern;
   spatial information included in a consumer request forms part of that
   request, insofar as it does not limit the provider but instead gives
   additional information for the provider to use as it sees fit.
   Consumers that have no need for the spatial information need not
   include it, and providers can choose to ignore the spatial
   information if it is not relevant to their selection process.

   Allowing the optional reuse of spatial information that is currently
   sent only by the provider in the consumer request increases the range
   of problems for which CLUE can provide a solution, while placing no
   additional burden on systems which do not have these concerns, as
   they can safely ignore the information


7.  Security Considerations

   The proposal herein has no security implications; the new information
   from the consumer is optional and sent at their discretion, and
   reveals nothing that can compromise their system.


8.  References

8.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.


Hansen, et al.          Expires December 2, 2012                [Page 8]

Internet-Draft    Consumer spatial information in CLUE          May 2012


8.2.  Informative References

   [I-D.ietf-clue-framework]
              Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino,
              "Framework for Telepresence Multi-Streams",
              draft-ietf-clue-framework-05 (work in progress), May 2012.


Authors' Addresses

   Robert Hansen
   Cisco Systems
   San Jose, CA  95134
   USA

   Email: rohanse2@cisco.com


   Andy Pepperell
   Silverflare

   Email: andy.pepperell@silverflare.com


   Allyn Romanow
   Cisco Systems
   San Jose, CA  95134
   USA

   Email: allyn@cisco.com


   Brian Baldino
   Cisco Systems
   San Jose, CA  95134
   USA

   Email: bbaldino@cisco.com


   Mark Duckworth
   Polycom
   Andover, MA  01810
   USA

   Email: mark.duckworth@polycom.com


Hansen, et al.          Expires December 2, 2012                [Page 9]