Network Working Group                                            N. Zong
Internet-Draft                                                 L. Dunbar
Intended status: Informational                       Huawei Technologies
Expires: March 10, 2014                                         M. Shore
                                                    No Mountain Software
                                                      September 06, 2013


   Problem Statement for Reliable Virtualized Network Function (VNF)
                draft-zong-vnfpool-problem-statement-01

Abstract

   Virtualization technology has been widely supported by both Network
   Operators and Data Center providers to provide service with reduced
   operational and capital costs, automated deployment, and enhanced
   elasticity.  A challenge is how to achieve the reliability and high
   availability capabilities of the virtualized network function to
   facilitate reliable service.

   This document focuses on the problems related to the reliability and
   high availability aspects of virtualized network function.  A
   discussion of reliable virtualized network function pools is
   presented for scoping solution purpose.  Some related works together
   with potential reuse and extension are also introduced.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on March 10, 2014.


Zong, et al.             Expires March 10, 2014                 [Page 1]

Internet-Draft                Reliable VNF                September 2013


Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Problems  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
     3.1.  VNF Instance Selection and Status Monitoring  . . . . . .   6
     3.2.  Backup Selection and Announcement . . . . . . . . . . . .   6
     3.3.  Service State Synchronization . . . . . . . . . . . . . .   6
     3.4.  Transition Handling . . . . . . . . . . . . . . . . . . .   6
     3.5.  Policy Enforcement  . . . . . . . . . . . . . . . . . . .   6
   4.  Working Scope . . . . . . . . . . . . . . . . . . . . . . . .   7
     4.1.  Reference Architecture  . . . . . . . . . . . . . . . . .   7
     4.2.  Proposed Working Scope  . . . . . . . . . . . . . . . . .   8
   5.  Related Works . . . . . . . . . . . . . . . . . . . . . . . .   9
     5.1.  Reliable Server Pool  . . . . . . . . . . . . . . . . . .   9
     5.2.  Virtual Router Redundancy Protocol  . . . . . . . . . . .  10
     5.3.  VNF Forwarding Graph  . . . . . . . . . . . . . . . . . .  11
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  11
   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  12
   8.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  12
   9.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  12
     9.1.  Normative References  . . . . . . . . . . . . . . . . . .  12
     9.2.  Informative References  . . . . . . . . . . . . . . . . .  12
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  13
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  13

1.  Background

   Network functions such as firewall (FW), Deep Packet Inspection
   (DPI), Load Balancer (LB), WAN Optimization are conventionally
   deployed as a set of dedicated devices in both Network Operators'
   network and Data Center (DC) network, as the building blocks of the
   services.  In recent years, virtualization technology, from server


Zong, et al.             Expires March 10, 2014                 [Page 2]

Internet-Draft                Reliable VNF                September 2013


   virtualization, network virtualization, to network function
   virtualization, is getting wider industry adoption by both Network
   Operators and DC providers, to achieve elastic service offering and
   reduced operational cost [NFV-WP].  The European Telecommunications
   Standards Institute (ETSI) has launched an Industry Specification
   Group (ISG) to study the use cases, requirements and architecture of
   Network Function Virtualization (NFV) from Network Operators'
   perspective.

   Building upon general purpose servers, instances of Virtualized
   Network Function (VNF) can be placed into various locations including
   DC networks, Network Operator networks and even customer premises.
   Furthermore, there are potentially more factors that cause VNF
   instance transition (e.g. scaling, migration) or even failure, such
   as resource contention among instances, hardware status change, and
   hardware/software failure at various levels.  Therefore, a major
   challenge is how to achieve the reliability and high availability
   capabilities of the VNF under highly distributed and dynamic
   conditions of the VNF instances.  For example, the Reliability and
   Availability Working Group (RELAV WG) of ETSI ISG NFV targets on
   identifying the resiliency problems and requirements to the services
   provided by Network Operators [NFV-REL].  An overview of VNF use
   cases focus on the reliability and high availability issues can be
   found in [VNFP-UC].

   In this document, we first overview problems related to the
   reliability and high availability aspects of VNF.  We then present an
   applicable architecture of reliable VNF pools to scope potential
   solutions.  Finally, we refer to some related works for potential
   reuse and extension of existing approaches.

2.  Terminology

   Reliability and High Availability: capability of a functional entity
   to consistently provide function under various dynamic and even
   unexpected conditions such as fault, overload, etc.

   Virtualized Network Function (VNF): a VNF provides the same
   functional behavior and interfaces as the equivalent network
   function, but is deployed as software instances building on top of a
   virtualization platform [NFV-TERM].

   VNF Pool: a group of VNF instances providing same network function.

   Pool Element (PE): a VNF instance inside a VNF pool.

   Pool User (PU): an entity that requests network function provided by
   VNF pool.


Zong, et al.             Expires March 10, 2014                 [Page 3]

Internet-Draft                Reliable VNF                September 2013


   Pool Manager (PM): an entity that manages pool elements, and
   interacts with pool user to provide network function.

3.  Problems

   Many network services require multiple network functions to be
   performed sequentially on data packets.  A traditional model for
   multi-tier service is shown as below, where for each network
   function, all instances connect to the corresponding entrance point
   (e.g. LB) responsible for sending/receiving data packets to/from
   selected instance(s), and steering the data packets between different
   network functions.

                    Service (e.g. VOIP, Web)
     +--------------+  +--------------+       +--------------+
     | function#1   |  |  function#2  |       |  function#n  |
     | +----------+ |  | +----------+ |       | +----------+ |
     | | Instance | |  | | Instance | |... ...| | Instance | |
     | +----------+ |  | +----------+ |       | +----------+ |
     |      |data   |  |      |data   |       |      |data   |
     |      |conn   |  |      |conn   |       |      |conn   |
     | +----------+ |  | +----------+ |       | +----------+ |
     | | Entrance | |  | | Entrance | |       | | Entrance | |
     | |   Point  | |  | |   Point  | |       | |   Point  | |
     | +----------+ |  | +----------+ |       | +----------+ |
     +-----+--------+  +-------+------+       +-------+------+
           |data conn          |data conn             |
           +-------------------+----------------------+

                    Figure 1: Multi-tier Service.


   Such model works well when all instances of the same network function
   are topologically close to each other.  However, VNF instances are
   highly distributed in DC networks, Network Operator networks and even
   customer premises.  When VNF instances are topologically far from
   each other, there could be many network links/nodes between an
   instance and the corresponding entrance point for steering the data
   packets.  For two VNF instances of different network functions, it is
   possible that they are on the same physical server, but the entrance
   points are many links/nodes away.  To improve network efficiency, it
   is desirable to establish direct data connections between VNF
   instances, as shown below.

                       Service (e.g. VOIP, Web)
     +----------+           +----------+           +----------+
     |   VNF#1  | data conn |   VNF#2  | data conn |   VNF#n  |
     | Instance |-----------| Instance |- ... ... -| Instance |


Zong, et al.             Expires March 10, 2014                 [Page 4]

Internet-Draft                Reliable VNF                September 2013


     +----------+           +----------+           +----------+
                                 ^
                                 | Virtualization
     +--------------------------------------------------------+
     |                Virtualization Platform                 |
     +--------------------------------------------------------+

              Figure 2: VNF Instances Direct Connection.


   Many of today's dedicated network devices have built-in failure
   protection and recovery mechanisms for reliability and high
   availability.  However, VNF instances are software instances running
   on general purpose servers via virtualization platform.  There are
   potentially more factors that cause VNF instance transition or even
   failure, such as:

      1) hardware failure;

      2) hardware status change such as server over-utilization, network
      congestion;

      3) software failure at various levels including hypervisor,
      Virtual Machine (VM), VNF instance;

      4) performance downgrade due to resource contention between VNF
      instances;

      5) instance migration due to server consolidation, or
      configuration change due to policy.

   Generally, VNF instance transition refers to the actions taken to
   address varying condition of hardware, software or configuration.
   Transition could be scaling in/out or scale up/down of an instance.
   Transition could be replacing an instance in same location, or moving
   instance to another location.

   Therefore, the major challenge is how to achieve reliability and high
   availability capabilities of the VNF during VNF instance transition
   or failure in the model of VNF instances direct connection.  The
   potential issues to be addressed are described in the following
   subsections.


Zong, et al.             Expires March 10, 2014                 [Page 5]

Internet-Draft                Reliable VNF                September 2013


3.1.  VNF Instance Selection and Status Monitoring

   One basic goal of reliable VNF is to select a suitable VNF instance
   from a group of candidates and replace the VNF instance in case of
   instance failure.  The issues are:

      1) Who is responsible and how to select the VNF instance?

      2) Who is responsible and how to monitor the status of instance?

3.2.  Backup Selection and Announcement

   Before a VNF instance fails, one or more backup instances of the same
   network function have to be selected and notified to the directly
   connected instances in the adjacent VNFs.  The issues are:

      1) Who is responsible and how to select the backup instances, as
      well as announce the backup instances?

      2) How to deal with the backup instance transition or failure?

3.3.  Service State Synchronization

   The service state of the VNF instance should also be synchronized
   between the VNF instance and its backup instances for stateful
   network function.  The issues are:

      1) Who is responsible and how to collect/keep the service state of
      the instance?

      2) How to synchronize service state with backup instances?

3.4.  Transition Handling

   It is important to maintain the service reliability during VNF
   instance transition.  The issues are:

      1) Who is responsible and how to notify the VNF instance
      transition to the directly connected instances in the adjacent
      VNFs?

      2) How to re-establish the network connection and session between
      a new VNF instance and the directly connected instances with an
      acceptable level of service continuity?

3.5.  Policy Enforcement


Zong, et al.             Expires March 10, 2014                 [Page 6]

Internet-Draft                Reliable VNF                September 2013


   There could be some policies reflecting the different reliability
   class of the service and hence affecting the selection of VNF
   instances.  Examples would include isolation policies requiring that
   VNF instances be placed on separate physical servers or separate DC
   sites.  Another example is to place some VNF instances in
   topologically closed locations.  The issues are:

      1) Who is responsible for receiving and enforcing the policy?

      2) Who is responsible and how to collect enough information (e.g.
      network topology, view of instances distribution) to enforce the
      policy?

4.  Working Scope

   Reliability and high availability aspects of VNF fall within a
   broader problem space than has been identified in this draft.  The
   current focus of our work is to develop tools to improve VNF
   reliability and availability, which will be applicable to a wide
   range of services.

4.1.  Reference Architecture

   There are a number of existing technologies for providing reliable
   and highly available functions, such as Reliable Server Pool
   (RSerPool) [RFC5351], Virtual Router Redundancy Protocol (VRRP)
   [RFC5798].  Although these technologies are applicable to different
   scenarios using different protocols, the underlying idea is similar.
   Both technologies provide service with an abstract object (e.g. pool
   handle in RSerPool, Virtual Router ID in VRRP) to represent a group
   of functional instances where the dynamic mapping of abstract object
   to actual serving instance, or the selection of serving instance, is
   managed internally to the group to cover failover procedure.  The
   advantage of this model is to provide service a reliable and highly
   available function in a manner that transparent to both end-hosts and
   other service sub-systems.

   Based on the above mentioned model, we could derive a reference
   architecture for reliable and highly available VNF called reliable
   VNF pools, which is illustrated as below.  Note that we reuse the
   term of pool to represent a group of VNF instances without loss of
   generality.

                             +-----------------+
                             |     Pool User   |
                             +-----------------+
                                 ^        ^
                                 |  (2)   |


Zong, et al.             Expires March 10, 2014                 [Page 7]

Internet-Draft                Reliable VNF                September 2013


                     +-----------+        +-----------+
                     |                                |
                     v                                v
              +--------------+      (4)       +--------------+
              | Pool Manager |<-------------->| Pool Manager |
              +--------------+                +--------------+
                     ^                                ^
                     | (1)                            |
                     v                                v
   +------------------------------+   +------------------------------+
   |+----------+     +----------+ |   | +----------+     +----------+|
   ||   VNF#1  |     |   VNF#1  | |(3)| |   VNF#2  |     |   VNF#2  ||
   || Instance | ... | Instance |<+---+>| Instance | ... | Instance ||
   |+----------+     +----------+ |   | +----------+     +----------+|
   |         VNF#1 Pool           |   |          VNF#2 Pool          |
   +------------------------------+   +------------------------------+

                              Figure 3: Reliable VNF Pools.


   In the architecture of reliable VNF pools, there are multiple VNF
   pools, each containing a group of VNF instances providing the same
   network function.  Each VNF pool has a Pool Manager (PM) to manage an
   abstract object of the VNF pool including VNF identity (e.g. "FW",
   "DPI") and the associated addresses of the VNF instances.  A Pool
   User (PU) could be either an application end-host or a service sub-
   system (e.g. orchestrator in DC service) requesting network function
   from the PM.  PM could also be a designated VNF instance in the VNF
   pool in the case that VNF instances are self-organized to select
   designated instance.  In such case, the PM itself provides network
   function to the PU as well.

4.2.  Proposed Working Scope

   Based on the reference architecture of reliable VNF pools, we believe
   the following design goals or working scope need to be considered as
   aspects of providing reliable and highly available VNF.

      1) The communication between VNF instances and the responsible PM
      to transmit messages such as VNF Instance Selection, Status
      Monitoring, Service State Synchronization;

      2) The communication between PU and PM to address issues like
      Policy Enforcement, VNF Instance query/response;

      3) The communication between VNF instances in different VNF pools
      to transmit messages such as Backup Announcement, Service State
      Synchronization, Transition Handling;


Zong, et al.             Expires March 10, 2014                 [Page 8]

Internet-Draft                Reliable VNF                September 2013


      4) The communication between different PMs to achieve fault-
      tolerance of PMs themselves including pool information redundancy.

   The main purpose of this section is to scope the solution space.  The
   proposed solution will be addressed by separate draft.

5.  Related Works

   In this section, we refer to some related work for potential reuse
   and extension.

5.1.  Reliable Server Pool

   Reliable Server Pool (RSerPool) supports high availability and the
   scalability of applications through the use of pools of servers
   [RFC5351].  The main functions of RSerPool involve server pool
   management, as well as receiving requests from a client to bind to a
   desired server.  The main protocols developed by RSerPool are called
   Aggregate Server Access Protocol (ASAP) [RFC5352] and Endpoint
   Handlespace Redundancy Protocol (ENRP) [RFC5353].  The architecture
   of RSerPool is shown as below.

                      +--------------+
                      |   Pool User  |
                      +--------------+
                             ^
                             | ASAP
                             V
                      +--------------+   ENRP   +--------------+
                      |  ENRP Server |<-------->|  ENRP Server |
                      +--------------+          +--------------+
                             ^
                             | ASAP
                             V
      +--------------------------------------------------+
      | +----------+   +----------+         +----------+ |
      | |    PE    |   |    PE    | ... ... |    PE    | |
      | +----------+   +----------+         +----------+ |
      | Server Pool                                      |
      +--------------------------------------------------+

                  Figure 4: Reliable Server Pool.


   The similarity and applicability of RSerPool to reliable VNF pools
   includes:


Zong, et al.             Expires March 10, 2014                 [Page 9]

Internet-Draft                Reliable VNF                September 2013


      1) Pool Elements (PEs) can be regarded as VNF instances, and an
      ENRP Server can be regarded as a PM;

      2) ASAP could be applicable to reliable VNF pools for VNF instance
      management, policy enforcement, backup announcement, service state
      synchronization, transition handling, and so on;

      3) ENRP could be applicable to reliable VNF pools for fault-
      tolerant PM.

   Potential extension to meet the full objective of reliable VNF pools
   includes:

      1) Extend ASAP to support more policy enforcement such as failure
      isolation;

      2) Extend ASAP to support more efficient instance transition.

5.2.  Virtual Router Redundancy Protocol

   Virtual Router Redundancy Protocol (VRRP) specifies an election
   protocol that dynamically assigns responsibility for a virtual router
   to one of the VRRP routers (i.e. Master) on a LAN [RFC5798].  The
   election process provides dynamic failover in the forwarding
   responsibility should the Master become unavailable.  The advantage
   of VRRP is a higher availability default path without requiring
   configuration of dynamic routing or router discovery protocols on
   every end-host.  An example is shown as below.

               +---------------+      +---------------+
               | VRRP Router#1 |      | VRRP Rouer#2  |
               |(Master, IP A) |      |(Backup, IP B) |
               |               |      |               |
       VRID=1  +---------------+      +---------------+
                       |                       |
              ---------+-----------------------+---------
                                   ^
                                   |
                                (IP A)
                                   |
                                +--+--+
                                | Host|
                                +-----+

                      Figure 5: Virtual Router (VRID=1)


Zong, et al.             Expires March 10, 2014                [Page 10]

Internet-Draft                Reliable VNF                September 2013


   The similarity and applicability of VRRP to reliable VNF pools
   includes:

      1) VRRP Routers can be regarded as VNF instances;

      2) The Master advertisement and transition between Master and
      Backup procedure can be a part of the function of PM as the
      designated VNF instance in the VNF pool.

   The gap between VRRP and reliable VNF pools includes:

      1) In VRRP, the loss of the master is infrequent, while in
      reliable VNF pools the more frequent transition of VNF instances
      means that failover efficiency is a more pressing concern;

      2) There is no policy enforcement related to reliability in VRRP.

5.3.  VNF Forwarding Graph

   VNF forwarding graph (a.k.a. service chain in a wider sense) defines
   the sequence of VNF instances that a user session must traverse [NFV-
   UC].  An example of a VNF forwarding graph is a topology in which
   user packets traverse a sequence of VNF instances of Intrusion
   Detection Service (IDS), FW, Network Address Translation (NAT) and
   LB.  Different services have different VNF forwarding graphs based on
   specific user needs and therefore service logic.

   The VNF forwarding graph and reliable VNF pools are independent but
   complementary with each other in the following ways:

      1) The VNF forwarding graph determines the sequential relation
      between VNF instances, while reliable VNF pools maintains the
      reliability and high availability of VNF.

6.  Security Considerations

   Any technology which allows the insertion, deletion, reordering, or
   manipulation of network functions has the potential to be subverted
   by an attacker, with serious consequences.  Distributed VNFs
   introduce an additional attack vector, in which bad actors join
   several VNFs of a service.  Replay attacks have the potential to
   create denials of service, reordering, adding, or removing VNFs.  VNF
   reliability technologies must provide cryptographic protections
   against spoofing and insertion attacks as well as replay attacks, in
   the form of client authentication, origin authentication on VNF
   reliability management (control plane) traffic, and replay
   protections.  There may be circumstances under which an attacker
   masquerading as a VNF manager can introduce data leakage or similar


Zong, et al.             Expires March 10, 2014                [Page 11]

Internet-Draft                Reliable VNF                September 2013


   attacks, and consequently server authentication would be required, as
   well.

7.  IANA Considerations

   This document has no actions for IANA.

8.  Acknowledgements

   The authors would like to thank Daniel King from Lancaster
   University, UK for the valuable comments to this draft.

9.  References

9.1.  Normative References

   TBD.

9.2.  Informative References

   [NFV-WP] NFV Whitepaper: "Network Function Virtualization", issue 1,
   2012, http://portal.etsi.org/NFV/NFV_White_Paper.pdf.

   [NFV-REL] ETSI GS NFV REL 001: "Network Function Virtualization;
   Resiliency Requirements", Version 0.0.1, 2013.

   [VNFP-UC] L. Xia, Q. Wu and D. King, "Use cases and Requirements for
   Virtual Service Node Pool Management", draft-xia-vsnpool-management-
   use-case-01, August 2013.

   [NFV-TERM] ETSI GS NFV 003: "Terminology for Main Conceptional
   Entities in NFV", Version 0.0.4, 2013.

   [RFC5351] P. Lei, L. Ong, M. Tuexen and T. Dreibholz, "An Overview of
   Reliable Server Pooling Protocols", RFC5351, September 2008.

   [RFC5352] R. Stewart, Q. Xie, M. Stillman and M. Tuexen, "Aggregate
   Server Access Protocol (ASAP)", RFC5352, September 2008.

   [RFC5353] Q. Xie, R. Stewart, M. Stillman, M. Tuexen and A.
   Silverton, "Endpoint Handlespace Redundancy Protocol (ENRP)",
   RFC5353, September 2008.

   [RFC5798] S.Nadas, "Virtual Router Redundancy Protocol (VRRP) Version
   3 for IPv4 and IPv6", RFC5798, March 2010.

   [NFV-UC] ETSI GS NFV 001: "Network Function Virtualization; Use
   Cases", Version 0.0.2, 2013.


Zong, et al.             Expires March 10, 2014                [Page 12]

Internet-Draft                Reliable VNF                September 2013


10.  References

Authors' Addresses

   Ning Zong
   Huawei Technologies

   Email: zongning@huawei.com


   Linda Dunbar
   Huawei Technologies

   Email: linda.dunbar@huawei.com


   Melinda Shore
   No Mountain Software

   Email: melinda.shore@nomountain.net


Zong, et al.             Expires March 10, 2014                [Page 13]