Network Virtualization Overlays Working L. Xia Group Q. Wu Internet-Draft Huawei Intended status: Standards Track June 28, 2013 Expires: December 30, 2013 Tenant system information discovery approaches Gap analysis draft-wu-nvo3-mac-learning-arp-03 Abstract This document analyzes various protocol solutions for tenant system information (e.g. MAC, IP, etc) discovery in the virtualization environment (e.g.,MAC in MAC, MAC in IP, IP in IP) and identifies the gap against NVO3 control plane and data plane requirements. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on December 30, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Xia & Wu Expires December 30, 2013 [Page 1] Internet-Draft TS Info Discovery Gap Analysis June 2013 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Conventions used in this document . . . . . . . . . . . . . . 4 3. Overview of tenant system information discovery in the virtualization domain using NVO3 . . . . . . . . . . . . . . . 5 3.1. Issues with tenant system information discovery in the virtualization domain using NVO3 . . . . . . . . . . . . . 6 4. Related work for Tenant system information discovery . . . . . 8 4.1. SPB and TRILL . . . . . . . . . . . . . . . . . . . . . . 8 4.2. ARMD and SARP . . . . . . . . . . . . . . . . . . . . . . 8 4.3. BGP/MPLS IP VPNs - Distributed control plane distribution . . . . . . . . . . . . . . . . . . . . . . . 9 4.4. BGP/MPLS Ethernet VPNs and PBB-EVPN . . . . . . . . . . . 10 4.5. VPLS - ARP + data plane learning . . . . . . . . . . . . . 10 4.6. LISP - Centralized control plane distribution . . . . . . 11 5. Gap Analysis and Discuss . . . . . . . . . . . . . . . . . . . 13 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 15 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 9. Normative References . . . . . . . . . . . . . . . . . . . . . 18 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 Xia & Wu Expires December 30, 2013 [Page 2] Internet-Draft TS Info Discovery Gap Analysis June 2013 1. Introduction The tenant system information in this document is referred to as L2 address and L3 address of VM. As described in [I.D-ietf-nvo3- framework], for an L2 NVE, the NVE needs to be able to determine MAC addresses of the tenant system. For an L3 NVE, the NVE needs to be able to determine IP addresses of the tenant system. This can be achieved mainly in 3 ways: data plane learning; ARP; control plane distribution (e.g. by BGP or IS-IS). This document analyzes various protocol solutions for tenant system information (e.g. MAC, IP, etc) discovery in the virtualization environment (e.g.,MAC in MAC, MAC in IP, IP in IP) and identifies the gap against NVO3 control plane and data plane requirements. Xia & Wu Expires December 30, 2013 [Page 3] Internet-Draft TS Info Discovery Gap Analysis June 2013 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 [RFC2119]. Xia & Wu Expires December 30, 2013 [Page 4] Internet-Draft TS Info Discovery Gap Analysis June 2013 3. Overview of tenant system information discovery in the virtualization domain using NVO3 Tenant system information discovery can be achieved either using dynamic data plane learning or ARP or control plane distribution. This document addresses how tenant system information discovery works in the overlay network enviroment. Figure 1 shows the NVO3 reference architecture for tenant system information discovery. The reference architecture assumes that: o Tenant system A in DC site X wants to establish communication with tenant system B in the DC site Y. o Tenant system A is connecting to VN by attaching to NVE X. Tenant System A knows IP address of Tenant System B using out of band means but does not know MAC address of Tenant System B. o Tenant system B is connecting to VN by attaching to NVE Y. Tenant System B knows IP address of Tenant System A using out of band means but does not know MAC address of Tenant System A. o NVE X associated with tenant system A doesn't know IP address and MAC address of tenant system B. o NVE Y associated with tenant system B doesn't know IP address and MAC address of tenant system A. Xia & Wu Expires December 30, 2013 [Page 5] Internet-Draft TS Info Discovery Gap Analysis June 2013 ,---------. ,' Backend `. ( NVA ) `. ,' `-+------+' | | .--..--. .--. .. ( ' '.--. .-.' L3 ' ( Overlay ) ( '-' .'--'._.'.-._.'.-._) NVE X = // \\ NVE Y = (MAC_X,IP_X) +------+ +-------+(MAC_Y,IP_Y) .-|NVE X | | NVE Y | ( +------+--. ( +-------+.--. .-.' ' .-.' ' ( DC Site X ) ( DC Site Y ) ( .'-' ( .'-' '--'._.'. ) '--'._.'. ) '--' / '--' / \ / \ __/_ \ /_ _\__ '--------' '--------' '--------' '--------' : Tenant : : Tenant : : Tenant : : Tenant : : SystemA: : SystemC: : SystemD: : SystemB: '--------' '--------' '--------' '--------' TSID= TSID= (VNID,MAC_A,IP_A) (VNID,MAC_B,IP_B) Figure 1: Example of NVO3 reference architecture for tenant system information discovery 3.1. Issues with tenant system information discovery in the virtualization domain using NVO3 Here we give an example of tenant system information discovery in large layer 2 domain using NVO3 using traditional approach for MAC address learning. The packet flow and control plane operation are as follows: 1. Tenant system A sends a broadcast ARP message to discover the MAC address of Tenant system B. The message contains IP_B in the ARP message payload. 2. The ARP proxy [RFC1027] in NVE X, receiving the ARP message and knowing source and destination are in the different subnet will encapsulate it with overlay header and outer header and flood it Xia & Wu Expires December 30, 2013 [Page 6] Internet-Draft TS Info Discovery Gap Analysis June 2013 on the overlay network for TSID = . VNID is included in the overlay header. 3. The ARP message will be processed by NVE Y which maintains mapping table matching TSID = . NVE Y, will forward the ARP message to tenant system B. Tenant System B sends ARP reply to tenant system A containing MAC_B. 4. NVE X processes ARP reply message and populates the mapping table with the received entry, then sends it to Tenant System A that includes MAC_B and IP_B of Tenant System B. 5. Tenant system A learns MAC_B from the ARP rely message and can now send a packet to Tenant system B by including MAC_B, and IP_B, as destination addresses. The issues with tenant system information discovery are as follows: o The demand on the forwarding table capacity at each NVE is increased compared to non-virtualized environments since layer 2 network is no longer constrained to small local network and has a need for millions of hosts. o If Address resolution protocol is used for control plane learning, it may cause excessive flooding since ARP packets need to be flooded over the whole overlay network. the ARP/ND processing load imposes great challenge on L2/L3 boundary routers. o Dynamic data plane learning implies that flooding of unknown destinations be supported and hence implies that broadcast and/or multicast be supported or that ingress replication, which may cause excessive flooding issue and lead to significant scalability limitations. o A control plane protocol (e.g., BGP) that carries both MAC and IP addresses eliminates the need for ARP, however some NVEs or DC Gateways may not support complex control plane protocol, for example, BGP protocol. Xia & Wu Expires December 30, 2013 [Page 7] Internet-Draft TS Info Discovery Gap Analysis June 2013 4. Related work for Tenant system information discovery Currently, 3 main solutions or their combination can be used to perform the tenant system information discovery. They are dynamic data plane learning, ARP, control plane distribution (including two options: centralized or distributed). Additionally, the ARP proxy [RFC1027] mechanism can be used for preventing the ARP flooding in the core network and limiting the MAC table size of NVEs and hosts. Here is a brief analysis of them and the associated protocols are discussed. 4.1. SPB and TRILL Shortest Path Bridging (SPB) [SPB] and TRILL [TRILL] are two different methods of IS-IS based overlay that operates over L2 Ethernets. They all use the MAC in MAC encapsulation and have the same default MAC address learning method: o Using IS-IS extension for outer MAC address distribution over the SPB area or TRILL campus network; o Using ARP or data plane snooping for inner MAC address learning of locally attached hosts. o In addition, the TRILL maybe use [draft-ietf-trill-directory-framework] distributes the inner MAC address between all the RBriges In the centralized approach, TRILL may use TRILL ESADI to distribute the inner MAC address between all the RBridges however SPB doesn't support ESADI distribution mechanism. In the distributed approach, SPB and TRILL may use combination of the above 3 methods. 4.2. ARMD and SARP The ARMD WG examined data center scaling issues with a focus on address resolution and developed a problem statement document [RFC6820]. In this document, the scaling issues of MAC address learning related to the overlay-based approach are listed as followed: ARP processing on Routers: This issue mainly concerns about the significant amount of ARP traffic or BUM packets traffic in large L2 broadcast domains and its impact to the routers. Finally, some optimized method are proposed; IPv6 Neighbor Discovery has the similar issue as ARP processing on router; Xia & Wu Expires December 30, 2013 [Page 8] Internet-Draft TS Info Discovery Gap Analysis June 2013 MAC Address Table Size Limitations at Switches: This issue mainly concerns on the MAC Address Table Size Limitations when the VM number is very large in the Virtualized data center environment. In order to tackle the above problems, SARP [SARP] seamlessly supports Layer 2 network virtualization services over the overlay network and significantly reduces their complexity in terms of table size and performance. The overlay networks are only required to map MAC addresses of the SARP proxies, instead of MAC address of the destination end host, to the correct tunnel. 4.3. BGP/MPLS IP VPNs - Distributed control plane distribution BGP/MPLS IP VPNs [RFC4364] provides IP Virtual Private Networks (VPNs) for its customers and support VPN traffic isolation, address overlapping and separation between customer networks. The BGP/MPLS control plane is used to distribute both the VPN labels and the tenant system IP addresses that are used to identify the customer. However BGP/MPLS IP VPN doesn't support interconnection with Data Center (DC) overlay networks and provide a virtual end to end tenant network service to tenant systems in the BGP/MPLS IPVPN.It also has the scalability related problems when IP addresses of a large number of VMs need to be propagated in control plane in the Virtualized data center environments. For an L3 overlay node, the overlay node only needs to determine IP addresses of the tenant system but doesn't need to know the MAC address of the destination system since overlay tunnels the L3 traffic from the tenant system in an encapsulated format to the final destination and doesn't care about the MAC address of destination end system for the inner L3 packet. Therefore overlay node can answer any address resolution query with its own MAC address or one virtual MAC address. In [I.D-ietf-l3vpn-end-system], NVE uses XMPP to exchange information with the tenant system and answer the address resolution query from tenant system with a virtual router MAC address. In order to propagate tenant system information to the whole overlay network environment, [I.D-ietf-l3vpn-end-system]use Route Server to gather VPN membership on each Forwarder and IP addresses that are currently associated with each virtual interface of tenant system and advertise them to the BGP speaker. In addition, BGP speaker also can interact with Route Server to generate tenant system information update to the upstream end systems. Xia & Wu Expires December 30, 2013 [Page 9] Internet-Draft TS Info Discovery Gap Analysis June 2013 4.4. BGP/MPLS Ethernet VPNs and PBB-EVPN Ethernet Virtual Private Networks (E-VPNs) [I-D.ietf-l2vpn-evpn] provide an emulated L2 service in which each tenant has its own Ethernet network over a common IP or MPLS infrastructure. PBB-EVPN [I-D.ietf -l2vpn-pbb-evpn] is a combined solution of PBB and E-VPN. They all use BGP for MAC address distribution over the core MPLS/IP network, and use ARP or data plane snooping for MAC address learning of locally attached hosts. In other words, the mapping table information should be distributed to all the remote overlay nodes that belong to the same VN. After that,the tenant system information is distributed from remote overlay nodes to all the remote tenant system. When all the tenant system information is populated, overlay nodes will process the packet from each tenant system and perform a lookup operation in its map table for the destination TSID= and determine which tunnel the packet needs to be sent to. The analysis of their MAC address learning methods is as followed: Pros: o ARP broadcast Suppression: They all construct ARP caches on the PEs and synchronize them either via BGP or data plane snooping. The PEs act as ARP proxies for locally attached hosts, thereby preventing repeated ARP broadcast over the core MPLS/IP network; o Comparing E-VPN, PBB-EVPN reduces the number of BGP MAC advertisement routes, provide C-MAC address mobility, confine the scope of C-MAC address learning to only active flows, offer per site policies and avoid C-MAC address flushing on topology changes. Con: An E-VPN PE sends a BGP MAC Advertisement Route per customer/ client MAC (C-MAC) address. This will raise the scalability related problems in the case of Virtualized data center environments where the number of virtual machines (VMs) is very large. 4.5. VPLS - ARP + data plane learning VPLS is an L2 VPN technology. VPLS uses the ARP and data plane learning for L2 tenant system information discovery, and not advertised and distributed via a BGP/LDP control plane. The analysis of this method is as followed: Pros: Xia & Wu Expires December 30, 2013 [Page 10] Internet-Draft TS Info Discovery Gap Analysis June 2013 o Reducing complexity and work burden of the control plane by decreasing the control packets; o MAC address learning based on active flows can save the space of MAC mapping table. Cons: o PE will learn all active MACs over the associated PW by BUM flooding of data plane. But, some active MACs is not destined to the PE; o Unlike the active MAC withdraw mechanism in control plane, PE cannot flush MAC address real-time in data plane, when host MACs behind the PE are changed. 4.6. LISP - Centralized control plane distribution LISP[RFC6830] essentially provides an IP over IP overlay where the internal addresses are end station Identifiers and the outer IP addresses represent the location of the end station within the core IP network topology. [draft-maino-nvo3-lisp-cp-02] discusses L2 over L3 LISP Encapsulation and proposes a LISP Mapping System for ARP resolution to eliminate the flooding of ARP traffic and further reduce the need for multicast in the underlay network. This system relies on mapping system for tenant system information distribution and involves MAP-request/MAP-Response message exchange between overlay node and mapping system. With introduced LISP Mapping system, the scalability is improved for tenant system information discovery. the packet flow and control plane operation are as follows: o Tenant System A sends a broadcast ARP message to discover the MAC address of Tenant system B. The message contains IP_B in the ARP message payload. o NVE X as an ARP proxy, receiving the ARP message and knowing source and destination are in the different subnet[RFC1027], but rather than flooding it on the overlay network, sends a Map- Request(i.e.,LISP signaling) to the backend LISP mapping system (i.e.,NVA) that maintains mapping information for entire overlay network for TSID = . o The Map-Request is reflected by the backend LISP Mapping system to NVE Y, that will send a Map-Reply back to NVE X containing the mapping TSID=. Alternatively, depending on the Backend LISP Mapping system configuration, the backend LISP Xia & Wu Expires December 30, 2013 [Page 11] Internet-Draft TS Info Discovery Gap Analysis June 2013 Mapping system may send directly a Map-Reply to NVE X. o NVE X populates the mapping table with the received entry, and sends an ARP-Agent Reply to Tenant System A that includes MAC_B and IP_B. o Tenant system A learns MAC_B from the ARP message and can now send a packet to Tenant system B by including MAC_B, and IP_B, as destination addresses. Xia & Wu Expires December 30, 2013 [Page 12] Internet-Draft TS Info Discovery Gap Analysis June 2013 5. Gap Analysis and Discuss The following table compares several tenant system information discovery methods from different aspects under the same network topology and scale. +-----------+-------------+--------------+-------------+------------+ | TS | Forwarding | Packets |Control plane| Directory | | Discovery | table | flooding |Distribution Support | | method | size | impact | support | | +-----------+-------------+--------------+-------------+------------+ | | | | | | | SPB | | | | Trill:Yes | | &TRILL | Mediaum | Medium | Yes | SPB:No | | | | | | | | | | | | | +-----------+-------------+--------------+-------------+------------+ | | | | | | | | | | | | | ARMD&SARP | Small | Medium | No | No | | | | | | | | | | | | | +-----------+-------------+--------------+-------------+------------+ | | | | | | | LISP | | | |LISP Mapping| | + | Medium | Medium | Yes | System | + ARP proxy | | | | | | | | | | | +-----------+-------------+--------------+-------------+------------+ | | | | | | | BGP/MPLS | | | | | | IP | Large | Large | Yes | No | | VPN | | | | | | | | | | | +-----------+-------------+--------------+-------------+------------+ | | | | | | | BGP/MPLS | | | | | | Ethernet | Large | Large | Yes | No | | VPN | | | | | | | | | | | +-----------+-------------+--------------+-------------+------------+ | | | | | | | VPLS | | | | | | + | Medium | Small | Yes | No | | ARP proxy | | | | | | | | | | | +-----------+-------------+--------------+-------------+------------+ Table 1: The comparison between several tenant system Xia & Wu Expires December 30, 2013 [Page 13] Internet-Draft TS Info Discovery Gap Analysis June 2013 information discovery methods Xia & Wu Expires December 30, 2013 [Page 14] Internet-Draft TS Info Discovery Gap Analysis June 2013 6. Conclusion There are three ways for tenant system information discovery, data plane learning and control plane ARP learning and control plane distribution. In large layer 2 domain, the MAC address can not be simply learnt by looking at the outer layer 2 header, instead, Deeper parsing inner Ethernet header is required. However it also introduces a lot of processing overhead. In order to address this issue, the control plane distribution is proposed, and used to carry both MAC address and IP address and eliminate the above data plane learning issue. However distribution protocol is needed. How distribution protocol is used to propagate tenant system information and mapping table information in large scale and in a more efficient way is still under study. Xia & Wu Expires December 30, 2013 [Page 15] Internet-Draft TS Info Discovery Gap Analysis June 2013 7. IANA Considerations This document has no actions for IANA. Xia & Wu Expires December 30, 2013 [Page 16] Internet-Draft TS Info Discovery Gap Analysis June 2013 8. Security Considerations TBC. Xia & Wu Expires December 30, 2013 [Page 17] Internet-Draft TS Info Discovery Gap Analysis June 2013 9. Normative References [ESADI] Eastlake, D., "TRILL (Transparent Interconnection of Lots of Links): ESADI (End Station Address Distribution Information) Protocol", ID draft-ietf-trill-esadi-02, February 2013. [I-D.ietf-l2vpn-evpn] Sajassi, A. and R. Aggarwal, "BGP MPLS Based Ethernet VPN", ID draft-ietf-l2vpn-evpn-03, February 2013. [I-D.ietf-l2vpn-pbb-evpn] Sajassi, A., "PBB-EVPN", ID draft-ietf-l2vpn-pbb-evpn-04, April 2013. [I-D.ietf-trill-directory-framework] Dunbar, L. and D. Eastlake, "TRILL (Transparent Interconnection of Lots of Links): Edge Directory Assistance Framework", ID draft-ietf-trill-directory-framework-05, April 2013. [I.D-ietf-l3vpn-end-system] Marques, P., "BGP-signaled end-system IP/VPNs", ID draft-maino-nvo3-lisp-cp-02, April 2013. [I.D-ietf-nvo3-framework] Lasserre, M., "Framework for DC Network Virtualization", ID draft-ietf-nvo3-framework-00, September 2012. [RFC1027] Carl-Mitchell, S., "Using ARP to Implement Transparent Subnet Gateways", October 1987. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", March 1997. [RFC4364] Rosen, E., "BGP/MPLS IP Virtual Private Networks (VPNs)", February 2006. [RFC6325] Perlman, R., "RBridges: Base Protocol Specification", July 2011. [RFC6820] Farinacci, D., "The Locator/ID Separation Protocol (LISP)", January 2013. [RFC6830] Farinacci, D., "The Locator/ID Separation Protocol (LISP)", January 2013. [SARP] Dunbar, L. and I. Yerushalmi, "Scaling the Address Xia & Wu Expires December 30, 2013 [Page 18] Internet-Draft TS Info Discovery Gap Analysis June 2013 Resolution Protocol for Large Data Centers (SARP)", ID draft-nachum-sarp-04, February 2013. [SPB] "IEEE standard for local and metropolitan area networks: Media access control (MAC) bridges and virtual bridged local area networks -- Amendment 20: Shortest path bridging", IEEE 802.1aq, June 2012. [draft-maino-nvo3-lisp-cp] Maino, F. and R. Aggarwal, "LISP Control Plane for Network Virtualization Overlays", ID draft-maino-nvo3-lisp-cp-02, April 2013. Xia & Wu Expires December 30, 2013 [Page 19] Internet-Draft TS Info Discovery Gap Analysis June 2013 Authors' Addresses Liang Xia Huawei 101 Software Avenue, Yuhua District Nanjing, Jiangsu 210012 China Email: frank.xialiang@huawei.com Qin Wu Huawei 101 Software Avenue, Yuhua District Nanjing, Jiangsu 210012 China Email: bill.wu@huawei.com Xia & Wu Expires December 30, 2013 [Page 20]