INTERNET-DRAFT Mingui Zhang Intended Status: Proposed Standard Huawei Radia Perlman Individual Contributor Hongjun Zhai ZTE Mukhtiar Shaikh Muhammad Durrani Brocade Expires: September 7, 2014 March 6, 2014 TRILL Active-Active Edge Using Multiple MAC Attachments draft-zhang-trill-aa-multi-attach-01.txt Abstract TRILL active-active service is to provide end stations with flow level load balance and resilience against link failures at the edge of TRILL campuses. This draft proposes that member RBridges in an active-active edge RBridge group use their own nicknames as ingress RBridge nicknames to encapsulate frames from attached end systems. Thus, remote edge RBridges are required to learn multiple locations of one MAC address in one VLAN. Design goals of this proposal are discussed in the document. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Mingui Zhang, et al Expires September 7, 2014 [Page 1] INTERNET-DRAFT MAC Multi-Attach for Active/Active March 6, 2014 Copyright and License Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Acronyms and Terminology . . . . . . . . . . . . . . . . . . . 3 2.1. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 5 5. Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . 5 5.1. No MAC Flip-Floping (Normal Unicast Egress) . . . . . . . . 6 5.2. Regular Unicast/Multicast Ingress . . . . . . . . . . . . . 6 5.3. Right Multicast Egress . . . . . . . . . . . . . . . . . . 6 5.3.1. No Duplication (Single Exit Point) . . . . . . . . . . 6 5.3.1. No Echo (Split Horizon) . . . . . . . . . . . . . . . . 6 5.4. No Black-hole & No Triangular Forwarding . . . . . . . . . 7 5.5. Load Balance Towards the AAE . . . . . . . . . . . . . . . 7 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . 7 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 8.2. Informative References . . . . . . . . . . . . . . . . . . 8 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 Mingui Zhang, et al Expires September 7, 2014 [Page 2] INTERNET-DRAFT MAC Multi-Attach for Active/Active March 6, 2014 1. Introduction In the TRILL Active-Active Edge (AAE) topology, a Multi-Chassis Link Aggregation Group (MC-LAG) is used to connect multiple RBridges to a switch or a vSwitch. An endnode clump is attached to this switch or vSwitch. It's required that data traffic within a specific VLAN from this endnode clump can be ingressed and egressed by any of these RBridges simultaneously. End systems in the clump can spread their traffic among these edge RBridges at the flow level. When a link fails, end systems can keep using the rest of links in the MC-LAG without waiting for the convergence of TRILL, which provides the resilience towards link failures. Since a packet from each endnode can be ingressed by any RBridge in the AAE group, a remote edge RBridge may observe multiple attachment points (i.e., egress RBridges) for this endnode identified by its MAC address. This issue is known as the "MAC flip-flopping". Three potential solutions arise to address this issue: 1) AAE member RBridges use a pseudonode nickname, instead of their own, as the ingress nickname for end systems attached to the MC- LAG. [CMT] is based on this solution. 2) AAE member RBridges split work among themselves for which ones will be responsible for which MAC addresses. A member RBridge will encapsulate the packet using its own nickname if it is responsible for the source MAC address. Otherwise, if the frame is known unicast, it encapsulates the packet using the nickname of the responsible RBridge; if the frame is multicast, it needs to redirect the packet to its responsible RBridge for encapsulation. 3) AAE member RBridges keep using their own nicknames. Remote edge RBridges are required to learn multiple points of attachment per VLAN for a MAC address attached to the AAE, and separately time each one out. The purpose of this ID is to develop an approach based on solution 3. Although it focuses on exploring solution 3, the major design goals discussed here are common for AAE. Through mirroring the scenarios studied in this draft, other potential solutions may benefit as well. The main body of the document is organized as follows. Section 2 lists the acronyms and terminologies. Section 3 gives the overview model. Section 4 gives three options for incremental deployment. Section 5 describes how this approach meets the design goals. 2. Acronyms and Terminology Mingui Zhang, et al Expires September 7, 2014 [Page 3] INTERNET-DRAFT MAC Multi-Attach for Active/Active March 6, 2014 2.1. Acronyms TRILL: TRansparent Interconnection of Lots of Links AAE: Active/Active Edge MC-LAG: Multi-Chassis Link Aggregation Group IS-IS: Intermediate System to Intermediate System 2.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Familiarity with [RFC6325], [RFC6327], [6327bis] and [RFC6439] is assumed in this document. 3. Overview +-----+ | RB4 | +----------+-----+----------+ | | | | | Rest of campus | | | | | +-+-----+--+-----+--+-----+-+ | RB1 | | RB2 | | RB3 | +-----\ +-----+ /-----+ \ | / \ | / |||MC-LAG ||| +---+ | B | +---+ H1 H2 H3 H4: vlan 10 Figure 3.1: An example topology of TRILL Active-Active Edge Figure 3.1 shows an example network of TRILL Active-Active Edge. In this figure, endnodes (H1, H2, H3 and H4) are attached to a bridge (B) which communicates with multiple RBridges (RB1, RB2 and RB3) via the MC-LAG. Suppose RB4 is a 'remote' RBridge out of the AAE group in the TRILL campus. This connection model is also applicable to the virtualized environment where the physical bridge can be replaced with a vSwitch while those bare metal hosts are replaced with virtual machines (VM). Mingui Zhang, et al Expires September 7, 2014 [Page 4] INTERNET-DRAFT MAC Multi-Attach for Active/Active March 6, 2014 For a packet received from their attached endnode clumps, member RBridges of the AAE group always encapsulate it using their own nickname no matter it's unicast or multicast. In this proposal, all edge RBridges in the entire campus need to learn multiple attachment points for each MAC address, and separately time each one out. 4. Backward Compatibility Three options are listed below to cope with incremental deployment scenarios. -- Option A A new capability announcement would appear in LSPs. "I can cope with multiple endnode attachments". Only if all edge RBridges announce this capability can the AAE group use this approach. For those legacy RBridges who are not capable to cope with multiple endnode attachments, new type TRILL switches will not establish connectivity with them so that they are isolated from these new type TRILL switches. Note only edge RBridges (those that are Appointed Forwarders [RFC6439]) need to be able to support this. It does not affect totally transit RBridges. -- Option B Each edge RBridge in the AAE group ingress data frames from any MC-LAG into a specific topology. In this way, the topology ID is used as the discriminator of different locations of a specific MAC address at the remote RBridge. TRILL MAY reserve a list of topology IDs to be dedicated to AAE. RBridges which do not support this reserved list MUST NOT establish connectivity with edge RBridges in the AAE group. -- Option C If the data plane learning of all RBridges does not support the multi-location learning feature. It's possible to make use of the ESADI protocol [ESADI] to distribute MAC addresses. Compared to the data plane learning, TRILL ESADI allows one RBridge to remember multiple locations of a MAC address at the control plane. 5. Design Goals Proposals for the major design goals of AAE are explored in this section. Mingui Zhang, et al Expires September 7, 2014 [Page 5] INTERNET-DRAFT MAC Multi-Attach for Active/Active March 6, 2014 5.1. No MAC Flip-Floping (Normal Unicast Egress) Since all RBridges talking with the AAE RBridges in the campus are able to keep multiple locations for one MAC address, a MAC address learnt from one AAE member will not be overwritten by the same MAC address learnt from another AAE member. Multiple entries for this MAC address will be created. The remote RBridge can adhere to one of the locations (e.g., the closest one) for each MAC address rather than keep flip-floping among them. 5.2. Regular Unicast/Multicast Ingress MC-LAG guarantees that each frame will be sent upward to the AAE via exactly one uplink. RBridges in the AAE can simply follow the process per [RFC6325] to ingress the frame. For example, each RBridge uses its own nickname as the ingress nickname to encapsulate the packet. In such scenario, each RBridge takes for granted that it is the Appointed Forwarder for the VLANs enabled on this MC-LAG. 5.3. Right Multicast Egress A fundamental design goal of AAE is that there is no duplication and forwarding loop. 5.3.1. No Duplication (Single Exit Point) When multi-destination packets for a specific VLAN are received from the campus, it's important that exactly one RBridge out of the AAE group let through each multicast packet, therefore no duplication happens. The single exit point can be selected based on static algorithms, e.g., VLAN or source MAC address 'mod' the number of AAE members. Also, AAE member RBridges may listen to the LACP PDUs and make use of the hashing function of MC-LAG to determine this single exit point. 5.3.1. No Echo (Split Horizon) When a multicast frame originated from an MC-LAG is ingressed by an RBridge of an AAE group, forwarded across the TRILL network and then received by another RBridge in the same AAE group, it is important that this RBridge does not egress this frame back to this MC-LAG. Otherwise, it will cause a forwarding loop (echo). The well known 'split horizon' technique can be used to eliminate the echo issue. The essential point for split horizon is that the MC-LAG is appointed with an unique identifier across the AAE group. When an AAE member receives a multicast packet has this identifier, the receiver MUST NOT egress it to the MC-LAG with the same identifier. Mingui Zhang, et al Expires September 7, 2014 [Page 6] INTERNET-DRAFT MAC Multi-Attach for Active/Active March 6, 2014 This document propose to split horizon based on the tuple consisting of the Fine Grained Label (FGL) plus the ingress RBridge nickname. When there are multiple MC-LAGs connected to the same RBridge, each MC-LAG MUST be assigned with an unique FGL. RBridges in an AAE group should discover and remember nicknames of other members. If a multicast packet is from an edge RBridge in a same AAE group as RB1, its FGL will be read and RB1 MUST NOT egress it out of the interface configured with the same FGL. For other interfaces, RB1 SHOULD egress the packet. 5.4. No Black-hole & No Triangular Forwarding If a sub-link of the MC-LAG fails while remote RBridges continue to send packets to those MAC addresses they have learnt via the failed port, black-hole happens. The proposal in this draft may make use of MAC withdrawal. When a member RBridge detects that the port connected to a sub-link of the MC-LAG fails, all MAC addresses attached to this RBridge through the failed sub-link will be flushed. After doing that, no traffic will be sent via the failed port, hence no black-hole happens. 5.5. Load Balance Towards the AAE Since a remote RBridge can record multiple attachments of one MAC address, this remote RBridge can choose to spread the traffic to this MAC towards any of the AAE members. Each of them is able to egress the traffic. Flow-level load balance mechanisms can be implemented to optimize the distribution of the traffic load towards the AAE group. 6. Security Considerations Security issue should be considered when a specific extension is made to TRILL. Authenticity for contents transported in IS-IS PDUs is enforced using regular IS-IS security mechanism [ISIS][RFC5310]. For security considerations pertain to extensions hosted by TRILL ESADI should refer to the Security Considerations in [ESADI]. 7. IANA Considerations This document requires no IANA actions. RFC Editor: please remove this section before publication. Acknowledgements Mingui Zhang, et al Expires September 7, 2014 [Page 7] INTERNET-DRAFT MAC Multi-Attach for Active/Active March 6, 2014 The authors would like to thank the comments and suggestions from Donald Eastlake, Erik Nordmark, Fangwei Hu and Liang Xia. 8. References 8.1. Normative References [RFC6325] Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. Ghanwani, "Routing Bridges (RBridges): Base Protocol Specification", RFC 6325, July 2011. [RFC6327] Eastlake 3rd, D., Perlman, R., Ghanwani, A., Dutt, D., and V. Manral, "Routing Bridges (RBridges): Adjacency", RFC 6327, July 2011. [6327bis] D. Eastlake, R. Perlman, et al, "TRILL: Adjacency", draft- ietf-trill-rfc6327bis-04.txt, January 2014, in RFC Ed Queue. [RFC6439] Perlman, R., Eastlake, D., Li, Y., Banerjee, A., and F. Hu, "Routing Bridges (RBridges): Appointed Forwarders", RFC 6439, November 2011. [ESADI] H. Zhai, F. Hu, et al, "TRILL (Transparent Interconnection of Lots of Links): ESADI (End Station Address Distribution Information) Protocol", draft-ietf-trill-esadi-05.txt, February 2014, working in progress. 8.2. Informative References [CMT] T. Senevirathne, J. Pathangi, et al, "Coordinated Multicast Trees (CMT)for TRILL", draft-ietf-trill-cmt-02.txt, November 2012, working in progress. [ISIS] ISO, "Intermediate system to Intermediate system routeing information exchange protocol for use in conjunction with the Protocol for providing the Connectionless-mode Network Service (ISO 8473)", ISO/IEC 10589:2002. [RFC5310] Bhatia, M., Manral, V., Li, T., Atkinson, R., White, R., and M. Fanto, "IS-IS Generic Cryptographic Authentication", RFC 5310, February 2009. Mingui Zhang, et al Expires September 7, 2014 [Page 8] INTERNET-DRAFT MAC Multi-Attach for Active/Active March 6, 2014 Author's Addresses Mingui Zhang Huawei Technologies No.156 Beiqing Rd. Haidian District, Beijing 100095 P.R. China Email: zhangmingui@huawei.com Radia Perlman Individual Contributor Email: radiaperlman@gmail.com Hongjun Zhai ZTE Corporation 68 Zijinghua Road Nanjing 200012 China Phone: +86-25-52877345 Email: zhai.hongjun@zte.com.cn Mukhtiar Shaikh Brocade Email: mshaikh@brocade.com Muhammad Durrani Brocade Email: mdurrani@brocade.com Mingui Zhang, et al Expires September 7, 2014 [Page 9]