INTERNET-DRAFT Mingui Zhang Intended Status: Proposed Standard Donald Eastlake Expires: February 23, 2014 Huawei August 22, 2013 Problem Statement: TRILL Active/Active Edge draft-zhang-trill-aggregation-04.txt Abstract This document specifies TRILL active/active edge which allows multiple RBridges concurrently forward data frames of the same VLAN on links bundled by a Multi-Chassis Link Aggregation Group. With this kind of connection, end nodes may increase the bandwidth and reliability of the access at the edge of TRILL campuses. It's required that no loop or duplication is caused by this new connection type. Besides this basic requirement, this document outlines other potential issues associated with TRILL active/active edge and investigates how these issues may be addressed. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Mingui Zhang, et al Expires February 23, 2014 [Page 1] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Acronyms and Terminology . . . . . . . . . . . . . . . . . . . 3 2.1. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Frame Processing . . . . . . . . . . . . . . . . . . . . . . . 6 4.1. Unicast Ingressing . . . . . . . . . . . . . . . . . . . . 6 4.2. Unicast Egressing . . . . . . . . . . . . . . . . . . . . . 6 4.3. Multicast Ingressing . . . . . . . . . . . . . . . . . . . 6 4.4. Multicast Egressing . . . . . . . . . . . . . . . . . . . . 6 5. DRB and Pseudonode . . . . . . . . . . . . . . . . . . . . . . 7 6. MAC Addresses Sharing . . . . . . . . . . . . . . . . . . . . . 8 7. Failures and Self-healing . . . . . . . . . . . . . . . . . . . 9 7.1. Link Failure . . . . . . . . . . . . . . . . . . . . . . . 9 7.2. Node Failure . . . . . . . . . . . . . . . . . . . . . . . 9 8. Reverse Path Forwarding Check . . . . . . . . . . . . . . . . . 9 9. Security Considerations . . . . . . . . . . . . . . . . . . . . 11 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11.1. Normative References . . . . . . . . . . . . . . . . . . . 11 11.2. Informative References . . . . . . . . . . . . . . . . . . 11 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 Mingui Zhang, et al Expires February 23, 2014 [Page 2] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 1. Introduction TRILL makes use of the ISIS link state routing to provide least cost paths between TRILL switches (a.k.a. Routing Bridge, RBridge). When a multi-access LAN link connects end-stations to multiple RBridges, a single RBridge has to be appointed as the frame forwarder for each VLAN-x on this LAN link. Other RBridges MAY be appointed as frame forwarders for other VLANs but MUST be inhibited from forwarding frames for the same VLAN-x on this LAN link [RFC6349]. An MC-LAG can also be used to connect end-stations to multiple RBridges. There are two possible scenarios: (a) an end-station is connected to multiple RBridges by an MC-LAG directly; (b) end- stations are attached to a bridge and this bridge uses an MC-LAG to connect multiple RBridges. An MC-LAG may choose any component link to forward frames and never forwards between them. Therefore, it requires the up-connected RBridges to provide active/active attachment instead of the active/standby mode adopted in the Appointed Forwarder mechanism [RFC6349]. This kind of attachment allows end nodes increase the bandwidth and reliability of their access to the TRILL campus via MC-LAG. Similar as a LAN link, an MC-LAG can be represented by a pseudonode. All member RBridges should report their adjacencies to this pseudonode using LSPs. In this way, RBridges attached to the same MC- LAG forms an active/active edge group. Other RBridges in the campus communicate with this pseudonode using forwarding paths computed according to ISIS link state routing. No additional add-on characteristics are required. The baseline requirement is that the active/active edge MUST provide frame forwarding without causing loops or duplications to TRILL campus and the end node. In order to work properly, the TRILL active/active edge has to conduct several other issues. The purpose of this document is to outline these issues while specific solutions to address them are to be explored in the future as building blocks of the whole TRILL active/active edge mechanism. The rest of this document is organized as follows. Section 2 gives acronyms and terminology. Section 3 provides an overview. Section 4 specifies the frame processing behaviors of member RBridges. Section 5 describes how pseudonode is set up. Section 6 explains the MAC sharing among member RBridges. Section 7 describes the self-healing issue. Section 8 investigates how to go through Reverse Path Forwarding Check without packet loss. 2. Acronyms and Terminology Mingui Zhang, et al Expires February 23, 2014 [Page 3] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 2.1. Acronyms MC-LAG: Multi-Chassis Link Aggregation Group ISIS: Intermediate System to Intermediate System TRILL: TRansparent Interconnection of Lots of Links AF: Appointed Forwarder DT: Distribution Tree RPFC: Reverse Path Forwarding Check 2.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. In this document, the term "end node" means the end station or bridge connected to the TRILL active/active edge by MC-LAG. Familiarity with [RFC6325], [RFC6327], and [RFC6349] is assumed in this document. As in [RFC6325], in this document the word "link" means a "bridged LAN", unless otherwise qualified. 3. Overview If an end node (end station or bridge) uses an MC-LAG to connect multiple edge RBridges, it's expected that all these RBridges can ingress and egress frames for the end node. In contrast, if multiple RBridges are connected to a LAN link, only one of them can be appointed as the frame forwarder for each VLAN-x [RFC6349], as illustrated in Figure 2.1 (a). Other RBridges will be inhibited from ingressing and egressing frames for VLAN-x. Mingui Zhang, et al Expires February 23, 2014 [Page 4] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 +-----+ +-----+ | RBi | | RBi |(Remote RBridge) +-----+ +-----+ /\/\/\/\/\/\ /\/\/\/\/\/\ / Transit \ / Transit \ < RBridges > < RBridges > \ / \ / \/\/\/\/\/\/ \/\/\/\/\/\/ | | | | +-----+ +-----+ +-----+ +-----+ | RB1 |--| RB2 | | RB1 |--| RB2 |(Active/Active Edge) +-----+ +-----+ +-----+ +-----+ AF\ / \ / +---+ ******* |LAN| * RBv * (Virtual RBridge) +---+ ******* | |(MC-LAG) +---+ | E | +---+ (a) Appointed Forwarder (b) Active/Active Edge Figure 2.1: TRILL Appointed Forwarder vs Active-Active Edge As illustrated in Figure 2.1 (b), The end node 'E' are attached to both RB1 and RB2 using an MC-LAG. Each member RBridge can ingress and egress frames for the end node for VLAN-x. If each of them uses its own nickname as the ingress nickname, the remote RBridge may observe different locations for one MAC address at different time, which is referred as the "MAC move" problem in this document. The MAC move problem affects the path selection at the remote RBridge. Frames destined to the end node may go through different paths, which may cause frame disorder of a traffic flow. In order to avoid the MAC move problem, each member RBridge should use a uniform nickname as the ingress nickname in TRILL data frame encapsulation. As shown in Figure 2.1 (b), member RBridges pretend there is an virtual RBridge connected to them, acting as the appointed forwarder of the end node. It is naturally to denote this virtual RBridge as a pseudonode. All RBridges connected to the MC-LAG forms adjacencies with the pseudonode. Other RBridges believe there is an RBridge RBv connecting RB1, RB2. Note that member RBridges SHOULD NOT announce they are VLAN-x Appointed Forwarder if VLAN-x is enabled on the MC-LAG. Although the above example includes two edge RBridges, the TRILL active/active edge solution SHOULD support cases with more than two member RBridges. Mingui Zhang, et al Expires February 23, 2014 [Page 5] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 4. Frame Processing When the end node injects frames into the TRILL campus via a member RBridge, this RBridge encapsulates the native frames on behalf of the pseudonode. When frames are sent to the end node, the pseudonode is supposed to be the egress RBridge. It's REQUIRED that RBridges other than the active/active members are not aware of the active/active group and need not change their frame processing behavior. Compared to the Appointed Forwarder mechanism, all active/active member RBridges are able to ingress and egress frames of VLAN-x on the same link. It is crucial to avoid loops and duplications in the frame processing. 4.1. Unicast Ingressing Receiver RBridges encapsulate native frames using the nickname of the pseudonode as the ingress nickname. When these TRILL data frames arrive at the remote RBridge, the MAC addresses will be learnt from packet decapsulation. The remote RBridge will regard the pseudonode as the egress RBridge for these MAC addresses. 4.2. Unicast Egressing As learnt in the MAC table, TRILL data frames from remote RBridges destined to the end node will be sent to the pseudonode rather than member RBridges. If member RBridges receive TRILL data frames whose egress RBridge is the pseudonode, they can judge that these frames should be egressed onto the MC-LAG. However, member RBridges MUST NOT egress any TRILL data frames whose ingress RBridge is the pseudonode. Otherwise, loops will happen. 4.3. Multicast Ingressing The end node chooses one component link of the MC-LAG to send multicast frames to member RBridges. Similar as the unicast ingressing, the receiver RBridge encapsulate the native frames using the nickname of the pseudonode as the ingress nickname. Different member RBridges MUST NOT share the same Distribution Tree to ingress a multicast frame of a specific VLAN-x from the end node. Otherwise, some multicast frames may suffer from loss due to Reverse Path Forwarding Check. This issues is detailed in Section 8. 4.4. Multicast Egressing Multicast frames sent along the VLAN-x Distribution Tree may reach Mingui Zhang, et al Expires February 23, 2014 [Page 6] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 all member RBridges. However, only one of them can egress the multicast frames onto the MC-LAG. Otherwise, the end node will suffer from frame duplication. This requirement can be met if member RBridges calculate the Distribution Tree regarding the pseudonode as a normal RBridge. Then only one parent RBridge will be selected for the pseudonode. Other non-parent member RBridges MUST refrain from egressing multicast frames of VLAN-x onto the MC-LAG. Similar as the unicast egressing, member RBridges MUST NOT egress any multicast frames whose ingress RBridge is the pseudonode. 5. DRB and Pseudonode As we know, a DRB MAY give a pseudonode name to a LAN link, issue an LSP (Link State PDU) on behalf of the pseudonode, and issues CSNPs (Complete Sequence Number PDUs) on the LAN link [RFC6325]. Different from a LAN link, there is no HELLO exchanging on the MC-LAG. Thus, the DRB cannot be elected using HELLO protocol. Member RBridges MAY establish a dedicated RBridge Channel to discover each other and elect the DRB (DRB for active/active RBridge group, aDRB) to execute the above tasks: to assign the nickname and issue LSP and CSNPs. The member RBridge with the highest priority to be the tree root is a good choice. Member RBridges SHOULD be able to discover each other to resolve misconfiguration and failures. Each member RBridge SHALL report their connection to the MC-LAG. The MAC address of the end node MAY be used to identify the MC-LAG to which the member RBridges are connected. One RBridge may be connected to multiple MC-LAGs. It's probably that all these MC-LAGs share the same set of member RBridges. However, these MC-LAGs MUST NOT share the same pseudonode, otherwise it can cause the following issue. o Component Links from Different MC-LAGs Cannot be Distinguished: Assume member RBridge RBi is connected to multiple end nodes and these links are all advertised as a single ISIS link "RBi-RBv". Remote RBridges cannot distinguish these links connecting RBi and RBv. When one of these links fails, it becomes problematic. On one hand, if the failed link is not advertised as a down ISIS link, traffic sent from remote RBridges to RBv via the failed link will be trapped by blackholing. On the other hand, if the failed link is announced as a down ISIS link. Component links from other MC-LAGs will be disconnected mistakenly. The right choice is to represent every MC-LAG as a unique pseudonode. In this way, the failure of a component link of an MC-LAG can be interpreted as an ISIS link failure. Thus the aDRB can issue a new Mingui Zhang, et al Expires February 23, 2014 [Page 7] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 LSP on half of the pseudonode to trigger the link state update across the campus. 6. MAC Addresses Sharing When a member RBridge learns a MAC address from the encapsulation or decapsulation of a TRILL data frame, it SHOULD share this learning among all member RBridges. Afterwards, a frame destined to this MAC address can be delivered to the MC-LAG or ingressed to the TRILL campus by any other member RBridge as a unicast native frame or TRILL data frame. a) Northbound Sharing: When a remote RBridge chooses the path to send data frames to the end node, these frames may arrive at anyone of the member RBridges, given that member RBridges may be on the Equal Cost Multiple Paths from the remote RBridge to the pseudonode. If the MAC address from the end node was learnt and recorded by any member RBridge before. The receiver RBridge SHOULD have recorded this MAC (VLAN ID, MAC Address, Port Number) as well, so that the frame can be delivered as a known unicast to the end node. Therefore, local MAC addresses learnt from data frames sent by the end node (northbound) SHOULD be shared among member RBridges. b) Southbound Sharing: The end node may choose any component link to inject a frame, which achieves load-balance on the MC-LAG. If the destination MAC address has been learnt by any member RBridge, the receiver RBridge SHOULD also hold that MAC record (VLAN ID, MAC Address, Egress RBridge Nickname). Thus the data frame need not be sent as a multicast frame (unknown unicast). Therefore, MAC addresses learnt from data frames sent by remote RBridges to the end node (southbound) should be shared as well. When an RBridge learns a source MAC address from a data frame, it will record the VLAN ID, the source MAC address and location which can be the incoming port number or the ingress nickname. A MAC address shared by a peer RBridge is recorded as if it is locally learned. For example, when RB1 shares a MAC with RB2, RB2 should set the incoming port as its port attaching to the end node. It is REQUIRED that all member RBridges set the same aging time for each MAC address. Every time a MAC address is learnt or updated, all member RBridges MUST update the record and reset its aging time. It's probably that data frames from one source MAC are received continuously. There is no problem to update the entry of this MAC locally. However, when this update is executed among multiple member RBridges, the intensive updates may consume a considerable bandwidth. Therefore, member RBridges need a communication channel to realize Mingui Zhang, et al Expires February 23, 2014 [Page 8] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 the MAC sharing, which can be realized through the extension of ESADI or using a dedicated RBridge Channel [Channel]. 7. Failures and Self-healing Resilience is a major purpose that the active/active edge aims to achieve. From the side of the end node, the MC-LAG provides reliability of the access link. From the side of the member RBridges, the state change of the active/active edge caused by link or node failures is reflected by the update of LSPs of member RBridges. This provides self-healing of the active/active edge. 7.1. Link Failure The failure of a component link of the MC-LAG link is translated into an ISIS link failure: if a member RBridge is disconnected from the end node, it will send out an LSP to announce that it is not connected to the pseudonode. This will trigger the update of forwarding tables of remote RBridges. Since other member RBridges have also reported the connection to the pseudonode, remote RBridges in the TRILL campus can send frames to the pseudonode via any other member RBridge. Therefore, the reach-ability to the end node is not broken by this link failure. If the link connecting the aDRB and the end node fails, the link failure will trigger the election of aDRB. The new aDRB SHOULD reuse the nickname allocated to the pseudonode, which avoids changing the locations of MAC addresses from the end node learnt by remote RBridges. The extreme case is that the last component link of the MC-LAG fails. Then the aDRB SHOULD update its LSPs to remove the pseudonode from the campus, which also destroys the whole active/active edge. 7.2. Node Failure The node failure of member RBridges will also be reflected by LSP announcement. If the aDRB fails, a new aDRB will be elected and this new aDRB SHOULD reuse the nickname of the pseudonode allocated by the old aDRB. 8. Reverse Path Forwarding Check Reverse Path Forwarding Check (RPFC) is used by TRILL to suppress forwarding loops of multicast frames [RFC6325]. For a specific Distribution Tree (DT), a multicast frame from a specific ingress RBridge can arrive at only one expected link of an RBridge. RBridges MUST drop multicast frames that fail the RPFC [RFC6325]. Mingui Zhang, et al Expires February 23, 2014 [Page 9] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 When multiple member RBridges ingress multicast frames for VLAN-x of the end node simultaneously, it can not guarantee that these frames always arrive at the expected link of at a remote RBridge. The following example explains this issue. RBi / \ RB1 RB2 / RBv Figure 7.1: The Distribution Tree, root=RBi Suppose a Distribution Tree of Figure 2.1 (b) is constructed as shown in Figure 7.1. For this Distributions Tree, multicast frames from RBv to RBi is expected to be received at the port attaching to RB1. With the active/active connection, RB2 can receive native data frames from the MC-LAG as well. If RB2 adopts the above Distribution Tree, multicast frames from RBv to RBi will be received at the port attaching to RB2. This brings the problem: these frames will be discarded according to the rule of RPFC. RBx RBy | | RBi RBi / \ / \ RB1 RB2 RB1 RB2 / \ RBv RBv (a) DT, root=RBx (b) DT, root=RBy Figure 7.2: Assign an Unique Tree to each Member RBridge One way to avoid the above issue is to leverage the feature that RBridges can compute multiple Distribution Trees. Be sure to assign an unique Distribution Tree to each member RBridge for multicast frame distribution. Identify these trees using their root RBridge nicknames. The example in Figure 7.2 illustrates this method, where RB1 and RB2 adopt two different Distribution Trees. Active/active edge need to assign at least one Distribution Tree per component link of an MC-LAG, the maximally allowed number of component links depends on the number of Distribution Trees that all RBridges can compute. However, MC-LAGs of the best current practice have two component links, which are well supported by TRILL switches. In [CMT], the Affinity TLV is used to achieve the above assignment of Mingui Zhang, et al Expires February 23, 2014 [Page 10] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 Distribution Trees to member RBridges. It is REQUIRED that all RBridges in the campus are able to recognize the Affinity TLV and compute Distribution Trees as this TLV specified. When there is a link or node failure in the active/active edge, the failed Distribution Tree should be re-allocated to a new member RBridge. It is RECOMMENDED that this re-allocation is incremental. In other words, other Distribution Trees not affected by the failure SHOULD be retained. 9. Security Considerations This document raises no new security issues for ISIS. 10. IANA Considerations This document requires no IANA actions. RFC Editor: please remove this section before publication. 11. References 11.1. Normative References [RFC6325] R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol Specification", RFC 6325, July 2011. [RFC6349] R. Perlman, D. Eastlake, et al, "RBridges: Appointed Forwarders", RFC 6349, November 2011. [Channel] D. Eastlake, V Manral, et al, "TRILL: RBridge Channel Support", draft-ietf-trill-rbridge-channel-08.txt, July 2012, working in progress. [CMT] T. Senevirathne, J. Pathangi, et al, "Coordinated Multicast Trees (CMT)for TRILL", draft-ietf-trill-cmt-01.txt, November 2012, working in progress. 11.2. Informative References None. Mingui Zhang, et al Expires February 23, 2014 [Page 11] INTERNET-DRAFT TRILL Active/Active Edge August 22, 2013 Author's Addresses Mingui Zhang Huawei Technologies No.156 Beiqing Rd. Haidian District, Beijing 100095 P.R. China Email: zhangmingui@huawei.com Donald E. Eastlake, 3rd Huawei Technologies 155 Beaver Street Milford, MA 01757 USA Phone: +1-508-333-2270 Email: d3e3e3@gmail.com Mingui Zhang, et al Expires February 23, 2014 [Page 12]