| rfc9722.original | rfc9722.txt | |||
|---|---|---|---|---|
| BESS Working Group P. Brissette | Internet Engineering Task Force (IETF) P. Brissette | |||
| Internet-Draft A. Sajassi | Request for Comments: 9722 A. Sajassi | |||
| Updates: 8584 (if approved) LA. Burdet, Ed. | Updates: 8584 LA. Burdet, Ed. | |||
| Intended status: Standards Track Cisco | Category: Standards Track Cisco | |||
| Expires: 24 May 2025 J. Drake | ISSN: 2070-1721 J. Drake | |||
| Independent | Independent | |||
| J. Rabadan | J. Rabadan | |||
| Nokia | Nokia | |||
| 20 November 2024 | April 2025 | |||
| Fast Recovery for EVPN Designated Forwarder Election | Fast Recovery for EVPN Designated Forwarder Election | |||
| draft-ietf-bess-evpn-fast-df-recovery-12 | ||||
| Abstract | Abstract | |||
| The Ethernet Virtual Private Network (EVPN) solution in RFC 7432 | The Ethernet Virtual Private Network (EVPN) solution in RFC 7432 | |||
| provides Designated Forwarder (DF) election procedures for multihomed | provides Designated Forwarder (DF) election procedures for multihomed | |||
| Ethernet Segments. These procedures have been enhanced further by | Ethernet Segments. These procedures have been enhanced further by | |||
| applying the Highest Random Weight (HRW) algorithm for Designated | applying the Highest Random Weight (HRW) algorithm for DF election to | |||
| Forwarder election to avoid unnecessary DF status changes upon a | avoid unnecessary DF status changes upon a failure. This document | |||
| failure. This document improves these procedures by providing a fast | improves these procedures by providing a fast DF election upon | |||
| Designated Forwarder election upon recovery of the failed link or | recovery of the failed link or node associated with the multihomed | |||
| node associated with the multihomed Ethernet Segment. This document | Ethernet Segment. This document updates RFC 8584 by optionally | |||
| updates RFC 8584 by optionally introducing delays between some of the | introducing delays between some of the events therein. | |||
| events therein. | ||||
| The solution is independent of the number of EVPN Instances (EVIs) | The solution is independent of the number of EVPN Instances (EVIs) | |||
| associated with that Ethernet Segment and it is performed via a | associated with that Ethernet Segment, and it is performed via a | |||
| simple signaling in BGP between the recovered node and each of the | simple signaling in BGP between the recovered node and each of the | |||
| other nodes in the multihoming group. | other nodes in the multihoming group. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
| provisions of BCP 78 and BCP 79. | ||||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
| and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
| time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
| material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
| Internet Standards is available in Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 24 May 2025. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9722. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
| 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 | 1.1. Requirements Language | |||
| 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 | 1.2. Terminology | |||
| 1.3. Challenges with Existing Mechanism . . . . . . . . . . . 4 | 1.3. Challenges with Existing Mechanism | |||
| 1.4. Design Principles for a Solution . . . . . . . . . . . . 5 | 1.4. Design Principles for a Solution | |||
| 2. DF Election Synchronization Solution . . . . . . . . . . . . 6 | 2. DF Election Synchronization Solution | |||
| 2.1. BGP Encoding . . . . . . . . . . . . . . . . . . . . . . 7 | 2.1. BGP Encoding | |||
| 2.2. Timestamp Verification . . . . . . . . . . . . . . . . . 9 | 2.2. Timestamp Verification | |||
| 2.3. Updates to RFC8584 . . . . . . . . . . . . . . . . . . . 9 | 2.3. Updates to RFC 8584 | |||
| 3. Synchronization Scenarios . . . . . . . . . . . . . . . . . . 10 | 3. Synchronization Scenarios | |||
| 3.1. Concurrent Recoveries . . . . . . . . . . . . . . . . . . 12 | 3.1. Concurrent Recoveries | |||
| 4. Backwards Compatibility . . . . . . . . . . . . . . . . . . . 13 | 4. Backwards Compatibility | |||
| 5. Security Considerations . . . . . . . . . . . . . . . . . . . 14 | 5. Security Considerations | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 | 6. IANA Considerations | |||
| 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 | 7. References | |||
| 7.1. Normative References . . . . . . . . . . . . . . . . . . 14 | 7.1. Normative References | |||
| 7.2. Informative References . . . . . . . . . . . . . . . . . 15 | 7.2. Informative References | |||
| Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 15 | Acknowledgements | |||
| Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 16 | Contributors | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 | Authors' Addresses | |||
| 1. Introduction | 1. Introduction | |||
| The Ethernet Virtual Private Network (EVPN) solution [RFC7432] is | The Ethernet Virtual Private Network (EVPN) solution [RFC7432] is | |||
| widely used in data center (DC) applications for Network | widely used in data center (DC) applications for Network | |||
| Virtualization Overlay (NVO) and DC interconnect (DCI) services, and | Virtualization Overlay (NVO) and Data Center Interconnect (DCI) | |||
| in service provider (SP) applications for next generation virtual | services and in service provider (SP) applications for next- | |||
| private LAN services. | generation virtual private LAN services. | |||
| [RFC7432] describes Designated Forwarder (DF) election procedures for | [RFC7432] describes Designated Forwarder (DF) election procedures for | |||
| multihomed Ethernet Segments. These procedures are enhanced further | multihomed Ethernet Segments. These procedures are enhanced further | |||
| in [RFC8584] by applying the Highest Random Weight algorithm for DF | in [RFC8584] by applying the Highest Random Weight (HRW) algorithm | |||
| election in order to avoid unnecessary DF status changes upon a link | for DF election in order to avoid unnecessary DF status changes upon | |||
| or node failure associated with the multihomed Ethernet Segment. | a link or node failure associated with the multihomed Ethernet | |||
| Segment. | ||||
| This document makes further improvements to the DF election | This document makes further improvements to the DF election | |||
| procedures in [RFC8584] by providing an option for a fast DF election | procedures in [RFC8584] by providing an option for a fast DF election | |||
| upon recovery of the failed link or node associated with the | upon recovery of the failed link or node associated with the | |||
| multihomed Ethernet Segment. This DF election is achieved | multihomed Ethernet Segment. This DF election is achieved | |||
| independent of the number of EVPN Instances (EVIs) associated with | independent of the number of EVPN Instances (EVIs) associated with | |||
| that Ethernet Segment and it is performed via straightforward | that Ethernet Segment, and it is performed via straightforward | |||
| signaling in BGP between the recovered node and each of the other | signaling in BGP between the recovered node and each of the other | |||
| nodes in the multihomed Ethernet Segment redundancy group. | nodes in the multihomed Ethernet Segment redundancy group. | |||
| This document updates the DF Election Finite State Machine (FSM) | This document updates the DF Election Finite State Machine (FSM) | |||
| described in Section 2.1 of [RFC8584], by optionally introducing | described in Section 2.1 of [RFC8584] by optionally introducing | |||
| delays between some events, as further detailed in Section 2.3. The | delays between some events, as further detailed in Section 2.3. The | |||
| solution is based on a simple one-way signaling mechanism. | solution is based on a simple one-way signaling mechanism. | |||
| 1.1. Requirements Language | 1.1. Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 1.2. Terminology | 1.2. Terminology | |||
| PE: Provider Edge device. | PE: Provider Edge | |||
| Designated Forwarder (DF): A PE that is currently forwarding | DF: Designated Forwarder. A PE that is currently forwarding | |||
| (encapsulating/decapsulating) traffic for a given VLAN in and out | (encapsulating/decapsulating) traffic for a given VLAN in and out | |||
| of a site. | of a site. | |||
| NDF: Non-Designated Forwarder, a PE that is currently blocking | NDF: Non-Designated Forwarder. A PE that is currently blocking | |||
| traffic (see DF above). | traffic (see DF above). | |||
| EVI: An EVPN instance spanning the Provider Edge (PE) devices | EVI: EVPN Instance. It spans the PE devices participating in that | |||
| participating in that EVPN. | EVPN. | |||
| HRW: Highest Random Weight algorithm, [HRW98] | HRW: Highest Random Weight algorithm [HRW98] | |||
| Service carving: DF Election is also referred to as "service | Service carving: This refers to DF election, as defined in | |||
| carving" in [RFC7432] | [RFC7432]. | |||
| SCT: Service Carving Time, defined in this document, the time at | SCT: Service Carving Time. Defined in this document as the time at | |||
| which all nodes participating in an Ethernet Segment perform DF | which all nodes participating in an Ethernet Segment perform DF | |||
| Election. | Election. | |||
| 1.3. Challenges with Existing Mechanism | 1.3. Challenges with Existing Mechanism | |||
| In EVPN technology, multiple Provider Edge (PE) devices encapsulate | In EVPN technology, multiple PE devices encapsulate and decapsulate | |||
| and decapsulate data belonging to the same VLAN. Under certain | data belonging to the same VLAN. Under certain conditions, this may | |||
| conditions, this may cause duplicated Ethernet packets and potential | cause duplicated Ethernet packets and potential loops if there is a | |||
| loops if there is a momentary overlap in forwarding roles between two | momentary overlap in forwarding roles between two or more PE devices, | |||
| or more PE devices, potentially also leading to broadcast storms of | potentially also leading to broadcast storms of frames forwarded back | |||
| frames forwarded back into the VLAN. | into the VLAN. | |||
| EVPN [RFC7432] currently specifies timer-based synchronization among | EVPN [RFC7432] currently specifies timer-based synchronization among | |||
| PE devices within an Ethernet Segment redundancy group. This | PE devices within an Ethernet Segment redundancy group. This | |||
| approach can lead to duplications and potential loops due to multiple | approach can lead to duplications and potential loops due to multiple | |||
| Designated Forwarders (DFs) if the timer interval is too short, or to | DFs if the timer interval is too short or can lead to packet drops if | |||
| packet drops if the timer interval is too long. | the timer interval is too long. | |||
| Split-horizon filtering, as described in Section 8.3 of [RFC7432], | Split-horizon filtering, as described in Section 8.3 of [RFC7432], | |||
| can prevent loops but does not address duplicates. However, if there | can prevent loops but does not address duplicates. However, if there | |||
| are overlapping Designated Forwarders of two different sites | are overlapping DFs of two different sites simultaneously for the | |||
| simultaneously for the same VLAN, the site identifier will differ | same VLAN, the site identifier will differ when the packet re-enters | |||
| when the packet re-enters the Ethernet Segment. Consequently, the | the Ethernet Segment. Consequently, the split-horizon check will | |||
| split-horizon check will fail, resulting in layer-2 loops. | fail, resulting in Layer 2 loops. | |||
| The updated DF procedures outlined in [RFC8584] use the well-known | The updated DF procedures outlined in [RFC8584] use the well-known | |||
| Highest Random Weight (HRW) algorithm to prevent the reshuffling of | HRW algorithm to prevent the reshuffling of VLANs among PE devices | |||
| VLANs among PE devices within the Ethernet Segment redundancy group | within the Ethernet Segment redundancy group during failure or | |||
| during failure or recovery events. This approach minimizes the | recovery events. This approach minimizes the impact on VLANs not | |||
| impact on VLANs not assigned to the failed or recovered ports and | assigned to the failed or recovered ports and eliminates the | |||
| eliminates the occurrence of loops or duplicates during such events. | occurrence of loops or duplicates during such events. | |||
| However, upon PE insertion or a port being newly added to a | However, upon PE insertion or a port being newly added to a | |||
| multihomed Ethernet Segment, HRW cannot help either as a transfer of | multihomed Ethernet Segment, the HRW cannot help either, as a | |||
| DF role to the new port must occur while the old DF is still active. | transfer of the DF role to the new port must occur while the old DF | |||
| is still active. | ||||
| +---------+ | +---------+ | |||
| +-------------+ | | | +-------------+ | | | |||
| | | | | | | | | | | |||
| / | PE1 |----| | +-------------+ | / | PE1 |----| | +-------------+ | |||
| / | | | MPLS/ | | |---CE3 | / | | | MPLS/ | | |---CE3 | |||
| / +-------------+ | VxLAN/ | | PE3 | | / +-------------+ | VxLAN/ | | PE3 | | |||
| CE1 - | Cloud | | | | CE1 - | Cloud | | | | |||
| \ +-------------+ | |---| | | \ +-------------+ | |---| | | |||
| \ | | | | +-------------+ | \ | | | | +-------------+ | |||
| \ | PE2 |----| | | \ | PE2 |----| | | |||
| | | | | | | | | | | |||
| +-------------+ | | | +-------------+ | | | |||
| +---------+ | +---------+ | |||
| Figure 1: CE1 multihomed to PE1 and PE2. | Figure 1: CE1 Multihomed to PE1 and PE2 | |||
| In Figure 1, when PE2 is inserted in the Ethernet Segment or its | In Figure 1, when PE2 is inserted in the Ethernet Segment or its | |||
| CE1-facing interface recovered, PE1 will transfer the DF role of some | CE1-facing interface is recovered, PE1 will transfer the DF role of | |||
| VLANs to PE2 to achieve load balancing. However, because there is no | some VLANs to PE2 to achieve load-balancing. However, because there | |||
| handshake mechanism between PE1 and PE2, overlapping of DF roles for | is no handshake mechanism between PE1 and PE2, overlapping of DF | |||
| a given VLAN is possible which leads to duplication of traffic as | roles for a given VLAN is possible, which leads to duplication of | |||
| well as layer-2 loops. | traffic as well as Layer 2 loops. | |||
| Current EVPN specifications [RFC7432] and [RFC8584] rely on a timer- | Current EVPN specifications [RFC7432] and [RFC8584] rely on a timer- | |||
| based approach for transferring the DF role to the newly inserted | based approach for transferring the DF role to the newly inserted | |||
| device. This can cause the following issues: | device. This can cause the following issues: | |||
| * Loops/Duplicates if the timer value is too short | * Loops and duplicates, if the timer value is too short | |||
| * Prolonged Traffic Blackholing if the timer value is too long | * Prolonged traffic loss, if the timer value is too long | |||
| 1.4. Design Principles for a Solution | 1.4. Design Principles for a Solution | |||
| The clock-synchronization solution for fast DF recovery presented in | The clock-synchronization solution for fast DF recovery presented in | |||
| this document follows several design principles and offers multiple | this document follows several design principles and offers multiple | |||
| advantages, namely: | advantages, namely: | |||
| * Complex handshake signaling mechanisms and state machines are | * Complex handshake signaling mechanisms and state machines are | |||
| avoided in favor of a simple uni-directional signaling approach. | avoided in favor of a simple unidirectional signaling approach. | |||
| * The fast DF recovery solution maintains backwards compatibility | * The fast DF recovery solution maintains backwards compatibility | |||
| (see Section 4) by ensuring that PEs reject any unrecognized new | (see Section 4) by ensuring that PEs reject any unrecognized new | |||
| BGP EVPN Extended Community. | BGP EVPN Extended Community. | |||
| * Existing DF Election algorithms remain supported. | * Existing DF Election algorithms remain supported. | |||
| * The fast DF recovery solution is independent of any BGP delays in | * The fast DF recovery solution is independent of any BGP delays in | |||
| propagation of Ethernet Segment routes (Route Type 4) | propagation of Ethernet Segment routes (Route Type 4) | |||
| * The fast DF recovery solution is agnostic of the actual time | * The fast DF recovery solution is agnostic of the actual time | |||
| synchronization mechanism used; however, an NTP-based | synchronization mechanism used; however, an NTP-based | |||
| representation of time is used for EVPN signaling. | representation of time is used for EVPN signaling. | |||
| The solution in this document relies on nodes in the topology, more | The solution in this document relies on nodes in the topology, more | |||
| specifically the peering nodes of each Ethernet-Segment, to be clock- | specifically the peering nodes of each Ethernet-Segment, to be clock- | |||
| synchronized and advertise Time Synchronization capability. When | synchronized and to advertise the Time Synchronization capability. | |||
| this is not the case, or clocks are badly desynchronized, network | When this is not the case, or when clocks are badly desynchronized, | |||
| convergence and DF Election is no worse than [RFC7432] due to the | network convergence and DF Election is no worse than that described | |||
| timestamp range checking (Section 2.2). | in [RFC7432] due to the timestamp range checking (Section 2.2). | |||
| 2. DF Election Synchronization Solution | 2. DF Election Synchronization Solution | |||
| The fast DF recovery solution relies on the concept of common clock | The fast DF recovery solution relies on the concept of common clock | |||
| alignment between partner PEs participating in a common Ethernet | alignment between partner PEs participating in a common Ethernet | |||
| Segment, i.e., PE1 and PE2 in Figure 1. The main idea is to have all | Segment, i.e., PE1 and PE2 in Figure 1. The main idea is to have all | |||
| peering PEs of that Ethernet Segment perform DF election and apply | peering PEs of that Ethernet Segment perform DF election and apply | |||
| the result at the same previously-announced time. | the result at the same previously announced time. | |||
| The DF Election procedure, as described in [RFC7432] and as | The DF Election procedure, as described in [RFC7432] and as | |||
| optionally signaled in [RFC8584], is applied. All PEs attached to a | optionally signaled in [RFC8584], is applied. All PEs attached to a | |||
| given Ethernet Segment are clock-synchronized using a networking | given Ethernet Segment are clock-synchronized using a networking | |||
| protocol for clock synchronization (e.g., NTP, PTP). Whenever | protocol for clock synchronization (e.g., NTP, Precision Time | |||
| possible, recovery activities for failed PEs SHOULD NOT be initiated | Protocol (PTP)). Whenever possible, recovery activities for failed | |||
| until after the underlying clock synchronization protocol has | PEs SHOULD NOT be initiated until after the underlying clock | |||
| converged to benefit from this document's fast DF recovery | synchronization protocol has converged to benefit from this | |||
| procedures. When a new PE is inserted in an Ethernet Segment or a | document's fast DF recovery procedures. When a new PE is inserted in | |||
| failed PE of the Ethernet Segment recovers, that PE communicates to | an Ethernet Segment or when a failed PE of the Ethernet Segment | |||
| peering partners the current time plus the value of the timer for | recovers, that PE communicates to peering partners the current time | |||
| partner discovery from step 2 in Section 8.5 of [RFC7432]. This | plus the value of the timer for partner discovery from step 2 in | |||
| constitutes an "end time" or "absolute time" as seen from the local | Section 8.5 of [RFC7432]. This constitutes an "end time" or | |||
| PE. That absolute time is called the "Service Carving Time" (SCT). | "absolute time" as seen from the local PE. That absolute time is | |||
| called the Service Carving Time (SCT). | ||||
| A new BGP EVPN Extended Community, the Service Carving Time is | A new BGP EVPN Extended Community, the Service Carving Time, is | |||
| advertised along with the Ethernet Segment Route Type 4 (RT-4) and | advertised along with the Ethernet Segment Route Type 4 (RT-4) and | |||
| communicates the Service Carving Time to other partners to ensure an | communicates the SCT to other partners to ensure an orderly transfer | |||
| orderly transfer of forwarding duties. | of forwarding duties. | |||
| Upon receipt of the new BGP EVPN Extended Community, partner PEs can | Upon receipt of the new BGP EVPN Extended Community, partner PEs can | |||
| determine the service carving time of the newly inserted PE. To | determine the SCT of the newly inserted PE. To eliminate any | |||
| eliminate any potential for duplicate traffic or loops, the concept | potential for duplicate traffic or loops, the concept of "skew" is | |||
| of skew is introduced: a small time offset to ensure a controlled and | introduced: a small time offset to ensure a controlled and orderly | |||
| orderly transition when multiple Provider Edge (PE) devices are | transition when multiple PE devices are involved. The previously | |||
| involved. The previously inserted PE(s) must perform service carving | inserted PE(s) must perform service carving first for NDF to DF | |||
| first for NDF to DF transitions. The receiving PEs subtract this | transitions. The receiving PEs subtract this skew (default = 10 ms) | |||
| skew (default = 10ms) to the Service Carving Time and apply NDF to DF | to the Service Carving Time and apply NDF to DF transitions first. | |||
| transitions first. This is followed shortly by the NDF to DF | This is followed shortly by the NDF to DF transitions on both PEs, | |||
| transitions on both PEs, after the skew delay. On the recovering PE, | after the skew delay. On the recovering PE, all services are already | |||
| all services are already in NDF state and no skew for DF to NDF | in NDF state, and no skew for DF to NDF transitions is required. | |||
| transitions is required. | ||||
| This document proposes a default skew value of 10ms to allow | This document proposes a default skew value of 10 ms to allow | |||
| completion of programming the DF to NDF transitions, but | completion of programming the DF to NDF transitions, but | |||
| implementations may make the skew larger (or configurable) taking | implementations may make the skew larger (or configurable) taking | |||
| into consideration scale, hardware capabilities and clock accuracy. | into consideration scale, hardware capabilities, and clock accuracy. | |||
| To summarize, all peering PEs perform service carving almost | To summarize, all peering PEs perform service carving almost | |||
| simultaneously at the time announced by the newly added/recovered PE. | simultaneously at the time announced by the newly added/recovered PE. | |||
| The newly inserted PE initiates the SCT, and triggers service carving | The newly inserted PE initiates the SCT and triggers service carving | |||
| immediately on its local timer expiry. The previously inserted PE(s) | immediately on its local timer expiry. The previously inserted PE(s) | |||
| receiving Ethernet Segment route (RT-4) with an SCT BGP extended | receiving Ethernet Segment route (RT-4) with an SCT BGP extended | |||
| community, perform service carving shortly before Service Carving | community perform service carving shortly before the SCT for DF to | |||
| Time for DF to NDF transitions, and at Service Carving Time for NDF | NDF transitions and at the SCT for NDF to DF transitions. | |||
| to DF transitions. | ||||
| 2.1. BGP Encoding | 2.1. BGP Encoding | |||
| A BGP extended community, with Type 0x06 and Sub-Type 0x0F, is | A BGP extended community, with Type 0x06 and Sub-Type 0x0F, is | |||
| defined to communicate the Service Carving Time for each Ethernet | defined to communicate the SCT for each Ethernet Segment: | |||
| Segment: | ||||
| 1 2 3 | 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ | | Type = 0x06 | Sub-Type(0x0F)| Timestamp Seconds ~ | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ~ Timestamp Seconds | Timestamp Fractional Seconds | | ~ Timestamp Seconds | Timestamp Fraction | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 2: Service Carving Time | Figure 2: Service Carving Time | |||
| The timestamp exchanged uses the NTP prime epoch of January 1, 1900 | The timestamp exchanged uses the NTP prime epoch of 0 h 1 January | |||
| [RFC5905] and an adapted form of the 64-bit NTP Timestamp Format. | 1900 UTC [RFC5905] and an adapted form of the 64-bit NTP timestamp | |||
| The 64-bit NTP Timestamp Format consists of a 32-bit part for Seconds | format. | |||
| and a 32-bit part for Fraction, which are encoded in the Service | ||||
| The 64-bit NTP timestamp format consists of a 32-bit unsigned seconds | ||||
| field and a 32-bit fraction field, which are encoded in the Service | ||||
| Carving Time as follows: | Carving Time as follows: | |||
| * Timestamp Seconds: 32-bit NTP seconds are encoded in this field. | Timestamp Seconds: 32-bit NTP seconds are encoded in this field. | |||
| * Timestamp Fractional Seconds: the high order 16 bits of the NTP | Timestamp Fraction: The high-order 16 bits of the NTP "Fraction" | |||
| 'Fraction' field are encoded in this field. | field are encoded in this field. | |||
| When rebuilding a 64-bit NTP Timestamp Format using the values from a | When rebuilding a 64-bit NTP timestamp format using the values from a | |||
| received SCT BGP extended community, the lower order 16 bits of the | received SCT BGP extended community, the lower-order 16 bits of the | |||
| Fractional field are set to 0. The use of a 16-bit fractional | NTP "Fraction" field are set to 0. The use of a 16-bit fractional | |||
| seconds value yields adequate precision of 15 microseconds (2^-16 s). | seconds value yields adequate precision of 15 microseconds (2^-16 s). | |||
| This document introduces a new flag called Time Synchronization | The format of the DF Election Extended Community that is used in this | |||
| indicated by "T" in the DF Election Capabilities registry defined in | document is: | |||
| [RFC8584] for use in DF Election Extended Community. | ||||
| 1 2 3 | 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ | | Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| ~ Bitmap | Reserved | | ~ Bitmap | Reserved | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 4: DF Election Extended Community | Figure 3: DF Election Extended Community (RFC 8584) | |||
| Figure 3: DF Election Extended Community | The Bitmap field (2 octets) encodes "capabilities" [RFC8584], where | |||
| this document introduces a new Time Synchronization capability | ||||
| indicated by "T". | ||||
| 1 1 | 1 1 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | |A| |T| | | | |A| |T| | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 5: DF Election Capabilities | Figure 4: Bitmap Field in the DF Election Extended Community | |||
| Figure 4: DF Election Capabilities | ||||
| * Bit 3: Time Synchronization (corresponds to Bit 27 of the DF | Bit 3: Time Synchronization (corresponds to Bit 27 of the DF | |||
| Election Extended Community). When set to 1, it indicates the | Election Extended Community). When set to 1, it indicates the | |||
| desire to use Time Synchronization capability with the rest of the | desire to use the Time Synchronization capability with the rest of | |||
| PEs in the Ethernet Segment. | the PEs in the Ethernet Segment. | |||
| This capability is utilized in conjunction with the agreed-upon DF | This capability is utilized in conjunction with the agreed-upon DF | |||
| Election Type. For instance, if all the PE devices in the Ethernet | Election Type. For instance, if all the PE devices in the Ethernet | |||
| Segment indicate the desire to use the Time Synchronization | Segment indicate the desire to use the Time Synchronization | |||
| capability and request the DF Election Type to be Highest Random | capability and request the DF Election Type to be the HRW, then the | |||
| Weight (HRW), then the HRW algorithm is used in conjunction with this | HRW algorithm is used in conjunction with this capability. A PE that | |||
| capability. A PE which does not support the procedures set out in | does not support the procedures set out in this document or that | |||
| this document, or receives a route from another PE in which the | receives a route from another PE in which the capability is not set | |||
| capability is not set, MUST NOT delay Designated Forwarder election | MUST NOT delay DF election as this could lead to duplicate traffic in | |||
| as this could lead to duplicate traffic in some instances | some instances (overlapping DFs). | |||
| (overlapping Designated Forwarders). | ||||
| 2.2. Timestamp Verification | 2.2. Timestamp Verification | |||
| The NTP Era value is not exchanged and participating PEs may consider | The NTP Era value is not exchanged, and participating PEs may | |||
| the timestamps to be in the same Era as their local value. A DF | consider the timestamps to be in the same Era as their local value. | |||
| Election operation occurring exactly at the next Era transition will | A DF Election operation occurring exactly at the next Era transition | |||
| be sometime on February 7, 2036. Implementors and operators may | will be some time on February 7, 2036. Implementors and operators | |||
| address credible cases of rollover ambiguity (adjacent Eras n and | may address credible cases of rollover ambiguity (adjacent Eras n and | |||
| n+1), as well as the security issue of unreasonably large or | n+1) as well as the security issue of unreasonably large or | |||
| unreasonably small NTP timestamps, in the following manner. | unreasonably small NTP timestamps in the following manner. | |||
| The procedures in this document address implicitly what occurs with | The procedures in this document address implicitly what occurs with | |||
| receiving a SCT value in the past. This would be a naturally | receiving an SCT value in the past. This would be a naturally | |||
| occurring event with a large BGP propagation delay: the receiving PE | occurring event with a large BGP propagation delay: the receiving PE | |||
| treats the DF Election at the peer as having occurred already and | treats the DF Election at the peer as having already occurred and | |||
| proceeds without starting any timer to further delay service carving, | proceeds without starting any timer to further delay service carving, | |||
| effectively falling back on [RFC7432] behavior. A PE which receives | effectively falling back on behavior as specified in [RFC7432]. A PE | |||
| a SCT value smaller than its current time, MUST discard the Service | that receives an SCT value smaller than its current time MUST discard | |||
| Carving Time and SHALL treat the DF Election at the peer as having | the Service Carving Time and SHALL treat the DF Election at the peer | |||
| occurred already. | as having occurred already. | |||
| The more problematic scenario is the PE in Era n+1 that receives an | ||||
| SCT advertised by the PE still in Era n, with a very large SCT value. | ||||
| To address this Era rollover as well as the large values attack | ||||
| vector, implementations MUST validate the received SCT against an | ||||
| upper bound. | ||||
| The more problematic scenario is the PE in Era n+1 which receives a | ||||
| Service Carving Time advertised by the PE still in Era n, with a very | ||||
| large SCT value. To address this Era rollover as well as the large | ||||
| values attack vector, implementations MUST validate the received SCT | ||||
| against an upper-bound. | ||||
| It is left to implementations to decide what constitutes an | It is left to implementations to decide what constitutes an | |||
| "unreasonably large" SCT value. A recommended approach, however, is | "unreasonably large" SCT value. A recommended approach, however, is | |||
| to compare the received offset to the local peering timer value. In | to compare the received offset to the local peering timer value. In | |||
| practice, peering timer values are configured uniformly across | practice, peering timer values are configured uniformly across | |||
| Ethernet-Segment peers and may be treated as an upper-bound on the | Ethernet Segment peers and may be treated as an upper bound on the | |||
| offset of received SCT values. A PE which receives an SCT | offset of received SCT values. A PE that receives an SCT | |||
| representing an offset larger than the local peering timer MUST | representing an offset larger than the local peering timer MUST | |||
| discard the Service Carving Time and SHALL treat the DF Election at | discard the SCT and SHALL treat the DF Election at the peer as having | |||
| the peer as having occurred already, as above. | already occurred, as above. | |||
| 2.3. Updates to RFC8584 | 2.3. Updates to RFC 8584 | |||
| This document introduces an additional delay to the events and | This document introduces an additional delay to the events and | |||
| transitions defined for the default DF election algorithm FSM in | transitions defined for the default DF election algorithm FSM in | |||
| Section 2.1 of [RFC8584] without changing the FSM state or event | Section 2.1 of [RFC8584] without changing the FSM state or event | |||
| definitions themselves. | definitions themselves. | |||
| Upon receiving a RCVD_ES message, the peering PE's Finite State | Upon receiving an RCVD_ES message, the peering PE's FSM transitions | |||
| Machine (FSM) transitions from the DF_DONE (indicating the DF | from the DF_DONE state (indicating the DF election process was | |||
| election process was complete) state to the DF_CALC (indicating that | complete) to the DF_CALC state (indicating that a new DF calculation | |||
| a new DF calculation is needed) state. Due to the Service Carving | is needed). Due to the SCT included in the Ethernet Segment update, | |||
| Time (SCT) included in the Ethernet-Segment update, the completion of | the completion of the DF_CALC state and the subsequent transition | |||
| the DF_CALC state and the subsequent transition back to the DF_DONE | back to the DF_DONE state are delayed. This delay ensures proper | |||
| state are delayed. This delay ensures proper synchronization and | synchronization and prevents conflicts. Consequently, the | |||
| prevents conflicts. Consequently, the accompanying forwarding | accompanying forwarding updates to the DF and NDF states are also | |||
| updates to the Designated Forwarder (DF) and Non-Designated Forwarder | deferred. | |||
| (NDF) states are also deferred. | ||||
| Item 9. in Section 2.1 of [RFC8584], the list "Corresponding actions | ||||
| when transitions are performed or states are entered/exited" is | ||||
| changed as follows: | ||||
| 9. DF_CALC on CALCULATED: Mark the election result for the VLAN or | ||||
| VLAN Bundle. | ||||
| 9.1 If an SCT timestamp is present during the RCVD_ES event of | ||||
| Action 11, wait until the time indicated by the SCT minus | ||||
| skew before proceeding to step 9.3. | ||||
| 9.2 If an SCT timestamp is present during the RCVD_ES event of | ||||
| Action 11, wait until the time indicated by the SCT before | ||||
| proceeding to step 9.4. | ||||
| 9.3 Assume the role of NDF for the local PE concerning the VLAN | Item 9 in Section 2.1 of [RFC8584], in the list "Corresponding | |||
| or VLAN Bundle, and transition to the DF_DONE state. | actions when transitions are performed or states are entered/exited", | |||
| is changed as follows: | ||||
| 9.4 Assume the role of DF for the local PE concerning the VLAN | | 9. DF_CALC on CALCULATED: Mark the election result for the VLAN | |||
| or VLAN Bundle, and transition to the DF_DONE state. | | or VLAN bundle. | |||
| | | ||||
| | 9.1 If no Service Carving Time is present during the RCVD_ES | ||||
| | event of Action 11, proceed to step 9.4 | ||||
| | | ||||
| | 9.2 If a Service Carving Time is present during the RCVD_ES | ||||
| | event of Action 11, wait until the time indicated by the | ||||
| | SCT minus skew before proceeding to step 9.3. | ||||
| | | ||||
| | 9.3 Assume the role of NDF for the local PE concerning the | ||||
| | VLAN or VLAN bundle. Wait the remaining skew time before | ||||
| | proceeding to step 9.4. | ||||
| | | ||||
| | 9.4 Assume the election result's role (DF or NDF) for the | ||||
| | local PE concerning the VLAN or VLAN bundle and | ||||
| | transition to the DF_DONE state. | ||||
| This revised approach ensures proper timing and synchronization in | This revised approach ensures proper timing and synchronization in | |||
| the DF election process, avoiding conflicts and ensuring accurate | the DF election process, avoiding conflicts and ensuring accurate | |||
| forwarding updates. | forwarding updates. | |||
| 3. Synchronization Scenarios | 3. Synchronization Scenarios | |||
| Consider Figure 1 as an example, where initially PE2 has failed and | Consider Figure 1 as an example, where initially PE2 has failed and | |||
| PE1 has taken over. This scenario illustrates the problem with the | PE1 has taken over. This scenario illustrates the problem with the | |||
| DF-Election mechanism described in Section 8.5 of [RFC7432], | DF Election mechanism described in Section 8.5 of [RFC7432], | |||
| specifically in the context of the timer value configured for all PEs | specifically in the context of the timer value configured for all PEs | |||
| on the Ethernet Segment. | on the Ethernet Segment. | |||
| Procedure based on Section 8.5 of [RFC7432] with the default 3-second | The following procedure is based on Section 8.5 of [RFC7432] with the | |||
| timer in step 2: | default 3-second timer in step 2. | |||
| 1. Initial state: PE1 is in a steady-state and PE2 is recovering. | 1. Initial state: PE1 is in a steady-state and PE2 is recovering. | |||
| 2. Recovery: PE2 recovers at an absolute time of t=99. | 2. Recovery: PE2 recovers at an absolute time of t=99. | |||
| 3. Advertisement: PE2 advertises RT-4, sent at t=100, to partner | 3. Advertisement: PE2 advertises RT-4, sent at t=100, to its partner | |||
| PE1. | (PE1). | |||
| 4. Timer Start: PE2 starts a 3-second timer to allow the reception | 4. Timer Start: PE2 starts a 3-second timer to allow the reception | |||
| of RT-4 from other PE nodes. | of RT-4 from other PE nodes. | |||
| 5. Immediate carving: PE1 performs service carving immediately upon | 5. Immediate carving: PE1 performs service carving immediately upon | |||
| RT-4 reception, i.e., t=100 plus some BGP propagation delay. | RT-4 reception, i.e., t=100 plus some BGP propagation delay. | |||
| 6. Delayed Carving: PE2 performs service carving at time t=103. | 6. Delayed Carving: PE2 performs service carving at time t=103. | |||
| [RFC7432] favors traffic drops over duplicate traffic. With the | [RFC7432] favors traffic drops over duplicate traffic. With the | |||
| above procedure, traffic drops will occur as part of each PE recovery | above procedure, traffic drops will occur as part of each PE recovery | |||
| sequence since PE1 transitions some VLANs to Non-Designated Forwarder | sequence since PE1 transitions some VLANs to an NDF immediately upon | |||
| (NDF) immediately upon RT-4 reception. | RT-4 reception. The timer value (default = 3 seconds) directly | |||
| The timer value (default = 3 seconds) directly affects the duration | affects the duration of the packet drops. A shorter (or zero) timer | |||
| of the packet drops. A shorter (or zero) timer may result in | may result in duplicate traffic or traffic loops. | |||
| duplicate traffic or traffic loops. | ||||
| Procedure based on the Service Carving Time (SCT) approach: | The following procedure is based on the SCT approach: | |||
| 1. Initial state: PE1 is in a steady state, and PE2 is recovering. | 1. Initial state: PE1 is in a steady state, and PE2 is recovering. | |||
| 2. Recovery: PE2 recovers at an absolute time of t=99. | 2. Recovery: PE2 recovers at an absolute time of t=99. | |||
| 3. Timer Start: PE2 starts at t=100 a 3-second timer to allow the | 3. Timer Start: PE2 starts at t=100 a 3-second timer to allow the | |||
| reception of RT-4 from other PE nodes. | reception of RT-4 from other PE nodes. | |||
| 4. Advertisement: PE2 advertises RT-4, sent at t=100, with a target | 4. Advertisement: PE2 advertises RT-4, sent at t=100, with a target | |||
| SCT value of t=103 to partner PE1. | SCT value of t=103 to its partner (PE1). | |||
| 5. Service Carving Timer: PE1 starts the service carving timer, with | 5. Service Carving Timer: PE1 starts the service carving timer, with | |||
| the remaining time until t=103. | the remaining time until t=103. | |||
| 6. Simultaneous Carving: Both PE1 and PE2 carve at an absolute time | 6. Simultaneous Carving: Both PE1 and PE2 carve at an absolute time | |||
| of t=103. | of t=103. | |||
| To maintain the preference for minimal loss over duplicate traffic, | To maintain the preference for minimal loss over duplicate traffic, | |||
| PE1 SHOULD carve slightly before PE2 (with skew). The recovering PE2 | PE1 SHOULD carve slightly before PE2 (with skew). The recovering PE2 | |||
| performs both DF to NDF and NDF to DF transitions per VLAN at the | performs both DF-to-NDF and NDF-to-DF transitions per VLAN at the | |||
| timer's expiry. The original PE1, which received the SCT, applies | timer's expiry. The original PE1, which received the SCT, applies | |||
| the following: | the following: | |||
| * DF to NDF Transition(s): at t=SCT minus skew, where both PEs are | * DF-to-NDF Transition(s): at t=SCT minus skew, where both PEs are | |||
| NDF for the skew duration. | NDF for the skew duration. | |||
| * NDF to DF Transition(s): at t=SCT. | * NDF-to-DF Transition(s): at t=SCT. | |||
| This split-behavior ensures a smooth DF role transition with minimal | This split behavior ensures a smooth DF role transition with minimal | |||
| loss. | loss. | |||
| Using the SCT approach, the negative effect of the timer to allow the | The SCT approach mitigates the negative effect of requiring a timer | |||
| reception of Ethernet Segment RT-4 from other PE nodes is mitigated. | for discovery of Ethernet Segment (ES) RT-4 from other PE nodes. | |||
| Furthermore, the BGP transmission delay (from PE2 to PE1) of the ES | Furthermore, the BGP transmission delay (from PE2 to PE1) of the ES | |||
| RT-4 becomes a non-issue. The SCT approach shortens the 3-second | RT-4 becomes a non-issue. The SCT approach shortens the 3-second | |||
| timer window to the order of milliseconds. | timer window to the order of milliseconds. | |||
| The peering timer is a configurable value where 3 seconds represents | The peering timer is a configurable value where 3 seconds represents | |||
| the default. Configuring a timer value of 0, or so small as to | the default. Configuring a timer value of 0, or so small as to | |||
| expire during propagation of the BGP routes, is outside the scope of | expire during propagation of the BGP routes, is outside the scope of | |||
| this document. In reality, the use of the SCT approach presented in | this document. In reality, the use of the SCT approach presented in | |||
| this document encourages the use of larger peering timer values to | this document encourages the use of larger peering timer values to | |||
| overcome any sort of BGP route propagation delays. | overcome any sort of BGP route propagation delays. | |||
| 3.1. Concurrent Recoveries | 3.1. Concurrent Recoveries | |||
| In the eventuality 2 or more PEs in a peering Ethernet Segment group | In the eventuality that two or more PEs in a peering Ethernet Segment | |||
| are recovering concurrently or roughly the same time, each will | group are recovering concurrently or roughly at the same time, each | |||
| advertise a Service Carving Time. This SCT value would correspond to | will advertise a SCT. This SCT value would correspond to what each | |||
| what each recovering PE considers the "end time" for DF Election. A | recovering PE considers the "end time" for DF Election. A similar | |||
| similar situation arises in sequentially recovering PEs, when a | situation arises in sequentially recovering PEs, when a second PE | |||
| second PE recovers approximately at the time of the first PE's | recovers approximately at the time of the first PE's advertised SCT | |||
| advertised SCT expiry, and with its own new SCT-2 outside of the | expiry and with its own new SCT-2 outside of the initial SCT window. | |||
| initial SCT window. | ||||
| In the case of multiple concurrent DF elections, each initiated by | In the case of multiple concurrent DF elections, each initiated by | |||
| one of the recovering PEs, the SCTs must be ordered chronologically. | one of the recovering PEs, the SCTs must be ordered chronologically. | |||
| All PEs SHALL execute only a single DF Election at the service | All PEs SHALL execute only a single DF Election at the service | |||
| carving time corresponding to the largest (latest) received timestamp | carving time corresponding to the largest (latest) received timestamp | |||
| value. This DF Election will lead peering PEs into a single co- | value. This DF Election will lead peering PEs into a single | |||
| ordinated DF Election update. | coordinated DF Election update. | |||
| Example: | Example: | |||
| 1. Initial State: PE1 is in a steady state, with services elected at | 1. Initial State: PE1 is in a steady state, with services elected at | |||
| PE1. | PE1. | |||
| 2. Recovery of PE2: PE2 recovers at time t=100 and advertises RT-4 | 2. Recovery of PE2: PE2 recovers at time t=100 and advertises RT-4 | |||
| with a target SCT value of t=103 to its partners (PE1). | with a target SCT value of t=103 to its partner (PE1). | |||
| 3. Timer Initiation by PE2: PE2 starts a 3-second timer to allow the | 3. Timer Initiation by PE2: PE2 starts a 3-second timer to allow the | |||
| reception of RT-4 from other PE nodes. | reception of RT-4 from other PE nodes. | |||
| 4. Timer Initiation by PE1: PE1 starts the service carving timer, | 4. Timer Initiation by PE1: PE1 starts the service carving timer, | |||
| with the remaining time until t=103. | with the remaining time until t=103. | |||
| 5. Recovery of PE3: PE3 recovers at time t=102 and advertises RT-4 | 5. Recovery of PE3: PE3 recovers at time t=102 and advertises RT-4 | |||
| with a target SCT value of t=105 to its partners (PE1, PE2). | with a target SCT value of t=105 to its partners (PE1, PE2). | |||
| skipping to change at page 13, line 17 ¶ | skipping to change at line 566 ¶ | |||
| 7. Timer Update by PE2: PE2 cancels the running timer and starts the | 7. Timer Update by PE2: PE2 cancels the running timer and starts the | |||
| service carving timer with the remaining time until t=105. | service carving timer with the remaining time until t=105. | |||
| 8. Timer Update by PE1: PE1 updates its service carving timer, with | 8. Timer Update by PE1: PE1 updates its service carving timer, with | |||
| the remaining time until t=105. | the remaining time until t=105. | |||
| 9. Service Carving: PE1, PE2, and PE3 perform service carving at the | 9. Service Carving: PE1, PE2, and PE3 perform service carving at the | |||
| absolute time of t=105. | absolute time of t=105. | |||
| In the eventuality a PE in an Ethernet Segment group recovers during | In the eventuality that a PE in an Ethernet Segment group recovers | |||
| the discovery window specified in Section 8.5 of [RFC7432], and does | during the discovery window specified in Section 8.5 of [RFC7432] and | |||
| not support or advertise the T-bit, then all PEs in the current | does not support or advertise the T-bit, all PEs in the current | |||
| peering sequence SHALL immediately revert to the default [RFC7432] | peering sequence SHALL immediately revert to the default behavior | |||
| behavior. | described in [RFC7432]. | |||
| 4. Backwards Compatibility | 4. Backwards Compatibility | |||
| For the DF election procedures to achieve global convergence and | For the DF election procedures to achieve global convergence and | |||
| unanimity within a redundancy group, it is essential that all | unanimity within a redundancy group, it is essential that all | |||
| participating PEs agree on the DF election algorithm to be employed. | participating PEs agree on the DF election algorithm to be employed. | |||
| However, it is possible that some PEs may continue to use the | However, it is possible that some PEs may continue to use the | |||
| existing modulo-based DF election algorithm from [RFC7432] and not | existing modulo-based DF election algorithm from [RFC7432] and not | |||
| utilize the new Service Carving Time (SCT) BGP extended community. | utilize the new SCT BGP extended community. PEs that operate using | |||
| PEs that operate using the baseline DF election mechanism will simply | the baseline DF election mechanism will simply discard the new SCT | |||
| discard the new SCT BGP extended community as unrecognized. | BGP extended community as unrecognized. | |||
| A PE can indicate its willingness to support clock-synchronized | A PE can indicate its willingness to support clock-synchronized | |||
| carving by signaling the new 'T' DF Election Capability and including | carving by signaling the new "T" DF Election Capability and including | |||
| the new SCT BGP extended community along with the Ethernet Segment | the new SCT BGP extended community along with the Ethernet Segment | |||
| Route Type 4. If one or more PEs attached to the Ethernet Segment do | Route Type 4. If one or more PEs attached to the Ethernet Segment do | |||
| not signal T=1, then all PEs in the Ethernet Segment SHALL revert to | not signal T=1, then all PEs in the Ethernet Segment SHALL revert to | |||
| the timer-based approach as specified in [RFC7432]. This reversion | the timer-based approach as specified in [RFC7432]. This reversion | |||
| is particularly crucial in preventing VLAN shuffling when more than | is particularly crucial in preventing VLAN shuffling when more than | |||
| two PEs are involved. | two PEs are involved. | |||
| In the event a new or extra RT-4 is received without the new 'T' DF | In the event a new or extra RT-4 is received without the new "T" DF | |||
| Election Capability in the midst of an ongoing DF Election sequence, | Election Capability in the midst of an ongoing DF Election sequence, | |||
| all SCT-based delays are cancelled and the DF Election immediately | all SCT-based delays are canceled, and the DF Election is immediately | |||
| applied as specified in [RFC7432], as if no SCT had been previously | applied as specified in [RFC7432], as if no SCT had been previously | |||
| exchanged. | exchanged. | |||
| 5. Security Considerations | 5. Security Considerations | |||
| The mechanisms in this document use the EVPN control plane as defined | The mechanisms in this document use the EVPN control plane as defined | |||
| in [RFC7432]. Security considerations described in [RFC7432] are | in [RFC7432]. Security considerations described in [RFC7432] are | |||
| equally applicable. | equally applicable. | |||
| For the new SCT Extended Community, attack vectors may be setting the | For the new SCT Extended Community, attack vectors may be setting the | |||
| value to zero, to a value in the past or to large times in the | value to zero, to a value in the past, or to large times in the | |||
| future. Handling of this attack vector is addressed in Section 2.2 | future. Handling of this attack vector is addressed in Section 2.2 | |||
| alongside NTP Era rollover ambiguity. | alongside NTP Era rollover ambiguity. | |||
| This document uses MPLS and IP-based tunnel technologies to support | This document uses MPLS- and IP-based tunnel technologies to support | |||
| data plane transport. Security considerations described in [RFC7432] | data plane transport. Security considerations described in [RFC7432] | |||
| and in [RFC8365] are equally applicable. | and [RFC8365] are equally applicable. | |||
| 6. IANA Considerations | 6. IANA Considerations | |||
| IANA maintains the "EVPN Extended Community Sub-Types" registry set | IANA has made the following assignment in the "EVPN Extended | |||
| up by [RFC7153], where the following assignment has been made: | Community Sub-Types" registry set up by [RFC7153]. | |||
| Sub-Type Value Name Reference | +================+======================+===========+ | |||
| -------------- ------------------------- ------------- | | Sub-Type Value | Name | Reference | | |||
| 0x0F Service Carving Time This document | +================+======================+===========+ | |||
| | 0x0F | Service Carving Time | RFC 9722 | | ||||
| +----------------+----------------------+-----------+ | ||||
| IANA maintains the "DF Election Capabilities" registry set up by | Table 1 | |||
| [RFC8584]. IANA is requested to make the following assignment from | ||||
| this registry: | ||||
| Bit Name Reference | IANA has made the following assignment in the "DF Election | |||
| ---- ---------------- ------------- | Capabilities" registry set up by [RFC8584]. | |||
| 3 Time Synchronization This document | ||||
| +=====+======================+===========+ | ||||
| | Bit | Name | Reference | | ||||
| +=====+======================+===========+ | ||||
| | 3 | Time Synchronization | RFC 9722 | | ||||
| +-----+----------------------+-----------+ | ||||
| Table 2 | ||||
| 7. References | 7. References | |||
| 7.1. Normative References | 7.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| skipping to change at page 15, line 33 ¶ | skipping to change at line 679 ¶ | |||
| [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, | [RFC8584] Rabadan, J., Ed., Mohanty, S., Ed., Sajassi, A., Drake, | |||
| J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet | J., Nagaraj, K., and S. Sathappan, "Framework for Ethernet | |||
| VPN Designated Forwarder Election Extensibility", | VPN Designated Forwarder Election Extensibility", | |||
| RFC 8584, DOI 10.17487/RFC8584, April 2019, | RFC 8584, DOI 10.17487/RFC8584, April 2019, | |||
| <https://www.rfc-editor.org/info/rfc8584>. | <https://www.rfc-editor.org/info/rfc8584>. | |||
| 7.2. Informative References | 7.2. Informative References | |||
| [HRW98] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings | [HRW98] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings | |||
| to Increase Hit Rates", 1998, | to Increase Hit Rates", IEEE/ACM Transactions on | |||
| <https://www.microsoft.com/en-us/research/wp-content/ | Networking, vol. 6, no. 1, February 1998, | |||
| uploads/2017/02/HRW98.pdf>. | <https://www.microsoft.com/en-us/research/wp- | |||
| content/uploads/2017/02/HRW98.pdf>. | ||||
| Appendix A. Contributors | Acknowledgements | |||
| Authors would like to acknowledge helpful comments and contributions | ||||
| of Satya Mohanty and Bharath Vasudevan. Also thank you to Anoop | ||||
| Ghanwani and Gunter van de Velde for their thorough review with | ||||
| valuable comments and corrections. | ||||
| Contributors | ||||
| In addition to the authors listed on the front page, the following | In addition to the authors listed on the front page, the following | |||
| co-authors have also contributed substantially to this document: | coauthors have also contributed substantially to this document: | |||
| Gaurav Badoni | Gaurav Badoni | |||
| Cisco | Cisco | |||
| Email: gbadoni@cisco.com | Email: gbadoni@cisco.com | |||
| Dhananjaya Rao | Dhananjaya Rao | |||
| Cisco | Cisco | |||
| Email: dhrao@cisco.com | Email: dhrao@cisco.com | |||
| Appendix B. Acknowledgements | ||||
| Authors would like to acknowledge helpful comments and contributions | ||||
| of Satya Mohanty and Bharath Vasudevan. Also thank you to Anoop | ||||
| Ghanwani and Gunter van de Velde for their thorough review with | ||||
| valuable comments and corrections. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Patrice Brissette | Patrice Brissette | |||
| Cisco | Cisco | |||
| Email: pbrisset@cisco.com | Email: pbrisset@cisco.com | |||
| Ali Sajassi | Ali Sajassi | |||
| Cisco | Cisco | |||
| Email: sajassi@cisco.com | Email: sajassi@cisco.com | |||
| Luc Andre Burdet (editor) | Luc André Burdet (editor) | |||
| Cisco | Cisco | |||
| Email: lburdet@cisco.com | Email: lburdet@cisco.com | |||
| John Drake | John Drake | |||
| Independent | Independent | |||
| Email: je_drake@yahoo.com | Email: je_drake@yahoo.com | |||
| Jorge Rabadan | Jorge Rabadan | |||
| Nokia | Nokia | |||
| Email: jorge.rabadan@nokia.com | Email: jorge.rabadan@nokia.com | |||
| End of changes. 100 change blocks. | ||||
| 289 lines changed or deleted | 292 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||