rfc8405.original   rfc8405.txt 
Network Working Group B. Decraene Internet Engineering Task Force (IETF) B. Decraene
Internet-Draft Orange Request for Comments: 8405 Orange
Intended status: Standards Track S. Litkowski Category: Standards Track S. Litkowski
Expires: September 20, 2018 Orange Business Service ISSN: 2070-1721 Orange Business Service
H. Gredler H. Gredler
RtBrick Inc RtBrick Inc.
A. Lindem A. Lindem
Cisco Systems Cisco Systems
P. Francois P. Francois
C. Bowers C. Bowers
Juniper Networks, Inc. Juniper Networks, Inc.
March 19, 2018 June 2018
SPF Back-off Delay algorithm for link state IGPs Shortest Path First (SPF) Back-Off Delay Algorithm for Link-State IGPs
draft-ietf-rtgwg-backoff-algo-10
Abstract Abstract
This document defines a standard algorithm to temporarily postpone or This document defines a standard algorithm to temporarily postpone or
'back-off' link-state IGP Shortest Path First (SPF) computations. "back off" link-state IGP Shortest Path First (SPF) computations.
This reduces the computational load and churn on IGP nodes when This reduces the computational load and churn on IGP nodes when
multiple temporally close network events trigger multiple SPF multiple temporally close network events trigger multiple SPF
computations. computations.
Having one standard algorithm improves interoperability by reducing Having one standard algorithm improves interoperability by reducing
the probability and/or duration of transient forwarding loops during the probability and/or duration of transient forwarding loops during
the IGP convergence when the IGP reacts to multiple temporally close the IGP convergence when the IGP reacts to multiple temporally close
IGP events. IGP events.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[BCP14] [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This is an Internet Standards Track document.
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
This Internet-Draft will expire on September 20, 2018. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc8405.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. High level goals . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
3. Definitions and parameters . . . . . . . . . . . . . . . . . 4 2. High-Level Goals . . . . . . . . . . . . . . . . . . . . . . 3
4. Principles of SPF delay algorithm . . . . . . . . . . . . . . 5 3. Definitions and Parameters . . . . . . . . . . . . . . . . . 4
5. Specification of the SPF delay state machine . . . . . . . . 6 4. Principles of the SPF Delay Algorithm . . . . . . . . . . . . 5
5. Specification of the SPF Delay State Machine . . . . . . . . 6
5.1. State Machine . . . . . . . . . . . . . . . . . . . . . . 6 5.1. State Machine . . . . . . . . . . . . . . . . . . . . . . 6
5.2. State . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5.2. State . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.3. Timers . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.3. Timers . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.4. FSM Events . . . . . . . . . . . . . . . . . . . . . . . 8 5.4. FSM Events . . . . . . . . . . . . . . . . . . . . . . . 7
6. Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 10 6. Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 9
7. Partial Deployment . . . . . . . . . . . . . . . . . . . . . 11 7. Partial Deployment . . . . . . . . . . . . . . . . . . . . . 10
8. Impact on micro-loops . . . . . . . . . . . . . . . . . . . . 11 8. Impact on Micro-loops . . . . . . . . . . . . . . . . . . . . 11
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
10. Security considerations . . . . . . . . . . . . . . . . . . . 12 10. Security Considerations . . . . . . . . . . . . . . . . . . . 11
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 11
12. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 11.1. Normative References . . . . . . . . . . . . . . . . . . 11
12.1. Normative References . . . . . . . . . . . . . . . . . . 12 11.2. Informative References . . . . . . . . . . . . . . . . . 11
12.2. Informative References . . . . . . . . . . . . . . . . . 12 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction 1. Introduction
Link state IGPs, such as IS-IS [ISO10589-Second-Edition], OSPF Link-state IGPs, such as IS-IS [ISO10589], OSPF [RFC2328], and OSPFv3
[RFC2328] and OSPFv3 [RFC5340], perform distributed route computation [RFC5340], perform distributed route computation on all routers in
on all routers in the area/level. In order to have consistent the area/level. In order to have consistent routing tables across
routing tables across the network, such distributed computation the network, such distributed computation requires that all routers
requires that all routers have the same version of the network have the same version of the network topology (Link-State Database
topology (Link State DataBase (LSDB)) and perform their computation (LSDB)) and perform their computation essentially at the same time.
essentially at the same time.
In general, when the network is stable, there is a desire to trigger In general, when the network is stable, there is a desire to trigger
a new Shortest Path First (SPF) computation as soon as a failure is a new Shortest Path First (SPF) computation as soon as a failure is
detected in order to quickly route around the failure. However, when detected in order to quickly route around the failure. However, when
the network is experiencing multiple failures over a short period of the network is experiencing multiple failures over a short period of
time, there is a conflicting desire to limit the frequency of SPF time, there is a conflicting desire to limit the frequency of SPF
computations, which would allow a reduction in control plane computations, which would allow a reduction in control plane
resources used by IGPs and all protocols/subsystems reacting on the resources used by IGPs and all protocols/subsystems reacting on the
attendant route change, such as LDP [RFC5036], RSVP-TE [RFC3209], BGP attendant route change, such as LDP [RFC5036], RSVP-TE [RFC3209], BGP
[RFC4271], Fast ReRoute computations (e.g., Loop Free Alternates [RFC4271], Fast Reroute computations (e.g., Loop-Free Alternates
(LFA) [RFC5286]), FIB updates, etc. This also reduces network churn (LFAs) [RFC5286]), FIB updates, etc. This also reduces network churn
and, in particular, reduces the side effects such as micro-loops and, in particular, reduces side effects (such as micro-loops
[RFC5715] that ensue during IGP convergence. [RFC5715]) that ensue during IGP convergence.
To allow for this, IGPs usually implement an SPF Back-off Delay To allow for this, IGPs usually implement an SPF Back-Off Delay
algorithm that postpones or backs-off the SPF computation. However, algorithm that postpones or backs off the SPF computation. However,
different implementations have chosen different algorithms. Hence, different implementations chose different algorithms. Hence, in a
in a multi-vendor network, it's not possible to ensure that all multi-vendor network, it's not possible to ensure that all routers
routers trigger their SPF computation after the same delay. This trigger their SPF computation after the same delay. This situation
situation increases the average and maximum differential delay increases the average and maximum differential delay between routers
between routers completing their SPF computation. It also increases completing their SPF computation. It also increases the probability
the probability that different routers compute their FIBs based on that different routers compute their FIBs based on different LSDB
different LSDB versions. Both factors increase the probability and/ versions. Both factors increase the probability and/or duration of
or duration of micro-loops as discussed in Section 8. micro-loops as discussed in Section 8.
To allow multi-vendor networks to have all routers delay their SPF This document specifies a standard algorithm to allow multi-vendor
computations for the same duration, this document specifies a networks to have all routers delay their SPF computations for the
standard algorithm. same duration.
2. High level goals 1.1. Requirements Language
The high level goals of this algorithm are the following: The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
o Very fast convergence for a single event (e.g., link failure). 2. High-Level Goals
o Paced fast convergence for multiple temporally close IGP events The high-level goals of this algorithm are the following:
while IGP stability is considered acceptable.
o Delayed convergence when IGP stability is problematic. This will o very fast convergence for a single event (e.g., link failure),
o paced fast convergence for multiple temporally close IGP events
while IGP stability is considered acceptable,
o delayed convergence when IGP stability is problematic (this will
allow the IGP and related processes to conserve resources during allow the IGP and related processes to conserve resources during
the period of instability. the period of instability), and
o Always try to avoid different SPF_DELAY Section 3 timer values o always try to avoid different SPF_DELAY (Section 3) timer values
across different routers in the area/level. This requires across different routers in the area/level. This requires
specific consideration as different routers may receive IGP specific consideration as different routers may receive IGP
messages at different interval or even order, due to differences messages at different intervals, or even orders, due to
both in the distance from the originator of the IGP event and in differences both in the distance from the originator of the IGP
flooding implementations. event and in flooding implementations.
3. Definitions and parameters 3. Definitions and Parameters
IGP events: The reception or origination of an IGP LSDB change IGP events: The reception or origination of an IGP LSDB change
requiring a new routing table computation. Examples are a topology requiring a new routing table computation. Some examples are a
change, a prefix change and a metric change on a link or prefix. topology change, a prefix change, and a metric change on a link or
Note that locally triggering a routing table computation is not prefix. Note that locally triggering a routing table computation is
considered as an IGP event since other IGP routers are unaware of not considered an IGP event since other IGP routers are unaware of
this occurrence. this occurrence.
Routing table computation, in this document, is scoped to the IGP. Routing table computation, in this document, is scoped to the IGP;
So this is the computation of the IGP RIB, performed by the IGP, so, this is the computation of the IGP RIB, performed by the IGP,
using the IGP LSDB. No distinction is made between the type of using the IGP LSDB. No distinction is made between the type of
computation performed. e.g., full SPF, incremental SPF, Partial Route computation performed, e.g., full SPF, incremental SPF, or Partial
Computation (PRC): the type of computation is a local consideration. Route Computation (PRC); the type of computation is a local
This document may interchangeably use the terms routing table consideration. This document may interchangeably use the terms
computation and SPF computation. "routing table computation" and "SPF computation".
SPF_DELAY: The delay between the first IGP event triggering a new SPF_DELAY: The delay between the first IGP event triggering a new
routing table computation and the start of that routing table routing table computation and the start of that routing table
computation. It can take the following values: computation. It can take the following values:
INITIAL_SPF_DELAY: A very small delay to quickly handle a single INITIAL_SPF_DELAY: A very small delay to quickly handle a single
isolated link failure, e.g., 0 milliseconds. isolated link failure, e.g., 0 milliseconds.
SHORT_SPF_DELAY: A small delay to provide fast convergence in the SHORT_SPF_DELAY: A small delay to provide fast convergence in the
case of a single component failure (node, Shared Risk Link Group case of a single component failure (such as the node failure or
(SRLG)..) that leads to multiple IGP events, e.g., 50-100 Shared Risk Link Group (SRLG) failure) that leads to multiple IGP
milliseconds. events, e.g., 50-100 milliseconds.
LONG_SPF_DELAY: A long delay when the IGP is unstable, e.g., 2 LONG_SPF_DELAY: A long delay when the IGP is unstable, e.g., 2
seconds. Note that this allows the IGP network to stabilize. seconds. Note that this allows the IGP network to stabilize.
TIME_TO_LEARN_INTERVAL: This is the maximum duration typically needed TIME_TO_LEARN_INTERVAL: This is the maximum duration typically needed
to learn all the IGP events related to a single component failure to learn all the IGP events related to a single component failure
(e.g., router failure, SRLG failure), e.g., 1 second. It's mostly (such as router failure or SRLG failure), e.g., 1 second. It's
dependent on failure detection time variation between all routers mostly dependent on failure detection time variation between all
that are adjacent to the failure. Additionally, it may depend on the routers that are adjacent to the failure. Additionally, it may
different IGP implementations/parameters across the network, related depend on the different IGP implementations/parameters across the
to origination and flooding of their link state advertisements. network, related to origination and flooding of their link-state
advertisements.
HOLDDOWN_INTERVAL: The time required with no received IGP events HOLDDOWN_INTERVAL: The time required with no received IGP events
before considering the IGP to be stable again and allowing the before considering the IGP to be stable again and allowing the
SPF_DELAY to be restored to INITIAL_SPF_DELAY. e.g. a SPF_DELAY to be restored to INITIAL_SPF_DELAY, e.g., a
HOLDDOWN_INTERVAL of 3 seconds. The HOLDDOWN_INTERVAL MUST be HOLDDOWN_INTERVAL of 3 seconds. The HOLDDOWN_INTERVAL MUST be
defaulted and configured to be longer than the defaulted and configured to be longer than the
TIME_TO_LEARN_INTERVAL. TIME_TO_LEARN_INTERVAL.
4. Principles of SPF delay algorithm 4. Principles of the SPF Delay Algorithm
For this first IGP event, we assume that there has been a single For this first IGP event, we assume that there has been a single
simple change in the network which can be taken into account using a simple change in the network, which can be taken into account using a
single routing computation (e.g., link failure, prefix (metric) single routing computation (e.g., link failure, prefix (metric)
change) and we optimize for very fast convergence, delaying the change), and we optimize for very fast convergence, which delays the
routing computation by INITIAL_SPF_DELAY. Under this assumption, routing computation by INITIAL_SPF_DELAY. Under this assumption,
there is no benefit in delaying the routing computation. In a there is no benefit in delaying the routing computation. In a
typical network, this is the most common type of IGP event. Hence, typical network, this is the most common type of IGP event. Hence,
it makes sense to optimize this case. it makes sense to optimize this case.
If subsequent IGP events are received in a short period of time If subsequent IGP events are received in a short period of time
(TIME_TO_LEARN_INTERVAL), we then assume that a single component (TIME_TO_LEARN_INTERVAL), we then assume that a single component
failed, but that this failure requires the knowledge of multiple IGP failed, but that this failure requires the knowledge of multiple IGP
events in order for IGP routing to converge. Under this assumption, events in order for IGP routing to converge. Under this assumption,
we want fast convergence since this is a normal network situation. we want fast convergence since this is a normal network situation.
However, there is a benefit in waiting for all IGP events related to However, there is a benefit in waiting for all IGP events related to
this single component failure so that the IGP can compute the post- this single component failure so that the IGP can compute the post-
failure routing table in a single additional route computation. In failure routing table in a single additional route computation. In
this situation, we delay the routing computation by SHORT_SPF_DELAY. this situation, we delay the routing computation by SHORT_SPF_DELAY.
If IGP events are still received after TIME_TO_LEARN_INTERVAL from If IGP events are still received after TIME_TO_LEARN_INTERVAL from
the initial IGP event received in QUIET state Section 5.1, then the the initial IGP event received in QUIET state (see Section 5.1), then
network is presumably experiencing multiple independent failures. In the network is presumably experiencing multiple independent failures.
this case, while waiting for network stability, the computations are In this case, while waiting for network stability, the computations
delayed for a longer time represented by LONG_SPF_DELAY. This SPF are delayed for a longer time, which is represented by
delay is kept until no IGP events are received for HOLDDOWN_INTERVAL. LONG_SPF_DELAY. This SPF delay is kept until no IGP events are
received for HOLDDOWN_INTERVAL.
Note that in order to increase the consistency network wide, the Note that in order to increase the consistency network wide, the
algorithm uses a delay (TIME_TO_LEARN_INTERVAL) from the initial IGP algorithm uses a delay (TIME_TO_LEARN_INTERVAL) from the initial IGP
event, rather than the number of SPF computation performed. Indeed, event rather than the number of SPF computations performed. Indeed,
as all routers may receive the IGP events at different times, we as all routers may receive the IGP events at different times, we
cannot assume that all routers will perform the same number of SPF cannot assume that all routers will perform the same number of SPF
computations. For example, assuming that the SPF delay is 50 ms, computations. For example, assuming that the SPF delay is 50
router R1 may receive 3 IGP events (E1, E2, E3) in those 50 ms and milliseconds, router R1 may receive three IGP events (E1, E2, E3) in
hence will perform a single routing computation. While another those 50 milliseconds and hence will perform a single routing
router R2 may only receive 2 events (E1, E2) in those 50 ms and hence computation, while another router R2 may only receive two events (E1,
will schedule another routing computation when receiving E3. E2) in those 50 milliseconds and hence will schedule another routing
computation when receiving E3.
5. Specification of the SPF delay state machine 5. Specification of the SPF Delay State Machine
This section specifies the finite state machine (FSM) intended to This section specifies the Finite State Machine (FSM) intended to
control the timing of the execution of SPF calculations in response control the timing of the execution of SPF calculations in response
to IGP events. to IGP events.
5.1. State Machine 5.1. State Machine
The FSM is initialized to the QUIET state with all three timers The FSM is initialized to the QUIET state with all three timers
timers (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER) deactivated. (SPF_TIMER, HOLDDOWN_TIMER, and LEARN_TIMER) deactivated.
The events which may change the FSM states are an IGP event or the The events that may change the FSM states are an IGP event or the
expiration of one timer (SPF_TIMER, HOLDDOWN_TIMER, LEARN_TIMER). expiration of one timer (SPF_TIMER, HOLDDOWN_TIMER, or LEARN_TIMER).
The following diagram briefly describes the state transitions. The following diagram briefly describes the state transitions.
+-------------------+ +-------------------+
+---->| |<-------------------+ +---->| |<-------------------+
| | QUIET | | | | QUIET | |
+-----| |<---------+ | +-----| |<---------+ |
7: +-------------------+ | | 7: +-------------------+ | |
SPF_TIMER | | | SPF_TIMER | | |
expiration | | | expiration | | |
| 1: IGP event | | | 1: IGP event | |
| | | | | |
v | | v | |
+-------------------+ | | +-------------------+ | |
+---->| | | | +---->| | | |
| | SHORT_WAIT |----->----+ | | | SHORT_WAIT |----->----+ |
+-----| | | +-----| | |
2: +-------------------+ 6: HOLDDOWN_TIMER | 2: +-------------------+ 6: HOLDDOWN_TIMER |
IGP event | expiration | IGP event | expiration |
8: SPF_TIMER | | 8: SPF_TIMER | |
expiration | | expiration | |
| 3: LEARN_TIMER | | 3: LEARN_TIMER |
| expiration | | expiration |
| | | |
v | v |
+-------------------+ | +-------------------+ |
+---->| | | +---->| | |
| | LONG_WAIT |------------>-------+ | | LONG_WAIT |------------>-------+
+-----| | +-----| |
4: +-------------------+ 5: HOLDDOWN_TIMER 4: +-------------------+ 5: HOLDDOWN_TIMER
IGP event expiration IGP event expiration
9: SPF_TIMER expiration 9: SPF_TIMER expiration
Figure 1: State Machine Figure 1: State Machine
5.2. State 5.2. State
The naming and semantics of each state corresponds directly to the The naming and semantics of each state corresponds directly to the
SPF delay used for IGP events received in that state. Three states SPF delay used for IGP events received in that state. Three states
are defined: are defined:
QUIET: This is the initial state, when no IGP events have occurred QUIET: This is the initial state, when no IGP events have occurred
for at least HOLDDOWN_INTERVAL since the previous routing table for at least HOLDDOWN_INTERVAL since the previous routing table
computation. The state is meant to handle link failures very computation. The state is meant to handle link failures very
quickly. quickly.
SHORT_WAIT: State entered when an IGP event has been received in SHORT_WAIT: This is the state entered when an IGP event has been
QUIET state. This state is meant to handle single component failure received in QUIET state. This state is meant to handle single
requiring multiple IGP events (e.g., node, SRLG). component failure requiring multiple IGP events (e.g., node, SRLG).
LONG_WAIT: State reached after TIME_TO_LEARN_INTERVAL. In other LONG_WAIT: This is the state reached after TIME_TO_LEARN_INTERVAL.
words, state reached after TIME_TO_LEARN_INTERVAL in state In other words, this is the state reached after
SHORT_WAIT. This state is meant to handle multiple independent TIME_TO_LEARN_INTERVAL in state SHORT_WAIT. This state is meant to
component failures during periods of IGP instability. handle multiple independent component failures during periods of IGP
instability.
5.3. Timers 5.3. Timers
SPF_TIMER: The FSM timer that uses the computed SPF delay. Upon SPF_TIMER: This is the FSM timer that uses the computed SPF delay.
expiration, the Route Table Computation (as defined in Section 3) is Upon expiration, the routing table computation (as defined in
performed. Section 3) is performed.
HOLDDOWN_TIMER: The FSM timer that is (re)started whan an IGP event HOLDDOWN_TIMER: This is the FSM timer that is (re)started when an IGP
is received and set to HOLDDOWN_INTERVAL. Upon expiration, the FSM event is received and set to HOLDDOWN_INTERVAL. Upon expiration, the
is moved to the QUIET state. FSM is moved to the QUIET state.
LEARN_TIMER: The FSM timer that is started when an IGP event is LEARN_TIMER: This is the FSM timer that is started when an IGP event
recevied while the FSM is in the QUIET state. Upon expiration, the is received while the FSM is in the QUIET state. Upon expiration,
FSM is moved to the LONG_WAIT state. the FSM is moved to the LONG_WAIT state.
5.4. FSM Events 5.4. FSM Events
This section describes the events and the actions performed in This section describes the events and the actions performed in
response. response.
Transition 1: IGP event, while in QUIET state. Transition 1: IGP event while in QUIET state
Actions on event 1: Actions on event 1:
o If SPF_TIMER is not already running, start it with value o If SPF_TIMER is not already running, start it with value
INITIAL_SPF_DELAY. INITIAL_SPF_DELAY.
o Start LEARN_TIMER with TIME_TO_LEARN_INTERVAL. o Start LEARN_TIMER with TIME_TO_LEARN_INTERVAL.
o Start HOLDDOWN_TIMER with HOLDDOWN_INTERVAL. o Start HOLDDOWN_TIMER with HOLDDOWN_INTERVAL.
o Transition to SHORT_WAIT state. o Transition to SHORT_WAIT state.
Transition 2: IGP event, while in SHORT_WAIT. Transition 2: IGP event while in SHORT_WAIT
Actions on event 2: Actions on event 2:
o Reset HOLDDOWN_TIMER to HOLDDOWN_INTERVAL. o Reset HOLDDOWN_TIMER to HOLDDOWN_INTERVAL.
o If SPF_TIMER is not already running, start it with value o If SPF_TIMER is not already running, start it with value
SHORT_SPF_DELAY. SHORT_SPF_DELAY.
o Remain in current state. o Remain in current state.
Transition 3: LEARN_TIMER expiration. Transition 3: LEARN_TIMER expiration
Actions on event 3: Actions on event 3:
o Transition to LONG_WAIT state. o Transition to LONG_WAIT state.
Transition 4: IGP event, while in LONG_WAIT. Transition 4: IGP event while in LONG_WAIT
Actions on event 4: Actions on event 4:
o Reset HOLDDOWN_TIMER to HOLDDOWN_INTERVAL. o Reset HOLDDOWN_TIMER to HOLDDOWN_INTERVAL.
o If SPF_TIMER is not already running, start it with value o If SPF_TIMER is not already running, start it with value
LONG_SPF_DELAY. LONG_SPF_DELAY.
o Remain in current state. o Remain in current state.
Transition 5: HOLDDOWN_TIMER expiration, while in LONG_WAIT. Transition 5: HOLDDOWN_TIMER expiration while in LONG_WAIT
Actions on event 5: Actions on event 5:
o Transition to QUIET state. o Transition to QUIET state.
Transition 6: HOLDDOWN_TIMER expiration, while in SHORT_WAIT. Transition 6: HOLDDOWN_TIMER expiration while in SHORT_WAIT
Actions on event 6: Actions on event 6:
o Deactivate LEARN_TIMER. o Deactivate LEARN_TIMER.
o Transition to QUIET state. o Transition to QUIET state.
Transition 7: SPF_TIMER expiration, while in QUIET. Transition 7: SPF_TIMER expiration while in QUIET
Actions on event 7: Actions on event 7:
o Compute SPF. o Compute SPF.
o Remain in current state. o Remain in current state.
Transition 8: SPF_TIMER expiration, while in SHORT_WAIT. Transition 8: SPF_TIMER expiration while in SHORT_WAIT
Actions on event 8: Actions on event 8:
o Compute SPF. o Compute SPF.
o Remain in current state. o Remain in current state.
Transition 9: SPF_TIMER expiration, while in LONG_WAIT. Transition 9: SPF_TIMER expiration while in LONG_WAIT
Actions on event 9: Actions on event 9:
o Compute SPF. o Compute SPF.
o Remain in current state. o Remain in current state.
6. Parameters 6. Parameters
All the parameters MUST be configurable at the protocol instance All the parameters MUST be configurable at the protocol instance
granularity. They MAY be configurable at the area/level granularity. granularity. They MAY be configurable at the area/level granularity.
All the delays (INITIAL_SPF_DELAY, SHORT_SPF_DELAY, LONG_SPF_DELAY, All the delays (INITIAL_SPF_DELAY, SHORT_SPF_DELAY, LONG_SPF_DELAY,
TIME_TO_LEARN_INTERVAL, HOLDDOWN_INTERVAL) SHOULD be configurable at TIME_TO_LEARN_INTERVAL, and HOLDDOWN_INTERVAL) SHOULD be configurable
the millisecond granularity. They MUST be configurable at least at at the millisecond granularity. They MUST be configurable at least
the tenth of second granularity. The configurable range for all the at the tenth of a second granularity. The configurable range for all
parameters SHOULD at least be from 0 milliseconds to 60 seconds. The the parameters SHOULD at least be from 0 milliseconds to 60 seconds.
HOLDDOWN_INTERVAL MUST be defaulted or configured to be longer than The HOLDDOWN_INTERVAL MUST be defaulted or configured to be longer
the TIME_TO_LEARN_INTERVAL. than the TIME_TO_LEARN_INTERVAL.
If this SPF backoff algorithm is enabled by default, then in order to If this SPF Back-Off algorithm is enabled by default, then in order
have consistent SPF delays between implementations with default to have consistent SPF delays between implementations with default
configuration, the following default values SHOULD be implemented: configuration, the following default values SHOULD be implemented:
INITIAL_SPF_DELAY 50 ms, SHORT_SPF_DELAY 200ms, LONG_SPF_DELAY: 5
000ms, TIME_TO_LEARN_INTERVAL 500ms, HOLDDOWN_INTERVAL 10 000ms. INITIAL_SPF_DELAY 50 ms
SHORT_SPF_DELAY 200 ms
LONG_SPF_DELAY 5000 ms
TIME_TO_LEARN_INTERVAL 500 ms
HOLDDOWN_INTERVAL 10000 ms
In order to satisfy the goals stated in Section 2, operators are In order to satisfy the goals stated in Section 2, operators are
RECOMMENDED to configure delay intervals such that INITIAL_SPF_DELAY RECOMMENDED to configure delay intervals such that INITIAL_SPF_DELAY
<= SHORT_SPF_DELAY and SHORT_SPF_DELAY <= LONG_SPF_DELAY. <= SHORT_SPF_DELAY and SHORT_SPF_DELAY <= LONG_SPF_DELAY.
When setting (default) values, one should consider the customers and When setting (default) values, one should consider the customers and
their application requirements, the computational power of the their application requirements, the computational power of the
routers, the size of the network, and, in particular, the number of routers, the size of the network, and, in particular, the number of
IP prefixes advertised in the IGP, the frequency and number of IGP IP prefixes advertised in the IGP, the frequency and number of IGP
events, the number of protocols reactions/computations triggered by events, and the number of protocol reactions/computations triggered
IGP SPF computation (e.g., BGP, PCEP, Traffic Engineering CSPF, Fast by IGP SPF computation (e.g., BGP, Path Computation Element
ReRoute computations). Note that some or all of these factors may Communication Protocol (PCEP), Traffic Engineering Constrained SPF
change over the life of the network. In case of doubt, it's (CSPF), and Fast Reroute computations). Note that some or all of
RECOMMENDED that timer intervals should be chosen conservatively these factors may change over the life of the network. In case of
(i.e., longer timer values). doubt, it's RECOMMENDED that timer intervals should be chosen
conservatively (i.e., longer timer values).
For the standard algorithm to be effective in mitigating micro-loops, For the standard algorithm to be effective in mitigating micro-loops,
it is RECOMMENDED that all routers in the IGP domain, or at least all it is RECOMMENDED that all routers in the IGP domain, or at least all
the routers in the same area/level, have exactly the same configured the routers in the same area/level, have exactly the same configured
values. values.
7. Partial Deployment 7. Partial Deployment
In general, the SPF Back-off Delay algorithm is only effective in In general, the SPF Back-Off Delay algorithm is only effective in
mitigating micro-loops if it is deployed, with the same parameters, mitigating micro-loops if it is deployed with the same parameters on
on all routers in the IGP domain or, at least, all routers in an IGP all routers in the IGP domain or, at least, all routers in an IGP
area/level. The impact of partial deployment is dependent on the area/level. The impact of partial deployment is dependent on the
particular event, topology, and the algorithm(s) used on other particular event, the topology, and the algorithm(s) used on other
routers in the IGP area/level. In cases where the previous SPF Back- routers in the IGP area/level. In cases where the previous SPF Back-
off Delay algorithm was implemented uniformly, partial deployment Off Delay algorithm was implemented uniformly, partial deployment
will increase the frequency and duration of micro-loops. Hence, it will increase the frequency and duration of micro-loops. Hence, it
is RECOMMENDED that all routers in the IGP domain or at least within is RECOMMENDED that all routers in the IGP domain, or at least within
the same area/level be migrated to the SPF algorithm described herein the same area/level, be migrated to the SPF algorithm described
at roughly the same time. herein at roughly the same time.
Note that this is not a new consideration as over times, network Note that this is not a new consideration; over time, network
operators have changed SPF delay parameters in order to accommodate operators have changed SPF delay parameters in order to accommodate
new customer requirements for fast convergence, as permitted by new new customer requirements for fast convergence, as permitted by new
software and hardware. They may also have progressively replaced an software and hardware. They may also have progressively replaced an
implementation with a given SPF Back-off Delay algorithm by another implementation with a given SPF Back-Off Delay algorithm by another
implementation with a different one. implementation with a different one.
8. Impact on micro-loops 8. Impact on Micro-loops
Micro-loops during IGP convergence are due to a non-synchronized or Micro-loops during IGP convergence are due to a non-synchronized or
non-ordered update of the forwarding information tables (FIB) non-ordered update of FIBs [RFC5715] [RFC6976] [SPF-MICRO]. FIBs are
[RFC5715] [RFC6976] [I-D.ietf-rtgwg-spf-uloop-pb-statement]. FIBs installed after multiple steps, such as flooding of the IGP event
are installed after multiple steps such as flooding of the IGP event
across the network, SPF wait time, SPF computation, FIB distribution across the network, SPF wait time, SPF computation, FIB distribution
across line cards, and FIB update. This document only addresses the across line cards, and FIB update. This document only addresses the
contribution from the SPF wait time. This standardized procedure contribution from the SPF wait time. This standardized procedure
reduces the probability and/or duration of micro-loops when IGPs reduces the probability and/or duration of micro-loops when IGPs
experience multiple temporally close events. It does not prevent all experience multiple temporally close events. It does not prevent all
micro-loops. However, it is beneficial and is less complex and micro-loops; however, it is beneficial and is less complex and costly
costly to implement when compared to full solutions such as [RFC5715] to implement when compared to full solutions such as [RFC5715] or
or [RFC6976]. [RFC6976].
9. IANA Considerations 9. IANA Considerations
No IANA actions required. This document has no IANA actions.
10. Security considerations 10. Security Considerations
The algorithm presented in this document does not compromise IGP The algorithm presented in this document does not compromise IGP
security. An attacker having the ability to generate IGP events security. An attacker having the ability to generate IGP events
would be able to delay the IGP convergence time. The LONG_SPF_DELAY would be able to delay the IGP convergence time. The LONG_SPF_DELAY
state may help mitigate the effects of Denial-of-Service (DOS) state may help mitigate the effects of Denial-of-Service (DoS)
attacks generating many IGP events. attacks generating many IGP events.
11. Acknowledgements 11. References
We would like to acknowledge Les Ginsberg, Uma Chunduri, Mike Shand
and Alexander Vainshtein for the discussions and comments related to
this document.
12. References
12.1. Normative References 11.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
May 2017, <https://www.rfc-editor.org/info/rfc8174>. May 2017, <https://www.rfc-editor.org/info/rfc8174>.
12.2. Informative References 11.2. Informative References
[I-D.ietf-rtgwg-spf-uloop-pb-statement]
Litkowski, S., Decraene, B., and M. Horneffer, "Link State
protocols SPF trigger and delay algorithm impact on IGP
micro-loops", draft-ietf-rtgwg-spf-uloop-pb-statement-06
(work in progress), January 2018.
[ISO10589-Second-Edition] [ISO10589]
International Organization for Standardization, International Organization for Standardization,
"Intermediate system to Intermediate system intra-domain "Information technology -- Telecommunications and
routeing information exchange protocol for use in information exchange between systems -- Intermediate
conjunction with the protocol for providing the System to Intermediate System intra-domain routeing
connectionless-mode Network Service (ISO 8473)", ISO/ information exchange protocol for use in conjunction with
IEC 10589:2002, Second Edition, Nov 2002. the protocol for providing the connectionless-mode network
service (ISO 8473)", ISO/IEC 10589:2002, Second Edition,
November 2002.
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328,
DOI 10.17487/RFC2328, April 1998, DOI 10.17487/RFC2328, April 1998,
<https://www.rfc-editor.org/info/rfc2328>. <https://www.rfc-editor.org/info/rfc2328>.
[RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V.,
and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP
Tunnels", RFC 3209, DOI 10.17487/RFC3209, December 2001, Tunnels", RFC 3209, DOI 10.17487/RFC3209, December 2001,
<https://www.rfc-editor.org/info/rfc3209>. <https://www.rfc-editor.org/info/rfc3209>.
skipping to change at page 13, line 38 skipping to change at page 13, line 5
[RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free
Convergence", RFC 5715, DOI 10.17487/RFC5715, January Convergence", RFC 5715, DOI 10.17487/RFC5715, January
2010, <https://www.rfc-editor.org/info/rfc5715>. 2010, <https://www.rfc-editor.org/info/rfc5715>.
[RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C., [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C.,
Francois, P., and O. Bonaventure, "Framework for Loop-Free Francois, P., and O. Bonaventure, "Framework for Loop-Free
Convergence Using the Ordered Forwarding Information Base Convergence Using the Ordered Forwarding Information Base
(oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July (oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July
2013, <https://www.rfc-editor.org/info/rfc6976>. 2013, <https://www.rfc-editor.org/info/rfc6976>.
[SPF-MICRO]
Litkowski, S., Decraene, B., and M. Horneffer, "Link State
protocols SPF trigger and delay algorithm impact on IGP
micro-loops", Work in Progress, draft-ietf-rtgwg-spf-
uloop-pb-statement-07, May 2018.
Acknowledgements
We would like to acknowledge Les Ginsberg, Uma Chunduri, Mike Shand,
and Alexander Vainshtein for the discussions and comments related to
this document.
Authors' Addresses Authors' Addresses
Bruno Decraene Bruno Decraene
Orange Orange
Email: bruno.decraene@orange.com Email: bruno.decraene@orange.com
Stephane Litkowski Stephane Litkowski
Orange Business Service Orange Business Service
skipping to change at page 14, line 4 skipping to change at page 13, line 28
Bruno Decraene Bruno Decraene
Orange Orange
Email: bruno.decraene@orange.com Email: bruno.decraene@orange.com
Stephane Litkowski Stephane Litkowski
Orange Business Service Orange Business Service
Email: stephane.litkowski@orange.com Email: stephane.litkowski@orange.com
Hannes Gredler Hannes Gredler
RtBrick Inc RtBrick Inc.
Email: hannes@rtbrick.com Email: hannes@rtbrick.com
Acee Lindem Acee Lindem
Cisco Systems Cisco Systems
301 Midenhall Way 301 Midenhall Way
Cary, NC 27513 Cary, NC 27513
USA United States of America
Email: acee@cisco.com Email: acee@cisco.com
Pierre Francois Pierre Francois
Email: pfrpfr@gmail.com Email: pfrpfr@gmail.com
Chris Bowers Chris Bowers
Juniper Networks, Inc. Juniper Networks, Inc.
1194 N. Mathilda Ave. 1194 N. Mathilda Ave.
Sunnyvale, CA 94089 Sunnyvale, CA 94089
US United States of America
Email: cbowers@juniper.net Email: cbowers@juniper.net
 End of changes. 83 change blocks. 
233 lines changed or deleted 239 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/