| rfc9599.original | rfc9599.txt | |||
|---|---|---|---|---|
| Transport Area Working Group B. Briscoe | Internet Engineering Task Force (IETF) B. Briscoe | |||
| Internet-Draft Independent | Request for Comments: 9599 Independent | |||
| Updates: 3819 (if approved) J. Kaippallimalil | BCP: 89 J. Kaippallimalil | |||
| Intended status: Best Current Practice Futurewei | Updates: 3819 Futurewei | |||
| Expires: 7 June 2024 5 December 2023 | Category: Best Current Practice August 2024 | |||
| ISSN: 2070-1721 | ||||
| Guidelines for Adding Congestion Notification to Protocols that | Guidelines for Adding Congestion Notification to Protocols That | |||
| Encapsulate IP | Encapsulate IP | |||
| draft-ietf-tsvwg-ecn-encap-guidelines-22 | ||||
| Abstract | Abstract | |||
| The purpose of this document is to guide the design of congestion | The purpose of this document is to guide the design of congestion | |||
| notification in any lower layer or tunnelling protocol that | notification in any lower-layer or tunnelling protocol that | |||
| encapsulates IP. The aim is for explicit congestion signals to | encapsulates IP. The aim is for explicit congestion signals to | |||
| propagate consistently from lower layer protocols into IP. Then the | propagate consistently from lower-layer protocols into IP. Then, the | |||
| IP internetwork layer can act as a portability layer to carry | IP internetwork layer can act as a portability layer to carry | |||
| congestion notification from non-IP-aware congested nodes up to the | congestion notification from non-IP-aware congested nodes up to the | |||
| transport layer (L4). Following these guidelines should assure | transport layer (L4). Specifications that follow these guidelines, | |||
| interworking among IP layer and lower layer congestion notification | whether produced by the IETF or other standards bodies, should assure | |||
| mechanisms, whether specified by the IETF or other standards bodies. | interworking among IP-layer and lower-layer congestion notification | |||
| This document is included in BCP 89 and updates the single paragraph | mechanisms. This document is included in BCP 89 and updates the | |||
| of advice to subnetwork designers about ECN in Section 13 of RFC | single paragraph of advice to subnetwork designers about Explicit | |||
| 3819, by replacing it with a reference to the whole of this document. | Congestion Notification (ECN) in Section 13 of RFC 3819 by replacing | |||
| it with a reference to this document. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This memo documents an Internet Best Current Practice. | |||
| provisions of BCP 78 and BCP 79. | ||||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
| and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
| time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
| material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
| BCPs is available in Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 7 June 2024. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9599. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2023 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| This document may contain material from IETF Documents or IETF | ||||
| Contributions published or made publicly available before November | ||||
| 10, 2008. The person(s) controlling the copyright in some of this | ||||
| material may not have granted the IETF Trust the right to allow | ||||
| modifications of such material outside the IETF Standards Process. | ||||
| Without obtaining an adequate license from the person(s) controlling | ||||
| the copyright in such materials, this document may not be modified | ||||
| outside the IETF Standards Process, and derivative works of it may | ||||
| not be created outside the IETF Standards Process, except to format | ||||
| it for publication as an RFC or to translate it into languages other | ||||
| than English. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
| 1.1. Update to RFC 3819 . . . . . . . . . . . . . . . . . . . 5 | 1.1. Update to RFC 3819 | |||
| 1.2. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Scope | |||
| 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 2. Terminology | |||
| 3. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 9 | 3. Modes of Operation | |||
| 3.1. Feed-Forward-and-Up Mode . . . . . . . . . . . . . . . . 10 | 3.1. Feed-Forward-and-Up Mode | |||
| 3.2. Feed-Up-and-Forward Mode . . . . . . . . . . . . . . . . 12 | 3.2. Feed-Up-and-Forward Mode | |||
| 3.3. Feed-Backward Mode . . . . . . . . . . . . . . . . . . . 12 | 3.3. Feed-Backward Mode | |||
| 3.4. Null Mode . . . . . . . . . . . . . . . . . . . . . . . . 14 | 3.4. Null Mode | |||
| 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion | 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion | |||
| Notification . . . . . . . . . . . . . . . . . . . . . . 14 | Notification | |||
| 4.1. IP-in-IP Tunnels with Shim Headers . . . . . . . . . . . 15 | 4.1. IP-in-IP Tunnels with Shim Headers | |||
| 4.2. Wire Protocol Design: Indication of ECN Support . . . . . 16 | 4.2. Wire Protocol Design: Indication of ECN Support | |||
| 4.3. Encapsulation Guidelines . . . . . . . . . . . . . . . . 19 | 4.3. Encapsulation Guidelines | |||
| 4.4. Decapsulation Guidelines . . . . . . . . . . . . . . . . 21 | 4.4. Decapsulation Guidelines | |||
| 4.5. Sequences of Similar Tunnels or Subnets . . . . . . . . . 22 | 4.5. Sequences of Similar Tunnels or Subnets | |||
| 4.6. Reframing and Congestion Markings . . . . . . . . . . . . 23 | 4.6. Reframing and Congestion Markings | |||
| 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | |||
| Notification . . . . . . . . . . . . . . . . . . . . . . 25 | Notification | |||
| 6. Feed-Backward Mode: Guidelines for Adding Congestion | 6. Feed-Backward Mode: Guidelines for Adding Congestion | |||
| Notification . . . . . . . . . . . . . . . . . . . . . . 26 | Notification | |||
| 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 | 7. IANA Considerations | |||
| 8. Security Considerations . . . . . . . . . . . . . . . . . . . 28 | 8. Security Considerations | |||
| 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 28 | 9. Conclusions | |||
| 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 29 | 10. References | |||
| 10.1. Normative References . . . . . . . . . . . . . . . . . . 29 | 10.1. Normative References | |||
| 10.2. Informative References . . . . . . . . . . . . . . . . . 30 | 10.2. Informative References | |||
| Comments Solicited . . . . . . . . . . . . . . . . . . . . . . . 34 | Acknowledgements | |||
| Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 34 | Contributors | |||
| Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 35 | Authors' Addresses | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 | ||||
| 1. Introduction | 1. Introduction | |||
| The benefits of Explicit Congestion Notification (ECN) described in | In certain networks, it might be possible for traffic to congest non- | |||
| [RFC8087] and summarized below can only be fully realized if support | IP-aware nodes. In such networks, the benefits of Explicit | |||
| for ECN is added to the relevant subnetwork technology, as well as to | Congestion Notification (ECN) described in [RFC8087] and summarized | |||
| IP. When a lower layer buffer drops a packet obviously it does not | below can only be fully realized if support for congestion | |||
| just drop at that layer; the packet disappears from all layers. In | notification is added to the relevant subnetwork technology, as well | |||
| contrast, when active queue management (AQM) at a lower layer marks a | as to IP. When a lower-layer buffer implicitly notifies congestion | |||
| packet with ECN, the marking needs to be explicitly propagated up the | by dropping a packet, it obviously does not just drop at that layer; | |||
| layers. The same is true if AQM marks the outer header of a packet | the packet disappears from all layers. In contrast, when active | |||
| that encapsulates inner tunnelled headers. Forwarding ECN is not as | queue management (AQM) at a lower layer buffer explicitly notifies | |||
| straightforward as other headers because it has to be assumed ECN may | congestion by marking a frame header, the marking needs to be | |||
| be only partially deployed. If a lower layer header that contains | explicitly propagated up the layers. The same is true if AQM marks | |||
| ECN congestion indications is stripped off by a subnet egress that is | the outer header of a packet that encapsulates inner tunnelled | |||
| not ECN-aware, or if the ultimate receiver or sender is not ECN- | headers. Forwarding ECN is not as straightforward as other headers | |||
| aware, congestion needs to be indicated by dropping the packet, not | because it has to be assumed ECN may be only partially deployed. If | |||
| marking it. | a lower-layer header that contains congestion indications is stripped | |||
| off by a subnet egress that is not ECN-aware, or if the ultimate | ||||
| receiver or sender is not ECN-aware, congestion needs to be indicated | ||||
| by dropping the packet, not marking it. | ||||
| The purpose of this document is to guide the addition of congestion | The purpose of this document is to guide the addition of congestion | |||
| notification to any subnet technology or tunnelling protocol, so that | notification to any subnet technology or tunnelling protocol so that | |||
| lower layer AQM algorithms can signal congestion explicitly and it | lower-layer AQM algorithms can signal congestion explicitly and that | |||
| will propagate consistently into encapsulated (higher layer) headers, | signal will propagate consistently into encapsulated (higher-layer) | |||
| otherwise the signals will not reach their ultimate destination. | headers. Otherwise, the signals will not reach their ultimate | |||
| destination. | ||||
| ECN is defined in the IP header (IPv4 and IPv6) [RFC3168] to allow a | ECN is defined in the IP header (IPv4 and IPv6) [RFC3168] to allow a | |||
| resource to notify the onset of queue build-up without having to drop | resource to notify the onset of queue buildup without having to drop | |||
| packets, by explicitly marking a proportion of packets with the | packets by explicitly marking a proportion of packets with the | |||
| congestion experienced (CE) codepoint. | congestion experienced (CE) codepoint. | |||
| Given a suitable marking scheme, ECN removes nearly all congestion | Given a suitable marking scheme, ECN removes nearly all congestion | |||
| loss and it cuts delays for two main reasons: | loss and it cuts delays for two main reasons: | |||
| * It avoids the delay when recovering from congestion losses, which | * It avoids the delay when recovering from congestion losses, which | |||
| particularly benefits small flows or real-time flows, making their | particularly benefits small flows or real-time flows, making their | |||
| delivery time predictably short [RFC2884]; | delivery time predictably short [RFC2884]. | |||
| * As ECN is used more widely by end-systems, it will gradually | * As ECN is used more widely by end systems, it will gradually | |||
| remove the need to configure a degree of delay into buffers before | remove the need to configure a degree of delay into buffers before | |||
| they start to notify congestion (the cause of bufferbloat). This | they start to notify congestion (the cause of bufferbloat). This | |||
| is because drop involves a trade-off between sending a timely | is because drop involves a trade-off between sending a timely | |||
| signal and trying to avoid impairment, whereas ECN is solely a | signal and trying to avoid impairment, whereas ECN is solely a | |||
| signal not an impairment, so there is no harm triggering it | signal and not an impairment, so there is no harm triggering it | |||
| earlier. | earlier. | |||
| Some lower layer technologies (e.g. MPLS, Ethernet) are used to form | Some lower-layer technologies (e.g., MPLS, Ethernet) are used to form | |||
| subnetworks with IP-aware nodes only at the edges. These networks | subnetworks with IP-aware nodes only at the edges. These networks | |||
| are often sized so that it is rare for interior queues to overflow. | are often sized so that it is rare for interior queues to overflow. | |||
| However, until recently this was more due to the inability of TCP to | However, until recently, this was more due to the inability of TCP to | |||
| saturate the links. For many years, fixes such as window scaling | saturate the links. For many years, fixes such as window scaling | |||
| [RFC7323] proved hard to deploy. And the Reno variant of TCP has | [RFC7323] proved hard to deploy and the Reno variant of TCP remained | |||
| remained in widespread use despite its inability to scale to high | in widespread use despite its inability to scale to high flow rates. | |||
| flow rates. However, now that modern operating systems are finally | However, now that modern operating systems are finally capable of | |||
| capable of saturating interior links, even the buffers of well- | saturating interior links, even the buffers of well-provisioned | |||
| provisioned interior switches will need to signal episodes of | interior switches will need to signal episodes of queuing. | |||
| queuing. | ||||
| Propagation of ECN is defined for MPLS [RFC5129], and has been | Propagation of ECN is defined for MPLS [RFC5129] and TRILL [RFC7780] | |||
| defined for TRILL [RFC7780], [I-D.ietf-trill-ecn-support], but it | [RFC9600], but it has yet to be defined for a number of other | |||
| remains to be defined for a number of other subnetwork technologies. | subnetwork technologies. | |||
| Similarly, ECN propagation is yet to be defined for many tunnelling | Similarly, ECN propagation is yet to be defined for many tunnelling | |||
| protocols. [RFC6040] defines how ECN should be propagated for IP-in- | protocols. [RFC6040] defines how ECN should be propagated for IP-in- | |||
| IPv4 [RFC2003], IP-in-IPv6 [RFC2473] and IPsec [RFC4301] tunnels, but | IPv4 [RFC2003], IP-in-IPv6 [RFC2473], and IPsec [RFC4301] tunnels, | |||
| there are numerous other tunnelling protocols with a shim and/or a | but there are numerous other tunnelling protocols with a shim and/or | |||
| layer 2 header between two IP headers (IPv4 or IPv6). Some address | a Layer 2 (L2) header between two IP headers (IPv4 or IPv6). Some | |||
| ECN propagation between the IP headers, but many do not. This | address ECN propagation between the IP headers, but many do not. | |||
| document gives guidance on how to address ECN propagation for future | This document gives guidance on how to address ECN propagation for | |||
| tunnelling protocols, and a companion standards track specification | future tunnelling protocols, and a companion Standards Track | |||
| [I-D.ietf-tsvwg-rfc6040update-shim] updates those existing IP-shim- | specification [RFC9601] updates existing tunnelling protocols with a | |||
| (L2)-IP protocols that are under IETF change control and still widely | shim between IP headers that are under IETF change control and still | |||
| used. | widely used. | |||
| Incremental deployment is the most delicate aspect when adding | Incremental deployment is the most delicate aspect when adding | |||
| support for ECN. The original ECN protocol in IP [RFC3168] was | support for ECN. The original ECN protocol in IP [RFC3168] was | |||
| carefully designed so that a congested buffer would not mark a packet | carefully designed so that a congested buffer would not mark a packet | |||
| (rather than drop it) unless both source and destination hosts were | (rather than drop it) unless both source and destination hosts were | |||
| ECN-capable. Otherwise, its congestion markings would never be | ECN-capable. Otherwise, its congestion markings would never be | |||
| detected and congestion would just build up further. However, to | detected and congestion would just build up further. However, to | |||
| support congestion marking below the IP layer or within tunnels, it | support congestion marking below the IP layer or within tunnels, it | |||
| is not sufficient to only check that the two layer 4 transport end- | is not sufficient to only check that the two layer 4 transport | |||
| points support ECN; correct operation also depends on the | endpoints support ECN; correct operation also depends on the | |||
| decapsulator at each subnet or tunnel egress faithfully propagating | decapsulator at each subnet or tunnel egress faithfully propagating | |||
| congestion notifications to the higher layer. Otherwise, a legacy | congestion notification to the higher layer. Otherwise, a legacy | |||
| decapsulator might silently fail to propagate any ECN signals from | decapsulator might silently fail to propagate any congestion signals | |||
| the outer to the forwarded header. Then the lost signals would never | from the outer header to the forwarded header. Then, the lost | |||
| be detected and again congestion would build up further. The | signals would never be detected and congestion would build up | |||
| guidelines given later require protocol designers to carefully | further. The guidelines given later require protocol designers to | |||
| consider incremental deployment, and suggest various safe approaches | carefully consider incremental deployment and suggest various safe | |||
| for different circumstances. | approaches for different circumstances. | |||
| Of course, the IETF does not have standards authority over every link | Of course, the IETF does not have standards authority over every | |||
| layer protocol. So this document gives guidelines for designing | link-layer protocol; thus, this document gives guidelines for | |||
| propagation of congestion notification across the interface between | designing propagation of congestion notification across the interface | |||
| IP and protocols that may encapsulate IP (i.e. that can be layered | between IP and protocols that may encapsulate IP (i.e., that can be | |||
| beneath IP). Each lower layer technology will exhibit different | layered beneath IP). Each lower-layer technology will exhibit | |||
| issues and compromises, so the IETF or the relevant standards body | different issues and compromises, so the IETF or the relevant | |||
| must be free to define the specifics of each lower layer congestion | standards body must be free to define the specifics of each lower- | |||
| notification scheme. Nonetheless, if the guidelines are followed, | layer congestion notification scheme. Nonetheless, if the guidelines | |||
| congestion notification should interwork between different | are followed, congestion notification should interwork between | |||
| technologies, using IP in its role as a 'portability layer'. | different technologies using IP in its role as a 'portability layer'. | |||
| Therefore, the capitalized terms 'SHOULD' or 'SHOULD NOT' are often | Therefore, the capitalized terms 'SHOULD' or 'SHOULD NOT' are often | |||
| used in preference to 'MUST' or 'MUST NOT', because it is difficult | used in preference to 'MUST' or 'MUST NOT' because it is difficult to | |||
| to know the compromises that will be necessary in each protocol | know the compromises that will be necessary in each protocol design. | |||
| design. If a particular protocol design chooses not to follow a | If a particular protocol design chooses not to follow a 'SHOULD' or | |||
| 'SHOULD (NOT)' given in the advice below, it MUST include a sound | 'SHOULD NOT' given in the advice below, it MUST include a sound | |||
| justification. | justification. | |||
| It has not been possible to give common guidelines for all lower | It has not been possible to give common guidelines for all lower- | |||
| layer technologies, because they do not all fit a common pattern. | layer technologies because they do not all fit a common pattern. | |||
| Instead, they have been divided into a few distinct modes of | Instead, they have been divided into a few distinct modes of | |||
| operation: feed-forward-and-upward; feed-upward-and-forward; feed- | operation: feed-forward-and-up, feed-up-and-forward, feed-backward, | |||
| backward; and null mode. These modes are described in Section 3, | and null mode. These modes are described in Section 3, and separate | |||
| then in the subsequent sections separate guidelines are given for | guidelines are given for each mode in subsequent sections. | |||
| each mode. | ||||
| 1.1. Update to RFC 3819 | 1.1. Update to RFC 3819 | |||
| This document updates the brief advice to subnetwork designers about | This document updates the brief advice to subnetwork designers about | |||
| ECN in Section 13 of [RFC3819], by replacing the last two paragraphs | ECN in Section 13 of [RFC3819] by adding this document (RFC 9599) as | |||
| with the following sentence: | an informative reference and replacing the last two paragraphs with | |||
| the following sentence: | ||||
| By following the guidelines in [this document], subnetwork | ||||
| designers can enable a layer-2 protocol to participate in | ||||
| congestion control without dropping packets via propagation of | ||||
| explicit congestion notification (ECN [RFC3168]) to receivers. | ||||
| and adding [this document] as an informative reference. {RFC Editor: | | By following the guidelines in [RFC9599], subnetwork designers can | |||
| Please replace both instances of [this document] above with the | | enable a layer-2 protocol to participate in congestion control | |||
| number of the present RFC when published.} | | without dropping packets via propagation of Explicit Congestion | |||
| | Notification (ECN) [RFC3168] to receivers. | ||||
| 1.2. Scope | 1.2. Scope | |||
| This document only concerns wire protocol processing of explicit | This document only concerns wire protocol processing of explicit | |||
| notification of congestion. It makes no changes or recommendations | notification of congestion. It makes no changes or recommendations | |||
| concerning algorithms for congestion marking or for congestion | concerning algorithms for congestion marking or congestion response | |||
| response, because algorithm issues should be independent of the layer | because algorithm issues should be independent of the layer that the | |||
| the algorithm operates in. | algorithm operates in. | |||
| The default ECN semantics are described in [RFC3168] and updated by | The default ECN semantics are described in [RFC3168] and updated by | |||
| [RFC8311]. Also, the guidelines for AQM designers [RFC7567] clarify | [RFC8311]. Also, the guidelines for AQM designers [RFC7567] clarify | |||
| the semantics of both drop and ECN signals from AQM algorithms. | the semantics of both drop and ECN signals from AQM algorithms. | |||
| [RFC4774] is the appropriate best current practice specification of | [RFC4774] is the appropriate best current practice specification of | |||
| how algorithms with alternative semantics for the ECN field can be | how algorithms with alternative semantics for the ECN field can be | |||
| partitioned from Internet traffic that uses the default ECN | partitioned from Internet traffic that uses the default ECN | |||
| semantics. There are two main examples for how alternative ECN | semantics. There are two main examples for how alternative ECN | |||
| semantics have been defined in practice: | semantics have been defined in practice: | |||
| * RFC 4774 suggests using the ECN field in combination with a | * [RFC4774] suggests using the ECN field in combination with a | |||
| Diffserv codepoint such as in PCN [RFC6660], Voice over 3G [UTRAN] | Diffserv codepoint, such as in Pre-Congestion Notification (PCN) | |||
| or Voice over LTE (VoLTE) [LTE-RA]; | [RFC6660], Voice over 3G [UTRAN], or Voice over LTE (VoLTE) | |||
| [LTE-RA]. | ||||
| * RFC 8311 suggests using the ECT(1) codepoint of the ECN field to | * [RFC8311] suggests using the ECT(1) codepoint of the ECN field to | |||
| indicate alternative semantics such as for the experimental Low | indicate alternative semantics, such as for the experimental Low | |||
| Latency Low Loss Scalable throughput (L4S) service [RFC9331]). | Latency, Low Loss, and Scalable throughput (L4S) service | |||
| [RFC9331]. | ||||
| The aim is that the default rules for encapsulating and decapsulating | The aim is that the default rules for encapsulating and decapsulating | |||
| the ECN field are sufficiently generic that tunnels and subnets will | the ECN field are sufficiently generic that tunnels and subnets will | |||
| encapsulate and decapsulate packets without regard to how algorithms | encapsulate and decapsulate packets without regard to how algorithms | |||
| elsewhere are setting or interpreting the semantics of the ECN field. | elsewhere are setting or interpreting the semantics of the ECN field. | |||
| [RFC6040] updates RFC 4774 to allow alternative encapsulation and | [RFC6040] updates [RFC4774] to allow alternative encapsulation and | |||
| decapsulation behaviours to be defined for alternative ECN semantics. | decapsulation behaviours to be defined for alternative ECN semantics. | |||
| However, it reinforces the same point - that it is far preferable to | However, it reinforces the same point -- it is far preferable to try | |||
| try to fit within the common ECN encapsulation and decapsulation | to fit within the common ECN encapsulation and decapsulation | |||
| behaviours, because expecting all lower layer technologies and | behaviours because expecting all lower-layer technologies and tunnels | |||
| tunnels to be updated is likely to be completely impractical. | to be updated is likely to be completely impractical. | |||
| Alternative semantics for the ECN field can be defined to depend on | Alternative semantics for the ECN field can be defined to depend on | |||
| the traffic class indicated by the DSCP. Therefore, correct | the traffic class indicated by the Differentiated Services Code Point | |||
| propagation of congestion signals could depend on correct propagation | (DSCP). Therefore, correct propagation of congestion signals could | |||
| of the DSCP between the layers and along the path. For instance, if | depend on correct propagation of the DSCP between the layers and | |||
| the meaning of the ECN field depends on the DSCP (as in PCN or VoLTE) | along the path. For instance, if the meaning of the ECN field | |||
| and if the outer DSCP is stripped on descapsulation, as in the pipe | depends on the DSCP (as in PCN or VoLTE) and the outer DSCP is | |||
| model of [RFC2983], the special semantics of the ECN field would be | stripped on descapsulation, as in the pipe model of [RFC2983], the | |||
| lost. Similarly, if the DSCP is changed at the boundary between | special semantics of the ECN field would be lost. Similarly, if the | |||
| Diffserv domains, the special ECN semantics would also be lost. This | DSCP is changed at the boundary between Diffserv domains, the special | |||
| is an important implication of the localized scope of most Diffserv | ECN semantics would also be lost. This is an important implication | |||
| arrangements. In this document, correct propagation of traffic class | of the localized scope of most Diffserv arrangements. In this | |||
| information is assumed, while what 'correct' means and how it is | document, correct propagation of traffic class information is assumed | |||
| achieved is covered elsewhere (e.g. RFC 2983) and is outside the | while the meaning of 'correct' and how it is achieved is covered | |||
| scope of the present document. | elsewhere (e.g., [RFC2983]) and is outside the scope of this | |||
| document. | ||||
| The guidelines in this document do ensure that common encapsulation | The guidelines in this document do ensure that common encapsulation | |||
| and decapsulation rules are sufficiently generic to cover cases where | and decapsulation rules are sufficiently generic to cover cases where | |||
| ECT(1) is used instead of ECT(0) to identify alternative ECN | ECT(1) is used instead of ECT(0) to identify alternative ECN | |||
| semantics (as in L4S [RFC9331]) and where ECN marking algorithms use | semantics (as in L4S [RFC9331]) and where ECN-marking algorithms use | |||
| ECT(1) to encode 3 severity levels into the ECN field (e.g. PCN | ECT(1) to encode three severity levels into the ECN field (e.g., PCN | |||
| [RFC6660]) rather than the default of 2. All these different | [RFC6660]) rather than the default of two. All these different | |||
| semantics for the ECN field work because it has been possible to | semantics for the ECN field work because it has been possible to | |||
| define common default decapsulation rules that allow for all cases. | define common default decapsulation rules that allow for all cases | |||
| [RFC6040]. | ||||
| Note that the guidelines in this document do not necessarily require | Note that the guidelines in this document do not necessarily require | |||
| the subnet wire protocol to be changed to add support for congestion | the subnet wire protocol to be changed to add support for congestion | |||
| notification. For instance, the Feed-Up-and-Forward Mode | notification. For instance, the feed-up-and-forward mode | |||
| (Section 3.2) and the Null Mode (Section 3.4) do not. Another way to | (Section 3.2) and the null mode (Section 3.4) do not. Another way to | |||
| add congestion notification without consuming header space in the | add congestion notification without consuming header space in the | |||
| subnet protocol might be to use a parallel control plane protocol. | subnet protocol might be to use a parallel control plane protocol. | |||
| This document focuses on the congestion notification interface | This document focuses on the congestion notification interface | |||
| between IP and lower layer or tunnel protocols that can encapsulate | between IP and lower-layer or tunnel protocols that can encapsulate | |||
| IP, where the term 'IP' includes IPv4 or IPv6, unicast, multicast or | IP, where the term 'IP' includes IPv4 or IPv6, unicast, multicast, or | |||
| anycast. However, it is likely that the guidelines will also be | anycast. However, it is likely that the guidelines will also be | |||
| useful when a lower layer protocol or tunnel encapsulates itself, | useful when a lower-layer protocol or tunnel encapsulates itself, | |||
| e.g. Ethernet MAC in MAC ([IEEE802.1Q]; previously 802.1ah) or when | e.g., Ethernet Media Access Control (MAC) in MAC ([IEEE802.1Q]; | |||
| it encapsulates other protocols. In the feed-backward mode, | previously 802.1ah), or when it encapsulates other protocols. In the | |||
| propagation of congestion signals for multicast and anycast packets | feed-backward mode, propagation of congestion signals for multicast | |||
| is out-of-scope (because the complexity would make it unlikely to be | and anycast packets is out of scope (because the complexity would | |||
| attempted). | make it unlikely to be attempted). | |||
| 2. Terminology | 2. Terminology | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
| BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| Further terminology used within this document: | Further terminology used within this document: | |||
| Protocol data unit (PDU): Information that is delivered as a unit | Protocol data unit (PDU): Information that is delivered as a unit | |||
| among peer entities of a layered network consisting of protocol | among peer entities of a layered network consisting of protocol | |||
| control information (typically a header) and possibly user data | control information (typically a header) and possibly user data | |||
| (payload) of that layer. The scope of this document includes | (payload) of that layer. The scope of this document includes | |||
| layer 2 and layer 3 networks, where the PDU is respectively termed | Layer 2 and Layer 3 networks, where the PDU is respectively termed | |||
| a frame or a packet (or a cell in ATM). PDU is a general term for | a frame or a packet (or a cell in ATM). PDU is a general term for | |||
| any of these. This definition also includes a payload with a shim | any of these. This definition also includes a payload with a shim | |||
| header lying somewhere between layer 2 and 3. | header lying somewhere between layer 2 and 3. | |||
| Transport: The end-to-end transmission control function, | Transport: The end-to-end transmission control function, | |||
| conventionally considered at layer-4 in the OSI reference model. | conventionally considered at layer 4 in the OSI reference model. | |||
| Given the audience for this document will often use the word | Given the audience for this document will often use the word | |||
| transport to mean low level bit carriage, whenever the term is | transport to mean low-level bit carriage, the term will be | |||
| used it will be qualified, e.g. 'L4 transport'. | qualified whenever it is used, e.g., 'L4 transport'. | |||
| Encapsulator: The link or tunnel endpoint function that adds an | Encapsulator: The link or tunnel endpoint function that adds an | |||
| outer header to a PDU (also termed the 'link ingress', the 'subnet | outer header to a PDU (also termed the 'link ingress', the 'subnet | |||
| ingress', the 'ingress tunnel endpoint' or just the 'ingress' | ingress', the 'ingress tunnel endpoint', or just the 'ingress' | |||
| where the context is clear). | where the context is clear). | |||
| Decapsulator: The link or tunnel endpoint function that removes an | Decapsulator: The link or tunnel endpoint function that removes an | |||
| outer header from a PDU (also termed the 'link egress', the | outer header from a PDU (also termed the 'link egress', the | |||
| 'subnet egress', the 'egress tunnel endpoint' or just the 'egress' | 'subnet egress', the 'egress tunnel endpoint', or just the | |||
| where the context is clear). | 'egress' where the context is clear). | |||
| Incoming header: The header of an arriving PDU before encapsulation. | Incoming header: The header of an arriving PDU before encapsulation. | |||
| Outer header: The header added to encapsulate a PDU. | Outer header: The header added to encapsulate a PDU. | |||
| Inner header: The header encapsulated by the outer header. | Inner header: The header encapsulated by the outer header. | |||
| Outgoing header: The header forwarded by the decapsulator. | Outgoing header: The header forwarded by the decapsulator. | |||
| CE: Congestion Experienced [RFC3168] | CE: Congestion Experienced [RFC3168] | |||
| skipping to change at page 9, line 4 ¶ | skipping to change at line 350 ¶ | |||
| Inner header: The header encapsulated by the outer header. | Inner header: The header encapsulated by the outer header. | |||
| Outgoing header: The header forwarded by the decapsulator. | Outgoing header: The header forwarded by the decapsulator. | |||
| CE: Congestion Experienced [RFC3168] | CE: Congestion Experienced [RFC3168] | |||
| ECT: ECN-Capable (L4) Transport [RFC3168] | ECT: ECN-Capable (L4) Transport [RFC3168] | |||
| Not-ECT: Not ECN-Capable (L4) Transport [RFC3168] | Not-ECT: Not ECN-Capable (L4) Transport [RFC3168] | |||
| Load Regulator: For each flow of PDUs, the transport function that | Load Regulator: For each flow of PDUs, the transport function that | |||
| is capable of controlling the data rate. Typically located at the | is capable of controlling the data rate. Typically located at the | |||
| data source, but in-path nodes can regulate load in some | data source, but in-path nodes can regulate load in some | |||
| congestion control arrangements (e.g. admission control, policing | congestion control arrangements (e.g., admission control, policing | |||
| nodes or transport circuit-breakers [RFC8084]). Note the term "a | nodes, or transport circuit-breakers [RFC8084]). Note that "a | |||
| function capable of controlling the load" deliberately includes a | function capable of controlling the load" deliberately includes a | |||
| transport that does not actually control the load responsively but | transport that does not actually control the load responsively, | |||
| ideally it ought to (e.g. a sending application without congestion | but ideally it ought to (e.g., a sending application without | |||
| control that uses UDP). | congestion control that uses UDP). | |||
| ECN-PDU: A PDU at the IP layer or below with a capacity to signal | ECN-PDU: A PDU at the IP layer or below with a capacity to signal | |||
| congestion that is part of a congestion control feedback loop | congestion that is part of a congestion control feedback loop | |||
| within which all the nodes necessary to propagate the signal back | within which all the nodes necessary to propagate the signal back | |||
| to the Load Regulator are capable of doing that propagation. An | to the Load Regulator are capable of doing that propagation. An | |||
| IP packet with a non-zero ECN field implies that the endpoints are | IP packet with a non-zero ECN field implies that the endpoints are | |||
| ECN-capable, so this would be an ECN-PDU. However, ECN-PDU is | ECN-capable, so this would be an ECN-PDU. However, ECN-PDU is | |||
| intended to be a general term for a PDU at lower layers, as well | intended to be a general term for a PDU at lower layers, as well | |||
| as at the IP layer. | as at the IP layer. | |||
| Not-ECN-PDU: A PDU at the IP layer or below that is part of a | Not-ECN-PDU: A PDU at the IP layer or below that is part of a | |||
| congestion control feedback loop that is not capable of | congestion control feedback loop that is not capable of | |||
| propagating explicit congestion notification signals back to the | propagating ECN signals back to the Load Regulator because at | |||
| Load Regulator, because at least one of the nodes necessary to | least one of the nodes necessary to propagate the signals is | |||
| propagate the signals is incapable of doing that propagation. | incapable of doing that propagation. Note that this definition is | |||
| Note that this definition is a property of the feedback-loop, not | a property of the feedback loop, not necessarily of the PDU | |||
| necessarily of the PDU itself, because in some protocols the PDU | itself; certainly the PDU will self-describe the property in some | |||
| will self-describe the property, but in others the property might | protocols, but in others, the property might be carried in a | |||
| be carried in a separate control-plane context that is somehow | separate control plane context (which is somehow bound to the | |||
| bound to the PDU. | PDU). | |||
| 3. Modes of Operation | 3. Modes of Operation | |||
| This section sets down the different modes by which congestion | This section sets down the different modes by which congestion | |||
| information is passed between the lower layer and the higher one. It | information is passed between the lower layer and the higher one. It | |||
| acts as a reference framework for the following sections, which give | acts as a reference framework for the subsequent sections that give | |||
| normative guidelines for designers of explicit congestion | normative guidelines for designers of congestion notification | |||
| notification protocols, taking each mode in turn: | protocols, taking each mode in turn: | |||
| Feed-Forward-and-Up: Nodes feed forward congestion notification | Feed-Forward-and-Up: Nodes feed forward congestion notification | |||
| towards the egress within the lower layer then up and along the | towards the egress within the lower layer, then up and along the | |||
| layers towards the end-to-end destination at the transport layer. | layers towards the end-to-end destination at the transport layer. | |||
| The following local optimisation is possible: | The following local optimization is possible: | |||
| Feed-Up-and-Forward: A lower layer switch feeds-up congestion | Feed-Up-and-Forward: A lower-layer switch feeds up congestion | |||
| notification directly into the higher layer (e.g. into the ECN | notification directly into the higher layer (e.g., into the ECN | |||
| field in the IP header), irrespective of whether the node is at | field in the IP header), irrespective of whether the node is at | |||
| the egress of a subnet. | the egress of a subnet. | |||
| Feed-Backward: Nodes feed back congestion signals towards the | Feed-Backward: Nodes feed back congestion signals towards the | |||
| ingress of the lower layer and (optionally) attempt to control | ingress of the lower layer and (optionally) attempt to control | |||
| congestion within their own layer. | congestion within their own layer. | |||
| Null: Nodes cannot experience congestion at the lower layer except | Null: Nodes cannot experience congestion at the lower layer except | |||
| at ingress nodes (which are IP-aware or equivalently higher-layer- | at the ingress nodes of the subnet (which are IP-aware or | |||
| aware). | equivalently higher-layer-aware). | |||
| 3.1. Feed-Forward-and-Up Mode | 3.1. Feed-Forward-and-Up Mode | |||
| Like IP and MPLS, many subnet technologies are based on self- | Like IP and MPLS, many subnet technologies are based on self- | |||
| contained protocol data units (PDUs) or frames sent unreliably. They | contained PDUs or frames sent unreliably. They provide no feedback | |||
| provide no feedback channel at the subnetwork layer, instead relying | channel at the subnetwork layer, instead relying on higher layers | |||
| on higher layers (e.g. TCP) to feed back loss signals. | (e.g., TCP) to feed back loss signals. | |||
| In these cases, ECN may best be supported by standardising explicit | In these cases, ECN may best be supported by standardising explicit | |||
| notification of congestion into the lower layer protocol that carries | notification of congestion into the lower-layer protocol that carries | |||
| the data forwards. Then a specification is needed for how the egress | the data forwards. Then, a specification is needed for how the | |||
| of the lower layer subnet propagates this explicit signal into the | egress of the lower-layer subnet propagates this explicit signal into | |||
| forwarded upper layer (IP) header. This signal continues forwards | the forwarded upper-layer (IP) header. This signal continues | |||
| until it finally reaches the destination transport (at L4). Then | forwards until it finally reaches the destination transport (at L4). | |||
| typically the destination will feed this congestion notification back | Typically, the destination will feed this congestion notification | |||
| to the source transport using an end-to-end protocol (e.g. TCP). | back to the source transport using an end-to-end protocol (e.g., | |||
| This is the arrangement that has already been used to add ECN to IP- | TCP). This is the arrangement that has already been used to add ECN | |||
| in-IP tunnels [RFC6040], IP-in-MPLS and MPLS-in-MPLS [RFC5129]. | to IP-in-IP tunnels [RFC6040], IP-in-MPLS, and MPLS-in-MPLS | |||
| [RFC5129]. | ||||
| This mode is illustrated in Figure 1. Along the middle of the | This mode is illustrated in Figure 1. Along the middle of the | |||
| figure, layers 2, 3 and 4 of the protocol stack are shown, and one | figure, layers 2, 3, and 4 of the protocol stack are shown. One | |||
| packet is shown along the bottom as it progresses across the network | packet is shown along the bottom as it progresses across the network | |||
| from source to destination, crossing two subnets connected by a | from source to destination, crossing two subnets connected by a | |||
| router, and crossing two switches on the path across each subnet. | router and crossing two switches on the path across each subnet. | |||
| Congestion at the output of the first switch (shown as *) leads to a | Congestion at the output of the first switch (shown as *) leads to a | |||
| congestion marking in the L2 header (shown as C in the illustration | congestion marking in the L2 header (shown as C in the illustration | |||
| of the packet). The chevrons show the progress of the resulting | of the packet). The chevrons show the progress of the resulting | |||
| congestion indication. It is propagated from link to link across the | congestion indication. It is propagated from link to link across the | |||
| subnet in the L2 header, then when the router removes the marked L2 | subnet in the L2 header. Then, when the router removes the marked L2 | |||
| header, it propagates the marking up into the L3 (IP) header. The | header, it propagates the marking up into the L3 (IP) header. The | |||
| router forwards the marked L3 header into subnet B. The L2 protocol | router forwards the marked L3 header into subnet B. The L2 protocol | |||
| used in subnet B does not support ECN, but the signal proceeds across | used in subnet B does not support congestion notification, but the | |||
| it in the L3 header. | signal proceeds across it in the L3 header. | |||
| Note that there is no implication that each 'C' marking is encoded | Note that there is no implication that each 'C' marking is encoded | |||
| the same; a different encoding might be used for the 'C' marking in | the same; a different encoding might be used for the 'C' marking in | |||
| each protocol. | each protocol. | |||
| Finally, for completeness, we show the L3 marking arriving at the | Finally, for completeness, we show the L3 marking arriving at the | |||
| destination, where the host transport protocol (e.g. TCP) feeds it | destination, where the host transport protocol (e.g., TCP) feeds it | |||
| back to the source in the L4 acknowledgement (the 'C' at L4 in the | back to the source in the L4 acknowledgement (the 'C' at L4 in the | |||
| packet at the top of the diagram). | packet at the top of the diagram). | |||
| _ _ _ | _ _ _ | |||
| /_______ | | |C| ACK Packet (V) | /_______ | | |C| ACK Packet (V) | |||
| \ |_|_|_| | \ |_|_|_| | |||
| +---+ layer: 2 3 4 header +---+ | +---+ layer: 2 3 4 header +---+ | |||
| | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 | | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 | |||
| | | +---+ | ^ | | | | +---+ | ^ | | |||
| | | . . . . . . Packet U. . | >>|>>> Packet U >>>>>>>>>>>>|>^ |L3 | | | . . . . . . Packet U. . | >>|>>> Packet U >>>>>>>>>>>>|>^ |L3 | |||
| | | +---+ +---+ | ^ | +---+ +---+ | | | | | +---+ +---+ | ^ | +---+ +---+ | | | |||
| | | | *|>>>>>|>>>|>>>>>|>^ | | | | | | |L2 | | | | *|>>>>>|>>>|>>>>>|>^ | | | | | | |L2 | |||
| |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| | |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| | |||
| source subnet A router subnet B dest | source subnet A router subnet B dest | |||
| __ _ _ _| __ _ _ _| __ _ _| __ _ _ _| | __ _ _ _| __ _ _ _| __ _ _| __ _ _ _| | |||
| | | | | | | | | |C| | | |C| | | |C| | Data________\ | | | | | | | | | |C| | | |C| | | |C| | Data________\ | |||
| |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / | |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / | |||
| layer:4 3 2A 4 3 2A 4 3 4 3 2B | layer:4 3 2A 4 3 2A 4 3 4 3 2B | |||
| header | header | |||
| Figure 1: Feed-Forward-and-Up Mode | Figure 1: Feed-Forward-and-Up Mode | |||
| Of course, modern networks are rarely as simple as this text-book | Of course, modern networks are rarely as simple as this textbook | |||
| example, often involving multiple nested layers. For example, a 3GPP | example, often involving multiple nested layers. For example, a | |||
| mobile network may have two IP-in-IP (GTP [GTPv1]) tunnels in series | Third Generation Partnership Project (3GPP) mobile network may have | |||
| and an MPLS backhaul between the base station and the first router. | two IP-in-IP GTP [GTPv1] tunnels in series and an MPLS backhaul | |||
| Nonetheless, the example illustrates the general idea of feeding | between the base station and the first router. Nonetheless, the | |||
| congestion notification forward then upward whenever a header is | example illustrates the general idea of feeding congestion | |||
| removed at the egress of a subnet. | notification forward then upward whenever a header is removed at the | |||
| egress of a subnet. | ||||
| Note that the FECN (forward ECN ) bit in Frame Relay [Buck00] and the | Note that the Forward Explicit Congestion Notification (FECN) bit in | |||
| explicit forward congestion indication (EFCI [ITU-T.I.371]) bit in | Frame Relay [Buck00] and the Explicit Forward Congestion Indication | |||
| ATM user data cells follow a feed-forward pattern. However, in ATM, | (EFCI) [ITU-T.I.371] bit in ATM user data cells follow a feed-forward | |||
| this arrangement is only part of a feed-forward-and-backward pattern | pattern. However, in ATM, this arrangement is only part of a feed- | |||
| at the lower layer, not feed-forward-and-up out of the lower layer — | forward-and-backward pattern at the lower layer, not feed-forward- | |||
| the intention was never to interface to IP ECN at the subnet egress. | and-up out of the lower layer -- the intention was never to interface | |||
| To our knowledge, Frame Relay FECN is solely used to detect where | with IP-ECN at the subnet egress. To our knowledge, Frame Relay FECN | |||
| more capacity should be provisioned. | is solely used by network operators to detect where they should | |||
| provision more capacity. | ||||
| 3.2. Feed-Up-and-Forward Mode | 3.2. Feed-Up-and-Forward Mode | |||
| Ethernet is particularly difficult to extend incrementally to support | Ethernet is particularly difficult to extend incrementally to support | |||
| explicit congestion notification. One way to support ECN in such | congestion notification. One way is to use so-called 'Layer 3 | |||
| cases has been to use so called 'layer-3 switches'. These are | switches'. These are Ethernet switches that dig into the Ethernet | |||
| Ethernet switches that dig into the Ethernet payload to find an IP | payload to find an IP header and manipulate or act on certain IP | |||
| header and manipulate or act on certain IP fields (specifically | fields (specifically Diffserv and ECN). For instance, in Data Center | |||
| Diffserv & ECN). For instance, in Data Center TCP [RFC8257], layer-3 | TCP [RFC8257], Layer 3 switches are configured to mark the ECN field | |||
| switches are configured to mark the ECN field of the IP header within | of the IP header within the Ethernet payload when their output buffer | |||
| the Ethernet payload when their output buffer becomes congested. | becomes congested. With respect to switching, a Layer 3 switch acts | |||
| With respect to switching, a layer-3 switch acts solely on the | solely on the addresses in the Ethernet header; it does not use IP | |||
| addresses in the Ethernet header; it does not use IP addresses, and | addresses and it does not decrement the TTL field in the IP header. | |||
| it does not decrement the TTL field in the IP header. | ||||
| _ _ _ | _ _ _ | |||
| /_______ | | |C| ACK packet (V) | /_______ | | |C| ACK packet (V) | |||
| \ |_|_|_| | \ |_|_|_| | |||
| +---+ layer: 2 3 4 header +---+ | +---+ layer: 2 3 4 header +---+ | |||
| | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 | | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet V <<<<<<<<<<<<<|<< |L4 | |||
| | | +---+ | ^ | | | | +---+ | ^ | | |||
| | | . . . >>>> Packet U >>>|>>>|>>> Packet U >>>>>>>>>>>>|>^ |L3 | | | . . . >>>> Packet U >>>|>>>|>>> Packet U >>>>>>>>>>>>|>^ |L3 | |||
| | | +--^+ +---+ | v| +---+ +---+ | ^ | | | | +--^+ +---+ | v| +---+ +---+ | ^ | | |||
| | | | *| | | | >|>>>>>|>>>|>>>>>|>>>|>>>>>|>^ |L2 | | | | *| | | | >|>>>>>|>>>|>>>>>|>>>|>>>>>|>^ |L2 | |||
| |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| | |___|_____|___|_____|___|_____|___|_____|___|_____|___|_____|___| | |||
| source subnet E router subnet F dest | source subnet E router subnet F dest | |||
| __ _ _ _| __ _ _ _| __ _ _| __ _ _ _| | __ _ _ _| __ _ _ _| __ _ _| __ _ _ _| | |||
| | | | | | | | |C| | | | |C| | | |C|C| Data________\ | | | | | | | | |C| | | | |C| | | |C|C| Data________\ | |||
| |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / | |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| Packet (U) / | |||
| layer:4 3 2 4 3 2 4 3 4 3 2 | layer:4 3 2 4 3 2 4 3 4 3 2 | |||
| header | header | |||
| Figure 2: Feed-Up-and-Forward Mode | Figure 2: Feed-Up-and-Forward Mode | |||
| By comparing Figure 2 with Figure 1, it can be seen that subnet E | By comparing Figure 2 with Figure 1, it can be seen that subnet E | |||
| (perhaps a subnet of layer-3 Ethernet switches) works in feed-up-and- | (perhaps a subnet of Layer 3 Ethernet switches) works in feed-up-and- | |||
| forward mode by notifying congestion directly into L3 at the point of | forward mode by notifying congestion directly into L3 at the point of | |||
| congestion, even though the congested switch does not otherwise act | congestion, even though the congested switch does not otherwise act | |||
| at L3. In this example, the technology in subnet F (e.g. MPLS) does | at L3. In this example, the technology in subnet F (e.g., MPLS) does | |||
| support ECN natively, so when the router adds the layer-2 header it | support ECN. So, when the router adds the Layer 2 header, it copies | |||
| copies the ECN marking from L3 to L2 as well, as shown by the 'C's in | the ECN marking from L3 to L2 as well, as shown by the 'C's in both | |||
| both layers. | layers. | |||
| 3.3. Feed-Backward Mode | 3.3. Feed-Backward Mode | |||
| In some layer 2 technologies, explicit congestion notification has | In some Layer 2 technologies, congestion notification has been | |||
| been defined for use internally within the subnet with its own | defined for use internally within the subnet with its own feedback | |||
| feedback and load regulation, but typically the interface with IP for | and load regulation but the interface with IP for ECN has not been | |||
| ECN has not been defined. | defined. | |||
| For instance, for the available bit-rate (ABR) service in ATM, the | For instance, the relative rate mechanism was one of the more popular | |||
| relative rate mechanism was one of the more popular mechanisms for | ways to manage traffic for the Available Bit Rate (ABR) service in | |||
| managing traffic, tending to supersede earlier designs. In this | ATM, and it tended to supersede earlier designs. In this approach, | |||
| approach ATM switches send special resource management (RM) cells in | ATM switches send special resource management (RM) cells in both the | |||
| both the forward and backward directions to control the ingress rate | forward and backward directions to control the ingress rate of user | |||
| of user data into a virtual circuit. If a switch buffer is | data into a virtual circuit. If a switch buffer is approaching | |||
| approaching congestion or is congested it sends an RM cell back | congestion or is congested, it sends an RM cell back towards the | |||
| towards the ingress with respectively the No Increase (NI) or | ingress with respectively the No Increase (NI) or Congestion | |||
| Congestion Indication (CI) bit set in its message type field | Indication (CI) bit set in its message type field [ATM-TM-ABR]. The | |||
| [ATM-TM-ABR]. The ingress then holds or decreases its sending bit- | ingress then holds or decreases its sending bit rate accordingly. | |||
| rate accordingly. | ||||
| _ _ _ | _ _ _ | |||
| /_______ | | |C| ACK packet (X) | /_______ | | |C| ACK packet (X) | |||
| \ |_|_|_| | \ |_|_|_| | |||
| +---+ layer: 2 3 4 header +---+ | +---+ layer: 2 3 4 header +---+ | |||
| | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet X <<<<<<<<<<<<<|<< |L4 | | <|<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Packet X <<<<<<<<<<<<<|<< |L4 | |||
| | | +---+ | ^ | | | | +---+ | ^ | | |||
| | | | *|>>> Packet W >>>>>>>>>>>>|>^ |L3 | | | | *|>>> Packet W >>>>>>>>>>>>|>^ |L3 | |||
| | | +---+ +---+ | | +---+ +---+ | | | | | +---+ +---+ | | +---+ +---+ | | | |||
| | | | | | | | <|<<<<<|<<<|<(V)<|<<<| | |L2 | | | | | | | | <|<<<<<|<<<|<(V)<|<<<| | |L2 | |||
| skipping to change at page 13, line 46 ¶ | skipping to change at line 576 ¶ | |||
| 2 | 2 | |||
| __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ earlier | __ _ _ _ __ _ _ _ __ _ _ __ _ _ _ earlier | |||
| | | | | | | | | | | | | | | | | | | | data________\ | | | | | | | | | | | | | | | | | | | | data________\ | |||
| |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (U) / | |__|_|_|_| |__|_|_|_| |__|_|_| |__|_|_|_| packet (U) / | |||
| layer: 4 3 2 4 3 2 4 3 4 3 2 | layer: 4 3 2 4 3 2 4 3 4 3 2 | |||
| header | header | |||
| Figure 3: Feed-Backward Mode | Figure 3: Feed-Backward Mode | |||
| ATM's feed-backward approach does not fit well when layered beneath | ATM's feed-backward approach does not fit well when layered beneath | |||
| IP's feed-forward approach — unless the initial data source is the | IP's feed-forward approach unless the initial data source is the same | |||
| same node as the ATM ingress. Figure 3 shows the feed-backward | node as the ATM ingress. Figure 3 shows the feed-backward approach | |||
| approach being used in subnet H. If the final switch on the path is | being used in subnet H. If the final switch on the path is congested | |||
| congested (*), it does not feed-forward any congestion indications on | (*), it does not feed forward any congestion indications on the | |||
| packet (U). Instead it sends a control cell (V) back to the router | packet (U). Instead, it sends a control cell (V) back to the router | |||
| at the ATM ingress. | at the ATM ingress. | |||
| However, the backward feedback does not reach the original data | However, the backward feedback does not reach the original data | |||
| source directly because IP does not support backward feedback (and | source directly because IP does not support backward feedback (and | |||
| subnet G is independent of subnet H). Instead, the router in the | subnet G is independent of subnet H). Instead, the router in the | |||
| middle throttles down its sending rate but the original data sources | middle throttles down its sending rate, but the original data sources | |||
| don't reduce their rates. The resulting rate mismatch causes the | don't reduce their rates. The resulting rate mismatch causes the | |||
| middle router's buffer at layer 3 to back up until it becomes | middle router's buffer at layer 3 to back up until it becomes | |||
| congested, which it signals forwards on later data packets at layer 3 | congested, which it signals forwards on later data packets at layer 3 | |||
| (e.g. packet W). Note that the forward signal from the middle router | (e.g., packet W). Note that the forward signal from the middle | |||
| is not triggered directly by the backward signal. Rather, it is | router is not triggered directly by the backward signal. Rather, it | |||
| triggered by congestion resulting from the middle router's mismatched | is triggered by congestion resulting from the middle router's | |||
| rate response to the backward signal. | mismatched rate response to the backward signal. | |||
| In response to this later forward signalling, end-to-end feedback at | In response to this later forward signalling, end-to-end feedback at | |||
| layer-4 finally completes the tortuous path of congestion indications | layer 4 finally completes the tortuous path of congestion indications | |||
| back to the origin data source, as before. | back to the origin data source as before. | |||
| Quantized congestion notification (QCN [IEEE802.1Q]) would suffer | Quantized Congestion Notification (QCN) [IEEE802.1Q] would suffer | |||
| from similar problems if extended to multiple subnets. However, from | from similar problems if extended to multiple subnets. However, QCN | |||
| the start QCN was clearly characterized as solely applicable to a | was clearly characterized as solely applicable to a single subnet | |||
| single subnet (see Section 6). | from the start (see Section 6). | |||
| 3.4. Null Mode | 3.4. Null Mode | |||
| Often link and physical layer resources are 'non-blocking' by design. | Link- and physical-layer resources are often 'non-blocking' by | |||
| In these cases congestion notification may be implemented but it does | design. Congestion notification may be implemented in these cases, | |||
| not need to be deployed at the lower layer; ECN in IP would be | but it does not need to be deployed at the lower layer; ECN in IP | |||
| sufficient. | would be sufficient. | |||
| A degenerate example is a point-to-point Ethernet link. Excess | A degenerate example is a point-to-point Ethernet link. Excess | |||
| loading of the link merely causes the queue from the higher layer to | loading of the link merely causes the queue from the higher layer to | |||
| back up, while the lower layer remains immune to congestion. Even a | back up, while the lower layer remains immune to congestion. Even a | |||
| whole meshed subnetwork can be made immune to interior congestion by | whole meshed subnetwork can be made immune to interior congestion by | |||
| limiting ingress capacity and sufficient sizing of interior links, | limiting ingress capacity and sufficient sizing of interior links, | |||
| e.g. a non-blocking fat-tree network [Leiserson85]. An alternative | e.g., a non-blocking fat-tree network [Leiserson85]. An alternative | |||
| to fat links near the root is numerous thin links with multi-path | to fat links near the root is numerous thin links with multi-path | |||
| routing to ensure even worst-case patterns of load cannot congest any | routing to ensure even worst-case patterns of load cannot congest any | |||
| link, e.g. a Clos network [Clos53]. | link, e.g., a Clos network [Clos53]. | |||
| 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion | 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion | |||
| Notification | Notification | |||
| Feed-forward-and-up is the mode already used for signalling ECN up | Feed-forward-and-up is the mode already used for signalling ECN up | |||
| the layers through MPLS into IP [RFC5129] and through IP-in-IP | the layers through MPLS into IP [RFC5129] and through IP-in-IP | |||
| tunnels [RFC6040], whether encapsulating with IPv4 [RFC2003], IPv6 | tunnels [RFC6040], whether encapsulating with IPv4 [RFC2003], IPv6 | |||
| [RFC2473] or IPsec [RFC4301]. These RFCs take a consistent approach | [RFC2473], or IPsec [RFC4301]. These RFCs take a consistent approach | |||
| and the following guidelines are designed to ensure this consistency | and the following guidelines are designed to ensure this consistency | |||
| continues as ECN support is added to other protocols that encapsulate | continues as ECN support is added to other protocols that encapsulate | |||
| IP. The guidelines are also designed to ensure compliance with the | IP. The guidelines are also designed to ensure compliance with the | |||
| more general best current practice for the design of alternate ECN | more general best current practice for the design of alternate ECN | |||
| schemes given in [RFC4774] and extended by [RFC8311]. | schemes given in [RFC4774] and extended by [RFC8311]. | |||
| The rest of this section is structured as follows: | The rest of this section is structured as follows: | |||
| * Section 4.1 addresses the most straightforward cases, where | * Section 4.1 addresses the most straightforward cases, where | |||
| [RFC6040] can be applied directly to add ECN to tunnels that are | [RFC6040] can be applied directly to add ECN to tunnels that are | |||
| effectively IP-in-IP tunnels, but with shim header(s) between the | effectively IP-in-IP tunnels, but with a shim header(s) between | |||
| IP headers. | the IP headers. | |||
| * The subsequent sections give guidelines for adding ECN to a subnet | * The subsequent sections give guidelines for adding congestion | |||
| technology that uses feed-forward-and-up mode like IP, but it is | notification to a subnet technology that uses feed-forward-and-up | |||
| not so similar to IP that [RFC6040] rules can be applied directly. | mode like IP, but it is not so similar to IP that [RFC6040] rules | |||
| Specifically: | can be applied directly. Specifically: | |||
| - Sections 4.2, 4.3 and 4.4 respectively address how to add ECN | - Sections 4.2, 4.3, and 4.4 address how to add ECN support to | |||
| support to the wire protocol and to the encapsulators and | the wire protocol and to the encapsulators and decapsulators at | |||
| decapsulators at the ingress and egress of the subnet. | the ingress and egress of the subnet, respectively. | |||
| - Section 4.5 deals with the special, but common, case of | - Section 4.5 deals with the special but common case of sequences | |||
| sequences of tunnels or subnets that all use the same | of tunnels or subnets that all use the same technology. | |||
| technology | ||||
| - Section 4.6 deals with the question of reframing when IP | - Section 4.6 deals with the question of reframing when IP | |||
| packets do not map 1:1 into lower layer frames. | packets do not map 1:1 into lower-layer frames. | |||
| 4.1. IP-in-IP Tunnels with Shim Headers | 4.1. IP-in-IP Tunnels with Shim Headers | |||
| A common pattern for many tunnelling protocols is to encapsulate an | A common pattern for many tunnelling protocols is to encapsulate an | |||
| inner IP header with shim header(s) then an outer IP header. A shim | inner IP header with a shim header(s) then an outer IP header. A | |||
| header is defined as one that is not sufficient alone to forward the | shim header is defined as one that is not sufficient alone to forward | |||
| packet as an outer header. Another common pattern is for a shim to | the packet as an outer header. Another common pattern is for a shim | |||
| encapsulate a layer 2 (L2) header, which in turn encapsulates (or | to encapsulate an L2 header, which in turn encapsulates (or might | |||
| might encapsulate) an IP header. [I-D.ietf-tsvwg-rfc6040update-shim] | encapsulate) an IP header. [RFC9601] clarifies that [RFC6040] is | |||
| clarifies that RFC 6040 is just as applicable when there are shim(s) | just as applicable when there are shims and even an L2 header between | |||
| and possibly a L2 header between two IP headers. | two IP headers. | |||
| However, it is not always feasible or necessary to propagate ECN | However, it is not always feasible or necessary to propagate ECN | |||
| between IP headers when separated by a shim. For instance, it might | between IP headers when separated by a shim. For instance, it might | |||
| be too costly to dig to arbitrary depths to find an inner IP header, | be too costly to dig to arbitrary depths to find an inner IP header, | |||
| there may be little or no congestion within the tunnel by design (see | there may be little or no congestion within the tunnel by design (see | |||
| null mode in Section 3.4 above), or a legacy implementation might not | null mode in Section 3.4 above), or a legacy implementation might not | |||
| support ECN. In cases where a tunnel does not support ECN, it is | support ECN. In cases where a tunnel does not support ECN, it is | |||
| important that the ingress does not copy the ECN field from an inner | important that the ingress does not copy the ECN field from an inner | |||
| IP header to an outer. Therefore Section 4 of | IP header to an outer. Therefore Section 4 of [RFC9601] requires | |||
| [I-D.ietf-tsvwg-rfc6040update-shim] requires network operators to | network operators to configure the ingress of a tunnel that does not | |||
| configure the ingress of a tunnel that does not support ECN so that | support ECN so that it zeros the ECN field in the outer IP header. | |||
| it zeros the ECN field in the outer IP header. | ||||
| Nonetheless, in many cases it is feasible to propagate the ECN field | Nonetheless, in many cases it is feasible to propagate the ECN field | |||
| between IP headers separated by shim header(s) and/or a L2 header. | between IP headers separated by shim headers and/or an L2 header. | |||
| Particularly in the typical case when the outer IP header and the | Particularly in the typical case when the outer IP header and the | |||
| shim(s) are added (or removed) as part of the same procedure. Even | shim(s) are added (or removed) as part of the same procedure. Even | |||
| if the shim(s) encapsulate a L2 header, it is often possible to find | if a shim encapsulates an L2 header, it is often possible to find an | |||
| an inner IP header within the L2 PDU and propagate ECN between that | inner IP header within the L2 PDU and propagate ECN between that and | |||
| and the outer IP header. This can be thought of as a special case of | the outer IP header. This can be thought of as a special case of the | |||
| the feed-up-and-forward mode (Section 3.2), so the guidelines for | feed-up-and-forward mode (Section 3.2), so the guidelines for this | |||
| this mode apply (Section 5). | mode apply (Section 5). | |||
| Numerous shim protocols have been defined for IP tunnelling. More | Numerous shim protocols have been defined for IP tunnelling. More | |||
| recent ones e.g. Geneve [RFC8926] and Generic UDP Encapsulation | recent ones, e.g., Geneve [RFC8926] and Generic UDP Encapsulation | |||
| (GUE) [I-D.ietf-intarea-gue] cite and follow RFC 6040. And some | (GUE) [INTAREA-GUE] cite and follow [RFC6040]. Some earlier ones, | |||
| earlier ones, e.g. CAPWAP [RFC5415] and LISP [RFC9300], cite RFC | e.g., CAPWAP [RFC5415] and LISP [RFC9300], cite [RFC3168], which is | |||
| 3168, which is compatible with RFC 6040. | compatible with [RFC6040]. | |||
| However, as Section 9.3 of [RFC3168] pointed out, ECN support needs | However, as Section 9.3 of [RFC3168] pointed out, ECN support needs | |||
| to be defined for many earlier shim-based tunnelling protocols, e.g. | to be defined for many earlier shim-based tunnelling protocols, e.g., | |||
| L2TPv2 [RFC2661], L2TPv3 [RFC3931], GRE [RFC2784], PPTP [RFC2637], | L2TPv2 [RFC2661], L2TPv3 [RFC3931], GRE [RFC2784], PPTP [RFC2637], | |||
| GTP [GTPv1], [GTPv1-U], [GTPv2-C] and Teredo [RFC4380] as well as | GTP [GTPv1] [GTPv1-U] [GTPv2-C], and Teredo [RFC4380], as well as | |||
| some recent ones, e.g. VXLAN [RFC7348], NVGRE [RFC7637] and NSH | some recent ones, e.g., VXLAN [RFC7348], NVGRE [RFC7637], and NSH | |||
| [RFC8300]. | [RFC8300]. | |||
| All these IP-based encapsulations can be updated in one shot by | All these IP-based encapsulations can be updated in one shot by | |||
| simple reference to RFC 6040. However, it would not be appropriate | simple reference to [RFC6040]. However, it would not be appropriate | |||
| to update all these protocols from within the present guidance | to update all these protocols from within the present guidance | |||
| document. Instead a companion specification | document. Instead, a companion specification [RFC9601] has the | |||
| [I-D.ietf-tsvwg-rfc6040update-shim] has been prepared that has the | appropriate Standards Track status to update Standards Track | |||
| appropriate standards track status to update standards track | ||||
| protocols. For those that are not under IETF change control | protocols. For those that are not under IETF change control | |||
| [I-D.ietf-tsvwg-rfc6040update-shim] can only recommend that the | [RFC9601] can only recommend that the relevant body updates them. | |||
| relevant body updates them. | ||||
| 4.2. Wire Protocol Design: Indication of ECN Support | 4.2. Wire Protocol Design: Indication of ECN Support | |||
| This section is intended to guide the redesign of any lower layer | This section is intended to guide the redesign of any lower-layer | |||
| protocol that encapsulate IP to add native ECN support at the lower | protocol that encapsulates IP to add built-in congestion notification | |||
| layer. It reflects the approaches used in [RFC6040] and in | support at the lower layer using feed-forward-and-up mode. It | |||
| [RFC5129]. Therefore IP-in-IP tunnels or IP-in-MPLS or MPLS-in-MPLS | reflects the approaches used in [RFC6040] and in [RFC5129]. | |||
| Therefore, IP-in-IP tunnels or IP-in-MPLS or MPLS-in-MPLS | ||||
| encapsulations that already comply with [RFC6040] or [RFC5129] will | encapsulations that already comply with [RFC6040] or [RFC5129] will | |||
| already satisfy this guidance. | already satisfy this guidance. | |||
| A lower layer (or subnet) congestion notification system: | A lower-layer (or subnet) congestion notification system: | |||
| 1. SHOULD NOT apply explicit congestion notifications to PDUs that | 1. SHOULD NOT apply explicit congestion notifications to PDUs that | |||
| are destined for legacy layer-4 transport implementations that | are destined for legacy layer-4 transport implementations that | |||
| will not understand ECN, and | will not understand ECN; and | |||
| 2. SHOULD NOT apply explicit congestion notifications to PDUs if the | 2. SHOULD NOT apply explicit congestion notifications to PDUs if the | |||
| egress of the subnet might not propagate congestion notifications | egress of the subnet might not propagate congestion notification | |||
| onward into the higher layer. | onward into the higher layer. | |||
| We use the term ECN-PDUs for a PDU on a feedback loop that will | We use the term ECN-PDU for a PDU on a feedback loop that will | |||
| propagate congestion notification properly because it meets both | propagate congestion notification properly because it meets both | |||
| the above criteria. And a Not-ECN-PDU is a PDU on a feedback | the above criteria. Additionally, a Not-ECN-PDU is a PDU on a | |||
| loop that does not meet at least one of the criteria, and will | feedback loop that does not meet at least one of the criteria, | |||
| therefore not propagate congestion notification properly. A | and therefore will not propagate congestion notification | |||
| corollary of the above is that a lower layer congestion | properly. A corollary of the above is that a lower-layer | |||
| notification protocol: | congestion notification protocol: | |||
| 3. SHOULD be able to distinguish ECN-PDUs from Not-ECN-PDUs. | 3. SHOULD be able to distinguish ECN-PDUs from Not-ECN-PDUs. | |||
| Note that there is no need for all interior nodes within a subnet to | Note that there is no need for all interior nodes within a subnet to | |||
| be able to mark congestion explicitly. A mix of ECN and drop signals | be able to mark congestion explicitly. A mix of drop and explicit | |||
| from different nodes is fine. However, if _any_ interior nodes might | congestion signals from different nodes is fine. However, if _any_ | |||
| generate ECN markings, guideline 2 above says that all relevant | interior nodes might generate congestion markings, Guideline 2 above | |||
| egress node(s) SHOULD be able to propagate those markings up to the | says that all relevant egress nodes SHOULD be able to propagate those | |||
| higher layer. | markings up to the higher layer. | |||
| In IP, if the ECN field in each PDU is cleared to the Not-ECT (not | In IP, if the ECN field in each PDU is cleared to the Not ECN-Capable | |||
| ECN-capable transport) codepoint, it indicates that the L4 transport | Transport (Not-ECT) codepoint, it indicates that the L4 transport | |||
| will not understand congestion markings. A congested buffer must not | will not understand congestion markings. A congested buffer must not | |||
| mark these Not-ECT PDUs, and therefore has to signal congestion by | mark these Not-ECT PDUs; therefore, it has to signal congestion by | |||
| increasingly applying drop instead. | increasingly applying drop instead. | |||
| The mechanism a lower layer uses to distinguish the ECN-capability of | The mechanism a lower layer uses to distinguish the ECN capability of | |||
| PDUs need not mimic that of IP. The above guidelines merely say that | PDUs need not mimic that of IP. The above guidelines merely say that | |||
| the lower layer system, as a whole, should achieve the same outcome. | the lower-layer system as a whole should achieve the same outcome. | |||
| For instance, ECN-capable feedback loops might use PDUs that are | For instance, ECN-capable feedback loops might use PDUs that are | |||
| identified by a particular set of labels or tags. Alternatively, | identified by a particular set of labels or tags. Alternatively, | |||
| logical link protocols that use flow state might determine whether a | logical-link protocols that use flow state might determine whether a | |||
| PDU can be congestion marked by checking for ECN-support in the flow | PDU can be congestion marked by checking for ECN support in the flow | |||
| state. Other protocols might depend on out-of-band control signals. | state. Other protocols might depend on out-of-band control signals. | |||
| The per-domain checking of ECN support in MPLS [RFC5129] is a good | The per-domain checking of ECN support in MPLS [RFC5129] is a good | |||
| example of a way to avoid sending congestion markings to L4 | example of a way to avoid sending congestion markings to L4 | |||
| transports that will not understand them, without using any header | transports that will not understand them without using any header | |||
| space in the subnet protocol. | space in the subnet protocol. | |||
| In MPLS, header space is extremely limited, therefore RFC5129 does | In MPLS, header space is extremely limited; therefore, [RFC5129] does | |||
| not provide a field in the MPLS header to indicate whether the PDU is | not provide a field in the MPLS header to indicate whether the PDU is | |||
| an ECN-PDU or a Not-ECN-PDU. Instead, interior nodes in a domain are | an ECN-PDU or a Not-ECN-PDU. Instead, interior nodes in a domain are | |||
| allowed to set explicit congestion indications without checking | allowed to set explicit congestion indications without checking | |||
| whether the PDU is destined for a L4 transport that will understand | whether the PDU is destined for a L4 transport that will understand | |||
| them. Nonetheless, this is made safe by requiring that the network | them. Nonetheless, this is made safe by requiring that the network | |||
| operator upgrades all decapsulating edges of a whole domain at once, | operator upgrades all decapsulating edges of a whole domain at once | |||
| as soon as even one switch within the domain is configured to mark | as soon as even one switch within the domain is configured to mark | |||
| rather than drop some PDUs during congestion. Therefore, any edge | rather than drop some PDUs during congestion. Therefore, any edge | |||
| node that might decapsulate a packet will be capable of checking | node that might decapsulate a packet will be capable of checking | |||
| whether the higher layer transport is ECN-capable. When | whether the higher-layer transport is ECN-capable. When | |||
| decapsulating a CE-marked packet, if the decapsulator discovers that | decapsulating a CE-marked packet, if the decapsulator discovers that | |||
| the higher layer (inner header) indicates the transport is not ECN- | the higher layer (inner header) indicates the transport is not ECN- | |||
| capable, it drops the packet — effectively on behalf of the earlier | capable, it drops the packet -- effectively on behalf of the earlier | |||
| congested node (see Decapsulation Guideline 1 in Section 4.4). | congested node (see Decapsulation Guideline 1 in Section 4.4). | |||
| It was only appropriate to define such an incremental deployment | It was only appropriate to define such an incremental deployment | |||
| strategy because MPLS is targeted solely at professional operators, | strategy because MPLS is targeted solely at professional operators | |||
| who can be expected to ensure that a whole subnetwork is consistently | who can be expected to ensure that a whole subnetwork is consistently | |||
| configured. This strategy might not be appropriate for other link | configured. This strategy might not be appropriate for other link | |||
| technologies targeted at zero-configuration deployment or deployment | technologies targeted at zero-configuration deployment or deployment | |||
| by the general public (e.g. Ethernet). For such 'plug-and-play' | by the general public (e.g., Ethernet). For such 'plug-and-play' | |||
| environments it will be necessary to invent a failsafe approach that | environments, it will be necessary to invent a fail-safe approach | |||
| ensures congestion markings will never fall into black holes, no | that ensures congestion markings will never fall into black holes, no | |||
| matter how inconsistently a system is put together. Alternatively, | matter how inconsistently a system is put together. Alternatively, | |||
| congestion notification relying on correct system configuration could | congestion notification relying on correct system configuration could | |||
| be confined to flavours of Ethernet intended only for professional | be confined to flavours of Ethernet intended only for professional | |||
| network operators, such as Provider Backbone Bridges (PBB | network operators, such as Provider Backbone Bridges (PBB) | |||
| [IEEE802.1Q]; previously 802.1ah). | ([IEEE802.1Q]; previously 802.1ah). | |||
| ECN support in TRILL [I-D.ietf-trill-ecn-support] provides a good | ECN support in TRansparent Interconnection of Lots of Links (TRILL) | |||
| example of how to add ECN to a lower layer protocol without relying | [RFC9600] provides a good example of how to add congestion | |||
| on careful and consistent operator configuration. TRILL provides an | notification to a lower-layer protocol without relying on careful and | |||
| extension header word with space for flags of different categories | consistent operator configuration. TRILL provides an extension | |||
| depending on whether logic to understand the extension is critical. | header word with space for flags of different categories depending on | |||
| The congestion experienced marking has been defined as a 'critical | whether logic to understand the extension is critical. The | |||
| ingress-to-egress' flag. So if a transit RBridge sets this flag on a | congestion-experienced marking has been defined as a 'critical | |||
| frame and an egress RBridge does not have any logic to process it, it | ingress-to-egress' flag. So, if a transit RBridge sets this flag on | |||
| will drop it; which is the desired default action anyway. Therefore | a frame and an egress RBridge does not have any logic to process it, | |||
| TRILL RBridges can be updated with support for ECN in no particular | the egress RBridge will drop the frame, which is the desired default | |||
| order and, at the egress of the TRILL campus, congestion notification | action anyway. Therefore, TRILL RBridges can be updated with support | |||
| will be propagated to IP as ECN whenever ECN logic has been | for congestion notification in no particular order and, at the egress | |||
| implemented at the egress, or as drop otherwise. | of the TRILL campus, congestion notification will be propagated to IP | |||
| as ECN whenever ECN logic has been implemented at the egress, or as | ||||
| drop otherwise. | ||||
| QCN [IEEE802.1Q] is not intended to extend beyond a single subnet, or | QCN [IEEE802.1Q] is not intended to extend beyond a single subnet or | |||
| to interoperate with ECN. Nonetheless, the way QCN indicates to | interoperate with IP-ECN. Nonetheless, the way QCN indicates to | |||
| lower layer devices that the end-points will not understand QCN | lower-layer devices that the endpoints will not understand QCN | |||
| provides another example that a lower layer protocol designer might | provides another example that a lower-layer protocol designer might | |||
| be able to mimic for their scenario. An operator can define certain | be able to mimic for their scenario. An operator can define certain | |||
| Priority Code Points (PCPs [IEEE802.1Q]; previously 802.1p) to | Priority Code Points (PCPs [IEEE802.1Q]; previously 802.1p) to | |||
| indicate non-QCN frames and an ingress bridge is required to map | indicate non-QCN frames. Then an ingress bridge has to map each | |||
| arriving not-QCN-capable IP packets to one of these non-QCN PCPs. | arriving not-QCN-capable IP packet to one of these non-QCN PCPs. | |||
| When drop for non-ECN traffic is deferred to the egress of a subnet, | When drop for non-ECN traffic is deferred to the egress of a subnet, | |||
| it cannot necessarily be assumed that one ECN mark is equivalent to | it cannot necessarily be assumed that one congestion mark is | |||
| one drop, as was originally required by [RFC3168]. [RFC8311] updated | equivalent to one drop, as was originally required by [RFC3168]. | |||
| RFC 3168, to allow experimentation with congestion markings that are | [RFC8311] updated [RFC3168] to allow experimentation with congestion | |||
| not equivalent to drop, in particular for L4S [RFC9331]. ECN support | markings that are not equivalent to drop, particularly for L4S | |||
| in TRILL [I-D.ietf-trill-ecn-support] is a good example of a way to | [RFC9331]. ECN support in TRILL [RFC9600] is a good example of a way | |||
| defer drop to the egress of a subnet both when marks are equivalent | to defer drop to the egress of a subnet both when marks are | |||
| to drops (as in RFC 3168) and when they are not (as in L4S). The ECN | equivalent to drops (as in [RFC3168]) and when they are not (as in | |||
| scheme for MPLS [RFC5129] was defined before L4S, so it only | L4S). The ECN scheme for MPLS [RFC5129] was defined before L4S, so | |||
| currently supports deferred drop that is equivalent to ECN-marking. | it only currently supports deferred drop that is equivalent to ECN | |||
| Nonetheless, in principle, MPLS (and potentially future L2 protocols) | marking. Nonetheless, in principle, MPLS (and potentially future L2 | |||
| could support L4S marking and copy TRILL's approach for determining | protocols) could support L4S marking by copying TRILL's approach for | |||
| the drop level of any non-ECN traffic at the subnet egress. | determining the drop level of any non-ECN traffic at the subnet | |||
| egress. | ||||
| 4.3. Encapsulation Guidelines | 4.3. Encapsulation Guidelines | |||
| This section is intended to guide the redesign of any node that | This section is intended to guide the redesign of any node that | |||
| encapsulates IP with a lower layer header when adding native ECN | encapsulates IP with a lower-layer header when adding built-in | |||
| support to the lower layer protocol. It reflects the approaches used | congestion notification support to the lower-layer protocol using | |||
| in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or IP-in- | feed-forward-and-up mode. It reflects the approaches used in | |||
| MPLS or MPLS-in-MPLS encapsulations that already comply with | [RFC6040] and [RFC5129]. Therefore, IP-in-IP tunnels or IP-in-MPLS | |||
| [RFC6040] or [RFC5129] will already satisfy this guidance. | or MPLS-in-MPLS encapsulations that already comply with [RFC6040] or | |||
| [RFC5129] will already satisfy this guidance. | ||||
| 1. Egress Capability Check: A subnet ingress needs to be sure that | 1. Egress Capability Check: A subnet ingress needs to be sure that | |||
| the corresponding egress of a subnet will propagate any | the corresponding egress of a subnet will propagate any | |||
| congestion notification added to the outer header across the | congestion notification added to the outer header across the | |||
| subnet. This is necessary in addition to checking that an | subnet. This is necessary in addition to checking that an | |||
| incoming PDU indicates an ECN-capable (L4) transport. Examples | incoming PDU indicates an ECN-capable (L4) transport. Examples | |||
| of how this guarantee might be provided include: | of how this guarantee might be provided include: | |||
| * by configuration (e.g. if any label switches in a domain | * by configuration (e.g., if any label switch in a domain | |||
| support ECN marking, [RFC5129] requires all egress nodes to | supports congestion marking, [RFC5129] requires all egress | |||
| have been configured to propagate ECN) | nodes to have been configured to propagate ECN). | |||
| * by the ingress explicitly checking that the egress propagates | * by the ingress explicitly checking that the egress propagates | |||
| ECN (e.g. an early attempt to add ECN support to TRILL used | ECN (e.g., an early attempt to add ECN support to TRILL used | |||
| IS-IS to check path capabilities before adding ECN extension | IS-IS to check path capabilities before adding ECN extension | |||
| flags to each frame [RFC7780]). | flags to each frame [RFC7780]). | |||
| * by inherent design of the protocol (e.g. by encoding ECN | * by inherent design of the protocol (e.g., by encoding | |||
| marking on the outer header in such a way that a legacy egress | congestion marking on the outer header in such a way that a | |||
| that does not understand ECN will consider the PDU corrupt or | legacy egress that does not understand ECN will consider the | |||
| invalid and discard it, thus at least propagating a form of | PDU corrupt or invalid and discard it; thus, at least | |||
| congestion signal). | propagating a form of congestion signal). | |||
| 2. Egress Fails Capability Check: If the ingress cannot guarantee | 2. Egress Fails Capability Check: If the ingress cannot guarantee | |||
| that the egress will propagate congestion notification, the | that the egress will propagate congestion notification, the | |||
| ingress SHOULD disable ECN at the lower layer when it forwards | ingress SHOULD disable congestion notification at the lower layer | |||
| the PDU. An example of how the ingress might disable ECN at the | when it forwards the PDU. An example of how the ingress might | |||
| lower layer would be by setting the outer header of the PDU to | disable congestion notification at the lower layer would be by | |||
| identify it as a Not-ECN-PDU, assuming the subnet technology | setting the outer header of the PDU to identify it as a Not-ECN- | |||
| supports such a concept. | PDU, assuming the subnet technology supports such a concept. | |||
| 3. Standard Congestion Monitoring Baseline: Once the ingress to a | 3. Standard Congestion Monitoring Baseline: Once the ingress to a | |||
| subnet has established that the egress will correctly propagate | subnet has established that the egress will correctly propagate | |||
| ECN, on encapsulation it SHOULD encode the same level of | ECN, on encapsulation, it SHOULD encode the same level of | |||
| congestion in outer headers as is arriving in incoming headers. | congestion in outer headers as is arriving in incoming headers. | |||
| For example, it might copy any incoming congestion notification | For example, it might copy any incoming congestion notifications | |||
| into the outer header of the lower layer protocol. | into the outer header of the lower-layer protocol. | |||
| This ensures that bulk congestion monitoring of outer headers | This ensures that bulk congestion monitoring of outer headers | |||
| (e.g. by a network management node monitoring ECN in passing | (e.g., by a network management node monitoring congestion | |||
| frames) will measure congestion accumulated along the whole | markings in passing frames) will measure congestion accumulated | |||
| upstream path — starting from the Load Regulator not just | along the whole upstream path, starting from the Load Regulator | |||
| starting from the ingress of the subnet. A node that is not the | and not just starting from the ingress of the subnet. A node | |||
| Load Regulator SHOULD NOT re-initialize the level of CE markings | that is not the Load Regulator SHOULD NOT re-initialize the level | |||
| in the outer to zero. | of CE markings in the outer header to zero. | |||
| It would still also be possible to measure congestion introduced | It would still also be possible to measure congestion introduced | |||
| across one subnet (or tunnel) by subtracting the level of CE | across one subnet (or tunnel) by subtracting the level of CE | |||
| markings on inner headers from that on outer headers (see | markings on inner headers from that on outer headers (see | |||
| Appendix C of [RFC6040]). For example: | Appendix C of [RFC6040]). For example: | |||
| * If this guideline has been followed and if the level of CE | * If this guideline has been followed and if the level of CE | |||
| markings is 0.4% on the outer and 0.1% on the inner, 0.4% | markings is 0.4% on the outer header and 0.1% on the inner | |||
| congestion has been introduced across all the networks since | header, 0.4% congestion has been introduced across all the | |||
| the load regulator, and 0.3% (= 0.4% - 0.1%) has been | networks since the Load Regulator, and 0.3% (= 0.4% - 0.1%) | |||
| introduced since the ingress to the current subnet (or | has been introduced since the ingress to the current subnet | |||
| tunnel); | (or tunnel). | |||
| * Without this guideline, if the subnet ingress had re- | * Without this guideline, if the subnet ingress had re- | |||
| initialized the outer congestion level to zero, the outer and | initialized the outer congestion level to zero, the outer and | |||
| inner would measure 0.1% and 0.3%. It would still be possible | inner headers would measure 0.1% and 0.3%. It would still be | |||
| to infer that the congestion introduced since the Load | possible to infer that the congestion introduced since the | |||
| Regulator was 0.4% (= 0.1% + 0.3%). But only if the | Load Regulator was 0.4% (= 0.1% + 0.3%), but only if the | |||
| monitoring system somehow knows whether the subnet ingress re- | monitoring system somehow knows whether the subnet ingress re- | |||
| initialized the congestion level. | initialized the congestion level. | |||
| As long as subnet and tunnel technologies use the standard | As long as subnet and tunnel technologies use the standard | |||
| congestion monitoring baseline in this guideline, monitoring | congestion monitoring baseline in this guideline, monitoring | |||
| systems will know to use the former approach, rather than having | systems will know to use the former approach rather than having | |||
| to "somehow know" which approach to use. | to 'somehow know' which approach to use. | |||
| 4.4. Decapsulation Guidelines | 4.4. Decapsulation Guidelines | |||
| This section is intended to guide the redesign of any node that | This section is intended to guide the redesign of any node that | |||
| decapsulates IP from within a lower layer header when adding native | decapsulates IP from within a lower-layer header when adding built-in | |||
| ECN support to the lower layer protocol. It reflects the approaches | congestion notification support to the lower-layer protocol using | |||
| used in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or | feed-forward-and-up mode. It reflects the approaches used in | |||
| IP-in-MPLS or MPLS-in-MPLS encapsulations that already comply with | [RFC6040] and in [RFC5129]. Therefore, IP-in-IP tunnels or IP-in- | |||
| MPLS or MPLS-in-MPLS encapsulations that already comply with | ||||
| [RFC6040] or [RFC5129] will already satisfy this guidance. | [RFC6040] or [RFC5129] will already satisfy this guidance. | |||
| A subnet egress SHOULD NOT simply copy congestion notification from | A subnet egress SHOULD NOT simply copy congestion notifications from | |||
| outer headers to the forwarded header. It SHOULD calculate the | outer headers to the forwarded header. It SHOULD calculate the | |||
| outgoing congestion notification field from the inner and outer | outgoing congestion notification field from the inner and outer | |||
| headers using the following guidelines. If there is any conflict, | headers using the following guidelines. If there is any conflict, | |||
| rules earlier in the list take precedence over rules later in the | rules earlier in the list take precedence over rules later in the | |||
| list: | list. | |||
| 1. If the arriving inner header is a Not-ECN-PDU it implies the L4 | 1. If the arriving inner header is a Not-ECN-PDU, it implies the L4 | |||
| transport will not understand explicit congestion markings. | transport will not understand explicit congestion markings. | |||
| Then: | Then: | |||
| * If the outer header carries an explicit congestion marking, it | * If the outer header carries an explicit congestion marking, it | |||
| is likely that a protocol error has occurred, so drop is the | is likely that a protocol error has occurred, so drop is the | |||
| only indication of congestion that the L4 transport will | only indication of congestion that the L4 transport will | |||
| understand. If the congestion marking is the most severe | understand. If the outer congestion marking is the most | |||
| possible, the packet MUST be dropped. However, if congestion | severe possible, the packet MUST be dropped. However, if | |||
| can be marked with multiple levels of severity and the | congestion can be marked with multiple levels of severity and | |||
| packet's marking is not the most severe, this requirement can | the packet's outer marking is not the most severe, this | |||
| be relaxed to: the packet SHOULD be dropped. | requirement can be relaxed to: the packet SHOULD be dropped. | |||
| * If the outer is an ECN-PDU that carries no indication of | * If the outer is an ECN-PDU that carries no indication of | |||
| congestion or a Not-ECN-PDU the PDU SHOULD be forwarded, but | congestion or a Not-ECN-PDU the PDU SHOULD be forwarded, but | |||
| still as a Not-ECN-PDU. | still as a Not-ECN-PDU. | |||
| 2. If the outer header does not support explicit congestion | 2. If the outer header does not support congestion notification (a | |||
| notification (a Not-ECN-PDU), but the inner header does (an ECN- | Not-ECN-PDU), but the inner header does (an ECN-PDU), the inner | |||
| PDU), the inner header SHOULD be forwarded unchanged. | header SHOULD be forwarded unchanged. | |||
| 3. In some lower layer protocols congestion may be signalled as a | 3. In some lower-layer protocols, congestion may be signalled as a | |||
| numerical level, such as in the control frames of quantized | numerical level, such as in the control frames of QCN | |||
| congestion notification (QCN [IEEE802.1Q]). If such a multi-bit | [IEEE802.1Q]. If such a multi-bit encoding encapsulates an ECN- | |||
| encoding encapsulates an ECN-capable IP data packet, a function | capable IP data packet, a function will be needed to convert the | |||
| will be needed to convert the quantized congestion level into the | quantized congestion level into the frequency of congestion | |||
| frequency of congestion markings in outgoing IP packets. | markings in outgoing IP packets. | |||
| 4. Congestion indications might be encoded by a severity level. For | 4. Congestion indications might be encoded by a severity level. For | |||
| instance increasing levels of congestion might be encoded by | instance, increasing levels of congestion might be encoded by | |||
| numerically increasing indications, e.g. pre-congestion | numerically increasing indications, e.g., PCN can be encoded in | |||
| notification (PCN) can be encoded in each PDU at three severity | each PDU at three severity levels in IP or MPLS [RFC6660] and the | |||
| levels in IP or MPLS [RFC6660] and the default encapsulation and | default encapsulation and decapsulation rules [RFC6040] are | |||
| decapsulation rules [RFC6040] are compatible with this | compatible with this interpretation of the ECN field. | |||
| interpretation of the ECN field. | ||||
| If the arriving inner header is an ECN-PDU, where the inner and | If the arriving inner header is an ECN-PDU, where the inner and | |||
| outer headers carry indications of congestion of different | outer headers carry indications of congestion of different | |||
| severity, the more severe indication SHOULD be forwarded in | severity, the more severe indication SHOULD be forwarded in | |||
| preference to the less severe. | preference to the less severe. | |||
| 5. The inner and outer headers might carry a combination of | 5. The inner and outer headers might carry a combination of | |||
| congestion notification fields that should not be possible given | congestion notification fields that should not be possible given | |||
| any currently used protocol transitions. For instance, if | any currently used protocol transitions. For instance, if | |||
| Encapsulation Guideline 3 in Section 4.3 had been followed, it | Encapsulation Guideline 3 in Section 4.3 had been followed, it | |||
| should not be possible to have a less severe indication of | should not be possible to have a less severe indication of | |||
| congestion in the outer than in the inner. It MAY be appropriate | congestion in the outer header than in the inner header. It MAY | |||
| to log unexpected combinations of headers and possibly raise an | be appropriate to log unexpected combinations of headers and | |||
| alarm. | possibly raise an alarm. | |||
| If a safe outgoing codepoint can be defined for such a PDU, the | If a safe outgoing codepoint can be defined for such a PDU, the | |||
| PDU SHOULD be forwarded rather than dropped. Some implementers | PDU SHOULD be forwarded rather than dropped. Some implementers | |||
| discard PDUs with currently unused combinations of headers just | discard PDUs with currently unused combinations of headers just | |||
| in case they represent an attack. However, an approach using | in case they represent an attack. However, an approach using | |||
| alarms and policy-mediated drop is preferable to hard-coded drop, | alarms and policy-mediated drop is preferable to hard-coded drop | |||
| so that operators can keep track of possible attacks but | so that operators can keep track of possible attacks, but | |||
| currently unused combinations are not precluded from future use | currently unused combinations are not precluded from future use | |||
| through new standards actions. | through new standards actions. | |||
| 4.5. Sequences of Similar Tunnels or Subnets | 4.5. Sequences of Similar Tunnels or Subnets | |||
| In some deployments, particularly in 3GPP networks, an IP packet may | In some deployments, particularly in 3GPP networks, an IP packet may | |||
| traverse two or more IP-in-IP tunnels in sequence that all use | traverse two or more IP-in-IP tunnels in sequence that all use | |||
| identical technology (e.g. GTP). | identical technology (e.g., GTP). | |||
| In such cases, it would be sufficient for every encapsulation and | In such cases, it would be sufficient for every encapsulation and | |||
| decapsulation in the chain to comply with RFC 6040. Alternatively, | decapsulation in the chain to comply with [RFC6040]. Alternatively, | |||
| as an optimisation, a node that decapsulates a packet and immediately | as an optimization, a node that decapsulates a packet and immediately | |||
| re-encapsulates it for the next tunnel MAY copy the incoming outer | re-encapsulates it for the next tunnel MAY copy the incoming outer | |||
| ECN field directly to the outgoing outer and the incoming inner ECN | ECN field directly to the outgoing outer header and the incoming | |||
| field directly to the outgoing inner. Then the overall behavior | inner ECN field directly to the outgoing inner header. Then, the | |||
| across the sequence of tunnel segments would still be consistent with | overall behaviour across the sequence of tunnel segments would still | |||
| RFC 6040. | be consistent with [RFC6040]. | |||
| Appendix C of RFC6040 describes how a tunnel egress can monitor how | Appendix C of [RFC6040] describes how a tunnel egress can monitor how | |||
| much congestion has been introduced within a tunnel. A network | much congestion has been introduced within a tunnel. A network | |||
| operator might want to monitor how much congestion had been | operator might want to monitor how much congestion had been | |||
| introduced within a whole sequence of tunnels. Using the technique | introduced within a whole sequence of tunnels. Using the technique | |||
| in Appendix C of RFC6040 at the final egress, the operator could | in Appendix C of [RFC6040] at the final egress, the operator could | |||
| monitor the whole sequence of tunnels, but only if the above | monitor the whole sequence of tunnels, but only if the above | |||
| optimisation were used consistently along the sequence of tunnels, in | optimization were used consistently along the sequence of tunnels, in | |||
| order to make it appear as a single tunnel. Therefore, tunnel | order to make it appear as a single tunnel. Therefore, tunnel | |||
| endpoint implementations SHOULD allow the operator to configure | endpoint implementations SHOULD allow the operator to configure | |||
| whether this optimisation is enabled. | whether this optimization is enabled. | |||
| When ECN support is added to a subnet technology, consideration | When congestion notification support is added to a subnet technology, | |||
| SHOULD be given to a similar optimisation between subnets in sequence | consideration SHOULD be given to a similar optimization between | |||
| if they all use the same technology. | subnets in sequence if they all use the same technology. | |||
| 4.6. Reframing and Congestion Markings | 4.6. Reframing and Congestion Markings | |||
| The guidance in this section is worded in terms of framing | The guidance in this section is worded in terms of framing | |||
| boundaries, but it applies equally whether the protocol data units | boundaries, but it applies equally whether the PDUs are frames, | |||
| are frames, cells or packets. | cells, or packets. | |||
| Where an AQM marks the ECN field of IP packets as they queue into a | Where an AQM marks the ECN field of IP packets as they queue into a | |||
| layer-2 link, there will be no problem with framing boundaries, | Layer 2 link, there will be no problem with framing boundaries | |||
| because the ECN markings would be applied directly to IP packets. | because the ECN markings would be applied directly to IP packets. | |||
| The guidance in this section is only applicable where an ECN | The guidance in this section is only applicable where a congestion | |||
| capability is being added to a layer-2 protocol so that layer-2 | notification capability is being added to a Layer 2 protocol so that | |||
| frames can be ECN-marked by an AQM at layer-2. This would only be | Layer 2 frames can be marked by an AQM at layer 2. This would only | |||
| necessary where AQM will be applied at pure layer-2 nodes (without | be necessary where AQM will be applied at pure Layer 2 nodes (without | |||
| IP-awareness). | IP awareness). | |||
| Where ECN marking has had to be applied at non-IP-aware nodes and | Where congestion marking has had to be applied at non-IP-aware nodes | |||
| framing boundaries do not necessarily align with packet boundaries, | and framing boundaries do not necessarily align with packet | |||
| the decapsulating IP forwarding node SHOULD propagate ECN markings | boundaries, the decapsulating IP forwarding node SHOULD propagate | |||
| from layer-2 frame headers to IP packets that may have different | congestion markings from Layer 2 frame headers to IP packets that may | |||
| boundaries as a consequence of reframing. | have different boundaries as a consequence of reframing. | |||
| Two possible design goals for propagating congestion indications, | Two possible design goals for propagating congestion indications, | |||
| described in Section 5.3 of [RFC3168] and Section 2.4 of [RFC7141], | described in Section 5.3 of [RFC3168] and Section 2.4 of [RFC7141], | |||
| are: | are: | |||
| 1. approximate preservation of the presence (and therefore timing) | 1. approximate preservation of the presence (and therefore timing) | |||
| of congestion marks on the L2 frames used to construct an IP | of congestion marks on the L2 frames used to construct an IP | |||
| packet; | packet; | |||
| a. at high frequency of congestion marking, approximate | 2. a. at high frequency of congestion marking, approximate | |||
| preservation of the proportion of congestion marks arriving | preservation of the proportion of congestion marks arriving | |||
| and departing; | and departing; | |||
| b. at low frequency of congestion marking, approximate | b. at low frequency of congestion marking, approximate | |||
| preservation of the timing of congestion marks arriving and | preservation of the timing of congestion marks arriving and | |||
| departing. | departing. | |||
| In either case, an implementation SHOULD ensure that any new incoming | In either case, an implementation SHOULD ensure that any new incoming | |||
| congestion indication is propagated immediately, not held awaiting | congestion indication is propagated immediately; not held awaiting | |||
| the possibility of further congestion indications to be sufficient to | the possibility of further congestion indications to be sufficient to | |||
| indicate congestion on an outgoing PDU [RFC7141]. Nonetheless, to | indicate congestion on an outgoing PDU [RFC7141]. Nonetheless, to | |||
| facilitate pipelined implementation, it would be acceptable for | facilitate pipelined implementation, it would be acceptable for | |||
| congestion marks to propagate to a slightly later IP packet. | congestion marks to propagate to a slightly later IP packet. | |||
| At decapsulation in either case: | At decapsulation in either case: | |||
| * ECN marking propagation logically occurs before application of | * ECN-marking propagation logically occurs before application of | |||
| Decapsulation Guideline 1 in Section 4.4. For instance, if ECN | Decapsulation Guideline 1 in Section 4.4. For instance, if ECN- | |||
| marking propagation would cause an ECN congestion indication to be | marking propagation would cause an ECN congestion indication to be | |||
| applied to an IP packet that is a Not-ECN-PDU, then that IP packet | applied to an IP packet that is a Not-ECN-PDU, then that IP packet | |||
| is dropped in accordance with Guideline 1; | is dropped in accordance with Guideline 1. | |||
| * where a mix of ECN-PDUs and non-ECN-PDUs arrives to construct the | * Where a mix of ECN-PDUs and non-ECN-PDUs arrives to construct the | |||
| same IP packet, the decapsulation spec SHOULD require that packet | same IP packet, the decapsulation specification SHOULD require | |||
| to be discarded. | that packet to be discarded. | |||
| * where a mix of different types of ECN-PDUs arrives to construct | * Where a mix of different types of ECN-PDUs arrives to construct | |||
| the same IP packet, e.g. a mix of frames that map to ECT(0) and | the same IP packet, e.g., a mix of frames that map to ECT(0) and | |||
| ECT(1) IP packets, the decapsulation spec might consider this a | ECT(1) IP packets, the decapsulation specification might consider | |||
| protocol error. But, if the lower layer protocol has defined such | this a protocol error. But, if the lower-layer protocol has | |||
| a mix of types of ECN-PDU as valid, it SHOULD require the | defined such a mix of types of ECN-PDU as valid, it SHOULD require | |||
| resulting IP packet to be set to either ECT(0) or ECT(1). In this | the resulting IP packet to be set to either ECT(0) or ECT(1). In | |||
| case, it SHOULD take into account that the RFC series has so far | this case, it SHOULD take into account that the RFC Series has so | |||
| allowed ECT(0) and ECT(1) to be considered equivalent [RFC3168], | far allowed ECT(0) and ECT(1) to be considered equivalent | |||
| or ECT(1) can provide a less severe congestion marking than CE | [RFC3168]; or ECT(1) can provide a less severe congestion marking | |||
| [RFC6040], or ECT(1) can indicate an unmarked but ECN-capable | than CE [RFC6040]; or ECT(1) can indicate an unmarked but ECN- | |||
| packet that is subject to a different marking algorithm to ECT(0) | capable packet that is subject to a different marking algorithm to | |||
| packets, for example L4S [RFC8311] [RFC9331]. | ECT(0) packets, e.g., L4S [RFC8311] [RFC9331]. | |||
| The following are two ways that goal 1 might be achieved, but they | The following are two ways that goal 1 might be achieved, but they | |||
| are not intended to be the only ways: | are not intended to be the only ways: | |||
| * Every IP PDU that is constructed, in whole or in part, from an L2 | * Every IP PDU that is constructed, in whole or in part, from an L2 | |||
| frame that is marked with a congestion signal, has that signal | frame that is marked with a congestion signal has that signal | |||
| propagated to it; | propagated to it. | |||
| * Every L2 frame that is marked with a congestion signal, propagates | * Every L2 frame that is marked with a congestion signal propagates | |||
| that signal to one IP PDU which is constructed, in whole or in | that signal to one IP PDU that is constructed from it in whole or | |||
| part, from it. If multiple IP PDUs meet this description, the | in part. If multiple IP PDUs meet this description, the choice | |||
| choice can be made arbitrarily but ought to be consistent. | can be made arbitrarily but ought to be consistent. | |||
| The following gives one way that goal 2 might be achieved, but it is | The following gives one way that goal 2 might be achieved, but it is | |||
| not intended to be the only way: | not intended to be the only way: | |||
| * For each of the streams of frames that encapsulate the IP packets | * For each of the streams of frames that encapsulate the IP packets | |||
| of each IP-ECN codepoint and follow the same path through the | of each IP-ECN codepoint and follow the same path through the | |||
| subnet, a counter ('in') tracks octets arriving within the payload | subnet, a counter ('in') tracks octets arriving within the payload | |||
| of marked L2 frames and another ('out') tracks octets departing in | of marked L2 frames and another ('out') tracks octets departing in | |||
| marked IP packets. While 'in' exceeds 'out', forwarded IP packets | marked IP packets. While 'in' exceeds 'out', forwarded IP packets | |||
| are ECN-marked. If 'out' exceeds 'in' for longer than a timeout, | are ECN-marked. If 'out' exceeds 'in' for longer than a timeout, | |||
| both counters are zeroed, to ensure that the start of the next | both counters are zeroed to ensure that the start of the next | |||
| congestion episode propagates immediately. The 'out' counter | congestion episode propagates immediately. The 'out' counter | |||
| includes octets in reconstructed IP packets that would have been | includes octets in reconstructed IP packets that would have been | |||
| marked, but had to be dropped because they were Not-ECN-PDUs (by | marked, but had to be dropped because they were Not-ECN-PDUs (by | |||
| Decapsulation Guideline 1 in Section 4.4). | Decapsulation Guideline 1 in Section 4.4). | |||
| Generally, the number of L2 frames may be higher (e.g. ATM), similar | Generally, relative to the number of IP PDUs, the number of L2 frames | |||
| to, or lower (e.g. 802.11 aggregation at a L2-only station) than the | may be higher (e.g., ATM), roughly the same, or lower (e.g., 802.11 | |||
| number of IP PDUs, and this distinction may influence the choice of | aggregation at an L2-only station). This distinction may influence | |||
| mechanism. | the choice of mechanism. | |||
| 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | |||
| Notification | Notification | |||
| The guidance in this section is applicable, for example, when IP | The guidance in this section is applicable, for example, when IP | |||
| packets: | packets: | |||
| * are encapsulated in Ethernet headers, which have no support for | * are encapsulated in Ethernet headers, which have no support for | |||
| ECN; | congestion notification; | |||
| * are forwarded by the eNode-B (base station) of a 3GPP radio access | * are forwarded by the eNode-B (base station) of a 3GPP radio access | |||
| network, which is required to apply ECN marking during congestion, | network, which is required to apply ECN marking during congestion | |||
| [LTE-RA], [UTRAN], but the Packet Data Convergence Protocol (PDCP) | [LTE-RA] [UTRAN], but the Packet Data Convergence Protocol (PDCP) | |||
| that encapsulates the IP header over the radio access has no | that encapsulates the IP header over the radio access has no | |||
| support for ECN. | support for ECN. | |||
| This guidance also generalizes to encapsulation by other subnet | This guidance also generalizes to encapsulation by other subnet | |||
| technologies with no native support for explicit congestion | technologies with no built-in support for congestion notification at | |||
| notification at the lower layer, but with support for finding and | the lower layer, but with support for finding and processing an IP | |||
| processing an IP header. It is unlikely to be applicable or | header. It is unlikely to be applicable or necessary for IP-in-IP | |||
| necessary for IP-in-IP encapsulation, where feed-forward-and-up mode | encapsulation, where feed-forward-and-up mode based on [RFC6040] | |||
| based on [RFC6040] would be more appropriate. | would be more appropriate. | |||
| Marking the IP header while switching at layer-2 (by using a layer-3 | Marking the IP header while switching at layer 2 (by using a Layer 3 | |||
| switch) or while forwarding in a radio access network seems to | switch) or while forwarding in a radio access network seems to | |||
| represent a layering violation. However, it can be considered as a | represent a layering violation. However, it can be considered as a | |||
| benign optimisation if the guidelines below are followed. Feed-up- | benign optimization if the guidelines below are followed. Feed-up- | |||
| and-forward is certainly not a general alternative to implementing | and-forward is certainly not a general alternative to implementing | |||
| feed-forward congestion notification in the lower layer, because: | feed-forward congestion notification in the lower layer, because: | |||
| * IPv4 and IPv6 are not the only layer-3 protocols that might be | * IPv4 and IPv6 are not the only Layer 3 protocols that might be | |||
| encapsulated by lower layer protocols | encapsulated by lower-layer protocols. | |||
| * Link-layer encryption might be in use, making the layer-2 payload | * Link-layer encryption might be in use, making the Layer 2 payload | |||
| inaccessible | inaccessible. | |||
| * Many Ethernet switches do not have 'layer-3 switch' capabilities | * Many Ethernet switches do not have 'Layer 3 switch' capabilities, | |||
| so they cannot read or modify an IP payload | so the ability to read or modify an IP payload cannot be assumed. | |||
| * It might be costly to find an IP header (IPv4 or IPv6) when it may | * It might be costly to find an IP header (IPv4 or IPv6) when it may | |||
| be encapsulated by more than one lower layer header, e.g. | be encapsulated by more than one lower-layer header, e.g., | |||
| Ethernet MAC in MAC ([IEEE802.1Q]; previously 802.1ah). | Ethernet MAC in MAC ([IEEE802.1Q]; previously 802.1ah). | |||
| Nonetheless, configuring lower layer equipment to look for an ECN | Nonetheless, configuring lower-layer equipment to look for an ECN | |||
| field in an encapsulated IP header is a useful optimisation. If the | field in an encapsulated IP header is a useful optimization. If the | |||
| implementation follows the guidelines below, this optimisation does | implementation follows the guidelines below, this optimization does | |||
| not have to be confined to a controlled environment such as within a | not have to be confined to a controlled environment, e.g., within a | |||
| data centre; it could usefully be applied on any network — even if | data centre; it could usefully be applied in any network -- even if | |||
| the operator is not sure whether the above issues will never apply: | the operator is not sure whether the above issues will never apply: | |||
| 1. If a native lower-layer congestion notification mechanism exists | 1. If a built-in lower-layer congestion notification mechanism | |||
| for a subnet technology, it is safe to mix feed-up-and-forward | exists for a subnet technology, it is safe to mix feed-up-and- | |||
| with feed-forward-and-up on other switches in the same subnet. | forward with feed-forward-and-up on other switches in the same | |||
| However, it will generally be more efficient to use the native | subnet. However, it will generally be more efficient to use the | |||
| mechanism. | built-in mechanism. | |||
| 2. The depth of the search for an IP header SHOULD be limited. If | 2. The depth of the search for an IP header SHOULD be limited. If | |||
| an IP header is not found soon enough, or an unrecognized or | an IP header is not found soon enough, or an unrecognized or | |||
| unreadable header is encountered, the switch SHOULD resort to an | unreadable header is encountered, the switch SHOULD resort to an | |||
| alternative means of signalling congestion (e.g. drop, or the | alternative means of signalling congestion (e.g., drop or the | |||
| native lower layer mechanism if available). | built-in lower-layer mechanism if available). | |||
| 3. It is sufficient to use the first IP header found in the stack; | 3. It is sufficient to use the first IP header found in the stack; | |||
| the egress of the relevant tunnel can propagate congestion | the egress of the relevant tunnel can propagate congestion | |||
| notification upwards to any more deeply encapsulated IP headers | notification upwards to any more deeply encapsulated IP headers | |||
| later. | later. | |||
| 6. Feed-Backward Mode: Guidelines for Adding Congestion Notification | 6. Feed-Backward Mode: Guidelines for Adding Congestion Notification | |||
| It can be seen from Section 3.3 that congestion notification in a | It can be seen from Section 3.3 that congestion notification in a | |||
| subnet using feed-backward mode has generally not been designed to be | subnet using feed-backward mode has generally not been designed to be | |||
| directly coupled with IP layer congestion notification. The subnet | directly coupled with IP-layer congestion notification. The subnet | |||
| attempts to minimize congestion internally, and if the incoming load | attempts to minimize congestion internally, and if the incoming load | |||
| at the ingress exceeds the capacity somewhere through the subnet, the | at the ingress exceeds the capacity somewhere through the subnet, the | |||
| layer 3 buffer into the ingress backs up. Thus, a feed-backward mode | Layer 3 buffer into the ingress backs up. Thus, a feed-backward mode | |||
| subnet is in some sense similar to a null mode subnet, in that there | subnet is in some sense similar to a null mode subnet, in that there | |||
| is no need for any direct interaction between the subnet and higher | is no need for any direct interaction between the subnet and higher- | |||
| layer congestion notification. Therefore no detailed protocol design | layer congestion notification. Therefore, no detailed protocol | |||
| guidelines are appropriate. Nonetheless, a more general guideline is | design guidelines are appropriate. Nonetheless, a more general | |||
| appropriate: | guideline is appropriate: | |||
| A subnetwork technology intended to eventually interface to IP | | A subnetwork technology intended to eventually interface to IP | |||
| SHOULD NOT be designed using only the feed-backward mode, which is | | SHOULD NOT be designed using only the feed-backward mode, which is | |||
| certainly best for a stand-alone subnet, but would need to be | | certainly best for a stand-alone subnet, but would need to be | |||
| modified to work efficiently as part of the wider Internet, | | modified to work efficiently as part of the wider Internet because | |||
| because IP uses feed-forward-and-up mode. | | IP uses feed-forward-and-up mode. | |||
| The feed-backward approach at least works beneath IP, where the term | The feed-backward approach at least works beneath IP, where the term | |||
| 'works' is used only in a narrow functional sense because feed- | 'works' is used only in a narrow functional sense because feed- | |||
| backward can result in very inefficient and sluggish congestion | backward can result in very inefficient and sluggish congestion | |||
| control — except if it is confined to the subnet directly connected | control -- except if it is confined to the subnet directly connected | |||
| to the original data source, when it is faster than feed-forward. It | to the original data source when it is faster than feed-forward. It | |||
| would be valid to design a protocol that could work in feed-backward | would be valid to design a protocol that could work in feed-backward | |||
| mode for paths that only cross one subnet, and in feed-forward-and-up | mode for paths that only cross one subnet, and in feed-forward-and-up | |||
| mode for paths that cross subnets. | mode for paths that cross subnets. | |||
| In the early days of TCP/IP, a similar feed-backward approach was | In the early days of TCP/IP, a similar feed-backward approach was | |||
| tried for explicit congestion signalling, using source-quench (SQ) | tried for explicit congestion signalling using source-quench (SQ) | |||
| ICMP control packets. However, SQ fell out of favour and is now | ICMP control packets. However, SQ fell out of favour and is now | |||
| formally deprecated [RFC6633]. The main problem was that it is hard | formally deprecated [RFC6633]. The main problem was that it is hard | |||
| for a data source to tell the difference between a spoofed SQ message | for a data source to tell the difference between a spoofed SQ message | |||
| and a quench request from a genuine buffer on the path. It is also | and a quench request from a genuine buffer on the path. It is also | |||
| hard for a lower layer buffer to address an SQ message to the | hard for a lower-layer buffer to address an SQ message to the | |||
| original source port number, which may be buried within many layers | original source port number, which may be buried within many layers | |||
| of headers, and possibly encrypted. | of headers and possibly encrypted. | |||
| QCN (also known as backward congestion notification, BCN; see | QCN (also known as Backward Congestion Notification (BCN); see | |||
| Sections 30–33 of [IEEE802.1Q]; previously known as 802.1Qau) uses a | Sections 30-33 of [IEEE802.1Q], previously known as 802.1Qau) uses a | |||
| feed-backward mode structurally similar to ATM's relative rate | feed-backward mode that is structurally similar to ATM's relative | |||
| mechanism. However, QCN confines its applicability to scenarios such | rate mechanism. However, QCN confines its applicability to scenarios | |||
| as some data centres where all endpoints are directly attached by the | such as some data centres where all endpoints are directly attached | |||
| same Ethernet technology. If a QCN subnet were later connected into | by the same Ethernet technology. If a QCN subnet were later | |||
| a wider IP-based internetwork (e.g. when attempting to interconnect | connected into a wider IP-based internetwork (e.g., when attempting | |||
| multiple data centres) it would suffer the inefficiency shown in | to interconnect multiple data centres) it would suffer the | |||
| Figure 3. | inefficiency shown in Figure 3. | |||
| 7. IANA Considerations | 7. IANA Considerations | |||
| This section is to be removed before publishing as an RFC. | This document has no IANA actions. | |||
| This memo includes no request to IANA. | ||||
| 8. Security Considerations | 8. Security Considerations | |||
| If a lower layer wire protocol is redesigned to include explicit | If a lower-layer wire protocol is redesigned to include explicit | |||
| congestion signalling in-band in the protocol header, care SHOULD be | congestion signalling in-band in the protocol header, care SHOULD be | |||
| taken to ensure that the field used is specified as mutable during | taken to ensure that the field used is specified as mutable during | |||
| transit. Otherwise interior nodes signalling congestion would | transit. Otherwise, interior nodes signalling congestion would | |||
| invalidate any authentication protocol applied to the lower layer | invalidate any authentication protocol applied to the lower-layer | |||
| header — by altering a header field that had been assumed as | header -- by altering a header field that had been assumed as | |||
| immutable. | immutable. | |||
| The redesign of protocols that encapsulate IP in order to propagate | The redesign of protocols that encapsulate IP in order to propagate | |||
| congestion signals between layers raises potential signal integrity | congestion signals between layers raises potential signal integrity | |||
| concerns. Experimental or proposed approaches exist for assuring the | concerns. Experimental or proposed approaches exist for assuring the | |||
| end-to-end integrity of in-band congestion signals, e.g.: | end-to-end integrity of in-band congestion signals, such as: | |||
| * Congestion exposure (ConEx) for networks to audit that their | * Congestion Exposure (ConEx) for networks: | |||
| congestion signals are not being suppressed by other networks or | ||||
| by receivers, and for networks to police that senders are | - to audit that their congestion signals are not being suppressed | |||
| responding sufficiently to the signals, irrespective of the L4 | by other networks or by receivers; and | |||
| transport protocol used [RFC7713]. | ||||
| - to police that senders are responding sufficiently to the | ||||
| signals, irrespective of the L4 transport protocol used | ||||
| [RFC7713]. | ||||
| * A test for a sender to detect whether a network or the receiver is | * A test for a sender to detect whether a network or the receiver is | |||
| suppressing congestion signals (for example see 2nd para of | suppressing congestion signals (for example, see the second | |||
| Section 20.2 of [RFC3168]). | paragraph of Section 20.2 of [RFC3168]). | |||
| Given these end-to-end approaches are already being specified, it | Given these end-to-end approaches are already being specified, it | |||
| would make little sense to attempt to design hop-by-hop congestion | would make little sense to attempt to design hop-by-hop congestion | |||
| signal integrity into a new lower layer protocol, because end-to-end | signal integrity into a new lower-layer protocol because end-to-end | |||
| integrity inherently achieves hop-by-hop integrity. | integrity inherently achieves hop-by-hop integrity. | |||
| Section 6 gives vulnerability to spoofing as one of the reasons for | Section 6 gives vulnerability to spoofing as one of the reasons for | |||
| deprecating feed-backward mode. | deprecating feed-backward mode. | |||
| 9. Conclusions | 9. Conclusions | |||
| Following the guidance in this document enables ECN support to be | Following the guidance in this document enables ECN support to be | |||
| extended consistently to numerous protocols that encapsulate IP (IPv4 | extended consistently to numerous protocols that encapsulate IP (IPv4 | |||
| and IPv6), so that IP continues to fulfil its role as an end-to-end | and IPv6) so that IP continues to fulfil its role as an end-to-end | |||
| interoperability layer. This includes: | interoperability layer. This includes: | |||
| * A wide range of tunnelling protocols including those with various | * A wide range of tunnelling protocols, including those with various | |||
| forms of shim header between two IP headers, possibly also | forms of shim header between two IP headers, possibly also | |||
| separated by a L2 header; | separated by an L2 header; | |||
| * A wide range of subnet technologies, particularly those that work | * A wide range of subnet technologies, particularly those that work | |||
| in the same 'feed-forward-and-up' mode that is used to support ECN | in the same 'feed-forward-and-up' mode that is used to support ECN | |||
| in IP and MPLS. | in IP and MPLS. | |||
| Guidelines have been defined for supporting propagation of ECN | Guidelines have been defined for supporting propagation of ECN | |||
| between Ethernet and IP on so-called Layer-3 Ethernet switches, using | between Ethernet and IP on so-called Layer 3 Ethernet switches using | |||
| a 'feed-up-and-forward' mode. This approach could enable other | a 'feed-up-and-forward' mode. This approach could enable other | |||
| subnet technologies to pass ECN signals into the IP layer, even if | subnet technologies to pass ECN signals into the IP layer, even if | |||
| they do not support ECN natively. | the lower-layer protocol does not support ECN. | |||
| Finally, attempting to add ECN to a subnet technology in feed- | Finally, attempting to add congestion notification to a subnet | |||
| backward mode is deprecated except in special cases, due to its | technology in feed-backward mode is deprecated except in special | |||
| likely sluggish response to congestion. | cases due to its likely sluggish response to congestion. | |||
| 10. References | 10. References | |||
| 10.1. Normative References | 10.1. Normative References | |||
| [I-D.ietf-trill-ecn-support] | ||||
| Eastlake, D. E. and B. Briscoe, "TRILL (TRansparent | ||||
| Interconnection of Lots of Links): ECN (Explicit | ||||
| Congestion Notification) Support", Work in Progress, | ||||
| Internet-Draft, draft-ietf-trill-ecn-support-07, 25 | ||||
| February 2018, <https://datatracker.ietf.org/doc/html/ | ||||
| draft-ietf-trill-ecn-support-07>. | ||||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, DOI 10.17487/RFC3168, September 2001, | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
| <https://www.rfc-editor.org/info/rfc3168>. | <https://www.rfc-editor.org/info/rfc3168>. | |||
| skipping to change at page 30, line 13 ¶ | skipping to change at line 1339 ¶ | |||
| 2008, <https://www.rfc-editor.org/info/rfc5129>. | 2008, <https://www.rfc-editor.org/info/rfc5129>. | |||
| [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion | [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion | |||
| Notification", RFC 6040, DOI 10.17487/RFC6040, November | Notification", RFC 6040, DOI 10.17487/RFC6040, November | |||
| 2010, <https://www.rfc-editor.org/info/rfc6040>. | 2010, <https://www.rfc-editor.org/info/rfc6040>. | |||
| [RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion | [RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion | |||
| Notification", BCP 41, RFC 7141, DOI 10.17487/RFC7141, | Notification", BCP 41, RFC 7141, DOI 10.17487/RFC7141, | |||
| February 2014, <https://www.rfc-editor.org/info/rfc7141>. | February 2014, <https://www.rfc-editor.org/info/rfc7141>. | |||
| [RFC9600] Eastlake 3rd, D. and B. Briscoe, "TRILL (TRansparent | ||||
| Interconnection of Lots of Links): ECN (Explicit | ||||
| Congestion Notification) Support", RFC 9600, | ||||
| DOI 10.17487/RFC9600, August 2024, | ||||
| <https://www.rfc-editor.org/info/rfc9600>. | ||||
| 10.2. Informative References | 10.2. Informative References | |||
| [ATM-TM-ABR] | [ATM-TM-ABR] | |||
| Cisco, "Understanding the Available Bit Rate (ABR) Service | Cisco, "Understanding the Available Bit Rate (ABR) Service | |||
| Category for ATM VCs", Design Technote 10415, 5 June 2005, | Category for ATM VCs", Design Technote 10415, June 2005, | |||
| <https://www.cisco.com/c/en/us/support/docs/asynchronous- | <https://www.cisco.com/c/en/us/support/docs/asynchronous- | |||
| transfer-mode-atm/atm-traffic- | transfer-mode-atm/atm-traffic- | |||
| management/10415-atmabr.html>. | management/10415-atmabr.html>. | |||
| [Buck00] Buckwalter, J.T., "Frame Relay: Technology and Practice", | [Buck00] Buckwalter, J.T., "Frame Relay: Technology and Practice", | |||
| Pub. Addison Wesley ISBN-13: 978-0201485240, 2000. | Addison-Wesley Professional, ISBN-13 978-0201485240, 2000. | |||
| [Clos53] Clos, C., "A Study of Non-Blocking Switching Networks", | [Clos53] Clos, C., "A Study of Non-Blocking Switching Networks", | |||
| Bell Systems Technical Journal 32(2):406–424, March 1953. | The Bell System Technical Journal, Vol. 32, Issue 2, | |||
| DOI 10.1002/j.1538-7305.1953.tb01433.x, March 1953, | ||||
| <https://doi.org/10.1002/j.1538-7305.1953.tb01433.x>. | ||||
| [GTPv1] 3GPP, "GPRS Tunnelling Protocol (GTP) across the Gn and Gp | [GTPv1] 3GPP, "General Packet Radio Service (GPRS); GPRS | |||
| interface", Technical Specification TS 29.060. | Tunnelling Protocol (GTP) across the Gn and Gp interface", | |||
| Technical Specification 29.060. | ||||
| [GTPv1-U] 3GPP, "General Packet Radio System (GPRS) Tunnelling | [GTPv1-U] 3GPP, "General Packet Radio System (GPRS) Tunnelling | |||
| Protocol User Plane (GTPv1-U)", Technical Specification TS | Protocol User Plane (GTPv1-U)", Technical | |||
| 29.281. | Specification 29.281. | |||
| [GTPv2-C] 3GPP, "Evolved General Packet Radio Service (GPRS) | [GTPv2-C] 3GPP, "3GPP Evolved Packet System (EPS); Evolved General | |||
| Tunnelling Protocol for Control plane (GTPv2-C)", | Packet Radio Service (GPRS) Tunnelling Protocol for | |||
| Technical Specification TS 29.274. | Control plane (GTPv2-C); Stage 3", Technical | |||
| Specification 29.274. | ||||
| [I-D.ietf-intarea-gue] | [IEEE802.1Q] | |||
| IEEE, "IEEE Standard for Local and Metropolitan Area | ||||
| Network--Bridges and Bridged Networks", IEEE Std 802.1Q- | ||||
| 2022, DOI 10.1109/IEEESTD.2022.10004498, December 2022, | ||||
| <https://doi.org/10.1109/IEEESTD.2022.10004498>. | ||||
| [INTAREA-GUE] | ||||
| Herbert, T., Yong, L., and O. Zia, "Generic UDP | Herbert, T., Yong, L., and O. Zia, "Generic UDP | |||
| Encapsulation", Work in Progress, Internet-Draft, draft- | Encapsulation", Work in Progress, Internet-Draft, draft- | |||
| ietf-intarea-gue-09, 26 October 2019, | ietf-intarea-gue-09, 26 October 2019, | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-intarea- | <https://datatracker.ietf.org/doc/html/draft-ietf-intarea- | |||
| gue-09>. | gue-09>. | |||
| [I-D.ietf-tsvwg-rfc6040update-shim] | ||||
| Briscoe, B., "Propagating Explicit Congestion Notification | ||||
| Across IP Tunnel Headers Separated by a Shim", Work in | ||||
| Progress, Internet-Draft, draft-ietf-tsvwg-rfc6040update- | ||||
| shim-22, 29 October 2023, | ||||
| <https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg- | ||||
| rfc6040update-shim-22>. | ||||
| [IEEE802.1Q] | ||||
| IEEE, "IEEE Standard for Local and Metropolitan Area | ||||
| Networks—Virtual Bridged Local Area Networks—Amendment 6: | ||||
| Provider Backbone Bridges", IEEE Std 802.1Q-2018, July | ||||
| 2018, <https://ieeexplore.ieee.org/document/8403927>. | ||||
| [ITU-T.I.371] | [ITU-T.I.371] | |||
| ITU-T, "Traffic Control and Congestion Control in B-ISDN", | ITU-T, "Traffic control and congestion control in B-ISDN", | |||
| ITU-T Rec. I.371 (03/04), March 2004, | ITU-T Recommendation I.371, March 2004, | |||
| <https://www.itu.int/rec/T-REC-I.371>. | <https://www.itu.int/rec/T-REC-I.371-200403-I/en>. | |||
| [Leiserson85] | [Leiserson85] | |||
| Leiserson, C.E., "Fat-trees: universal networks for | Leiserson, C.E., "Fat-trees: Universal networks for | |||
| hardware-efficient supercomputing", IEEE Transactions on | hardware-efficient supercomputing", IEEE Transactions on | |||
| Computers 34(10):892–901, October 1985. | Computers, Vol. C-34, Issue 10, | |||
| DOI 10.1109/TC.1985.6312192, October 1985, | ||||
| <https://doi.org/10.1109/TC.1985.6312192>. | ||||
| [LTE-RA] 3GPP, "Evolved Universal Terrestrial Radio Access (E-UTRA) | [LTE-RA] 3GPP, "Evolved Universal Terrestrial Radio Access (E-UTRA) | |||
| and Evolved Universal Terrestrial Radio Access Network | and Evolved Universal Terrestrial Radio Access Network | |||
| (E-UTRAN); Overall description; Stage 2", Technical | (E-UTRAN); Overall description; Stage 2", Technical | |||
| Specification TS 36.300. | Specification 36.300. | |||
| [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, | [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, | |||
| DOI 10.17487/RFC2003, October 1996, | DOI 10.17487/RFC2003, October 1996, | |||
| <https://www.rfc-editor.org/info/rfc2003>. | <https://www.rfc-editor.org/info/rfc2003>. | |||
| [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in | [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in | |||
| IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, | IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, | |||
| December 1998, <https://www.rfc-editor.org/info/rfc2473>. | December 1998, <https://www.rfc-editor.org/info/rfc2473>. | |||
| [RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, | [RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, | |||
| skipping to change at page 34, line 21 ¶ | skipping to change at line 1544 ¶ | |||
| Cabellos, Ed., "The Locator/ID Separation Protocol | Cabellos, Ed., "The Locator/ID Separation Protocol | |||
| (LISP)", RFC 9300, DOI 10.17487/RFC9300, October 2022, | (LISP)", RFC 9300, DOI 10.17487/RFC9300, October 2022, | |||
| <https://www.rfc-editor.org/info/rfc9300>. | <https://www.rfc-editor.org/info/rfc9300>. | |||
| [RFC9331] De Schepper, K. and B. Briscoe, Ed., "The Explicit | [RFC9331] De Schepper, K. and B. Briscoe, Ed., "The Explicit | |||
| Congestion Notification (ECN) Protocol for Low Latency, | Congestion Notification (ECN) Protocol for Low Latency, | |||
| Low Loss, and Scalable Throughput (L4S)", RFC 9331, | Low Loss, and Scalable Throughput (L4S)", RFC 9331, | |||
| DOI 10.17487/RFC9331, January 2023, | DOI 10.17487/RFC9331, January 2023, | |||
| <https://www.rfc-editor.org/info/rfc9331>. | <https://www.rfc-editor.org/info/rfc9331>. | |||
| [UTRAN] 3GPP, "UTRAN Overall Description", Technical | [RFC9601] Briscoe, B., "Propagating Explicit Congestion Notification | |||
| Specification TS 25.401. | Across IP Tunnel Headers Separated by a Shim", RFC 9601, | |||
| DOI 10.17487/RFC9601, August 2024, | ||||
| Comments Solicited | <https://www.rfc-editor.org/info/rfc9601>. | |||
| This section is to be removed before publishing as an RFC. | ||||
| Comments and questions are encouraged and very welcome. They can be | [UTRAN] 3GPP, "UTRAN overall description", Technical | |||
| addressed to the IETF Transport Area working group mailing list | Specification 25.401. | |||
| <tsvwg@ietf.org>, and/or to the authors. | ||||
| Acknowledgements | Acknowledgements | |||
| Thanks to Gorry Fairhurst and David Black for extensive reviews. | Thanks to Gorry Fairhurst and David Black for extensive reviews. | |||
| Thanks also to the following reviewers: Joe Touch, Andrew McGregor, | Thanks also to the following reviewers: Joe Touch, Andrew McGregor, | |||
| Richard Scheffenegger, Ingemar Johansson, Piers O'Hanlon, Donald | Richard Scheffenegger, Ingemar Johansson, Piers O'Hanlon, Donald | |||
| Eastlake, Jonathan Morton, Markku Kojo, Sebastian Möller, Martin Duke | Eastlake 3rd, Jonathan Morton, Markku Kojo, Sebastian Möller, Martin | |||
| and Michael Welzl, who pointed out that lower layer congestion | Duke, and Michael Welzl, who pointed out that lower-layer congestion | |||
| notification signals may have different semantics to those in IP. | notification signals may have different semantics to those in IP. | |||
| Thanks are also due to the tsvwg chairs, TSV ADs and IETF liaison | Thanks are also due to the Transport and Services Working Group | |||
| people such as Eric Gray, Dan Romascanu and Gonzalo Camarillo for | (tsvwg) chairs, TSV ADs and IETF liaison people such as Eric Gray, | |||
| helping with the liaisons with the IEEE and 3GPP. And thanks to | Dan Romascanu and Gonzalo Camarillo for helping with the liaisons | |||
| Georg Mayer and particularly to Erik Guttman for the extensive search | with the IEEE and 3GPP. And thanks to Georg Mayer and particularly | |||
| and categorisation of any 3GPP specifications that cite ECN | to Erik Guttman for the extensive search and categorization of any | |||
| specifications. Thanks also to the Area Reviewers Dan Harkins, Paul | 3GPP specifications that cite ECN specifications. Thanks also to the | |||
| Kyzivat, Sue Hares and Dale Worley. | Area Reviewers Dan Harkins, Paul Kyzivat, Sue Hares, and Dale Worley. | |||
| Bob Briscoe was part-funded by the European Community under its | Bob Briscoe was part-funded by the European Community under its | |||
| Seventh Framework Programme through the Trilogy project (ICT-216372) | Seventh Framework Programme through the Trilogy project (ICT-216372) | |||
| for initial drafts then through the Reducing Internet Transport | for initial drafts then through the Reducing Internet Transport | |||
| Latency (RITE) project (ICT-317700), and for final drafts (from -18) | Latency (RITE) project (ICT-317700), and for final drafts (from -18) | |||
| he was funded by Apple Inc. The views expressed here are solely those | he was funded by Apple Inc. The views expressed here are solely those | |||
| of the authors. | of the authors. | |||
| Contributors | Contributors | |||
| Pat Thaler | Pat Thaler | |||
| Broadcom Corporation (retired) | Broadcom Corporation (retired) | |||
| CA | CA | |||
| USA | United States of America | |||
| Pat was a co-author of this draft, but retired before its | Pat was a coauthor of this document, but retired before its | |||
| publication. | publication. | |||
| Authors' Addresses | Authors' Addresses | |||
| Bob Briscoe | Bob Briscoe | |||
| Independent | Independent | |||
| United Kingdom | United Kingdom | |||
| Email: ietf@bobbriscoe.net | Email: ietf@bobbriscoe.net | |||
| URI: https://bobbriscoe.net/ | URI: https://bobbriscoe.net/ | |||
| End of changes. 220 change blocks. | ||||
| 724 lines changed or deleted | 710 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||