| rfc9332.original | rfc9332.txt | |||
|---|---|---|---|---|
| Transport Area working group (tsvwg) K. De Schepper | Internet Engineering Task Force (IETF) K. De Schepper | |||
| Internet-Draft Nokia Bell Labs | Request for Comments: 9332 Nokia Bell Labs | |||
| Intended status: Experimental B. Briscoe, Ed. | Category: Experimental B. Briscoe, Ed. | |||
| Expires: 2 March 2023 Independent | ISSN: 2070-1721 Independent | |||
| G. White | G. White | |||
| CableLabs | CableLabs | |||
| 29 August 2022 | January 2023 | |||
| DualQ Coupled AQMs for Low Latency, Low Loss and Scalable Throughput | Dual-Queue Coupled Active Queue Management (AQM) for Low Latency, Low | |||
| (L4S) | Loss, and Scalable Throughput (L4S) | |||
| draft-ietf-tsvwg-aqm-dualq-coupled-25 | ||||
| Abstract | Abstract | |||
| This specification defines a framework for coupling the Active Queue | This specification defines a framework for coupling the Active Queue | |||
| Management (AQM) algorithms in two queues intended for flows with | Management (AQM) algorithms in two queues intended for flows with | |||
| different responses to congestion. This provides a way for the | different responses to congestion. This provides a way for the | |||
| Internet to transition from the scaling problems of standard TCP | Internet to transition from the scaling problems of standard TCP- | |||
| Reno-friendly ('Classic') congestion controls to the family of | Reno-friendly ('Classic') congestion controls to the family of | |||
| 'Scalable' congestion controls. These are designed for consistently | 'Scalable' congestion controls. These are designed for consistently | |||
| very Low queuing Latency, very Low congestion Loss and Scaling of | very low queuing latency, very low congestion loss, and scaling of | |||
| per-flow throughput (L4S) by using Explicit Congestion Notification | per-flow throughput by using Explicit Congestion Notification (ECN) | |||
| (ECN) in a modified way. Until the Coupled DualQ, these scalable L4S | in a modified way. Until the Coupled Dual Queue (DualQ), these | |||
| congestion controls could only be deployed where a clean-slate | Scalable L4S congestion controls could only be deployed where a | |||
| environment could be arranged, such as in private data centres. | clean-slate environment could be arranged, such as in private data | |||
| centres. | ||||
| The specification first explains how a Coupled DualQ works. It then | This specification first explains how a Coupled DualQ works. It then | |||
| gives the normative requirements that are necessary for it to work | gives the normative requirements that are necessary for it to work | |||
| well. All this is independent of which two AQMs are used, but | well. All this is independent of which two AQMs are used, but | |||
| pseudocode examples of specific AQMs are given in appendices. | pseudocode examples of specific AQMs are given in appendices. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
| provisions of BCP 78 and BCP 79. | published for examination, experimental implementation, and | |||
| evaluation. | ||||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document defines an Experimental Protocol for the Internet | |||
| and may be updated, replaced, or obsoleted by other documents at any | community. This document is a product of the Internet Engineering | |||
| time. It is inappropriate to use Internet-Drafts as reference | Task Force (IETF). It represents the consensus of the IETF | |||
| material or to cite them other than as "work in progress." | community. It has received public review and has been approved for | |||
| publication by the Internet Engineering Steering Group (IESG). Not | ||||
| all documents approved by the IESG are candidates for any level of | ||||
| Internet Standard; see Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 2 March 2023. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9332. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2022 IETF Trust and the persons identified as the | Copyright (c) 2023 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
| 1.1. Outline of the Problem . . . . . . . . . . . . . . . . . 3 | 1.1. Outline of the Problem | |||
| 1.2. Context, Scope & Applicability . . . . . . . . . . . . . 6 | 1.2. Context, Scope, and Applicability | |||
| 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 | 1.3. Terminology | |||
| 1.4. Features . . . . . . . . . . . . . . . . . . . . . . . . 9 | 1.4. Features | |||
| 2. DualQ Coupled AQM . . . . . . . . . . . . . . . . . . . . . . 11 | 2. DualQ Coupled AQM | |||
| 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 11 | 2.1. Coupled AQM | |||
| 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 12 | 2.2. Dual Queue | |||
| 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 12 | 2.3. Traffic Classification | |||
| 2.4. Overall DualQ Coupled AQM Structure . . . . . . . . . . . 13 | 2.4. Overall DualQ Coupled AQM Structure | |||
| 2.5. Normative Requirements for a DualQ Coupled AQM . . . . . 17 | 2.5. Normative Requirements for a DualQ Coupled AQM | |||
| 2.5.1. Functional Requirements . . . . . . . . . . . . . . . 17 | 2.5.1. Functional Requirements | |||
| 2.5.1.1. Requirements in Unexpected Cases . . . . . . . . 18 | 2.5.1.1. Requirements in Unexpected Cases | |||
| 2.5.2. Management Requirements . . . . . . . . . . . . . . . 19 | 2.5.2. Management Requirements | |||
| 2.5.2.1. Configuration . . . . . . . . . . . . . . . . . . 19 | 2.5.2.1. Configuration | |||
| 2.5.2.2. Monitoring . . . . . . . . . . . . . . . . . . . 21 | 2.5.2.2. Monitoring | |||
| 2.5.2.3. Anomaly Detection . . . . . . . . . . . . . . . . 22 | 2.5.2.3. Anomaly Detection | |||
| 2.5.2.4. Deployment, Coexistence and Scaling . . . . . . . 22 | 2.5.2.4. Deployment, Coexistence, and Scaling | |||
| 3. IANA Considerations (to be removed by RFC Editor) . . . . . . 22 | 3. IANA Considerations | |||
| 4. Security Considerations . . . . . . . . . . . . . . . . . . . 22 | 4. Security Considerations | |||
| 4.1. Low Delay without Requiring Per-Flow Processing . . . . . 22 | 4.1. Low Delay without Requiring Per-flow Processing | |||
| 4.2. Handling Unresponsive Flows and Overload . . . . . . . . 23 | 4.2. Handling Unresponsive Flows and Overload | |||
| 4.2.1. Unresponsive Traffic without Overload . . . . . . . . 24 | 4.2.1. Unresponsive Traffic without Overload | |||
| 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S | 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S | |||
| Throughput or Delay? . . . . . . . . . . . . . . . . 25 | Throughput or Delay? | |||
| 4.2.3. L4S ECN Saturation: Introduce Drop or Delay? . . . . 26 | 4.2.3. L4S ECN Saturation: Introduce Drop or Delay? | |||
| 4.2.3.1. Protecting against Overload by Unresponsive | 4.2.3.1. Protecting against Overload by Unresponsive | |||
| ECN-Capable Traffic . . . . . . . . . . . . . . . . 28 | ECN-Capable Traffic | |||
| 5. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 | 5. References | |||
| 5.1. Normative References . . . . . . . . . . . . . . . . . . 28 | 5.1. Normative References | |||
| 5.2. Informative References . . . . . . . . . . . . . . . . . 29 | 5.2. Informative References | |||
| Appendix A. Example DualQ Coupled PI2 Algorithm . . . . . . . . 35 | Appendix A. Example DualQ Coupled PI2 Algorithm | |||
| A.1. Pass #1: Core Concepts . . . . . . . . . . . . . . . . . 35 | A.1. Pass #1: Core Concepts | |||
| A.2. Pass #2: Edge-Case Details . . . . . . . . . . . . . . . 46 | A.2. Pass #2: Edge-Case Details | |||
| Appendix B. Example DualQ Coupled Curvy RED Algorithm . . . . . 51 | Appendix B. Example DualQ Coupled Curvy RED Algorithm | |||
| B.1. Curvy RED in Pseudocode . . . . . . . . . . . . . . . . . 51 | B.1. Curvy RED in Pseudocode | |||
| B.2. Efficient Implementation of Curvy RED . . . . . . . . . . 57 | B.2. Efficient Implementation of Curvy RED | |||
| Appendix C. Choice of Coupling Factor, k . . . . . . . . . . . . 59 | Appendix C. Choice of Coupling Factor, k | |||
| C.1. RTT-Dependence . . . . . . . . . . . . . . . . . . . . . 59 | C.1. RTT-Dependence | |||
| C.2. Guidance on Controlling Throughput Equivalence . . . . . 60 | C.2. Guidance on Controlling Throughput Equivalence | |||
| Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 64 | Acknowledgements | |||
| Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 64 | Contributors | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 65 | Authors' Addresses | |||
| 1. Introduction | 1. Introduction | |||
| This document specifies a framework for DualQ Coupled AQMs, which can | This document specifies a framework for DualQ Coupled AQMs, which can | |||
| serve as the network part of the L4S | serve as the network part of the L4S architecture [RFC9330]. A DualQ | |||
| architecture [I-D.ietf-tsvwg-l4s-arch]. A Coupled DualQ AQM consists | Coupled AQM consists of two queues: L4S and Classic. The L4S queue | |||
| of two queues; L4S and Classic. The L4S queue is intended for | is intended for Scalable congestion controls that can maintain very | |||
| Scalable congestion controls that can maintain very low queuing | low queuing latency (sub-millisecond on average) and high throughput | |||
| latency (sub-millisecond on average) and high throughput at the same | at the same time. The Coupled DualQ acts like a semi-permeable | |||
| time. The Coupled DualQ acts like a semi-permeable membrane: the L4S | membrane: the L4S queue isolates the sub-millisecond average queuing | |||
| queue isolates the sub-millisecond average queuing delay of L4S from | delay of L4S from Classic latency, while the coupling between the | |||
| Classic latency; while the coupling between the queues pools the | queues pools the capacity between both queues so that ad hoc numbers | |||
| capacity between both queues so that ad hoc numbers of capacity- | of capacity-seeking applications all sharing the same capacity can | |||
| seeking applications all sharing the same capacity can have roughly | have roughly equivalent throughput per flow, whichever queue they | |||
| equivalent throughput per flow, whichever queue they use. The DualQ | use. The DualQ achieves this indirectly, without having to inspect | |||
| achieves this indirectly, without having to inspect transport layer | transport-layer flow identifiers and without compromising the | |||
| flow identifiers and without compromising the performance of the | performance of the Classic traffic, relative to a single queue. The | |||
| Classic traffic, relative to a single queue. The DualQ design has | DualQ design has low complexity and requires no configuration for the | |||
| low complexity and requires no configuration for the public Internet. | public Internet. | |||
| 1.1. Outline of the Problem | 1.1. Outline of the Problem | |||
| Latency is becoming the critical performance factor for many (most?) | Latency is becoming the critical performance factor for many (perhaps | |||
| applications on the public Internet, e.g. interactive Web, Web | most) applications on the public Internet, e.g., interactive web, web | |||
| services, voice, conversational video, interactive video, interactive | services, voice, conversational video, interactive video, interactive | |||
| remote presence, instant messaging, online gaming, remote desktop, | remote presence, instant messaging, online gaming, remote desktop, | |||
| cloud-based applications, and video-assisted remote control of | cloud-based applications, and video-assisted remote control of | |||
| machinery and industrial processes. Once access network bit rates | machinery and industrial processes. Once access network bitrates | |||
| reach levels now common in the developed world, further increases | reach levels now common in the developed world, further increases | |||
| offer diminishing returns unless latency is also addressed | offer diminishing returns unless latency is also addressed | |||
| [Dukkipati06]. In the last decade or so, much has been done to | [Dukkipati06]. In the last decade or so, much has been done to | |||
| reduce propagation time by placing caches or servers closer to users. | reduce propagation time by placing caches or servers closer to users. | |||
| However, queuing remains a major intermittent component of latency. | However, queuing remains a major intermittent component of latency. | |||
| Traditionally very low latency has only been available for a few | Previously, very low latency has only been available for a few | |||
| selected low rate applications, that confine their sending rate | selected low-rate applications, that confine their sending rate | |||
| within a specially carved-off portion of capacity, which is | within a specially carved-off portion of capacity, which is | |||
| prioritized over other traffic, e.g. Diffserv EF [RFC3246]. Up to | prioritized over other traffic, e.g., Diffserv Expedited Forwarding | |||
| now it has not been possible to allow any number of low latency, high | (EF) [RFC3246]. Up to now, it has not been possible to allow any | |||
| throughput applications to seek to fully utilize available capacity, | number of low-latency, high throughput applications to seek to fully | |||
| because the capacity-seeking process itself causes too much queuing | utilize available capacity, because the capacity-seeking process | |||
| delay. | itself causes too much queuing delay. | |||
| To reduce this queuing delay caused by the capacity seeking process, | To reduce this queuing delay caused by the capacity-seeking process, | |||
| changes either to the network alone or to end-systems alone are in | changes either to the network alone or to end systems alone are in | |||
| progress. L4S involves a recognition that both approaches are | progress. L4S involves a recognition that both approaches are | |||
| yielding diminishing returns: | yielding diminishing returns: | |||
| * Recent state-of-the-art active queue management (AQM) in the | * Recent state-of-the-art AQM in the network, e.g., Flow Queue CoDel | |||
| network, e.g. FQ-CoDel [RFC8290], PIE [RFC8033], Adaptive | [RFC8290], Proportional Integral controller Enhanced (PIE) | |||
| RED [ARED01] ) has reduced queuing delay for all traffic, not just | [RFC8033], and Adaptive Random Early Detection (ARED) [ARED01]), | |||
| a select few applications. However, no matter how good the AQM, | has reduced queuing delay for all traffic, not just a select few | |||
| the capacity-seeking (sawtoothing) rate of TCP-like congestion | applications. However, no matter how good the AQM, the capacity- | |||
| controls represents a lower limit that will either cause queuing | seeking (sawtoothing) rate of TCP-like congestion controls | |||
| delay to vary or cause the link to be under-utilized. These AQMs | represents a lower limit that will cause either the queuing delay | |||
| are tuned to allow a typical capacity-seeking Reno-friendly flow | to vary or the link to be underutilized. These AQMs are tuned to | |||
| to induce an average queue that roughly doubles the base RTT, | allow a typical capacity-seeking TCP-Reno-friendly flow to induce | |||
| adding 5-15 ms of queuing on average (cf. 500 microseconds with | an average queue that roughly doubles the base round-trip time | |||
| L4S for the same mix of long-running and web traffic). However, | (RTT), adding 5-15 ms of queuing on average for a mix of long- | |||
| for many applications low delay is not useful unless it is | running flows and web traffic (cf. 500 microseconds with L4S for | |||
| consistently low. With these AQMs, 99th percentile queuing delay | the same traffic mix [L4Seval22]). However, for many | |||
| is 20-30 ms (cf. 2 ms with the same traffic over L4S). | applications, low delay is not useful unless it is consistently | |||
| low. With these AQMs, 99th percentile queuing delay is 20-30 ms | ||||
| (cf. 2 ms with the same traffic over L4S). | ||||
| * Similarly, recent research into using e2e congestion control | * Similarly, recent research into using end-to-end congestion | |||
| without needing an AQM in the network (e.g. BBR | control without needing an AQM in the network (e.g., Bottleneck | |||
| [I-D.cardwell-iccrg-bbr-congestion-control]) seems to have hit a | Bandwidth and Round-trip propagation time (BBR) [BBR-CC]) seems to | |||
| similar lower limit to queuing delay of about 20ms on average, but | have hit a similar queuing delay floor of about 20 ms on average, | |||
| there are also regular 25ms delay spikes due to bandwidth probes | but there are also regular 25 ms delay spikes due to bandwidth | |||
| and 60ms spikes due to flow-starts. | probes and 60 ms spikes due to flow-starts. | |||
| L4S learns from the experience of Data Center TCP [RFC8257], which | L4S learns from the experience of Data Center TCP (DCTCP) [RFC8257], | |||
| shows the power of complementary changes both in the network and on | which shows the power of complementary changes both in the network | |||
| end-systems. DCTCP teaches us that two small but radical changes to | and on end systems. DCTCP teaches us that two small but radical | |||
| congestion control are needed to cut the two major outstanding causes | changes to congestion control are needed to cut the two major | |||
| of queuing delay variability: | outstanding causes of queuing delay variability: | |||
| 1. Far smaller rate variations (sawteeth) than Reno-friendly | 1. Far smaller rate variations (sawteeth) than Reno-friendly | |||
| congestion controls; | congestion controls. | |||
| 2. A shift of smoothing and hence smoothing delay from network to | 2. A shift of smoothing and hence smoothing delay from network to | |||
| sender. | sender. | |||
| Without the former, a 'Classic' (e.g. Reno-friendly) flow's round | Without the former, a 'Classic' (e.g., Reno-friendly) flow's RTT | |||
| trip time (RTT) varies between roughly 1 and 2 times the base RTT | varies between roughly 1 and 2 times the base RTT between the | |||
| between the machines in question. Without the latter a 'Classic' | machines in question. Without the latter, a 'Classic' flow's | |||
| flow's response to changing events is delayed by a worst-case | response to changing events is delayed by a worst-case | |||
| (transcontinental) RTT, which could be hundreds of times the actual | (transcontinental) RTT, which could be hundreds of times the actual | |||
| smoothing delay needed for the RTT of typical traffic from localized | smoothing delay needed for the RTT of typical traffic from localized | |||
| CDNs. | Content Delivery Networks (CDNs). | |||
| These changes are the two main features of the family of so-called | These changes are the two main features of the family of so-called | |||
| 'Scalable' congestion controls (which includes DCTCP, TCP Prague and | 'Scalable' congestion controls (which include DCTCP, Prague, and | |||
| SCReAM). Both these changes only reduce delay in combination with a | Self-Clocked Rate Adaptation for Multimedia (SCReAM)). Both of these | |||
| complementary change in the network and they are both only feasible | changes only reduce delay in combination with a complementary change | |||
| with ECN, not drop, for the signalling: | in the network, and they are both only feasible with ECN, not drop, | |||
| for the signalling: | ||||
| 1. The smaller sawteeth allow an extremely shallow ECN packet- | 1. The smaller sawteeth allow an extremely shallow ECN packet- | |||
| marking threshold in the queue. | marking threshold in the queue. | |||
| 2. And no smoothing in the network means that every fluctuation of | 2. No smoothing in the network means that every fluctuation of the | |||
| the queue is signalled immediately. | queue is signalled immediately. | |||
| Without ECN, either of these would lead to very high loss levels. | Without ECN, either of these would lead to very high loss levels. In | |||
| But, with ECN, the resulting high marking levels are just signals, | contrast, with ECN, the resulting high marking levels are just | |||
| not impairments. (Note that BBRv2 [BBRv2] combines the best of both | signals, not impairments. (Note that BBRv2 [BBRv2] combines the best | |||
| worlds - it works as a scalable congestion control when ECN is | of both worlds -- it works as a Scalable congestion control when ECN | |||
| available, but also aims to minimize delay when it isn't.) | is available, but it also aims to minimize delay when ECN is absent.) | |||
| However, until now, Scalable congestion controls (like DCTCP) did not | However, until now, Scalable congestion controls (like DCTCP) did not | |||
| co-exist well in a shared ECN-capable queue with existing Classic | coexist well in a shared ECN-capable queue with existing Classic | |||
| (e.g. Reno [RFC5681] or Cubic [RFC8312]) congestion controls -- | (e.g., Reno [RFC5681] or CUBIC [RFC8312]) congestion controls -- | |||
| Scalable controls are so aggressive that these 'Classic' algorithms | Scalable controls are so aggressive that these 'Classic' algorithms | |||
| would drive themselves to a small capacity share. Therefore, until | would drive themselves to a small capacity share. Therefore, until | |||
| now, L4S controls could only be deployed where a clean-slate | now, L4S controls could only be deployed where a clean-slate | |||
| environment could be arranged, such as in private data centres (hence | environment could be arranged, such as in private data centres (hence | |||
| the name DCTCP). | the name DCTCP). | |||
| One way to solve the problem of coexistence between Scalable and | One way to solve the problem of coexistence between Scalable and | |||
| Classic flows is to use a per-flow-queuing approach such as FQ- | Classic flows is to use a per-flow-queuing (FQ) approach such as FQ- | |||
| CoDel [RFC8290]. It classifies packets by flow identifier into | CoDel [RFC8290]. It classifies packets by flow identifier into | |||
| separate queues in order to isolate sparse flows from the higher | separate queues in order to isolate sparse flows from the higher | |||
| latency in the queues assigned to heavier flows. However, if a | latency in the queues assigned to heavier flows. However, if a | |||
| Classic flow needs both low delay and high throughput, having a queue | Classic flow needs both low delay and high throughput, having a queue | |||
| to itself does not isolate it from the harm it causes to itself. | to itself does not isolate it from the harm it causes to itself. | |||
| Also FQ approaches need to inspect flow identifiers, which is not | Also FQ approaches need to inspect flow identifiers, which is not | |||
| always practical. | always practical. | |||
| In summary, Scalable congestion controls address the root cause of | In summary, Scalable congestion controls address the root cause of | |||
| the latency, loss and scaling problems with Classic congestion | the latency, loss and scaling problems with Classic congestion | |||
| controls. Both FQ and DualQ AQMs can be enablers for this smooth low | controls. Both FQ and DualQ AQMs can be enablers for this smooth | |||
| latency scalable behaviour. The DualQ approach is particularly | low-latency scalable behaviour. The DualQ approach is particularly | |||
| useful because identifying flows is sometimes not practical or | useful because identifying flows is sometimes not practical or | |||
| desirable. | desirable. | |||
| 1.2. Context, Scope & Applicability | 1.2. Context, Scope, and Applicability | |||
| L4S involves complementary changes in the network and on end-systems: | L4S involves complementary changes in the network and on end systems: | |||
| Network: A DualQ Coupled AQM (defined in the present document) or a | Network: | |||
| modification to flow-queue AQMs (described in section 4.2.b of the | A DualQ Coupled AQM (defined in the present document) or a | |||
| L4S architecture [I-D.ietf-tsvwg-l4s-arch]); | modification to flow queue AQMs (described in paragraph "b" in | |||
| Section 4.2 of the L4S architecture [RFC9330]). | ||||
| End-system: A Scalable congestion control (defined in section 4 of | End system: | |||
| the L4S ECN protocol [I-D.ietf-tsvwg-ecn-l4s-id]). | A Scalable congestion control (defined in Section 4 of the L4S ECN | |||
| protocol spec [RFC9331]). | ||||
| Packet identifier: The network and end-system parts of L4S can be | Packet identifier: | |||
| deployed incrementally, because they both identify L4S packets | The network and end-system parts of L4S can be deployed | |||
| using the experimentally assigned explicit congestion notification | incrementally, because they both identify L4S packets using the | |||
| (ECN) codepoints in the IP header: ECT(1) and CE [RFC8311] | experimentally assigned ECN codepoints in the IP header: ECT(1) | |||
| [I-D.ietf-tsvwg-ecn-l4s-id]. | and CE [RFC8311] [RFC9331]. | |||
| Data Center TCP (DCTCP [RFC8257]) is an example of a Scalable | DCTCP [RFC8257] is an example of a Scalable congestion control for | |||
| congestion control for controlled environments that has been deployed | controlled environments that has been deployed for some time in | |||
| for some time in Linux, Windows and FreeBSD operating systems. | Linux, Windows, and FreeBSD operating systems. During the progress | |||
| During the progress of this document through the IETF a number of | of this document through the IETF, a number of other Scalable | |||
| other Scalable congestion controls were implemented, e.g. TCP Prague | congestion controls were implemented, e.g., Prague over TCP and QUIC | |||
| [I-D.briscoe-iccrg-prague-congestion-control] [PragueLinux], BBRv2 | [PRAGUE-CC] [PragueLinux], BBRv2 [BBRv2] [BBR-CC], and the L4S | |||
| [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control], QUIC Prague and | variant of SCReAM for real-time media [SCReAM-L4S] [RFC8298]. | |||
| the L4S variant of SCREAM for real-time media [RFC8298]. | ||||
| The focus of this specification is to enable deployment of the | The focus of this specification is to enable deployment of the | |||
| network part of the L4S service. Then, without any management | network part of the L4S service. Then, without any management | |||
| intervention, applications can exploit this new network capability as | intervention, applications can exploit this new network capability as | |||
| their operating systems migrate to Scalable congestion controls, | the applications or their operating systems migrate to Scalable | |||
| which can then evolve _while_ their benefits are being enjoyed by | congestion controls, which can then evolve _while_ their benefits are | |||
| everyone on the Internet. | being enjoyed by everyone on the Internet. | |||
| The DualQ Coupled AQM framework can incorporate any AQM designed for | The DualQ Coupled AQM framework can incorporate any AQM designed for | |||
| a single queue that generates a statistical or deterministic mark/ | a single queue that generates a statistical or deterministic mark/ | |||
| drop probability driven by the queue dynamics. Pseudocode examples | drop probability driven by the queue dynamics. Pseudocode examples | |||
| of two different DualQ Coupled AQMs are given in the appendices. In | of two different DualQ Coupled AQMs are given in the appendices. In | |||
| many cases the framework simplifies the basic control algorithm, and | many cases the framework simplifies the basic control algorithm and | |||
| requires little extra processing. Therefore, it is believed the | requires little extra processing. Therefore, it is believed the | |||
| Coupled AQM would be applicable and easy to deploy in all types of | Coupled AQM would be applicable and easy to deploy in all types of | |||
| buffers; buffers in cost-reduced mass-market residential equipment; | buffers such as buffers in cost-reduced mass-market residential | |||
| buffers in end-system stacks; buffers in carrier-scale equipment | equipment; buffers in end-system stacks; buffers in carrier-scale | |||
| including remote access servers, routers, firewalls and Ethernet | equipment including remote access servers, routers, firewalls, and | |||
| switches; buffers in network interface cards, buffers in virtualized | Ethernet switches; buffers in network interface cards; buffers in | |||
| network appliances, hypervisors, and so on. | virtualized network appliances, hypervisors; and so on. | |||
| For the public Internet, nearly all the benefit will typically be | For the public Internet, nearly all the benefit will typically be | |||
| achieved by deploying the Coupled AQM into either end of the access | achieved by deploying the Coupled AQM into either end of the access | |||
| link between a 'site' and the Internet, which is invariably the | link between a 'site' and the Internet, which is invariably the | |||
| bottleneck (see section 6.4 of[I-D.ietf-tsvwg-l4s-arch] about | bottleneck (see Section 6.4 of [RFC9330] about deployment, which also | |||
| deployment, which also defines the term 'site' to mean a home, an | defines the term 'site' to mean a home, an office, a campus, or | |||
| office, a campus or mobile user equipment). | mobile user equipment). | |||
| Latency is not the only concern of L4S: | Latency is not the only concern of L4S: | |||
| * The "Low Loss" part of the name denotes that L4S generally | * The 'Low Loss' part of the name denotes that L4S generally | |||
| achieves zero congestion loss (which would otherwise cause | achieves zero congestion loss (which would otherwise cause | |||
| retransmission delays), due to its use of ECN. | retransmission delays), due to its use of ECN. | |||
| * The "Scalable throughput" part of the name denotes that the per- | * The 'Scalable throughput' part of the name denotes that the per- | |||
| flow throughput of Scalable congestion controls should scale | flow throughput of Scalable congestion controls should scale | |||
| indefinitely, avoiding the imminent scaling problems with 'TCP- | indefinitely, avoiding the imminent scaling problems with 'TCP- | |||
| Friendly' congestion control algorithms [RFC3649]. | Friendly' congestion control algorithms [RFC3649]. | |||
| The former is clearly in scope of this AQM document. However, the | The former is clearly in scope of this AQM document. However, the | |||
| latter is an outcome of the end-system behaviour, and therefore | latter is an outcome of the end-system behaviour and is therefore | |||
| outside the scope of this AQM document, even though the AQM is an | outside the scope of this AQM document, even though the AQM is an | |||
| enabler. | enabler. | |||
| The overall L4S architecture [I-D.ietf-tsvwg-l4s-arch] gives more | The overall L4S architecture [RFC9330] gives more detail, including | |||
| detail, including on wider deployment aspects such as backwards | on wider deployment aspects such as backwards compatibility of | |||
| compatibility of Scalable congestion controls in bottlenecks where a | Scalable congestion controls in bottlenecks where a DualQ Coupled AQM | |||
| DualQ Coupled AQM has not been deployed. The supporting papers | has not been deployed. The supporting papers [L4Seval22], | |||
| [DualPI2Linux], [PI2], [DCttH19] and [PI2param] give the full | [DualPI2Linux], [PI2], and [PI2param] give the full rationale for the | |||
| rationale for the AQM's design, both discursively and in more precise | AQM design, both discursively and in more precise mathematical form, | |||
| mathematical form, as well as the results of performance evaluations. | as well as the results of performance evaluations. The main results | |||
| The main results have been validated independently when using the | have been validated independently when using the Prague congestion | |||
| Prague congestion control [Boru20] (experiments are run using Prague | control [Boru20] (experiments are run using Prague and DCTCP, but | |||
| and DCTCP, but only the former are relevant for validation, because | only the former is relevant for validation, because Prague fixes a | |||
| Prague fixes a number of problems with the Linux DCTCP code that make | number of problems with the Linux DCTCP code that make it unsuitable | |||
| it unsuitable for the public Internet). | for the public Internet). | |||
| 1.3. Terminology | 1.3. Terminology | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| document are to be interpreted as described in [RFC2119] [RFC8174] | "OPTIONAL" in this document are to be interpreted as described in | |||
| when, and only when, they appear in all capitals, as shown here. | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | ||||
| The DualQ Coupled AQM uses two queues for two services. Each of the | The DualQ Coupled AQM uses two queues for two services: | |||
| following terms identifies both the service and the queue that | ||||
| provides the service: | ||||
| Classic service/queue: The Classic service is intended for all the | Classic Service/Queue: The Classic service is intended for all the | |||
| congestion control behaviours that co-exist with Reno [RFC5681] | congestion control behaviours that coexist with Reno [RFC5681] | |||
| (e.g. Reno itself, Cubic [RFC8312], TFRC [RFC5348]). | (e.g., Reno itself, CUBIC [RFC8312], and TFRC [RFC5348]). The | |||
| term 'Classic queue' means a queue providing the Classic service. | ||||
| Low-Latency, Low-Loss Scalable throughput (L4S) service/queue: The | Low Latency, Low Loss, and Scalable throughput (L4S) Service/ | |||
| 'L4S' service is intended for traffic from scalable congestion | Queue: The 'L4S' service is intended for traffic from Scalable | |||
| control algorithms, such as TCP Prague | congestion control algorithms, such as the Prague congestion | |||
| [I-D.briscoe-iccrg-prague-congestion-control], which was derived | control [PRAGUE-CC], which was derived from Data Center TCP | |||
| from Data Center TCP [RFC8257]. The L4S service is for more | [RFC8257]. The L4S service is for more general traffic than just | |||
| general traffic than just TCP Prague -- it allows the set of | Prague -- it allows the set of congestion controls with similar | |||
| congestion controls with similar scaling properties to Prague to | scaling properties to Prague to evolve, such as the examples | |||
| evolve, such as the examples of Scalable congestion controls | listed below (Relentless, SCReAM, etc.). The term 'L4S queue' | |||
| listed below (Relentless, SCReAM, etc.). | means a queue providing the L4S service. | |||
| Classic Congestion Control: A congestion control behaviour that can | Classic Congestion Control: A congestion control behaviour that can | |||
| co-exist with standard TCP Reno [RFC5681] without causing | coexist with standard Reno [RFC5681] without causing significantly | |||
| significantly negative impact on its flow rate [RFC5033]. With | negative impact on its flow rate [RFC5033]. With Classic | |||
| Classic congestion controls, such as Reno or Cubic, because flow | congestion controls, such as Reno or CUBIC, because flow rate has | |||
| rate has scaled since TCP congestion control was first designed in | scaled since TCP congestion control was first designed in 1988, it | |||
| 1988, it now takes hundreds of round trips (and growing) to | now takes hundreds of round trips (and growing) to recover after a | |||
| recover after a congestion signal (whether a loss or an ECN mark) | congestion signal (whether a loss or an ECN mark) as shown in the | |||
| as shown in the examples in section 5.1 of the L4S | examples in Section 5.1 of the L4S architecture [RFC9330] and in | |||
| architecture [I-D.ietf-tsvwg-l4s-arch] and in [RFC3649]. | [RFC3649]. Therefore, control of queuing and utilization becomes | |||
| Therefore, control of queuing and utilization becomes very slack, | very slack, and the slightest disturbances (e.g., from new flows | |||
| and the slightest disturbances (e.g. from new flows starting) | starting) prevent a high rate from being attained. | |||
| prevent a high rate from being attained. | ||||
| Scalable Congestion Control: A congestion control where the average | Scalable Congestion Control: A congestion control where the average | |||
| time from one congestion signal to the next (the recovery time) | time from one congestion signal to the next (the recovery time) | |||
| remains invariant as the flow rate scales, all other factors being | remains invariant as flow rate scales, all other factors being | |||
| equal. This maintains the same degree of control over queueing | equal. This maintains the same degree of control over queuing and | |||
| and utilization whatever the flow rate, as well as ensuring that | utilization whatever the flow rate, as well as ensuring that high | |||
| high throughput is robust to disturbances. For instance, DCTCP | throughput is robust to disturbances. For instance, DCTCP | |||
| averages 2 congestion signals per round-trip whatever the flow | averages 2 congestion signals per round trip, whatever the flow | |||
| rate, as do other recently developed scalable congestion controls, | rate, as do other recently developed Scalable congestion controls, | |||
| e.g. Relentless TCP [I-D.mathis-iccrg-relentless-tcp], TCP Prague | e.g., Relentless TCP [RELENTLESS], Prague [PRAGUE-CC] | |||
| [I-D.briscoe-iccrg-prague-congestion-control], [PragueLinux], | [PragueLinux], BBRv2 [BBRv2] [BBR-CC], and the L4S variant of | |||
| BBRv2 [BBRv2], [I-D.cardwell-iccrg-bbr-congestion-control] and the | SCReAM for real-time media [SCReAM-L4S] [RFC8298]. For the public | |||
| L4S variant of SCREAM for real-time media [SCReAM], [RFC8298]). | Internet, a Scalable transport has to comply with the requirements | |||
| For the public Internet a Scalable transport has to comply with | in Section 4 of [RFC9331] (a.k.a. the 'Prague L4S requirements'). | |||
| the requirements in Section 4 of [I-D.ietf-tsvwg-ecn-l4s-id] | ||||
| (aka. the 'Prague L4S requirements'). | ||||
| C: Abbreviation for Classic, e.g. when used as a subscript. | C: Abbreviation for Classic, e.g., when used as a subscript. | |||
| L: Abbreviation for L4S, e.g. when used as a subscript. | L: Abbreviation for L4S, e.g., when used as a subscript. | |||
| The terms Classic or L4S can also qualify other nouns, such as | The terms Classic or L4S can also qualify other nouns, such as | |||
| 'codepoint', 'identifier', 'classification', 'packet', 'flow'. | 'codepoint', 'identifier', 'classification', 'packet', and 'flow'. | |||
| For example: an L4S packet means a packet with an L4S identifier | For example, an L4S packet means a packet with an L4S identifier | |||
| sent from an L4S congestion control. | sent from an L4S congestion control. | |||
| Both Classic and L4S services can cope with a proportion of | Both Classic and L4S services can cope with a proportion of | |||
| unresponsive or less-responsive traffic as well, but in the L4S | unresponsive or less-responsive traffic as well but, in the L4S | |||
| case its rate has to be smooth enough or low enough not to build a | case, its rate has to be smooth enough or low enough to not build | |||
| queue (e.g. DNS, VoIP, game sync datagrams, etc.). The DualQ | a queue (e.g., DNS, Voice over IP (VoIP), game sync datagrams, | |||
| Coupled AQM behaviour is defined to be similar to a single FIFO | etc.). The DualQ Coupled AQM behaviour is defined to be similar | |||
| queue with respect to unresponsive and overload traffic. | to a single First-In, First-Out (FIFO) queue with respect to | |||
| unresponsive and overload traffic. | ||||
| Reno-friendly: The subset of Classic traffic that is friendly to the | Reno-friendly: The subset of Classic traffic that is friendly to the | |||
| standard Reno congestion control defined for TCP in [RFC5681]. | standard Reno congestion control defined for TCP in [RFC5681]. | |||
| Reno-friendly is used in place of 'TCP-friendly', given the latter | The TFRC spec [RFC5348] indirectly implies that 'friendly' is | |||
| has become imprecise, because the TCP protocol is now used with so | defined as "generally within a factor of two of the sending rate | |||
| many different congestion control behaviours, and Reno is used in | of a TCP flow under the same conditions". 'Reno-friendly' is used | |||
| non-TCP transports such as QUIC. | here in place of 'TCP-friendly', given the latter has become | |||
| imprecise, because the TCP protocol is now used with so many | ||||
| different congestion control behaviours, and Reno is used in non- | ||||
| TCP transports, such as QUIC [RFC9000]. | ||||
| DualQ or DualQ AQM: Used loosely as shorthand for a Dual-Queue | ||||
| Coupled AQM, where the context makes 'Coupled AQM' obvious. | ||||
| Classic ECN: The original Explicit Congestion Notification (ECN) | Classic ECN: The original Explicit Congestion Notification (ECN) | |||
| protocol [RFC3168], which requires ECN signals to be treated the | protocol [RFC3168] that requires ECN signals to be treated as | |||
| same as drops, both when generated in the network and when | equivalent to drops, both when generated in the network and when | |||
| responded to by the sender. | responded to by the sender. | |||
| For L4S, the names used for the four codepoints of the 2-bit IP- | For L4S, the names used for the four codepoints of the 2-bit IP- | |||
| ECN field are unchanged from those defined in [RFC3168]: Not ECT, | ECN field are unchanged from those defined in the ECN spec | |||
| ECT(0), ECT(1) and CE, where ECT stands for ECN-Capable Transport | [RFC3168], i.e., Not-ECT, ECT(0), ECT(1), and CE, where ECT stands | |||
| and CE stands for Congestion Experienced. A packet marked with | for ECN-Capable Transport and CE stands for Congestion | |||
| the CE codepoint is termed 'ECN-marked' or sometimes just 'marked' | Experienced. A packet marked with the CE codepoint is termed | |||
| where the context makes ECN obvious. | 'ECN-marked' or sometimes just 'marked' where the context makes | |||
| ECN obvious. | ||||
| 1.4. Features | 1.4. Features | |||
| The AQM couples marking and/or dropping from the Classic queue to the | The AQM couples marking and/or dropping from the Classic queue to the | |||
| L4S queue in such a way that a flow will get roughly the same | L4S queue in such a way that a flow will get roughly the same | |||
| throughput whichever it uses. Therefore, both queues can feed into | throughput whichever it uses. Therefore, both queues can feed into | |||
| the full capacity of a link and no rates need to be configured for | the full capacity of a link, and no rates need to be configured for | |||
| the queues. The L4S queue enables Scalable congestion controls like | the queues. The L4S queue enables Scalable congestion controls like | |||
| DCTCP or TCP Prague to give very low and predictably low latency, | DCTCP or Prague to give very low and consistently low latency, | |||
| without compromising the performance of competing 'Classic' Internet | without compromising the performance of competing 'Classic' Internet | |||
| traffic. | traffic. | |||
| Thousands of tests have been conducted in a typical fixed residential | Thousands of tests have been conducted in a typical fixed residential | |||
| broadband setting. Experiments used a range of base round trip | broadband setting. Experiments used a range of base round-trip | |||
| delays up to 100ms and link rates up to 200 Mb/s between the data | delays up to 100 ms and link rates up to 200 Mb/s between the data | |||
| centre and home network, with varying amounts of background traffic | centre and home network, with varying amounts of background traffic | |||
| in both queues. For every L4S packet, the AQM kept the average | in both queues. For every L4S packet, the AQM kept the average | |||
| queuing delay below 1ms (or 2 packets where serialization delay | queuing delay below 1 ms (or 2 packets where serialization delay | |||
| exceeded 1ms on slower links), with 99th percentile no worse than | exceeded 1 ms on slower links), with the 99th percentile being no | |||
| 2ms. No losses at all were introduced by the L4S AQM. Details of | worse than 2 ms. No losses at all were introduced by the L4S AQM. | |||
| the extensive experiments are available [DualPI2Linux], [PI2], | Details of the extensive experiments are available in [L4Seval22] and | |||
| [DCttH19]. Subjective testing using very demanding high bandwidth | [DualPI2Linux]. Subjective testing using very demanding high- | |||
| low latency applications over a single shared access link is also | bandwidth low-latency applications over a single shared access link | |||
| described in [L4Sdemo16] and summarized in the section about | is also described in [L4Sdemo16] and summarized in Section 6.1 of the | |||
| applications in the L4S architecture [I-D.ietf-tsvwg-l4s-arch] . | L4S architecture [RFC9330]. | |||
| In all these experiments, the host was connected to the home network | In all these experiments, the host was connected to the home network | |||
| by fixed Ethernet, in order to quantify the queuing delay that can be | by fixed Ethernet, in order to quantify the queuing delay that can be | |||
| achieved by a user who cares about delay. It should be emphasized | achieved by a user who cares about delay. It should be emphasized | |||
| that L4S support at the bottleneck link cannot 'undelay' bursts | that L4S support at the bottleneck link cannot 'undelay' bursts | |||
| introduced by another link on the path, for instance by legacy Wi-Fi | introduced by another link on the path, for instance by legacy Wi-Fi | |||
| equipment. However, if L4S support is added to the queue feeding the | equipment. However, if L4S support is added to the queue feeding the | |||
| _outgoing_ WAN link of a home gateway, it would be counterproductive | _outgoing_ WAN link of a home gateway, it would be counterproductive | |||
| not to also reduce the burstiness of the _incoming_ Wi-Fi. Also, | not to also reduce the burstiness of the _incoming_ Wi-Fi. Also, | |||
| trials of Wi-Fi equipment with an L4S DualQ Coupled AQM on the | trials of Wi-Fi equipment with an L4S DualQ Coupled AQM on the | |||
| _outgoing_ Wi-Fi interface are in progress, and early results of an | _outgoing_ Wi-Fi interface are in progress, and early results of an | |||
| L4S DualQ Coupled AQM in a 5G radio access network testbed with | L4S DualQ Coupled AQM in a 5G radio access network testbed with | |||
| emulated outdoor cell edge radio fading are given in [L4S_5G]. | emulated outdoor cell edge radio fading are given in [L4S_5G]. | |||
| Unlike Diffserv Expedited Forwarding, the L4S queue does not have to | Unlike Diffserv EF, the L4S queue does not have to be limited to a | |||
| be limited to a small proportion of the link capacity in order to | small proportion of the link capacity in order to achieve low delay. | |||
| achieve low delay. The L4S queue can be filled with a heavy load of | The L4S queue can be filled with a heavy load of capacity-seeking | |||
| capacity-seeking flows (TCP Prague etc.) and still achieve low delay. | flows (Prague, BBRv2, etc.) and still achieve low delay. The L4S | |||
| The L4S queue does not rely on the presence of other traffic in the | queue does not rely on the presence of other traffic in the Classic | |||
| Classic queue that can be 'overtaken'. It gives low latency to L4S | queue that can be 'overtaken'. It gives low latency to L4S traffic | |||
| traffic whether or not there is Classic traffic. The tail latency of | whether or not there is Classic traffic. The tail latency of traffic | |||
| traffic served by the Classic AQM is sometimes a little better | served by the Classic AQM is sometimes a little better, sometimes a | |||
| sometimes a little worse, when a proportion of the traffic is L4S. | little worse, when a proportion of the traffic is L4S. | |||
| The two queues are only necessary because: | The two queues are only necessary because: | |||
| * the large variations (sawteeth) of Classic flows need roughly a | * The large variations (sawteeth) of Classic flows need roughly a | |||
| base RTT of queuing delay to ensure full utilization | base RTT of queuing delay to ensure full utilization. | |||
| * Scalable flows do not need a queue to keep utilization high, but | * Scalable flows do not need a queue to keep utilization high, but | |||
| they cannot keep latency predictably low if they are mixed with | they cannot keep latency consistently low if they are mixed with | |||
| Classic traffic, | Classic traffic. | |||
| The L4S queue has latency priority within sub-round trip timescales, | The L4S queue has latency priority within sub-round-trip timescales, | |||
| but over longer periods the coupling from the Classic to the L4S AQM | but over longer periods the coupling from the Classic to the L4S AQM | |||
| (explained below) ensures that it does not have bandwidth priority | (explained below) ensures that it does not have bandwidth priority | |||
| over the Classic queue. | over the Classic queue. | |||
| 2. DualQ Coupled AQM | 2. DualQ Coupled AQM | |||
| There are two main aspects to the approach: | There are two main aspects to the DualQ Coupled AQM approach: | |||
| * The Coupled AQM that addresses throughput equivalence between | 1. The Coupled AQM that addresses throughput equivalence between | |||
| Classic (e.g. Reno, Cubic) flows and L4S flows (that satisfy the | Classic (e.g., Reno or CUBIC) flows and L4S flows (that satisfy | |||
| Prague L4S requirements). | the Prague L4S requirements). | |||
| * The Dual Queue structure that provides latency separation for L4S | 2. The Dual-Queue structure that provides latency separation for L4S | |||
| flows to isolate them from the typically large Classic queue. | flows to isolate them from the typically large Classic queue. | |||
| 2.1. Coupled AQM | 2.1. Coupled AQM | |||
| In the 1990s, the `TCP formula' was derived for the relationship | In the 1990s, the 'TCP formula' was derived for the relationship | |||
| between the steady-state congestion window, cwnd, and the drop | between the steady-state congestion window, cwnd, and the drop | |||
| probability, p of standard Reno congestion control [RFC5681]. To a | probability, p of standard Reno congestion control [RFC5681]. To a | |||
| first order approximation, the steady-state cwnd of Reno is inversely | first-order approximation, the steady-state cwnd of Reno is inversely | |||
| proportional to the square root of p. | proportional to the square root of p. | |||
| The design focuses on Reno as the worst case, because if it does no | The design focuses on Reno as the worst case, because if it does no | |||
| harm to Reno, it will not harm Cubic or any traffic designed to be | harm to Reno, it will not harm CUBIC or any traffic designed to be | |||
| friendly to Reno. TCP Cubic implements a Reno-compatibility mode, | friendly to Reno. TCP CUBIC implements a Reno-friendly mode, which | |||
| which is relevant for typical RTTs under 20ms as long as the | is relevant for typical RTTs under 20 ms as long as the throughput of | |||
| throughput of a single flow is less than about 350Mb/s. In such | a single flow is less than about 350 Mb/s. In such cases, it can be | |||
| cases it can be assumed that Cubic traffic behaves similarly to Reno. | assumed that CUBIC traffic behaves similarly to Reno. The term | |||
| The term 'Classic' will be used for the collection of Reno-friendly | 'Classic' will be used for the collection of Reno-friendly traffic | |||
| traffic including Cubic and potentially other experimental congestion | including CUBIC and potentially other experimental congestion | |||
| controls intended not to significantly impact the flow rate of Reno. | controls intended not to significantly impact the flow rate of Reno. | |||
| A supporting paper [PI2] includes the derivation of the equivalent | A supporting paper [PI2] includes the derivation of the equivalent | |||
| rate equation for DCTCP, for which cwnd is inversely proportional to | rate equation for DCTCP, for which cwnd is inversely proportional to | |||
| p (not the square root), where in this case p is the ECN marking | p (not the square root), where in this case p is the ECN-marking | |||
| probability. DCTCP is not the only congestion control that behaves | probability. DCTCP is not the only congestion control that behaves | |||
| like this, so the term 'Scalable' will be used for all similar | like this, so the term 'Scalable' will be used for all similar | |||
| congestion control behaviours (see examples in Section 1.2). The | congestion control behaviours (see examples in Section 1.2). The | |||
| term 'L4S' is used for traffic driven by a Scalable congestion | term 'L4S' is used for traffic driven by a Scalable congestion | |||
| control that also complies with the additional 'Prague L4S' | control that also complies with the additional 'Prague L4S | |||
| requirements [I-D.ietf-tsvwg-ecn-l4s-id]. | requirements' [RFC9331]. | |||
| For safe co-existence, under stationary conditions, a Scalable flow | For safe coexistence, under stationary conditions, a Scalable flow | |||
| has to run at roughly the same rate as a Reno TCP flow (all other | has to run at roughly the same rate as a Reno TCP flow (all other | |||
| factors being equal). So the drop or marking probability for Classic | factors being equal). So the drop or marking probability for Classic | |||
| traffic, p_C has to be distinct from the marking probability for L4S | traffic, p_C, has to be distinct from the marking probability for L4S | |||
| traffic, p_L. The original ECN specification [RFC3168] required | traffic, p_L. The original ECN spec [RFC3168] required these | |||
| these probabilities to be the same, but [RFC8311] updates RFC 3168 to | probabilities to be the same, but [RFC8311] updates [RFC3168] to | |||
| enable experiments in which these probabilities are different. | enable experiments in which these probabilities are different. | |||
| Also, to remain stable, Classic sources need the network to smooth | Also, to remain stable, Classic sources need the network to smooth | |||
| p_C so it changes relatively slowly. It is hard for a network node | p_C so it changes relatively slowly. It is hard for a network node | |||
| to know the RTTs of all the flows, so a Classic AQM adds a _worst- | to know the RTTs of all the flows, so a Classic AQM adds a _worst- | |||
| case_ RTT of smoothing delay (about 100-200 ms). In contrast, L4S | case_ RTT of smoothing delay (about 100-200 ms). In contrast, L4S | |||
| shifts responsibility for smoothing ECN feedback to the sender, which | shifts responsibility for smoothing ECN feedback to the sender, which | |||
| only delays its response by its _own_ RTT, as well as allowing a more | only delays its response by its _own_ RTT, as well as allowing a more | |||
| immediate response if necessary. | immediate response if necessary. | |||
| The Coupled AQM achieves safe coexistence by making the Classic drop | The Coupled AQM achieves safe coexistence by making the Classic drop | |||
| probability p_C proportional to the square of the coupled L4S | probability p_C proportional to the square of the coupled L4S | |||
| probability p_CL. p_CL is an input to the instantaneous L4S marking | probability p_CL. p_CL is an input to the instantaneous L4S marking | |||
| probability p_L but it changes as slowly as p_C. This makes the Reno | probability p_L, but it changes as slowly as p_C. This makes the | |||
| flow rate roughly equal the DCTCP flow rate, because the squaring of | Reno flow rate roughly equal the DCTCP flow rate, because the | |||
| p_CL counterbalances the square root of p_C in the 'TCP formula' of | squaring of p_CL counterbalances the square root of p_C in the 'TCP | |||
| Classic Reno congestion control. | formula' of Classic Reno congestion control. | |||
| Stating this as a formula, the relation between Classic drop | Stating this as a formula, the relation between Classic drop | |||
| probability, p_C, and the coupled L4S probability p_CL needs to take | probability, p_C, and the coupled L4S probability p_CL needs to take | |||
| the form: | the following form: | |||
| p_C = ( p_CL / k )^2 (1) | p_C = ( p_CL / k )^2, (1) | |||
| where k is the constant of proportionality, which is termed the | where k is the constant of proportionality, which is termed the | |||
| coupling factor. | 'coupling factor'. | |||
| 2.2. Dual Queue | 2.2. Dual Queue | |||
| Classic traffic needs to build a large queue to prevent under- | Classic traffic needs to build a large queue to prevent | |||
| utilization. Therefore, a separate queue is provided for L4S | underutilization. Therefore, a separate queue is provided for L4S | |||
| traffic, and it is scheduled with priority over the Classic queue. | traffic, and it is scheduled with priority over the Classic queue. | |||
| Priority is conditional to prevent starvation of Classic traffic in | Priority is conditional to prevent starvation of Classic traffic in | |||
| certain conditions (see Section 2.4). | certain conditions (see Section 2.4). | |||
| Nonetheless, coupled marking ensures that giving priority to L4S | Nonetheless, coupled marking ensures that giving priority to L4S | |||
| traffic still leaves the right amount of spare scheduling time for | traffic still leaves the right amount of spare scheduling time for | |||
| Classic flows to each get equivalent throughput to DCTCP flows (all | Classic flows to each get equivalent throughput to DCTCP flows (all | |||
| other factors such as RTT being equal). | other factors, such as RTT, being equal). | |||
| 2.3. Traffic Classification | 2.3. Traffic Classification | |||
| Both the Coupled AQM and DualQ mechanisms need an identifier to | Both the Coupled AQM and DualQ mechanisms need an identifier to | |||
| distinguish L4S (L) and Classic (C) packets. Then the coupling | distinguish L4S (L) and Classic (C) packets. Then the coupling | |||
| algorithm can achieve coexistence without having to inspect flow | algorithm can achieve coexistence without having to inspect flow | |||
| identifiers, because it can apply the appropriate marking or dropping | identifiers, because it can apply the appropriate marking or dropping | |||
| probability to all flows of each type. A separate | probability to all flows of each type. A separate specification | |||
| specification [I-D.ietf-tsvwg-ecn-l4s-id] requires the network to | [RFC9331] requires the network to treat the ECT(1) and CE codepoints | |||
| treat the ECT(1) and CE codepoints of the ECN field as this | of the ECN field as this identifier. An additional process document | |||
| identifier. An additional process document has proved necessary to | has proved necessary to make the ECT(1) codepoint available for | |||
| make the ECT(1) codepoint available for experimentation [RFC8311]. | experimentation [RFC8311]. | |||
| For policy reasons, an operator might choose to steer certain packets | For policy reasons, an operator might choose to steer certain packets | |||
| (e.g. from certain flows or with certain addresses) out of the L | (e.g., from certain flows or with certain addresses) out of the L | |||
| queue, even though they identify themselves as L4S by their ECN | queue, even though they identify themselves as L4S by their ECN | |||
| codepoints. In such cases, the L4S ECN | codepoints. In such cases, the L4S ECN protocol [RFC9331] states | |||
| protocol [I-D.ietf-tsvwg-ecn-l4s-id] says that the device "MUST NOT | that the device "MUST NOT alter the end-to-end L4S ECN identifier" so | |||
| alter the end-to-end L4S ECN identifier", so that it is preserved | that it is preserved end to end. The aim is that each operator can | |||
| end-to-end. The aim is that each operator can choose how it treats | choose how it treats L4S traffic locally, but an individual operator | |||
| L4S traffic locally, but an individual operator does not alter the | does not alter the identification of L4S packets, which would prevent | |||
| identification of L4S packets, which would prevent other operators | other operators downstream from making their own choices on how to | |||
| downstream from making their own choices on how to treat L4S traffic. | treat L4S traffic. | |||
| In addition, an operator could use other identifiers to classify | In addition, an operator could use other identifiers to classify | |||
| certain additional packet types into the L queue that it deems will | certain additional packet types into the L queue that it deems will | |||
| not risk harm to the L4S service. For instance addresses of specific | not risk harm to the L4S service, for instance, addresses of specific | |||
| applications or hosts; specific Diffserv codepoints such as EF | applications or hosts; specific Diffserv codepoints such as EF, | |||
| (Expedited Forwarding), Voice-Admit or the Non-Queue-Building (NQB) | Voice-Admit, or the Non-Queue-Building (NQB) per-hop behaviour; or | |||
| per-hop behaviour; or certain protocols (e.g. ARP, DNS) (see | certain protocols (e.g., ARP and DNS) (see Section 5.4.1 of | |||
| Section 5.4.1 of [I-D.ietf-tsvwg-ecn-l4s-id]). Note that the | [RFC9331]. Note that [RFC9331] states that "a network node MUST NOT | |||
| mechanism only reads these identifiers. [I-D.ietf-tsvwg-ecn-l4s-id] | change Not-ECT or ECT(0) in the IP-ECN field into an L4S identifier." | |||
| says it "MUST NOT alter these non-ECN identifiers". Thus, the L | Thus, the L queue is not solely an L4S queue; it can be considered | |||
| queue is not solely an L4S queue, it can be considered more generally | more generally as a low-latency queue. | |||
| as a low latency queue. | ||||
| 2.4. Overall DualQ Coupled AQM Structure | 2.4. Overall DualQ Coupled AQM Structure | |||
| Figure 1 shows the overall structure that any DualQ Coupled AQM is | Figure 1 shows the overall structure that any DualQ Coupled AQM is | |||
| likely to have. This schematic is intended to aid understanding of | likely to have. This schematic is intended to aid understanding of | |||
| the current designs of DualQ Coupled AQMs. However, it is not | the current designs of DualQ Coupled AQMs. However, it is not | |||
| intended to preclude other innovative ways of satisfying the | intended to preclude other innovative ways of satisfying the | |||
| normative requirements in Section 2.5 that minimally define a DualQ | normative requirements in Section 2.5 that minimally define a DualQ | |||
| Coupled AQM. Also, the schematic only illustrates operation under | Coupled AQM. Also, the schematic only illustrates operation under | |||
| normally expected circumstances; behaviour under overload or with | normally expected circumstances; behaviour under overload or with | |||
| operator-specific classifiers is deferred to Section 2.5.1.1. | operator-specific classifiers is deferred to Section 2.5.1.1. | |||
| The classifier on the left separates incoming traffic between the two | The classifier on the left separates incoming traffic between the two | |||
| queues (L and C). Each queue has its own AQM that determines the | queues (L and C). Each queue has its own AQM that determines the | |||
| likelihood of marking or dropping (p_L and p_C). It has been | likelihood of marking or dropping (p_L and p_C). In [PI2], it has | |||
| proved [PI2] that it is preferable to control load with a linear | been proved that it is preferable to control load with a linear | |||
| controller, then square the output before applying it as a drop | controller, then square the output before applying it as a drop | |||
| probability to Reno-friendly traffic (because Reno congestion control | probability to Reno-friendly traffic (because Reno congestion control | |||
| decreases its load proportional to the square-root of the increase in | decreases its load proportional to the square root of the increase in | |||
| drop). So, the AQM for Classic traffic needs to be implemented in | drop). So, the AQM for Classic traffic needs to be implemented in | |||
| two stages: i) a base stage that outputs an internal probability p' | two stages: i) a base stage that outputs an internal probability p' | |||
| (pronounced p-prime); and ii) a squaring stage that outputs p_C, | (pronounced 'p-prime') and ii) a squaring stage that outputs p_C, | |||
| where | where | |||
| p_C = (p')^2. (2) | p_C = (p')^2. (2) | |||
| Substituting for p_C in Eqn (1) gives: | Substituting for p_C in equation (1) gives | |||
| p' = p_CL / k | p' = p_CL / k. | |||
| So the slow-moving input to ECN marking in the L queue (the coupled | So the slow-moving input to ECN marking in the L queue (the coupled | |||
| L4S probability) is: | L4S probability) is | |||
| p_CL = k*p'. (3) | p_CL = k*p'. (3) | |||
| The actual ECN marking probability p_L that is applied to the L queue | The actual ECN-marking probability p_L that is applied to the L queue | |||
| needs to track the immediate L queue delay under L-only congestion | needs to track the immediate L queue delay under L-only congestion | |||
| conditions, as well as track p_CL under coupled congestion | conditions, as well as track p_CL under coupled congestion | |||
| conditions. So the L queue uses a native AQM that calculates a | conditions. So the L queue uses a 'Native AQM' that calculates a | |||
| probability p'_L as a function of the instantaneous L queue delay. | probability p'_L as a function of the instantaneous L queue delay. | |||
| And, given the L queue has conditional priority over the C queue, | And given the L queue has conditional priority over the C queue, | |||
| whenever the L queue grows, the AQM ought to apply marking | whenever the L queue grows, the AQM ought to apply marking | |||
| probability p'_L, but p_L ought not to fall below p_CL. This | probability p'_L, but p_L ought to not fall below p_CL. This | |||
| suggests: | suggests | |||
| p_L = max(p'_L, p_CL), (4) | p_L = max(p'_L, p_CL), (4) | |||
| which has also been found to work very well in practice. | which has also been found to work very well in practice. | |||
| The two transformations of p' in equations (2) and (3) implement the | The two transformations of p' in equations (2) and (3) implement the | |||
| required coupling given in equation (1) earlier. | required coupling given in equation (1) earlier. | |||
| The constant of proportionality or coupling factor, k, in equation | The constant of proportionality or coupling factor, k, in equation | |||
| (1) determines the ratio between the congestion probabilities (loss | (1) determines the ratio between the congestion probabilities (loss | |||
| skipping to change at page 15, line 28 ¶ | skipping to change at line 688 ¶ | |||
| `----------'\\ | AQM |---->: ,'|`-.___.-' | `----------'\\ | AQM |---->: ,'|`-.___.-' | |||
| \\ | |p' | <' | | \\ | |p' | <' | | |||
| \\ `-------' (p'^2) //`' | \\ `-------' (p'^2) //`' | |||
| \\ ^ | // | \\ ^ | // | |||
| \\,. | v p_C // | \\,. | v p_C // | |||
| < | _________ .------.// | < | _________ .------.// | |||
| `\| | | | Drop |/ | `\| | | | Drop |/ | |||
| Classic (C) |queue |===>|/mark | | Classic (C) |queue |===>|/mark | | |||
| __|______| `------' | __|______| `------' | |||
| Figure 1: DualQ Coupled AQM Schematic | Legend: ===> traffic flow | |||
| ---> control dependency | ||||
| Legend: ===> traffic flow; ---> control dependency. | Figure 1: DualQ Coupled AQM Schematic | |||
| After the AQMs have applied their dropping or marking, the scheduler | After the AQMs have applied their dropping or marking, the scheduler | |||
| forwards their packets to the link. Even though the scheduler gives | forwards their packets to the link. Even though the scheduler gives | |||
| priority to the L queue, it is not as strong as the coupling from the | priority to the L queue, it is not as strong as the coupling from the | |||
| C queue. This is because, as the C queue grows, the base AQM applies | C queue. This is because, as the C queue grows, the 'Base AQM' | |||
| more congestion signals to L traffic (as well as C). As L flows | applies more congestion signals to L traffic (as well as to C). As L | |||
| reduce their rate in response, they use less than the scheduling | flows reduce their rate in response, they use less than the | |||
| share for L traffic. So, because the scheduler is work preserving, | scheduling share for L traffic. So, because the scheduler is work | |||
| it schedules any C traffic in the gaps. | preserving, it schedules any C traffic in the gaps. | |||
| Giving priority to the L queue has the benefit of very low L queue | Giving priority to the L queue has the benefit of very low L queue | |||
| delay, because the L queue is kept empty whenever L traffic is | delay, because the L queue is kept empty whenever L traffic is | |||
| controlled by the coupling. Also, there only has to be a coupling in | controlled by the coupling. Also, there only has to be a coupling in | |||
| one direction - from Classic to L4S. Priority has to be conditional | one direction -- from Classic to L4S. Priority has to be conditional | |||
| in some way to prevent the C queue being starved in the short-term | in some way to prevent the C queue from being starved in the short | |||
| (see Section 4.2.2) to give C traffic a means to push in, as | term (see Section 4.2.2) to give C traffic a means to push in, as | |||
| explained next. With normal responsive L traffic, the coupled ECN | explained next. With normal responsive L traffic, the coupled ECN | |||
| marking gives C traffic the ability to push back against even strict | marking gives C traffic the ability to push back against even strict | |||
| priority, by congestion marking the L traffic to make it yield some | priority, by congestion marking the L traffic to make it yield some | |||
| space. However, if there is just a small finite set of C packets | space. However, if there is just a small finite set of C packets | |||
| (e.g. a DNS request or an initial window of data) some Classic AQMs | (e.g., a DNS request or an initial window of data), some Classic AQMs | |||
| will not induce enough ECN marking in the L queue, no matter how long | will not induce enough ECN marking in the L queue, no matter how long | |||
| the small set of C packets waits. Then, if the L queue happens to | the small set of C packets waits. Then, if the L queue happens to | |||
| remain busy, the C traffic would never get a scheduling opportunity | remain busy, the C traffic would never get a scheduling opportunity | |||
| from a strict priority scheduler. Ideally the Classic AQM would be | from a strict priority scheduler. Ideally, the Classic AQM would be | |||
| designed to increase the coupled marking the longer that C packets | designed to increase the coupled marking the longer that C packets | |||
| have been waiting, but this is not always practical - hence the need | have been waiting, but this is not always practical -- hence the need | |||
| for L priority to be conditional. Giving a small weight or limited | for L priority to be conditional. Giving a small weight or limited | |||
| waiting time for C traffic improves response times for short Classic | waiting time for C traffic improves response times for short Classic | |||
| messages, such as DNS requests, and improves Classic flow startup | messages, such as DNS requests, and improves Classic flow startup | |||
| because immediate capacity is available. | because immediate capacity is available. | |||
| Example DualQ Coupled AQM algorithms called DualPI2 and Curvy RED are | Example DualQ Coupled AQM algorithms called 'DualPI2' and 'Curvy RED' | |||
| given in Appendix A and Appendix B. Either example AQM can be used | are given in Appendices A and B. Either example AQM can be used to | |||
| to couple packet marking and dropping across a dual Q. | couple packet marking and dropping across a DualQ: | |||
| DualPI2 uses a Proportional-Integral (PI) controller as the Base AQM. | * DualPI2 uses a Proportional Integral (PI) controller as the Base | |||
| Indeed, this Base AQM with just the squared output and no L4S queue | AQM. Indeed, this Base AQM with just the squared output and no | |||
| can be used as a drop-in replacement for PIE [RFC8033], in which case | L4S queue can be used as a drop-in replacement for PIE [RFC8033], | |||
| it is just called PI2 [PI2]. PI2 is a principled simplification of | in which case it is just called PI2 [PI2]. PI2 is a principled | |||
| PIE that is both more responsive and more stable in the face of | simplification of PIE that is both more responsive and more stable | |||
| dynamically varying load. | in the face of dynamically varying load. | |||
| Curvy RED is derived from RED [RFC2309], except its configuration | * Curvy RED is derived from RED [RED], except its configuration | |||
| parameters are delay-based to make them insensitive to link rate and | parameters are delay-based to make them insensitive to link rate, | |||
| it requires fewer operations per packet than RED. However, DualPI2 | and it requires fewer operations per packet than RED. However, | |||
| is more responsive and stable over a wider range of RTTs than Curvy | DualPI2 is more responsive and stable over a wider range of RTTs | |||
| RED. As a consequence, at the time of writing, DualPI2 has attracted | than Curvy RED. As a consequence, at the time of writing, DualPI2 | |||
| more development and evaluation attention than Curvy RED, leaving the | has attracted more development and evaluation attention than Curvy | |||
| Curvy RED design not so fully evaluated. | RED, leaving the Curvy RED design not so fully evaluated. | |||
| Both AQMs regulate their queue against targets configured in units of | Both AQMs regulate their queue against targets configured in units of | |||
| time rather than bytes. As already explained, this ensures | time rather than bytes. As already explained, this ensures | |||
| configuration can be invariant for different drain rates. With AQMs | configuration can be invariant for different drain rates. With AQMs | |||
| in a dualQ structure this is particularly important because the drain | in a DualQ structure this is particularly important because the drain | |||
| rate of each queue can vary rapidly as flows for the two queues | rate of each queue can vary rapidly as flows for the two queues | |||
| arrive and depart, even if the combined link rate is constant. | arrive and depart, even if the combined link rate is constant. | |||
| It would be possible to control the queues with other alternative | It would be possible to control the queues with other alternative | |||
| AQMs, as long as the normative requirements (those expressed in | AQMs, as long as the normative requirements (those expressed in | |||
| capitals) in Section 2.5 are observed. | capitals) in Section 2.5 are observed. | |||
| The two queues could optionally be part of a larger queuing | The two queues could optionally be part of a larger queuing | |||
| hierarchy, such as the initial example ideas in | hierarchy, such as the initial example ideas in [L4S-DIFFSERV]. | |||
| [I-D.briscoe-tsvwg-l4s-diffserv]. | ||||
| 2.5. Normative Requirements for a DualQ Coupled AQM | 2.5. Normative Requirements for a DualQ Coupled AQM | |||
| The following requirements are intended to capture only the essential | The following requirements are intended to capture only the essential | |||
| aspects of a DualQ Coupled AQM. They are intended to be independent | aspects of a DualQ Coupled AQM. They are intended to be independent | |||
| of the particular AQMs implemented for each queue, but to still | of the particular AQMs implemented for each queue but to still define | |||
| define the DualQ framework built around those AQMs. | the DualQ framework built around those AQMs. | |||
| 2.5.1. Functional Requirements | 2.5.1. Functional Requirements | |||
| A Dual Queue Coupled AQM implementation MUST comply with the | A DualQ Coupled AQM implementation MUST comply with the prerequisite | |||
| prerequisite L4S behaviours for any L4S network node (not just a | L4S behaviours for any L4S network node (not just a DualQ) as | |||
| DualQ) as specified in section 5 of [I-D.ietf-tsvwg-ecn-l4s-id]. | specified in Section 5 of [RFC9331]. These primarily concern | |||
| These primarily concern classification and remarking as briefly | classification and re-marking as briefly summarized earlier in | |||
| summarized in Section 2.3 earlier. But there is also a subsection | Section 2.3. But Section 5.5 of [RFC9331] also gives guidance on | |||
| (5.5) giving guidance on reducing the burstiness of the link | reducing the burstiness of the link technology underlying any L4S | |||
| technology underlying any L4S AQM. | AQM. | |||
| A Dual Queue Coupled AQM implementation MUST utilize two queues, each | A DualQ Coupled AQM implementation MUST utilize two queues, each with | |||
| with an AQM algorithm. | an AQM algorithm. | |||
| The AQM algorithm for the low latency (L) queue MUST be able to apply | The AQM algorithm for the low-latency (L) queue MUST be able to apply | |||
| ECN marking to ECN-capable packets. | ECN marking to ECN-capable packets. | |||
| The scheduler draining the two queues MUST give L4S packets priority | The scheduler draining the two queues MUST give L4S packets priority | |||
| over Classic, although priority MUST be bounded in order not to | over Classic, although priority MUST be bounded in order not to | |||
| starve Classic traffic (see Section 4.2.2). The scheduler SHOULD be | starve Classic traffic (see Section 4.2.2). The scheduler SHOULD be | |||
| work-conserving, or otherwise close to work-conserving. This is | work-conserving, or otherwise close to work-conserving. This is | |||
| because Classic traffic needs to be able to efficiently fill any | because Classic traffic needs to be able to efficiently fill any | |||
| space left by L4S traffic even though the scheduler would otherwise | space left by L4S traffic even though the scheduler would otherwise | |||
| allocate it to L4S. | allocate it to L4S. | |||
| [I-D.ietf-tsvwg-ecn-l4s-id] defines the meaning of an ECN marking on | [RFC9331] defines the meaning of an ECN marking on L4S traffic, | |||
| L4S traffic, relative to drop of Classic traffic. In order to ensure | relative to drop of Classic traffic. In order to ensure coexistence | |||
| coexistence of Classic and Scalable L4S traffic, it says, "The | of Classic and Scalable L4S traffic, it says, "the likelihood that | |||
| likelihood that an AQM drops a Not-ECT Classic packet (p_C) MUST be | the AQM drops a Not-ECT Classic packet (p_C) MUST be roughly | |||
| roughly proportional to the square of the likelihood that it would | proportional to the square of the likelihood that it would have | |||
| have marked it if it had been an L4S packet (p_L)." The term | marked it if it had been an L4S packet (p_L)." The term 'likelihood' | |||
| 'likelihood' is used to allow for marking and dropping to be either | is used to allow for marking and dropping to be either probabilistic | |||
| probabilistic or deterministic. | or deterministic. | |||
| For the current specification, this translates into the following | For the current specification, this translates into the following | |||
| requirement. A DualQ Coupled AQM MUST apply ECN marking to traffic | requirement. A DualQ Coupled AQM MUST apply ECN marking to traffic | |||
| in the L queue that is no lower than that derived from the likelihood | in the L queue that is no lower than that derived from the likelihood | |||
| of drop (or ECN marking) in the Classic queue using Eqn. (1). | of drop (or ECN marking) in the Classic queue using equation (1). | |||
| The constant of proportionality, k, in Eqn (1) determines the | The constant of proportionality, k, in equation (1) determines the | |||
| relative flow rates of Classic and L4S flows when the AQM concerned | relative flow rates of Classic and L4S flows when the AQM concerned | |||
| is the bottleneck (all other factors being equal). The L4S ECN | is the bottleneck (all other factors being equal). The L4S ECN | |||
| protocol [I-D.ietf-tsvwg-ecn-l4s-id] says, "The constant of | protocol [RFC9331] says, "The constant of proportionality (k) does | |||
| proportionality (k) does not have to be standardised for | not have to be standardised for interoperability, but a value of 2 is | |||
| interoperability, but a value of 2 is RECOMMENDED." | RECOMMENDED." | |||
| Assuming Scalable congestion controls for the Internet will be as | Assuming Scalable congestion controls for the Internet will be as | |||
| aggressive as DCTCP, this will ensure their congestion window will be | aggressive as DCTCP, this will ensure their congestion window will be | |||
| roughly the same as that of a standards track TCP Reno congestion | roughly the same as that of a Standards Track TCP Reno congestion | |||
| control (Reno) [RFC5681] and other Reno-friendly controls, such as | control (Reno) [RFC5681] and other Reno-friendly controls, such as | |||
| TCP Cubic in its Reno-compatibility mode. | TCP CUBIC in its Reno-friendly mode. | |||
| The choice of k is a matter of operator policy, and operators MAY | The choice of k is a matter of operator policy, and operators MAY | |||
| choose a different value using the guidelines in Appendix C.2. | choose a different value using the guidelines in Appendix C.2. | |||
| If multiple customers or users share capacity at a bottleneck | If multiple customers or users share capacity at a bottleneck (e.g., | |||
| (e.g. in the Internet access link of a campus network), the | in the Internet access link of a campus network), the operator's | |||
| operator's choice of k will determine capacity sharing between the | choice of k will determine capacity sharing between the flows of | |||
| flows of different customers. However, on the public Internet, | different customers. However, on the public Internet, access network | |||
| access network operators typically isolate customers from each other | operators typically isolate customers from each other with some form | |||
| with some form of layer-2 multiplexing (OFDM(A) in DOCSIS3.1, CDMA in | of Layer 2 multiplexing (OFDM(A) in DOCSIS 3.1, CDMA in 3G, and SC- | |||
| 3G, SC-FDMA in LTE) or L3 scheduling (WRR in DSL), rather than | FDMA in LTE) or Layer 3 scheduling (Weighted Round Robin (WRR) for | |||
| relying on host congestion controls to share capacity between | DSL) rather than relying on host congestion controls to share | |||
| customers [RFC0970]. In such cases, the choice of k will solely | capacity between customers [RFC0970]. In such cases, the choice of k | |||
| affect relative flow rates within each customer's access capacity, | will solely affect relative flow rates within each customer's access | |||
| not between customers. Also, k will not affect relative flow rates | capacity, not between customers. Also, k will not affect relative | |||
| at any times when all flows are Classic or all flows are L4S, and it | flow rates at any times when all flows are Classic or all flows are | |||
| will not affect the relative throughput of small flows. | L4S, and it will not affect the relative throughput of small flows. | |||
| 2.5.1.1. Requirements in Unexpected Cases | 2.5.1.1. Requirements in Unexpected Cases | |||
| The flexibility to allow operator-specific classifiers (Section 2.3) | The flexibility to allow operator-specific classifiers (Section 2.3) | |||
| leads to the need to specify what the AQM in each queue ought to do | leads to the need to specify what the AQM in each queue ought to do | |||
| with packets that do not carry the ECN field expected for that queue. | with packets that do not carry the ECN field expected for that queue. | |||
| It is expected that the AQM in each queue will inspect the ECN field | It is expected that the AQM in each queue will inspect the ECN field | |||
| to determine what sort of congestion notification to signal, then it | to determine what sort of congestion notification to signal, then it | |||
| will decide whether to apply congestion notification to this | will decide whether to apply congestion notification to this | |||
| particular packet, as follows: | particular packet, as follows: | |||
| * If a packet that does not carry an ECT(1) or CE codepoint is | * If a packet that does not carry an ECT(1) or a CE codepoint is | |||
| classified into the L queue: | classified into the L queue, then: | |||
| - if the packet is ECT(0), the L AQM SHOULD apply CE-marking | - if the packet is ECT(0), the L AQM SHOULD apply CE marking | |||
| using a probability appropriate to Classic congestion control | using a probability appropriate to Classic congestion control | |||
| and appropriate to the target delay in the L queue | and appropriate to the target delay in the L queue | |||
| - if the packet is Not-ECT, the appropriate action depends on | - if the packet is Not-ECT, the appropriate action depends on | |||
| whether some other function is protecting the L queue from | whether some other function is protecting the L queue from | |||
| misbehaving flows (e.g. per-flow queue protection | misbehaving flows (e.g., per-flow queue protection | |||
| [I-D.briscoe-docsis-q-protection] or latency policing): | [DOCSIS-Q-PROT] or latency policing): | |||
| o If separate queue protection is provided, the L AQM SHOULD | o if separate queue protection is provided, the L AQM SHOULD | |||
| ignore the packet and forward it unchanged, meaning it | ignore the packet and forward it unchanged, meaning it | |||
| should not calculate whether to apply congestion | should not calculate whether to apply congestion | |||
| notification and it should neither drop nor CE-mark the | notification, and it should neither drop nor CE mark the | |||
| packet (for instance, the operator might classify EF traffic | packet (for instance, the operator might classify EF traffic | |||
| that is unresponsive to drop into the L queue, alongside | that is unresponsive to drop into the L queue, alongside | |||
| responsive L4S-ECN traffic) | responsive L4S-ECN traffic) | |||
| o if separate queue protection is not provided, the L AQM | o if separate queue protection is not provided, the L AQM | |||
| SHOULD apply drop using a drop probability appropriate to | SHOULD apply drop using a drop probability appropriate to | |||
| Classic congestion control and appropriate to the target | Classic congestion control and to the target delay in the L | |||
| delay in the L queue | queue | |||
| * If a packet that carries an ECT(1) codepoint is classified into | * If a packet that carries an ECT(1) codepoint is classified into | |||
| the C queue: | the C queue: | |||
| - the C AQM SHOULD apply CE-marking using the coupled AQM | - the C AQM SHOULD apply CE marking using the Coupled AQM | |||
| probability p_CL (= k*p'). | probability p_CL (= k*p'). | |||
| The above requirements are worded as "SHOULDs", because operator- | The above requirements are worded as "SHOULD"s, because operator- | |||
| specific classifiers are for flexibility, by definition. Therefore, | specific classifiers are for flexibility, by definition. Therefore, | |||
| alternative actions might be appropriate in the operator's specific | alternative actions might be appropriate in the operator's specific | |||
| circumstances. An example would be where the operator knows that | circumstances. An example would be where the operator knows that | |||
| certain legacy traffic marked with one codepoint actually has a | certain legacy traffic set to one codepoint actually has a congestion | |||
| congestion response associated with another codepoint. | response associated with another codepoint. | |||
| If the DualQ Coupled AQM has detected overload, it MUST introduce | If the DualQ Coupled AQM has detected overload, it MUST introduce | |||
| Classic drop to both types of ECN-capable traffic until the overload | Classic drop to both types of ECN-capable traffic until the overload | |||
| episode has subsided. Introducing drop if ECN marking is | episode has subsided. Introducing drop if ECN marking is | |||
| persistently high is recommended by Section 7 of the ECN | persistently high is recommended in Section 7 of the ECN spec | |||
| specification [RFC3168] and Section 4.2.1 of the AQM | [RFC3168] and in Section 4.2.1 of the AQM Recommendations [RFC7567]. | |||
| Recommendations [RFC7567]. | ||||
| 2.5.2. Management Requirements | 2.5.2. Management Requirements | |||
| 2.5.2.1. Configuration | 2.5.2.1. Configuration | |||
| By default, a DualQ Coupled AQM SHOULD NOT need any configuration for | By default, a DualQ Coupled AQM SHOULD NOT need any configuration for | |||
| use at a bottleneck on the public Internet [RFC7567]. The following | use at a bottleneck on the public Internet [RFC7567]. The following | |||
| parameters MAY be operator-configurable, e.g. to tune for non- | parameters MAY be operator-configurable, e.g., to tune for non- | |||
| Internet settings: | Internet settings: | |||
| * Optional packet classifier(s) to use in addition to the ECN field | * Optional packet classifier(s) to use in addition to the ECN field | |||
| (see Section 2.3); | (see Section 2.3). | |||
| * Expected typical RTT, which can be used to determine the queuing | * Expected typical RTT, which can be used to determine the queuing | |||
| delay of the Classic AQM at its operating point, in order to | delay of the Classic AQM at its operating point, in order to | |||
| prevent typical lone flows from under-utilizing capacity. For | prevent typical lone flows from underutilizing capacity. For | |||
| example: | example: | |||
| - for the PI2 algorithm (Appendix A) the queuing delay target is | - for the PI2 algorithm (Appendix A), the queuing delay target is | |||
| dependent on the typical RTT; | dependent on the typical RTT. | |||
| - for the Curvy RED algorithm (Appendix B) the queuing delay at | - for the Curvy RED algorithm (Appendix B), the queuing delay at | |||
| the desired operating point of the curvy ramp is configured to | the desired operating point of the curvy ramp is configured to | |||
| encompass a typical RTT; | encompass a typical RTT. | |||
| - if another Classic AQM was used, it would be likely to need an | - if another Classic AQM was used, it would be likely to need an | |||
| operating point for the queue based on the typical RTT, and if | operating point for the queue based on the typical RTT, and if | |||
| so it SHOULD be expressed in units of time. | so, it SHOULD be expressed in units of time. | |||
| An operating point that is manually calculated might be directly | An operating point that is manually calculated might be directly | |||
| configurable instead, e.g. for links with large numbers of flows | configurable instead, e.g., for links with large numbers of flows | |||
| where under-utilization by a single flow would be unlikely. | where underutilization by a single flow would be unlikely. | |||
| * Expected maximum RTT, which can be used to set the stability | * Expected maximum RTT, which can be used to set the stability | |||
| parameter(s) of the Classic AQM. For example: | parameter(s) of the Classic AQM. For example: | |||
| - for the PI2 algorithm (Appendix A), the gain parameters of the | - for the PI2 algorithm (Appendix A), the gain parameters of the | |||
| PI algorithm depend on the maximum RTT. | PI algorithm depend on the maximum RTT. | |||
| - for the Curvy RED algorithm (Appendix B) the smoothing | - for the Curvy RED algorithm (Appendix B), the smoothing | |||
| parameter is chosen to filter out transients in the queue | parameter is chosen to filter out transients in the queue | |||
| within a maximum RTT. | within a maximum RTT. | |||
| Stability parameter(s) that are manually calculated assuming a | Any stability parameter that is manually calculated assuming a | |||
| maximum RTT might be directly configurable instead. | maximum RTT might be directly configurable instead. | |||
| * Coupling factor, k (see Appendix C.2); | * Coupling factor, k (see Appendix C.2). | |||
| * A limit to the conditional priority of L4S. This is scheduler- | * A limit to the conditional priority of L4S. This is scheduler- | |||
| dependent, but it SHOULD be expressed as a relation between the | dependent, but it SHOULD be expressed as a relation between the | |||
| max delay of a C packet and an L packet. For example: | max delay of a C packet and an L packet. For example: | |||
| - for a WRR scheduler a weight ratio between L and C of w:1 means | - for a WRR scheduler, a weight ratio between L and C of w:1 | |||
| that the maximum delay to a C packet is w times that of an L | means that the maximum delay of a C packet is w times that of | |||
| packet. | an L packet. | |||
| - for a time-shifted FIFO (TS-FIFO) scheduler (see Section 4.2.2) | - for a time-shifted FIFO (TS-FIFO) scheduler (see | |||
| a time-shift of tshift means that the maximum delay to a C | Section 4.2.2), a time-shift of tshift means that the maximum | |||
| packet is tshift greater than that of an L packet. tshift could | delay to a C packet is tshift greater than that of an L packet. | |||
| be expressed as a multiple of the typical RTT rather than as an | tshift could be expressed as a multiple of the typical RTT | |||
| absolute delay. | rather than as an absolute delay. | |||
| * The maximum Classic ECN marking probability, p_Cmax, before | * The maximum Classic ECN-marking probability, p_Cmax, before | |||
| introducing drop. | introducing drop. | |||
| 2.5.2.2. Monitoring | 2.5.2.2. Monitoring | |||
| An experimental DualQ Coupled AQM SHOULD allow the operator to | An experimental DualQ Coupled AQM SHOULD allow the operator to | |||
| monitor each of the following operational statistics on demand, per | monitor each of the following operational statistics on demand, per | |||
| queue and per configurable sample interval, for performance | queue and per configurable sample interval, for performance | |||
| monitoring and perhaps also for accounting in some cases: | monitoring and perhaps also for accounting in some cases: | |||
| * Bits forwarded, from which utilization can be calculated; | * bits forwarded, from which utilization can be calculated; | |||
| * Total packets in the three categories: arrived, presented to the | * total packets in the three categories: arrived, presented to the | |||
| AQM, and forwarded. The difference between the first two will | AQM, and forwarded. The difference between the first two will | |||
| measure any non-AQM tail discard. The difference between the last | measure any non-AQM tail discard. The difference between the last | |||
| two will measure proactive AQM discard; | two will measure proactive AQM discard; | |||
| * ECN packets marked, non-ECN packets dropped, ECN packets dropped, | * ECN packets marked, non-ECN packets dropped, and ECN packets | |||
| which can be combined with the three total packet counts above to | dropped, which can be combined with the three total packet counts | |||
| calculate marking and dropping probabilities; | above to calculate marking and dropping probabilities; and | |||
| * Queue delay (not including serialization delay of the head packet | * queue delay (not including serialization delay of the head packet | |||
| or medium acquisition delay) - see further notes below. | or medium acquisition delay) -- see further notes below. | |||
| Unlike the other statistics, queue delay cannot be captured in a | Unlike the other statistics, queue delay cannot be captured in a | |||
| simple accumulating counter. Therefore, the type of queue delay | simple accumulating counter. Therefore, the type of queue delay | |||
| statistics produced (mean, percentiles, etc.) will depend on | statistics produced (mean, percentiles, etc.) will depend on | |||
| implementation constraints. To facilitate comparative evaluation | implementation constraints. To facilitate comparative evaluation | |||
| of different implementations and approaches, an implementation | of different implementations and approaches, an implementation | |||
| SHOULD allow mean and 99th percentile queue delay to be derived | SHOULD allow mean and 99th percentile queue delay to be derived | |||
| (per queue per sample interval). A relatively simple way to do | (per queue per sample interval). A relatively simple way to do | |||
| this would be to store a coarse-grained histogram of queue delay. | this would be to store a coarse-grained histogram of queue delay. | |||
| This could be done with a small number of bins with configurable | This could be done with a small number of bins with configurable | |||
| skipping to change at page 22, line 10 ¶ | skipping to change at line 991 ¶ | |||
| a sample interval, each bin would accumulate a count of the number | a sample interval, each bin would accumulate a count of the number | |||
| of packets that had fallen within each range. The maximum queue | of packets that had fallen within each range. The maximum queue | |||
| delay per queue per interval MAY also be recorded, to aid | delay per queue per interval MAY also be recorded, to aid | |||
| diagnosis of faults and anomalous events. | diagnosis of faults and anomalous events. | |||
| 2.5.2.3. Anomaly Detection | 2.5.2.3. Anomaly Detection | |||
| An experimental DualQ Coupled AQM SHOULD asynchronously report the | An experimental DualQ Coupled AQM SHOULD asynchronously report the | |||
| following data about anomalous conditions: | following data about anomalous conditions: | |||
| * Start-time and duration of overload state. | * Start time and duration of overload state. | |||
| A hysteresis mechanism SHOULD be used to prevent flapping in and | A hysteresis mechanism SHOULD be used to prevent flapping in and | |||
| out of overload causing an event storm. For instance, exit from | out of overload causing an event storm. For instance, exiting | |||
| overload state could trigger one report, but also latch a timer. | from overload state could trigger one report but also latch a | |||
| Then, during that time, if the AQM enters and exits overload state | timer. Then, during that time, if the AQM enters and exits | |||
| any number of times, the duration in overload state is | overload state any number of times, the duration in overload state | |||
| accumulated, but no new report is generated until the first time | is accumulated, but no new report is generated until the first | |||
| the AQM is out of overload once the timer has expired. | time the AQM is out of overload once the timer has expired. | |||
| 2.5.2.4. Deployment, Coexistence and Scaling | 2.5.2.4. Deployment, Coexistence, and Scaling | |||
| [RFC5706] suggests that deployment, coexistence and scaling should | [RFC5706] suggests that deployment, coexistence, and scaling should | |||
| also be covered as management requirements. The raison d'etre of the | also be covered as management requirements. The raison d'etre of the | |||
| DualQ Coupled AQM is to enable deployment and coexistence of Scalable | DualQ Coupled AQM is to enable deployment and coexistence of Scalable | |||
| congestion controls - as incremental replacements for today's Reno- | congestion controls (as incremental replacements for today's Reno- | |||
| friendly controls that do not scale with bandwidth-delay product. | friendly controls that do not scale with bandwidth-delay product). | |||
| Therefore, there is no need to repeat these motivating issues here | Therefore, there is no need to repeat these motivating issues here | |||
| given they are already explained in the Introduction and detailed in | given they are already explained in the Introduction and detailed in | |||
| the L4S architecture [I-D.ietf-tsvwg-l4s-arch]. | the L4S architecture [RFC9330]. | |||
| The descriptions of specific DualQ Coupled AQM algorithms in the | The descriptions of specific DualQ Coupled AQM algorithms in the | |||
| appendices cover scaling of their configuration parameters, e.g. with | appendices cover scaling of their configuration parameters, e.g., | |||
| respect to RTT and sampling frequency. | with respect to RTT and sampling frequency. | |||
| 3. IANA Considerations (to be removed by RFC Editor) | 3. IANA Considerations | |||
| This specification contains no IANA considerations. | This document has no IANA actions. | |||
| 4. Security Considerations | 4. Security Considerations | |||
| 4.1. Low Delay without Requiring Per-Flow Processing | 4.1. Low Delay without Requiring Per-flow Processing | |||
| The L4S architecture [I-D.ietf-tsvwg-l4s-arch] compares the DualQ and | The L4S architecture [RFC9330] compares the DualQ and FQ approaches | |||
| per-flow-queuing (FQ) approaches to L4S. The privacy considerations | to L4S. The privacy considerations section in that document | |||
| section in that document motivates the DualQ on the grounds that | motivates the DualQ on the grounds that users who want to encrypt | |||
| users who want to encrypt application flow identifiers, e.g. in IPSec | application flow identifiers, e.g., in IPsec or other encrypted VPN | |||
| or other encrypted VPN tunnels, don't have to sacrifice low delay | tunnels, don't have to sacrifice low delay ([RFC8404] encourages | |||
| ([RFC8404] encourages avoidance of such privacy compromises). | avoidance of such privacy compromises). | |||
| The security considerations section of the L4S architecture also | The security considerations section of the L4S architecture [RFC9330] | |||
| includes subsections on policing of relative flow-rates (section 8.1) | also includes subsections on policing of relative flow rates | |||
| and on policing of flows that cause excessive queuing delay (section | (Section 8.1) and on policing of flows that cause excessive queuing | |||
| 8.2). It explains that the interests of users do not collide in the | delay (Section 8.2). It explains that the interests of users do not | |||
| same way for delay as they do for bandwidth. For someone to get more | collide in the same way for delay as they do for bandwidth. For | |||
| of the bandwidth of a shared link, someone else necessarily gets less | someone to get more of the bandwidth of a shared link, someone else | |||
| (a 'zero-sum game'), whereas queuing delay can be reduced for | necessarily gets less (a 'zero-sum game'), whereas queuing delay can | |||
| everyone, without any need for someone else to lose out. It also | be reduced for everyone, without any need for someone else to lose | |||
| explains that, on the current Internet, scheduling usually enforces | out. It also explains that, on the current Internet, scheduling | |||
| separation of bandwidth between 'sites' (e.g. households, businesses | usually enforces separation of bandwidth between 'sites' (e.g., | |||
| or mobile users), but it is not common to need to schedule or police | households, businesses, or mobile users), but it is not common to | |||
| the bandwidth used by individual application flows. | need to schedule or police the bandwidth used by individual | |||
| application flows. | ||||
| By the above arguments, per-flow rate policing might not be necessary | By the above arguments, per-flow rate policing might not be | |||
| and in trusted environments (e.g. private data centres) it is | necessary, and in trusted environments (e.g., private data centres), | |||
| certainly unlikely to be needed. Therefore, because it is hard to | it is certainly unlikely to be needed. Therefore, because it is hard | |||
| avoid complexity and unintended side effects with per-flow rate | to avoid complexity and unintended side effects with per-flow rate | |||
| policing, it needs to be separable from a basic AQM, as an option, | policing, it needs to be separable from a basic AQM, as an option, | |||
| under policy control. On this basis, the DualQ Coupled AQM provides | under policy control. On this basis, the DualQ Coupled AQM provides | |||
| low delay without prejudging the question of per-flow rate policing. | low delay without prejudging the question of per-flow rate policing. | |||
| Nonetheless, the interests of users or flows might conflict, e.g. in | Nonetheless, the interests of users or flows might conflict, e.g., in | |||
| case of accident or malice. Then per-flow rate control could be | case of accident or malice. Then per-flow rate control could be | |||
| necessary. If flow-rate control is needed, it can be provided as a | necessary. If per-flow rate control is needed, it can be provided as | |||
| modular addition to a DualQ. And similarly, if protection against | a modular addition to a DualQ. And similarly, if protection against | |||
| excessive queue delay is needed, a per-flow queue protection option | excessive queue delay is needed, a per-flow queue protection option | |||
| can be added to a DualQ (e.g. [I-D.briscoe-docsis-q-protection]). | can be added to a DualQ (e.g., [DOCSIS-Q-PROT]). | |||
| 4.2. Handling Unresponsive Flows and Overload | 4.2. Handling Unresponsive Flows and Overload | |||
| In the absence of any per-flow control, it is important that the | In the absence of any per-flow control, it is important that the | |||
| basic DualQ Coupled AQM gives unresponsive flows no more throughput | basic DualQ Coupled AQM gives unresponsive flows no more throughput | |||
| advantage than a single-queue AQM would, and that it at least handles | advantage than a single-queue AQM would, and that it at least handles | |||
| overload situations. Overload means that incoming load significantly | overload situations. Overload means that incoming load significantly | |||
| or persistently exceeds output capacity, but it is not intended to be | or persistently exceeds output capacity, but it is not intended to be | |||
| a precise term -- significant and persistent are matters of degree. | a precise term -- significant and persistent are matters of degree. | |||
| A trade-off needs to be made between complexity and the risk of | A trade-off needs to be made between complexity and the risk of | |||
| either traffic class harming the other. In overloaded conditions the | either traffic class harming the other. In overloaded conditions, | |||
| higher priority L4S service will have to sacrifice some aspect of its | the higher priority L4S service will have to sacrifice some aspect of | |||
| performance. Depending on the degree of overload, alternative | its performance. Depending on the degree of overload, alternative | |||
| solutions may relax a different factor: e.g. throughput, delay, drop. | solutions may relax a different factor: for example, throughput, | |||
| These choices need to be made either by the developer or by operator | delay, or drop. These choices need to be made either by the | |||
| policy, rather than by the IETF. Subsequent subsections discuss | developer or by operator policy, rather than by the IETF. Subsequent | |||
| aspects relating to handling of different degrees of overload: | subsections discuss handling different degrees of overload: | |||
| * Unresponsive flows (L and/or C) but not overloaded, i.e. the sum | * Unresponsive flows (L and/or C) but not overloaded, i.e., the sum | |||
| of unresponsive load before adding any responsive traffic is below | of unresponsive load before adding any responsive traffic is below | |||
| capacity; | capacity. | |||
| This case is handled by the regular Coupled DualQ (Section 2.1) | This case is handled by the regular Coupled DualQ (Section 2.1) | |||
| but not discussed there. So below, Section 4.2.1 explains the | but not discussed there. So below, Section 4.2.1 explains the | |||
| design goal, and how it is achieved in practice; | design goal and how it is achieved in practice. | |||
| * Unresponsive flows (L and/or C) causing persistent overload, | * Unresponsive flows (L and/or C) causing persistent overload, i.e., | |||
| i.e. the sum of unresponsive load even before adding any | the sum of unresponsive load even before adding any responsive | |||
| responsive traffic persistently exceeds capacity; | traffic persistently exceeds capacity. | |||
| This case is not covered by the regular Coupled DualQ mechanism | This case is not covered by the regular Coupled DualQ mechanism | |||
| (Section 2.1) but the last para in Section 2.5.1.1 sets out a | (Section 2.1), but the last paragraph in Section 2.5.1.1 sets | |||
| requirement to handle the case where ECN-capable traffic could | out a requirement to handle the case where ECN-capable traffic | |||
| starve non-ECN-capable traffic. Section 4.2.3 below discusses | could starve non-ECN-capable traffic. Section 4.2.3 below | |||
| the general options and gives specific examples. | discusses the general options and gives specific examples. | |||
| * Short-term overload that lies between the 'not overloaded' and | * Short-term overload that lies between the 'not overloaded' and | |||
| 'persistently overloaded' cases. | 'persistently overloaded' cases. | |||
| For the period before overload is deemed persistent, | For the period before overload is deemed persistent, | |||
| Section 4.2.2 discusses options for more immediate mechanisms | Section 4.2.2 discusses options for more immediate mechanisms | |||
| at the scheduler timescale. These prevent short-term | at the scheduler timescale. These prevent short-term | |||
| starvation of the C queue by making the priority of the L queue | starvation of the C queue by making the priority of the L queue | |||
| conditional, as required in Section 2.5.1. | conditional, as required in Section 2.5.1. | |||
| 4.2.1. Unresponsive Traffic without Overload | 4.2.1. Unresponsive Traffic without Overload | |||
| When one or more L flows and/or C flows are unresponsive, but their | When one or more L flows and/or C flows are unresponsive, but their | |||
| total load is within the link capacity so that they do not saturate | total load is within the link capacity so that they do not saturate | |||
| the coupled marking (below 100%), the goal of a DualQ AQM is to | the coupled marking (below 100%), the goal of a DualQ AQM is to | |||
| behave no worse than a single-queue AQM. | behave no worse than a single-queue AQM. | |||
| Tests have shown that this is indeed the case with no additional | Tests have shown that this is indeed the case with no additional | |||
| mechanism beyond the regular Coupled DualQ of Section 2.1 (see the | mechanism beyond the regular Coupled DualQ of Section 2.1 (see the | |||
| results of 'overload experiments' in [DCttH19]). Perhaps counter- | results of 'overload experiments' in [L4Seval22]). Perhaps | |||
| intuitively, whether the unresponsive flow classifies itself into the | counterintuitively, whether the unresponsive flow classifies itself | |||
| L or the C queue, the DualQ system behaves as if it has subtracted | into the L or the C queue, the DualQ system behaves as if it has | |||
| from the overall link capacity. Then, the coupling shares out the | subtracted from the overall link capacity. Then, the coupling shares | |||
| remaining capacity between any competing responsive flows (in either | out the remaining capacity between any competing responsive flows (in | |||
| queue). See also Section 4.2.2, which discusses scheduler-specific | either queue). See also Section 4.2.2, which discusses scheduler- | |||
| details. | specific details. | |||
| 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S Throughput | 4.2.2. Avoiding Short-Term Classic Starvation: Sacrifice L4S Throughput | |||
| or Delay? | or Delay? | |||
| Priority of L4S is required to be conditional (see Section 2.4 & | Priority of L4S is required to be conditional (see Sections 2.4 and | |||
| Section 2.5.1) to avoid short-term starvation of Classic. Otherwise, | 2.5.1) to avoid short-term starvation of Classic. Otherwise, as | |||
| as explained in Section 2.4, even a lone responsive L4S flow could | explained in Section 2.4, even a lone responsive L4S flow could | |||
| temporarily block a small finite set of C packets (e.g. an initial | temporarily block a small finite set of C packets (e.g., an initial | |||
| window or DNS request). The blockage would only be brief, but it | window or DNS request). The blockage would only be brief, but it | |||
| could be longer for certain AQM implementations that can only | could be longer for certain AQM implementations that can only | |||
| increase the congestion signal coupled from the C queue when C | increase the congestion signal coupled from the C queue when C | |||
| packets are actually being dequeued. There is then the question of | packets are actually being dequeued. There is then the question of | |||
| whether to sacrifice L4S throughput or L4S delay (or some other | whether to sacrifice L4S throughput or L4S delay (or some other | |||
| policy) to make the priority conditional: | policy) to make the priority conditional: | |||
| Sacrifice L4S throughput: By using weighted round-robin as the | Sacrifice L4S throughput: | |||
| conditional priority scheduler, the L4S service can sacrifice some | By using WRR as the conditional priority scheduler, the L4S | |||
| throughput during overload. This can either be thought of as | service can sacrifice some throughput during overload. This can | |||
| guaranteeing a minimum throughput service for Classic traffic, or | be thought of as guaranteeing either a minimum throughput service | |||
| as guaranteeing a maximum delay for a packet at the head of the | for Classic traffic or a maximum delay for a packet at the head of | |||
| Classic queue. | the Classic queue. | |||
| Cautionary note: a WRR scheduler can only guarantee Classic | | Cautionary note: a WRR scheduler can only guarantee Classic | |||
| throughput if Classic sources are sending enough to use it -- | | throughput if Classic sources are sending enough to use it | |||
| congestion signals can undermine scheduling because they determine | | -- congestion signals can undermine scheduling because they | |||
| how much responsive traffic of each class arrives for scheduling | | determine how much responsive traffic of each class arrives | |||
| in the first place. This is why scheduling is only relied on to | | for scheduling in the first place. This is why scheduling | |||
| handle short-term starvation; until congestion signals build up | | is only relied on to handle short-term starvation, until | |||
| and the sources react. Even during long-term overload (discussed | | congestion signals build up and the sources react. Even | |||
| more fully in Section 4.2.3), it's pragmatic to discard packets | | during long-term overload (discussed more fully in | |||
| from both queues, which again thins the traffic before it reaches | | Section 4.2.3), it's pragmatic to discard packets from both | |||
| the scheduler. This is because a scheduler cannot be relied on to | | queues, which again thins the traffic before it reaches the | |||
| handle long-term overload since the right scheduler weight cannot | | scheduler. This is because a scheduler cannot be relied on | |||
| be known for every scenario. | | to handle long-term overload since the right scheduler | |||
| | weight cannot be known for every scenario. | ||||
| The scheduling weight of the Classic queue should be small | The scheduling weight of the Classic queue should be small (e.g., | |||
| (e.g. 1/16). In most traffic scenarios the scheduler will not | 1/16). In most traffic scenarios, the scheduler will not | |||
| interfere and it will not need to, because the coupling mechanism | interfere and it will not need to, because the coupling mechanism | |||
| and the end-systems will determine the share of capacity across | and the end systems will determine the share of capacity across | |||
| both queues as if it were a single pool. However, if L4S traffic | both queues as if it were a single pool. However, if L4S traffic | |||
| is over-aggressive or unresponsive, the scheduler weight for | is over-aggressive or unresponsive, the scheduler weight for | |||
| Classic traffic will at least be large enough to ensure it does | Classic traffic will at least be large enough to ensure it does | |||
| not starve in the short-term. | not starve in the short term. | |||
| Although WRR scheduling is only expected to address short-term | Although WRR scheduling is only expected to address short-term | |||
| overload, there are (somewhat rare) cases when WRR has an effect | overload, there are (somewhat rare) cases when WRR has an effect | |||
| on capacity shares over longer time-scales. But its effect is | on capacity shares over longer timescales. But its effect is | |||
| minor, and it certainly does no harm. Specifically, in cases | minor, and it certainly does no harm. Specifically, in cases | |||
| where the ratio of L4S to Classic flows (e.g. 19:1) is greater | where the ratio of L4S to Classic flows (e.g., 19:1) is greater | |||
| than the ratio of their scheduler weights (e.g. 15:1), the L4S | than the ratio of their scheduler weights (e.g., 15:1), the L4S | |||
| flows will get less than an equal share of the capacity, but only | flows will get less than an equal share of the capacity, but only | |||
| slightly. For instance, with the example numbers given, each L4S | slightly. For instance, with the example numbers given, each L4S | |||
| flow will get (15/16)/19 = 4.9% when ideally each would get | flow will get (15/16)/19 = 4.9% when ideally each would get 1/20 = | |||
| 1/20=5%. In the rather specific case of an unresponsive flow | 5%. In the rather specific case of an unresponsive flow taking up | |||
| taking up just less than the capacity set aside for L4S | just less than the capacity set aside for L4S (e.g., 14/16 in the | |||
| (e.g. 14/16 in the above example), using WRR could significantly | above example), using WRR could significantly reduce the capacity | |||
| reduce the capacity left for any responsive L4S flows. | left for any responsive L4S flows. | |||
| The scheduling weight of the Classic queue should not be too | The scheduling weight of the Classic queue should not be too | |||
| small, otherwise a C packet at the head of the queue could be | small, otherwise a C packet at the head of the queue could be | |||
| excessively delayed by a continually busy L queue. For instance | excessively delayed by a continually busy L queue. For instance, | |||
| if the Classic weight is 1/16, the maximum that a Classic packet | if the Classic weight is 1/16, the maximum that a Classic packet | |||
| at the head of the queue can be delayed by L traffic is the | at the head of the queue can be delayed by L traffic is the | |||
| serialization delay of 15 MTU-sized packets. | serialization delay of 15 MTU-sized packets. | |||
| Sacrifice L4S Delay: The operator could choose to control overload | Sacrifice L4S delay: | |||
| of the Classic queue by allowing some delay to 'leak' across to | The operator could choose to control overload of the Classic queue | |||
| the L4S queue. The scheduler can be made to behave like a single | by allowing some delay to 'leak' across to the L4S queue. The | |||
| First-In First-Out (FIFO) queue with different service times by | scheduler can be made to behave like a single FIFO queue with | |||
| implementing a very simple conditional priority scheduler that | different service times by implementing a very simple conditional | |||
| could be called a "time-shifted FIFO" (see the Modifier Earliest | priority scheduler that could be called a "time-shifted FIFO" (TS- | |||
| Deadline First (MEDF) scheduler [MEDF]). This scheduler adds | FIFO) (see the Modifier Earliest Deadline First (MEDF) scheduler | |||
| tshift to the queue delay of the next L4S packet, before comparing | [MEDF]). This scheduler adds tshift to the queue delay of the | |||
| it with the queue delay of the next Classic packet, then it | next L4S packet, before comparing it with the queue delay of the | |||
| selects the packet with the greater adjusted queue delay. | next Classic packet, then it selects the packet with the greater | |||
| adjusted queue delay. | ||||
| Under regular conditions, this time-shifted FIFO scheduler behaves | Under regular conditions, the TS-FIFO scheduler behaves just like | |||
| just like a strict priority scheduler. But under moderate or high | a strict priority scheduler. But under moderate or high overload, | |||
| overload it prevents starvation of the Classic queue, because the | it prevents starvation of the Classic queue, because the time- | |||
| time-shift (tshift) defines the maximum extra queuing delay of | shift (tshift) defines the maximum extra queuing delay of Classic | |||
| Classic packets relative to L4S. This would control milder | packets relative to L4S. This would control milder overload of | |||
| overload of responsive traffic by introducing delay to defer | responsive traffic by introducing delay to defer invoking the | |||
| invoking the overload mechanisms in Section 4.2.3, particularly | overload mechanisms in Section 4.2.3, particularly when close to | |||
| when close to the maximum congestion signal. | the maximum congestion signal. | |||
| The example implementations in Appendix A and Appendix B could both | The example implementations in Appendices A and B could both be | |||
| be implemented with either policy. | implemented with either policy. | |||
| 4.2.3. L4S ECN Saturation: Introduce Drop or Delay? | 4.2.3. L4S ECN Saturation: Introduce Drop or Delay? | |||
| This section concerns persistent overload caused by unresponsive L | This section concerns persistent overload caused by unresponsive L | |||
| and/or C flows. To keep the throughput of both L4S and Classic flows | and/or C flows. To keep the throughput of both L4S and Classic flows | |||
| roughly equal over the full load range, a different control strategy | roughly equal over the full load range, a different control strategy | |||
| needs to be defined above the point where the L4S AQM persistently | needs to be defined above the point where the L4S AQM persistently | |||
| saturates to an ECN marking probability of 100% leaving no room to | saturates to an ECN marking probability of 100%, leaving no room to | |||
| push back the load any harder. L4S ECN marking will saturate first | push back the load any harder. L4S ECN marking will saturate first | |||
| (assuming the coupling factor k>1), even though saturation could be | (assuming the coupling factor k>1), even though saturation could be | |||
| caused by the sum of unresponsive traffic in either or both queues | caused by the sum of unresponsive traffic in either or both queues | |||
| exceeding the link capacity. | exceeding the link capacity. | |||
| The term 'unresponsive' includes cases where a flow becomes | The term 'unresponsive' includes cases where a flow becomes | |||
| temporarily unresponsive, for instance, a real-time flow that takes a | temporarily unresponsive, for instance, a real-time flow that takes a | |||
| while to adapt its rate in response to congestion, or a standard Reno | while to adapt its rate in response to congestion, or a standard Reno | |||
| flow that is normally responsive, but above a certain congestion | flow that is normally responsive, but above a certain congestion | |||
| level it will not be able to reduce its congestion window below the | level it will not be able to reduce its congestion window below the | |||
| allowed minimum of 2 segments [RFC5681], effectively becoming | allowed minimum of 2 segments [RFC5681], effectively becoming | |||
| unresponsive. (Note that L4S traffic ought to remain responsive | unresponsive. (Note that L4S traffic ought to remain responsive | |||
| below a window of 2 segments (see the L4S | below a window of 2 segments. See the L4S requirements [RFC9331].) | |||
| requirements [I-D.ietf-tsvwg-ecn-l4s-id]). | ||||
| Saturation raises the question of whether to relieve congestion by | Saturation raises the question of whether to relieve congestion by | |||
| introducing some drop into the L4S queue or by allowing delay to grow | introducing some drop into the L4S queue or by allowing delay to grow | |||
| in both queues (which could eventually lead to drop due to buffer | in both queues (which could eventually lead to drop due to buffer | |||
| exhaustion anyway): | exhaustion anyway): | |||
| Drop on Saturation: Persistent saturation can be defined by a | Drop on Saturation: | |||
| maximum threshold for coupled L4S ECN marking (assuming k>1) | Persistent saturation can be defined by a maximum threshold for | |||
| before saturation starts to make the flow rates of the different | coupled L4S ECN marking (assuming k>1) before saturation starts to | |||
| traffic types diverge. Above that, the drop probability of | make the flow rates of the different traffic types diverge. Above | |||
| Classic traffic is applied to all packets of all traffic types. | that, the drop probability of Classic traffic is applied to all | |||
| Then experiments have shown that queueing delay can be kept at the | packets of all traffic types. Then experiments have shown that | |||
| target in any overload situation, including with unresponsive | queuing delay can be kept at the target in any overload situation, | |||
| traffic, and no further measures are required (Section 4.2.3.1). | including with unresponsive traffic, and no further measures are | |||
| required (Section 4.2.3.1). | ||||
| Delay on Saturation: When L4S marking saturates, instead of | Delay on Saturation: | |||
| introducing L4S drop, the drop and marking probabilities of both | When L4S marking saturates, instead of introducing L4S drop, the | |||
| queues could be capped. Beyond that, delay will grow either | drop and marking probabilities of both queues could be capped. | |||
| solely in the queue with unresponsive traffic (if WRR is used), or | Beyond that, delay will grow either solely in the queue with | |||
| in both queues (if time-shifted FIFO is used). In either case, | unresponsive traffic (if WRR is used) or in both queues (if TS- | |||
| the higher delay ought to control temporary high congestion. If | FIFO is used). In either case, the higher delay ought to control | |||
| the overload is more persistent, eventually the combined DualQ | temporary high congestion. If the overload is more persistent, | |||
| will overflow and tail drop will control congestion. | eventually the combined DualQ will overflow and tail drop will | |||
| control congestion. | ||||
| The example implementation in Appendix A solely applies the "drop on | The example implementation in Appendix A solely applies the "drop on | |||
| saturation" policy. The DOCSIS specification of a DualQ Coupled | saturation" policy. The DOCSIS specification of a DualQ Coupled AQM | |||
| AQM [DOCSIS3.1] also implements the 'drop on saturation' policy with | [DOCSIS3.1] also implements the 'drop on saturation' policy with a | |||
| a very shallow L buffer. However, the addition of DOCSIS per-flow | very shallow L buffer. However, the addition of DOCSIS per-flow | |||
| Queue Protection [I-D.briscoe-docsis-q-protection] turns this into | Queue Protection [DOCSIS-Q-PROT] turns this into 'delay on | |||
| 'delay on saturation' by redirecting some packets of the flow(s) most | saturation' by redirecting some packets of the flow or flows that are | |||
| responsible for L queue overload into the C queue, which has a higher | most responsible for L queue overload into the C queue, which has a | |||
| delay target. If overload continues, this again becomes 'drop on | higher delay target. If overload continues, this again becomes 'drop | |||
| saturation' as the level of drop in the C queue rises to maintain the | on saturation' as the level of drop in the C queue rises to maintain | |||
| target delay of the C queue. | the target delay of the C queue. | |||
| 4.2.3.1. Protecting against Overload by Unresponsive ECN-Capable | 4.2.3.1. Protecting against Overload by Unresponsive ECN-Capable | |||
| Traffic | Traffic | |||
| Without a specific overload mechanism, unresponsive traffic would | Without a specific overload mechanism, unresponsive traffic would | |||
| have a greater advantage if it were also ECN-capable. The advantage | have a greater advantage if it were also ECN-capable. The advantage | |||
| is undetectable at normal low levels of marking. However, it would | is undetectable at normal low levels of marking. However, it would | |||
| become significant with the higher levels of marking typical during | become significant with the higher levels of marking typical during | |||
| overload, when it could evade a significant degree of drop. This is | overload, when it could evade a significant degree of drop. This is | |||
| an issue whether the ECN-capable traffic is L4S or Classic. | an issue whether the ECN-capable traffic is L4S or Classic. | |||
| This raises the question of whether and when to introduce drop of | This raises the question of whether and when to introduce drop of | |||
| ECN-capable traffic, as required by both Section 7 of the ECN | ECN-capable traffic, as required by both Section 7 of the ECN spec | |||
| spec [RFC3168] and Section 4.2.1 of the AQM | [RFC3168] and Section 4.2.1 of the AQM recommendations [RFC7567]. | |||
| recommendations [RFC7567]. | ||||
| As an example, experiments with the DualPI2 AQM (Appendix A) have | As an example, experiments with the DualPI2 AQM (Appendix A) have | |||
| shown that introducing 'drop on saturation' at 100% coupled L4S | shown that introducing 'drop on saturation' at 100% coupled L4S | |||
| marking addresses this problem with unresponsive ECN as well as | marking addresses this problem with unresponsive ECN, and it also | |||
| addressing the saturation problem. At saturation, DualPI2 switches | addresses the saturation problem. At saturation, DualPI2 switches | |||
| into overload mode, where the base AQM is driven by the max delay of | into overload mode, where the Base AQM is driven by the max delay of | |||
| both queues and it introduces probabilistic drop to both queues | both queues, and it introduces probabilistic drop to both queues | |||
| equally. It leaves only a small range of congestion levels just | equally. It leaves only a small range of congestion levels just | |||
| below saturation where unresponsive traffic gains any advantage from | below saturation where unresponsive traffic gains any advantage from | |||
| using the ECN capability (relative to being unresponsive without | using the ECN capability (relative to being unresponsive without | |||
| ECN), and the advantage is hardly detectable (see [DualQ-Test] and | ECN), and the advantage is hardly detectable (see [DualQ-Test] and | |||
| section IV-E of [DCttH19]. Also overload with an unresponsive ECT(1) | section IV-G of [L4Seval22]). Also, overload with an unresponsive | |||
| flow gets no more bandwidth advantage than with ECT(0). | ECT(1) flow gets no more bandwidth advantage than with ECT(0). | |||
| 5. References | 5. References | |||
| 5.1. Normative References | 5.1. Normative References | |||
| [I-D.ietf-tsvwg-ecn-l4s-id] | ||||
| Schepper, K. D. and B. Briscoe, "Explicit Congestion | ||||
| Notification (ECN) Protocol for Very Low Queuing Delay | ||||
| (L4S)", Work in Progress, Internet-Draft, draft-ietf- | ||||
| tsvwg-ecn-l4s-id-28, 8 August 2022, | ||||
| <https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
| ietf-tsvwg-ecn-l4s-id/>. | ||||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, DOI 10.17487/RFC3168, September 2001, | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
| <https://www.rfc-editor.org/info/rfc3168>. | <https://www.rfc-editor.org/info/rfc3168>. | |||
| [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion | |||
| Notification (ECN) Experimentation", RFC 8311, | Notification (ECN) Experimentation", RFC 8311, | |||
| DOI 10.17487/RFC8311, January 2018, | DOI 10.17487/RFC8311, January 2018, | |||
| <https://www.rfc-editor.org/info/rfc8311>. | <https://www.rfc-editor.org/info/rfc8311>. | |||
| [RFC9331] De Schepper, K. and B. Briscoe, Ed., "The Explicit | ||||
| Congestion Notification (ECN) Protocol for Low Latency, | ||||
| Low Loss, and Scalable Throughput (L4S)", RFC 9331, | ||||
| DOI 10.17487/RFC9331, January 2023, | ||||
| <https://www.rfc-editor.org/info/rfc9331>. | ||||
| 5.2. Informative References | 5.2. Informative References | |||
| [Alizadeh-stability] | [Alizadeh-stability] | |||
| Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis | Alizadeh, M., Javanmard, A., and B. Prabhakar, "Analysis | |||
| of DCTCP: Stability, Convergence, and Fairness", ACM | of DCTCP: Stability, Convergence, and Fairness", | |||
| SIGMETRICS 2011 , June 2011, | SIGMETRICS '11: Proceedings of the ACM SIGMETRICS Joint | |||
| <https://dl.acm.org/citation.cfm?id=1993753>. | International Conference on Measurement and Modeling of | |||
| Computer Systems, pp. 73-84, DOI 10.1145/1993744.1993753, | ||||
| June 2011, <https://dl.acm.org/citation.cfm?id=1993753>. | ||||
| [AQMmetrics] | [AQMmetrics] | |||
| Kwon, M. and S. Fahmy, "A Comparison of Load-based and | Kwon, M. and S. Fahmy, "A Comparison of Load-based and | |||
| Queue- based Active Queue Management Algorithms", Proc. | Queue-based Active Queue Management Algorithms", Proc. | |||
| Int'l Soc. for Optical Engineering (SPIE) 4866:35--46 DOI: | Int'l Soc. for Optical Engineering (SPIE), Vol. 4866, pp. | |||
| 10.1117/12.473021, 2002, | 35-46, DOI 10.1117/12.473021, 2002, | |||
| <https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. | <https://www.cs.purdue.edu/homes/fahmy/papers/ldc.pdf>. | |||
| [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | |||
| Algorithm for Increasing the Robustness of RED's Active | Algorithm for Increasing the Robustness of RED's Active | |||
| Queue Management", ACIRI Technical Report , August 2001, | Queue Management", ACIRI Technical Report 301, August | |||
| <https://www.icir.org/floyd/red.html>. | 2001, <https://www.icsi.berkeley.edu/icsi/node/2032>. | |||
| [BBRv2] Cardwell, N., "BRTCP BBR v2 Alpha/Preview Release", GitHub | [BBR-CC] Cardwell, N., Cheng, Y., Yeganeh, S. H., Swett, I., and V. | |||
| repository; Linux congestion control module, | Jacobson, "BBR Congestion Control", Work in Progress, | |||
| <https://github.com/google/bbr/blob/v2alpha/README.md>. | Internet-Draft, draft-cardwell-iccrg-bbr-congestion- | |||
| control-02, 7 March 2022, | ||||
| <https://datatracker.ietf.org/doc/html/draft-cardwell- | ||||
| iccrg-bbr-congestion-control-02>. | ||||
| [BBRv2] "TCP BBR v2 Alpha/Preview Release", commit 17700ca, June | ||||
| 2022, <https://github.com/google/bbr>. | ||||
| [Boru20] Boru Oljira, D., Grinnemo, K-J., Brunstrom, A., and J. | [Boru20] Boru Oljira, D., Grinnemo, K-J., Brunstrom, A., and J. | |||
| Taheri, "Validating the Sharing Behavior and Latency | Taheri, "Validating the Sharing Behavior and Latency | |||
| Characteristics of the L4S Architecture", ACM CCR | Characteristics of the L4S Architecture", ACM SIGCOMM | |||
| 50(2):37--44, May 2020, | Computer Communication Review, Vol. 50, Issue 2, pp. | |||
| 37-44, DOI 10.1145/3402413.3402419, May 2020, | ||||
| <https://dl.acm.org/doi/abs/10.1145/3402413.3402419>. | <https://dl.acm.org/doi/abs/10.1145/3402413.3402419>. | |||
| [CCcensus19] | [CCcensus19] | |||
| Mishra, A., Sun, X., Jain, A., Pande, S., Joshi, R., and | Mishra, A., Sun, X., Jain, A., Pande, S., Joshi, R., and | |||
| B. Leong, "The Great Internet TCP Congestion Control | B. Leong, "The Great Internet TCP Congestion Control | |||
| Census", Proc. ACM on Measurement and Analysis of | Census", Proceedings of the ACM on Measurement and | |||
| Computing Systems 3(3), December 2019, | Analysis of Computing Systems, Vol. 3, Issue 3, Article | |||
| No. 45, pp. 1-24, DOI 10.1145/3366693, December 2019, | ||||
| <https://doi.org/10.1145/3366693>. | <https://doi.org/10.1145/3366693>. | |||
| [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | |||
| ACM Queue 10(5), May 2012, | ACM Queue, Vol. 10, Issue 5, May 2012, | |||
| <https://queue.acm.org/issuedetail.cfm?issue=2208917>. | <https://queue.acm.org/issuedetail.cfm?issue=2208917>. | |||
| [CRED_Insights] | [CRED_Insights] | |||
| Briscoe, B., "Insights from Curvy RED (Random Early | Briscoe, B. and K. De Schepper, "Insights from Curvy RED | |||
| Detection)", BT Technical Report TR-TUB8-2015-003 | (Random Early Detection)", BT Technical Report, TR- | |||
| arXiv:1904.07339 [cs.NI], July 2015, | TUB8-2015-003, DOI 10.48550/arXiv.1904.07339, August 2015, | |||
| <https://arxiv.org/abs/1904.07339>. | <https://arxiv.org/abs/1904.07339>. | |||
| [DCttH19] De Schepper, K., Bondarenko, O., Tilmans, O., and B. | [DOCSIS-Q-PROT] | |||
| Briscoe, "`Data Centre to the Home': Ultra-Low Latency for | Briscoe, B. and G. White, "The DOCSIS(r) Queue Protection | |||
| All", Updated RITE project Technical Report , July 2019, | Algorithm to Preserve Low Latency", Work in Progress, | |||
| <https://bobbriscoe.net/pubs.html#DCttH_TR>. | Internet-Draft, draft-briscoe-docsis-q-protection-06, 13 | |||
| May 2022, <https://datatracker.ietf.org/doc/html/draft- | ||||
| briscoe-docsis-q-protection-06>. | ||||
| [DOCSIS3.1] | [DOCSIS3.1] | |||
| CableLabs, "MAC and Upper Layer Protocols Interface | CableLabs, "DOCSIS 3.1 MAC and Upper Layer Protocols | |||
| (MULPI) Specification, CM-SP-MULPIv3.1", Data-Over-Cable | Interface Specification", CM-SP-MULPIv3.1, Data-Over-Cable | |||
| Service Interface Specifications DOCSIS® 3.1 Version i17 | Service Interface Specifications DOCSIS 3.1 Version I17 or | |||
| or later, 21 January 2019, <https://specification- | later, January 2019, <https://specification- | |||
| search.cablelabs.com/CM-SP-MULPIv3.1>. | search.cablelabs.com/CM-SP-MULPIv3>. | |||
| [DualPI2Linux] | [DualPI2Linux] | |||
| Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., | Albisser, O., De Schepper, K., Briscoe, B., Tilmans, O., | |||
| and H. Steen, "DUALPI2 - Low Latency, Low Loss and | and H. Steen, "DUALPI2 - Low Latency, Low Loss and | |||
| Scalable (L4S) AQM", Proc. Linux Netdev 0x13 , March 2019, | Scalable (L4S) AQM", Proceedings of Linux Netdev 0x13 , | |||
| <https://www.netdevconf.org/0x13/session.html?talk- | March 2019, <https://www.netdevconf.org/0x13/ | |||
| DUALPI2-AQM>. | session.html?talk-DUALPI2-AQM>. | |||
| [DualQ-Test] | [DualQ-Test] | |||
| Steen, H., "Destruction Testing: Ultra-Low Delay using | Steen, H., "Destruction Testing: Ultra-Low Delay using | |||
| Dual Queue Coupled Active Queue Management", Master's | Dual Queue Coupled Active Queue Management", Master's | |||
| Thesis, Dept of Informatics, Uni Oslo , May 2017, | Thesis, Department of Informatics, University of Oslo, May | |||
| <https://www.duo.uio.no/bitstream/handle/10852/57424/ | 2017. | |||
| thesis-henrste.pdf?sequence=1>. | ||||
| [Dukkipati06] | [Dukkipati06] | |||
| Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is | Dukkipati, N. and N. McKeown, "Why Flow-Completion Time is | |||
| the Right Metric for Congestion Control", ACM CCR | the Right Metric for Congestion Control", ACM SIGCOMM | |||
| 36(1):59--62, January 2006, | Computer Communication Review, Vol. 36, Issue 1, pp. | |||
| 59-62, DOI 10.1145/1111322.1111336, January 2006, | ||||
| <https://dl.acm.org/doi/10.1145/1111322.1111336>. | <https://dl.acm.org/doi/10.1145/1111322.1111336>. | |||
| [Heist21] Heist, P. and J. Morton, "L4S Tests", GitHub README, | [Heist21] "L4S Tests", commit e21cd91, August 2021, | |||
| August 2021, <https://github.com/heistp/l4s- | <https://github.com/heistp/l4s-tests>. | |||
| tests/#underutilization-with-bursty-traffic>. | ||||
| [I-D.briscoe-docsis-q-protection] | ||||
| Briscoe, B. and G. White, "The DOCSIS(r) Queue Protection | ||||
| Algorithm to Preserve Low Latency", Work in Progress, | ||||
| Internet-Draft, draft-briscoe-docsis-q-protection-06, 13 | ||||
| May 2022, | ||||
| <https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
| briscoe-docsis-q-protection/>. | ||||
| [I-D.briscoe-iccrg-prague-congestion-control] | ||||
| Schepper, K. D., Tilmans, O., and B. Briscoe, "Prague | ||||
| Congestion Control", Work in Progress, Internet-Draft, | ||||
| draft-briscoe-iccrg-prague-congestion-control-01, 11 July | ||||
| 2022, <https://datatracker.ietf.org/api/v1/doc/document/ | ||||
| draft-briscoe-iccrg-prague-congestion-control/>. | ||||
| [I-D.briscoe-tsvwg-l4s-diffserv] | [L4S-DIFFSERV] | |||
| Briscoe, B., "Interactions between Low Latency, Low Loss, | Briscoe, B., "Interactions between Low Latency, Low Loss, | |||
| Scalable Throughput (L4S) and Differentiated Services", | Scalable Throughput (L4S) and Differentiated Services", | |||
| Work in Progress, Internet-Draft, draft-briscoe-tsvwg-l4s- | Work in Progress, Internet-Draft, draft-briscoe-tsvwg-l4s- | |||
| diffserv-02, 2 July 2018, | diffserv-02, 4 November 2018, | |||
| <https://datatracker.ietf.org/api/v1/doc/document/draft- | <https://datatracker.ietf.org/doc/html/draft-briscoe- | |||
| briscoe-tsvwg-l4s-diffserv/>. | tsvwg-l4s-diffserv-02>. | |||
| [I-D.cardwell-iccrg-bbr-congestion-control] | ||||
| Cardwell, N., Cheng, Y., Yeganeh, S. H., Swett, I., and V. | ||||
| Jacobson, "BBR Congestion Control", Work in Progress, | ||||
| Internet-Draft, draft-cardwell-iccrg-bbr-congestion- | ||||
| control-02, 7 March 2022, | ||||
| <https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
| cardwell-iccrg-bbr-congestion-control/>. | ||||
| [I-D.ietf-tsvwg-l4s-arch] | ||||
| Briscoe, B., Schepper, K. D., Bagnulo, M., and G. White, | ||||
| "Low Latency, Low Loss, Scalable Throughput (L4S) Internet | ||||
| Service: Architecture", Work in Progress, Internet-Draft, | ||||
| draft-ietf-tsvwg-l4s-arch-19, 27 July 2022, | ||||
| <https://datatracker.ietf.org/api/v1/doc/document/draft- | ||||
| ietf-tsvwg-l4s-arch/>. | ||||
| [I-D.mathis-iccrg-relentless-tcp] | ||||
| Mathis, M., "Relentless Congestion Control", Work in | ||||
| Progress, Internet-Draft, draft-mathis-iccrg-relentless- | ||||
| tcp-00, 4 March 2009, <https://www.ietf.org/archive/id/ | ||||
| draft-mathis-iccrg-relentless-tcp-00.txt>. | ||||
| [L4Sdemo16] | [L4Sdemo16] | |||
| Bondarenko, O., De Schepper, K., Tsang, I., and B. | Bondarenko, O., De Schepper, K., Tsang, I., Briscoe, B., | |||
| Briscoe, "Ultra-Low Delay for All: Live Experience, Live | Petlund, A., and C. Griwodz, "Ultra-Low Delay for All: | |||
| Analysis", Proc. MMSYS'16 pp33:1--33:4, May 2016, | Live Experience, Live Analysis", Proceedings of the 7th | |||
| <https//dl.acm.org/citation.cfm?doid=2910017.2910633 | International Conference on Multimedia Systems, Article | |||
| (videos of demos: | No. 33, pp. 1-4, DOI 10.1145/2910017.2910633, May 2016, | |||
| https://riteproject.eu/dctth/#1511dispatchwg )>. | <https://dl.acm.org/citation.cfm?doid=2910017.2910633>. | |||
| [L4Seval22] | ||||
| De Schepper, K., Albisser, O., Tilmans, O., and B. | ||||
| Briscoe, "Dual Queue Coupled AQM: Deployable Very Low | ||||
| Queuing Delay for All", Preprint submitted to IEEE/ACM | ||||
| Transactions on Networking, DOI 10.48550/arXiv.2209.01078, | ||||
| September 2022, <https://arxiv.org/abs/2209.01078>. | ||||
| [L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C., | [L4S_5G] Willars, P., Wittenmark, E., Ronkainen, H., Östberg, C., | |||
| Johansson, I., Strand, J., Lédl, P., and D. Schnieders, | Johansson, I., Strand, J., Lédl, P., and D. Schnieders, | |||
| "Enabling time-critical applications over 5G with rate | "Enabling time-critical applications over 5G with rate | |||
| adaptation", Ericsson - Deutsche Telekom White Paper BNEW- | adaptation", Ericsson - Deutsche Telekom White Paper, | |||
| 21:025455 Uen, May 2021, <https://www.ericsson.com/en/ | BNEW-21:025455, May 2021, <https://www.ericsson.com/en/ | |||
| reports-and-papers/white-papers/enabling-time-critical- | reports-and-papers/white-papers/enabling-time-critical- | |||
| applications-over-5g-with-rate-adaptation>. | applications-over-5g-with-rate-adaptation>. | |||
| [Labovitz10] | [Labovitz10] | |||
| Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, | Labovitz, C., Iekel-Johnson, S., McPherson, D., Oberheide, | |||
| J., and F. Jahanian, "Internet Inter-Domain Traffic", Proc | J., and F. Jahanian, "Internet Inter-Domain Traffic", ACM | |||
| ACM SIGCOMM; ACM CCR 40(4):75--86, August 2010, | SIGCOMM Computer Communication Review, Vol. 40, Issue 4, | |||
| pp. 75-86, DOI 10.1145/1851275.1851194, August 2010, | ||||
| <https://doi.org/10.1145/1851275.1851194>. | <https://doi.org/10.1145/1851275.1851194>. | |||
| [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | [LLD] White, G., Sundaresan, K., and B. Briscoe, "Low Latency | |||
| DOCSIS: Technology Overview", CableLabs White Paper , | DOCSIS: Technology Overview", CableLabs White Paper, | |||
| February 2019, <https://cablela.bs/low-latency-docsis- | February 2019, <https://cablela.bs/low-latency-docsis- | |||
| technology-overview-february-2019>. | technology-overview-february-2019>. | |||
| [MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - a | [MEDF] Menth, M., Schmid, M., Heiss, H., and T. Reim, "MEDF - A | |||
| simple scheduling algorithm for two real-time transport | Simple Scheduling Algorithm for Two Real-Time Transport | |||
| service classes with application in the UTRAN", Proc. IEEE | Service Classes with Application in the UTRAN", Proc. IEEE | |||
| Conference on Computer Communications (INFOCOM'03) Vol.2 | Conference on Computer Communications (INFOCOM'03), Vol. | |||
| pp.1116-1122, March 2003, | 2, pp. 1116-1122, DOI 10.1109/INFCOM.2003.1208948, March | |||
| <https://infocom2003.ieee-infocom.org/papers/27_04.PDF>. | 2003, <https://doi.org/10.1109/INFCOM.2003.1208948>. | |||
| [PI2] De Schepper, K., Bondarenko, O., Briscoe, B., and I. | [PI2] De Schepper, K., Bondarenko, O., Briscoe, B., and I. | |||
| Tsang, "PI2: A Linearized AQM for both Classic and | Tsang, "PI2: A Linearized AQM for both Classic and | |||
| Scalable TCP", ACM CoNEXT'16 , December 2016, | Scalable TCP", ACM CoNEXT'16, DOI 10.1145/2999572.2999578, | |||
| <https://riteproject.files.wordpress.com/2015/10/ | December 2016, | |||
| pi2_conext.pdf>. | <https://dl.acm.org/doi/10.1145/2999572.2999578>. | |||
| [PI2param] Briscoe, B., "PI2 Parameters", Technical Report TR-BB- | [PI2param] Briscoe, B., "PI2 Parameters", Technical Report, TR-BB- | |||
| 2021-001 arXiv:2107.01003 [cs.NI], July 2021, | 2021-001, arXiv:2107.01003 [cs.NI], | |||
| DOI 10.48550/arXiv.2107.01003, July 2021, | ||||
| <https://arxiv.org/abs/2107.01003>. | <https://arxiv.org/abs/2107.01003>. | |||
| [PRAGUE-CC] | ||||
| De Schepper, K., Tilmans, O., and B. Briscoe, "Prague | ||||
| Congestion Control", Work in Progress, Internet-Draft, | ||||
| draft-briscoe-iccrg-prague-congestion-control-01, 11 July | ||||
| 2022, <https://datatracker.ietf.org/doc/html/draft- | ||||
| briscoe-iccrg-prague-congestion-control-01>. | ||||
| [PragueLinux] | [PragueLinux] | |||
| Briscoe, B., De Schepper, K., Albisser, O., Misund, J., | Briscoe, B., De Schepper, K., Albisser, O., Misund, J., | |||
| Tilmans, O., Kühlewind, M., and A.S. Ahmed, "Implementing | Tilmans, O., Kuehlewind, M., and A. Ahmed, "Implementing | |||
| the `TCP Prague' Requirements for Low Latency Low Loss | the 'TCP Prague' Requirements for L4S", Proceedings of | |||
| Scalable Throughput (L4S)", Proc. Linux Netdev 0x13 , | Linux Netdev 0x13, March 2019, | |||
| March 2019, <https://www.netdevconf.org/0x13/ | <https://www.netdevconf.org/0x13/session.html?talk-tcp- | |||
| session.html?talk-tcp-prague-l4s>. | prague-l4s>. | |||
| [RED] Floyd, S. and V. Jacobson, "Random Early Detection | ||||
| Gateways for Congestion Avoidance", IEEE/ACM Transactions | ||||
| on Networking, Volume 1, Issue 4, pp. 397-413, | ||||
| DOI 10.1109/90.251892, August 1993, | ||||
| <https://dl.acm.org/doi/10.1109/90.251892>. | ||||
| [RELENTLESS] | ||||
| Mathis, M., "Relentless Congestion Control", Work in | ||||
| Progress, Internet-Draft, draft-mathis-iccrg-relentless- | ||||
| tcp-00, 4 March 2009, | ||||
| <https://datatracker.ietf.org/doc/html/draft-mathis-iccrg- | ||||
| relentless-tcp-00>. | ||||
| [RFC0970] Nagle, J., "On Packet Switches With Infinite Storage", | [RFC0970] Nagle, J., "On Packet Switches With Infinite Storage", | |||
| RFC 970, DOI 10.17487/RFC0970, December 1985, | RFC 970, DOI 10.17487/RFC0970, December 1985, | |||
| <https://www.rfc-editor.org/info/rfc970>. | <https://www.rfc-editor.org/info/rfc970>. | |||
| [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | ||||
| S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | ||||
| Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | ||||
| S., Wroclawski, J., and L. Zhang, "Recommendations on | ||||
| Queue Management and Congestion Avoidance in the | ||||
| Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, | ||||
| <https://www.rfc-editor.org/info/rfc2309>. | ||||
| [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, | [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, | |||
| RFC 2914, DOI 10.17487/RFC2914, September 2000, | RFC 2914, DOI 10.17487/RFC2914, September 2000, | |||
| <https://www.rfc-editor.org/info/rfc2914>. | <https://www.rfc-editor.org/info/rfc2914>. | |||
| [RFC3246] Davie, B., Charny, A., Bennet, J.C.R., Benson, K., Le | [RFC3246] Davie, B., Charny, A., Bennet, J.C.R., Benson, K., Le | |||
| Boudec, J.Y., Courtney, W., Davari, S., Firoiu, V., and D. | Boudec, J.Y., Courtney, W., Davari, S., Firoiu, V., and D. | |||
| Stiliadis, "An Expedited Forwarding PHB (Per-Hop | Stiliadis, "An Expedited Forwarding PHB (Per-Hop | |||
| Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, | Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, | |||
| <https://www.rfc-editor.org/info/rfc3246>. | <https://www.rfc-editor.org/info/rfc3246>. | |||
| skipping to change at page 34, line 17 ¶ | skipping to change at line 1552 ¶ | |||
| BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, | BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, | |||
| <https://www.rfc-editor.org/info/rfc7567>. | <https://www.rfc-editor.org/info/rfc7567>. | |||
| [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, | [RFC8033] Pan, R., Natarajan, P., Baker, F., and G. White, | |||
| "Proportional Integral Controller Enhanced (PIE): A | "Proportional Integral Controller Enhanced (PIE): A | |||
| Lightweight Control Scheme to Address the Bufferbloat | Lightweight Control Scheme to Address the Bufferbloat | |||
| Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, | Problem", RFC 8033, DOI 10.17487/RFC8033, February 2017, | |||
| <https://www.rfc-editor.org/info/rfc8033>. | <https://www.rfc-editor.org/info/rfc8033>. | |||
| [RFC8034] White, G. and R. Pan, "Active Queue Management (AQM) Based | [RFC8034] White, G. and R. Pan, "Active Queue Management (AQM) Based | |||
| on Proportional Integral Controller Enhanced PIE) for | on Proportional Integral Controller Enhanced (PIE) for | |||
| Data-Over-Cable Service Interface Specifications (DOCSIS) | Data-Over-Cable Service Interface Specifications (DOCSIS) | |||
| Cable Modems", RFC 8034, DOI 10.17487/RFC8034, February | Cable Modems", RFC 8034, DOI 10.17487/RFC8034, February | |||
| 2017, <https://www.rfc-editor.org/info/rfc8034>. | 2017, <https://www.rfc-editor.org/info/rfc8034>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., | [RFC8257] Bensley, S., Thaler, D., Balasubramanian, P., Eggert, L., | |||
| and G. Judd, "Data Center TCP (DCTCP): TCP Congestion | and G. Judd, "Data Center TCP (DCTCP): TCP Congestion | |||
| skipping to change at page 35, line 5 ¶ | skipping to change at line 1586 ¶ | |||
| [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | |||
| R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | |||
| RFC 8312, DOI 10.17487/RFC8312, February 2018, | RFC 8312, DOI 10.17487/RFC8312, February 2018, | |||
| <https://www.rfc-editor.org/info/rfc8312>. | <https://www.rfc-editor.org/info/rfc8312>. | |||
| [RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of | [RFC8404] Moriarty, K., Ed. and A. Morton, Ed., "Effects of | |||
| Pervasive Encryption on Operators", RFC 8404, | Pervasive Encryption on Operators", RFC 8404, | |||
| DOI 10.17487/RFC8404, July 2018, | DOI 10.17487/RFC8404, July 2018, | |||
| <https://www.rfc-editor.org/info/rfc8404>. | <https://www.rfc-editor.org/info/rfc8404>. | |||
| [SCReAM] Johansson, I., "SCReAM", GitHub repository; , | [RFC9000] Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based | |||
| <https://github.com/EricssonResearch/scream/blob/master/ | Multiplexed and Secure Transport", RFC 9000, | |||
| README.md>. | DOI 10.17487/RFC9000, May 2021, | |||
| <https://www.rfc-editor.org/info/rfc9000>. | ||||
| [RFC9330] Briscoe, B., Ed., De Schepper, K., Bagnulo, M., and G. | ||||
| White, "Low Latency, Low Loss, and Scalable Throughput | ||||
| (L4S) Internet Service: Architecture", RFC 9330, | ||||
| DOI 10.17487/RFC9330, January 2023, | ||||
| <https://www.rfc-editor.org/info/rfc9330>. | ||||
| [SCReAM-L4S] | ||||
| "SCReAM", commit fda6c53, June 2022, | ||||
| <https://github.com/EricssonResearch/scream>. | ||||
| [SigQ-Dyn] Briscoe, B., "Rapid Signalling of Queue Dynamics", | [SigQ-Dyn] Briscoe, B., "Rapid Signalling of Queue Dynamics", | |||
| Technical Report TR-BB-2017-001 arXiv:1904.07044 [cs.NI], | Technical Report, TR-BB-2017-001, | |||
| September 2017, <https://arxiv.org/abs/1904.07044>. | DOI 10.48550/arXiv.1904.07044, September 2017, | |||
| <https://arxiv.org/abs/1904.07044>. | ||||
| Appendix A. Example DualQ Coupled PI2 Algorithm | Appendix A. Example DualQ Coupled PI2 Algorithm | |||
| As a first concrete example, the pseudocode below gives the DualPI2 | As a first concrete example, the pseudocode below gives the DualPI2 | |||
| algorithm. DualPI2 follows the structure of the DualQ Coupled AQM | algorithm. DualPI2 follows the structure of the DualQ Coupled AQM | |||
| framework in Figure 1. A simple ramp function (configured in units | framework in Figure 1. A simple ramp function (configured in units | |||
| of queuing time) with unsmoothed ECN marking is used for the Native | of queuing time) with unsmoothed ECN marking is used for the Native | |||
| L4S AQM. The ramp can also be configured as a step function. The | L4S AQM. The ramp can also be configured as a step function. The | |||
| PI2 algorithm [PI2] is used for the Classic AQM. PI2 is an improved | PI2 algorithm [PI2] is used for the Classic AQM. PI2 is an improved | |||
| variant of the PIE AQM [RFC8033]. | variant of the PIE AQM [RFC8033]. | |||
| The pseudocode will be introduced in two passes. The first pass | The pseudocode will be introduced in two passes. The first pass | |||
| explains the core concepts, deferring handling of edge-cases like | explains the core concepts, deferring handling of edge-cases like | |||
| overload to the second pass. To aid comparison, line numbers are | overload to the second pass. To aid comparison, line numbers are | |||
| kept in step between the two passes by using letter suffixes where | kept in step between the two passes by using letter suffixes where | |||
| the longer code needs extra lines. | the longer code needs extra lines. | |||
| All variables are assumed to be floating point in their basic units | All variables are assumed to be floating point in their basic units | |||
| (size in bytes, time in seconds, rates in bytes/second, alpha and | (size in bytes, time in seconds, rates in bytes/second, alpha and | |||
| beta in Hz, and probabilities from 0 to 1. Constants expressed in k | beta in Hz, and probabilities from 0 to 1). Constants expressed in k | |||
| (kilo), M (mega), G (giga), u (micro), m (milli) , %, ... are assumed | (kilo), M (mega), G (giga), u (micro), m (milli), %, and so forth, | |||
| to be converted to their appropriate multiple or fraction to | are assumed to be converted to their appropriate multiple or fraction | |||
| represent the basic units. A real implementation that wants to use | to represent the basic units. A real implementation that wants to | |||
| integer values needs to handle appropriate scaling factors and allow | use integer values needs to handle appropriate scaling factors and | |||
| accordingly appropriate resolution of its integer types (including | allow appropriate resolution of its integer types (including | |||
| temporary internal values during calculations). | temporary internal values during calculations). | |||
| A full open source implementation for Linux is available at: | A full open source implementation for Linux is available at | |||
| https://github.com/L4STeam/sch_dualpi2_upstream and explained in | <https://github.com/L4STeam/sch_dualpi2_upstream> and explained in | |||
| [DualPI2Linux]. The specification of the DualQ Coupled AQM for | [DualPI2Linux]. The specification of the DualQ Coupled AQM for | |||
| DOCSIS cable modems and CMTSs is available in [DOCSIS3.1] and | DOCSIS cable modems and cable modem termination systems (CMTSs) is | |||
| explained in [LLD]. | available in [DOCSIS3.1] and explained in [LLD]. | |||
| A.1. Pass #1: Core Concepts | A.1. Pass #1: Core Concepts | |||
| The pseudocode manipulates three main structures of variables: the | The pseudocode manipulates three main structures of variables: the | |||
| packet (pkt), the L4S queue (lq) and the Classic queue (cq). The | packet (pkt), the L4S queue (lq), and the Classic queue (cq). The | |||
| pseudocode consists of the following six functions: | pseudocode consists of the following six functions: | |||
| * The initialization function dualpi2_params_init(...) (Figure 2) | * The initialization function dualpi2_params_init(...) (Figure 2) | |||
| that sets parameter defaults (the API for setting non-default | that sets parameter defaults (the API for setting non-default | |||
| values is omitted for brevity) | values is omitted for brevity). | |||
| * The enqueue function dualpi2_enqueue(lq, cq, pkt) (Figure 3) | * The enqueue function dualpi2_enqueue(lq, cq, pkt) (Figure 3). | |||
| * The dequeue function dualpi2_dequeue(lq, cq, pkt) (Figure 4) | * The dequeue function dualpi2_dequeue(lq, cq, pkt) (Figure 4). | |||
| * The recurrence function recur(q, likelihood) for de-randomized ECN | * The recurrence function recur(q, likelihood) for de-randomized ECN | |||
| marking (shown at the end of Figure 4). | marking (shown at the end of Figure 4). | |||
| * The L4S AQM function laqm(qdelay) (Figure 5) used to calculate the | * The L4S AQM function laqm(qdelay) (Figure 5) used to calculate the | |||
| ECN-marking probability for the L4S queue | ECN-marking probability for the L4S queue. | |||
| * The base AQM function that implements the PI algorithm | * The Base AQM function that implements the PI algorithm | |||
| dualpi2_update(lq, cq) (Figure 6) used to regularly update the | dualpi2_update(lq, cq) (Figure 6) used to regularly update the | |||
| base probability (p'), which is squared for the Classic AQM as | base probability (p'), which is squared for the Classic AQM as | |||
| well as being coupled across to the L4S queue. | well as being coupled across to the L4S queue. | |||
| It also uses the following functions that are not shown in full here: | It also uses the following functions that are not shown in full here: | |||
| * scheduler(), which selects between the head packets of the two | * scheduler(), which selects between the head packets of the two | |||
| queues; the choice of scheduler technology is discussed later; | queues. The choice of scheduler technology is discussed later. | |||
| * cq.byt() or lq.byt() returns the current length (aka. backlog) of | * cq.byt() or lq.byt() returns the current length (a.k.a. backlog) | |||
| the relevant queue in bytes; | of the relevant queue in bytes. | |||
| * cq.len() or lq.len() returns the current length of the relevant | * cq.len() or lq.len() returns the current length of the relevant | |||
| queue in packets; | queue in packets. | |||
| * cq.time() or lq.time() returns the current queuing delay of the | * cq.time() or lq.time() returns the current queuing delay of the | |||
| relevant queue in units of time (see Note a); | relevant queue in units of time (see Note a below). | |||
| * mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; | * mark(pkt) and drop(pkt) for ECN marking and dropping a packet. | |||
| In experiments so far (building on experiments with PIE) on broadband | In experiments so far (building on experiments with PIE) on broadband | |||
| access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms | access links ranging from 4 Mb/s to 200 Mb/s with base RTTs from 5 ms | |||
| to 100 ms, DualPI2 achieves good results with the default parameters | to 100 ms, DualPI2 achieves good results with the default parameters | |||
| in Figure 2. The parameters are categorised by whether they relate | in Figure 2. The parameters are categorised by whether they relate | |||
| to the Base PI2 AQM, the L4S AQM or the framework coupling them | to the PI2 AQM, the L4S AQM, or the framework coupling them together. | |||
| together. Constants and variables derived from these parameters are | Constants and variables derived from these parameters are also | |||
| also included at the end of each category. Each parameter is | included at the end of each category. Each parameter is explained as | |||
| explained as it is encountered in the walk-through of the pseudocode | it is encountered in the walk-through of the pseudocode below, and | |||
| below, and the rationale for the chosen defaults are given so that | the rationale for the chosen defaults are given so that sensible | |||
| sensible values can be used in scenarios other than the regular | values can be used in scenarios other than the regular public | |||
| public Internet. | Internet. | |||
| 1: dualpi2_params_init(...) { % Set input parameter defaults | 1: dualpi2_params_init(...) { % Set input parameter defaults | |||
| 2: % DualQ Coupled framework parameters | 2: % DualQ Coupled framework parameters | |||
| 5: limit = MAX_LINK_RATE * 250 ms % Dual buffer size | 5: limit = MAX_LINK_RATE * 250 ms % Dual buffer size | |||
| 3: k = 2 % Coupling factor | 3: k = 2 % Coupling factor | |||
| 4: % NOT SHOWN % scheduler-dependent weight or equival't parameter | 4: % NOT SHOWN % scheduler-dependent weight or equival't parameter | |||
| 6: | 6: | |||
| 7: % PI2 Classic AQM parameters | 7: % PI2 Classic AQM parameters | |||
| 8: target = 15 ms % Queue delay target | 8: target = 15 ms % Queue delay target | |||
| 9: RTT_max = 100 ms % Worst case RTT expected | 9: RTT_max = 100 ms % Worst case RTT expected | |||
| skipping to change at page 37, line 32 ¶ | skipping to change at line 1718 ¶ | |||
| 18: range = 400 us % Range of L4S ramp in time units | 18: range = 400 us % Range of L4S ramp in time units | |||
| 19: Th_len = 1 pkt % Min L4S marking threshold in packets | 19: Th_len = 1 pkt % Min L4S marking threshold in packets | |||
| 20: % L4S constants | 20: % L4S constants | |||
| 21: p_Lmax = 1 % Max L4S marking prob | 21: p_Lmax = 1 % Max L4S marking prob | |||
| 22: } | 22: } | |||
| Figure 2: Example Header Pseudocode for DualQ Coupled PI2 AQM | Figure 2: Example Header Pseudocode for DualQ Coupled PI2 AQM | |||
| The overall goal of the code is to apply the marking and dropping | The overall goal of the code is to apply the marking and dropping | |||
| probabilities for L4S and Classic traffic (p_L and p_C). These are | probabilities for L4S and Classic traffic (p_L and p_C). These are | |||
| derived from the underlying base probabilities p'_L and p' driven | derived from the underlying base probabilities p'_L and p' driven, | |||
| respectively by the traffic in the L and C queues. The marking | respectively, by the traffic in the L and C queues. The marking | |||
| probability for the L queue (p_L) depends on both the base | probability for the L queue (p_L) depends on both the base | |||
| probability in its own queue (p'_L) and a probability called p_CL, | probability in its own queue (p'_L) and a probability called p_CL, | |||
| which is coupled across from p' in the C queue (see Section 2.4 for | which is coupled across from p' in the C queue (see Section 2.4 for | |||
| the derivation of the specific equations and dependencies). | the derivation of the specific equations and dependencies). | |||
| The probabilities p_CL and p_C are derived in lines 4 and 5 of the | The probabilities p_CL and p_C are derived in lines 4 and 5 of the | |||
| dualpi2_update() function (Figure 6) then used in the | dualpi2_update() function (Figure 6) then used in the | |||
| dualpi2_dequeue() function where p_L is also derived from p_CL at | dualpi2_dequeue() function where p_L is also derived from p_CL at | |||
| line 6 (Figure 4). The code walk-through below builds up to | line 6 (Figure 4). The code walk-through below builds up to | |||
| explaining that part of the code eventually, but it starts from | explaining that part of the code eventually, but it starts from | |||
| skipping to change at page 39, line 5 ¶ | skipping to change at line 1779 ¶ | |||
| 24: q.count += likelihood | 24: q.count += likelihood | |||
| 25: if (q.count > 1) { | 25: if (q.count > 1) { | |||
| 26: q.count -= 1 | 26: q.count -= 1 | |||
| 27: return TRUE | 27: return TRUE | |||
| 28: } | 28: } | |||
| 29: return FALSE | 29: return FALSE | |||
| 30: } | 30: } | |||
| Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | Figure 4: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | |||
| When packets arrive, first a common queue limit is checked as shown | When packets arrive, a common queue limit is checked first as shown | |||
| in line 2 of the enqueuing pseudocode in Figure 3. This assumes a | in line 2 of the enqueuing pseudocode in Figure 3. This assumes a | |||
| shared buffer for the two queues (Note b discusses the merits of | shared buffer for the two queues (Note b discusses the merits of | |||
| separate buffers). In order to avoid any bias against larger | separate buffers). In order to avoid any bias against larger | |||
| packets, 1 MTU of space is always allowed, and the limit is | packets, 1 MTU of space is always allowed, and the limit is | |||
| deliberately tested before enqueue. | deliberately tested before enqueue. | |||
| If limit is not exceeded, the packet is timestamped in line 4 (only | If limit is not exceeded, the packet is timestamped in line 4 (only | |||
| if the sojourn time technique is being used to measure queue delay; | if the sojourn time technique is being used to measure queue delay; | |||
| see Note a for alternatives). | see Note a below for alternatives). | |||
| At lines 5-9, the packet is classified and enqueued to the Classic or | At lines 5-9, the packet is classified and enqueued to the Classic or | |||
| L4S queue dependent on the least significant bit of the ECN field in | L4S queue dependent on the least significant bit (LSB) of the ECN | |||
| the IP header (line 6). Packets with a codepoint having an LSB of 0 | field in the IP header (line 6). Packets with a codepoint having an | |||
| (Not-ECT and ECT(0)) will be enqueued in the Classic queue. | LSB of 0 (Not-ECT and ECT(0)) will be enqueued in the Classic queue. | |||
| Otherwise, ECT(1) and CE packets will be enqueued in the L4S queue. | Otherwise, ECT(1) and CE packets will be enqueued in the L4S queue. | |||
| Optional additional packet classification flexibility is omitted for | Optional additional packet classification flexibility is omitted for | |||
| brevity (see the L4S ECN protocol [I-D.ietf-tsvwg-ecn-l4s-id]). | brevity (see the L4S ECN protocol [RFC9331]). | |||
| The dequeue pseudocode (Figure 4) is repeatedly called whenever the | The dequeue pseudocode (Figure 4) is repeatedly called whenever the | |||
| lower layer is ready to forward a packet. It schedules one packet | lower layer is ready to forward a packet. It schedules one packet | |||
| for dequeuing (or zero if the queue is empty) then returns control to | for dequeuing (or zero if the queue is empty) then returns control to | |||
| the caller, so that it does not block while that packet is being | the caller so that it does not block while that packet is being | |||
| forwarded. While making this dequeue decision, it also makes the | forwarded. While making this dequeue decision, it also makes the | |||
| necessary AQM decisions on dropping or marking. The alternative of | necessary AQM decisions on dropping or marking. The alternative of | |||
| applying the AQMs at enqueue would shift some processing from the | applying the AQMs at enqueue would shift some processing from the | |||
| critical time when each packet is dequeued. However, it would also | critical time when each packet is dequeued. However, it would also | |||
| add a whole queue of delay to the control signals, making the control | add a whole queue of delay to the control signals, making the control | |||
| loop sloppier (for a typical RTT it would double the Classic queue's | loop sloppier (for a typical RTT, it would double the Classic queue's | |||
| feedback delay). | feedback delay). | |||
| All the dequeue code is contained within a large while loop so that | All the dequeue code is contained within a large while loop so that | |||
| if it decides to drop a packet, it will continue until it selects a | if it decides to drop a packet, it will continue until it selects a | |||
| packet to schedule. Line 3 of the dequeue pseudocode is where the | packet to schedule. Line 3 of the dequeue pseudocode is where the | |||
| scheduler chooses between the L4S queue (lq) and the Classic queue | scheduler chooses between the L4S queue (lq) and the Classic queue | |||
| (cq). Detailed implementation of the scheduler is not shown (see | (cq). Detailed implementation of the scheduler is not shown (see | |||
| discussion later). | discussion later). | |||
| * If an L4S packet is scheduled, in lines 7 and 8 the packet is ECN- | * If an L4S packet is scheduled, in lines 7 and 8 the packet is ECN- | |||
| marked with likelihood p_L. The recur() function at the end of | marked with likelihood p_L. The recur() function at the end of | |||
| Figure 4 is used, which is preferred over random marking because | Figure 4 is used, which is preferred over random marking because | |||
| it avoids delay due to randomization when interpreting congestion | it avoids delay due to randomization when interpreting congestion | |||
| signals, but it still desynchronizes the saw-teeth of the flows. | signals, but it still desynchronizes the sawteeth of the flows. | |||
| Line 6 calculates p_L as the maximum of the coupled L4S | Line 6 calculates p_L as the maximum of the coupled L4S | |||
| probability p_CL and the probability from the native L4S AQM p'_L. | probability p_CL and the probability from the Native L4S AQM p'_L. | |||
| This implements the max() function shown in Figure 1 to couple the | This implements the max() function shown in Figure 1 to couple the | |||
| outputs of the two AQMs together. Of the two probabilities input | outputs of the two AQMs together. Of the two probabilities input | |||
| to p_L in line 6: | to p_L in line 6: | |||
| - p'_L is calculated per packet in line 5 by the laqm() function | - p'_L is calculated per packet in line 5 by the laqm() function | |||
| (see Figure 5), | (see Figure 5), whereas | |||
| - Whereas p_CL is maintained by the dualpi2_update() function | - p_CL is maintained by the dualpi2_update() function, which runs | |||
| which runs every Tupdate (Tupdate is set in line 12 of | every Tupdate (Tupdate is set in line 12 of Figure 2). | |||
| Figure 2). | ||||
| * If a Classic packet is scheduled, lines 10 to 17 drop or mark the | * If a Classic packet is scheduled, lines 10 to 17 drop or mark the | |||
| packet with probability p_C. | packet with probability p_C. | |||
| The Native L4S AQM algorithm (Figure 5) is a ramp function, similar | The Native L4S AQM algorithm (Figure 5) is a ramp function, similar | |||
| to the RED algorithm, but simplified as follows: | to the RED algorithm, but simplified as follows: | |||
| * The extent of the ramp is defined in units of queuing delay, not | * The extent of the ramp is defined in units of queuing delay, not | |||
| bytes, so that configuration remains invariant as the queue | bytes, so that configuration remains invariant as the queue | |||
| departure rate varies. | departure rate varies. | |||
| * It uses instantaneous queueing delay, which avoids the complexity | * It uses instantaneous queuing delay, which avoids the complexity | |||
| of smoothing, but also avoids embedding a worst-case RTT of | of smoothing, but also avoids embedding a worst-case RTT of | |||
| smoothing delay in the network (see Section 2.1). | smoothing delay in the network (see Section 2.1). | |||
| * The ramp rises linearly directly from 0 to 1, not to an | * The ramp rises linearly directly from 0 to 1, not to an | |||
| intermediate value of p'_L as RED would, because there is no need | intermediate value of p'_L as RED would, because there is no need | |||
| to keep ECN marking probability low. | to keep ECN-marking probability low. | |||
| * Marking does not have to be randomized. Determinism is used | * Marking does not have to be randomized. Determinism is used | |||
| instead of randomness; to reduce the delay necessary to smooth out | instead of randomness to reduce the delay necessary to smooth out | |||
| the noise of randomness from the signal. | the noise of randomness from the signal. | |||
| The ramp function requires two configuration parameters, the minimum | The ramp function requires two configuration parameters, the minimum | |||
| threshold (minTh) and the width of the ramp (range), both in units of | threshold (minTh) and the width of the ramp (range), both in units of | |||
| queuing time, as shown in lines 17 & 18 of the initialization | queuing time, as shown in lines 17 and 18 of the initialization | |||
| function in Figure 2. The ramp function can be configured as a step | function in Figure 2. The ramp function can be configured as a step | |||
| (see Note c). | (see Note c). | |||
| Although the DCTCP paper [Alizadeh-stability] recommends an ECN | Although the DCTCP paper [Alizadeh-stability] recommends an ECN- | |||
| marking threshold of 0.17*RTT_typ, it also shows that the threshold | marking threshold of 0.17*RTT_typ, it also shows that the threshold | |||
| can be much shallower with hardly any worse under-utilization of the | can be much shallower with hardly any worse underutilization of the | |||
| link (because the amplitude of DCTCP's sawteeth is so small). Based | link (because the amplitude of DCTCP's sawteeth is so small). Based | |||
| on extensive experiments, for the public Internet the default minimum | on extensive experiments, for the public Internet the default minimum | |||
| ECN marking threshold (target) in Figure 2 is considered a good | ECN-marking threshold (target) in Figure 2 is considered a good | |||
| compromise, even though it is significantly smaller fraction of | compromise, even though it is a significantly smaller fraction of | |||
| RTT_typ. | RTT_typ. | |||
| 1: laqm(qdelay) { % Returns native L4S AQM probability | 1: laqm(qdelay) { % Returns Native L4S AQM probability | |||
| 2: if (qdelay >= maxTh) | 2: if (qdelay >= maxTh) | |||
| 3: return 1 | 3: return 1 | |||
| 4: else if (qdelay > minTh) | 4: else if (qdelay > minTh) | |||
| 5: return (qdelay - minTh)/range % Divide could use a bit-shift | 5: return (qdelay - minTh)/range % Divide could use a bit-shift | |||
| 6: else | 6: else | |||
| 7: return 0 | 7: return 0 | |||
| 8: } | 8: } | |||
| Figure 5: Example Pseudocode for the Native L4S AQM | Figure 5: Example Pseudocode for the Native L4S AQM | |||
| 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | |||
| 2: curq = cq.time() % use queuing time of first-in Classic packet | 2: curq = cq.time() % use queuing time of first-in Classic packet | |||
| 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | |||
| 4: p_CL = k * p' % Coupled L4S prob = base prob * coupling factor | 4: p_CL = k * p' % Coupled L4S prob = base prob * coupling factor | |||
| 5: p_C = p'^2 % Classic prob = (base prob)^2 | 5: p_C = p'^2 % Classic prob = (base prob)^2 | |||
| 6: prevq = curq | 6: prevq = curq | |||
| 7: } | 7: } | |||
| Figure 6: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | Figure 6: Example PI-update Pseudocode for DualQ Coupled PI2 AQM | |||
| (Clamping p' within the range [0,1] omitted for clarity - see text) | (Note: Clamping p' within the range [0,1] omitted for clarity -- | |||
| see below.) | ||||
| The coupled marking probability, p_CL depends on the base probability | The coupled marking probability p_CL depends on the base probability | |||
| (p'), which is kept up to date by the core PI algorithm in Figure 6 | (p'), which is kept up to date by executing the core PI algorithm in | |||
| executed every Tupdate. | Figure 6 every Tupdate. | |||
| Note that p' solely depends on the queuing time in the Classic queue. | Note that p' solely depends on the queuing time in the Classic queue. | |||
| In line 2, the current queuing delay (curq) is evaluated from how | In line 2, the current queuing delay (curq) is evaluated from how | |||
| long the head packet was in the Classic queue (cq). The function | long the head packet was in the Classic queue (cq). The function | |||
| cq.time() (not shown) subtracts the time stamped at enqueue from the | cq.time() (not shown) subtracts the time stamped at enqueue from the | |||
| current time (see Note a) and implicitly takes the current queuing | current time (see Note a below) and implicitly takes the current | |||
| delay as 0 if the queue is empty. | queuing delay as 0 if the queue is empty. | |||
| The algorithm centres on line 3, which is a classical Proportional- | The algorithm centres on line 3, which is a classical PI controller | |||
| Integral (PI) controller that alters p' dependent on: a) the error | that alters p' dependent on: a) the error between the current queuing | |||
| between the current queuing delay (curq) and the target queuing | delay (curq) and the target queuing delay (target) and b) the change | |||
| delay, 'target'; and b) the change in queuing delay since the last | in queuing delay since the last sample. The name 'PI' represents the | |||
| sample. The name 'PI' represents the fact that the second factor | fact that the second factor (how fast the queue is growing) is | |||
| (how fast the queue is growing) is _P_roportional to load while the | Proportional to load while the first is the Integral of the load (so | |||
| first is the _I_ntegral of the load (so it removes any standing queue | it removes any standing queue in excess of the target). | |||
| in excess of the target). | ||||
| The target parameter can be set based on local knowledge, but the aim | The target parameter can be set based on local knowledge, but the aim | |||
| is for the default to be a good compromise for anywhere in the | is for the default to be a good compromise for anywhere in the | |||
| intended deployment environment -- the public Internet. According to | intended deployment environment -- the public Internet. According to | |||
| [PI2param], the target queuing delay on line 9 of Figure 2 is related | [PI2param], the target queuing delay on line 8 of Figure 2 is related | |||
| to the typical base RTT worldwide, RTT_typ, by two factors: target = | to the typical base RTT worldwide, RTT_typ, by two factors: target = | |||
| RTT_typ * g * f. Below we summarize the rationale behind these | RTT_typ * g * f. Below, we summarize the rationale behind these | |||
| factors and introduce a further adjustment. The two factors ensure | factors and introduce a further adjustment. The two factors ensure | |||
| that, in a large proportion of cases (say 90%), the sawtooth | that, in a large proportion of cases (say 90%), the sawtooth | |||
| variations in RTT of a single flow will fit within the buffer without | variations in RTT of a single flow will fit within the buffer without | |||
| underutilizing the link. Frankly, these factors are educated | underutilizing the link. Frankly, these factors are educated | |||
| guesses, but with the emphasis closer to 'educated' than to 'guess' | guesses, but with the emphasis closer to 'educated' than to 'guess' | |||
| (see [PI2param] for full background): | (see [PI2param] for the full background): | |||
| * RTT_typ is taken as 25 ms. This is based on an average CDN | * RTT_typ is taken as 25 ms. This is based on an average CDN | |||
| latency measured in each country weighted by the number of | latency measured in each country weighted by the number of | |||
| Internet users in that country to produce an overall weighted | Internet users in that country to produce an overall weighted | |||
| average for the Internet [PI2param]. Countries were ranked by | average for the Internet [PI2param]. Countries were ranked by | |||
| number of Internet users, and once 90% of Internet users were | number of Internet users, and once 90% of Internet users were | |||
| covered, smaller countries were excluded to avoid | covered, smaller countries were excluded to avoid small sample | |||
| unrepresentatively small sample sizes. Also, importantly, the | sizes that would be less representative. Also, importantly, the | |||
| data for the average CDN latency in China (with the largest number | data for the average CDN latency in China (with the largest number | |||
| of Internet users) has been removed, because the CDN latency was a | of Internet users) has been removed, because the CDN latency was a | |||
| significant outlier and, on reflection, the experimental technique | significant outlier and, on reflection, the experimental technique | |||
| seemed inappropriate to the CDN market in China. | seemed inappropriate to the CDN market in China. | |||
| * g is taken as 0.38. The factor g is a geometry factor that | * g is taken as 0.38. The factor g is a geometry factor that | |||
| characterizes the shape of the sawteeth of prevalent Classic | characterizes the shape of the sawteeth of prevalent Classic | |||
| congestion controllers. The geometry factor is the fraction of | congestion controllers. The geometry factor is the fraction of | |||
| the amplitude of the sawtooth variability in queue delay that lies | the amplitude of the sawtooth variability in queue delay that lies | |||
| below the AQM's target. For instance, at low bit rate, the | below the AQM's target. For instance, at low bitrates, the | |||
| geometry factor of standard Reno is 0.5, but at higher rates it | geometry factor of standard Reno is 0.5, but at higher rates, it | |||
| tends to just under 1. According to the census of congestion | tends towards just under 1. According to the census of congestion | |||
| controllers conducted by Mishra et al. in Jul-Oct | controllers conducted by Mishra et al. in Jul-Oct 2019 | |||
| 2019 [CCcensus19], most Classic TCP traffic uses Cubic. And, | [CCcensus19], most Classic TCP traffic uses CUBIC. And, according | |||
| according to the analysis in [PI2param], if running over a PI2 | to the analysis in [PI2param], if running over a PI2 AQM, a large | |||
| AQM, a large proportion of this Cubic traffic would be in its | proportion of this CUBIC traffic would be in its Reno-friendly | |||
| Reno-Friendly mode, which has a geometry factor of ~0.39 (all | mode, which has a geometry factor of ~0.39 (for all known | |||
| known implementations). The rest of the Cubic traffic would be in | implementations). The rest of the CUBIC traffic would be in true | |||
| true Cubic mode, which has a geometry factor of ~0.36. Without | CUBIC mode, which has a geometry factor of ~0.36. Without | |||
| modelling the sawtooth profiles from all the other less prevalent | modelling the sawtooth profiles from all the other less prevalent | |||
| congestion controllers, we estimate a 7:3 weighted average of | congestion controllers, we estimate a 7:3 weighted average of | |||
| these two, resulting in an average geometry factor of 0.38. | these two, resulting in an average geometry factor of 0.38. | |||
| * f is taken as 2. The factor f is a safety factor that increases | * f is taken as 2. The factor f is a safety factor that increases | |||
| the target queue to allow for the distribution of RTT_typ around | the target queue to allow for the distribution of RTT_typ around | |||
| its mean. Otherwise, the target queue would only avoid | its mean. Otherwise, the target queue would only avoid | |||
| underutilization for those users below the mean. It also provides | underutilization for those users below the mean. It also provides | |||
| a safety margin for the proportion of paths in use that span | a safety margin for the proportion of paths in use that span | |||
| beyond the distance between a user and their local CDN. | beyond the distance between a user and their local CDN. | |||
| Currently, no data is available on the variance of queue delay | Currently, no data is available on the variance of queue delay | |||
| around the mean in each region, so there is plenty of room for | around the mean in each region, so there is plenty of room for | |||
| this guess to become more educated. | this guess to become more educated. | |||
| * [PI2param] recommends target = RTT_typ * g * f = 25ms * 0.38 * 2 = | * [PI2param] recommends target = RTT_typ * g * f = 25 ms * 0.38 * 2 | |||
| 19 ms. However, a further adjustment is warranted, because target | = 19 ms. However, a further adjustment is warranted, because | |||
| is moving year-on-year. The paper is based on data collected in | target is moving year-on-year. The paper is based on data | |||
| 2019, and it mentions evidence from speedtest.net that suggests | collected in 2019, and it mentions evidence from the Speedtest | |||
| RTT_typ reduced by 17% (fixed) or 12% (mobile) between 2020 and | Global Index that suggests RTT_typ reduced by 17% (fixed) or 12% | |||
| 2021. Therefore, we recommend a default of target = 15 ms at the | (mobile) between 2020 and 2021. Therefore, we recommend a default | |||
| time of writing (2021). | of target = 15 ms at the time of writing (2021). | |||
| Operators can always use the data and discussion in [PI2param] to | Operators can always use the data and discussion in [PI2param] to | |||
| configure a more appropriate target for their environment. For | configure a more appropriate target for their environment. For | |||
| instance, an operator might wish to question the assumptions called | instance, an operator might wish to question the assumptions called | |||
| out in that paper, such as the goal of no underutilization for a | out in that paper, such as the goal of no underutilization for a | |||
| large majority of single flow transfers (given many large transfers | large majority of single flow transfers (given many large transfers | |||
| use multiple flows to avoid the scaling limitations of Classic | use multiple flows to avoid the scaling limitations of Classic | |||
| flows). | flows). | |||
| The two 'gain factors' in line 3 of Figure 6, alpha and beta, | The two 'gain factors' in line 3 of Figure 6, alpha and beta, | |||
| respectively weight how strongly each of the two elements (Integral | respectively weight how strongly each of the two elements (Integral | |||
| and Proportional) alters p'. They are in units of 'per second of | and Proportional) alters p'. They are in units of 'per second of | |||
| delay' or Hz, because they transform differences in queueing delay | delay' or Hz, because they transform differences in queuing delay | |||
| into changes in probability (assuming probability has a value from 0 | into changes in probability (assuming probability has a value from 0 | |||
| to 1). | to 1). | |||
| Alpha and beta determine how much p' ought to change after each | Alpha and beta determine how much p' ought to change after each | |||
| update interval (Tupdate). For smaller Tupdate, p' should change by | update interval (Tupdate). For a smaller Tupdate, p' should change | |||
| the same amount per second, but in finer more frequent steps. So | by the same amount per second but in finer more frequent steps. So | |||
| alpha depends on Tupdate (see line 13 of the initialization function | alpha depends on Tupdate (see line 13 of the initialization function | |||
| in Figure 2). It is best to update p' as frequently as possible, but | in Figure 2). It is best to update p' as frequently as possible, but | |||
| Tupdate will probably be constrained by hardware performance. As | Tupdate will probably be constrained by hardware performance. As | |||
| shown in line 13, the update interval should be frequent enough to | shown in line 12, the update interval should be frequent enough to | |||
| update at least once in the time taken for the target queue to drain | update at least once in the time taken for the target queue to drain | |||
| ('target') as long as it updates at least three times per maximum | ('target') as long as it updates at least three times per maximum | |||
| RTT. Tupdate defaults to 16 ms in the reference Linux implementation | RTT. Tupdate defaults to 16 ms in the reference Linux implementation | |||
| because it has to be rounded to a multiple of 4 ms. For link rates | because it has to be rounded to a multiple of 4 ms. For link rates | |||
| from 4 to 200 Mb/s and a maximum RTT of 100ms, it has been verified | from 4 to 200 Mb/s and a maximum RTT of 100 ms, it has been verified | |||
| through extensive testing that Tupdate=16ms (as also recommended in | through extensive testing that Tupdate = 16 ms (as also recommended | |||
| the PIE spec [RFC8033]) is sufficient. | in the PIE spec [RFC8033]) is sufficient. | |||
| The choice of alpha and beta also determines the AQM's stable | The choice of alpha and beta also determines the AQM's stable | |||
| operating range. The AQM ought to change p' as fast as possible in | operating range. The AQM ought to change p' as fast as possible in | |||
| response to changes in load without over-compensating and therefore | response to changes in load without overcompensating and therefore | |||
| causing oscillations in the queue. Therefore, the values of alpha | causing oscillations in the queue. Therefore, the values of alpha | |||
| and beta also depend on the RTT of the expected worst-case flow | and beta also depend on the RTT of the expected worst-case flow | |||
| (RTT_max). | (RTT_max). | |||
| The maximum RTT of a PI controller (RTT_max in line 10 of Figure 2) | The maximum RTT of a PI controller (RTT_max in line 9 of Figure 2) is | |||
| is not an absolute maximum, but more instability (more queue | not an absolute maximum, but more instability (more queue | |||
| variability) sets in for long-running flows with an RTT above this | variability) sets in for long-running flows with an RTT above this | |||
| value. The propagation delay halfway round the planet and back in | value. The propagation delay halfway round the planet and back in | |||
| glass fibre is 200 ms. However, hardly any traffic traverses such | glass fibre is 200 ms. However, hardly any traffic traverses such | |||
| extreme paths and, since the significant consolidation of Internet | extreme paths and, since the significant consolidation of Internet | |||
| traffic between 2007 and 2009 [Labovitz10], a high and growing | traffic between 2007 and 2009 [Labovitz10], a high and growing | |||
| proportion of all Internet traffic (roughly two-thirds at the time of | proportion of all Internet traffic (roughly two-thirds at the time of | |||
| writing) has been served from content distribution networks (CDNs) or | writing) has been served from CDNs or 'cloud' services distributed | |||
| 'cloud' services distributed close to end-users. The Internet might | close to end users. The Internet might change again, but for now, | |||
| change again, but for now, designing for a maximum RTT of 100ms is a | designing for a maximum RTT of 100 ms is a good compromise between | |||
| good compromise between faster queue control at low RTT and some | faster queue control at low RTT and some instability on the occasions | |||
| instability on the occasions when a longer path is necessary. | when a longer path is necessary. | |||
| Recommended derivations of the gain constants alpha and beta can be | Recommended derivations of the gain constants alpha and beta can be | |||
| approximated for Reno over a PI2 AQM as: alpha = 0.1 * Tupdate / | approximated for Reno over a PI2 AQM as: alpha = 0.1 * Tupdate / | |||
| RTT_max^2; beta = 0.3 / RTT_max, as shown in lines 14 & 15 of | RTT_max^2; beta = 0.3 / RTT_max, as shown in lines 13 and 14 of | |||
| Figure 2. These are derived from the stability analysis in [PI2]. | Figure 2. These are derived from the stability analysis in [PI2]. | |||
| For the default values of Tupdate=16 ms and RTT_max = 100 ms, they | For the default values of Tupdate = 16 ms and RTT_max = 100 ms, they | |||
| result in alpha = 0.16; beta = 3.2 (discrepancies are due to | result in alpha = 0.16; beta = 3.2 (discrepancies are due to | |||
| rounding). These defaults have been verified with a wide range of | rounding). These defaults have been verified with a wide range of | |||
| link rates, target delays and a range of traffic models with mixed | link rates, target delays, and traffic models with mixed and similar | |||
| and similar RTTs, short and long flows, etc. | RTTs, short and long flows, etc. | |||
| In corner cases, p' can overflow the range [0,1] so the resulting | In corner cases, p' can overflow the range [0,1] so the resulting | |||
| value of p' has to be bounded (omitted from the pseudocode). Then, | value of p' has to be bounded (omitted from the pseudocode). Then, | |||
| as already explained, the coupled and Classic probabilities are | as already explained, the coupled and Classic probabilities are | |||
| derived from the new p' in lines 4 and 5 of Figure 6 as p_CL = k*p' | derived from the new p' in lines 4 and 5 of Figure 6 as p_CL = k*p' | |||
| and p_C = p'^2. | and p_C = p'^2. | |||
| Because the coupled L4S marking probability (p_CL) is factored up by | Because the coupled L4S marking probability (p_CL) is factored up by | |||
| k, the dynamic gain parameters alpha and beta are also inherently | k, the dynamic gain parameters alpha and beta are also inherently | |||
| factored up by k for the L4S queue. So, the effective gain factor | factored up by k for the L4S queue. So, the effective gain factor | |||
| for the L4S queue is k*alpha (with defaults alpha = 0.16 Hz and k=2, | for the L4S queue is k*alpha (with defaults alpha = 0.16 Hz and k = | |||
| effective L4S alpha = 0.32 Hz). | 2, effective L4S alpha = 0.32 Hz). | |||
| Unlike in PIE [RFC8033], alpha and beta do not need to be tuned every | Unlike in PIE [RFC8033], alpha and beta do not need to be tuned every | |||
| Tupdate dependent on p'. Instead, in PI2, alpha and beta are | Tupdate dependent on p'. Instead, in PI2, alpha and beta are | |||
| independent of p' because the squaring applied to Classic traffic | independent of p' because the squaring applied to Classic traffic | |||
| tunes them inherently. This is explained in [PI2], which also | tunes them inherently. This is explained in [PI2], which also | |||
| explains why this more principled approach removes the need for most | explains why this more principled approach removes the need for most | |||
| of the heuristics that had to be added to PIE. | of the heuristics that had to be added to PIE. | |||
| Nonetheless, an implementer might wish to add selected details to | Nonetheless, an implementer might wish to add selected details to | |||
| either AQM. For instance the Linux reference DualPI2 implementation | either AQM. For instance, the Linux reference DualPI2 implementation | |||
| includes the following (not shown in the pseudocode above): | includes the following (not shown in the pseudocode above): | |||
| * Classic and coupled marking or dropping (i.e. based on p_C and | * Classic and coupled marking or dropping (i.e., based on p_C and | |||
| p_CL from the PI controller) is not applied to a packet if the | p_CL from the PI controller) is not applied to a packet if the | |||
| aggregate queue length in bytes is < 2 MTU (prior to enqueuing the | aggregate queue length in bytes is < 2 MTU (prior to enqueuing the | |||
| packet or dequeuing it, depending on whether the AQM is configured | packet or dequeuing it, depending on whether the AQM is configured | |||
| to be applied at enqueue or dequeue); | to be applied at enqueue or dequeue); and | |||
| * In the WRR scheduler, the 'credit' indicating which queue should | * in the WRR scheduler, the 'credit' indicating which queue should | |||
| transmit is only changed if there are packets in both queues | transmit is only changed if there are packets in both queues | |||
| (i.e. if there is actual resource contention). This means that a | (i.e., if there is actual resource contention). This means that a | |||
| properly paced L flow might never be delayed by the WRR. The WRR | properly paced L flow might never be delayed by the WRR. The WRR | |||
| credit is reset in favour of the L queue when the link is idle. | credit is reset in favour of the L queue when the link is idle. | |||
| An implementer might also wish to add other heuristics, e.g. burst | An implementer might also wish to add other heuristics, e.g., burst | |||
| protection [RFC8033] or enhanced burst protection [RFC8034]. | protection [RFC8033] or enhanced burst protection [RFC8034]. | |||
| Notes: | Notes: | |||
| a. The drain rate of the queue can vary if it is scheduled relative | a. The drain rate of the queue can vary if it is scheduled relative | |||
| to other queues, or to cater for fluctuations in a wireless | to other queues or if it accommodates fluctuations in a wireless | |||
| medium. To auto-adjust to changes in drain rate, the queue needs | medium. To auto-adjust to changes in drain rate, the queue needs | |||
| to be measured in time, not bytes or packets [AQMmetrics], | to be measured in time, not bytes or packets [AQMmetrics] | |||
| [CoDel]. Queuing delay could be measured directly as the sojourn | [CoDel]. Queuing delay could be measured directly as the sojourn | |||
| time (aka. service time) of the queue, by storing a per-packet | time (a.k.a. service time) of the queue by storing a per-packet | |||
| time-stamp as each packet is enqueued, and subtracting this from | timestamp as each packet is enqueued and subtracting it from the | |||
| the system time when the packet is dequeued. If time-stamping is | system time when the packet is dequeued. If timestamping is not | |||
| not easy to introduce with certain hardware, queuing delay could | easy to introduce with certain hardware, queuing delay could be | |||
| be predicted indirectly by dividing the size of the queue by the | predicted indirectly by dividing the size of the queue by the | |||
| predicted departure rate, which might be known precisely for some | predicted departure rate, which might be known precisely for some | |||
| link technologies (see for example in DOCSIS PIE [RFC8034]). | link technologies (see, for example, DOCSIS PIE [RFC8034]). | |||
| However, sojourn time is slow to detect bursts. For instance, if | However, sojourn time is slow to detect bursts. For instance, if | |||
| a burst arrives at an empty queue, the sojourn time only fully | a burst arrives at an empty queue, the sojourn time only fully | |||
| measures the burst's delay when its last packet is dequeued, even | measures the burst's delay when its last packet is dequeued, even | |||
| though the queue has known the size of the burst since its last | though the queue has known the size of the burst since its last | |||
| packet was enqueued - so it could have signalled congestion | packet was enqueued -- so it could have signalled congestion | |||
| earlier. To remedy this, each head packet can be marked when it | earlier. To remedy this, each head packet can be marked when it | |||
| is dequeued based on the expected delay of the tail packet behind | is dequeued based on the expected delay of the tail packet behind | |||
| it, as explained below, rather than based on the head packet's | it, as explained below, rather than based on the head packet's | |||
| own delay due to the packets in front of it. [Heist21] identifies | own delay due to the packets in front of it. "Underutilization | |||
| a specific scenario where bursty traffic significantly hits | with Bursty Traffic" in [Heist21] identifies a specific scenario | |||
| utilization of the L queue. If this effect proves to be more | where bursty traffic significantly hits utilization of the L | |||
| widely applicable, using the delay behind the head could improve | queue. If this effect proves to be more widely applicable, using | |||
| performance. | the delay behind the head could improve performance. | |||
| The delay behind the head can be implemented by dividing the | The delay behind the head can be implemented by dividing the | |||
| backlog at dequeue by the link rate or equivalently multiplying | backlog at dequeue by the link rate or equivalently multiplying | |||
| the backlog by the delay per unit of backlog. The implementation | the backlog by the delay per unit of backlog. The implementation | |||
| details will depend on whether the link rate is known; if it is | details will depend on whether the link rate is known; if it is | |||
| not, a moving average of the delay per unit backlog can be | not, a moving average of the delay per unit backlog can be | |||
| maintained. This delay consists of serialization as well as | maintained. This delay consists of serialization as well as | |||
| media acquisition for shared media. So the details will depend | media acquisition for shared media. So the details will depend | |||
| strongly on the specific link technology, This approach should be | strongly on the specific link technology. This approach should | |||
| less sensitive to timing errors and cost less in operations and | be less sensitive to timing errors and cost less in operations | |||
| memory than the otherwise equivalent 'scaled sojourn time' | and memory than the otherwise equivalent 'scaled sojourn time' | |||
| metric, which is the sojourn time of a packet scaled by the ratio | metric, which is the sojourn time of a packet scaled by the ratio | |||
| of the queue sizes when the packet departed and | of the queue sizes when the packet departed and arrived | |||
| arrived [SigQ-Dyn]. | [SigQ-Dyn]. | |||
| b. Line 2 of the dualpi2_enqueue() function (Figure 3) assumes an | b. Line 2 of the dualpi2_enqueue() function (Figure 3) assumes an | |||
| implementation where lq and cq share common buffer memory. An | implementation where lq and cq share common buffer memory. An | |||
| alternative implementation could use separate buffers for each | alternative implementation could use separate buffers for each | |||
| queue, in which case the arriving packet would have to be | queue, in which case the arriving packet would have to be | |||
| classified first to determine which buffer to check for available | classified first to determine which buffer to check for available | |||
| space. The choice is a trade-off; a shared buffer can use less | space. The choice is a trade-off; a shared buffer can use less | |||
| memory whereas separate buffers isolate the L4S queue from tail- | memory whereas separate buffers isolate the L4S queue from tail | |||
| drop due to large bursts of Classic traffic (e.g. a Classic Reno | drop due to large bursts of Classic traffic (e.g., a Classic Reno | |||
| TCP during slow-start over a long RTT). | TCP during slow-start over a long RTT). | |||
| c. There has been some concern that using the step function of DCTCP | c. There has been some concern that using the step function of DCTCP | |||
| for the Native L4S AQM requires end-systems to smooth the signal | for the Native L4S AQM requires end systems to smooth the signal | |||
| for an unnecessarily large number of round trips to ensure | for an unnecessarily large number of round trips to ensure | |||
| sufficient fidelity. A ramp is no worse than a step in initial | sufficient fidelity. A ramp is no worse than a step in initial | |||
| experiments with existing DCTCP. Therefore, it is recommended | experiments with existing DCTCP. Therefore, it is recommended | |||
| that a ramp is configured in place of a step, which will allow | that a ramp is configured in place of a step, which will allow | |||
| congestion control algorithms to investigate faster smoothing | congestion control algorithms to investigate faster smoothing | |||
| algorithms. | algorithms. | |||
| A ramp is more general that a step, because an operator can | A ramp is more general than a step, because an operator can | |||
| effectively turn the ramp into a step function, as used by DCTCP, | effectively turn the ramp into a step function, as used by DCTCP, | |||
| by setting the range to zero. There will not be a divide by zero | by setting the range to zero. There will not be a divide by zero | |||
| problem at line 5 of Figure 5 because, if minTh is equal to | problem at line 5 of Figure 5 because, if minTh is equal to | |||
| maxTh, the condition for this ramp calculation cannot arise. | maxTh, the condition for this ramp calculation cannot arise. | |||
| A.2. Pass #2: Edge-Case Details | A.2. Pass #2: Edge-Case Details | |||
| This section takes a second pass through the pseudocode adding | This section takes a second pass through the pseudocode to add | |||
| details of two edge-cases: low link rate and overload. Figure 7 | details of two edge-cases: low link rate and overload. Figure 7 | |||
| repeats the dequeue function of Figure 4, but with details of both | repeats the dequeue function of Figure 4, but with details of both | |||
| edge-cases added. Similarly, Figure 8 repeats the core PI algorithm | edge-cases added. Similarly, Figure 8 repeats the core PI algorithm | |||
| of Figure 6, but with overload details added. The initialization, | of Figure 6, but with overload details added. The initialization, | |||
| enqueue, L4S AQM and recur functions are unchanged. | enqueue, L4S AQM, and recur functions are unchanged. | |||
| The link rate can be so low that it takes a single packet queue | The link rate can be so low that it takes a single packet queue | |||
| longer to serialize than the threshold delay at which ECN marking | longer to serialize than the threshold delay at which ECN marking | |||
| starts to be applied in the L queue. Therefore, a minimum marking | starts to be applied in the L queue. Therefore, a minimum marking | |||
| threshold parameter in units of packets rather than time is necessary | threshold parameter in units of packets rather than time is necessary | |||
| (Th_len, default 1 packet in line 19 of Figure 2) to ensure that the | (Th_len, default 1 packet in line 19 of Figure 2) to ensure that the | |||
| ramp does not trigger excessive marking on slow links. Where an | ramp does not trigger excessive marking on slow links. Where an | |||
| implementation knows the link rate, it can set up this minimum at the | implementation knows the link rate, it can set up this minimum at the | |||
| time it is configured. For instance, it would divide 1 MTU by the | time it is configured. For instance, it would divide 1 MTU by the | |||
| link rate to convert it into a serialization time, then if the lower | link rate to convert it into a serialization time, then if the lower | |||
| threshold of the Native L AQM ramp was lower than this serialization | threshold of the Native L AQM ramp was lower than this serialization | |||
| time, it could increase the thresholds to shift the bottom of the | time, it could increase the thresholds to shift the bottom of the | |||
| ramp to 2 MTU. This is the approach used in DOCSIS [DOCSIS3.1], | ramp to 2 MTU. This is the approach used in DOCSIS [DOCSIS3.1], | |||
| because the configured link rate is dedicated to the DualQ. | because the configured link rate is dedicated to the DualQ. | |||
| The pseudocode given here applies where the link rate is unknown, | The pseudocode given here applies where the link rate is unknown, | |||
| which is more common for software implementations that might be | which is more common for software implementations that might be | |||
| deployed in scenarios where the link is shared with other queues. In | deployed in scenarios where the link is shared with other queues. In | |||
| lines 5a to 5d in Figure 7 the native L4S marking probability, p'_L, | lines 5a to 5d in Figure 7, the native L4S marking probability, p'_L, | |||
| is zeroed if the queue is only 1 packet (in the default | is zeroed if the queue is only 1 packet (in the default | |||
| configuration). | configuration). | |||
| Linux implementation note: | | Linux implementation note: In Linux, the check that the queue | |||
| | exceeds Th_len before marking with the Native L4S AQM is | ||||
| * In Linux, the check that the queue exceeds Th_len before marking | | actually at enqueue, not dequeue; otherwise, it would exempt | |||
| with the native L4S AQM is actually at enqueue, not dequeue, | | the last packet of a burst from being marked. The result of | |||
| otherwise it would exempt the last packet of a burst from being | | the check is conveyed from enqueue to the dequeue function via | |||
| marked. The result of the check is conveyed from enqueue to the | | a boolean in the packet metadata. | |||
| dequeue function via a boolean in the packet metadata. | ||||
| Persistent overload is deemed to have occurred when Classic drop/ | Persistent overload is deemed to have occurred when Classic drop/ | |||
| marking probability reaches p_Cmax. Above this point, the Classic | marking probability reaches p_Cmax. Above this point, the Classic | |||
| drop probability is applied to both L and C queues, irrespective of | drop probability is applied to both the L and C queues, irrespective | |||
| whether any packet is ECN-capable. ECT packets that are not dropped | of whether any packet is ECN-capable. ECT packets that are not | |||
| can still be ECN-marked. | dropped can still be ECN-marked. | |||
| In line 10 of the initialization function (Figure 2), the maximum | In line 11 of the initialization function (Figure 2), the maximum | |||
| Classic drop probability p_Cmax = min(1/k^2, 1) or 1/4 for the | Classic drop probability p_Cmax = min(1/k^2, 1) or 1/4 for the | |||
| default coupling factor k=2. In practice, 25% has been found to be a | default coupling factor k = 2. In practice, 25% has been found to be | |||
| good threshold to preserve fairness between ECN capable and non ECN | a good threshold to preserve fairness between ECN-capable and non- | |||
| capable traffic. This protects the queues against both temporary | ECN-capable traffic. This protects the queues against both temporary | |||
| overload from responsive flows and more persistent overload from any | overload from responsive flows and more persistent overload from any | |||
| unresponsive traffic that falsely claims to be responsive to ECN. | unresponsive traffic that falsely claims to be responsive to ECN. | |||
| When the Classic ECN marking probability reaches the p_Cmax threshold | When the Classic ECN-marking probability reaches the p_Cmax threshold | |||
| (1/k^2), the marking probability coupled to the L4S queue, p_CL will | (1/k^2), the marking probability that is coupled to the L4S queue, | |||
| always be 100% for any k (by equation (1) in Section 2). So, for | p_CL, will always be 100% for any k (by equation (1) in Section 2.1). | |||
| readability, the constant p_Lmax is defined as 1 in line 22 of the | So, for readability, the constant p_Lmax is defined as 1 in line 21 | |||
| initialization function (Figure 2). This is intended to ensure that | of the initialization function (Figure 2). This is intended to | |||
| the L4S queue starts to introduce dropping once ECN-marking saturates | ensure that the L4S queue starts to introduce dropping once ECN | |||
| at 100% and can rise no further. The 'Prague L4S' | marking saturates at 100% and can rise no further. The 'Prague L4S | |||
| requirements [I-D.ietf-tsvwg-ecn-l4s-id] state that, when an L4S | requirements' [RFC9331] state that when an L4S congestion control | |||
| congestion control detects a drop, it falls back to a response that | detects a drop, it falls back to a response that coexists with | |||
| coexists with 'Classic' Reno congestion control. So it is correct | 'Classic' Reno congestion control. So, it is correct that when the | |||
| that, when the L4S queue drops packets, it drops them proportional to | L4S queue drops packets, it drops them proportional to p'^2, as if | |||
| p'^2, as if they are Classic packets. | they are Classic packets. | |||
| The two queues each test for overload in lines 4b and 12b of the | The two queues each test for overload in lines 4b and 12b of the | |||
| dequeue function (Figure 7). Lines 8c to 8g drop L4S packets with | dequeue function (Figure 7). Lines 8c to 8g drop L4S packets with | |||
| probability p'^2. Lines 8h to 8i mark the remaining packets with | probability p'^2. Lines 8h to 8i mark the remaining packets with | |||
| probability p_CL. Given p_Lmax = 1, all remaining packets will be | probability p_CL. Given p_Lmax = 1, all remaining packets will be | |||
| marked because, to have reached the else block at line 8b, p_CL >= 1. | marked because, to have reached the else block at line 8b, p_CL >= 1. | |||
| Line 2a in the core PI algorithm (Figure 8) deals with overload of | Line 2a in the core PI algorithm (Figure 8) deals with overload of | |||
| the L4S queue when there is little or no Classic traffic. This is | the L4S queue when there is little or no Classic traffic. This is | |||
| necessary, because the core PI algorithm maintains the appropriate | necessary, because the core PI algorithm maintains the appropriate | |||
| drop probability to regulate overload, but it depends on the length | drop probability to regulate overload, but it depends on the length | |||
| of the Classic queue. If there is little or no Classic queue the | of the Classic queue. If there is little or no Classic queue, the | |||
| naive PI update function in Figure 6 would drop nothing, even if the | naive PI-update function (Figure 6) would drop nothing, even if the | |||
| L4S queue were overloaded - so tail drop would have to take over | L4S queue were overloaded -- so tail drop would have to take over | |||
| (lines 2 and 3 of Figure 3). | (lines 2 and 3 of Figure 3). | |||
| Instead, line 2a of the full PI update function in Figure 8 ensures | Instead, line 2a of the full PI-update function (Figure 8) ensures | |||
| that the base PI AQM in line 3 is driven by whichever of the two | that the Base PI AQM in line 3 is driven by whichever of the two | |||
| queue delays is greater, but line 3 still always uses the same | queue delays is greater, but line 3 still always uses the same | |||
| Classic target (default 15 ms). If L queue delay is greater just | Classic target (default 15 ms). If L queue delay is greater just | |||
| because there is little or no Classic traffic, normally it will still | because there is little or no Classic traffic, normally it will still | |||
| be well below the base AQM target. This is because L4S traffic is | be well below the Base AQM target. This is because L4S traffic is | |||
| also governed by the shallow threshold of its own native AQM (lines 5 | also governed by the shallow threshold of its own Native AQM (lines | |||
| and 6 of the dequeue algorithm in Figure 7). So the base AQM will be | 5a to 6 of the dequeue algorithm in Figure 7). So the Base AQM will | |||
| driven to zero and not contribute. However, if the L queue is | be driven to zero and not contribute. However, if the L queue is | |||
| overloaded by traffic that is unresponsive to its marking, the max() | overloaded by traffic that is unresponsive to its marking, the max() | |||
| in line 2 enables the L queue to smoothly take over driving the base | in line 2a of Figure 8 enables the L queue to smoothly take over | |||
| AQM into overload mode even if there is little or no Classic traffic. | driving the Base AQM into overload mode even if there is little or no | |||
| Then the base AQM will keep the L queue to the Classic target | Classic traffic. Then the Base AQM will keep the L queue to the | |||
| (default 15 ms) by shedding L packets. | Classic target (default 15 ms) by shedding L packets. | |||
| 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | 1: dualpi2_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | |||
| 2: while ( lq.byt() + cq.byt() > 0 ) { | 2: while ( lq.byt() + cq.byt() > 0 ) { | |||
| 3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
| 4a: lq.dequeue(pkt) % L4S scheduled | 4a: lq.dequeue(pkt) % L4S scheduled | |||
| 4b: if ( p_CL < p_Lmax ) { % Check for overload saturation | 4b: if ( p_CL < p_Lmax ) { % Check for overload saturation | |||
| 5a: if (lq.len()>Th_len) % >1 packet queued | 5a: if (lq.len()>Th_len) % >1 packet queued | |||
| 5b: p'_L = laqm(lq.time()) % Native LAQM | 5b: p'_L = laqm(lq.time()) % Native LAQM | |||
| 5c: else | 5c: else | |||
| 5d: p'_L = 0 % Suppress marking 1 pkt queue | 5d: p'_L = 0 % Suppress marking 1 pkt queue | |||
| skipping to change at page 50, line 4 ¶ | skipping to change at line 2282 ¶ | |||
| Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | Figure 7: Example Dequeue Pseudocode for DualQ Coupled PI2 AQM | |||
| (Including Code for Edge-Cases) | (Including Code for Edge-Cases) | |||
| 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | 1: dualpi2_update(lq, cq) { % Update p' every Tupdate | |||
| 2a: curq = max(cq.time(), lq.time()) % use greatest queuing time | 2a: curq = max(cq.time(), lq.time()) % use greatest queuing time | |||
| 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | 3: p' = p' + alpha * (curq - target) + beta * (curq - prevq) | |||
| 4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor | 4: p_CL = p' * k % Coupled L4S prob = base prob * coupling factor | |||
| 5: p_C = p'^2 % Classic prob = (base prob)^2 | 5: p_C = p'^2 % Classic prob = (base prob)^2 | |||
| 6: prevq = curq | 6: prevq = curq | |||
| 7: } | 7: } | |||
| Figure 8: Example PI-Update Pseudocode for DualQ Coupled PI2 AQM | ||||
| Figure 8: Example PI-update Pseudocode for DualQ Coupled PI2 AQM | ||||
| (Including Overload Code) | (Including Overload Code) | |||
| The choice of scheduler technology is critical to overload protection | The choice of scheduler technology is critical to overload protection | |||
| (see Section 4.2.2). | (see Section 4.2.2). | |||
| * A well-understood weighted scheduler such as weighted round-robin | * A well-understood weighted scheduler such as WRR is recommended. | |||
| (WRR) is recommended. As long as the scheduler weight for Classic | As long as the scheduler weight for Classic is small (e.g., 1/16), | |||
| is small (e.g. 1/16), its exact value is unimportant because it | its exact value is unimportant, because it does not normally | |||
| does not normally determine capacity shares. The weight is only | determine capacity shares. The weight is only important to | |||
| important to prevent unresponsive L4S traffic starving Classic | prevent unresponsive L4S traffic starving Classic traffic in the | |||
| traffic in the short term (see Section 4.2.2). This is because | short term (see Section 4.2.2). This is because capacity sharing | |||
| capacity sharing between the queues is normally determined by the | between the queues is normally determined by the coupled | |||
| coupled congestion signal, which overrides the scheduler, by | congestion signal, which overrides the scheduler, by making L4S | |||
| making L4S sources leave roughly equal per-flow capacity available | sources leave roughly equal per-flow capacity available for | |||
| for Classic flows. | Classic flows. | |||
| * Alternatively, a time-shifted FIFO (TS-FIFO) could be used. It | * Alternatively, a time-shifted FIFO (TS-FIFO) could be used. It | |||
| works by selecting the head packet that has waited the longest, | works by selecting the head packet that has waited the longest, | |||
| biased against the Classic traffic by a time-shift of tshift. To | biased against the Classic traffic by a time-shift of tshift. To | |||
| implement time-shifted FIFO, the scheduler() function in line 3 of | implement TS-FIFO, the scheduler() function in line 3 of the | |||
| the dequeue code would simply be implemented as the scheduler() | dequeue code would simply be implemented as the scheduler() | |||
| function at the bottom of Figure 10 in Appendix B. For the public | function at the bottom of Figure 10 in Appendix B. For the public | |||
| Internet a good value for tshift is 50ms. For private networks | Internet, a good value for tshift is 50 ms. For private networks | |||
| with smaller diameter, about 4*target would be reasonable. TS- | with smaller diameter, about 4*target would be reasonable. TS- | |||
| FIFO is a very simple scheduler, but complexity might need to be | FIFO is a very simple scheduler, but complexity might need to be | |||
| added to address some deficiencies (which is why it is not | added to address some deficiencies (which is why it is not | |||
| recommended over WRR): | recommended over WRR): | |||
| - TS-FIFO does not fully isolate latency in the L4S queue from | - TS-FIFO does not fully isolate latency in the L4S queue from | |||
| uncontrolled bursts in the Classic queue; | uncontrolled bursts in the Classic queue; | |||
| - Using sojourn time for TS-FIFO is only appropriate if time- | - using sojourn time for TS-FIFO is only appropriate if | |||
| stamping of packets is feasible; | timestamping of packets is feasible; and | |||
| - Even if time-stamping is supported, the sojourn time of the | - even if timestamping is supported, the sojourn time of the head | |||
| head packet is always stale, so a more instantaneous measure of | packet is always stale, so a more instantaneous measure of | |||
| queue delay could be used (see Note a in Appendix A.1). | queue delay could be used (see Note a in Appendix A.1). | |||
| * A strict priority scheduler would be inappropriate as discussed in | * A strict priority scheduler would be inappropriate as discussed in | |||
| Section 4.2.2. | Section 4.2.2. | |||
| Appendix B. Example DualQ Coupled Curvy RED Algorithm | Appendix B. Example DualQ Coupled Curvy RED Algorithm | |||
| As another example of a DualQ Coupled AQM algorithm, the pseudocode | As another example of a DualQ Coupled AQM algorithm, the pseudocode | |||
| below gives the Curvy RED based algorithm. Although the AQM was | below gives the Curvy-RED-based algorithm. Although the AQM was | |||
| designed to be efficient in integer arithmetic, to aid understanding | designed to be efficient in integer arithmetic, to aid understanding | |||
| it is first given using floating point arithmetic (Figure 10). Then, | it is first given using floating point arithmetic (Figure 10). Then, | |||
| one possible optimization for integer arithmetic is given, also in | one possible optimization for integer arithmetic is given, also in | |||
| pseudocode (Figure 11). To aid comparison, the line numbers are kept | pseudocode (Figure 11). To aid comparison, the line numbers are kept | |||
| in step between the two by using letter suffixes where the longer | in step between the two by using letter suffixes where the longer | |||
| code needs extra lines. | code needs extra lines. | |||
| B.1. Curvy RED in Pseudocode | B.1. Curvy RED in Pseudocode | |||
| The pseudocode manipulates three main structures of variables: the | The pseudocode manipulates three main structures of variables: the | |||
| packet (pkt), the L4S queue (lq) and the Classic queue (cq) and | packet (pkt), the L4S queue (lq), and the Classic queue (cq). It is | |||
| consists of the following five functions: | defined and described below in the following three functions: | |||
| * The initialization function cred_params_init(...) (Figure 2) that | * the initialization function cred_params_init(...) (Figure 2) that | |||
| sets parameter defaults (the API for setting non-default values is | sets parameter defaults (the API for setting non-default values is | |||
| omitted for brevity); | omitted for brevity); | |||
| * The dequeue function cred_dequeue(lq, cq, pkt) (Figure 4); | * the dequeue function cred_dequeue(lq, cq, pkt) (Figure 4); and | |||
| * The scheduling function scheduler(), which selects between the | * the scheduling function scheduler(), which selects between the | |||
| head packets of the two queues. | head packets of the two queues. | |||
| It also uses the following functions that are either shown elsewhere, | It also uses the following functions that are either shown elsewhere | |||
| or not shown in full here: | or not shown in full here: | |||
| * The enqueue function, which is identical to that used for DualPI2, | * the enqueue function, which is identical to that used for DualPI2, | |||
| dualpi2_enqueue(lq, cq, pkt) in Figure 3; | dualpi2_enqueue(lq, cq, pkt) in Figure 3; | |||
| * mark(pkt) and drop(pkt) for ECN-marking and dropping a packet; | * mark(pkt) and drop(pkt) for ECN marking and dropping a packet; | |||
| * cq.byt() or lq.byt() returns the current length (aka. backlog) of | * cq.byt() or lq.byt() returns the current length (a.k.a. backlog) | |||
| the relevant queue in bytes; | of the relevant queue in bytes; and | |||
| * cq.time() or lq.time() returns the current queuing delay of the | * cq.time() or lq.time() returns the current queuing delay of the | |||
| relevant queue in units of time (see Note a in Appendix A.1). | relevant queue in units of time (see Note a in Appendix A.1). | |||
| Because Curvy RED was evaluated before DualPI2, certain improvements | Because Curvy RED was evaluated before DualPI2, certain improvements | |||
| introduced for DualPI2 were not evaluated for Curvy RED. In the | introduced for DualPI2 were not evaluated for Curvy RED. In the | |||
| pseudocode below, the straightforward improvements have been added on | pseudocode below, the straightforward improvements have been added on | |||
| the assumption they will provide similar benefits, but that has not | the assumption they will provide similar benefits, but that has not | |||
| been proven experimentally. They are: i) a conditional priority | been proven experimentally. They are: i) a conditional priority | |||
| scheduler instead of strict priority ii) a time-based threshold for | scheduler instead of strict priority; ii) a time-based threshold for | |||
| the native L4S AQM; iii) ECN support for the Classic AQM. A recent | the Native L4S AQM; and iii) ECN support for the Classic AQM. A | |||
| evaluation has proved that a minimum ECN-marking threshold (minTh) | recent evaluation has proved that a minimum ECN-marking threshold | |||
| greatly improves performance, so this is also included in the | (minTh) greatly improves performance, so this is also included in the | |||
| pseudocode. | pseudocode. | |||
| Overload protection has not been added to the Curvy RED pseudocode | Overload protection has not been added to the Curvy RED pseudocode | |||
| below so as not to detract from the main features. It would be added | below so as not to detract from the main features. It would be added | |||
| in exactly the same way as in Appendix A.2 for the DualPI2 | in exactly the same way as in Appendix A.2 for the DualPI2 | |||
| pseudocode. The native L4S AQM uses a step threshold, but a ramp | pseudocode. The Native L4S AQM uses a step threshold, but a ramp | |||
| like that described for DualPI2 could be used instead. The scheduler | like that described for DualPI2 could be used instead. The scheduler | |||
| uses the simple TS-FIFO algorithm, but it could be replaced with WRR. | uses the simple TS-FIFO algorithm, but it could be replaced with WRR. | |||
| The Curvy RED algorithm has not been maintained or evaluated to the | The Curvy RED algorithm has not been maintained or evaluated to the | |||
| same degree as the DualPI2 algorithm. In initial experiments on | same degree as the DualPI2 algorithm. In initial experiments on | |||
| broadband access links ranging from 4 Mb/s to 200 Mb/s with base RTTs | broadband access links ranging from 4 Mb/s to 200 Mb/s with base RTTs | |||
| from 5 ms to 100 ms, Curvy RED achieved good results with the default | from 5 ms to 100 ms, Curvy RED achieved good results with the default | |||
| parameters in Figure 9. | parameters in Figure 9. | |||
| The parameters are categorised by whether they relate to the Classic | The parameters are categorized by whether they relate to the Classic | |||
| AQM, the L4S AQM or the framework coupling them together. Constants | AQM, the L4S AQM, or the framework coupling them together. Constants | |||
| and variables derived from these parameters are also included at the | and variables derived from these parameters are also included at the | |||
| end of each category. These are the raw input parameters for the | end of each category. These are the raw input parameters for the | |||
| algorithm. A configuration front-end could accept more meaningful | algorithm. A configuration front-end could accept more meaningful | |||
| parameters (e.g. RTT_max and RTT_typ) and convert them into these raw | parameters (e.g., RTT_max and RTT_typ) and convert them into these | |||
| parameters, as has been done for DualPI2 in Appendix A. Where | raw parameters, as has been done for DualPI2 in Appendix A. Where | |||
| necessary, parameters are explained further in the walk-through of | necessary, parameters are explained further in the walk-through of | |||
| the pseudocode below. | the pseudocode below. | |||
| 1: cred_params_init(...) { % Set input parameter defaults | 1: cred_params_init(...) { % Set input parameter defaults | |||
| 2: % DualQ Coupled framework parameters | 2: % DualQ Coupled framework parameters | |||
| 3: limit = MAX_LINK_RATE * 250 ms % Dual buffer size | 3: limit = MAX_LINK_RATE * 250 ms % Dual buffer size | |||
| 4: k' = 1 % Coupling factor as a power of 2 | 4: k' = 1 % Coupling factor as a power of 2 | |||
| 5: tshift = 50 ms % Time shift of TS-FIFO scheduler | 5: tshift = 50 ms % Time-shift of TS-FIFO scheduler | |||
| 6: % Constants derived from Classic AQM parameters | 6: % Constants derived from Classic AQM parameters | |||
| 7: k = 2^k' % Coupling factor from Equation (1) | 7: k = 2^k' % Coupling factor from equation (1) | |||
| 6: | 6: | |||
| 7: % Classic AQM parameters | 7: % Classic AQM parameters | |||
| 8: g_C = 5 % EWMA smoothing parameter as a power of 1/2 | 8: g_C = 5 % EWMA smoothing parameter as a power of 1/2 | |||
| 9: S_C = -1 % Classic ramp scaling factor as a power of 2 | 9: S_C = -1 % Classic ramp scaling factor as a power of 2 | |||
| 10: minTh = 500 ms % No Classic drop/mark below this queue delay | 10: minTh = 500 ms % No Classic drop/mark below this queue delay | |||
| 11: % Constants derived from Classic AQM parameters | 11: % Constants derived from Classic AQM parameters | |||
| 12: gamma = 2^(-g_C) % EWMA smoothing parameter | 12: gamma = 2^(-g_C) % EWMA smoothing parameter | |||
| 13: range_C = 2^S_C % Range of Classic ramp | 13: range_C = 2^S_C % Range of Classic ramp | |||
| 14: | 14: | |||
| 15: % L4S AQM parameters | 15: % L4S AQM parameters | |||
| 16: T = 1 ms % Queue delay threshold for native L4S AQM | 16: T = 1 ms % Queue delay threshold for Native L4S AQM | |||
| 17: % Constants derived from above parameters | 17: % Constants derived from above parameters | |||
| 18: S_L = S_C - k' % L4S ramp scaling factor as a power of 2 | 18: S_L = S_C - k' % L4S ramp scaling factor as a power of 2 | |||
| 19: range_L = 2^S_L % Range of L4S ramp | 19: range_L = 2^S_L % Range of L4S ramp | |||
| 20: } | 20: } | |||
| Figure 9: Example Header Pseudocode for DualQ Coupled Curvy RED AQM | Figure 9: Example Header Pseudocode for DualQ Coupled Curvy RED AQM | |||
| 1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | 1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | |||
| 2: while ( lq.byt() + cq.byt() > 0 ) { | 2: while ( lq.byt() + cq.byt() > 0 ) { | |||
| 3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
| skipping to change at page 54, line 49 ¶ | skipping to change at line 2468 ¶ | |||
| 30: return lq; | 30: return lq; | |||
| 31: else | 31: else | |||
| 32: return cq; | 32: return cq; | |||
| 33: } | 33: } | |||
| Figure 10: Example Dequeue Pseudocode for DualQ Coupled Curvy RED AQM | Figure 10: Example Dequeue Pseudocode for DualQ Coupled Curvy RED AQM | |||
| The dequeue pseudocode (Figure 10) is repeatedly called whenever the | The dequeue pseudocode (Figure 10) is repeatedly called whenever the | |||
| lower layer is ready to forward a packet. It schedules one packet | lower layer is ready to forward a packet. It schedules one packet | |||
| for dequeuing (or zero if the queue is empty) then returns control to | for dequeuing (or zero if the queue is empty) then returns control to | |||
| the caller, so that it does not block while that packet is being | the caller so that it does not block while that packet is being | |||
| forwarded. While making this dequeue decision, it also makes the | forwarded. While making this dequeue decision, it also makes the | |||
| necessary AQM decisions on dropping or marking. The alternative of | necessary AQM decisions on dropping or marking. The alternative of | |||
| applying the AQMs at enqueue would shift some processing from the | applying the AQMs at enqueue would shift some processing from the | |||
| critical time when each packet is dequeued. However, it would also | critical time when each packet is dequeued. However, it would also | |||
| add a whole queue of delay to the control signals, making the control | add a whole queue of delay to the control signals, making the control | |||
| loop very sloppy. | loop very sloppy. | |||
| The code is written assuming the AQMs are applied on dequeue (Note | The code is written assuming the AQMs are applied on dequeue (Note | |||
| 1). All the dequeue code is contained within a large while loop so | 1). All the dequeue code is contained within a large while loop so | |||
| that if it decides to drop a packet, it will continue until it | that if it decides to drop a packet, it will continue until it | |||
| selects a packet to schedule. If both queues are empty, the routine | selects a packet to schedule. If both queues are empty, the routine | |||
| returns NULL at line 20. Line 3 of the dequeue pseudocode is where | returns NULL at line 20. Line 3 of the dequeue pseudocode is where | |||
| the conditional priority scheduler chooses between the L4S queue (lq) | the conditional priority scheduler chooses between the L4S queue (lq) | |||
| and the Classic queue (cq). The time-shifted FIFO scheduler is shown | and the Classic queue (cq). The TS-FIFO scheduler is shown at lines | |||
| at lines 28-33, which would be suitable if simplicity is paramount | 28-33, which would be suitable if simplicity is paramount (see Note | |||
| (see Note 2). | 2). | |||
| Within each queue, the decision whether to forward, drop or mark is | Within each queue, the decision whether to forward, drop, or mark is | |||
| taken as follows (to simplify the explanation, it is assumed that | taken as follows (to simplify the explanation, it is assumed that U = | |||
| U=1): | 1): | |||
| L4S: If the test at line 3 determines there is an L4S packet to | L4S: | |||
| If the test at line 3 determines there is an L4S packet to | ||||
| dequeue, the tests at lines 5b and 5c determine whether to mark | dequeue, the tests at lines 5b and 5c determine whether to mark | |||
| it. The first is a simple test of whether the L4S queue delay | it. The first is a simple test of whether the L4S queue delay | |||
| (lq.time()) is greater than a step threshold T (Note 3). The | (lq.time()) is greater than a step threshold T (Note 3). The | |||
| second test is similar to the random ECN marking in RED, but with | second test is similar to the random ECN marking in RED but with | |||
| the following differences: i) marking depends on queuing time, not | the following differences: i) marking depends on queuing time, not | |||
| bytes, in order to scale for any link rate without being | bytes, in order to scale for any link rate without being | |||
| reconfigured; ii) marking of the L4S queue depends on a logical OR | reconfigured; ii) marking of the L4S queue depends on a logical OR | |||
| of two tests; one against its own queuing time and one against the | of two tests: one against its own queuing time and one against the | |||
| queuing time of the _other_ (Classic) queue; iii) the tests are | queuing time of the _other_ (Classic) queue; iii) the tests are | |||
| against the instantaneous queuing time of the L4S queue, but a | against the instantaneous queuing time of the L4S queue but | |||
| smoothed average of the other (Classic) queue; iv) the queue is | against a smoothed average of the other (Classic) queue; and iv) | |||
| compared with the maximum of U random numbers (but if U=1, this is | the queue is compared with the maximum of U random numbers (but if | |||
| the same as the single random number used in RED). | U = 1, this is the same as the single random number used in RED). | |||
| Specifically, in line 5a the coupled marking probability p_CL is | Specifically, in line 5a, the coupled marking probability p_CL is | |||
| set to the amount by which the averaged Classic queueing delay Q_C | set to the amount by which the averaged Classic queuing delay Q_C | |||
| exceeds the minimum queuing delay threshold (minTh) all divided by | exceeds the minimum queuing delay threshold (minTh), all divided | |||
| the L4S scaling parameter range_L. range_L represents the queuing | by the L4S scaling parameter range_L. range_L represents the | |||
| delay (in seconds) added to minTh at which marking probability | queuing delay (in seconds) added to minTh at which marking | |||
| would hit 100%. Then in line 5c (if U=1) the result is compared | probability would hit 100%. Then, in line 5c (if U = 1), the | |||
| with a uniformly distributed random number between 0 and 1, which | result is compared with a uniformly distributed random number | |||
| ensures that, over range_L, marking probability will linearly | between 0 and 1, which ensures that, over range_L, marking | |||
| increase with queueing time. | probability will linearly increase with queuing time. | |||
| Classic: If the scheduler at line 3 chooses to dequeue a Classic | Classic: | |||
| packet and jumps to line 7, the test at line 10b determines | If the scheduler at line 3 chooses to dequeue a Classic packet and | |||
| whether to drop or mark it. But before that, line 9a updates Q_C, | jumps to line 7, the test at line 10b determines whether to drop | |||
| which is an exponentially weighted moving average (Note 4) of the | or mark it. But before that, line 9a updates Q_C, which is an | |||
| queuing time of the Classic queue, where cq.time() is the current | exponentially weighted moving average (Note 4) of the queuing time | |||
| instantaneous queueing time of the packet at the head of the | of the Classic queue, where cq.time() is the current instantaneous | |||
| Classic queue (zero if empty) and gamma is the EWMA constant | queuing time of the packet at the head of the Classic queue (zero | |||
| (default 1/32, see line 12 of the initialization function). | if empty), and gamma is the exponentially weighted moving average | |||
| (EWMA) constant (default 1/32; see line 12 of the initialization | ||||
| function). | ||||
| Lines 10a and 10b implement the Classic AQM. In line 10a the | Lines 10a and 10b implement the Classic AQM. In line 10a, the | |||
| averaged queuing time Q_C is divided by the Classic scaling | averaged queuing time Q_C is divided by the Classic scaling | |||
| parameter range_C, in the same way that queuing time was scaled | parameter range_C, in the same way that queuing time was scaled | |||
| for L4S marking. This scaled queuing time will be squared to | for L4S marking. This scaled queuing time will be squared to | |||
| compute Classic drop probability so, before it is squared, it is | compute Classic drop probability. So, before it is squared, it is | |||
| effectively the square root of the drop probability, hence it is | effectively the square root of the drop probability; hence, it is | |||
| given the variable name sqrt_p_C. The squaring is done by | given the variable name sqrt_p_C. The squaring is done by | |||
| comparing it with the maximum out of two random numbers (assuming | comparing it with the maximum out of two random numbers (assuming | |||
| U=1). Comparing it with the maximum out of two is the same as the | U = 1). Comparing it with the maximum out of two is the same as | |||
| logical `AND' of two tests, which ensures drop probability rises | the logical 'AND' of two tests, which ensures drop probability | |||
| with the square of queuing time. | rises with the square of queuing time. | |||
| The AQM functions in each queue (lines 5c & 10b) are two cases of a | The AQM functions in each queue (lines 5c and 10b) are two cases of a | |||
| new generalization of RED called Curvy RED, motivated as follows. | new generalization of RED called 'Curvy RED', motivated as follows. | |||
| When the performance of this AQM was compared with FQ-CoDel and PIE, | When the performance of this AQM was compared with FQ-CoDel and PIE, | |||
| their goal of holding queuing delay to a fixed target seemed | their goal of holding queuing delay to a fixed target seemed | |||
| misguided [CRED_Insights]. As the number of flows increases, if the | misguided [CRED_Insights]. As the number of flows increases, if the | |||
| AQM does not allow host congestion controllers to increase queuing | AQM does not allow host congestion controllers to increase queuing | |||
| delay, it has to introduce abnormally high levels of loss. Then loss | delay, it has to introduce abnormally high levels of loss. Then loss | |||
| rather than queuing becomes the dominant cause of delay for short | rather than queuing becomes the dominant cause of delay for short | |||
| flows, due to timeouts and tail losses. | flows, due to timeouts and tail losses. | |||
| Curvy RED constrains delay with a softened target that allows some | Curvy RED constrains delay with a softened target that allows some | |||
| increase in delay as load increases. This is achieved by increasing | increase in delay as load increases. This is achieved by increasing | |||
| drop probability on a convex curve relative to queue growth (the | drop probability on a convex curve relative to queue growth (the | |||
| square curve in the Classic queue, if U=1). Like RED, the curve hugs | square curve in the Classic queue, if U = 1). Like RED, the curve | |||
| the zero axis while the queue is shallow. Then, as load increases, | hugs the zero axis while the queue is shallow. Then, as load | |||
| it introduces a growing barrier to higher delay. But, unlike RED, it | increases, it introduces a growing barrier to higher delay. But, | |||
| requires only two parameters, not three. The disadvantage of Curvy | unlike RED, it requires only two parameters, not three. The | |||
| RED (compared to a PI controller for example) is that it is not | disadvantage of Curvy RED (compared to a PI controller, for example) | |||
| adapted to a wide range of RTTs. Curvy RED can be used as is when | is that it is not adapted to a wide range of RTTs. Curvy RED can be | |||
| the RTT range to be supported is limited, otherwise an adaptation | used as is when the RTT range to be supported is limited; otherwise, | |||
| mechanism is needed. | an adaptation mechanism is needed. | |||
| From our limited experiments with Curvy RED so far, recommended | From our limited experiments with Curvy RED so far, recommended | |||
| values of these parameters are: S_C = -1; g_C = 5; T = 5 * MTU at the | values of these parameters are: S_C = -1; g_C = 5; T = 5 * MTU at the | |||
| link rate (about 1ms at 60Mb/s) for the range of base RTTs typical on | link rate (about 1 ms at 60 Mb/s) for the range of base RTTs typical | |||
| the public Internet. [CRED_Insights] explains why these parameters | on the public Internet. [CRED_Insights] explains why these | |||
| are applicable whatever rate link this AQM implementation is deployed | parameters are applicable whatever rate link this AQM implementation | |||
| on and how the parameters would need to be adjusted for a scenario | is deployed on and how the parameters would need to be adjusted for a | |||
| with a different range of RTTs (e.g. a data centre). The setting of | scenario with a different range of RTTs (e.g., a data centre). The | |||
| k depends on policy (see Section 2.5 and Appendix C.2 respectively | setting of k depends on policy (see Section 2.5 and Appendix C.2, | |||
| for its recommended setting and guidance on alternatives). | respectively, for its recommended setting and guidance on | |||
| alternatives). | ||||
| There is also a cUrviness parameter, U, which is a small positive | There is also a cUrviness parameter, U, which is a small positive | |||
| integer. It is likely to take the same hard-coded value for all | integer. It is likely to take the same hard-coded value for all | |||
| implementations, once experiments have determined a good value. Only | implementations, once experiments have determined a good value. Only | |||
| U=1 has been used in experiments so far, but results might be even | U = 1 has been used in experiments so far, but results might be even | |||
| better with U=2 or higher. | better with U = 2 or higher. | |||
| Notes: | Notes: | |||
| 1. The alternative of applying the AQMs at enqueue would shift some | 1. The alternative of applying the AQMs at enqueue would shift some | |||
| processing from the critical time when each packet is dequeued. | processing from the critical time when each packet is dequeued. | |||
| However, it would also add a whole queue of delay to the control | However, it would also add a whole queue of delay to the control | |||
| signals, making the control loop sloppier (for a typical RTT it | signals, making the control loop sloppier (for a typical RTT, it | |||
| would double the Classic queue's feedback delay). On a platform | would double the Classic queue's feedback delay). On a platform | |||
| where packet timestamping is feasible, e.g. Linux, it is also | where packet timestamping is feasible, e.g., Linux, it is also | |||
| easiest to apply the AQMs at dequeue because that is where | easiest to apply the AQMs at dequeue, because that is where | |||
| queuing time is also measured. | queuing time is also measured. | |||
| 2. WRR better isolates the L4S queue from large delay bursts in the | 2. WRR better isolates the L4S queue from large delay bursts in the | |||
| Classic queue, but it is slightly less simple than TS-FIFO. If | Classic queue, but it is slightly less simple than TS-FIFO. If | |||
| WRR were used, a low default Classic weight (e.g. 1/16) would | WRR were used, a low default Classic weight (e.g., 1/16) would | |||
| need to be configured in place of the time shift in line 5 of the | need to be configured in place of the time-shift in line 5 of the | |||
| initialization function (Figure 9). | initialization function (Figure 9). | |||
| 3. A step function is shown for simplicity. A ramp function (see | 3. A step function is shown for simplicity. A ramp function (see | |||
| Figure 5 and the discussion around it in Appendix A.1) is | Figure 5 and the discussion around it in Appendix A.1) is | |||
| recommended, because it is more general than a step and has the | recommended, because it is more general than a step and has the | |||
| potential to enable L4S congestion controls to converge more | potential to enable L4S congestion controls to converge more | |||
| rapidly. | rapidly. | |||
| 4. An EWMA is only one possible way to filter bursts; other more | 4. An EWMA is only one possible way to filter bursts; other more | |||
| adaptive smoothing methods could be valid and it might be | adaptive smoothing methods could be valid, and it might be | |||
| appropriate to decrease the EWMA faster than it increases, | appropriate to decrease the EWMA faster than it increases, e.g., | |||
| e.g. by using the minimum of the smoothed and instantaneous queue | by using the minimum of the smoothed and instantaneous queue | |||
| delays, min(Q_C, qc.time()). | delays, min(Q_C, qc.time()). | |||
| B.2. Efficient Implementation of Curvy RED | B.2. Efficient Implementation of Curvy RED | |||
| Although code optimization depends on the platform, the following | Although code optimization depends on the platform, the following | |||
| notes explain where the design of Curvy RED was particularly | notes explain where the design of Curvy RED was particularly | |||
| motivated by efficient implementation. | motivated by efficient implementation. | |||
| The Classic AQM at line 10b calls maxrand(2*U), which gives twice as | The Classic AQM at line 10b in Figure 10 calls maxrand(2*U), which | |||
| much curviness as the call to maxrand(U) in the marking function at | gives twice as much curviness as the call to maxrand(U) in the | |||
| line 5c. This is the trick that implements the square rule in | marking function at line 5c. This is the trick that implements the | |||
| equation (1) (Section 2.1). This is based on the fact that, given a | square rule in equation (1) (Section 2.1). This is based on the fact | |||
| number X from 1 to 6, the probability that two dice throws will both | that, given a number X from 1 to 6, the probability that two dice | |||
| be less than X is the square of the probability that one throw will | throws will both be less than X is the square of the probability that | |||
| be less than X. So, when U=1, the L4S marking function is linear and | one throw will be less than X. So, when U = 1, the L4S marking | |||
| the Classic dropping function is squared. If U=2, L4S would be a | function is linear and the Classic dropping function is squared. If | |||
| square function and Classic would be quartic. And so on. | U = 2, L4S would be a square function and Classic would be quartic. | |||
| And so on. | ||||
| The maxrand(u) function in lines 16-21 simply generates u random | The maxrand(u) function in lines 22-27 simply generates u random | |||
| numbers and returns the maximum. Typically, maxrand(u) could be run | numbers and returns the maximum. Typically, maxrand(u) could be run | |||
| in parallel out of band. For instance, if U=1, the Classic queue | in parallel out of band. For instance, if U = 1, the Classic queue | |||
| would require the maximum of two random numbers. So, instead of | would require the maximum of two random numbers. So, instead of | |||
| calling maxrand(2*U) in-band, the maximum of every pair of values | calling maxrand(2*U) in-band, the maximum of every pair of values | |||
| from a pseudorandom number generator could be generated out-of-band, | from a pseudorandom number generator could be generated out of band | |||
| and held in a buffer ready for the Classic queue to consume. | and held in a buffer ready for the Classic queue to consume. | |||
| 1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | 1: cred_dequeue(lq, cq, pkt) { % Couples L4S & Classic queues | |||
| 2: while ( lq.byt() + cq.byt() > 0 ) { | 2: while ( lq.byt() + cq.byt() > 0 ) { | |||
| 3: if ( scheduler() == lq ) { | 3: if ( scheduler() == lq ) { | |||
| 4: lq.dequeue(pkt) % L4S scheduled | 4: lq.dequeue(pkt) % L4S scheduled | |||
| 5: if ((lq.time() > T) OR (Q_C >> (S_L-2) > maxrand(U))) | 5: if ((lq.time() > T) OR (Q_C >> (S_L-2) > maxrand(U))) | |||
| 6: mark(pkt) | 6: mark(pkt) | |||
| 7: } else { | 7: } else { | |||
| 8: cq.dequeue(pkt) % Classic scheduled | 8: cq.dequeue(pkt) % Classic scheduled | |||
| skipping to change at page 58, line 48 ¶ | skipping to change at line 2657 ¶ | |||
| 16: } | 16: } | |||
| 17: } | 17: } | |||
| 18: return(pkt) % return the packet and stop here | 18: return(pkt) % return the packet and stop here | |||
| 19: } | 19: } | |||
| 20: return(NULL) % no packet to dequeue | 20: return(NULL) % no packet to dequeue | |||
| 21: } | 21: } | |||
| Figure 11: Optimised Example Dequeue Pseudocode for DualQ Coupled | Figure 11: Optimised Example Dequeue Pseudocode for DualQ Coupled | |||
| AQM using Integer Arithmetic | AQM using Integer Arithmetic | |||
| The two ranges, range_L and range_C are expressed as powers of 2 so | The two ranges, range_L and range_C, are expressed as powers of 2 so | |||
| that division can be implemented as a right bit-shift (>>) in lines 5 | that division can be implemented as a right bit-shift (>>) in lines 5 | |||
| and 10 of the integer variant of the pseudocode (Figure 11). | and 10 of the integer variant of the pseudocode (Figure 11). | |||
| For the integer variant of the pseudocode, an integer version of the | For the integer variant of the pseudocode, an integer version of the | |||
| rand() function used at line 25 of the maxrand(function) in Figure 10 | rand() function used at line 25 of the maxrand() function in | |||
| would be arranged to return an integer in the range 0 <= maxrand() < | Figure 10 would be arranged to return an integer in the range 0 <= | |||
| 2^32 (not shown). This would scale up all the floating point | maxrand() < 2^32 (not shown). This would scale up all the floating | |||
| probabilities in the range [0,1] by 2^32. | point probabilities in the range [0,1] by 2^32. | |||
| Queuing delays are also scaled up by 2^32, but in two stages: i) In | Queuing delays are also scaled up by 2^32, but in two stages: i) in | |||
| line 9 queuing time qc.ns() is returned in integer nanoseconds, | line 9, queuing time qc.ns() is returned in integer nanoseconds, | |||
| making the value about 2^30 times larger than when the units were | making the value about 2^30 times larger than when the units were | |||
| seconds, ii) then in lines 5 and 10 an adjustment of -2 to the right | seconds, and then ii) in lines 5 and 10, an adjustment of -2 to the | |||
| bit-shift multiplies the result by 2^2, to complete the scaling by | right bit-shift multiplies the result by 2^2, to complete the scaling | |||
| 2^32. | by 2^32. | |||
| In line 8 of the initialization function, the EWMA constant gamma is | In line 8 of the initialization function, the EWMA constant gamma is | |||
| represented as an integer power of 2, g_C, so that in line 9 of the | represented as an integer power of 2, g_C, so that in line 9 of the | |||
| integer code the division needed to weight the moving average can be | integer code (Figure 11), the division needed to weight the moving | |||
| implemented by a right bit-shift (>> g_C). | average can be implemented by a right bit-shift (>> g_C). | |||
| Appendix C. Choice of Coupling Factor, k | Appendix C. Choice of Coupling Factor, k | |||
| C.1. RTT-Dependence | C.1. RTT-Dependence | |||
| Where Classic flows compete for the same capacity, their relative | Where Classic flows compete for the same capacity, their relative | |||
| flow rates depend not only on the congestion probability, but also on | flow rates depend not only on the congestion probability but also on | |||
| their end-to-end RTT (= base RTT + queue delay). The rates of | their end-to-end RTT (= base RTT + queue delay). The rates of Reno | |||
| Reno [RFC5681] flows competing over an AQM are roughly inversely | [RFC5681] flows competing over an AQM are roughly inversely | |||
| proportional to their RTTs. Cubic exhibits similar RTT-dependence | proportional to their RTTs. CUBIC exhibits similar RTT-dependence | |||
| when in Reno-compatibility mode, but it is less RTT-dependent | when in Reno-friendly mode, but it is less RTT-dependent otherwise. | |||
| otherwise. | ||||
| Until the early experiments with the DualQ Coupled AQM, the | Until the early experiments with the DualQ Coupled AQM, the | |||
| importance of the reasonably large Classic queue in mitigating RTT- | importance of the reasonably large Classic queue in mitigating RTT- | |||
| dependence when the base RTT is low had not been appreciated. | dependence when the base RTT is low had not been appreciated. | |||
| Appendix A.1.6 of the L4S ECN protocol [I-D.ietf-tsvwg-ecn-l4s-id] | Appendix A.1.6 of the L4S ECN Protocol [RFC9331] uses numerical | |||
| uses numerical examples to explain why bloated buffers had concealed | examples to explain why bloated buffers had concealed the RTT- | |||
| the RTT-dependence of Classic congestion controls before that time. | dependence of Classic congestion controls before that time. Then, it | |||
| Then it explains why, the more that queuing delays have reduced, the | explains why, the more that queuing delays have reduced, the more | |||
| more that RTT-dependence has surfaced as a potential starvation | that RTT-dependence has surfaced as a potential starvation problem | |||
| problem for long RTT flows, when competing against very short RTT | for long RTT flows, when competing against very short RTT flows. | |||
| flows. | ||||
| Given that congestion control on end-systems is voluntary, there is | Given that congestion control on end systems is voluntary, there is | |||
| no reason why it has to be voluntarily RTT-dependent. The RTT- | no reason why it has to be voluntarily RTT-dependent. The RTT- | |||
| dependence of existing Classic traffic cannot be 'undeployed'. | dependence of existing Classic traffic cannot be 'undeployed'. | |||
| Therefore, [I-D.ietf-tsvwg-ecn-l4s-id] requires L4S congestion | Therefore, [RFC9331] requires L4S congestion controls to be | |||
| controls to be significantly less RTT-dependent than the standard | significantly less RTT-dependent than the standard Reno congestion | |||
| Reno congestion control [RFC5681], at least at low RTT. Then RTT- | control [RFC5681], at least at low RTT. Then RTT-dependence ought to | |||
| dependence ought to be no worse than it is with appropriately sized | be no worse than it is with appropriately sized Classic buffers. | |||
| Classic buffers. Following this approach means there is no need for | Following this approach means there is no need for network devices to | |||
| network devices to address RTT-dependence, although there would be no | address RTT-dependence, although there would be no harm if they did, | |||
| harm if they did, which per-flow queuing inherently does. | which per-flow queuing inherently does. | |||
| C.2. Guidance on Controlling Throughput Equivalence | C.2. Guidance on Controlling Throughput Equivalence | |||
| The coupling factor, k, determines the balance between L4S and | The coupling factor, k, determines the balance between L4S and | |||
| Classic flow rates (see Section 2.5.2.1 and equation (1)). | Classic flow rates (see Section 2.5.2.1 and equation (1) in | |||
| Section 2.1). | ||||
| For the public Internet, a coupling factor of k=2 is recommended, and | For the public Internet, a coupling factor of k = 2 is recommended | |||
| justified below. For scenarios other than the public Internet, a | and justified below. For scenarios other than the public Internet, a | |||
| good coupling factor can be derived by plugging the appropriate | good coupling factor can be derived by plugging the appropriate | |||
| numbers into the same working. | numbers into the same working. | |||
| To summarize the maths below, from equation (7) it can be seen that | To summarize the maths below, from equation (7) it can be seen that | |||
| choosing k=1.64 would theoretically make L4S throughput roughly the | choosing k = 1.64 would theoretically make L4S throughput roughly the | |||
| same as Classic, _if their actual end-to-end RTTs were the same_. | same as Classic, _if their actual end-to-end RTTs were the same_. | |||
| However, even if the base RTTs are the same, the actual RTTs are | However, even if the base RTTs are the same, the actual RTTs are | |||
| unlikely to be the same, because Classic traffic needs a fairly large | unlikely to be the same, because Classic traffic needs a fairly large | |||
| queue to avoid under-utilization and excess drop. Whereas L4S does | queue to avoid underutilization and excess drop, whereas L4S does | |||
| not. | not. | |||
| Therefore, to determine the appropriate coupling factor policy, the | Therefore, to determine the appropriate coupling factor policy, the | |||
| operator needs to decide at what base RTT it wants L4S and Classic | operator needs to decide at what base RTT it wants L4S and Classic | |||
| flows to have roughly equal throughput, once the effect of the | flows to have roughly equal throughput, once the effect of the | |||
| additional Classic queue on Classic throughput has been taken into | additional Classic queue on Classic throughput has been taken into | |||
| account. With this approach, a network operator can determine a good | account. With this approach, a network operator can determine a good | |||
| coupling factor without knowing the precise L4S algorithm for | coupling factor without knowing the precise L4S algorithm for | |||
| reducing RTT-dependence - or even in the absence of any algorithm. | reducing RTT-dependence -- or even in the absence of any algorithm. | |||
| The following additional terminology will be used, with appropriate | The following additional terminology will be used, with appropriate | |||
| subscripts: | subscripts: | |||
| r: Packet rate [pkt/s] | r: Packet rate [pkt/s] | |||
| R: RTT [s/round] | R: RTT [s/round] | |||
| p: ECN marking probability [] | p: ECN-marking probability [] | |||
| On the Classic side, we consider Reno as the most sensitive and | On the Classic side, we consider Reno as the most sensitive and | |||
| therefore worst-case Classic congestion control. We will also | therefore worst-case Classic congestion control. We will also | |||
| consider Cubic in its Reno-friendly mode ('CReno'), as the most | consider CUBIC in its Reno-friendly mode ('CReno') as the most | |||
| prevalent congestion control, according to the references and | prevalent congestion control, according to the references and | |||
| analysis in [PI2param]. In either case, the Classic packet rate in | analysis in [PI2param]. In either case, the Classic packet rate in | |||
| steady state is given by the well-known square root formula for Reno | steady state is given by the well-known square root formula for Reno | |||
| congestion control: | congestion control: | |||
| r_C = 1.22 / (R_C * p_C^0.5) (5) | r_C = 1.22 / (R_C * p_C^0.5) (5) | |||
| On the L4S side, we consider the Prague congestion | On the L4S side, we consider the Prague congestion control | |||
| control [I-D.briscoe-iccrg-prague-congestion-control] as the | [PRAGUE-CC] as the reference for steady-state dependence on | |||
| reference for steady-state dependence on congestion. Prague conforms | congestion. Prague conforms to the same equation as DCTCP, but we do | |||
| to the same equation as DCTCP, but we do not use the equation derived | not use the equation derived in the DCTCP paper, which is only | |||
| in the DCTCP paper, which is only appropriate for step marking. The | appropriate for step marking. The coupled marking, p_CL, is the | |||
| coupled marking, p_CL, is the appropriate one when considering | appropriate one when considering throughput equivalence with Classic | |||
| throughput equivalence with Classic flows. Unlike step marking, | flows. Unlike step marking, coupled markings are inherently spaced | |||
| coupled markings are inherently spaced out, so we use the formula for | out, so we use the formula for DCTCP packet rate with probabilistic | |||
| DCTCP packet rate with probabilistic marking derived in Appendix A of | marking derived in Appendix A of [PI2]. We use the equation without | |||
| [PI2]. We use the equation without RTT-independence enabled, which | RTT-independence enabled, which will be explained later. | |||
| will be explained later. | ||||
| r_L = 2 / (R_L * p_CL) (6) | r_L = 2 / (R_L * p_CL) (6) | |||
| For packet rate equivalence, we equate the two packet rates and | For packet rate equivalence, we equate the two packet rates and | |||
| rearrange into the same form as Equation (1), so the two can be | rearrange the equation into the same form as equation (1) (copied | |||
| equated and simplified to produce a formula for a theoretical | from Section 2.1) so the two can be equated and simplified to produce | |||
| coupling factor, which we shall call k*: | a formula for a theoretical coupling factor, which we shall call k*: | |||
| r_c = r_L | r_c = r_L | |||
| => p_C = (p_CL/1.64 * R_L/R_C)^2 | => p_C = (p_CL/1.64 * R_L/R_C)^2. | |||
| p_C = ( p_CL / k )^2 (1) | p_C = ( p_CL / k )^2. (1) | |||
| k* = 1.64 * (R_C / R_L) (7) | k* = 1.64 * (R_C / R_L). (7) | |||
| We say that this coupling factor is theoretical, because it is in | We say that this coupling factor is theoretical, because it is in | |||
| terms of two RTTs, which raises two practical questions: i) for | terms of two RTTs, which raises two practical questions: i) for | |||
| multiple flows with different RTTs, the RTT for each traffic class | multiple flows with different RTTs, the RTT for each traffic class | |||
| would have to be derived from the RTTs of all the flows in that class | would have to be derived from the RTTs of all the flows in that class | |||
| (actually the harmonic mean would be needed); ii) a network node | (actually the harmonic mean would be needed) and ii) a network node | |||
| cannot easily know the RTT of the flows anyway. | cannot easily know the RTT of the flows anyway. | |||
| RTT-dependence is caused by window-based congestion control, so it | RTT-dependence is caused by window-based congestion control, so it | |||
| ought to be reversed there, not in the network. Therefore, we use a | ought to be reversed there, not in the network. Therefore, we use a | |||
| fixed coupling factor in the network, and reduce RTT-dependence in | fixed coupling factor in the network and reduce RTT-dependence in L4S | |||
| L4S senders. We cannot expect Classic senders to all be updated to | senders. We cannot expect Classic senders to all be updated to | |||
| reduce their RTT-dependence. But solely addressing the problem in | reduce their RTT-dependence. But solely addressing the problem in | |||
| L4S senders at least makes RTT-dependence no worse - not just between | L4S senders at least makes RTT-dependence no worse -- not just | |||
| L4S senders, but also between L4S and Classic senders. | between L4S senders, but also between L4S and Classic senders. | |||
| Traditionally, throughput equivalence has been defined for flows | Throughput equivalence is defined for flows under comparable | |||
| under comparable conditions, including with the same base | conditions, including with the same base RTT [RFC2914]. So if we | |||
| RTT [RFC2914]. So if we assume the same base RTT, R_b, for | assume the same base RTT, R_b, for comparable flows, we can put both | |||
| comparable flows, we can put both R_C and R_L in terms of R_b. | R_C and R_L in terms of R_b. | |||
| We can approximate the L4S RTT to be hardly greater than the base | We can approximate the L4S RTT to be hardly greater than the base | |||
| RTT, i.e. R_L ~= R_b. And we can replace R_C with (R_b + q_C), where | RTT, i.e., R_L ~= R_b. And we can replace R_C with (R_b + q_C), | |||
| the Classic queue, q_C, depends on the target queue delay that the | where the Classic queue, q_C, depends on the target queue delay that | |||
| operator has configured for the Classic AQM. | the operator has configured for the Classic AQM. | |||
| Taking PI2 as an example Classic AQM, it seems that we could just | Taking PI2 as an example Classic AQM, it seems that we could just | |||
| take R_C = R_b + target (recommended 15 ms by default in | take R_C = R_b + target (recommended 15 ms by default in | |||
| Appendix A.1). However, target is roughly the queue depth reached by | Appendix A.1). However, target is roughly the queue depth reached by | |||
| the tips of the sawteeth of a congestion control, not the average | the tips of the sawteeth of a congestion control, not the average | |||
| [PI2param]. That is R_max = R_b + target. | [PI2param]. That is R_max = R_b + target. | |||
| The position of the average in relation to the max depends on the | The position of the average in relation to the max depends on the | |||
| amplitude and geometry of the sawteeth. We consider two examples: | amplitude and geometry of the sawteeth. We consider two examples: | |||
| Reno [RFC5681], as the most sensitive worst-case, and Cubic [RFC8312] | Reno [RFC5681], as the most sensitive worst case, and CUBIC [RFC8312] | |||
| in its Reno-friendly mode ('CReno') as the most prevalent congestion | in its Reno-friendly mode ('CReno') as the most prevalent congestion | |||
| control algorithm on the Internet according to the references in | control algorithm on the Internet according to the references in | |||
| [PI2param]. Both are AIMD, so we will generalize using b as the | [PI2param]. Both are Additive Increase Multiplicative Decrease | |||
| multiplicative decrease factor (b_r = 0.5 for Reno, b_c = 0.7 for | (AIMD), so we will generalize using b as the multiplicative decrease | |||
| CReno). Then: | factor (b_r = 0.5 for Reno, b_c = 0.7 for CReno). Then | |||
| R_C = (R_max + b*R_max) / 2 | R_C = (R_max + b*R_max) / 2 | |||
| = R_max * (1+b)/2 | = R_max * (1+b)/2. | |||
| R_reno = 0.75 * (R_b + target); R_creno = 0.85 * (R_b + target). | R_reno = 0.75 * (R_b + target); R_creno = 0.85 * (R_b + target). | |||
| (8) | (8) | |||
| Plugging all this into equation (7) we get a fixed coupling factor | Plugging all this into equation (7), at any particular base RTT, R_b, | |||
| for each: | we get a fixed coupling factor for each: | |||
| k_reno = 1.64*0.75*(R_b+target)/R_b | k_reno = 1.64*0.75*(R_b+target)/R_b | |||
| = 1.23*(1 + target/R_b); k_creno = 1.39 * (1 + target/R_b) | = 1.23*(1 + target/R_b); k_creno = 1.39 * (1 + target/R_b). | |||
| An operator can then choose the base RTT at which it wants throughput | An operator can then choose the base RTT at which it wants throughput | |||
| to be equivalent. For instance, if we recommend that the operator | to be equivalent. For instance, if we recommend that the operator | |||
| chooses R_b = 25 ms, as a typical base RTT between Internet users and | chooses R_b = 25 ms, as a typical base RTT between Internet users and | |||
| CDNs [PI2param], then these coupling factors become: | CDNs [PI2param], then these coupling factors become: | |||
| k_reno = 1.23 * (1 + 15/25) k_creno = 1.39 * (1 + 15/25) | k_reno = 1.23 * (1 + 15/25) k_creno = 1.39 * (1 + 15/25) | |||
| = 1.97 = 2.22 | = 1.97 = 2.22 | |||
| ~= 2 ~= 2 (9) | ~= 2. ~= 2. (9) | |||
| The approximation is relevant to any of the above example DualQ | The approximation is relevant to any of the above example DualQ | |||
| Coupled algorithms, which use a coupling factor that is an integer | Coupled algorithms, which use a coupling factor that is an integer | |||
| power of 2 to aid efficient implementation. It also fits best to the | power of 2 to aid efficient implementation. It also fits best for | |||
| worst case (Reno). | the worst case (Reno). | |||
| To check the outcome of this coupling factor, we can express the | To check the outcome of this coupling factor, we can express the | |||
| ratio of L4S to Classic throughput by substituting from their rate | ratio of L4S to Classic throughput by substituting from their rate | |||
| equations (5) and (6), then also substituting for p_C in terms of | equations (5) and (6), then also substituting for p_C in terms of | |||
| p_CL, using equation (1) with k=2 as just determined for the | p_CL using equation (1) with k = 2 as just determined for the | |||
| Internet: | Internet: | |||
| r_L / r_C = 2 (R_C * p_C^0.5) / 1.22 (R_L * p_CL) | r_L / r_C = 2 (R_C * p_C^0.5) / 1.22 (R_L * p_CL) | |||
| = (R_C * p_CL) / (1.22 * R_L * p_CL) | = (R_C * p_CL) / (1.22 * R_L * p_CL) | |||
| = R_C / (1.22 * R_L) (10) | = R_C / (1.22 * R_L). (10) | |||
| As an example, we can then consider single competing CReno and Prague | As an example, we can then consider single competing CReno and Prague | |||
| flows, by expressing both their RTTs in (10) in terms of their base | flows, by expressing both their RTTs in (10) in terms of their base | |||
| RTTs, R_bC and R_bL. So R_C is replaced by equation (8) for CReno. | RTTs, R_bC and R_bL. So R_C is replaced by equation (8) for CReno. | |||
| And R_L is replaced by the max() function below, which represents the | And R_L is replaced by the max() function below, which represents the | |||
| effective RTT of the current Prague congestion | effective RTT of the current Prague congestion control [PRAGUE-CC] in | |||
| control [I-D.briscoe-iccrg-prague-congestion-control] in its | its (default) RTT-independent mode, because it sets a floor to the | |||
| (default) RTT-independent mode, because it sets a floor to the | ||||
| effective RTT that it uses for additive increase: | effective RTT that it uses for additive increase: | |||
| ~= 0.85 * (R_bC + target) / (1.22 * max(R_bL, R_typ)) | r_L / r_C ~= 0.85 * (R_bC + target) / (1.22 * max(R_bL, R_typ)) | |||
| ~= (R_bC + target) / (1.4 * max(R_bL, R_typ)) | ~= (R_bC + target) / (1.4 * max(R_bL, R_typ)). | |||
| It can be seen that, for base RTTs below target (15 ms), both the | It can be seen that, for base RTTs below target (15 ms), both the | |||
| numerator and the denominator plateau, which has the desired effect | numerator and the denominator plateau, which has the desired effect | |||
| of limiting RTT-dependence. | of limiting RTT-dependence. | |||
| At the start of the above derivations, an explanation was promised | At the start of the above derivations, an explanation was promised | |||
| for why the L4S throughput equation in equation (6) did not need to | for why the L4S throughput equation in equation (6) did not need to | |||
| model RTT-independence. This is because we only use one point - at | model RTT-independence. This is because we only use one point -- at | |||
| the typical base RTT where the operator chooses to calculate the | the typical base RTT where the operator chooses to calculate the | |||
| coupling factor. Then, throughput equivalence will at least hold at | coupling factor. Then throughput equivalence will at least hold at | |||
| that chosen point. Nonetheless, assuming Prague senders implement | that chosen point. Nonetheless, assuming Prague senders implement | |||
| RTT-independence over a range of RTTs below this, the throughput | RTT-independence over a range of RTTs below this, the throughput | |||
| equivalence will then extend over that range as well. | equivalence will then extend over that range as well. | |||
| Congestion control designers can choose different ways to reduce RTT- | Congestion control designers can choose different ways to reduce RTT- | |||
| dependence. And each operator can make a policy choice to decide on | dependence. And each operator can make a policy choice to decide on | |||
| a different base RTT, and therefore a different k, at which it wants | a different base RTT, and therefore a different k, at which it wants | |||
| throughput equivalence. Nonetheless, for the Internet, it makes | throughput equivalence. Nonetheless, for the Internet, it makes | |||
| sense to choose what is believed to be the typical RTT most users | sense to choose what is believed to be the typical RTT most users | |||
| experience, because a Classic AQM's target queuing delay is also | experience, because a Classic AQM's target queuing delay is also | |||
| derived from a typical RTT for the Internet. | derived from a typical RTT for the Internet. | |||
| As a non-Internet example, for localized traffic from a particular | As a non-Internet example, for localized traffic from a particular | |||
| ISP's data centre, using the measured RTTs, it was calculated that a | ISP's data centre, using the measured RTTs, it was calculated that a | |||
| value of k = 8 would achieve throughput equivalence, and experiments | value of k = 8 would achieve throughput equivalence, and experiments | |||
| verified the formula very closely. | verified the formula very closely. | |||
| But, for a typical mix of RTTs across the general Internet, a value | But, for a typical mix of RTTs across the general Internet, a value | |||
| of k=2 is recommended as a good workable compromise. | of k = 2 is recommended as a good workable compromise. | |||
| Acknowledgements | Acknowledgements | |||
| Thanks to Anil Agarwal, Sowmini Varadhan, Gabi Bracha, Nicolas Kuhn, | Thanks to Anil Agarwal, Sowmini Varadhan, Gabi Bracha, Nicolas Kuhn, | |||
| Greg Skinner, Tom Henderson, David Pullen, Mirja Kuehlewind, Gorry | Greg Skinner, Tom Henderson, David Pullen, Mirja Kühlewind, Gorry | |||
| Fairhurst, Pete Heist, Ermin Sakic and Martin Duke for detailed | Fairhurst, Pete Heist, Ermin Sakic, and Martin Duke for detailed | |||
| review comments particularly of the appendices and suggestions on how | review comments, particularly of the appendices, and suggestions on | |||
| to make the explanations clearer. Thanks also to Tom Henderson for | how to make the explanations clearer. Thanks also to Tom Henderson | |||
| insights on the choice of schedulers and queue delay measurement | for insight on the choice of schedulers and queue delay measurement | |||
| techniques. And thanks to the area reviewers Christer Holmberg, Lars | techniques. And thanks to the area reviewers Christer Holmberg, Lars | |||
| Eggert and Roman Danyliw. | Eggert, and Roman Danyliw. | |||
| The early contributions of Koen De Schepper, Bob Briscoe, Olga | The early contributions of Koen De Schepper, Bob Briscoe, Olga | |||
| Bondarenko and Inton Tsang were part-funded by the European Community | Bondarenko, and Inton Tsang were partly funded by the European | |||
| under its Seventh Framework Programme through the Reducing Internet | Community under its Seventh Framework Programme through the Reducing | |||
| Transport Latency (RITE) project (ICT-317700). Contributions of Koen | Internet Transport Latency (RITE) project (ICT-317700). | |||
| De Schepper and Olivier Tilmans were also part-funded by the 5Growth | Contributions of Koen De Schepper and Olivier Tilmans were also | |||
| and DAEMON EU H2020 projects. Bob Briscoe's contribution was also | partly funded by the 5Growth and DAEMON EU H2020 projects. Bob | |||
| part-funded by the Comcast Innovation Fund and the Research Council | Briscoe's contribution was also partly funded by the Comcast | |||
| of Norway through the TimeIn project. The views expressed here are | Innovation Fund and the Research Council of Norway through the TimeIn | |||
| solely those of the authors. | project. The views expressed here are solely those of the authors. | |||
| Contributors | Contributors | |||
| The following contributed implementations and evaluations that | The following contributed implementations and evaluations that | |||
| validated and helped to improve this specification: | validated and helped to improve this specification: | |||
| Olga Albisser <olga@albisser.org> of Simula Research Lab, Norway | Olga Albisser <olga@albisser.org> of Simula Research Lab, Norway | |||
| (Olga Bondarenko during early drafts) implemented the prototype | (Olga Bondarenko during early draft versions) implemented the | |||
| DualPI2 AQM for Linux with Koen De Schepper and conducted | prototype DualPI2 AQM for Linux with Koen De Schepper and conducted | |||
| extensive evaluations as well as implementing the live performance | extensive evaluations as well as implementing the live performance | |||
| visualization GUI [L4Sdemo16]. | visualization GUI [L4Sdemo16]. | |||
| Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com> of Nokia | Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com> of Nokia Bell | |||
| Bell Labs, Belgium prepared and maintains the Linux implementation | Labs, Belgium prepared and maintains the Linux implementation of | |||
| of DualPI2 for upstreaming. | DualPI2 for upstreaming. | |||
| Shravya K.S. wrote a model for the ns-3 simulator based on the -01 | Shravya K.S. wrote a model for the ns-3 simulator based on draft- | |||
| version of this Internet-Draft. Based on this initial work, Tom | ietf-tsvwg-aqm-dualq-coupled-01 (a draft version of this document). | |||
| Henderson <tomh@tomh.org> updated that earlier model and created a | Based on this initial work, Tom Henderson <tomh@tomh.org> updated | |||
| model for the DualQ variant specified as part of the Low Latency | that earlier model and created a model for the DualQ variant | |||
| DOCSIS specification, as well as conducting extensive evaluations. | specified as part of the Low Latency DOCSIS specification, as well as | |||
| conducting extensive evaluations. | ||||
| Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data | Ing Jyh (Inton) Tsang of Nokia, Belgium built the End-to-End Data | |||
| Centre to the Home broadband testbed on which DualQ Coupled AQM | Centre to the Home broadband testbed on which DualQ Coupled AQM | |||
| implementations were tested. | implementations were tested. | |||
| Authors' Addresses | Authors' Addresses | |||
| Koen De Schepper | Koen De Schepper | |||
| Nokia Bell Labs | Nokia Bell Labs | |||
| Antwerp | Antwerp | |||
| Belgium | Belgium | |||
| Email: koen.de_schepper@nokia.com | Email: koen.de_schepper@nokia.com | |||
| URI: https://www.bell-labs.com/about/researcher-profiles/ | URI: https://www.bell-labs.com/about/researcher-profiles/ | |||
| koende_schepper/ | koende_schepper/ | |||
| Bob Briscoe (editor) | Bob Briscoe (editor) | |||
| Independent | Independent | |||
| United Kingdom | United Kingdom | |||
| Email: ietf@bobbriscoe.net | Email: ietf@bobbriscoe.net | |||
| URI: https://bobbriscoe.net/ | URI: https://bobbriscoe.net/ | |||
| Greg White | Greg White | |||
| CableLabs | CableLabs | |||
| Louisville, CO, | Louisville, CO | |||
| United States of America | United States of America | |||
| Email: G.White@CableLabs.com | Email: G.White@CableLabs.com | |||
| End of changes. 404 change blocks. | ||||
| 1180 lines changed or deleted | 1200 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||