| rfc9406.original | rfc9406.txt | |||
|---|---|---|---|---|
| Network Working Group P. Balasubramanian | Internet Engineering Task Force (IETF) P. Balasubramanian | |||
| Internet-Draft Confluent | Request for Comments: 9406 Confluent | |||
| Intended status: Standards Track Y. Huang | Category: Standards Track Y. Huang | |||
| Expires: 31 August 2023 M. Olson | ISSN: 2070-1721 M. Olson | |||
| Microsoft | Microsoft | |||
| 27 February 2023 | May 2023 | |||
| HyStart++: Modified Slow Start for TCP | HyStart++: Modified Slow Start for TCP | |||
| draft-ietf-tcpm-hystartplusplus-14 | ||||
| Abstract | Abstract | |||
| This document describes HyStart++, a simple modification to the slow | This document describes HyStart++, a simple modification to the slow | |||
| start phase of congestion control algorithms. Slow start can | start phase of congestion control algorithms. Slow start can | |||
| overshoot the ideal send rate in many cases, causing high packet loss | overshoot the ideal send rate in many cases, causing high packet loss | |||
| and poor performance. HyStart++ uses increase in round-trip delay as | and poor performance. HyStart++ uses increase in round-trip delay as | |||
| a heuristic to find an exit point before possible overshoot. It also | a heuristic to find an exit point before possible overshoot. It also | |||
| adds a mitigation to prevent jitter from causing premature slow start | adds a mitigation to prevent jitter from causing premature slow start | |||
| exit. | exit. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
| provisions of BCP 78 and BCP 79. | ||||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
| and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
| time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
| material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
| Internet Standards is available in Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 31 August 2023. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9406. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2023 IETF Trust and the persons identified as the | Copyright (c) 2023 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction | |||
| 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology | |||
| 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 3. Definitions | |||
| 4. HyStart++ Algorithm . . . . . . . . . . . . . . . . . . . . . 3 | 4. HyStart++ Algorithm | |||
| 4.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 4.1. Summary | |||
| 4.2. Algorithm Details . . . . . . . . . . . . . . . . . . . . 4 | 4.2. Algorithm Details | |||
| 4.3. Tuning constants and other considerations . . . . . . . . 6 | 4.3. Tuning Constants and Other Considerations | |||
| 5. Deployments and Performance Evaluations . . . . . . . . . . . 7 | 5. Deployments and Performance Evaluations | |||
| 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | 6. Security Considerations | |||
| 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | 7. IANA Considerations | |||
| 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 | 8. References | |||
| 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 8.1. Normative References | |||
| 9.1. Normative References . . . . . . . . . . . . . . . . . . 8 | 8.2. Informative References | |||
| 9.2. Informative References . . . . . . . . . . . . . . . . . 8 | Acknowledgments | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 | Authors' Addresses | |||
| 1. Introduction | 1. Introduction | |||
| [RFC5681] describes the slow start congestion control algorithm for | [RFC5681] describes the slow start congestion control algorithm for | |||
| TCP. The slow start algorithm is used when the congestion window | TCP. The slow start algorithm is used when the congestion window | |||
| (cwnd) is less than the slow start threshold (ssthresh). During slow | (cwnd) is less than the slow start threshold (ssthresh). During slow | |||
| start, in absence of packet loss signals, TCP increases cwnd | start, in the absence of packet loss signals, TCP increases the cwnd | |||
| exponentially to probe the network capacity. This fast growth can | exponentially to probe the network capacity. This fast growth can | |||
| overshoot the ideal sending rate and cause significant packet loss | overshoot the ideal sending rate and cause significant packet loss | |||
| which cannot always be recovered efficiently. | that cannot always be recovered efficiently. | |||
| HyStart++ uses increase in round-trip delay as a signal to exit slow | HyStart++ builds upon Hybrid Start (HyStart), originally described in | |||
| start before potential packet loss occurs as a result of overshoot. | [HyStart]. HyStart++ uses increase in round-trip delay as a signal | |||
| This is one of two algorithms specified in [HyStart]. After the slow | to exit slow start before potential packet loss occurs as a result of | |||
| start exit, a new Conservative Slow Start (CSS) phase is used to | overshoot. This is one of two algorithms specified in [HyStart] for | |||
| determine whether the slow start exit was premature and to resume | finding a safe exit point for slow start. After the slow start exit, | |||
| slow start. This mitigation improves performance in presence of | a new Conservative Slow Start (CSS) phase is used to determine | |||
| jitter. HyStart++ reduces packet loss and retransmissions, and | whether the slow start exit was premature and to resume slow start. | |||
| improves goodput in lab measurements and real world deployments. | This mitigation improves performance in the presence of jitter. | |||
| HyStart++ reduces packet loss and retransmissions, and improves | ||||
| goodput in lab measurements and real-world deployments. | ||||
| While this document describes Hystart++ for TCP, it can also be used | While this document describes HyStart++ for TCP, it can also be used | |||
| for other transport protocols which use slow start such as QUIC | for other transport protocols that use slow start, such as QUIC | |||
| [RFC9002] or SCTP [RFC9260]. | [RFC9002] or the Stream Control Transmission Protocol (SCTP) | |||
| [RFC9260]. | ||||
| 2. Terminology | 2. Terminology | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 3. Definitions | 3. Definitions | |||
| We repeat here some definition from [RFC5681] to aid the reader. | To aid the reader, we repeat some definitions from [RFC5681]: | |||
| SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the | SENDER MAXIMUM SEGMENT SIZE (SMSS): The size of the largest segment | |||
| largest segment that the sender can transmit. This value can be | that the sender can transmit. This value can be based on the | |||
| based on the maximum transmission unit of the network, the path MTU | maximum transmission unit of the network, the Path MTU Discovery | |||
| discovery [RFC1191], [RFC4821] algorithm, RMSS (see next item), or | algorithm [RFC1191] [RFC4821], RMSS (see next item), or other | |||
| other factors. The size does not include the TCP/IP headers and | factors. The size does not include the TCP/IP headers and | |||
| options. | options. | |||
| RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the size of the | RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The size of the largest | |||
| largest segment the receiver is willing to accept. This is the value | segment that the receiver is willing to accept. This is the value | |||
| specified in the MSS option sent by the receiver during connection | specified in the MSS option sent by the receiver during connection | |||
| startup. Or, if the MSS option is not used, it is 536 bytes | startup. Or, if the MSS option is not used, it is 536 bytes | |||
| [RFC1122]. The size does not include the TCP/IP headers and options. | [RFC1122]. The size does not include the TCP/IP headers and | |||
| options. | ||||
| RECEIVER WINDOW (rwnd): The most recently advertised receiver window. | RECEIVER WINDOW (rwnd): The most recently advertised receiver | |||
| window. | ||||
| CONGESTION WINDOW (cwnd): A TCP state variable that limits the amount | CONGESTION WINDOW (cwnd): A TCP state variable that limits the | |||
| of data a TCP can send. At any given time, a TCP MUST NOT send data | amount of data a TCP can send. At any given time, a TCP MUST NOT | |||
| with a sequence number higher than the sum of the highest | send data with a sequence number higher than the sum of the | |||
| acknowledged sequence number and the minimum of cwnd and rwnd. | highest acknowledged sequence number and the minimum of the cwnd | |||
| and rwnd. | ||||
| 4. HyStart++ Algorithm | 4. HyStart++ Algorithm | |||
| 4.1. Summary | 4.1. Summary | |||
| [HyStart] specifies two algorithms (a "Delay Increase" algorithm and | [HyStart] specifies two algorithms (a "Delay Increase" algorithm and | |||
| an "Inter-Packet Arrival" algorithm) to be run in parallel to detect | an "Inter-Packet Arrival" algorithm) to be run in parallel to detect | |||
| that the sending rate has reached capacity. In practice, the Inter- | that the sending rate has reached capacity. In practice, the Inter- | |||
| Packet Arrival algorithm does not perform well and is not able to | Packet Arrival algorithm does not perform well and is not able to | |||
| detect congestion early, primarily due to ACK compression. The idea | detect congestion early, primarily due to ACK compression. The idea | |||
| of the Delay Increase algorithm is to look for spikes in RTT (round- | of the Delay Increase algorithm is to look for spikes in RTT (round- | |||
| trip time), which suggest that the bottleneck buffer is filling up. | trip time), which suggest that the bottleneck buffer is filling up. | |||
| In HyStart++, a TCP sender uses traditional slow start and then uses | In HyStart++, a TCP sender uses standard slow start and then uses the | |||
| the "Delay Increase" algorithm to trigger an exit from slow start. | Delay Increase algorithm to trigger an exit from slow start. But | |||
| But instead of going straight from slow start to congestion | instead of going straight from slow start to congestion avoidance, | |||
| avoidance, the sender spends a number of RTTs in a Conservative Slow | the sender spends a number of RTTs in a Conservative Slow Start (CSS) | |||
| Start (CSS) phase to determine whether the exit from slow start was | phase to determine whether the exit from slow start was premature. | |||
| premature. During CSS, the congestion window is grown exponentially | During CSS, the congestion window is grown exponentially in a fashion | |||
| like in regular slow start, but with a smaller exponential base, | similar to regular slow start, but with a smaller exponential base, | |||
| resulting in less aggressive growth. If the RTT reduces during CSS, | resulting in less aggressive growth. If the RTT reduces during CSS, | |||
| it's concluded that the RTT spike was not related to congestion | it's concluded that the RTT spike was not related to congestion | |||
| caused by the connection sending at a rate greater than the ideal | caused by the connection sending at a rate greater than the ideal | |||
| send rate, and the connection resumes slow start. If the RTT | send rate, and the connection resumes slow start. If the RTT | |||
| inflation persists throughout CSS, the connection enters congestion | inflation persists throughout CSS, the connection enters congestion | |||
| avoidance. | avoidance. | |||
| 4.2. Algorithm Details | 4.2. Algorithm Details | |||
| The following pseudocode uses a limit, L, to control the | The following pseudocode uses a limit, L, to control the | |||
| aggressiveness of the cwnd increase during both standard slow start | aggressiveness of the cwnd increase during both standard slow start | |||
| and CSS. While an arriving ACK may newly acknowledge an arbitrary | and CSS. While an arriving ACK may newly acknowledge an arbitrary | |||
| number of bytes, the Hystart++ algorithm limits the number of those | number of bytes, the HyStart++ algorithm limits the number of those | |||
| bytes applied to increase the cwnd to L*SMSS bytes. | bytes applied to increase the cwnd to L*SMSS bytes. | |||
| lastRoundMinRTT and currentRoundMinRTT are initialized to infinity at | lastRoundMinRTT and currentRoundMinRTT are initialized to infinity at | |||
| the initialization time. currRTT is the RTT sampled from the latest | the initialization time. currRTT is the RTT sampled from the latest | |||
| incoming ACK and initialized to infinity. | incoming ACK and initialized to infinity. | |||
| lastRoundMinRTT = infinity | lastRoundMinRTT = infinity | |||
| currentRoundMinRTT = infinity | currentRoundMinRTT = infinity | |||
| currRTT = infinity | currRTT = infinity | |||
| Hystart++ measures rounds using sequence numbers, as follows: Define | HyStart++ measures rounds using sequence numbers, as follows: | |||
| windowEnd as a sequence number initialized to SND.NXT. When | ||||
| windowEnd is ACKed, the current round ends and windowEnd is set to | ||||
| SND.NXT. | ||||
| At the start of each round during standard slow start ([RFC5681]) and | * Define windowEnd as a sequence number initialized to SND.NXT. | |||
| CSS, initialize the variables used to compute last round and current | ||||
| round's minimum RTT: | * When windowEnd is ACKed, the current round ends and windowEnd is | |||
| set to SND.NXT. | ||||
| At the start of each round during standard slow start [RFC5681] and | ||||
| CSS, initialize the variables used to compute the last round's and | ||||
| current round's minimum RTT: | ||||
| lastRoundMinRTT = currentRoundMinRTT | lastRoundMinRTT = currentRoundMinRTT | |||
| currentRoundMinRTT = infinity | currentRoundMinRTT = infinity | |||
| rttSampleCount = 0 | rttSampleCount = 0 | |||
| For each arriving ACK in slow start, where N is the number of | For each arriving ACK in slow start, where N is the number of | |||
| previously unacknowledged bytes acknowledged in the arriving ACK: | previously unacknowledged bytes acknowledged in the arriving ACK: | |||
| Update the cwnd: | Update the cwnd: | |||
| cwnd = cwnd + min(N, L * SMSS) | cwnd = cwnd + min(N, L * SMSS) | |||
| Keep track of minimum observed RTT: | Keep track of the minimum observed RTT: | |||
| currentRoundMinRTT = min(currentRoundMinRTT, currRTT) | currentRoundMinRTT = min(currentRoundMinRTT, currRTT) | |||
| rttSampleCount += 1 | rttSampleCount += 1 | |||
| For rounds where at least N_RTT_SAMPLE RTT samples have been obtained | For rounds where at least N_RTT_SAMPLE RTT samples have been obtained | |||
| and currentRoundMinRTT and lastRoundMinRTT are valid, check if delay | and currentRoundMinRTT and lastRoundMinRTT are valid, check to see if | |||
| increase triggers slow start exit: | delay increase triggers slow start exit: | |||
| if ((rttSampleCount >= N_RTT_SAMPLE) AND | if ((rttSampleCount >= N_RTT_SAMPLE) AND | |||
| (currentRoundMinRTT != infinity) AND | (currentRoundMinRTT != infinity) AND | |||
| (lastRoundMinRTT != infinity)) | (lastRoundMinRTT != infinity)) | |||
| Compute a RTT Threshold clamped between MIN_RTT_THRESH and MAX_RTT_THRESH | RttThresh = max(MIN_RTT_THRESH, | |||
| RttThresh = max(MIN_RTT_THRESH, min(lastRoundMinRTT / MIN_RTT_DIVISOR, MAX_RTT_THRESH)) | min(lastRoundMinRTT / MIN_RTT_DIVISOR, MAX_RTT_THRESH)) | |||
| if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) | if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) | |||
| cssBaselineMinRtt = currentRoundMinRTT | cssBaselineMinRtt = currentRoundMinRTT | |||
| exit slow start and enter CSS | exit slow start and enter CSS | |||
| For each arriving ACK in CSS, where N is the number of previously | For each arriving ACK in CSS, where N is the number of previously | |||
| unacknowledged bytes acknowledged in the arriving ACK: | unacknowledged bytes acknowledged in the arriving ACK: | |||
| Update the cwnd: | Update the cwnd: | |||
| cwnd = cwnd + (min(N, L * SMSS) / CSS_GROWTH_DIVISOR) | cwnd = cwnd + (min(N, L * SMSS) / CSS_GROWTH_DIVISOR) | |||
| Keep track of minimum observed RTT: | Keep track of the minimum observed RTT: | |||
| currentRoundMinRTT = min(currentRoundMinRTT, currRTT) | currentRoundMinRTT = min(currentRoundMinRTT, currRTT) | |||
| rttSampleCount += 1 | rttSampleCount += 1 | |||
| For CSS rounds where at least N_RTT_SAMPLE RTT samples have been | For CSS rounds where at least N_RTT_SAMPLE RTT samples have been | |||
| obtained, check if current round's minRTT drops below baseline | obtained, check to see if the current round's minRTT drops below | |||
| indicating that HyStart exit was spurious: | baseline (cssBaselineMinRtt) indicating that slow start exit was | |||
| spurious: | ||||
| if (currentRoundMinRTT < cssBaselineMinRtt) | if (currentRoundMinRTT < cssBaselineMinRtt) | |||
| cssBaselineMinRtt = infinity | cssBaselineMinRtt = infinity | |||
| resume slow start including HyStart++ | resume slow start including HyStart++ | |||
| CSS lasts at most CSS_ROUNDS rounds. If the transition into CSS | CSS lasts at most CSS_ROUNDS rounds. If the transition into CSS | |||
| happens in the middle of a round, that partial round counts towards | happens in the middle of a round, that partial round counts towards | |||
| the limit. | the limit. | |||
| If CSS_ROUNDS rounds are complete, enter congestion avoidance by | If CSS_ROUNDS rounds are complete, enter congestion avoidance by | |||
| setting ssthresh to current cwnd. | setting the ssthresh to the current cwnd. | |||
| ssthresh = cwnd | ssthresh = cwnd | |||
| If loss or ECN-marking is observed anytime during standard slow start | If loss or Explicit Congestion Notification (ECN) marking is observed | |||
| or CSS, enter congestion avoidance by setting ssthresh to current | at any time during standard slow start or CSS, enter congestion | |||
| cwnd. | avoidance by setting the ssthresh to the current cwnd. | |||
| ssthresh = cwnd | ssthresh = cwnd | |||
| 4.3. Tuning constants and other considerations | 4.3. Tuning Constants and Other Considerations | |||
| It is RECOMMENDED that a HyStart++ implementation use the following | It is RECOMMENDED that a HyStart++ implementation use the following | |||
| constants: | constants: | |||
| MIN_RTT_THRESH = 4 msec | MIN_RTT_THRESH = 4 msec | |||
| MAX_RTT_THRESH = 16 msec | MAX_RTT_THRESH = 16 msec | |||
| MIN_RTT_DIVISOR = 8 | MIN_RTT_DIVISOR = 8 | |||
| N_RTT_SAMPLE = 8 | N_RTT_SAMPLE = 8 | |||
| CSS_GROWTH_DIVISOR = 4 | CSS_GROWTH_DIVISOR = 4 | |||
| CSS_ROUNDS = 5 | CSS_ROUNDS = 5 | |||
| L = infinity if paced, L = 8 if non-paced | L = infinity if paced, L = 8 if non-paced | |||
| These constants have been determined with lab measurements and real | These constants have been determined with lab measurements and real- | |||
| world deployments. An implementation MAY tune them for different | world deployments. An implementation MAY tune them for different | |||
| network characteristics. | network characteristics. | |||
| The delay increase sensitivity is determined by MIN_RTT_THRESH and | The delay increase sensitivity is determined by MIN_RTT_THRESH and | |||
| MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious | MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious | |||
| exits from slow start. Larger values of MAX_RTT_THRESH may result in | exits from slow start. Larger values of MAX_RTT_THRESH may result in | |||
| slow start not exiting until loss is encountered for connections on | slow start not exiting until loss is encountered for connections on | |||
| large RTT paths. | large RTT paths. | |||
| MIN_RTT_DIVISOR is a fraction of RTT to compute delay threshold. A | MIN_RTT_DIVISOR is a fraction of RTT to compute the delay threshold. | |||
| smaller value would mean a bigger threshold and thus less sensitive | A smaller value would mean a larger threshold and thus less | |||
| to delay increase, and vice versa. | sensitivity to delay increase, and vice versa. | |||
| While all TCP implementations are REQUIRED to take at least one RTT | While all TCP implementations are REQUIRED to take at least one RTT | |||
| sample each round, implementations of HyStart++ are RECOMMENDED to | sample each round, implementations of HyStart++ are RECOMMENDED to | |||
| take at least N_RTT_SAMPLE RTT samples. Using lower values of | take at least N_RTT_SAMPLE RTT samples. Using lower values of | |||
| N_RTT_SAMPLE will lower the accuracy of the measured RTT for the | N_RTT_SAMPLE will lower the accuracy of the measured RTT for the | |||
| round; higher values will improve accuracy at the cost of more | round; higher values will improve accuracy at the cost of more | |||
| processing. | processing. | |||
| The minimum value of CSS_GROWTH_DIVISOR MUST be at least 2. A value | The minimum value of CSS_GROWTH_DIVISOR MUST be at least 2. A value | |||
| of 1 results in the same aggressive behavior as regular slow start. | of 1 results in the same aggressive behavior as regular slow start. | |||
| Values larger than 4 will cause the algorithm to be less aggressive | Values larger than 4 will cause the algorithm to be less aggressive | |||
| and maybe less performant. | and maybe less performant. | |||
| Smaller values of CSS_ROUNDS may miss detecting jitter and larger | Smaller values of CSS_ROUNDS may miss detecting jitter, and larger | |||
| values may limit performance. | values may limit performance. | |||
| Packet pacing [ASA00] is a possible mechanism to avoid large bursts | Packet pacing [ASA00] is a possible mechanism to avoid large bursts | |||
| and their associated harm. A paced TCP implementation SHOULD use L = | and their associated harm. A paced TCP implementation SHOULD use L = | |||
| infinity. Burst concerns are mitigated by pacing and this setting | infinity. Burst concerns are mitigated by pacing, and this setting | |||
| allows for optimal cwnd growth on modern networks. | allows for optimal cwnd growth on modern networks. | |||
| For TCP implementations that pace to mitigate burst concerns, L | For TCP implementations that pace to mitigate burst concerns, L | |||
| values smaller than INFINITY may suffer performance problems due to | values smaller than infinity may suffer performance problems due to | |||
| slow cwnd growth in high speed networks. For non-paced TCP | slow cwnd growth in high-speed networks. For non-paced TCP | |||
| implementations, L values smaller than 8 may suffer performance | implementations, L values smaller than 8 may suffer performance | |||
| problems due to slow cwnd growth in high speed networks; L values | problems due to slow cwnd growth in high-speed networks; L values | |||
| larger than 8 may cause an increase in burstiness and thereby loss | larger than 8 may cause an increase in burstiness and thereby loss | |||
| rates, and result in poor performance. | rates, and result in poor performance. | |||
| An implementation SHOULD use HyStart++ only for the initial slow | An implementation SHOULD use HyStart++ only for the initial slow | |||
| start (when ssthresh is at its initial value of arbitrarily high per | start (when the ssthresh is at its initial value of arbitrarily high | |||
| [RFC5681]) and fall back to using traditional slow start for the | per [RFC5681]) and fall back to using standard slow start for the | |||
| remainder of the connection lifetime. This is acceptable because | remainder of the connection lifetime. This is acceptable because | |||
| subsequent slow starts will use the discovered ssthresh value to exit | subsequent slow starts will use the discovered ssthresh value to exit | |||
| slow start and avoid the overshoot problem. An implementation MAY | slow start and avoid the overshoot problem. An implementation MAY | |||
| use HyStart++ to grow the restart window ([RFC5681]) after a long | use HyStart++ to grow the restart window [RFC5681] after a long idle | |||
| idle period. | period. | |||
| In application limited scenarios, the amount of data in flight could | In application-limited scenarios, the amount of data in flight could | |||
| fall below the bandwidth-delay product (BDP) and result in smaller | fall below the bandwidth-delay product (BDP) and result in smaller | |||
| RTT samples which can trigger an exit back to slow start. It is | RTT samples, which can trigger an exit back to slow start. It is | |||
| expected that a connection might oscillate between CSS and slow start | expected that a connection might oscillate between CSS and slow start | |||
| in such scenarios. But this behavior will neither result in a | in such scenarios. But this behavior will neither result in a | |||
| connection prematurely entering congestion avoidance nor cause | connection prematurely entering congestion avoidance nor cause | |||
| overshooting compared to slow start. | overshooting compared to slow start. | |||
| 5. Deployments and Performance Evaluations | 5. Deployments and Performance Evaluations | |||
| As of February 2023, HyStart++ as described in this document has been | At the time of this writing, HyStart++ as described in this document | |||
| default enabled for all TCP connections in the Windows operating | has been default enabled for all TCP connections in the Windows | |||
| system for over two years with pacing disabled and an actual L = 8. | operating system for over two years with pacing disabled and an | |||
| actual L = 8. | ||||
| In lab measurements with Windows TCP, HyStart++ shows both goodput | In lab measurements with Windows TCP, HyStart++ shows goodput | |||
| improvements as well as reductions in packet loss and retransmissions | improvements as well as reductions in packet loss and retransmissions | |||
| compared to traditional slow start. For example, across a variety of | compared to standard slow start. For example, across a variety of | |||
| tests on a 100 Mbps link with a bottleneck buffer size of bandwidth- | tests on a 100 Mbps link with a bottleneck buffer size of bandwidth- | |||
| delay product, HyStart++ reduces bytes retransmitted by 50% and | delay product, HyStart++ reduces bytes retransmitted by 50% and | |||
| retransmission timeouts (RTOs) by 36%. | retransmission timeouts (RTOs) by 36%. | |||
| In an A/B test where we compare HyStart++ draft 01 to traditional | In an A/B test where we compared an implementation of HyStart++ | |||
| slow start across a large Windows device population, out of 52 | (based on an earlier draft version of this document) to standard slow | |||
| billion TCP connections, 0.7% of connections move from 1 RTO to 0 | start across a large Windows device population, out of 52 billion TCP | |||
| RTOs and another 0.7% connections move from 2 RTOs to 1 RTO with | connections, 0.7% of connections move from 1 RTO to 0 RTOs and | |||
| HyStart++. This test did not focus on send-heavy connections and the | another 0.7% of connections move from 2 RTOs to 1 RTO with HyStart++. | |||
| impact on send-heavy connections is likely much higher. We plan to | This test did not focus on send-heavy connections, and the impact on | |||
| conduct more such production experiments to gather more data in the | send-heavy connections is likely much higher. We plan to conduct | |||
| future. | more such production experiments to gather more data in the future. | |||
| 6. Security Considerations | 6. Security Considerations | |||
| HyStart++ enhances slow start and inherits the general security | HyStart++ enhances slow start and inherits the general security | |||
| considerations discussed in [RFC5681]. | considerations discussed in [RFC5681]. | |||
| An attacker can cause Hystart++ to exit slow start prematurely and | An attacker can cause HyStart++ to exit slow start prematurely and | |||
| impair the performance of a TCP connection by, for example, dropping | impair the performance of a TCP connection by, for example, dropping | |||
| data packets or their acknowledgements. | data packets or their acknowledgments. | |||
| The ACK division attack outlined in [SCWA99] does not affect | The ACK division attack outlined in [SCWA99] does not affect | |||
| Hystart++ because the congestion window increase in Hystart++ is | HyStart++ because the congestion window increase in HyStart++ is | |||
| based on the number of bytes newly acknowledged in each arriving ACK | based on the number of bytes newly acknowledged in each arriving ACK | |||
| rather than by a particular constant on each arriving ACK. | rather than by a particular constant on each arriving ACK. | |||
| 7. IANA Considerations | 7. IANA Considerations | |||
| This document has no actions for IANA. | This document has no IANA actions. | |||
| 8. Acknowledgements | ||||
| During the discussions of this work on the TCPM mailing list, in | ||||
| working group meetings, helpful comments, critiques, and reviews were | ||||
| received from (listed alphabetically by last name): Mark Allman, Bob | ||||
| Briscoe, Neal Cardwell, Yuchung Cheng, Junho Choi, Martin Duke, Reese | ||||
| Enghardt, Christian Huitema, Ilpo Järvinen, Yoshifumi Nishida, | ||||
| Randall Stewart, and Michael Tuexen. | ||||
| 9. References | 8. References | |||
| 9.1. Normative References | 8.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
| Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | |||
| <https://www.rfc-editor.org/info/rfc5681>. | <https://www.rfc-editor.org/info/rfc5681>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| 9.2. Informative References | 8.2. Informative References | |||
| [ASA00] Aggarwal, A., Savage, S., and T. Anderson, "Understanding | [ASA00] Aggarwal, A., Savage, S., and T. Anderson, "Understanding | |||
| the Performance of TCP Pacing", Proceedings IEEE INFOCOM | the performance of TCP pacing", Proceedings IEEE INFOCOM | |||
| 2000, DOI 10.1109/INFCOM.2000.832483, 2000, | 2000, DOI 10.1109/INFCOM.2000.832483, March 2000, | |||
| <https://doi.org/10.1109/INFCOM.2000.832483>. | <https://doi.org/10.1109/INFCOM.2000.832483>. | |||
| [HyStart] Ha, S. and I. Ree, "Taming the elephants: New TCP slow | [HyStart] Ha, S. and I. Rhee, "Taming the elephants: New TCP slow | |||
| start", Computer Networks vol. 55, no. 9, pp. 2092-2110, | start", Computer Networks vol. 55, no. 9, pp. 2092-2110, | |||
| DOI 10.1016/j.comnet.2011.01.014, 2011, | DOI 10.1016/j.comnet.2011.01.014, June 2011, | |||
| <https://doi.org/10.1016/j.comnet.2011.01.014>. | <https://doi.org/10.1016/j.comnet.2011.01.014>. | |||
| [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - | [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - | |||
| Communication Layers", STD 3, RFC 1122, | Communication Layers", STD 3, RFC 1122, | |||
| DOI 10.17487/RFC1122, October 1989, | DOI 10.17487/RFC1122, October 1989, | |||
| <https://www.rfc-editor.org/info/rfc1122>. | <https://www.rfc-editor.org/info/rfc1122>. | |||
| [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, | [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, | |||
| DOI 10.17487/RFC1191, November 1990, | DOI 10.17487/RFC1191, November 1990, | |||
| <https://www.rfc-editor.org/info/rfc1191>. | <https://www.rfc-editor.org/info/rfc1191>. | |||
| skipping to change at page 9, line 37 ¶ | skipping to change at line 411 ¶ | |||
| [RFC9002] Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection | [RFC9002] Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection | |||
| and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, | and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, | |||
| May 2021, <https://www.rfc-editor.org/info/rfc9002>. | May 2021, <https://www.rfc-editor.org/info/rfc9002>. | |||
| [RFC9260] Stewart, R., Tüxen, M., and K. Nielsen, "Stream Control | [RFC9260] Stewart, R., Tüxen, M., and K. Nielsen, "Stream Control | |||
| Transmission Protocol", RFC 9260, DOI 10.17487/RFC9260, | Transmission Protocol", RFC 9260, DOI 10.17487/RFC9260, | |||
| June 2022, <https://www.rfc-editor.org/info/rfc9260>. | June 2022, <https://www.rfc-editor.org/info/rfc9260>. | |||
| [SCWA99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, | [SCWA99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, | |||
| "TCP congestion control with a misbehaving receiver", ACM | "TCP congestion control with a misbehaving receiver", ACM | |||
| Computer Communication Review, 29(5), | SIGCOMM Computer Communication Review, vol. 29, issue 5, | |||
| DOI 10.1145/505696.505704, 1999, | pp. 71-78, DOI 10.1145/505696.505704, October 1999, | |||
| <https://doi.org/10.1145/505696.505704>. | <https://doi.org/10.1145/505696.505704>. | |||
| Acknowledgments | ||||
| During the discussions of this work on the TCPM mailing list and in | ||||
| working group meetings, helpful comments, critiques, and reviews were | ||||
| received from (listed alphabetically by last name) Mark Allman, Bob | ||||
| Briscoe, Neal Cardwell, Yuchung Cheng, Junho Choi, Martin Duke, Reese | ||||
| Enghardt, Christian Huitema, Ilpo Järvinen, Yoshifumi Nishida, | ||||
| Randall Stewart, and Michael Tüxen. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Praveen Balasubramanian | Praveen Balasubramanian | |||
| Confluent | Confluent | |||
| 899 West Evelyn Ave | 899 West Evelyn Ave | |||
| Mountain View, CA 94041 | Mountain View, CA 94041 | |||
| United States of America | United States of America | |||
| Email: pravb.ietf@gmail.com | Email: pravb.ietf@gmail.com | |||
| Yi Huang | Yi Huang | |||
| Microsoft | Microsoft | |||
| One Microsoft Way | One Microsoft Way | |||
| Redmond, WA 94052 | Redmond, WA 98052 | |||
| United States of America | United States of America | |||
| Phone: +1 425 703 0447 | Phone: +1 425 703 0447 | |||
| Email: huanyi@microsoft.com | Email: huanyi@microsoft.com | |||
| Matt Olson | Matt Olson | |||
| Microsoft | Microsoft | |||
| One Microsoft Way | ||||
| Redmond, WA 98052 | ||||
| United States of America | ||||
| Phone: +1 425 538 8598 | Phone: +1 425 538 8598 | |||
| Email: maolson@microsoft.com | Email: maolson@microsoft.com | |||
| End of changes. 59 change blocks. | ||||
| 154 lines changed or deleted | 165 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||