rfc9840xml2.original.xml | rfc9840.xml | |||
---|---|---|---|---|
<?xml version="1.0" encoding="US-ASCII"?> | <?xml version='1.0' encoding='UTF-8'?> | |||
<!-- edited with XMLSPY v5 rel. 3 U (http://www.xmlspy.com) | ||||
by Daniel M Kohn (private) --> | ||||
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [ | ||||
<!DOCTYPE rfc [ | ||||
<!ENTITY nbsp " "> | ||||
<!ENTITY zwsp "​"> | ||||
<!ENTITY nbhy "‑"> | ||||
<!ENTITY wj "⁠"> | ||||
]> | ]> | |||
<rfc category="exp" docName="draft-irtf-iccrg-rledbat-10" | ||||
ipr="trust200902"> | ||||
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?> | ||||
<?rfc toc="yes" ?> | <rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="exp" docName="draft-ir | |||
tf-iccrg-rledbat-10" number="9840" consensus="true" ipr="trust200902" obsoletes= | ||||
<?rfc symrefs="yes" ?> | "" updates="" submissionType="IRTF" xml:lang="en" tocInclude="true" symRefs="tru | |||
e" sortRefs="true" version="3"> | ||||
<?rfc sortrefs="yes"?> | ||||
<?rfc iprnotified="no" ?> | ||||
<?rfc strict="yes" ?> | ||||
<front> | <front> | |||
<title abbrev="rLEDBAT">rLEDBAT: receiver-driven Low Extra Delay Background | <title abbrev="rLEDBAT">rLEDBAT: Receiver-Driven Low Extra Delay Background | |||
Transport for TCP | Transport for TCP</title> | |||
</title> | <seriesInfo name="RFC" value="9840"/> | |||
<author fullname="Marcelo Bagnulo" initials="M." surname="Bagnulo"> | <author fullname="Marcelo Bagnulo" initials="M." surname="Bagnulo"> | |||
<organization>Universidad Carlos III de Madrid</organization> | <organization>Universidad Carlos III de Madrid</organization> | |||
<address> | <address> | |||
<email>marcelo@it.uc3m.es</email> | <email>marcelo@it.uc3m.es</email> | |||
</address> | </address> | |||
</author> | </author> | |||
<author fullname="Alberto García-Martínez" initials="A." surname="García-Mar | ||||
<author fullname="Alberto Garcia-Martinez" initials="A." surname="Garcia-Mar | tínez"> | |||
tinez"> | ||||
<organization>Universidad Carlos III de Madrid</organization> | <organization>Universidad Carlos III de Madrid</organization> | |||
<address> | <address> | |||
<email>alberto@it.uc3m.es</email> | <email>alberto@it.uc3m.es</email> | |||
</address> | </address> | |||
</author> | </author> | |||
<author fullname="Gabriel Montenegro" initials="G." surname="Montenegro"> | <author fullname="Gabriel Montenegro" initials="G." surname="Montenegro"> | |||
<address> | <address> | |||
<email>g.e.montenegro@hotmail.com</email> | <email>g.e.montenegro@hotmail.com</email> | |||
</address> | </address> | |||
skipping to change at line 44 ¶ | skipping to change at line 33 ¶ | |||
<address> | <address> | |||
<email>alberto@it.uc3m.es</email> | <email>alberto@it.uc3m.es</email> | |||
</address> | </address> | |||
</author> | </author> | |||
<author fullname="Gabriel Montenegro" initials="G." surname="Montenegro"> | <author fullname="Gabriel Montenegro" initials="G." surname="Montenegro"> | |||
<address> | <address> | |||
<email>g.e.montenegro@hotmail.com</email> | <email>g.e.montenegro@hotmail.com</email> | |||
</address> | </address> | |||
</author> | </author> | |||
<author fullname="Praveen Balasubramanian " initials="P." surname="Balasubra manian"> | <author fullname="Praveen Balasubramanian " initials="P." surname="Balasubra manian"> | |||
<organization>Confluent</organization> | <organization>Confluent</organization> | |||
<address> | <address> | |||
<email>pravb.ietf@gmail.com</email> | <email>pravb.ietf@gmail.com</email> | |||
</address> | </address> | |||
</author> | </author> | |||
<date year="2025" month="September"/> | ||||
<date year="2025" /> | <workgroup>Internet Congestion Control</workgroup> | |||
<keyword>Congestion control</keyword> | ||||
<keyword>scavenger/less-than-best-effort traffic</keyword> | ||||
<abstract> | <abstract> | |||
<t> This document specifies rLEDBAT, a set of mechanisms that enable the e xecution of a less-than-best-effort congestion control algorithm for TCP at the receiver end. This document is a product of the Internet Congestion Control Rese arch Group (ICCRG) of the Internet Research Task Force (IRTF). | <t> This document specifies receiver-driven Low Extra Delay Background Tra nsport (rLEDBAT) -- a set of mechanisms that enable the execution of a less-than -best-effort congestion control algorithm for TCP at the receiver end. This docu ment is a product of the Internet Congestion Control Research Group (ICCRG) of t he Internet Research Task Force (IRTF). | |||
</t> | </t> | |||
</abstract> | </abstract> | |||
</front> | </front> | |||
<middle> | <middle> | |||
<section title="Introduction"> | <section numbered="true" toc="default"> | |||
<name>Introduction</name> | ||||
<t>LEDBAT (Low Extra Delay Background Transport) <xref target="RFC6817" / | <t>LEDBAT (Low Extra Delay Background Transport) <xref target="RFC6817" fo | |||
> is a congestion-control algorithm used for less-than-best-effort (LBE) traffic | rmat="default"/> is a congestion control algorithm used for less-than-best-effor | |||
.</t> | t (LBE) traffic.</t> | |||
<t>When LEDBAT traffic shares a bottleneck with other traffic using standa | ||||
<t>When LEDBAT traffic shares a bottleneck with other traffic using stand | rd congestion control algorithms (for example, TCP traffic using CUBIC <xref tar | |||
ard congestion control algorithms (for example, TCP traffic using Cubic<xref tar | get="RFC9438" format="default"/>, hereafter referred to as "standard-TCP" for sh | |||
get="RFC9438" />, hereafter referred as standard-TCP for short), it reduces its | ort), it reduces its sending rate earlier and more aggressively than standard-TC | |||
sending rate earlier and more aggressively than standard-TCP congestion control, | P congestion control, allowing other non-background traffic to use more of the a | |||
allowing other non-background traffic to use more of the available capacity. In | vailable capacity. In the absence of competing traffic, LEDBAT aims to make effi | |||
the absence of competing traffic, LEDBAT aims to make an efficient use of the a | cient use of the available capacity, while keeping the queuing delay within pred | |||
vailable capacity, while keeping the queuing delay within predefined bounds.</t> | efined bounds.</t> | |||
<t>LEDBAT reacts to both packet loss and variations in delay. With respec | ||||
<t>LEDBAT reacts both to packet loss and to variations in delay. With re | t to packet loss, LEDBAT reacts with a multiplicative decrease, similar to most | |||
spect to packet loss, LEDBAT reacts with a multiplicative decrease, similar to m | TCP congestion controllers. Regarding delay, LEDBAT aims for a target queuing de | |||
ost TCP congestion controllers. Regarding delay, LEDBAT aims for a target queuei | lay. When the measured current queuing delay is below the target, LEDBAT increas | |||
ng delay. When the measured current queueing delay is below the target, LEDBAT i | es the sending rate, and when the delay is above the target, it reduces the send | |||
ncreases the sending rate and when the delay is above the target, it reduces the | ing rate. LEDBAT estimates the queuing delay by subtracting the measured current | |||
sending rate. LEDBAT estimates the queuing delay by subtracting the measured cu | one-way delay from the estimated base one-way delay (i.e., the one-way delay in | |||
rrent one-way delay from the estimated base one-way delay (i.e. the one-way dela | the absence of queues). </t> | |||
y in the absence of queues). </t> | <t>The LEDBAT specification <xref target="RFC6817" format="default"/> defi | |||
nes the LEDBAT congestion control algorithm, implemented in the sender to contro | ||||
<t>The LEDBAT specification <xref target="RFC6817" /> defines the LEDBAT | l its sending rate. LEDBAT is specified in a protocol-agnostic and layer-agnosti | |||
congestion-control algorithm, implemented in the sender to control its sending r | c manner.</t> | |||
ate. LEDBAT is specified in a protocol and layer agnostic manner.</t> | <t>LEDBAT++ <xref target="I-D.irtf-iccrg-ledbat-plus-plus" format="default | |||
"/> is also an LBE congestion control algorithm that is inspired by LEDBAT while | ||||
<t>LEDBAT++ <xref target="I-D.irtf-iccrg-ledbat-plus-plus" /> is also an | addressing several problems identified with the original LEDBAT specification. | |||
LBE congestion control algorithm which is inspired by LEDBAT while addressing se | In particular, the differences between LEDBAT and LEDBAT++ include the following | |||
veral problems identified with the original LEDBAT specification. In particular | :</t> | |||
the differences between LEDBAT and LEDBAT++ include: i) LEDBAT++ uses the round- | ||||
trip-time (RTT) (as opposed to the one way delay used in LEDBAT) to estimate the | ||||
queuing delay; ii) LEDBAT++ uses an Additive Increase/Multiplicative Decrease a | ||||
lgorithm to achieve inter-LEDBAT++ fairness and avoid the late-comer advantage o | ||||
bserved in LEDBAT; iii) LEDBAT++ performs periodic slowdowns to improve the meas | ||||
urement of the base delay; iv) LEDBAT++ is defined for TCP.</t> | ||||
<t>In this specification, we describe rLEDBAT, a set of mechanisms that e | ||||
nable the execution of an LBE delay-based congestion control algorithm such as L | ||||
EDBAT or LEDBAT++ at the receiver end of a TCP connection.</t> | ||||
<t> The consensus of the Internet Congestion Control Research Group (ICCR | ||||
G) is to publish this document to encourage further experimentation and review o | ||||
f rLEDBAT. This document is not an IETF product and is not a standard. The statu | ||||
s of this document is experimental. In section 4 titled Experiment Consideration | ||||
s, we describe the purpose of the experiment and its current status. </t> | ||||
</section> | ||||
<section title="Motivations for rLEDBAT"> | ||||
<t>rLEDBAT enables new use cases and new deployment models, fostering the | ||||
use of LBE traffic. The following scenarios are enabled by rLEDBAT: | ||||
<list> | ||||
<t>Content Delivery Networks and more sophisticated file | ||||
distribution scenarios: Consider the case where the source of a file to be distr | ||||
ibuted (e.g., a software developer that wishes to distribute a software update) | ||||
would prefer to use LBE and it enables LEDBAT/LEDBAT++ in the servers containing | ||||
the source file. However, because the file is being distributed through a CDN t | ||||
hat does not implement LBE congestion control, the result is that the file trans | ||||
fers originated from CDN surrogates will not be using LBE. Interestingly enough, | ||||
in the case of the software update, the developer may also control the software | ||||
performing the download in the client, the receiver of the file, but because cu | ||||
rrent LEDBAT/LEDBAT++ are sender-based algorithms, controlling the client is not | ||||
enough to enable LBE congestion control in the communication. rLEDBAT would ena | ||||
ble the use of LBE traffic class for file distribution in this setup. </t> | ||||
<t>Interference from proxies and other middleboxes: Proxies and o | ||||
ther middleboxes are commonplace in the Internet. For instance, in the case of m | ||||
obile networks, proxies are frequently used. In the case of enterprise networks, | ||||
it is common to deploy corporate proxies for filtering and firewalling. In the | ||||
case of satellite links, Performance Enhancement Proxies (PEPs) are deployed to | ||||
mitigate the effect of the long delay in TCP connection. These proxies terminate | ||||
the TCP connection on both ends and prevent the use of LBE congestion control | ||||
in the segment between the proxy and the sink of the content, the client. By ena | ||||
bling rLEDBAT, clients would be able to enable LBE traffic between them and the | ||||
proxy.</t> | ||||
<t>Receiver-defined preferences. It is frequent that the bottlene | ||||
ck of the communication is the access link. This is particularly true in the cas | ||||
e of mobile devices. It is then especially relevant for mobile devices to proper | ||||
ly manage the capacity of the access link. With current technologies, it is poss | ||||
ible for the mobile device to use different congestion control algorithms expres | ||||
sing different preferences for the traffic. For instance, a device can choose to | ||||
use standard-TCP for some traffic and to use LEDBAT/LEDBAT++ for other traffic. | ||||
However, this would only affect the outgoing traffic since both standard-TCP an | ||||
d LEDBAT/LEDBAT++ are sender-driven. The mobile device has no means to manage th | ||||
e traffic in the down-link, which is in most cases, the communication bottleneck | ||||
for a typical eye-ball end-user. rLEDBAT enables the mobile device to selective | ||||
ly use LBE traffic class for some of the incoming traffic. For instance, by usin | ||||
g rLEDBAT, a user can use regular standard-TCP/UDP for video stream (e.g., Youtu | ||||
be) and use rLEDBAT for other background file download.</t> | ||||
</list></t> | ||||
</section> | ||||
<section title="rLEDBAT mechanisms"> | ||||
<t>rLEDBAT provides the mechanisms to implement an LBE congestion control | ||||
algorithm at the receiver-end of a TCP connection. The rLEDBAT receiver control | ||||
s the sender's rate through the Receive Window announced by the receiver in the | ||||
TCP header.</t> | ||||
<t>rLEDBAT assumes that the sender is a standard TCP sender. rLEDBAT does | ||||
not require any rLEDBAT-specific modifications to the TCP sender. The envisione | ||||
d deployment model for rLEDBAT is that the clients implement rLEDBAT and this en | ||||
ables rLEDBAT in communications with existent standard TCP senders. In particul | ||||
ar, the sender MUST implement <xref target="RFC9293" /> and it also MUST impleme | ||||
nt the Time Stamp Option as defined in <xref target="RFC7323" />. Also, the send | ||||
er should implement some of the standard congestion control mechanisms, such as | ||||
Cubic <xref target="RFC9438" /> or New Reno <xref target="RFC5681" />. </t> | ||||
<t>rLEDBAT does not define a new congestion control algorithm. The LBE co | ||||
ngestion control algorithm executed in the rLEDBAT receiver is defined in other | ||||
documents. The rLEDBAT receiver MUST use an LBE congestion control algorithm. Be | ||||
cause rLEDBAT assumes a standard TCP sender, the sender will be using a "best ef | ||||
fort" congestion control algorithm (such as Cubic or New Reno). Since rLEDBAT us | ||||
es the Receive Window to control the sender's rate and the sender calculates the | ||||
sender's window as the minimum of the Receive window and the congestion window, | ||||
rLEDBAT will only be effective as long as the congestion control algorithm exec | ||||
uted in the receiver yields a smaller window than the one calculated by the send | ||||
er. This is normally the case when the receiver is using an LBE congestion contr | ||||
ol algorithm. The rLEDBAT receiver SHOULD use the LEDBAT congestion control algo | ||||
rithm <xref target="RFC6817" /> or the LEDBAT++ congestion control algorithm <xr | ||||
ef target="I-D.irtf-iccrg-ledbat-plus-plus" />. The rLEDBAT MAY use other LBE co | ||||
ngestion control algorithms defined elsewhere. Irrespective of which congestion | ||||
control algorithm is executed in the receiver, an rLEDBAT connection will never | ||||
be more aggressive than standard-TCP since it is always bounded by the congestio | ||||
n control algorithm executed at the sender.</t> | ||||
<t>rLEDBAT is essentially composed of three types of mechanisms, namely, | ||||
those that provide the means to measure the packet delay (either the round trip | ||||
time or the one way delay, depending on the selected algorithm), mechanisms to d | ||||
etect packet loss and the means to manipulate the Receive Window to control the | ||||
sender's rate. The former provide input to the LBE congestion control algorithm | ||||
while the latter uses the congestion window computed by the LBE congestion contr | ||||
ol algorithm to manipulate the Receive window, as depicted in the figure.</t> | ||||
<figure title="The rLEDBAT architecture."> | <ol spacing="normal" type="%i)"> | |||
<artwork align="center"><![CDATA[ | <li>LEDBAT++ uses the round-trip time (RTT) (as opposed to the one-way delay us | |||
ed in LEDBAT) to estimate the queuing delay.</li> | ||||
<li>LEDBAT++ uses an additive increase/multiplicative decrease algorithm to ach | ||||
ieve inter-LEDBAT++ fairness and avoid the latecomer advantage observed in LEDBA | ||||
T.</li> | ||||
<li>LEDBAT++ performs periodic slowdowns to improve the measurement of the base | ||||
delay.</li> | ||||
<li>LEDBAT++ is defined for TCP.</li> | ||||
</ol> | ||||
<t>In this specification, we describe receiver-driven Low Extra Delay Back | ||||
ground Transport (rLEDBAT) -- a set of mechanisms that enable the execution of a | ||||
n LBE delay-based congestion control algorithm such as LEDBAT or LEDBAT++ at the | ||||
receiver end of a TCP connection.</t> | ||||
<t> The consensus of the Internet Congestion Control Research Group (ICCRG | ||||
) is to publish this document to encourage further experimentation and review of | ||||
rLEDBAT. This document is not an IETF product and is not an Internet Standards | ||||
Track specification. The status of this document is Experimental. In <xref targe | ||||
t="sect-5" format="default"/> ("<xref target="sect-5" format="title"/>"), we des | ||||
cribe the purpose of the experiment and its current status. </t> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Conventions and Terminology</name> | ||||
<t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", | ||||
"<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", | ||||
"<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>", | ||||
"<bcp14>SHOULD NOT</bcp14>", | ||||
"<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>", | ||||
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document | ||||
are to be interpreted as described in BCP 14 | ||||
<xref target="RFC2119"/> <xref target="RFC8174"/> when, and only | ||||
when, they appear in all capitals, as shown here.</t> | ||||
<t>We use the following abbreviations throughout the text and include them | ||||
here for the reader's convenience:</t> | ||||
<dl spacing="normal" newline="false"> | ||||
<dt>RCV.WND:</dt><dd>The value included in the Receive Window field of | ||||
the TCP header (the computation of which is modified by its | ||||
specification).</dd> | ||||
<dt>SND.WND:</dt><dd>The TCP sender's window.</dd> | ||||
<dt>cwnd:</dt><dd>The congestion window as computed by the congestion | ||||
control algorithm running at the TCP sender.</dd> | ||||
<dt>RLWND:</dt><dd>The window value calculated by the rLEDBAT algorithm. | ||||
</dd> | ||||
<dt>fcwnd:</dt><dd>The value that a standard-TCP receiver compliant with | ||||
<xref target="RFC9293"/> | ||||
calculates to set in the receive window for flow control | ||||
purposes.</dd> | ||||
<dt>RCV.HGH:</dt><dd>The highest sequence number corresponding to a | ||||
received byte of data at one point in time.</dd> | ||||
<dt>TSV.HGH:</dt><dd>The Timestamp Value (TSval) <xref target="RFC7323" | ||||
format="default"/> corresponding to the | ||||
segment in which RCV.HGH was carried at that point in time.</dd> | ||||
<dt>SEG.SEQ:</dt><dd>The sequence number of the last received segment.</ | ||||
dd> | ||||
<dt>TSV.SEQ:</dt><dd>The TSval of the last received segment.</dd> | ||||
</dl> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>Motivations for rLEDBAT</name> | ||||
<t>rLEDBAT enables new use cases and new deployment models, fostering the | ||||
use of LBE traffic. The following scenarios are enabled by rLEDBAT: | ||||
</t> | ||||
<dl spacing="normal" newline="true"> | ||||
<dt>Content Delivery Networks (CDNs) and more sophisticated file distrib | ||||
ution scenarios:</dt> | ||||
<dd>Consider the case where the source of a file to be distributed (e.g., a soft | ||||
ware developer that wishes to distribute a software update) would prefer to use | ||||
LBE and enables LEDBAT/LEDBAT++ in the servers containing the source file. Howev | ||||
er, because the file is being distributed through a CDN that does not implement | ||||
LBE congestion control, the result is that the file transfers originated from CD | ||||
N surrogates will not be using LBE. Interestingly enough, in the case of the sof | ||||
tware update, the developer may also control the software performing the downloa | ||||
d in the client (the receiver of the file), but because current LEDBAT/LEDBAT++ | ||||
are sender-based algorithms, controlling the client is not enough to enable LBE | ||||
congestion control in the communication. rLEDBAT would enable the use of a | ||||
n LBE traffic class for file distribution in this setup.</dd> | ||||
<dt>Interference from proxies and other middleboxes:</dt> | ||||
<dd>Proxies and other middleboxes are commonplace in the Internet. For instance, | ||||
in the case of mobile networks, proxies are frequently used. In the case of ent | ||||
erprise networks, it is common to deploy corporate proxies for filtering and fir | ||||
ewalling. In the case of satellite links, Performance Enhancing Proxies (PEPs) a | ||||
re deployed to mitigate the effect of long delays in a TCP connection. These pro | ||||
xies terminate the TCP connection on both ends and prevent the use of LBE conges | ||||
tion control in the segment between the proxy and the sink of the content, the c | ||||
lient. By enabling rLEDBAT, clients can then enable LBE traffic between them and | ||||
the proxy.</dd> | ||||
<dt>Receiver-defined preferences:</dt> | ||||
<dd>Frequently, the access link is the communication bottleneck. This is particu | ||||
larly true in the case of mobile devices. It is then especially relevant for mob | ||||
ile devices to properly manage the capacity of the access link. With current tec | ||||
hnologies, it is possible for the mobile device to use different congestion cont | ||||
rol algorithms expressing different preferences for the traffic. For instance, a | ||||
device can choose to use standard-TCP for some traffic and use LEDBAT/LEDBAT++ | ||||
for other traffic. However, this would only affect the outgoing traffic, since b | ||||
oth standard-TCP and LEDBAT/LEDBAT++ are driven by the sender. The mobile device | ||||
has no means to manage the traffic in the downlink, which is, in most cases, th | ||||
e communication bottleneck for a typical "eyeball" end user. rLEDBAT enabl | ||||
es the mobile device to selectively use an LBE traffic class for some of the inc | ||||
oming traffic. For instance, by using rLEDBAT, a user can use regular standard-T | ||||
CP/UDP for a video stream (e.g., YouTube) and use rLEDBAT for other background f | ||||
ile downloads.</dd> | ||||
</dl> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>rLEDBAT Mechanisms</name> | ||||
<t>rLEDBAT provides the mechanisms to implement an LBE congestion control | ||||
algorithm at the receiver end of a TCP connection. The rLEDBAT receiver controls | ||||
the sender's rate through the receive window announced by the receiver in the T | ||||
CP header.</t> | ||||
<t>rLEDBAT assumes that the sender is a standard-TCP sender. rLEDBAT | ||||
does not require any rLEDBAT-specific modifications to the TCP sender. The envi | ||||
sioned deployment model for rLEDBAT is that the clients implement rLEDBAT and th | ||||
is enables rLEDBAT in communications with existing standard-TCP senders. In par | ||||
ticular, the sender <bcp14>MUST</bcp14> implement <xref target="RFC9293" format= | ||||
"default"/> and also <bcp14>MUST</bcp14> implement the TCP Timestamps (TS) optio | ||||
n as defined in <xref target="RFC7323" format="default"/>. Also, the sender shou | ||||
ld implement some of the standard congestion control mechanisms, such as CUBIC < | ||||
xref target="RFC9438" format="default"/> or NewReno <xref target="RFC5681" forma | ||||
t="default"/> <xref target="RFC6582"/>.</t> | ||||
<t>rLEDBAT does not define a new congestion control algorithm. The LBE con | ||||
gestion control algorithm executed in the rLEDBAT receiver is defined in other d | ||||
ocuments. The rLEDBAT receiver <bcp14>MUST</bcp14> use an LBE congestion control | ||||
algorithm. Because rLEDBAT assumes a standard-TCP sender, the sender will be us | ||||
ing a "best effort" congestion control algorithm (such as CUBIC or NewReno). Sin | ||||
ce rLEDBAT uses the receive window to control the sender's rate and the sender c | ||||
alculates the sender's window as the minimum of the receive window and the conge | ||||
stion window, rLEDBAT will only be effective as long as the congestion control a | ||||
lgorithm executed in the receiver yields a smaller window than the one calculate | ||||
d by the sender. This is normally the case when the receiver is using an LBE con | ||||
gestion control algorithm. The rLEDBAT receiver <bcp14>SHOULD</bcp14> use the LE | ||||
DBAT congestion control algorithm <xref target="RFC6817" format="default"/> or t | ||||
he LEDBAT++ congestion control algorithm <xref target="I-D.irtf-iccrg-ledbat-plu | ||||
s-plus" format="default"/>. rLEDBAT <bcp14>MAY</bcp14> use other LBE conge | ||||
stion control algorithms defined elsewhere. Irrespective of which congestion con | ||||
trol algorithm is executed in the receiver, a rLEDBAT connection will never be m | ||||
ore aggressive than standard-TCP, since it is always bounded by the congestion c | ||||
ontrol algorithm executed at the sender.</t> | ||||
<t>rLEDBAT is essentially composed of three types of mechanisms, namely | ||||
those that provide the means to measure the packet delay (either the RTT or the | ||||
one-way delay, depending on the selected algorithm), mechanisms to detect packet | ||||
loss, and the means to manipulate the receive window to control the sender's ra | ||||
te. The first two provide input to the LBE congestion control algorithm, while t | ||||
he third uses the congestion window computed by the LBE congestion control algor | ||||
ithm to manipulate the receive window, as depicted in <xref target="fig1"/>.</t> | ||||
<figure anchor="fig1"> | ||||
<name>The rLEDBAT Architecture</name> | ||||
<artwork align="center" name="" type="" alt=""><![CDATA[ | ||||
+------------------------------------------+ | +------------------------------------------+ | |||
| TCP receiver | | | TCP Receiver | | |||
| +-----------------+ | | | +-----------------+ | | |||
| | +------------+ | | | | | +------------+ | | | |||
| +---------------------| RTT | | | | | +---------------------| RTT | | | | |||
| | | | Estimation | | | | | | | | Estimation | | | | |||
| | | +------------+ | | | | | | +------------+ | | | |||
| | | | | | | | | | | | |||
| | | +------------+ | | | | | | +------------+ | | | |||
| | +--------------| Loss, RTX | | | | | | +--------------| Loss, RTX | | | | |||
| | | | | Detection | | | | | | | | | Detection | | | | |||
| | | | +------------+ | | | | | | | +------------+ | | | |||
| v v | | | | | v v | | | | |||
| +----------------+ | | | | | +----------------+ | | | | |||
| | LBE Congestion | | rLEDBAT | | | | | LBE Congestion | | rLEDBAT | | | |||
| | Control | | | | | | | Control | | | | | |||
| +----------------+ | | | | | +----------------+ | | | | |||
| | | +------------+ | | | | | | +------------+ | | | |||
| | | | RCV-WND | | | | | | | | RCV.WND | | | | |||
| +---------------->| Control | | | | | +---------------->| Control | | | | |||
| | +------------+ | | | | | +------------+ | | | |||
| +-----------------+ | | | +-----------------+ | | |||
+------------------------------------------+ | +------------------------------------------+ | |||
]]></artwork> | ]]></artwork> | |||
</figure> | ||||
<t>We next describe each of the rLEDBAT components.</t> | ||||
<section numbered="true" toc="default"> | ||||
<name>Controlling the Receive Window</name> | ||||
<t>rLEDBAT uses the TCP receive window (RCV.WND) to enable the receiver | ||||
to control the sender's rate. <xref target="RFC9293" format="default"/> specifi | ||||
es that the RCV.WND is used to announce the available receive buffer to the send | ||||
er for flow control purposes. In order to avoid confusion, we will call fcwnd th | ||||
e value that a standard-TCP receiver compliant with <xref target="RFC9293"/> cal | ||||
culates to set in the receive window for flow control purposes. We call RLWND th | ||||
e window value calculated by the rLEDBAT algorithm, and we call RCV.WND the valu | ||||
e actually included in the Receive Window field of the TCP header. For a receive | ||||
r compliant with <xref target="RFC9293"/>, RCV.WND == fcwnd.</t> | ||||
<t>In the case of the rLEDBAT receiver, this receiver <bcp14>MUST NOT</b | ||||
cp14> set the RCV.WND to a value larger than fcwnd and <bcp14>SHOULD</bcp14> set | ||||
the RCV.WND to the minimum of RLWND and fcwnd, honoring both.</t> | ||||
<t>When using rLEDBAT, two congestion controllers are in action in the f | ||||
low of data from the sender to the receiver, namely the TCP congestion control a | ||||
lgorithm on the sender side and the LBE congestion control algorithm executed i | ||||
n the receiver and conveyed to the sender through the RCV.WND. In the normal TCP | ||||
operation, the sender uses the minimum of the cwnd and the RCV.WND to calculate | ||||
the SND.WND. This is also true for rLEDBAT, as the sender is a regular TCP send | ||||
er. This guarantees that the rLEDBAT flow will never transmit more aggressively | ||||
than a standard-TCP flow, as the sender's congestion window limits the sending r | ||||
ate. Moreover, because an LBE congestion control algorithm such as LEDBAT/LEDBAT | ||||
++ is designed to react earlier and more aggressively to congestion than regular | ||||
TCP congestion control, the RLWND contained in the TCP RCV.WND field will gener | ||||
ally be smaller than the congestion window calculated by the TCP sender, implyin | ||||
g that the rLEDBAT congestion control algorithm will be effectively controlling | ||||
the sender's window. One exception to this scenario is that at the beginning of | ||||
the connection, when there is no information to set RLWND, RLWND is set to its | ||||
maximum value, so that the sending rate of the sender is governed by the flow co | ||||
ntrol algorithm of the receiver and the TCP slow start mechanism of the sender.< | ||||
/t> | ||||
<t>In summary, the sender's window is SND.WND = min(cwnd, RLWND, fcwnd)< | ||||
/t> | ||||
<section numbered="true" toc="default"> | ||||
<name>Avoiding Window Shrinking</name> | ||||
<t>The LEDBAT/LEDBAT++ algorithm executed in a rLEDBAT receiver increa | ||||
ses or decreases RLWND according to congestion signals (variations in the estima | ||||
ted queuing delay and packet loss). | ||||
</figure> | If RLWND is decreased and directly announced in RCV.WND, | |||
this could lead to an announced window that is smaller than what is currently in | ||||
<t>We describe each of the rLEDBAT components next.</t> | use. This so-called "shrinking the window" is discouraged as per <xref target=" | |||
RFC9293" format="default"/>, as it may cause unnecessary packet loss and perform | ||||
<section title="Controlling the receive window"> | ance penalties. To be consistent with <xref target="RFC9293" format="default"/>, | |||
the rLEDBAT receiver <bcp14>SHOULD NOT</bcp14> shrink the receive window. </t> | ||||
<t>rLEDBAT uses the Receive Window (RCV.WND) of TCP to enable the receive | <t>In order to avoid window shrinking, the receiver <bcp14>MUST</bcp14 | |||
r to control the sender's rate. <xref target="RFC9293" /> defines that the RCV. | > only reduce RCV.WND by the number of bytes contained in a received data packet | |||
WND is used to announce the available receive buffer to the sender for flow cont | . This may fall short to honor the new calculated value of the RLWND immediately | |||
rol purposes. In order to avoid confusion, we will call fcwnd the value that a s | . However, the receiver <bcp14>SHOULD</bcp14> progressively reduce the advertise | |||
tandard RFC793bis TCP receiver calculates to set in the receive window for flow | d RCV.WND, always honoring that the reduction is less than or equal to the recei | |||
control purposes. We call RLWND the window value calculated by rLEDBAT algorithm | ved bytes, until the target window determined by the rLEDBAT algorithm is reache | |||
and we call RCV.WND the value actually included in the Receive Window field of | d. | |||
the TCP header. For a RFC793bis receiver, RCV.WND == fcwnd.</t> | This implies that it may take up to one RTT for the rLEDBAT receiver to drain en | |||
<t>In the case of rLEDBAT receiver, the rLEDBAT receiver MUST NOT set the | ough in-flight bytes to completely close its receive window without shrinking it | |||
RCV.WND to a value larger than fcwnd and it SHOULD set the RCV.WND to the minim | . This is sufficient to honor the window output from the LEDBAT/LEDBAT++ algorit | |||
um of RLWND and fcwnd, honoring both.</t> | hms, since they are only allowed to perform at most one multiplicative decrease | |||
per RTT.</t> | ||||
<t>When using rLEDBAT, two congestion controllers are in action in the fl | </section> | |||
ow of data from the sender to the receiver, namely, the congestion control algor | <section numbered="true" toc="default"> | |||
ithm of TCP in the sender side and the LBE congestion control algorithm execute | <name>Setting the Window Scale Option</name> | |||
d in the receiver and conveyed to the sender through the RCV.WND. In the normal | <t>The Window Scale (WS) option <xref target="RFC7323" format="default | |||
TCP operation, the sender uses the minimum of the congestion window cwnd and the | "/> is a means to increase the maximum window size permitted by the receive wind | |||
receiver window RCV.WND to calculate the sender's window SND.WND. This is also | ow. The WS option defines a scale factor that restricts the granularity of the r | |||
true for rLEDBAT, as the sender is a regular TCP sender. This guarantees that th | eceive window that can be announced. This means that the rLEDBAT client will hav | |||
e rLEDBAT flow will never transmit more aggressively than a standard-TCP flow, a | e to accumulate the increases resulting from multiple received packets and only | |||
s the sender's congestion window limits the sending rate. Moreover, because a LB | convey a change in the window when the accumulated sum of increases is equal to | |||
E congestion control algorithm such as LEDBAT/LEDBAT++ is designed to react earl | or higher than one increase step as imposed by the scaling factor according to t | |||
ier and more aggressively to congestion than regular TCP congestion control, the | he WS option in place for the TCP connection.</t> | |||
RLWND contained in the RCV.WND field of TCP will be in general smaller than the | <t>Changes in the receive window that are smaller than 1 MSS (Maximum | |||
congestion window calculated by the TCP sender, implying that the rLEDBAT conge | Segment Size) are unlikely to have any immediate impact on the sender's rate. As | |||
stion control algorithm will be effectively controlling the sender's window. On | usual, TCP's segmentation practice results in sending full segments (i.e., segm | |||
e exception to this is at the beginning of the connection, when there is no info | ents of size equal to the MSS). <xref target="RFC7323" format="default"/>, which | |||
rmation to set RLWND, then, RLWND is set to its maximum value, so that the sendi | defines the WS option, specifies that allowed values for the WS option are betw | |||
ng rate of the sender is governed by the flow control algorithm of the receiver | een 0 and 14. Assuming an MSS of around 1500 bytes, WS option values between 0 a | |||
and the TCP slow start mechanism of the sender.</t> | nd 11 result in the receive window being expressed in units that are about 1 MSS | |||
or smaller. So, WS option values between 0 and 11 have no impact in rLEDBAT (un | ||||
<t>In summary, the sender's window is: SND.WND = min(cwnd, RLWND, fcwnd)< | less packets smaller than the MSS are being exchanged).</t> | |||
/t> | <t>WS option values higher than 11 can affect the dynamics of rLEDBAT, | |||
since control may become too coarse (e.g., with a WS option value of 14, a chan | ||||
<section title="Avoiding window shrinking"> | ge in one unit of the receive window implies a change of 10 MSS in the effective | |||
window). | ||||
<t>The LEDBAT/LEDBAT++ algorithm executed in a rLEDBAT receiver i | </t> | |||
ncreases or decreases RLWND according to congestion signals (variations on the e | <t>For the above reasons, the rLEDBAT client <bcp14>SHOULD</bcp14> set | |||
stimated queueing delay and packet loss). | WS option values lower than 12. Additional experimentation is required to explo | |||
re the impact of larger WS values on rLEDBAT dynamics.</t> | ||||
If RLWND is decreased and directly announced in RCV.WND, | <t>Note that the recommendation for rLEDBAT to set the WS option value | |||
this could lead to an announced window that is smaller than what is currently in | s to lower values does not preclude communication with servers that set the WS o | |||
use. This so called 'shrinking the window' is discouraged as per <xref target=" | ption values to larger values, since WS option values are set independently for | |||
RFC9293" />, as it may cause unnecessary packet loss and performance penalty. To | each direction of the TCP connection.</t> | |||
be consistent with <xref target="RFC9293" />, the rLEDBAT receiver SHOULD NOT s | </section> | |||
hrink the receive window. </t> | </section> | |||
<section numbered="true" toc="default"> | ||||
<t>In order to avoid window shrinking, the receiver MUST only red | <name>Measuring Delays</name> | |||
uce RCV.WND by the number of bytes upon of a received data packet. This may fall | <t>Both LEDBAT and LEDBAT++ measure base and current delays to estimate | |||
short to honor the new calculated value of the RLWND immediately. However, the | the queuing delay. LEDBAT uses the one-way delay, while LEDBAT++ uses the RTT. I | |||
receiver SHOULD progressively reduce the advertised RCV.WND, always honoring tha | n the next sections, we describe how rLEDBAT mechanisms enable the receiver to m | |||
t the reduction is less or equal than the received bytes, until the target windo | easure the one-way delay or the RTT -- whichever is needed, depending on the con | |||
w determined by the rLEDBAT algorithm is reached. | gestion control algorithm used.</t> | |||
This implies that it may take up to one RTT for the rLEDBAT receiver to drain en | <section numbered="true" toc="default"> | |||
ough in-flight bytes to completely close its receive window without shrinking it | <name>Measuring RTT to Estimate the Queuing Delay</name> | |||
. This is sufficient to honor the window output from the LEDBAT/LEDBAT++ algorit | <t>LEDBAT++ uses the RTT to estimate the queuing delay. In order to es | |||
hms since they only allow to perform at most one multiplicative decrease per RTT | timate the queuing delay using RTT, the rLEDBAT receiver estimates the base RTT | |||
.</t> | (i.e., the constant components of RTT) and also measures the current RTT. By sub | |||
</section> | tracting these two values, we obtain the queuing delay to be used by the rLEDBAT | |||
controller.</t> | ||||
<section title="Setting the Window Scale Option"> | <t>LEDBAT++ discovers the base RTT (RTTb) by taking the minimum value | |||
of the measured RTTs over a period of time. The current RTT (RTTc) is estimated | ||||
<t>The Window Scale (WS) option <xref target="RFC7323" /> is a me | using a number of recent samples and applying a filter, such as the minimum (or | |||
ans to increase the maximum window size permitted by the Receive Window. The WS | the mean) of the last k samples. Using RTT to estimate the queuing delay has a n | |||
option defines a scale factor which restricts the granularity of the receive win | umber of shortcomings and difficulties, as discussed below.</t> | |||
dow that can be announced. This means that the rLEDBAT client will have to accum | <t>The queuing delay measured using RTT also includes the queuing dela | |||
ulate the increases resulting from multiple received packets, and only convey a | y experienced by the return packets in the direction from the rLEDBAT receiver t | |||
change in the window when the accumulated sum of increases is equal or higher th | o the sender. This is a fundamental limitation of this approach. The impact of t | |||
an one increase step as imposed by the scaling factor according to the WS option | his limitation is that the rLEDBAT controller will also react to congestion in t | |||
in place for the TCP connection.</t> | he reverse path direction, resulting in an even more conservative mechanism.</t> | |||
<t>In order to measure RTT, the rLEDBAT client <bcp14>MUST</bcp14> ena | ||||
<t>Changes in the receive window that are smaller than 1 MSS are | ble the TS option <xref target="RFC7323" format="default"/>. By matching the TSv | |||
unlikely to have any immediate impact on the sender's rate, as usual TCP's segme | al carried in outgoing packets with the Timestamp Echo Reply (TSecr) value <xref | |||
ntation practice results in sending full segments (i.e., segments of size equal | target="RFC7323" format="default"/> observed in incoming packets, it is possibl | |||
to the MSS). Current WS option specification <xref target="RFC7323" /> defines t | e to measure RTT. This allows the rLEDBAT receiver to measure RTT even if it is | |||
hat allowed values for the WS option are between 0 and 14. Assuming a MSS around | acting as a pure receiver. In a pure receiver, there is no data flowing from the | |||
1500 bytes, WS option values between 0 and 11 result in the receive window bein | rLEDBAT receiver to the sender, making it impossible to match data packets with | |||
g expressed in units that are about 1 MSS or smaller. So, WS option values betwe | Acknowledgment packets to measure RTT, in contrast to what is usually done in T | |||
en 0 and 11 have no impact in rLEDBAT (unless packets smaller than the MSS are b | CP for other purposes.</t> | |||
eing exchanged).</t> | <t>Depending on the frequency of the local clock used to generate the | |||
values included in the TS option, several packets may carry the same TSval. If t | ||||
<t>WS option values higher than 11 can affect the dynamics of rLE | hat happens, the rLEDBAT receiver will be unable to match the different outgoing | |||
DBAT, since control may become too coarse (e.g., with WS of 14, a change in one | packets carrying the same TSval with the different incoming packets also carryi | |||
unit of the receive window implies a change of 10 MSS in the effective window).< | ng the same TSecr value. However, it is not necessary for rLEDBAT to use all pac | |||
/t> | kets to estimate RTT, and sampling a subset of in-flight packets per RTT is enou | |||
<t>For the above reasons, the rLEDBAT client SHOULD set WS option | gh to properly assess the queuing delay. RTT <bcp14>MUST</bcp14> then be calcula | |||
values lower than 12. Additional experimentation is required to explore the imp | ted as the time since the first packet with a given TSval was sent and the first | |||
act of larger WS values on rLEDBAT dynamics.</t> | packet that was received with the same value contained in the TSecr. Other pack | |||
<t>Note that the recommendation for rLEDBAT to set the WS option | ets with repeated TS values <bcp14>SHOULD NOT</bcp14> be used for RTT calculatio | |||
value to lower values does not precludes the communication with servers that set | ns. </t> | |||
the WS option values to larger values, since the WS option value is set indepen | <t>Several issues must be addressed in order to avoid an artificial in | |||
dently for each direction of the TCP connection.</t> | crease in the observed RTT. Different issues emerge, depending on whether the | |||
</section> | rLEDBAT-capable host is sending data packets or pure ACKs to measure RTT. We nex | |||
</section> | t consider these issues separately.</t> | |||
<section numbered="true" toc="default"> | ||||
<section title="Measuring delays"> | <name>Measuring RTT When Sending Pure ACKs</name> | |||
<t>In this scenario, the rLEDBAT node (node A) sends a pure ACK to t | ||||
<t>Both LEDBAT and LEDBAT++ measure base and current delays to estimate t | he other endpoint of the TCP connection (node B), including the TS option. Upon | |||
he queueing delay. LEDBAT uses the one way delay while LEDBAT++ uses the round t | the reception of the TS option, host B will copy the value of the TSval into the | |||
rip time. In the next sections we describe how rLEDBAT mechanisms enable the rec | TSecr field of the TS option and include that option in the next data packet to | |||
eiver to measure the one way delay or the round trip time, whatever is needed de | wards host A. However, there are two reasons why B may not send a packet immedia | |||
pending on the congestion control algorithm used.</t> | tely back to A, artificially increasing the measured RTT. The first reason is wh | |||
en A has no data to send. | ||||
<section title="Measuring RTT to estimate the queueing delay"> | The second is when A has no available window to put more packets in flight. We n | |||
ext describe how each of these cases is addressed.</t> | ||||
<t>LEDBAT++ uses the round trip time (RTT) to estimate the queueing delay | <t>The case where host B has no data to send when it receives the pu | |||
. In order to estimate the queueing delay using RTT, the rLEDBAT receiver estima | re Acknowledgment is expected to be rare in the rLEDBAT use cases. rLEDBAT | |||
tes the base RTT (i.e., the constant components of RTT) and also measures the cu | will be used mostly for background file transfers, so the expected common case | |||
rrent RTT. By subtracting these two values, we obtain the queuing delay to be us | is that the sender will have data to send throughout the lifetime of the communi | |||
ed by the rLEDBAT controller.</t> | cation. However, if, for example, the file is structured in blocks of data, it m | |||
ay be the case that the sender will seldom have to wait until the next block is | ||||
<t>LEDBAT++ discovers the base RTT (RTTb) by taking the minimum value of | available to proceed with the data transfer. To address this situation, the filt | |||
the measured RTTs over a period of time. The current RTT (RTTc) is estimated usi | er used by the congestion control algorithm executed in the receiver <bcp14>SHOU | |||
ng a number of recent samples and applying a filter, such as the minimum (or the | LD</bcp14> discard outliers (e.g., a MIN filter <xref target="RFC6817"/> would a | |||
mean) of the last k samples. Using RTT to estimate the queueing delay has a num | chieve this) when measuring RTT using pure ACK packets.</t> | |||
ber of shortcomings and difficulties that we discuss next.</t> | <t>This limitation of the sender's window can come from either the T | |||
CP congestion window in host B or the announced receive window from rLEDBAT in h | ||||
<t>The queuing delay measured using RTT includes also the queueing delay | ost A. Normally, the receive window will be the one to limit the sender's transm | |||
experienced by the return packets in the direction from the rLEDBAT receiver to | ission rate, since the LBE congestion control algorithm used by the rLEDBAT node | |||
the sender. This is a fundamental limitation of this approach. The impact of thi | is designed to be more restrictive on the sender's rate than standard-TCP. If t | |||
s error is that the rLEDBAT controller will also react to congestion in the reve | he limiting factor is the congestion window in the sender, it is less relevant i | |||
rse path direction which results in an even more conservative mechanism.</t> | f rLEDBAT further reduces the receive window due to a bloated RTT measurement, s | |||
ince the rLEDBAT node is not actively controlling the sender's rate. Nevertheles | ||||
<t>In order to measure RTT, the rLEDBAT client MUST enable the Time Stamp | s, the proposed approach to discard larger samples would also address this issue | |||
(TS) option <xref target="RFC7323" />. By matching the TSVal value carried in o | .</t> | |||
utgoing packets with the TSecr value observed in incoming packets, it is possibl | <t>To address the case in which the limiting factor is the receive w | |||
e to measure RTT. This allows the rLEDBAT receiver to measure RTT even if it is | indow announced by rLEDBAT, the congestion control algorithm at the receiver <bc | |||
acting as a pure receiver. In a pure receiver there is no data flowing from the | p14>SHOULD</bcp14> discard RTT measurements during the window reduction phase th | |||
rLEDBAT receiver to the sender, making impossible to match data packets with ack | at are triggered by pure ACK packets. The rLEDBAT receiver is aware of whether a | |||
nowledgements packets to measure RTT, as it is usually done in TCP for other pur | given TSval was sent in a pure ACK packet where the window was reduced, and if | |||
poses.</t> | so, it can discard the corresponding RTT measurement. </t> | |||
</section> | ||||
<t>Depending on the frequency of the local clock used to generate the val | <section numbered="true" toc="default"> | |||
ues included in the TS option, several packets may carry the same TSVal value. I | <name>Measuring RTT When Sending Data Packets</name> | |||
f that happens, the rLEDBAT receiver will be unable to match the different outgo | <t>In the case that the rLEDBAT node is sending data packets and mat | |||
ing packets carrying the same TSVal value with the different incoming packets ca | ching them with pure ACKs to measure RTT, a factor that can artificially increas | |||
rrying also the same TSecr value. However, it is not necessary for rLEDBAT to us | e the RTT measured is the presence of delayed Acknowledgments. | |||
e all packets to estimate RTT and sampling a subset of in-flight packets per RTT | According to the TS option generation rules <xref target="RFC7323 | |||
is enough to properly assess the queueing delay. RTT MUST then be calculated as | " format="default"/>, | |||
the time since the first packet with a given TSVal was sent and the first packe | the value included in the TSecr for a delayed ACK is the one in t | |||
t that was received with the same value contained in the TSecr. Other packets wi | he TSval field of the earliest unacknowledged segment. | |||
th repeated TS values SHOULD NOT be used for RTT calculation. </t> | ||||
<t>Several issues must be addressed in order to avoid an artificial incre | ||||
ase of the observed RTT. Different issues emerge depending whether the rLEDBAT c | ||||
apable host is sending data packets or pure ACKs to measure RTT. We next conside | ||||
r the issues separately.</t> | ||||
<section title="Measuring RTT sending pure ACKs"> | ||||
<t>In this scenario, the rLEDBAT node (node A) sends a pure ACK t | ||||
o the other endpoint of the TCP connection (node B), including the TS option. Up | ||||
on the reception of the TS Option, host B will copy the value of the TSVal into | ||||
the TSecr field of the TS option and include that option into the next data pack | ||||
et towards host A. However, there are two reasons why B may not send a packet im | ||||
mediately back to A, artificially increasing the measured RTT. The first reason | ||||
is when A has no data to send. | ||||
The second is when A has no available window to put more packets in-flight. We d | ||||
escribe next how each of these cases is addressed.</t> | ||||
<t>The case where the host B has no data to send when it receives the pur | ||||
e Acknowledgement is expected to be rare in the rLEDBAT use cases. rLEDBAT will | ||||
be used mostly for background file transfers so the expected common case is that | ||||
the sender will have data to send throughout the lifetime of the communication. | ||||
However, if, for example, the file is structured in blocks of data, it may be t | ||||
he case that the sender seldomly will have to wait until the next block is avail | ||||
able to proceed with the data transfer. To address this situation, the filter us | ||||
ed by the congestion control algorithm executed in the receiver SHOULD discard o | ||||
utliers (e.g. a min filter would achieve this) when measuring RTT using pure ACK | ||||
packets.</t> | ||||
<t>This limitation of the sender's window can come either from the TCP co | ||||
ngestion window in host B or from the announced receive window from the rLEDBAT | ||||
in host A. Normally, the receive window will be the one to limit the sender's tr | ||||
ansmission rate, since the LBE congestion control algorithm used by the rLEDBAT | ||||
node is designed to be more restrictive on the sender's rate than standard-TCP. | ||||
If the limiting factor is the congestion window in the sender, it is less releva | ||||
nt if rLEDBAT further reduces the receive window due to a bloated RTT measuremen | ||||
t, since the rLEDBAT node is not actively controlling the sender's rate. Neverth | ||||
eless, the proposed approach to discard larger samples would also address this i | ||||
ssue.</t> | ||||
<t>To address the case in which the limiting factor is the receive window | ||||
announced by rLEDBAT, the congestion control algorithm at the receiver SHOULD d | ||||
iscard RTT measurements during the window reduction phase that are triggered by | ||||
pure ACK packets. The rLEDBAT receiver is aware whether a given TSVal value was | ||||
sent in a pure ACK packet where the window was reduced, and if so, it can discar | ||||
d the corresponding RTT measurement. </t> | ||||
</section> | ||||
<section title="Measuring RTT when sending data packets"> | ||||
<t>In the case that the rLEDBAT node is sending data packets and matching | ||||
them with pure ACKs to measure RTT, a factor that can artificially increase the | ||||
RTT measured is the presence of delayed Acknowledgements. | ||||
According to the TS option generation rules <xref target="RFC7323 | ||||
" />, | ||||
the value included in the TSecr for a delayed ACK is the one in t | ||||
he TSVal field of the earliest unacknowledged segment. | ||||
This may artificially increase the measured RTT. </t> | This may artificially increase the measured RTT. </t> | |||
<t>If both endpoints of the connection are sending data packets, Ack | ||||
<t>If both endpoints of the connection are sending data packets, | nowledgments are piggybacked onto the data packets and they are not delayed. Del | |||
Acknowledgments are piggybacked into the data packets and they are not delayed. | ayed ACKs only increase RTT measurements in the case that the sender has no data | |||
Delayed ACKs only increase RTT measurements in the case that the sender has no d | to send. Since the expected use case for rLEDBAT is that the sender will be sen | |||
ata to send. Since the expected use case for rLEDBAT is that the sender will be | ding background traffic to the rLEDBAT receiver, the cases where delayed ACKs in | |||
sending background traffic to the rLEDBAT receiver, the cases where delayed ACKs | crease the measured RTT are expected to be rare.</t> | |||
increase the measured RTT are expected to be rare.</t> | <t>Nevertheless, measurements based on data packets from the rLEDBAT | |||
<t>Nevertheless, measurements based on data packets from the rLED | node matching pure ACKs from the other end will result in an increased RTT samp | |||
BAT node matching pure ACKs from the other end will result in an increased RTT s | le. The additional increase in the measured RTT will be up to 500 ms. This is be | |||
ample. The additional increase in the measured RTT will be up to 500 ms. The rea | cause delayed ACKs are generated every second data packet received and not delay | |||
son for this is that delayed ACKs are generated every second data packet receive | ed more than 500 ms according to <xref target="RFC9293" format="default"/>. The | |||
d and not delayed more than 500 ms according to <xref target="RFC9293" />. The r | rLEDBAT receiver <bcp14>MAY</bcp14> discard RTT measurements done using data pac | |||
LEDBAT receiver MAY discard RTT measurements done using data packets from the rL | kets from the rLEDBAT receiver and matching pure ACKs, especially if it has rece | |||
EBDAT receiver and matching pure ACKs, especially if it has recent measurements | nt measurements done using other packet combinations. Applying a filter (e.g., a | |||
done using other packet combinations. Also, applying a filter that discards outl | MIN filter) that discards outliers would also address this issue.</t> | |||
iers would also address this issue (e.g. a min filter).</t> | </section> | |||
</section> | </section> | |||
</section> | <section numbered="true" toc="default"> | |||
<name>Measuring One-Way Delay to Estimate the Queuing Delay</name> | ||||
<section title="Measuring one way delay to estimate the queueing | <t>The LEDBAT algorithm uses the one-way delay of packets as input. A | |||
delay"> | TCP receiver can measure the delay of incoming packets directly (as opposed to t | |||
<t>The LEDBAT algorithm uses the one-way delay of packets as inpu | he sender-based LEDBAT, where the receiver measures the one-way delay and needs | |||
t. A TCP receiver can measure the delay of incoming packets directly (as opposed | to convey it to the sender).</t> | |||
to the sender-based LEDBAT, where the receiver measures the one-way delay and n | <t>In the case of TCP, the receiver can use the TS option to measure t | |||
eeds to convey it to the sender).</t> | he one-way delay by subtracting the timestamp contained in the incoming packet f | |||
<t>In the case of TCP, the receiver can use the TimeStamp option | rom the local time at which the packet has arrived. As noted in <xref target="RF | |||
to measure the one way delay by subtracting the timestamp contained in the incom | C6817" format="default"/>, the clock offset between the sender's clock and the r | |||
ing packet from the local time at which the packet has arrived. As noted in <xre | eceiver's clock does not affect the LEDBAT operation, since LEDBAT uses the diff | |||
f target="RFC6817" /> the clock offset between the clock of the sender and the c | erence between the base one-way delay and the current one-way delay to estimate | |||
lock in the receiver does not affect the LEDBAT operation, since LEDBAT uses the | the queuing delay, effectively "canceling out" the clock offset error in the que | |||
difference between the base one way delay and the current one way delay to esti | uing delay estimation. There are, however, two other issues that the rLEDBAT rec | |||
mate the queuing delay, effectively canceling the clock offset error in the queu | eiver needs to take into account in order to properly estimate the one-way delay | |||
eing delay estimation. There are however two other issues that the rLEDBAT recei | , namely the units in which the received timestamps are expressed and the clock | |||
ver needs to take into account in order to properly estimate the one way delay, | skew. These issues are addressed below.</t> | |||
namely, the units in which the received timestamps are expressed and the clock s | <t>In order to measure the one-way delay using TCP timestamps, the rLE | |||
kew. We address them next.</t> | DBAT receiver first needs to discover the units of values in the TS option and t | |||
hen needs to account for the skew between the two endpoint clocks. Note that a m | ||||
<t>In order to measure the one way delay using TCP timestamps, th | ismatch of 100 ppm (parts per million) in the estimation of the sender's clock r | |||
e rLEDBAT receiver, first, needs to discover the units of values in the TS optio | ate accounts for 6 ms of variation per minute in the measured delay. This is jus | |||
n and, second, needs to account for the skew between the two endpoint clocks. No | t one order of magnitude below the target delay set by rLEDBAT (or potentially m | |||
te that a mismatch of 100 ppm (parts per million) in the estimation of the sende | ore if the target is set to lower values, which is possible). Typical skew for u | |||
r's clock rate accounts for 6 ms of variation per minute in the measured delay. | ntrained clocks is reported to be around 100-200 ppm <xref target="RFC6817" form | |||
This just one order of magnitude below the target delay set by rLEDBAT (or poten | at="default"/>.</t> | |||
tially more if the target is set to lower values, which is possible). Typical sk | <t>In order to learn both the TS units and the clock skew, the rLEDBAT | |||
ew for untrained clocks is reported to be around 100-200 ppm <xref target="RFC68 | receiver measures how much local time has elapsed between two packets with diff | |||
17" />.</t> | erent TS values issued by the sender. By comparing the local time difference and | |||
<t>In order to learn both the TS units and the clock skew, the rL | the TS value difference, the receiver can assess the TS units and relative cloc | |||
EDBAT receiver measures how much local time has elapsed between two packets with | k skews. In order for this to be accurate, the packets carrying the different TS | |||
different TS values issued by the sender. By comparing the local time differenc | values should experience equal (or at least similar) delay when traveling from | |||
e and the TS value difference, the receiver can assess the TS units and relative | the sender to the receiver, as any difference in the experienced delays would in | |||
clock skews. In order for this to be accurate, the packets carrying the differe | troduce an error in the unit/skew estimation. One possible approach is to select | |||
nt TS values should experience equal (or at least similar delay) when traveling | packets that experienced minimal delay (i.e., queuing delay close to zero) to m | |||
from the sender to the receiver, as any difference in the experienced delays wou | ake the estimations.</t> | |||
ld introduce error in the unit/skew estimation. One possible approach is to sele | <t>An additional difficulty regarding the estimation of the TS units a | |||
ct packets that experienced the minimum delay (i.e. close to zero queueing delay | nd clock skew in the context of (r)LEDBAT is that the LEDBAT congestion controll | |||
) to make the estimations.</t> | er actions directly affect the (queuing) delay experienced by packets. In partic | |||
<t>An additional difficulty regarding the estimation of the TS un | ular, if there is an error in the estimation of the TS units/skew, the LEDBAT co | |||
its and clock skew in the context of (r)LEDBAT is that the LEDBAT congestion con | ntroller will attempt to compensate for it by reducing/increasing the load. The | |||
troller actions directly affect the (queueing) delay experienced by packets. In | result is that the LEDBAT operation interferes with the TS units/clock skew meas | |||
particular, if there is an error in the estimation of the TS units/skew, the LED | urements. Because of this, measurements are more accurate when there is no traff | |||
BAT controller will attempt to compensate it by reducing/increasing the load. Th | ic in the connection (in addition to the packets used for the measurements). The | |||
e result is that the LEDBAT operation interferes with the TS units/clock skew me | problem is that the receiver is unaware of whether the sender is injecting traf | |||
asurements. Because of this, measurements are more accurate when there is no tra | fic at any point in time; it is therefore unable to use these quiet intervals to | |||
ffic in the connection (in addition to the packets used for the measurements). T | perform measurements. The receiver can, however, force periodic slowdowns, redu | |||
he problem is that the receiver is unaware if the sender is injecting traffic at | cing the | |||
any point in time, and so, it is unable to use these quiet intervals to perform | announced receive window to a few packets and performing the measurements at tha | |||
measurements. The receiver can however, force periodic slowdowns, reducing the | t time.</t> | |||
announced receive window to a few packets and perform the measurements then.</t> | <t>It is possible for the rLEDBAT receiver to perform multiple measure | |||
<t>It is possible for the rLEDBAT receiver to perform multiple me | ments to assess both the TS units and the relative clock skew during the lifetim | |||
asurements to assess both the TS units and the relative clock skew during the li | e of the connection, in order to obtain more accurate results. Clock skew measur | |||
fetime of the connection, in order to obtain more accurate results. Clock skew m | ements are more accurate if the time period used to discover the skew is larger, | |||
easurements are more accurate if the time period used to discover the skew is la | as the impact of the skew becomes more apparent. It is a reasonable approach f | |||
rger, as the impact of the skew becomes more apparent. It is a reasonable appro | or the rLEDBAT receiver to perform an early discovery of the TS units (and the c | |||
ach for the rLEDBAT receiver to perform an early discovery of the TS units (and | lock skew) using the first few packets of the TCP connection and then improve th | |||
the clock skew) using the first few packets of the TCP connection and then impro | e accuracy of the TS units/clock skew estimation using periodic measurements lat | |||
ve the accuracy of the TS units/clock skew estimation using periodic measurement | er in the lifetime of the connection. </t> | |||
s later in the lifetime of the connection. </t> | </section> | |||
</section> | ||||
</section> | <section numbered="true" toc="default"> | |||
<name>Detecting Packet Losses and Retransmissions</name> | ||||
</section> | <t>The rLEDBAT receiver is capable of detecting retransmitted packets as | |||
follows. We call RCV.HGH the highest sequence number corresponding to a receive | ||||
<section title="Detecting packet losses and retransmissions"> | d byte of data (not assuming that all bytes with smaller sequence numbers have b | |||
een received already, there may be holes), and we call TSV.HGH the TSval corresp | ||||
<t>The rLEDBAT receiver is capable of detecting retransmitted packets in | onding to the segment in which that byte was carried. SEG.SEQ stands for the seq | |||
the following way. We call RCV.HGH the highest sequence number corresponding to | uence number of a newly received segment, and we call TSV.SEQ the TSval of the n | |||
a received byte of data (not assuming that all bytes with smaller sequence numbe | ewly received segment.</t> | |||
rs have been received already, there may be holes) and we call TSV.HGH the TSVal | <t>If SEG.SEQ < RCV.HGH and TSV.SEQ > TSV.HGH, then the newly rece | |||
value corresponding to the segment in which that byte was carried. SEG.SEQ stan | ived segment is a retransmission. This is so because the newly received segment | |||
ds for the sequence number of a newly received segment and we call TSV.SEQ the T | was generated later than another already-received segment that contained data wi | |||
SVal value of the newly received segment.</t> | th a larger sequence number. This means that this segment was lost and was retra | |||
<t>If SEG.SEQ < RCV.HGH and TSV.SEQ > TSV.HGH then the newly received | nsmitted.</t> | |||
segment is a retransmission. This is so because the newly received segment was g | <t>The proposed mechanism to detect retransmissions at the receiver fail | |||
enerated later than another already received segment which contained data with a | s when there are window tail drops. If all packets in the tail of the window are | |||
larger sequence number. This means that this segment was lost and was retransmi | lost, the receiver will not be able to detect a mismatch between the sequence n | |||
tted.</t> | umbers of the packets and the order of the timestamps. In this case, rLEDBAT wil | |||
l not react to losses; however, the TCP congestion controller at the sender will | ||||
<t>The proposed mechanism to detect retransmissions at the receiver fails | , most likely reducing its window to 1 MSS and taking over the control of the se | |||
when there are window tail drops. If all packets in the tail of the window are | nding rate until slow start ramps up and catches the current value of the rLEDBA | |||
lost, the receiver will not be able to detect a mismatch between the sequence nu | T window.</t> | |||
mbers of the packets and the order of the timestamps. In this case, rLEDBAT will | </section> | |||
not react to losses but the TCP congestion controller at the sender will, most | ||||
likely reducing its window to 1MSS and take over the control of the sending rate | ||||
, until slow start ramps up and catches the current value of the rLEDBAT window. | ||||
</t> | ||||
</section> | ||||
</section> | ||||
<section title="Experiment Considerations"> | ||||
<t>The status of this document is Experimental. The general purpose of th | ||||
e proposed experiment is to gain more experience running rLEDBAT over different | ||||
network paths to see if the proposed rLEDBAT parameters perform well in differen | ||||
t situations. Specifically, we would like to learn about the following aspects o | ||||
f the rLEDBAT mechanism: </t> | ||||
<t><list> | ||||
<t>- Interaction between the sender and the receiver Congestion c | ||||
ontrol algorithms. rLEDBAT posits that because the rLEDBAT receiver is using a l | ||||
ess-than-best-effort congestion control algorithm, the receiver congestion contr | ||||
ol algorithm will expose a smaller congestion window (conveyed though the Receiv | ||||
e Window) than the one resulting from the congestion control algorithm executed | ||||
at the sender. One of the purposes of the experiment is learn how these two inte | ||||
ract and if the assumption that the receiver side is always controlling the send | ||||
er's rate (and making rLEDBAT effective) holds. The experiment should include th | ||||
e different congestion control algorithms that are currently widely used in the | ||||
Internet, including Cubic, BBR and LEDBAT(++).</t> | ||||
<t>- Interaction between rLEDBAT and Active Queue Management tech | ||||
niques such as Codel, PIE and L4S.</t> | ||||
<t>- How the rLEDBAT should resume after a period during which th | ||||
ere was no incoming traffic and the information about the rLEDBAT state informat | ||||
ion is potentially dated.</t> | ||||
</list></t> | ||||
<section title="Status of the experiment at the time of this writing."> | ||||
<t>Currently there are the following implementations of rLEDBAT t | ||||
hat can be used for experimentation: | ||||
<list> | ||||
<t>- Windows 11. rLEDBAT is available in Microsof | ||||
t's Windows 11 22H2 since October 2023 <xref target="Windows11" />.</t> | ||||
<t>- Windows Server 2022. rLEDBAT is available in | ||||
Microsoft's Windows Server 2022 since September 2022 <xref target="WindowsServe | ||||
r" />.</t> | ||||
<t>- Apple. rLEDBAT is available in MacOS and iOS | ||||
since 2021 <xref target="Apple" />.</t> | ||||
<t>- Linux implementation, open source, available | ||||
since 2022 at https://github.com/net-research/rledbat_module.</t> | ||||
<t>- ns3 implementation, open source, available s | ||||
ince 2020 at https://github.com/manas11/implementation-of-rLEDBAT-in-ns-3.</t> | ||||
</list></t> | ||||
<t>In addition, rLEDBAT has been deployed by Microsoft in | ||||
wide scale in the following services: | ||||
<list> | ||||
<t>- BITS (Background Intelligent Transfe | ||||
r Service)</t> | ||||
<t>- DO (Delivery Optimization) service</ | ||||
t> | ||||
<t>- Windows update # using DO</t> | ||||
<t>- Windows Store # using DO</t> | ||||
<t>- OneDrive</t> | ||||
<t>- Windows Error Reporting # wermgr.exe | ||||
; werfault.exe</t> | ||||
<t>- System Center Configuration Manager | ||||
(SCCM)</t> | ||||
<t>- Windows Media Player</t> | ||||
<t>- Microsoft Office</t> | ||||
<t>- Xbox (download games) # using DO</t> | ||||
</list> </t> | ||||
<t> Some initial experiments involving rLEDBAT have been | ||||
reported in <xref target="COMNET3" />. Experiments involving the interaction of | ||||
LEDBAT++ and BBR are presented in <xref target="COMNET2" />. An experimental eva | ||||
luation of the LEDBAT++ algorithm is presented in <xref target="COMNET1" />. As | ||||
LEDBAT++ is one of the less-than-best-effort congestion control algorithms that | ||||
rLEDBAT relies on, the results regarding LEDBAT++ interaction with other congest | ||||
ion control algorithms are relevant for the understanding of rLEDBAT as well.</t | ||||
> | ||||
</section> | ||||
</section> | ||||
<section title="Security Considerations"> | ||||
<t>Overall, we believe that rLEDBAT does not introduce any new vu | ||||
lnerabilities to existing TCP endpoints, as it relies on existing TCP knobs, not | ||||
ably the Receive Window and timestamps. </t> | ||||
<t>Specifically, rLEDBAT uses RCV.WND to modulate the rate of the sender | ||||
. An attacker wishing to starve a flow can simply reduce the RCV.WND, irrespecti | ||||
ve of whether rLEDBAT is being used or not.</t> | ||||
<t> We can further ask ourselves whether the attacker can use the rLEDBAT | ||||
mechanisms in place to force the rLEDBAT receiver to reduce the RCV WND. There | ||||
are two ways an attacker can do that. One would be to introduce an artificial de | ||||
lay to the packets either by actually delaying the packets or modifying the Time | ||||
stamps. This would cause the rLEDBAT receiver to believe that a queue is buildin | ||||
g up and reduce the RCV.WND. Note that an attacker to do that must be on path, s | ||||
o if that is the case, it is probably more direct to simply reduce the RCV.WND.< | ||||
/t> | ||||
<t> The other option would be for the attacker to make the rLEDBAT | ||||
receiver believe that a loss has occurred. To do that, it basically needs to re | ||||
transmit an old packet (to be precise, it needs to transmit a packet with the ri | ||||
ght sequence number and the right port and IP numbers). This means that the atta | ||||
cker can achieve a reduction of incoming traffic to the rLEDBAT receiver not onl | ||||
y by modifying the RCV.WND field of the packets originated from the rLEDBAT host | ||||
, but also by injecting packets with the proper sequence number in the other dir | ||||
ection. This may slightly expand the attack surface.</t> | ||||
</section> | </section> | |||
<section numbered="true" anchor="sect-5" toc="default"> | ||||
<name>Experiment Considerations</name> | ||||
<t>The status of this document is Experimental. The general purpose of the | ||||
proposed experiment is to gain more experience running rLEDBAT over different n | ||||
etwork paths to see if the proposed rLEDBAT parameters perform well in different | ||||
situations. Specifically, we would like to learn about the following aspects of | ||||
the rLEDBAT mechanism: </t> | ||||
<ul spacing="normal"> | ||||
<li> | ||||
<t>Interaction between the sender's and receiver's congestion control | ||||
algorithms. rLEDBAT posits that because the rLEDBAT receiver is using a le | ||||
ss-than-best-effort congestion control algorithm, the receiver's congestion cont | ||||
rol algorithm will expose a smaller congestion window (conveyed through the rece | ||||
ive window) than the one resulting from the congestion control algorithm execute | ||||
d at the sender. One of the purposes of the experiment is to learn how these two | ||||
algorithms | ||||
interact and if the assumption that the receiver side is always controlling the | ||||
sender's rate (and making rLEDBAT effective) holds. The experiment should includ | ||||
e the different congestion control algorithms that are currently widely used in | ||||
the Internet, including CUBIC, Bottleneck Bandwidth and Round-trip propagation t | ||||
ime (BBR), and LEDBAT(++).</t> | ||||
</li> | ||||
<li> | ||||
<t>Interaction between rLEDBAT and Active Queue Management techniques | ||||
such as Controlled Delay (CoDel); Proportional Integral controller Enhanced (PIE | ||||
); and Low Latency, Low Loss, and Scalable Throughput (L4S). | ||||
</t> | ||||
</li> | ||||
<li> | ||||
<t>How rLEDBAT should resume after a period during which there was no | ||||
incoming traffic and the information about the rLEDBAT state information is pote | ||||
ntially dated.</t> | ||||
</li> | ||||
</ul> | ||||
<section numbered="true" toc="default"> | ||||
<name>Status of the Experiment at the Time of This Writing</name> | ||||
<t>Currently, the following implementations of rLEDBAT can be used for e | ||||
xperimentation:</t> | ||||
<ul spacing="normal"> | ||||
<li> | ||||
<t>Windows 11. rLEDBAT is available in Microsoft's Windows 11 | ||||
22H2 since October 2023 <xref target="Windows11" format="default"/>.</t> | ||||
</li> | ||||
<li> | ||||
<t>Windows Server 2022. rLEDBAT is available in Microsoft's Wi | ||||
ndows Server 2022 since September 2022 <xref target="WindowsServer" format="defa | ||||
ult"/>.</t> | ||||
</li> | ||||
<li> | ||||
<t>Apple. rLEDBAT is available in macOS and iOS since 2021 < | ||||
xref target="Apple" format="default"/>.</t> | ||||
</li> | ||||
<li> | ||||
<t>Linux implementation, open source, available since 2022 <xref tar | ||||
get="rledbat_module"/>.</t> | ||||
</li> | ||||
<li> | ||||
<t>ns3 implementation, open source, available since 2020 <xref targe | ||||
t="rLEDBAT-in-ns-3"/>.</t> | ||||
</li> | ||||
</ul> | ||||
<t>In addition, rLEDBAT has been deployed by Microsoft at wide scale in | ||||
the following services: | ||||
</t> | ||||
<ul spacing="normal"> | ||||
<li> | ||||
<t>BITS (Background Intelligent Transfer Service)</t> | ||||
</li> | ||||
<li> | ||||
<t>DO (Delivery Optimization) service</t> | ||||
</li> | ||||
<li> | ||||
<t>Windows update: using DO</t> | ||||
</li> | ||||
<li> | ||||
<t>Windows Store: using DO</t> | ||||
</li> | ||||
<li> | ||||
<t>OneDrive</t> | ||||
</li> | ||||
<li> | ||||
<t>Windows Error Reporting: wermgr.exe; werfault.exe</t> | ||||
</li> | ||||
<li> | ||||
<t>System Center Configuration Manager (SCCM)</t> | ||||
</li> | ||||
<li> | ||||
<t>Windows Media Player</t> | ||||
</li> | ||||
<li> | ||||
<t>Microsoft Office</t> | ||||
</li> | ||||
<li> | ||||
<t>Xbox (download games): using DO</t> | ||||
</li> | ||||
</ul> | ||||
<section title="IANA Considerations"> | <t> Some initial experiments involving rLEDBAT have been reported in <xr | |||
<t>No actions are required from IANA.</t> | ef target="COMNET3" format="default"/>. Experiments involving the interaction be | |||
tween LEDBAT++ and BBR are presented in <xref target="COMNET2" format="default"/ | ||||
>. An experimental evaluation of the LEDBAT++ algorithm is presented in <xref ta | ||||
rget="COMNET1" format="default"/>. As LEDBAT++ is one of the less-than-best-effo | ||||
rt congestion control algorithms that rLEDBAT relies on, the results regarding h | ||||
ow LEDBAT++ interacts with other congestion control algorithms are relevant for | ||||
the understanding of rLEDBAT as well.</t> | ||||
</section> | ||||
</section> | </section> | |||
<section numbered="true" toc="default"> | ||||
<name>Security Considerations</name> | ||||
<t>Overall, we believe that rLEDBAT does not introduce any new vulnerabili | ||||
ties to existing TCP endpoints, as it relies on existing TCP knobs, notably the | ||||
receive window and timestamps. </t> | ||||
<t>Specifically, rLEDBAT uses RCV.WND to modulate the rate of the sender. | ||||
An attacker wishing to starve a flow can simply reduce the RCV.WND, irrespective | ||||
of whether rLEDBAT is being used or not.</t> | ||||
<t> We can further ask ourselves whether the attacker can use the rLEDBAT | ||||
mechanisms in place to force the rLEDBAT receiver to reduce the RCV.WND. There a | ||||
re two ways an attacker can do this:</t> | ||||
<section title="Acknowledgements"> | <ul spacing="normal"> | |||
<li>One would be to introduce an artificial delay to the packets by either | ||||
<t>This work was supported by the EU through the StandICT projects RXQ, C | actually delaying the packets or modifying the timestamps. This would cause the | |||
CI and CEL6, the NGI Pointer RIM project and the H2020 5G-RANGE project and by t | rLEDBAT receiver to believe that a queue is building up and reduce the RCV.WND. | |||
he Spanish Ministry of Economy and Competitiveness through the 5G-City project ( | Note that to do so, an attacker must be on path, so if that is the case, it is | |||
TEC2016-76795-C6-3-R).</t> | probably more direct to simply reduce the RCV.WND.</li> | |||
<li>The other option would be for the attacker to make the rLEDBAT receive | ||||
<t>We would like to thank ICCRG chairs Reese Enghardt and Vidhi Goel for | r believe that a loss has occurred. To do this, it basically needs to retransmit | |||
their support on this work. We would also like to thank Daniel Havey for his hel | an old packet (to be precise, it needs to transmit a packet with the correct se | |||
p. We would like to thank Colin Perkins, Mirja Kuehlewind, and Vidhi Goel for th | quence number and the correct port and IP numbers). This means that the attacker | |||
eir reviews and comments on earlier versions of this document.</t> | can achieve a reduction of incoming traffic to the rLEDBAT receiver not only by | |||
modifying the RCV.WND field of the packets originated from the rLEDBAT host but | ||||
also by injecting packets with the proper sequence number in the other directio | ||||
n. This may slightly expand the attack surface.</li> | ||||
</ul> | ||||
</section> | ||||
<section numbered="true" toc="default"> | ||||
<name>IANA Considerations</name> | ||||
<t>This document has no IANA actions.</t> | ||||
</section> | </section> | |||
</middle> | </middle> | |||
<back> | <back> | |||
<references title="Informative References"> | <displayreference target="I-D.irtf-iccrg-ledbat-plus-plus" to="LEDBAT++"/> | |||
<?rfc include='reference.RFC.9293'?> | ||||
<?rfc include='reference.I-D.irtf-iccrg-ledbat-plus-plus" ?> | ||||
<?rfc include="reference.RFC.6817" ?> | ||||
<?rfc include="reference.RFC.7323" ?> | ||||
<?rfc include="reference.RFC.9438"?> | ||||
<?rfc include="reference.RFC.5681"?> | ||||
<reference anchor="Windows11" > | ||||
<front> | ||||
<title>What's new in Delivery Optimization</title> | ||||
<author initials="C.F." surname="Forsmann" fullname=" | ||||
Carmen"> | ||||
<organization /> | ||||
</author> | ||||
<date year="2023" /> | ||||
</front> | ||||
<seriesInfo name="Microsoft Documentation" value="https:/ | ||||
/learn.microsoft.com/en-us/windows/deployment/do/whats-new-do" /> | ||||
<refcontent></refcontent> | ||||
</reference> | ||||
<reference anchor="WindowsServer" > | ||||
<front> | ||||
<title>LEDBAT Background Data Transfer for Wi | ||||
ndows</title> | ||||
<author initials="D.H." surname="Havey" fulln | ||||
ame="Daniel"> | ||||
<organization /> | ||||
</author> | ||||
<date year="2022" /> | ||||
</front> | ||||
<seriesInfo name="Microsoft Blog" value="https:// | ||||
techcommunity.microsoft.com/t5/networking-blog/ledbat-background-data-transfer-f | ||||
or-windows/ba-p/3639278" /> | ||||
<refcontent></refcontent> | ||||
</reference> | ||||
<reference anchor="Apple" > | ||||
<front> | ||||
<title>Reduce network delays for your | ||||
app</title> | ||||
<author initials="S.C." surname="Stua | ||||
rt" fullname="Cheshire"> | ||||
<organization /> | ||||
</author> | ||||
<author initials="V.G." surname="Vidh | ||||
i" fullname=" Goel "> | ||||
<organization /> | ||||
</author> | ||||
<date year="2021" /> | ||||
</front> | ||||
<seriesInfo name="WWDC21" value="https:// | ||||
developer.apple.com/videos/play/wwdc2021/10239/" /> | ||||
<refcontent></refcontent> | ||||
</reference> | ||||
<reference anchor="COMNET3" > | ||||
<front> | ||||
<title> Design, implementation and va | ||||
lidation of a receiver-driven less-than-best-effort transport </title> | ||||
<author initials="M.B." surname="Bagn | ||||
ulo" fullname="Marcelo Bagnulo"> | ||||
<organization /> | ||||
</author> | ||||
<author initials="A.G." surname="Garc | ||||
ia-Martinez" fullname="Alberto Garcia-Martinez"> | ||||
<organization /> | ||||
</author> | ||||
<author initials="A.M." surname="Mand | ||||
alari" fullname="Anna Maria Mandalari"> | ||||
<organization /> | ||||
</author> | ||||
<author initials="P.B," surname="Bala | ||||
subramanian" fullname="Praveen Balasubramanian"> | ||||
<organization /> | ||||
</author> | ||||
<author initials="D.H." surname="Have | ||||
y" fullname="Daniel Havey"> | ||||
<organization /> | ||||
</author> | ||||
<author initials="G.M." surname="Mont | ||||
enegro" fullname="Gabriel Montenegro"> | ||||
<organization /> | ||||
</author> | ||||
<date year="2022" /> | ||||
</front> | ||||
<seriesInfo name="Computer Networks" valu | ||||
e="Volume 233" /> | ||||
<refcontent></refcontent> | ||||
</reference> | ||||
<reference anchor="COMNET2" > | ||||
<front> | ||||
<title>When less is m | ||||
ore: BBR versus LEDBAT++</title> | ||||
<author initials="M.B | ||||
." surname="Bagnulo" fullname="Marcelo Bagnulo"> | ||||
<organization /> | ||||
</author> | ||||
<author initials="A.G | ||||
." surname="Garcia-Martinez" fullname="Alberto Garcia-Martinez"> | ||||
<organization /> | ||||
</author> | ||||
<date year="2022" /> | ||||
</front> | ||||
<seriesInfo name="Compute | ||||
r Networks" value="Volume 219" /> | ||||
<refcontent></refcontent> | ||||
</reference> | ||||
<reference anchor="COMNET | <references> | |||
1" > | <name>References</name> | |||
<front> | <references anchor="sec-normative-references"> | |||
<titl | <name>Normative References</name> | |||
e>An experimental evaluation of LEDBAT++ </title> | <xi:include | |||
<auth | href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/> | |||
or initials="M.B." surname="Bagnulo" fullname="Marcelo Bagnulo"> | <xi:include | |||
< | href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/> | |||
organization /> | </references> | |||
</aut | <references anchor="sec-informative-references"> | |||
hor> | <name>Informative References</name> | |||
<auth | <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.929 | |||
or initials="A.G." surname="Garcia-Martinez" fullname="Alberto Garcia-Martinez"> | 3.xml"/> | |||
< | ||||
organization /> | ||||
</aut | ||||
hor> | ||||
<date | ||||
year="2022" /> | ||||
</front> | ||||
<seri | ||||
esInfo name='Computer Networks' value="Volume 212"/> | ||||
<refconte | ||||
nt></refcontent> | ||||
</reference> | ||||
</references> | <!-- draft-irtf-iccrg-ledbat-plus-plus (I-D Exists) --> | |||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ir | ||||
tf-iccrg-ledbat-plus-plus.xml"/> | ||||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.681 | ||||
7.xml"/> | ||||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.732 | ||||
3.xml"/> | ||||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.943 | ||||
8.xml"/> | ||||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.568 | ||||
1.xml"/> | ||||
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.658 | ||||
2.xml"/> | ||||
<section title="Terminology"> | <reference anchor="Windows11" target="https://learn.microsoft.com/en-us/wi | |||
ndows/deployment/do/whats-new-do"> | ||||
<front> | ||||
<title>What's new in Delivery Optimization</title> | ||||
<author> | ||||
<organization>Microsoft</organization> | ||||
</author> | ||||
<date month="October" year="2024"/> | ||||
</front> | ||||
<refcontent>Microsoft Windows Documentation</refcontent> | ||||
</reference> | ||||
<t>We use the following abreviations thoughout the text. We include a sho | <reference anchor="WindowsServer" target="https://techcommunity.microsoft. | |||
rt list for the reader's convenence:</t> | com/t5/networking-blog/ledbat-background-data-transfer-for-windows/ba-p/3639278" | |||
<t><list> | > | |||
<t>RCV.WND: the value included in the Receive Window fiel | <front> | |||
d of the TCP header (which computation is modified by this specification)</t> | <title>LEDBAT Background Data Transfer for Windows</title> | |||
<t>SND.WND: The TCP sender's window</t> | <author initials="D" surname="Havey" fullname="Daniel"> | |||
<t>cwnd: the consgestion window as computed by the conges | <organization/> | |||
tion control algorithm running at the TCP sender.</t> | </author> | |||
<t>RLWND: the window value calculated by rLEDBAT algorith | <date month="September" year="2022"/> | |||
m</t> | </front> | |||
<t>fcwnd: the value that a standard RFC793bis TCP receive | <refcontent>Microsoft Networking Blog</refcontent> | |||
r calculates to set in the receive window for flow control purposes.</t> | </reference> | |||
<t>RCV.HGH: the highest sequence number corresponding to | ||||
a received byte of data at one point in time</t> | ||||
<t>TSV.HGH: TSV.HGH the TSVal value corresponding to the | ||||
segment in which RCV.HGH was carried at that point in time</t> | ||||
<t>SEG.SEQ: the sequence number of the last received segm | ||||
ent</t> | ||||
<t>TSV.SEQ: the TSVal value of the last received segment< | ||||
/t> | ||||
</list></t> | ||||
</section> | ||||
<section title="rLEDBAT pseudo-code"> | <reference anchor="Apple" target="https://developer.apple.com/videos/play/ | |||
wwdc2021/10239/"> | ||||
<front> | ||||
<title>Reduce network delays for your app</title> | ||||
<author initials="S" surname="Cheshire" fullname="Stuart Cheshire"> | ||||
<organization/> | ||||
</author> | ||||
<author initials="V" surname="Goel" fullname="Vidhi Goel "> | ||||
<organization/> | ||||
</author> | ||||
<date year="2021"/> | ||||
</front> | ||||
<refcontent>Apple Worldwide Developers Conference (WWDC2021), Video</ref | ||||
content> | ||||
</reference> | ||||
<t>We next describe how to integrate the proposed rLEDBAT mechanisms and | <reference anchor="COMNET3"> | |||
an LBE delay-based congestion control algorithm such as LEDBAT or LEDBAT++. We | <front> | |||
describe the integrated algorithm as two procedures, one that is executed when | <title> Design, implementation and validation of a receiver-driven les | |||
a packet is received by a rLEDBAT-enabled endpoint (Figure 2) and another that i | s-than-best-effort transport </title> | |||
s executed when the rLEDBAT-enabled endpoint sends a packet (Figure 3). At the b | <author initials="M" surname="Bagnulo" fullname="Marcelo Bagnulo"> | |||
eginning, RLWND is set to its maximum value, so that the sending rate of the sen | <organization/> | |||
der is governed by the flow control algorithm of the receiver and the TCP slow s | </author> | |||
tart mechanism of the sender, and the ackedBytes variable is set to 0. </t> | <author initials="A" surname="García-Martínez" fullname="Alberto Garcí | |||
a-Martínez"> | ||||
<organization/> | ||||
</author> | ||||
<author initials="A.M." surname="Mandalari" fullname="Anna Maria Manda | ||||
lari"> | ||||
<organization/> | ||||
</author> | ||||
<author initials="P" surname="Balasubramanian" fullname="Praveen Balas | ||||
ubramanian"> | ||||
<organization/> | ||||
</author> | ||||
<author initials="D" surname="Havey" fullname="Daniel Havey"> | ||||
<organization/> | ||||
</author> | ||||
<author initials="G" surname="Montenegro" fullname="Gabriel Montenegro | ||||
"> | ||||
<organization/> | ||||
</author> | ||||
<date month="September" year="2023"/> | ||||
</front> | ||||
<refcontent>Computer Networks, vol. 233</refcontent> | ||||
<seriesInfo name="DOI" value="10.1016/j.comnet.2023.109841"/> | ||||
</reference> | ||||
<t>We assume that the LBE congestion control algorithm defines a WindowIn | <reference anchor="COMNET2"> | |||
crease() function and a WindowDecrease() function. For example, in the case of L | <front> | |||
EDBAT++, the WindowIncrease() function is an additive increase, while the Window | <title>When less is more: BBR versus LEDBAT++</title> | |||
Decrease() function is a multiplicative decrease. In the case of the WindowIncre | <author initials="M" surname="Bagnulo" fullname="Marcelo Bagnulo"> | |||
ase(), we assume that it takes as input the current window size and the number o | <organization/> | |||
f bytes that were acknowledged since the last window update (ackedBytes) and ret | </author> | |||
urns as output the updated window size. In the case of WindowDecrease(), it take | <author initials="A" surname="García-Martínez" fullname="Alberto Garcí | |||
s as input the current window size and returns the updated window size. </t> | a-Martínez"> | |||
<organization/> | ||||
</author> | ||||
<date month="December" year="2022"/> | ||||
</front> | ||||
<refcontent>Computer Networks, vol. 219</refcontent> | ||||
<seriesInfo name="DOI" value="10.1016/j.comnet.2022.109460"/> | ||||
</reference> | ||||
<t>The data structures used in the algorithms are as follows. The sentLis | <reference anchor="COMNET1"> | |||
t is a list that contains the TSval and the local send time of each packet sent | <front> | |||
by the rLEDBAT-enabled endpoint. The TSecr field of the packets received by the | <title>An experimental evaluation of LEDBAT++ </title> | |||
rLEDBAT-enabled endpoint are matched with the sendList to compute the RTT.</t> | <author initials="M" surname="Bagnulo" fullname="Marcelo Bagnulo"> | |||
<organization/> | ||||
</author> | ||||
<author initials="A" surname="García-Martínez" fullname="Alberto Garcí | ||||
a-Martínez"> | ||||
<organization/> | ||||
</author> | ||||
<date month="July" year="2022"/> | ||||
</front> | ||||
<refcontent>Computer Networks, vol. 212</refcontent> | ||||
<seriesInfo name="DOI" value="10.1016/j.comnet.2022.109036"/> | ||||
</reference> | ||||
<t>The RTT values computed for each received packet are stored in the RTT | <reference anchor="rledbat_module" | |||
list, which contains also the received TSecr (to avoid using multiple packets wi | target="https://github.com/net-research/rledbat_module"> | |||
th the same TSecr for RTT calculations, only the first packet received for a giv | <front> | |||
en TSecr is used to compute the RTT). It also contains the local time at which t | <title>rledbat_module</title> | |||
he packet was received, to allow selecting the RTTs measured in a given period ( | <author/> | |||
e.g., in the last 10 minutes). RTTlist is initialized with all its values to its | <date month="September" day="9" year="2022"/> | |||
maximum.</t> | </front> | |||
<refcontent>commit d82ff20</refcontent> | ||||
</reference> | ||||
<figure title="Procedure executed when a packet is received"> | <reference anchor="rLEDBAT-in-ns-3" | |||
target="https://github.com/manas11/implementation-of-rLEDBAT-in-ns | ||||
-3"> | ||||
<front> | ||||
<title>Implementation-of-rLEDBAT-in-ns-3</title> | ||||
<author/> | ||||
<date month="June" day="24" year="2020"/> | ||||
</front> | ||||
<refcontent>commit 2ab34ad</refcontent> | ||||
</reference> | ||||
<sourcecode> | </references> | |||
</references> | ||||
<section numbered="true" toc="default"> | ||||
<name>rLEDBAT Pseudocode</name> | ||||
<t>In this section, we describe how to integrate the proposed rLEDBAT mech | ||||
anisms and an LBE delay-based congestion control algorithm such as LEDBAT or LE | ||||
DBAT++. We describe the integrated algorithm as two procedures: one that | ||||
is executed when a packet is received by a rLEDBAT-enabled endpoint (<xref targe | ||||
t="fig2"/>) and another that is executed when the rLEDBAT-enabled endpoint sends | ||||
a packet (<xref target="fig3"/>). At the beginning, RLWND is set to its maximum | ||||
value, so that the sending rate of the sender is governed by the flow control a | ||||
lgorithm of the receiver and the TCP slow start mechanism of the sender, and the | ||||
ackedBytes variable is set to 0. </t> | ||||
<t>We assume that the LBE congestion control algorithm defines a WindowInc | ||||
rease() function and a WindowDecrease() function. For example, in the case of LE | ||||
DBAT++, the WindowIncrease() function is an additive increase, while the WindowD | ||||
ecrease() function is a multiplicative decrease. In the case of the WindowIncrea | ||||
se() function, we assume that it takes as input the current window size and the | ||||
number of bytes that were acknowledged since the last window update (ackedBytes) | ||||
and returns as output the updated window size. In the case of the WindowDecreas | ||||
e() function, it takes as input the current window size and returns the updated | ||||
window size. </t> | ||||
<t>The data structures used in the algorithms are as follows. The sendList | ||||
is a list that contains the TSval and the local send time of each packet sent | ||||
by the rLEDBAT-enabled endpoint. The TSecr field of the packets received by the | ||||
rLEDBAT-enabled endpoint is matched with the sendList to compute the RTT.</t> | ||||
<t>The RTT values computed for each received packet are stored in the RTTl | ||||
ist, which also contains the received TSecr (to avoid using multiple packets wit | ||||
h the same TSecr for RTT calculations, only the first packet received for a give | ||||
n TSecr is used to compute the RTT). It also contains the local time at which th | ||||
e packet was received, to allow selecting the RTTs measured in a given period (e | ||||
.g., in the last 10 minutes). RTTlist is initialized with all its values to its | ||||
maximum.</t> | ||||
<figure anchor="fig2"> | ||||
<name>Procedure Executed When a Packet Is Received</name> | ||||
<sourcecode type="pseudocode"><![CDATA[ | ||||
procedure receivePacket() | procedure receivePacket() | |||
//Looks for first sent packet with same TSval as TSecr, and, | //Looks for first sent packet with same TSval as TSecr, and | |||
//returns time difference | //returns time difference | |||
receivedRTT = computeRTT(sentList, receivedTSecr, receivedTime) | receivedRTT = computeRTT(sendList, receivedTSecr, receivedTime) | |||
//Inserts minimum value for a given receivedTSecr | //Inserts minimum value for a given receivedTSecr | |||
//note that many received packets may contain same receivedTSecr | //Note that many received packets may contain same receivedTSecr | |||
insertRTT (RTTlist, receivedRTT, receivedTSecr, receivedTime) | insertRTT (RTTlist, receivedRTT, receivedTSecr, receivedTime) | |||
filteredRTT = minLastKMeasures(RTTlist, K=4) | filteredRTT = minLastKMeasures(RTTlist, K=4) | |||
baseRTT = minLastNSeconds(RTTlist, N=180) | baseRTT = minLastNSeconds(RTTlist, N=180) | |||
qd = filteredRTT - baseRTT | qd = filteredRTT - baseRTT | |||
//ackedBytes is the number of bytes that can be used to reduce | //ackedBytes is the number of bytes that can be used to reduce | |||
//the Receive Window - without shrinking it - if necessary | //the receive window - without shrinking it - if necessary | |||
ackedBytes = ackedBytes + receiveBytes | ackedBytes = ackedBytes + receiveBytes | |||
if retransmittedPacketDetected then | if retransmittedPacketDetected then | |||
RLWND = DecreaseWindow(RLWND) // Only once per RTT | RLWND = DecreaseWindow(RLWND) //Only once per RTT | |||
end if | end if | |||
if qd < T then | if qd < T then | |||
RLWND = IncreaseWindow(RLWND, ackedBytes) | RLWND = IncreaseWindow(RLWND, ackedBytes) | |||
else | else | |||
RLWND = DecreaseWindow(RLWND) | RLWND = DecreaseWindow(RLWND) | |||
end if | end if | |||
end procedure | end procedure | |||
</sourcecode> | ]]></sourcecode> | |||
</figure> | </figure> | |||
<figure title="Procedure executed when a packet is sent"> | <figure anchor="fig3"> | |||
<sourcecode> | <name>Procedure Executed When a Packet Is Sent</name> | |||
<sourcecode type="pseudocode"><![CDATA[ | ||||
procedure SENDPACKET | procedure SENDPACKET | |||
if (RLWND > RLWNDPrevious) or (RLWND - RLWNDPrevious < ackedBytes) | if (RLWND > RLWNDPrevious) or (RLWND - RLWNDPrevious < ackedBytes) | |||
then | then | |||
RLWNDPrevious = RLWND | RLWNDPrevious = RLWND | |||
else | else | |||
RLWNDPrevious = RLWND - ackedBytes | RLWNDPrevious = RLWND - ackedBytes | |||
end if | end if | |||
ackedBytes = 0 | ackedBytes = 0 | |||
RLWNDPrevious = RLWND | RLWNDPrevious = RLWND | |||
//Compute the RWND to include in the packet | //Compute the RLWND to include in the packet | |||
RLWND = min(RLWND, fcwnd) | RLWND = min(RLWND, fcwnd) | |||
end procedure | end procedure | |||
</sourcecode> | ]]></sourcecode> | |||
</figure> | </figure> | |||
</section> | </section> | |||
<section numbered="false" toc="default"> | ||||
<name>Acknowledgments</name> | ||||
<t>This work was supported by the EU through the StandICT projects RXQ, CC | ||||
I, and CEL6; the NGI Pointer RIM project; and the H2020 5G-RANGE project; and by | ||||
the Spanish Ministry of Economy and Competitiveness through the 5G-City project | ||||
(TEC2016-76795-C6-3-R).</t> | ||||
<t>We would like to thank ICCRG chairs <contact fullname="Reese Enghardt"/ | ||||
> and <contact fullname="Vidhi Goel"/> for their support on this work. We would | ||||
also like to thank <contact fullname="Daniel Havey"/> for his help. We would lik | ||||
e to thank <contact fullname="Colin Perkins"/>, <contact fullname="Mirja Kühlewi | ||||
nd"/>, and <contact fullname="Vidhi Goel"/> for their reviews and comments on ea | ||||
rlier draft versions of this document.</t> | ||||
</section> | ||||
</back> | </back> | |||
</rfc> | </rfc> | |||
End of changes. 52 change blocks. | ||||
804 lines changed or deleted | 838 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |