Internet Engineering Task Force A. Sabatini Internet-Draft Broker Communications Inc. Intended status: Standards Track August 28, 2012 Expires: March 1, 2013 TDLC, An improved "Selective Repeat" protocol. draft-sabatini-tdlc-00 Abstract This paper describes an extremely efficient and reliable real-time "Selective Repeat" protocol that was implemented as the backbone transmission protocol for the largest financial information network in the United States during the 1980's and 1990's. The protocol is capable of working efficiently in all environments including those with substantial delay and high error or congestion rates, transmitting optimally down to the point that there is one error per block, never resending a block that has been received correctly and always sending the blocks in an order which is optimal for expeditious delivery. This protocol represents a new way of looking at "Selective Repeat" protocols and the concepts it introduces have wide applicability in other protocols. It works equally well in point to multi-point environments as it does point to point ones. It has been used as both a link level and a transport level protocol. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on March 1, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. Sabatini Expires March 1, 2013 [Page 1] Internet-Draft TDLC Theory August 2012 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Underlying Concepts . . . . . . . . . . . . . . . . . . . . . 4 4. The HDLC (or X.25) protocol . . . . . . . . . . . . . . . . . 4 5. Considerations in designing TDLC . . . . . . . . . . . . . . . 5 5.1. Re-Transmission . . . . . . . . . . . . . . . . . . . . . 5 5.2. Sequence state . . . . . . . . . . . . . . . . . . . . . . 5 5.3. Transmit sequence numbering . . . . . . . . . . . . . . . 6 5.4. Window Size . . . . . . . . . . . . . . . . . . . . . . . 7 5.5. Connection and Re-Connection . . . . . . . . . . . . . . . 7 5.6. Use of Receive Ready . . . . . . . . . . . . . . . . . . . 8 5.7. Point to Multipoint . . . . . . . . . . . . . . . . . . . 9 5.8. Calculation of round trip delay . . . . . . . . . . . . . 9 6. Performance . . . . . . . . . . . . . . . . . . . . . . . . . 9 7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 10 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 9. Security Considerations . . . . . . . . . . . . . . . . . . . 10 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10.1. Normative References . . . . . . . . . . . . . . . . . . . 10 10.2. Informative References . . . . . . . . . . . . . . . . . . 11 Appendix A. Related Readings . . . . . . . . . . . . . . . . . . 11 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12 Sabatini Expires March 1, 2013 [Page 2] Internet-Draft TDLC Theory August 2012 1. Introduction Although known and used in the financial community for over 20 years, and the subject of presentations to both industry and academic groups there has never been an academic paper written on this protocol. As both the inventor of this protocol and the original implementer I felt it was my responsibility to correct this oversight. This protocol is a direct result of my studies of the Yu-Lin protocol and the first implementation of it was in 1983. On the first anniversary of its industry publication in 1984, Telerate decided not to assert intellectual property rights and to add it to the public domain. Two attempts were made to commercialize it outside of the financial industry, once by AT&T in 1988 and by AT&T Paradyne in 1991 but lack of an approved standard ultimately lead in both cases to AT&T abandoning the project. Since Telerate operated worldwide over an amazing number of communications technologies, TDLC was born out of the need for a protocol that would handle any communications environment no matter how noisy, how slow or how much delay (including multiple satellite hops) was in the path. In later years its properties were found valuable in congestion situations where packets were dropped. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. History Both major classes of automatic repeat request (ARQ) transport protocols, represented by SDLC/HDLC/X.25 and TCP, have an implicit assumption that the delay between sender and receiver is very small. Because of this assumption when used over a link with considerable delay their error recovery mechanisms are inadequate to insure optimum use of the link or consistent delivery times. Over the years a number of papers suggesting improved "GO BACK N", and selective repeat (SR) procedures and a number of protocols (such as SACK, XTP, NETBLT) have been proposed but none addressed the underlying considerations of recreating the current state of the receiver at the transmitter. Sabatini Expires March 1, 2013 [Page 3] Internet-Draft TDLC Theory August 2012 3. Underlying Concepts In order for a sender to know how to optimally transmit messages to a receiver the sender must recreate the state of the receiver as of the last acknowledgement received and then "age" or modify that state by updating it based upon the messages sent since the state implicit in the acknowledgement was current. In order to do this the sender must maintain a transmission order buffer which lists the message number of each message as it is sent. In TDLC we called the index into the transmission order buffer "Send State" and transmitted this state variable with each message. The receiver, after correctly receiving the message, saves this value and returns it (now called "Returned State") and the list of outstanding messages and their status with each acknowledgement. When the sender receives this information it is then capable of constructing a list of missing messages by taking the list of such messages from the received acknowledgement and then removing from that list all messages that have been transmitted since the message which caused the acknowledgement which is all messages sent with indexes between the current "Send State" and the "Returned State" in the acknowledgement message. The list of outstanding messages is sent as two indexes and a bit array whose length is the difference between these two indexes. The first index is the "Next Confirmed", this value confirms that all messages up to (but not including) this value have been received correctly. The second index is the value of the "Next Highest" which is the highest message number received plus one. The reason that these are both "plus one" is to facilitate link start where there are no previous messages and thus the values would become undefined. The bit array has a one bit if the message is being acknowledged; a zero if has not been received. Thus by transmitting the complete acknowledgement information from the receiver along with an indicator to the sender as to its state current at the time of the acknowledgement the sender can accurately recreate the current status of the receiver assuming all "in flight" messages were received and thus only send the unacknowledged messages starting with the oldest along with any new messages whose retransmission is requested. 4. The HDLC (or X.25) protocol All "Selective Repeat" protocols must recreate the state of the receiver and ultimately implement, perhaps implicitly, the concepts outlined above. It is therefore instructive to look at how HDLC actually works. One of the least understood aspects of this protocol is that the receive sequence number performs three functions simultaneously, it provides a floor for the acknowledgement system Sabatini Expires March 1, 2013 [Page 4] Internet-Draft TDLC Theory August 2012 (it acknowledges all messages up to this one), it indicates the last message received correctly (highest in sequence) and it establishes the sequence of the messages received (and thus transmitted). This is accomplished by the rules of its use. Since you only ever acknowledge the last in sequence message you received, you implicitly acknowledge all that preceded it. Since HDLC is a "GO BACK N" protocol, the sequence of the messages is always identical to the order of transmission (you discard those messages which are out of message sequence). Even with the inclusion of a selective reject construct you can only selectively reject the first possible missing message (the message at "N"). Thus to provide any enhancements to this class of protocols, as we have done in TDLC, we must first split these functions into three separate entities, namely a "Confirmed" which indicates that all the messages up through this one have been received properly, a "Highest Received" which indicates the highest message number of any message properly received that is within the window, and a sequence state token which denotes the order in which the messages were sent. 5. Considerations in designing TDLC 5.1. Re-Transmission Given that we must retransmit a series of messages whose number certainly exceeds 8, we must find some convenient but compact methodology to express which messages need to be re-transmitted. The two most obvious are either a bit map or ordered pairs of indexes. Since there is a high probability that the number of messages which needs to be transmitted is fairly small, the most compact system, a bit map, was chosen. Since it is inelegant to send more bits than are actually needed we should only transmit those bits between the limits "Next Confirmed" and "Next highest received". Although it has been found more practical to transmit all of these bits, one need not send the bit for "Next Confirmed", which by definition you know is missing. "Highest Received" is a different case since it may have been set by an "RR" message and thus the message itself may not have been received. 5.2. Sequence state One property of any transmission environment which is often ignored is that of delay. Even on a locally attached link the time involved in physically sending and receiving the message along with the delay before it is actually processed introduces an element of uncertainty as to what the other end of the link is doing, what messages it has seen and whether it has received a request for retransmission. This Sabatini Expires March 1, 2013 [Page 5] Internet-Draft TDLC Theory August 2012 problem can be solved relatively easily by including a sequence state index with each message sent. This token, which is returned with the next transmission from the other end of the link, tells the transmitter which messages were sent after the point that the receiver is acknowledging. Thus each message being transmitted, whether an original transmission or a re-transmission, increments the sequence state. By maintaining a vector, whose index is sequence state, of the message numbers in the order in which the messages were transmitted or retransmitted, when the sequence state is returned to the transmitting end, the transmitter knows which messages the receiver should have seen and if the receiver still indicates that it has not yet received a particular message that was transmitted then it must be queued for transmission again. In a practical application of this technology, a bit map corresponding to all message numbers is kept by the transmitter. As each message is added to the queue to be transmitted, its corresponding bit in the bit map is turned off, and as it is transmitted it is turned on. When a bit map representing message status is received, the bits for all unacknowledged messages are turned off in the transmitter bitmap. Then using the received state as an index into the vector of transmitted messages, any messages that the transmitter has sent since then are turned back on. The bit map of outgoing messages is then rescanned starting at message "Next Confirmed" and the message associated with the first zero bit is set as the next message to transmit. The transmitting process then continues processing the bit map transmitting messages which have a zero bit until it reaches "Next Highest Queued". A variant of TDLC for higher speed links was designed which used two byte message numbers and a list of acknowledged message ranges in place of the bit map in the original protocol 5.3. Transmit sequence numbering One problem associated with SDLC which does not exist, for example, in DECNET is that of losing the last message or series of messages with no indication until the link times out. This is a small problem at low data rates and short transmission delay environments, but can severely degrade the link under any other circumstances. This is due to the fact that only the "I" frame has a transmit sequence number as well as a receive sequence number, thus there is no way to tell from the "link idle" RR that a message is in fact missing. The obvious solution to this problem is to include a second, transmit sequence number on the "RR", "REJ" and "SREJ" packets (which are very short anyway) to facilitate recovery. In TDLC we treat all messages, whether data carrying or not, equally thus the complete state is transmitted with each message. Since we are sending the complete Sabatini Expires March 1, 2013 [Page 6] Internet-Draft TDLC Theory August 2012 state only a "RR" message is needed, the "REJ" and "SREJ" are implicit. 5.4. Window Size One concept that has not been well explored for "GO BACK N" protocols is the effect of window size on data recovery. When not explicitly acknowledging all members in a sequence, but rather individual messages you create a zone of uncertainty as to which message numbers are merely late acknowledgments or retransmissions and which are errored but CRC correct messages (either through the transmission media or program problem). Thus because of these two features one must assume a minimum of a "window-size" leading and a "window-size" trailing interval in which the message number can legally reside. Thus with NO protection from errored messages, the window size can be at most half the message modulus and good practice, because it is easier and faster to deal with powers of 2, dictates we should keep it at one quarter of modulus, thus allowing us to know with certainty that messages outside of this range should be ignored. 5.5. Connection and Re-Connection One of the known problems in SDLC is that of link establishment. In normal SDLC since there is no information passed in the SABM packet other than the request for connection, it is impossible to explicitly tie UA replies to their respective SABM other than by allowing sufficient time to elapse such that it is impossible for more than one SABM or UA to be outstanding (interestingly the same error resolution by waiting for extended periods is the ultimate heart of all Binary Synchronous Protocols). This leads to excessive line time on long delay circuits and still does not guard against SABM collisions where each side initializes separately and tries to send SABM's simultaneously. This problem would just be of theoretical interest if it only occurred when one side or the other accomplished a physical restart of the link, however with the lack of a logical restart mechanism the physical restart using SABM occurs much more frequently in the real world and results in a total destruction and re-establishment of the link every time there is a communications interruption which exceeds the limits of it's timers as often happens with noisy satellite environments or frame relay transmissions in a congested environment. Another frightful by-product of this behavior is that of the link layer of HDLC makes un-OSI demands on the layers above; delivering restart indications all the way up through the protocol stack. The answer to these problems is the introduction of two new commands and two new replies, "INITIAL CONNECT" (ICN), "RE- CONNECT" (RCN), "INITIAL CONNECT ACKNOWLEDGE" (ICA), and "RE-CONNECT ACKNOWLEDGE" (RCA). These new commands allow the connected link layers to differentiate between a Physical Restart by using an ICN Sabatini Expires March 1, 2013 [Page 7] Internet-Draft TDLC Theory August 2012 command or an ICA reply and a Logical Restart by using a RCN command or a RCA reply. When a link is re-established after an interruption, the exchange of RCN/RCA reconnects the link without affecting the upper layers of the protocol. If either side needs to have the link counters initialized or the message buffers flushed it need only send an ICN or, in response to an RCN, an ICA to force link initialization. To guard against possible link re-establishment where the circuit may have been looped back causing acknowledgment and disposal of messages that weren't actually received and the possibility of the physical switching of a circuit from one active port to another, the data field in the ICN or RCN command will have a random identity stamp and ICA or RCA will return these stamps so that replies can be matched. By comparison of these stamps one can establish/determine the absolute link session information. 5.6. Use of Receive Ready Since the goal of a real time protocol is to be event driven ("ACK Clocked" in TCP nomenclature) and thus much less dependent on timers to correct missing message situations the rules of about using Receive Ready have been broadened in order to maximize the probability of error correction without timer expiration. Note that receiving a "RR" packet with a changed send token is a change in the state of the receiver so the receiver must return a "RR" in response. The first rule about using Receive Ready is that after the receipt of an ICA or RCA packet the receiving end will immediately send an "RR" packet. This is in order to allow the far end to start transmitting, the far end, once it is in connection state, can not transmit until an "RR" is received. Like wise if an end sends an ICA or RCA it must immediately follow with a "RR" to allow the far end to start transmitting data. The second rule is that after the transmission of the last information packet "I" a "RR" packet is always sent. Since a loss of the last information packet will not be noticed until a "RR" or "I" is received, if we send one immediately we can help insure the detection of the loss of the last data packet immediately. The third rule is that if there are messages that are unacknowledged or undelivered and the link has gone idle, another "RR" packet is sent after an interval T3 where T3 is typically one quarter of the roundtrip delay between the two ends (or defaulted to one quarter of T1) as a maximum. If there are no messages outstanding "RR" is sent Sabatini Expires March 1, 2013 [Page 8] Internet-Draft TDLC Theory August 2012 as a "heartbeat" message after interval T1 where T1 is typically slightly greater than twice the total roundtrip delay with a default of four seconds. T2, the link disconnected timer, is typically ten times T1 or defaults to 40 seconds. On dedicated links where transmitting a "RR" has no cost the preferred practice is simply to repeatedly send "RR" packets when the link is idle. 5.7. Point to Multipoint TDLC has been used in both polled multipoint and broadcast multipoint with individual low speed acknowledgement channels. The inclusive oring of all retransmission requests accommodates the retransmission needs of all tributaries at once. One major issue in multipoint is a single station making so many retransmission requests that it impairs the entire channel. Since TDLC had stringent real time delivery requirements an upper level started canceling messages which were not received within 10 seconds, causing the offending terminal to restart but protecting the integrity of the overall facility. Since this is not a generalized solution counts were maintained of the number of messages requested for retransmission by station and an error threshold was established, those stations exceeding the threshold were disconnected. Further study is warranted to establish a better threshold mechanism. 5.8. Calculation of round trip delay One pleasant benefit of having a token which is immediately returned by the far end is the easy calculation of round trip delay. In TDLC we can save a time stamp along with the message number in our transmission order array. This allows us to calculate round trip delay when we receive our transmission state value and use it to access the timestamp. Since more than one received message might have the same transmission state value we zero the timestamp after use to indicate that the value should not be used again. Note that if an acknowledgement is lost we will calculate a longer delay than is accurate therefore we must smooth the returned values, typically returning the smallest out of the last N (where N in TDLC was four). 6. Performance The point to point performance of TDLC has been extensively tested (it was a production protocol for twenty years in a worldwide network of 88,000 computers), this testing confirming its near optimal performance no matter what the combination of message loss and delay. This was due to each reply having both the complete state of the receiver and that of the sender at the time the reply was generated. Sabatini Expires March 1, 2013 [Page 9] Internet-Draft TDLC Theory August 2012 For each reply that got through the message stream could be re- optimized to reflect the current state at the far end. Since the point to point version was designed for dedicated bandwidth situations it was normally configured to send "RR" messages at close to continuous rates. Since the hardware generated and processed these directly (later versions of the protocol were implemented directly in the line interface) they imposed no overhead in a dedicated bandwidth situation but provided the earliest possible indication of message loss and thus the least overall latency. 7. Conclusion TDLC represents an entirely new way at looking at selective acknowledgement protocols. Using the concepts outlined a whole new family of protocols optimized for long delay or high loss rates (especially due to noise or congestion) is possible. TDLC itself represents a large improvement over HDLC based protocols and can be used as a replacement transport protocol with little additional overhead anywhere that HDLC could be used. 8. IANA Considerations This memo includes no request to IANA. All drafts are required to have an IANA considerations section (see the update of RFC 2434 [I-D.narten-iana-considerations-rfc2434bis] for a guide). If the draft does not require IANA to do anything, the section contains an explicit statement that this is the case (as above). If there are no requirements for IANA, the section will be removed during conversion into an RFC by the RFC Editor. 9. Security Considerations All drafts are required to have a security considerations section. See RFC 3552 [RFC3552] for a guide. 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Sabatini Expires March 1, 2013 [Page 10] Internet-Draft TDLC Theory August 2012 10.2. Informative References [I-D.narten-iana-considerations-rfc2434bis] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", draft-narten-iana-considerations-rfc2434bis-09 (work in progress), March 2008. [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RFC0998] Clark, D., Lambert, M., and L. Zhang, "NETBLT: A bulk data transfer protocol", RFC 998, March 1987. [RFC1072] Jacobson, V. and R. Braden, "TCP extensions for long-delay paths", RFC 1072, October 1988. [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP Selective Acknowledgment Options", RFC 2018, October 1996. [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June 1999. [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, July 2003. Appendix A. Related Readings S. Lin and P. S. Yu, "An effective error control scheme for satellite communications", IEEE Transactions on Communications, vol. COM-28, no. 3, pp. 395--401, Mar. 1980. P. S. Yu and S. Lin, "An efficient selective-repeat ARQ scheme for satellite channels and its throughput analysis," IEEE Transactions on Communications, vol. 29, no. 3, pp. 353--363, March 1981. S. Lin and P. S. Yu, "A hybrid ARQ scheme with parity retransmission for error control of satellite channels," IEEE Trans. Commun., vol. 30, pp. 1701-1719, July 1982. M. J. Miller and S. L. Lin. "The analysis of some selective-repeat ARQ schemes with finite receiver buffer", IEEE Transactions on Communications, Vol. COM-29, No. 9, pages 1307--1315, September 1981. S. R. Chandran and S. Lin, "A selective repeat ARQ scheme for point- to-multipoint communications and its throughput analysis," ACM Sabatini Expires March 1, 2013 [Page 11] Internet-Draft TDLC Theory August 2012 Computer Communications Review, vol. 16, pp. 292--301, Aug. 1986. S. R. Chandran and S. Lin, "Selective-repeat-ARQ schemes for broadcast links," IEEE Trans. Commun., vol. 40, no. 1, pp. 12-19, Jan. 1992. H. M. de Lima, O. Carlos and M. B. Duarte, "A Point-to-Multipoint ARQ Scheme with Multicopy Retransmission for High Speed Satellite Communications" IEEE International Telecommunication Symposium ITS'94, (Rio de Janeiro, 1994) A. N. Netravali, W. D. Roome, and K. Sabnani, "Design and implementation of a high-speed transport protocol," IEEE Transactions on Communications, vol. COM38, pp. 2010. K. K. Sabnani and A. N. Netravali, "A high speed transport protocol for datagram/virtual circuit networks", ACM SIGCOMM Computer Communication Review , Symposium proceedings on Communications architectures & protocols SIGCOMM '89, Volume 19 Issue 4, Aug. 1989. D. Towsley and S. Mithal, "A selective repeat ARQ protocol for a point to multipoint channel," in IEEE International Conference on Communications INFOCOM'87, (San Francisco,CA), pp. 521--526, Apr. 1987. E. Weldon, "An improved selective repeat ARQ strategy", IEEE Transactions on Communications, vol. COM-30, no. 3, pp. 480--486, Mar. 1982. J. L. Wang and J. A. Silvester, "Optimal adaptative multireceiver ARQ protocols," IEEE Transactions on Communications, vol. COM-41, pp. 1816--1829, Dec. 1993. T. Shikama and T. Mizuno, "Resequencing schemes for selective-repeat ARQ and their performance", Advanced Information Networking and Applications, 2005. AINA 2005. 19th International Conference on Volume 2, Issue , Page(s): 491 - 494 vol.2, Mar. 2005. R. Sanders and A. Weaver 1990. "The Xpress transfer protocol (XTP): A tutorial", SIGCOMM Comp. Comm. Rev. 20, 5, Pg 67-80, Oct. 1990. Sabatini Expires March 1, 2013 [Page 12] Internet-Draft TDLC Theory August 2012 Author's Address Anthony Sabatini Broker Communications Inc. 200 West 20th Street Apt. 1216 New York, NY 10011 US Phone: +1 212 867 7179 Email: tsabatini@hotmail.com URI: http://tsabatini.com/ Sabatini Expires March 1, 2013 [Page 13]