v6ops J. Jaeggli Internet-Draft Fastly Intended status: Informational M. Hite Expires: August 18, 2014 M. Byerly Zynga February 14, 2014 Close encounters of the icmp type 2 kind (near misses with ICMPv6 PTB) draft-v6ops-jaeggli-pmtud-ecmp-problem-00 Abstract This document calls attention to the problem posed to ICMP type 2 (PTB) messages by using stateless loading balancing in front of statelful services. It discusses operational mitigations that can mitigate this kind of failure. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on August 18, 2014. Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of Jaeggli, et al. Expires August 18, 2014 [Page 1] Internet-Draft misses with ICMPv6 PTB February 2014 the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3. mitigation . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1. Implementation . . . . . . . . . . . . . . . . . . . . . 4 4. Improvements . . . . . . . . . . . . . . . . . . . . . . . . 4 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 7. Security Considerations . . . . . . . . . . . . . . . . . . . 5 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 5 1. Introduction The operators of large internet applications face some challenges associated with scaling their infrastructure. One approach in the toolbox, used in conjunction with others is the stateless distribution of incoming tcp sessions to stateful servers or to middle boxes known as load-balancers (which could be layer-4 devices or full application proxies) using an ecmp hash. Distribution of traffic in this fashion presents a problem when dealing with icmp signaling that does not match the values used compute the hash employed to distribute incoming traffic. A case that is particularly problematic operationally is path MTU discovery (PMTUD). 2. Problem A common application for stateless load-balancing of TCP or UDP flows is to perform an initial subdivision of flows in front of a stateful load-balancer tier or multiple servers such that the workload can be divided into managable chunks. The division is performed using ECMP forwarding and a stateless but sticky algorythm for hashing the flows across the available paths. This is constrained form of anycast distribution where all anycast destinations are eqidistant topologically from the upstream router making the last nexthop forwarding decision. In this approach the hash used in path selection has to use available headers, typically IP-source/IP- destination/protocol/source-port/destination-port for TCP or UDP though it could be limited to IP-source / IP-destination or in the case of IPv6 IP-source/IP-destination/flow-label. A problem common to this approach to distribution through hashing is it's impact on path-mtu-discovery. An ICMP6 type 2 PTB message generated on the path taken by packets originating from a statelessy load balanced server will have the anycast address as desintation and Jaeggli, et al. Expires August 18, 2014 [Page 2] Internet-Draft misses with ICMPv6 PTB February 2014 will be statelessly load-balanced to one of the stateful loadbalancers. While the ICMPv6 PTB message contains as much of the packet that could not be forwarded as possible the payload does not factor into the forwarding decision and it is not possible to reconstruct the results of the inward bound hashing decision (by reverseing the order of the source/dest/source port/dest port/ protocol number or equivalent hash). An example packet flow and topology follow. ptb -> router ecmp -> nexthop L4/L7 load balancer -> destination router --> load-balancer 1 ---> \\--> load-balancer 2 ---> load-balanced service \--> load-balancer N ---> Figure 1 The router ECMP decsion is used because it is part of the forwarding architecture, can be performed at what amounts to line-rate and is also not depedant on the existance of shared memory or coordination across a distributed forwarding architecture. The ECMP decision is determinstic with resepct to packets having the same hash. The typical case where ICMPv6 PTB messages are recieved at the load- balancer is where the path MTU from the client to the load-balancer is limited by a tunnel, which the client is itself not aware of. In the case of a TCP connection where TLS is employed, the first packet that is likely to exceed a tunnel MTU lower than that specified by the MSS on the client and the load-balancer/server is the serverhello and certificate (With http the client is likely to send the first packet exceeding the MTU given the loquatious nature of http clients). Direct experience says that the frequency of PTB messages is fairly low as a percentage of traffic is pretty low, and that says a lot about native vs tunneled multicast deployment and techniques such as happy-eyeballs may actually contribute some amelioration to the IPv6 client experience. Still the expectation is that PMTUD should work and that unecessary breakage of client traffic should work. Some final oberservations are that it is typically not possible even if potentially desirable to be able to independantly set the TCP mss for different address families. The problem as described does also impact IPv4, however the ability to fragment on wire and the relative rarity of sub-1500 byte MTUs that are not coupled to changes in Jaeggli, et al. Expires August 18, 2014 [Page 3] Internet-Draft misses with ICMPv6 PTB February 2014 client behavior (client vpn tunnels for example typically set the MTU accordingly for perfomance reasons) makes the problem sufficiently rare that some depolyments simply choose to ignore it. 3. mitigation Mitigation of the issue of effectively random distribution of ICMPv6 PTB messages involves insuring that the packet in question is distributed to the load-balancer responsible for for the flow for which it is generated. Mitigation could be done by looking into payload of the ICMP message as the load balancer would (to determine which TCP flow it was associated with) before making a forwarding decision. Alternative mitigation is predicated distributing the PTB message to all load-balancers under the assumption that the one that needs it will be able to match it to a flow and the others will safely discard it. Such distribution has signficant implications for resource consumption (it might result in a 10 or more fold increase in the amount of ICMP PTB traffic) and the potential for DOS if not carefully employed. Fortunately the number of flows for which this problem occurs is relatively small (10 or fewer pps on 1Gb/s or more worth of https traffic) and sensible ingress rate limits can protect the load-balancer tiers with potential fallout only under circumstances of deliberate duress. 3.1. Implementation 1. Filter-based-forwarding matches next-header ICMP6 ICMP type-2 and matches a next hop on a particular subnet directly attached to both border routers. (Filter is policed to 1000pps) 2. Filter is applied on input side of all external interfaces 3. A proxy located at the next hop forwards ICMP6 type-2 packets recived at the nexthop to an ethernet multicast address (example ff:ff:ff:ff:ff:ff) on all specificed subnets. This was necessitated by, The routers inability (in IPv6) to forward the same packet to multiple nexthops. 4. load balancer(s) recieve and discard or process packet as needed 4. Improvements There are several ways to imagine that improvements could be made to the situation with respect load balancing ICMP PTB. Jaeggli, et al. Expires August 18, 2014 [Page 4] Internet-Draft misses with ICMPv6 PTB February 2014 1. Routers with sufficient capacity within the lookup process could parse all the way through the L3 or L4 header in the ICMP Payload begining at bit offset 32 of the ICMP header. By reordering the elements of the hash to match the inward direction of the flow the PTB could be directed to the same nexthop as the incoming packets in the flow. 2. The Fib could be programed with a multicast distribution tree that included all of the necessary nexthops. 5. Acknowledgements 6. IANA Considerations This memo includes no request to IANA. 7. Security Considerations Authors' Addresses Joel Jaeggli Fastly Mountain View, CA US Email: joelja@gmail.com Matt Hite Zynga Redwood City, CA US Email: mhite@zynga.com Matt Byerly Zynga HI US Email: mbyerly@zynga.com Jaeggli, et al. Expires August 18, 2014 [Page 5]