Diff: rfc9199.original

	rfc9199.original	rfc9199.txt


	DNSOP Working Group G.C.M. Moura	Independent Submission G. Moura
	Internet-Draft SIDN Labs/TU Delft	Request for Comments: 9199 SIDN Labs/TU Delft
	Intended status: Informational W. Hardaker	Category: Informational W. Hardaker
	Expires: 8 July 2022 J. Heidemann	ISSN: 2070-1721 J. Heidemann
	USC/Information Sciences Institute	USC/Information Sciences Institute
	M. Davids	M. Davids
	SIDN Labs	SIDN Labs

	4 January 2022	March 2022


	Considerations for Large Authoritative DNS Servers Operators	Considerations for Large Authoritative DNS Server Operators
	draft-moura-dnsop-authoritative-recommendations-11

	Abstract	Abstract

	Recent research work has explored the deployment characteristics and	Recent research work has explored the deployment characteristics and
	configuration of the Domain Name System (DNS). This document	configuration of the Domain Name System (DNS). This document
	summarizes the conclusions from these research efforts and offers	summarizes the conclusions from these research efforts and offers
	specific, tangible considerations or advice to authoritative DNS	specific, tangible considerations or advice to authoritative DNS
	server operators. Authoritative server operators may wish to follow	server operators. Authoritative server operators may wish to follow
	these considerations to improve their DNS services.	these considerations to improve their DNS services.

	It is possible that the results presented in this document could be	It is possible that the results presented in this document could be
	applicable in a wider context than just the DNS protocol, as some of	applicable in a wider context than just the DNS protocol, as some of

	the results may generically apply to any stateless/short-duration,	the results may generically apply to any stateless/short-duration
	anycasted service.	anycasted service.

	This document is not an IETF consensus document: it is published for	This document is not an IETF consensus document: it is published for
	informational purposes.	informational purposes.

	Status of This Memo	Status of This Memo


	This Internet-Draft is submitted in full conformance with the	This document is not an Internet Standards Track specification; it is
	provisions of BCP 78 and BCP 79.	published for informational purposes.

	Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF). Note that other groups may also distribute
	working documents as Internet-Drafts. The list of current Internet-
	Drafts is at https://datatracker.ietf.org/drafts/current/.


	Internet-Drafts are draft documents valid for a maximum of six months	This is a contribution to the RFC Series, independently of any other
	and may be updated, replaced, or obsoleted by other documents at any	RFC stream. The RFC Editor has chosen to publish this document at
	time. It is inappropriate to use Internet-Drafts as reference	its discretion and makes no statement about its value for
	material or to cite them other than as "work in progress."	implementation or deployment. Documents approved for publication by
		the RFC Editor are not candidates for any level of Internet Standard;
		see Section 2 of RFC 7841.


	This Internet-Draft will expire on 8 July 2022.	Information about the current status of this document, any errata,
		and how to provide feedback on it may be obtained at
		https://www.rfc-editor.org/info/rfc9199.

	Copyright Notice	Copyright Notice

	Copyright (c) 2022 IETF Trust and the persons identified as the	Copyright (c) 2022 IETF Trust and the persons identified as the
	document authors. All rights reserved.	document authors. All rights reserved.

	This document is subject to BCP 78 and the IETF Trust's Legal	This document is subject to BCP 78 and the IETF Trust's Legal

	Provisions Relating to IETF Documents (https://trustee.ietf.org/	Provisions Relating to IETF Documents
	license-info) in effect on the date of publication of this document.	(https://trustee.ietf.org/license-info) in effect on the date of
	Please review these documents carefully, as they describe your rights	publication of this document. Please review these documents
	and restrictions with respect to this document. Code Components	carefully, as they describe your rights and restrictions with respect
	extracted from this document must include Revised BSD License text as	to this document.
	described in Section 4.e of the Trust Legal Provisions and are
	provided without warranty as described in the Revised BSD License.

	Table of Contents	Table of Contents


	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3	1. Introduction
	2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 3	2. Background
	3. Considerations . . . . . . . . . . . . . . . . . . . . . . . 5	3. Considerations
	3.1. C1: Deploy anycast in every authoritative server to enhance	3.1. C1: Deploy Anycast in Every Authoritative Server to Enhance
	distribution and latency . . . . . . . . . . . . . . . . 5	Distribution and Latency
	3.1.1. Research background . . . . . . . . . . . . . . . . . 5	3.1.1. Research Background
	3.1.2. Resulting considerations . . . . . . . . . . . . . . 6	3.1.2. Resulting Considerations
	3.2. C2: Optimizing routing is more important than location	3.2. C2: Optimizing Routing is More Important than Location
	count and diversity . . . . . . . . . . . . . . . . . . . 7	Count and Diversity
	3.2.1. Research background . . . . . . . . . . . . . . . . . 7	3.2.1. Research Background
	3.2.2. Resulting considerations . . . . . . . . . . . . . . 8	3.2.2. Resulting Considerations
	3.3. C3: Collecting anycast catchment maps to improve	3.3. C3: Collect Anycast Catchment Maps to Improve Design
	design . . . . . . . . . . . . . . . . . . . . . . . . . 8	3.3.1. Research Background
	3.3.1. Research background . . . . . . . . . . . . . . . . . 8	3.3.2. Resulting Considerations
	3.3.2. Resulting considerations . . . . . . . . . . . . . . 9	3.4. C4: Employ Two Strategies When under Stress
	3.4. C4: When under stress, employ two strategies . . . . . . 9	3.4.1. Research Background
	3.4.1. Research background . . . . . . . . . . . . . . . . . 10	3.4.2. Resulting Considerations
	3.4.2. Resulting considerations . . . . . . . . . . . . . . 11	3.5. C5: Consider Longer Time-to-Live Values Whenever Possible
	3.5. C5: Consider longer time-to-live values whenever	3.5.1. Research Background
	possible . . . . . . . . . . . . . . . . . . . . . . . . 11	3.5.2. Resulting Considerations
	3.5.1. Research background . . . . . . . . . . . . . . . . . 11	3.6. C6: Consider the Difference in Parent and Children's TTL
	3.5.2. Resulting considerations . . . . . . . . . . . . . . 13	Values
	3.6. C6: Consider the TTL differences between parents and	3.6.1. Research Background
	children . . . . . . . . . . . . . . . . . . . . . . . . 14	3.6.2. Resulting Considerations
	3.6.1. Research background . . . . . . . . . . . . . . . . . 14	4. Security Considerations
	3.6.2. Resulting considerations . . . . . . . . . . . . . . 15	5. Privacy Considerations
	4. Security considerations . . . . . . . . . . . . . . . . . . . 15	6. IANA Considerations
	5. Privacy Considerations . . . . . . . . . . . . . . . . . . . 15	7. References
	6. IANA considerations . . . . . . . . . . . . . . . . . . . . . 15	7.1. Normative References
	7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15	7.2. Informative References
	8. References . . . . . . . . . . . . . . . . . . . . . . . . . 16	Acknowledgements
	8.1. Normative References . . . . . . . . . . . . . . . . . . 16	Contributors
	8.2. Informative References . . . . . . . . . . . . . . . . . 17	Authors' Addresses
	Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20

	1. Introduction	1. Introduction


	This document summarizes recent research work that explored the	This document summarizes recent research that explored the deployed
	deployed DNS configurations and offers derived, specific tangible	DNS configurations and offers derived, specific, tangible advice to
	advice to DNS authoritative server operators (DNS operators	DNS authoritative server operators (referred to as "DNS operators"
	hereafter). The considerations (C1--C5) presented in this document	hereafter). The considerations (C1-C6) presented in this document
	are backed by peer-reviewed research works, which used wide-scale	are backed by peer-reviewed research, which used wide-scale Internet
	Internet measurements to draw their conclusions. This document	measurements to draw their conclusions. This document summarizes the
	summarizes the research results and describes the resulting key	research results and describes the resulting key engineering options.
	engineering options. In each section, it points readers to the	In each section, readers are pointed to the pertinent publications
	pertinent publications where additional details are presented.	where additional details are presented.

	These considerations are designed for operators of "large"	These considerations are designed for operators of "large"

	authoritative DNS servers. In this context, "large" authoritative	authoritative DNS servers, which, in this context, are servers with a
	servers refers to those with a significant global user population,	significant global user population, like top-level domain (TLD)
	like top-level domain (TLD) operators, run by either a single or	operators, run by either a single operator or multiple operators.
	multiple operators. Typically these networks are deployed on wide	Typically, these networks are deployed on wide anycast networks
	anycast networks [RFC1546][AnyBest]. These considerations may not be	[RFC1546] [AnyBest]. These considerations may not be appropriate for
	appropriate for smaller domains, such as those used by an	smaller domains, such as those used by an organization with users in
	organization with users in one unicast network, or in one city or	one unicast network or in a single city or region, where operational
	region, where operational goals such as uniform, global low latency	goals such as uniform, global low latency are less required.
	are less required.

	It is possible that the results presented in this document could be	It is possible that the results presented in this document could be
	applicable in a wider context than just the DNS protocol, as some of	applicable in a wider context than just the DNS protocol, as some of

	the results may generically apply to any stateless/short-duration,	the results may generically apply to any stateless/short-duration
	anycasted service. Because the conclusions of the reviewed studies	anycasted service. Because the conclusions of the reviewed studies
	don't measure smaller networks, the wording in this document	don't measure smaller networks, the wording in this document

	concentrates solely on disusing large-scale DNS authoritative	concentrates solely on discussing large-scale DNS authoritative
	services only.	services.

	This document is not an IETF consensus document: it is published for	This document is not an IETF consensus document: it is published for
	informational purposes.	informational purposes.

	2. Background	2. Background


	The DNS has main two types of DNS servers: authoritative servers and	The DNS has two main types of DNS servers: authoritative servers and
	recursive resolvers, shown by a representational deployment model in	recursive resolvers, shown by a representational deployment model in

	Figure 1. An authoritative server (shown as AT1--AT4 in Figure 1)	Figure 1. An authoritative server (shown as AT1-AT4 in Figure 1)
	knows the content of a DNS zone, and is responsible for answering	knows the content of a DNS zone and is responsible for answering
	queries about that zone. It runs using local (possibly automatically	queries about that zone. It runs using local (possibly automatically
	updated) copies of the zone and does not need to query other servers	updated) copies of the zone and does not need to query other servers

	[RFC2181] in order to answer requests. A recursive resolver (Re1--	[RFC2181] in order to answer requests. A recursive resolver
	Re3) is a server that iteratively queries authoritative and other	(Re1-Re3) is a server that iteratively queries authoritative and
	servers to answer queries received from client requests [RFC1034]. A	other servers to answer queries received from client requests
	client typically employs a software library called a stub resolver	[RFC1034]. A client typically employs a software library called a
	(stub in Figure 1) to issue its query to the upstream recursive	"stub resolver" ("stub" in Figure 1) to issue its query to the
	resolvers [RFC1034].	upstream recursive resolvers [RFC1034].

	+-----+ +-----+ +-----+ +-----+	+-----+ +-----+ +-----+ +-----+
	\| AT1 \| \| AT2 \| \| AT3 \| \| AT4 \|	\| AT1 \| \| AT2 \| \| AT3 \| \| AT4 \|
	+-----+ +-----+ +-----+ +-----+	+-----+ +-----+ +-----+ +-----+
	^ ^ ^ ^	^ ^ ^ ^
	\| \| \| \|	\| \| \| \|
	\| +-----+ \| \|	\| +-----+ \| \|
	+------\| Re1 \|----+\| \|	+------\| Re1 \|----+\| \|
	\| +-----+ \|	\| +-----+ \|
	\| ^ \|	\| ^ \|
	\| \| \|	\| \| \|
	\| +----+ +----+ \|	\| +----+ +----+ \|
	+------\|Re2 \| \|Re3 \|------+	+------\|Re2 \| \|Re3 \|------+
	+----+ +----+	+----+ +----+
	^ ^	^ ^
	\| \|	\| \|
	\| +------+ \|	\| +------+ \|
	+-\| stub \|-+	+-\| stub \|-+
	+------+	+------+


	Figure 1: Relationship between recursive resolvers (Re) and	Figure 1: Relationship between Recursive Resolvers (Re) and
	authoritative name servers (ATn)	Authoritative Name Servers (ATn)

	DNS queries issued by a client contribute to a user's perceived	DNS queries issued by a client contribute to a user's perceived

	perceived latency and affect user experience [Singla2014] depending	latency and affect the user experience [Singla2014] depending on how
	on how long it takes for responses to be returned. The DNS system	long it takes for responses to be returned. The DNS system has been
	has been subject to repeated Denial of Service (DoS) attacks (for	subject to repeated Denial-of-Service (DoS) attacks (for example, in
	example, in November 2015 [Moura16b]) in order to specifically	November 2015 [Moura16b]) in order to specifically degrade the user
	degrade user experience.	experience.

	To reduce latency and improve resiliency against DoS attacks, the DNS	To reduce latency and improve resiliency against DoS attacks, the DNS
	uses several types of service replication. Replication at the	uses several types of service replication. Replication at the

	authoritative server level can be achieved with (i) the deployment of	authoritative server level can be achieved with the following:
	multiple servers for the same zone [RFC1035] (AT1---AT4 in Figure 1),
	(ii) the use of IP anycast [RFC1546][RFC4786][RFC7094] that allows	i. the deployment of multiple servers for the same zone [RFC1035]
	the same IP address to be announced from multiple locations (each of	(AT1-AT4 in Figure 1);
	referred to as an "anycast instance" [RFC8499]) and (iii) the use of
	load balancers to support multiple servers inside a single	ii. the use of IP anycast [RFC1546] [RFC4786] [RFC7094] that allows
	(potentially anycasted) instance. As a consequence, there are many	the same IP address to be announced from multiple locations
	possible ways an authoritative DNS provider can engineer its	(each of referred to as an "anycast instance" [RFC8499]); and
	production authoritative server network, with multiple viable choices
	and no necessarily single optimal design.	iii. the use of load balancers to support multiple servers inside a
		single (potentially anycasted) instance. As a consequence,
		there are many possible ways an authoritative DNS provider can
		engineer its production authoritative server network with
		multiple viable choices, and there is not necessarily a single
		optimal design.

	3. Considerations	3. Considerations


	In the next sections we cover the specific consideration (C1--C6) for	In the next sections, we cover the specific considerations (C1-C6)
	conclusions drawn within the academic papers about large	for conclusions drawn within academic papers about large
	authoritative DNS server operators. These considerations are	authoritative DNS server operators. These considerations are

	conclusions reached from academic works that authoritative server	conclusions reached from academic work that authoritative server
	operators may wish to consider in order to improve their DNS service.	operators may wish to consider in order to improve their DNS service.
	Each consideration offers different improvements that may impact	Each consideration offers different improvements that may impact
	service latency, routing, anycast deployment, and defensive	service latency, routing, anycast deployment, and defensive

	strategies for example.	strategies, for example.


	3.1. C1: Deploy anycast in every authoritative server to enhance	3.1. C1: Deploy Anycast in Every Authoritative Server to Enhance
	distribution and latency	Distribution and Latency


	3.1.1. Research background	3.1.1. Research Background

	Authoritative DNS server operators announce their service using NS	Authoritative DNS server operators announce their service using NS

	records[RFC1034]. Different authoritative servers for a given zone	records [RFC1034]. Different authoritative servers for a given zone
	should return the same content; typically they stay synchronized	should return the same content; typically, they stay synchronized
	using DNS zone transfers (AXFR[RFC5936] and IXFR[RFC1995]),	using DNS zone transfers (authoritative transfer (AXFR) [RFC5936] and
	coordinating the zone data they all return to their clients.	incremental zone transfer (IXFR) [RFC1995]), coordinating the zone
		data they all return to their clients.

	As discussed above, the DNS heavily relies upon replication to	As discussed above, the DNS heavily relies upon replication to

	support high reliability, ensure capacity and to reduce latency	support high reliability, ensure capacity, and reduce latency
	[Moura16b]. DNS has two complementary mechanisms for service	[Moura16b]. The DNS has two complementary mechanisms for service
	replication: nameserver replication (multiple NS records) and anycast	replication: name server replication (multiple NS records) and
	(multiple physical locations). Nameserver replication is strongly	anycast (multiple physical locations). Name server replication is
	recommended for all zones (multiple NS records), and IP anycast is	strongly recommended for all zones (multiple NS records), and IP
	used by many larger zones such as the DNS Root[AnyFRoot], most top-	anycast is used by many larger zones such as the DNS root [AnyFRoot],
	level domains[Moura16b] and many large commercial enterprises,	most top-level domains [Moura16b], and many large commercial
	governments and other organizations.	enterprises, governments, and other organizations.

	Most DNS operators strive to reduce service latency for users, which	Most DNS operators strive to reduce service latency for users, which
	is greatly affected by both of these replication techniques.	is greatly affected by both of these replication techniques.
	However, because operators only have control over their authoritative	However, because operators only have control over their authoritative

	servers, and not over the client's recursive resolvers, it is	servers and not over the client's recursive resolvers, it is
	difficult to ensure that recursives will be served by the closest	difficult to ensure that recursives will be served by the closest
	authoritative server. Server selection is ultimately up to the	authoritative server. Server selection is ultimately up to the
	recursive resolver's software implementation, and different vendors	recursive resolver's software implementation, and different vendors

	and even different releases employ different criteria to chose the	and even different releases employ different criteria to choose the
	authoritative servers with which to communicate.	authoritative servers with which to communicate.

	Understanding how recursive resolvers choose authoritative servers is	Understanding how recursive resolvers choose authoritative servers is
	a key step in improving the effectiveness of authoritative server	a key step in improving the effectiveness of authoritative server
	deployments. To measure and evaluate server deployments,	deployments. To measure and evaluate server deployments,

	[Mueller17b] deployed seven unicast authoritative name servers in	[Mueller17b] describes the deployment of seven unicast authoritative
	different global locations and then queried them from more than 9000	name servers in different global locations and then queried them from
	RIPE authoritative server operators and their respective recursive	more than 9000 Reseaux IP Europeens (RIPE) authoritative server
	resolvers.	operators and their respective recursive resolvers.


	[Mueller17b] found that recursive resolvers in the wild query all	It was found in [Mueller17b] that recursive resolvers in the wild
	available authoritative servers, regardless of the observed latency.	query all available authoritative servers, regardless of the observed
	But the distribution of queries tends to be skewed towards	latency. But the distribution of queries tends to be skewed towards
	authoritatives with lower latency: the lower the latency between a	authoritatives with lower latency: the lower the latency between a
	recursive resolver and an authoritative server, the more often the	recursive resolver and an authoritative server, the more often the
	recursive will send queries to that server. These results were	recursive will send queries to that server. These results were

	obtained by aggregating results from all of the vantage points and	obtained by aggregating results from all of the vantage points, and
	were not specific to any specific vendor or version.	they were not specific to any vendor or version.

	The authors believe this behavior is a consequence of combining the	The authors believe this behavior is a consequence of combining the
	two main criteria employed by resolvers when selecting authoritative	two main criteria employed by resolvers when selecting authoritative
	servers: resolvers regularly check all listed authoritative servers	servers: resolvers regularly check all listed authoritative servers

	in an NS set to determine which is closer (the least latent) and when	in an NS set to determine which is closer (the least latent), and
	one isn't available selects one of the alternatives.	when one isn't available, it selects one of the alternatives.


	3.1.2. Resulting considerations	3.1.2. Resulting Considerations

	For an authoritative DNS operator, this result means that the latency	For an authoritative DNS operator, this result means that the latency
	of all authoritative servers (NS records) matter, so they all must be	of all authoritative servers (NS records) matter, so they all must be
	similarly capable -- all available authoritatives will be queried by	similarly capable -- all available authoritatives will be queried by
	most recursive resolvers. Unicasted services, unfortunately, cannot	most recursive resolvers. Unicasted services, unfortunately, cannot
	deliver good latency worldwide (a unicast authoritative server in	deliver good latency worldwide (a unicast authoritative server in
	Europe will always have high latency to resolvers in California and	Europe will always have high latency to resolvers in California and
	Australia, for example, given its geographical distance).	Australia, for example, given its geographical distance).

	[Mueller17b] recommends that DNS operators deploy equally strong IP	[Mueller17b] recommends that DNS operators deploy equally strong IP
	anycast instances for every authoritative server (i.e., for each NS	anycast instances for every authoritative server (i.e., for each NS
	record). Each large authoritative DNS server provider should phase	record). Each large authoritative DNS server provider should phase

	out their usage of unicast and deploy a well engineered number of	out its usage of unicast and deploy a number of well-engineered
	anycast instances with good peering strategies so they can provide	anycast instances with good peering strategies so they can provide
	good latency to their global clients.	good latency to their global clients.

	As a case study, the ".nl" TLD zone was originally served on seven	As a case study, the ".nl" TLD zone was originally served on seven
	authoritative servers with a mixed unicast/anycast setup. In early	authoritative servers with a mixed unicast/anycast setup. In early
	2018, .nl moved to a setup with 4 anycast authoritative servers.	2018, .nl moved to a setup with 4 anycast authoritative servers.


	[Mueller17b]'s contribution to DNS service engineering shows that	The contribution of [Mueller17b] to DNS service engineering shows
	because unicast cannot deliver good latency worldwide, anycast needs	that because unicast cannot deliver good latency worldwide, anycast
	to be used to provide a low latency service worldwide.	needs to be used to provide a low-latency service worldwide.


	3.2. C2: Optimizing routing is more important than location count and	3.2. C2: Optimizing Routing is More Important than Location Count and
	diversity	Diversity


	3.2.1. Research background	3.2.1. Research Background

	When selecting an anycast DNS provider or setting up an anycast	When selecting an anycast DNS provider or setting up an anycast

	service, choosing the best number of anycast	service, choosing the best number of anycast instances [RFC4786]
	instances[RFC4786][RFC7094] to deploy is a challenging problem.	[RFC7094] to deploy is a challenging problem. Selecting the right
	Selecting where and how many global locations to announce from using	quantity and set of global locations that should send BGP
	BGP is tricky. Intuitively, one could naively think that the more	announcements is tricky. Intuitively, one could naively think that
	instances the better and simply "more" will always lead to shorter	more instances are better and that simply "more" will always lead to
	response times.	shorter response times.


	This is not necessarily true, however. In fact, [Schmidt17a] found	This is not necessarily true, however. In fact, proper route
	that proper route engineering can matter more than the total number	engineering can matter more than the total number of locations, as
	of locations. They analyzed the relationship between the number of	found in [Schmidt17a]. To study the relationship between the number
	anycast instances and service performance (measuring latency of the	of anycast instances and the associated service performance, the
	round-trip time (RTT)), measuring the overall performance of four DNS	authors measured the round-trip time (RTT) latency of four DNS root
	Root servers. The Root DNS servers are implemented by 12 separate	servers. The root DNS servers are implemented by 12 separate
	organizations serving the DNS root zone at 13 different IPv4/IPv6	organizations serving the DNS root zone at 13 different IPv4/IPv6
	address pairs.	address pairs.

	The results documented in [Schmidt17a] measured the performance of	The results documented in [Schmidt17a] measured the performance of

	the {c,f,k,l}.root-servers.net (hereafter, "C", "F", "K" and "L")	the {c,f,k,l}.root-servers.net (referred to as "C", "F", "K", and "L"
	servers from more than 7.9k RIPE Atlas probes. RIPE Atlas is a	hereafter) servers from more than 7,900 RIPE Atlas probes. RIPE
	Internet measurement platform with more than 12000 global vantage	Atlas is an Internet measurement platform with more than 12,000
	points called "Atlas Probes" -- it is used regularly by both	global vantage points called "Atlas probes", and it is used regularly
	researchers and operators [RipeAtlas15a] [RipeAtlas19a].	by both researchers and operators [RipeAtlas15a] [RipeAtlas19a].


	[Schmidt17a] found that the C server, a smaller anycast deployment	In [Schmidt17a], the authors found that the C server, a smaller
	consisting of only 8 instances, provided very similar overall	anycast deployment consisting of only 8 instances, provided very
	performance in comparison to the much larger deployments of K and L,	similar overall performance in comparison to the much larger
	with 33 and 144 instances respectively. The median RTT for C, K and	deployments of K and L, with 33 and 144 instances, respectively. The
	L root server were all between 30-32ms.	median RTTs for the C, K, and L root servers were all between 30-32
		ms.

	Because RIPE Atlas is known to have better coverage in Europe than	Because RIPE Atlas is known to have better coverage in Europe than
	other regions, the authors specifically analyzed the results per	other regions, the authors specifically analyzed the results per

	region and per country (Figure 5 in [Schmidt17a]), and show that	region and per country (Figure 5 in [Schmidt17a]) and show that known
	known Atlas bias toward Europe does not change the conclusion that	Atlas bias toward Europe does not change the conclusion that properly
	properly selected anycast locations is more important to latency than	selected anycast locations are more important to latency than the
	the number of sites.	number of sites.


	3.2.2. Resulting considerations	3.2.2. Resulting Considerations


	The important conclusion of [Schmidt17a] is that when engineering	The important conclusion from [Schmidt17a] is that when engineering
	anycast services for performance, factors other than just the number	anycast services for performance, factors other than just the number
	of instances (such as local routing connectivity) must be considered.	of instances (such as local routing connectivity) must be considered.
	Specifically, optimizing routing policies is more important than	Specifically, optimizing routing policies is more important than

	simply adding new instances. They showed that 12 instances can	simply adding new instances. The authors showed that 12 instances
	provide reasonable latency, assuming they are globally distributed	can provide reasonable latency, assuming they are globally
	and have good local interconnectivity. However, additional instances	distributed and have good local interconnectivity. However,
	can still be useful for other reasons, such as when handling Denial-	additional instances can still be useful for other reasons, such as
	of-service (DoS) attacks [Moura16b].	when handling DoS attacks [Moura16b].


	3.3. C3: Collecting anycast catchment maps to improve design	3.3. C3: Collect Anycast Catchment Maps to Improve Design


	3.3.1. Research background	3.3.1. Research Background


	An anycast DNS service may be deployed from anywhere from several	An anycast DNS service may be deployed from anywhere and from several
	locations to hundreds of locations (for example, l.root-servers.net	locations to hundreds of locations (for example, l.root-servers.net
	has over 150 anycast instances at the time this was written).	has over 150 anycast instances at the time this was written).
	Anycast leverages Internet routing to distribute incoming queries to	Anycast leverages Internet routing to distribute incoming queries to

	a service's hop-nearest distributed anycast locations. However,	a service's nearest distributed anycast locations measured by the
	usually queries are not evenly distributed across all anycast	number of routing hops. However, queries are usually not evenly
	locations, as found in the case of L-Root [IcannHedge18].	distributed across all anycast locations, as found in the case of
		L-Root when analyzed using Hedgehog [IcannHedgehog].

	Adding locations to or removing locations from a deployed anycast	Adding locations to or removing locations from a deployed anycast
	network changes the load distribution across all of its locations.	network changes the load distribution across all of its locations.
	When a new location is announced by BGP, locations may receive more	When a new location is announced by BGP, locations may receive more
	or less traffic than it was engineered for, leading to suboptimal	or less traffic than it was engineered for, leading to suboptimal
	service performance or even stressing some locations while leaving	service performance or even stressing some locations while leaving

	others underutilized. Operators constantly face this scenario that	others underutilized. Operators constantly face this scenario when
	when expanding an anycast service. Operators cannot easily directly	expanding an anycast service. Operators cannot easily directly
	estimate future query distributions based on proposed anycast network	estimate future query distributions based on proposed anycast network
	engineering decisions.	engineering decisions.


	To address this need and estimate the query loads based on changing,	To address this need and estimate the query loads of an anycast
	in particular expanding, anycast service changes [Vries17b] developed	service undergoing changes (in particular expanding), [Vries17b]
	a new technique enabling operators to carry out active measurements,	describes the development of a new technique enabling operators to
	using an open-source tool called Verfploeter (available at	carry out active measurements using an open-source tool called
	[VerfSrc]). The results allow the creation of detailed anycast maps	Verfploeter (available at [VerfSrc]). The results allow the creation
	and catchment estimates. By running verfploeter combined with a	of detailed anycast maps and catchment estimates. By running
	published IPv4 "hit list", DNS can precisely calculate which remote	Verfploeter combined with a published IPv4 "hit list", the DNS can
	prefixes will be matched to each anycast instance in a network. At	precisely calculate which remote prefixes will be matched to each
	the moment of this writing, Verfploeter still does not support IPv6	anycast instance in a network. At the time of this writing,
	as the IPv4 hit lists used are generated via frequent large scale	Verfploeter still does not support IPv6 as the IPv4 hit lists used
	ICMP echo scans, which is not possible using IPv6.	are generated via frequent large-scale ICMP echo scans, which is not
		possible using IPv6.


	As proof of concept, [Vries17b] documents how it verfploeter was used	As proof of concept, [Vries17b] documents how Verfploeter was used to
	to predict both the catchment and query load distribution for a new	predict both the catchment and query load distribution for a new
	anycast instance deployed for b.root-servers.net. Using two anycast	anycast instance deployed for b.root-servers.net. Using two anycast
	test instances in Miami (MIA) and Los Angeles (LAX), an ICMP echo	test instances in Miami (MIA) and Los Angeles (LAX), an ICMP echo

	query was sent from an IP anycast addresses to each IPv4 /24 network	query was sent from an IP anycast address to each IPv4 /24 network
	routing block on the Internet.	routing block on the Internet.

	The ICMP echo responses were recorded at both sites and analyzed and	The ICMP echo responses were recorded at both sites and analyzed and

	overlayed onto a graphical world map, resulting in an Internet scale	overlaid onto a graphical world map, resulting in an Internet-scale
	catchment map. To calculate expected load once the production	catchment map. To calculate expected load once the production
	network was enabled, the quantity of traffic received by b.root-	network was enabled, the quantity of traffic received by b.root-
	servers.net's single site at LAX was recorded based on a single day's	servers.net's single site at LAX was recorded based on a single day's

	traffic (2017-04-12, DITL datasets [Ditl17]). [Vries17b] predicted	traffic (2017-04-12, "day in the life" (DITL) datasets [Ditl17]). In
	that 81.6% of the traffic load would remain at the LAX site. This	[Vries17b], it was predicted that 81.6% of the traffic load would
	estimate by verfploeter turned out to be very accurate; the actual	remain at the LAX site. This Verfploeter estimate turned out to be
	measured traffic volume when production service at MIA was enabled	very accurate; the actual measured traffic volume when production
	was 81.4%.	service at MIA was enabled was 81.4%.

	Verfploeter can also be used to estimate traffic shifts based on	Verfploeter can also be used to estimate traffic shifts based on

	other BGP route engineering techniques (for example, AS path	other BGP route engineering techniques (for example, Autonomous
	prepending or BGP community use) in advance of operational	System (AS) path prepending or BGP community use) in advance of
	deployment. [Vries17b] studied this using prepending with 1-3 hops	operational deployment. This was studied in [Vries17b] using
	at each instance and compared the results against real operational	prepending with 1-3 hops at each instance, and the results were
	changes to validate the techniques accuracy.	compared against real operational changes to validate the accuracy of
		the techniques.


	3.3.2. Resulting considerations	3.3.2. Resulting Considerations

	An important operational takeaway [Vries17b] provides is how DNS	An important operational takeaway [Vries17b] provides is how DNS
	operators can make informed engineering choices when changing DNS	operators can make informed engineering choices when changing DNS
	anycast network deployments by using Verfploeter in advance.	anycast network deployments by using Verfploeter in advance.

	Operators can identify sub-optimal routing situations in advance with	Operators can identify suboptimal routing situations in advance with
	significantly better coverage than using other active measurement	significantly better coverage rather than using other active
	platforms such as RIPE Atlas. To date, Verfploeter has been deployed	measurement platforms such as RIPE Atlas. To date, Verfploeter has
	on a operational testbed (Anycast testbed) [AnyTest], on a large	been deployed on an operational testbed (anycast testbed) [AnyTest]
	unnamed operator and is run daily at b.root-servers.net[Vries17b].	on a large unnamed operator and is run daily at b.root-servers.net
		[Vries17b].

	Operators should use active measurement techniques like Verfploeter	Operators should use active measurement techniques like Verfploeter
	in advance of potential anycast network changes to accurately measure	in advance of potential anycast network changes to accurately measure
	the benefits and potential issues ahead of time.	the benefits and potential issues ahead of time.


	3.4. C4: When under stress, employ two strategies	3.4. C4: Employ Two Strategies When under Stress
	3.4.1. Research background
		3.4.1. Research Background

	DDoS attacks are becoming bigger, cheaper, and more frequent	DDoS attacks are becoming bigger, cheaper, and more frequent
	[Moura16b]. The most powerful recorded DDoS attack against DNS	[Moura16b]. The most powerful recorded DDoS attack against DNS

	servers to date reached 1.2 Tbps by using IoT devices [Perlroth16].	servers to date reached 1.2 Tbps by using Internet of Things (IoT)
	How should a DNS operator engineer its anycast authoritative DNS	devices [Perlroth16]. How should a DNS operator engineer its anycast
	server react to such a DDoS attack? [Moura16b] investigates this	authoritative DNS server to react to such a DDoS attack? [Moura16b]
	question using empirical observations grounded with theoretical	investigates this question using empirical observations grounded with
	option evaluations.	theoretical option evaluations.

	An authoritative DNS server deployed using anycast will have many	An authoritative DNS server deployed using anycast will have many
	server instances distributed over many networks. Ultimately, the	server instances distributed over many networks. Ultimately, the
	relationship between the DNS provider's network and a client's ISP	relationship between the DNS provider's network and a client's ISP
	will determine which anycast instance will answer queries for a given	will determine which anycast instance will answer queries for a given

	client, given that BGP is the protocol that maps clients to specific	client, given that the BGP protocol maps clients to specific anycast
	anycast instances by using routing information [RF:KDar02]. As a	instances using routing information. As a consequence, when an
	consequence, when an anycast authoritative server is under attack,	anycast authoritative server is under attack, the load that each
	the load that each anycast instance receives is likely to be unevenly	anycast instance receives is likely to be unevenly distributed (a
	distributed (a function of the source of the attacks), thus some	function of the source of the attacks); thus, some instances may be
	instances may be more overloaded than others which is what was	more overloaded than others, which is what was observed when
	observed analyzing the Root DNS events of Nov. 2015 [Moura16b].	analyzing the root DNS events of November 2015 [Moura16b]. Given the
	Given the fact that different instances may have different capacity	fact that different instances may have different capacities
	(bandwidth, CPU, etc.), making a decision about how to react to	(bandwidth, CPU, etc.), making a decision about how to react to
	stress becomes even more difficult.	stress becomes even more difficult.


	In practice, an anycast instance is overloaded with incoming traffic,	In practice, when an anycast instance is overloaded with incoming
	operators have two options:	traffic, operators have two options:

	* They can withdraw its routes, pre-prepend its AS route to some or	* They can withdraw its routes, pre-prepend its AS route to some or

	all of its neighbors, perform other traffic shifting tricks (such	all of its neighbors, perform other traffic-shifting tricks (such
	as reducing route announcement propagation using BGP	as reducing route announcement propagation using BGP communities
	communities[RFC1997]), or by communicating with its upstream	[RFC1997]), or communicate with its upstream network providers to
	network providers to apply filtering (potentially using FlowSpec	apply filtering (potentially using FlowSpec [RFC8955] or the DDoS
	[RFC8955] or DOTS protocol ([RFC8811], [RFC8782], [RFC8783]).	Open Threat Signaling (DOTS) protocol [RFC8811] [RFC9132]
	These techniques shift both legitimate and attack traffic to other	[RFC8783]). These techniques shift both legitimate and attack
	anycast instances (with hopefully greater capacity) or to block	traffic to other anycast instances (with hopefully greater
	traffic entirely.	capacity) or block traffic entirely.


	* Alternatively, operators can be become a degraded absorber by	* Alternatively, operators can become degraded absorbers by
	continuing to operate, knowing dropping incoming legitimate	continuing to operate, knowing dropping incoming legitimate
	requests due to queue overflow. However, this approach will also	requests due to queue overflow. However, this approach will also
	absorb attack traffic directed toward its catchment, hopefully	absorb attack traffic directed toward its catchment, hopefully
	protecting the other anycast instances.	protecting the other anycast instances.


	[Moura16b] saw both of these behaviors deployed in practice by	[Moura16b] describes seeing both of these behaviors deployed in
	studying instance reachability and route-trip time (RTTs) in the DNS	practice when studying instance reachability and RTTs in the DNS root
	root events. When withdraw strategies were deployed, the stress of	events. When withdraw strategies were deployed, the stress of
	increased query loads were displaced from one instance to multiple	increased query loads were displaced from one instance to multiple
	other sites. In other observed events, one site was left to absorb	other sites. In other observed events, one site was left to absorb

	the brunt of an attack leaving the other sites to remain relatively	the brunt of an attack, leaving the other sites to remain relatively
	less affected.	less affected.


	3.4.2. Resulting considerations	3.4.2. Resulting Considerations


	Operators should consider having both a anycast site withdraw	Operators should consider having both an anycast site withdraw
	strategy and a absorption strategy ready to be used before a network	strategy and an absorption strategy ready to be used before a network
	overload occurs. Operators should be able to deploy one or both of	overload occurs. Operators should be able to deploy one or both of
	these strategies rapidly. Ideally, these should be encoded into	these strategies rapidly. Ideally, these should be encoded into
	operating playbooks with defined site measurement guidelines for	operating playbooks with defined site measurement guidelines for
	which strategy to employ based on measured data from past events.	which strategy to employ based on measured data from past events.

	[Moura16b] speculates that careful, explicit, and automated	[Moura16b] speculates that careful, explicit, and automated
	management policies may provide stronger defenses to overload events.	management policies may provide stronger defenses to overload events.

	DNS operators should be ready to employ both traditional filtering	DNS operators should be ready to employ both common filtering
	approaches and other routing load balancing techniques	approaches and other routing load-balancing techniques (such as
	(withdraw/prepend/communities or isolate instances), where the best	withdrawing routes, prepending Autonomous Systems (ASes), adding
	choice depends on the specifics of the attack.	communities, or isolating instances), where the best choice depends
		on the specifics of the attack.

	Note that this consideration refers to the operation of just one	Note that this consideration refers to the operation of just one
	anycast service point, i.e., just one anycasted IP address block	anycast service point, i.e., just one anycasted IP address block
	covering one NS record. However, DNS zones with multiple	covering one NS record. However, DNS zones with multiple
	authoritative anycast servers may also expect loads to shift from one	authoritative anycast servers may also expect loads to shift from one

	anycasted server to another, as resolvers switch from on	anycasted server to another, as resolvers switch from one
	authoritative service point to another when attempting to resolve a	authoritative service point to another when attempting to resolve a
	name [Mueller17b].	name [Mueller17b].


	3.5. C5: Consider longer time-to-live values whenever possible	3.5. C5: Consider Longer Time-to-Live Values Whenever Possible


	3.5.1. Research background	3.5.1. Research Background

	Caching is the cornerstone of good DNS performance and reliability.	Caching is the cornerstone of good DNS performance and reliability.
	A 50 ms response to a new DNS query may be considered fast, but a	A 50 ms response to a new DNS query may be considered fast, but a

	less than 1 ms response to a cached entry is far faster. [Moura18b]	response of less than 1 ms to a cached entry is far faster. In
	showed that caching also protects users from short outages and even	[Moura18b], it was shown that caching also protects users from short
	significant DDoS attacks.	outages and even significant DDoS attacks.


	DNS record TTLs (time-to-live values) [RFC1034][RFC1035] directly	Time-to-live (TTL) values [RFC1034] [RFC1035] for DNS records
	control cache durations and affect latency, resilience, and the role	directly control cache durations and affect latency, resilience, and
	of DNS in CDN server selection. Some early work modeled caches as a	the role of DNS in Content Delivery Network (CDN) server selection.
	function of their TTLs [Jung03a], and recent work has examined their	Some early work modeled caches as a function of their TTLs [Jung03a],
	interaction with DNS[Moura18b], but until [Moura19b] no research	and recent work has examined cache interactions with DNS [Moura18b],
	provided considerations about the benefits of various TTL value	but until [Moura19b], no research had provided considerations about
	choices. To study this, Moura et. al. [Moura19b] carried out a	the benefits of various TTL value choices. To study this, Moura et
	measurement study investigating TTL choices and their impact on user	al. [Moura19b] carried out a measurement study investigating TTL
	experiences in the wild. They performed this study independent of	choices and their impact on user experiences in the wild. They
	specific resolvers (and their caching architectures), vendors, or	performed this study independent of specific resolvers (and their
	setups.	caching architectures), vendors, or setups.


	First, they identified several reasons why operators and zone-owners	First, they identified several reasons why operators and zone owners
	may want to choose longer or shorter TTLs:	may want to choose longer or shorter TTLs:


	* As discussed, longer TTLs lead to a longer cache life, resulting	* Longer TTLs, as discussed, lead to a longer cache life, resulting
	in faster responses. [Moura19b] measured this in the wild and	in faster responses. In [Moura19b], this was measured this in the
	showed that by increasing the TTL for .uy TLD from 5 minutes	wild, and it showed that by increasing the TTL for the .uy TLD
	(300s) to 1 day (86400s) the latency measured from 15k Atlas	from 5 minutes (300 s) to 1 day (86,400 s), the latency measured
	vantage points changed significantly: the median RTT decreased	from 15,000 Atlas vantage points changed significantly: the median
	from 28.7ms to 8ms, and the 75%ile decreased from 183ms to 21ms.	RTT decreased from 28.7 ms to 8 ms, and the 75th percentile
		decreased from 183 ms to 21 ms.


	* Longer caching times also results in lower DNS traffic:	* Longer caching times also result in lower DNS traffic:
	authoritative servers will experience less traffic with extended	authoritative servers will experience less traffic with extended
	TTLs, as repeated queries are answered by resolver caches.	TTLs, as repeated queries are answered by resolver caches.


	* Consequently, longer caching results in a lower overall cost if	* Longer caching consequently results in a lower overall cost if the
	DNS is metered: some DNS-As-A-Service providers charge a per query	DNS is metered: some providers that offer DNS as a Service charge
	(metered) cost (often in addition to a fixed monthly cost).	a per-query (metered) cost (often in addition to a fixed monthly
		cost).

	* Longer caching is more robust to DDoS attacks on DNS	* Longer caching is more robust to DDoS attacks on DNS

	infrastructure. [Moura18b] also measured and show that DNS	infrastructure. DNS caching was also measured in [Moura18b], and
	caching can greatly reduce the effects of a DDoS on DNS, provided	it showed that the effects of a DDoS on DNS can be greatly
	that caches last longer than the attack.	reduced, provided that the caches last longer than the attack.


	* However, shorter caching supports deployments that may require	* Shorter caching, however, supports deployments that may require
	rapid operational changes: An easy way to transition from an old	rapid operational changes: an easy way to transition from an old
	server to a new one is to simply change the DNS records. Since	server to a new one is to simply change the DNS records. Since
	there is no method to remotely remove cached DNS records, the TTL	there is no method to remotely remove cached DNS records, the TTL
	duration represents a necessary transition delay to fully shift	duration represents a necessary transition delay to fully shift
	from one server to another. Thus, low TTLs allow for more rapid	from one server to another. Thus, low TTLs allow for more rapid
	transitions. However, when deployments are planned in advance	transitions. However, when deployments are planned in advance
	(that is, longer than the TTL), it is possible to lower the TTLs	(that is, longer than the TTL), it is possible to lower the TTLs

	just-before a major operational change and raise them again	just before a major operational change and raise them again
	afterward.	afterward.

	* Shorter caching can also help with a DNS-based response to DDoS	* Shorter caching can also help with a DNS-based response to DDoS
	attacks. Specifically, some DDoS-scrubbing services use the DNS	attacks. Specifically, some DDoS-scrubbing services use the DNS
	to redirect traffic during an attack. Since DDoS attacks arrive	to redirect traffic during an attack. Since DDoS attacks arrive

	unannounced, DNS-based traffic redirection requires the TTL be	unannounced, DNS-based traffic redirection requires that the TTL
	kept quite low at all times to allow operators to suddenly have	be kept quite low at all times to allow operators to suddenly have
	their zone served by a DDoS-scrubbing service.	their zone served by a DDoS-scrubbing service.

	* Shorter caching helps DNS-based load balancing. Many large	* Shorter caching helps DNS-based load balancing. Many large
	services are known to rotate traffic among their servers using	services are known to rotate traffic among their servers using
	DNS-based load balancing. Each arriving DNS request provides an	DNS-based load balancing. Each arriving DNS request provides an

	opportunity to adjust service load by rotating IP address records	opportunity to adjust the service load by rotating IP address
	(A and AAAA) to the lowest unused server. Shorter TTLs may be	records (A and AAAA) to the lowest unused server. Shorter TTLs
	desired in these architectures to react more quickly to traffic	may be desired in these architectures to react more quickly to
	dynamics. Many recursive resolvers, however, have minimum caching	traffic dynamics. Many recursive resolvers, however, have minimum
	times of tens of seconds, placing a limit on this form of agility.	caching times of tens of seconds, placing a limit on this form of
		agility.


	3.5.2. Resulting considerations	3.5.2. Resulting Considerations

	Given these considerations, the proper choice for a TTL depends in	Given these considerations, the proper choice for a TTL depends in
	part on multiple external factors -- no single recommendation is	part on multiple external factors -- no single recommendation is
	appropriate for all scenarios. Organizations must weigh these trade-	appropriate for all scenarios. Organizations must weigh these trade-
	offs and find a good balance for their situation. Still, some	offs and find a good balance for their situation. Still, some
	guidelines can be reached when choosing TTLs:	guidelines can be reached when choosing TTLs:

	* For general DNS zone owners, [Moura19b] recommends a longer TTL of	* For general DNS zone owners, [Moura19b] recommends a longer TTL of

	at least one hour, and ideally 8, 12, or 24 hours. Assuming	at least one hour and ideally 4, 8, or 24 hours. Assuming planned
	planned maintenance can be scheduled at least a day in advance,	maintenance can be scheduled at least a day in advance, long TTLs
	long TTLs have little cost and may, even, literally provide a cost	have little cost and may even literally provide cost savings.
	savings.


	* For registry operators: TLD and other public registration	* For TLD and other public registration operators (for example, most
	operators (for example most ccTLDs and .com, .net, .org) that host	ccTLDs and .com, .net, and .org) that host many delegations (NS
	many delegations (NS records, DS records and "glue" records),	records, DS records, and "glue" records), [Moura19b] demonstrates
	[Moura19b] demonstrates that most resolvers will use the TTL	that most resolvers will use the TTL values provided by the child
	values provided by the child delegations while the others some	delegations while some others will choose the TTL provided by the
	will choose the TTL provided by the parent's copy of the record.	parent's copy of the record. As such, [Moura19b] recommends
	As such, [Moura19b] recommends longer TTLs (at least an hour or	longer TTLs (at least an hour or more) for registry operators as
	more) for registry operators as well for child NS and other	well for child NS and other records.
	records.

	* Users of DNS-based load balancing or DDoS-prevention services may	* Users of DNS-based load balancing or DDoS-prevention services may
	require shorter TTLs: TTLs may even need to be as short as 5	require shorter TTLs: TTLs may even need to be as short as 5
	minutes, although 15 minutes may provide sufficient agility for	minutes, although 15 minutes may provide sufficient agility for

	many operators. There is always a tussle between shorter TTLs	many operators. There is always a tussle between using shorter
	providing more agility against all the benefits listed above for	TTLs that provide more agility and using longer TTLs that include
	using longer TTLs.	all the benefits listed above.


	* Use of A/AAAA and NS records: The TTLs for A/AAAA records should	* Regarding the use of A/AAAA and NS records, the TTLs for A/AAAA
	be shorter to or equal to the TTL for the corresponding NS records	records should be shorter than or equal to the TTL for the
	for in-bailiwick authoritative DNS servers, since [Moura19b] finds	corresponding NS records for in-bailiwick authoritative DNS
	that once an NS record expires, their associated A/AAAA will also	servers, since [Moura19b] finds that once an NS record expires,
	be re-queried when glue is required to be sent by the parents.	their associated A/AAAA will also be requeried when glue is
	For out-of-bailiwick servers, A, AAAA and NS records are usually	required to be sent by the parents. For out-of-bailiwick servers,
	all cached independently, so different TTLs can be used	A, AAAA, and NS records are usually all cached independently, so
	effectively if desired. In either case, short A and AAAA records	different TTLs can be used effectively if desired. In either
	may still be desired if DDoS-mitigation services are required.	case, short A and AAAA records may still be desired if DDoS
		mitigation services are required.


	3.6. C6: Consider the TTL differences between parents and children	3.6. C6: Consider the Difference in Parent and Children's TTL Values


	3.6.1. Research background	3.6.1. Research Background

	Multiple record types exist or are related between the parent of a	Multiple record types exist or are related between the parent of a
	zone and the child. At a minimum, NS records are supposed to be	zone and the child. At a minimum, NS records are supposed to be

	identical in the parent (but often are not) as or corresponding IP	identical in the parent (but often are not), as are corresponding IP
	address in "glue" A/AAAA records that must exist for in-bailiwick	addresses in "glue" A/AAAA records that must exist for in-bailiwick
	authoritative servers. Additionally, if DNSSEC ([RFC4033] [RFC4034]	authoritative servers. Additionally, if DNSSEC [RFC4033] [RFC4034]
	[RFC4035] [RFC4509]) is deployed for a zone the parent's DS record	[RFC4035] [RFC4509] is deployed for a zone, the parent's DS record
	must cryptographically refer to a child's DNSKEY record.	must cryptographically refer to a child's DNSKEY record.

	Because some information exists in both the parent and a child, it is	Because some information exists in both the parent and a child, it is
	possible for the TTL values to differ between the parent's copy and	possible for the TTL values to differ between the parent's copy and
	the child's. [Moura19b] examines resolver behaviors when these	the child's. [Moura19b] examines resolver behaviors when these

	values differ in the wild, as they frequently do -- often parent	values differed in the wild, as they frequently do -- often, parent
	zones have defacto TTL values that a child has no control over. For	zones have de facto TTL values that a child has no control over. For
	example, NS records for TLDs in the root zone are all set to 2 days	example, NS records for TLDs in the root zone are all set to 2 days

	(48 hours), but some TLD's have lower values within their published	(48 hours), but some TLDs have lower values within their published
	records (the TTLs for .cl's NS records from their authoritative	records (the TTLs for .cl's NS records from their authoritative
	servers is 1 hour). [Moura19b] also examines the differences in the	servers is 1 hour). [Moura19b] also examines the differences in the
	TTLs between the NS records and the corresponding A/AAAA records for	TTLs between the NS records and the corresponding A/AAAA records for

	the addresses of a nameserver. RIPE Atlas nodes are used to	the addresses of a name server. RIPE Atlas nodes are used to
	determine what resolvers in the wild do with different information,	determine what resolvers in the wild do with different information
	and whether the parent's TTL is used for cache life-times ("parent-	and whether the parent's TTL is used for cache lifetimes ("parent-
	centric") or the child's is used ("child-centric").	centric") or the child's ("child-centric").


	[Moura19b] finds that roughly 90% of resolvers follow the child's	[Moura19b] found that roughly 90% of resolvers follow the child's
	view of the TTL, while 10% appear parent-centric. It additionally	view of the TTL, while 10% appear parent-centric. Additionally, it
	finds that resolvers behave differently for cache lifetimes for in-	found that resolvers behave differently for cache lifetimes for in-
	bailiwick vs out-of-bailiwick NS/A/AAAA TTL combinations.	bailiwick vs. out-of-bailiwick NS/A/AAAA TTL combinations.
	Specifically, when NS TTLs are shorter than the corresponding address	Specifically, when NS TTLs are shorter than the corresponding address

	records, most resolvers will re-query for A/AAAA records for in-	records, most resolvers will requery for A/AAAA records for the in-
	bailiwick resolvers and switch to new address records even if the	bailiwick resolvers and switch to new address records even if the
	cache indicates the original A/AAAA records could be kept longer. On	cache indicates the original A/AAAA records could be kept longer. On
	the other hand, the inverse is true for out-of-bailiwick resolvers:	the other hand, the inverse is true for out-of-bailiwick resolvers:

	If the NS record expires first resolvers will honor the original	if the NS record expires first, resolvers will honor the original
	cache time of the nameserver's address.	cache time of the name server's address.


	3.6.2. Resulting considerations	3.6.2. Resulting Considerations

	The important conclusion from this study is that operators cannot	The important conclusion from this study is that operators cannot
	depend on their published TTL values alone -- the parent's values are	depend on their published TTL values alone -- the parent's values are
	also used for timing cache entries in the wild. Operators that are	also used for timing cache entries in the wild. Operators that are

	planning on infrastructure changes should assume that older	planning on infrastructure changes should assume that an older
	infrastructure must be left on and operational for at least the	infrastructure must be left on and operational for at least the
	maximum of both the parent and child's TTLs.	maximum of both the parent and child's TTLs.


	4. Security considerations	4. Security Considerations

	This document discusses applying measured research results to	This document discusses applying measured research results to
	operational deployments. Most of the considerations affect mostly	operational deployments. Most of the considerations affect mostly

	operational practice, though a few do have security related impacts.	operational practice, though a few do have security-related impacts.

	Specifically, C4 discusses a couple of strategies to employ when a	Specifically, C4 discusses a couple of strategies to employ when a
	service is under stress from DDoS attacks and offers operators	service is under stress from DDoS attacks and offers operators
	additional guidance when handling excess traffic.	additional guidance when handling excess traffic.

	Similarly, C5 identifies the trade-offs with respect to the	Similarly, C5 identifies the trade-offs with respect to the

	operational and security benefits of using longer time-to-live	operational and security benefits of using longer TTL values.
	values.

	5. Privacy Considerations	5. Privacy Considerations


	This document does not add any practical new privacy issues, aside	This document does not add any new, practical privacy issues, aside
	from possible benefits in deploying longer TTLs as suggested in C5.	from possible benefits in deploying longer TTLs as suggested in C5.
	Longer TTLs may help preserve a user's privacy by reducing the number	Longer TTLs may help preserve a user's privacy by reducing the number

	of requests that get transmitted in both the client-to-resolver and	of requests that get transmitted in both client-to-resolver and
	resolver-to-authoritative cases.	resolver-to-authoritative cases.


	6. IANA considerations	6. IANA Considerations

	This document has no IANA actions.	This document has no IANA actions.


	7. Acknowledgements	7. References

	This document is a summary of the main considerations of six research
	works performed by the authors and others. This document would not
	have been possible without the hard work of these authors and co-
	authors:

	* Ricardo de O. Schmidt

	* Wouter B de Vries

	* Moritz Mueller
	* Lan Wei

	* Cristian Hesselman

	* Jan Harm Kuipers

	* Pieter-Tjerk de Boer

	* Aiko Pras

	We would like also to thank the reviewers of this draft that offered
	valuable suggestions: Duane Wessels, Joe Abley, Toema Gavrichenkov,
	John Levine, Michael StJohns, Kristof Tuyteleers, Stefan Ubbink,
	Klaus Darilion and Samir Jafferali, and comments provided at the IETF
	DNSOP session (IETF104).

	Besides those, we would like thank those acknowledged in the papers
	this document summarizes for helping produce the results: RIPE NCC
	and DNS OARC for their tools and datasets used in this research, as
	well as the funding agencies sponsoring the individual research
	works.

	8. References


	8.1. Normative References	7.1. Normative References

	[RFC1034] Mockapetris, P., "Domain names - concepts and facilities",	[RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
	STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,	STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
	<https://www.rfc-editor.org/info/rfc1034>.	<https://www.rfc-editor.org/info/rfc1034>.

	[RFC1035] Mockapetris, P., "Domain names - implementation and	[RFC1035] Mockapetris, P., "Domain names - implementation and
	specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,	specification", STD 13, RFC 1035, DOI 10.17487/RFC1035,
	November 1987, <https://www.rfc-editor.org/info/rfc1035>.	November 1987, <https://www.rfc-editor.org/info/rfc1035>.

	[RFC1546] Partridge, C., Mendez, T., and W. Milliken, "Host	[RFC1546] Partridge, C., Mendez, T., and W. Milliken, "Host

	skipping to change at page 17, line 26 ¶	skipping to change at line 725 ¶

	[RFC7094] McPherson, D., Oran, D., Thaler, D., and E. Osterweil,	[RFC7094] McPherson, D., Oran, D., Thaler, D., and E. Osterweil,
	"Architectural Considerations of IP Anycast", RFC 7094,	"Architectural Considerations of IP Anycast", RFC 7094,
	DOI 10.17487/RFC7094, January 2014,	DOI 10.17487/RFC7094, January 2014,
	<https://www.rfc-editor.org/info/rfc7094>.	<https://www.rfc-editor.org/info/rfc7094>.

	[RFC8499] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS	[RFC8499] Hoffman, P., Sullivan, A., and K. Fujiwara, "DNS
	Terminology", BCP 219, RFC 8499, DOI 10.17487/RFC8499,	Terminology", BCP 219, RFC 8499, DOI 10.17487/RFC8499,
	January 2019, <https://www.rfc-editor.org/info/rfc8499>.	January 2019, <https://www.rfc-editor.org/info/rfc8499>.


	[RFC8782] Reddy.K, T., Ed., Boucadair, M., Ed., Patil, P.,
	Mortensen, A., and N. Teague, "Distributed Denial-of-
	Service Open Threat Signaling (DOTS) Signal Channel
	Specification", RFC 8782, DOI 10.17487/RFC8782, May 2020,
	<https://www.rfc-editor.org/info/rfc8782>.

	[RFC8783] Boucadair, M., Ed. and T. Reddy.K, Ed., "Distributed	[RFC8783] Boucadair, M., Ed. and T. Reddy.K, Ed., "Distributed
	Denial-of-Service Open Threat Signaling (DOTS) Data	Denial-of-Service Open Threat Signaling (DOTS) Data
	Channel Specification", RFC 8783, DOI 10.17487/RFC8783,	Channel Specification", RFC 8783, DOI 10.17487/RFC8783,
	May 2020, <https://www.rfc-editor.org/info/rfc8783>.	May 2020, <https://www.rfc-editor.org/info/rfc8783>.

	[RFC8955] Loibl, C., Hares, S., Raszuk, R., McPherson, D., and M.	[RFC8955] Loibl, C., Hares, S., Raszuk, R., McPherson, D., and M.
	Bacher, "Dissemination of Flow Specification Rules",	Bacher, "Dissemination of Flow Specification Rules",
	RFC 8955, DOI 10.17487/RFC8955, December 2020,	RFC 8955, DOI 10.17487/RFC8955, December 2020,
	<https://www.rfc-editor.org/info/rfc8955>.	<https://www.rfc-editor.org/info/rfc8955>.


	8.2. Informative References	[RFC9132] Boucadair, M., Ed., Shallow, J., and T. Reddy.K,
		"Distributed Denial-of-Service Open Threat Signaling
		(DOTS) Signal Channel Specification", RFC 9132,
		DOI 10.17487/RFC9132, September 2021,
		<https://www.rfc-editor.org/info/rfc9132>.

		7.2. Informative References

	[AnyBest] Woodcock, B., "Best Practices in DNS Service-Provision	[AnyBest] Woodcock, B., "Best Practices in DNS Service-Provision

	Architecture", March 2016,	Architecture", Version 1.2, March 2016,
	<https://meetings.icann.org/en/marrakech55/schedule/mon-	<https://meetings.icann.org/en/marrakech55/schedule/mon-
	tech/presentation-dns-service-provision-07mar16-en.pdf>.	tech/presentation-dns-service-provision-07mar16-en.pdf>.


	[AnyFRoot] Woolf, S., "Anycasting f.root-serers.net", January 2003,	[AnyFRoot] Woolf, S., "Anycasting f.root-servers.net", January 2003,
	<https://archive.nanog.org/meetings/nanog27/presentations/	<https://archive.nanog.org/meetings/nanog27/presentations/
	suzanne.pdf>.	suzanne.pdf>.


	[AnyTest] Schmidt, R.d.O., "Anycast Testbed", December 2018,	[AnyTest] Tangled, "Tangled Anycast Testbed",
	<http://www.anycast-testbed.com/>.	<http://www.anycast-testbed.com/>.


	[Ditl17] OARC, D., "2017 DITL data", October 2018,	[Ditl17] DNS-OARC, "2017 DITL Data", April 2017,
	<https://www.dns-oarc.net/oarc/data/ditl/2017>.	<https://www.dns-oarc.net/oarc/data/ditl/2017>.


	[IcannHedge18]	[IcannHedgehog]
	ICANN, ., "DNS-STATS - Hedgehog 2.4.1", October 2018,	"hedgehog", commit b136eb0, May 2021,
	<http://stats.dns.icann.org/hedgehog/>.	<https://github.com/dns-stats/hedgehog>.


	[Jung03a] Jung, J., Berger, A.W., and H. Balakrishnan, "Modeling	[Jung03a] Jung, J., Berger, A., and H. Balakrishnan, "Modeling TTL-
	TTL-based Internet caches", ACM 2003 IEEE INFOCOM,	based Internet Caches", ACM 2003 IEEE INFOCOM,
	DOI 10.1109/INFCOM.2003.1208693, July 2003,	DOI 10.1109/INFCOM.2003.1208693, July 2003,
	<http://www.ieee-infocom.org/2003/papers/11_01.PDF>.	<http://www.ieee-infocom.org/2003/papers/11_01.PDF>.


	[Moura16b] Moura, G.C.M., Schmidt, R.d.O., Heidemann, J., Mueller,	[Moura16b] Moura, G.C.M., Schmidt, R. de O., Heidemann, J., de Vries,
	M., Wei, L., and C. Hesselman, "Anycast vs DDoS Evaluating	W., Müller, M., Wei, L., and C. Hesselman, "Anycast vs.
	the November 2015 Root DNS Events.", ACM 2016 Internet	DDoS: Evaluating the November 2015 Root DNS Event", ACM
	Measurement Conference, DOI /10.1145/2987443.2987446, 14	2016 Internet Measurement Conference,
	October 2016,	DOI 10.1145/2987443.2987446, November 2016,
	<https://www.isi.edu/~johnh/PAPERS/Moura16b.pdf>.	<https://www.isi.edu/~johnh/PAPERS/Moura16b.pdf>.


	[Moura18b] Moura, G.C.M., Heidemann, J., Mueller, M., Schmidt,	[Moura18b] Moura, G.C.M., Heidemann, J., Müller, M., Schmidt, R. de
	R.d.O., and M. Davids, "When the Dike Breaks: Dissecting	O., and M. Davids, "When the Dike Breaks: Dissecting DNS
	DNS Defenses During DDos", ACM 2018 Internet Measurement	Defenses During DDoS", ACM 2018 Internet Measurement
	Conference, DOI 10.1145/3278532.3278534, 31 October 2018,	Conference, DOI 10.1145/3278532.3278534, October 2018,
	<https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf>.	<https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf>.


	[Moura19b] Moura, G., Hardaker, W., Heidemann, J., and R.d.O.	[Moura19b] Moura, G.C.M., Hardaker, W., Heidemann, J., and R. de O.
	Schmidt, "Cache Me If You Can: Effects of DNS Time-to-	Schmidt, "Cache Me If You Can: Effects of DNS Time-to-
	Live", ACM 2019 Internet Measurement Conference,	Live", ACM 2019 Internet Measurement Conference,

	DOI 10.1145/3355369.3355568, n.d.,	DOI 10.1145/3355369.3355568, October 2019,
	<https://www.isi.edu/~hardaker/papers/2019-10-cache-me-	<https://www.isi.edu/~hardaker/papers/2019-10-cache-me-
	ttls.pdf>.	ttls.pdf>.

	[Mueller17b]	[Mueller17b]

	Mueller, M., Moura, G.C.M., Schmidt, R.d.O., and J.	Müller, M., Moura, G.C.M., Schmidt, R. de O., and J.
	Heidemann, "Recursives in the Wild- Engineering	Heidemann, "Recursives in the Wild: Engineering
	Authoritative DNS Servers.", ACM 2017 Internet Measurement	Authoritative DNS Servers", ACM 2017 Internet Measurement
	Conference, DOI 10.1145/3131365.3131366, October 2017,	Conference, DOI 10.1145/3131365.3131366, November 2017,
	<https://www.isi.edu/%7ejohnh/PAPERS/Mueller17b.pdf>.	<https://www.isi.edu/%7ejohnh/PAPERS/Mueller17b.pdf>.

	[Perlroth16]	[Perlroth16]
	Perlroth, N., "Hackers Used New Weapons to Disrupt Major	Perlroth, N., "Hackers Used New Weapons to Disrupt Major
	Websites Across U.S.", October 2016,	Websites Across U.S.", October 2016,
	<https://www.nytimes.com/2016/10/22/business/internet-	<https://www.nytimes.com/2016/10/22/business/internet-
	problems-attack.html>.	problems-attack.html>.

	[RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S.	[RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S.
	Rose, "DNS Security Introduction and Requirements",	Rose, "DNS Security Introduction and Requirements",

	skipping to change at page 19, line 31 ¶	skipping to change at line 826 ¶
	(DS) Resource Records (RRs)", RFC 4509,	(DS) Resource Records (RRs)", RFC 4509,
	DOI 10.17487/RFC4509, May 2006,	DOI 10.17487/RFC4509, May 2006,
	<https://www.rfc-editor.org/info/rfc4509>.	<https://www.rfc-editor.org/info/rfc4509>.

	[RFC8811] Mortensen, A., Ed., Reddy.K, T., Ed., Andreasen, F.,	[RFC8811] Mortensen, A., Ed., Reddy.K, T., Ed., Andreasen, F.,
	Teague, N., and R. Compton, "DDoS Open Threat Signaling	Teague, N., and R. Compton, "DDoS Open Threat Signaling
	(DOTS) Architecture", RFC 8811, DOI 10.17487/RFC8811,	(DOTS) Architecture", RFC 8811, DOI 10.17487/RFC8811,
	August 2020, <https://www.rfc-editor.org/info/rfc8811>.	August 2020, <https://www.rfc-editor.org/info/rfc8811>.

	[RipeAtlas15a]	[RipeAtlas15a]

	Staff, R.N., "RIPE Atlas A Global Internet Measurement	RIPE Network Coordination Centre (RIPE NCC), "RIPE Atlas:
	Network", September 2015, <http://ipj.dreamhosters.com/wp-	A Global Internet Measurement Network", October 2015,
		<http://ipj.dreamhosters.com/wp-
	content/uploads/issues/2015/ipj18-3.pdf>.	content/uploads/issues/2015/ipj18-3.pdf>.

	[RipeAtlas19a]	[RipeAtlas19a]

	NCC, R., "Ripe Atlas - RIPE Network Coordination Centre",	RIPE Network Coordination Centre (RIPE NCC), "RIPE Atlas",
	September 2019, <https://atlas.ripe.net/>.	<https://atlas.ripe.net>.

	[Schmidt17a]	[Schmidt17a]

	Schmidt, R.d.O., Heidemann, J., and J.H. Kuipers, "Anycast	Schmidt, R. de O., Heidemann, J., and J. Kuipers, "Anycast
	Latency - How Many Sites Are Enough. In Proceedings of the	Latency: How Many Sites Are Enough?", PAM 2017 Passive and
	Passive and Active Measurement Workshop", PAM Passive and	Active Measurement Conference,
	Active Measurement Conference, March 2017,	DOI 10.1007/978-3-319-54328-4_14, March 2017,
	<https://www.isi.edu/%7ejohnh/PAPERS/Schmidt17a.pdf>.	<https://www.isi.edu/%7ejohnh/PAPERS/Schmidt17a.pdf>.

	[Singla2014]	[Singla2014]

	Singla, A., Chandrasekaran, B., Godfrey, P.B., and B.	Singla, A., Chandrasekaran, B., Godfrey, P., and B. Maggs,
	Maggs, "The Internet at the speed of light. In Proceedings	"The Internet at the Speed of Light", 13th ACM Workshop on
	of the 13th ACM Workshop on Hot Topics in Networks (Oct	Hot Topics in Networks, DOI 10.1145/2670518.2673876,
	2014)", ACM Workshop on Hot Topics in Networks, October	October 2014,
	2014,
	<http://speedierweb.web.engr.illinois.edu/cspeed/papers/	<http://speedierweb.web.engr.illinois.edu/cspeed/papers/
	hotnets14.pdf>.	hotnets14.pdf>.


	[VerfSrc] Vries, W.d., "Verfploeter source code", November 2018,	[VerfSrc] "Verfploeter Source Code", commit f4792dc, May 2019,
	<https://github.com/Woutifier/verfploeter>.	<https://github.com/Woutifier/verfploeter>.


	[Vries17b] Vries, W.d., Schmidt, R.d.O., Hardaker, W., Heidemann, J.,	[Vries17b] de Vries, W., Schmidt, R. de O., Hardaker, W., Heidemann,
	Boer, P.d., and A. Pras, "Verfploeter - Broad and Load-	J., de Boer, P-T., and A. Pras, "Broad and Load-Aware
	Aware Anycast Mapping", ACM 2017 Internet Measurement	Anycast Mapping with Verfploeter", ACM 2017 Internet
	Conference, DOI 10.1145/3131365.3131371, October 2017,	Measurement Conference, DOI 10.1145/3131365.3131371,
		November 2017,
	<https://www.isi.edu/%7ejohnh/PAPERS/Vries17b.pdf>.	<https://www.isi.edu/%7ejohnh/PAPERS/Vries17b.pdf>.


		Acknowledgements

		We would like to thank the reviewers of this document who offered
		valuable suggestions as well as comments at the IETF DNSOP session
		(IETF 104): Duane Wessels, Joe Abley, Toema Gavrichenkov, John
		Levine, Michael StJohns, Kristof Tuyteleers, Stefan Ubbink, Klaus
		Darilion, and Samir Jafferali.

		Additionally, we would like thank those acknowledged in the papers
		this document summarizes for helping produce the results: RIPE NCC
		and DNS OARC for their tools and datasets used in this research, as
		well as the funding agencies sponsoring the individual research.

		Contributors

		This document is a summary of the main considerations of six research
		papers written by the authors and the following people who
		contributed substantially to the content and should be considered
		coauthors; this document would not have been possible without their
		hard work:

		* Ricardo de O. Schmidt

		* Wouter B. de Vries

		* Moritz Mueller

		* Lan Wei

		* Cristian Hesselman

		* Jan Harm Kuipers

		* Pieter-Tjerk de Boer

		* Aiko Pras

	Authors' Addresses	Authors' Addresses

	Giovane C. M. Moura	Giovane C. M. Moura
	SIDN Labs/TU Delft	SIDN Labs/TU Delft
	Meander 501	Meander 501
	6825 MD Arnhem	6825 MD Arnhem
	Netherlands	Netherlands


	Phone: +31 26 352 5500	Phone: +31 26 352 5500
	Email: giovane.moura@sidn.nl	Email: giovane.moura@sidn.nl

	Wes Hardaker	Wes Hardaker
	USC/Information Sciences Institute	USC/Information Sciences Institute
	PO Box 382	PO Box 382

	Davis, 95617-0382	Davis, CA 95617-0382
	United States of America	United States of America


	Phone: +1 (530) 404-0099	Phone: +1 (530) 404-0099
	Email: ietf@hardakers.net	Email: ietf@hardakers.net

	John Heidemann	John Heidemann
	USC/Information Sciences Institute	USC/Information Sciences Institute
	4676 Admiralty Way	4676 Admiralty Way

	Marina Del Rey, 90292-6695	Marina Del Rey, CA 90292-6695
	United States of America	United States of America


	Phone: +1 (310) 448-8708	Phone: +1 (310) 448-8708
	Email: johnh@isi.edu	Email: johnh@isi.edu


	Marco Davids	Marco Davids
	SIDN Labs	SIDN Labs
	Meander 501	Meander 501
	6825 MD Arnhem	6825 MD Arnhem
	Netherlands	Netherlands


	Phone: +31 26 352 5500	Phone: +31 26 352 5500
	Email: marco.davids@sidn.nl	Email: marco.davids@sidn.nl

End of changes. 134 change blocks.
	444 lines changed or deleted	453 lines changed or added
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/