| rfc9232xml2.original.xml | rfc9232.xml | |||
|---|---|---|---|---|
| <?xml version="1.0" encoding="US-ASCII"?> | <?xml version="1.0" encoding="utf-8"?> | |||
| <!-- This template is for creating an Internet Draft using xml2rfc, | ||||
| which is available here: http://xml.resource.org. --> | ||||
| <!DOCTYPE rfc SYSTEM "rfc2629.dtd"> | ||||
| <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?> | ||||
| <!-- used by XSLT processors --> | ||||
| <!-- For a complete list and description of processing instructions (PIs), | ||||
| please see http://xml.resource.org/authoring/README.html. --> | ||||
| <?rfc strict="yes" ?> | ||||
| <!-- give errors regarding ID-nits and DTD validation --> | ||||
| <!-- control the table of contents (ToC) --> | ||||
| <?rfc toc="yes"?> | ||||
| <!-- generate a ToC --> | ||||
| <?rfc tocdepth="3"?> | ||||
| <!-- the number of levels of subsections in ToC. default: 3 --> | ||||
| <!-- control references --> | ||||
| <?rfc symrefs="yes"?> | ||||
| <!-- use symbolic references tags, i.e, [RFC2119] instead of [1] --> | ||||
| <?rfc sortrefs="yes" ?> | ||||
| <!-- sort the reference entries alphabetically --> | ||||
| <!-- control vertical white space | ||||
| (using these PIs as follows is recommended by the RFC Editor) --> | ||||
| <?rfc compact="yes" ?> | ||||
| <!-- do not start each main section on a new page --> | ||||
| <?rfc subcompact="no" ?> | ||||
| <!-- keep one blank line between list items --> | ||||
| <!-- end of list of popular I-D processing instructions --> | ||||
| <rfc category="info" docName="draft-ietf-opsawg-ntf-13" ipr="trust200902"> | ||||
| <front> | ||||
| <title abbrev="Network Telemetry Framework">Network Telemetry Framework</title> | ||||
| <author fullname="Haoyu Song" initials="H." surname="Song"> | ||||
| <organization>Futurewei</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street/> | ||||
| <city/> | ||||
| <country>USA</country> | ||||
| </postal> | ||||
| <email>haoyu.song@futurewei.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <author fullname="Fengwei Qin" initials="F." surname="Qin"> | ||||
| <organization>China Mobile</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street/> | ||||
| <city/> | ||||
| <country>P.R. China</country> | ||||
| </postal> | ||||
| <email>qinfengwei@chinamobile.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <author fullname="Pedro Martinez-Julia" initials="P." surname="Martinez-Julia"> | ||||
| <organization>NICT</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street/> | ||||
| <city/> | ||||
| <country>Japan</country> | ||||
| </postal> | ||||
| <email>pedro@nict.go.jp</email> | ||||
| </address> | ||||
| </author> | ||||
| <author fullname="Laurent Ciavaglia" initials="L." surname="Ciavaglia"> | ||||
| <organization>Rakuten Mobile</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street/> | ||||
| <city/> | ||||
| <country>France</country> | ||||
| </postal> | ||||
| <email>laurent.ciavaglia@rakuten.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <author fullname="Aijun Wang" initials="A." surname="Wang"> | ||||
| <organization>China Telecom</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street/> | ||||
| <city/> | ||||
| <country>P.R. China</country> | ||||
| </postal> | ||||
| <email>wangaj.bri@chinatelecom.cn</email> | ||||
| </address> | ||||
| </author> | ||||
| <date day="3" month="December" year="2021"/> | ||||
| <area>Operation and Management Area</area> | ||||
| <workgroup>OPSAWG</workgroup> | ||||
| <!-- --> | ||||
| <keyword>Telemetry, OAM</keyword> | ||||
| <abstract> | ||||
| <t>Network telemetry is a technology for gaining network insight and facilitatin | ||||
| g efficient and automated network management. It encompasses various techniques | ||||
| for remote data generation, collection, correlation, and consumption. This docum | ||||
| ent describes an architectural framework for network telemetry, motivated by cha | ||||
| llenges that are encountered as part of the operation of networks and by the req | ||||
| uirements that ensue. This document clarifies the terminologies and classifies t | ||||
| he modules and components of a network telemetry system from different perspecti | ||||
| ves. The framework and taxonomy help to set a common ground for the collection o | ||||
| f related work and provide guidance for related technique and standard developme | ||||
| nts.</t> | ||||
| </abstract> | <!DOCTYPE rfc [ | |||
| </front> | <!ENTITY nbsp " "> | |||
| <middle> | <!ENTITY zwsp "​"> | |||
| <section title="Introduction"> | <!ENTITY nbhy "‑"> | |||
| <!ENTITY wj "⁠"> | ||||
| ]> | ||||
| <t> Network visibility is the ability of management tools to see the state and b | <rfc xmlns:xi="http://www.w3.org/2001/XInclude" docName="draft-ietf-opsawg-ntf-1 | |||
| ehavior of a network, which is essential for successful network operation. Netwo | 3" number="9232" ipr="trust200902" obsoletes="" updates="" submissionType="IETF" | |||
| rk Telemetry revolves around network data that can help provide insights about t | category="info" consensus="true" xml:lang="en" tocInclude="true" tocDepth="3" s | |||
| he current state of the network, including network devices, forwarding, control, | ymRefs="true" sortRefs="true" version="3"> | |||
| and management planes, and that can be generated and obtained through a variety | ||||
| of techniques, including but not limited to network instrumentation and measure | ||||
| ments, and that can be processed for purposes ranging from service assurance to | ||||
| network security using a wide variety of data analytical techniques. In this doc | ||||
| ument, Network Telemetry refer to both the data itself (i.e., "Network Telemetry | ||||
| Data"), and the techniques and processes used to generate, export, collect, and | ||||
| consume that data for use by potentially automated management applications. Net | ||||
| work telemetry extends beyond the classical network Operations, Administration, | ||||
| and Management (OAM) techniques and expects to support better flexibility, scala | ||||
| bility, accuracy, coverage, and performance.</t> | ||||
| <t> However, the term "network telemetry" lacks an unambiguous definition. The s | ||||
| cope and coverage of it cause confusion and misunderstandings. It is beneficial | ||||
| to clarify the concept and provide a clear architectural framework for network t | ||||
| elemetry, so we can articulate the technical field, and better align the related | ||||
| techniques and standard works.</t> | ||||
| <t>To fulfill such an undertaking, we first discuss some key characteristics of | ||||
| network telemetry which set a clear distinction from the conventional network OA | ||||
| M and show that some conventional OAM technologies can be considered a subset of | ||||
| the network telemetry technologies. We then provide an architectural framework | ||||
| for network telemetry which includes four modules, each concerned with a differe | ||||
| nt category of telemetry data and corresponding procedures. All the modules are | ||||
| internally structured in the same way, including components that allow the opera | ||||
| tor to configure data sources in regard to what data to generate and how to make | ||||
| that available to client applications, components that instrument the underlyin | ||||
| g data sources, and components that perform the actual rendering, encoding, and | ||||
| exporting of the generated data. We show how the network telemetry framework can | ||||
| benefit the current and future network operations. Based on the distinction of | ||||
| modules and function components, we can map the existing and emerging techniques | ||||
| and protocols into the framework. The framework can also simplify designing, ma | ||||
| intaining, and understanding a network telemetry system. In addition, we outline | ||||
| the evolution stages of the network telemetry system and discuss the potential | ||||
| security concerns. </t> | ||||
| <t> The purpose of the framework and taxonomy is to set a common ground for the | ||||
| collection of related work and provide guidance for future technique and standar | ||||
| d developments. To the best of our knowledge, this document is the first such ef | ||||
| fort for network telemetry in industry standards organizations. This document do | ||||
| es not define specific technologies.</t> | ||||
| <!-- | ||||
| <section title="Requirements Language"> | ||||
| <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | ||||
| "OPTIONAL" in this document are to be interpreted as described in | ||||
| BCP 14 <xref target="RFC2119"></xref><xref target="RFC8174"></xref> w | ||||
| hen, and only when, they appear in all | ||||
| capitals, as shown here.</t> | ||||
| </section> | ||||
| --> | ||||
| <section title="Applicability Statement"> | <!-- xml2rfc v2v3 conversion 3.12.2 --> | |||
| <front> | ||||
| <title abbrev="Network Telemetry Framework">Network Telemetry Framework</tit | ||||
| le> | ||||
| <seriesInfo name="RFC" value="9232"/> | ||||
| <author fullname="Haoyu Song" initials="H." surname="Song"> | ||||
| <organization>Futurewei</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street/> | ||||
| <city/> | ||||
| <country>United States of America</country> | ||||
| </postal> | ||||
| <email>haoyu.song@futurewei.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <author fullname="Fengwei Qin" initials="F." surname="Qin"> | ||||
| <organization>China Mobile</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street/> | ||||
| <city/> | ||||
| <country>China</country> | ||||
| </postal> | ||||
| <email>qinfengwei@chinamobile.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <author fullname="Pedro Martinez-Julia" initials="P." surname="Martinez-Juli | ||||
| a"> | ||||
| <organization>NICT</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street/> | ||||
| <city/> | ||||
| <country>Japan</country> | ||||
| </postal> | ||||
| <email>pedro@nict.go.jp</email> | ||||
| </address> | ||||
| </author> | ||||
| <author fullname="Laurent Ciavaglia" initials="L." surname="Ciavaglia"> | ||||
| <organization>Rakuten Mobile</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street/> | ||||
| <city/> | ||||
| <country>France</country> | ||||
| </postal> | ||||
| <email>laurent.ciavaglia@rakuten.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <author fullname="Aijun Wang" initials="A." surname="Wang"> | ||||
| <organization>China Telecom</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street/> | ||||
| <city/> | ||||
| <country>China</country> | ||||
| </postal> | ||||
| <email>wangaj3@chinatelecom.cn</email> | ||||
| </address> | ||||
| </author> | ||||
| <date year="2022" month="May" /> | ||||
| <t>Large-scale network data collection is a major threat to user privacy and may | <area>Operations and Management Area</area> | |||
| be indistinguishable from pervasive monitoring <xref target="RFC7258" />. The | <workgroup>OPSAWG</workgroup> | |||
| network telemetry framework presented in this document must not be applied to ge | ||||
| nerating, exporting, collecting, analyzing, or retaining individual user data or | ||||
| any data that can identify end users or characterize their behavior without con | ||||
| sent. Based on this principle, the network telemetry framework is not applicable | ||||
| to networks whose endpoints represent individual users, such as general-purpose | ||||
| access networks. </t> | ||||
| </section> | <keyword>Telemetry</keyword> | |||
| <keyword>OAM</keyword> | ||||
| <section title="Glossary"> | <abstract> | |||
| <t>Before further discussion, we list some key terminology and acronyms used in | <t>Network telemetry is a technology for gaining network insight and facil | |||
| this document. We make an intended differentiation between the terms of network | itating efficient and automated network management. It encompasses various techn | |||
| telemetry and OAM. However, it should be understood that there is not a hard-lin | iques for remote data generation, collection, correlation, and consumption. This | |||
| e distinction between the two concepts. Rather, network telemetry is considered | document describes an architectural framework for network telemetry, motivated | |||
| as an extension of OAM. It covers all the existing OAM protocols but puts more e | by challenges that are encountered as part of the operation of networks and by t | |||
| mphasis on the newer and emerging techniques and protocols concerning all aspect | he requirements that ensue. | |||
| s of network data from acquisition to consumption.</t> | This document clarifies the terminology and classifies the modules and com | |||
| <t> | ponents of a network telemetry system from different perspectives. The framework | |||
| <list style="hanging"> | and taxonomy help to set a common ground for the collection of related work and | |||
| <t hangText="AI:"> Artificial Intelligence. In the network domain, AI refers to | provide guidance for related technique and standard developments.</t> | |||
| the machine-learning based technologies for automated network operation and othe | </abstract> | |||
| r tasks.</t> | </front> | |||
| <t hangText="AM:"> Alternate Marking, a flow performance measurement method, spe | <middle> | |||
| cified in <xref target="RFC8321"/>. </t> | <section numbered="true" toc="default"> | |||
| <t hangText="BMP:"> BGP Monitoring Protocol, specified in <xref target="RFC7854" | <name>Introduction</name> | |||
| />. </t> | <t> Network visibility is the ability of management tools to see the state | |||
| <t hangText="DPI:"> Deep Packet Inspection, referring to the techniques that exa | and behavior of a network, which is essential for successful network operation. | |||
| mines packet beyond packet L3/L4 headers. </t> | Network telemetry revolves around network data that 1) can help provide insight | |||
| <t hangText="gNMI:"> gRPC Network Management Interface, a network management pro | s about the current state of the network, including network devices, forwarding, | |||
| tocol from OpenConfig Operator Working Group, mainly contributed by Google. See | control, and management planes; 2) can be generated and obtained through a vari | |||
| <xref target="gnmi"/> for details. </t> | ety of techniques, including but not limited to network instrumentation and meas | |||
| <t hangText="GPB:"> Google Protocol Buffer, an extensible mechanism for serializ | urements; and 3) can be processed for purposes ranging from service assurance to | |||
| ing structured data. See <xref target="gpb" /> for details. </t> | network security using a wide variety of data analytical techniques. In this do | |||
| <t hangText="gRPC:"> gRPC Remote Procedure Call, an open source high performance | cument, network telemetry refers to both the data itself (i.e., "Network Telemet | |||
| RPC framework that gNMI is based on. See <xref target="grpc"/> for details. </t | ry Data") and the techniques and processes used to generate, export, collect, an | |||
| > | d consume that data for use by potentially automated management applications. Ne | |||
| <t hangText="IPFIX:"> IP Flow Information Export Protocol, specified in <xref ta | twork telemetry extends beyond the classical network Operations, Administration, | |||
| rget="RFC7011"/>. </t> | and Management (OAM) techniques and expects to support better flexibility, scal | |||
| <t hangText="IOAM:"> <xref target="I-D.ietf-ippm-ioam-data">In-situ OAM</xref>, | ability, accuracy, coverage, and performance.</t> | |||
| a dataplane on-path telemetry technique. </t> | <t> However, the term "network telemetry" lacks an unambiguous definition. | |||
| <t hangText="JSON:"> An open standard file format and data interchange format th | The scope and coverage of it cause confusion and misunderstandings. It is benef | |||
| at uses human-readable text to store and transmit data objects, specified in <xr | icial to clarify the concept and provide a clear architectural framework for net | |||
| ef target="RFC8259" />. </t> | work telemetry, so we can articulate the technical field and better align the re | |||
| <t hangText="MIB:"> Management Information Base, a database used for managing th | lated techniques and standard works.</t> | |||
| e entities in a network. </t> | <t>To fulfill such an undertaking, we first discuss some key characteristi | |||
| <t hangText="NETCONF:"> Network Configuration Protocol, specified in <xref targe | cs of network telemetry that set a clear distinction from the conventional netwo | |||
| t="RFC6241"/>. </t> | rk OAM and show that some conventional OAM technologies can be considered a subs | |||
| <t hangText="NetFlow:"> A Cisco protocol for flow record collecting, described i | et of the network telemetry technologies. We then provide an architectural frame | |||
| n <xref target="RFC3954"/>. </t> | work for network telemetry that includes four modules, each associated with a di | |||
| <t hangText="Network Telemetry:"> The process and instrumentation for acquiring | fferent category of telemetry data and corresponding procedures. All the modules | |||
| and utilizing network data remotely for network monitoring and operation. A gene | are internally structured in the same way, including components that allow the | |||
| ral term for a large set of network visibility techniques and protocols, concern | operator to configure data sources in regard to what data to generate and how to | |||
| ing aspects like data generation, collection, correlation, and consumption. Netw | make that available to client applications, components that instrument the unde | |||
| ork telemetry addresses the current network operation issues and enables smooth | rlying data sources, and components that perform the actual rendering, encoding, | |||
| evolution toward future intent-driven autonomous networks.</t> | and exporting of the generated data. We show how the network telemetry framewor | |||
| <t hangText="NMS:"> Network Management System, referring to applications that al | k can benefit current and future network operations. Based on the distinction of | |||
| low network administrators to manage a network. </t> | modules and function components, we can map the existing and emerging technique | |||
| <t hangText="OAM:"> Operations, Administration, and Maintenance. A group of netw | s and protocols into the framework. The framework can also simplify designing, m | |||
| ork management functions that provide network fault indication, fault localizati | aintaining, and understanding a network telemetry system. In addition, we outlin | |||
| on, performance information, and data and diagnosis functions. Most conventional | e the evolution stages of the network telemetry system and discuss the potential | |||
| network monitoring techniques and protocols belong to network OAM.</t> | security concerns. </t> | |||
| <t hangText="PBT:"> Postcard-Based Telemetry, a dataplane on-path telemetry tech | ||||
| nique. A representative technique is described in <xref target="I-D.ietf-ippm-io | ||||
| am-direct-export"/>. </t> | ||||
| <t hangText="RESTCONF:"> An HTTP-based protocol that provides a programmatic int | ||||
| erface for accessing data defined in YANG, using the datastore concepts defined | ||||
| in NETCONF, as specified in <xref target="RFC8040"/>. </t> | ||||
| <t hangText="SMIv2:"> Structure of Management Information Version 2, defining MI | ||||
| B objects, specified in <xref target="RFC2578"/>. </t> | ||||
| <t hangText="SNMP:"> Simple Network Management Protocol. Version 1, 2, and 3 are | ||||
| specified in <xref target="RFC1157"/>, <xref target="RFC3416"/>, and <xref targ | ||||
| et="RFC3411"/>, respectively. </t> | ||||
| <t hangText="XML:"> Extensible Markup Language is a markup language for data enc | ||||
| oding that is both human-readable and machine-readable, specified by W3C <xref t | ||||
| arget="xml" />. </t> | ||||
| <t hangText="YANG:"> YANG is a data modeling language for the definition of data | ||||
| sent over network management protocols such as the NETCONF and RESTCONF. YANG i | ||||
| s defined in <xref target="RFC6020"/> and <xref target="RFC7950"/>. </t> | ||||
| <t hangText="YANG ECA:"> A YANG model for Event-Condition-Action policies, defin | ||||
| ed in <xref target="I-D.wwx-netmod-event-yang"/>. </t> | ||||
| <t hangText="YANG-Push:"> A mechanism that allows subscriber applications to req | ||||
| uest a stream of updates from a YANG datastore on a network device. Details are | ||||
| specified in <xref target="RFC8641"/> and <xref target="RFC8639"/>. </t> | ||||
| </list> | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section title="Background"> | ||||
| <t>The term "big data" is used to describe the extremely large volume of data se | ||||
| ts that can be analyzed computationally to reveal patterns, trends, and associat | ||||
| ions. Networks are undoubtedly a source of big data because of their scale and t | ||||
| he volume of network traffic they forward. When a network's endpoints do not rep | ||||
| resent individual users (e.g. in industrial, datacenter, and infrastructure cont | ||||
| exts), network operations can often benefit from large-scale data collection wit | ||||
| hout breaching user privacy.</t> | ||||
| <t>Today one can access advanced big data analytics capability through a plethor | ||||
| a of commercial and open source platforms (e.g., Apache Hadoop), tools (e.g., Ap | ||||
| ache Spark), and techniques (e.g., machine learning). Thanks to the advance of c | ||||
| omputing and storage technologies, network big data analytics gives network oper | ||||
| ators an opportunity to gain network insights and move towards network autonomy. | ||||
| Some operators start to explore the application of Artificial Intelligence (AI) | ||||
| to make sense of network data. Software tools can use the network data to detec | ||||
| t and react on network faults, anomalies, and policy violations, as well as pred | ||||
| icting future events. In turn, the network policy updates for planning, intrusio | ||||
| n prevention, optimization, and self-healing may be applied.</t> | ||||
| <t>It is conceivable that an <xref target="RFC7575"> autonomic network </xref> i | ||||
| s the logical next step for network evolution following Software Defined Network | ||||
| ing (SDN), aiming to reduce (or even eliminate) human labor, make more efficient | ||||
| use of network resources, and provide better services more aligned with custome | ||||
| r requirements. The IETF ANIMA working group is dedicated to developing and main | ||||
| taining protocols and procedures for automated network management and control of | ||||
| professionally-managed networks. The related technique of <xref target="I-D.irt | ||||
| f-nmrg-ibn-concepts-definitions">Intent-based Networking (IBN)</xref> requires n | ||||
| etwork visibility and telemetry data in order to ensure that the network is beha | ||||
| ving as intended. </t> | ||||
| <t>However, while the data processing capability is improved and applications re | ||||
| quire more data to function better, the networks lag behind in extracting and tr | ||||
| anslating network data into useful and actionable information in efficient ways. | ||||
| The system bottleneck is shifting from data consumption to data supply. Both th | ||||
| e number of network nodes and the traffic bandwidth keep increasing at a fast pa | ||||
| ce. The network configuration and policy change at smaller time slots than befor | ||||
| e. More subtle events and fine-grained data through all network planes need to b | ||||
| e captured and exported in real time. In a nutshell, it is a challenge to get en | ||||
| ough high-quality data out of the network in a manner that is efficient, timely, | ||||
| and flexible. Therefore, we need to survey the existing technologies and protoc | ||||
| ols and identify any potential gaps.</t> | ||||
| <t>In the remainder of this section, first we clarify the scope of network data | ||||
| (i.e., telemetry data) relevant in this document. Then, we discuss several key u | ||||
| se cases for today's and future network operations. Next, we show why the curren | ||||
| t network OAM techniques and protocols are insufficient for these use cases. The | ||||
| discussion underlines the need for new methods, techniques, and protocols, as w | ||||
| ell as the extensions of existing ones, which we assign under the umbrella term | ||||
| - Network Telemetry. </t> | ||||
| <section title="Telemetry Data Coverage"> | ||||
| <t>Any information that can be extracted from networks (including data plane, co | ||||
| ntrol plane, and management plane) and used to gain visibility or as basis for a | ||||
| ctions is considered telemetry data. It includes statistics, event records and l | ||||
| ogs, snapshots of state, configuration data, etc. It also covers the outputs of | ||||
| any active and passive measurements <xref target="RFC7799"/>. In some cases, raw | ||||
| data is processed in network before being sent to a data consumer. Such process | ||||
| ed data is also considered telemetry data. The value of telemetry data varies. I | ||||
| n some cases, if the cost is acceptable, less but higher quality data are prefer | ||||
| red than lots of low quality data. A classification of telemetry data is provide | ||||
| d in <xref target="framework"/>. To preserve the privacy of end-users, no user p | ||||
| acket content should be collected. Specifically, the data objects generated, ex | ||||
| ported, and collected by a network telemetry application should not include any | ||||
| packet payload from traffic associated with end-users systems. </t> | ||||
| </section> | ||||
| <section title="Use Cases"> | ||||
| <t>The following set of use cases is essential for network operations. While the | ||||
| list is by no means exhaustive, it is enough to highlight the requirements for | ||||
| data velocity, variety, volume, and veracity, the attributes of big data, in net | ||||
| works. </t> | ||||
| <t> | ||||
| <list style="symbols"> | ||||
| <t> Security: Network intrusion detection and prevention systems need to monitor | ||||
| network traffic and activities and act upon anomalies. Given increasingly sophi | ||||
| sticated attack vectors coupled with increasingly severe consequences of securit | ||||
| y breaches, new tools and techniques need to be developed, relying on wider and | ||||
| deeper visibility into networks. The ultimate goal is to achieve security with n | ||||
| o, or only minimal, human intervention, and without disrupting legitimate traffi | ||||
| c flows. </t> | ||||
| <t> Policy and Intent Compliance: Network policies are the rules that constrain | ||||
| the services for network access, provide service differentiation, or enforce spe | ||||
| cific treatment on the traffic. For example, a service function chain is a polic | ||||
| y that requires the selected flows to pass through a set of ordered network func | ||||
| tions. Intent, as defined in <xref target="I-D.irtf-nmrg-ibn-concepts-definition | ||||
| s"/>, is a set of operational goals that a network should meet and outcomes that | ||||
| a network is supposed to deliver, defined in a declarative manner without speci | ||||
| fying how to achieve or implement them. An intent requires a complex translation | ||||
| and mapping process before being applied on networks. While a policy or intent | ||||
| is enforced, the compliance needs to be verified and monitored continuously by r | ||||
| elying on visibility that is provided through network telemetry data. Any viola | ||||
| tion must be reported immediately, potentially resulting in updates to how the p | ||||
| olicy or intent is applied in the network to ensure that it remains in force, or | ||||
| otherwise alerting the network administrator to the policy or intent violation. | ||||
| </t> | ||||
| <t> SLA Compliance: A Service-Level Agreement (SLA) is a service contract betwee | ||||
| n a service provider and a client, which include the metrics for the service mea | ||||
| surement and remedy/penalty procedures when the service level misses the agreeme | ||||
| nt. Users need to check if they get the service as promised and network operator | ||||
| s need to evaluate how they can deliver services that can meet the SLA based on | ||||
| realtime network telemetry data, including data from network measurements.</t> | ||||
| <t> Root Cause Analysis: Many network failure can be the effect of a sequence of | ||||
| chained events. Troubleshooting and recovery require quick identification of th | ||||
| e root cause of any observable issues. However, the root cause is not always str | ||||
| aightforward to identify, especially when the failure is sporadic and the number | ||||
| of event messages, both related and unrelated to the same cause, is overwhelmin | ||||
| g. While technologies such as machine learning can be used for root cause analys | ||||
| is, it is up to the network to sense and provide the relevant diagnostic data wh | ||||
| ich are either actively fed into, or passively retrieved by, the root cause anal | ||||
| ysis applications.</t> | ||||
| <t> Network Optimization: This covers all short-term and long-term network optim | ||||
| ization techniques, including load balancing, Traffic Engineering (TE), and netw | ||||
| ork planning. Network operators are motivated to optimize their network utilizat | ||||
| ion and differentiate services for better Return On Investment (ROI) or lower Ca | ||||
| pital Expenditures (CAPEX). The first step is to know the real-time network cond | ||||
| itions before applying policies for traffic manipulation. In some cases, micro-b | ||||
| ursts need to be detected in a very short time-frame so that fine-grained traffi | ||||
| c control can be applied to avoid network congestion. Long-term planning of netw | ||||
| ork capacity and topology requires analysis of real-world network telemetry data | ||||
| that is obtained over long periods of time.</t> | ||||
| <t> Event Tracking and Prediction: The visibility into traffic path and performa | ||||
| nce is critical for services and applications that rely on healthy network opera | ||||
| tion. Numerous related network events are of interest to network operators. For | ||||
| example, Network operators want to learn where and why packets are dropped for a | ||||
| n application flow. They also want to be warned of issues in advance, so proacti | ||||
| ve actions can be taken to avoid catastrophic consequences. </t> | ||||
| </list> | ||||
| </t> | ||||
| </section> | ||||
| <section title="Challenges"> | ||||
| <t>For a long time, network operators have relied upon <xref target="RFC3416">SN | ||||
| MP</xref>, Command-Line Interface (CLI), or <xref target="RFC5424">Syslog</xref> | ||||
| to monitor the network. Some other OAM techniques as described in <xref target= | ||||
| "RFC7276"/> are also used to facilitate network troubleshooting. These conventio | ||||
| nal techniques are not sufficient to support the above use cases for the followi | ||||
| ng reasons: </t> | ||||
| <t> | ||||
| <list style="symbols"> | ||||
| <t>Most use cases need to continuously monitor the network and dynamically refin | ||||
| e the data collection in real-time. Poll-based low-frequency data collection is | ||||
| ill-suited for these applications. Subscription-based streaming data directly pu | ||||
| shed from the data source (e.g., the forwarding chip) is preferred to provide su | ||||
| fficient data quantity and precision at scale.</t> | ||||
| <t>Comprehensive data is needed, ranging from packet processing engines to traff | ||||
| ic manager, from line cards to main control board, from user flows to control pr | ||||
| otocol packets, from device configurations to operations, and from physical laye | ||||
| r to application layer. Conventional OAM only covers a narrow range of data (e.g | ||||
| ., SNMP only handles data from the Management Information Base (MIB)). Classical | ||||
| network devices cannot provide all the necessary probes. More open and programm | ||||
| able network devices are therefore needed.</t> | ||||
| <t>Many application scenarios need to correlate network-wide data from multiple | ||||
| sources (i.e., from distributed network devices, different components of a netwo | ||||
| rk device, or different network planes). A piecemeal solution is often lacking t | ||||
| he capability to consolidate the data from multiple sources. The composition of | ||||
| a complete solution, as partly proposed by <xref target="I-D.pedro-nmrg-anticipa | ||||
| ted-adaptation">Autonomic Resource Control Architecture(ARCA)</xref>, will be em | ||||
| powered and guided by a comprehensive framework. </t> | ||||
| <t>Some conventional OAM techniques (e.g., CLI and Syslog) lack a formal data mo | ||||
| del. The unstructured data hinder the tool automation and application extensibil | ||||
| ity. Standardized data models are essential to support the programmable networks | ||||
| . </t> | ||||
| <t>Although some conventional OAM techniques support data push (e.g., <xref targ | ||||
| et="RFC2981">SNMP Trap</xref><xref target="RFC3877"/>, Syslog, and <xref target= | ||||
| "RFC3176">sFlow</xref>), the pushed data are limited to only predefined manageme | ||||
| nt plane warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow). Netwo | ||||
| rk operators require the data with arbitrary source, granularity, and precision | ||||
| which are beyond the capability of the existing techniques. </t> | ||||
| <t>The conventional passive measurement techniques can either consume excessive | ||||
| network resources and produce excessive redundant data, or lead to inaccurate re | ||||
| sults; on the other hand, the conventional active measurement techniques can int | ||||
| erfere with the user traffic and their results are indirect. Techniques that can | ||||
| collect direct and on-demand data from user traffic are more favorable.</t> | ||||
| </list> | ||||
| </t> | ||||
| <t>These challenges were addressed by newer standards and techniques (e.g., IPFI | ||||
| X/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push) and more are emerging. | ||||
| These standards and techniques need to be recognized and accommodated in a new f | ||||
| ramework.</t> | ||||
| </section> | ||||
| <section title="Network Telemetry"> | <t> The purpose of the framework and taxonomy is to set a common ground fo | |||
| <t>Network telemetry has emerged as a mainstream technical term to refer to the | r the collection of related work and provide guidance for future technique and s | |||
| network data collection and consumption techniques. Several network telemetry te | tandard developments. To the best of our knowledge, this document is the first s | |||
| chniques and protocols (e.g., <xref target="RFC7011">IPFIX</xref> and <xref targ | uch effort for network telemetry in industry standards organizations. This docum | |||
| et="grpc">gRPC</xref>) have been widely deployed. Network telemetry allows separ | ent does not define specific technologies.</t> | |||
| ate entities to acquire data from network devices so that data can be visualized | ||||
| and analyzed to support network monitoring and operation. Network telemetry cov | ||||
| ers the conventional network OAM and has a wider scope. For instance, it is expe | ||||
| cted that network telemetry can provide the necessary network insight for autono | ||||
| mous networks and address the shortcomings of conventional OAM techniques. </t> | ||||
| <t>Network telemetry usually assumes machines as data consumers rather than huma | ||||
| n operators. Hence, the network telemetry can directly trigger the automated net | ||||
| work operation, while in contrast some conventional OAM tools were designed and | ||||
| used to help human operators to monitor and diagnose the networks and guide manu | ||||
| al network operations. Such a proposition leads to very different techniques. </ | ||||
| t> | ||||
| <t>Although new network telemetry techniques are emerging and subject to continu | ||||
| ous evolution, several characteristics of network telemetry have been well accep | ||||
| ted. Note that network telemetry is intended to be an umbrella term covering a w | ||||
| ide spectrum of techniques, so the following characteristics are not expected to | ||||
| be held by every specific technique.</t> | ||||
| <t> | ||||
| <list style="symbols"> | ||||
| <t>Push and Streaming: Instead of polling data from network devices, telemetry c | ||||
| ollectors subscribe to streaming data pushed from data sources in network device | ||||
| s.</t> | ||||
| <t>Volume and Velocity: The telemetry data is intended to be consumed by machine | ||||
| s rather than by human being. Therefore, the data volume can be huge and the pro | ||||
| cessing is optimized for the needs of automation in realtime.</t> | ||||
| <t>Normalization and Unification: Telemetry aims to address the overall network | ||||
| automation needs. Efforts are made to normalize the data representation and unif | ||||
| y the protocols, so as to simplify data analysis and provide integrated analysis | ||||
| across heterogeneous devices and data sources across a network.</t> | ||||
| <t>Model-based: The telemetry data is modeled in advance which allows applicatio | ||||
| ns to configure and consume data with ease. </t> | ||||
| <t>Data Fusion: The data for a single application can come from multiple data so | ||||
| urces (e.g., cross-domain, cross-device, and cross-layer) based on common naming | ||||
| /ID and needs to be correlated to take effect.</t> | ||||
| <t>Dynamic and Interactive: Since the network telemetry means to be used in a cl | ||||
| osed control loop for network automation, it needs to run continuously and adapt | ||||
| to the dynamic and interactive queries from the network operation controller. < | ||||
| /t> | ||||
| </list> | ||||
| </t> | ||||
| <t>In addition, an ideal network telemetry solution may also have the following | ||||
| features or properties:</t> | ||||
| <t> | ||||
| <list style="symbols"> | ||||
| <t>In-Network Customization: The data that is generated can be customized in net | ||||
| work at run-time to cater to the specific need of applications. This needs the s | ||||
| upport of a programmable data plane which allows probes with custom functions to | ||||
| be deployed at flexible locations. </t> | ||||
| <t>In-Network Data Aggregation and Correlation: Network devices and aggregation | ||||
| points can work out which events and what data needs to be stored, reported, or | ||||
| discarded thus reducing the load on the central collection and processing points | ||||
| while still ensuring that the right information is ready to be processed in a t | ||||
| imely way.</t> | ||||
| <t>In-Network Processing: Sometimes it is not necessary or feasible to gather al | ||||
| l information to a central point to be processed and acted upon. It is possible | ||||
| for the data processing to be done in network, allowing reactive actions to be t | ||||
| aken locally.</t> | ||||
| <t>Direct Data Plane Export: The data originated from the data plane forwarding | ||||
| chips can be directly exported to the data consumer for efficiency, especially w | ||||
| hen the data bandwidth is large and the real-time processing is required. </t> | ||||
| <t>In-band Data Collection: In addition to the passive and active data collectio | ||||
| n approaches, the new hybrid approach allows to directly collect data for any ta | ||||
| rget flow on its entire forwarding path <xref target="I-D.song-opsawg-ifit-frame | ||||
| work"/>. </t> | ||||
| </list> | ||||
| </t> | ||||
| <t>It is worth noting that a network telemetry system should not be intrusive to | ||||
| normal network operations by avoiding the pitfall of the "observer effect". Tha | ||||
| t is, it should not change the network behavior and affect the forwarding perfor | ||||
| mance. Moreover, high-volume telemetry traffic may cause network congestion unle | ||||
| ss proper isolation or traffic engineering techniques are in place, or congestio | ||||
| n control mechanisms ensure that telemetry traffic backs off if it exceeds the n | ||||
| etwork capacity. <xref target="RFC8084" /> and <xref target="RFC8085" /> are rel | ||||
| evant Best Current Practices (BCP) in this space.</t> | ||||
| <t>Although in many cases a system for network telemetry involves a remote data | ||||
| collecting and consuming entity, it is important to understand that there are no | ||||
| inherent assumptions about how a system should be architected. While a network | ||||
| architecture with centralized controller (e.g., SDN) seems a natural fit for net | ||||
| work telemetry, network telemetry can work in distributed fashions as well. For | ||||
| example, telemetry data producers and consumers can have a peer-to-peer relatio | ||||
| nship, in which a network node can be the direct consumer of telemetry data from | ||||
| other nodes. </t> | ||||
| </section> | ||||
| <section title="The Necessity of a Network Telemetry Framework"> | <section numbered="true" toc="default"> | |||
| <t>Network data analytics (e.g., machine learning) is applied for network operat | <name>Applicability Statement</name> | |||
| ion automation, relying on abundant and coherent data from networks. Data acquis | <t>Large-scale network data collection is a major threat to user privacy | |||
| ition that is limited to a single source and static in nature will in many cases | and may be indistinguishable from pervasive monitoring <xref target="RFC7258" f | |||
| not be sufficient to meet an application's telemetry data needs. As a result, m | ormat="default"/>. The network telemetry framework presented in this document m | |||
| ultiple data sources, involving a variety of techniques and standards, will need | ust not be applied to generating, exporting, collecting, analyzing, or retaining | |||
| to be integrated. It is desirable to have a framework that classifies and organ | individual user data or any data that can identify end users or characterize th | |||
| izes different telemetry data source and types, defines different components of | eir behavior without consent. Based on this principle, the network telemetry fra | |||
| a network telemetry system and their interactions, and helps coordinate and inte | mework is not applicable to networks whose endpoints represent individual users, | |||
| grate multiple telemetry approaches across layers. This allows flexible combinat | such as general-purpose access networks. </t> | |||
| ions of data for different applications, while normalizing and simplifying inter | </section> | |||
| faces. In detail, such a framework would benefit the development of network oper | <section numbered="true" toc="default"> | |||
| ation applications for the following reasons:</t> | <name>Glossary</name> | |||
| <t> | <t>Before further discussion, we list some key terminology and abbreviat | |||
| <list style="symbols"> | ions used in this document. There is an intended differentiation between the ter | |||
| <t>Future networks, autonomous or otherwise, depend on holistic and comprehensiv | ms of network telemetry and OAM. However, it should be understood that there is | |||
| e network visibility. The use cases and applications are better to be supported | not a hard-line distinction between the two concepts. Rather, network telemetry | |||
| uniformly and coherently using an integrated, converged mechanism and common tel | is considered an extension of OAM. It covers all the existing OAM protocols but | |||
| emetry data representations wherever feasible. Therefore, the protocols and mech | puts more emphasis on the newer and emerging techniques and protocols concerning | |||
| anisms should be consolidated into a minimum yet comprehensive set. A telemetry | all aspects of network data from acquisition to consumption.</t> | |||
| framework can help to normalize the technique developments.</t> | <dl newline="false" spacing="normal" indent="12"> | |||
| <t>Network visibility presents multiple viewpoints. For example, the device view | <dt>AI:</dt> | |||
| point takes the network infrastructure as the monitoring object from which the n | <dd> Artificial Intelligence. In the network domain, AI refers to mach | |||
| etwork topology and device status can be acquired; the traffic viewpoint takes t | ine-learning-based technologies for automated network operation and other tasks. | |||
| he flows or packets as the monitoring object from which the traffic quality and | </dd> | |||
| path can be acquired. An application may need to switch its viewpoint during ope | <dt>AM:</dt> | |||
| ration. It may also need to correlate a service and its impact on user experienc | <dd> Alternate Marking. A flow performance measurement method, as spec | |||
| e to acquire the comprehensive information.</t> | ified in <xref target="RFC8321" format="default"/>. </dd> | |||
| <t>Applications require network telemetry to be elastic in order to make efficie | <dt>BMP:</dt> | |||
| nt use of network resources and reduce the impact of processing related to netwo | <dd>BGP Monitoring Protocol. Specified in <xref target="RFC7854" forma | |||
| rk telemetry on network performance. For example, routine network monitoring sho | t="default"/>. </dd> | |||
| uld cover the entire network with a low data sampling rate. Only when issues ari | <dt>DPI:</dt> | |||
| se or critical trends emerge should telemetry data sources be modified and telem | <dd>Deep Packet Inspection. Refers to the techniques that examine pack | |||
| etry data rates boosted as needed.</t> | ets beyond packet L3/L4 headers. </dd> | |||
| <t>Efficient data aggregation is critical for applications to reduce the overall | <dt>gNMI:</dt> | |||
| quantity of data and improve the accuracy of analysis.</t> | <dd>gRPC Network Management Interface. A network management protocol f | |||
| </list> | rom the OpenConfig Operator Working Group, mainly contributed by Google. See <xr | |||
| </t> | ef target="gnmi" format="default"/> for details. </dd> | |||
| <t> A telemetry framework collects together all the telemetry-related works from | <dt>GPB:</dt> | |||
| different sources and working groups within IETF. This makes it possible to ass | <dd>Google Protocol Buffer. An extensible mechanism for serializing st | |||
| emble a comprehensive network telemetry system and to avoid repetitious or redun | ructured data. See <xref target="gpb" format="default"/> for details. </dd> | |||
| dant work. The framework should cover the concepts and components from the stand | <dt>gRPC:</dt> | |||
| ardization perspective. This document describes the modules which make up a netw | <dd>gRPC Remote Procedure Call. An open-source high-performance RPC fr | |||
| ork telemetry framework and decomposes the telemetry system into a set of distin | amework that gNMI is based on. See <xref target="grpc" format="default"/> for de | |||
| ct components that existing and future work can easily map to.</t> | tails. </dd> | |||
| <dt>IPFIX:</dt> | ||||
| <dd>IP Flow Information Export Protocol. Specified in <xref target="RF | ||||
| C7011" format="default"/>. </dd> | ||||
| <dt>IOAM:</dt> | ||||
| <dd> | ||||
| <xref target="RFC9197" format="default">In situ OAM</xref>. A data p | ||||
| lane on-path telemetry technique. </dd> | ||||
| <dt>JSON:</dt> | ||||
| <dd>JavaScript Object Notation. An open standard file format and data | ||||
| interchange format that uses human-readable text to store and transmit data obje | ||||
| cts, as specified in <xref target="RFC8259" format="default"/>. </dd> | ||||
| <dt>MIB:</dt> | ||||
| <dd>Management Information Base. A database used for managing the enti | ||||
| ties in a network. </dd> | ||||
| <dt>NETCONF:</dt> | ||||
| <dd>Network Configuration Protocol. Specified in <xref target="RFC6241 | ||||
| " format="default"/>. </dd> | ||||
| <dt>NetFlow:</dt> | ||||
| <dd>A Cisco protocol used for flow record collecting, as described in | ||||
| <xref target="RFC3954" format="default"/>. </dd> | ||||
| <dt>Network Telemetry:</dt> | ||||
| <dd>The process and instrumentation for acquiring and utilizing networ | ||||
| k data remotely for network monitoring and operation. A general term for a large | ||||
| set of network visibility techniques and protocols, concerning aspects like dat | ||||
| a generation, collection, correlation, and consumption. Network telemetry addres | ||||
| ses current network operation issues and enables smooth evolution toward future | ||||
| intent-driven autonomous networks.</dd> | ||||
| <dt>NMS:</dt> | ||||
| <dd>Network Management System. Refers to applications that allow netwo | ||||
| rk administrators to manage a network. </dd> | ||||
| <dt>OAM:</dt> | ||||
| <dd>Operations, Administration, and Maintenance. A group of network ma | ||||
| nagement functions that provide network fault indication, fault localization, pe | ||||
| rformance information, and data and diagnosis functions. Most conventional netwo | ||||
| rk monitoring techniques and protocols belong to network OAM.</dd> | ||||
| </section> | <dt>PBT:</dt> | |||
| </section> | <dd>Postcard-Based Telemetry. A data plane on-path telemetry technique | |||
| . A representative technique is described in <xref target="IPPM-IOAM-DIRECT-EXPO | ||||
| RT" format="default"/>. </dd> | ||||
| <dt>RESTCONF:</dt> | ||||
| <dd> An HTTP-based protocol that provides a programmatic interface for | ||||
| accessing data defined in YANG, using the datastore concepts defined in NETCONF | ||||
| , as specified in <xref target="RFC8040" format="default"/>. </dd> | ||||
| <dt>SMIv2:</dt> | ||||
| <dd>Structure of Management Information Version 2. Defines MIB objects | ||||
| , as specified in <xref target="RFC2578" format="default"/>. </dd> | ||||
| <dt>SNMP:</dt> | ||||
| <dd>Simple Network Management Protocol. Versions 1, 2, and 3 are speci | ||||
| fied in <xref target="RFC1157" format="default"/>, <xref target="RFC3416" format | ||||
| ="default"/>, and <xref target="RFC3411" format="default"/>, respectively. </dd> | ||||
| <dt>XML:</dt> | ||||
| <dd>Extensible Markup Language. A markup language for data encoding th | ||||
| at is both human readable and machine readable, as specified by W3C <xref target | ||||
| ="W3C.REC-xml-20081126" format="default"/>. </dd> | ||||
| <dt>YANG:</dt> | ||||
| <dd>YANG is a data modeling language for the definition of data sent o | ||||
| ver network management protocols such as NETCONF and RESTCONF. YANG is defined i | ||||
| n <xref target="RFC6020" format="default"/> and <xref target="RFC7950" format="d | ||||
| efault"/>. </dd> | ||||
| <dt>YANG ECA:</dt> | ||||
| <dd>A YANG model for Event-Condition-Action policies, as defined in <x | ||||
| ref target="I-D.ietf-netmod-eca-policy" format="default"/>. </dd> | ||||
| <dt>YANG-Push:</dt> | ||||
| <dd> A mechanism that allows subscriber applications to request a stre | ||||
| am of updates from a YANG datastore on a network device. Details are specified i | ||||
| n <xref target="RFC8639" format="default"/> and <xref target="RFC8641" format="d | ||||
| efault"/>. </dd> | ||||
| </dl> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Background</name> | ||||
| <t>The term "big data" is used to describe the extremely large volume of d | ||||
| ata sets that can be analyzed computationally to reveal patterns, trends, and as | ||||
| sociations. Networks are undoubtedly a source of big data because of their scale | ||||
| and the volume of network traffic they forward. When a network's endpoints do n | ||||
| ot represent individual users (e.g., in industrial, data-center, and infrastruct | ||||
| ure contexts), network operations can often benefit from large-scale data collec | ||||
| tion without breaching user privacy.</t> | ||||
| <t>Today, one can access advanced big data analytics capability through a | ||||
| plethora of commercial and open-source platforms (e.g., Apache Hadoop), tools (e | ||||
| .g., Apache Spark), and techniques (e.g., machine learning). Thanks to the advan | ||||
| ce of computing and storage technologies, network big data analytics give networ | ||||
| k operators an opportunity to gain network insights and move towards network aut | ||||
| onomy. Some operators start to explore the application of Artificial Intelligenc | ||||
| e (AI) to make sense of network data. Software tools can use the network data to | ||||
| detect and react on network faults, anomalies, and policy violations, as well a | ||||
| s predict future events. In turn, the network policy updates for planning, intru | ||||
| sion prevention, optimization, and self-healing may be applied.</t> | ||||
| <t>It is conceivable that an <xref target="RFC7575" format="default"> auto | ||||
| nomic network </xref> is the logical next step for network evolution following S | ||||
| oftware-Defined Networking (SDN), which aims to reduce (or even eliminate) human | ||||
| labor, make more efficient use of network resources, and provide better service | ||||
| s more aligned with customer requirements. The IETF ANIMA Working Group is dedic | ||||
| ated to developing and maintaining protocols and procedures for automated networ | ||||
| k management and control of professionally managed networks. The related techniq | ||||
| ue of <xref target="I-D.irtf-nmrg-ibn-concepts-definitions" format="default">Int | ||||
| ent-Based Networking (IBN)</xref> requires network visibility and telemetry data | ||||
| in order to ensure that the network is behaving as intended. </t> | ||||
| <t>However, while the data processing capability is improved and applicati | ||||
| ons require more data to function better, the networks lag behind in extracting | ||||
| and translating network data into useful and actionable information in efficient | ||||
| ways. The system bottleneck is shifting from data consumption to data supply. B | ||||
| oth the number of network nodes and the traffic bandwidth keep increasing at a f | ||||
| ast pace. The network configuration and policy change at smaller time slots than | ||||
| before. More subtle events and fine-grained data through all network planes nee | ||||
| d to be captured and exported in real time. In a nutshell, it is a challenge to | ||||
| get enough high-quality data out of the network in a manner that is efficient, t | ||||
| imely, and flexible. Therefore, we need to survey the existing technologies and | ||||
| protocols and identify any potential gaps.</t> | ||||
| <t>In the remainder of this section, we first clarify the scope of network | ||||
| data (i.e., telemetry data) relevant in this document. Then, we discuss several | ||||
| key use cases for network operations of today and the future. Next, we show why | ||||
| the current network OAM techniques and protocols are insufficient for these use | ||||
| cases. The discussion underlines the need for new methods, techniques, and prot | ||||
| ocols, as well as the extensions of existing ones, which we assign under the umb | ||||
| rella term "Network Telemetry". </t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Telemetry Data Coverage</name> | ||||
| <t>Any information that can be extracted from networks (including the da | ||||
| ta plane, control plane, and management plane) and used to gain visibility or as | ||||
| a basis for actions is considered telemetry data. It includes statistics, event | ||||
| records and logs, snapshots of state, configuration data, etc. It also covers t | ||||
| he outputs of any active and passive measurements <xref target="RFC7799" format= | ||||
| "default"/>. In some cases, raw data is processed in network before being sent t | ||||
| o a data consumer. Such processed data is also considered telemetry data. The va | ||||
| lue of telemetry data varies. In some cases, if the cost is acceptable, less but | ||||
| higher-quality data are preferred rather than a lot of low-quality data. A clas | ||||
| sification of telemetry data is provided in <xref target="framework" format="def | ||||
| ault"/>. To preserve the privacy of end users, no user packet content should be | ||||
| collected. Specifically, the data objects generated, exported, and collected by | ||||
| a network telemetry application should not include any packet payload from traf | ||||
| fic associated with end-user systems. </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Use Cases</name> | ||||
| <t>The following set of use cases is essential for network operations. W | ||||
| hile the list is by no means exhaustive, it is enough to highlight the requireme | ||||
| nts for data velocity, variety, volume, and veracity, the attributes of big data | ||||
| , in networks. </t> | ||||
| <ul spacing="normal"> | ||||
| <li> Security: Network intrusion detection and prevention systems need | ||||
| to monitor network traffic and activities and act upon anomalies. Given increas | ||||
| ingly sophisticated attack vectors coupled with increasingly severe consequences | ||||
| of security breaches, new tools and techniques need to be developed, relying on | ||||
| wider and deeper visibility into networks. The ultimate goal is to achieve secu | ||||
| rity with no, or only minimal, human intervention and without disrupting legitim | ||||
| ate traffic flows. </li> | ||||
| <li>Policy and Intent Compliance: Network policies are the rules that | ||||
| constrain the services for network access, provide service differentiation, or e | ||||
| nforce specific treatment on the traffic. For example, a service function chain | ||||
| is a policy that requires the selected flows to pass through a set of ordered ne | ||||
| twork functions. Intent, as defined in <xref target="I-D.irtf-nmrg-ibn-concepts- | ||||
| definitions" format="default"/>, is a set of operational goals that a network sh | ||||
| ould meet and outcomes that a network is supposed to deliver, defined in a decla | ||||
| rative manner without specifying how to achieve or implement them. An intent req | ||||
| uires a complex translation and mapping process before being applied on networks | ||||
| . While a policy or intent is enforced, the compliance needs to be verified and | ||||
| monitored continuously by relying on visibility that is provided through network | ||||
| telemetry data. Any violation must be reported immediately - this will alert th | ||||
| e network | ||||
| administrator to the policy or intent violation and will potentially | ||||
| result in updates to how the policy or intent is applied in the network to | ||||
| ensure that it remains in force. </li> | ||||
| <li>SLA Compliance: A Service Level Agreement (SLA) is a service contr | ||||
| act between a service provider and a client, which includes the metrics for the | ||||
| service measurement and remedy/penalty procedures when the service level misses | ||||
| the agreement. Users need to check if they get the service as promised, and netw | ||||
| ork operators need to evaluate how they can deliver services that meet the SLA b | ||||
| ased on real-time network telemetry data, including data from network measuremen | ||||
| ts.</li> | ||||
| <li>Root Cause Analysis: Many network failures can be the effect of a | ||||
| sequence of chained events. Troubleshooting and recovery require quick identific | ||||
| ation of the root cause of any observable issues. However, the root cause is not | ||||
| always straightforward to identify, especially when the failure is sporadic and | ||||
| the number of event messages, both related and unrelated to the same cause, is | ||||
| overwhelming. While technologies such as machine learning can be used for root c | ||||
| ause analysis, it is up to the network to sense and provide the relevant diagnos | ||||
| tic data that are either actively fed into or passively retrieved by the root ca | ||||
| use analysis applications.</li> | ||||
| <li>Network Optimization: This covers all short-term and long-term net | ||||
| work optimization techniques, including load balancing, Traffic Engineering (TE) | ||||
| , and network planning. Network operators are motivated to optimize their networ | ||||
| k utilization and differentiate services for better Return on Investment (ROI) o | ||||
| r lower Capital Expenditure (CAPEX). The first step is to know the real-time net | ||||
| work conditions before applying policies for traffic manipulation. In some cases | ||||
| , microbursts need to be detected in a very short time frame so that fine-graine | ||||
| d traffic control can be applied to avoid network congestion. Long-term planning | ||||
| of network capacity and topology requires analysis of real-world network teleme | ||||
| try data that is obtained over long periods of time.</li> | ||||
| <li>Event Tracking and Prediction: The visibility into traffic path an | ||||
| d performance is critical for services and applications that rely on healthy net | ||||
| work operation. Numerous related network events are of interest to network opera | ||||
| tors. For example, network operators want to learn where and why packets are dro | ||||
| pped for an application flow. They also want to be warned of issues in advance, | ||||
| so proactive actions can be taken to avoid catastrophic consequences. </li> | ||||
| </ul> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Challenges</name> | ||||
| <t>For a long time, network operators have relied upon <xref target="RFC | ||||
| 3416" format="default">SNMP</xref>, Command-Line Interface (CLI), or <xref targe | ||||
| t="RFC5424" format="default">Syslog</xref> to monitor the network. Some other OA | ||||
| M techniques as described in <xref target="RFC7276" format="default"/> are also | ||||
| used to facilitate network troubleshooting. These conventional techniques are no | ||||
| t sufficient to support the above use cases for the following reasons: </t> | ||||
| <ul spacing="normal"> | ||||
| <li>Most use cases need to continuously monitor the network and dynami | ||||
| cally refine the data collection in real time. Poll-based low-frequency data col | ||||
| lection is ill-suited for these applications. Subscription-based streaming data | ||||
| directly pushed from the data source (e.g., the forwarding chip) is preferred to | ||||
| provide sufficient data quantity and precision at scale.</li> | ||||
| <li>Comprehensive data is needed, ranging from packet processing engin | ||||
| es to traffic managers, line cards to main control boards, user flows to control | ||||
| protocol packets, device configurations to operations, and physical layers to a | ||||
| pplication layers. Conventional OAM only covers a narrow range of data (e.g., SN | ||||
| MP only handles data from the Management Information Base (MIB)). Classical netw | ||||
| ork devices cannot provide all the necessary probes. More open and programmable | ||||
| network devices are therefore needed.</li> | ||||
| <li>Many application scenarios need to correlate network-wide data fro | ||||
| m multiple sources (i.e., from distributed network devices, different components | ||||
| of a network device, or different network planes). A piecemeal solution is ofte | ||||
| n lacking the capability to consolidate the data from multiple sources. The comp | ||||
| osition of a complete solution, as partly proposed by <xref target="NMRG-ANTICIP | ||||
| ATED-ADAPTATION" format="default">Autonomic Resource Control Architecture (ARCA) | ||||
| </xref>, will be empowered and guided by a comprehensive framework. </li> | ||||
| <li>Some conventional OAM techniques (e.g., CLI and Syslog) lack a for | ||||
| mal data model. The unstructured data hinder the tool automation and application | ||||
| extensibility. Standardized data models are essential to support the programmab | ||||
| le networks. </li> | ||||
| <section anchor="framework" title="Network Telemetry Framework"> | <li>Although some conventional OAM techniques support data push (e.g., | |||
| <t> The top level network telemetry framework partitions the network telemetry i | <xref target="RFC2981" format="default">SNMP Trap</xref><xref target="RFC3877" f | |||
| nto four modules based on the telemetry data object source and represents their | ormat="default"/>, Syslog, and <xref target="RFC3176" format="default">sFlow</xr | |||
| relationship. Once the network operation applications acquire the data from thes | ef>), the pushed data are limited to only predefined management plane warnings ( | |||
| e modules, they can apply data analytics and take actions. At the next level, th | e.g., SNMP Trap) or sampled user packets (e.g., sFlow). Network operators requir | |||
| e framework decomposes each module into separate components. Each of the modules | e the data with arbitrary source, granularity, and precision, which is beyond th | |||
| follows the same underlying structure, with one component dedicated to the conf | e capability of the existing techniques. </li> | |||
| iguration of data subscriptions and data sources, a second component dedicated t | <li>Conventional passive measurement techniques can either consume exc | |||
| o encoding and exporting data, and a third component instrumenting the generatio | essive network resources and produce excessive redundant data or lead to inaccur | |||
| n of telemetry related to the underlying resources. Throughout the framework, th | ate results; on the other hand, conventional active measurement techniques can i | |||
| e same set of abstract data acquiring mechanisms and data types (<xref target="s | nterfere with the user traffic, and their results are indirect. Techniques that | |||
| ec:type"/>) are applied. The two-level architecture with the uniform data abstra | can collect direct and on-demand data from user traffic are more favorable.</li> | |||
| ction helps accurately pinpoint a protocol or technique to its position in a net | </ul> | |||
| work telemetry system or disaggregate a network telemetry system into manageable | <t>These challenges were addressed by newer standards and techniques (e. | |||
| parts.</t> | g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push), and more are e | |||
| <section title="Top Level Modules"> | merging. These standards and techniques need to be recognized and accommodated i | |||
| <t> Telemetry can be applied on the forwarding plane, the control plane, and the | n a new framework.</t> | |||
| management plane in a network, as well as other sources out of the network, as | </section> | |||
| shown in <xref target="figure_1"/>. Therefore, we categorize the network telemet | <section numbered="true" toc="default"> | |||
| ry into four distinct modules (management plane, control plane, forwarding plane | <name>Network Telemetry</name> | |||
| , and external data and event telemetry) with each having its own interface to N | <t>Network telemetry has emerged as a mainstream technical term to refer | |||
| etwork Operation Applications.</t> | to the network data collection and consumption techniques. Several network tele | |||
| <t> | metry techniques and protocols (e.g., <xref target="RFC7011" format="default">IP | |||
| <figure anchor="figure_1" title="Modules in Layer Category of NTF"> | FIX</xref> and <xref target="grpc" format="default">gRPC</xref>) have been widel | |||
| <artwork><![CDATA[ | y deployed. Network telemetry allows separate entities to acquire data from netw | |||
| ork devices so that data can be visualized and analyzed to support network monit | ||||
| oring and operation. Network telemetry covers the conventional network OAM and h | ||||
| as a wider scope. For instance, it is expected that network telemetry can provid | ||||
| e the necessary network insight for autonomous networks and address the shortcom | ||||
| ings of conventional OAM techniques. </t> | ||||
| <t>Network telemetry usually assumes machines as data consumers rather t | ||||
| han human operators. Hence, network telemetry can directly trigger the automated | ||||
| network operation, while in contrast, some conventional OAM tools were designed | ||||
| and used to help human operators to monitor and diagnose the networks and guide | ||||
| manual network operations. Such a proposition leads to very different technique | ||||
| s. </t> | ||||
| <t>Although new network telemetry techniques are emerging and subject to | ||||
| continuous evolution, several characteristics of network telemetry have been we | ||||
| ll accepted. Note that network telemetry is intended to be an umbrella term cove | ||||
| ring a wide spectrum of techniques, so the following characteristics are not exp | ||||
| ected to be held by every specific technique.</t> | ||||
| <ul spacing="normal"> | ||||
| <li>Push and Streaming: Instead of polling data from network devices, | ||||
| telemetry collectors subscribe to streaming data pushed from data sources in net | ||||
| work devices.</li> | ||||
| <li>Volume and Velocity: Telemetry data is intended to be consumed by | ||||
| machines rather than by human beings. Therefore, the data volume can be huge, an | ||||
| d the processing is optimized for the needs of automation in real time.</li> | ||||
| <li>Normalization and Unification: Telemetry aims to address the overa | ||||
| ll network automation needs. Efforts are made to normalize the data representati | ||||
| on and unify the protocols, so as to simplify data analysis and provide integrat | ||||
| ed analysis across heterogeneous devices and data sources across a network.</li> | ||||
| <li>Model-Based: Telemetry data is modeled in advance, which allows ap | ||||
| plications to configure and consume data with ease. </li> | ||||
| <li>Data Fusion: The data for a single application can come from multi | ||||
| ple data sources (e.g., cross-domain, cross-device, and cross-layer) that are ba | ||||
| sed on a common name/ID and need to be correlated to take effect.</li> | ||||
| <li>Dynamic and Interactive: Since the network telemetry means to be u | ||||
| sed in a closed control loop for network automation, it needs to run continuousl | ||||
| y and adapt to the dynamic and interactive queries from the network operation co | ||||
| ntroller. </li> | ||||
| </ul> | ||||
| <t>In addition, an ideal network telemetry solution may also have the fo | ||||
| llowing features or properties:</t> | ||||
| <ul spacing="normal"> | ||||
| <li>In-Network Customization: The data that is generated can be custom | ||||
| ized in network at runtime to cater to the specific need of applications. This n | ||||
| eeds the support of a programmable data plane, which allows probes with custom f | ||||
| unctions to be deployed at flexible locations. </li> | ||||
| <li>In-Network Data Aggregation and Correlation: Network devices and a | ||||
| ggregation points can work out which events and what data needs to be stored, re | ||||
| ported, or discarded, thus reducing the load on the central collection and proce | ||||
| ssing points while still ensuring that the right information is ready to be proc | ||||
| essed in a timely way.</li> | ||||
| <li>In-Network Processing: Sometimes it is not necessary or feasible t | ||||
| o gather all information to a central point to be processed and acted upon. It i | ||||
| s possible for the data processing to be done in network, allowing reactive acti | ||||
| ons to be taken locally.</li> | ||||
| <li>Direct Data Plane Export: The data originated from data plane forw | ||||
| arding chips can be directly exported to the data consumer for efficiency, espec | ||||
| ially when the data bandwidth is large and real-time processing is required. </l | ||||
| i> | ||||
| <li>In-Band Data Collection: In addition to the passive and active dat | ||||
| a collection approaches, the new hybrid approach allows to directly collect data | ||||
| for any target flow on its entire forwarding path <xref target="I-D.song-opsawg | ||||
| -ifit-framework" format="default"/>. </li> | ||||
| </ul> | ||||
| <t>It is worth noting that a network telemetry system should not be intr | ||||
| usive to normal network operations by avoiding the pitfall of the "observer effe | ||||
| ct". That is, it should not change the network behavior and affect the forwardin | ||||
| g performance. Moreover, high-volume telemetry traffic may cause network congest | ||||
| ion unless proper isolation or traffic engineering techniques are in place, or c | ||||
| ongestion control mechanisms ensure that telemetry traffic backs off if it excee | ||||
| ds the network capacity. <xref target="RFC8084" format="default"/> and <xref tar | ||||
| get="RFC8085" format="default"/> are relevant Best Current Practices (BCPs) in t | ||||
| his space.</t> | ||||
| <t>Although in many cases a system for network telemetry involves a remo | ||||
| te data collecting and consuming entity, it is important to understand that ther | ||||
| e are no inherent assumptions about how a system should be architected. While a | ||||
| network architecture with a centralized controller (e.g., SDN) seems to be a nat | ||||
| ural fit for network telemetry, network telemetry can work in distributed fashio | ||||
| ns as well. For example, telemetry data producers and consumers can have a peer | ||||
| -to-peer relationship, in which a network node can be the direct consumer of tel | ||||
| emetry data from other nodes. </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>The Necessity of a Network Telemetry Framework</name> | ||||
| <t>Network data analytics (e.g., machine learning) is applied for networ | ||||
| k operation automation, relying on abundant and coherent data from networks. Dat | ||||
| a acquisition that is limited to a single source and static in nature will in ma | ||||
| ny cases not be sufficient to meet an application's telemetry data needs. As a r | ||||
| esult, multiple data sources, involving a variety of techniques and standards, w | ||||
| ill need to be integrated. It is desirable to have a framework that classifies a | ||||
| nd organizes different telemetry data sources and types, defines different compo | ||||
| nents of a network telemetry system and their interactions, and helps coordinate | ||||
| and integrate multiple telemetry approaches across layers. This allows flexible | ||||
| combinations of data for different applications, while normalizing and simplify | ||||
| ing interfaces. In detail, such a framework would benefit the development of net | ||||
| work operation applications for the following reasons:</t> | ||||
| <ul spacing="normal"> | ||||
| <li>Future networks, autonomous or otherwise, depend on holistic and c | ||||
| omprehensive network visibility. Use cases and applications are better when supp | ||||
| orted uniformly and coherently using an integrated, converged mechanism and comm | ||||
| on telemetry data representations wherever feasible. Therefore, the protocols an | ||||
| d mechanisms should be consolidated into a minimum yet comprehensive set. A tele | ||||
| metry framework can help to normalize the technique developments.</li> | ||||
| <li>Network visibility presents multiple viewpoints. For example, the | ||||
| device viewpoint takes the network infrastructure as the monitoring object from | ||||
| which the network topology and device status can be acquired, and the traffic vi | ||||
| ewpoint takes the flows or packets as the monitoring object from which the traff | ||||
| ic quality and path can be acquired. An application may need to switch its viewp | ||||
| oint during operation. It may also need to correlate a service and its impact on | ||||
| user experience (UE) to acquire the comprehensive information.</li> | ||||
| <li>Applications require network telemetry to be elastic in order to m | ||||
| ake efficient use of network resources and reduce the impact of processing relat | ||||
| ed to network telemetry on network performance. For example, routine network mon | ||||
| itoring should cover the entire network with a low data sampling rate. Only when | ||||
| issues arise or critical trends emerge should telemetry data sources be modifie | ||||
| d and telemetry data rates be boosted as needed.</li> | ||||
| <li>Efficient data aggregation is critical for applications to reduce | ||||
| the overall quantity of data and improve the accuracy of analysis.</li> | ||||
| </ul> | ||||
| <t>A telemetry framework collects all the telemetry-related works from d | ||||
| ifferent sources and working groups within the IETF. This makes it possible to a | ||||
| ssemble a comprehensive network telemetry system and to avoid repetitious or red | ||||
| undant work. The framework should cover the concepts and components from the sta | ||||
| ndardization perspective. This document describes the modules that make up a net | ||||
| work telemetry framework and decomposes the telemetry system into a set of disti | ||||
| nct components that existing and future work can easily map to.</t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="framework" numbered="true" toc="default"> | ||||
| <name>Network Telemetry Framework</name> | ||||
| <t> The top-level network telemetry framework partitions the network telem | ||||
| etry into four modules based on the telemetry data object source and represents | ||||
| their relationship. Once the network operation applications acquire the data fro | ||||
| m these modules, they can apply data analytics and take actions. At the next lev | ||||
| el, the framework decomposes each module into separate components. Each of these | ||||
| modules follows the same underlying structure, with one component dedicated to | ||||
| the configuration of data subscriptions and data sources, a second component ded | ||||
| icated to encoding and exporting data, and a third component instrumenting the g | ||||
| eneration of telemetry related to the underlying resources. Throughout the frame | ||||
| work, the same set of abstract data-acquiring mechanisms and data types (<xref t | ||||
| arget="sec_type" format="default"/>) are applied. The two-level architecture wit | ||||
| h the uniform data abstraction helps accurately pinpoint a protocol or technique | ||||
| to its position in a network telemetry system or disaggregates a network teleme | ||||
| try system into manageable parts.</t> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Top-Level Modules</name> | ||||
| <t> Telemetry can be applied on the forwarding plane, control plane, and | ||||
| management plane in a network, as well as on other sources out of the network, | ||||
| as shown in <xref target="figure_1" format="default"/>. Therefore, we categorize | ||||
| the network telemetry into four distinct modules (management plane, control pla | ||||
| ne, forwarding plane, and external data and event telemetry) with each having it | ||||
| s own interface to network operation applications.</t> | ||||
| <figure anchor="figure_1"> | ||||
| <name>Modules in Layer Category of the Network Telemetry Framework</na | ||||
| me> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| +------------------------------+ | +------------------------------+ | |||
| | | | | | | |||
| | Network Operation |<-------+ | | Network Operation |<-------+ | |||
| | Applications | | | | Applications | | | |||
| | | | | | | | | |||
| +------------------------------+ | | +------------------------------+ | | |||
| ^ ^ ^ | | ^ ^ ^ | | |||
| | | | | | | | | | | |||
| V V | V | V V | V | |||
| +--------------+-----------|---+ +-----------+ | +--------------+-----------|---+ +-----------+ | |||
| skipping to change at line 262 ¶ | skipping to change at line 264 ¶ | |||
| | Management | ^ V | | Telemetry | | | Management | ^ V | | Telemetry | | |||
| | Plane +-------|-------+ | | | | Plane +-------|-------+ | | | |||
| | Telemetry | V | +-----------+ | | Telemetry | V | +-----------+ | |||
| | | Forwarding | | | | Forwarding | | |||
| | | Plane | | | | Plane | | |||
| | <---> | | | <---> | | |||
| | | Telemetry | | | | Telemetry | | |||
| | | | | | | | | |||
| +--------------+---------------+ | +--------------+---------------+ | |||
| ]]></artwork> | ]]></artwork> | |||
| </figure> | </figure> | |||
| </t> | <t>The rationale of this partition lies in the different telemetry data | |||
| <t>The rationale of this partition lies in the different telemetry data objects | objects that result in different data sources and export locations. Such differe | |||
| which result in different data source and export locations. Such differences hav | nces have profound implications on in-network data programming and processing ca | |||
| e profound implications on in-network data programming and processing capability | pability, data encoding and the transport protocol, and required data bandwidth | |||
| , data encoding and transport protocol, and required data bandwidth and latency. | and latency. Data can be sent directly or proxied via the control and management | |||
| Data can be sent directly, or proxied via the control and management planes. Th | planes. There are advantages/disadvantages to both approaches.</t> | |||
| ere are advantages/disadvantages to both approaches.</t> | <t>Note that in some cases, the network controller itself may be the sou | |||
| <t>Note that in some cases the network controller itself may be the source of te | rce of telemetry data that is unique to it or derived from the telemetry data co | |||
| lemetry data that is unique to it or derived from the telemetry data collected f | llected from the network elements. Some of the principles and taxonomy specific | |||
| rom the network elements. Some of the principles and taxonomy specific to the co | to the control plane and management plane telemetry could also be applied to the | |||
| ntrol plane and management plane telemetry could also be applied to the controll | controller when it is required to provide the telemetry data to network operati | |||
| er when it is required to provide the telemetry data to Network Operation Applic | on applications hosted outside. The scope of this document is focused on the net | |||
| ations hosted outside. The scope of the document is focused on the network eleme | work elements telemetry, and further details related to controllers are thus out | |||
| nts telemetry and further details related to controllers are thus out of scope. | of scope. </t> | |||
| </t> | <t>We summarize the major differences of the four modules in <xref targe | |||
| t="table_1"/>. They are compared from six angles:</t> | ||||
| <ul spacing="normal"> | ||||
| <li>Data Object</li> | ||||
| <li>Data Export Location</li> | ||||
| <li>Data Model</li> | ||||
| <li>Data Encoding</li> | ||||
| <li>Telemetry Application Protocol</li> | ||||
| <li>Data Transport Method</li> | ||||
| </ul> | ||||
| <t>Data Object is the target and source of each module. Because the data | ||||
| source varies, the location where data is mostly conveniently exported also var | ||||
| ies. For example, forwarding plane data mainly originates as data exported from | ||||
| the forwarding Application-Specific Integrated Circuits (ASICs), while control p | ||||
| lane data mainly originates from the protocol daemons running on the control CPU | ||||
| (s). For convenience and efficiency, it is preferred to export the data off the | ||||
| device from locations near the source. Because the locations that can export dat | ||||
| a have different capabilities, different choices of data models, encoding, and t | ||||
| ransport methods are made to balance the performance and cost. For example, the | ||||
| forwarding chip has high throughput but limited capacity for processing complex | ||||
| data and maintaining state, while the main control CPU is capable of complex dat | ||||
| a and state processing but has limited bandwidth for high throughput data. As a | ||||
| result, the suitable telemetry protocol for each module can be different. Some r | ||||
| epresentative techniques are shown in the corresponding table blocks to highligh | ||||
| t the technical diversity of these modules. Note that the selected techniques ju | ||||
| st reflect the de facto state of the art and are by no means exhaustive (e.g., I | ||||
| PFIX can also be implemented over TCP and SCTP, but that is not recommended for | ||||
| the forwarding plane). The key point is that one cannot expect to use a universa | ||||
| l protocol to cover all the network telemetry requirements. </t> | ||||
| <t>We summarize the major differences of the four modules in the following table | <table anchor="table_1"> | |||
| . They are compared from six angles:</t> | <name>Comparison of Data Object Modules</name> | |||
| <t> | <thead> | |||
| <list style="symbols"> | <tr> | |||
| <t>Data Object</t> | <th>Module</th> | |||
| <t>Data Export Location</t> | <th>Management Plane</th> | |||
| <t>Data Model</t> | <th>Control Plane</th> | |||
| <t>Data Encoding</t> | <th>Forwarding Plane</th> | |||
| <t>Telemetry Application Protocol</t> | <th>External Data</th> | |||
| <t>Data Transport Method</t> | </tr> | |||
| </list> | </thead> | |||
| </t> | <tbody> | |||
| <t>Data Object is the target and source of each module. Because the data source | <tr> | |||
| varies, the location where data is mostly conveniently exported also varies. For | <td>Object</td> | |||
| example, forwarding plane data mainly originates as data exported from the forw | <td>configuration and operation state</td> | |||
| arding Application-Specific Integrated Circuits (ASICs), while control plane dat | <td>control protocol and signaling, RIB</td> | |||
| a mainly originates from the protocol daemons running on the control CPU(s). For | <td>flow and packet QoS, traffic stat., buffer and queue stat., FIB, Acces | |||
| convenience and efficiency, it is preferred to export the data off the device f | s Control List (ACL)</td> | |||
| rom locations near the source. Because the locations that can export data have d | <td>terminal, social, and environmental</td> | |||
| ifferent capabilities, different choices of data model, encoding, and transport | </tr> | |||
| method are made to balance the performance and cost. For example, the forwarding | <tr> | |||
| chip has high throughput but limited capacity for processing complex data and m | <td>Export Location</td> | |||
| aintaining state, while the main control CPU is capable of complex data and stat | <td>main control CPU</td> | |||
| e processing, but has limited bandwidth for high throughput data. As a result, t | <td>main control CPU, linecard CPU, or forwarding chip</td> | |||
| he suitable telemetry protocol for each module can be different. Some representa | <td>forwarding chip or linecard CPU; main control CPU unlikely</td> | |||
| tive techniques are shown in the corresponding table blocks to highlight the tec | <td>various</td> | |||
| hnical diversity of these modules. Note that the selected techniques just reflec | </tr> | |||
| t the de facto state of the art and are by no means exhaustive (e.g., IPFIX can | <tr> | |||
| also be implemented over TCP and SCTP, but that is not recommended for forwardin | <td>Data Model</td> | |||
| g plane). The key point is that one cannot expect to use a universal protocol to | <td>YANG, MIB, syslog</td> | |||
| cover all the network telemetry requirements. </t> | <td>YANG, custom</td> | |||
| <t> | <td>YANG, custom</td> | |||
| <figure anchor="figure_2" title="Comparison of the Data Object Modules"> | <td>YANG, custom</td> | |||
| <artwork><![CDATA[ | </tr> | |||
| +-----------+-------------+-------------+--------------+----------+ | <tr> | |||
| | Module |Management |Control |Forwarding |External | | <td>Data Encoding</td> | |||
| | |Plane |Plane |Plane |Data | | <td>GPB, JSON, XML</td> | |||
| +-----------+-------------+-------------+--------------+----------+ | <td>GPB, JSON, XML, plain text</td> | |||
| |Object |config. & |control |flow & packet |terminal, | | <td>plain text</td> | |||
| | |operation |protocol & |QoS, traffic |social & | | <td>GPB, JSON, XML, plain text</td> | |||
| | |state |signaling, |stat., buffer |environ- | | </tr> | |||
| | | |RIB |& queue stat.,|mental | | <tr> | |||
| | | | |ACL, FIB | | | <td>Application Protocol</td> | |||
| +-----------+-------------+-------------+--------------+----------+ | <td>gRPC, NETCONF, RESTCONF</td> | |||
| |Export |main control |main control |fwding chip |various | | <td>gRPC, NETCONF, IPFIX, traffic mirroring</td> | |||
| |Location |CPU |CPU, |or linecard | | | <td>IPFIX, traffic mirroring, gRPC, NETFLOW</td> | |||
| | | |linecard CPU |CPU; main | | | <td>gRPC</td> | |||
| | | |or forwarding|control CPU | | | </tr> | |||
| | | |chip |unlikely | | | <tr> | |||
| +-----------+-------------+-------------+--------------+----------+ | <td>Data Transport</td> | |||
| |Data |YANG, MIB, |YANG, |YANG |YANG, | | <td>HTTP(S), TCP</td> | |||
| |Model |syslog |custom |custom, |custom | | <td>HTTP(S), TCP, UDP</td> | |||
| +-----------+-------------+-------------+--------------+----------+ | <td>UDP</td> | |||
| |Data |GPB, JSON, |GPB, JSON, |plain text |GPB, JSON | | <td>HTTP(S), TCP, UDP</td> | |||
| |Encoding |XML |XML, | |XML, plain| | </tr> | |||
| | | |plain text | |text | | </tbody> | |||
| +-----------+-------------+-------------+--------------+----------+ | </table> | |||
| |Application|gRPC,NETCONF,|gRPC,NETCONF,|IPFIX, traffic|gRPC | | ||||
| |Protocol |RESTCONF |IPFIX,traffic|mirroring, | | | ||||
| | | |mirroring |gRPC, NETFLOW | | | ||||
| +-----------+-------------+-------------+--------------+----------+ | ||||
| |Data |HTTP(S), TCP |HTTP(S), TCP,|UDP |HTTP(S), | | ||||
| |Transport | |UDP | |TCP, UDP | | ||||
| +-----------+-------------+-------------+--------------+----------+ | ||||
| ]]> | ||||
| </artwork> | ||||
| </figure> | ||||
| </t> | ||||
| <t>Note that the interaction with the applications that consume network telemetr | ||||
| y data can be indirect. Some in-device data transfer is possible. For example, i | ||||
| n the management plane telemetry, the management plane will need to acquire data | ||||
| from the data plane. Some operational states can only be derived from data plan | ||||
| e data sources such as the interface status and statistics. As another example, | ||||
| obtaining control plane telemetry data may require the ability to access the For | ||||
| warding Information Base (FIB) of the data plane.</t> | ||||
| <t>On the other hand, an application may involve more than one plane and interac | ||||
| t with multiple planes simultaneously. For example, an SLA compliance applicatio | ||||
| n may require both the data plane telemetry and the control plane telemetry.</t> | ||||
| <t>The requirements and challenges for each module are summarized as follows (no | ||||
| te that the requirements may pertain across all telemetry modules; however, we e | ||||
| mphasize those that are most pronounced for a particular plane).</t> | ||||
| <section title="Management Plane Telemetry"> | ||||
| <t>The management plane of network elements interacts with the Network Managemen | ||||
| t System (NMS), and provides information such as performance data, network loggi | ||||
| ng data, network warning and defects data, and network statistics and state data | ||||
| . The management plane includes many protocols, including the classical SNMP and | ||||
| syslog. Regardless the protocol, management plane telemetry must address the fo | ||||
| llowing requirements:</t> | ||||
| <t> | ||||
| <list style="symbols"> | ||||
| <t>Convenient Data Subscription: An application should have the freedom to choos | ||||
| e which data is exported (see section 4.3) and the means and frequency of how th | ||||
| at data is exported (e.g., on-change or periodic subscription).</t> | ||||
| <t>Structured Data: For automatic network operation, machines will replace human | ||||
| for network data comprehension. Data modeling languages, such as YANG, can effi | ||||
| ciently describe structured data and normalize data encoding and transformation. | ||||
| </t> | ||||
| <t>High Speed Data Transport: In order to keep up with the velocity of informati | ||||
| on, a data source needs to be able to send large amounts of data at high frequen | ||||
| cy. Compact encoding formats or data compression schemes are needed to reduce th | ||||
| e quantity of data and improve the data transport efficiency. The subscription m | ||||
| ode, by replacing the query mode, reduces the interactions between clients and s | ||||
| ervers and helps to improve the data source's efficiency.</t> | ||||
| <t>Network Congestion Avoidance: The application must protect the network from c | ||||
| ongestion by congestion control mechanisms or at least circuit breakers. <xref t | ||||
| arget="RFC8084" /> and <xref target="RFC8085" /> provide some solutions in this | ||||
| space.</t> | ||||
| </list> | ||||
| </t> | ||||
| </section> | ||||
| <section title="Control Plane Telemetry"> | ||||
| <t>The control plane telemetry refers to the health condition monitoring of diff | ||||
| erent network control protocols at all layers of the protocol stack. Keeping tra | ||||
| ck of the operational status of these protocols is beneficial for detecting, loc | ||||
| alizing, and even predicting various network issues, as well as network optimiza | ||||
| tion, in real-time and with fine granularity. Some particular challenges and iss | ||||
| ues faced by the control plane telemetry are as follows: </t> | ||||
| <t> | ||||
| <list style="symbols"> | ||||
| <t>One challenging problem for the control plane telemetry is how to correlate t | ||||
| he End-to-End (E2E) Key Performance Indicators (KPI) to a specific layer's KPIs. | ||||
| For example, IPTV users may describe their User Experience (UE) by the video sm | ||||
| oothness and definition. Then in case of an unusually poor UE KPI or a service d | ||||
| isconnection, it is non-trivial to delimit and pinpoint the issue in the respons | ||||
| ible protocol layer (e.g., the Transport Layer or the Network Layer), the respon | ||||
| sible protocol (e.g., ISIS or BGP at the Network Layer), and finally the respons | ||||
| ible device(s) with specific reasons. </t> | ||||
| <t> Conventional OAM-based approaches for control plane KPI measurement include | ||||
| Ping (L3), Traceroute (L3), <xref target="y1731">Y.1731</xref> (L2), and so on. | ||||
| One common issue behind these methods is that they only measure the KPIs instead | ||||
| of reflecting the actual running status of these protocols, making them less ef | ||||
| fective or efficient for control plane troubleshooting and network optimization. | ||||
| </t> | ||||
| <t> An example of the control plane telemetry is the BGP monitoring protocol (BM | ||||
| P). It is currently used for monitoring the BGP routes and enables rich applicat | ||||
| ions, such as BGP peer analysis, AS analysis, prefix analysis, and security anal | ||||
| ysis. However, the monitoring of other layers, protocols and the cross-layer, cr | ||||
| oss-protocol KPI correlations are still in their infancy (e.g., IGP monitoring i | ||||
| s not as extensive as BMP), which require further research. </t> | ||||
| <t> The requirement and solutions for network congestion avoidance are also appl | ||||
| icable to the control plane telemetry. </t> | ||||
| </list> | ||||
| </t> | ||||
| </section> | ||||
| <section title="Forwarding Plane Telemetry"> | ||||
| <t>An effective forwarding plane telemetry system relies on the data that the ne | ||||
| twork device can expose. The quality, quantity, and timeliness of data must meet | ||||
| some stringent requirements. This raises some challenges to the network data pl | ||||
| ane devices where the first-hand data originates.</t> | ||||
| <t> | ||||
| <list style="symbols"> | ||||
| <t>A data plane device's main function is user traffic processing and forwarding | ||||
| . While supporting network visibility is important, the telemetry is just an aux | ||||
| iliary function, and it should strive to not impede normal traffic processing an | ||||
| d forwarding (i.e., the forwarding behavior should not be altered and the trade- | ||||
| off between forwarding performance and telemetry should be well-balanced).</t> | ||||
| <t>Network operation applications require end-to-end visibility across various s | ||||
| ources, which can result in a huge volume of data. However, the sheer quantity o | ||||
| f data must not exhaust the network bandwidth, regardless of the data delivery a | ||||
| pproach (i.e., whether through in-band or out-of-band channels).</t> | ||||
| <t>The data plane devices must provide timely data with the minimum possible del | ||||
| ay. Long processing, transport, storage, and analysis delay can impact the effec | ||||
| tiveness of the control loop and even render the data useless.</t> | ||||
| <t>The data should be structured and labeled, and easy for applications to parse | ||||
| and consume. At the same time, the data types needed by applications can vary s | ||||
| ignificantly. The data plane devices need to provide enough flexibility and prog | ||||
| rammability to support the precise data provision for applications.</t> | ||||
| <t>The data plane telemetry should support incremental deployment and work even | ||||
| though some devices are unaware of the system.</t> | ||||
| <t>The requirement and solutions for network congestion avoidance are also appli | ||||
| cable to the forwarding plane telemetry.</t> | ||||
| </list> | ||||
| </t> | ||||
| <t>Although not specific to the forwarding plane, these challenges are more diff | ||||
| icult to the forwarding plane because of the limited resource and flexibility. D | ||||
| ata plane programmability is essential to support network telemetry. Newer data | ||||
| plane forwarding chips are equipped with advanced telemetry features and provide | ||||
| flexibility to support customized telemetry functions. </t> | ||||
| <t>Technique Taxonomy: concerning about how one instruments the telemetry, there | <t>Note that the interaction with the applications that consume network | |||
| can be multiple possible dimensions to classify the forwarding plane telemetry | telemetry data can be indirect. Some in-device data transfer is possible. For ex | |||
| techniques.</t> | ample, in the management plane telemetry, the management plane will need to acqu | |||
| <t> | ire data from the data plane. Some operational states can only be derived from d | |||
| <list style="symbols"> | ata plane data sources such as the interface status and statistics. As another e | |||
| <t> Active, Passive, and Hybrid: This dimension concerns about the end-to-end me | xample, obtaining control plane telemetry data may require the ability to access | |||
| asurement. Active and passive methods (as well as the hybrid types) are well doc | the Forwarding Information Base (FIB) of the data plane.</t> | |||
| umented in <xref target="RFC7799"/>. Passive methods include TCPDUMP, <xref targ | <t>On the other hand, an application may involve more than one plane and | |||
| et="RFC7011">IPFIX</xref>, sFlow, and traffic mirroring. These methods usually h | interact with multiple planes simultaneously. For example, an SLA compliance ap | |||
| ave low data coverage. The bandwidth cost is very high in order to improve the d | plication may require both the data plane telemetry and the control plane teleme | |||
| ata coverage. On the other hand, active methods include Ping, <xref target="RFC4 | try.</t> | |||
| 656">OWAMP</xref>, <xref target="RFC5357">TWAMP</xref>, <xref target="RFC8762">S | <t>The requirements and challenges for each module are summarized as fol | |||
| TAMP</xref>, and <xref target="RFC6812">Cisco's SLA Protocol</xref>. These metho | lows (note that the requirements may pertain across all telemetry modules; howev | |||
| ds are intrusive and only provide indirect network measurements. Hybrid methods, | er, we emphasize those that are most pronounced for a particular plane).</t> | |||
| including <xref target="I-D.ietf-ippm-ioam-data">in-situ OAM</xref>, <xref targ | <section numbered="true" toc="default"> | |||
| et="RFC8321">Alternate-Marking (AM)</xref>, and <xref target="RFC8889">Multipoin | <name>Management Plane Telemetry</name> | |||
| t Alternate Marking</xref>, provide a well-balanced and more flexible approach. | <t>The management plane of network elements interacts with the Network | |||
| However, these methods are also more complex to implement.</t> | Management System (NMS) and provides information such as performance data, netw | |||
| <t> In-Band and Out-of-Band: Telemetry data carried in user packets before being | ork logging data, network warning and defects data, and network statistics and s | |||
| exported to a data collector is considered in-band (e.g., <xref target="I-D.iet | tate data. The management plane includes many protocols, including the classical | |||
| f-ippm-ioam-data">in-situ OAM</xref>). Telemetry data that is directly exported | SNMP and syslog. Regardless the protocol, management plane telemetry must addre | |||
| to a data collector without modifying user packets is considered out-of-band (e. | ss the following requirements:</t> | |||
| g., the postcard-based approach described in <xref target="pbt" />). It is also | <ul spacing="normal"> | |||
| possible to have hybrid methods, where only the telemetry instruction or partial | <li>Convenient Data Subscription: An application should have the fre | |||
| data is carried by user packets (e.g., <xref target="RFC8321">AM</xref>). </t> | edom to choose which data is exported (see <xref target="sec_type" format="defau | |||
| <t> End-to-End and In-Network: End-to-End methods start from, and end at, the ne | lt"/>) and the means and frequency of how that data is exported (e.g., on-change | |||
| twork end hosts (e.g., Ping). In-Network methods work in networks and are transp | or periodic subscription).</li> | |||
| arent to end hosts. However, if needed, In-Network methods can be easily extende | <li>Structured Data: For automatic network operation, machines will | |||
| d into end hosts. </t> | replace humans for network data comprehension. Data modeling languages, such as | |||
| <t> Data Subject: Depending on the telemetry objective, the methods can be flow- | YANG, can efficiently describe structured data and normalize data encoding and t | |||
| based (e.g., <xref target="I-D.ietf-ippm-ioam-data">in-situ OAM</xref>), path-ba | ransformation.</li> | |||
| sed (e.g., Traceroute), and node-based (e.g., <xref target="RFC7011">IPFIX</xref | <li>High-Speed Data Transport: In order to keep up with the velocity | |||
| >). The various data objects can be packet, flow record, measurement, states, an | of information, a data source needs to be able to send large amounts of data at | |||
| d signal.</t> | high frequency. Compact encoding formats or data compression schemes are needed | |||
| </list> | to reduce the quantity of data and improve the data transport efficiency. The s | |||
| </t> | ubscription mode, by replacing the query mode, reduces the interactions between | |||
| </section> | clients and servers and helps to improve the data source's efficiency.</li> | |||
| <section title="External Data Telemetry"> | ||||
| <t>Events that occur outside the boundaries of the network system are another im | <li>Network Congestion Avoidance: The application must protect the | |||
| portant source of network telemetry. Correlating both internal telemetry data an | network from congestion with congestion control mechanisms or, | |||
| d external events with the requirements of network systems, as presented in <xre | at minimum, with circuit breakers. <xref target="RFC8084" format="default"/> | |||
| f target="I-D.pedro-nmrg-anticipated-adaptation"/>, provides a strategic and fun | and <xref target="RFC8085" format="default"/> provide some solutions in this spa | |||
| ctional advantage to management operations. </t> | ce.</li> | |||
| <t>As with other sources of telemetry information, the data and events must meet | </ul> | |||
| strict requirements, especially in terms of timeliness, which is essential to p | </section> | |||
| roperly incorporate external event information into network management applicati | <section numbered="true" toc="default"> | |||
| ons. The specific challenges are described as follows:</t> | <name>Control Plane Telemetry</name> | |||
| <t> | <t>The control plane telemetry refers to the health condition monitori | |||
| <list style="symbols"> | ng of different network control protocols at all layers of the protocol stack. K | |||
| <t>The role of the external event detector can be played by multiple elements, i | eeping track of the operational status of these protocols is beneficial for dete | |||
| ncluding hardware (e.g., physical sensors, such as seismometers) and software (e | cting, localizing, and even predicting various network issues, as well as for ne | |||
| .g., Big Data sources that can analyze streams of information, such as Twitter m | twork optimization, in real time and with fine granularity. Some particular chal | |||
| essages). Thus, the transmitted data must support different shapes but, at the s | lenges and issues faced by the control plane telemetry are as follows: </t> | |||
| ame time, follow a common but extensible schema. </t> | ||||
| <t>Since the main function of the external event detectors is to perform the not | <ul spacing="normal"> | |||
| ifications, their timeliness is assumed. However, once messages have been dispat | <li>How to correlate the End-to-End (E2E) Key Performance Indicators | |||
| ched, they must be quickly collected and inserted into the control plane with va | (KPIs) to a specific layer's KPIs. For example, IPTV users may describe their U | |||
| riable priority, which is higher for important sources and events and lower for | E by the video smoothness and definition. Then in case of an unusually poor UE K | |||
| secondary ones. </t> | PI or a service disconnection, it is non-trivial to delimit and pinpoint the iss | |||
| <t>The schema used by external detectors must be easily adopted by current and f | ue in the responsible protocol layer (e.g., the transport layer or the network l | |||
| uture devices and applications. Therefore, it must be easily mapped to current d | ayer), the responsible protocol (e.g., IS-IS or BGP at the network layer), and f | |||
| ata models, such as in terms of YANG. </t> | inally the responsible device(s) with specific reasons. </li> | |||
| <t>As the communication with external entities outside the boundary of a provide | <li> Conventional OAM-based approaches for control plane KPI measure | |||
| r network may be realized over the Internet, the risk of congestion is even more | ment, which include Ping (L3), Traceroute (L3), <xref target="y1731" format="def | |||
| relevant in this context and proper counter-measures must be taken. Solutions s | ault">Y.1731</xref> (L2), and so on. One common issue behind these methods is th | |||
| uch as network transport circuit breakers are needed as well.</t> | at they only measure the KPIs instead of reflecting the actual running status of | |||
| </list> | these protocols, making them less effective or efficient for control plane trou | |||
| </t> | bleshooting and network optimization. </li> | |||
| <t>Organizing both internal and external telemetry information together will be | <li> How more research is needed for the BGP monitoring protocol (BM | |||
| key for the general exploitation of the management possibilities of current and | P). BMP is an example of the control plane telemetry; it is currently used for m | |||
| future network systems, as reflected in the incorporation of cognitive capabilit | onitoring BGP routes and enables rich applications, such as BGP peer analysis, A | |||
| ies to new hardware and software (virtual) elements. </t> | utonomous System (AS) analysis, prefix analysis, and security analysis. However, | |||
| </section> | the monitoring of other layers, protocols, and the cross-layer, cross-protocol | |||
| </section> | KPI correlations are still in their infancy (e.g., IGP monitoring is not as exte | |||
| <section title="Second Level Function Components"> | nsive as BMP), which requires further research. </li> | |||
| <t>The telemetry module at each plane can be further partitioned into five disti | </ul> | |||
| nct conceptual components:</t> | <t> Note that the requirement and solutions for network congest | |||
| <t> | ion avoidance are also applicable to the control plane telemetry. </t> | |||
| <list style="symbols"> | </section> | |||
| <t> Data Query, Analysis, and Storage: This component works at the network opera | <section numbered="true" toc="default"> | |||
| tion application block in <xref target="figure_1"/>. It is normally a part of th | <name>Forwarding Plane Telemetry</name> | |||
| e network management system at the receiver side. On the one hand, it is respons | <t>An effective forwarding plane telemetry system relies on the data t | |||
| ible for issuing data requirements. The data of interest can be modeled data thr | hat the network device can expose. The quality, quantity, and timeliness of data | |||
| ough configuration or custom data through programming. The data requirements can | must meet some stringent requirements. This raises some challenges for the netw | |||
| be queries for one-shot data or subscriptions for events or streaming data. On | ork data plane devices where the first-hand data originates.</t> | |||
| the other hand, it receives, stores, and processes the returned data from networ | <ul spacing="normal"> | |||
| k devices. Data analysis can be interactive to initiate further data queries. Th | <li>A data plane device's main function is user traffic processing a | |||
| is component can reside in either network devices or remote controllers. It can | nd forwarding. While supporting network visibility is important, the telemetry i | |||
| be centralized and distributed, and involve one or more instances.</t> | s just an auxiliary function, and it should strive to not impede normal traffic | |||
| <t> Data Configuration and Subscription: This component manages data queries on | processing and forwarding (i.e., the forwarding behavior should not be altered, | |||
| devices. It determines the protocol and channel for applications to acquire desi | and the trade-off between forwarding performance and telemetry should be well-ba | |||
| red data. This component is also responsible for configuring the desired data th | lanced).</li> | |||
| at might not be directly available from data sources. The subscription data can | <li>Network operation applications require end-to-end visibility acr | |||
| be described by models, templates, or programs. </t> | oss various sources, which can result in a huge volume of data. However, the she | |||
| <t> Data Encoding and Export: This component determines how telemetry data is de | er quantity of data must not exhaust the network bandwidth, regardless of the da | |||
| livered to the data analysis and storage component with access control. The data | ta delivery approach (i.e., whether through in-band or out-of-band channels).</l | |||
| encoding and the transport protocol may vary due to the data export location.</ | i> | |||
| t> | <li>The data plane devices must provide timely data with the minimum | |||
| <t> Data Generation and Processing: The requested data needs to be captured, fil | possible delay. Long processing, transport, storage, and analysis delay can imp | |||
| tered, processed, and formatted in network devices from raw data sources. This m | act the effectiveness of the control loop and even render the data useless.</li> | |||
| ay involve in-network computing and processing on either the fast path or the sl | <li>The data should be structured, labeled, and easy for application | |||
| ow path in network devices.</t> | s to parse and consume. At the same time, the data types needed by applications | |||
| <t> Data Object and Source: This component determines the monitoring objects and | can vary significantly. The data plane devices need to provide enough flexibilit | |||
| original data sources provisioned in the device. A data source usually just pro | y and programmability to support the precise data provision for applications.</l | |||
| vides raw data which needs further processing. Each data source can be considere | i> | |||
| d a probe. Some data sources can be dynamically installed, while others will be | <li>The data plane telemetry should support incremental deployment a | |||
| more static.</t> | nd work even though some devices are unaware of the system.</li> | |||
| </list> | <li>The requirement and solutions for network congestion avoidance a | |||
| </t> | re also applicable to the forwarding plane telemetry.</li> | |||
| <t> | </ul> | |||
| <figure anchor="figure_3" title="Components in the Network Telemetry Framework"> | <t>Although not specific to the forwarding plane, these challenges are | |||
| <artwork><![CDATA[ | more difficult for the forwarding plane because of the limited resources and fl | |||
| exibility. Data plane programmability is essential to support network telemetry. | ||||
| Newer data plane forwarding chips are equipped with advanced telemetry features | ||||
| and provide flexibility to support customized telemetry functions. </t> | ||||
| <t>Technique Taxonomy: This pertains to how one instruments the teleme | ||||
| try; there can be multiple possible dimensions to classify the forwarding plane | ||||
| telemetry techniques.</t> | ||||
| <ul spacing="normal"> | ||||
| <li> Active, Passive, and Hybrid: This dimension pertains to the end | ||||
| -to-end measurement. Active and passive methods (as well as the hybrid types) ar | ||||
| e well documented in <xref target="RFC7799" format="default"/>. Passive methods | ||||
| include TCPDUMP, <xref target="RFC7011" format="default">IPFIX</xref>, sFlow, an | ||||
| d traffic mirroring. These methods usually have low data coverage. The bandwidth | ||||
| cost is very high in order to improve the data coverage. On the other hand, act | ||||
| ive methods include Ping, the <xref target="RFC4656" format="default">One-Way Ac | ||||
| tive Measurement Protocol (OWAMP)</xref>, the <xref target="RFC5357" format="def | ||||
| ault">Two-Way Active Measurement Protocol (TWAMP)</xref>, the <xref target="RFC8 | ||||
| 762" format="default">Simple Two-way Active Measurement Protocol (STAMP)</xref>, | ||||
| and <xref target="RFC6812" format="default">Cisco's SLA Protocol</xref>. These | ||||
| methods are intrusive and only provide indirect network measurements. Hybrid met | ||||
| hods, including <xref target="RFC9197" format="default">IOAM</xref>, <xref targe | ||||
| t="RFC8321" format="default">Alternate Marking (AM)</xref>, and <xref target="RF | ||||
| C8889" format="default">Multipoint Alternate Marking</xref>, provide a well-bala | ||||
| nced and more flexible approach. However, these methods are also more complex to | ||||
| implement.</li> | ||||
| <li> In-Band and Out-of-Band: Telemetry data carried in user packets | ||||
| before being exported to a data collector is considered in-band (e.g., <xref ta | ||||
| rget="RFC9197" format="default">IOAM</xref>). Telemetry data that is directly ex | ||||
| ported to a data collector without modifying user packets is considered out-of-b | ||||
| and (e.g., the postcard-based approach described in <xref target="pbt" format="d | ||||
| efault"/>). It is also possible to have hybrid methods, where only the telemetry | ||||
| instruction or partial data is carried by user packets (e.g., <xref target="RFC | ||||
| 8321" format="default">AM</xref>). </li> | ||||
| <li> End-to-End and In-Network: End-to-end methods start from, and e | ||||
| nd at, the network end hosts (e.g., Ping). In-network methods work in networks a | ||||
| nd are transparent to end hosts. However, if needed, in-network methods can be e | ||||
| asily extended into end hosts. </li> | ||||
| <li> Data Subject: Depending on the telemetry objective, the methods | ||||
| can be flow based (e.g., <xref target="RFC9197" format="default">IOAM</xref>), | ||||
| path based (e.g., Traceroute), and node based (e.g., <xref target="RFC7011" form | ||||
| at="default">IPFIX</xref>). The various data objects can be packet, flow record, | ||||
| measurement, states, and signal.</li> | ||||
| </ul> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>External Data Telemetry</name> | ||||
| <t>Events that occur outside the boundaries of the network system are | ||||
| another important source of network telemetry. Correlating both internal telemet | ||||
| ry data and external events with the requirements of network systems, as present | ||||
| ed in <xref target="NMRG-ANTICIPATED-ADAPTATION" format="default"/>, provides a | ||||
| strategic and functional advantage to management operations. </t> | ||||
| <t>As with other sources of telemetry information, the data and events | ||||
| must meet strict requirements, especially in terms of timeliness, which is esse | ||||
| ntial to properly incorporate external event information into network management | ||||
| applications. The specific challenges are described as follows:</t> | ||||
| <ul spacing="normal"> | ||||
| <li>The role of the external event detector can be played by multipl | ||||
| e elements, including hardware (e.g., physical sensors, such as seismometers) an | ||||
| d software (e.g., big data sources that can analyze streams of information, such | ||||
| as Twitter messages). Thus, the transmitted data must support different shapes | ||||
| but, at the same time, follow a common but extensible schema. </li> | ||||
| <li>Since the main function of the external event detectors is to pe | ||||
| rform the notifications, their timeliness is assumed. However, once messages hav | ||||
| e been dispatched, they must be quickly collected and inserted into the control | ||||
| plane with variable priority, which is higher for important sources and events a | ||||
| nd lower for secondary ones. </li> | ||||
| <li>The schema used by external detectors must be easily adopted by | ||||
| current and future devices and applications. Therefore, it must be easily mapped | ||||
| to current data models, such as in terms of YANG. </li> | ||||
| <li>As the communication with external entities outside the boundary | ||||
| of a provider network may be realized over the Internet, the risk of congestion | ||||
| is even more relevant in this context and proper countermeasures must be taken. | ||||
| Solutions such as network transport circuit breakers are needed as well.</li> | ||||
| </ul> | ||||
| <t>Organizing both internal and external telemetry information togethe | ||||
| r will be key for the general exploitation of the management possibilities of cu | ||||
| rrent and future network systems, as reflected in the incorporation of cognitive | ||||
| capabilities to new hardware and software (virtual) elements. </t> | ||||
| </section> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Second-Level Function Components</name> | ||||
| <t>The telemetry module at each plane can be further partitioned into fi | ||||
| ve distinct conceptual components:</t> | ||||
| <ul spacing="normal"> | ||||
| <li> Data Query, Analysis, and Storage: This component works at the ne | ||||
| twork operation application block in <xref target="figure_1" format="default"/>. | ||||
| It is normally a part of the network management system at the receiver side. On | ||||
| one hand, it is responsible for issuing data requirements. The data of interest | ||||
| can be modeled data through configuration or custom data through programming. T | ||||
| he data requirements can be queries for one-shot data or subscriptions for event | ||||
| s or streaming data. On the other hand, it receives, stores, and processes the r | ||||
| eturned data from network devices. Data analysis can be interactive to initiate | ||||
| further data queries. This component can reside in either network devices or rem | ||||
| ote controllers. It can be centralized and distributed and involve one or more i | ||||
| nstances.</li> | ||||
| <li> Data Configuration and Subscription: This component manages data | ||||
| queries on devices. It determines the protocol and channel for applications to a | ||||
| cquire desired data. This component is also responsible for configuring the desi | ||||
| red data that might not be directly available from data sources. The subscriptio | ||||
| n data can be described by models, templates, or programs. </li> | ||||
| <li> Data Encoding and Export: This component determines how telemetry | ||||
| data is delivered to the data analysis and storage component with access contro | ||||
| l. The data encoding and the transport protocol may vary due to the data export | ||||
| location.</li> | ||||
| <li> Data Generation and Processing: The requested data needs to be ca | ||||
| ptured, filtered, processed, and formatted in network devices from raw data sour | ||||
| ces. This may involve in-network computing and processing on either the fast pat | ||||
| h or the slow path in network devices.</li> | ||||
| <li> Data Object and Source: This component determines the monitoring | ||||
| objects and original data sources provisioned in the device. A data source usual | ||||
| ly just provides raw data that needs further processing. Each data source can be | ||||
| considered a probe. Some data sources can be dynamically installed, while other | ||||
| s will be more static.</li> | ||||
| </ul> | ||||
| <figure anchor="figure_3"> | ||||
| <name>Components in the Network Telemetry Framework</name> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| +----------------------------------------+ | +----------------------------------------+ | |||
| +----------------------------------------+ | | +----------------------------------------+ | | |||
| | | | | | | | | |||
| | Data Query, Analysis, & Storage | | | | Data Query, Analysis, & Storage | | | |||
| | | + | | | + | |||
| +-------+++ -----------------------------+ | +-------+++ -----------------------------+ | |||
| ||| ^^^ | ||| ^^^ | |||
| ||| ||| | ||| ||| | |||
| ||V ||| | ||V ||| | |||
| +--+V--------------------+++------------+ | +--+V--------------------+++------------+ | |||
| +-----V---------------------+------------+ | | +-----V---------------------+------------+ | | |||
| +---------------------+-------+----------+ | | | +---------------------+-------+----------+ | | | |||
| | Data Configuration | | | | | | Data Configuration | | | | | |||
| | & Subscription | Data Encoding | | | | | & Subscription | Data Encoding | | | | |||
| | (model, template, | & Export | | | | | (model, template, | & Export | | | | |||
| | & program) | | | | | | & program) | | | | | |||
| +---------------------+------------------| | | | +---------------------+------------------| | | | |||
| | | | | | | | | | | |||
| | Data Generation | | | | | Data Generation | | | | |||
| | & Processing | | | | | & Processing | | | | |||
| | | | | | | | | | | |||
| +----------------------------------------| | | | +----------------------------------------| | | | |||
| | | | | | | | | | | |||
| | Data Object and Source | |-+ | | Data Object and Source | |-+ | |||
| | |-+ | | |-+ | |||
| +----------------------------------------+ | +----------------------------------------+ | |||
| ]]></artwork> | ||||
| ]]> | </figure> | |||
| </artwork> | </section> | |||
| </figure> | <section anchor="sec_type" numbered="true" toc="default"> | |||
| </t> | <name>Data Acquisition Mechanism and Type Abstraction</name> | |||
| </section> | <t>Broadly speaking, network data can be acquired through subscription ( | |||
| <section anchor="sec:type" title="Data Acquisition Mechanism and Type Abstractio | push) and query (poll). A subscription is a contract between publisher and subsc | |||
| n"> | riber. After initial setup, the subscribed data is automatically delivered to re | |||
| <t>Broadly speaking, network data can be acquired through subscription (push) an | gistered subscribers until the subscription expires. | |||
| d query (poll). A subscription is a contract between publisher and subscriber. A | There are two variations of subscription. The subscriptions can be predef | |||
| fter initial setup, the subscribed data is automatically delivered to registered | ined, or the subscribers are allowed to configure and tailor the published data | |||
| subscribers until the subscription expires. | to their specific needs.</t> | |||
| There are two variations of subscription. The subscriptions can be either pre-de | <t>In contrast, queries are used when a client expects immediate and one | |||
| fined, or the subscribers are allowed to configure and tailor the published data | -off feedback from network devices. The queried data may be directly extracted f | |||
| to their specific needs.</t> | rom some specific data source or synthesized and processed from raw data. Querie | |||
| <t>In contrast, queries are used when a client expects immediate and one-off fee | s work well for interactive network telemetry applications. </t> | |||
| dback from network devices. The queried data may be directly extracted from some | <t>In general, data can be pulled (i.e., queried) whenever needed, but i | |||
| specific data source, or synthesized and processed from raw data. Queries work | n many cases, pushing the data (i.e., subscription) is more efficient, and it ca | |||
| well for interactive network telemetry applications. </t> | n reduce the latency of a client detecting a change. From the data consumer poin | |||
| <t>In general, data can be pulled (i.e., queried) whenever needed, but in many c | t of view, there are four types of data from network devices that a telemetry da | |||
| ases, pushing the data (i.e., subscription) is more efficient, and can reduce th | ta consumer can subscribe or query:</t> | |||
| e latency of a client detecting a change. From the data consumer point of view, | <ul spacing="normal"> | |||
| there are four types of data from network devices that a telemetry data consumer | <li> Simple Data: Data that are steadily available from some datastore | |||
| can subscribe or query:</t> | or static probes in network devices.</li> | |||
| <t> | <li> Derived Data: Data that need to be synthesized or processed in th | |||
| <list style="symbols"> | e network from raw data from one or more network devices. The data processing fu | |||
| <t> Simple Data: The data that are steadily available from some datastore or sta | nction can be statically or dynamically loaded into network devices.</li> | |||
| tic probes in network devices.</t> | <li> Event-triggered Data: Data that are conditionally acquired based | |||
| <t> Derived Data: The data need to be synthesized or processed in network from r | on the occurrence of some events. An example of event-triggered data could be an | |||
| aw data from one or more network devices. The data processing function can be st | interface changing operational state between up and down. Such data can be acti | |||
| atically or dynamically loaded into network devices.</t> | vely pushed through subscription or passively polled through query. There are ma | |||
| <t> Event-triggered Data: The data are conditionally acquired based on the occur | ny ways to model events, including using Finite State Machine (FSM) or <xref tar | |||
| rence of some events. An example of event-triggered data could be an interface c | get="I-D.ietf-netmod-eca-policy" format="default">Event Condition Action (ECA)</ | |||
| hanging operational state between up and down. Such data can be actively pushed | xref>. </li> | |||
| through subscription or passively polled through query. There are many ways to m | <li> Streaming Data: Data that are continuously generated. It can be a | |||
| odel events, including using Finite State Machine (FSM) or <xref target="I-D.wwx | time series or the dump of databases. For example, an interface packet counter | |||
| -netmod-event-yang">Event Condition Action (ECA)</xref>. </t> | is exported every second. The streaming data reflect real-time network states an | |||
| <t> Streaming Data: The data are continuously generated. It can be time series o | d metrics and require large bandwidth and processing power. The streaming data a | |||
| r the dump of databases. For example, an interface packet counter is exported ev | re always actively pushed to the subscribers.</li> | |||
| ery second. The streaming data reflect realtime network states and metrics and r | </ul> | |||
| equire large bandwidth and processing power. The streaming data are always activ | <t>The above telemetry data types are not mutually exclusive. Rather, th | |||
| ely pushed to the subscribers.</t> | ey are often composite. Derived data is composed of simple data; event-triggered | |||
| </list> | data can be simple or derived; and streaming data can be based on some recurrin | |||
| </t> | g event. The relationships of these data types are illustrated in <xref target=" | |||
| <t>The above telemetry data types are not mutually exclusive. Rather, they are o | figure_0" format="default"/>. </t> | |||
| ften composite. Derived data is composed of simple data; Event-triggered data ca | <figure anchor="figure_0"> | |||
| n be simple or derived; streaming data can be based on some recurring event. The | <name>Data Type Relationship</name> | |||
| relationships of these data types are illustrated in <xref target="figure_0"/>. | <artwork name="" type="" align="left" alt=""><![CDATA[ | |||
| </t> | ||||
| <t> | ||||
| <figure anchor="figure_0" title="Data Type Relationship"> | ||||
| <artwork><![CDATA[ | ||||
| +----------------------+ +-----------------+ | +----------------------+ +-----------------+ | |||
| | Event-triggered Data |<----+ Streaming Data | | | Event-Triggered Data |<----+ Streaming Data | | |||
| +-------+---+----------+ +-----+---+-------+ | +-------+---+----------+ +-----+---+-------+ | |||
| | | | | | | | | | | |||
| | | | | | | | | | | |||
| | | +--------------+ | | | | | +--------------+ | | | |||
| | +-->| Derived Data |<--+ | | | +-->| Derived Data |<--+ | | |||
| | +------+------ + | | | +------+------ + | | |||
| | | | | | | | | |||
| | V | | | V | | |||
| | +--------------+ | | | +--------------+ | | |||
| +------>| Simple Data |<------+ | +------>| Simple Data |<------+ | |||
| +--------------+ | +--------------+ | |||
| ]]> | ]]></artwork> | |||
| </artwork> | </figure> | |||
| </figure> | <t>Subscription usually deals with event-triggered data and streaming da | |||
| </t> | ta, and query usually deals with simple data and derived data. But the other way | |||
| <t>Subscription usually deals with event-triggered data and streaming data, and | s are also possible. Advanced network telemetry techniques are designed mainly f | |||
| query usually deals with simple data and derived data. But the other ways are al | or event-triggered or streaming data subscription and derived data query.</t> | |||
| so possible. Advanced network telemetry techniques are designed mainly for event | </section> | |||
| -triggered or streaming data subscription, and derived data query.</t> | <section numbered="true" toc="default"> | |||
| </section> | <name>Mapping Existing Mechanisms into the Framework</name> | |||
| <section title="Mapping Existing Mechanisms into the Framework"> | <t>The following table shows how the existing mechanisms (mainly publish | |||
| <t>The following table shows how the existing mechanisms (mainly published in IE | ed in IETF and with the emphasis on the latest new technologies) are positioned | |||
| TF and with the emphasis on the latest new technologies) are positioned in the f | in the framework. Given the vast body of existing work, we cannot provide an exh | |||
| ramework. Given the vast body of existing work, we cannot provide an exhaustive | austive list, so the mechanisms in the tables should be considered as just examp | |||
| list, so the mechanisms in the tables should be considered as just examples. Als | les. Also, some comprehensive protocols and techniques may cover multiple aspect | |||
| o, some comprehensive protocols and techniques may cover multiple aspects or mod | s or modules of the framework, so a name in a block only emphasizes one particul | |||
| ules of the framework, so a name in a block only emphasizes one particular chara | ar characteristic of it. More details about some listed mechanisms can be found | |||
| cteristic of it. More details about some listed mechanisms can be found in Appen | in Appendix A.</t> | |||
| dix A.</t> | ||||
| <t> | ||||
| <figure anchor="figure_5" title="Existing Work Mapping"> | ||||
| <artwork><![CDATA[ | ||||
| +-------------+-----------------+---------------+--------------+ | ||||
| | | Management | Control | Forwarding | | ||||
| | | Plane | Plane | Plane | | ||||
| +-------------+-----------------+---------------+--------------+ | ||||
| | data config.| gNMI, NETCONF, | gNMI, NETCONF,| NETCONF, | | ||||
| | & subscribe | RESTCONF, SNMP, | RESTCONF, | RESTCONF, | | ||||
| | | YANG-Push | YANG-Push | YANG-Push | | ||||
| +-------------+-----------------+---------------+--------------+ | ||||
| | data gen. & | MIB, | YANG | IOAM, PSAMP | | ||||
| | process | YANG | | PBT, AM, | | ||||
| +-------------+-----------------+---------------+--------------+ | ||||
| | data encode.| gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP | | ||||
| | & export | | | | | ||||
| +-------------+-----------------+---------------+--------------+ | ||||
| ]]> | <table anchor="table_2"> | |||
| </artwork> | <name>Existing Work Mapping</name> | |||
| </figure> | <thead> | |||
| </t> | <tr> | |||
| <th></th> | ||||
| <th>Management Plane</th> | ||||
| <th>Control Plane</th> | ||||
| <th>Forwarding Plane</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td>data configuration and subscribe</td> | ||||
| <td>gNMI, NETCONF, RESTCONF, SNMP, YANG-Push</td> | ||||
| <td>gNMI, NETCONF, RESTCONF, YANG-Push</td> | ||||
| <td>NETCONF, RESTCONF, YANG-Push</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>data generation and process</td> | ||||
| <td>MIB, YANG</td> | ||||
| <td>YANG</td> | ||||
| <td>IOAM, PSAMP, PBT, AM</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>data encoding and export</td> | ||||
| <td>gRPC, HTTP, TCP</td> | ||||
| <td>BMP, TCP</td> | ||||
| <td>IPFIX, UDP</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t>Although the framework is generally suitable for any network environm | ||||
| ents, the multi-domain telemetry has some unique challenges that deserve further | ||||
| architectural consideration, which is out of the scope of this document.</t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="level" numbered="true" toc="default"> | ||||
| <name>Evolution of Network Telemetry Applications</name> | ||||
| <t>Network telemetry is an evolving technical area. As the network moves t | ||||
| owards the automated operation, network telemetry applications undergo several s | ||||
| tages of evolution, which add a new layer of requirements to the underlying netw | ||||
| ork telemetry techniques. Each stage is built upon the techniques adopted by the | ||||
| previous stages plus some new requirements.</t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>Stage 0 - Static Telemetry:</dt> | ||||
| <dd> The telemetry data source and type are determined at design time. T | ||||
| he network operator can only configure how to use it with limited flexibility. < | ||||
| /dd> | ||||
| <dt>Stage 1 - Dynamic Telemetry:</dt> | ||||
| <dd> The custom telemetry data can be dynamically programmed or configur | ||||
| ed at runtime without interrupting the network operation, allowing a trade-off a | ||||
| mong resource, performance, flexibility, and coverage.</dd> | ||||
| <dt>Stage 2 - Interactive Telemetry:</dt> | ||||
| <dd> The network operator can continuously customize and fine tune the t | ||||
| elemetry data in real time to reflect the network operation's visibility require | ||||
| ments. Compared with Stage 1, the changes are frequent based on the real-time fe | ||||
| edback. At this stage, some tasks can be automated, but human operators still ne | ||||
| ed to sit in the middle to make decisions. </dd> | ||||
| <dt>Stage 3 - Closed-Loop Telemetry:</dt> | ||||
| <dd> The telemetry is free from the interference of human operators, exc | ||||
| ept for generating the reports. The intelligent network operation engine automat | ||||
| ically issues the telemetry data requests, analyzes the data, and updates the ne | ||||
| twork operations in closed control loops. </dd> | ||||
| </dl> | ||||
| <t>Existing technologies are ready for Stages 0 and 1. Individual applicat | ||||
| ions for Stages 2 and 3 are also possible now. However, the future autonomic net | ||||
| works may need a comprehensive operation management system that works at Stages | ||||
| 2 and 3 to cover all the network operation tasks. A well-defined network telemet | ||||
| ry framework is the first step towards this direction. </t> | ||||
| </section> | ||||
| <section anchor="Security" numbered="true" toc="default"> | ||||
| <name>Security Considerations</name> | ||||
| <t>The complexity of network telemetry raises significant security implica | ||||
| tions. For example, telemetry data can be manipulated to exhaust various network | ||||
| resources at each plane as well as the data consumer; falsified or tampered dat | ||||
| a can mislead the decision-making process and paralyze networks; and wrong confi | ||||
| guration and programming for telemetry is equally harmful. The telemetry data is | ||||
| highly sensitive, which exposes a lot of information about the network and its | ||||
| configuration. Some of that information can make designing attacks against the n | ||||
| etwork much easier (e.g., exact details of what software and patches have been i | ||||
| nstalled) and allows an attacker to determine whether a device may be subject to | ||||
| unprotected security vulnerabilities.</t> | ||||
| <t>Although the framework is generally suitable for any network environments, th | <t>Given that this document has proposed a framework for network telemetry | |||
| e multi-domain telemetry has some unique challenges which deserve further archit | and the telemetry mechanisms discussed are more extensive (in both message freq | |||
| ectural consideration, which is out of the scope of this document.</t> | uency and traffic amount) than the conventional network OAM concepts, we must al | |||
| so anticipate that new security considerations that may also arise. A number of | ||||
| techniques already exist for securing the forwarding plane, control plane, and m | ||||
| anagement plane in a network, but it is important to consider if any new threat | ||||
| vectors are now being enabled via the use of network telemetry procedures and me | ||||
| chanisms. </t> | ||||
| <t>This document proposes a conceptual architectural for collecting, trans | ||||
| porting, and analyzing a wide variety of data sources in support of network appl | ||||
| ications. The protocols, data formats, and configurations chosen to implement th | ||||
| is framework will dictate the specific security considerations. These considerat | ||||
| ions may include:</t> | ||||
| <ul spacing="normal"> | ||||
| <li>Telemetry framework trust and policy models;</li> | ||||
| <li>Role management and access control for enabling and disabling teleme | ||||
| try capabilities;</li> | ||||
| <li>Protocol transport used for telemetry data and its inherent security | ||||
| capabilities;</li> | ||||
| <li>Telemetry data stores, storage encryption, methods of access, and re | ||||
| tention practices;</li> | ||||
| <li>Tracking telemetry events and any abnormalities that might identify | ||||
| malicious attacks using telemetry interfaces.</li> | ||||
| <li>Authentication and integrity protection of telemetry data to make da | ||||
| ta more trustworthy; and </li> | ||||
| <li>Segregating the telemetry data traffic from the data traffic carried | ||||
| over the network (e.g., historically management access and management data may | ||||
| be carried via an independent management network).</li> | ||||
| </ul> | ||||
| <t>Some security considerations highlighted above may be minimized or nega | ||||
| ted with policy management of network telemetry. In a network telemetry deployme | ||||
| nt, it would be advantageous to separate telemetry capabilities into different c | ||||
| lasses of policies, i.e., Role-Based Access Control and Event-Condition-Action p | ||||
| olicies. Also, potential conflicts between network telemetry mechanisms must be | ||||
| detected accurately and resolved quickly to avoid unnecessary network telemetry | ||||
| traffic propagation escalating into an unintended or intended denial-of-service | ||||
| attack.</t> | ||||
| <t>Further study of the security issues will be required, and it is expect | ||||
| ed that the security mechanisms and protocols are developed and deployed along w | ||||
| ith a network telemetry system.</t> | ||||
| </section> | ||||
| <section anchor="IANA" numbered="true" toc="default"> | ||||
| <name>IANA Considerations</name> | ||||
| <t>This document has no IANA actions.</t> | ||||
| </section> | ||||
| </section> | </middle> | |||
| </section> | <back> | |||
| <section anchor="level" title="Evolution of Network Telemetry Applications"> | ||||
| <t>Network telemetry is an evolving technical area. As the network moves towards | ||||
| the automated operation, network telemetry applications undergo several stages | ||||
| of evolution which add new layer of requirements to the underlying network telem | ||||
| etry techniques. Each stage is built upon the techniques adopted by the previous | ||||
| stages plus some new requirements.</t> | ||||
| <t> | ||||
| <list style="hanging"> | ||||
| <t hangText="Stage 0 - Static Telemetry:"> The telemetry data source and type ar | ||||
| e determined at design time. The network operator can only configure how to use | ||||
| it with limited flexibility. </t> | ||||
| <t hangText="Stage 1 - Dynamic Telemetry:"> The custom telemetry data can be dyn | ||||
| amically programmed or configured at runtime without interrupting the network op | ||||
| eration, allowing a trade-off among resource, performance, flexibility, and cove | ||||
| rage. </t> | ||||
| <t hangText="Stage 2 - Interactive Telemetry:"> The network operator can continu | ||||
| ously customize and fine tune the telemetry data in real time to reflect the net | ||||
| work operation's visibility requirements. Compared with Stage 1, the changes are | ||||
| frequent based on the real-time feedback. At this stage, some tasks can be auto | ||||
| mated, but human operators still need to sit in the middle to make decisions. </ | ||||
| t> | ||||
| <t hangText="Stage 3 - Closed-loop Telemetry:"> The telemetry is free from the i | ||||
| nterference of human operators, except for generating the reports. The intellige | ||||
| nt network operation engine automatically issues the telemetry data requests, an | ||||
| alyzes the data, and updates the network operations in closed control loops. </t | ||||
| > | ||||
| </list> | ||||
| </t> | ||||
| <t>Existing technologies are ready for stage 0 and stage 1. Individual stage 2 a | ||||
| nd stage 3 applications are also possible now. However, the future autonomic net | ||||
| works may need a comprehensive operation management system which works at stage | ||||
| 2 and stage 3 to cover all the network operation tasks. A well-defined network t | ||||
| elemetry framework is the first step towards this direction. </t> | ||||
| </section> | ||||
| <section anchor="Security" title="Security Considerations"> | ||||
| <t>The complexity of network telemetry raises significant security implications. | ||||
| For example, telemetry data can be manipulated to exhaust various network resou | ||||
| rces at each plane as well as the data consumer; falsified or tampered data can | ||||
| mislead the decision-making and paralyze networks; wrong configuration and progr | ||||
| amming for telemetry is equally harmful. The telemetry data is highly sensitive, | ||||
| which exposes a lot of information about the network and its configuration. Som | ||||
| e of that information can make designing attacks against the network much easier | ||||
| (e.g., exact details of what software and patches have been installed), and all | ||||
| ows an attacker to determine whether a device may be subject to unprotected secu | ||||
| rity vulnerabilities.</t> | ||||
| <t>Given that this document has proposed a framework for network telemetry and t | ||||
| he telemetry mechanisms discussed are more extensive (in both message frequency | ||||
| and traffic amount) than the conventional network OAM concepts, we must also ref | ||||
| lect that various new security considerations may also arise. A number of techni | ||||
| ques already exist for securing the forwarding plane, the control plane, and the | ||||
| management plane in a network, but it is important to consider if any new threa | ||||
| t vectors are now being enabled via the use of network telemetry procedures and | ||||
| mechanisms. </t> | ||||
| <t>This document proposes a conceptual architectural for collecting, transportin | ||||
| g, and analyzing a wide variety of data sources in support of network applicatio | ||||
| ns. The protocols, data formats, and configurations chosen to implement this fra | ||||
| mework will dictate the specific security considerations. These considerations m | ||||
| ay include:</t> | ||||
| <t> | ||||
| <list style="symbols"> | ||||
| <t>Telemetry framework trust and policy model;</t> | ||||
| <t>Role management and access control for enabling and disabling telemetry capab | ||||
| ilities;</t> | ||||
| <t>Protocol transport used for telemetry data and its inherent security capabili | ||||
| ties;</t> | ||||
| <t>Telemetry data stores, storage encryption, methods of access, and retention p | ||||
| ractices;</t> | ||||
| <t>Tracking telemetry events and any abnormalities that might identify malicious | ||||
| attacks using telemetry interfaces.</t> | ||||
| <t>Authentication and integrity protection of telemetry data to make data more t | ||||
| rustworthy. </t> | ||||
| <t>Segregating the telemetry data traffic from the data traffic carried over the | ||||
| network (e.g., historically management access and management data may be carrie | ||||
| d via an independent management network).</t> | ||||
| </list> | ||||
| </t> | ||||
| <t>Some security considerations highlighted above may be minimized or negated wi | ||||
| th policy management of network telemetry. In a network telemetry deployment it | ||||
| would be advantageous to separate telemetry capabilities into different classes | ||||
| of policies, i.e., Role Based Access Control and Event-Condition-Action policies | ||||
| . Also, potential conflicts between network telemetry mechanisms must be detecte | ||||
| d accurately and resolved quickly to avoid unnecessary network telemetry traffic | ||||
| propagation escalating into an unintended or intended denial of service attack. | ||||
| </t> | ||||
| <t>Further study of the security issues will be required, and it is expected tha | ||||
| t the security mechanisms and protocols are developed and deployed along with a | ||||
| network telemetry system.</t> | ||||
| </section> | <displayreference target="I-D.ietf-netconf-distributed-notif" to="NETCONF-DISTRI | |||
| <section anchor="IANA" title="IANA Considerations"> | B-NOTIF"/> | |||
| <t>This document includes no request to IANA.</t> | <displayreference target="I-D.ietf-netconf-udp-notif" to="NETCONF-UDP-NOTIF"/> | |||
| </section> | <displayreference target="I-D.song-ippm-postcard-based-telemetry" to="IPPM-POSTC | |||
| <section anchor="Contributors" title="Contributors"> | ARD-BASED-TELEMETRY"/> | |||
| <t> The other contributors of this document are Tianran Zhou, Zhenbin Li, Zhenqi | <displayreference target="I-D.song-opsawg-ifit-framework" to="OPSAWG-IFIT-FRAMEW | |||
| ang Li, Daniel King, Adrian Farrel, and Alexander Clemm </t> | ORK"/> | |||
| </section> | <displayreference target="I-D.irtf-nmrg-ibn-concepts-definitions" to="NMRG-IBN-C | |||
| <section anchor="Acknowledgments" title="Acknowledgments"> | ONCEPTS-DEFINITIONS"/> | |||
| <t>We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe Clarke, Vi | <displayreference target="I-D.ietf-netmod-eca-policy" to="NETMOD-ECA-POLICY"/> | |||
| ctor Liu, James Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Ye | ||||
| gani, Young Lee, Qin Wu, Gyan Mishra, Ben Schwartz, Alexey Melnikov, Michael Sch | ||||
| arf, Dhruv Dhody, Martin Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars E | ||||
| ggert, Eric Vyncke, Jean-Michel Combes, Erik Kline, Benjamin Kaduk, and many oth | ||||
| ers who have provided helpful comments and suggestions to improve this document. | ||||
| </t> | ||||
| </section> | ||||
| </middle> | ||||
| <back> | ||||
| <!-- | ||||
| <references title="Normative References"> | ||||
| <?rfc include='reference.RFC.2119'?> | ||||
| <?rfc include='reference.RFC.8174'?> | ||||
| </references> | ||||
| --> | ||||
| <references title="Informative References"> | ||||
| <?rfc include='reference.RFC.3954'?> | ||||
| <?rfc include="reference.RFC.6020"?> | ||||
| <?rfc include="reference.RFC.7950"?> | ||||
| <?rfc include="reference.RFC.6241"?> | ||||
| <?rfc include='reference.RFC.7540'?> | ||||
| <?rfc include='reference.RFC.7854'?> | ||||
| <?rfc include='reference.RFC.8321'?> | ||||
| <?rfc include='reference.RFC.7011'?> | ||||
| <?rfc include='reference.RFC.4656'?> | ||||
| <?rfc include='reference.RFC.5357'?> | ||||
| <?rfc include='reference.RFC.5424'?> | ||||
| <?rfc include='reference.RFC.1157'?> | ||||
| <?rfc include='reference.RFC.3176'?> | ||||
| <?rfc include='reference.RFC.3411'?> | ||||
| <?rfc include='reference.RFC.3416'?> | ||||
| <?rfc include='reference.RFC.7276'?> | ||||
| <?rfc include='reference.RFC.7799'?> | ||||
| <?rfc include='reference.RFC.2981'?> | ||||
| <?rfc include='reference.RFC.3877'?> | ||||
| <?rfc include='reference.RFC.7575'?> | ||||
| <?rfc include='reference.RFC.8641'?> | ||||
| <?rfc include='reference.RFC.8639'?> | ||||
| <?rfc include='reference.RFC.6812'?> | ||||
| <?rfc include='reference.RFC.2578'?> | ||||
| <?rfc include='reference.RFC.8762'?> | ||||
| <?rfc include='reference.RFC.8040'?> | ||||
| <?rfc include='reference.RFC.7258'?> | ||||
| <?rfc include='reference.RFC.8259'?> | ||||
| <?rfc include='reference.RFC.8924'?> | ||||
| <?rfc include='reference.RFC.5085'?> | ||||
| <?rfc include='reference.RFC.8084'?> | ||||
| <?rfc include='reference.RFC.8085'?> | ||||
| <?rfc include='reference.RFC.8889'?> | ||||
| <?rfc include='reference.RFC.8671'?> | ||||
| <?rfc include='reference.I-D.ietf-grow-bmp-local-rib'?> | ||||
| <?rfc include='reference.I-D.ietf-netconf-distributed-notif'?> | ||||
| <?rfc include='reference.I-D.ietf-netconf-udp-notif'?> | ||||
| <?rfc include='reference.I-D.song-opsawg-dnp4iq'?> | ||||
| <?rfc include='reference.I-D.ietf-ippm-ioam-data'?> | ||||
| <?rfc include='reference.I-D.ietf-ippm-ioam-direct-export'?> | ||||
| <?rfc include='reference.I-D.pedro-nmrg-anticipated-adaptation'?> | ||||
| <?rfc include='reference.I-D.song-ippm-postcard-based-telemetry'?> | ||||
| <?rfc include='reference.I-D.song-opsawg-ifit-framework'?> | ||||
| <?rfc include='reference.I-D.irtf-nmrg-ibn-concepts-definitions'?> | ||||
| <?rfc include='reference.I-D.wwx-netmod-event-yang'?> | ||||
| <reference anchor="gpb" target="https://developers.google.com/protocol-buffers"> | <references> | |||
| <front> | <name>Informative References</name> | |||
| <title>Google Protocol Buffers</title> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
| <author/> | .3954.xml"/> | |||
| <date/> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
| </front> | .6020.xml"/> | |||
| </reference> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
| <reference anchor="grpc" target="https://grpc.io"> | .7950.xml"/> | |||
| <front> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
| <title>gPPC, A high performance, open-source universal RPC framework</title> | .6241.xml"/> | |||
| <author/> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
| <date/> | .7540.xml"/> | |||
| </front> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
| </reference> | .7854.xml"/> | |||
| <reference anchor="gnmi" target="https://github.com/openconfig/reference/tree/ma | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
| ster/rpc/gnmi"> | .8321.xml"/> | |||
| <front> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
| <title>gNMI - gRPC Network Management Interface</title> | .7011.xml"/> | |||
| <author/> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
| <date/> | .4656.xml"/> | |||
| </front> | <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | |||
| .5357.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .5424.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .1157.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .3176.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .3411.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .3416.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .7276.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .7799.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .2981.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .3877.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .7575.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .8641.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .8639.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .6812.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .2578.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .8762.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .8040.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .7258.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .8259.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .8924.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .5085.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .8084.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .8085.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .8889.xml"/> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .8671.xml"/> | ||||
| <!-- [I-D.ietf-ippm-ioam-data] is now 9197--> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC | ||||
| .9197.xml"/> | ||||
| <!-- [I-D.ietf-grow-bmp-local-rib] Published as RFC 9069 --> | ||||
| <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9069. | ||||
| xml"/> | ||||
| <!-- [I-D.ietf-netconf-distributed-notif] IESG state I-D Exists --> | ||||
| <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-ne | ||||
| tconf-distributed-notif.xml"/> | ||||
| <!-- [I-D.ietf-netconf-udp-notif] IESG state I-D Exists --> | ||||
| <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-ne | ||||
| tconf-udp-notif.xml"/> | ||||
| <!-- [I-D.song-opsawg-dnp4iq] IESG state Expired. Note: included the long form a | ||||
| s the editor role was missing --> | ||||
| <reference anchor="OPSAWG-DNP4IQ"> | ||||
| <front> | ||||
| <title>Requirements for Interactive Query with Dynamic Network Probes</tit | ||||
| le> | ||||
| <author fullname="Haoyu Song" role="editor"> | ||||
| <organization>Huawei Technologies Co., Ltd</organization> | ||||
| </author> | ||||
| <author fullname="Jun Gong"> | ||||
| <organization>Huawei Technologies Co., Ltd</organization> | ||||
| </author> | ||||
| <date month="June" day="19" year="2017" /> | ||||
| </front> | ||||
| <seriesInfo name="Internet-Draft" value="draft-song-opsawg-dnp4iq-01" /> | ||||
| </reference> | </reference> | |||
| <reference anchor="xml" target="https://www.w3.org/TR/2008/REC-xml-20081126/"> | ||||
| <front> | <!-- [I-D.ietf-ippm-ioam-direct-export] IESG state AD Evaluation. Note: included | |||
| <title>Extensible Markup Language (XML) 1.0 (Fifth Edition)</title> | the long form as the editor role was missing --> | |||
| <author/> | <reference anchor="IPPM-IOAM-DIRECT-EXPORT"> | |||
| <date/> | <front> | |||
| </front> | <title>In-situ OAM Direct Exporting</title> | |||
| <author fullname="Haoyu Song"> | ||||
| <organization>Futurewei</organization> | ||||
| </author> | ||||
| <author fullname="Barak Gafni"> | ||||
| <organization>Nvidia</organization> | ||||
| </author> | ||||
| <author fullname="Tianran Zhou"> | ||||
| <organization>Huawei</organization> | ||||
| </author> | ||||
| <author fullname="Zhenbin Li"> | ||||
| <organization>Huawei</organization> | ||||
| </author> | ||||
| <author fullname="Frank Brockners"> | ||||
| <organization>Cisco</organization> | ||||
| </author> | ||||
| <author fullname="Shwetha Bhandari" role="editor"> | ||||
| <organization>Thoughtspot</organization> | ||||
| </author> | ||||
| <author fullname="Ramesh Sivakolundu"> | ||||
| <organization>Cisco</organization> | ||||
| </author> | ||||
| <author fullname="Tal Mizrahi" role="editor"> | ||||
| <organization>Huawei</organization> | ||||
| </author> | ||||
| <date month="October" day="13" year="2021" /> | ||||
| </front> | ||||
| <seriesInfo name="Internet-Draft" value="draft-ietf-ippm-ioam-direct-export-0 | ||||
| 7" /> | ||||
| </reference> | </reference> | |||
| <reference anchor="y1731" target="https://www.itu.int/rec/T-REC-Y.1731/en"> | ||||
| <front> | <!-- [I-D.pedro-nmrg-anticipated-adaptation] IESG state Expired. Note: in | |||
| <title>ITU-T Y.1731: OAM Functions and Mechanisms for Ethernet based networks, 2 | cluded the long form as the editor role was missing --> | |||
| 015</title> | <reference anchor="NMRG-ANTICIPATED-ADAPTATION"> | |||
| <author/> | <front> | |||
| <date/> | <title>Exploiting External Event Detectors to Anticipate Resource Requirem | |||
| </front> | ents for the Elastic Adaptation of SDN/NFV Systems</title> | |||
| <author fullname="Pedro Martinez-Julia" role="editor"> | ||||
| <organization>NICT</organization> | ||||
| </author> | ||||
| <date month="June" day="29" year="2018" /> | ||||
| </front> | ||||
| <seriesInfo name="Internet-Draft" value="draft-pedro-nmrg-anticipated-adaptat | ||||
| ion-02" /> | ||||
| </reference> | </reference> | |||
| </references> | <!-- [I-D.song-ippm-postcard-based-telemetry] IESG state I-D Exists --> | |||
| <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.song-ip | ||||
| pm-postcard-based-telemetry.xml"/> | ||||
| <section title="A Survey on Existing Network Telemetry Techniques"> | <!-- [I-D.song-opsawg-ifit-framework] IESG state I-D Exists --> | |||
| <t>In this non-normative appendix, we provide an overview of some existing techn | <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.song-op | |||
| iques and standard proposals for each network telemetry module.</t> | sawg-ifit-framework.xml"/> | |||
| <section title="Management Plane Telemetry"> | ||||
| <section title="Push Extensions for NETCONF"> | <!-- [I-D.irtf-nmrg-ibn-concepts-definitions] IESG state I-D Exists --> | |||
| <t><xref target="RFC6241">NETCONF</xref> is a popular network management protoco | <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.irtf-nm | |||
| l recommended by IETF. Its core strength is for managing configuration, but can | rg-ibn-concepts-definitions.xml"/> | |||
| also be used for data collection. <xref target="RFC8641">YANG-Push</xref> <xref | ||||
| target="RFC8639"/> extends NETCONF and enables subscriber applications to reques | <!-- [I-D.wwx-netmod-event-yang] FYI: I-D.wwx-netmod-event-yang (Expired) was re | |||
| t a continuous, customized stream of updates from a YANG datastore. Providing su | placed by I-D.ietf-netmod-eca-policy - IESG state Expired --> | |||
| ch visibility into changes made upon YANG configuration and operational objects | <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-ne | |||
| enables new capabilities based on the remote mirroring of configuration and oper | tmod-eca-policy.xml"/> | |||
| ational state. Moreover, <xref target="I-D.ietf-netconf-distributed-notif">distr | ||||
| ibuted data collection mechanism</xref> via <xref target="I-D.ietf-netconf-udp-n | <reference anchor="gpb" target="https://developers.google.com/protocol-buf | |||
| otif">UDP based publication channel</xref> provides enhanced efficiency for the | fers"> | |||
| NETCONF based telemetry.</t> | <front> | |||
| </section> | <title>Protocol Buffers</title> | |||
| <section title="gRPC Network Management Interface"> | <author><organization>Google Developers</organization></author> | |||
| <t><xref target="gnmi">gRPC Network Management Interface (gNMI)</xref> is a netw | <date/> | |||
| ork management protocol based on the <xref target="grpc">gRPC</xref> RPC (Remote | </front> | |||
| Procedure Call) framework. With a single gRPC service definition, both configur | </reference> | |||
| ation and telemetry can be covered. gRPC is an <xref target="RFC7540">HTTP/2</xr | ||||
| ef>-based open-source micro-service communication framework. It provides a numbe | <reference anchor="grpc" target="https://grpc.io"> | |||
| r of capabilities which are well-suited for network telemetry, including: </t> | <front> | |||
| <t> | <title>gPPC: A high performance, open source universal RPC framework</ | |||
| <list style="symbols"> | title> | |||
| <t>Full-duplex streaming transport model combined with a binary encoding mechani | <author><organization>gRPC</organization></author> | |||
| sm provides good telemetry efficiency.</t> | <date/> | |||
| <t>gRPC provides higher-level features consistency across platforms that common | </front> | |||
| HTTP/2 libraries typically do not. This characteristic is especially valuable fo | </reference> | |||
| r the fact that telemetry data collectors normally reside on a large variety of | ||||
| platforms.</t> | <reference anchor="gnmi" target="https://datatracker.ietf.org/meeting/98/ma | |||
| <t>The built-in load-balancing and failover mechanism.</t> | terials/slides-98-rtgwg-gnmi-intro-draft-openconfig-rtgwg-gnmi-spec-00"> | |||
| </list> | <front> | |||
| </t> | <title>gRPC Network Management Interface</title> | |||
| </section> | <author initials="R." surname="Shakir" fullname="Rob Shakir"> | |||
| </section> | <organization/> | |||
| <section title="Control Plane Telemetry"> | </author> | |||
| <section title="BGP Monitoring Protocol"> | <author initials="A." surname="Shaikh" fullname="Anees Shaikh"> | |||
| <t><xref target="RFC7854">BGP Monitoring Protocol (BMP)</xref> is used to monito | <organization/> | |||
| r BGP sessions and is intended to provide a convenient interface for obtaining r | </author> | |||
| oute views. </t> | <author initials="P." surname="Borman" fullname="Paul Borman"> | |||
| <t>The BGP routing information is collected from the monitored device(s) to the | <organization/> | |||
| BMP monitoring station by setting up the BMP TCP session. The BGP peers are moni | </author> | |||
| tored by the BMP Peer Up and Peer Down Notifications. The BGP routes (including | <author initials="M." surname="Hines" fullname="Marcus Hines"> | |||
| <xref target="RFC7854"> Adjacency_RIB_In </xref>, <xref target="RFC8671"> Adjace | <organization/> | |||
| ncy_RIB_out</xref>, and <xref target="I-D.ietf-grow-bmp-local-rib">Local_Rib</xr | </author> | |||
| ef>) are encapsulated in the BMP Route Monitoring Message and the BMP Route Mirr | <author initials="C." surname="Lebsack" fullname="Carl Lebsack"> | |||
| oring Message, providing both an initial table dump and real-time route updates. | <organization/> | |||
| In addition, BGP statistics are reported through the BMP Stats Report Message, | </author> | |||
| which could be either timer triggered or event-driven. Future BMP extensions cou | <author initials="C." surname="Marrow" fullname="Chris Morrow"> | |||
| ld further enrich BGP monitoring applications. | <organization/> | |||
| </t> | </author> | |||
| </section> | <date month="March" year="2017"/> | |||
| </section> | </front> | |||
| <section title="Data Plane Telemetry"> | <refcontent>IETF 98</refcontent> | |||
| <section title="The Alternate Marking (AM) technology"> | </reference> | |||
| <t>The Alternate Marking method enables efficient measurements of packet loss, d | ||||
| elay, and jitter both in IP and Overlay Networks, as presented in <xref target=" | <reference anchor="W3C.REC-xml-20081126" target="https://www.w3.org/TR/2008/RE | |||
| RFC8321"/> and <xref target="RFC8889"/>. </t> | C-xml-20081126"> | |||
| <t>This technique can be applied to point-to-point and multipoint-to-multipoint | <front> | |||
| flows. Alternate Marking creates batches of packets by alternating the value of | <title>Extensible Markup Language (XML) 1.0 (Fifth Edition)</title> | |||
| 1 bit (or a label) of the packet header. These batches of packets are unambiguou | <author initials="T." surname="Bray" fullname="Tim Bray"> | |||
| sly recognized over the network and the comparison of packet counters for each b | <organization showOnFrontPage="true"/> | |||
| atch allows the packet loss calculation. The same idea can be applied to delay m | </author> | |||
| easurement by selecting ad hoc packets with a marking bit dedicated for delay me | <author initials="J." surname="Paoli" fullname="Jean Paoli"> | |||
| asurements.</t> | <organization showOnFrontPage="true"/> | |||
| <t>Alternate Marking method needs two counters each marking period for each flow | </author> | |||
| under monitor. For instance, by considering n measurement points and m monitore | <author initials="M." surname="Sperberg-McQueen" fullname="Michael S | |||
| d flows, the order of magnitude of the packet counters for each time interval is | perberg-McQueen"> | |||
| n*m*2 (1 per color).</t> | <organization showOnFrontPage="true"/> | |||
| <t>Since networks offer rich sets of network performance measurement data (e.g., | </author> | |||
| packet counters), conventional approaches run into limitations. The bottleneck | <author initials="E." surname="Maler" fullname="Eve Maler"> | |||
| is the generation and export of the data and the amount of data that can be reas | <organization showOnFrontPage="true"/> | |||
| onably collected from the network. In addition, management tasks related to dete | </author> | |||
| rmining and configuring which data to generate lead to significant deployment ch | <author initials="F." surname="Yergeau" fullname="Francois Yergeau"> | |||
| allenges.</t> | <organization showOnFrontPage="true"/> | |||
| <t>The Multipoint Alternate Marking approach, described in <xref target="RFC8889 | </author> | |||
| "/>, aims to resolve this issue and make the performance monitoring more flexibl | <date month="November" year="2008"/> | |||
| e in case a detailed analysis is not needed. </t> | </front> | |||
| <t>An application orchestrates network performance measurements tasks across the | <refcontent>World Wide Web Consortium Recommendation REC-xml-20081126</ | |||
| network to allow for optimized monitoring. The application can choose how roug | refcontent> | |||
| hly or precisely to configure measurement points depending on the application's | </reference> | |||
| requirements.</t> | ||||
| <t>Using Alternate Marking, it is possible to monitor a Multipoint Network witho | <reference anchor="y1731" target="https://www.itu.int/rec/T-REC-Y.1731/en" | |||
| ut in depth examination by using the Network Clustering (subnetworks that are po | > | |||
| rtions of the entire network that preserve the same property of the entire netwo | <front> | |||
| rk, called clusters). So in the case that there is packet loss or the delay is | <title>Operations, administration and maintenance (OAM) functions and | |||
| too high then the specific filtering criteria could be applied to gather a more | mechanisms for Ethernet-based networks</title> | |||
| detailed analysis by using a different combination of clusters up to a per-flow | <author><organization>ITU-T</organization></author> | |||
| measurement as described in <xref target="RFC8321">Alternate-Marking (AM)</xref> | <date month="August" year="2015"/> | |||
| . </t> | </front> | |||
| <t>In summary, an application can configure end-to-end network monitoring. If th | <seriesInfo name="ITU-T Recommendation" value="G.8013/Y.1731"/> | |||
| e network does not experience issues, this approximate monitoring is good enough | </reference> | |||
| and is very cheap in terms of network resources. However, in case of problems, | </references> | |||
| the application becomes aware of the issues from this approximate monitoring and | ||||
| , in order to localize the portion of the network that has issues, configures th | <section numbered="true" toc="default"> | |||
| e measurement points more extensively, allowing more detailed monitoring to be p | <name>A Survey on Existing Network Telemetry Techniques</name> | |||
| erformed. After the detection and resolution of the problem, the initial approxi | <t>In this non-normative appendix, we provide an overview of some existing | |||
| mate monitoring can be used again.</t> | techniques and standard proposals for each network telemetry module.</t> | |||
| </section> | <section numbered="true" toc="default"> | |||
| <section title="Dynamic Network Probe"> | <name>Management Plane Telemetry</name> | |||
| <t>Hardware-based <xref target="I-D.song-opsawg-dnp4iq">Dynamic Network Probe (D | <section numbered="true" toc="default"> | |||
| NP)</xref> proposes a programmable means to customize the data that an applicati | <name>Push Extensions for NETCONF</name> | |||
| on collects from the data plane. A direct benefit of DNP is the reduction of the | <t><xref target="RFC6241" format="default">NETCONF</xref> is a popular | |||
| exported data. A full DNP solution covers several components including data sou | network management protocol recommended by IETF. Its core strength is for manag | |||
| rce, data subscription, and data generation. The data subscription needs to defi | ing configuration, but it can also be used for data collection. <xref target="RF | |||
| ne the derived data which can be composed and derived from the raw data sources. | C8639" format="default">YANG-Push</xref> <xref target="RFC8641" format="default" | |||
| The data generation takes advantage of the moderate in-network computing to pro | /> extends NETCONF and enables subscriber applications to request a continuous, | |||
| duce the desired data.</t> | customized stream of updates from a YANG datastore. Providing such visibility in | |||
| <t>While DNP can introduce unforeseeable flexibility to the data plane telemetry | to changes made upon YANG configuration and operational objects enables new capa | |||
| , it also faces some challenges. It requires a flexible data plane that can be d | bilities based on the remote mirroring of configuration and operational state. M | |||
| ynamically reprogrammed at run-time. The programming API is yet to be defined.</ | oreover, a <xref target="I-D.ietf-netconf-distributed-notif" format="default">di | |||
| t> | stributed data collection mechanism</xref> via a <xref target="I-D.ietf-netconf- | |||
| </section> | udp-notif" format="default">UDP-based publication channel</xref> provides enhanc | |||
| <section title="IP Flow Information Export (IPFIX) Protocol"> | ed efficiency for the NETCONF-based telemetry.</t> | |||
| <t>Traffic on a network can be seen as a set of flows passing through network el | </section> | |||
| ements. | <section numbered="true" toc="default"> | |||
| <xref target="RFC7011">IP Flow Information Export (IPFIX) </xref> | <name>gRPC Network Management Interface</name> | |||
| provides a means of transmitting traffic flow information for administrative or | <t><xref target="gnmi" format="default">gRPC Network Management Interf | |||
| other purposes. A typical IPFIX enabled system includes a pool of Metering Proce | ace (gNMI)</xref> is a network management protocol based on the <xref target="gr | |||
| sses that collects data packets at one or more Observation Points, optionally fi | pc" format="default">gRPC</xref> Remote Procedure Call (RPC) framework. With a s | |||
| lters them and aggregates information about these packets. An Exporter then gath | ingle gRPC service definition, both configuration and telemetry can be covered. | |||
| ers each of the Observation Points together into an Observation Domain and sends | gRPC is an open-source micro-service communication framework based on <xref targ | |||
| this information via the IPFIX protocol to a Collector.</t> | et="RFC7540" format="default">HTTP/2</xref>. It provides a number of capabilitie | |||
| </section> | s that are well-suited for network telemetry, including: </t> | |||
| <section title="In-Situ OAM"> | <ul spacing="normal"> | |||
| <t>Classical passive and active monitoring and measurement techniques are either | <li>A full-duplex streaming transport model; when combined with a bi | |||
| inaccurate or resource-consuming. It is preferable to directly acquire data ass | nary encoding mechanism, it provides good telemetry efficiency.</li> | |||
| ociated with a flow's packets when the packets pass through a network. <xref tar | <li>A higher-level feature consistency across platforms that common | |||
| get="I-D.ietf-ippm-ioam-data">In-situ OAM (iOAM)</xref>, a data generation techn | HTTP/2 libraries typically do not provide. This characteristic is especially val | |||
| ique, embeds a new instruction header to user packets and the instruction direct | uable for the fact that telemetry data collectors normally reside on a large var | |||
| s the network nodes to add the requested data to the packets. Thus, at the path | iety of platforms.</li> | |||
| end, the packet's experience gained on the entire forwarding path can be collect | <li>A built-in load-balancing and failover mechanism.</li> | |||
| ed. Such firsthand data is invaluable to many network OAM applications.</t> | </ul> | |||
| <t>However, iOAM also faces some challenges. The issues on performance impact, s | </section> | |||
| ecurity, scalability and overhead limits, encapsulation difficulties in some pro | </section> | |||
| tocols, and cross-domain deployment need to be addressed.</t> | <section numbered="true" toc="default"> | |||
| </section> | <name>Control Plane Telemetry</name> | |||
| <section anchor="pbt" title="Postcard Based Telemetry"> | <section numbered="true" toc="default"> | |||
| <t>The postcard-based telemetry, as embodied in <xref target="I-D.ietf-ippm-ioam | <name>BGP Monitoring Protocol</name> | |||
| -direct-export">IOAM DEX</xref> and <xref target="I-D.song-ippm-postcard-based-t | <t><xref target="RFC7854" format="default">BMP</xref> is used to monit | |||
| elemetry">IOAM Marking</xref>, is a complementary technique to the passport-base | or BGP sessions and is intended to provide a convenient interface for obtaining | |||
| d IOAM. PBT directly exports data at each node through an independent packet. At | route views. </t> | |||
| the cost of higher bandwidth overhead and the need for data correlation, PBT sh | <t>BGP routing information is collected from the monitored device(s) t | |||
| ows several unique advantages. It can also help to identify packet drop location | o the BMP monitoring station by setting up the BMP TCP session. The BGP peers ar | |||
| in case a packet is dropped on its forwarding path.</t> | e monitored by the BMP Peer Up and Peer Down notifications. The BGP routes (incl | |||
| </section> | uding <xref target="RFC7854" format="default"> Adj_RIB_In </xref>, <xref target= | |||
| <section title="Existing OAM for Specific Data Planes"> | "RFC8671" format="default"> Adj_RIB_out</xref>, and <xref target="RFC9069" forma | |||
| <t> | t="default">local RIB</xref>) are encapsulated in the BMP Route Monitoring Messa | |||
| Various data planes raise unique OAM requirements. IETF has published OAM techni | ge and the BMP Route Mirroring Message, providing both an initial table dump and | |||
| que and framework documents (e.g., <xref target="RFC8924" /> and <xref target="R | real-time route updates. In addition, BGP statistics are reported through the B | |||
| FC5085" />) targeting different data planes such as Multi-Protocol Label Switchi | MP Stats Report Message, which could be either timer triggered or event-driven. | |||
| ng (MPLS), L2 Virtual Private Network (L2-VPN), Network Virtualization Overlays | Future BMP extensions could further enrich BGP monitoring applications. | |||
| (NVO3), Virtual Extensible LAN (VXLAN), Bit Indexed Explicit Replication (BIER), | ||||
| Service Function Chaining (SFC), Segment Routing (SR), and Deterministic Networ | ||||
| king (DETNET). The aforementioned data plane telemetry techniques can be used to | ||||
| enhance the OAM capability on such data planes. | ||||
| </t> | </t> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section title="External Data and Event Telemetry"> | <section numbered="true" toc="default"> | |||
| <section title="Sources of External Events"> | <name>Data Plane Telemetry</name> | |||
| <t>To ensure that the information provided by external event detectors and used | <section numbered="true" toc="default"> | |||
| by the network management solutions is meaningful for management purposes, the n | <name>Alternate-Marking (AM) Technology</name> | |||
| etwork telemetry framework must ensure that such detectors (sources) are easily | <t>The Alternate-Marking method enables efficient measurements of pack | |||
| connected to the management solutions (sinks). This requires the specification o | et loss, delay, and jitter both in IP and Overlay Networks, as presented in <xre | |||
| f a list of potential external data sources that could be of interest in network | f target="RFC8321" format="default"/> and <xref target="RFC8889" format="default | |||
| management and match it to the connectors and/or interfaces required to connect | "/>. </t> | |||
| them.</t> | <t>This technique can be applied to point-to-point and multipoint-to-m | |||
| <t>Categories of external event sources that may be of interest to network manag | ultipoint flows. Alternate Marking creates batches of packets by alternating the | |||
| ement include::</t> | value of 1 bit (or a label) of the packet header. These batches of packets are | |||
| <t> | unambiguously recognized over the network, and the comparison of packet counters | |||
| <list style="symbols"> | for each batch allows the packet loss calculation. The same idea can be applied | |||
| <t>Smart objects and sensors. With the consolidation of the Internet of Things~( | to delay measurement by selecting ad hoc packets with a marking bit dedicated f | |||
| IoT) any network system will have many smart objects attached to its physical su | or delay measurements.</t> | |||
| rroundings and logical operation environments. Most of these objects will be ess | <t>The Alternate-Marking method needs two counters each marking period | |||
| entially based on sensors of many kinds (e.g., temperature, humidity, presence) | for each flow under monitor. For instance, by considering n measurement points | |||
| and the information they provide can be very useful for the management of the ne | and m monitored flows, the order of magnitude of the packet counters for each ti | |||
| twork, even when they are not specifically deployed for such purpose. Elements o | me interval is n*m*2 (1 per color).</t> | |||
| f this source type will usually provide a specific protocol for interaction, esp | <t>Since networks offer rich sets of network performance measurement d | |||
| ecially one of those protocols related to IoT, such as the Constrained Applicati | ata (e.g., packet counters), conventional approaches run into limitations. The b | |||
| on Protocol (CoAP).</t> | ottleneck is the generation and export of the data and the amount of data that c | |||
| <t>Online news reporters. Several online news services have the ability to provi | an be reasonably collected from the network. In addition, management tasks relat | |||
| de enormous quantity of information about different events occurring in the worl | ed to determining and configuring which data to generate lead to significant dep | |||
| d. Some of those events can impact on the network system managed by a specific f | loyment challenges.</t> | |||
| ramework and, therefore, such information may be of interest to the management s | <t>The Multipoint Alternate-Marking approach, described in <xref targe | |||
| olution. For instance, diverse security reports, such as the Common Vulnerabilit | t="RFC8889" format="default"/>, aims to resolve this issue and make the performa | |||
| ies and Exposures (CVE), can be issued by the corresponding authority and used b | nce monitoring more flexible in case a detailed analysis is not needed. </t> | |||
| y the management solution to update the managed system if needed. Instead of a s | <t>An application orchestrates network performance measurement tasks a | |||
| pecific protocol and data format, the sources of this kind of information usuall | cross the network to allow for optimized monitoring. The application can choose | |||
| y follow a relaxed but structured format. This format will be part of both the o | how roughly or precisely to configure measurement points depending on the appli | |||
| ntology and information model of the telemetry framework.</t> | cation's requirements.</t> | |||
| <t>Global event analyzers. The advance of Big Data analyzers provides a huge amo | <t>Using Alternate Marking, it is possible to monitor a Multipoint Net | |||
| unt of information and, more interestingly, the identification of events detecte | work without in-depth examination by using Network Clustering (subnetworks that | |||
| d by analyzing many data streams from different origins. In contrast with the ot | are portions of the entire network that preserve the same property of the entire | |||
| her types of sources, which are focused on specific events, the detectors of thi | network, called clusters). So in the case where there is packet loss or the de | |||
| s source type will detect generic events. For example, during a sport event some | lay is too high, the specific filtering criteria could be applied to gather a mo | |||
| unexpected movement makes it fascinating and many people connect to sites that | re detailed analysis by using a different combination of clusters up to a per-fl | |||
| are reporting on the event. The underlying networks supporting the services that | ow measurement as described in the Alternate-Marking document <xref target="RFC8 | |||
| cover the event can be affected by such situation, so their management solution | 321" format="default"/>. </t> | |||
| s should be aware of it. In contrast with the other source types, a new informat | <t>In summary, an application can configure end-to-end network monitor | |||
| ion model, format, and reporting protocol is required to integrate the detectors | ing. If the network does not experience issues, this approximate monitoring is g | |||
| of this type with the management solution.</t> | ood enough and is very cheap in terms of network resources. However, in case of | |||
| </list> | problems, the application becomes aware of the issues from this approximate moni | |||
| toring and, in order to localize the portion of the network that has issues, con | ||||
| figures the measurement points more extensively, allowing more detailed monitori | ||||
| ng to be performed. After the detection and resolution of the problem, the initi | ||||
| al approximate monitoring can be used again.</t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Dynamic Network Probe</name> | ||||
| <t>A hardware-based <xref target="OPSAWG-DNP4IQ" format="default">Dyna | ||||
| mic Network Probe (DNP)</xref> provides a programmable means to customize the da | ||||
| ta that an application collects from the data plane. A direct benefit of DNP is | ||||
| the reduction of the exported data. A full DNP solution covers several component | ||||
| s including data source, data subscription, and data generation. The data subscr | ||||
| iption needs to define the derived data that can be composed and derived from ra | ||||
| w data sources. The data generation takes advantage of the moderate in-network c | ||||
| omputing to produce the desired data.</t> | ||||
| <t>While DNP can introduce unforeseeable flexibility to the data plane | ||||
| telemetry, it also faces some challenges. It requires a flexible data plane tha | ||||
| t can be dynamically reprogrammed at runtime. The programming Application Progra | ||||
| mming Interface (API) is yet to be defined.</t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>IP Flow Information Export (IPFIX) Protocol</name> | ||||
| <t>Traffic on a network can be seen as a set of flows passing through | ||||
| network elements. | ||||
| <xref target="RFC7011" format="default">IPFIX </xref> | ||||
| provides a means of transmitting traffic flow information for administrative or | ||||
| other purposes. A typical IPFIX-enabled system includes a pool of Metering Proce | ||||
| sses that collects data packets at one or more Observation Points, optionally fi | ||||
| lters them, and aggregates information about these packets. An Exporter then gat | ||||
| hers each of the Observation Points together into an Observation Domain and send | ||||
| s this information via the IPFIX protocol to a Collector.</t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>In Situ OAM</name> | ||||
| <t>Classical passive and active monitoring and measurement techniques | ||||
| are either inaccurate or resource consuming. It is preferable to directly acquir | ||||
| e data associated with a flow's packets when the packets pass through a network. | ||||
| <xref target="RFC9197" format="default">IOAM</xref>, a data generation techniqu | ||||
| e, embeds a new instruction header to user packets, and the instruction directs | ||||
| the network nodes to add the requested data to the packets. Thus, at the path's | ||||
| end, the packet's experience gained on the entire forwarding path can be collect | ||||
| ed. Such firsthand data is invaluable to many network OAM applications.</t> | ||||
| <t>However, IOAM also faces some challenges. The issues on performance | ||||
| impact, security, scalability and overhead limits, encapsulation difficulties i | ||||
| n some protocols, and cross-domain deployment need to be addressed.</t> | ||||
| </section> | ||||
| <section anchor="pbt" numbered="true" toc="default"> | ||||
| <name>Postcard-Based Telemetry</name> | ||||
| <t>The postcard-based telemetry, as embodied in <xref target="IPPM-IOA | ||||
| M-DIRECT-EXPORT" format="default">IOAM Direct Export (DEX)</xref> and <xref targ | ||||
| et="I-D.song-ippm-postcard-based-telemetry" format="default">IOAM Marking</xref> | ||||
| , is a complementary technique to the passport-based IOAM <xref target="RFC9197" | ||||
| format="default"/>. PBT directly exports data at each node through an independe | ||||
| nt packet. At the cost of higher bandwidth overhead and the need for data correl | ||||
| ation, PBT shows several unique advantages. It can also help to identify packet | ||||
| drop location in case a packet is dropped on its forwarding path.</t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Existing OAM for Specific Data Planes</name> | ||||
| <t> | ||||
| Various data planes raise unique OAM requirements. IETF has published OAM techni | ||||
| que and framework documents (e.g., <xref target="RFC8924" format="default"/> and | ||||
| <xref target="RFC5085" format="default"/>) targeting different data planes such | ||||
| as Multiprotocol Label Switching (MPLS), L2 Virtual Private Network (VPN), Netw | ||||
| ork Virtualization over Layer 3 (NVO3), Virtual Extensible LAN (VXLAN), Bit Inde | ||||
| x Explicit Replication (BIER), Service Function Chaining (SFC), Segment Routing | ||||
| (SR), and Deterministic Networking (DETNET). The aforementioned data plane telem | ||||
| etry techniques can be used to enhance the OAM capability on such data planes. | ||||
| </t> | </t> | |||
| <t>Additional types of detector types can be added to the system, but they will | </section> | |||
| be generally the result of composing the properties offered by these main classe | </section> | |||
| s.</t> | <section numbered="true" toc="default"> | |||
| </section> | <name>External Data and Event Telemetry</name> | |||
| <section title="Connectors and Interfaces"> | <section numbered="true" toc="default"> | |||
| <t>For allowing external event detectors to be properly integrated with other ma | <name>Sources of External Events</name> | |||
| nagement solutions, both elements must expose interfaces and protocols that are | <t>To ensure that the information provided by external event detectors | |||
| subject to their particular objective. Since external event detectors will be fo | and used by the network management solutions is meaningful for management purpo | |||
| cused on providing their information to their main consumers, which generally wi | ses, the network telemetry framework must ensure that such detectors (sources) a | |||
| ll not be limited to the network management solutions, the framework must includ | re easily connected to the management solutions (sinks). This requires the speci | |||
| e the definition of the required connectors for ensuring the interconnection bet | fication of a list of potential external data sources that could be of interest | |||
| ween detectors (sources) and their consumers within the management systems (sink | in network management and matching it to the connectors and/or interfaces requir | |||
| s) are effective.</t> | ed to connect them.</t> | |||
| <t>In some situations, the interconnection between the external event detectors | <t>Categories of external event sources that may be of interest to net | |||
| and the management system is via the management plane. For those situations ther | work management include:</t> | |||
| e will be a special connector that provides the typical interfaces found in most | <ul spacing="normal"> | |||
| other elements connected to the management plane. For instance, the interfaces | <li>Smart objects and sensors. With the consolidation of the Interne | |||
| could accomplish this with a specific data model (YANG) and specific telemetry p | t of Things (IoT), any network system will have many smart objects attached to i | |||
| rotocol, such as NETCONF, YANG-Push, or gRPC.</t> | ts physical surroundings and logical operation environments. Most of these objec | |||
| </section> | ts will be essentially based on sensors of many kinds (e.g., temperature, humidi | |||
| </section> | ty, and presence), and the information they provide can be very useful for the m | |||
| </section> | anagement of the network, even when they are not specifically deployed for such | |||
| </back> | purpose. Elements of this source type will usually provide a specific protocol f | |||
| or interaction, especially one of the protocols related to IoT, such as the Cons | ||||
| trained Application Protocol (CoAP).</li> | ||||
| <li>Online news reporters. Several online news services have the abi | ||||
| lity to provide an enormous quantity of information about different events occur | ||||
| ring in the world. Some of those events can have an impact on the network system | ||||
| managed by a specific framework; therefore, such information may be of interest | ||||
| to the management solution. For instance, diverse security reports, such as Com | ||||
| mon Vulnerabilities and Exposures (CVEs), can be issued by the corresponding aut | ||||
| hority and used by the management solution to update the managed system, if need | ||||
| ed. Instead of a specific protocol and data format, the sources of this kind of | ||||
| information usually follow a relaxed but structured format. This format will be | ||||
| part of both the ontology and information model of the telemetry framework.</li> | ||||
| <li>Global event analyzers. The advance of big data analyzers provid | ||||
| es a huge amount of information and, more interestingly, the identification of e | ||||
| vents detected by analyzing many data streams from different origins. In contras | ||||
| t with the other types of sources, which are focused on specific events, the det | ||||
| ectors of this source type will detect generic events. For example, during a spo | ||||
| rts event, some unexpected movement makes it fascinating, and many people connec | ||||
| t to sites that are reporting on the event. The underlying networks supporting t | ||||
| he services that cover the event can be affected by such situation, so their man | ||||
| agement solutions should be aware of it. In contrast with the other source types | ||||
| , a new information model, format, and reporting protocol is required to integra | ||||
| te the detectors of this type with the management solution.</li> | ||||
| </ul> | ||||
| <t>Additional detector types can be added to the system, but generally | ||||
| they will be the result of composing the properties offered by these main class | ||||
| es.</t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Connectors and Interfaces</name> | ||||
| <t>For allowing external event detectors to be properly integrated wit | ||||
| h other management solutions, both elements must expose interfaces and protocols | ||||
| that are subject to their particular objective. Since external event detectors | ||||
| will be focused on providing their information to their main consumers, which ge | ||||
| nerally will not be limited to the network management solutions, the framework m | ||||
| ust include the definition of the required connectors for ensuring the interconn | ||||
| ection between detectors (sources) and their consumers within the management sys | ||||
| tems (sinks) are effective.</t> | ||||
| <t>In some situations, the interconnection between external event dete | ||||
| ctors and the management system is via the management plane. For those situation | ||||
| s, there will be a special connector that provides the typical interfaces found | ||||
| in most other elements connected to the management plane. For instance, the inte | ||||
| rfaces could accomplish this with a specific data model (YANG) and specific tele | ||||
| metry protocol, such as NETCONF, YANG-Push, or gRPC.</t> | ||||
| </section> | ||||
| <section anchor="Acknowledgments" numbered="false" toc="default"> | ||||
| <name>Acknowledgments</name> | ||||
| <t>We would like to thank <contact fullname="Rob Wilton"/>, <contact fulln | ||||
| ame="Greg Mirsky"/>, <contact fullname="Randy Presuhn"/>, <contact fullname="Joe | ||||
| Clarke"/>, <contact fullname="Victor Liu"/>, <contact fullname="James Guichard" | ||||
| />, <contact fullname="Uri Blumenthal"/>, <contact fullname="Giuseppe Fioccola"/ | ||||
| >, <contact fullname="Yunan Gu"/>, <contact fullname="Parviz Yegani"/>, <contact | ||||
| fullname="Young Lee"/>, <contact fullname="Qin Wu"/>, <contact fullname="Gyan M | ||||
| ishra"/>, <contact fullname="Ben Schwartz"/>, <contact fullname="Alexey Melnikov | ||||
| "/>, <contact fullname="Michael Scharf"/>, <contact fullname="Dhruv Dhody"/>, <c | ||||
| ontact fullname="Martin Duke"/>, <contact fullname="Roman Danyliw"/>, <contact f | ||||
| ullname="Warren Kumari"/>, <contact fullname="Sheng Jiang"/>, <contact fullname= | ||||
| "Lars Eggert"/>, <contact fullname="Éric Vyncke"/>, <contact fullname="Jean-Mich | ||||
| el Combes"/>, <contact fullname="Erik Kline"/>, <contact fullname="Benjamin Kadu | ||||
| k"/>, and many others who have provided helpful comments and suggestions to impr | ||||
| ove this document.</t> | ||||
| </section> | ||||
| <section anchor="Contributors" numbered="false" toc="default"> | ||||
| <name>Contributors</name> | ||||
| <t> The other contributors of this document are <contact fullname="Tianran | ||||
| Zhou"/>, <contact fullname="Zhenbin Li"/>, <contact fullname="Zhenqiang Li"/>, | ||||
| <contact fullname="Daniel King"/>, <contact fullname="Adrian Farrel"/>, and <con | ||||
| tact fullname="Alexander Clemm"/>.</t> | ||||
| </section> | ||||
| </section> | ||||
| </section> | ||||
| </back> | ||||
| </rfc> | </rfc> | |||
| End of changes. 29 change blocks. | ||||
| 1401 lines changed or deleted | 1641 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||