rfc9232xml2.original.xml   rfc9232.xml 
<?xml version="1.0" encoding="US-ASCII"?> <?xml version="1.0" encoding="utf-8"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs),
please see http://xml.resource.org/authoring/README.html. -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="3"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space
(using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-ietf-opsawg-ntf-13" ipr="trust200902">
<front>
<title abbrev="Network Telemetry Framework">Network Telemetry Framework</title>
<author fullname="Haoyu Song" initials="H." surname="Song">
<organization>Futurewei</organization>
<address>
<postal>
<street/>
<city/>
<country>USA</country>
</postal>
<email>haoyu.song@futurewei.com</email>
</address>
</author>
<author fullname="Fengwei Qin" initials="F." surname="Qin">
<organization>China Mobile</organization>
<address>
<postal>
<street/>
<city/>
<country>P.R. China</country>
</postal>
<email>qinfengwei@chinamobile.com</email>
</address>
</author>
<author fullname="Pedro Martinez-Julia" initials="P." surname="Martinez-Julia">
<organization>NICT</organization>
<address>
<postal>
<street/>
<city/>
<country>Japan</country>
</postal>
<email>pedro@nict.go.jp</email>
</address>
</author>
<author fullname="Laurent Ciavaglia" initials="L." surname="Ciavaglia">
<organization>Rakuten Mobile</organization>
<address>
<postal>
<street/>
<city/>
<country>France</country>
</postal>
<email>laurent.ciavaglia@rakuten.com</email>
</address>
</author>
<author fullname="Aijun Wang" initials="A." surname="Wang">
<organization>China Telecom</organization>
<address>
<postal>
<street/>
<city/>
<country>P.R. China</country>
</postal>
<email>wangaj.bri@chinatelecom.cn</email>
</address>
</author>
<date day="3" month="December" year="2021"/>
<area>Operation and Management Area</area>
<workgroup>OPSAWG</workgroup>
<!-- -->
<keyword>Telemetry, OAM</keyword>
<abstract>
<t>Network telemetry is a technology for gaining network insight and facilitatin
g efficient and automated network management. It encompasses various techniques
for remote data generation, collection, correlation, and consumption. This docum
ent describes an architectural framework for network telemetry, motivated by cha
llenges that are encountered as part of the operation of networks and by the req
uirements that ensue. This document clarifies the terminologies and classifies t
he modules and components of a network telemetry system from different perspecti
ves. The framework and taxonomy help to set a common ground for the collection o
f related work and provide guidance for related technique and standard developme
nts.</t>
</abstract> <!DOCTYPE rfc [
</front> <!ENTITY nbsp "&#160;">
<middle> <!ENTITY zwsp "&#8203;">
<section title="Introduction"> <!ENTITY nbhy "&#8209;">
<!ENTITY wj "&#8288;">
]>
<t> Network visibility is the ability of management tools to see the state and b <rfc xmlns:xi="http://www.w3.org/2001/XInclude" docName="draft-ietf-opsawg-ntf-1
ehavior of a network, which is essential for successful network operation. Netwo 3" number="9232" ipr="trust200902" obsoletes="" updates="" submissionType="IETF"
rk Telemetry revolves around network data that can help provide insights about t category="info" consensus="true" xml:lang="en" tocInclude="true" tocDepth="3" s
he current state of the network, including network devices, forwarding, control, ymRefs="true" sortRefs="true" version="3">
and management planes, and that can be generated and obtained through a variety
of techniques, including but not limited to network instrumentation and measure
ments, and that can be processed for purposes ranging from service assurance to
network security using a wide variety of data analytical techniques. In this doc
ument, Network Telemetry refer to both the data itself (i.e., "Network Telemetry
Data"), and the techniques and processes used to generate, export, collect, and
consume that data for use by potentially automated management applications. Net
work telemetry extends beyond the classical network Operations, Administration,
and Management (OAM) techniques and expects to support better flexibility, scala
bility, accuracy, coverage, and performance.</t>
<t> However, the term "network telemetry" lacks an unambiguous definition. The s
cope and coverage of it cause confusion and misunderstandings. It is beneficial
to clarify the concept and provide a clear architectural framework for network t
elemetry, so we can articulate the technical field, and better align the related
techniques and standard works.</t>
<t>To fulfill such an undertaking, we first discuss some key characteristics of
network telemetry which set a clear distinction from the conventional network OA
M and show that some conventional OAM technologies can be considered a subset of
the network telemetry technologies. We then provide an architectural framework
for network telemetry which includes four modules, each concerned with a differe
nt category of telemetry data and corresponding procedures. All the modules are
internally structured in the same way, including components that allow the opera
tor to configure data sources in regard to what data to generate and how to make
that available to client applications, components that instrument the underlyin
g data sources, and components that perform the actual rendering, encoding, and
exporting of the generated data. We show how the network telemetry framework can
benefit the current and future network operations. Based on the distinction of
modules and function components, we can map the existing and emerging techniques
and protocols into the framework. The framework can also simplify designing, ma
intaining, and understanding a network telemetry system. In addition, we outline
the evolution stages of the network telemetry system and discuss the potential
security concerns. </t>
<t> The purpose of the framework and taxonomy is to set a common ground for the
collection of related work and provide guidance for future technique and standar
d developments. To the best of our knowledge, this document is the first such ef
fort for network telemetry in industry standards organizations. This document do
es not define specific technologies.</t>
<!--
<section title="Requirements Language">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 <xref target="RFC2119"></xref><xref target="RFC8174"></xref> w
hen, and only when, they appear in all
capitals, as shown here.</t>
</section>
-->
<section title="Applicability Statement"> <!-- xml2rfc v2v3 conversion 3.12.2 -->
<front>
<title abbrev="Network Telemetry Framework">Network Telemetry Framework</tit
le>
<seriesInfo name="RFC" value="9232"/>
<author fullname="Haoyu Song" initials="H." surname="Song">
<organization>Futurewei</organization>
<address>
<postal>
<street/>
<city/>
<country>United States of America</country>
</postal>
<email>haoyu.song@futurewei.com</email>
</address>
</author>
<author fullname="Fengwei Qin" initials="F." surname="Qin">
<organization>China Mobile</organization>
<address>
<postal>
<street/>
<city/>
<country>China</country>
</postal>
<email>qinfengwei@chinamobile.com</email>
</address>
</author>
<author fullname="Pedro Martinez-Julia" initials="P." surname="Martinez-Juli
a">
<organization>NICT</organization>
<address>
<postal>
<street/>
<city/>
<country>Japan</country>
</postal>
<email>pedro@nict.go.jp</email>
</address>
</author>
<author fullname="Laurent Ciavaglia" initials="L." surname="Ciavaglia">
<organization>Rakuten Mobile</organization>
<address>
<postal>
<street/>
<city/>
<country>France</country>
</postal>
<email>laurent.ciavaglia@rakuten.com</email>
</address>
</author>
<author fullname="Aijun Wang" initials="A." surname="Wang">
<organization>China Telecom</organization>
<address>
<postal>
<street/>
<city/>
<country>China</country>
</postal>
<email>wangaj3@chinatelecom.cn</email>
</address>
</author>
<date year="2022" month="May" />
<t>Large-scale network data collection is a major threat to user privacy and may <area>Operations and Management Area</area>
be indistinguishable from pervasive monitoring <xref target="RFC7258" />. The <workgroup>OPSAWG</workgroup>
network telemetry framework presented in this document must not be applied to ge
nerating, exporting, collecting, analyzing, or retaining individual user data or
any data that can identify end users or characterize their behavior without con
sent. Based on this principle, the network telemetry framework is not applicable
to networks whose endpoints represent individual users, such as general-purpose
access networks. </t>
</section> <keyword>Telemetry</keyword>
<keyword>OAM</keyword>
<section title="Glossary"> <abstract>
<t>Before further discussion, we list some key terminology and acronyms used in <t>Network telemetry is a technology for gaining network insight and facil
this document. We make an intended differentiation between the terms of network itating efficient and automated network management. It encompasses various techn
telemetry and OAM. However, it should be understood that there is not a hard-lin iques for remote data generation, collection, correlation, and consumption. This
e distinction between the two concepts. Rather, network telemetry is considered document describes an architectural framework for network telemetry, motivated
as an extension of OAM. It covers all the existing OAM protocols but puts more e by challenges that are encountered as part of the operation of networks and by t
mphasis on the newer and emerging techniques and protocols concerning all aspect he requirements that ensue.
s of network data from acquisition to consumption.</t> This document clarifies the terminology and classifies the modules and com
<t> ponents of a network telemetry system from different perspectives. The framework
<list style="hanging"> and taxonomy help to set a common ground for the collection of related work and
<t hangText="AI:"> Artificial Intelligence. In the network domain, AI refers to provide guidance for related technique and standard developments.</t>
the machine-learning based technologies for automated network operation and othe </abstract>
r tasks.</t> </front>
<t hangText="AM:"> Alternate Marking, a flow performance measurement method, spe <middle>
cified in <xref target="RFC8321"/>. </t> <section numbered="true" toc="default">
<t hangText="BMP:"> BGP Monitoring Protocol, specified in <xref target="RFC7854" <name>Introduction</name>
/>. </t> <t> Network visibility is the ability of management tools to see the state
<t hangText="DPI:"> Deep Packet Inspection, referring to the techniques that exa and behavior of a network, which is essential for successful network operation.
mines packet beyond packet L3/L4 headers. </t> Network telemetry revolves around network data that 1) can help provide insight
<t hangText="gNMI:"> gRPC Network Management Interface, a network management pro s about the current state of the network, including network devices, forwarding,
tocol from OpenConfig Operator Working Group, mainly contributed by Google. See control, and management planes; 2) can be generated and obtained through a vari
<xref target="gnmi"/> for details. </t> ety of techniques, including but not limited to network instrumentation and meas
<t hangText="GPB:"> Google Protocol Buffer, an extensible mechanism for serializ urements; and 3) can be processed for purposes ranging from service assurance to
ing structured data. See <xref target="gpb" /> for details. </t> network security using a wide variety of data analytical techniques. In this do
<t hangText="gRPC:"> gRPC Remote Procedure Call, an open source high performance cument, network telemetry refers to both the data itself (i.e., "Network Telemet
RPC framework that gNMI is based on. See <xref target="grpc"/> for details. </t ry Data") and the techniques and processes used to generate, export, collect, an
> d consume that data for use by potentially automated management applications. Ne
<t hangText="IPFIX:"> IP Flow Information Export Protocol, specified in <xref ta twork telemetry extends beyond the classical network Operations, Administration,
rget="RFC7011"/>. </t> and Management (OAM) techniques and expects to support better flexibility, scal
<t hangText="IOAM:"> <xref target="I-D.ietf-ippm-ioam-data">In-situ OAM</xref>, ability, accuracy, coverage, and performance.</t>
a dataplane on-path telemetry technique. </t> <t> However, the term "network telemetry" lacks an unambiguous definition.
<t hangText="JSON:"> An open standard file format and data interchange format th The scope and coverage of it cause confusion and misunderstandings. It is benef
at uses human-readable text to store and transmit data objects, specified in <xr icial to clarify the concept and provide a clear architectural framework for net
ef target="RFC8259" />. </t> work telemetry, so we can articulate the technical field and better align the re
<t hangText="MIB:"> Management Information Base, a database used for managing th lated techniques and standard works.</t>
e entities in a network. </t> <t>To fulfill such an undertaking, we first discuss some key characteristi
<t hangText="NETCONF:"> Network Configuration Protocol, specified in <xref targe cs of network telemetry that set a clear distinction from the conventional netwo
t="RFC6241"/>. </t> rk OAM and show that some conventional OAM technologies can be considered a subs
<t hangText="NetFlow:"> A Cisco protocol for flow record collecting, described i et of the network telemetry technologies. We then provide an architectural frame
n <xref target="RFC3954"/>. </t> work for network telemetry that includes four modules, each associated with a di
<t hangText="Network Telemetry:"> The process and instrumentation for acquiring fferent category of telemetry data and corresponding procedures. All the modules
and utilizing network data remotely for network monitoring and operation. A gene are internally structured in the same way, including components that allow the
ral term for a large set of network visibility techniques and protocols, concern operator to configure data sources in regard to what data to generate and how to
ing aspects like data generation, collection, correlation, and consumption. Netw make that available to client applications, components that instrument the unde
ork telemetry addresses the current network operation issues and enables smooth rlying data sources, and components that perform the actual rendering, encoding,
evolution toward future intent-driven autonomous networks.</t> and exporting of the generated data. We show how the network telemetry framewor
<t hangText="NMS:"> Network Management System, referring to applications that al k can benefit current and future network operations. Based on the distinction of
low network administrators to manage a network. </t> modules and function components, we can map the existing and emerging technique
<t hangText="OAM:"> Operations, Administration, and Maintenance. A group of netw s and protocols into the framework. The framework can also simplify designing, m
ork management functions that provide network fault indication, fault localizati aintaining, and understanding a network telemetry system. In addition, we outlin
on, performance information, and data and diagnosis functions. Most conventional e the evolution stages of the network telemetry system and discuss the potential
network monitoring techniques and protocols belong to network OAM.</t> security concerns. </t>
<t hangText="PBT:"> Postcard-Based Telemetry, a dataplane on-path telemetry tech
nique. A representative technique is described in <xref target="I-D.ietf-ippm-io
am-direct-export"/>. </t>
<t hangText="RESTCONF:"> An HTTP-based protocol that provides a programmatic int
erface for accessing data defined in YANG, using the datastore concepts defined
in NETCONF, as specified in <xref target="RFC8040"/>. </t>
<t hangText="SMIv2:"> Structure of Management Information Version 2, defining MI
B objects, specified in <xref target="RFC2578"/>. </t>
<t hangText="SNMP:"> Simple Network Management Protocol. Version 1, 2, and 3 are
specified in <xref target="RFC1157"/>, <xref target="RFC3416"/>, and <xref targ
et="RFC3411"/>, respectively. </t>
<t hangText="XML:"> Extensible Markup Language is a markup language for data enc
oding that is both human-readable and machine-readable, specified by W3C <xref t
arget="xml" />. </t>
<t hangText="YANG:"> YANG is a data modeling language for the definition of data
sent over network management protocols such as the NETCONF and RESTCONF. YANG i
s defined in <xref target="RFC6020"/> and <xref target="RFC7950"/>. </t>
<t hangText="YANG ECA:"> A YANG model for Event-Condition-Action policies, defin
ed in <xref target="I-D.wwx-netmod-event-yang"/>. </t>
<t hangText="YANG-Push:"> A mechanism that allows subscriber applications to req
uest a stream of updates from a YANG datastore on a network device. Details are
specified in <xref target="RFC8641"/> and <xref target="RFC8639"/>. </t>
</list>
</t>
</section>
</section>
<section title="Background">
<t>The term "big data" is used to describe the extremely large volume of data se
ts that can be analyzed computationally to reveal patterns, trends, and associat
ions. Networks are undoubtedly a source of big data because of their scale and t
he volume of network traffic they forward. When a network's endpoints do not rep
resent individual users (e.g. in industrial, datacenter, and infrastructure cont
exts), network operations can often benefit from large-scale data collection wit
hout breaching user privacy.</t>
<t>Today one can access advanced big data analytics capability through a plethor
a of commercial and open source platforms (e.g., Apache Hadoop), tools (e.g., Ap
ache Spark), and techniques (e.g., machine learning). Thanks to the advance of c
omputing and storage technologies, network big data analytics gives network oper
ators an opportunity to gain network insights and move towards network autonomy.
Some operators start to explore the application of Artificial Intelligence (AI)
to make sense of network data. Software tools can use the network data to detec
t and react on network faults, anomalies, and policy violations, as well as pred
icting future events. In turn, the network policy updates for planning, intrusio
n prevention, optimization, and self-healing may be applied.</t>
<t>It is conceivable that an <xref target="RFC7575"> autonomic network </xref> i
s the logical next step for network evolution following Software Defined Network
ing (SDN), aiming to reduce (or even eliminate) human labor, make more efficient
use of network resources, and provide better services more aligned with custome
r requirements. The IETF ANIMA working group is dedicated to developing and main
taining protocols and procedures for automated network management and control of
professionally-managed networks. The related technique of <xref target="I-D.irt
f-nmrg-ibn-concepts-definitions">Intent-based Networking (IBN)</xref> requires n
etwork visibility and telemetry data in order to ensure that the network is beha
ving as intended. </t>
<t>However, while the data processing capability is improved and applications re
quire more data to function better, the networks lag behind in extracting and tr
anslating network data into useful and actionable information in efficient ways.
The system bottleneck is shifting from data consumption to data supply. Both th
e number of network nodes and the traffic bandwidth keep increasing at a fast pa
ce. The network configuration and policy change at smaller time slots than befor
e. More subtle events and fine-grained data through all network planes need to b
e captured and exported in real time. In a nutshell, it is a challenge to get en
ough high-quality data out of the network in a manner that is efficient, timely,
and flexible. Therefore, we need to survey the existing technologies and protoc
ols and identify any potential gaps.</t>
<t>In the remainder of this section, first we clarify the scope of network data
(i.e., telemetry data) relevant in this document. Then, we discuss several key u
se cases for today's and future network operations. Next, we show why the curren
t network OAM techniques and protocols are insufficient for these use cases. The
discussion underlines the need for new methods, techniques, and protocols, as w
ell as the extensions of existing ones, which we assign under the umbrella term
- Network Telemetry. </t>
<section title="Telemetry Data Coverage">
<t>Any information that can be extracted from networks (including data plane, co
ntrol plane, and management plane) and used to gain visibility or as basis for a
ctions is considered telemetry data. It includes statistics, event records and l
ogs, snapshots of state, configuration data, etc. It also covers the outputs of
any active and passive measurements <xref target="RFC7799"/>. In some cases, raw
data is processed in network before being sent to a data consumer. Such process
ed data is also considered telemetry data. The value of telemetry data varies. I
n some cases, if the cost is acceptable, less but higher quality data are prefer
red than lots of low quality data. A classification of telemetry data is provide
d in <xref target="framework"/>. To preserve the privacy of end-users, no user p
acket content should be collected. Specifically, the data objects generated, ex
ported, and collected by a network telemetry application should not include any
packet payload from traffic associated with end-users systems. </t>
</section>
<section title="Use Cases">
<t>The following set of use cases is essential for network operations. While the
list is by no means exhaustive, it is enough to highlight the requirements for
data velocity, variety, volume, and veracity, the attributes of big data, in net
works. </t>
<t>
<list style="symbols">
<t> Security: Network intrusion detection and prevention systems need to monitor
network traffic and activities and act upon anomalies. Given increasingly sophi
sticated attack vectors coupled with increasingly severe consequences of securit
y breaches, new tools and techniques need to be developed, relying on wider and
deeper visibility into networks. The ultimate goal is to achieve security with n
o, or only minimal, human intervention, and without disrupting legitimate traffi
c flows. </t>
<t> Policy and Intent Compliance: Network policies are the rules that constrain
the services for network access, provide service differentiation, or enforce spe
cific treatment on the traffic. For example, a service function chain is a polic
y that requires the selected flows to pass through a set of ordered network func
tions. Intent, as defined in <xref target="I-D.irtf-nmrg-ibn-concepts-definition
s"/>, is a set of operational goals that a network should meet and outcomes that
a network is supposed to deliver, defined in a declarative manner without speci
fying how to achieve or implement them. An intent requires a complex translation
and mapping process before being applied on networks. While a policy or intent
is enforced, the compliance needs to be verified and monitored continuously by r
elying on visibility that is provided through network telemetry data. Any viola
tion must be reported immediately, potentially resulting in updates to how the p
olicy or intent is applied in the network to ensure that it remains in force, or
otherwise alerting the network administrator to the policy or intent violation.
</t>
<t> SLA Compliance: A Service-Level Agreement (SLA) is a service contract betwee
n a service provider and a client, which include the metrics for the service mea
surement and remedy/penalty procedures when the service level misses the agreeme
nt. Users need to check if they get the service as promised and network operator
s need to evaluate how they can deliver services that can meet the SLA based on
realtime network telemetry data, including data from network measurements.</t>
<t> Root Cause Analysis: Many network failure can be the effect of a sequence of
chained events. Troubleshooting and recovery require quick identification of th
e root cause of any observable issues. However, the root cause is not always str
aightforward to identify, especially when the failure is sporadic and the number
of event messages, both related and unrelated to the same cause, is overwhelmin
g. While technologies such as machine learning can be used for root cause analys
is, it is up to the network to sense and provide the relevant diagnostic data wh
ich are either actively fed into, or passively retrieved by, the root cause anal
ysis applications.</t>
<t> Network Optimization: This covers all short-term and long-term network optim
ization techniques, including load balancing, Traffic Engineering (TE), and netw
ork planning. Network operators are motivated to optimize their network utilizat
ion and differentiate services for better Return On Investment (ROI) or lower Ca
pital Expenditures (CAPEX). The first step is to know the real-time network cond
itions before applying policies for traffic manipulation. In some cases, micro-b
ursts need to be detected in a very short time-frame so that fine-grained traffi
c control can be applied to avoid network congestion. Long-term planning of netw
ork capacity and topology requires analysis of real-world network telemetry data
that is obtained over long periods of time.</t>
<t> Event Tracking and Prediction: The visibility into traffic path and performa
nce is critical for services and applications that rely on healthy network opera
tion. Numerous related network events are of interest to network operators. For
example, Network operators want to learn where and why packets are dropped for a
n application flow. They also want to be warned of issues in advance, so proacti
ve actions can be taken to avoid catastrophic consequences. </t>
</list>
</t>
</section>
<section title="Challenges">
<t>For a long time, network operators have relied upon <xref target="RFC3416">SN
MP</xref>, Command-Line Interface (CLI), or <xref target="RFC5424">Syslog</xref>
to monitor the network. Some other OAM techniques as described in <xref target=
"RFC7276"/> are also used to facilitate network troubleshooting. These conventio
nal techniques are not sufficient to support the above use cases for the followi
ng reasons: </t>
<t>
<list style="symbols">
<t>Most use cases need to continuously monitor the network and dynamically refin
e the data collection in real-time. Poll-based low-frequency data collection is
ill-suited for these applications. Subscription-based streaming data directly pu
shed from the data source (e.g., the forwarding chip) is preferred to provide su
fficient data quantity and precision at scale.</t>
<t>Comprehensive data is needed, ranging from packet processing engines to traff
ic manager, from line cards to main control board, from user flows to control pr
otocol packets, from device configurations to operations, and from physical laye
r to application layer. Conventional OAM only covers a narrow range of data (e.g
., SNMP only handles data from the Management Information Base (MIB)). Classical
network devices cannot provide all the necessary probes. More open and programm
able network devices are therefore needed.</t>
<t>Many application scenarios need to correlate network-wide data from multiple
sources (i.e., from distributed network devices, different components of a netwo
rk device, or different network planes). A piecemeal solution is often lacking t
he capability to consolidate the data from multiple sources. The composition of
a complete solution, as partly proposed by <xref target="I-D.pedro-nmrg-anticipa
ted-adaptation">Autonomic Resource Control Architecture(ARCA)</xref>, will be em
powered and guided by a comprehensive framework. </t>
<t>Some conventional OAM techniques (e.g., CLI and Syslog) lack a formal data mo
del. The unstructured data hinder the tool automation and application extensibil
ity. Standardized data models are essential to support the programmable networks
. </t>
<t>Although some conventional OAM techniques support data push (e.g., <xref targ
et="RFC2981">SNMP Trap</xref><xref target="RFC3877"/>, Syslog, and <xref target=
"RFC3176">sFlow</xref>), the pushed data are limited to only predefined manageme
nt plane warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow). Netwo
rk operators require the data with arbitrary source, granularity, and precision
which are beyond the capability of the existing techniques. </t>
<t>The conventional passive measurement techniques can either consume excessive
network resources and produce excessive redundant data, or lead to inaccurate re
sults; on the other hand, the conventional active measurement techniques can int
erfere with the user traffic and their results are indirect. Techniques that can
collect direct and on-demand data from user traffic are more favorable.</t>
</list>
</t>
<t>These challenges were addressed by newer standards and techniques (e.g., IPFI
X/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push) and more are emerging.
These standards and techniques need to be recognized and accommodated in a new f
ramework.</t>
</section>
<section title="Network Telemetry"> <t> The purpose of the framework and taxonomy is to set a common ground fo
<t>Network telemetry has emerged as a mainstream technical term to refer to the r the collection of related work and provide guidance for future technique and s
network data collection and consumption techniques. Several network telemetry te tandard developments. To the best of our knowledge, this document is the first s
chniques and protocols (e.g., <xref target="RFC7011">IPFIX</xref> and <xref targ uch effort for network telemetry in industry standards organizations. This docum
et="grpc">gRPC</xref>) have been widely deployed. Network telemetry allows separ ent does not define specific technologies.</t>
ate entities to acquire data from network devices so that data can be visualized
and analyzed to support network monitoring and operation. Network telemetry cov
ers the conventional network OAM and has a wider scope. For instance, it is expe
cted that network telemetry can provide the necessary network insight for autono
mous networks and address the shortcomings of conventional OAM techniques. </t>
<t>Network telemetry usually assumes machines as data consumers rather than huma
n operators. Hence, the network telemetry can directly trigger the automated net
work operation, while in contrast some conventional OAM tools were designed and
used to help human operators to monitor and diagnose the networks and guide manu
al network operations. Such a proposition leads to very different techniques. </
t>
<t>Although new network telemetry techniques are emerging and subject to continu
ous evolution, several characteristics of network telemetry have been well accep
ted. Note that network telemetry is intended to be an umbrella term covering a w
ide spectrum of techniques, so the following characteristics are not expected to
be held by every specific technique.</t>
<t>
<list style="symbols">
<t>Push and Streaming: Instead of polling data from network devices, telemetry c
ollectors subscribe to streaming data pushed from data sources in network device
s.</t>
<t>Volume and Velocity: The telemetry data is intended to be consumed by machine
s rather than by human being. Therefore, the data volume can be huge and the pro
cessing is optimized for the needs of automation in realtime.</t>
<t>Normalization and Unification: Telemetry aims to address the overall network
automation needs. Efforts are made to normalize the data representation and unif
y the protocols, so as to simplify data analysis and provide integrated analysis
across heterogeneous devices and data sources across a network.</t>
<t>Model-based: The telemetry data is modeled in advance which allows applicatio
ns to configure and consume data with ease. </t>
<t>Data Fusion: The data for a single application can come from multiple data so
urces (e.g., cross-domain, cross-device, and cross-layer) based on common naming
/ID and needs to be correlated to take effect.</t>
<t>Dynamic and Interactive: Since the network telemetry means to be used in a cl
osed control loop for network automation, it needs to run continuously and adapt
to the dynamic and interactive queries from the network operation controller. <
/t>
</list>
</t>
<t>In addition, an ideal network telemetry solution may also have the following
features or properties:</t>
<t>
<list style="symbols">
<t>In-Network Customization: The data that is generated can be customized in net
work at run-time to cater to the specific need of applications. This needs the s
upport of a programmable data plane which allows probes with custom functions to
be deployed at flexible locations. </t>
<t>In-Network Data Aggregation and Correlation: Network devices and aggregation
points can work out which events and what data needs to be stored, reported, or
discarded thus reducing the load on the central collection and processing points
while still ensuring that the right information is ready to be processed in a t
imely way.</t>
<t>In-Network Processing: Sometimes it is not necessary or feasible to gather al
l information to a central point to be processed and acted upon. It is possible
for the data processing to be done in network, allowing reactive actions to be t
aken locally.</t>
<t>Direct Data Plane Export: The data originated from the data plane forwarding
chips can be directly exported to the data consumer for efficiency, especially w
hen the data bandwidth is large and the real-time processing is required. </t>
<t>In-band Data Collection: In addition to the passive and active data collectio
n approaches, the new hybrid approach allows to directly collect data for any ta
rget flow on its entire forwarding path <xref target="I-D.song-opsawg-ifit-frame
work"/>. </t>
</list>
</t>
<t>It is worth noting that a network telemetry system should not be intrusive to
normal network operations by avoiding the pitfall of the "observer effect". Tha
t is, it should not change the network behavior and affect the forwarding perfor
mance. Moreover, high-volume telemetry traffic may cause network congestion unle
ss proper isolation or traffic engineering techniques are in place, or congestio
n control mechanisms ensure that telemetry traffic backs off if it exceeds the n
etwork capacity. <xref target="RFC8084" /> and <xref target="RFC8085" /> are rel
evant Best Current Practices (BCP) in this space.</t>
<t>Although in many cases a system for network telemetry involves a remote data
collecting and consuming entity, it is important to understand that there are no
inherent assumptions about how a system should be architected. While a network
architecture with centralized controller (e.g., SDN) seems a natural fit for net
work telemetry, network telemetry can work in distributed fashions as well. For
example, telemetry data producers and consumers can have a peer-to-peer relatio
nship, in which a network node can be the direct consumer of telemetry data from
other nodes. </t>
</section>
<section title="The Necessity of a Network Telemetry Framework"> <section numbered="true" toc="default">
<t>Network data analytics (e.g., machine learning) is applied for network operat <name>Applicability Statement</name>
ion automation, relying on abundant and coherent data from networks. Data acquis <t>Large-scale network data collection is a major threat to user privacy
ition that is limited to a single source and static in nature will in many cases and may be indistinguishable from pervasive monitoring <xref target="RFC7258" f
not be sufficient to meet an application's telemetry data needs. As a result, m ormat="default"/>. The network telemetry framework presented in this document m
ultiple data sources, involving a variety of techniques and standards, will need ust not be applied to generating, exporting, collecting, analyzing, or retaining
to be integrated. It is desirable to have a framework that classifies and organ individual user data or any data that can identify end users or characterize th
izes different telemetry data source and types, defines different components of eir behavior without consent. Based on this principle, the network telemetry fra
a network telemetry system and their interactions, and helps coordinate and inte mework is not applicable to networks whose endpoints represent individual users,
grate multiple telemetry approaches across layers. This allows flexible combinat such as general-purpose access networks. </t>
ions of data for different applications, while normalizing and simplifying inter </section>
faces. In detail, such a framework would benefit the development of network oper <section numbered="true" toc="default">
ation applications for the following reasons:</t> <name>Glossary</name>
<t> <t>Before further discussion, we list some key terminology and abbreviat
<list style="symbols"> ions used in this document. There is an intended differentiation between the ter
<t>Future networks, autonomous or otherwise, depend on holistic and comprehensiv ms of network telemetry and OAM. However, it should be understood that there is
e network visibility. The use cases and applications are better to be supported not a hard-line distinction between the two concepts. Rather, network telemetry
uniformly and coherently using an integrated, converged mechanism and common tel is considered an extension of OAM. It covers all the existing OAM protocols but
emetry data representations wherever feasible. Therefore, the protocols and mech puts more emphasis on the newer and emerging techniques and protocols concerning
anisms should be consolidated into a minimum yet comprehensive set. A telemetry all aspects of network data from acquisition to consumption.</t>
framework can help to normalize the technique developments.</t> <dl newline="false" spacing="normal" indent="12">
<t>Network visibility presents multiple viewpoints. For example, the device view <dt>AI:</dt>
point takes the network infrastructure as the monitoring object from which the n <dd> Artificial Intelligence. In the network domain, AI refers to mach
etwork topology and device status can be acquired; the traffic viewpoint takes t ine-learning-based technologies for automated network operation and other tasks.
he flows or packets as the monitoring object from which the traffic quality and </dd>
path can be acquired. An application may need to switch its viewpoint during ope <dt>AM:</dt>
ration. It may also need to correlate a service and its impact on user experienc <dd> Alternate Marking. A flow performance measurement method, as spec
e to acquire the comprehensive information.</t> ified in <xref target="RFC8321" format="default"/>. </dd>
<t>Applications require network telemetry to be elastic in order to make efficie <dt>BMP:</dt>
nt use of network resources and reduce the impact of processing related to netwo <dd>BGP Monitoring Protocol. Specified in <xref target="RFC7854" forma
rk telemetry on network performance. For example, routine network monitoring sho t="default"/>. </dd>
uld cover the entire network with a low data sampling rate. Only when issues ari <dt>DPI:</dt>
se or critical trends emerge should telemetry data sources be modified and telem <dd>Deep Packet Inspection. Refers to the techniques that examine pack
etry data rates boosted as needed.</t> ets beyond packet L3/L4 headers. </dd>
<t>Efficient data aggregation is critical for applications to reduce the overall <dt>gNMI:</dt>
quantity of data and improve the accuracy of analysis.</t> <dd>gRPC Network Management Interface. A network management protocol f
</list> rom the OpenConfig Operator Working Group, mainly contributed by Google. See <xr
</t> ef target="gnmi" format="default"/> for details. </dd>
<t> A telemetry framework collects together all the telemetry-related works from <dt>GPB:</dt>
different sources and working groups within IETF. This makes it possible to ass <dd>Google Protocol Buffer. An extensible mechanism for serializing st
emble a comprehensive network telemetry system and to avoid repetitious or redun ructured data. See <xref target="gpb" format="default"/> for details. </dd>
dant work. The framework should cover the concepts and components from the stand <dt>gRPC:</dt>
ardization perspective. This document describes the modules which make up a netw <dd>gRPC Remote Procedure Call. An open-source high-performance RPC fr
ork telemetry framework and decomposes the telemetry system into a set of distin amework that gNMI is based on. See <xref target="grpc" format="default"/> for de
ct components that existing and future work can easily map to.</t> tails. </dd>
<dt>IPFIX:</dt>
<dd>IP Flow Information Export Protocol. Specified in <xref target="RF
C7011" format="default"/>. </dd>
<dt>IOAM:</dt>
<dd>
<xref target="RFC9197" format="default">In situ OAM</xref>. A data p
lane on-path telemetry technique. </dd>
<dt>JSON:</dt>
<dd>JavaScript Object Notation. An open standard file format and data
interchange format that uses human-readable text to store and transmit data obje
cts, as specified in <xref target="RFC8259" format="default"/>. </dd>
<dt>MIB:</dt>
<dd>Management Information Base. A database used for managing the enti
ties in a network. </dd>
<dt>NETCONF:</dt>
<dd>Network Configuration Protocol. Specified in <xref target="RFC6241
" format="default"/>. </dd>
<dt>NetFlow:</dt>
<dd>A Cisco protocol used for flow record collecting, as described in
<xref target="RFC3954" format="default"/>. </dd>
<dt>Network Telemetry:</dt>
<dd>The process and instrumentation for acquiring and utilizing networ
k data remotely for network monitoring and operation. A general term for a large
set of network visibility techniques and protocols, concerning aspects like dat
a generation, collection, correlation, and consumption. Network telemetry addres
ses current network operation issues and enables smooth evolution toward future
intent-driven autonomous networks.</dd>
<dt>NMS:</dt>
<dd>Network Management System. Refers to applications that allow netwo
rk administrators to manage a network. </dd>
<dt>OAM:</dt>
<dd>Operations, Administration, and Maintenance. A group of network ma
nagement functions that provide network fault indication, fault localization, pe
rformance information, and data and diagnosis functions. Most conventional netwo
rk monitoring techniques and protocols belong to network OAM.</dd>
</section> <dt>PBT:</dt>
</section> <dd>Postcard-Based Telemetry. A data plane on-path telemetry technique
. A representative technique is described in <xref target="IPPM-IOAM-DIRECT-EXPO
RT" format="default"/>. </dd>
<dt>RESTCONF:</dt>
<dd> An HTTP-based protocol that provides a programmatic interface for
accessing data defined in YANG, using the datastore concepts defined in NETCONF
, as specified in <xref target="RFC8040" format="default"/>. </dd>
<dt>SMIv2:</dt>
<dd>Structure of Management Information Version 2. Defines MIB objects
, as specified in <xref target="RFC2578" format="default"/>. </dd>
<dt>SNMP:</dt>
<dd>Simple Network Management Protocol. Versions 1, 2, and 3 are speci
fied in <xref target="RFC1157" format="default"/>, <xref target="RFC3416" format
="default"/>, and <xref target="RFC3411" format="default"/>, respectively. </dd>
<dt>XML:</dt>
<dd>Extensible Markup Language. A markup language for data encoding th
at is both human readable and machine readable, as specified by W3C <xref target
="W3C.REC-xml-20081126" format="default"/>. </dd>
<dt>YANG:</dt>
<dd>YANG is a data modeling language for the definition of data sent o
ver network management protocols such as NETCONF and RESTCONF. YANG is defined i
n <xref target="RFC6020" format="default"/> and <xref target="RFC7950" format="d
efault"/>. </dd>
<dt>YANG ECA:</dt>
<dd>A YANG model for Event-Condition-Action policies, as defined in <x
ref target="I-D.ietf-netmod-eca-policy" format="default"/>. </dd>
<dt>YANG-Push:</dt>
<dd> A mechanism that allows subscriber applications to request a stre
am of updates from a YANG datastore on a network device. Details are specified i
n <xref target="RFC8639" format="default"/> and <xref target="RFC8641" format="d
efault"/>. </dd>
</dl>
</section>
</section>
<section numbered="true" toc="default">
<name>Background</name>
<t>The term "big data" is used to describe the extremely large volume of d
ata sets that can be analyzed computationally to reveal patterns, trends, and as
sociations. Networks are undoubtedly a source of big data because of their scale
and the volume of network traffic they forward. When a network's endpoints do n
ot represent individual users (e.g., in industrial, data-center, and infrastruct
ure contexts), network operations can often benefit from large-scale data collec
tion without breaching user privacy.</t>
<t>Today, one can access advanced big data analytics capability through a
plethora of commercial and open-source platforms (e.g., Apache Hadoop), tools (e
.g., Apache Spark), and techniques (e.g., machine learning). Thanks to the advan
ce of computing and storage technologies, network big data analytics give networ
k operators an opportunity to gain network insights and move towards network aut
onomy. Some operators start to explore the application of Artificial Intelligenc
e (AI) to make sense of network data. Software tools can use the network data to
detect and react on network faults, anomalies, and policy violations, as well a
s predict future events. In turn, the network policy updates for planning, intru
sion prevention, optimization, and self-healing may be applied.</t>
<t>It is conceivable that an <xref target="RFC7575" format="default"> auto
nomic network </xref> is the logical next step for network evolution following S
oftware-Defined Networking (SDN), which aims to reduce (or even eliminate) human
labor, make more efficient use of network resources, and provide better service
s more aligned with customer requirements. The IETF ANIMA Working Group is dedic
ated to developing and maintaining protocols and procedures for automated networ
k management and control of professionally managed networks. The related techniq
ue of <xref target="I-D.irtf-nmrg-ibn-concepts-definitions" format="default">Int
ent-Based Networking (IBN)</xref> requires network visibility and telemetry data
in order to ensure that the network is behaving as intended. </t>
<t>However, while the data processing capability is improved and applicati
ons require more data to function better, the networks lag behind in extracting
and translating network data into useful and actionable information in efficient
ways. The system bottleneck is shifting from data consumption to data supply. B
oth the number of network nodes and the traffic bandwidth keep increasing at a f
ast pace. The network configuration and policy change at smaller time slots than
before. More subtle events and fine-grained data through all network planes nee
d to be captured and exported in real time. In a nutshell, it is a challenge to
get enough high-quality data out of the network in a manner that is efficient, t
imely, and flexible. Therefore, we need to survey the existing technologies and
protocols and identify any potential gaps.</t>
<t>In the remainder of this section, we first clarify the scope of network
data (i.e., telemetry data) relevant in this document. Then, we discuss several
key use cases for network operations of today and the future. Next, we show why
the current network OAM techniques and protocols are insufficient for these use
cases. The discussion underlines the need for new methods, techniques, and prot
ocols, as well as the extensions of existing ones, which we assign under the umb
rella term "Network Telemetry". </t>
<section numbered="true" toc="default">
<name>Telemetry Data Coverage</name>
<t>Any information that can be extracted from networks (including the da
ta plane, control plane, and management plane) and used to gain visibility or as
a basis for actions is considered telemetry data. It includes statistics, event
records and logs, snapshots of state, configuration data, etc. It also covers t
he outputs of any active and passive measurements <xref target="RFC7799" format=
"default"/>. In some cases, raw data is processed in network before being sent t
o a data consumer. Such processed data is also considered telemetry data. The va
lue of telemetry data varies. In some cases, if the cost is acceptable, less but
higher-quality data are preferred rather than a lot of low-quality data. A clas
sification of telemetry data is provided in <xref target="framework" format="def
ault"/>. To preserve the privacy of end users, no user packet content should be
collected. Specifically, the data objects generated, exported, and collected by
a network telemetry application should not include any packet payload from traf
fic associated with end-user systems. </t>
</section>
<section numbered="true" toc="default">
<name>Use Cases</name>
<t>The following set of use cases is essential for network operations. W
hile the list is by no means exhaustive, it is enough to highlight the requireme
nts for data velocity, variety, volume, and veracity, the attributes of big data
, in networks. </t>
<ul spacing="normal">
<li> Security: Network intrusion detection and prevention systems need
to monitor network traffic and activities and act upon anomalies. Given increas
ingly sophisticated attack vectors coupled with increasingly severe consequences
of security breaches, new tools and techniques need to be developed, relying on
wider and deeper visibility into networks. The ultimate goal is to achieve secu
rity with no, or only minimal, human intervention and without disrupting legitim
ate traffic flows. </li>
<li>Policy and Intent Compliance: Network policies are the rules that
constrain the services for network access, provide service differentiation, or e
nforce specific treatment on the traffic. For example, a service function chain
is a policy that requires the selected flows to pass through a set of ordered ne
twork functions. Intent, as defined in <xref target="I-D.irtf-nmrg-ibn-concepts-
definitions" format="default"/>, is a set of operational goals that a network sh
ould meet and outcomes that a network is supposed to deliver, defined in a decla
rative manner without specifying how to achieve or implement them. An intent req
uires a complex translation and mapping process before being applied on networks
. While a policy or intent is enforced, the compliance needs to be verified and
monitored continuously by relying on visibility that is provided through network
telemetry data. Any violation must be reported immediately - this will alert th
e network
administrator to the policy or intent violation and will potentially
result in updates to how the policy or intent is applied in the network to
ensure that it remains in force. </li>
<li>SLA Compliance: A Service Level Agreement (SLA) is a service contr
act between a service provider and a client, which includes the metrics for the
service measurement and remedy/penalty procedures when the service level misses
the agreement. Users need to check if they get the service as promised, and netw
ork operators need to evaluate how they can deliver services that meet the SLA b
ased on real-time network telemetry data, including data from network measuremen
ts.</li>
<li>Root Cause Analysis: Many network failures can be the effect of a
sequence of chained events. Troubleshooting and recovery require quick identific
ation of the root cause of any observable issues. However, the root cause is not
always straightforward to identify, especially when the failure is sporadic and
the number of event messages, both related and unrelated to the same cause, is
overwhelming. While technologies such as machine learning can be used for root c
ause analysis, it is up to the network to sense and provide the relevant diagnos
tic data that are either actively fed into or passively retrieved by the root ca
use analysis applications.</li>
<li>Network Optimization: This covers all short-term and long-term net
work optimization techniques, including load balancing, Traffic Engineering (TE)
, and network planning. Network operators are motivated to optimize their networ
k utilization and differentiate services for better Return on Investment (ROI) o
r lower Capital Expenditure (CAPEX). The first step is to know the real-time net
work conditions before applying policies for traffic manipulation. In some cases
, microbursts need to be detected in a very short time frame so that fine-graine
d traffic control can be applied to avoid network congestion. Long-term planning
of network capacity and topology requires analysis of real-world network teleme
try data that is obtained over long periods of time.</li>
<li>Event Tracking and Prediction: The visibility into traffic path an
d performance is critical for services and applications that rely on healthy net
work operation. Numerous related network events are of interest to network opera
tors. For example, network operators want to learn where and why packets are dro
pped for an application flow. They also want to be warned of issues in advance,
so proactive actions can be taken to avoid catastrophic consequences. </li>
</ul>
</section>
<section numbered="true" toc="default">
<name>Challenges</name>
<t>For a long time, network operators have relied upon <xref target="RFC
3416" format="default">SNMP</xref>, Command-Line Interface (CLI), or <xref targe
t="RFC5424" format="default">Syslog</xref> to monitor the network. Some other OA
M techniques as described in <xref target="RFC7276" format="default"/> are also
used to facilitate network troubleshooting. These conventional techniques are no
t sufficient to support the above use cases for the following reasons: </t>
<ul spacing="normal">
<li>Most use cases need to continuously monitor the network and dynami
cally refine the data collection in real time. Poll-based low-frequency data col
lection is ill-suited for these applications. Subscription-based streaming data
directly pushed from the data source (e.g., the forwarding chip) is preferred to
provide sufficient data quantity and precision at scale.</li>
<li>Comprehensive data is needed, ranging from packet processing engin
es to traffic managers, line cards to main control boards, user flows to control
protocol packets, device configurations to operations, and physical layers to a
pplication layers. Conventional OAM only covers a narrow range of data (e.g., SN
MP only handles data from the Management Information Base (MIB)). Classical netw
ork devices cannot provide all the necessary probes. More open and programmable
network devices are therefore needed.</li>
<li>Many application scenarios need to correlate network-wide data fro
m multiple sources (i.e., from distributed network devices, different components
of a network device, or different network planes). A piecemeal solution is ofte
n lacking the capability to consolidate the data from multiple sources. The comp
osition of a complete solution, as partly proposed by <xref target="NMRG-ANTICIP
ATED-ADAPTATION" format="default">Autonomic Resource Control Architecture (ARCA)
</xref>, will be empowered and guided by a comprehensive framework. </li>
<li>Some conventional OAM techniques (e.g., CLI and Syslog) lack a for
mal data model. The unstructured data hinder the tool automation and application
extensibility. Standardized data models are essential to support the programmab
le networks. </li>
<section anchor="framework" title="Network Telemetry Framework"> <li>Although some conventional OAM techniques support data push (e.g.,
<t> The top level network telemetry framework partitions the network telemetry i <xref target="RFC2981" format="default">SNMP Trap</xref><xref target="RFC3877" f
nto four modules based on the telemetry data object source and represents their ormat="default"/>, Syslog, and <xref target="RFC3176" format="default">sFlow</xr
relationship. Once the network operation applications acquire the data from thes ef>), the pushed data are limited to only predefined management plane warnings (
e modules, they can apply data analytics and take actions. At the next level, th e.g., SNMP Trap) or sampled user packets (e.g., sFlow). Network operators requir
e framework decomposes each module into separate components. Each of the modules e the data with arbitrary source, granularity, and precision, which is beyond th
follows the same underlying structure, with one component dedicated to the conf e capability of the existing techniques. </li>
iguration of data subscriptions and data sources, a second component dedicated t <li>Conventional passive measurement techniques can either consume exc
o encoding and exporting data, and a third component instrumenting the generatio essive network resources and produce excessive redundant data or lead to inaccur
n of telemetry related to the underlying resources. Throughout the framework, th ate results; on the other hand, conventional active measurement techniques can i
e same set of abstract data acquiring mechanisms and data types (<xref target="s nterfere with the user traffic, and their results are indirect. Techniques that
ec:type"/>) are applied. The two-level architecture with the uniform data abstra can collect direct and on-demand data from user traffic are more favorable.</li>
ction helps accurately pinpoint a protocol or technique to its position in a net </ul>
work telemetry system or disaggregate a network telemetry system into manageable <t>These challenges were addressed by newer standards and techniques (e.
parts.</t> g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push), and more are e
<section title="Top Level Modules"> merging. These standards and techniques need to be recognized and accommodated i
<t> Telemetry can be applied on the forwarding plane, the control plane, and the n a new framework.</t>
management plane in a network, as well as other sources out of the network, as </section>
shown in <xref target="figure_1"/>. Therefore, we categorize the network telemet <section numbered="true" toc="default">
ry into four distinct modules (management plane, control plane, forwarding plane <name>Network Telemetry</name>
, and external data and event telemetry) with each having its own interface to N <t>Network telemetry has emerged as a mainstream technical term to refer
etwork Operation Applications.</t> to the network data collection and consumption techniques. Several network tele
<t> metry techniques and protocols (e.g., <xref target="RFC7011" format="default">IP
<figure anchor="figure_1" title="Modules in Layer Category of NTF"> FIX</xref> and <xref target="grpc" format="default">gRPC</xref>) have been widel
<artwork><![CDATA[ y deployed. Network telemetry allows separate entities to acquire data from netw
ork devices so that data can be visualized and analyzed to support network monit
oring and operation. Network telemetry covers the conventional network OAM and h
as a wider scope. For instance, it is expected that network telemetry can provid
e the necessary network insight for autonomous networks and address the shortcom
ings of conventional OAM techniques. </t>
<t>Network telemetry usually assumes machines as data consumers rather t
han human operators. Hence, network telemetry can directly trigger the automated
network operation, while in contrast, some conventional OAM tools were designed
and used to help human operators to monitor and diagnose the networks and guide
manual network operations. Such a proposition leads to very different technique
s. </t>
<t>Although new network telemetry techniques are emerging and subject to
continuous evolution, several characteristics of network telemetry have been we
ll accepted. Note that network telemetry is intended to be an umbrella term cove
ring a wide spectrum of techniques, so the following characteristics are not exp
ected to be held by every specific technique.</t>
<ul spacing="normal">
<li>Push and Streaming: Instead of polling data from network devices,
telemetry collectors subscribe to streaming data pushed from data sources in net
work devices.</li>
<li>Volume and Velocity: Telemetry data is intended to be consumed by
machines rather than by human beings. Therefore, the data volume can be huge, an
d the processing is optimized for the needs of automation in real time.</li>
<li>Normalization and Unification: Telemetry aims to address the overa
ll network automation needs. Efforts are made to normalize the data representati
on and unify the protocols, so as to simplify data analysis and provide integrat
ed analysis across heterogeneous devices and data sources across a network.</li>
<li>Model-Based: Telemetry data is modeled in advance, which allows ap
plications to configure and consume data with ease. </li>
<li>Data Fusion: The data for a single application can come from multi
ple data sources (e.g., cross-domain, cross-device, and cross-layer) that are ba
sed on a common name/ID and need to be correlated to take effect.</li>
<li>Dynamic and Interactive: Since the network telemetry means to be u
sed in a closed control loop for network automation, it needs to run continuousl
y and adapt to the dynamic and interactive queries from the network operation co
ntroller. </li>
</ul>
<t>In addition, an ideal network telemetry solution may also have the fo
llowing features or properties:</t>
<ul spacing="normal">
<li>In-Network Customization: The data that is generated can be custom
ized in network at runtime to cater to the specific need of applications. This n
eeds the support of a programmable data plane, which allows probes with custom f
unctions to be deployed at flexible locations. </li>
<li>In-Network Data Aggregation and Correlation: Network devices and a
ggregation points can work out which events and what data needs to be stored, re
ported, or discarded, thus reducing the load on the central collection and proce
ssing points while still ensuring that the right information is ready to be proc
essed in a timely way.</li>
<li>In-Network Processing: Sometimes it is not necessary or feasible t
o gather all information to a central point to be processed and acted upon. It i
s possible for the data processing to be done in network, allowing reactive acti
ons to be taken locally.</li>
<li>Direct Data Plane Export: The data originated from data plane forw
arding chips can be directly exported to the data consumer for efficiency, espec
ially when the data bandwidth is large and real-time processing is required. </l
i>
<li>In-Band Data Collection: In addition to the passive and active dat
a collection approaches, the new hybrid approach allows to directly collect data
for any target flow on its entire forwarding path <xref target="I-D.song-opsawg
-ifit-framework" format="default"/>. </li>
</ul>
<t>It is worth noting that a network telemetry system should not be intr
usive to normal network operations by avoiding the pitfall of the "observer effe
ct". That is, it should not change the network behavior and affect the forwardin
g performance. Moreover, high-volume telemetry traffic may cause network congest
ion unless proper isolation or traffic engineering techniques are in place, or c
ongestion control mechanisms ensure that telemetry traffic backs off if it excee
ds the network capacity. <xref target="RFC8084" format="default"/> and <xref tar
get="RFC8085" format="default"/> are relevant Best Current Practices (BCPs) in t
his space.</t>
<t>Although in many cases a system for network telemetry involves a remo
te data collecting and consuming entity, it is important to understand that ther
e are no inherent assumptions about how a system should be architected. While a
network architecture with a centralized controller (e.g., SDN) seems to be a nat
ural fit for network telemetry, network telemetry can work in distributed fashio
ns as well. For example, telemetry data producers and consumers can have a peer
-to-peer relationship, in which a network node can be the direct consumer of tel
emetry data from other nodes. </t>
</section>
<section numbered="true" toc="default">
<name>The Necessity of a Network Telemetry Framework</name>
<t>Network data analytics (e.g., machine learning) is applied for networ
k operation automation, relying on abundant and coherent data from networks. Dat
a acquisition that is limited to a single source and static in nature will in ma
ny cases not be sufficient to meet an application's telemetry data needs. As a r
esult, multiple data sources, involving a variety of techniques and standards, w
ill need to be integrated. It is desirable to have a framework that classifies a
nd organizes different telemetry data sources and types, defines different compo
nents of a network telemetry system and their interactions, and helps coordinate
and integrate multiple telemetry approaches across layers. This allows flexible
combinations of data for different applications, while normalizing and simplify
ing interfaces. In detail, such a framework would benefit the development of net
work operation applications for the following reasons:</t>
<ul spacing="normal">
<li>Future networks, autonomous or otherwise, depend on holistic and c
omprehensive network visibility. Use cases and applications are better when supp
orted uniformly and coherently using an integrated, converged mechanism and comm
on telemetry data representations wherever feasible. Therefore, the protocols an
d mechanisms should be consolidated into a minimum yet comprehensive set. A tele
metry framework can help to normalize the technique developments.</li>
<li>Network visibility presents multiple viewpoints. For example, the
device viewpoint takes the network infrastructure as the monitoring object from
which the network topology and device status can be acquired, and the traffic vi
ewpoint takes the flows or packets as the monitoring object from which the traff
ic quality and path can be acquired. An application may need to switch its viewp
oint during operation. It may also need to correlate a service and its impact on
user experience (UE) to acquire the comprehensive information.</li>
<li>Applications require network telemetry to be elastic in order to m
ake efficient use of network resources and reduce the impact of processing relat
ed to network telemetry on network performance. For example, routine network mon
itoring should cover the entire network with a low data sampling rate. Only when
issues arise or critical trends emerge should telemetry data sources be modifie
d and telemetry data rates be boosted as needed.</li>
<li>Efficient data aggregation is critical for applications to reduce
the overall quantity of data and improve the accuracy of analysis.</li>
</ul>
<t>A telemetry framework collects all the telemetry-related works from d
ifferent sources and working groups within the IETF. This makes it possible to a
ssemble a comprehensive network telemetry system and to avoid repetitious or red
undant work. The framework should cover the concepts and components from the sta
ndardization perspective. This document describes the modules that make up a net
work telemetry framework and decomposes the telemetry system into a set of disti
nct components that existing and future work can easily map to.</t>
</section>
</section>
<section anchor="framework" numbered="true" toc="default">
<name>Network Telemetry Framework</name>
<t> The top-level network telemetry framework partitions the network telem
etry into four modules based on the telemetry data object source and represents
their relationship. Once the network operation applications acquire the data fro
m these modules, they can apply data analytics and take actions. At the next lev
el, the framework decomposes each module into separate components. Each of these
modules follows the same underlying structure, with one component dedicated to
the configuration of data subscriptions and data sources, a second component ded
icated to encoding and exporting data, and a third component instrumenting the g
eneration of telemetry related to the underlying resources. Throughout the frame
work, the same set of abstract data-acquiring mechanisms and data types (<xref t
arget="sec_type" format="default"/>) are applied. The two-level architecture wit
h the uniform data abstraction helps accurately pinpoint a protocol or technique
to its position in a network telemetry system or disaggregates a network teleme
try system into manageable parts.</t>
<section numbered="true" toc="default">
<name>Top-Level Modules</name>
<t> Telemetry can be applied on the forwarding plane, control plane, and
management plane in a network, as well as on other sources out of the network,
as shown in <xref target="figure_1" format="default"/>. Therefore, we categorize
the network telemetry into four distinct modules (management plane, control pla
ne, forwarding plane, and external data and event telemetry) with each having it
s own interface to network operation applications.</t>
<figure anchor="figure_1">
<name>Modules in Layer Category of the Network Telemetry Framework</na
me>
<artwork name="" type="" align="left" alt=""><![CDATA[
+------------------------------+ +------------------------------+
| | | |
| Network Operation |<-------+ | Network Operation |<-------+
| Applications | | | Applications | |
| | | | | |
+------------------------------+ | +------------------------------+ |
^ ^ ^ | ^ ^ ^ |
| | | | | | | |
V V | V V V | V
+--------------+-----------|---+ +-----------+ +--------------+-----------|---+ +-----------+
skipping to change at line 262 skipping to change at line 264
| Management | ^ V | | Telemetry | | Management | ^ V | | Telemetry |
| Plane +-------|-------+ | | | Plane +-------|-------+ | |
| Telemetry | V | +-----------+ | Telemetry | V | +-----------+
| | Forwarding | | | Forwarding |
| | Plane | | | Plane |
| <---> | | <---> |
| | Telemetry | | | Telemetry |
| | | | | |
+--------------+---------------+ +--------------+---------------+
]]></artwork> ]]></artwork>
</figure> </figure>
</t> <t>The rationale of this partition lies in the different telemetry data
<t>The rationale of this partition lies in the different telemetry data objects objects that result in different data sources and export locations. Such differe
which result in different data source and export locations. Such differences hav nces have profound implications on in-network data programming and processing ca
e profound implications on in-network data programming and processing capability pability, data encoding and the transport protocol, and required data bandwidth
, data encoding and transport protocol, and required data bandwidth and latency. and latency. Data can be sent directly or proxied via the control and management
Data can be sent directly, or proxied via the control and management planes. Th planes. There are advantages/disadvantages to both approaches.</t>
ere are advantages/disadvantages to both approaches.</t> <t>Note that in some cases, the network controller itself may be the sou
<t>Note that in some cases the network controller itself may be the source of te rce of telemetry data that is unique to it or derived from the telemetry data co
lemetry data that is unique to it or derived from the telemetry data collected f llected from the network elements. Some of the principles and taxonomy specific
rom the network elements. Some of the principles and taxonomy specific to the co to the control plane and management plane telemetry could also be applied to the
ntrol plane and management plane telemetry could also be applied to the controll controller when it is required to provide the telemetry data to network operati
er when it is required to provide the telemetry data to Network Operation Applic on applications hosted outside. The scope of this document is focused on the net
ations hosted outside. The scope of the document is focused on the network eleme work elements telemetry, and further details related to controllers are thus out
nts telemetry and further details related to controllers are thus out of scope. of scope. </t>
</t> <t>We summarize the major differences of the four modules in <xref targe
t="table_1"/>. They are compared from six angles:</t>
<ul spacing="normal">
<li>Data Object</li>
<li>Data Export Location</li>
<li>Data Model</li>
<li>Data Encoding</li>
<li>Telemetry Application Protocol</li>
<li>Data Transport Method</li>
</ul>
<t>Data Object is the target and source of each module. Because the data
source varies, the location where data is mostly conveniently exported also var
ies. For example, forwarding plane data mainly originates as data exported from
the forwarding Application-Specific Integrated Circuits (ASICs), while control p
lane data mainly originates from the protocol daemons running on the control CPU
(s). For convenience and efficiency, it is preferred to export the data off the
device from locations near the source. Because the locations that can export dat
a have different capabilities, different choices of data models, encoding, and t
ransport methods are made to balance the performance and cost. For example, the
forwarding chip has high throughput but limited capacity for processing complex
data and maintaining state, while the main control CPU is capable of complex dat
a and state processing but has limited bandwidth for high throughput data. As a
result, the suitable telemetry protocol for each module can be different. Some r
epresentative techniques are shown in the corresponding table blocks to highligh
t the technical diversity of these modules. Note that the selected techniques ju
st reflect the de facto state of the art and are by no means exhaustive (e.g., I
PFIX can also be implemented over TCP and SCTP, but that is not recommended for
the forwarding plane). The key point is that one cannot expect to use a universa
l protocol to cover all the network telemetry requirements. </t>
<t>We summarize the major differences of the four modules in the following table <table anchor="table_1">
. They are compared from six angles:</t> <name>Comparison of Data Object Modules</name>
<t> <thead>
<list style="symbols"> <tr>
<t>Data Object</t> <th>Module</th>
<t>Data Export Location</t> <th>Management Plane</th>
<t>Data Model</t> <th>Control Plane</th>
<t>Data Encoding</t> <th>Forwarding Plane</th>
<t>Telemetry Application Protocol</t> <th>External Data</th>
<t>Data Transport Method</t> </tr>
</list> </thead>
</t> <tbody>
<t>Data Object is the target and source of each module. Because the data source <tr>
varies, the location where data is mostly conveniently exported also varies. For <td>Object</td>
example, forwarding plane data mainly originates as data exported from the forw <td>configuration and operation state</td>
arding Application-Specific Integrated Circuits (ASICs), while control plane dat <td>control protocol and signaling, RIB</td>
a mainly originates from the protocol daemons running on the control CPU(s). For <td>flow and packet QoS, traffic stat., buffer and queue stat., FIB, Acces
convenience and efficiency, it is preferred to export the data off the device f s Control List (ACL)</td>
rom locations near the source. Because the locations that can export data have d <td>terminal, social, and environmental</td>
ifferent capabilities, different choices of data model, encoding, and transport </tr>
method are made to balance the performance and cost. For example, the forwarding <tr>
chip has high throughput but limited capacity for processing complex data and m <td>Export Location</td>
aintaining state, while the main control CPU is capable of complex data and stat <td>main control CPU</td>
e processing, but has limited bandwidth for high throughput data. As a result, t <td>main control CPU, linecard CPU, or forwarding chip</td>
he suitable telemetry protocol for each module can be different. Some representa <td>forwarding chip or linecard CPU; main control CPU unlikely</td>
tive techniques are shown in the corresponding table blocks to highlight the tec <td>various</td>
hnical diversity of these modules. Note that the selected techniques just reflec </tr>
t the de facto state of the art and are by no means exhaustive (e.g., IPFIX can <tr>
also be implemented over TCP and SCTP, but that is not recommended for forwardin <td>Data Model</td>
g plane). The key point is that one cannot expect to use a universal protocol to <td>YANG, MIB, syslog</td>
cover all the network telemetry requirements. </t> <td>YANG, custom</td>
<t> <td>YANG, custom</td>
<figure anchor="figure_2" title="Comparison of the Data Object Modules"> <td>YANG, custom</td>
<artwork><![CDATA[ </tr>
+-----------+-------------+-------------+--------------+----------+ <tr>
| Module |Management |Control |Forwarding |External | <td>Data Encoding</td>
| |Plane |Plane |Plane |Data | <td>GPB, JSON, XML</td>
+-----------+-------------+-------------+--------------+----------+ <td>GPB, JSON, XML, plain text</td>
|Object |config. & |control |flow & packet |terminal, | <td>plain text</td>
| |operation |protocol & |QoS, traffic |social & | <td>GPB, JSON, XML, plain text</td>
| |state |signaling, |stat., buffer |environ- | </tr>
| | |RIB |& queue stat.,|mental | <tr>
| | | |ACL, FIB | | <td>Application Protocol</td>
+-----------+-------------+-------------+--------------+----------+ <td>gRPC, NETCONF, RESTCONF</td>
|Export |main control |main control |fwding chip |various | <td>gRPC, NETCONF, IPFIX, traffic mirroring</td>
|Location |CPU |CPU, |or linecard | | <td>IPFIX, traffic mirroring, gRPC, NETFLOW</td>
| | |linecard CPU |CPU; main | | <td>gRPC</td>
| | |or forwarding|control CPU | | </tr>
| | |chip |unlikely | | <tr>
+-----------+-------------+-------------+--------------+----------+ <td>Data Transport</td>
|Data |YANG, MIB, |YANG, |YANG |YANG, | <td>HTTP(S), TCP</td>
|Model |syslog |custom |custom, |custom | <td>HTTP(S), TCP, UDP</td>
+-----------+-------------+-------------+--------------+----------+ <td>UDP</td>
|Data |GPB, JSON, |GPB, JSON, |plain text |GPB, JSON | <td>HTTP(S), TCP, UDP</td>
|Encoding |XML |XML, | |XML, plain| </tr>
| | |plain text | |text | </tbody>
+-----------+-------------+-------------+--------------+----------+ </table>
|Application|gRPC,NETCONF,|gRPC,NETCONF,|IPFIX, traffic|gRPC |
|Protocol |RESTCONF |IPFIX,traffic|mirroring, | |
| | |mirroring |gRPC, NETFLOW | |
+-----------+-------------+-------------+--------------+----------+
|Data |HTTP(S), TCP |HTTP(S), TCP,|UDP |HTTP(S), |
|Transport | |UDP | |TCP, UDP |
+-----------+-------------+-------------+--------------+----------+
]]>
</artwork>
</figure>
</t>
<t>Note that the interaction with the applications that consume network telemetr
y data can be indirect. Some in-device data transfer is possible. For example, i
n the management plane telemetry, the management plane will need to acquire data
from the data plane. Some operational states can only be derived from data plan
e data sources such as the interface status and statistics. As another example,
obtaining control plane telemetry data may require the ability to access the For
warding Information Base (FIB) of the data plane.</t>
<t>On the other hand, an application may involve more than one plane and interac
t with multiple planes simultaneously. For example, an SLA compliance applicatio
n may require both the data plane telemetry and the control plane telemetry.</t>
<t>The requirements and challenges for each module are summarized as follows (no
te that the requirements may pertain across all telemetry modules; however, we e
mphasize those that are most pronounced for a particular plane).</t>
<section title="Management Plane Telemetry">
<t>The management plane of network elements interacts with the Network Managemen
t System (NMS), and provides information such as performance data, network loggi
ng data, network warning and defects data, and network statistics and state data
. The management plane includes many protocols, including the classical SNMP and
syslog. Regardless the protocol, management plane telemetry must address the fo
llowing requirements:</t>
<t>
<list style="symbols">
<t>Convenient Data Subscription: An application should have the freedom to choos
e which data is exported (see section 4.3) and the means and frequency of how th
at data is exported (e.g., on-change or periodic subscription).</t>
<t>Structured Data: For automatic network operation, machines will replace human
for network data comprehension. Data modeling languages, such as YANG, can effi
ciently describe structured data and normalize data encoding and transformation.
</t>
<t>High Speed Data Transport: In order to keep up with the velocity of informati
on, a data source needs to be able to send large amounts of data at high frequen
cy. Compact encoding formats or data compression schemes are needed to reduce th
e quantity of data and improve the data transport efficiency. The subscription m
ode, by replacing the query mode, reduces the interactions between clients and s
ervers and helps to improve the data source's efficiency.</t>
<t>Network Congestion Avoidance: The application must protect the network from c
ongestion by congestion control mechanisms or at least circuit breakers. <xref t
arget="RFC8084" /> and <xref target="RFC8085" /> provide some solutions in this
space.</t>
</list>
</t>
</section>
<section title="Control Plane Telemetry">
<t>The control plane telemetry refers to the health condition monitoring of diff
erent network control protocols at all layers of the protocol stack. Keeping tra
ck of the operational status of these protocols is beneficial for detecting, loc
alizing, and even predicting various network issues, as well as network optimiza
tion, in real-time and with fine granularity. Some particular challenges and iss
ues faced by the control plane telemetry are as follows: </t>
<t>
<list style="symbols">
<t>One challenging problem for the control plane telemetry is how to correlate t
he End-to-End (E2E) Key Performance Indicators (KPI) to a specific layer's KPIs.
For example, IPTV users may describe their User Experience (UE) by the video sm
oothness and definition. Then in case of an unusually poor UE KPI or a service d
isconnection, it is non-trivial to delimit and pinpoint the issue in the respons
ible protocol layer (e.g., the Transport Layer or the Network Layer), the respon
sible protocol (e.g., ISIS or BGP at the Network Layer), and finally the respons
ible device(s) with specific reasons. </t>
<t> Conventional OAM-based approaches for control plane KPI measurement include
Ping (L3), Traceroute (L3), <xref target="y1731">Y.1731</xref> (L2), and so on.
One common issue behind these methods is that they only measure the KPIs instead
of reflecting the actual running status of these protocols, making them less ef
fective or efficient for control plane troubleshooting and network optimization.
</t>
<t> An example of the control plane telemetry is the BGP monitoring protocol (BM
P). It is currently used for monitoring the BGP routes and enables rich applicat
ions, such as BGP peer analysis, AS analysis, prefix analysis, and security anal
ysis. However, the monitoring of other layers, protocols and the cross-layer, cr
oss-protocol KPI correlations are still in their infancy (e.g., IGP monitoring i
s not as extensive as BMP), which require further research. </t>
<t> The requirement and solutions for network congestion avoidance are also appl
icable to the control plane telemetry. </t>
</list>
</t>
</section>
<section title="Forwarding Plane Telemetry">
<t>An effective forwarding plane telemetry system relies on the data that the ne
twork device can expose. The quality, quantity, and timeliness of data must meet
some stringent requirements. This raises some challenges to the network data pl
ane devices where the first-hand data originates.</t>
<t>
<list style="symbols">
<t>A data plane device's main function is user traffic processing and forwarding
. While supporting network visibility is important, the telemetry is just an aux
iliary function, and it should strive to not impede normal traffic processing an
d forwarding (i.e., the forwarding behavior should not be altered and the trade-
off between forwarding performance and telemetry should be well-balanced).</t>
<t>Network operation applications require end-to-end visibility across various s
ources, which can result in a huge volume of data. However, the sheer quantity o
f data must not exhaust the network bandwidth, regardless of the data delivery a
pproach (i.e., whether through in-band or out-of-band channels).</t>
<t>The data plane devices must provide timely data with the minimum possible del
ay. Long processing, transport, storage, and analysis delay can impact the effec
tiveness of the control loop and even render the data useless.</t>
<t>The data should be structured and labeled, and easy for applications to parse
and consume. At the same time, the data types needed by applications can vary s
ignificantly. The data plane devices need to provide enough flexibility and prog
rammability to support the precise data provision for applications.</t>
<t>The data plane telemetry should support incremental deployment and work even
though some devices are unaware of the system.</t>
<t>The requirement and solutions for network congestion avoidance are also appli
cable to the forwarding plane telemetry.</t>
</list>
</t>
<t>Although not specific to the forwarding plane, these challenges are more diff
icult to the forwarding plane because of the limited resource and flexibility. D
ata plane programmability is essential to support network telemetry. Newer data
plane forwarding chips are equipped with advanced telemetry features and provide
flexibility to support customized telemetry functions. </t>
<t>Technique Taxonomy: concerning about how one instruments the telemetry, there <t>Note that the interaction with the applications that consume network
can be multiple possible dimensions to classify the forwarding plane telemetry telemetry data can be indirect. Some in-device data transfer is possible. For ex
techniques.</t> ample, in the management plane telemetry, the management plane will need to acqu
<t> ire data from the data plane. Some operational states can only be derived from d
<list style="symbols"> ata plane data sources such as the interface status and statistics. As another e
<t> Active, Passive, and Hybrid: This dimension concerns about the end-to-end me xample, obtaining control plane telemetry data may require the ability to access
asurement. Active and passive methods (as well as the hybrid types) are well doc the Forwarding Information Base (FIB) of the data plane.</t>
umented in <xref target="RFC7799"/>. Passive methods include TCPDUMP, <xref targ <t>On the other hand, an application may involve more than one plane and
et="RFC7011">IPFIX</xref>, sFlow, and traffic mirroring. These methods usually h interact with multiple planes simultaneously. For example, an SLA compliance ap
ave low data coverage. The bandwidth cost is very high in order to improve the d plication may require both the data plane telemetry and the control plane teleme
ata coverage. On the other hand, active methods include Ping, <xref target="RFC4 try.</t>
656">OWAMP</xref>, <xref target="RFC5357">TWAMP</xref>, <xref target="RFC8762">S <t>The requirements and challenges for each module are summarized as fol
TAMP</xref>, and <xref target="RFC6812">Cisco's SLA Protocol</xref>. These metho lows (note that the requirements may pertain across all telemetry modules; howev
ds are intrusive and only provide indirect network measurements. Hybrid methods, er, we emphasize those that are most pronounced for a particular plane).</t>
including <xref target="I-D.ietf-ippm-ioam-data">in-situ OAM</xref>, <xref targ <section numbered="true" toc="default">
et="RFC8321">Alternate-Marking (AM)</xref>, and <xref target="RFC8889">Multipoin <name>Management Plane Telemetry</name>
t Alternate Marking</xref>, provide a well-balanced and more flexible approach. <t>The management plane of network elements interacts with the Network
However, these methods are also more complex to implement.</t> Management System (NMS) and provides information such as performance data, netw
<t> In-Band and Out-of-Band: Telemetry data carried in user packets before being ork logging data, network warning and defects data, and network statistics and s
exported to a data collector is considered in-band (e.g., <xref target="I-D.iet tate data. The management plane includes many protocols, including the classical
f-ippm-ioam-data">in-situ OAM</xref>). Telemetry data that is directly exported SNMP and syslog. Regardless the protocol, management plane telemetry must addre
to a data collector without modifying user packets is considered out-of-band (e. ss the following requirements:</t>
g., the postcard-based approach described in <xref target="pbt" />). It is also <ul spacing="normal">
possible to have hybrid methods, where only the telemetry instruction or partial <li>Convenient Data Subscription: An application should have the fre
data is carried by user packets (e.g., <xref target="RFC8321">AM</xref>). </t> edom to choose which data is exported (see <xref target="sec_type" format="defau
<t> End-to-End and In-Network: End-to-End methods start from, and end at, the ne lt"/>) and the means and frequency of how that data is exported (e.g., on-change
twork end hosts (e.g., Ping). In-Network methods work in networks and are transp or periodic subscription).</li>
arent to end hosts. However, if needed, In-Network methods can be easily extende <li>Structured Data: For automatic network operation, machines will
d into end hosts. </t> replace humans for network data comprehension. Data modeling languages, such as
<t> Data Subject: Depending on the telemetry objective, the methods can be flow- YANG, can efficiently describe structured data and normalize data encoding and t
based (e.g., <xref target="I-D.ietf-ippm-ioam-data">in-situ OAM</xref>), path-ba ransformation.</li>
sed (e.g., Traceroute), and node-based (e.g., <xref target="RFC7011">IPFIX</xref <li>High-Speed Data Transport: In order to keep up with the velocity
>). The various data objects can be packet, flow record, measurement, states, an of information, a data source needs to be able to send large amounts of data at
d signal.</t> high frequency. Compact encoding formats or data compression schemes are needed
</list> to reduce the quantity of data and improve the data transport efficiency. The s
</t> ubscription mode, by replacing the query mode, reduces the interactions between
</section> clients and servers and helps to improve the data source's efficiency.</li>
<section title="External Data Telemetry">
<t>Events that occur outside the boundaries of the network system are another im <li>Network Congestion Avoidance: The application must protect the
portant source of network telemetry. Correlating both internal telemetry data an network from congestion with congestion control mechanisms or,
d external events with the requirements of network systems, as presented in <xre at minimum, with circuit breakers. <xref target="RFC8084" format="default"/>
f target="I-D.pedro-nmrg-anticipated-adaptation"/>, provides a strategic and fun and <xref target="RFC8085" format="default"/> provide some solutions in this spa
ctional advantage to management operations. </t> ce.</li>
<t>As with other sources of telemetry information, the data and events must meet </ul>
strict requirements, especially in terms of timeliness, which is essential to p </section>
roperly incorporate external event information into network management applicati <section numbered="true" toc="default">
ons. The specific challenges are described as follows:</t> <name>Control Plane Telemetry</name>
<t> <t>The control plane telemetry refers to the health condition monitori
<list style="symbols"> ng of different network control protocols at all layers of the protocol stack. K
<t>The role of the external event detector can be played by multiple elements, i eeping track of the operational status of these protocols is beneficial for dete
ncluding hardware (e.g., physical sensors, such as seismometers) and software (e cting, localizing, and even predicting various network issues, as well as for ne
.g., Big Data sources that can analyze streams of information, such as Twitter m twork optimization, in real time and with fine granularity. Some particular chal
essages). Thus, the transmitted data must support different shapes but, at the s lenges and issues faced by the control plane telemetry are as follows: </t>
ame time, follow a common but extensible schema. </t>
<t>Since the main function of the external event detectors is to perform the not <ul spacing="normal">
ifications, their timeliness is assumed. However, once messages have been dispat <li>How to correlate the End-to-End (E2E) Key Performance Indicators
ched, they must be quickly collected and inserted into the control plane with va (KPIs) to a specific layer's KPIs. For example, IPTV users may describe their U
riable priority, which is higher for important sources and events and lower for E by the video smoothness and definition. Then in case of an unusually poor UE K
secondary ones. </t> PI or a service disconnection, it is non-trivial to delimit and pinpoint the iss
<t>The schema used by external detectors must be easily adopted by current and f ue in the responsible protocol layer (e.g., the transport layer or the network l
uture devices and applications. Therefore, it must be easily mapped to current d ayer), the responsible protocol (e.g., IS-IS or BGP at the network layer), and f
ata models, such as in terms of YANG. </t> inally the responsible device(s) with specific reasons. </li>
<t>As the communication with external entities outside the boundary of a provide <li> Conventional OAM-based approaches for control plane KPI measure
r network may be realized over the Internet, the risk of congestion is even more ment, which include Ping (L3), Traceroute (L3), <xref target="y1731" format="def
relevant in this context and proper counter-measures must be taken. Solutions s ault">Y.1731</xref> (L2), and so on. One common issue behind these methods is th
uch as network transport circuit breakers are needed as well.</t> at they only measure the KPIs instead of reflecting the actual running status of
</list> these protocols, making them less effective or efficient for control plane trou
</t> bleshooting and network optimization. </li>
<t>Organizing both internal and external telemetry information together will be <li> How more research is needed for the BGP monitoring protocol (BM
key for the general exploitation of the management possibilities of current and P). BMP is an example of the control plane telemetry; it is currently used for m
future network systems, as reflected in the incorporation of cognitive capabilit onitoring BGP routes and enables rich applications, such as BGP peer analysis, A
ies to new hardware and software (virtual) elements. </t> utonomous System (AS) analysis, prefix analysis, and security analysis. However,
</section> the monitoring of other layers, protocols, and the cross-layer, cross-protocol
</section> KPI correlations are still in their infancy (e.g., IGP monitoring is not as exte
<section title="Second Level Function Components"> nsive as BMP), which requires further research. </li>
<t>The telemetry module at each plane can be further partitioned into five disti </ul>
nct conceptual components:</t> <t> Note that the requirement and solutions for network congest
<t> ion avoidance are also applicable to the control plane telemetry. </t>
<list style="symbols"> </section>
<t> Data Query, Analysis, and Storage: This component works at the network opera <section numbered="true" toc="default">
tion application block in <xref target="figure_1"/>. It is normally a part of th <name>Forwarding Plane Telemetry</name>
e network management system at the receiver side. On the one hand, it is respons <t>An effective forwarding plane telemetry system relies on the data t
ible for issuing data requirements. The data of interest can be modeled data thr hat the network device can expose. The quality, quantity, and timeliness of data
ough configuration or custom data through programming. The data requirements can must meet some stringent requirements. This raises some challenges for the netw
be queries for one-shot data or subscriptions for events or streaming data. On ork data plane devices where the first-hand data originates.</t>
the other hand, it receives, stores, and processes the returned data from networ <ul spacing="normal">
k devices. Data analysis can be interactive to initiate further data queries. Th <li>A data plane device's main function is user traffic processing a
is component can reside in either network devices or remote controllers. It can nd forwarding. While supporting network visibility is important, the telemetry i
be centralized and distributed, and involve one or more instances.</t> s just an auxiliary function, and it should strive to not impede normal traffic
<t> Data Configuration and Subscription: This component manages data queries on processing and forwarding (i.e., the forwarding behavior should not be altered,
devices. It determines the protocol and channel for applications to acquire desi and the trade-off between forwarding performance and telemetry should be well-ba
red data. This component is also responsible for configuring the desired data th lanced).</li>
at might not be directly available from data sources. The subscription data can <li>Network operation applications require end-to-end visibility acr
be described by models, templates, or programs. </t> oss various sources, which can result in a huge volume of data. However, the she
<t> Data Encoding and Export: This component determines how telemetry data is de er quantity of data must not exhaust the network bandwidth, regardless of the da
livered to the data analysis and storage component with access control. The data ta delivery approach (i.e., whether through in-band or out-of-band channels).</l
encoding and the transport protocol may vary due to the data export location.</ i>
t> <li>The data plane devices must provide timely data with the minimum
<t> Data Generation and Processing: The requested data needs to be captured, fil possible delay. Long processing, transport, storage, and analysis delay can imp
tered, processed, and formatted in network devices from raw data sources. This m act the effectiveness of the control loop and even render the data useless.</li>
ay involve in-network computing and processing on either the fast path or the sl <li>The data should be structured, labeled, and easy for application
ow path in network devices.</t> s to parse and consume. At the same time, the data types needed by applications
<t> Data Object and Source: This component determines the monitoring objects and can vary significantly. The data plane devices need to provide enough flexibilit
original data sources provisioned in the device. A data source usually just pro y and programmability to support the precise data provision for applications.</l
vides raw data which needs further processing. Each data source can be considere i>
d a probe. Some data sources can be dynamically installed, while others will be <li>The data plane telemetry should support incremental deployment a
more static.</t> nd work even though some devices are unaware of the system.</li>
</list> <li>The requirement and solutions for network congestion avoidance a
</t> re also applicable to the forwarding plane telemetry.</li>
<t> </ul>
<figure anchor="figure_3" title="Components in the Network Telemetry Framework"> <t>Although not specific to the forwarding plane, these challenges are
<artwork><![CDATA[ more difficult for the forwarding plane because of the limited resources and fl
exibility. Data plane programmability is essential to support network telemetry.
Newer data plane forwarding chips are equipped with advanced telemetry features
and provide flexibility to support customized telemetry functions. </t>
<t>Technique Taxonomy: This pertains to how one instruments the teleme
try; there can be multiple possible dimensions to classify the forwarding plane
telemetry techniques.</t>
<ul spacing="normal">
<li> Active, Passive, and Hybrid: This dimension pertains to the end
-to-end measurement. Active and passive methods (as well as the hybrid types) ar
e well documented in <xref target="RFC7799" format="default"/>. Passive methods
include TCPDUMP, <xref target="RFC7011" format="default">IPFIX</xref>, sFlow, an
d traffic mirroring. These methods usually have low data coverage. The bandwidth
cost is very high in order to improve the data coverage. On the other hand, act
ive methods include Ping, the <xref target="RFC4656" format="default">One-Way Ac
tive Measurement Protocol (OWAMP)</xref>, the <xref target="RFC5357" format="def
ault">Two-Way Active Measurement Protocol (TWAMP)</xref>, the <xref target="RFC8
762" format="default">Simple Two-way Active Measurement Protocol (STAMP)</xref>,
and <xref target="RFC6812" format="default">Cisco's SLA Protocol</xref>. These
methods are intrusive and only provide indirect network measurements. Hybrid met
hods, including <xref target="RFC9197" format="default">IOAM</xref>, <xref targe
t="RFC8321" format="default">Alternate Marking (AM)</xref>, and <xref target="RF
C8889" format="default">Multipoint Alternate Marking</xref>, provide a well-bala
nced and more flexible approach. However, these methods are also more complex to
implement.</li>
<li> In-Band and Out-of-Band: Telemetry data carried in user packets
before being exported to a data collector is considered in-band (e.g., <xref ta
rget="RFC9197" format="default">IOAM</xref>). Telemetry data that is directly ex
ported to a data collector without modifying user packets is considered out-of-b
and (e.g., the postcard-based approach described in <xref target="pbt" format="d
efault"/>). It is also possible to have hybrid methods, where only the telemetry
instruction or partial data is carried by user packets (e.g., <xref target="RFC
8321" format="default">AM</xref>). </li>
<li> End-to-End and In-Network: End-to-end methods start from, and e
nd at, the network end hosts (e.g., Ping). In-network methods work in networks a
nd are transparent to end hosts. However, if needed, in-network methods can be e
asily extended into end hosts. </li>
<li> Data Subject: Depending on the telemetry objective, the methods
can be flow based (e.g., <xref target="RFC9197" format="default">IOAM</xref>),
path based (e.g., Traceroute), and node based (e.g., <xref target="RFC7011" form
at="default">IPFIX</xref>). The various data objects can be packet, flow record,
measurement, states, and signal.</li>
</ul>
</section>
<section numbered="true" toc="default">
<name>External Data Telemetry</name>
<t>Events that occur outside the boundaries of the network system are
another important source of network telemetry. Correlating both internal telemet
ry data and external events with the requirements of network systems, as present
ed in <xref target="NMRG-ANTICIPATED-ADAPTATION" format="default"/>, provides a
strategic and functional advantage to management operations. </t>
<t>As with other sources of telemetry information, the data and events
must meet strict requirements, especially in terms of timeliness, which is esse
ntial to properly incorporate external event information into network management
applications. The specific challenges are described as follows:</t>
<ul spacing="normal">
<li>The role of the external event detector can be played by multipl
e elements, including hardware (e.g., physical sensors, such as seismometers) an
d software (e.g., big data sources that can analyze streams of information, such
as Twitter messages). Thus, the transmitted data must support different shapes
but, at the same time, follow a common but extensible schema. </li>
<li>Since the main function of the external event detectors is to pe
rform the notifications, their timeliness is assumed. However, once messages hav
e been dispatched, they must be quickly collected and inserted into the control
plane with variable priority, which is higher for important sources and events a
nd lower for secondary ones. </li>
<li>The schema used by external detectors must be easily adopted by
current and future devices and applications. Therefore, it must be easily mapped
to current data models, such as in terms of YANG. </li>
<li>As the communication with external entities outside the boundary
of a provider network may be realized over the Internet, the risk of congestion
is even more relevant in this context and proper countermeasures must be taken.
Solutions such as network transport circuit breakers are needed as well.</li>
</ul>
<t>Organizing both internal and external telemetry information togethe
r will be key for the general exploitation of the management possibilities of cu
rrent and future network systems, as reflected in the incorporation of cognitive
capabilities to new hardware and software (virtual) elements. </t>
</section>
</section>
<section numbered="true" toc="default">
<name>Second-Level Function Components</name>
<t>The telemetry module at each plane can be further partitioned into fi
ve distinct conceptual components:</t>
<ul spacing="normal">
<li> Data Query, Analysis, and Storage: This component works at the ne
twork operation application block in <xref target="figure_1" format="default"/>.
It is normally a part of the network management system at the receiver side. On
one hand, it is responsible for issuing data requirements. The data of interest
can be modeled data through configuration or custom data through programming. T
he data requirements can be queries for one-shot data or subscriptions for event
s or streaming data. On the other hand, it receives, stores, and processes the r
eturned data from network devices. Data analysis can be interactive to initiate
further data queries. This component can reside in either network devices or rem
ote controllers. It can be centralized and distributed and involve one or more i
nstances.</li>
<li> Data Configuration and Subscription: This component manages data
queries on devices. It determines the protocol and channel for applications to a
cquire desired data. This component is also responsible for configuring the desi
red data that might not be directly available from data sources. The subscriptio
n data can be described by models, templates, or programs. </li>
<li> Data Encoding and Export: This component determines how telemetry
data is delivered to the data analysis and storage component with access contro
l. The data encoding and the transport protocol may vary due to the data export
location.</li>
<li> Data Generation and Processing: The requested data needs to be ca
ptured, filtered, processed, and formatted in network devices from raw data sour
ces. This may involve in-network computing and processing on either the fast pat
h or the slow path in network devices.</li>
<li> Data Object and Source: This component determines the monitoring
objects and original data sources provisioned in the device. A data source usual
ly just provides raw data that needs further processing. Each data source can be
considered a probe. Some data sources can be dynamically installed, while other
s will be more static.</li>
</ul>
<figure anchor="figure_3">
<name>Components in the Network Telemetry Framework</name>
<artwork name="" type="" align="left" alt=""><![CDATA[
+----------------------------------------+ +----------------------------------------+
+----------------------------------------+ | +----------------------------------------+ |
| | | | | |
| Data Query, Analysis, & Storage | | | Data Query, Analysis, & Storage | |
| | + | | +
+-------+++ -----------------------------+ +-------+++ -----------------------------+
||| ^^^ ||| ^^^
||| ||| ||| |||
||V ||| ||V |||
+--+V--------------------+++------------+ +--+V--------------------+++------------+
+-----V---------------------+------------+ | +-----V---------------------+------------+ |
+---------------------+-------+----------+ | | +---------------------+-------+----------+ | |
| Data Configuration | | | | | Data Configuration | | | |
| & Subscription | Data Encoding | | | | & Subscription | Data Encoding | | |
| (model, template, | & Export | | | | (model, template, | & Export | | |
| & program) | | | | | & program) | | | |
+---------------------+------------------| | | +---------------------+------------------| | |
| | | | | | | |
| Data Generation | | | | Data Generation | | |
| & Processing | | | | & Processing | | |
| | | | | | | |
+----------------------------------------| | | +----------------------------------------| | |
| | | | | | | |
| Data Object and Source | |-+ | Data Object and Source | |-+
| |-+ | |-+
+----------------------------------------+ +----------------------------------------+
]]></artwork>
]]> </figure>
</artwork> </section>
</figure> <section anchor="sec_type" numbered="true" toc="default">
</t> <name>Data Acquisition Mechanism and Type Abstraction</name>
</section> <t>Broadly speaking, network data can be acquired through subscription (
<section anchor="sec:type" title="Data Acquisition Mechanism and Type Abstractio push) and query (poll). A subscription is a contract between publisher and subsc
n"> riber. After initial setup, the subscribed data is automatically delivered to re
<t>Broadly speaking, network data can be acquired through subscription (push) an gistered subscribers until the subscription expires.
d query (poll). A subscription is a contract between publisher and subscriber. A There are two variations of subscription. The subscriptions can be predef
fter initial setup, the subscribed data is automatically delivered to registered ined, or the subscribers are allowed to configure and tailor the published data
subscribers until the subscription expires. to their specific needs.</t>
There are two variations of subscription. The subscriptions can be either pre-de <t>In contrast, queries are used when a client expects immediate and one
fined, or the subscribers are allowed to configure and tailor the published data -off feedback from network devices. The queried data may be directly extracted f
to their specific needs.</t> rom some specific data source or synthesized and processed from raw data. Querie
<t>In contrast, queries are used when a client expects immediate and one-off fee s work well for interactive network telemetry applications. </t>
dback from network devices. The queried data may be directly extracted from some <t>In general, data can be pulled (i.e., queried) whenever needed, but i
specific data source, or synthesized and processed from raw data. Queries work n many cases, pushing the data (i.e., subscription) is more efficient, and it ca
well for interactive network telemetry applications. </t> n reduce the latency of a client detecting a change. From the data consumer poin
<t>In general, data can be pulled (i.e., queried) whenever needed, but in many c t of view, there are four types of data from network devices that a telemetry da
ases, pushing the data (i.e., subscription) is more efficient, and can reduce th ta consumer can subscribe or query:</t>
e latency of a client detecting a change. From the data consumer point of view, <ul spacing="normal">
there are four types of data from network devices that a telemetry data consumer <li> Simple Data: Data that are steadily available from some datastore
can subscribe or query:</t> or static probes in network devices.</li>
<t> <li> Derived Data: Data that need to be synthesized or processed in th
<list style="symbols"> e network from raw data from one or more network devices. The data processing fu
<t> Simple Data: The data that are steadily available from some datastore or sta nction can be statically or dynamically loaded into network devices.</li>
tic probes in network devices.</t> <li> Event-triggered Data: Data that are conditionally acquired based
<t> Derived Data: The data need to be synthesized or processed in network from r on the occurrence of some events. An example of event-triggered data could be an
aw data from one or more network devices. The data processing function can be st interface changing operational state between up and down. Such data can be acti
atically or dynamically loaded into network devices.</t> vely pushed through subscription or passively polled through query. There are ma
<t> Event-triggered Data: The data are conditionally acquired based on the occur ny ways to model events, including using Finite State Machine (FSM) or <xref tar
rence of some events. An example of event-triggered data could be an interface c get="I-D.ietf-netmod-eca-policy" format="default">Event Condition Action (ECA)</
hanging operational state between up and down. Such data can be actively pushed xref>. </li>
through subscription or passively polled through query. There are many ways to m <li> Streaming Data: Data that are continuously generated. It can be a
odel events, including using Finite State Machine (FSM) or <xref target="I-D.wwx time series or the dump of databases. For example, an interface packet counter
-netmod-event-yang">Event Condition Action (ECA)</xref>. </t> is exported every second. The streaming data reflect real-time network states an
<t> Streaming Data: The data are continuously generated. It can be time series o d metrics and require large bandwidth and processing power. The streaming data a
r the dump of databases. For example, an interface packet counter is exported ev re always actively pushed to the subscribers.</li>
ery second. The streaming data reflect realtime network states and metrics and r </ul>
equire large bandwidth and processing power. The streaming data are always activ <t>The above telemetry data types are not mutually exclusive. Rather, th
ely pushed to the subscribers.</t> ey are often composite. Derived data is composed of simple data; event-triggered
</list> data can be simple or derived; and streaming data can be based on some recurrin
</t> g event. The relationships of these data types are illustrated in <xref target="
<t>The above telemetry data types are not mutually exclusive. Rather, they are o figure_0" format="default"/>. </t>
ften composite. Derived data is composed of simple data; Event-triggered data ca <figure anchor="figure_0">
n be simple or derived; streaming data can be based on some recurring event. The <name>Data Type Relationship</name>
relationships of these data types are illustrated in <xref target="figure_0"/>. <artwork name="" type="" align="left" alt=""><![CDATA[
</t>
<t>
<figure anchor="figure_0" title="Data Type Relationship">
<artwork><![CDATA[
+----------------------+ +-----------------+ +----------------------+ +-----------------+
| Event-triggered Data |<----+ Streaming Data | | Event-Triggered Data |<----+ Streaming Data |
+-------+---+----------+ +-----+---+-------+ +-------+---+----------+ +-----+---+-------+
| | | | | | | |
| | | | | | | |
| | +--------------+ | | | | +--------------+ | |
| +-->| Derived Data |<--+ | | +-->| Derived Data |<--+ |
| +------+------ + | | +------+------ + |
| | | | | |
| V | | V |
| +--------------+ | | +--------------+ |
+------>| Simple Data |<------+ +------>| Simple Data |<------+
+--------------+ +--------------+
]]> ]]></artwork>
</artwork> </figure>
</figure> <t>Subscription usually deals with event-triggered data and streaming da
</t> ta, and query usually deals with simple data and derived data. But the other way
<t>Subscription usually deals with event-triggered data and streaming data, and s are also possible. Advanced network telemetry techniques are designed mainly f
query usually deals with simple data and derived data. But the other ways are al or event-triggered or streaming data subscription and derived data query.</t>
so possible. Advanced network telemetry techniques are designed mainly for event </section>
-triggered or streaming data subscription, and derived data query.</t> <section numbered="true" toc="default">
</section> <name>Mapping Existing Mechanisms into the Framework</name>
<section title="Mapping Existing Mechanisms into the Framework"> <t>The following table shows how the existing mechanisms (mainly publish
<t>The following table shows how the existing mechanisms (mainly published in IE ed in IETF and with the emphasis on the latest new technologies) are positioned
TF and with the emphasis on the latest new technologies) are positioned in the f in the framework. Given the vast body of existing work, we cannot provide an exh
ramework. Given the vast body of existing work, we cannot provide an exhaustive austive list, so the mechanisms in the tables should be considered as just examp
list, so the mechanisms in the tables should be considered as just examples. Als les. Also, some comprehensive protocols and techniques may cover multiple aspect
o, some comprehensive protocols and techniques may cover multiple aspects or mod s or modules of the framework, so a name in a block only emphasizes one particul
ules of the framework, so a name in a block only emphasizes one particular chara ar characteristic of it. More details about some listed mechanisms can be found
cteristic of it. More details about some listed mechanisms can be found in Appen in Appendix A.</t>
dix A.</t>
<t>
<figure anchor="figure_5" title="Existing Work Mapping">
<artwork><![CDATA[
+-------------+-----------------+---------------+--------------+
| | Management | Control | Forwarding |
| | Plane | Plane | Plane |
+-------------+-----------------+---------------+--------------+
| data config.| gNMI, NETCONF, | gNMI, NETCONF,| NETCONF, |
| & subscribe | RESTCONF, SNMP, | RESTCONF, | RESTCONF, |
| | YANG-Push | YANG-Push | YANG-Push |
+-------------+-----------------+---------------+--------------+
| data gen. & | MIB, | YANG | IOAM, PSAMP |
| process | YANG | | PBT, AM, |
+-------------+-----------------+---------------+--------------+
| data encode.| gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP |
| & export | | | |
+-------------+-----------------+---------------+--------------+
]]> <table anchor="table_2">
</artwork> <name>Existing Work Mapping</name>
</figure> <thead>
</t> <tr>
<th></th>
<th>Management Plane</th>
<th>Control Plane</th>
<th>Forwarding Plane</th>
</tr>
</thead>
<tbody>
<tr>
<td>data configuration and subscribe</td>
<td>gNMI, NETCONF, RESTCONF, SNMP, YANG-Push</td>
<td>gNMI, NETCONF, RESTCONF, YANG-Push</td>
<td>NETCONF, RESTCONF, YANG-Push</td>
</tr>
<tr>
<td>data generation and process</td>
<td>MIB, YANG</td>
<td>YANG</td>
<td>IOAM, PSAMP, PBT, AM</td>
</tr>
<tr>
<td>data encoding and export</td>
<td>gRPC, HTTP, TCP</td>
<td>BMP, TCP</td>
<td>IPFIX, UDP</td>
</tr>
</tbody>
</table>
<t>Although the framework is generally suitable for any network environm
ents, the multi-domain telemetry has some unique challenges that deserve further
architectural consideration, which is out of the scope of this document.</t>
</section>
</section>
<section anchor="level" numbered="true" toc="default">
<name>Evolution of Network Telemetry Applications</name>
<t>Network telemetry is an evolving technical area. As the network moves t
owards the automated operation, network telemetry applications undergo several s
tages of evolution, which add a new layer of requirements to the underlying netw
ork telemetry techniques. Each stage is built upon the techniques adopted by the
previous stages plus some new requirements.</t>
<dl newline="false" spacing="normal">
<dt>Stage 0 - Static Telemetry:</dt>
<dd> The telemetry data source and type are determined at design time. T
he network operator can only configure how to use it with limited flexibility. <
/dd>
<dt>Stage 1 - Dynamic Telemetry:</dt>
<dd> The custom telemetry data can be dynamically programmed or configur
ed at runtime without interrupting the network operation, allowing a trade-off a
mong resource, performance, flexibility, and coverage.</dd>
<dt>Stage 2 - Interactive Telemetry:</dt>
<dd> The network operator can continuously customize and fine tune the t
elemetry data in real time to reflect the network operation's visibility require
ments. Compared with Stage 1, the changes are frequent based on the real-time fe
edback. At this stage, some tasks can be automated, but human operators still ne
ed to sit in the middle to make decisions. </dd>
<dt>Stage 3 - Closed-Loop Telemetry:</dt>
<dd> The telemetry is free from the interference of human operators, exc
ept for generating the reports. The intelligent network operation engine automat
ically issues the telemetry data requests, analyzes the data, and updates the ne
twork operations in closed control loops. </dd>
</dl>
<t>Existing technologies are ready for Stages 0 and 1. Individual applicat
ions for Stages 2 and 3 are also possible now. However, the future autonomic net
works may need a comprehensive operation management system that works at Stages
2 and 3 to cover all the network operation tasks. A well-defined network telemet
ry framework is the first step towards this direction. </t>
</section>
<section anchor="Security" numbered="true" toc="default">
<name>Security Considerations</name>
<t>The complexity of network telemetry raises significant security implica
tions. For example, telemetry data can be manipulated to exhaust various network
resources at each plane as well as the data consumer; falsified or tampered dat
a can mislead the decision-making process and paralyze networks; and wrong confi
guration and programming for telemetry is equally harmful. The telemetry data is
highly sensitive, which exposes a lot of information about the network and its
configuration. Some of that information can make designing attacks against the n
etwork much easier (e.g., exact details of what software and patches have been i
nstalled) and allows an attacker to determine whether a device may be subject to
unprotected security vulnerabilities.</t>
<t>Although the framework is generally suitable for any network environments, th <t>Given that this document has proposed a framework for network telemetry
e multi-domain telemetry has some unique challenges which deserve further archit and the telemetry mechanisms discussed are more extensive (in both message freq
ectural consideration, which is out of the scope of this document.</t> uency and traffic amount) than the conventional network OAM concepts, we must al
so anticipate that new security considerations that may also arise. A number of
techniques already exist for securing the forwarding plane, control plane, and m
anagement plane in a network, but it is important to consider if any new threat
vectors are now being enabled via the use of network telemetry procedures and me
chanisms. </t>
<t>This document proposes a conceptual architectural for collecting, trans
porting, and analyzing a wide variety of data sources in support of network appl
ications. The protocols, data formats, and configurations chosen to implement th
is framework will dictate the specific security considerations. These considerat
ions may include:</t>
<ul spacing="normal">
<li>Telemetry framework trust and policy models;</li>
<li>Role management and access control for enabling and disabling teleme
try capabilities;</li>
<li>Protocol transport used for telemetry data and its inherent security
capabilities;</li>
<li>Telemetry data stores, storage encryption, methods of access, and re
tention practices;</li>
<li>Tracking telemetry events and any abnormalities that might identify
malicious attacks using telemetry interfaces.</li>
<li>Authentication and integrity protection of telemetry data to make da
ta more trustworthy; and </li>
<li>Segregating the telemetry data traffic from the data traffic carried
over the network (e.g., historically management access and management data may
be carried via an independent management network).</li>
</ul>
<t>Some security considerations highlighted above may be minimized or nega
ted with policy management of network telemetry. In a network telemetry deployme
nt, it would be advantageous to separate telemetry capabilities into different c
lasses of policies, i.e., Role-Based Access Control and Event-Condition-Action p
olicies. Also, potential conflicts between network telemetry mechanisms must be
detected accurately and resolved quickly to avoid unnecessary network telemetry
traffic propagation escalating into an unintended or intended denial-of-service
attack.</t>
<t>Further study of the security issues will be required, and it is expect
ed that the security mechanisms and protocols are developed and deployed along w
ith a network telemetry system.</t>
</section>
<section anchor="IANA" numbered="true" toc="default">
<name>IANA Considerations</name>
<t>This document has no IANA actions.</t>
</section>
</section> </middle>
</section> <back>
<section anchor="level" title="Evolution of Network Telemetry Applications">
<t>Network telemetry is an evolving technical area. As the network moves towards
the automated operation, network telemetry applications undergo several stages
of evolution which add new layer of requirements to the underlying network telem
etry techniques. Each stage is built upon the techniques adopted by the previous
stages plus some new requirements.</t>
<t>
<list style="hanging">
<t hangText="Stage 0 - Static Telemetry:"> The telemetry data source and type ar
e determined at design time. The network operator can only configure how to use
it with limited flexibility. </t>
<t hangText="Stage 1 - Dynamic Telemetry:"> The custom telemetry data can be dyn
amically programmed or configured at runtime without interrupting the network op
eration, allowing a trade-off among resource, performance, flexibility, and cove
rage. </t>
<t hangText="Stage 2 - Interactive Telemetry:"> The network operator can continu
ously customize and fine tune the telemetry data in real time to reflect the net
work operation's visibility requirements. Compared with Stage 1, the changes are
frequent based on the real-time feedback. At this stage, some tasks can be auto
mated, but human operators still need to sit in the middle to make decisions. </
t>
<t hangText="Stage 3 - Closed-loop Telemetry:"> The telemetry is free from the i
nterference of human operators, except for generating the reports. The intellige
nt network operation engine automatically issues the telemetry data requests, an
alyzes the data, and updates the network operations in closed control loops. </t
>
</list>
</t>
<t>Existing technologies are ready for stage 0 and stage 1. Individual stage 2 a
nd stage 3 applications are also possible now. However, the future autonomic net
works may need a comprehensive operation management system which works at stage
2 and stage 3 to cover all the network operation tasks. A well-defined network t
elemetry framework is the first step towards this direction. </t>
</section>
<section anchor="Security" title="Security Considerations">
<t>The complexity of network telemetry raises significant security implications.
For example, telemetry data can be manipulated to exhaust various network resou
rces at each plane as well as the data consumer; falsified or tampered data can
mislead the decision-making and paralyze networks; wrong configuration and progr
amming for telemetry is equally harmful. The telemetry data is highly sensitive,
which exposes a lot of information about the network and its configuration. Som
e of that information can make designing attacks against the network much easier
(e.g., exact details of what software and patches have been installed), and all
ows an attacker to determine whether a device may be subject to unprotected secu
rity vulnerabilities.</t>
<t>Given that this document has proposed a framework for network telemetry and t
he telemetry mechanisms discussed are more extensive (in both message frequency
and traffic amount) than the conventional network OAM concepts, we must also ref
lect that various new security considerations may also arise. A number of techni
ques already exist for securing the forwarding plane, the control plane, and the
management plane in a network, but it is important to consider if any new threa
t vectors are now being enabled via the use of network telemetry procedures and
mechanisms. </t>
<t>This document proposes a conceptual architectural for collecting, transportin
g, and analyzing a wide variety of data sources in support of network applicatio
ns. The protocols, data formats, and configurations chosen to implement this fra
mework will dictate the specific security considerations. These considerations m
ay include:</t>
<t>
<list style="symbols">
<t>Telemetry framework trust and policy model;</t>
<t>Role management and access control for enabling and disabling telemetry capab
ilities;</t>
<t>Protocol transport used for telemetry data and its inherent security capabili
ties;</t>
<t>Telemetry data stores, storage encryption, methods of access, and retention p
ractices;</t>
<t>Tracking telemetry events and any abnormalities that might identify malicious
attacks using telemetry interfaces.</t>
<t>Authentication and integrity protection of telemetry data to make data more t
rustworthy. </t>
<t>Segregating the telemetry data traffic from the data traffic carried over the
network (e.g., historically management access and management data may be carrie
d via an independent management network).</t>
</list>
</t>
<t>Some security considerations highlighted above may be minimized or negated wi
th policy management of network telemetry. In a network telemetry deployment it
would be advantageous to separate telemetry capabilities into different classes
of policies, i.e., Role Based Access Control and Event-Condition-Action policies
. Also, potential conflicts between network telemetry mechanisms must be detecte
d accurately and resolved quickly to avoid unnecessary network telemetry traffic
propagation escalating into an unintended or intended denial of service attack.
</t>
<t>Further study of the security issues will be required, and it is expected tha
t the security mechanisms and protocols are developed and deployed along with a
network telemetry system.</t>
</section> <displayreference target="I-D.ietf-netconf-distributed-notif" to="NETCONF-DISTRI
<section anchor="IANA" title="IANA Considerations"> B-NOTIF"/>
<t>This document includes no request to IANA.</t> <displayreference target="I-D.ietf-netconf-udp-notif" to="NETCONF-UDP-NOTIF"/>
</section> <displayreference target="I-D.song-ippm-postcard-based-telemetry" to="IPPM-POSTC
<section anchor="Contributors" title="Contributors"> ARD-BASED-TELEMETRY"/>
<t> The other contributors of this document are Tianran Zhou, Zhenbin Li, Zhenqi <displayreference target="I-D.song-opsawg-ifit-framework" to="OPSAWG-IFIT-FRAMEW
ang Li, Daniel King, Adrian Farrel, and Alexander Clemm </t> ORK"/>
</section> <displayreference target="I-D.irtf-nmrg-ibn-concepts-definitions" to="NMRG-IBN-C
<section anchor="Acknowledgments" title="Acknowledgments"> ONCEPTS-DEFINITIONS"/>
<t>We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe Clarke, Vi <displayreference target="I-D.ietf-netmod-eca-policy" to="NETMOD-ECA-POLICY"/>
ctor Liu, James Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Ye
gani, Young Lee, Qin Wu, Gyan Mishra, Ben Schwartz, Alexey Melnikov, Michael Sch
arf, Dhruv Dhody, Martin Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars E
ggert, Eric Vyncke, Jean-Michel Combes, Erik Kline, Benjamin Kaduk, and many oth
ers who have provided helpful comments and suggestions to improve this document.
</t>
</section>
</middle>
<back>
<!--
<references title="Normative References">
<?rfc include='reference.RFC.2119'?>
<?rfc include='reference.RFC.8174'?>
</references>
-->
<references title="Informative References">
<?rfc include='reference.RFC.3954'?>
<?rfc include="reference.RFC.6020"?>
<?rfc include="reference.RFC.7950"?>
<?rfc include="reference.RFC.6241"?>
<?rfc include='reference.RFC.7540'?>
<?rfc include='reference.RFC.7854'?>
<?rfc include='reference.RFC.8321'?>
<?rfc include='reference.RFC.7011'?>
<?rfc include='reference.RFC.4656'?>
<?rfc include='reference.RFC.5357'?>
<?rfc include='reference.RFC.5424'?>
<?rfc include='reference.RFC.1157'?>
<?rfc include='reference.RFC.3176'?>
<?rfc include='reference.RFC.3411'?>
<?rfc include='reference.RFC.3416'?>
<?rfc include='reference.RFC.7276'?>
<?rfc include='reference.RFC.7799'?>
<?rfc include='reference.RFC.2981'?>
<?rfc include='reference.RFC.3877'?>
<?rfc include='reference.RFC.7575'?>
<?rfc include='reference.RFC.8641'?>
<?rfc include='reference.RFC.8639'?>
<?rfc include='reference.RFC.6812'?>
<?rfc include='reference.RFC.2578'?>
<?rfc include='reference.RFC.8762'?>
<?rfc include='reference.RFC.8040'?>
<?rfc include='reference.RFC.7258'?>
<?rfc include='reference.RFC.8259'?>
<?rfc include='reference.RFC.8924'?>
<?rfc include='reference.RFC.5085'?>
<?rfc include='reference.RFC.8084'?>
<?rfc include='reference.RFC.8085'?>
<?rfc include='reference.RFC.8889'?>
<?rfc include='reference.RFC.8671'?>
<?rfc include='reference.I-D.ietf-grow-bmp-local-rib'?>
<?rfc include='reference.I-D.ietf-netconf-distributed-notif'?>
<?rfc include='reference.I-D.ietf-netconf-udp-notif'?>
<?rfc include='reference.I-D.song-opsawg-dnp4iq'?>
<?rfc include='reference.I-D.ietf-ippm-ioam-data'?>
<?rfc include='reference.I-D.ietf-ippm-ioam-direct-export'?>
<?rfc include='reference.I-D.pedro-nmrg-anticipated-adaptation'?>
<?rfc include='reference.I-D.song-ippm-postcard-based-telemetry'?>
<?rfc include='reference.I-D.song-opsawg-ifit-framework'?>
<?rfc include='reference.I-D.irtf-nmrg-ibn-concepts-definitions'?>
<?rfc include='reference.I-D.wwx-netmod-event-yang'?>
<reference anchor="gpb" target="https://developers.google.com/protocol-buffers"> <references>
<front> <name>Informative References</name>
<title>Google Protocol Buffers</title> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
<author/> .3954.xml"/>
<date/> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
</front> .6020.xml"/>
</reference> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
<reference anchor="grpc" target="https://grpc.io"> .7950.xml"/>
<front> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
<title>gPPC, A high performance, open-source universal RPC framework</title> .6241.xml"/>
<author/> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
<date/> .7540.xml"/>
</front> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
</reference> .7854.xml"/>
<reference anchor="gnmi" target="https://github.com/openconfig/reference/tree/ma <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
ster/rpc/gnmi"> .8321.xml"/>
<front> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
<title>gNMI - gRPC Network Management Interface</title> .7011.xml"/>
<author/> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
<date/> .4656.xml"/>
</front> <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.5357.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.5424.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.1157.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.3176.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.3411.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.3416.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.7276.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.7799.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.2981.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.3877.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.7575.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.8641.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.8639.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.6812.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.2578.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.8762.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.8040.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.7258.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.8259.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.8924.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.5085.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.8084.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.8085.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.8889.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.8671.xml"/>
<!-- [I-D.ietf-ippm-ioam-data] is now 9197-->
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC
.9197.xml"/>
<!-- [I-D.ietf-grow-bmp-local-rib] Published as RFC 9069 -->
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9069.
xml"/>
<!-- [I-D.ietf-netconf-distributed-notif] IESG state I-D Exists -->
<xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-ne
tconf-distributed-notif.xml"/>
<!-- [I-D.ietf-netconf-udp-notif] IESG state I-D Exists -->
<xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-ne
tconf-udp-notif.xml"/>
<!-- [I-D.song-opsawg-dnp4iq] IESG state Expired. Note: included the long form a
s the editor role was missing -->
<reference anchor="OPSAWG-DNP4IQ">
<front>
<title>Requirements for Interactive Query with Dynamic Network Probes</tit
le>
<author fullname="Haoyu Song" role="editor">
<organization>Huawei Technologies Co., Ltd</organization>
</author>
<author fullname="Jun Gong">
<organization>Huawei Technologies Co., Ltd</organization>
</author>
<date month="June" day="19" year="2017" />
</front>
<seriesInfo name="Internet-Draft" value="draft-song-opsawg-dnp4iq-01" />
</reference> </reference>
<reference anchor="xml" target="https://www.w3.org/TR/2008/REC-xml-20081126/">
<front> <!-- [I-D.ietf-ippm-ioam-direct-export] IESG state AD Evaluation. Note: included
<title>Extensible Markup Language (XML) 1.0 (Fifth Edition)</title> the long form as the editor role was missing -->
<author/> <reference anchor="IPPM-IOAM-DIRECT-EXPORT">
<date/> <front>
</front> <title>In-situ OAM Direct Exporting</title>
<author fullname="Haoyu Song">
<organization>Futurewei</organization>
</author>
<author fullname="Barak Gafni">
<organization>Nvidia</organization>
</author>
<author fullname="Tianran Zhou">
<organization>Huawei</organization>
</author>
<author fullname="Zhenbin Li">
<organization>Huawei</organization>
</author>
<author fullname="Frank Brockners">
<organization>Cisco</organization>
</author>
<author fullname="Shwetha Bhandari" role="editor">
<organization>Thoughtspot</organization>
</author>
<author fullname="Ramesh Sivakolundu">
<organization>Cisco</organization>
</author>
<author fullname="Tal Mizrahi" role="editor">
<organization>Huawei</organization>
</author>
<date month="October" day="13" year="2021" />
</front>
<seriesInfo name="Internet-Draft" value="draft-ietf-ippm-ioam-direct-export-0
7" />
</reference> </reference>
<reference anchor="y1731" target="https://www.itu.int/rec/T-REC-Y.1731/en">
<front> <!-- [I-D.pedro-nmrg-anticipated-adaptation] IESG state Expired. Note: in
<title>ITU-T Y.1731: OAM Functions and Mechanisms for Ethernet based networks, 2 cluded the long form as the editor role was missing -->
015</title> <reference anchor="NMRG-ANTICIPATED-ADAPTATION">
<author/> <front>
<date/> <title>Exploiting External Event Detectors to Anticipate Resource Requirem
</front> ents for the Elastic Adaptation of SDN/NFV Systems</title>
<author fullname="Pedro Martinez-Julia" role="editor">
<organization>NICT</organization>
</author>
<date month="June" day="29" year="2018" />
</front>
<seriesInfo name="Internet-Draft" value="draft-pedro-nmrg-anticipated-adaptat
ion-02" />
</reference> </reference>
</references> <!-- [I-D.song-ippm-postcard-based-telemetry] IESG state I-D Exists -->
<xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.song-ip
pm-postcard-based-telemetry.xml"/>
<section title="A Survey on Existing Network Telemetry Techniques"> <!-- [I-D.song-opsawg-ifit-framework] IESG state I-D Exists -->
<t>In this non-normative appendix, we provide an overview of some existing techn <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.song-op
iques and standard proposals for each network telemetry module.</t> sawg-ifit-framework.xml"/>
<section title="Management Plane Telemetry">
<section title="Push Extensions for NETCONF"> <!-- [I-D.irtf-nmrg-ibn-concepts-definitions] IESG state I-D Exists -->
<t><xref target="RFC6241">NETCONF</xref> is a popular network management protoco <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.irtf-nm
l recommended by IETF. Its core strength is for managing configuration, but can rg-ibn-concepts-definitions.xml"/>
also be used for data collection. <xref target="RFC8641">YANG-Push</xref> <xref
target="RFC8639"/> extends NETCONF and enables subscriber applications to reques <!-- [I-D.wwx-netmod-event-yang] FYI: I-D.wwx-netmod-event-yang (Expired) was re
t a continuous, customized stream of updates from a YANG datastore. Providing su placed by I-D.ietf-netmod-eca-policy - IESG state Expired -->
ch visibility into changes made upon YANG configuration and operational objects <xi:include href="https://datatracker.ietf.org/doc/bibxml3/reference.I-D.ietf-ne
enables new capabilities based on the remote mirroring of configuration and oper tmod-eca-policy.xml"/>
ational state. Moreover, <xref target="I-D.ietf-netconf-distributed-notif">distr
ibuted data collection mechanism</xref> via <xref target="I-D.ietf-netconf-udp-n <reference anchor="gpb" target="https://developers.google.com/protocol-buf
otif">UDP based publication channel</xref> provides enhanced efficiency for the fers">
NETCONF based telemetry.</t> <front>
</section> <title>Protocol Buffers</title>
<section title="gRPC Network Management Interface"> <author><organization>Google Developers</organization></author>
<t><xref target="gnmi">gRPC Network Management Interface (gNMI)</xref> is a netw <date/>
ork management protocol based on the <xref target="grpc">gRPC</xref> RPC (Remote </front>
Procedure Call) framework. With a single gRPC service definition, both configur </reference>
ation and telemetry can be covered. gRPC is an <xref target="RFC7540">HTTP/2</xr
ef>-based open-source micro-service communication framework. It provides a numbe <reference anchor="grpc" target="https://grpc.io">
r of capabilities which are well-suited for network telemetry, including: </t> <front>
<t> <title>gPPC: A high performance, open source universal RPC framework</
<list style="symbols"> title>
<t>Full-duplex streaming transport model combined with a binary encoding mechani <author><organization>gRPC</organization></author>
sm provides good telemetry efficiency.</t> <date/>
<t>gRPC provides higher-level features consistency across platforms that common </front>
HTTP/2 libraries typically do not. This characteristic is especially valuable fo </reference>
r the fact that telemetry data collectors normally reside on a large variety of
platforms.</t> <reference anchor="gnmi" target="https://datatracker.ietf.org/meeting/98/ma
<t>The built-in load-balancing and failover mechanism.</t> terials/slides-98-rtgwg-gnmi-intro-draft-openconfig-rtgwg-gnmi-spec-00">
</list> <front>
</t> <title>gRPC Network Management Interface</title>
</section> <author initials="R." surname="Shakir" fullname="Rob Shakir">
</section> <organization/>
<section title="Control Plane Telemetry"> </author>
<section title="BGP Monitoring Protocol"> <author initials="A." surname="Shaikh" fullname="Anees Shaikh">
<t><xref target="RFC7854">BGP Monitoring Protocol (BMP)</xref> is used to monito <organization/>
r BGP sessions and is intended to provide a convenient interface for obtaining r </author>
oute views. </t> <author initials="P." surname="Borman" fullname="Paul Borman">
<t>The BGP routing information is collected from the monitored device(s) to the <organization/>
BMP monitoring station by setting up the BMP TCP session. The BGP peers are moni </author>
tored by the BMP Peer Up and Peer Down Notifications. The BGP routes (including <author initials="M." surname="Hines" fullname="Marcus Hines">
<xref target="RFC7854"> Adjacency_RIB_In </xref>, <xref target="RFC8671"> Adjace <organization/>
ncy_RIB_out</xref>, and <xref target="I-D.ietf-grow-bmp-local-rib">Local_Rib</xr </author>
ef>) are encapsulated in the BMP Route Monitoring Message and the BMP Route Mirr <author initials="C." surname="Lebsack" fullname="Carl Lebsack">
oring Message, providing both an initial table dump and real-time route updates. <organization/>
In addition, BGP statistics are reported through the BMP Stats Report Message, </author>
which could be either timer triggered or event-driven. Future BMP extensions cou <author initials="C." surname="Marrow" fullname="Chris Morrow">
ld further enrich BGP monitoring applications. <organization/>
</t> </author>
</section> <date month="March" year="2017"/>
</section> </front>
<section title="Data Plane Telemetry"> <refcontent>IETF 98</refcontent>
<section title="The Alternate Marking (AM) technology"> </reference>
<t>The Alternate Marking method enables efficient measurements of packet loss, d
elay, and jitter both in IP and Overlay Networks, as presented in <xref target=" <reference anchor="W3C.REC-xml-20081126" target="https://www.w3.org/TR/2008/RE
RFC8321"/> and <xref target="RFC8889"/>. </t> C-xml-20081126">
<t>This technique can be applied to point-to-point and multipoint-to-multipoint <front>
flows. Alternate Marking creates batches of packets by alternating the value of <title>Extensible Markup Language (XML) 1.0 (Fifth Edition)</title>
1 bit (or a label) of the packet header. These batches of packets are unambiguou <author initials="T." surname="Bray" fullname="Tim Bray">
sly recognized over the network and the comparison of packet counters for each b <organization showOnFrontPage="true"/>
atch allows the packet loss calculation. The same idea can be applied to delay m </author>
easurement by selecting ad hoc packets with a marking bit dedicated for delay me <author initials="J." surname="Paoli" fullname="Jean Paoli">
asurements.</t> <organization showOnFrontPage="true"/>
<t>Alternate Marking method needs two counters each marking period for each flow </author>
under monitor. For instance, by considering n measurement points and m monitore <author initials="M." surname="Sperberg-McQueen" fullname="Michael S
d flows, the order of magnitude of the packet counters for each time interval is perberg-McQueen">
n*m*2 (1 per color).</t> <organization showOnFrontPage="true"/>
<t>Since networks offer rich sets of network performance measurement data (e.g., </author>
packet counters), conventional approaches run into limitations. The bottleneck <author initials="E." surname="Maler" fullname="Eve Maler">
is the generation and export of the data and the amount of data that can be reas <organization showOnFrontPage="true"/>
onably collected from the network. In addition, management tasks related to dete </author>
rmining and configuring which data to generate lead to significant deployment ch <author initials="F." surname="Yergeau" fullname="Francois Yergeau">
allenges.</t> <organization showOnFrontPage="true"/>
<t>The Multipoint Alternate Marking approach, described in <xref target="RFC8889 </author>
"/>, aims to resolve this issue and make the performance monitoring more flexibl <date month="November" year="2008"/>
e in case a detailed analysis is not needed. </t> </front>
<t>An application orchestrates network performance measurements tasks across the <refcontent>World Wide Web Consortium Recommendation REC-xml-20081126</
network to allow for optimized monitoring. The application can choose how roug refcontent>
hly or precisely to configure measurement points depending on the application's </reference>
requirements.</t>
<t>Using Alternate Marking, it is possible to monitor a Multipoint Network witho <reference anchor="y1731" target="https://www.itu.int/rec/T-REC-Y.1731/en"
ut in depth examination by using the Network Clustering (subnetworks that are po >
rtions of the entire network that preserve the same property of the entire netwo <front>
rk, called clusters). So in the case that there is packet loss or the delay is <title>Operations, administration and maintenance (OAM) functions and
too high then the specific filtering criteria could be applied to gather a more mechanisms for Ethernet-based networks</title>
detailed analysis by using a different combination of clusters up to a per-flow <author><organization>ITU-T</organization></author>
measurement as described in <xref target="RFC8321">Alternate-Marking (AM)</xref> <date month="August" year="2015"/>
. </t> </front>
<t>In summary, an application can configure end-to-end network monitoring. If th <seriesInfo name="ITU-T Recommendation" value="G.8013/Y.1731"/>
e network does not experience issues, this approximate monitoring is good enough </reference>
and is very cheap in terms of network resources. However, in case of problems, </references>
the application becomes aware of the issues from this approximate monitoring and
, in order to localize the portion of the network that has issues, configures th <section numbered="true" toc="default">
e measurement points more extensively, allowing more detailed monitoring to be p <name>A Survey on Existing Network Telemetry Techniques</name>
erformed. After the detection and resolution of the problem, the initial approxi <t>In this non-normative appendix, we provide an overview of some existing
mate monitoring can be used again.</t> techniques and standard proposals for each network telemetry module.</t>
</section> <section numbered="true" toc="default">
<section title="Dynamic Network Probe"> <name>Management Plane Telemetry</name>
<t>Hardware-based <xref target="I-D.song-opsawg-dnp4iq">Dynamic Network Probe (D <section numbered="true" toc="default">
NP)</xref> proposes a programmable means to customize the data that an applicati <name>Push Extensions for NETCONF</name>
on collects from the data plane. A direct benefit of DNP is the reduction of the <t><xref target="RFC6241" format="default">NETCONF</xref> is a popular
exported data. A full DNP solution covers several components including data sou network management protocol recommended by IETF. Its core strength is for manag
rce, data subscription, and data generation. The data subscription needs to defi ing configuration, but it can also be used for data collection. <xref target="RF
ne the derived data which can be composed and derived from the raw data sources. C8639" format="default">YANG-Push</xref> <xref target="RFC8641" format="default"
The data generation takes advantage of the moderate in-network computing to pro /> extends NETCONF and enables subscriber applications to request a continuous,
duce the desired data.</t> customized stream of updates from a YANG datastore. Providing such visibility in
<t>While DNP can introduce unforeseeable flexibility to the data plane telemetry to changes made upon YANG configuration and operational objects enables new capa
, it also faces some challenges. It requires a flexible data plane that can be d bilities based on the remote mirroring of configuration and operational state. M
ynamically reprogrammed at run-time. The programming API is yet to be defined.</ oreover, a <xref target="I-D.ietf-netconf-distributed-notif" format="default">di
t> stributed data collection mechanism</xref> via a <xref target="I-D.ietf-netconf-
</section> udp-notif" format="default">UDP-based publication channel</xref> provides enhanc
<section title="IP Flow Information Export (IPFIX) Protocol"> ed efficiency for the NETCONF-based telemetry.</t>
<t>Traffic on a network can be seen as a set of flows passing through network el </section>
ements. <section numbered="true" toc="default">
<xref target="RFC7011">IP Flow Information Export (IPFIX) </xref> <name>gRPC Network Management Interface</name>
provides a means of transmitting traffic flow information for administrative or <t><xref target="gnmi" format="default">gRPC Network Management Interf
other purposes. A typical IPFIX enabled system includes a pool of Metering Proce ace (gNMI)</xref> is a network management protocol based on the <xref target="gr
sses that collects data packets at one or more Observation Points, optionally fi pc" format="default">gRPC</xref> Remote Procedure Call (RPC) framework. With a s
lters them and aggregates information about these packets. An Exporter then gath ingle gRPC service definition, both configuration and telemetry can be covered.
ers each of the Observation Points together into an Observation Domain and sends gRPC is an open-source micro-service communication framework based on <xref targ
this information via the IPFIX protocol to a Collector.</t> et="RFC7540" format="default">HTTP/2</xref>. It provides a number of capabilitie
</section> s that are well-suited for network telemetry, including: </t>
<section title="In-Situ OAM"> <ul spacing="normal">
<t>Classical passive and active monitoring and measurement techniques are either <li>A full-duplex streaming transport model; when combined with a bi
inaccurate or resource-consuming. It is preferable to directly acquire data ass nary encoding mechanism, it provides good telemetry efficiency.</li>
ociated with a flow's packets when the packets pass through a network. <xref tar <li>A higher-level feature consistency across platforms that common
get="I-D.ietf-ippm-ioam-data">In-situ OAM (iOAM)</xref>, a data generation techn HTTP/2 libraries typically do not provide. This characteristic is especially val
ique, embeds a new instruction header to user packets and the instruction direct uable for the fact that telemetry data collectors normally reside on a large var
s the network nodes to add the requested data to the packets. Thus, at the path iety of platforms.</li>
end, the packet's experience gained on the entire forwarding path can be collect <li>A built-in load-balancing and failover mechanism.</li>
ed. Such firsthand data is invaluable to many network OAM applications.</t> </ul>
<t>However, iOAM also faces some challenges. The issues on performance impact, s </section>
ecurity, scalability and overhead limits, encapsulation difficulties in some pro </section>
tocols, and cross-domain deployment need to be addressed.</t> <section numbered="true" toc="default">
</section> <name>Control Plane Telemetry</name>
<section anchor="pbt" title="Postcard Based Telemetry"> <section numbered="true" toc="default">
<t>The postcard-based telemetry, as embodied in <xref target="I-D.ietf-ippm-ioam <name>BGP Monitoring Protocol</name>
-direct-export">IOAM DEX</xref> and <xref target="I-D.song-ippm-postcard-based-t <t><xref target="RFC7854" format="default">BMP</xref> is used to monit
elemetry">IOAM Marking</xref>, is a complementary technique to the passport-base or BGP sessions and is intended to provide a convenient interface for obtaining
d IOAM. PBT directly exports data at each node through an independent packet. At route views. </t>
the cost of higher bandwidth overhead and the need for data correlation, PBT sh <t>BGP routing information is collected from the monitored device(s) t
ows several unique advantages. It can also help to identify packet drop location o the BMP monitoring station by setting up the BMP TCP session. The BGP peers ar
in case a packet is dropped on its forwarding path.</t> e monitored by the BMP Peer Up and Peer Down notifications. The BGP routes (incl
</section> uding <xref target="RFC7854" format="default"> Adj_RIB_In </xref>, <xref target=
<section title="Existing OAM for Specific Data Planes"> "RFC8671" format="default"> Adj_RIB_out</xref>, and <xref target="RFC9069" forma
<t> t="default">local RIB</xref>) are encapsulated in the BMP Route Monitoring Messa
Various data planes raise unique OAM requirements. IETF has published OAM techni ge and the BMP Route Mirroring Message, providing both an initial table dump and
que and framework documents (e.g., <xref target="RFC8924" /> and <xref target="R real-time route updates. In addition, BGP statistics are reported through the B
FC5085" />) targeting different data planes such as Multi-Protocol Label Switchi MP Stats Report Message, which could be either timer triggered or event-driven.
ng (MPLS), L2 Virtual Private Network (L2-VPN), Network Virtualization Overlays Future BMP extensions could further enrich BGP monitoring applications.
(NVO3), Virtual Extensible LAN (VXLAN), Bit Indexed Explicit Replication (BIER),
Service Function Chaining (SFC), Segment Routing (SR), and Deterministic Networ
king (DETNET). The aforementioned data plane telemetry techniques can be used to
enhance the OAM capability on such data planes.
</t> </t>
</section> </section>
</section> </section>
<section title="External Data and Event Telemetry"> <section numbered="true" toc="default">
<section title="Sources of External Events"> <name>Data Plane Telemetry</name>
<t>To ensure that the information provided by external event detectors and used <section numbered="true" toc="default">
by the network management solutions is meaningful for management purposes, the n <name>Alternate-Marking (AM) Technology</name>
etwork telemetry framework must ensure that such detectors (sources) are easily <t>The Alternate-Marking method enables efficient measurements of pack
connected to the management solutions (sinks). This requires the specification o et loss, delay, and jitter both in IP and Overlay Networks, as presented in <xre
f a list of potential external data sources that could be of interest in network f target="RFC8321" format="default"/> and <xref target="RFC8889" format="default
management and match it to the connectors and/or interfaces required to connect "/>. </t>
them.</t> <t>This technique can be applied to point-to-point and multipoint-to-m
<t>Categories of external event sources that may be of interest to network manag ultipoint flows. Alternate Marking creates batches of packets by alternating the
ement include::</t> value of 1 bit (or a label) of the packet header. These batches of packets are
<t> unambiguously recognized over the network, and the comparison of packet counters
<list style="symbols"> for each batch allows the packet loss calculation. The same idea can be applied
<t>Smart objects and sensors. With the consolidation of the Internet of Things~( to delay measurement by selecting ad hoc packets with a marking bit dedicated f
IoT) any network system will have many smart objects attached to its physical su or delay measurements.</t>
rroundings and logical operation environments. Most of these objects will be ess <t>The Alternate-Marking method needs two counters each marking period
entially based on sensors of many kinds (e.g., temperature, humidity, presence) for each flow under monitor. For instance, by considering n measurement points
and the information they provide can be very useful for the management of the ne and m monitored flows, the order of magnitude of the packet counters for each ti
twork, even when they are not specifically deployed for such purpose. Elements o me interval is n*m*2 (1 per color).</t>
f this source type will usually provide a specific protocol for interaction, esp <t>Since networks offer rich sets of network performance measurement d
ecially one of those protocols related to IoT, such as the Constrained Applicati ata (e.g., packet counters), conventional approaches run into limitations. The b
on Protocol (CoAP).</t> ottleneck is the generation and export of the data and the amount of data that c
<t>Online news reporters. Several online news services have the ability to provi an be reasonably collected from the network. In addition, management tasks relat
de enormous quantity of information about different events occurring in the worl ed to determining and configuring which data to generate lead to significant dep
d. Some of those events can impact on the network system managed by a specific f loyment challenges.</t>
ramework and, therefore, such information may be of interest to the management s <t>The Multipoint Alternate-Marking approach, described in <xref targe
olution. For instance, diverse security reports, such as the Common Vulnerabilit t="RFC8889" format="default"/>, aims to resolve this issue and make the performa
ies and Exposures (CVE), can be issued by the corresponding authority and used b nce monitoring more flexible in case a detailed analysis is not needed. </t>
y the management solution to update the managed system if needed. Instead of a s <t>An application orchestrates network performance measurement tasks a
pecific protocol and data format, the sources of this kind of information usuall cross the network to allow for optimized monitoring. The application can choose
y follow a relaxed but structured format. This format will be part of both the o how roughly or precisely to configure measurement points depending on the appli
ntology and information model of the telemetry framework.</t> cation's requirements.</t>
<t>Global event analyzers. The advance of Big Data analyzers provides a huge amo <t>Using Alternate Marking, it is possible to monitor a Multipoint Net
unt of information and, more interestingly, the identification of events detecte work without in-depth examination by using Network Clustering (subnetworks that
d by analyzing many data streams from different origins. In contrast with the ot are portions of the entire network that preserve the same property of the entire
her types of sources, which are focused on specific events, the detectors of thi network, called clusters). So in the case where there is packet loss or the de
s source type will detect generic events. For example, during a sport event some lay is too high, the specific filtering criteria could be applied to gather a mo
unexpected movement makes it fascinating and many people connect to sites that re detailed analysis by using a different combination of clusters up to a per-fl
are reporting on the event. The underlying networks supporting the services that ow measurement as described in the Alternate-Marking document <xref target="RFC8
cover the event can be affected by such situation, so their management solution 321" format="default"/>. </t>
s should be aware of it. In contrast with the other source types, a new informat <t>In summary, an application can configure end-to-end network monitor
ion model, format, and reporting protocol is required to integrate the detectors ing. If the network does not experience issues, this approximate monitoring is g
of this type with the management solution.</t> ood enough and is very cheap in terms of network resources. However, in case of
</list> problems, the application becomes aware of the issues from this approximate moni
toring and, in order to localize the portion of the network that has issues, con
figures the measurement points more extensively, allowing more detailed monitori
ng to be performed. After the detection and resolution of the problem, the initi
al approximate monitoring can be used again.</t>
</section>
<section numbered="true" toc="default">
<name>Dynamic Network Probe</name>
<t>A hardware-based <xref target="OPSAWG-DNP4IQ" format="default">Dyna
mic Network Probe (DNP)</xref> provides a programmable means to customize the da
ta that an application collects from the data plane. A direct benefit of DNP is
the reduction of the exported data. A full DNP solution covers several component
s including data source, data subscription, and data generation. The data subscr
iption needs to define the derived data that can be composed and derived from ra
w data sources. The data generation takes advantage of the moderate in-network c
omputing to produce the desired data.</t>
<t>While DNP can introduce unforeseeable flexibility to the data plane
telemetry, it also faces some challenges. It requires a flexible data plane tha
t can be dynamically reprogrammed at runtime. The programming Application Progra
mming Interface (API) is yet to be defined.</t>
</section>
<section numbered="true" toc="default">
<name>IP Flow Information Export (IPFIX) Protocol</name>
<t>Traffic on a network can be seen as a set of flows passing through
network elements.
<xref target="RFC7011" format="default">IPFIX </xref>
provides a means of transmitting traffic flow information for administrative or
other purposes. A typical IPFIX-enabled system includes a pool of Metering Proce
sses that collects data packets at one or more Observation Points, optionally fi
lters them, and aggregates information about these packets. An Exporter then gat
hers each of the Observation Points together into an Observation Domain and send
s this information via the IPFIX protocol to a Collector.</t>
</section>
<section numbered="true" toc="default">
<name>In Situ OAM</name>
<t>Classical passive and active monitoring and measurement techniques
are either inaccurate or resource consuming. It is preferable to directly acquir
e data associated with a flow's packets when the packets pass through a network.
<xref target="RFC9197" format="default">IOAM</xref>, a data generation techniqu
e, embeds a new instruction header to user packets, and the instruction directs
the network nodes to add the requested data to the packets. Thus, at the path's
end, the packet's experience gained on the entire forwarding path can be collect
ed. Such firsthand data is invaluable to many network OAM applications.</t>
<t>However, IOAM also faces some challenges. The issues on performance
impact, security, scalability and overhead limits, encapsulation difficulties i
n some protocols, and cross-domain deployment need to be addressed.</t>
</section>
<section anchor="pbt" numbered="true" toc="default">
<name>Postcard-Based Telemetry</name>
<t>The postcard-based telemetry, as embodied in <xref target="IPPM-IOA
M-DIRECT-EXPORT" format="default">IOAM Direct Export (DEX)</xref> and <xref targ
et="I-D.song-ippm-postcard-based-telemetry" format="default">IOAM Marking</xref>
, is a complementary technique to the passport-based IOAM <xref target="RFC9197"
format="default"/>. PBT directly exports data at each node through an independe
nt packet. At the cost of higher bandwidth overhead and the need for data correl
ation, PBT shows several unique advantages. It can also help to identify packet
drop location in case a packet is dropped on its forwarding path.</t>
</section>
<section numbered="true" toc="default">
<name>Existing OAM for Specific Data Planes</name>
<t>
Various data planes raise unique OAM requirements. IETF has published OAM techni
que and framework documents (e.g., <xref target="RFC8924" format="default"/> and
<xref target="RFC5085" format="default"/>) targeting different data planes such
as Multiprotocol Label Switching (MPLS), L2 Virtual Private Network (VPN), Netw
ork Virtualization over Layer 3 (NVO3), Virtual Extensible LAN (VXLAN), Bit Inde
x Explicit Replication (BIER), Service Function Chaining (SFC), Segment Routing
(SR), and Deterministic Networking (DETNET). The aforementioned data plane telem
etry techniques can be used to enhance the OAM capability on such data planes.
</t> </t>
<t>Additional types of detector types can be added to the system, but they will </section>
be generally the result of composing the properties offered by these main classe </section>
s.</t> <section numbered="true" toc="default">
</section> <name>External Data and Event Telemetry</name>
<section title="Connectors and Interfaces"> <section numbered="true" toc="default">
<t>For allowing external event detectors to be properly integrated with other ma <name>Sources of External Events</name>
nagement solutions, both elements must expose interfaces and protocols that are <t>To ensure that the information provided by external event detectors
subject to their particular objective. Since external event detectors will be fo and used by the network management solutions is meaningful for management purpo
cused on providing their information to their main consumers, which generally wi ses, the network telemetry framework must ensure that such detectors (sources) a
ll not be limited to the network management solutions, the framework must includ re easily connected to the management solutions (sinks). This requires the speci
e the definition of the required connectors for ensuring the interconnection bet fication of a list of potential external data sources that could be of interest
ween detectors (sources) and their consumers within the management systems (sink in network management and matching it to the connectors and/or interfaces requir
s) are effective.</t> ed to connect them.</t>
<t>In some situations, the interconnection between the external event detectors <t>Categories of external event sources that may be of interest to net
and the management system is via the management plane. For those situations ther work management include:</t>
e will be a special connector that provides the typical interfaces found in most <ul spacing="normal">
other elements connected to the management plane. For instance, the interfaces <li>Smart objects and sensors. With the consolidation of the Interne
could accomplish this with a specific data model (YANG) and specific telemetry p t of Things (IoT), any network system will have many smart objects attached to i
rotocol, such as NETCONF, YANG-Push, or gRPC.</t> ts physical surroundings and logical operation environments. Most of these objec
</section> ts will be essentially based on sensors of many kinds (e.g., temperature, humidi
</section> ty, and presence), and the information they provide can be very useful for the m
</section> anagement of the network, even when they are not specifically deployed for such
</back> purpose. Elements of this source type will usually provide a specific protocol f
or interaction, especially one of the protocols related to IoT, such as the Cons
trained Application Protocol (CoAP).</li>
<li>Online news reporters. Several online news services have the abi
lity to provide an enormous quantity of information about different events occur
ring in the world. Some of those events can have an impact on the network system
managed by a specific framework; therefore, such information may be of interest
to the management solution. For instance, diverse security reports, such as Com
mon Vulnerabilities and Exposures (CVEs), can be issued by the corresponding aut
hority and used by the management solution to update the managed system, if need
ed. Instead of a specific protocol and data format, the sources of this kind of
information usually follow a relaxed but structured format. This format will be
part of both the ontology and information model of the telemetry framework.</li>
<li>Global event analyzers. The advance of big data analyzers provid
es a huge amount of information and, more interestingly, the identification of e
vents detected by analyzing many data streams from different origins. In contras
t with the other types of sources, which are focused on specific events, the det
ectors of this source type will detect generic events. For example, during a spo
rts event, some unexpected movement makes it fascinating, and many people connec
t to sites that are reporting on the event. The underlying networks supporting t
he services that cover the event can be affected by such situation, so their man
agement solutions should be aware of it. In contrast with the other source types
, a new information model, format, and reporting protocol is required to integra
te the detectors of this type with the management solution.</li>
</ul>
<t>Additional detector types can be added to the system, but generally
they will be the result of composing the properties offered by these main class
es.</t>
</section>
<section numbered="true" toc="default">
<name>Connectors and Interfaces</name>
<t>For allowing external event detectors to be properly integrated wit
h other management solutions, both elements must expose interfaces and protocols
that are subject to their particular objective. Since external event detectors
will be focused on providing their information to their main consumers, which ge
nerally will not be limited to the network management solutions, the framework m
ust include the definition of the required connectors for ensuring the interconn
ection between detectors (sources) and their consumers within the management sys
tems (sinks) are effective.</t>
<t>In some situations, the interconnection between external event dete
ctors and the management system is via the management plane. For those situation
s, there will be a special connector that provides the typical interfaces found
in most other elements connected to the management plane. For instance, the inte
rfaces could accomplish this with a specific data model (YANG) and specific tele
metry protocol, such as NETCONF, YANG-Push, or gRPC.</t>
</section>
<section anchor="Acknowledgments" numbered="false" toc="default">
<name>Acknowledgments</name>
<t>We would like to thank <contact fullname="Rob Wilton"/>, <contact fulln
ame="Greg Mirsky"/>, <contact fullname="Randy Presuhn"/>, <contact fullname="Joe
Clarke"/>, <contact fullname="Victor Liu"/>, <contact fullname="James Guichard"
/>, <contact fullname="Uri Blumenthal"/>, <contact fullname="Giuseppe Fioccola"/
>, <contact fullname="Yunan Gu"/>, <contact fullname="Parviz Yegani"/>, <contact
fullname="Young Lee"/>, <contact fullname="Qin Wu"/>, <contact fullname="Gyan M
ishra"/>, <contact fullname="Ben Schwartz"/>, <contact fullname="Alexey Melnikov
"/>, <contact fullname="Michael Scharf"/>, <contact fullname="Dhruv Dhody"/>, <c
ontact fullname="Martin Duke"/>, <contact fullname="Roman Danyliw"/>, <contact f
ullname="Warren Kumari"/>, <contact fullname="Sheng Jiang"/>, <contact fullname=
"Lars Eggert"/>, <contact fullname="Éric Vyncke"/>, <contact fullname="Jean-Mich
el Combes"/>, <contact fullname="Erik Kline"/>, <contact fullname="Benjamin Kadu
k"/>, and many others who have provided helpful comments and suggestions to impr
ove this document.</t>
</section>
<section anchor="Contributors" numbered="false" toc="default">
<name>Contributors</name>
<t> The other contributors of this document are <contact fullname="Tianran
Zhou"/>, <contact fullname="Zhenbin Li"/>, <contact fullname="Zhenqiang Li"/>,
<contact fullname="Daniel King"/>, <contact fullname="Adrian Farrel"/>, and <con
tact fullname="Alexander Clemm"/>.</t>
</section>
</section>
</section>
</back>
</rfc> </rfc>
 End of changes. 29 change blocks. 
1401 lines changed or deleted 1641 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/