rfc9232.original   rfc9232.txt 
OPSAWG H. Song Internet Engineering Task Force (IETF) H. Song
Internet-Draft Futurewei Request for Comments: 9232 Futurewei
Intended status: Informational F. Qin Category: Informational F. Qin
Expires: 6 June 2022 China Mobile ISSN: 2070-1721 China Mobile
P. Martinez-Julia P. Martinez-Julia
NICT NICT
L. Ciavaglia L. Ciavaglia
Rakuten Mobile Rakuten Mobile
A. Wang A. Wang
China Telecom China Telecom
3 December 2021 May 2022
Network Telemetry Framework Network Telemetry Framework
draft-ietf-opsawg-ntf-13
Abstract Abstract
Network telemetry is a technology for gaining network insight and Network telemetry is a technology for gaining network insight and
facilitating efficient and automated network management. It facilitating efficient and automated network management. It
encompasses various techniques for remote data generation, encompasses various techniques for remote data generation,
collection, correlation, and consumption. This document describes an collection, correlation, and consumption. This document describes an
architectural framework for network telemetry, motivated by architectural framework for network telemetry, motivated by
challenges that are encountered as part of the operation of networks challenges that are encountered as part of the operation of networks
and by the requirements that ensue. This document clarifies the and by the requirements that ensue. This document clarifies the
terminologies and classifies the modules and components of a network terminology and classifies the modules and components of a network
telemetry system from different perspectives. The framework and telemetry system from different perspectives. The framework and
taxonomy help to set a common ground for the collection of related taxonomy help to set a common ground for the collection of related
work and provide guidance for related technique and standard work and provide guidance for related technique and standard
developments. developments.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This document is not an Internet Standards Track specification; it is
provisions of BCP 78 and BCP 79. published for informational purposes.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are candidates for any level of Internet
Standard; see Section 2 of RFC 7841.
This Internet-Draft will expire on 6 June 2022. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9232.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. Code Components carefully, as they describe your rights and restrictions with respect
extracted from this document must include Revised BSD License text as to this document. Code Components extracted from this document must
described in Section 4.e of the Trust Legal Provisions and are include Revised BSD License text as described in Section 4.e of the
provided without warranty as described in the Revised BSD License. Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction
1.1. Applicability Statement . . . . . . . . . . . . . . . . . 4 1.1. Applicability Statement
1.2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Glossary
2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. Background
2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7 2.1. Telemetry Data Coverage
2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2. Use Cases
2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 9 2.3. Challenges
2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 11 2.4. Network Telemetry
2.5. The Necessity of a Network Telemetry Framework . . . . . 13 2.5. The Necessity of a Network Telemetry Framework
3. Network Telemetry Framework . . . . . . . . . . . . . . . . . 14 3. Network Telemetry Framework
3.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 15 3.1. Top-Level Modules
3.1.1. Management Plane Telemetry . . . . . . . . . . . . . 18 3.1.1. Management Plane Telemetry
3.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 18 3.1.2. Control Plane Telemetry
3.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 19 3.1.3. Forwarding Plane Telemetry
3.1.4. External Data Telemetry . . . . . . . . . . . . . . . 21 3.1.4. External Data Telemetry
3.2. Second Level Function Components . . . . . . . . . . . . 22 3.2. Second-Level Function Components
3.3. Data Acquisition Mechanism and Type Abstraction . . . . . 24 3.3. Data Acquisition Mechanism and Type Abstraction
3.4. Mapping Existing Mechanisms into the Framework . . . . . 26 3.4. Mapping Existing Mechanisms into the Framework
4. Evolution of Network Telemetry Applications . . . . . . . . . 27 4. Evolution of Network Telemetry Applications
5. Security Considerations . . . . . . . . . . . . . . . . . . . 28 5. Security Considerations
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 6. IANA Considerations
7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 29 7. Informative References
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 29 Appendix A. A Survey on Existing Network Telemetry Techniques
9. Informative References . . . . . . . . . . . . . . . . . . . 29 A.1. Management Plane Telemetry
Appendix A. A Survey on Existing Network Telemetry Techniques . 35 A.1.1. Push Extensions for NETCONF
A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 35 A.1.2. gRPC Network Management Interface
A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 35 A.2. Control Plane Telemetry
A.1.2. gRPC Network Management Interface . . . . . . . . . . 36 A.2.1. BGP Monitoring Protocol
A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 36 A.3. Data Plane Telemetry
A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 36 A.3.1. Alternate-Marking (AM) Technology
A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 36 A.3.2. Dynamic Network Probe
A.3.1. The Alternate Marking (AM) technology . . . . . . . . 36 A.3.3. IP Flow Information Export (IPFIX) Protocol
A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 38 A.3.4. In Situ OAM
A.3.3. IP Flow Information Export (IPFIX) Protocol . . . . . 38 A.3.5. Postcard-Based Telemetry
A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 38 A.3.6. Existing OAM for Specific Data Planes
A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 39 A.4. External Data and Event Telemetry
A.3.6. Existing OAM for Specific Data Planes . . . . . . . . 39 A.4.1. Sources of External Events
A.4. External Data and Event Telemetry . . . . . . . . . . . . 39 A.4.2. Connectors and Interfaces
A.4.1. Sources of External Events . . . . . . . . . . . . . 39 Acknowledgments
A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 41 Contributors
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 Authors' Addresses
1. Introduction 1. Introduction
Network visibility is the ability of management tools to see the Network visibility is the ability of management tools to see the
state and behavior of a network, which is essential for successful state and behavior of a network, which is essential for successful
network operation. Network Telemetry revolves around network data network operation. Network telemetry revolves around network data
that can help provide insights about the current state of the that 1) can help provide insights about the current state of the
network, including network devices, forwarding, control, and network, including network devices, forwarding, control, and
management planes, and that can be generated and obtained through a management planes; 2) can be generated and obtained through a variety
variety of techniques, including but not limited to network of techniques, including but not limited to network instrumentation
instrumentation and measurements, and that can be processed for and measurements; and 3) can be processed for purposes ranging from
purposes ranging from service assurance to network security using a service assurance to network security using a wide variety of data
wide variety of data analytical techniques. In this document, analytical techniques. In this document, network telemetry refers to
Network Telemetry refer to both the data itself (i.e., "Network both the data itself (i.e., "Network Telemetry Data") and the
Telemetry Data"), and the techniques and processes used to generate, techniques and processes used to generate, export, collect, and
export, collect, and consume that data for use by potentially consume that data for use by potentially automated management
automated management applications. Network telemetry extends beyond applications. Network telemetry extends beyond the classical network
the classical network Operations, Administration, and Management Operations, Administration, and Management (OAM) techniques and
(OAM) techniques and expects to support better flexibility, expects to support better flexibility, scalability, accuracy,
scalability, accuracy, coverage, and performance. coverage, and performance.
However, the term "network telemetry" lacks an unambiguous However, the term "network telemetry" lacks an unambiguous
definition. The scope and coverage of it cause confusion and definition. The scope and coverage of it cause confusion and
misunderstandings. It is beneficial to clarify the concept and misunderstandings. It is beneficial to clarify the concept and
provide a clear architectural framework for network telemetry, so we provide a clear architectural framework for network telemetry, so we
can articulate the technical field, and better align the related can articulate the technical field and better align the related
techniques and standard works. techniques and standard works.
To fulfill such an undertaking, we first discuss some key To fulfill such an undertaking, we first discuss some key
characteristics of network telemetry which set a clear distinction characteristics of network telemetry that set a clear distinction
from the conventional network OAM and show that some conventional OAM from the conventional network OAM and show that some conventional OAM
technologies can be considered a subset of the network telemetry technologies can be considered a subset of the network telemetry
technologies. We then provide an architectural framework for network technologies. We then provide an architectural framework for network
telemetry which includes four modules, each concerned with a telemetry that includes four modules, each associated with a
different category of telemetry data and corresponding procedures. different category of telemetry data and corresponding procedures.
All the modules are internally structured in the same way, including All the modules are internally structured in the same way, including
components that allow the operator to configure data sources in components that allow the operator to configure data sources in
regard to what data to generate and how to make that available to regard to what data to generate and how to make that available to
client applications, components that instrument the underlying data client applications, components that instrument the underlying data
sources, and components that perform the actual rendering, encoding, sources, and components that perform the actual rendering, encoding,
and exporting of the generated data. We show how the network and exporting of the generated data. We show how the network
telemetry framework can benefit the current and future network telemetry framework can benefit current and future network
operations. Based on the distinction of modules and function operations. Based on the distinction of modules and function
components, we can map the existing and emerging techniques and components, we can map the existing and emerging techniques and
protocols into the framework. The framework can also simplify protocols into the framework. The framework can also simplify
designing, maintaining, and understanding a network telemetry system. designing, maintaining, and understanding a network telemetry system.
In addition, we outline the evolution stages of the network telemetry In addition, we outline the evolution stages of the network telemetry
system and discuss the potential security concerns. system and discuss the potential security concerns.
The purpose of the framework and taxonomy is to set a common ground The purpose of the framework and taxonomy is to set a common ground
for the collection of related work and provide guidance for future for the collection of related work and provide guidance for future
technique and standard developments. To the best of our knowledge, technique and standard developments. To the best of our knowledge,
skipping to change at page 4, line 35 skipping to change at line 175
The network telemetry framework presented in this document must not The network telemetry framework presented in this document must not
be applied to generating, exporting, collecting, analyzing, or be applied to generating, exporting, collecting, analyzing, or
retaining individual user data or any data that can identify end retaining individual user data or any data that can identify end
users or characterize their behavior without consent. Based on this users or characterize their behavior without consent. Based on this
principle, the network telemetry framework is not applicable to principle, the network telemetry framework is not applicable to
networks whose endpoints represent individual users, such as general- networks whose endpoints represent individual users, such as general-
purpose access networks. purpose access networks.
1.2. Glossary 1.2. Glossary
Before further discussion, we list some key terminology and acronyms Before further discussion, we list some key terminology and
used in this document. We make an intended differentiation between abbreviations used in this document. There is an intended
the terms of network telemetry and OAM. However, it should be differentiation between the terms of network telemetry and OAM.
understood that there is not a hard-line distinction between the two However, it should be understood that there is not a hard-line
concepts. Rather, network telemetry is considered as an extension of distinction between the two concepts. Rather, network telemetry is
OAM. It covers all the existing OAM protocols but puts more emphasis considered an extension of OAM. It covers all the existing OAM
on the newer and emerging techniques and protocols concerning all protocols but puts more emphasis on the newer and emerging techniques
aspects of network data from acquisition to consumption. and protocols concerning all aspects of network data from acquisition
to consumption.
AI: Artificial Intelligence. In the network domain, AI refers to AI: Artificial Intelligence. In the network domain, AI
the machine-learning based technologies for automated network refers to machine-learning-based technologies for
operation and other tasks. automated network operation and other tasks.
AM: Alternate Marking, a flow performance measurement method, AM: Alternate Marking. A flow performance measurement
specified in [RFC8321]. method, as specified in [RFC8321].
BMP: BGP Monitoring Protocol, specified in [RFC7854]. BMP: BGP Monitoring Protocol. Specified in [RFC7854].
DPI: Deep Packet Inspection, referring to the techniques that DPI: Deep Packet Inspection. Refers to the techniques that
examines packet beyond packet L3/L4 headers. examine packets beyond packet L3/L4 headers.
gNMI: gRPC Network Management Interface, a network management gNMI: gRPC Network Management Interface. A network management
protocol from OpenConfig Operator Working Group, mainly protocol from the OpenConfig Operator Working Group,
contributed by Google. See [gnmi] for details. mainly contributed by Google. See [gnmi] for details.
GPB: Google Protocol Buffer, an extensible mechanism for serializing GPB: Google Protocol Buffer. An extensible mechanism for
structured data. See [gpb] for details. serializing structured data. See [gpb] for details.
gRPC: gRPC Remote Procedure Call, an open source high performance gRPC: gRPC Remote Procedure Call. An open-source high-
RPC framework that gNMI is based on. See [grpc] for details. performance RPC framework that gNMI is based on. See
[grpc] for details.
IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. IPFIX: IP Flow Information Export Protocol. Specified in
[RFC7011].
IOAM: In-situ OAM [I-D.ietf-ippm-ioam-data], a dataplane on-path IOAM: In situ OAM [RFC9197]. A data plane on-path telemetry
telemetry technique. technique.
JSON: An open standard file format and data interchange format that JSON: JavaScript Object Notation. An open standard file format
uses human-readable text to store and transmit data objects, and data interchange format that uses human-readable text
specified in [RFC8259]. to store and transmit data objects, as specified in
[RFC8259].
MIB: Management Information Base, a database used for managing the MIB: Management Information Base. A database used for
entities in a network. managing the entities in a network.
NETCONF: Network Configuration Protocol, specified in [RFC6241]. NETCONF: Network Configuration Protocol. Specified in [RFC6241].
NetFlow: A Cisco protocol for flow record collecting, described in NetFlow: A Cisco protocol used for flow record collecting, as
[RFC3954]. described in [RFC3954].
Network Telemetry: The process and instrumentation for acquiring and Network Telemetry: The process and instrumentation for acquiring and
utilizing network data remotely for network monitoring and utilizing network data remotely for network monitoring
operation. A general term for a large set of network visibility and operation. A general term for a large set of network
techniques and protocols, concerning aspects like data generation, visibility techniques and protocols, concerning aspects
collection, correlation, and consumption. Network telemetry like data generation, collection, correlation, and
addresses the current network operation issues and enables smooth consumption. Network telemetry addresses current network
evolution toward future intent-driven autonomous networks. operation issues and enables smooth evolution toward
future intent-driven autonomous networks.
NMS: Network Management System, referring to applications that allow NMS: Network Management System. Refers to applications that
network administrators to manage a network. allow network administrators to manage a network.
OAM: Operations, Administration, and Maintenance. A group of OAM: Operations, Administration, and Maintenance. A group of
network management functions that provide network fault network management functions that provide network fault
indication, fault localization, performance information, and data indication, fault localization, performance information,
and diagnosis functions. Most conventional network monitoring and data and diagnosis functions. Most conventional
techniques and protocols belong to network OAM. network monitoring techniques and protocols belong to
network OAM.
PBT: Postcard-Based Telemetry, a dataplane on-path telemetry PBT: Postcard-Based Telemetry. A data plane on-path telemetry
technique. A representative technique is described in technique. A representative technique is described in
[I-D.ietf-ippm-ioam-direct-export]. [IPPM-IOAM-DIRECT-EXPORT].
RESTCONF: An HTTP-based protocol that provides a programmatic RESTCONF: An HTTP-based protocol that provides a programmatic
interface for accessing data defined in YANG, using the datastore interface for accessing data defined in YANG, using the
concepts defined in NETCONF, as specified in [RFC8040]. datastore concepts defined in NETCONF, as specified in
[RFC8040].
SMIv2: Structure of Management Information Version 2, defining MIB SMIv2: Structure of Management Information Version 2. Defines
objects, specified in [RFC2578]. MIB objects, as specified in [RFC2578].
SNMP: Simple Network Management Protocol. Version 1, 2, and 3 are SNMP: Simple Network Management Protocol. Versions 1, 2, and 3
specified in [RFC1157], [RFC3416], and [RFC3411], respectively. are specified in [RFC1157], [RFC3416], and [RFC3411],
respectively.
XML: Extensible Markup Language is a markup language for data XML: Extensible Markup Language. A markup language for data
encoding that is both human-readable and machine-readable, encoding that is both human readable and machine
specified by W3C [xml]. readable, as specified by W3C [W3C.REC-xml-20081126].
YANG: YANG is a data modeling language for the definition of data YANG: YANG is a data modeling language for the definition of
sent over network management protocols such as the NETCONF and data sent over network management protocols such as
RESTCONF. YANG is defined in [RFC6020] and [RFC7950]. NETCONF and RESTCONF. YANG is defined in [RFC6020] and
[RFC7950].
YANG ECA: A YANG model for Event-Condition-Action policies, defined YANG ECA: A YANG model for Event-Condition-Action policies, as
in [I-D.wwx-netmod-event-yang]. defined in [NETMOD-ECA-POLICY].
YANG-Push: A mechanism that allows subscriber applications to YANG-Push: A mechanism that allows subscriber applications to
request a stream of updates from a YANG datastore on a network request a stream of updates from a YANG datastore on a
device. Details are specified in [RFC8641] and [RFC8639]. network device. Details are specified in [RFC8639] and
[RFC8641].
2. Background 2. Background
The term "big data" is used to describe the extremely large volume of The term "big data" is used to describe the extremely large volume of
data sets that can be analyzed computationally to reveal patterns, data sets that can be analyzed computationally to reveal patterns,
trends, and associations. Networks are undoubtedly a source of big trends, and associations. Networks are undoubtedly a source of big
data because of their scale and the volume of network traffic they data because of their scale and the volume of network traffic they
forward. When a network's endpoints do not represent individual forward. When a network's endpoints do not represent individual
users (e.g. in industrial, datacenter, and infrastructure contexts), users (e.g., in industrial, data-center, and infrastructure
network operations can often benefit from large-scale data collection contexts), network operations can often benefit from large-scale data
without breaching user privacy. collection without breaching user privacy.
Today one can access advanced big data analytics capability through a Today, one can access advanced big data analytics capability through
plethora of commercial and open source platforms (e.g., Apache a plethora of commercial and open-source platforms (e.g., Apache
Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine
learning). Thanks to the advance of computing and storage learning). Thanks to the advance of computing and storage
technologies, network big data analytics gives network operators an technologies, network big data analytics give network operators an
opportunity to gain network insights and move towards network opportunity to gain network insights and move towards network
autonomy. Some operators start to explore the application of autonomy. Some operators start to explore the application of
Artificial Intelligence (AI) to make sense of network data. Software Artificial Intelligence (AI) to make sense of network data. Software
tools can use the network data to detect and react on network faults, tools can use the network data to detect and react on network faults,
anomalies, and policy violations, as well as predicting future anomalies, and policy violations, as well as predict future events.
events. In turn, the network policy updates for planning, intrusion In turn, the network policy updates for planning, intrusion
prevention, optimization, and self-healing may be applied. prevention, optimization, and self-healing may be applied.
It is conceivable that an autonomic network [RFC7575] is the logical It is conceivable that an autonomic network [RFC7575] is the logical
next step for network evolution following Software Defined Networking next step for network evolution following Software-Defined Networking
(SDN), aiming to reduce (or even eliminate) human labor, make more (SDN), which aims to reduce (or even eliminate) human labor, make
efficient use of network resources, and provide better services more more efficient use of network resources, and provide better services
aligned with customer requirements. The IETF ANIMA working group is more aligned with customer requirements. The IETF ANIMA Working
dedicated to developing and maintaining protocols and procedures for Group is dedicated to developing and maintaining protocols and
automated network management and control of professionally-managed procedures for automated network management and control of
networks. The related technique of Intent-based Networking (IBN) professionally managed networks. The related technique of
[I-D.irtf-nmrg-ibn-concepts-definitions] requires network visibility Intent-Based Networking (IBN) [NMRG-IBN-CONCEPTS-DEFINITIONS]
and telemetry data in order to ensure that the network is behaving as requires network visibility and telemetry data in order to ensure
intended. that the network is behaving as intended.
However, while the data processing capability is improved and However, while the data processing capability is improved and
applications require more data to function better, the networks lag applications require more data to function better, the networks lag
behind in extracting and translating network data into useful and behind in extracting and translating network data into useful and
actionable information in efficient ways. The system bottleneck is actionable information in efficient ways. The system bottleneck is
shifting from data consumption to data supply. Both the number of shifting from data consumption to data supply. Both the number of
network nodes and the traffic bandwidth keep increasing at a fast network nodes and the traffic bandwidth keep increasing at a fast
pace. The network configuration and policy change at smaller time pace. The network configuration and policy change at smaller time
slots than before. More subtle events and fine-grained data through slots than before. More subtle events and fine-grained data through
all network planes need to be captured and exported in real time. In all network planes need to be captured and exported in real time. In
a nutshell, it is a challenge to get enough high-quality data out of a nutshell, it is a challenge to get enough high-quality data out of
the network in a manner that is efficient, timely, and flexible. the network in a manner that is efficient, timely, and flexible.
Therefore, we need to survey the existing technologies and protocols Therefore, we need to survey the existing technologies and protocols
and identify any potential gaps. and identify any potential gaps.
In the remainder of this section, first we clarify the scope of In the remainder of this section, we first clarify the scope of
network data (i.e., telemetry data) relevant in this document. Then, network data (i.e., telemetry data) relevant in this document. Then,
we discuss several key use cases for today's and future network we discuss several key use cases for network operations of today and
operations. Next, we show why the current network OAM techniques and the future. Next, we show why the current network OAM techniques and
protocols are insufficient for these use cases. The discussion protocols are insufficient for these use cases. The discussion
underlines the need for new methods, techniques, and protocols, as underlines the need for new methods, techniques, and protocols, as
well as the extensions of existing ones, which we assign under the well as the extensions of existing ones, which we assign under the
umbrella term - Network Telemetry. umbrella term "Network Telemetry".
2.1. Telemetry Data Coverage 2.1. Telemetry Data Coverage
Any information that can be extracted from networks (including data Any information that can be extracted from networks (including the
plane, control plane, and management plane) and used to gain data plane, control plane, and management plane) and used to gain
visibility or as basis for actions is considered telemetry data. It visibility or as a basis for actions is considered telemetry data.
includes statistics, event records and logs, snapshots of state, It includes statistics, event records and logs, snapshots of state,
configuration data, etc. It also covers the outputs of any active configuration data, etc. It also covers the outputs of any active
and passive measurements [RFC7799]. In some cases, raw data is and passive measurements [RFC7799]. In some cases, raw data is
processed in network before being sent to a data consumer. Such processed in network before being sent to a data consumer. Such
processed data is also considered telemetry data. The value of processed data is also considered telemetry data. The value of
telemetry data varies. In some cases, if the cost is acceptable, telemetry data varies. In some cases, if the cost is acceptable,
less but higher quality data are preferred than lots of low quality less but higher-quality data are preferred rather than a lot of low-
data. A classification of telemetry data is provided in Section 3. quality data. A classification of telemetry data is provided in
To preserve the privacy of end-users, no user packet content should Section 3. To preserve the privacy of end users, no user packet
be collected. Specifically, the data objects generated, exported, content should be collected. Specifically, the data objects
and collected by a network telemetry application should not include generated, exported, and collected by a network telemetry application
any packet payload from traffic associated with end-users systems. should not include any packet payload from traffic associated with
end-user systems.
2.2. Use Cases 2.2. Use Cases
The following set of use cases is essential for network operations. The following set of use cases is essential for network operations.
While the list is by no means exhaustive, it is enough to highlight While the list is by no means exhaustive, it is enough to highlight
the requirements for data velocity, variety, volume, and veracity, the requirements for data velocity, variety, volume, and veracity,
the attributes of big data, in networks. the attributes of big data, in networks.
* Security: Network intrusion detection and prevention systems need * Security: Network intrusion detection and prevention systems need
to monitor network traffic and activities and act upon anomalies. to monitor network traffic and activities and act upon anomalies.
Given increasingly sophisticated attack vectors coupled with Given increasingly sophisticated attack vectors coupled with
increasingly severe consequences of security breaches, new tools increasingly severe consequences of security breaches, new tools
and techniques need to be developed, relying on wider and deeper and techniques need to be developed, relying on wider and deeper
visibility into networks. The ultimate goal is to achieve visibility into networks. The ultimate goal is to achieve
security with no, or only minimal, human intervention, and without security with no, or only minimal, human intervention and without
disrupting legitimate traffic flows. disrupting legitimate traffic flows.
* Policy and Intent Compliance: Network policies are the rules that * Policy and Intent Compliance: Network policies are the rules that
constrain the services for network access, provide service constrain the services for network access, provide service
differentiation, or enforce specific treatment on the traffic. differentiation, or enforce specific treatment on the traffic.
For example, a service function chain is a policy that requires For example, a service function chain is a policy that requires
the selected flows to pass through a set of ordered network the selected flows to pass through a set of ordered network
functions. Intent, as defined in functions. Intent, as defined in [NMRG-IBN-CONCEPTS-DEFINITIONS],
[I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational is a set of operational goals that a network should meet and
goals that a network should meet and outcomes that a network is outcomes that a network is supposed to deliver, defined in a
supposed to deliver, defined in a declarative manner without declarative manner without specifying how to achieve or implement
specifying how to achieve or implement them. An intent requires a them. An intent requires a complex translation and mapping
complex translation and mapping process before being applied on process before being applied on networks. While a policy or
networks. While a policy or intent is enforced, the compliance intent is enforced, the compliance needs to be verified and
needs to be verified and monitored continuously by relying on monitored continuously by relying on visibility that is provided
visibility that is provided through network telemetry data. Any through network telemetry data. Any violation must be reported
violation must be reported immediately, potentially resulting in immediately - this will alert the network administrator to the
updates to how the policy or intent is applied in the network to policy or intent violation and will potentially result in updates
ensure that it remains in force, or otherwise alerting the network to how the policy or intent is applied in the network to ensure
administrator to the policy or intent violation. that it remains in force.
* SLA Compliance: A Service-Level Agreement (SLA) is a service * SLA Compliance: A Service Level Agreement (SLA) is a service
contract between a service provider and a client, which include contract between a service provider and a client, which includes
the metrics for the service measurement and remedy/penalty the metrics for the service measurement and remedy/penalty
procedures when the service level misses the agreement. Users procedures when the service level misses the agreement. Users
need to check if they get the service as promised and network need to check if they get the service as promised, and network
operators need to evaluate how they can deliver services that can operators need to evaluate how they can deliver services that meet
meet the SLA based on realtime network telemetry data, including the SLA based on real-time network telemetry data, including data
data from network measurements. from network measurements.
* Root Cause Analysis: Many network failure can be the effect of a * Root Cause Analysis: Many network failures can be the effect of a
sequence of chained events. Troubleshooting and recovery require sequence of chained events. Troubleshooting and recovery require
quick identification of the root cause of any observable issues. quick identification of the root cause of any observable issues.
However, the root cause is not always straightforward to identify, However, the root cause is not always straightforward to identify,
especially when the failure is sporadic and the number of event especially when the failure is sporadic and the number of event
messages, both related and unrelated to the same cause, is messages, both related and unrelated to the same cause, is
overwhelming. While technologies such as machine learning can be overwhelming. While technologies such as machine learning can be
used for root cause analysis, it is up to the network to sense and used for root cause analysis, it is up to the network to sense and
provide the relevant diagnostic data which are either actively fed provide the relevant diagnostic data that are either actively fed
into, or passively retrieved by, the root cause analysis into or passively retrieved by the root cause analysis
applications. applications.
* Network Optimization: This covers all short-term and long-term * Network Optimization: This covers all short-term and long-term
network optimization techniques, including load balancing, Traffic network optimization techniques, including load balancing, Traffic
Engineering (TE), and network planning. Network operators are Engineering (TE), and network planning. Network operators are
motivated to optimize their network utilization and differentiate motivated to optimize their network utilization and differentiate
services for better Return On Investment (ROI) or lower Capital services for better Return on Investment (ROI) or lower Capital
Expenditures (CAPEX). The first step is to know the real-time Expenditure (CAPEX). The first step is to know the real-time
network conditions before applying policies for traffic network conditions before applying policies for traffic
manipulation. In some cases, micro-bursts need to be detected in manipulation. In some cases, microbursts need to be detected in a
a very short time-frame so that fine-grained traffic control can very short time frame so that fine-grained traffic control can be
be applied to avoid network congestion. Long-term planning of applied to avoid network congestion. Long-term planning of
network capacity and topology requires analysis of real-world network capacity and topology requires analysis of real-world
network telemetry data that is obtained over long periods of time. network telemetry data that is obtained over long periods of time.
* Event Tracking and Prediction: The visibility into traffic path * Event Tracking and Prediction: The visibility into traffic path
and performance is critical for services and applications that and performance is critical for services and applications that
rely on healthy network operation. Numerous related network rely on healthy network operation. Numerous related network
events are of interest to network operators. For example, Network events are of interest to network operators. For example, network
operators want to learn where and why packets are dropped for an operators want to learn where and why packets are dropped for an
application flow. They also want to be warned of issues in application flow. They also want to be warned of issues in
advance, so proactive actions can be taken to avoid catastrophic advance, so proactive actions can be taken to avoid catastrophic
consequences. consequences.
2.3. Challenges 2.3. Challenges
For a long time, network operators have relied upon SNMP [RFC3416], For a long time, network operators have relied upon SNMP [RFC3416],
Command-Line Interface (CLI), or Syslog [RFC5424] to monitor the Command-Line Interface (CLI), or Syslog [RFC5424] to monitor the
network. Some other OAM techniques as described in [RFC7276] are network. Some other OAM techniques as described in [RFC7276] are
also used to facilitate network troubleshooting. These conventional also used to facilitate network troubleshooting. These conventional
techniques are not sufficient to support the above use cases for the techniques are not sufficient to support the above use cases for the
following reasons: following reasons:
* Most use cases need to continuously monitor the network and * Most use cases need to continuously monitor the network and
dynamically refine the data collection in real-time. Poll-based dynamically refine the data collection in real time. Poll-based
low-frequency data collection is ill-suited for these low-frequency data collection is ill-suited for these
applications. Subscription-based streaming data directly pushed applications. Subscription-based streaming data directly pushed
from the data source (e.g., the forwarding chip) is preferred to from the data source (e.g., the forwarding chip) is preferred to
provide sufficient data quantity and precision at scale. provide sufficient data quantity and precision at scale.
* Comprehensive data is needed, ranging from packet processing * Comprehensive data is needed, ranging from packet processing
engines to traffic manager, from line cards to main control board, engines to traffic managers, line cards to main control boards,
from user flows to control protocol packets, from device user flows to control protocol packets, device configurations to
configurations to operations, and from physical layer to operations, and physical layers to application layers.
application layer. Conventional OAM only covers a narrow range of Conventional OAM only covers a narrow range of data (e.g., SNMP
data (e.g., SNMP only handles data from the Management Information only handles data from the Management Information Base (MIB)).
Base (MIB)). Classical network devices cannot provide all the Classical network devices cannot provide all the necessary probes.
necessary probes. More open and programmable network devices are More open and programmable network devices are therefore needed.
therefore needed.
* Many application scenarios need to correlate network-wide data * Many application scenarios need to correlate network-wide data
from multiple sources (i.e., from distributed network devices, from multiple sources (i.e., from distributed network devices,
different components of a network device, or different network different components of a network device, or different network
planes). A piecemeal solution is often lacking the capability to planes). A piecemeal solution is often lacking the capability to
consolidate the data from multiple sources. The composition of a consolidate the data from multiple sources. The composition of a
complete solution, as partly proposed by Autonomic Resource complete solution, as partly proposed by Autonomic Resource
Control Architecture(ARCA) Control Architecture (ARCA) [NMRG-ANTICIPATED-ADAPTATION], will be
[I-D.pedro-nmrg-anticipated-adaptation], will be empowered and empowered and guided by a comprehensive framework.
guided by a comprehensive framework.
* Some conventional OAM techniques (e.g., CLI and Syslog) lack a * Some conventional OAM techniques (e.g., CLI and Syslog) lack a
formal data model. The unstructured data hinder the tool formal data model. The unstructured data hinder the tool
automation and application extensibility. Standardized data automation and application extensibility. Standardized data
models are essential to support the programmable networks. models are essential to support the programmable networks.
* Although some conventional OAM techniques support data push (e.g., * Although some conventional OAM techniques support data push (e.g.,
SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow [RFC3176]), the SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow [RFC3176]), the
pushed data are limited to only predefined management plane pushed data are limited to only predefined management plane
warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow). warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow).
Network operators require the data with arbitrary source, Network operators require the data with arbitrary source,
granularity, and precision which are beyond the capability of the granularity, and precision, which is beyond the capability of the
existing techniques. existing techniques.
* The conventional passive measurement techniques can either consume * Conventional passive measurement techniques can either consume
excessive network resources and produce excessive redundant data, excessive network resources and produce excessive redundant data
or lead to inaccurate results; on the other hand, the conventional or lead to inaccurate results; on the other hand, conventional
active measurement techniques can interfere with the user traffic active measurement techniques can interfere with the user traffic,
and their results are indirect. Techniques that can collect and their results are indirect. Techniques that can collect
direct and on-demand data from user traffic are more favorable. direct and on-demand data from user traffic are more favorable.
These challenges were addressed by newer standards and techniques These challenges were addressed by newer standards and techniques
(e.g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push) (e.g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push),
and more are emerging. These standards and techniques need to be and more are emerging. These standards and techniques need to be
recognized and accommodated in a new framework. recognized and accommodated in a new framework.
2.4. Network Telemetry 2.4. Network Telemetry
Network telemetry has emerged as a mainstream technical term to refer Network telemetry has emerged as a mainstream technical term to refer
to the network data collection and consumption techniques. Several to the network data collection and consumption techniques. Several
network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and
gRPC [grpc]) have been widely deployed. Network telemetry allows gRPC [grpc]) have been widely deployed. Network telemetry allows
separate entities to acquire data from network devices so that data separate entities to acquire data from network devices so that data
can be visualized and analyzed to support network monitoring and can be visualized and analyzed to support network monitoring and
operation. Network telemetry covers the conventional network OAM and operation. Network telemetry covers the conventional network OAM and
has a wider scope. For instance, it is expected that network has a wider scope. For instance, it is expected that network
telemetry can provide the necessary network insight for autonomous telemetry can provide the necessary network insight for autonomous
networks and address the shortcomings of conventional OAM techniques. networks and address the shortcomings of conventional OAM techniques.
Network telemetry usually assumes machines as data consumers rather Network telemetry usually assumes machines as data consumers rather
than human operators. Hence, the network telemetry can directly than human operators. Hence, network telemetry can directly trigger
trigger the automated network operation, while in contrast some the automated network operation, while in contrast, some conventional
conventional OAM tools were designed and used to help human operators OAM tools were designed and used to help human operators to monitor
to monitor and diagnose the networks and guide manual network and diagnose the networks and guide manual network operations. Such
operations. Such a proposition leads to very different techniques. a proposition leads to very different techniques.
Although new network telemetry techniques are emerging and subject to Although new network telemetry techniques are emerging and subject to
continuous evolution, several characteristics of network telemetry continuous evolution, several characteristics of network telemetry
have been well accepted. Note that network telemetry is intended to have been well accepted. Note that network telemetry is intended to
be an umbrella term covering a wide spectrum of techniques, so the be an umbrella term covering a wide spectrum of techniques, so the
following characteristics are not expected to be held by every following characteristics are not expected to be held by every
specific technique. specific technique.
* Push and Streaming: Instead of polling data from network devices, * Push and Streaming: Instead of polling data from network devices,
telemetry collectors subscribe to streaming data pushed from data telemetry collectors subscribe to streaming data pushed from data
sources in network devices. sources in network devices.
* Volume and Velocity: The telemetry data is intended to be consumed * Volume and Velocity: Telemetry data is intended to be consumed by
by machines rather than by human being. Therefore, the data machines rather than by human beings. Therefore, the data volume
volume can be huge and the processing is optimized for the needs can be huge, and the processing is optimized for the needs of
of automation in realtime. automation in real time.
* Normalization and Unification: Telemetry aims to address the * Normalization and Unification: Telemetry aims to address the
overall network automation needs. Efforts are made to normalize overall network automation needs. Efforts are made to normalize
the data representation and unify the protocols, so as to simplify the data representation and unify the protocols, so as to simplify
data analysis and provide integrated analysis across heterogeneous data analysis and provide integrated analysis across heterogeneous
devices and data sources across a network. devices and data sources across a network.
* Model-based: The telemetry data is modeled in advance which allows * Model-Based: Telemetry data is modeled in advance, which allows
applications to configure and consume data with ease. applications to configure and consume data with ease.
* Data Fusion: The data for a single application can come from * Data Fusion: The data for a single application can come from
multiple data sources (e.g., cross-domain, cross-device, and multiple data sources (e.g., cross-domain, cross-device, and
cross-layer) based on common naming/ID and needs to be correlated cross-layer) that are based on a common name/ID and need to be
to take effect. correlated to take effect.
* Dynamic and Interactive: Since the network telemetry means to be * Dynamic and Interactive: Since the network telemetry means to be
used in a closed control loop for network automation, it needs to used in a closed control loop for network automation, it needs to
run continuously and adapt to the dynamic and interactive queries run continuously and adapt to the dynamic and interactive queries
from the network operation controller. from the network operation controller.
In addition, an ideal network telemetry solution may also have the In addition, an ideal network telemetry solution may also have the
following features or properties: following features or properties:
* In-Network Customization: The data that is generated can be * In-Network Customization: The data that is generated can be
customized in network at run-time to cater to the specific need of customized in network at runtime to cater to the specific need of
applications. This needs the support of a programmable data plane applications. This needs the support of a programmable data
which allows probes with custom functions to be deployed at plane, which allows probes with custom functions to be deployed at
flexible locations. flexible locations.
* In-Network Data Aggregation and Correlation: Network devices and * In-Network Data Aggregation and Correlation: Network devices and
aggregation points can work out which events and what data needs aggregation points can work out which events and what data needs
to be stored, reported, or discarded thus reducing the load on the to be stored, reported, or discarded, thus reducing the load on
central collection and processing points while still ensuring that the central collection and processing points while still ensuring
the right information is ready to be processed in a timely way. that the right information is ready to be processed in a timely
way.
* In-Network Processing: Sometimes it is not necessary or feasible * In-Network Processing: Sometimes it is not necessary or feasible
to gather all information to a central point to be processed and to gather all information to a central point to be processed and
acted upon. It is possible for the data processing to be done in acted upon. It is possible for the data processing to be done in
network, allowing reactive actions to be taken locally. network, allowing reactive actions to be taken locally.
* Direct Data Plane Export: The data originated from the data plane * Direct Data Plane Export: The data originated from data plane
forwarding chips can be directly exported to the data consumer for forwarding chips can be directly exported to the data consumer for
efficiency, especially when the data bandwidth is large and the efficiency, especially when the data bandwidth is large and real-
real-time processing is required. time processing is required.
* In-band Data Collection: In addition to the passive and active * In-Band Data Collection: In addition to the passive and active
data collection approaches, the new hybrid approach allows to data collection approaches, the new hybrid approach allows to
directly collect data for any target flow on its entire forwarding directly collect data for any target flow on its entire forwarding
path [I-D.song-opsawg-ifit-framework]. path [OPSAWG-IFIT-FRAMEWORK].
It is worth noting that a network telemetry system should not be It is worth noting that a network telemetry system should not be
intrusive to normal network operations by avoiding the pitfall of the intrusive to normal network operations by avoiding the pitfall of the
"observer effect". That is, it should not change the network "observer effect". That is, it should not change the network
behavior and affect the forwarding performance. Moreover, high- behavior and affect the forwarding performance. Moreover, high-
volume telemetry traffic may cause network congestion unless proper volume telemetry traffic may cause network congestion unless proper
isolation or traffic engineering techniques are in place, or isolation or traffic engineering techniques are in place, or
congestion control mechanisms ensure that telemetry traffic backs off congestion control mechanisms ensure that telemetry traffic backs off
if it exceeds the network capacity. [RFC8084] and [RFC8085] are if it exceeds the network capacity. [RFC8084] and [RFC8085] are
relevant Best Current Practices (BCP) in this space. relevant Best Current Practices (BCPs) in this space.
Although in many cases a system for network telemetry involves a Although in many cases a system for network telemetry involves a
remote data collecting and consuming entity, it is important to remote data collecting and consuming entity, it is important to
understand that there are no inherent assumptions about how a system understand that there are no inherent assumptions about how a system
should be architected. While a network architecture with centralized should be architected. While a network architecture with a
controller (e.g., SDN) seems a natural fit for network telemetry, centralized controller (e.g., SDN) seems to be a natural fit for
network telemetry can work in distributed fashions as well. For network telemetry, network telemetry can work in distributed fashions
example, telemetry data producers and consumers can have a peer-to- as well. For example, telemetry data producers and consumers can
peer relationship, in which a network node can be the direct consumer have a peer-to-peer relationship, in which a network node can be the
of telemetry data from other nodes. direct consumer of telemetry data from other nodes.
2.5. The Necessity of a Network Telemetry Framework 2.5. The Necessity of a Network Telemetry Framework
Network data analytics (e.g., machine learning) is applied for Network data analytics (e.g., machine learning) is applied for
network operation automation, relying on abundant and coherent data network operation automation, relying on abundant and coherent data
from networks. Data acquisition that is limited to a single source from networks. Data acquisition that is limited to a single source
and static in nature will in many cases not be sufficient to meet an and static in nature will in many cases not be sufficient to meet an
application's telemetry data needs. As a result, multiple data application's telemetry data needs. As a result, multiple data
sources, involving a variety of techniques and standards, will need sources, involving a variety of techniques and standards, will need
to be integrated. It is desirable to have a framework that to be integrated. It is desirable to have a framework that
classifies and organizes different telemetry data source and types, classifies and organizes different telemetry data sources and types,
defines different components of a network telemetry system and their defines different components of a network telemetry system and their
interactions, and helps coordinate and integrate multiple telemetry interactions, and helps coordinate and integrate multiple telemetry
approaches across layers. This allows flexible combinations of data approaches across layers. This allows flexible combinations of data
for different applications, while normalizing and simplifying for different applications, while normalizing and simplifying
interfaces. In detail, such a framework would benefit the interfaces. In detail, such a framework would benefit the
development of network operation applications for the following development of network operation applications for the following
reasons: reasons:
* Future networks, autonomous or otherwise, depend on holistic and * Future networks, autonomous or otherwise, depend on holistic and
comprehensive network visibility. The use cases and applications comprehensive network visibility. Use cases and applications are
are better to be supported uniformly and coherently using an better when supported uniformly and coherently using an
integrated, converged mechanism and common telemetry data integrated, converged mechanism and common telemetry data
representations wherever feasible. Therefore, the protocols and representations wherever feasible. Therefore, the protocols and
mechanisms should be consolidated into a minimum yet comprehensive mechanisms should be consolidated into a minimum yet comprehensive
set. A telemetry framework can help to normalize the technique set. A telemetry framework can help to normalize the technique
developments. developments.
* Network visibility presents multiple viewpoints. For example, the * Network visibility presents multiple viewpoints. For example, the
device viewpoint takes the network infrastructure as the device viewpoint takes the network infrastructure as the
monitoring object from which the network topology and device monitoring object from which the network topology and device
status can be acquired; the traffic viewpoint takes the flows or status can be acquired, and the traffic viewpoint takes the flows
packets as the monitoring object from which the traffic quality or packets as the monitoring object from which the traffic quality
and path can be acquired. An application may need to switch its and path can be acquired. An application may need to switch its
viewpoint during operation. It may also need to correlate a viewpoint during operation. It may also need to correlate a
service and its impact on user experience to acquire the service and its impact on user experience (UE) to acquire the
comprehensive information. comprehensive information.
* Applications require network telemetry to be elastic in order to * Applications require network telemetry to be elastic in order to
make efficient use of network resources and reduce the impact of make efficient use of network resources and reduce the impact of
processing related to network telemetry on network performance. processing related to network telemetry on network performance.
For example, routine network monitoring should cover the entire For example, routine network monitoring should cover the entire
network with a low data sampling rate. Only when issues arise or network with a low data sampling rate. Only when issues arise or
critical trends emerge should telemetry data sources be modified critical trends emerge should telemetry data sources be modified
and telemetry data rates boosted as needed. and telemetry data rates be boosted as needed.
* Efficient data aggregation is critical for applications to reduce * Efficient data aggregation is critical for applications to reduce
the overall quantity of data and improve the accuracy of analysis. the overall quantity of data and improve the accuracy of analysis.
A telemetry framework collects together all the telemetry-related A telemetry framework collects all the telemetry-related works from
works from different sources and working groups within IETF. This different sources and working groups within the IETF. This makes it
makes it possible to assemble a comprehensive network telemetry possible to assemble a comprehensive network telemetry system and to
system and to avoid repetitious or redundant work. The framework avoid repetitious or redundant work. The framework should cover the
should cover the concepts and components from the standardization concepts and components from the standardization perspective. This
perspective. This document describes the modules which make up a document describes the modules that make up a network telemetry
network telemetry framework and decomposes the telemetry system into framework and decomposes the telemetry system into a set of distinct
a set of distinct components that existing and future work can easily components that existing and future work can easily map to.
map to.
3. Network Telemetry Framework 3. Network Telemetry Framework
The top level network telemetry framework partitions the network The top-level network telemetry framework partitions the network
telemetry into four modules based on the telemetry data object source telemetry into four modules based on the telemetry data object source
and represents their relationship. Once the network operation and represents their relationship. Once the network operation
applications acquire the data from these modules, they can apply data applications acquire the data from these modules, they can apply data
analytics and take actions. At the next level, the framework analytics and take actions. At the next level, the framework
decomposes each module into separate components. Each of the modules decomposes each module into separate components. Each of these
follows the same underlying structure, with one component dedicated modules follows the same underlying structure, with one component
to the configuration of data subscriptions and data sources, a second dedicated to the configuration of data subscriptions and data
component dedicated to encoding and exporting data, and a third sources, a second component dedicated to encoding and exporting data,
component instrumenting the generation of telemetry related to the and a third component instrumenting the generation of telemetry
underlying resources. Throughout the framework, the same set of related to the underlying resources. Throughout the framework, the
abstract data acquiring mechanisms and data types (Section 3.3) are same set of abstract data-acquiring mechanisms and data types
applied. The two-level architecture with the uniform data (Section 3.3) are applied. The two-level architecture with the
abstraction helps accurately pinpoint a protocol or technique to its uniform data abstraction helps accurately pinpoint a protocol or
position in a network telemetry system or disaggregate a network technique to its position in a network telemetry system or
telemetry system into manageable parts. disaggregates a network telemetry system into manageable parts.
3.1. Top Level Modules 3.1. Top-Level Modules
Telemetry can be applied on the forwarding plane, the control plane, Telemetry can be applied on the forwarding plane, control plane, and
and the management plane in a network, as well as other sources out management plane in a network, as well as on other sources out of the
of the network, as shown in Figure 1. Therefore, we categorize the network, as shown in Figure 1. Therefore, we categorize the network
network telemetry into four distinct modules (management plane, telemetry into four distinct modules (management plane, control
control plane, forwarding plane, and external data and event plane, forwarding plane, and external data and event telemetry) with
telemetry) with each having its own interface to Network Operation each having its own interface to network operation applications.
Applications.
+------------------------------+ +------------------------------+
| | | |
| Network Operation |<-------+ | Network Operation |<-------+
| Applications | | | Applications | |
| | | | | |
+------------------------------+ | +------------------------------+ |
^ ^ ^ | ^ ^ ^ |
| | | | | | | |
V V | V V V | V
skipping to change at page 15, line 39 skipping to change at line 709
| Management | ^ V | | Telemetry | | Management | ^ V | | Telemetry |
| Plane +-------|-------+ | | | Plane +-------|-------+ | |
| Telemetry | V | +-----------+ | Telemetry | V | +-----------+
| | Forwarding | | | Forwarding |
| | Plane | | | Plane |
| <---> | | <---> |
| | Telemetry | | | Telemetry |
| | | | | |
+--------------+---------------+ +--------------+---------------+
Figure 1: Modules in Layer Category of NTF Figure 1: Modules in Layer Category of the Network Telemetry
Framework
The rationale of this partition lies in the different telemetry data The rationale of this partition lies in the different telemetry data
objects which result in different data source and export locations. objects that result in different data sources and export locations.
Such differences have profound implications on in-network data Such differences have profound implications on in-network data
programming and processing capability, data encoding and transport programming and processing capability, data encoding and the
protocol, and required data bandwidth and latency. Data can be sent transport protocol, and required data bandwidth and latency. Data
directly, or proxied via the control and management planes. There can be sent directly or proxied via the control and management
are advantages/disadvantages to both approaches. planes. There are advantages/disadvantages to both approaches.
Note that in some cases the network controller itself may be the Note that in some cases, the network controller itself may be the
source of telemetry data that is unique to it or derived from the source of telemetry data that is unique to it or derived from the
telemetry data collected from the network elements. Some of the telemetry data collected from the network elements. Some of the
principles and taxonomy specific to the control plane and management principles and taxonomy specific to the control plane and management
plane telemetry could also be applied to the controller when it is plane telemetry could also be applied to the controller when it is
required to provide the telemetry data to Network Operation required to provide the telemetry data to network operation
Applications hosted outside. The scope of the document is focused on applications hosted outside. The scope of this document is focused
the network elements telemetry and further details related to on the network elements telemetry, and further details related to
controllers are thus out of scope. controllers are thus out of scope.
We summarize the major differences of the four modules in the We summarize the major differences of the four modules in Table 1.
following table. They are compared from six angles: They are compared from six angles:
* Data Object * Data Object
* Data Export Location * Data Export Location
* Data Model * Data Model
* Data Encoding * Data Encoding
* Telemetry Application Protocol * Telemetry Application Protocol
skipping to change at page 16, line 34 skipping to change at line 754
Data Object is the target and source of each module. Because the Data Object is the target and source of each module. Because the
data source varies, the location where data is mostly conveniently data source varies, the location where data is mostly conveniently
exported also varies. For example, forwarding plane data mainly exported also varies. For example, forwarding plane data mainly
originates as data exported from the forwarding Application-Specific originates as data exported from the forwarding Application-Specific
Integrated Circuits (ASICs), while control plane data mainly Integrated Circuits (ASICs), while control plane data mainly
originates from the protocol daemons running on the control CPU(s). originates from the protocol daemons running on the control CPU(s).
For convenience and efficiency, it is preferred to export the data For convenience and efficiency, it is preferred to export the data
off the device from locations near the source. Because the locations off the device from locations near the source. Because the locations
that can export data have different capabilities, different choices that can export data have different capabilities, different choices
of data model, encoding, and transport method are made to balance the of data models, encoding, and transport methods are made to balance
performance and cost. For example, the forwarding chip has high the performance and cost. For example, the forwarding chip has high
throughput but limited capacity for processing complex data and throughput but limited capacity for processing complex data and
maintaining state, while the main control CPU is capable of complex maintaining state, while the main control CPU is capable of complex
data and state processing, but has limited bandwidth for high data and state processing but has limited bandwidth for high
throughput data. As a result, the suitable telemetry protocol for throughput data. As a result, the suitable telemetry protocol for
each module can be different. Some representative techniques are each module can be different. Some representative techniques are
shown in the corresponding table blocks to highlight the technical shown in the corresponding table blocks to highlight the technical
diversity of these modules. Note that the selected techniques just diversity of these modules. Note that the selected techniques just
reflect the de facto state of the art and are by no means exhaustive reflect the de facto state of the art and are by no means exhaustive
(e.g., IPFIX can also be implemented over TCP and SCTP, but that is (e.g., IPFIX can also be implemented over TCP and SCTP, but that is
not recommended for forwarding plane). The key point is that one not recommended for the forwarding plane). The key point is that one
cannot expect to use a universal protocol to cover all the network cannot expect to use a universal protocol to cover all the network
telemetry requirements. telemetry requirements.
+-----------+-------------+-------------+--------------+----------+ +=============+===============+==========+==========+===============+
| Module |Management |Control |Forwarding |External | |Module |Management |Control |Forwarding|External Data |
| |Plane |Plane |Plane |Data | | |Plane |Plane |Plane | |
+-----------+-------------+-------------+--------------+----------+ +=============+===============+==========+==========+===============+
|Object |config. & |control |flow & packet |terminal, | |Object |configuration |control |flow and |terminal, |
| |operation |protocol & |QoS, traffic |social & | | |and operation |protocol |packet |social, and |
| |state |signaling, |stat., buffer |environ- | | |state |and |QoS, |environmental |
| | |RIB |& queue stat.,|mental | | | |signaling,|traffic | |
| | | |ACL, FIB | | | | |RIB |stat., | |
+-----------+-------------+-------------+--------------+----------+ | | | |buffer and| |
|Export |main control |main control |fwding chip |various | | | | |queue | |
|Location |CPU |CPU, |or linecard | | | | | |stat., | |
| | |linecard CPU |CPU; main | | | | | |FIB, | |
| | |or forwarding|control CPU | | | | | |Access | |
| | |chip |unlikely | | | | | |Control | |
+-----------+-------------+-------------+--------------+----------+ | | | |List (ACL)| |
|Data |YANG, MIB, |YANG, |YANG |YANG, | +-------------+---------------+----------+----------+---------------+
|Model |syslog |custom |custom, |custom | |Export |main control |main |forwarding|various |
+-----------+-------------+-------------+--------------+----------+ |Location |CPU |control |chip or | |
|Data |GPB, JSON, |GPB, JSON, |plain text |GPB, JSON | | | |CPU, |linecard | |
|Encoding |XML |XML, | |XML, plain| | | |linecard |CPU; main | |
| | |plain text | |text | | | |CPU, or |control | |
+-----------+-------------+-------------+--------------+----------+ | | |forwarding|CPU | |
|Application|gRPC,NETCONF,|gRPC,NETCONF,|IPFIX, traffic|gRPC | | | |chip |unlikely | |
|Protocol |RESTCONF |IPFIX,traffic|mirroring, | | +-------------+---------------+----------+----------+---------------+
| | |mirroring |gRPC, NETFLOW | | |Data Model |YANG, MIB, |YANG, |YANG, |YANG, custom |
+-----------+-------------+-------------+--------------+----------+ | |syslog |custom |custom | |
|Data |HTTP(S), TCP |HTTP(S), TCP,|UDP |HTTP(S), | +-------------+---------------+----------+----------+---------------+
|Transport | |UDP | |TCP, UDP | |Data Encoding|GPB, JSON, XML |GPB, JSON,|plain text|GPB, JSON, XML,|
+-----------+-------------+-------------+--------------+----------+ | | |XML, plain| |plain text |
| | |text | | |
+-------------+---------------+----------+----------+---------------+
|Application |gRPC, NETCONF, |gRPC, |IPFIX, |gRPC |
|Protocol |RESTCONF |NETCONF, |traffic | |
| | |IPFIX, |mirroring,| |
| | |traffic |gRPC, | |
| | |mirroring |NETFLOW | |
+-------------+---------------+----------+----------+---------------+
|Data |HTTP(S), TCP |HTTP(S), |UDP |HTTP(S), TCP, |
|Transport | |TCP, UDP | |UDP |
+-------------+---------------+----------+----------+---------------+
Figure 2: Comparison of the Data Object Modules Table 1: Comparison of Data Object Modules
Note that the interaction with the applications that consume network Note that the interaction with the applications that consume network
telemetry data can be indirect. Some in-device data transfer is telemetry data can be indirect. Some in-device data transfer is
possible. For example, in the management plane telemetry, the possible. For example, in the management plane telemetry, the
management plane will need to acquire data from the data plane. Some management plane will need to acquire data from the data plane. Some
operational states can only be derived from data plane data sources operational states can only be derived from data plane data sources
such as the interface status and statistics. As another example, such as the interface status and statistics. As another example,
obtaining control plane telemetry data may require the ability to obtaining control plane telemetry data may require the ability to
access the Forwarding Information Base (FIB) of the data plane. access the Forwarding Information Base (FIB) of the data plane.
skipping to change at page 18, line 13 skipping to change at line 835
the control plane telemetry. the control plane telemetry.
The requirements and challenges for each module are summarized as The requirements and challenges for each module are summarized as
follows (note that the requirements may pertain across all telemetry follows (note that the requirements may pertain across all telemetry
modules; however, we emphasize those that are most pronounced for a modules; however, we emphasize those that are most pronounced for a
particular plane). particular plane).
3.1.1. Management Plane Telemetry 3.1.1. Management Plane Telemetry
The management plane of network elements interacts with the Network The management plane of network elements interacts with the Network
Management System (NMS), and provides information such as performance Management System (NMS) and provides information such as performance
data, network logging data, network warning and defects data, and data, network logging data, network warning and defects data, and
network statistics and state data. The management plane includes network statistics and state data. The management plane includes
many protocols, including the classical SNMP and syslog. Regardless many protocols, including the classical SNMP and syslog. Regardless
the protocol, management plane telemetry must address the following the protocol, management plane telemetry must address the following
requirements: requirements:
* Convenient Data Subscription: An application should have the * Convenient Data Subscription: An application should have the
freedom to choose which data is exported (see section 4.3) and the freedom to choose which data is exported (see Section 3.3) and the
means and frequency of how that data is exported (e.g., on-change means and frequency of how that data is exported (e.g., on-change
or periodic subscription). or periodic subscription).
* Structured Data: For automatic network operation, machines will * Structured Data: For automatic network operation, machines will
replace human for network data comprehension. Data modeling replace humans for network data comprehension. Data modeling
languages, such as YANG, can efficiently describe structured data languages, such as YANG, can efficiently describe structured data
and normalize data encoding and transformation. and normalize data encoding and transformation.
* High Speed Data Transport: In order to keep up with the velocity * High-Speed Data Transport: In order to keep up with the velocity
of information, a data source needs to be able to send large of information, a data source needs to be able to send large
amounts of data at high frequency. Compact encoding formats or amounts of data at high frequency. Compact encoding formats or
data compression schemes are needed to reduce the quantity of data data compression schemes are needed to reduce the quantity of data
and improve the data transport efficiency. The subscription mode, and improve the data transport efficiency. The subscription mode,
by replacing the query mode, reduces the interactions between by replacing the query mode, reduces the interactions between
clients and servers and helps to improve the data source's clients and servers and helps to improve the data source's
efficiency. efficiency.
* Network Congestion Avoidance: The application must protect the * Network Congestion Avoidance: The application must protect the
network from congestion by congestion control mechanisms or at network from congestion with congestion control mechanisms or, at
least circuit breakers. [RFC8084] and [RFC8085] provide some minimum, with circuit breakers. [RFC8084] and [RFC8085] provide
solutions in this space. some solutions in this space.
3.1.2. Control Plane Telemetry 3.1.2. Control Plane Telemetry
The control plane telemetry refers to the health condition monitoring The control plane telemetry refers to the health condition monitoring
of different network control protocols at all layers of the protocol of different network control protocols at all layers of the protocol
stack. Keeping track of the operational status of these protocols is stack. Keeping track of the operational status of these protocols is
beneficial for detecting, localizing, and even predicting various beneficial for detecting, localizing, and even predicting various
network issues, as well as network optimization, in real-time and network issues, as well as for network optimization, in real time and
with fine granularity. Some particular challenges and issues faced with fine granularity. Some particular challenges and issues faced
by the control plane telemetry are as follows: by the control plane telemetry are as follows:
* One challenging problem for the control plane telemetry is how to * How to correlate the End-to-End (E2E) Key Performance Indicators
correlate the End-to-End (E2E) Key Performance Indicators (KPI) to (KPIs) to a specific layer's KPIs. For example, IPTV users may
a specific layer's KPIs. For example, IPTV users may describe describe their UE by the video smoothness and definition. Then in
their User Experience (UE) by the video smoothness and definition. case of an unusually poor UE KPI or a service disconnection, it is
Then in case of an unusually poor UE KPI or a service non-trivial to delimit and pinpoint the issue in the responsible
disconnection, it is non-trivial to delimit and pinpoint the issue protocol layer (e.g., the transport layer or the network layer),
in the responsible protocol layer (e.g., the Transport Layer or the responsible protocol (e.g., IS-IS or BGP at the network
the Network Layer), the responsible protocol (e.g., ISIS or BGP at layer), and finally the responsible device(s) with specific
the Network Layer), and finally the responsible device(s) with reasons.
specific reasons.
* Conventional OAM-based approaches for control plane KPI * Conventional OAM-based approaches for control plane KPI
measurement include Ping (L3), Traceroute (L3), Y.1731 [y1731] measurement, which include Ping (L3), Traceroute (L3), Y.1731
(L2), and so on. One common issue behind these methods is that [y1731] (L2), and so on. One common issue behind these methods is
they only measure the KPIs instead of reflecting the actual that they only measure the KPIs instead of reflecting the actual
running status of these protocols, making them less effective or running status of these protocols, making them less effective or
efficient for control plane troubleshooting and network efficient for control plane troubleshooting and network
optimization. optimization.
* An example of the control plane telemetry is the BGP monitoring * How more research is needed for the BGP monitoring protocol (BMP).
protocol (BMP). It is currently used for monitoring the BGP BMP is an example of the control plane telemetry; it is currently
routes and enables rich applications, such as BGP peer analysis, used for monitoring BGP routes and enables rich applications, such
AS analysis, prefix analysis, and security analysis. However, the as BGP peer analysis, Autonomous System (AS) analysis, prefix
monitoring of other layers, protocols and the cross-layer, cross- analysis, and security analysis. However, the monitoring of other
protocol KPI correlations are still in their infancy (e.g., IGP layers, protocols, and the cross-layer, cross-protocol KPI
monitoring is not as extensive as BMP), which require further correlations are still in their infancy (e.g., IGP monitoring is
research. not as extensive as BMP), which requires further research.
* The requirement and solutions for network congestion avoidance are Note that the requirement and solutions for network congestion
also applicable to the control plane telemetry. avoidance are also applicable to the control plane telemetry.
3.1.3. Forwarding Plane Telemetry 3.1.3. Forwarding Plane Telemetry
An effective forwarding plane telemetry system relies on the data An effective forwarding plane telemetry system relies on the data
that the network device can expose. The quality, quantity, and that the network device can expose. The quality, quantity, and
timeliness of data must meet some stringent requirements. This timeliness of data must meet some stringent requirements. This
raises some challenges to the network data plane devices where the raises some challenges for the network data plane devices where the
first-hand data originates. first-hand data originates.
* A data plane device's main function is user traffic processing and * A data plane device's main function is user traffic processing and
forwarding. While supporting network visibility is important, the forwarding. While supporting network visibility is important, the
telemetry is just an auxiliary function, and it should strive to telemetry is just an auxiliary function, and it should strive to
not impede normal traffic processing and forwarding (i.e., the not impede normal traffic processing and forwarding (i.e., the
forwarding behavior should not be altered and the trade-off forwarding behavior should not be altered, and the trade-off
between forwarding performance and telemetry should be well- between forwarding performance and telemetry should be well-
balanced). balanced).
* Network operation applications require end-to-end visibility * Network operation applications require end-to-end visibility
across various sources, which can result in a huge volume of data. across various sources, which can result in a huge volume of data.
However, the sheer quantity of data must not exhaust the network However, the sheer quantity of data must not exhaust the network
bandwidth, regardless of the data delivery approach (i.e., whether bandwidth, regardless of the data delivery approach (i.e., whether
through in-band or out-of-band channels). through in-band or out-of-band channels).
* The data plane devices must provide timely data with the minimum * The data plane devices must provide timely data with the minimum
possible delay. Long processing, transport, storage, and analysis possible delay. Long processing, transport, storage, and analysis
delay can impact the effectiveness of the control loop and even delay can impact the effectiveness of the control loop and even
render the data useless. render the data useless.
* The data should be structured and labeled, and easy for * The data should be structured, labeled, and easy for applications
applications to parse and consume. At the same time, the data to parse and consume. At the same time, the data types needed by
types needed by applications can vary significantly. The data applications can vary significantly. The data plane devices need
plane devices need to provide enough flexibility and to provide enough flexibility and programmability to support the
programmability to support the precise data provision for precise data provision for applications.
applications.
* The data plane telemetry should support incremental deployment and * The data plane telemetry should support incremental deployment and
work even though some devices are unaware of the system. work even though some devices are unaware of the system.
* The requirement and solutions for network congestion avoidance are * The requirement and solutions for network congestion avoidance are
also applicable to the forwarding plane telemetry. also applicable to the forwarding plane telemetry.
Although not specific to the forwarding plane, these challenges are Although not specific to the forwarding plane, these challenges are
more difficult to the forwarding plane because of the limited more difficult for the forwarding plane because of the limited
resource and flexibility. Data plane programmability is essential to resources and flexibility. Data plane programmability is essential
support network telemetry. Newer data plane forwarding chips are to support network telemetry. Newer data plane forwarding chips are
equipped with advanced telemetry features and provide flexibility to equipped with advanced telemetry features and provide flexibility to
support customized telemetry functions. support customized telemetry functions.
Technique Taxonomy: concerning about how one instruments the Technique Taxonomy: This pertains to how one instruments the
telemetry, there can be multiple possible dimensions to classify the telemetry; there can be multiple possible dimensions to classify the
forwarding plane telemetry techniques. forwarding plane telemetry techniques.
* Active, Passive, and Hybrid: This dimension concerns about the * Active, Passive, and Hybrid: This dimension pertains to the end-
end-to-end measurement. Active and passive methods (as well as to-end measurement. Active and passive methods (as well as the
the hybrid types) are well documented in [RFC7799]. Passive hybrid types) are well documented in [RFC7799]. Passive methods
methods include TCPDUMP, IPFIX [RFC7011], sFlow, and traffic include TCPDUMP, IPFIX [RFC7011], sFlow, and traffic mirroring.
mirroring. These methods usually have low data coverage. The These methods usually have low data coverage. The bandwidth cost
bandwidth cost is very high in order to improve the data coverage. is very high in order to improve the data coverage. On the other
On the other hand, active methods include Ping, OWAMP [RFC4656], hand, active methods include Ping, the One-Way Active Measurement
TWAMP [RFC5357], STAMP [RFC8762], and Cisco's SLA Protocol Protocol (OWAMP) [RFC4656], the Two-Way Active Measurement
[RFC6812]. These methods are intrusive and only provide indirect Protocol (TWAMP) [RFC5357], the Simple Two-way Active Measurement
network measurements. Hybrid methods, including in-situ OAM Protocol (STAMP) [RFC8762], and Cisco's SLA Protocol [RFC6812].
[I-D.ietf-ippm-ioam-data], Alternate-Marking (AM) [RFC8321], and These methods are intrusive and only provide indirect network
Multipoint Alternate Marking [RFC8889], provide a well-balanced measurements. Hybrid methods, including IOAM [RFC9197], Alternate
and more flexible approach. However, these methods are also more Marking (AM) [RFC8321], and Multipoint Alternate Marking
complex to implement. [RFC8889], provide a well-balanced and more flexible approach.
However, these methods are also more complex to implement.
* In-Band and Out-of-Band: Telemetry data carried in user packets * In-Band and Out-of-Band: Telemetry data carried in user packets
before being exported to a data collector is considered in-band before being exported to a data collector is considered in-band
(e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]). Telemetry data (e.g., IOAM [RFC9197]). Telemetry data that is directly exported
that is directly exported to a data collector without modifying to a data collector without modifying user packets is considered
user packets is considered out-of-band (e.g., the postcard-based out-of-band (e.g., the postcard-based approach described in
approach described in Appendix A.3.5). It is also possible to Appendix A.3.5). It is also possible to have hybrid methods,
have hybrid methods, where only the telemetry instruction or where only the telemetry instruction or partial data is carried by
partial data is carried by user packets (e.g., AM [RFC8321]). user packets (e.g., AM [RFC8321]).
* End-to-End and In-Network: End-to-End methods start from, and end * End-to-End and In-Network: End-to-end methods start from, and end
at, the network end hosts (e.g., Ping). In-Network methods work at, the network end hosts (e.g., Ping). In-network methods work
in networks and are transparent to end hosts. However, if needed, in networks and are transparent to end hosts. However, if needed,
In-Network methods can be easily extended into end hosts. in-network methods can be easily extended into end hosts.
* Data Subject: Depending on the telemetry objective, the methods * Data Subject: Depending on the telemetry objective, the methods
can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), can be flow based (e.g., IOAM [RFC9197]), path based (e.g.,
path-based (e.g., Traceroute), and node-based (e.g., IPFIX Traceroute), and node based (e.g., IPFIX [RFC7011]). The various
[RFC7011]). The various data objects can be packet, flow record, data objects can be packet, flow record, measurement, states, and
measurement, states, and signal. signal.
3.1.4. External Data Telemetry 3.1.4. External Data Telemetry
Events that occur outside the boundaries of the network system are Events that occur outside the boundaries of the network system are
another important source of network telemetry. Correlating both another important source of network telemetry. Correlating both
internal telemetry data and external events with the requirements of internal telemetry data and external events with the requirements of
network systems, as presented in network systems, as presented in [NMRG-ANTICIPATED-ADAPTATION],
[I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and provides a strategic and functional advantage to management
functional advantage to management operations. operations.
As with other sources of telemetry information, the data and events As with other sources of telemetry information, the data and events
must meet strict requirements, especially in terms of timeliness, must meet strict requirements, especially in terms of timeliness,
which is essential to properly incorporate external event information which is essential to properly incorporate external event information
into network management applications. The specific challenges are into network management applications. The specific challenges are
described as follows: described as follows:
* The role of the external event detector can be played by multiple * The role of the external event detector can be played by multiple
elements, including hardware (e.g., physical sensors, such as elements, including hardware (e.g., physical sensors, such as
seismometers) and software (e.g., Big Data sources that can seismometers) and software (e.g., big data sources that can
analyze streams of information, such as Twitter messages). Thus, analyze streams of information, such as Twitter messages). Thus,
the transmitted data must support different shapes but, at the the transmitted data must support different shapes but, at the
same time, follow a common but extensible schema. same time, follow a common but extensible schema.
* Since the main function of the external event detectors is to * Since the main function of the external event detectors is to
perform the notifications, their timeliness is assumed. However, perform the notifications, their timeliness is assumed. However,
once messages have been dispatched, they must be quickly collected once messages have been dispatched, they must be quickly collected
and inserted into the control plane with variable priority, which and inserted into the control plane with variable priority, which
is higher for important sources and events and lower for secondary is higher for important sources and events and lower for secondary
ones. ones.
* The schema used by external detectors must be easily adopted by * The schema used by external detectors must be easily adopted by
current and future devices and applications. Therefore, it must current and future devices and applications. Therefore, it must
be easily mapped to current data models, such as in terms of YANG. be easily mapped to current data models, such as in terms of YANG.
* As the communication with external entities outside the boundary * As the communication with external entities outside the boundary
of a provider network may be realized over the Internet, the risk of a provider network may be realized over the Internet, the risk
of congestion is even more relevant in this context and proper of congestion is even more relevant in this context and proper
counter-measures must be taken. Solutions such as network countermeasures must be taken. Solutions such as network
transport circuit breakers are needed as well. transport circuit breakers are needed as well.
Organizing both internal and external telemetry information together Organizing both internal and external telemetry information together
will be key for the general exploitation of the management will be key for the general exploitation of the management
possibilities of current and future network systems, as reflected in possibilities of current and future network systems, as reflected in
the incorporation of cognitive capabilities to new hardware and the incorporation of cognitive capabilities to new hardware and
software (virtual) elements. software (virtual) elements.
3.2. Second Level Function Components 3.2. Second-Level Function Components
The telemetry module at each plane can be further partitioned into The telemetry module at each plane can be further partitioned into
five distinct conceptual components: five distinct conceptual components:
* Data Query, Analysis, and Storage: This component works at the * Data Query, Analysis, and Storage: This component works at the
network operation application block in Figure 1. It is normally a network operation application block in Figure 1. It is normally a
part of the network management system at the receiver side. On part of the network management system at the receiver side. On
the one hand, it is responsible for issuing data requirements. one hand, it is responsible for issuing data requirements. The
The data of interest can be modeled data through configuration or data of interest can be modeled data through configuration or
custom data through programming. The data requirements can be custom data through programming. The data requirements can be
queries for one-shot data or subscriptions for events or streaming queries for one-shot data or subscriptions for events or streaming
data. On the other hand, it receives, stores, and processes the data. On the other hand, it receives, stores, and processes the
returned data from network devices. Data analysis can be returned data from network devices. Data analysis can be
interactive to initiate further data queries. This component can interactive to initiate further data queries. This component can
reside in either network devices or remote controllers. It can be reside in either network devices or remote controllers. It can be
centralized and distributed, and involve one or more instances. centralized and distributed and involve one or more instances.
* Data Configuration and Subscription: This component manages data * Data Configuration and Subscription: This component manages data
queries on devices. It determines the protocol and channel for queries on devices. It determines the protocol and channel for
applications to acquire desired data. This component is also applications to acquire desired data. This component is also
responsible for configuring the desired data that might not be responsible for configuring the desired data that might not be
directly available from data sources. The subscription data can directly available from data sources. The subscription data can
be described by models, templates, or programs. be described by models, templates, or programs.
* Data Encoding and Export: This component determines how telemetry * Data Encoding and Export: This component determines how telemetry
data is delivered to the data analysis and storage component with data is delivered to the data analysis and storage component with
skipping to change at page 23, line 30 skipping to change at line 1075
vary due to the data export location. vary due to the data export location.
* Data Generation and Processing: The requested data needs to be * Data Generation and Processing: The requested data needs to be
captured, filtered, processed, and formatted in network devices captured, filtered, processed, and formatted in network devices
from raw data sources. This may involve in-network computing and from raw data sources. This may involve in-network computing and
processing on either the fast path or the slow path in network processing on either the fast path or the slow path in network
devices. devices.
* Data Object and Source: This component determines the monitoring * Data Object and Source: This component determines the monitoring
objects and original data sources provisioned in the device. A objects and original data sources provisioned in the device. A
data source usually just provides raw data which needs further data source usually just provides raw data that needs further
processing. Each data source can be considered a probe. Some processing. Each data source can be considered a probe. Some
data sources can be dynamically installed, while others will be data sources can be dynamically installed, while others will be
more static. more static.
+----------------------------------------+ +----------------------------------------+
+----------------------------------------+ | +----------------------------------------+ |
| | | | | |
| Data Query, Analysis, & Storage | | | Data Query, Analysis, & Storage | |
| | + | | +
+-------+++ -----------------------------+ +-------+++ -----------------------------+
||| ^^^ ||| ^^^
||| ||| ||| |||
||V ||| ||V |||
+--+V--------------------+++------------+ +--+V--------------------+++------------+
+-----V---------------------+------------+ | +-----V---------------------+------------+ |
+---------------------+-------+----------+ | | +---------------------+-------+----------+ | |
| Data Configuration | | | | | Data Configuration | | | |
| & Subscription | Data Encoding | | | | & Subscription | Data Encoding | | |
| (model, template, | & Export | | | | (model, template, | & Export | | |
| & program) | | | | | & program) | | | |
+---------------------+------------------| | | +---------------------+------------------| | |
| | | | | | | |
| Data Generation | | | | Data Generation | | |
| & Processing | | | | & Processing | | |
| | | | | | | |
+----------------------------------------| | | +----------------------------------------| | |
| | | | | | | |
| Data Object and Source | |-+ | Data Object and Source | |-+
| |-+ | |-+
+----------------------------------------+ +----------------------------------------+
Figure 3: Components in the Network Telemetry Framework Figure 2: Components in the Network Telemetry Framework
3.3. Data Acquisition Mechanism and Type Abstraction 3.3. Data Acquisition Mechanism and Type Abstraction
Broadly speaking, network data can be acquired through subscription Broadly speaking, network data can be acquired through subscription
(push) and query (poll). A subscription is a contract between (push) and query (poll). A subscription is a contract between
publisher and subscriber. After initial setup, the subscribed data publisher and subscriber. After initial setup, the subscribed data
is automatically delivered to registered subscribers until the is automatically delivered to registered subscribers until the
subscription expires. There are two variations of subscription. The subscription expires. There are two variations of subscription. The
subscriptions can be either pre-defined, or the subscribers are subscriptions can be predefined, or the subscribers are allowed to
allowed to configure and tailor the published data to their specific configure and tailor the published data to their specific needs.
needs.
In contrast, queries are used when a client expects immediate and In contrast, queries are used when a client expects immediate and
one-off feedback from network devices. The queried data may be one-off feedback from network devices. The queried data may be
directly extracted from some specific data source, or synthesized and directly extracted from some specific data source or synthesized and
processed from raw data. Queries work well for interactive network processed from raw data. Queries work well for interactive network
telemetry applications. telemetry applications.
In general, data can be pulled (i.e., queried) whenever needed, but In general, data can be pulled (i.e., queried) whenever needed, but
in many cases, pushing the data (i.e., subscription) is more in many cases, pushing the data (i.e., subscription) is more
efficient, and can reduce the latency of a client detecting a change. efficient, and it can reduce the latency of a client detecting a
From the data consumer point of view, there are four types of data change. From the data consumer point of view, there are four types
from network devices that a telemetry data consumer can subscribe or of data from network devices that a telemetry data consumer can
query: subscribe or query:
* Simple Data: The data that are steadily available from some * Simple Data: Data that are steadily available from some datastore
datastore or static probes in network devices. or static probes in network devices.
* Derived Data: The data need to be synthesized or processed in * Derived Data: Data that need to be synthesized or processed in the
network from raw data from one or more network devices. The data network from raw data from one or more network devices. The data
processing function can be statically or dynamically loaded into processing function can be statically or dynamically loaded into
network devices. network devices.
* Event-triggered Data: The data are conditionally acquired based on * Event-triggered Data: Data that are conditionally acquired based
the occurrence of some events. An example of event-triggered data on the occurrence of some events. An example of event-triggered
could be an interface changing operational state between up and data could be an interface changing operational state between up
down. Such data can be actively pushed through subscription or and down. Such data can be actively pushed through subscription
passively polled through query. There are many ways to model or passively polled through query. There are many ways to model
events, including using Finite State Machine (FSM) or Event events, including using Finite State Machine (FSM) or Event
Condition Action (ECA) [I-D.wwx-netmod-event-yang]. Condition Action (ECA) [NETMOD-ECA-POLICY].
* Streaming Data: The data are continuously generated. It can be * Streaming Data: Data that are continuously generated. It can be a
time series or the dump of databases. For example, an interface time series or the dump of databases. For example, an interface
packet counter is exported every second. The streaming data packet counter is exported every second. The streaming data
reflect realtime network states and metrics and require large reflect real-time network states and metrics and require large
bandwidth and processing power. The streaming data are always bandwidth and processing power. The streaming data are always
actively pushed to the subscribers. actively pushed to the subscribers.
The above telemetry data types are not mutually exclusive. Rather, The above telemetry data types are not mutually exclusive. Rather,
they are often composite. Derived data is composed of simple data; they are often composite. Derived data is composed of simple data;
Event-triggered data can be simple or derived; streaming data can be event-triggered data can be simple or derived; and streaming data can
based on some recurring event. The relationships of these data types be based on some recurring event. The relationships of these data
are illustrated in Figure 4. types are illustrated in Figure 3.
+----------------------+ +-----------------+ +----------------------+ +-----------------+
| Event-triggered Data |<----+ Streaming Data | | Event-Triggered Data |<----+ Streaming Data |
+-------+---+----------+ +-----+---+-------+ +-------+---+----------+ +-----+---+-------+
| | | | | | | |
| | | | | | | |
| | +--------------+ | | | | +--------------+ | |
| +-->| Derived Data |<--+ | | +-->| Derived Data |<--+ |
| +------+------ + | | +------+------ + |
| | | | | |
| V | | V |
| +--------------+ | | +--------------+ |
+------>| Simple Data |<------+ +------>| Simple Data |<------+
+--------------+ +--------------+
Figure 4: Data Type Relationship Figure 3: Data Type Relationship
Subscription usually deals with event-triggered data and streaming Subscription usually deals with event-triggered data and streaming
data, and query usually deals with simple data and derived data. But data, and query usually deals with simple data and derived data. But
the other ways are also possible. Advanced network telemetry the other ways are also possible. Advanced network telemetry
techniques are designed mainly for event-triggered or streaming data techniques are designed mainly for event-triggered or streaming data
subscription, and derived data query. subscription and derived data query.
3.4. Mapping Existing Mechanisms into the Framework 3.4. Mapping Existing Mechanisms into the Framework
The following table shows how the existing mechanisms (mainly The following table shows how the existing mechanisms (mainly
published in IETF and with the emphasis on the latest new published in IETF and with the emphasis on the latest new
technologies) are positioned in the framework. Given the vast body technologies) are positioned in the framework. Given the vast body
of existing work, we cannot provide an exhaustive list, so the of existing work, we cannot provide an exhaustive list, so the
mechanisms in the tables should be considered as just examples. mechanisms in the tables should be considered as just examples.
Also, some comprehensive protocols and techniques may cover multiple Also, some comprehensive protocols and techniques may cover multiple
aspects or modules of the framework, so a name in a block only aspects or modules of the framework, so a name in a block only
emphasizes one particular characteristic of it. More details about emphasizes one particular characteristic of it. More details about
some listed mechanisms can be found in Appendix A. some listed mechanisms can be found in Appendix A.
+-------------+-----------------+---------------+--------------+ +===============+=================+================+============+
| | Management | Control | Forwarding | | | Management | Control Plane | Forwarding |
| | Plane | Plane | Plane | | | Plane | | Plane |
+-------------+-----------------+---------------+--------------+ +===============+=================+================+============+
| data config.| gNMI, NETCONF, | gNMI, NETCONF,| NETCONF, | | data | gNMI, NETCONF, | gNMI, NETCONF, | NETCONF, |
| & subscribe | RESTCONF, SNMP, | RESTCONF, | RESTCONF, | | configuration | RESTCONF, SNMP, | RESTCONF, | RESTCONF, |
| | YANG-Push | YANG-Push | YANG-Push | | and subscribe | YANG-Push | YANG-Push | YANG-Push |
+-------------+-----------------+---------------+--------------+ +---------------+-----------------+----------------+------------+
| data gen. & | MIB, | YANG | IOAM, PSAMP | | data | MIB, YANG | YANG | IOAM, |
| process | YANG | | PBT, AM, | | generation | | | PSAMP, |
+-------------+-----------------+---------------+--------------+ | and process | | | PBT, AM |
| data encode.| gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP | +---------------+-----------------+----------------+------------+
| & export | | | | | data encoding | gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP |
+-------------+-----------------+---------------+--------------+ | and export | | | |
Figure 5: Existing Work Mapping +---------------+-----------------+----------------+------------+
Table 2: Existing Work Mapping
Although the framework is generally suitable for any network Although the framework is generally suitable for any network
environments, the multi-domain telemetry has some unique challenges environments, the multi-domain telemetry has some unique challenges
which deserve further architectural consideration, which is out of that deserve further architectural consideration, which is out of the
the scope of this document. scope of this document.
4. Evolution of Network Telemetry Applications 4. Evolution of Network Telemetry Applications
Network telemetry is an evolving technical area. As the network Network telemetry is an evolving technical area. As the network
moves towards the automated operation, network telemetry applications moves towards the automated operation, network telemetry applications
undergo several stages of evolution which add new layer of undergo several stages of evolution, which add a new layer of
requirements to the underlying network telemetry techniques. Each requirements to the underlying network telemetry techniques. Each
stage is built upon the techniques adopted by the previous stages stage is built upon the techniques adopted by the previous stages
plus some new requirements. plus some new requirements.
Stage 0 - Static Telemetry: The telemetry data source and type are Stage 0 - Static Telemetry: The telemetry data source and type are
determined at design time. The network operator can only determined at design time. The network operator can only
configure how to use it with limited flexibility. configure how to use it with limited flexibility.
Stage 1 - Dynamic Telemetry: The custom telemetry data can be Stage 1 - Dynamic Telemetry: The custom telemetry data can be
dynamically programmed or configured at runtime without dynamically programmed or configured at runtime without
interrupting the network operation, allowing a trade-off among interrupting the network operation, allowing a trade-off among
resource, performance, flexibility, and coverage. resource, performance, flexibility, and coverage.
Stage 2 - Interactive Telemetry: The network operator can Stage 2 - Interactive Telemetry: The network operator can
continuously customize and fine tune the telemetry data in real continuously customize and fine tune the telemetry data in real
time to reflect the network operation's visibility requirements. time to reflect the network operation's visibility requirements.
Compared with Stage 1, the changes are frequent based on the real- Compared with Stage 1, the changes are frequent based on the real-
time feedback. At this stage, some tasks can be automated, but time feedback. At this stage, some tasks can be automated, but
human operators still need to sit in the middle to make decisions. human operators still need to sit in the middle to make decisions.
Stage 3 - Closed-loop Telemetry: The telemetry is free from the Stage 3 - Closed-Loop Telemetry: The telemetry is free from the
interference of human operators, except for generating the interference of human operators, except for generating the
reports. The intelligent network operation engine automatically reports. The intelligent network operation engine automatically
issues the telemetry data requests, analyzes the data, and updates issues the telemetry data requests, analyzes the data, and updates
the network operations in closed control loops. the network operations in closed control loops.
Existing technologies are ready for stage 0 and stage 1. Individual Existing technologies are ready for Stages 0 and 1. Individual
stage 2 and stage 3 applications are also possible now. However, the applications for Stages 2 and 3 are also possible now. However, the
future autonomic networks may need a comprehensive operation future autonomic networks may need a comprehensive operation
management system which works at stage 2 and stage 3 to cover all the management system that works at Stages 2 and 3 to cover all the
network operation tasks. A well-defined network telemetry framework network operation tasks. A well-defined network telemetry framework
is the first step towards this direction. is the first step towards this direction.
5. Security Considerations 5. Security Considerations
The complexity of network telemetry raises significant security The complexity of network telemetry raises significant security
implications. For example, telemetry data can be manipulated to implications. For example, telemetry data can be manipulated to
exhaust various network resources at each plane as well as the data exhaust various network resources at each plane as well as the data
consumer; falsified or tampered data can mislead the decision-making consumer; falsified or tampered data can mislead the decision-making
and paralyze networks; wrong configuration and programming for process and paralyze networks; and wrong configuration and
telemetry is equally harmful. The telemetry data is highly programming for telemetry is equally harmful. The telemetry data is
sensitive, which exposes a lot of information about the network and highly sensitive, which exposes a lot of information about the
its configuration. Some of that information can make designing network and its configuration. Some of that information can make
attacks against the network much easier (e.g., exact details of what designing attacks against the network much easier (e.g., exact
software and patches have been installed), and allows an attacker to details of what software and patches have been installed) and allows
determine whether a device may be subject to unprotected security an attacker to determine whether a device may be subject to
vulnerabilities. unprotected security vulnerabilities.
Given that this document has proposed a framework for network Given that this document has proposed a framework for network
telemetry and the telemetry mechanisms discussed are more extensive telemetry and the telemetry mechanisms discussed are more extensive
(in both message frequency and traffic amount) than the conventional (in both message frequency and traffic amount) than the conventional
network OAM concepts, we must also reflect that various new security network OAM concepts, we must also anticipate that new security
considerations may also arise. A number of techniques already exist considerations that may also arise. A number of techniques already
for securing the forwarding plane, the control plane, and the exist for securing the forwarding plane, control plane, and
management plane in a network, but it is important to consider if any management plane in a network, but it is important to consider if any
new threat vectors are now being enabled via the use of network new threat vectors are now being enabled via the use of network
telemetry procedures and mechanisms. telemetry procedures and mechanisms.
This document proposes a conceptual architectural for collecting, This document proposes a conceptual architectural for collecting,
transporting, and analyzing a wide variety of data sources in support transporting, and analyzing a wide variety of data sources in support
of network applications. The protocols, data formats, and of network applications. The protocols, data formats, and
configurations chosen to implement this framework will dictate the configurations chosen to implement this framework will dictate the
specific security considerations. These considerations may include: specific security considerations. These considerations may include:
* Telemetry framework trust and policy model; * Telemetry framework trust and policy models;
* Role management and access control for enabling and disabling * Role management and access control for enabling and disabling
telemetry capabilities; telemetry capabilities;
* Protocol transport used for telemetry data and its inherent * Protocol transport used for telemetry data and its inherent
security capabilities; security capabilities;
* Telemetry data stores, storage encryption, methods of access, and * Telemetry data stores, storage encryption, methods of access, and
retention practices; retention practices;
* Tracking telemetry events and any abnormalities that might * Tracking telemetry events and any abnormalities that might
identify malicious attacks using telemetry interfaces. identify malicious attacks using telemetry interfaces.
* Authentication and integrity protection of telemetry data to make * Authentication and integrity protection of telemetry data to make
data more trustworthy. data more trustworthy; and
* Segregating the telemetry data traffic from the data traffic * Segregating the telemetry data traffic from the data traffic
carried over the network (e.g., historically management access and carried over the network (e.g., historically management access and
management data may be carried via an independent management management data may be carried via an independent management
network). network).
Some security considerations highlighted above may be minimized or Some security considerations highlighted above may be minimized or
negated with policy management of network telemetry. In a network negated with policy management of network telemetry. In a network
telemetry deployment it would be advantageous to separate telemetry telemetry deployment, it would be advantageous to separate telemetry
capabilities into different classes of policies, i.e., Role Based capabilities into different classes of policies, i.e., Role-Based
Access Control and Event-Condition-Action policies. Also, potential Access Control and Event-Condition-Action policies. Also, potential
conflicts between network telemetry mechanisms must be detected conflicts between network telemetry mechanisms must be detected
accurately and resolved quickly to avoid unnecessary network accurately and resolved quickly to avoid unnecessary network
telemetry traffic propagation escalating into an unintended or telemetry traffic propagation escalating into an unintended or
intended denial of service attack. intended denial-of-service attack.
Further study of the security issues will be required, and it is Further study of the security issues will be required, and it is
expected that the security mechanisms and protocols are developed and expected that the security mechanisms and protocols are developed and
deployed along with a network telemetry system. deployed along with a network telemetry system.
6. IANA Considerations 6. IANA Considerations
This document includes no request to IANA. This document has no IANA actions.
7. Contributors
The other contributors of this document are Tianran Zhou, Zhenbin Li,
Zhenqiang Li, Daniel King, Adrian Farrel, and Alexander Clemm
8. Acknowledgments
We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe
Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe
Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra,
Ben Schwartz, Alexey Melnikov, Michael Scharf, Dhruv Dhody, Martin
Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars Eggert, Eric
Vyncke, Jean-Michel Combes, Erik Kline, Benjamin Kaduk, and many
others who have provided helpful comments and suggestions to improve
this document.
9. Informative References 7. Informative References
[gnmi] "gNMI - gRPC Network Management Interface", [gnmi] Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack,
<https://github.com/openconfig/reference/tree/master/rpc/ C., and C. Marrow, "gRPC Network Management Interface",
gnmi>. IETF 98, March 2017,
<https://datatracker.ietf.org/meeting/98/materials/slides-
98-rtgwg-gnmi-intro-draft-openconfig-rtgwg-gnmi-spec-00>.
[gpb] "Google Protocol Buffers", [gpb] Google Developers, "Protocol Buffers",
<https://developers.google.com/protocol-buffers>. <https://developers.google.com/protocol-buffers>.
[grpc] "gPPC, A high performance, open-source universal RPC [grpc] gRPC, "gPPC: A high performance, open source universal RPC
framework", <https://grpc.io>. framework", <https://grpc.io>.
[I-D.ietf-grow-bmp-local-rib] [IPPM-IOAM-DIRECT-EXPORT]
Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente,
"Support for Local RIB in BGP Monitoring Protocol (BMP)",
Work in Progress, Internet-Draft, draft-ietf-grow-bmp-
local-rib-13, 31 August 2021,
<https://www.ietf.org/archive/id/draft-ietf-grow-bmp-
local-rib-13.txt>.
[I-D.ietf-ippm-ioam-data]
Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields
for In-situ OAM", Work in Progress, Internet-Draft, draft-
ietf-ippm-ioam-data-16, 8 November 2021,
<https://www.ietf.org/archive/id/draft-ietf-ippm-ioam-
data-16.txt>.
[I-D.ietf-ippm-ioam-direct-export]
Song, H., Gafni, B., Zhou, T., Li, Z., Brockners, F., Song, H., Gafni, B., Zhou, T., Li, Z., Brockners, F.,
Bhandari, S., Sivakolundu, R., and T. Mizrahi, "In-situ Bhandari, S., Ed., Sivakolundu, R., and T. Mizrahi, Ed.,
OAM Direct Exporting", Work in Progress, Internet-Draft, "In-situ OAM Direct Exporting", Work in Progress,
draft-ietf-ippm-ioam-direct-export-07, 13 October 2021, Internet-Draft, draft-ietf-ippm-ioam-direct-export-07, 13
<https://www.ietf.org/archive/id/draft-ietf-ippm-ioam- October 2021, <https://datatracker.ietf.org/doc/html/
direct-export-07.txt>. draft-ietf-ippm-ioam-direct-export-07>.
[I-D.ietf-netconf-distributed-notif] [IPPM-POSTCARD-BASED-TELEMETRY]
Song, H., Mirsky, G., Filsfils, C., Abdelsalam, A., Zhou,
T., Li, Z., Mishra, G., Shin, J., and K. Lee, "In-Situ OAM
Marking-based Direct Export", Work in Progress, Internet-
Draft, draft-song-ippm-postcard-based-telemetry-12, 12 May
2022, <https://datatracker.ietf.org/doc/html/draft-song-
ippm-postcard-based-telemetry-12>.
[NETCONF-DISTRIB-NOTIF]
Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois,
"Subscription to Distributed Notifications", Work in "Subscription to Distributed Notifications", Work in
Progress, Internet-Draft, draft-ietf-netconf-distributed- Progress, Internet-Draft, draft-ietf-netconf-distributed-
notif-02, 6 May 2021, <https://www.ietf.org/archive/id/ notif-03, 10 January 2022,
draft-ietf-netconf-distributed-notif-02.txt>. <https://datatracker.ietf.org/doc/html/draft-ietf-netconf-
distributed-notif-03>.
[I-D.ietf-netconf-udp-notif] [NETCONF-UDP-NOTIF]
Zheng, G., Zhou, T., Graf, T., Francois, P., Feng, A. H., Zheng, G., Zhou, T., Graf, T., Francois, P., Feng, A. H.,
and P. Lucente, "UDP-based Transport for Configured and P. Lucente, "UDP-based Transport for Configured
Subscriptions", Work in Progress, Internet-Draft, draft- Subscriptions", Work in Progress, Internet-Draft, draft-
ietf-netconf-udp-notif-04, 21 October 2021, ietf-netconf-udp-notif-05, 4 March 2022,
<https://www.ietf.org/archive/id/draft-ietf-netconf-udp- <https://datatracker.ietf.org/doc/html/draft-ietf-netconf-
notif-04.txt>. udp-notif-05>.
[I-D.irtf-nmrg-ibn-concepts-definitions] [NETMOD-ECA-POLICY]
Clemm, A., Ciavaglia, L., Granville, L. Z., and J. Wu, Q., Bryskin, I., Birkholz, H., Liu, X., and B. Claise,
Tantsura, "Intent-Based Networking - Concepts and "A YANG Data model for ECA Policy Management", Work in
Definitions", Work in Progress, Internet-Draft, draft- Progress, Internet-Draft, draft-ietf-netmod-eca-policy-01,
irtf-nmrg-ibn-concepts-definitions-05, 2 September 2021, 19 February 2021, <https://datatracker.ietf.org/doc/html/
<https://www.ietf.org/archive/id/draft-irtf-nmrg-ibn- draft-ietf-netmod-eca-policy-01>.
concepts-definitions-05.txt>.
[I-D.pedro-nmrg-anticipated-adaptation] [NMRG-ANTICIPATED-ADAPTATION]
Martinez-Julia, P., "Exploiting External Event Detectors Martinez-Julia, P., Ed., "Exploiting External Event
to Anticipate Resource Requirements for the Elastic Detectors to Anticipate Resource Requirements for the
Adaptation of SDN/NFV Systems", Work in Progress, Elastic Adaptation of SDN/NFV Systems", Work in Progress,
Internet-Draft, draft-pedro-nmrg-anticipated-adaptation- Internet-Draft, draft-pedro-nmrg-anticipated-adaptation-
02, 29 June 2018, <https://www.ietf.org/archive/id/draft- 02, 29 June 2018, <https://datatracker.ietf.org/doc/html/
pedro-nmrg-anticipated-adaptation-02.txt>. draft-pedro-nmrg-anticipated-adaptation-02>.
[I-D.song-ippm-postcard-based-telemetry]
Song, H., Mirsky, G., Filsfils, C., Abdelsalam, A., Zhou,
T., Li, Z., Shin, J., and K. Lee, "In-Situ OAM Marking-
based Direct Export", Work in Progress, Internet-Draft,
draft-song-ippm-postcard-based-telemetry-11, 15 November
2021, <https://www.ietf.org/archive/id/draft-song-ippm-
postcard-based-telemetry-11.txt>.
[I-D.song-opsawg-dnp4iq] [NMRG-IBN-CONCEPTS-DEFINITIONS]
Song, H. and J. Gong, "Requirements for Interactive Query Clemm, A., Ciavaglia, L., Granville, L. Z., and J.
with Dynamic Network Probes", Work in Progress, Internet- Tantsura, "Intent-Based Networking - Concepts and
Draft, draft-song-opsawg-dnp4iq-01, 19 June 2017, Definitions", Work in Progress, Internet-Draft, draft-
<https://www.ietf.org/archive/id/draft-song-opsawg-dnp4iq- irtf-nmrg-ibn-concepts-definitions-09, 24 March 2022,
01.txt>. <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg-
ibn-concepts-definitions-09>.
[I-D.song-opsawg-ifit-framework] [OPSAWG-DNP4IQ]
Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- Song, H., Ed. and J. Gong, "Requirements for Interactive
situ Flow Information Telemetry", Work in Progress, Query with Dynamic Network Probes", Work in Progress,
Internet-Draft, draft-song-opsawg-ifit-framework-16, 21 Internet-Draft, draft-song-opsawg-dnp4iq-01, 19 June 2017,
October 2021, <https://www.ietf.org/archive/id/draft-song- <https://datatracker.ietf.org/doc/html/draft-song-opsawg-
opsawg-ifit-framework-16.txt>. dnp4iq-01>.
[I-D.wwx-netmod-event-yang] [OPSAWG-IFIT-FRAMEWORK]
Wu, Q., Bryskin, I., Birkholz, H., Liu, X., and B. Claise, Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "A
"A YANG Data model for ECA Policy Management", Work in Framework for In-situ Flow Information Telemetry", Work in
Progress, Internet-Draft, draft-wwx-netmod-event-yang-10, Progress, Internet-Draft, draft-song-opsawg-ifit-
1 November 2020, <https://www.ietf.org/archive/id/draft- framework-17, 22 February 2022,
wwx-netmod-event-yang-10.txt>. <https://datatracker.ietf.org/doc/html/draft-song-opsawg-
ifit-framework-17>.
[RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin,
"Simple Network Management Protocol (SNMP)", RFC 1157, "Simple Network Management Protocol (SNMP)", RFC 1157,
DOI 10.17487/RFC1157, May 1990, DOI 10.17487/RFC1157, May 1990,
<https://www.rfc-editor.org/info/rfc1157>. <https://www.rfc-editor.org/info/rfc1157>.
[RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J.
Schoenwaelder, Ed., "Structure of Management Information Schoenwaelder, Ed., "Structure of Management Information
Version 2 (SMIv2)", STD 58, RFC 2578, Version 2 (SMIv2)", STD 58, RFC 2578,
DOI 10.17487/RFC2578, April 1999, DOI 10.17487/RFC2578, April 1999,
skipping to change at page 35, line 22 skipping to change at line 1578
Hybrid Performance Monitoring", RFC 8889, Hybrid Performance Monitoring", RFC 8889,
DOI 10.17487/RFC8889, August 2020, DOI 10.17487/RFC8889, August 2020,
<https://www.rfc-editor.org/info/rfc8889>. <https://www.rfc-editor.org/info/rfc8889>.
[RFC8924] Aldrin, S., Pignataro, C., Ed., Kumar, N., Ed., Krishnan, [RFC8924] Aldrin, S., Pignataro, C., Ed., Kumar, N., Ed., Krishnan,
R., and A. Ghanwani, "Service Function Chaining (SFC) R., and A. Ghanwani, "Service Function Chaining (SFC)
Operations, Administration, and Maintenance (OAM) Operations, Administration, and Maintenance (OAM)
Framework", RFC 8924, DOI 10.17487/RFC8924, October 2020, Framework", RFC 8924, DOI 10.17487/RFC8924, October 2020,
<https://www.rfc-editor.org/info/rfc8924>. <https://www.rfc-editor.org/info/rfc8924>.
[xml] "Extensible Markup Language (XML) 1.0 (Fifth Edition)", [RFC9069] Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente,
<https://www.w3.org/TR/2008/REC-xml-20081126/>. "Support for Local RIB in the BGP Monitoring Protocol
(BMP)", RFC 9069, DOI 10.17487/RFC9069, February 2022,
<https://www.rfc-editor.org/info/rfc9069>.
[y1731] "ITU-T Y.1731: OAM Functions and Mechanisms for Ethernet [RFC9197] Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi,
based networks, 2015", Ed., "Data Fields for In Situ Operations, Administration,
and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197,
May 2022, <https://www.rfc-editor.org/info/rfc9197>.
[W3C.REC-xml-20081126]
Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and
F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth
Edition)", World Wide Web Consortium Recommendation REC-
xml-20081126, November 2008,
<https://www.w3.org/TR/2008/REC-xml-20081126>.
[y1731] ITU-T, "Operations, administration and maintenance (OAM)
functions and mechanisms for Ethernet-based networks",
ITU-T Recommendation G.8013/Y.1731, August 2015,
<https://www.itu.int/rec/T-REC-Y.1731/en>. <https://www.itu.int/rec/T-REC-Y.1731/en>.
Appendix A. A Survey on Existing Network Telemetry Techniques Appendix A. A Survey on Existing Network Telemetry Techniques
In this non-normative appendix, we provide an overview of some In this non-normative appendix, we provide an overview of some
existing techniques and standard proposals for each network telemetry existing techniques and standard proposals for each network telemetry
module. module.
A.1. Management Plane Telemetry A.1. Management Plane Telemetry
A.1.1. Push Extensions for NETCONF A.1.1. Push Extensions for NETCONF
NETCONF [RFC6241] is a popular network management protocol NETCONF [RFC6241] is a popular network management protocol
recommended by IETF. Its core strength is for managing recommended by IETF. Its core strength is for managing
configuration, but can also be used for data collection. YANG-Push configuration, but it can also be used for data collection.
[RFC8641] [RFC8639] extends NETCONF and enables subscriber YANG-Push [RFC8639] [RFC8641] extends NETCONF and enables subscriber
applications to request a continuous, customized stream of updates applications to request a continuous, customized stream of updates
from a YANG datastore. Providing such visibility into changes made from a YANG datastore. Providing such visibility into changes made
upon YANG configuration and operational objects enables new upon YANG configuration and operational objects enables new
capabilities based on the remote mirroring of configuration and capabilities based on the remote mirroring of configuration and
operational state. Moreover, distributed data collection mechanism operational state. Moreover, a distributed data collection mechanism
[I-D.ietf-netconf-distributed-notif] via UDP based publication [NETCONF-DISTRIB-NOTIF] via a UDP-based publication channel
channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for [NETCONF-UDP-NOTIF] provides enhanced efficiency for the NETCONF-
the NETCONF based telemetry. based telemetry.
A.1.2. gRPC Network Management Interface A.1.2. gRPC Network Management Interface
gRPC Network Management Interface (gNMI) [gnmi] is a network gRPC Network Management Interface (gNMI) [gnmi] is a network
management protocol based on the gRPC [grpc] RPC (Remote Procedure management protocol based on the gRPC [grpc] Remote Procedure Call
Call) framework. With a single gRPC service definition, both (RPC) framework. With a single gRPC service definition, both
configuration and telemetry can be covered. gRPC is an HTTP/2 configuration and telemetry can be covered. gRPC is an open-source
[RFC7540]-based open-source micro-service communication framework. micro-service communication framework based on HTTP/2 [RFC7540]. It
It provides a number of capabilities which are well-suited for provides a number of capabilities that are well-suited for network
network telemetry, including: telemetry, including:
* Full-duplex streaming transport model combined with a binary * A full-duplex streaming transport model; when combined with a
encoding mechanism provides good telemetry efficiency. binary encoding mechanism, it provides good telemetry efficiency.
* gRPC provides higher-level features consistency across platforms * A higher-level feature consistency across platforms that common
that common HTTP/2 libraries typically do not. This HTTP/2 libraries typically do not provide. This characteristic is
characteristic is especially valuable for the fact that telemetry especially valuable for the fact that telemetry data collectors
data collectors normally reside on a large variety of platforms. normally reside on a large variety of platforms.
* The built-in load-balancing and failover mechanism. * A built-in load-balancing and failover mechanism.
A.2. Control Plane Telemetry A.2. Control Plane Telemetry
A.2.1. BGP Monitoring Protocol A.2.1. BGP Monitoring Protocol
BGP Monitoring Protocol (BMP) [RFC7854] is used to monitor BGP BMP [RFC7854] is used to monitor BGP sessions and is intended to
sessions and is intended to provide a convenient interface for provide a convenient interface for obtaining route views.
obtaining route views.
The BGP routing information is collected from the monitored device(s) BGP routing information is collected from the monitored device(s) to
to the BMP monitoring station by setting up the BMP TCP session. The the BMP monitoring station by setting up the BMP TCP session. The
BGP peers are monitored by the BMP Peer Up and Peer Down BGP peers are monitored by the BMP Peer Up and Peer Down
Notifications. The BGP routes (including Adjacency_RIB_In [RFC7854], notifications. The BGP routes (including Adj_RIB_In [RFC7854],
Adjacency_RIB_out [RFC8671], and Local_Rib Adj_RIB_out [RFC8671], and local RIB [RFC9069]) are encapsulated in
[I-D.ietf-grow-bmp-local-rib]) are encapsulated in the BMP Route the BMP Route Monitoring Message and the BMP Route Mirroring Message,
Monitoring Message and the BMP Route Mirroring Message, providing providing both an initial table dump and real-time route updates. In
both an initial table dump and real-time route updates. In addition, addition, BGP statistics are reported through the BMP Stats Report
BGP statistics are reported through the BMP Stats Report Message, Message, which could be either timer triggered or event-driven.
which could be either timer triggered or event-driven. Future BMP Future BMP extensions could further enrich BGP monitoring
extensions could further enrich BGP monitoring applications. applications.
A.3. Data Plane Telemetry A.3. Data Plane Telemetry
A.3.1. The Alternate Marking (AM) technology A.3.1. Alternate-Marking (AM) Technology
The Alternate Marking method enables efficient measurements of packet The Alternate-Marking method enables efficient measurements of packet
loss, delay, and jitter both in IP and Overlay Networks, as presented loss, delay, and jitter both in IP and Overlay Networks, as presented
in [RFC8321] and [RFC8889]. in [RFC8321] and [RFC8889].
This technique can be applied to point-to-point and multipoint-to- This technique can be applied to point-to-point and multipoint-to-
multipoint flows. Alternate Marking creates batches of packets by multipoint flows. Alternate Marking creates batches of packets by
alternating the value of 1 bit (or a label) of the packet header. alternating the value of 1 bit (or a label) of the packet header.
These batches of packets are unambiguously recognized over the These batches of packets are unambiguously recognized over the
network and the comparison of packet counters for each batch allows network, and the comparison of packet counters for each batch allows
the packet loss calculation. The same idea can be applied to delay the packet loss calculation. The same idea can be applied to delay
measurement by selecting ad hoc packets with a marking bit dedicated measurement by selecting ad hoc packets with a marking bit dedicated
for delay measurements. for delay measurements.
Alternate Marking method needs two counters each marking period for The Alternate-Marking method needs two counters each marking period
each flow under monitor. For instance, by considering n measurement for each flow under monitor. For instance, by considering n
points and m monitored flows, the order of magnitude of the packet measurement points and m monitored flows, the order of magnitude of
counters for each time interval is n*m*2 (1 per color). the packet counters for each time interval is n*m*2 (1 per color).
Since networks offer rich sets of network performance measurement Since networks offer rich sets of network performance measurement
data (e.g., packet counters), conventional approaches run into data (e.g., packet counters), conventional approaches run into
limitations. The bottleneck is the generation and export of the data limitations. The bottleneck is the generation and export of the data
and the amount of data that can be reasonably collected from the and the amount of data that can be reasonably collected from the
network. In addition, management tasks related to determining and network. In addition, management tasks related to determining and
configuring which data to generate lead to significant deployment configuring which data to generate lead to significant deployment
challenges. challenges.
The Multipoint Alternate Marking approach, described in [RFC8889], The Multipoint Alternate-Marking approach, described in [RFC8889],
aims to resolve this issue and make the performance monitoring more aims to resolve this issue and make the performance monitoring more
flexible in case a detailed analysis is not needed. flexible in case a detailed analysis is not needed.
An application orchestrates network performance measurements tasks An application orchestrates network performance measurement tasks
across the network to allow for optimized monitoring. The across the network to allow for optimized monitoring. The
application can choose how roughly or precisely to configure application can choose how roughly or precisely to configure
measurement points depending on the application's requirements. measurement points depending on the application's requirements.
Using Alternate Marking, it is possible to monitor a Multipoint Using Alternate Marking, it is possible to monitor a Multipoint
Network without in depth examination by using the Network Clustering Network without in-depth examination by using Network Clustering
(subnetworks that are portions of the entire network that preserve (subnetworks that are portions of the entire network that preserve
the same property of the entire network, called clusters). So in the the same property of the entire network, called clusters). So in the
case that there is packet loss or the delay is too high then the case where there is packet loss or the delay is too high, the
specific filtering criteria could be applied to gather a more specific filtering criteria could be applied to gather a more
detailed analysis by using a different combination of clusters up to detailed analysis by using a different combination of clusters up to
a per-flow measurement as described in Alternate-Marking (AM) a per-flow measurement as described in the Alternate-Marking document
[RFC8321]. [RFC8321].
In summary, an application can configure end-to-end network In summary, an application can configure end-to-end network
monitoring. If the network does not experience issues, this monitoring. If the network does not experience issues, this
approximate monitoring is good enough and is very cheap in terms of approximate monitoring is good enough and is very cheap in terms of
network resources. However, in case of problems, the application network resources. However, in case of problems, the application
becomes aware of the issues from this approximate monitoring and, in becomes aware of the issues from this approximate monitoring and, in
order to localize the portion of the network that has issues, order to localize the portion of the network that has issues,
configures the measurement points more extensively, allowing more configures the measurement points more extensively, allowing more
detailed monitoring to be performed. After the detection and detailed monitoring to be performed. After the detection and
resolution of the problem, the initial approximate monitoring can be resolution of the problem, the initial approximate monitoring can be
used again. used again.
A.3.2. Dynamic Network Probe A.3.2. Dynamic Network Probe
Hardware-based Dynamic Network Probe (DNP) [I-D.song-opsawg-dnp4iq] A hardware-based Dynamic Network Probe (DNP) [OPSAWG-DNP4IQ] provides
proposes a programmable means to customize the data that an a programmable means to customize the data that an application
application collects from the data plane. A direct benefit of DNP is collects from the data plane. A direct benefit of DNP is the
the reduction of the exported data. A full DNP solution covers reduction of the exported data. A full DNP solution covers several
several components including data source, data subscription, and data components including data source, data subscription, and data
generation. The data subscription needs to define the derived data generation. The data subscription needs to define the derived data
which can be composed and derived from the raw data sources. The that can be composed and derived from raw data sources. The data
data generation takes advantage of the moderate in-network computing generation takes advantage of the moderate in-network computing to
to produce the desired data. produce the desired data.
While DNP can introduce unforeseeable flexibility to the data plane While DNP can introduce unforeseeable flexibility to the data plane
telemetry, it also faces some challenges. It requires a flexible telemetry, it also faces some challenges. It requires a flexible
data plane that can be dynamically reprogrammed at run-time. The data plane that can be dynamically reprogrammed at runtime. The
programming API is yet to be defined. programming Application Programming Interface (API) is yet to be
defined.
A.3.3. IP Flow Information Export (IPFIX) Protocol A.3.3. IP Flow Information Export (IPFIX) Protocol
Traffic on a network can be seen as a set of flows passing through Traffic on a network can be seen as a set of flows passing through
network elements. IP Flow Information Export (IPFIX) [RFC7011] network elements. IPFIX [RFC7011] provides a means of transmitting
provides a means of transmitting traffic flow information for traffic flow information for administrative or other purposes. A
administrative or other purposes. A typical IPFIX enabled system typical IPFIX-enabled system includes a pool of Metering Processes
includes a pool of Metering Processes that collects data packets at that collects data packets at one or more Observation Points,
one or more Observation Points, optionally filters them and optionally filters them, and aggregates information about these
aggregates information about these packets. An Exporter then gathers packets. An Exporter then gathers each of the Observation Points
each of the Observation Points together into an Observation Domain together into an Observation Domain and sends this information via
and sends this information via the IPFIX protocol to a Collector. the IPFIX protocol to a Collector.
A.3.4. In-Situ OAM A.3.4. In Situ OAM
Classical passive and active monitoring and measurement techniques Classical passive and active monitoring and measurement techniques
are either inaccurate or resource-consuming. It is preferable to are either inaccurate or resource consuming. It is preferable to
directly acquire data associated with a flow's packets when the directly acquire data associated with a flow's packets when the
packets pass through a network. In-situ OAM (iOAM) packets pass through a network. IOAM [RFC9197], a data generation
[I-D.ietf-ippm-ioam-data], a data generation technique, embeds a new technique, embeds a new instruction header to user packets, and the
instruction header to user packets and the instruction directs the instruction directs the network nodes to add the requested data to
network nodes to add the requested data to the packets. Thus, at the the packets. Thus, at the path's end, the packet's experience gained
path end, the packet's experience gained on the entire forwarding on the entire forwarding path can be collected. Such firsthand data
path can be collected. Such firsthand data is invaluable to many is invaluable to many network OAM applications.
network OAM applications.
However, iOAM also faces some challenges. The issues on performance However, IOAM also faces some challenges. The issues on performance
impact, security, scalability and overhead limits, encapsulation impact, security, scalability and overhead limits, encapsulation
difficulties in some protocols, and cross-domain deployment need to difficulties in some protocols, and cross-domain deployment need to
be addressed. be addressed.
A.3.5. Postcard Based Telemetry A.3.5. Postcard-Based Telemetry
The postcard-based telemetry, as embodied in IOAM DEX The postcard-based telemetry, as embodied in IOAM Direct Export (DEX)
[I-D.ietf-ippm-ioam-direct-export] and IOAM Marking [IPPM-IOAM-DIRECT-EXPORT] and IOAM Marking
[I-D.song-ippm-postcard-based-telemetry], is a complementary [IPPM-POSTCARD-BASED-TELEMETRY], is a complementary technique to the
technique to the passport-based IOAM. PBT directly exports data at passport-based IOAM [RFC9197]. PBT directly exports data at each
each node through an independent packet. At the cost of higher node through an independent packet. At the cost of higher bandwidth
bandwidth overhead and the need for data correlation, PBT shows overhead and the need for data correlation, PBT shows several unique
several unique advantages. It can also help to identify packet drop advantages. It can also help to identify packet drop location in
location in case a packet is dropped on its forwarding path. case a packet is dropped on its forwarding path.
A.3.6. Existing OAM for Specific Data Planes A.3.6. Existing OAM for Specific Data Planes
Various data planes raise unique OAM requirements. IETF has Various data planes raise unique OAM requirements. IETF has
published OAM technique and framework documents (e.g., [RFC8924] and published OAM technique and framework documents (e.g., [RFC8924] and
[RFC5085]) targeting different data planes such as Multi-Protocol [RFC5085]) targeting different data planes such as Multiprotocol
Label Switching (MPLS), L2 Virtual Private Network (L2-VPN), Network Label Switching (MPLS), L2 Virtual Private Network (VPN), Network
Virtualization Overlays (NVO3), Virtual Extensible LAN (VXLAN), Bit Virtualization over Layer 3 (NVO3), Virtual Extensible LAN (VXLAN),
Indexed Explicit Replication (BIER), Service Function Chaining (SFC), Bit Index Explicit Replication (BIER), Service Function Chaining
Segment Routing (SR), and Deterministic Networking (DETNET). The (SFC), Segment Routing (SR), and Deterministic Networking (DETNET).
aforementioned data plane telemetry techniques can be used to enhance The aforementioned data plane telemetry techniques can be used to
the OAM capability on such data planes. enhance the OAM capability on such data planes.
A.4. External Data and Event Telemetry A.4. External Data and Event Telemetry
A.4.1. Sources of External Events A.4.1. Sources of External Events
To ensure that the information provided by external event detectors To ensure that the information provided by external event detectors
and used by the network management solutions is meaningful for and used by the network management solutions is meaningful for
management purposes, the network telemetry framework must ensure that management purposes, the network telemetry framework must ensure that
such detectors (sources) are easily connected to the management such detectors (sources) are easily connected to the management
solutions (sinks). This requires the specification of a list of solutions (sinks). This requires the specification of a list of
potential external data sources that could be of interest in network potential external data sources that could be of interest in network
management and match it to the connectors and/or interfaces required management and matching it to the connectors and/or interfaces
to connect them. required to connect them.
Categories of external event sources that may be of interest to Categories of external event sources that may be of interest to
network management include:: network management include:
* Smart objects and sensors. With the consolidation of the Internet * Smart objects and sensors. With the consolidation of the Internet
of Things~(IoT) any network system will have many smart objects of Things (IoT), any network system will have many smart objects
attached to its physical surroundings and logical operation attached to its physical surroundings and logical operation
environments. Most of these objects will be essentially based on environments. Most of these objects will be essentially based on
sensors of many kinds (e.g., temperature, humidity, presence) and sensors of many kinds (e.g., temperature, humidity, and presence),
the information they provide can be very useful for the management and the information they provide can be very useful for the
of the network, even when they are not specifically deployed for management of the network, even when they are not specifically
such purpose. Elements of this source type will usually provide a deployed for such purpose. Elements of this source type will
specific protocol for interaction, especially one of those usually provide a specific protocol for interaction, especially
protocols related to IoT, such as the Constrained Application one of the protocols related to IoT, such as the Constrained
Protocol (CoAP). Application Protocol (CoAP).
* Online news reporters. Several online news services have the * Online news reporters. Several online news services have the
ability to provide enormous quantity of information about ability to provide an enormous quantity of information about
different events occurring in the world. Some of those events can different events occurring in the world. Some of those events can
impact on the network system managed by a specific framework and, have an impact on the network system managed by a specific
therefore, such information may be of interest to the management framework; therefore, such information may be of interest to the
solution. For instance, diverse security reports, such as the management solution. For instance, diverse security reports, such
Common Vulnerabilities and Exposures (CVE), can be issued by the as Common Vulnerabilities and Exposures (CVEs), can be issued by
corresponding authority and used by the management solution to the corresponding authority and used by the management solution to
update the managed system if needed. Instead of a specific update the managed system, if needed. Instead of a specific
protocol and data format, the sources of this kind of information protocol and data format, the sources of this kind of information
usually follow a relaxed but structured format. This format will usually follow a relaxed but structured format. This format will
be part of both the ontology and information model of the be part of both the ontology and information model of the
telemetry framework. telemetry framework.
* Global event analyzers. The advance of Big Data analyzers * Global event analyzers. The advance of big data analyzers
provides a huge amount of information and, more interestingly, the provides a huge amount of information and, more interestingly, the
identification of events detected by analyzing many data streams identification of events detected by analyzing many data streams
from different origins. In contrast with the other types of from different origins. In contrast with the other types of
sources, which are focused on specific events, the detectors of sources, which are focused on specific events, the detectors of
this source type will detect generic events. For example, during this source type will detect generic events. For example, during
a sport event some unexpected movement makes it fascinating and a sports event, some unexpected movement makes it fascinating, and
many people connect to sites that are reporting on the event. The many people connect to sites that are reporting on the event. The
underlying networks supporting the services that cover the event underlying networks supporting the services that cover the event
can be affected by such situation, so their management solutions can be affected by such situation, so their management solutions
should be aware of it. In contrast with the other source types, a should be aware of it. In contrast with the other source types, a
new information model, format, and reporting protocol is required new information model, format, and reporting protocol is required
to integrate the detectors of this type with the management to integrate the detectors of this type with the management
solution. solution.
Additional types of detector types can be added to the system, but Additional detector types can be added to the system, but generally
they will be generally the result of composing the properties offered they will be the result of composing the properties offered by these
by these main classes. main classes.
A.4.2. Connectors and Interfaces A.4.2. Connectors and Interfaces
For allowing external event detectors to be properly integrated with For allowing external event detectors to be properly integrated with
other management solutions, both elements must expose interfaces and other management solutions, both elements must expose interfaces and
protocols that are subject to their particular objective. Since protocols that are subject to their particular objective. Since
external event detectors will be focused on providing their external event detectors will be focused on providing their
information to their main consumers, which generally will not be information to their main consumers, which generally will not be
limited to the network management solutions, the framework must limited to the network management solutions, the framework must
include the definition of the required connectors for ensuring the include the definition of the required connectors for ensuring the
interconnection between detectors (sources) and their consumers interconnection between detectors (sources) and their consumers
within the management systems (sinks) are effective. within the management systems (sinks) are effective.
In some situations, the interconnection between the external event In some situations, the interconnection between external event
detectors and the management system is via the management plane. For detectors and the management system is via the management plane. For
those situations there will be a special connector that provides the those situations, there will be a special connector that provides the
typical interfaces found in most other elements connected to the typical interfaces found in most other elements connected to the
management plane. For instance, the interfaces could accomplish this management plane. For instance, the interfaces could accomplish this
with a specific data model (YANG) and specific telemetry protocol, with a specific data model (YANG) and specific telemetry protocol,
such as NETCONF, YANG-Push, or gRPC. such as NETCONF, YANG-Push, or gRPC.
Acknowledgments
We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe
Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe
Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra,
Ben Schwartz, Alexey Melnikov, Michael Scharf, Dhruv Dhody, Martin
Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars Eggert, Éric
Vyncke, Jean-Michel Combes, Erik Kline, Benjamin Kaduk, and many
others who have provided helpful comments and suggestions to improve
this document.
Contributors
The other contributors of this document are Tianran Zhou, Zhenbin Li,
Zhenqiang Li, Daniel King, Adrian Farrel, and Alexander Clemm.
Authors' Addresses Authors' Addresses
Haoyu Song Haoyu Song
Futurewei Futurewei
United States of America United States of America
Email: haoyu.song@futurewei.com Email: haoyu.song@futurewei.com
Fengwei Qin Fengwei Qin
China Mobile China Mobile
P.R. China China
Email: qinfengwei@chinamobile.com Email: qinfengwei@chinamobile.com
Pedro Martinez-Julia Pedro Martinez-Julia
NICT NICT
Japan Japan
Email: pedro@nict.go.jp Email: pedro@nict.go.jp
Laurent Ciavaglia Laurent Ciavaglia
Rakuten Mobile Rakuten Mobile
France France
Email: laurent.ciavaglia@rakuten.com Email: laurent.ciavaglia@rakuten.com
Aijun Wang Aijun Wang
China Telecom China Telecom
P.R. China China
Email: wangaj3@chinatelecom.cn
Email: wangaj.bri@chinatelecom.cn
 End of changes. 219 change blocks. 
703 lines changed or deleted 722 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/