| rfc9232.original | rfc9232.txt | |||
|---|---|---|---|---|
| OPSAWG H. Song | Internet Engineering Task Force (IETF) H. Song | |||
| Internet-Draft Futurewei | Request for Comments: 9232 Futurewei | |||
| Intended status: Informational F. Qin | Category: Informational F. Qin | |||
| Expires: 6 June 2022 China Mobile | ISSN: 2070-1721 China Mobile | |||
| P. Martinez-Julia | P. Martinez-Julia | |||
| NICT | NICT | |||
| L. Ciavaglia | L. Ciavaglia | |||
| Rakuten Mobile | Rakuten Mobile | |||
| A. Wang | A. Wang | |||
| China Telecom | China Telecom | |||
| 3 December 2021 | May 2022 | |||
| Network Telemetry Framework | Network Telemetry Framework | |||
| draft-ietf-opsawg-ntf-13 | ||||
| Abstract | Abstract | |||
| Network telemetry is a technology for gaining network insight and | Network telemetry is a technology for gaining network insight and | |||
| facilitating efficient and automated network management. It | facilitating efficient and automated network management. It | |||
| encompasses various techniques for remote data generation, | encompasses various techniques for remote data generation, | |||
| collection, correlation, and consumption. This document describes an | collection, correlation, and consumption. This document describes an | |||
| architectural framework for network telemetry, motivated by | architectural framework for network telemetry, motivated by | |||
| challenges that are encountered as part of the operation of networks | challenges that are encountered as part of the operation of networks | |||
| and by the requirements that ensue. This document clarifies the | and by the requirements that ensue. This document clarifies the | |||
| terminologies and classifies the modules and components of a network | terminology and classifies the modules and components of a network | |||
| telemetry system from different perspectives. The framework and | telemetry system from different perspectives. The framework and | |||
| taxonomy help to set a common ground for the collection of related | taxonomy help to set a common ground for the collection of related | |||
| work and provide guidance for related technique and standard | work and provide guidance for related technique and standard | |||
| developments. | developments. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
| provisions of BCP 78 and BCP 79. | published for informational purposes. | |||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
| and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
| time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
| material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Not all documents | |||
| approved by the IESG are candidates for any level of Internet | ||||
| Standard; see Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 6 June 2022. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9232. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
| 1.1. Applicability Statement . . . . . . . . . . . . . . . . . 4 | 1.1. Applicability Statement | |||
| 1.2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.2. Glossary | |||
| 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 2. Background | |||
| 2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7 | 2.1. Telemetry Data Coverage | |||
| 2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8 | 2.2. Use Cases | |||
| 2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 9 | 2.3. Challenges | |||
| 2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 11 | 2.4. Network Telemetry | |||
| 2.5. The Necessity of a Network Telemetry Framework . . . . . 13 | 2.5. The Necessity of a Network Telemetry Framework | |||
| 3. Network Telemetry Framework . . . . . . . . . . . . . . . . . 14 | 3. Network Telemetry Framework | |||
| 3.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 15 | 3.1. Top-Level Modules | |||
| 3.1.1. Management Plane Telemetry . . . . . . . . . . . . . 18 | 3.1.1. Management Plane Telemetry | |||
| 3.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 18 | 3.1.2. Control Plane Telemetry | |||
| 3.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 19 | 3.1.3. Forwarding Plane Telemetry | |||
| 3.1.4. External Data Telemetry . . . . . . . . . . . . . . . 21 | 3.1.4. External Data Telemetry | |||
| 3.2. Second Level Function Components . . . . . . . . . . . . 22 | 3.2. Second-Level Function Components | |||
| 3.3. Data Acquisition Mechanism and Type Abstraction . . . . . 24 | 3.3. Data Acquisition Mechanism and Type Abstraction | |||
| 3.4. Mapping Existing Mechanisms into the Framework . . . . . 26 | 3.4. Mapping Existing Mechanisms into the Framework | |||
| 4. Evolution of Network Telemetry Applications . . . . . . . . . 27 | 4. Evolution of Network Telemetry Applications | |||
| 5. Security Considerations . . . . . . . . . . . . . . . . . . . 28 | 5. Security Considerations | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 | 6. IANA Considerations | |||
| 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 29 | 7. Informative References | |||
| 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 29 | Appendix A. A Survey on Existing Network Telemetry Techniques | |||
| 9. Informative References . . . . . . . . . . . . . . . . . . . 29 | A.1. Management Plane Telemetry | |||
| Appendix A. A Survey on Existing Network Telemetry Techniques . 35 | A.1.1. Push Extensions for NETCONF | |||
| A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 35 | A.1.2. gRPC Network Management Interface | |||
| A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 35 | A.2. Control Plane Telemetry | |||
| A.1.2. gRPC Network Management Interface . . . . . . . . . . 36 | A.2.1. BGP Monitoring Protocol | |||
| A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 36 | A.3. Data Plane Telemetry | |||
| A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 36 | A.3.1. Alternate-Marking (AM) Technology | |||
| A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 36 | A.3.2. Dynamic Network Probe | |||
| A.3.1. The Alternate Marking (AM) technology . . . . . . . . 36 | A.3.3. IP Flow Information Export (IPFIX) Protocol | |||
| A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 38 | A.3.4. In Situ OAM | |||
| A.3.3. IP Flow Information Export (IPFIX) Protocol . . . . . 38 | A.3.5. Postcard-Based Telemetry | |||
| A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 38 | A.3.6. Existing OAM for Specific Data Planes | |||
| A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 39 | A.4. External Data and Event Telemetry | |||
| A.3.6. Existing OAM for Specific Data Planes . . . . . . . . 39 | A.4.1. Sources of External Events | |||
| A.4. External Data and Event Telemetry . . . . . . . . . . . . 39 | A.4.2. Connectors and Interfaces | |||
| A.4.1. Sources of External Events . . . . . . . . . . . . . 39 | Acknowledgments | |||
| A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 41 | Contributors | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 | Authors' Addresses | |||
| 1. Introduction | 1. Introduction | |||
| Network visibility is the ability of management tools to see the | Network visibility is the ability of management tools to see the | |||
| state and behavior of a network, which is essential for successful | state and behavior of a network, which is essential for successful | |||
| network operation. Network Telemetry revolves around network data | network operation. Network telemetry revolves around network data | |||
| that can help provide insights about the current state of the | that 1) can help provide insights about the current state of the | |||
| network, including network devices, forwarding, control, and | network, including network devices, forwarding, control, and | |||
| management planes, and that can be generated and obtained through a | management planes; 2) can be generated and obtained through a variety | |||
| variety of techniques, including but not limited to network | of techniques, including but not limited to network instrumentation | |||
| instrumentation and measurements, and that can be processed for | and measurements; and 3) can be processed for purposes ranging from | |||
| purposes ranging from service assurance to network security using a | service assurance to network security using a wide variety of data | |||
| wide variety of data analytical techniques. In this document, | analytical techniques. In this document, network telemetry refers to | |||
| Network Telemetry refer to both the data itself (i.e., "Network | both the data itself (i.e., "Network Telemetry Data") and the | |||
| Telemetry Data"), and the techniques and processes used to generate, | techniques and processes used to generate, export, collect, and | |||
| export, collect, and consume that data for use by potentially | consume that data for use by potentially automated management | |||
| automated management applications. Network telemetry extends beyond | applications. Network telemetry extends beyond the classical network | |||
| the classical network Operations, Administration, and Management | Operations, Administration, and Management (OAM) techniques and | |||
| (OAM) techniques and expects to support better flexibility, | expects to support better flexibility, scalability, accuracy, | |||
| scalability, accuracy, coverage, and performance. | coverage, and performance. | |||
| However, the term "network telemetry" lacks an unambiguous | However, the term "network telemetry" lacks an unambiguous | |||
| definition. The scope and coverage of it cause confusion and | definition. The scope and coverage of it cause confusion and | |||
| misunderstandings. It is beneficial to clarify the concept and | misunderstandings. It is beneficial to clarify the concept and | |||
| provide a clear architectural framework for network telemetry, so we | provide a clear architectural framework for network telemetry, so we | |||
| can articulate the technical field, and better align the related | can articulate the technical field and better align the related | |||
| techniques and standard works. | techniques and standard works. | |||
| To fulfill such an undertaking, we first discuss some key | To fulfill such an undertaking, we first discuss some key | |||
| characteristics of network telemetry which set a clear distinction | characteristics of network telemetry that set a clear distinction | |||
| from the conventional network OAM and show that some conventional OAM | from the conventional network OAM and show that some conventional OAM | |||
| technologies can be considered a subset of the network telemetry | technologies can be considered a subset of the network telemetry | |||
| technologies. We then provide an architectural framework for network | technologies. We then provide an architectural framework for network | |||
| telemetry which includes four modules, each concerned with a | telemetry that includes four modules, each associated with a | |||
| different category of telemetry data and corresponding procedures. | different category of telemetry data and corresponding procedures. | |||
| All the modules are internally structured in the same way, including | All the modules are internally structured in the same way, including | |||
| components that allow the operator to configure data sources in | components that allow the operator to configure data sources in | |||
| regard to what data to generate and how to make that available to | regard to what data to generate and how to make that available to | |||
| client applications, components that instrument the underlying data | client applications, components that instrument the underlying data | |||
| sources, and components that perform the actual rendering, encoding, | sources, and components that perform the actual rendering, encoding, | |||
| and exporting of the generated data. We show how the network | and exporting of the generated data. We show how the network | |||
| telemetry framework can benefit the current and future network | telemetry framework can benefit current and future network | |||
| operations. Based on the distinction of modules and function | operations. Based on the distinction of modules and function | |||
| components, we can map the existing and emerging techniques and | components, we can map the existing and emerging techniques and | |||
| protocols into the framework. The framework can also simplify | protocols into the framework. The framework can also simplify | |||
| designing, maintaining, and understanding a network telemetry system. | designing, maintaining, and understanding a network telemetry system. | |||
| In addition, we outline the evolution stages of the network telemetry | In addition, we outline the evolution stages of the network telemetry | |||
| system and discuss the potential security concerns. | system and discuss the potential security concerns. | |||
| The purpose of the framework and taxonomy is to set a common ground | The purpose of the framework and taxonomy is to set a common ground | |||
| for the collection of related work and provide guidance for future | for the collection of related work and provide guidance for future | |||
| technique and standard developments. To the best of our knowledge, | technique and standard developments. To the best of our knowledge, | |||
| skipping to change at page 4, line 35 ¶ | skipping to change at line 175 ¶ | |||
| The network telemetry framework presented in this document must not | The network telemetry framework presented in this document must not | |||
| be applied to generating, exporting, collecting, analyzing, or | be applied to generating, exporting, collecting, analyzing, or | |||
| retaining individual user data or any data that can identify end | retaining individual user data or any data that can identify end | |||
| users or characterize their behavior without consent. Based on this | users or characterize their behavior without consent. Based on this | |||
| principle, the network telemetry framework is not applicable to | principle, the network telemetry framework is not applicable to | |||
| networks whose endpoints represent individual users, such as general- | networks whose endpoints represent individual users, such as general- | |||
| purpose access networks. | purpose access networks. | |||
| 1.2. Glossary | 1.2. Glossary | |||
| Before further discussion, we list some key terminology and acronyms | Before further discussion, we list some key terminology and | |||
| used in this document. We make an intended differentiation between | abbreviations used in this document. There is an intended | |||
| the terms of network telemetry and OAM. However, it should be | differentiation between the terms of network telemetry and OAM. | |||
| understood that there is not a hard-line distinction between the two | However, it should be understood that there is not a hard-line | |||
| concepts. Rather, network telemetry is considered as an extension of | distinction between the two concepts. Rather, network telemetry is | |||
| OAM. It covers all the existing OAM protocols but puts more emphasis | considered an extension of OAM. It covers all the existing OAM | |||
| on the newer and emerging techniques and protocols concerning all | protocols but puts more emphasis on the newer and emerging techniques | |||
| aspects of network data from acquisition to consumption. | and protocols concerning all aspects of network data from acquisition | |||
| to consumption. | ||||
| AI: Artificial Intelligence. In the network domain, AI refers to | AI: Artificial Intelligence. In the network domain, AI | |||
| the machine-learning based technologies for automated network | refers to machine-learning-based technologies for | |||
| operation and other tasks. | automated network operation and other tasks. | |||
| AM: Alternate Marking, a flow performance measurement method, | AM: Alternate Marking. A flow performance measurement | |||
| specified in [RFC8321]. | method, as specified in [RFC8321]. | |||
| BMP: BGP Monitoring Protocol, specified in [RFC7854]. | BMP: BGP Monitoring Protocol. Specified in [RFC7854]. | |||
| DPI: Deep Packet Inspection, referring to the techniques that | DPI: Deep Packet Inspection. Refers to the techniques that | |||
| examines packet beyond packet L3/L4 headers. | examine packets beyond packet L3/L4 headers. | |||
| gNMI: gRPC Network Management Interface, a network management | gNMI: gRPC Network Management Interface. A network management | |||
| protocol from OpenConfig Operator Working Group, mainly | protocol from the OpenConfig Operator Working Group, | |||
| contributed by Google. See [gnmi] for details. | mainly contributed by Google. See [gnmi] for details. | |||
| GPB: Google Protocol Buffer, an extensible mechanism for serializing | GPB: Google Protocol Buffer. An extensible mechanism for | |||
| structured data. See [gpb] for details. | serializing structured data. See [gpb] for details. | |||
| gRPC: gRPC Remote Procedure Call, an open source high performance | gRPC: gRPC Remote Procedure Call. An open-source high- | |||
| RPC framework that gNMI is based on. See [grpc] for details. | performance RPC framework that gNMI is based on. See | |||
| [grpc] for details. | ||||
| IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. | IPFIX: IP Flow Information Export Protocol. Specified in | |||
| [RFC7011]. | ||||
| IOAM: In-situ OAM [I-D.ietf-ippm-ioam-data], a dataplane on-path | IOAM: In situ OAM [RFC9197]. A data plane on-path telemetry | |||
| telemetry technique. | technique. | |||
| JSON: An open standard file format and data interchange format that | JSON: JavaScript Object Notation. An open standard file format | |||
| uses human-readable text to store and transmit data objects, | and data interchange format that uses human-readable text | |||
| specified in [RFC8259]. | to store and transmit data objects, as specified in | |||
| [RFC8259]. | ||||
| MIB: Management Information Base, a database used for managing the | MIB: Management Information Base. A database used for | |||
| entities in a network. | managing the entities in a network. | |||
| NETCONF: Network Configuration Protocol, specified in [RFC6241]. | NETCONF: Network Configuration Protocol. Specified in [RFC6241]. | |||
| NetFlow: A Cisco protocol for flow record collecting, described in | NetFlow: A Cisco protocol used for flow record collecting, as | |||
| [RFC3954]. | described in [RFC3954]. | |||
| Network Telemetry: The process and instrumentation for acquiring and | Network Telemetry: The process and instrumentation for acquiring and | |||
| utilizing network data remotely for network monitoring and | utilizing network data remotely for network monitoring | |||
| operation. A general term for a large set of network visibility | and operation. A general term for a large set of network | |||
| techniques and protocols, concerning aspects like data generation, | visibility techniques and protocols, concerning aspects | |||
| collection, correlation, and consumption. Network telemetry | like data generation, collection, correlation, and | |||
| addresses the current network operation issues and enables smooth | consumption. Network telemetry addresses current network | |||
| evolution toward future intent-driven autonomous networks. | operation issues and enables smooth evolution toward | |||
| future intent-driven autonomous networks. | ||||
| NMS: Network Management System, referring to applications that allow | NMS: Network Management System. Refers to applications that | |||
| network administrators to manage a network. | allow network administrators to manage a network. | |||
| OAM: Operations, Administration, and Maintenance. A group of | OAM: Operations, Administration, and Maintenance. A group of | |||
| network management functions that provide network fault | network management functions that provide network fault | |||
| indication, fault localization, performance information, and data | indication, fault localization, performance information, | |||
| and diagnosis functions. Most conventional network monitoring | and data and diagnosis functions. Most conventional | |||
| techniques and protocols belong to network OAM. | network monitoring techniques and protocols belong to | |||
| network OAM. | ||||
| PBT: Postcard-Based Telemetry, a dataplane on-path telemetry | PBT: Postcard-Based Telemetry. A data plane on-path telemetry | |||
| technique. A representative technique is described in | technique. A representative technique is described in | |||
| [I-D.ietf-ippm-ioam-direct-export]. | [IPPM-IOAM-DIRECT-EXPORT]. | |||
| RESTCONF: An HTTP-based protocol that provides a programmatic | RESTCONF: An HTTP-based protocol that provides a programmatic | |||
| interface for accessing data defined in YANG, using the datastore | interface for accessing data defined in YANG, using the | |||
| concepts defined in NETCONF, as specified in [RFC8040]. | datastore concepts defined in NETCONF, as specified in | |||
| [RFC8040]. | ||||
| SMIv2: Structure of Management Information Version 2, defining MIB | SMIv2: Structure of Management Information Version 2. Defines | |||
| objects, specified in [RFC2578]. | MIB objects, as specified in [RFC2578]. | |||
| SNMP: Simple Network Management Protocol. Version 1, 2, and 3 are | SNMP: Simple Network Management Protocol. Versions 1, 2, and 3 | |||
| specified in [RFC1157], [RFC3416], and [RFC3411], respectively. | are specified in [RFC1157], [RFC3416], and [RFC3411], | |||
| respectively. | ||||
| XML: Extensible Markup Language is a markup language for data | XML: Extensible Markup Language. A markup language for data | |||
| encoding that is both human-readable and machine-readable, | encoding that is both human readable and machine | |||
| specified by W3C [xml]. | readable, as specified by W3C [W3C.REC-xml-20081126]. | |||
| YANG: YANG is a data modeling language for the definition of data | YANG: YANG is a data modeling language for the definition of | |||
| sent over network management protocols such as the NETCONF and | data sent over network management protocols such as | |||
| RESTCONF. YANG is defined in [RFC6020] and [RFC7950]. | NETCONF and RESTCONF. YANG is defined in [RFC6020] and | |||
| [RFC7950]. | ||||
| YANG ECA: A YANG model for Event-Condition-Action policies, defined | YANG ECA: A YANG model for Event-Condition-Action policies, as | |||
| in [I-D.wwx-netmod-event-yang]. | defined in [NETMOD-ECA-POLICY]. | |||
| YANG-Push: A mechanism that allows subscriber applications to | YANG-Push: A mechanism that allows subscriber applications to | |||
| request a stream of updates from a YANG datastore on a network | request a stream of updates from a YANG datastore on a | |||
| device. Details are specified in [RFC8641] and [RFC8639]. | network device. Details are specified in [RFC8639] and | |||
| [RFC8641]. | ||||
| 2. Background | 2. Background | |||
| The term "big data" is used to describe the extremely large volume of | The term "big data" is used to describe the extremely large volume of | |||
| data sets that can be analyzed computationally to reveal patterns, | data sets that can be analyzed computationally to reveal patterns, | |||
| trends, and associations. Networks are undoubtedly a source of big | trends, and associations. Networks are undoubtedly a source of big | |||
| data because of their scale and the volume of network traffic they | data because of their scale and the volume of network traffic they | |||
| forward. When a network's endpoints do not represent individual | forward. When a network's endpoints do not represent individual | |||
| users (e.g. in industrial, datacenter, and infrastructure contexts), | users (e.g., in industrial, data-center, and infrastructure | |||
| network operations can often benefit from large-scale data collection | contexts), network operations can often benefit from large-scale data | |||
| without breaching user privacy. | collection without breaching user privacy. | |||
| Today one can access advanced big data analytics capability through a | Today, one can access advanced big data analytics capability through | |||
| plethora of commercial and open source platforms (e.g., Apache | a plethora of commercial and open-source platforms (e.g., Apache | |||
| Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine | Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine | |||
| learning). Thanks to the advance of computing and storage | learning). Thanks to the advance of computing and storage | |||
| technologies, network big data analytics gives network operators an | technologies, network big data analytics give network operators an | |||
| opportunity to gain network insights and move towards network | opportunity to gain network insights and move towards network | |||
| autonomy. Some operators start to explore the application of | autonomy. Some operators start to explore the application of | |||
| Artificial Intelligence (AI) to make sense of network data. Software | Artificial Intelligence (AI) to make sense of network data. Software | |||
| tools can use the network data to detect and react on network faults, | tools can use the network data to detect and react on network faults, | |||
| anomalies, and policy violations, as well as predicting future | anomalies, and policy violations, as well as predict future events. | |||
| events. In turn, the network policy updates for planning, intrusion | In turn, the network policy updates for planning, intrusion | |||
| prevention, optimization, and self-healing may be applied. | prevention, optimization, and self-healing may be applied. | |||
| It is conceivable that an autonomic network [RFC7575] is the logical | It is conceivable that an autonomic network [RFC7575] is the logical | |||
| next step for network evolution following Software Defined Networking | next step for network evolution following Software-Defined Networking | |||
| (SDN), aiming to reduce (or even eliminate) human labor, make more | (SDN), which aims to reduce (or even eliminate) human labor, make | |||
| efficient use of network resources, and provide better services more | more efficient use of network resources, and provide better services | |||
| aligned with customer requirements. The IETF ANIMA working group is | more aligned with customer requirements. The IETF ANIMA Working | |||
| dedicated to developing and maintaining protocols and procedures for | Group is dedicated to developing and maintaining protocols and | |||
| automated network management and control of professionally-managed | procedures for automated network management and control of | |||
| networks. The related technique of Intent-based Networking (IBN) | professionally managed networks. The related technique of | |||
| [I-D.irtf-nmrg-ibn-concepts-definitions] requires network visibility | Intent-Based Networking (IBN) [NMRG-IBN-CONCEPTS-DEFINITIONS] | |||
| and telemetry data in order to ensure that the network is behaving as | requires network visibility and telemetry data in order to ensure | |||
| intended. | that the network is behaving as intended. | |||
| However, while the data processing capability is improved and | However, while the data processing capability is improved and | |||
| applications require more data to function better, the networks lag | applications require more data to function better, the networks lag | |||
| behind in extracting and translating network data into useful and | behind in extracting and translating network data into useful and | |||
| actionable information in efficient ways. The system bottleneck is | actionable information in efficient ways. The system bottleneck is | |||
| shifting from data consumption to data supply. Both the number of | shifting from data consumption to data supply. Both the number of | |||
| network nodes and the traffic bandwidth keep increasing at a fast | network nodes and the traffic bandwidth keep increasing at a fast | |||
| pace. The network configuration and policy change at smaller time | pace. The network configuration and policy change at smaller time | |||
| slots than before. More subtle events and fine-grained data through | slots than before. More subtle events and fine-grained data through | |||
| all network planes need to be captured and exported in real time. In | all network planes need to be captured and exported in real time. In | |||
| a nutshell, it is a challenge to get enough high-quality data out of | a nutshell, it is a challenge to get enough high-quality data out of | |||
| the network in a manner that is efficient, timely, and flexible. | the network in a manner that is efficient, timely, and flexible. | |||
| Therefore, we need to survey the existing technologies and protocols | Therefore, we need to survey the existing technologies and protocols | |||
| and identify any potential gaps. | and identify any potential gaps. | |||
| In the remainder of this section, first we clarify the scope of | In the remainder of this section, we first clarify the scope of | |||
| network data (i.e., telemetry data) relevant in this document. Then, | network data (i.e., telemetry data) relevant in this document. Then, | |||
| we discuss several key use cases for today's and future network | we discuss several key use cases for network operations of today and | |||
| operations. Next, we show why the current network OAM techniques and | the future. Next, we show why the current network OAM techniques and | |||
| protocols are insufficient for these use cases. The discussion | protocols are insufficient for these use cases. The discussion | |||
| underlines the need for new methods, techniques, and protocols, as | underlines the need for new methods, techniques, and protocols, as | |||
| well as the extensions of existing ones, which we assign under the | well as the extensions of existing ones, which we assign under the | |||
| umbrella term - Network Telemetry. | umbrella term "Network Telemetry". | |||
| 2.1. Telemetry Data Coverage | 2.1. Telemetry Data Coverage | |||
| Any information that can be extracted from networks (including data | Any information that can be extracted from networks (including the | |||
| plane, control plane, and management plane) and used to gain | data plane, control plane, and management plane) and used to gain | |||
| visibility or as basis for actions is considered telemetry data. It | visibility or as a basis for actions is considered telemetry data. | |||
| includes statistics, event records and logs, snapshots of state, | It includes statistics, event records and logs, snapshots of state, | |||
| configuration data, etc. It also covers the outputs of any active | configuration data, etc. It also covers the outputs of any active | |||
| and passive measurements [RFC7799]. In some cases, raw data is | and passive measurements [RFC7799]. In some cases, raw data is | |||
| processed in network before being sent to a data consumer. Such | processed in network before being sent to a data consumer. Such | |||
| processed data is also considered telemetry data. The value of | processed data is also considered telemetry data. The value of | |||
| telemetry data varies. In some cases, if the cost is acceptable, | telemetry data varies. In some cases, if the cost is acceptable, | |||
| less but higher quality data are preferred than lots of low quality | less but higher-quality data are preferred rather than a lot of low- | |||
| data. A classification of telemetry data is provided in Section 3. | quality data. A classification of telemetry data is provided in | |||
| To preserve the privacy of end-users, no user packet content should | Section 3. To preserve the privacy of end users, no user packet | |||
| be collected. Specifically, the data objects generated, exported, | content should be collected. Specifically, the data objects | |||
| and collected by a network telemetry application should not include | generated, exported, and collected by a network telemetry application | |||
| any packet payload from traffic associated with end-users systems. | should not include any packet payload from traffic associated with | |||
| end-user systems. | ||||
| 2.2. Use Cases | 2.2. Use Cases | |||
| The following set of use cases is essential for network operations. | The following set of use cases is essential for network operations. | |||
| While the list is by no means exhaustive, it is enough to highlight | While the list is by no means exhaustive, it is enough to highlight | |||
| the requirements for data velocity, variety, volume, and veracity, | the requirements for data velocity, variety, volume, and veracity, | |||
| the attributes of big data, in networks. | the attributes of big data, in networks. | |||
| * Security: Network intrusion detection and prevention systems need | * Security: Network intrusion detection and prevention systems need | |||
| to monitor network traffic and activities and act upon anomalies. | to monitor network traffic and activities and act upon anomalies. | |||
| Given increasingly sophisticated attack vectors coupled with | Given increasingly sophisticated attack vectors coupled with | |||
| increasingly severe consequences of security breaches, new tools | increasingly severe consequences of security breaches, new tools | |||
| and techniques need to be developed, relying on wider and deeper | and techniques need to be developed, relying on wider and deeper | |||
| visibility into networks. The ultimate goal is to achieve | visibility into networks. The ultimate goal is to achieve | |||
| security with no, or only minimal, human intervention, and without | security with no, or only minimal, human intervention and without | |||
| disrupting legitimate traffic flows. | disrupting legitimate traffic flows. | |||
| * Policy and Intent Compliance: Network policies are the rules that | * Policy and Intent Compliance: Network policies are the rules that | |||
| constrain the services for network access, provide service | constrain the services for network access, provide service | |||
| differentiation, or enforce specific treatment on the traffic. | differentiation, or enforce specific treatment on the traffic. | |||
| For example, a service function chain is a policy that requires | For example, a service function chain is a policy that requires | |||
| the selected flows to pass through a set of ordered network | the selected flows to pass through a set of ordered network | |||
| functions. Intent, as defined in | functions. Intent, as defined in [NMRG-IBN-CONCEPTS-DEFINITIONS], | |||
| [I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational | is a set of operational goals that a network should meet and | |||
| goals that a network should meet and outcomes that a network is | outcomes that a network is supposed to deliver, defined in a | |||
| supposed to deliver, defined in a declarative manner without | declarative manner without specifying how to achieve or implement | |||
| specifying how to achieve or implement them. An intent requires a | them. An intent requires a complex translation and mapping | |||
| complex translation and mapping process before being applied on | process before being applied on networks. While a policy or | |||
| networks. While a policy or intent is enforced, the compliance | intent is enforced, the compliance needs to be verified and | |||
| needs to be verified and monitored continuously by relying on | monitored continuously by relying on visibility that is provided | |||
| visibility that is provided through network telemetry data. Any | through network telemetry data. Any violation must be reported | |||
| violation must be reported immediately, potentially resulting in | immediately - this will alert the network administrator to the | |||
| updates to how the policy or intent is applied in the network to | policy or intent violation and will potentially result in updates | |||
| ensure that it remains in force, or otherwise alerting the network | to how the policy or intent is applied in the network to ensure | |||
| administrator to the policy or intent violation. | that it remains in force. | |||
| * SLA Compliance: A Service-Level Agreement (SLA) is a service | * SLA Compliance: A Service Level Agreement (SLA) is a service | |||
| contract between a service provider and a client, which include | contract between a service provider and a client, which includes | |||
| the metrics for the service measurement and remedy/penalty | the metrics for the service measurement and remedy/penalty | |||
| procedures when the service level misses the agreement. Users | procedures when the service level misses the agreement. Users | |||
| need to check if they get the service as promised and network | need to check if they get the service as promised, and network | |||
| operators need to evaluate how they can deliver services that can | operators need to evaluate how they can deliver services that meet | |||
| meet the SLA based on realtime network telemetry data, including | the SLA based on real-time network telemetry data, including data | |||
| data from network measurements. | from network measurements. | |||
| * Root Cause Analysis: Many network failure can be the effect of a | * Root Cause Analysis: Many network failures can be the effect of a | |||
| sequence of chained events. Troubleshooting and recovery require | sequence of chained events. Troubleshooting and recovery require | |||
| quick identification of the root cause of any observable issues. | quick identification of the root cause of any observable issues. | |||
| However, the root cause is not always straightforward to identify, | However, the root cause is not always straightforward to identify, | |||
| especially when the failure is sporadic and the number of event | especially when the failure is sporadic and the number of event | |||
| messages, both related and unrelated to the same cause, is | messages, both related and unrelated to the same cause, is | |||
| overwhelming. While technologies such as machine learning can be | overwhelming. While technologies such as machine learning can be | |||
| used for root cause analysis, it is up to the network to sense and | used for root cause analysis, it is up to the network to sense and | |||
| provide the relevant diagnostic data which are either actively fed | provide the relevant diagnostic data that are either actively fed | |||
| into, or passively retrieved by, the root cause analysis | into or passively retrieved by the root cause analysis | |||
| applications. | applications. | |||
| * Network Optimization: This covers all short-term and long-term | * Network Optimization: This covers all short-term and long-term | |||
| network optimization techniques, including load balancing, Traffic | network optimization techniques, including load balancing, Traffic | |||
| Engineering (TE), and network planning. Network operators are | Engineering (TE), and network planning. Network operators are | |||
| motivated to optimize their network utilization and differentiate | motivated to optimize their network utilization and differentiate | |||
| services for better Return On Investment (ROI) or lower Capital | services for better Return on Investment (ROI) or lower Capital | |||
| Expenditures (CAPEX). The first step is to know the real-time | Expenditure (CAPEX). The first step is to know the real-time | |||
| network conditions before applying policies for traffic | network conditions before applying policies for traffic | |||
| manipulation. In some cases, micro-bursts need to be detected in | manipulation. In some cases, microbursts need to be detected in a | |||
| a very short time-frame so that fine-grained traffic control can | very short time frame so that fine-grained traffic control can be | |||
| be applied to avoid network congestion. Long-term planning of | applied to avoid network congestion. Long-term planning of | |||
| network capacity and topology requires analysis of real-world | network capacity and topology requires analysis of real-world | |||
| network telemetry data that is obtained over long periods of time. | network telemetry data that is obtained over long periods of time. | |||
| * Event Tracking and Prediction: The visibility into traffic path | * Event Tracking and Prediction: The visibility into traffic path | |||
| and performance is critical for services and applications that | and performance is critical for services and applications that | |||
| rely on healthy network operation. Numerous related network | rely on healthy network operation. Numerous related network | |||
| events are of interest to network operators. For example, Network | events are of interest to network operators. For example, network | |||
| operators want to learn where and why packets are dropped for an | operators want to learn where and why packets are dropped for an | |||
| application flow. They also want to be warned of issues in | application flow. They also want to be warned of issues in | |||
| advance, so proactive actions can be taken to avoid catastrophic | advance, so proactive actions can be taken to avoid catastrophic | |||
| consequences. | consequences. | |||
| 2.3. Challenges | 2.3. Challenges | |||
| For a long time, network operators have relied upon SNMP [RFC3416], | For a long time, network operators have relied upon SNMP [RFC3416], | |||
| Command-Line Interface (CLI), or Syslog [RFC5424] to monitor the | Command-Line Interface (CLI), or Syslog [RFC5424] to monitor the | |||
| network. Some other OAM techniques as described in [RFC7276] are | network. Some other OAM techniques as described in [RFC7276] are | |||
| also used to facilitate network troubleshooting. These conventional | also used to facilitate network troubleshooting. These conventional | |||
| techniques are not sufficient to support the above use cases for the | techniques are not sufficient to support the above use cases for the | |||
| following reasons: | following reasons: | |||
| * Most use cases need to continuously monitor the network and | * Most use cases need to continuously monitor the network and | |||
| dynamically refine the data collection in real-time. Poll-based | dynamically refine the data collection in real time. Poll-based | |||
| low-frequency data collection is ill-suited for these | low-frequency data collection is ill-suited for these | |||
| applications. Subscription-based streaming data directly pushed | applications. Subscription-based streaming data directly pushed | |||
| from the data source (e.g., the forwarding chip) is preferred to | from the data source (e.g., the forwarding chip) is preferred to | |||
| provide sufficient data quantity and precision at scale. | provide sufficient data quantity and precision at scale. | |||
| * Comprehensive data is needed, ranging from packet processing | * Comprehensive data is needed, ranging from packet processing | |||
| engines to traffic manager, from line cards to main control board, | engines to traffic managers, line cards to main control boards, | |||
| from user flows to control protocol packets, from device | user flows to control protocol packets, device configurations to | |||
| configurations to operations, and from physical layer to | operations, and physical layers to application layers. | |||
| application layer. Conventional OAM only covers a narrow range of | Conventional OAM only covers a narrow range of data (e.g., SNMP | |||
| data (e.g., SNMP only handles data from the Management Information | only handles data from the Management Information Base (MIB)). | |||
| Base (MIB)). Classical network devices cannot provide all the | Classical network devices cannot provide all the necessary probes. | |||
| necessary probes. More open and programmable network devices are | More open and programmable network devices are therefore needed. | |||
| therefore needed. | ||||
| * Many application scenarios need to correlate network-wide data | * Many application scenarios need to correlate network-wide data | |||
| from multiple sources (i.e., from distributed network devices, | from multiple sources (i.e., from distributed network devices, | |||
| different components of a network device, or different network | different components of a network device, or different network | |||
| planes). A piecemeal solution is often lacking the capability to | planes). A piecemeal solution is often lacking the capability to | |||
| consolidate the data from multiple sources. The composition of a | consolidate the data from multiple sources. The composition of a | |||
| complete solution, as partly proposed by Autonomic Resource | complete solution, as partly proposed by Autonomic Resource | |||
| Control Architecture(ARCA) | Control Architecture (ARCA) [NMRG-ANTICIPATED-ADAPTATION], will be | |||
| [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and | empowered and guided by a comprehensive framework. | |||
| guided by a comprehensive framework. | ||||
| * Some conventional OAM techniques (e.g., CLI and Syslog) lack a | * Some conventional OAM techniques (e.g., CLI and Syslog) lack a | |||
| formal data model. The unstructured data hinder the tool | formal data model. The unstructured data hinder the tool | |||
| automation and application extensibility. Standardized data | automation and application extensibility. Standardized data | |||
| models are essential to support the programmable networks. | models are essential to support the programmable networks. | |||
| * Although some conventional OAM techniques support data push (e.g., | * Although some conventional OAM techniques support data push (e.g., | |||
| SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow [RFC3176]), the | SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow [RFC3176]), the | |||
| pushed data are limited to only predefined management plane | pushed data are limited to only predefined management plane | |||
| warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow). | warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow). | |||
| Network operators require the data with arbitrary source, | Network operators require the data with arbitrary source, | |||
| granularity, and precision which are beyond the capability of the | granularity, and precision, which is beyond the capability of the | |||
| existing techniques. | existing techniques. | |||
| * The conventional passive measurement techniques can either consume | * Conventional passive measurement techniques can either consume | |||
| excessive network resources and produce excessive redundant data, | excessive network resources and produce excessive redundant data | |||
| or lead to inaccurate results; on the other hand, the conventional | or lead to inaccurate results; on the other hand, conventional | |||
| active measurement techniques can interfere with the user traffic | active measurement techniques can interfere with the user traffic, | |||
| and their results are indirect. Techniques that can collect | and their results are indirect. Techniques that can collect | |||
| direct and on-demand data from user traffic are more favorable. | direct and on-demand data from user traffic are more favorable. | |||
| These challenges were addressed by newer standards and techniques | These challenges were addressed by newer standards and techniques | |||
| (e.g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push) | (e.g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push), | |||
| and more are emerging. These standards and techniques need to be | and more are emerging. These standards and techniques need to be | |||
| recognized and accommodated in a new framework. | recognized and accommodated in a new framework. | |||
| 2.4. Network Telemetry | 2.4. Network Telemetry | |||
| Network telemetry has emerged as a mainstream technical term to refer | Network telemetry has emerged as a mainstream technical term to refer | |||
| to the network data collection and consumption techniques. Several | to the network data collection and consumption techniques. Several | |||
| network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and | network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and | |||
| gRPC [grpc]) have been widely deployed. Network telemetry allows | gRPC [grpc]) have been widely deployed. Network telemetry allows | |||
| separate entities to acquire data from network devices so that data | separate entities to acquire data from network devices so that data | |||
| can be visualized and analyzed to support network monitoring and | can be visualized and analyzed to support network monitoring and | |||
| operation. Network telemetry covers the conventional network OAM and | operation. Network telemetry covers the conventional network OAM and | |||
| has a wider scope. For instance, it is expected that network | has a wider scope. For instance, it is expected that network | |||
| telemetry can provide the necessary network insight for autonomous | telemetry can provide the necessary network insight for autonomous | |||
| networks and address the shortcomings of conventional OAM techniques. | networks and address the shortcomings of conventional OAM techniques. | |||
| Network telemetry usually assumes machines as data consumers rather | Network telemetry usually assumes machines as data consumers rather | |||
| than human operators. Hence, the network telemetry can directly | than human operators. Hence, network telemetry can directly trigger | |||
| trigger the automated network operation, while in contrast some | the automated network operation, while in contrast, some conventional | |||
| conventional OAM tools were designed and used to help human operators | OAM tools were designed and used to help human operators to monitor | |||
| to monitor and diagnose the networks and guide manual network | and diagnose the networks and guide manual network operations. Such | |||
| operations. Such a proposition leads to very different techniques. | a proposition leads to very different techniques. | |||
| Although new network telemetry techniques are emerging and subject to | Although new network telemetry techniques are emerging and subject to | |||
| continuous evolution, several characteristics of network telemetry | continuous evolution, several characteristics of network telemetry | |||
| have been well accepted. Note that network telemetry is intended to | have been well accepted. Note that network telemetry is intended to | |||
| be an umbrella term covering a wide spectrum of techniques, so the | be an umbrella term covering a wide spectrum of techniques, so the | |||
| following characteristics are not expected to be held by every | following characteristics are not expected to be held by every | |||
| specific technique. | specific technique. | |||
| * Push and Streaming: Instead of polling data from network devices, | * Push and Streaming: Instead of polling data from network devices, | |||
| telemetry collectors subscribe to streaming data pushed from data | telemetry collectors subscribe to streaming data pushed from data | |||
| sources in network devices. | sources in network devices. | |||
| * Volume and Velocity: The telemetry data is intended to be consumed | * Volume and Velocity: Telemetry data is intended to be consumed by | |||
| by machines rather than by human being. Therefore, the data | machines rather than by human beings. Therefore, the data volume | |||
| volume can be huge and the processing is optimized for the needs | can be huge, and the processing is optimized for the needs of | |||
| of automation in realtime. | automation in real time. | |||
| * Normalization and Unification: Telemetry aims to address the | * Normalization and Unification: Telemetry aims to address the | |||
| overall network automation needs. Efforts are made to normalize | overall network automation needs. Efforts are made to normalize | |||
| the data representation and unify the protocols, so as to simplify | the data representation and unify the protocols, so as to simplify | |||
| data analysis and provide integrated analysis across heterogeneous | data analysis and provide integrated analysis across heterogeneous | |||
| devices and data sources across a network. | devices and data sources across a network. | |||
| * Model-based: The telemetry data is modeled in advance which allows | * Model-Based: Telemetry data is modeled in advance, which allows | |||
| applications to configure and consume data with ease. | applications to configure and consume data with ease. | |||
| * Data Fusion: The data for a single application can come from | * Data Fusion: The data for a single application can come from | |||
| multiple data sources (e.g., cross-domain, cross-device, and | multiple data sources (e.g., cross-domain, cross-device, and | |||
| cross-layer) based on common naming/ID and needs to be correlated | cross-layer) that are based on a common name/ID and need to be | |||
| to take effect. | correlated to take effect. | |||
| * Dynamic and Interactive: Since the network telemetry means to be | * Dynamic and Interactive: Since the network telemetry means to be | |||
| used in a closed control loop for network automation, it needs to | used in a closed control loop for network automation, it needs to | |||
| run continuously and adapt to the dynamic and interactive queries | run continuously and adapt to the dynamic and interactive queries | |||
| from the network operation controller. | from the network operation controller. | |||
| In addition, an ideal network telemetry solution may also have the | In addition, an ideal network telemetry solution may also have the | |||
| following features or properties: | following features or properties: | |||
| * In-Network Customization: The data that is generated can be | * In-Network Customization: The data that is generated can be | |||
| customized in network at run-time to cater to the specific need of | customized in network at runtime to cater to the specific need of | |||
| applications. This needs the support of a programmable data plane | applications. This needs the support of a programmable data | |||
| which allows probes with custom functions to be deployed at | plane, which allows probes with custom functions to be deployed at | |||
| flexible locations. | flexible locations. | |||
| * In-Network Data Aggregation and Correlation: Network devices and | * In-Network Data Aggregation and Correlation: Network devices and | |||
| aggregation points can work out which events and what data needs | aggregation points can work out which events and what data needs | |||
| to be stored, reported, or discarded thus reducing the load on the | to be stored, reported, or discarded, thus reducing the load on | |||
| central collection and processing points while still ensuring that | the central collection and processing points while still ensuring | |||
| the right information is ready to be processed in a timely way. | that the right information is ready to be processed in a timely | |||
| way. | ||||
| * In-Network Processing: Sometimes it is not necessary or feasible | * In-Network Processing: Sometimes it is not necessary or feasible | |||
| to gather all information to a central point to be processed and | to gather all information to a central point to be processed and | |||
| acted upon. It is possible for the data processing to be done in | acted upon. It is possible for the data processing to be done in | |||
| network, allowing reactive actions to be taken locally. | network, allowing reactive actions to be taken locally. | |||
| * Direct Data Plane Export: The data originated from the data plane | * Direct Data Plane Export: The data originated from data plane | |||
| forwarding chips can be directly exported to the data consumer for | forwarding chips can be directly exported to the data consumer for | |||
| efficiency, especially when the data bandwidth is large and the | efficiency, especially when the data bandwidth is large and real- | |||
| real-time processing is required. | time processing is required. | |||
| * In-band Data Collection: In addition to the passive and active | * In-Band Data Collection: In addition to the passive and active | |||
| data collection approaches, the new hybrid approach allows to | data collection approaches, the new hybrid approach allows to | |||
| directly collect data for any target flow on its entire forwarding | directly collect data for any target flow on its entire forwarding | |||
| path [I-D.song-opsawg-ifit-framework]. | path [OPSAWG-IFIT-FRAMEWORK]. | |||
| It is worth noting that a network telemetry system should not be | It is worth noting that a network telemetry system should not be | |||
| intrusive to normal network operations by avoiding the pitfall of the | intrusive to normal network operations by avoiding the pitfall of the | |||
| "observer effect". That is, it should not change the network | "observer effect". That is, it should not change the network | |||
| behavior and affect the forwarding performance. Moreover, high- | behavior and affect the forwarding performance. Moreover, high- | |||
| volume telemetry traffic may cause network congestion unless proper | volume telemetry traffic may cause network congestion unless proper | |||
| isolation or traffic engineering techniques are in place, or | isolation or traffic engineering techniques are in place, or | |||
| congestion control mechanisms ensure that telemetry traffic backs off | congestion control mechanisms ensure that telemetry traffic backs off | |||
| if it exceeds the network capacity. [RFC8084] and [RFC8085] are | if it exceeds the network capacity. [RFC8084] and [RFC8085] are | |||
| relevant Best Current Practices (BCP) in this space. | relevant Best Current Practices (BCPs) in this space. | |||
| Although in many cases a system for network telemetry involves a | Although in many cases a system for network telemetry involves a | |||
| remote data collecting and consuming entity, it is important to | remote data collecting and consuming entity, it is important to | |||
| understand that there are no inherent assumptions about how a system | understand that there are no inherent assumptions about how a system | |||
| should be architected. While a network architecture with centralized | should be architected. While a network architecture with a | |||
| controller (e.g., SDN) seems a natural fit for network telemetry, | centralized controller (e.g., SDN) seems to be a natural fit for | |||
| network telemetry can work in distributed fashions as well. For | network telemetry, network telemetry can work in distributed fashions | |||
| example, telemetry data producers and consumers can have a peer-to- | as well. For example, telemetry data producers and consumers can | |||
| peer relationship, in which a network node can be the direct consumer | have a peer-to-peer relationship, in which a network node can be the | |||
| of telemetry data from other nodes. | direct consumer of telemetry data from other nodes. | |||
| 2.5. The Necessity of a Network Telemetry Framework | 2.5. The Necessity of a Network Telemetry Framework | |||
| Network data analytics (e.g., machine learning) is applied for | Network data analytics (e.g., machine learning) is applied for | |||
| network operation automation, relying on abundant and coherent data | network operation automation, relying on abundant and coherent data | |||
| from networks. Data acquisition that is limited to a single source | from networks. Data acquisition that is limited to a single source | |||
| and static in nature will in many cases not be sufficient to meet an | and static in nature will in many cases not be sufficient to meet an | |||
| application's telemetry data needs. As a result, multiple data | application's telemetry data needs. As a result, multiple data | |||
| sources, involving a variety of techniques and standards, will need | sources, involving a variety of techniques and standards, will need | |||
| to be integrated. It is desirable to have a framework that | to be integrated. It is desirable to have a framework that | |||
| classifies and organizes different telemetry data source and types, | classifies and organizes different telemetry data sources and types, | |||
| defines different components of a network telemetry system and their | defines different components of a network telemetry system and their | |||
| interactions, and helps coordinate and integrate multiple telemetry | interactions, and helps coordinate and integrate multiple telemetry | |||
| approaches across layers. This allows flexible combinations of data | approaches across layers. This allows flexible combinations of data | |||
| for different applications, while normalizing and simplifying | for different applications, while normalizing and simplifying | |||
| interfaces. In detail, such a framework would benefit the | interfaces. In detail, such a framework would benefit the | |||
| development of network operation applications for the following | development of network operation applications for the following | |||
| reasons: | reasons: | |||
| * Future networks, autonomous or otherwise, depend on holistic and | * Future networks, autonomous or otherwise, depend on holistic and | |||
| comprehensive network visibility. The use cases and applications | comprehensive network visibility. Use cases and applications are | |||
| are better to be supported uniformly and coherently using an | better when supported uniformly and coherently using an | |||
| integrated, converged mechanism and common telemetry data | integrated, converged mechanism and common telemetry data | |||
| representations wherever feasible. Therefore, the protocols and | representations wherever feasible. Therefore, the protocols and | |||
| mechanisms should be consolidated into a minimum yet comprehensive | mechanisms should be consolidated into a minimum yet comprehensive | |||
| set. A telemetry framework can help to normalize the technique | set. A telemetry framework can help to normalize the technique | |||
| developments. | developments. | |||
| * Network visibility presents multiple viewpoints. For example, the | * Network visibility presents multiple viewpoints. For example, the | |||
| device viewpoint takes the network infrastructure as the | device viewpoint takes the network infrastructure as the | |||
| monitoring object from which the network topology and device | monitoring object from which the network topology and device | |||
| status can be acquired; the traffic viewpoint takes the flows or | status can be acquired, and the traffic viewpoint takes the flows | |||
| packets as the monitoring object from which the traffic quality | or packets as the monitoring object from which the traffic quality | |||
| and path can be acquired. An application may need to switch its | and path can be acquired. An application may need to switch its | |||
| viewpoint during operation. It may also need to correlate a | viewpoint during operation. It may also need to correlate a | |||
| service and its impact on user experience to acquire the | service and its impact on user experience (UE) to acquire the | |||
| comprehensive information. | comprehensive information. | |||
| * Applications require network telemetry to be elastic in order to | * Applications require network telemetry to be elastic in order to | |||
| make efficient use of network resources and reduce the impact of | make efficient use of network resources and reduce the impact of | |||
| processing related to network telemetry on network performance. | processing related to network telemetry on network performance. | |||
| For example, routine network monitoring should cover the entire | For example, routine network monitoring should cover the entire | |||
| network with a low data sampling rate. Only when issues arise or | network with a low data sampling rate. Only when issues arise or | |||
| critical trends emerge should telemetry data sources be modified | critical trends emerge should telemetry data sources be modified | |||
| and telemetry data rates boosted as needed. | and telemetry data rates be boosted as needed. | |||
| * Efficient data aggregation is critical for applications to reduce | * Efficient data aggregation is critical for applications to reduce | |||
| the overall quantity of data and improve the accuracy of analysis. | the overall quantity of data and improve the accuracy of analysis. | |||
| A telemetry framework collects together all the telemetry-related | A telemetry framework collects all the telemetry-related works from | |||
| works from different sources and working groups within IETF. This | different sources and working groups within the IETF. This makes it | |||
| makes it possible to assemble a comprehensive network telemetry | possible to assemble a comprehensive network telemetry system and to | |||
| system and to avoid repetitious or redundant work. The framework | avoid repetitious or redundant work. The framework should cover the | |||
| should cover the concepts and components from the standardization | concepts and components from the standardization perspective. This | |||
| perspective. This document describes the modules which make up a | document describes the modules that make up a network telemetry | |||
| network telemetry framework and decomposes the telemetry system into | framework and decomposes the telemetry system into a set of distinct | |||
| a set of distinct components that existing and future work can easily | components that existing and future work can easily map to. | |||
| map to. | ||||
| 3. Network Telemetry Framework | 3. Network Telemetry Framework | |||
| The top level network telemetry framework partitions the network | The top-level network telemetry framework partitions the network | |||
| telemetry into four modules based on the telemetry data object source | telemetry into four modules based on the telemetry data object source | |||
| and represents their relationship. Once the network operation | and represents their relationship. Once the network operation | |||
| applications acquire the data from these modules, they can apply data | applications acquire the data from these modules, they can apply data | |||
| analytics and take actions. At the next level, the framework | analytics and take actions. At the next level, the framework | |||
| decomposes each module into separate components. Each of the modules | decomposes each module into separate components. Each of these | |||
| follows the same underlying structure, with one component dedicated | modules follows the same underlying structure, with one component | |||
| to the configuration of data subscriptions and data sources, a second | dedicated to the configuration of data subscriptions and data | |||
| component dedicated to encoding and exporting data, and a third | sources, a second component dedicated to encoding and exporting data, | |||
| component instrumenting the generation of telemetry related to the | and a third component instrumenting the generation of telemetry | |||
| underlying resources. Throughout the framework, the same set of | related to the underlying resources. Throughout the framework, the | |||
| abstract data acquiring mechanisms and data types (Section 3.3) are | same set of abstract data-acquiring mechanisms and data types | |||
| applied. The two-level architecture with the uniform data | (Section 3.3) are applied. The two-level architecture with the | |||
| abstraction helps accurately pinpoint a protocol or technique to its | uniform data abstraction helps accurately pinpoint a protocol or | |||
| position in a network telemetry system or disaggregate a network | technique to its position in a network telemetry system or | |||
| telemetry system into manageable parts. | disaggregates a network telemetry system into manageable parts. | |||
| 3.1. Top Level Modules | 3.1. Top-Level Modules | |||
| Telemetry can be applied on the forwarding plane, the control plane, | Telemetry can be applied on the forwarding plane, control plane, and | |||
| and the management plane in a network, as well as other sources out | management plane in a network, as well as on other sources out of the | |||
| of the network, as shown in Figure 1. Therefore, we categorize the | network, as shown in Figure 1. Therefore, we categorize the network | |||
| network telemetry into four distinct modules (management plane, | telemetry into four distinct modules (management plane, control | |||
| control plane, forwarding plane, and external data and event | plane, forwarding plane, and external data and event telemetry) with | |||
| telemetry) with each having its own interface to Network Operation | each having its own interface to network operation applications. | |||
| Applications. | ||||
| +------------------------------+ | +------------------------------+ | |||
| | | | | | | |||
| | Network Operation |<-------+ | | Network Operation |<-------+ | |||
| | Applications | | | | Applications | | | |||
| | | | | | | | | |||
| +------------------------------+ | | +------------------------------+ | | |||
| ^ ^ ^ | | ^ ^ ^ | | |||
| | | | | | | | | | | |||
| V V | V | V V | V | |||
| skipping to change at page 15, line 39 ¶ | skipping to change at line 709 ¶ | |||
| | Management | ^ V | | Telemetry | | | Management | ^ V | | Telemetry | | |||
| | Plane +-------|-------+ | | | | Plane +-------|-------+ | | | |||
| | Telemetry | V | +-----------+ | | Telemetry | V | +-----------+ | |||
| | | Forwarding | | | | Forwarding | | |||
| | | Plane | | | | Plane | | |||
| | <---> | | | <---> | | |||
| | | Telemetry | | | | Telemetry | | |||
| | | | | | | | | |||
| +--------------+---------------+ | +--------------+---------------+ | |||
| Figure 1: Modules in Layer Category of NTF | Figure 1: Modules in Layer Category of the Network Telemetry | |||
| Framework | ||||
| The rationale of this partition lies in the different telemetry data | The rationale of this partition lies in the different telemetry data | |||
| objects which result in different data source and export locations. | objects that result in different data sources and export locations. | |||
| Such differences have profound implications on in-network data | Such differences have profound implications on in-network data | |||
| programming and processing capability, data encoding and transport | programming and processing capability, data encoding and the | |||
| protocol, and required data bandwidth and latency. Data can be sent | transport protocol, and required data bandwidth and latency. Data | |||
| directly, or proxied via the control and management planes. There | can be sent directly or proxied via the control and management | |||
| are advantages/disadvantages to both approaches. | planes. There are advantages/disadvantages to both approaches. | |||
| Note that in some cases the network controller itself may be the | Note that in some cases, the network controller itself may be the | |||
| source of telemetry data that is unique to it or derived from the | source of telemetry data that is unique to it or derived from the | |||
| telemetry data collected from the network elements. Some of the | telemetry data collected from the network elements. Some of the | |||
| principles and taxonomy specific to the control plane and management | principles and taxonomy specific to the control plane and management | |||
| plane telemetry could also be applied to the controller when it is | plane telemetry could also be applied to the controller when it is | |||
| required to provide the telemetry data to Network Operation | required to provide the telemetry data to network operation | |||
| Applications hosted outside. The scope of the document is focused on | applications hosted outside. The scope of this document is focused | |||
| the network elements telemetry and further details related to | on the network elements telemetry, and further details related to | |||
| controllers are thus out of scope. | controllers are thus out of scope. | |||
| We summarize the major differences of the four modules in the | We summarize the major differences of the four modules in Table 1. | |||
| following table. They are compared from six angles: | They are compared from six angles: | |||
| * Data Object | * Data Object | |||
| * Data Export Location | * Data Export Location | |||
| * Data Model | * Data Model | |||
| * Data Encoding | * Data Encoding | |||
| * Telemetry Application Protocol | * Telemetry Application Protocol | |||
| skipping to change at page 16, line 34 ¶ | skipping to change at line 754 ¶ | |||
| Data Object is the target and source of each module. Because the | Data Object is the target and source of each module. Because the | |||
| data source varies, the location where data is mostly conveniently | data source varies, the location where data is mostly conveniently | |||
| exported also varies. For example, forwarding plane data mainly | exported also varies. For example, forwarding plane data mainly | |||
| originates as data exported from the forwarding Application-Specific | originates as data exported from the forwarding Application-Specific | |||
| Integrated Circuits (ASICs), while control plane data mainly | Integrated Circuits (ASICs), while control plane data mainly | |||
| originates from the protocol daemons running on the control CPU(s). | originates from the protocol daemons running on the control CPU(s). | |||
| For convenience and efficiency, it is preferred to export the data | For convenience and efficiency, it is preferred to export the data | |||
| off the device from locations near the source. Because the locations | off the device from locations near the source. Because the locations | |||
| that can export data have different capabilities, different choices | that can export data have different capabilities, different choices | |||
| of data model, encoding, and transport method are made to balance the | of data models, encoding, and transport methods are made to balance | |||
| performance and cost. For example, the forwarding chip has high | the performance and cost. For example, the forwarding chip has high | |||
| throughput but limited capacity for processing complex data and | throughput but limited capacity for processing complex data and | |||
| maintaining state, while the main control CPU is capable of complex | maintaining state, while the main control CPU is capable of complex | |||
| data and state processing, but has limited bandwidth for high | data and state processing but has limited bandwidth for high | |||
| throughput data. As a result, the suitable telemetry protocol for | throughput data. As a result, the suitable telemetry protocol for | |||
| each module can be different. Some representative techniques are | each module can be different. Some representative techniques are | |||
| shown in the corresponding table blocks to highlight the technical | shown in the corresponding table blocks to highlight the technical | |||
| diversity of these modules. Note that the selected techniques just | diversity of these modules. Note that the selected techniques just | |||
| reflect the de facto state of the art and are by no means exhaustive | reflect the de facto state of the art and are by no means exhaustive | |||
| (e.g., IPFIX can also be implemented over TCP and SCTP, but that is | (e.g., IPFIX can also be implemented over TCP and SCTP, but that is | |||
| not recommended for forwarding plane). The key point is that one | not recommended for the forwarding plane). The key point is that one | |||
| cannot expect to use a universal protocol to cover all the network | cannot expect to use a universal protocol to cover all the network | |||
| telemetry requirements. | telemetry requirements. | |||
| +-----------+-------------+-------------+--------------+----------+ | +=============+===============+==========+==========+===============+ | |||
| | Module |Management |Control |Forwarding |External | | |Module |Management |Control |Forwarding|External Data | | |||
| | |Plane |Plane |Plane |Data | | | |Plane |Plane |Plane | | | |||
| +-----------+-------------+-------------+--------------+----------+ | +=============+===============+==========+==========+===============+ | |||
| |Object |config. & |control |flow & packet |terminal, | | |Object |configuration |control |flow and |terminal, | | |||
| | |operation |protocol & |QoS, traffic |social & | | | |and operation |protocol |packet |social, and | | |||
| | |state |signaling, |stat., buffer |environ- | | | |state |and |QoS, |environmental | | |||
| | | |RIB |& queue stat.,|mental | | | | |signaling,|traffic | | | |||
| | | | |ACL, FIB | | | | | |RIB |stat., | | | |||
| +-----------+-------------+-------------+--------------+----------+ | | | | |buffer and| | | |||
| |Export |main control |main control |fwding chip |various | | | | | |queue | | | |||
| |Location |CPU |CPU, |or linecard | | | | | | |stat., | | | |||
| | | |linecard CPU |CPU; main | | | | | | |FIB, | | | |||
| | | |or forwarding|control CPU | | | | | | |Access | | | |||
| | | |chip |unlikely | | | | | | |Control | | | |||
| +-----------+-------------+-------------+--------------+----------+ | | | | |List (ACL)| | | |||
| |Data |YANG, MIB, |YANG, |YANG |YANG, | | +-------------+---------------+----------+----------+---------------+ | |||
| |Model |syslog |custom |custom, |custom | | |Export |main control |main |forwarding|various | | |||
| +-----------+-------------+-------------+--------------+----------+ | |Location |CPU |control |chip or | | | |||
| |Data |GPB, JSON, |GPB, JSON, |plain text |GPB, JSON | | | | |CPU, |linecard | | | |||
| |Encoding |XML |XML, | |XML, plain| | | | |linecard |CPU; main | | | |||
| | | |plain text | |text | | | | |CPU, or |control | | | |||
| +-----------+-------------+-------------+--------------+----------+ | | | |forwarding|CPU | | | |||
| |Application|gRPC,NETCONF,|gRPC,NETCONF,|IPFIX, traffic|gRPC | | | | |chip |unlikely | | | |||
| |Protocol |RESTCONF |IPFIX,traffic|mirroring, | | | +-------------+---------------+----------+----------+---------------+ | |||
| | | |mirroring |gRPC, NETFLOW | | | |Data Model |YANG, MIB, |YANG, |YANG, |YANG, custom | | |||
| +-----------+-------------+-------------+--------------+----------+ | | |syslog |custom |custom | | | |||
| |Data |HTTP(S), TCP |HTTP(S), TCP,|UDP |HTTP(S), | | +-------------+---------------+----------+----------+---------------+ | |||
| |Transport | |UDP | |TCP, UDP | | |Data Encoding|GPB, JSON, XML |GPB, JSON,|plain text|GPB, JSON, XML,| | |||
| +-----------+-------------+-------------+--------------+----------+ | | | |XML, plain| |plain text | | |||
| | | |text | | | | ||||
| +-------------+---------------+----------+----------+---------------+ | ||||
| |Application |gRPC, NETCONF, |gRPC, |IPFIX, |gRPC | | ||||
| |Protocol |RESTCONF |NETCONF, |traffic | | | ||||
| | | |IPFIX, |mirroring,| | | ||||
| | | |traffic |gRPC, | | | ||||
| | | |mirroring |NETFLOW | | | ||||
| +-------------+---------------+----------+----------+---------------+ | ||||
| |Data |HTTP(S), TCP |HTTP(S), |UDP |HTTP(S), TCP, | | ||||
| |Transport | |TCP, UDP | |UDP | | ||||
| +-------------+---------------+----------+----------+---------------+ | ||||
| Figure 2: Comparison of the Data Object Modules | Table 1: Comparison of Data Object Modules | |||
| Note that the interaction with the applications that consume network | Note that the interaction with the applications that consume network | |||
| telemetry data can be indirect. Some in-device data transfer is | telemetry data can be indirect. Some in-device data transfer is | |||
| possible. For example, in the management plane telemetry, the | possible. For example, in the management plane telemetry, the | |||
| management plane will need to acquire data from the data plane. Some | management plane will need to acquire data from the data plane. Some | |||
| operational states can only be derived from data plane data sources | operational states can only be derived from data plane data sources | |||
| such as the interface status and statistics. As another example, | such as the interface status and statistics. As another example, | |||
| obtaining control plane telemetry data may require the ability to | obtaining control plane telemetry data may require the ability to | |||
| access the Forwarding Information Base (FIB) of the data plane. | access the Forwarding Information Base (FIB) of the data plane. | |||
| skipping to change at page 18, line 13 ¶ | skipping to change at line 835 ¶ | |||
| the control plane telemetry. | the control plane telemetry. | |||
| The requirements and challenges for each module are summarized as | The requirements and challenges for each module are summarized as | |||
| follows (note that the requirements may pertain across all telemetry | follows (note that the requirements may pertain across all telemetry | |||
| modules; however, we emphasize those that are most pronounced for a | modules; however, we emphasize those that are most pronounced for a | |||
| particular plane). | particular plane). | |||
| 3.1.1. Management Plane Telemetry | 3.1.1. Management Plane Telemetry | |||
| The management plane of network elements interacts with the Network | The management plane of network elements interacts with the Network | |||
| Management System (NMS), and provides information such as performance | Management System (NMS) and provides information such as performance | |||
| data, network logging data, network warning and defects data, and | data, network logging data, network warning and defects data, and | |||
| network statistics and state data. The management plane includes | network statistics and state data. The management plane includes | |||
| many protocols, including the classical SNMP and syslog. Regardless | many protocols, including the classical SNMP and syslog. Regardless | |||
| the protocol, management plane telemetry must address the following | the protocol, management plane telemetry must address the following | |||
| requirements: | requirements: | |||
| * Convenient Data Subscription: An application should have the | * Convenient Data Subscription: An application should have the | |||
| freedom to choose which data is exported (see section 4.3) and the | freedom to choose which data is exported (see Section 3.3) and the | |||
| means and frequency of how that data is exported (e.g., on-change | means and frequency of how that data is exported (e.g., on-change | |||
| or periodic subscription). | or periodic subscription). | |||
| * Structured Data: For automatic network operation, machines will | * Structured Data: For automatic network operation, machines will | |||
| replace human for network data comprehension. Data modeling | replace humans for network data comprehension. Data modeling | |||
| languages, such as YANG, can efficiently describe structured data | languages, such as YANG, can efficiently describe structured data | |||
| and normalize data encoding and transformation. | and normalize data encoding and transformation. | |||
| * High Speed Data Transport: In order to keep up with the velocity | * High-Speed Data Transport: In order to keep up with the velocity | |||
| of information, a data source needs to be able to send large | of information, a data source needs to be able to send large | |||
| amounts of data at high frequency. Compact encoding formats or | amounts of data at high frequency. Compact encoding formats or | |||
| data compression schemes are needed to reduce the quantity of data | data compression schemes are needed to reduce the quantity of data | |||
| and improve the data transport efficiency. The subscription mode, | and improve the data transport efficiency. The subscription mode, | |||
| by replacing the query mode, reduces the interactions between | by replacing the query mode, reduces the interactions between | |||
| clients and servers and helps to improve the data source's | clients and servers and helps to improve the data source's | |||
| efficiency. | efficiency. | |||
| * Network Congestion Avoidance: The application must protect the | * Network Congestion Avoidance: The application must protect the | |||
| network from congestion by congestion control mechanisms or at | network from congestion with congestion control mechanisms or, at | |||
| least circuit breakers. [RFC8084] and [RFC8085] provide some | minimum, with circuit breakers. [RFC8084] and [RFC8085] provide | |||
| solutions in this space. | some solutions in this space. | |||
| 3.1.2. Control Plane Telemetry | 3.1.2. Control Plane Telemetry | |||
| The control plane telemetry refers to the health condition monitoring | The control plane telemetry refers to the health condition monitoring | |||
| of different network control protocols at all layers of the protocol | of different network control protocols at all layers of the protocol | |||
| stack. Keeping track of the operational status of these protocols is | stack. Keeping track of the operational status of these protocols is | |||
| beneficial for detecting, localizing, and even predicting various | beneficial for detecting, localizing, and even predicting various | |||
| network issues, as well as network optimization, in real-time and | network issues, as well as for network optimization, in real time and | |||
| with fine granularity. Some particular challenges and issues faced | with fine granularity. Some particular challenges and issues faced | |||
| by the control plane telemetry are as follows: | by the control plane telemetry are as follows: | |||
| * One challenging problem for the control plane telemetry is how to | * How to correlate the End-to-End (E2E) Key Performance Indicators | |||
| correlate the End-to-End (E2E) Key Performance Indicators (KPI) to | (KPIs) to a specific layer's KPIs. For example, IPTV users may | |||
| a specific layer's KPIs. For example, IPTV users may describe | describe their UE by the video smoothness and definition. Then in | |||
| their User Experience (UE) by the video smoothness and definition. | case of an unusually poor UE KPI or a service disconnection, it is | |||
| Then in case of an unusually poor UE KPI or a service | non-trivial to delimit and pinpoint the issue in the responsible | |||
| disconnection, it is non-trivial to delimit and pinpoint the issue | protocol layer (e.g., the transport layer or the network layer), | |||
| in the responsible protocol layer (e.g., the Transport Layer or | the responsible protocol (e.g., IS-IS or BGP at the network | |||
| the Network Layer), the responsible protocol (e.g., ISIS or BGP at | layer), and finally the responsible device(s) with specific | |||
| the Network Layer), and finally the responsible device(s) with | reasons. | |||
| specific reasons. | ||||
| * Conventional OAM-based approaches for control plane KPI | * Conventional OAM-based approaches for control plane KPI | |||
| measurement include Ping (L3), Traceroute (L3), Y.1731 [y1731] | measurement, which include Ping (L3), Traceroute (L3), Y.1731 | |||
| (L2), and so on. One common issue behind these methods is that | [y1731] (L2), and so on. One common issue behind these methods is | |||
| they only measure the KPIs instead of reflecting the actual | that they only measure the KPIs instead of reflecting the actual | |||
| running status of these protocols, making them less effective or | running status of these protocols, making them less effective or | |||
| efficient for control plane troubleshooting and network | efficient for control plane troubleshooting and network | |||
| optimization. | optimization. | |||
| * An example of the control plane telemetry is the BGP monitoring | * How more research is needed for the BGP monitoring protocol (BMP). | |||
| protocol (BMP). It is currently used for monitoring the BGP | BMP is an example of the control plane telemetry; it is currently | |||
| routes and enables rich applications, such as BGP peer analysis, | used for monitoring BGP routes and enables rich applications, such | |||
| AS analysis, prefix analysis, and security analysis. However, the | as BGP peer analysis, Autonomous System (AS) analysis, prefix | |||
| monitoring of other layers, protocols and the cross-layer, cross- | analysis, and security analysis. However, the monitoring of other | |||
| protocol KPI correlations are still in their infancy (e.g., IGP | layers, protocols, and the cross-layer, cross-protocol KPI | |||
| monitoring is not as extensive as BMP), which require further | correlations are still in their infancy (e.g., IGP monitoring is | |||
| research. | not as extensive as BMP), which requires further research. | |||
| * The requirement and solutions for network congestion avoidance are | Note that the requirement and solutions for network congestion | |||
| also applicable to the control plane telemetry. | avoidance are also applicable to the control plane telemetry. | |||
| 3.1.3. Forwarding Plane Telemetry | 3.1.3. Forwarding Plane Telemetry | |||
| An effective forwarding plane telemetry system relies on the data | An effective forwarding plane telemetry system relies on the data | |||
| that the network device can expose. The quality, quantity, and | that the network device can expose. The quality, quantity, and | |||
| timeliness of data must meet some stringent requirements. This | timeliness of data must meet some stringent requirements. This | |||
| raises some challenges to the network data plane devices where the | raises some challenges for the network data plane devices where the | |||
| first-hand data originates. | first-hand data originates. | |||
| * A data plane device's main function is user traffic processing and | * A data plane device's main function is user traffic processing and | |||
| forwarding. While supporting network visibility is important, the | forwarding. While supporting network visibility is important, the | |||
| telemetry is just an auxiliary function, and it should strive to | telemetry is just an auxiliary function, and it should strive to | |||
| not impede normal traffic processing and forwarding (i.e., the | not impede normal traffic processing and forwarding (i.e., the | |||
| forwarding behavior should not be altered and the trade-off | forwarding behavior should not be altered, and the trade-off | |||
| between forwarding performance and telemetry should be well- | between forwarding performance and telemetry should be well- | |||
| balanced). | balanced). | |||
| * Network operation applications require end-to-end visibility | * Network operation applications require end-to-end visibility | |||
| across various sources, which can result in a huge volume of data. | across various sources, which can result in a huge volume of data. | |||
| However, the sheer quantity of data must not exhaust the network | However, the sheer quantity of data must not exhaust the network | |||
| bandwidth, regardless of the data delivery approach (i.e., whether | bandwidth, regardless of the data delivery approach (i.e., whether | |||
| through in-band or out-of-band channels). | through in-band or out-of-band channels). | |||
| * The data plane devices must provide timely data with the minimum | * The data plane devices must provide timely data with the minimum | |||
| possible delay. Long processing, transport, storage, and analysis | possible delay. Long processing, transport, storage, and analysis | |||
| delay can impact the effectiveness of the control loop and even | delay can impact the effectiveness of the control loop and even | |||
| render the data useless. | render the data useless. | |||
| * The data should be structured and labeled, and easy for | * The data should be structured, labeled, and easy for applications | |||
| applications to parse and consume. At the same time, the data | to parse and consume. At the same time, the data types needed by | |||
| types needed by applications can vary significantly. The data | applications can vary significantly. The data plane devices need | |||
| plane devices need to provide enough flexibility and | to provide enough flexibility and programmability to support the | |||
| programmability to support the precise data provision for | precise data provision for applications. | |||
| applications. | ||||
| * The data plane telemetry should support incremental deployment and | * The data plane telemetry should support incremental deployment and | |||
| work even though some devices are unaware of the system. | work even though some devices are unaware of the system. | |||
| * The requirement and solutions for network congestion avoidance are | * The requirement and solutions for network congestion avoidance are | |||
| also applicable to the forwarding plane telemetry. | also applicable to the forwarding plane telemetry. | |||
| Although not specific to the forwarding plane, these challenges are | Although not specific to the forwarding plane, these challenges are | |||
| more difficult to the forwarding plane because of the limited | more difficult for the forwarding plane because of the limited | |||
| resource and flexibility. Data plane programmability is essential to | resources and flexibility. Data plane programmability is essential | |||
| support network telemetry. Newer data plane forwarding chips are | to support network telemetry. Newer data plane forwarding chips are | |||
| equipped with advanced telemetry features and provide flexibility to | equipped with advanced telemetry features and provide flexibility to | |||
| support customized telemetry functions. | support customized telemetry functions. | |||
| Technique Taxonomy: concerning about how one instruments the | Technique Taxonomy: This pertains to how one instruments the | |||
| telemetry, there can be multiple possible dimensions to classify the | telemetry; there can be multiple possible dimensions to classify the | |||
| forwarding plane telemetry techniques. | forwarding plane telemetry techniques. | |||
| * Active, Passive, and Hybrid: This dimension concerns about the | * Active, Passive, and Hybrid: This dimension pertains to the end- | |||
| end-to-end measurement. Active and passive methods (as well as | to-end measurement. Active and passive methods (as well as the | |||
| the hybrid types) are well documented in [RFC7799]. Passive | hybrid types) are well documented in [RFC7799]. Passive methods | |||
| methods include TCPDUMP, IPFIX [RFC7011], sFlow, and traffic | include TCPDUMP, IPFIX [RFC7011], sFlow, and traffic mirroring. | |||
| mirroring. These methods usually have low data coverage. The | These methods usually have low data coverage. The bandwidth cost | |||
| bandwidth cost is very high in order to improve the data coverage. | is very high in order to improve the data coverage. On the other | |||
| On the other hand, active methods include Ping, OWAMP [RFC4656], | hand, active methods include Ping, the One-Way Active Measurement | |||
| TWAMP [RFC5357], STAMP [RFC8762], and Cisco's SLA Protocol | Protocol (OWAMP) [RFC4656], the Two-Way Active Measurement | |||
| [RFC6812]. These methods are intrusive and only provide indirect | Protocol (TWAMP) [RFC5357], the Simple Two-way Active Measurement | |||
| network measurements. Hybrid methods, including in-situ OAM | Protocol (STAMP) [RFC8762], and Cisco's SLA Protocol [RFC6812]. | |||
| [I-D.ietf-ippm-ioam-data], Alternate-Marking (AM) [RFC8321], and | These methods are intrusive and only provide indirect network | |||
| Multipoint Alternate Marking [RFC8889], provide a well-balanced | measurements. Hybrid methods, including IOAM [RFC9197], Alternate | |||
| and more flexible approach. However, these methods are also more | Marking (AM) [RFC8321], and Multipoint Alternate Marking | |||
| complex to implement. | [RFC8889], provide a well-balanced and more flexible approach. | |||
| However, these methods are also more complex to implement. | ||||
| * In-Band and Out-of-Band: Telemetry data carried in user packets | * In-Band and Out-of-Band: Telemetry data carried in user packets | |||
| before being exported to a data collector is considered in-band | before being exported to a data collector is considered in-band | |||
| (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]). Telemetry data | (e.g., IOAM [RFC9197]). Telemetry data that is directly exported | |||
| that is directly exported to a data collector without modifying | to a data collector without modifying user packets is considered | |||
| user packets is considered out-of-band (e.g., the postcard-based | out-of-band (e.g., the postcard-based approach described in | |||
| approach described in Appendix A.3.5). It is also possible to | Appendix A.3.5). It is also possible to have hybrid methods, | |||
| have hybrid methods, where only the telemetry instruction or | where only the telemetry instruction or partial data is carried by | |||
| partial data is carried by user packets (e.g., AM [RFC8321]). | user packets (e.g., AM [RFC8321]). | |||
| * End-to-End and In-Network: End-to-End methods start from, and end | * End-to-End and In-Network: End-to-end methods start from, and end | |||
| at, the network end hosts (e.g., Ping). In-Network methods work | at, the network end hosts (e.g., Ping). In-network methods work | |||
| in networks and are transparent to end hosts. However, if needed, | in networks and are transparent to end hosts. However, if needed, | |||
| In-Network methods can be easily extended into end hosts. | in-network methods can be easily extended into end hosts. | |||
| * Data Subject: Depending on the telemetry objective, the methods | * Data Subject: Depending on the telemetry objective, the methods | |||
| can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), | can be flow based (e.g., IOAM [RFC9197]), path based (e.g., | |||
| path-based (e.g., Traceroute), and node-based (e.g., IPFIX | Traceroute), and node based (e.g., IPFIX [RFC7011]). The various | |||
| [RFC7011]). The various data objects can be packet, flow record, | data objects can be packet, flow record, measurement, states, and | |||
| measurement, states, and signal. | signal. | |||
| 3.1.4. External Data Telemetry | 3.1.4. External Data Telemetry | |||
| Events that occur outside the boundaries of the network system are | Events that occur outside the boundaries of the network system are | |||
| another important source of network telemetry. Correlating both | another important source of network telemetry. Correlating both | |||
| internal telemetry data and external events with the requirements of | internal telemetry data and external events with the requirements of | |||
| network systems, as presented in | network systems, as presented in [NMRG-ANTICIPATED-ADAPTATION], | |||
| [I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and | provides a strategic and functional advantage to management | |||
| functional advantage to management operations. | operations. | |||
| As with other sources of telemetry information, the data and events | As with other sources of telemetry information, the data and events | |||
| must meet strict requirements, especially in terms of timeliness, | must meet strict requirements, especially in terms of timeliness, | |||
| which is essential to properly incorporate external event information | which is essential to properly incorporate external event information | |||
| into network management applications. The specific challenges are | into network management applications. The specific challenges are | |||
| described as follows: | described as follows: | |||
| * The role of the external event detector can be played by multiple | * The role of the external event detector can be played by multiple | |||
| elements, including hardware (e.g., physical sensors, such as | elements, including hardware (e.g., physical sensors, such as | |||
| seismometers) and software (e.g., Big Data sources that can | seismometers) and software (e.g., big data sources that can | |||
| analyze streams of information, such as Twitter messages). Thus, | analyze streams of information, such as Twitter messages). Thus, | |||
| the transmitted data must support different shapes but, at the | the transmitted data must support different shapes but, at the | |||
| same time, follow a common but extensible schema. | same time, follow a common but extensible schema. | |||
| * Since the main function of the external event detectors is to | * Since the main function of the external event detectors is to | |||
| perform the notifications, their timeliness is assumed. However, | perform the notifications, their timeliness is assumed. However, | |||
| once messages have been dispatched, they must be quickly collected | once messages have been dispatched, they must be quickly collected | |||
| and inserted into the control plane with variable priority, which | and inserted into the control plane with variable priority, which | |||
| is higher for important sources and events and lower for secondary | is higher for important sources and events and lower for secondary | |||
| ones. | ones. | |||
| * The schema used by external detectors must be easily adopted by | * The schema used by external detectors must be easily adopted by | |||
| current and future devices and applications. Therefore, it must | current and future devices and applications. Therefore, it must | |||
| be easily mapped to current data models, such as in terms of YANG. | be easily mapped to current data models, such as in terms of YANG. | |||
| * As the communication with external entities outside the boundary | * As the communication with external entities outside the boundary | |||
| of a provider network may be realized over the Internet, the risk | of a provider network may be realized over the Internet, the risk | |||
| of congestion is even more relevant in this context and proper | of congestion is even more relevant in this context and proper | |||
| counter-measures must be taken. Solutions such as network | countermeasures must be taken. Solutions such as network | |||
| transport circuit breakers are needed as well. | transport circuit breakers are needed as well. | |||
| Organizing both internal and external telemetry information together | Organizing both internal and external telemetry information together | |||
| will be key for the general exploitation of the management | will be key for the general exploitation of the management | |||
| possibilities of current and future network systems, as reflected in | possibilities of current and future network systems, as reflected in | |||
| the incorporation of cognitive capabilities to new hardware and | the incorporation of cognitive capabilities to new hardware and | |||
| software (virtual) elements. | software (virtual) elements. | |||
| 3.2. Second Level Function Components | 3.2. Second-Level Function Components | |||
| The telemetry module at each plane can be further partitioned into | The telemetry module at each plane can be further partitioned into | |||
| five distinct conceptual components: | five distinct conceptual components: | |||
| * Data Query, Analysis, and Storage: This component works at the | * Data Query, Analysis, and Storage: This component works at the | |||
| network operation application block in Figure 1. It is normally a | network operation application block in Figure 1. It is normally a | |||
| part of the network management system at the receiver side. On | part of the network management system at the receiver side. On | |||
| the one hand, it is responsible for issuing data requirements. | one hand, it is responsible for issuing data requirements. The | |||
| The data of interest can be modeled data through configuration or | data of interest can be modeled data through configuration or | |||
| custom data through programming. The data requirements can be | custom data through programming. The data requirements can be | |||
| queries for one-shot data or subscriptions for events or streaming | queries for one-shot data or subscriptions for events or streaming | |||
| data. On the other hand, it receives, stores, and processes the | data. On the other hand, it receives, stores, and processes the | |||
| returned data from network devices. Data analysis can be | returned data from network devices. Data analysis can be | |||
| interactive to initiate further data queries. This component can | interactive to initiate further data queries. This component can | |||
| reside in either network devices or remote controllers. It can be | reside in either network devices or remote controllers. It can be | |||
| centralized and distributed, and involve one or more instances. | centralized and distributed and involve one or more instances. | |||
| * Data Configuration and Subscription: This component manages data | * Data Configuration and Subscription: This component manages data | |||
| queries on devices. It determines the protocol and channel for | queries on devices. It determines the protocol and channel for | |||
| applications to acquire desired data. This component is also | applications to acquire desired data. This component is also | |||
| responsible for configuring the desired data that might not be | responsible for configuring the desired data that might not be | |||
| directly available from data sources. The subscription data can | directly available from data sources. The subscription data can | |||
| be described by models, templates, or programs. | be described by models, templates, or programs. | |||
| * Data Encoding and Export: This component determines how telemetry | * Data Encoding and Export: This component determines how telemetry | |||
| data is delivered to the data analysis and storage component with | data is delivered to the data analysis and storage component with | |||
| skipping to change at page 23, line 30 ¶ | skipping to change at line 1075 ¶ | |||
| vary due to the data export location. | vary due to the data export location. | |||
| * Data Generation and Processing: The requested data needs to be | * Data Generation and Processing: The requested data needs to be | |||
| captured, filtered, processed, and formatted in network devices | captured, filtered, processed, and formatted in network devices | |||
| from raw data sources. This may involve in-network computing and | from raw data sources. This may involve in-network computing and | |||
| processing on either the fast path or the slow path in network | processing on either the fast path or the slow path in network | |||
| devices. | devices. | |||
| * Data Object and Source: This component determines the monitoring | * Data Object and Source: This component determines the monitoring | |||
| objects and original data sources provisioned in the device. A | objects and original data sources provisioned in the device. A | |||
| data source usually just provides raw data which needs further | data source usually just provides raw data that needs further | |||
| processing. Each data source can be considered a probe. Some | processing. Each data source can be considered a probe. Some | |||
| data sources can be dynamically installed, while others will be | data sources can be dynamically installed, while others will be | |||
| more static. | more static. | |||
| +----------------------------------------+ | +----------------------------------------+ | |||
| +----------------------------------------+ | | +----------------------------------------+ | | |||
| | | | | | | | | |||
| | Data Query, Analysis, & Storage | | | | Data Query, Analysis, & Storage | | | |||
| | | + | | | + | |||
| +-------+++ -----------------------------+ | +-------+++ -----------------------------+ | |||
| ||| ^^^ | ||| ^^^ | |||
| ||| ||| | ||| ||| | |||
| ||V ||| | ||V ||| | |||
| +--+V--------------------+++------------+ | +--+V--------------------+++------------+ | |||
| +-----V---------------------+------------+ | | +-----V---------------------+------------+ | | |||
| +---------------------+-------+----------+ | | | +---------------------+-------+----------+ | | | |||
| | Data Configuration | | | | | | Data Configuration | | | | | |||
| | & Subscription | Data Encoding | | | | | & Subscription | Data Encoding | | | | |||
| | (model, template, | & Export | | | | | (model, template, | & Export | | | | |||
| | & program) | | | | | | & program) | | | | | |||
| +---------------------+------------------| | | | +---------------------+------------------| | | | |||
| | | | | | | | | | | |||
| | Data Generation | | | | | Data Generation | | | | |||
| | & Processing | | | | | & Processing | | | | |||
| | | | | | | | | | | |||
| +----------------------------------------| | | | +----------------------------------------| | | | |||
| | | | | | | | | | | |||
| | Data Object and Source | |-+ | | Data Object and Source | |-+ | |||
| | |-+ | | |-+ | |||
| +----------------------------------------+ | +----------------------------------------+ | |||
| Figure 3: Components in the Network Telemetry Framework | Figure 2: Components in the Network Telemetry Framework | |||
| 3.3. Data Acquisition Mechanism and Type Abstraction | 3.3. Data Acquisition Mechanism and Type Abstraction | |||
| Broadly speaking, network data can be acquired through subscription | Broadly speaking, network data can be acquired through subscription | |||
| (push) and query (poll). A subscription is a contract between | (push) and query (poll). A subscription is a contract between | |||
| publisher and subscriber. After initial setup, the subscribed data | publisher and subscriber. After initial setup, the subscribed data | |||
| is automatically delivered to registered subscribers until the | is automatically delivered to registered subscribers until the | |||
| subscription expires. There are two variations of subscription. The | subscription expires. There are two variations of subscription. The | |||
| subscriptions can be either pre-defined, or the subscribers are | subscriptions can be predefined, or the subscribers are allowed to | |||
| allowed to configure and tailor the published data to their specific | configure and tailor the published data to their specific needs. | |||
| needs. | ||||
| In contrast, queries are used when a client expects immediate and | In contrast, queries are used when a client expects immediate and | |||
| one-off feedback from network devices. The queried data may be | one-off feedback from network devices. The queried data may be | |||
| directly extracted from some specific data source, or synthesized and | directly extracted from some specific data source or synthesized and | |||
| processed from raw data. Queries work well for interactive network | processed from raw data. Queries work well for interactive network | |||
| telemetry applications. | telemetry applications. | |||
| In general, data can be pulled (i.e., queried) whenever needed, but | In general, data can be pulled (i.e., queried) whenever needed, but | |||
| in many cases, pushing the data (i.e., subscription) is more | in many cases, pushing the data (i.e., subscription) is more | |||
| efficient, and can reduce the latency of a client detecting a change. | efficient, and it can reduce the latency of a client detecting a | |||
| From the data consumer point of view, there are four types of data | change. From the data consumer point of view, there are four types | |||
| from network devices that a telemetry data consumer can subscribe or | of data from network devices that a telemetry data consumer can | |||
| query: | subscribe or query: | |||
| * Simple Data: The data that are steadily available from some | * Simple Data: Data that are steadily available from some datastore | |||
| datastore or static probes in network devices. | or static probes in network devices. | |||
| * Derived Data: The data need to be synthesized or processed in | * Derived Data: Data that need to be synthesized or processed in the | |||
| network from raw data from one or more network devices. The data | network from raw data from one or more network devices. The data | |||
| processing function can be statically or dynamically loaded into | processing function can be statically or dynamically loaded into | |||
| network devices. | network devices. | |||
| * Event-triggered Data: The data are conditionally acquired based on | * Event-triggered Data: Data that are conditionally acquired based | |||
| the occurrence of some events. An example of event-triggered data | on the occurrence of some events. An example of event-triggered | |||
| could be an interface changing operational state between up and | data could be an interface changing operational state between up | |||
| down. Such data can be actively pushed through subscription or | and down. Such data can be actively pushed through subscription | |||
| passively polled through query. There are many ways to model | or passively polled through query. There are many ways to model | |||
| events, including using Finite State Machine (FSM) or Event | events, including using Finite State Machine (FSM) or Event | |||
| Condition Action (ECA) [I-D.wwx-netmod-event-yang]. | Condition Action (ECA) [NETMOD-ECA-POLICY]. | |||
| * Streaming Data: The data are continuously generated. It can be | * Streaming Data: Data that are continuously generated. It can be a | |||
| time series or the dump of databases. For example, an interface | time series or the dump of databases. For example, an interface | |||
| packet counter is exported every second. The streaming data | packet counter is exported every second. The streaming data | |||
| reflect realtime network states and metrics and require large | reflect real-time network states and metrics and require large | |||
| bandwidth and processing power. The streaming data are always | bandwidth and processing power. The streaming data are always | |||
| actively pushed to the subscribers. | actively pushed to the subscribers. | |||
| The above telemetry data types are not mutually exclusive. Rather, | The above telemetry data types are not mutually exclusive. Rather, | |||
| they are often composite. Derived data is composed of simple data; | they are often composite. Derived data is composed of simple data; | |||
| Event-triggered data can be simple or derived; streaming data can be | event-triggered data can be simple or derived; and streaming data can | |||
| based on some recurring event. The relationships of these data types | be based on some recurring event. The relationships of these data | |||
| are illustrated in Figure 4. | types are illustrated in Figure 3. | |||
| +----------------------+ +-----------------+ | +----------------------+ +-----------------+ | |||
| | Event-triggered Data |<----+ Streaming Data | | | Event-Triggered Data |<----+ Streaming Data | | |||
| +-------+---+----------+ +-----+---+-------+ | +-------+---+----------+ +-----+---+-------+ | |||
| | | | | | | | | | | |||
| | | | | | | | | | | |||
| | | +--------------+ | | | | | +--------------+ | | | |||
| | +-->| Derived Data |<--+ | | | +-->| Derived Data |<--+ | | |||
| | +------+------ + | | | +------+------ + | | |||
| | | | | | | | | |||
| | V | | | V | | |||
| | +--------------+ | | | +--------------+ | | |||
| +------>| Simple Data |<------+ | +------>| Simple Data |<------+ | |||
| +--------------+ | +--------------+ | |||
| Figure 4: Data Type Relationship | Figure 3: Data Type Relationship | |||
| Subscription usually deals with event-triggered data and streaming | Subscription usually deals with event-triggered data and streaming | |||
| data, and query usually deals with simple data and derived data. But | data, and query usually deals with simple data and derived data. But | |||
| the other ways are also possible. Advanced network telemetry | the other ways are also possible. Advanced network telemetry | |||
| techniques are designed mainly for event-triggered or streaming data | techniques are designed mainly for event-triggered or streaming data | |||
| subscription, and derived data query. | subscription and derived data query. | |||
| 3.4. Mapping Existing Mechanisms into the Framework | 3.4. Mapping Existing Mechanisms into the Framework | |||
| The following table shows how the existing mechanisms (mainly | The following table shows how the existing mechanisms (mainly | |||
| published in IETF and with the emphasis on the latest new | published in IETF and with the emphasis on the latest new | |||
| technologies) are positioned in the framework. Given the vast body | technologies) are positioned in the framework. Given the vast body | |||
| of existing work, we cannot provide an exhaustive list, so the | of existing work, we cannot provide an exhaustive list, so the | |||
| mechanisms in the tables should be considered as just examples. | mechanisms in the tables should be considered as just examples. | |||
| Also, some comprehensive protocols and techniques may cover multiple | Also, some comprehensive protocols and techniques may cover multiple | |||
| aspects or modules of the framework, so a name in a block only | aspects or modules of the framework, so a name in a block only | |||
| emphasizes one particular characteristic of it. More details about | emphasizes one particular characteristic of it. More details about | |||
| some listed mechanisms can be found in Appendix A. | some listed mechanisms can be found in Appendix A. | |||
| +-------------+-----------------+---------------+--------------+ | +===============+=================+================+============+ | |||
| | | Management | Control | Forwarding | | | | Management | Control Plane | Forwarding | | |||
| | | Plane | Plane | Plane | | | | Plane | | Plane | | |||
| +-------------+-----------------+---------------+--------------+ | +===============+=================+================+============+ | |||
| | data config.| gNMI, NETCONF, | gNMI, NETCONF,| NETCONF, | | | data | gNMI, NETCONF, | gNMI, NETCONF, | NETCONF, | | |||
| | & subscribe | RESTCONF, SNMP, | RESTCONF, | RESTCONF, | | | configuration | RESTCONF, SNMP, | RESTCONF, | RESTCONF, | | |||
| | | YANG-Push | YANG-Push | YANG-Push | | | and subscribe | YANG-Push | YANG-Push | YANG-Push | | |||
| +-------------+-----------------+---------------+--------------+ | +---------------+-----------------+----------------+------------+ | |||
| | data gen. & | MIB, | YANG | IOAM, PSAMP | | | data | MIB, YANG | YANG | IOAM, | | |||
| | process | YANG | | PBT, AM, | | | generation | | | PSAMP, | | |||
| +-------------+-----------------+---------------+--------------+ | | and process | | | PBT, AM | | |||
| | data encode.| gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP | | +---------------+-----------------+----------------+------------+ | |||
| | & export | | | | | | data encoding | gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP | | |||
| +-------------+-----------------+---------------+--------------+ | | and export | | | | | |||
| Figure 5: Existing Work Mapping | +---------------+-----------------+----------------+------------+ | |||
| Table 2: Existing Work Mapping | ||||
| Although the framework is generally suitable for any network | Although the framework is generally suitable for any network | |||
| environments, the multi-domain telemetry has some unique challenges | environments, the multi-domain telemetry has some unique challenges | |||
| which deserve further architectural consideration, which is out of | that deserve further architectural consideration, which is out of the | |||
| the scope of this document. | scope of this document. | |||
| 4. Evolution of Network Telemetry Applications | 4. Evolution of Network Telemetry Applications | |||
| Network telemetry is an evolving technical area. As the network | Network telemetry is an evolving technical area. As the network | |||
| moves towards the automated operation, network telemetry applications | moves towards the automated operation, network telemetry applications | |||
| undergo several stages of evolution which add new layer of | undergo several stages of evolution, which add a new layer of | |||
| requirements to the underlying network telemetry techniques. Each | requirements to the underlying network telemetry techniques. Each | |||
| stage is built upon the techniques adopted by the previous stages | stage is built upon the techniques adopted by the previous stages | |||
| plus some new requirements. | plus some new requirements. | |||
| Stage 0 - Static Telemetry: The telemetry data source and type are | Stage 0 - Static Telemetry: The telemetry data source and type are | |||
| determined at design time. The network operator can only | determined at design time. The network operator can only | |||
| configure how to use it with limited flexibility. | configure how to use it with limited flexibility. | |||
| Stage 1 - Dynamic Telemetry: The custom telemetry data can be | Stage 1 - Dynamic Telemetry: The custom telemetry data can be | |||
| dynamically programmed or configured at runtime without | dynamically programmed or configured at runtime without | |||
| interrupting the network operation, allowing a trade-off among | interrupting the network operation, allowing a trade-off among | |||
| resource, performance, flexibility, and coverage. | resource, performance, flexibility, and coverage. | |||
| Stage 2 - Interactive Telemetry: The network operator can | Stage 2 - Interactive Telemetry: The network operator can | |||
| continuously customize and fine tune the telemetry data in real | continuously customize and fine tune the telemetry data in real | |||
| time to reflect the network operation's visibility requirements. | time to reflect the network operation's visibility requirements. | |||
| Compared with Stage 1, the changes are frequent based on the real- | Compared with Stage 1, the changes are frequent based on the real- | |||
| time feedback. At this stage, some tasks can be automated, but | time feedback. At this stage, some tasks can be automated, but | |||
| human operators still need to sit in the middle to make decisions. | human operators still need to sit in the middle to make decisions. | |||
| Stage 3 - Closed-loop Telemetry: The telemetry is free from the | Stage 3 - Closed-Loop Telemetry: The telemetry is free from the | |||
| interference of human operators, except for generating the | interference of human operators, except for generating the | |||
| reports. The intelligent network operation engine automatically | reports. The intelligent network operation engine automatically | |||
| issues the telemetry data requests, analyzes the data, and updates | issues the telemetry data requests, analyzes the data, and updates | |||
| the network operations in closed control loops. | the network operations in closed control loops. | |||
| Existing technologies are ready for stage 0 and stage 1. Individual | Existing technologies are ready for Stages 0 and 1. Individual | |||
| stage 2 and stage 3 applications are also possible now. However, the | applications for Stages 2 and 3 are also possible now. However, the | |||
| future autonomic networks may need a comprehensive operation | future autonomic networks may need a comprehensive operation | |||
| management system which works at stage 2 and stage 3 to cover all the | management system that works at Stages 2 and 3 to cover all the | |||
| network operation tasks. A well-defined network telemetry framework | network operation tasks. A well-defined network telemetry framework | |||
| is the first step towards this direction. | is the first step towards this direction. | |||
| 5. Security Considerations | 5. Security Considerations | |||
| The complexity of network telemetry raises significant security | The complexity of network telemetry raises significant security | |||
| implications. For example, telemetry data can be manipulated to | implications. For example, telemetry data can be manipulated to | |||
| exhaust various network resources at each plane as well as the data | exhaust various network resources at each plane as well as the data | |||
| consumer; falsified or tampered data can mislead the decision-making | consumer; falsified or tampered data can mislead the decision-making | |||
| and paralyze networks; wrong configuration and programming for | process and paralyze networks; and wrong configuration and | |||
| telemetry is equally harmful. The telemetry data is highly | programming for telemetry is equally harmful. The telemetry data is | |||
| sensitive, which exposes a lot of information about the network and | highly sensitive, which exposes a lot of information about the | |||
| its configuration. Some of that information can make designing | network and its configuration. Some of that information can make | |||
| attacks against the network much easier (e.g., exact details of what | designing attacks against the network much easier (e.g., exact | |||
| software and patches have been installed), and allows an attacker to | details of what software and patches have been installed) and allows | |||
| determine whether a device may be subject to unprotected security | an attacker to determine whether a device may be subject to | |||
| vulnerabilities. | unprotected security vulnerabilities. | |||
| Given that this document has proposed a framework for network | Given that this document has proposed a framework for network | |||
| telemetry and the telemetry mechanisms discussed are more extensive | telemetry and the telemetry mechanisms discussed are more extensive | |||
| (in both message frequency and traffic amount) than the conventional | (in both message frequency and traffic amount) than the conventional | |||
| network OAM concepts, we must also reflect that various new security | network OAM concepts, we must also anticipate that new security | |||
| considerations may also arise. A number of techniques already exist | considerations that may also arise. A number of techniques already | |||
| for securing the forwarding plane, the control plane, and the | exist for securing the forwarding plane, control plane, and | |||
| management plane in a network, but it is important to consider if any | management plane in a network, but it is important to consider if any | |||
| new threat vectors are now being enabled via the use of network | new threat vectors are now being enabled via the use of network | |||
| telemetry procedures and mechanisms. | telemetry procedures and mechanisms. | |||
| This document proposes a conceptual architectural for collecting, | This document proposes a conceptual architectural for collecting, | |||
| transporting, and analyzing a wide variety of data sources in support | transporting, and analyzing a wide variety of data sources in support | |||
| of network applications. The protocols, data formats, and | of network applications. The protocols, data formats, and | |||
| configurations chosen to implement this framework will dictate the | configurations chosen to implement this framework will dictate the | |||
| specific security considerations. These considerations may include: | specific security considerations. These considerations may include: | |||
| * Telemetry framework trust and policy model; | * Telemetry framework trust and policy models; | |||
| * Role management and access control for enabling and disabling | * Role management and access control for enabling and disabling | |||
| telemetry capabilities; | telemetry capabilities; | |||
| * Protocol transport used for telemetry data and its inherent | * Protocol transport used for telemetry data and its inherent | |||
| security capabilities; | security capabilities; | |||
| * Telemetry data stores, storage encryption, methods of access, and | * Telemetry data stores, storage encryption, methods of access, and | |||
| retention practices; | retention practices; | |||
| * Tracking telemetry events and any abnormalities that might | * Tracking telemetry events and any abnormalities that might | |||
| identify malicious attacks using telemetry interfaces. | identify malicious attacks using telemetry interfaces. | |||
| * Authentication and integrity protection of telemetry data to make | * Authentication and integrity protection of telemetry data to make | |||
| data more trustworthy. | data more trustworthy; and | |||
| * Segregating the telemetry data traffic from the data traffic | * Segregating the telemetry data traffic from the data traffic | |||
| carried over the network (e.g., historically management access and | carried over the network (e.g., historically management access and | |||
| management data may be carried via an independent management | management data may be carried via an independent management | |||
| network). | network). | |||
| Some security considerations highlighted above may be minimized or | Some security considerations highlighted above may be minimized or | |||
| negated with policy management of network telemetry. In a network | negated with policy management of network telemetry. In a network | |||
| telemetry deployment it would be advantageous to separate telemetry | telemetry deployment, it would be advantageous to separate telemetry | |||
| capabilities into different classes of policies, i.e., Role Based | capabilities into different classes of policies, i.e., Role-Based | |||
| Access Control and Event-Condition-Action policies. Also, potential | Access Control and Event-Condition-Action policies. Also, potential | |||
| conflicts between network telemetry mechanisms must be detected | conflicts between network telemetry mechanisms must be detected | |||
| accurately and resolved quickly to avoid unnecessary network | accurately and resolved quickly to avoid unnecessary network | |||
| telemetry traffic propagation escalating into an unintended or | telemetry traffic propagation escalating into an unintended or | |||
| intended denial of service attack. | intended denial-of-service attack. | |||
| Further study of the security issues will be required, and it is | Further study of the security issues will be required, and it is | |||
| expected that the security mechanisms and protocols are developed and | expected that the security mechanisms and protocols are developed and | |||
| deployed along with a network telemetry system. | deployed along with a network telemetry system. | |||
| 6. IANA Considerations | 6. IANA Considerations | |||
| This document includes no request to IANA. | This document has no IANA actions. | |||
| 7. Contributors | ||||
| The other contributors of this document are Tianran Zhou, Zhenbin Li, | ||||
| Zhenqiang Li, Daniel King, Adrian Farrel, and Alexander Clemm | ||||
| 8. Acknowledgments | ||||
| We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe | ||||
| Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe | ||||
| Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra, | ||||
| Ben Schwartz, Alexey Melnikov, Michael Scharf, Dhruv Dhody, Martin | ||||
| Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars Eggert, Eric | ||||
| Vyncke, Jean-Michel Combes, Erik Kline, Benjamin Kaduk, and many | ||||
| others who have provided helpful comments and suggestions to improve | ||||
| this document. | ||||
| 9. Informative References | 7. Informative References | |||
| [gnmi] "gNMI - gRPC Network Management Interface", | [gnmi] Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack, | |||
| <https://github.com/openconfig/reference/tree/master/rpc/ | C., and C. Marrow, "gRPC Network Management Interface", | |||
| gnmi>. | IETF 98, March 2017, | |||
| <https://datatracker.ietf.org/meeting/98/materials/slides- | ||||
| 98-rtgwg-gnmi-intro-draft-openconfig-rtgwg-gnmi-spec-00>. | ||||
| [gpb] "Google Protocol Buffers", | [gpb] Google Developers, "Protocol Buffers", | |||
| <https://developers.google.com/protocol-buffers>. | <https://developers.google.com/protocol-buffers>. | |||
| [grpc] "gPPC, A high performance, open-source universal RPC | [grpc] gRPC, "gPPC: A high performance, open source universal RPC | |||
| framework", <https://grpc.io>. | framework", <https://grpc.io>. | |||
| [I-D.ietf-grow-bmp-local-rib] | [IPPM-IOAM-DIRECT-EXPORT] | |||
| Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, | ||||
| "Support for Local RIB in BGP Monitoring Protocol (BMP)", | ||||
| Work in Progress, Internet-Draft, draft-ietf-grow-bmp- | ||||
| local-rib-13, 31 August 2021, | ||||
| <https://www.ietf.org/archive/id/draft-ietf-grow-bmp- | ||||
| local-rib-13.txt>. | ||||
| [I-D.ietf-ippm-ioam-data] | ||||
| Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields | ||||
| for In-situ OAM", Work in Progress, Internet-Draft, draft- | ||||
| ietf-ippm-ioam-data-16, 8 November 2021, | ||||
| <https://www.ietf.org/archive/id/draft-ietf-ippm-ioam- | ||||
| data-16.txt>. | ||||
| [I-D.ietf-ippm-ioam-direct-export] | ||||
| Song, H., Gafni, B., Zhou, T., Li, Z., Brockners, F., | Song, H., Gafni, B., Zhou, T., Li, Z., Brockners, F., | |||
| Bhandari, S., Sivakolundu, R., and T. Mizrahi, "In-situ | Bhandari, S., Ed., Sivakolundu, R., and T. Mizrahi, Ed., | |||
| OAM Direct Exporting", Work in Progress, Internet-Draft, | "In-situ OAM Direct Exporting", Work in Progress, | |||
| draft-ietf-ippm-ioam-direct-export-07, 13 October 2021, | Internet-Draft, draft-ietf-ippm-ioam-direct-export-07, 13 | |||
| <https://www.ietf.org/archive/id/draft-ietf-ippm-ioam- | October 2021, <https://datatracker.ietf.org/doc/html/ | |||
| direct-export-07.txt>. | draft-ietf-ippm-ioam-direct-export-07>. | |||
| [I-D.ietf-netconf-distributed-notif] | [IPPM-POSTCARD-BASED-TELEMETRY] | |||
| Song, H., Mirsky, G., Filsfils, C., Abdelsalam, A., Zhou, | ||||
| T., Li, Z., Mishra, G., Shin, J., and K. Lee, "In-Situ OAM | ||||
| Marking-based Direct Export", Work in Progress, Internet- | ||||
| Draft, draft-song-ippm-postcard-based-telemetry-12, 12 May | ||||
| 2022, <https://datatracker.ietf.org/doc/html/draft-song- | ||||
| ippm-postcard-based-telemetry-12>. | ||||
| [NETCONF-DISTRIB-NOTIF] | ||||
| Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, | Zhou, T., Zheng, G., Voit, E., Graf, T., and P. Francois, | |||
| "Subscription to Distributed Notifications", Work in | "Subscription to Distributed Notifications", Work in | |||
| Progress, Internet-Draft, draft-ietf-netconf-distributed- | Progress, Internet-Draft, draft-ietf-netconf-distributed- | |||
| notif-02, 6 May 2021, <https://www.ietf.org/archive/id/ | notif-03, 10 January 2022, | |||
| draft-ietf-netconf-distributed-notif-02.txt>. | <https://datatracker.ietf.org/doc/html/draft-ietf-netconf- | |||
| distributed-notif-03>. | ||||
| [I-D.ietf-netconf-udp-notif] | [NETCONF-UDP-NOTIF] | |||
| Zheng, G., Zhou, T., Graf, T., Francois, P., Feng, A. H., | Zheng, G., Zhou, T., Graf, T., Francois, P., Feng, A. H., | |||
| and P. Lucente, "UDP-based Transport for Configured | and P. Lucente, "UDP-based Transport for Configured | |||
| Subscriptions", Work in Progress, Internet-Draft, draft- | Subscriptions", Work in Progress, Internet-Draft, draft- | |||
| ietf-netconf-udp-notif-04, 21 October 2021, | ietf-netconf-udp-notif-05, 4 March 2022, | |||
| <https://www.ietf.org/archive/id/draft-ietf-netconf-udp- | <https://datatracker.ietf.org/doc/html/draft-ietf-netconf- | |||
| notif-04.txt>. | udp-notif-05>. | |||
| [I-D.irtf-nmrg-ibn-concepts-definitions] | [NETMOD-ECA-POLICY] | |||
| Clemm, A., Ciavaglia, L., Granville, L. Z., and J. | Wu, Q., Bryskin, I., Birkholz, H., Liu, X., and B. Claise, | |||
| Tantsura, "Intent-Based Networking - Concepts and | "A YANG Data model for ECA Policy Management", Work in | |||
| Definitions", Work in Progress, Internet-Draft, draft- | Progress, Internet-Draft, draft-ietf-netmod-eca-policy-01, | |||
| irtf-nmrg-ibn-concepts-definitions-05, 2 September 2021, | 19 February 2021, <https://datatracker.ietf.org/doc/html/ | |||
| <https://www.ietf.org/archive/id/draft-irtf-nmrg-ibn- | draft-ietf-netmod-eca-policy-01>. | |||
| concepts-definitions-05.txt>. | ||||
| [I-D.pedro-nmrg-anticipated-adaptation] | [NMRG-ANTICIPATED-ADAPTATION] | |||
| Martinez-Julia, P., "Exploiting External Event Detectors | Martinez-Julia, P., Ed., "Exploiting External Event | |||
| to Anticipate Resource Requirements for the Elastic | Detectors to Anticipate Resource Requirements for the | |||
| Adaptation of SDN/NFV Systems", Work in Progress, | Elastic Adaptation of SDN/NFV Systems", Work in Progress, | |||
| Internet-Draft, draft-pedro-nmrg-anticipated-adaptation- | Internet-Draft, draft-pedro-nmrg-anticipated-adaptation- | |||
| 02, 29 June 2018, <https://www.ietf.org/archive/id/draft- | 02, 29 June 2018, <https://datatracker.ietf.org/doc/html/ | |||
| pedro-nmrg-anticipated-adaptation-02.txt>. | draft-pedro-nmrg-anticipated-adaptation-02>. | |||
| [I-D.song-ippm-postcard-based-telemetry] | ||||
| Song, H., Mirsky, G., Filsfils, C., Abdelsalam, A., Zhou, | ||||
| T., Li, Z., Shin, J., and K. Lee, "In-Situ OAM Marking- | ||||
| based Direct Export", Work in Progress, Internet-Draft, | ||||
| draft-song-ippm-postcard-based-telemetry-11, 15 November | ||||
| 2021, <https://www.ietf.org/archive/id/draft-song-ippm- | ||||
| postcard-based-telemetry-11.txt>. | ||||
| [I-D.song-opsawg-dnp4iq] | [NMRG-IBN-CONCEPTS-DEFINITIONS] | |||
| Song, H. and J. Gong, "Requirements for Interactive Query | Clemm, A., Ciavaglia, L., Granville, L. Z., and J. | |||
| with Dynamic Network Probes", Work in Progress, Internet- | Tantsura, "Intent-Based Networking - Concepts and | |||
| Draft, draft-song-opsawg-dnp4iq-01, 19 June 2017, | Definitions", Work in Progress, Internet-Draft, draft- | |||
| <https://www.ietf.org/archive/id/draft-song-opsawg-dnp4iq- | irtf-nmrg-ibn-concepts-definitions-09, 24 March 2022, | |||
| 01.txt>. | <https://datatracker.ietf.org/doc/html/draft-irtf-nmrg- | |||
| ibn-concepts-definitions-09>. | ||||
| [I-D.song-opsawg-ifit-framework] | [OPSAWG-DNP4IQ] | |||
| Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- | Song, H., Ed. and J. Gong, "Requirements for Interactive | |||
| situ Flow Information Telemetry", Work in Progress, | Query with Dynamic Network Probes", Work in Progress, | |||
| Internet-Draft, draft-song-opsawg-ifit-framework-16, 21 | Internet-Draft, draft-song-opsawg-dnp4iq-01, 19 June 2017, | |||
| October 2021, <https://www.ietf.org/archive/id/draft-song- | <https://datatracker.ietf.org/doc/html/draft-song-opsawg- | |||
| opsawg-ifit-framework-16.txt>. | dnp4iq-01>. | |||
| [I-D.wwx-netmod-event-yang] | [OPSAWG-IFIT-FRAMEWORK] | |||
| Wu, Q., Bryskin, I., Birkholz, H., Liu, X., and B. Claise, | Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "A | |||
| "A YANG Data model for ECA Policy Management", Work in | Framework for In-situ Flow Information Telemetry", Work in | |||
| Progress, Internet-Draft, draft-wwx-netmod-event-yang-10, | Progress, Internet-Draft, draft-song-opsawg-ifit- | |||
| 1 November 2020, <https://www.ietf.org/archive/id/draft- | framework-17, 22 February 2022, | |||
| wwx-netmod-event-yang-10.txt>. | <https://datatracker.ietf.org/doc/html/draft-song-opsawg- | |||
| ifit-framework-17>. | ||||
| [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, | [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, | |||
| "Simple Network Management Protocol (SNMP)", RFC 1157, | "Simple Network Management Protocol (SNMP)", RFC 1157, | |||
| DOI 10.17487/RFC1157, May 1990, | DOI 10.17487/RFC1157, May 1990, | |||
| <https://www.rfc-editor.org/info/rfc1157>. | <https://www.rfc-editor.org/info/rfc1157>. | |||
| [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. | [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. | |||
| Schoenwaelder, Ed., "Structure of Management Information | Schoenwaelder, Ed., "Structure of Management Information | |||
| Version 2 (SMIv2)", STD 58, RFC 2578, | Version 2 (SMIv2)", STD 58, RFC 2578, | |||
| DOI 10.17487/RFC2578, April 1999, | DOI 10.17487/RFC2578, April 1999, | |||
| skipping to change at page 35, line 22 ¶ | skipping to change at line 1578 ¶ | |||
| Hybrid Performance Monitoring", RFC 8889, | Hybrid Performance Monitoring", RFC 8889, | |||
| DOI 10.17487/RFC8889, August 2020, | DOI 10.17487/RFC8889, August 2020, | |||
| <https://www.rfc-editor.org/info/rfc8889>. | <https://www.rfc-editor.org/info/rfc8889>. | |||
| [RFC8924] Aldrin, S., Pignataro, C., Ed., Kumar, N., Ed., Krishnan, | [RFC8924] Aldrin, S., Pignataro, C., Ed., Kumar, N., Ed., Krishnan, | |||
| R., and A. Ghanwani, "Service Function Chaining (SFC) | R., and A. Ghanwani, "Service Function Chaining (SFC) | |||
| Operations, Administration, and Maintenance (OAM) | Operations, Administration, and Maintenance (OAM) | |||
| Framework", RFC 8924, DOI 10.17487/RFC8924, October 2020, | Framework", RFC 8924, DOI 10.17487/RFC8924, October 2020, | |||
| <https://www.rfc-editor.org/info/rfc8924>. | <https://www.rfc-editor.org/info/rfc8924>. | |||
| [xml] "Extensible Markup Language (XML) 1.0 (Fifth Edition)", | [RFC9069] Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, | |||
| <https://www.w3.org/TR/2008/REC-xml-20081126/>. | "Support for Local RIB in the BGP Monitoring Protocol | |||
| (BMP)", RFC 9069, DOI 10.17487/RFC9069, February 2022, | ||||
| <https://www.rfc-editor.org/info/rfc9069>. | ||||
| [y1731] "ITU-T Y.1731: OAM Functions and Mechanisms for Ethernet | [RFC9197] Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi, | |||
| based networks, 2015", | Ed., "Data Fields for In Situ Operations, Administration, | |||
| and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197, | ||||
| May 2022, <https://www.rfc-editor.org/info/rfc9197>. | ||||
| [W3C.REC-xml-20081126] | ||||
| Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and | ||||
| F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth | ||||
| Edition)", World Wide Web Consortium Recommendation REC- | ||||
| xml-20081126, November 2008, | ||||
| <https://www.w3.org/TR/2008/REC-xml-20081126>. | ||||
| [y1731] ITU-T, "Operations, administration and maintenance (OAM) | ||||
| functions and mechanisms for Ethernet-based networks", | ||||
| ITU-T Recommendation G.8013/Y.1731, August 2015, | ||||
| <https://www.itu.int/rec/T-REC-Y.1731/en>. | <https://www.itu.int/rec/T-REC-Y.1731/en>. | |||
| Appendix A. A Survey on Existing Network Telemetry Techniques | Appendix A. A Survey on Existing Network Telemetry Techniques | |||
| In this non-normative appendix, we provide an overview of some | In this non-normative appendix, we provide an overview of some | |||
| existing techniques and standard proposals for each network telemetry | existing techniques and standard proposals for each network telemetry | |||
| module. | module. | |||
| A.1. Management Plane Telemetry | A.1. Management Plane Telemetry | |||
| A.1.1. Push Extensions for NETCONF | A.1.1. Push Extensions for NETCONF | |||
| NETCONF [RFC6241] is a popular network management protocol | NETCONF [RFC6241] is a popular network management protocol | |||
| recommended by IETF. Its core strength is for managing | recommended by IETF. Its core strength is for managing | |||
| configuration, but can also be used for data collection. YANG-Push | configuration, but it can also be used for data collection. | |||
| [RFC8641] [RFC8639] extends NETCONF and enables subscriber | YANG-Push [RFC8639] [RFC8641] extends NETCONF and enables subscriber | |||
| applications to request a continuous, customized stream of updates | applications to request a continuous, customized stream of updates | |||
| from a YANG datastore. Providing such visibility into changes made | from a YANG datastore. Providing such visibility into changes made | |||
| upon YANG configuration and operational objects enables new | upon YANG configuration and operational objects enables new | |||
| capabilities based on the remote mirroring of configuration and | capabilities based on the remote mirroring of configuration and | |||
| operational state. Moreover, distributed data collection mechanism | operational state. Moreover, a distributed data collection mechanism | |||
| [I-D.ietf-netconf-distributed-notif] via UDP based publication | [NETCONF-DISTRIB-NOTIF] via a UDP-based publication channel | |||
| channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for | [NETCONF-UDP-NOTIF] provides enhanced efficiency for the NETCONF- | |||
| the NETCONF based telemetry. | based telemetry. | |||
| A.1.2. gRPC Network Management Interface | A.1.2. gRPC Network Management Interface | |||
| gRPC Network Management Interface (gNMI) [gnmi] is a network | gRPC Network Management Interface (gNMI) [gnmi] is a network | |||
| management protocol based on the gRPC [grpc] RPC (Remote Procedure | management protocol based on the gRPC [grpc] Remote Procedure Call | |||
| Call) framework. With a single gRPC service definition, both | (RPC) framework. With a single gRPC service definition, both | |||
| configuration and telemetry can be covered. gRPC is an HTTP/2 | configuration and telemetry can be covered. gRPC is an open-source | |||
| [RFC7540]-based open-source micro-service communication framework. | micro-service communication framework based on HTTP/2 [RFC7540]. It | |||
| It provides a number of capabilities which are well-suited for | provides a number of capabilities that are well-suited for network | |||
| network telemetry, including: | telemetry, including: | |||
| * Full-duplex streaming transport model combined with a binary | * A full-duplex streaming transport model; when combined with a | |||
| encoding mechanism provides good telemetry efficiency. | binary encoding mechanism, it provides good telemetry efficiency. | |||
| * gRPC provides higher-level features consistency across platforms | * A higher-level feature consistency across platforms that common | |||
| that common HTTP/2 libraries typically do not. This | HTTP/2 libraries typically do not provide. This characteristic is | |||
| characteristic is especially valuable for the fact that telemetry | especially valuable for the fact that telemetry data collectors | |||
| data collectors normally reside on a large variety of platforms. | normally reside on a large variety of platforms. | |||
| * The built-in load-balancing and failover mechanism. | * A built-in load-balancing and failover mechanism. | |||
| A.2. Control Plane Telemetry | A.2. Control Plane Telemetry | |||
| A.2.1. BGP Monitoring Protocol | A.2.1. BGP Monitoring Protocol | |||
| BGP Monitoring Protocol (BMP) [RFC7854] is used to monitor BGP | BMP [RFC7854] is used to monitor BGP sessions and is intended to | |||
| sessions and is intended to provide a convenient interface for | provide a convenient interface for obtaining route views. | |||
| obtaining route views. | ||||
| The BGP routing information is collected from the monitored device(s) | BGP routing information is collected from the monitored device(s) to | |||
| to the BMP monitoring station by setting up the BMP TCP session. The | the BMP monitoring station by setting up the BMP TCP session. The | |||
| BGP peers are monitored by the BMP Peer Up and Peer Down | BGP peers are monitored by the BMP Peer Up and Peer Down | |||
| Notifications. The BGP routes (including Adjacency_RIB_In [RFC7854], | notifications. The BGP routes (including Adj_RIB_In [RFC7854], | |||
| Adjacency_RIB_out [RFC8671], and Local_Rib | Adj_RIB_out [RFC8671], and local RIB [RFC9069]) are encapsulated in | |||
| [I-D.ietf-grow-bmp-local-rib]) are encapsulated in the BMP Route | the BMP Route Monitoring Message and the BMP Route Mirroring Message, | |||
| Monitoring Message and the BMP Route Mirroring Message, providing | providing both an initial table dump and real-time route updates. In | |||
| both an initial table dump and real-time route updates. In addition, | addition, BGP statistics are reported through the BMP Stats Report | |||
| BGP statistics are reported through the BMP Stats Report Message, | Message, which could be either timer triggered or event-driven. | |||
| which could be either timer triggered or event-driven. Future BMP | Future BMP extensions could further enrich BGP monitoring | |||
| extensions could further enrich BGP monitoring applications. | applications. | |||
| A.3. Data Plane Telemetry | A.3. Data Plane Telemetry | |||
| A.3.1. The Alternate Marking (AM) technology | A.3.1. Alternate-Marking (AM) Technology | |||
| The Alternate Marking method enables efficient measurements of packet | The Alternate-Marking method enables efficient measurements of packet | |||
| loss, delay, and jitter both in IP and Overlay Networks, as presented | loss, delay, and jitter both in IP and Overlay Networks, as presented | |||
| in [RFC8321] and [RFC8889]. | in [RFC8321] and [RFC8889]. | |||
| This technique can be applied to point-to-point and multipoint-to- | This technique can be applied to point-to-point and multipoint-to- | |||
| multipoint flows. Alternate Marking creates batches of packets by | multipoint flows. Alternate Marking creates batches of packets by | |||
| alternating the value of 1 bit (or a label) of the packet header. | alternating the value of 1 bit (or a label) of the packet header. | |||
| These batches of packets are unambiguously recognized over the | These batches of packets are unambiguously recognized over the | |||
| network and the comparison of packet counters for each batch allows | network, and the comparison of packet counters for each batch allows | |||
| the packet loss calculation. The same idea can be applied to delay | the packet loss calculation. The same idea can be applied to delay | |||
| measurement by selecting ad hoc packets with a marking bit dedicated | measurement by selecting ad hoc packets with a marking bit dedicated | |||
| for delay measurements. | for delay measurements. | |||
| Alternate Marking method needs two counters each marking period for | The Alternate-Marking method needs two counters each marking period | |||
| each flow under monitor. For instance, by considering n measurement | for each flow under monitor. For instance, by considering n | |||
| points and m monitored flows, the order of magnitude of the packet | measurement points and m monitored flows, the order of magnitude of | |||
| counters for each time interval is n*m*2 (1 per color). | the packet counters for each time interval is n*m*2 (1 per color). | |||
| Since networks offer rich sets of network performance measurement | Since networks offer rich sets of network performance measurement | |||
| data (e.g., packet counters), conventional approaches run into | data (e.g., packet counters), conventional approaches run into | |||
| limitations. The bottleneck is the generation and export of the data | limitations. The bottleneck is the generation and export of the data | |||
| and the amount of data that can be reasonably collected from the | and the amount of data that can be reasonably collected from the | |||
| network. In addition, management tasks related to determining and | network. In addition, management tasks related to determining and | |||
| configuring which data to generate lead to significant deployment | configuring which data to generate lead to significant deployment | |||
| challenges. | challenges. | |||
| The Multipoint Alternate Marking approach, described in [RFC8889], | The Multipoint Alternate-Marking approach, described in [RFC8889], | |||
| aims to resolve this issue and make the performance monitoring more | aims to resolve this issue and make the performance monitoring more | |||
| flexible in case a detailed analysis is not needed. | flexible in case a detailed analysis is not needed. | |||
| An application orchestrates network performance measurements tasks | An application orchestrates network performance measurement tasks | |||
| across the network to allow for optimized monitoring. The | across the network to allow for optimized monitoring. The | |||
| application can choose how roughly or precisely to configure | application can choose how roughly or precisely to configure | |||
| measurement points depending on the application's requirements. | measurement points depending on the application's requirements. | |||
| Using Alternate Marking, it is possible to monitor a Multipoint | Using Alternate Marking, it is possible to monitor a Multipoint | |||
| Network without in depth examination by using the Network Clustering | Network without in-depth examination by using Network Clustering | |||
| (subnetworks that are portions of the entire network that preserve | (subnetworks that are portions of the entire network that preserve | |||
| the same property of the entire network, called clusters). So in the | the same property of the entire network, called clusters). So in the | |||
| case that there is packet loss or the delay is too high then the | case where there is packet loss or the delay is too high, the | |||
| specific filtering criteria could be applied to gather a more | specific filtering criteria could be applied to gather a more | |||
| detailed analysis by using a different combination of clusters up to | detailed analysis by using a different combination of clusters up to | |||
| a per-flow measurement as described in Alternate-Marking (AM) | a per-flow measurement as described in the Alternate-Marking document | |||
| [RFC8321]. | [RFC8321]. | |||
| In summary, an application can configure end-to-end network | In summary, an application can configure end-to-end network | |||
| monitoring. If the network does not experience issues, this | monitoring. If the network does not experience issues, this | |||
| approximate monitoring is good enough and is very cheap in terms of | approximate monitoring is good enough and is very cheap in terms of | |||
| network resources. However, in case of problems, the application | network resources. However, in case of problems, the application | |||
| becomes aware of the issues from this approximate monitoring and, in | becomes aware of the issues from this approximate monitoring and, in | |||
| order to localize the portion of the network that has issues, | order to localize the portion of the network that has issues, | |||
| configures the measurement points more extensively, allowing more | configures the measurement points more extensively, allowing more | |||
| detailed monitoring to be performed. After the detection and | detailed monitoring to be performed. After the detection and | |||
| resolution of the problem, the initial approximate monitoring can be | resolution of the problem, the initial approximate monitoring can be | |||
| used again. | used again. | |||
| A.3.2. Dynamic Network Probe | A.3.2. Dynamic Network Probe | |||
| Hardware-based Dynamic Network Probe (DNP) [I-D.song-opsawg-dnp4iq] | A hardware-based Dynamic Network Probe (DNP) [OPSAWG-DNP4IQ] provides | |||
| proposes a programmable means to customize the data that an | a programmable means to customize the data that an application | |||
| application collects from the data plane. A direct benefit of DNP is | collects from the data plane. A direct benefit of DNP is the | |||
| the reduction of the exported data. A full DNP solution covers | reduction of the exported data. A full DNP solution covers several | |||
| several components including data source, data subscription, and data | components including data source, data subscription, and data | |||
| generation. The data subscription needs to define the derived data | generation. The data subscription needs to define the derived data | |||
| which can be composed and derived from the raw data sources. The | that can be composed and derived from raw data sources. The data | |||
| data generation takes advantage of the moderate in-network computing | generation takes advantage of the moderate in-network computing to | |||
| to produce the desired data. | produce the desired data. | |||
| While DNP can introduce unforeseeable flexibility to the data plane | While DNP can introduce unforeseeable flexibility to the data plane | |||
| telemetry, it also faces some challenges. It requires a flexible | telemetry, it also faces some challenges. It requires a flexible | |||
| data plane that can be dynamically reprogrammed at run-time. The | data plane that can be dynamically reprogrammed at runtime. The | |||
| programming API is yet to be defined. | programming Application Programming Interface (API) is yet to be | |||
| defined. | ||||
| A.3.3. IP Flow Information Export (IPFIX) Protocol | A.3.3. IP Flow Information Export (IPFIX) Protocol | |||
| Traffic on a network can be seen as a set of flows passing through | Traffic on a network can be seen as a set of flows passing through | |||
| network elements. IP Flow Information Export (IPFIX) [RFC7011] | network elements. IPFIX [RFC7011] provides a means of transmitting | |||
| provides a means of transmitting traffic flow information for | traffic flow information for administrative or other purposes. A | |||
| administrative or other purposes. A typical IPFIX enabled system | typical IPFIX-enabled system includes a pool of Metering Processes | |||
| includes a pool of Metering Processes that collects data packets at | that collects data packets at one or more Observation Points, | |||
| one or more Observation Points, optionally filters them and | optionally filters them, and aggregates information about these | |||
| aggregates information about these packets. An Exporter then gathers | packets. An Exporter then gathers each of the Observation Points | |||
| each of the Observation Points together into an Observation Domain | together into an Observation Domain and sends this information via | |||
| and sends this information via the IPFIX protocol to a Collector. | the IPFIX protocol to a Collector. | |||
| A.3.4. In-Situ OAM | A.3.4. In Situ OAM | |||
| Classical passive and active monitoring and measurement techniques | Classical passive and active monitoring and measurement techniques | |||
| are either inaccurate or resource-consuming. It is preferable to | are either inaccurate or resource consuming. It is preferable to | |||
| directly acquire data associated with a flow's packets when the | directly acquire data associated with a flow's packets when the | |||
| packets pass through a network. In-situ OAM (iOAM) | packets pass through a network. IOAM [RFC9197], a data generation | |||
| [I-D.ietf-ippm-ioam-data], a data generation technique, embeds a new | technique, embeds a new instruction header to user packets, and the | |||
| instruction header to user packets and the instruction directs the | instruction directs the network nodes to add the requested data to | |||
| network nodes to add the requested data to the packets. Thus, at the | the packets. Thus, at the path's end, the packet's experience gained | |||
| path end, the packet's experience gained on the entire forwarding | on the entire forwarding path can be collected. Such firsthand data | |||
| path can be collected. Such firsthand data is invaluable to many | is invaluable to many network OAM applications. | |||
| network OAM applications. | ||||
| However, iOAM also faces some challenges. The issues on performance | However, IOAM also faces some challenges. The issues on performance | |||
| impact, security, scalability and overhead limits, encapsulation | impact, security, scalability and overhead limits, encapsulation | |||
| difficulties in some protocols, and cross-domain deployment need to | difficulties in some protocols, and cross-domain deployment need to | |||
| be addressed. | be addressed. | |||
| A.3.5. Postcard Based Telemetry | A.3.5. Postcard-Based Telemetry | |||
| The postcard-based telemetry, as embodied in IOAM DEX | The postcard-based telemetry, as embodied in IOAM Direct Export (DEX) | |||
| [I-D.ietf-ippm-ioam-direct-export] and IOAM Marking | [IPPM-IOAM-DIRECT-EXPORT] and IOAM Marking | |||
| [I-D.song-ippm-postcard-based-telemetry], is a complementary | [IPPM-POSTCARD-BASED-TELEMETRY], is a complementary technique to the | |||
| technique to the passport-based IOAM. PBT directly exports data at | passport-based IOAM [RFC9197]. PBT directly exports data at each | |||
| each node through an independent packet. At the cost of higher | node through an independent packet. At the cost of higher bandwidth | |||
| bandwidth overhead and the need for data correlation, PBT shows | overhead and the need for data correlation, PBT shows several unique | |||
| several unique advantages. It can also help to identify packet drop | advantages. It can also help to identify packet drop location in | |||
| location in case a packet is dropped on its forwarding path. | case a packet is dropped on its forwarding path. | |||
| A.3.6. Existing OAM for Specific Data Planes | A.3.6. Existing OAM for Specific Data Planes | |||
| Various data planes raise unique OAM requirements. IETF has | Various data planes raise unique OAM requirements. IETF has | |||
| published OAM technique and framework documents (e.g., [RFC8924] and | published OAM technique and framework documents (e.g., [RFC8924] and | |||
| [RFC5085]) targeting different data planes such as Multi-Protocol | [RFC5085]) targeting different data planes such as Multiprotocol | |||
| Label Switching (MPLS), L2 Virtual Private Network (L2-VPN), Network | Label Switching (MPLS), L2 Virtual Private Network (VPN), Network | |||
| Virtualization Overlays (NVO3), Virtual Extensible LAN (VXLAN), Bit | Virtualization over Layer 3 (NVO3), Virtual Extensible LAN (VXLAN), | |||
| Indexed Explicit Replication (BIER), Service Function Chaining (SFC), | Bit Index Explicit Replication (BIER), Service Function Chaining | |||
| Segment Routing (SR), and Deterministic Networking (DETNET). The | (SFC), Segment Routing (SR), and Deterministic Networking (DETNET). | |||
| aforementioned data plane telemetry techniques can be used to enhance | The aforementioned data plane telemetry techniques can be used to | |||
| the OAM capability on such data planes. | enhance the OAM capability on such data planes. | |||
| A.4. External Data and Event Telemetry | A.4. External Data and Event Telemetry | |||
| A.4.1. Sources of External Events | A.4.1. Sources of External Events | |||
| To ensure that the information provided by external event detectors | To ensure that the information provided by external event detectors | |||
| and used by the network management solutions is meaningful for | and used by the network management solutions is meaningful for | |||
| management purposes, the network telemetry framework must ensure that | management purposes, the network telemetry framework must ensure that | |||
| such detectors (sources) are easily connected to the management | such detectors (sources) are easily connected to the management | |||
| solutions (sinks). This requires the specification of a list of | solutions (sinks). This requires the specification of a list of | |||
| potential external data sources that could be of interest in network | potential external data sources that could be of interest in network | |||
| management and match it to the connectors and/or interfaces required | management and matching it to the connectors and/or interfaces | |||
| to connect them. | required to connect them. | |||
| Categories of external event sources that may be of interest to | Categories of external event sources that may be of interest to | |||
| network management include:: | network management include: | |||
| * Smart objects and sensors. With the consolidation of the Internet | * Smart objects and sensors. With the consolidation of the Internet | |||
| of Things~(IoT) any network system will have many smart objects | of Things (IoT), any network system will have many smart objects | |||
| attached to its physical surroundings and logical operation | attached to its physical surroundings and logical operation | |||
| environments. Most of these objects will be essentially based on | environments. Most of these objects will be essentially based on | |||
| sensors of many kinds (e.g., temperature, humidity, presence) and | sensors of many kinds (e.g., temperature, humidity, and presence), | |||
| the information they provide can be very useful for the management | and the information they provide can be very useful for the | |||
| of the network, even when they are not specifically deployed for | management of the network, even when they are not specifically | |||
| such purpose. Elements of this source type will usually provide a | deployed for such purpose. Elements of this source type will | |||
| specific protocol for interaction, especially one of those | usually provide a specific protocol for interaction, especially | |||
| protocols related to IoT, such as the Constrained Application | one of the protocols related to IoT, such as the Constrained | |||
| Protocol (CoAP). | Application Protocol (CoAP). | |||
| * Online news reporters. Several online news services have the | * Online news reporters. Several online news services have the | |||
| ability to provide enormous quantity of information about | ability to provide an enormous quantity of information about | |||
| different events occurring in the world. Some of those events can | different events occurring in the world. Some of those events can | |||
| impact on the network system managed by a specific framework and, | have an impact on the network system managed by a specific | |||
| therefore, such information may be of interest to the management | framework; therefore, such information may be of interest to the | |||
| solution. For instance, diverse security reports, such as the | management solution. For instance, diverse security reports, such | |||
| Common Vulnerabilities and Exposures (CVE), can be issued by the | as Common Vulnerabilities and Exposures (CVEs), can be issued by | |||
| corresponding authority and used by the management solution to | the corresponding authority and used by the management solution to | |||
| update the managed system if needed. Instead of a specific | update the managed system, if needed. Instead of a specific | |||
| protocol and data format, the sources of this kind of information | protocol and data format, the sources of this kind of information | |||
| usually follow a relaxed but structured format. This format will | usually follow a relaxed but structured format. This format will | |||
| be part of both the ontology and information model of the | be part of both the ontology and information model of the | |||
| telemetry framework. | telemetry framework. | |||
| * Global event analyzers. The advance of Big Data analyzers | * Global event analyzers. The advance of big data analyzers | |||
| provides a huge amount of information and, more interestingly, the | provides a huge amount of information and, more interestingly, the | |||
| identification of events detected by analyzing many data streams | identification of events detected by analyzing many data streams | |||
| from different origins. In contrast with the other types of | from different origins. In contrast with the other types of | |||
| sources, which are focused on specific events, the detectors of | sources, which are focused on specific events, the detectors of | |||
| this source type will detect generic events. For example, during | this source type will detect generic events. For example, during | |||
| a sport event some unexpected movement makes it fascinating and | a sports event, some unexpected movement makes it fascinating, and | |||
| many people connect to sites that are reporting on the event. The | many people connect to sites that are reporting on the event. The | |||
| underlying networks supporting the services that cover the event | underlying networks supporting the services that cover the event | |||
| can be affected by such situation, so their management solutions | can be affected by such situation, so their management solutions | |||
| should be aware of it. In contrast with the other source types, a | should be aware of it. In contrast with the other source types, a | |||
| new information model, format, and reporting protocol is required | new information model, format, and reporting protocol is required | |||
| to integrate the detectors of this type with the management | to integrate the detectors of this type with the management | |||
| solution. | solution. | |||
| Additional types of detector types can be added to the system, but | Additional detector types can be added to the system, but generally | |||
| they will be generally the result of composing the properties offered | they will be the result of composing the properties offered by these | |||
| by these main classes. | main classes. | |||
| A.4.2. Connectors and Interfaces | A.4.2. Connectors and Interfaces | |||
| For allowing external event detectors to be properly integrated with | For allowing external event detectors to be properly integrated with | |||
| other management solutions, both elements must expose interfaces and | other management solutions, both elements must expose interfaces and | |||
| protocols that are subject to their particular objective. Since | protocols that are subject to their particular objective. Since | |||
| external event detectors will be focused on providing their | external event detectors will be focused on providing their | |||
| information to their main consumers, which generally will not be | information to their main consumers, which generally will not be | |||
| limited to the network management solutions, the framework must | limited to the network management solutions, the framework must | |||
| include the definition of the required connectors for ensuring the | include the definition of the required connectors for ensuring the | |||
| interconnection between detectors (sources) and their consumers | interconnection between detectors (sources) and their consumers | |||
| within the management systems (sinks) are effective. | within the management systems (sinks) are effective. | |||
| In some situations, the interconnection between the external event | In some situations, the interconnection between external event | |||
| detectors and the management system is via the management plane. For | detectors and the management system is via the management plane. For | |||
| those situations there will be a special connector that provides the | those situations, there will be a special connector that provides the | |||
| typical interfaces found in most other elements connected to the | typical interfaces found in most other elements connected to the | |||
| management plane. For instance, the interfaces could accomplish this | management plane. For instance, the interfaces could accomplish this | |||
| with a specific data model (YANG) and specific telemetry protocol, | with a specific data model (YANG) and specific telemetry protocol, | |||
| such as NETCONF, YANG-Push, or gRPC. | such as NETCONF, YANG-Push, or gRPC. | |||
| Acknowledgments | ||||
| We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe | ||||
| Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe | ||||
| Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra, | ||||
| Ben Schwartz, Alexey Melnikov, Michael Scharf, Dhruv Dhody, Martin | ||||
| Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars Eggert, Éric | ||||
| Vyncke, Jean-Michel Combes, Erik Kline, Benjamin Kaduk, and many | ||||
| others who have provided helpful comments and suggestions to improve | ||||
| this document. | ||||
| Contributors | ||||
| The other contributors of this document are Tianran Zhou, Zhenbin Li, | ||||
| Zhenqiang Li, Daniel King, Adrian Farrel, and Alexander Clemm. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Haoyu Song | Haoyu Song | |||
| Futurewei | Futurewei | |||
| United States of America | United States of America | |||
| Email: haoyu.song@futurewei.com | Email: haoyu.song@futurewei.com | |||
| Fengwei Qin | Fengwei Qin | |||
| China Mobile | China Mobile | |||
| P.R. China | China | |||
| Email: qinfengwei@chinamobile.com | Email: qinfengwei@chinamobile.com | |||
| Pedro Martinez-Julia | Pedro Martinez-Julia | |||
| NICT | NICT | |||
| Japan | Japan | |||
| Email: pedro@nict.go.jp | Email: pedro@nict.go.jp | |||
| Laurent Ciavaglia | Laurent Ciavaglia | |||
| Rakuten Mobile | Rakuten Mobile | |||
| France | France | |||
| Email: laurent.ciavaglia@rakuten.com | Email: laurent.ciavaglia@rakuten.com | |||
| Aijun Wang | Aijun Wang | |||
| China Telecom | China Telecom | |||
| P.R. China | China | |||
| Email: wangaj3@chinatelecom.cn | ||||
| Email: wangaj.bri@chinatelecom.cn | ||||
| End of changes. 219 change blocks. | ||||
| 703 lines changed or deleted | 722 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||