rfc9417.original   rfc9417.txt 
OPSAWG B. Claise Internet Engineering Task Force (IETF) B. Claise
Internet-Draft J. Quilbeuf Request for Comments: 9417 J. Quilbeuf
Intended status: Informational Huawei Category: Informational Huawei
Expires: 7 July 2023 D. Lopez ISSN: 2070-1721 D. Lopez
Telefonica I+D Telefonica I+D
D. Voyer D. Voyer
Bell Canada Bell Canada
T. Arumugam T. Arumugam
Cisco Systems, Inc. Consultant
3 January 2023 June 2023
Service Assurance for Intent-based Networking Architecture Service Assurance for Intent-Based Networking Architecture
draft-ietf-opsawg-service-assurance-architecture-13
Abstract Abstract
This document describes an architecture that aims at assuring that This document describes an architecture that provides some assurance
service instances are running as expected. As services rely upon that service instances are running as expected. As services rely
multiple sub-services provided by a variety of elements including the upon multiple subservices provided by a variety of elements,
underlying network devices and functions, getting the assurance of a including the underlying network devices and functions, getting the
healthy service is only possible with a holistic view of all involved assurance of a healthy service is only possible with a holistic view
elements. This architecture not only helps to correlate the service of all involved elements. This architecture not only helps to
degradation with symptoms of a specific network component but also to correlate the service degradation with symptoms of a specific network
list the services impacted by the failure or degradation of a component but, it also lists the services impacted by the failure or
specific network component. degradation of a specific network component.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This document is not an Internet Standards Track specification; it is
provisions of BCP 78 and BCP 79. published for informational purposes.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are candidates for any level of Internet
Standard; see Section 2 of RFC 7841.
This Internet-Draft will expire on 7 July 2023. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9417.
Copyright Notice Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. Code Components carefully, as they describe your rights and restrictions with respect
extracted from this document must include Revised BSD License text as to this document. Code Components extracted from this document must
described in Section 4.e of the Trust Legal Provisions and are include Revised BSD License text as described in Section 4.e of the
provided without warranty as described in the Revised BSD License. Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
Table of Contents Table of Contents
1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology
3. A Functional Architecture . . . . . . . . . . . . . . . . . . 7 3. A Functional Architecture
3.1. Translating a Service Instance Configuration into an 3.1. Translating a Service Instance Configuration into an
Assurance Graph . . . . . . . . . . . . . . . . . . . . . 10 Assurance Graph
3.1.1. Circular Dependencies . . . . . . . . . . . . . . . . 12 3.1.1. Circular Dependencies
3.2. Intent and Assurance Graph . . . . . . . . . . . . . . . 16 3.2. Intent and Assurance Graph
3.3. Subservices . . . . . . . . . . . . . . . . . . . . . . . 17 3.3. Subservices
3.4. Building the Expression Graph from the Assurance Graph . 18 3.4. Building the Expression Graph from the Assurance Graph
3.5. Open Interfaces with YANG Modules . . . . . . . . . . . . 19 3.5. Open Interfaces with YANG Modules
3.6. Handling Maintenance Windows . . . . . . . . . . . . . . 20 3.6. Handling Maintenance Windows
3.7. Flexible Functional Architecture . . . . . . . . . . . . 21 3.7. Flexible Functional Architecture
3.8. Time window for symptoms history . . . . . . . . . . . . 23 3.8. Time Window for Symptoms' History
3.9. New Assurance Graph Generation . . . . . . . . . . . . . 23 3.9. New Assurance Graph Generation
4. Security Considerations . . . . . . . . . . . . . . . . . . . 24 4. IANA Considerations
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 5. Security Considerations
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 25 6. References
7. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 6.1. Normative References
7.1. Normative References . . . . . . . . . . . . . . . . . . 25 6.2. Informative References
7.2. Informative References . . . . . . . . . . . . . . . . . 25 Acknowledgements
Appendix A. Changes between revisions . . . . . . . . . . . . . 27 Contributors
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 28 Authors' Addresses
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 28
1. Terminology
SAIN agent: A functional component that communicates with a device, a
set of devices, or another agent to build an expression graph from a
received assurance graph and perform the corresponding computation of
the health status and symptoms. A SAIN agent might be running
directly on the device it monitors.
Assurance case: "An assurance case is a structured argument,
supported by evidence, intended to justify that a system is
acceptably assured relative to a concern (such as safety or security)
in the intended operating environment" [Piovesan2017].
Service instance: A specific instance of a service.
Intent: "A set of operational goals (that a network should meet) and
outcomes (that a network is supposed to deliver), defined in a
declarative manner without specifying how to achieve or implement
them" [RFC9315].
Subservice: Part or functionality of the network system that can be
independently assured as a single entity in assurance graph.
Assurance graph: A Directed Acyclic Graph (DAG) representing the
assurance case for one or several service instances. The nodes (also
known as vertices in the context of DAG) are the service instances
themselves and the subservices, the edges indicate a dependency
relation.
SAIN collector: A functional component that fetches or receives the
computer-consumable output of the SAIN agent(s) and process it
locally (including displaying it in a user-friendly form).
DAG: Directed Acyclic Graph.
ECMP: Equal Cost Multiple Paths
Expression graph: A generic term for a DAG representing a computation
in SAIN. More specific terms are:
* Subservice expressions: Is an expression graph representing all
the computations to execute for a subservice.
* Service expressions: Is an expression graph representing all the
computations to execute for a service instance, i.e., including
the computations for all dependent subservices.
* Global computation graph: Is an expression graph representing all
the computations to execute for all services instances (i.e., all
computations performed).
Dependency: The directed relationship between subservice instances in
the assurance graph.
Metric: A piece of information retrieved from the network running the
assured service.
Metric engine: A functional component, part of the SAIN agent, that
maps metrics to a list of candidate metric implementations depending
on the network element.
Metric implementation: Actual way of retrieving a metric from a
network element.
Network service YANG module: describes the characteristics of a
service as agreed upon with consumers of that service [RFC8199].
Service orchestrator: Quoting RFC8199, "Network Service YANG Modules
describe the characteristics of a service, as agreed upon with
consumers of that service. That is, a service module does not expose
the detailed configuration parameters of all participating network
elements and features but describes an abstract model that allows
instances of the service to be decomposed into instance data
according to the Network Element YANG Modules of the participating
network elements. The service-to-element decomposition is a separate
process; the details depend on how the network operator chooses to
realize the service. For the purpose of this document, the term
"orchestrator" is used to describe a system implementing such a
process."
SAIN orchestrator: A functional component that is in charge of
fetching the configuration specific to each service instance and
converting it into an assurance graph.
Health status: Score and symptoms indicating whether a service
instance or a subservice is "healthy". A non-maximal score must
always be explained by one or more symptoms.
Health score: Integer ranging from 0 to 100 indicating the health of
a subservice. A score of 0 means that the subservice is broken, a
score of 100 means that the subservice in question is operating as
expected. The special value -1 can be used to specify that no value
could be computed for that health-score, for instance if some metric
needed for that computation could not be collected.
Strongly connected component: subset of a directed graph such that
there is a (directed) path from any node of the subset to any other
node. A DAG does not contain any strongly connected component.
Symptom: Reason explaining why a service instance or a subservice is
not completely healthy.
2. Introduction 1. Introduction
Network service YANG modules [RFC8199] describe the configuration, Network Service YANG Modules [RFC8199] describe the configuration,
state data, operations, and notifications of abstract representations state data, operations, and notifications of abstract representations
of services implemented on one or multiple network elements. of services implemented on one or multiple network elements.
Service orchestrators use Network service YANG modules that will Service orchestrators use Network Service YANG Modules that will
infer network-wide configuration and, therefore the invocation of the infer network-wide configuration and, therefore, the invocation of
appropriate device modules (Section 3 of [RFC8969]). Knowing that a the appropriate device modules (Section 3 of [RFC8969]). Knowing
configuration is applied doesn't imply that the provisioned service that a configuration is applied doesn't imply that the provisioned
instance is up and running as expected. For instance, the service service instance is up and running as expected. For instance, the
might be degraded because of a failure in the network, the service service might be degraded because of a failure in the network, the
quality may be degraded, or a service function may be reachable at service quality may be degraded, or a service function may be
the IP level but does not provide its intended function. Thus, the reachable at the IP level but does not provide its intended function.
network operator must monitor the service's operational data at the Thus, the network operator must monitor the service's operational
same time as the configuration (Section 3.3 of [RFC8969]). To feed data at the same time as the configuration (Section 3.3 of
that task, the industry has been standardizing on telemetry to push [RFC8969]). To fuel that task, the industry has been standardizing
network element performance information (e.g., on telemetry to push network element performance information (e.g.,
[I-D.ietf-opsawg-yang-vpn-service-pm]). [RFC9375]).
A network administrator needs to monitor their network and services A network administrator needs to monitor its network and services as
as a whole, independently of the management protocols. With a whole, independently of the management protocols. With different
different protocols come different data models, and different ways to protocols come different data models and different ways to model the
model the same type of information. When network administrators deal same type of information. When network administrators deal with
with multiple management protocols, the network management entities multiple management protocols, the network management entities have
have to perform the difficult and time-consuming job of mapping data to perform the difficult and time-consuming job of mapping data
models: e.g., the model used for configuration with the model used models, e.g., the model used for configuration with the model used
for monitoring when separate models or protocols are used. This for monitoring when separate models or protocols are used. This
problem is compounded by a large, disparate set of data sources (MIB problem is compounded by a large, disparate set of data sources
modules, YANG models [RFC7950], IPFIX information elements [RFC7011], (e.g., MIB modules, YANG data models [RFC7950], IP Flow Information
syslog plain text [RFC5424], TACACS+ [RFC8907], RADIUS [RFC2865], Export (IPFIX) information elements [RFC7011], syslog plain text
etc.). In order to avoid this data model mapping, the industry [RFC5424], Terminal Access Controller Access-Control System Plus
converged on model-driven telemetry to stream the service operational (TACACS+) [RFC8907], RADIUS [RFC2865], etc.). In order to avoid this
data, reusing the YANG models used for configuration. Model-driven data model mapping, the industry converged on model-driven telemetry
telemetry greatly facilitates the notion of closed-loop automation to stream the service operational data, reusing the YANG data models
whereby events and updated operational state streamed from the used for configuration. Model-driven telemetry greatly facilitates
network drive remediation changes back into the network. the notion of closed-loop automation, whereby events and updated
operational states streamed from the network drive remediation change
back into the network.
However, it proves difficult for network operators to correlate the However, it proves difficult for network operators to correlate the
service degradation with the network root cause. For example, "Why service degradation with the network root cause, for example, "Why
does my layer 3 virtual private network (L3VPN) fail to connect?" or does my layer 3 virtual private network (L3VPN) fail to connect?" or
"Why is this specific service not highly responsive?". The reverse, "Why is this specific service not highly responsive?" The reverse,
i.e., which services are impacted when a network component fails or i.e., which services are impacted when a network component fails or
degrades, is also important for operators. For example, "Which degrades, is also important for operators, for example, "Which
services are impacted when this specific optic decibel milliwatt services are impacted when this specific optic decibel milliwatt
(dBm) begins to degrade?", "Which applications are impacted by an (dBm) begins to degrade?", "Which applications are impacted by an
imbalance in this equal cost multiple paths (ECMP) bundle?", or "Is imbalance in this Equal-Cost Multipath (ECMP) bundle?", or "Is that
that issue actually impacting any other customers?". This task issue actually impacting any other customers?" This task usually
usually falls under the so-called "Service Impact Analysis" falls under the so-called "Service Impact Analysis" functional block.
functional block.
In this document, we propose an architecture implementing Service This document defines an architecture implementing Service Assurance
Assurance for Intent-Based Networking (SAIN). Intent-based for Intent-based Networking (SAIN). Intent-based approaches are
approaches are often declarative, starting from a statement of "The often declarative, starting from a statement of "The service works as
service works as expected" and trying to enforce it. However, some expected" and trying to enforce it. However, some already-defined
already defined services might have been designed using a different services might have been designed using a different approach.
approach. Aligned with Section 3.3 of [RFC7149], and instead of Aligned with Section 3.3 of [RFC7149], and instead of requiring a
requiring a declarative intent as a starting point, this architecture declarative intent as a starting point, this architecture focuses on
focuses on already defined services and tries to infer the meaning of already-defined services and tries to infer the meaning of "The
"The service works as expected". To do so, the architecture works service works as expected". To do so, the architecture works from an
from an assurance graph, deduced from the configuration pushed to the assurance graph, deduced from the configuration pushed to the device
device for enabling the service instance. If the SAIN orchestrator for enabling the service instance. If the SAIN orchestrator supports
supports it, the service model (Section 2 of [RFC8309]) or the it, the service model (Section 2 of [RFC8309]) or the network model
network model (Section 2.1 of [RFC8969]) can also be used to build (Section 2.1 of [RFC8969]) can also be used to build the assurance
the assurance graph. In that case and if the service model includes graph. In that case and if the service model includes the
the declarative intent as well, the SAIN orchestrator can rely on the declarative intent as well, the SAIN orchestrator can rely on the
declared intent instead of inferring it. The assurance graph may declared intent instead of inferring it. The assurance graph may
also be explicitly completed to add an intent not exposed in the also be explicitly completed to add an intent not exposed in the
service model itself. service model itself.
The assurance graph of a service instance is decomposed into The assurance graph of a service instance is decomposed into
components, which are then assured independently. The top of the components, which are then assured independently. The top of the
assurance graph represents the service instance to assure, and its assurance graph represents the service instance to assure, and its
children represent components identified as its direct dependencies; children represent components identified as its direct dependencies;
each component can have dependencies as well. Components involved in each component can have dependencies as well. Components involved in
the assurance graph of a service are called subservices. The SAIN the assurance graph of a service are called subservices. The SAIN
orchestrator updates automatically the assurance graph when the orchestrator updates the assurance graph automatically when the
service instance is modified. service instance is modified.
When a service is degraded, the SAIN architecture will highlight When a service is degraded, the SAIN architecture will highlight
where in the assurance service graph to look, as opposed to going hop where in the assurance service graph to look, as opposed to going hop
by hop to troubleshoot the issue. More precisely, the SAIN by hop to troubleshoot the issue. More precisely, the SAIN
architecture will associate to each service instance a list of architecture will associate to each service instance a list of
symptoms originating from specific subservices, corresponding to symptoms originating from specific subservices, corresponding to
components of the network. These components are good candidates for components of the network. These components are good candidates for
explaining the source of a service degradation. Not only can this explaining the source of a service degradation. Not only can this
architecture help to correlate service degradation with network root architecture help to correlate service degradation with network root
cause/symptoms, but it can deduce from the assurance graph the list cause/symptoms, but it can deduce from the assurance graph the list
of service instances impacted by a component degradation/failure. of service instances impacted by a component degradation/failure.
This added value informs the operational team where to focus its This added value informs the operational team where to focus its
attention for maximum return. Indeed, the operational team is likely attention for maximum return. Indeed, the operational team is likely
to focus their priority on the degrading/failing components impacting to focus their priority on the degrading/failing components impacting
the highest number of their customers, especially the ones with the the highest number of their customers, especially the ones with the
SLA contracts involving penalties in case of failure. Service-Level Agreement (SLA) contracts involving penalties in case
of failure.
This architecture provides the building blocks to assure both This architecture provides the building blocks to assure both
physical and virtual entities and is flexible with respect to physical and virtual entities and is flexible with respect to
services and subservices, of (distributed) graphs, and of components services and subservices of (distributed) graphs and components
(Section 3.7). (Section 3.7).
The architecture presented in this document is implemented by a set The architecture presented in this document is implemented by a set
of YANG modules defined in a companion document of YANG modules defined in a companion document [RFC9418]. These
[I-D.ietf-opsawg-service-assurance-yang]. These YANG modules YANG modules properly define the interfaces between the various
properly define the interfaces between the various components of the components of the architecture to foster interoperability.
architecture in order to foster interoperability.
2. Terminology
SAIN agent: A functional component that communicates with a device,
a set of devices, or another agent to build an expression graph
from a received assurance graph and perform the corresponding
computation of the health status and symptoms. A SAIN agent might
be running directly on the device it monitors.
Assurance case: "An assurance case is a structured argument,
supported by evidence, intended to justify that a system is
acceptably assured relative to a concern (such as safety or
security) in the intended operating environment" [Piovesan2017].
Service instance: A specific instance of a service.
Intent: "A set of operational goals (that a network should meet) and
outcomes (that a network is supposed to deliver) defined in a
declarative manner without specifying how to achieve or implement
them" [RFC9315].
Subservice: A part or functionality of the network system that can
be independently assured as a single entity in an assurance graph.
Assurance graph: A Directed Acyclic Graph (DAG) representing the
assurance case for one or several service instances. The nodes
(also known as vertices in the context of DAG) are the service
instances themselves and the subservices; the edges indicate a
dependency relation.
SAIN collector: A functional component that fetches or receives the
computer-consumable output of the SAIN agent(s) and processes it
locally (including displaying it in a user-friendly form).
DAG: Directed Acyclic Graph.
ECMP: Equal-Cost Multipath.
Expression graph: A generic term for a DAG representing a
computation in SAIN. More specific terms are listed below:
Subservice expressions:
An expression graph representing all the computations to
execute for a subservice.
Service expressions:
An expression graph representing all the computations to
execute for a service instance, i.e., including the
computations for all dependent subservices.
Global computation graph:
An expression graph representing all the computations to
execute for all services instances (i.e., all computations
performed).
Dependency: The directed relationship between subservice instances
in the assurance graph.
Metric: A piece of information retrieved from the network running
the assured service.
Metric engine: A functional component, part of the SAIN agent, that
maps metrics to a list of candidate metric implementations,
depending on the network element.
Metric implementation: The actual way of retrieving a metric from a
network element.
Network Service YANG Module: The characteristics of a service, as
agreed upon with consumers of that service [RFC8199].
Service orchestrator: "Network Service YANG Modules describe the
characteristics of a service, as agreed upon with consumers of
that service. That is, a service module does not expose the
detailed configuration parameters of all participating network
elements and features but describes an abstract model that allows
instances of the service to be decomposed into instance data
according to the Network Element YANG Modules of the participating
network elements. The service-to-element decomposition is a
separate process; the details depend on how the network operator
chooses to realize the service. For the purpose of this document,
the term "orchestrator" is used to describe a system implementing
such a process" [RFC8199].
SAIN orchestrator: A functional component that is in charge of
fetching the configuration specific to each service instance and
converting it into an assurance graph.
Health status: The score and symptoms indicating whether a service
instance or a subservice is "healthy". A non-maximal score must
always be explained by one or more symptoms.
Health score: An integer ranging from 0 to 100 that indicates the
health of a subservice. A score of 0 means that the subservice is
broken, a score of 100 means that the subservice in question is
operating as expected, and the special value -1 can be used to
specify that no value could be computed for that health score, for
instance, if some metric needed for that computation could not be
collected.
Strongly connected component: A subset of a directed graph such that
there is a (directed) path from any node of the subset to any
other node. A DAG does not contain any strongly connected
component.
Symptom: A reason explaining why a service instance or a subservice
is not completely healthy.
3. A Functional Architecture 3. A Functional Architecture
The goal of SAIN is to assure that service instances are operating as The goal of SAIN is to assure that service instances are operating as
expected (i.e., the observed service is matching the expected expected (i.e., the observed service is matching the expected
service) and if not, to pinpoint what is wrong. More precisely, SAIN service) and, if not, to pinpoint what is wrong. More precisely,
computes a score for each service instance and outputs symptoms SAIN computes a score for each service instance and outputs symptoms
explaining that score. The only valid situation where no symptoms explaining that score. The only valid situation where no symptoms
are returned is when the score is maximal, indicating that no issues are returned is when the score is maximal, indicating that no issues
were detected for that service instance. The score augmented with were detected for that service instance. The score augmented with
the symptoms is called the health status. The exact meaning of the the symptoms is called the health status. The exact meaning of the
health score value is out of scope of this document. However the health score value is out of scope of this document. However, the
following constraints should be followed: the higher the score, the following constraints should be followed: the higher the score, the
better the service health is; the two extrema being 0 meaning the better the service health is and the two extrema are 0 meaning the
service is completely broken and 100 meaning the service is service is completely broken, and 100 meaning the service is
completely operational. completely operational.
The SAIN architecture is a generic architecture, which generates an The SAIN architecture is a generic architecture, which generates an
assurance graph from service instance(s), as specified in assurance graph from service instance(s), as specified in
Section 3.1). This architecture is applicable to multiple Section 3.1. This architecture is applicable to not only multiple
environments (e.g. wireline, wireless), but also different domains environments (e.g., wireline and wireless) but also different domains
(e.g. 5G network function virtualization (NFV) domain with a virtual (e.g., 5G network function virtualization (NFV) domain with a virtual
infrastructure manager (VIM), etc.), and as already noted, for infrastructure manager (VIM), etc.) and, as already noted, for
physical or virtual devices, as well as virtual functions. Thanks to physical or virtual devices, as well as virtual functions. Thanks to
the distributed graph design principle, graphs from different the distributed graph design principle, graphs from different
environments/orchestrator can be combined to obtain the graph of a environments and orchestrators can be combined to obtain the graph of
service instance that spans over multiple domains. a service instance that spans over multiple domains.
As an example of a service, let us consider a point-to-point level 2 As an example of a service, let us consider a point-to-point layer 2
virtual private network (L2VPN). [RFC8466] specifies the parameters virtual private network (L2VPN). [RFC8466] specifies the parameters
for such a service. Examples of symptoms might be symptoms reported for such a service. Examples of symptoms might be symptoms reported
by specific subservices "Interface has high error rate" or "Interface by specific subservices, including "Interface has high error rate",
flapping", or "Device almost out of memory" as well as symptoms more "Interface flapping", or "Device almost out of memory", as well as
specific to the service such as "Site disconnected from VPN". symptoms more specific to the service (such as "Site disconnected
from VPN").
To compute the health status of an instance of such a service, the To compute the health status of an instance of such a service, the
service definition is decomposed into an assurance graph formed by service definition is decomposed into an assurance graph formed by
subservices linked through dependencies. Each subservice is then subservices linked through dependencies. Each subservice is then
turned into an expression graph that details how to fetch metrics turned into an expression graph that details how to fetch metrics
from the devices and compute the health status of the subservice. from the devices and compute the health status of the subservice.
The subservice expressions are combined according to the dependencies The subservice expressions are combined according to the dependencies
between the subservices in order to obtain the expression graph which between the subservices in order to obtain the expression graph that
computes the health status of the service instance. computes the health status of the service instance.
The overall SAIN architecture is presented in Figure 1. Based on the The overall SAIN architecture is presented in Figure 1. Based on the
service configuration provided by the service orchestrator, the SAIN service configuration provided by the service orchestrator, the SAIN
orchestrator decomposes the assurance graph. It then sends to the orchestrator decomposes the assurance graph. It then sends to the
SAIN agents the assurance graph along with some other configuration SAIN agents the assurance graph along with some other configuration
options. The SAIN agents are responsible for building the expression options. The SAIN agents are responsible for building the expression
graph and computing the health statuses in a distributed manner. The graph and computing the health statuses in a distributed manner. The
collector is in charge of collecting and displaying the current collector is in charge of collecting and displaying the current
inferred health status of the service instances and subservices. The inferred health status of the service instances and subservices. The
collector also detects changes in the assurance graph structures, for collector also detects changes in the assurance graph structures
instance when a switchover from primary to backup path occurs, and (e.g., an occurrence of a switchover from primary to backup path) and
forwards to the orchestrator, which reconfigures the agents. forwards the information to the orchestrator, which reconfigures the
Finally, the automation loop is closed by having the SAIN collector agents. Finally, the automation loop is closed by having the SAIN
providing feedback to the network/service orchestrator. collector provide feedback to the network/service orchestrator.
In order to make agents, orchestrators and collectors from different In order to make agents, orchestrators, and collectors from different
vendors interoperable, their interface is defined as a YANG model in vendors interoperable, their interface is defined as a YANG module in
a companion document [I-D.ietf-opsawg-service-assurance-yang]. In a companion document [RFC9418]. In Figure 1, the communications that
Figure 1, the communications that are normalized by this YANG model are normalized by this YANG module are tagged with a "Y". The use of
are tagged with a "Y". The use of this YANG model is further this YANG module is further explained in Section 3.5.
explained in Section 3.5.
+-----------------+ +-----------------+
| Service | | Service |
| Orchestrator |<----------------------+ | Orchestrator |<----------------------+
| | | | | |
+-----------------+ | +-----------------+ |
| ^ | | ^ |
| | Network | | | Network |
| | Service | Feedback | | Service | Feedback
| | Instance | Loop | | Instance | Loop
| | Configuration | | | Configuration |
| | | | | |
| V | | V |
| +-----------------+ Graph +-------------------+ | +-----------------+ Graph +-------------------+
| | SAIN | updates | SAIN | | | SAIN | Updates | SAIN |
| | Orchestrator |<--------| Collector | | | Orchestrator |<--------| Collector |
| +-----------------+ +-------------------+ | +-----------------+ +-------------------+
| | ^ | | ^
| Y| Configuration | Health Status | Y| Configuration | Health Status
| | (assurance graph) Y| (Score + Symptoms) | | (Assurance Graph) Y| (Score + Symptoms)
| V | Streamed | V | Streamed
| +-------------------+ | via Telemetry | +-------------------+ | via Telemetry
| |+-------------------+ | | |+-------------------+ |
| ||+-------------------+ | | ||+-------------------+ |
| +|| SAIN |-----------+ | +|| SAIN |-----------+
| +| agent | | +| Agent |
| +-------------------+ | +-------------------+
| ^ ^ ^ | ^ ^ ^
| | | | | | | |
| | | | Metric Collection | | | | Metric Collection
V V V V V V V V
+-------------------------------------------------------------+ +-------------------------------------------------------------+
| (Network) System | | (Network) System |
| | | |
+-------------------------------------------------------------+ +-------------------------------------------------------------+
skipping to change at page 10, line 5 skipping to change at line 407
In order to produce the score assigned to a service instance, the In order to produce the score assigned to a service instance, the
various involved components perform the following tasks: various involved components perform the following tasks:
* Analyze the configuration pushed to the network device(s) for * Analyze the configuration pushed to the network device(s) for
configuring the service instance. From there, determine which configuring the service instance. From there, determine which
information (called a metric) must be collected from the device(s) information (called a metric) must be collected from the device(s)
and which operations to apply to the metrics to compute the health and which operations to apply to the metrics to compute the health
status. status.
* Stream (via telemetry [RFC8641]) operational and config metric * Stream (via telemetry, such as YANG-Push [RFC8641]) operational
values when possible, else continuously poll. and config metric values when possible, else continuously poll.
* Continuously compute the health status of the service instances, * Continuously compute the health status of the service instances
based on the metric values. based on the metric values.
The SAIN architecture requires time synchronization, with Network The SAIN architecture requires time synchronization, with the Network
Time Protocol (NTP) [RFC5905] as a candidate, between all elements: Time Protocol (NTP) [RFC5905] as a candidate, between all elements:
monitored entities, SAIN agents, Service orchestrator, the SAIN monitored entities, SAIN agents, service orchestrator, the SAIN
collector, as well as the SAIN orchestrator. This guarantees the collector, as well as the SAIN orchestrator. This guarantees the
correlations of all symptoms in the system, correlated with the right correlations of all symptoms in the system, correlated with the right
assurance graph version. assurance graph version.
3.1. Translating a Service Instance Configuration into an Assurance 3.1. Translating a Service Instance Configuration into an Assurance
Graph Graph
In order to structure the assurance of a service instance, the SAIN In order to structure the assurance of a service instance, the SAIN
orchestrator decomposes the service instance into so-called orchestrator decomposes the service instance into so-called
subservice instances. Each subservice instance focuses on a specific subservice instances. Each subservice instance focuses on a specific
feature or subpart of the service. feature or subpart of the service.
The decomposition into subservices is an important function of the The decomposition into subservices is an important function of the
architecture, for the following reasons: architecture for the following reasons:
* The result of this decomposition provides a relational picture of * The result of this decomposition provides a relational picture of
a service instance, that can be represented as a graph (called a service instance, which can be represented as a graph (called an
assurance graph) to the operator. assurance graph) to the operator.
* Subservices provide a scope for particular expertise and thereby * Subservices provide a scope for particular expertise and thereby
enable contribution from external experts. For instance, the enable contribution from external experts. For instance, the
subservice dealing with the optics health should be reviewed and subservice dealing with the optic's health should be reviewed and
extended by an expert in optical interfaces. extended by an expert in optical interfaces.
* Subservices that are common to several service instances are * Subservices that are common to several service instances are
reused for reducing the amount of computation needed. For reused for reducing the amount of computation needed. For
instance, the subservice assuring a given interface is reused by instance, the subservice assuring a given interface is reused by
any service instance relying on that interface. any service instance relying on that interface.
The assurance graph of a service instance is a DAG representing the The assurance graph of a service instance is a DAG representing the
structure of the assurance case for the service instance. The nodes structure of the assurance case for the service instance. The nodes
of this graph are service instances or subservice instances. Each of this graph are service instances or subservice instances. Each
edge of this graph indicates a dependency between the two nodes at edge of this graph indicates a dependency between the two nodes at
its extremities: the service or subservice at the source of the edge its extremities, i.e., the service or subservice at the source of the
depends on the service or subservice at the destination of the edge. edge depends on the service or subservice at the destination of the
edge.
Figure 2 depicts a simplistic example of the assurance graph for a Figure 2 depicts a simplistic example of the assurance graph for a
tunnel service. The node at the top is the service instance, the tunnel service. The node at the top is the service instance; the
nodes below are its dependencies. In the example, the tunnel service nodes below are its dependencies. In the example, the tunnel service
instance depends on the "peer1" and "peer2" tunnel interfaces (the instance depends on the "peer1" and "peer2" tunnel interfaces (the
tunnel interfaces created on the peer1 and peer2 devices, tunnel interfaces created on the peer1 and peer2 devices,
respectively), which in turn depend on the respective physical respectively), which in turn depend on the respective physical
interfaces, which finally depend on the respective "peer1" and interfaces, which finally depend on the respective "peer1" and
"peer2" devices. The tunnel service instance also depends on the IP "peer2" devices. The tunnel service instance also depends on the IP
connectivity that depends on the IS-IS routing protocol. connectivity that depends on the IS-IS routing protocol.
+------------------+ +------------------+
| Tunnel | | Tunnel |
skipping to change at page 12, line 7 skipping to change at line 497
+-------------+ +-------------+ +-------------+ +-------------+
| | | | | | | |
| Peer1 | | Peer2 | | Peer1 | | Peer2 |
| Device | | Device | | Device | | Device |
+-------------+ +-------------+ +-------------+ +-------------+
Figure 2: Assurance Graph Example Figure 2: Assurance Graph Example
Depicting the assurance graph helps the operator to understand (and Depicting the assurance graph helps the operator to understand (and
assert) the decomposition. The assurance graph shall be maintained assert) the decomposition. The assurance graph shall be maintained
during normal operation with addition, modification and removal of during normal operation with addition, modification, and removal of
service instances. A change in the network configuration or topology service instances. A change in the network configuration or topology
shall automatically be reflected in the assurance graph. As a first shall automatically be reflected in the assurance graph. As a first
example, a change of routing protocol from IS-IS to OSPF would change example, a change of the routing protocol from IS-IS to OSPF would
the assurance graph accordingly. As a second example, assuming that change the assurance graph accordingly. As a second example, assume
ECMP is in place for the source router for that specific tunnel; in that the ECMP is in place for the source router for that specific
that case, multiple interfaces must now be monitored, on top of the tunnel; in that case, multiple interfaces must now be monitored, in
monitoring the ECMP health itself. addition to monitoring the ECMP health itself.
3.1.1. Circular Dependencies 3.1.1. Circular Dependencies
The edges of the assurance graph represent dependencies. An The edges of the assurance graph represent dependencies. An
assurance graph is a DAG if and only if there are no circular assurance graph is a DAG if and only if there are no circular
dependencies among the subservices, and every assurance graph should dependencies among the subservices, and every assurance graph should
avoid circular dependencies. However, in some cases, circular avoid circular dependencies. However, in some cases, circular
dependencies might appear in the assurance graph. dependencies might appear in the assurance graph.
First, the assurance graph of a whole system is obtained by combining First, the assurance graph of a whole system is obtained by combining
the assurance graph of every service running on that system. Here the assurance graph of every service running on that system. Here,
combining means that two subservices having the same type and the combining means that two subservices having the same type and the
same parameters are in fact the same subservice and thus a single same parameters are in fact the same subservice and thus a single
node in the graph. For instance, the subservice of type "device" node in the graph. For instance, the subservice of type "device"
with the only parameter (the device ID) set to "PE1" will appear only with the only parameter (the device ID) set to "PE1" will appear only
once in the whole assurance graph even if several service instances once in the whole assurance graph, even if several service instances
rely on that device. Now, if two engineers design assurance graphs rely on that device. Now, if two engineers design assurance graphs
for two different services, and engineer A decides that an interface for two different services, and Engineer A decides that an interface
depends on the link it is connected to, but engineer B decides that depends on the link it is connected to, but Engineer B decides that
the link depends on the interface it is connected to, then when the link depends on the interface it is connected to, then when
combining the two assurance graphs, we will have a circular combining the two assurance graphs, we will have a circular
dependency interface -> link -> interface. dependency interface -> link -> interface.
Another case possibly resulting in circular dependencies is when Another case possibly resulting in circular dependencies is when
subservices are not properly identified. Assume that we want to subservices are not properly identified. Assume that we want to
assure a cloud-based computing cluster that runs containers. We assure a cloud-based computing cluster that runs containers. We
could represent the cluster by a subservice and the network service could represent the cluster by a subservice and the network service
connecting containers on the cluster by another subservice. We will connecting containers on the cluster by another subservice. We would
likely model that the network service depends on the cluster, because likely model that as the network service depending on the cluster,
the network service runs in a container supported by the cluster. because the network service runs in a container supported by the
Conversely, the cluster depends on the network service for cluster. Conversely, the cluster depends on the network service for
connectivity between containers, which creates a circular dependency. connectivity between containers, which creates a circular dependency.
A finer decomposition might distinguish between the resources for A finer decomposition might distinguish between the resources for
executing containers (a part of our cluster subservice) and the executing containers (a part of our cluster subservice) and the
communication between the containers (which could be modelled in the communication between the containers (which could be modeled in the
same way as communication between routers). same way as communication between routers).
In any case, it is likely that circular dependencies will show up in In any case, it is likely that circular dependencies will show up in
the assurance graph. A first step would be to detect circular the assurance graph. A first step would be to detect circular
dependencies as soon as possible in the SAIN architecture. Such a dependencies as soon as possible in the SAIN architecture. Such a
detection could be carried out by the SAIN orchestrator. Whenever a detection could be carried out by the SAIN orchestrator. Whenever a
circular dependency is detected, the newly added service would not be circular dependency is detected, the newly added service would not be
monitored until more careful modelling or alignment between the monitored until more careful modeling or alignment between the
different teams (engineer A and B) remove the circular dependency. different teams (Engineers A and B) remove the circular dependency.
As more elaborate solution we could consider a graph transformation: As a more elaborate solution, we could consider a graph
transformation:
* Decompose the graph into strongly connected components. * Decompose the graph into strongly connected components.
* For each strongly connected component: * For each strongly connected component:
- Remove all edges between nodes of the strongly connected - remove all edges between nodes of the strongly connected
component component;
- Add a new "synthetic" node for the strongly connected component - add a new "synthetic" node for the strongly connected
component;
- For each edge pointing to a node in the strongly connected - for each edge pointing to a node in the strongly connected
component, change the destination to the "synthetic" node component, change the destination to the "synthetic" node; and
- Add a dependency from the "synthetic" node to every node in the - add a dependency from the "synthetic" node to every node in the
strongly connected component. strongly connected component.
Such an algorithm would include all symptoms detected by any Such an algorithm would include all symptoms detected by any
subservice in one of the strongly component and make it available to subservice in one of the strongly connected components and make it
any subservice that depends on it. Figure 3 shows an example of such available to any subservice that depends on it. Figure 3 shows an
a transformation. On the left-hand side, the nodes c, d, e and f example of such a transformation. On the left-hand side, the nodes
form a strongly connected component. The status of node a should c, d, e, and f form a strongly connected component. The status of
depend on the status of nodes c, d, e, f, g, and h, but this is hard node a should depend on the status of nodes c, d, e, f, g, and h, but
to compute because of the circular dependency. On the right hand- this is hard to compute because of the circular dependency. On the
side, a depends on all these nodes as well, but there the circular right-hand side, node a depends on all these nodes as well, but the
dependency has been removed. circular dependency has been removed.
+---+ +---+ | +---+ +---+ +---+ +---+ | +---+ +---+
| a | | b | | | a | | b | | a | | b | | | a | | b |
+---+ +---+ | +---+ +---+ +---+ +---+ | +---+ +---+
| | | | | | | | | |
v v | v v v v | v v
+---+ +---+ | +------------+ +---+ +---+ | +------------+
| c |--->| d | | | synthetic | | c |--->| d | | | synthetic |
+---+ +---+ | +------------+ +---+ +---+ | +------------+
^ | | / | | \ ^ | | / | | \
skipping to change at page 14, line 28 skipping to change at line 602
+---+ +---+ | +---+ +---+ +---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+
| | | | | | | | | |
v v | v v v v | v v
+---+ +---+ | +---+ +---+ +---+ +---+ | +---+ +---+
| g | | h | | | g | | h | | g | | h | | | g | | h |
+---+ +---+ | +---+ +---+ +---+ +---+ | +---+ +---+
Before After Before After
Transformation Transformation Transformation Transformation
Figure 3: Graph transformation Figure 3: Graph Transformation
We consider a concrete example to illustrate this transformation. We consider a concrete example to illustrate this transformation.
Let's assume that Engineer A is building an assurance graph dealing Let's assume that Engineer A is building an assurance graph dealing
with IS-IS and Engineer B is building an assurance graph dealing with with IS-IS and Engineer B is building an assurance graph dealing with
OSPF. The graph from Engineer A could contain the following: OSPF. The graph from Engineer A could contain the following:
+------------+ +------------+
| IS-IS Link | | IS-IS Link |
+------------+ +------------+
| |
v v
+------------+ +------------+
| Phys. Link | | Phys. Link |
+------------+ +------------+
| | | |
v v v v
+-------------+ +-------------+ +-------------+ +-------------+
| Interface 1 | | Interface 2 | | Interface 1 | | Interface 2 |
+-------------+ +-------------+ +-------------+ +-------------+
Figure 4: Fragment of assurance graph from Engineer A Figure 4: Fragment of the Assurance Graph from Engineer A
The graph from Engineer B could contain the following: The graph from Engineer B could contain the following:
+------------+ +------------+
| OSPF Link | | OSPF Link |
+------------+ +------------+
| | | | | |
v | v v | v
+-------------+ | +-------------+ +-------------+ | +-------------+
| Interface 1 | | | Interface 2 | | Interface 1 | | | Interface 2 |
+-------------+ | +-------------+ +-------------+ | +-------------+
| | | | | |
v v v v v v
+------------+ +------------+
| Phys. Link | | Phys. Link |
+------------+ +------------+
Figure 5: Fragment of assurance graph from Engineer B Figure 5: Fragment of the Assurance Graph from Engineer B
Each Interface subservice and the Physical Link subservice are common The Interface subservices and the Physical Link subservice are common
to both fragments above. Each of these subservice appears only once to both fragments above. Each of these subservices appear only once
in the graph merging the two fragments. Dependencies from both in the graph merging the two fragments. Dependencies from both
fragments are included in the merged graph, resulting in a circular fragments are included in the merged graph, resulting in a circular
dependency: dependency:
+------------+ +------------+ +------------+ +------------+
| IS-IS Link | | OSPF Link |---+ | IS-IS Link | | OSPF Link |---+
+------------+ +------------+ | +------------+ +------------+ |
| | | | | | | |
| +-------- + | | | +-------- + | |
v v | | v v | |
skipping to change at page 15, line 46 skipping to change at line 668
| ^ | | | | | ^ | | | |
| | +-------+ | | | | | +-------+ | | |
v | v | v | v | v | v |
+-------------+ +-------------+ | +-------------+ +-------------+ |
| Interface 1 | | Interface 2 | | | Interface 1 | | Interface 2 | |
+-------------+ +-------------+ | +-------------+ +-------------+ |
^ | ^ |
| | | |
+------------------------------+ +------------------------------+
Figure 6: Merging graphs from A and B Figure 6: Merging Graphs from Engineers A and B
The solution presented above would result in graph looking as The solution presented above would result in a graph looking as
follows, where a new "synthetic" node is included. Using that follows, where a new "synthetic" node is included. Using that
transformation, all dependencies are indirectly satisfied for the transformation, all dependencies are indirectly satisfied for the
nodes outside the circular dependency, in the sense that both IS-IS nodes outside the circular dependency, in the sense that both IS-IS
and OSPF links have indirect dependencies to the two interfaces and and OSPF links have indirect dependencies to the two interfaces and
the link. However, the dependencies between the link and the the link. However, the dependencies between the link and the
interfaces are lost as they were causing the circular dependency. interfaces are lost since they were causing the circular dependency.
+------------+ +------------+ +------------+ +------------+
| IS-IS Link | | OSPF Link | | IS-IS Link | | OSPF Link |
+------------+ +------------+ +------------+ +------------+
| | | |
v v v v
+------------+ +------------+
| synthetic | | synthetic |
+------------+ +------------+
| |
+-----------+-------------+ +-----------+-------------+
| | | | | |
v v v v v v
+-------------+ +------------+ +-------------+ +-------------+ +------------+ +-------------+
| Interface 1 | | Phys. Link | | Interface 2 | | Interface 1 | | Phys. Link | | Interface 2 |
+-------------+ +------------+ +-------------+ +-------------+ +------------+ +-------------+
Figure 7: Removing circular dependencies after merging graphs Figure 7: Removing Circular Dependencies after Merging Graphs
from A and B from Engineers A and B
3.2. Intent and Assurance Graph 3.2. Intent and Assurance Graph
The SAIN orchestrator analyzes the configuration of a service The SAIN orchestrator analyzes the configuration of a service
instance to: instance to do the following:
* Try to capture the intent of the service instance, i.e., what is * Try to capture the intent of the service instance, i.e., What is
the service instance trying to achieve. At least, this requires the service instance trying to achieve? At a minimum, this
the SAIN orchestrator to know the YANG modules that are being requires the SAIN orchestrator to know the YANG modules that are
configured on the devices to enable the service. Note that if the being configured on the devices to enable the service. Note that,
service model or the network model is known to the SAIN if the service model or the network model is known to the SAIN
orchestrator, the latter can exploit it. In that case, the intent orchestrator, the latter can exploit it. In that case, the intent
could be directly extracted and include more details, such as the could be directly extracted and include more details, such as the
notion of sites for a VPN, which is out of scope of the device notion of sites for a VPN, which is out of scope of the device
configuration. configuration.
* Decompose the service instance into subservices representing the * Decompose the service instance into subservices representing the
network features on which the service instance relies. network features on which the service instance relies.
The SAIN orchestrator must be able to analyze configuration pushed to The SAIN orchestrator must be able to analyze the configuration
various devices for configuring a service instance and produce the pushed to various devices of a service instance and produce the
assurance graph for that service instance. assurance graph for that service instance.
To schematize what a SAIN orchestrator does, assume that the To schematize what a SAIN orchestrator does, assume that a service
configuration for a service instance touches two devices and instance touches two devices and configures a virtual tunnel
configure on each device a virtual tunnel interface. Then: interface on each device. Then:
* Capturing the intent would start by detecting that the service * Capturing the intent would start by detecting that the service
instance is actually a tunnel between the two devices, and stating instance is actually a tunnel between the two devices and stating
that this tunnel must be functional. This solution is minimally that this tunnel must be operational. This solution is minimally
invasive as it does not require modifying nor knowing the service invasive, as it does not require modifying nor knowing the service
model. If the service model or network model is known by the SAIN model. If the service model or network model is known by the SAIN
orchestrator, it can be used to further capture the intent and orchestrator, it can be used to further capture the intent and
include more information such as Service Level Objectives. For include more information, such as Service-Level Objectives (e.g.,
instance, the latency and bandwidth requirements for the tunnel, the latency and bandwidth requirements for the tunnel) if present
if present in the service model in the service model.
* Decomposing the service instance into subservices would result in * Decomposing the service instance into subservices would result in
the assurance graph depicted in Figure 2, for instance. the assurance graph depicted in Figure 2, for instance.
The assurance graph, or more precisely the subservices and The assurance graph, or more precisely the subservices and
dependencies that a SAIN orchestrator can instantiate, should be dependencies that a SAIN orchestrator can instantiate, should be
curated. The organization of such a process is out-of-scope for this curated. The organization of such a process (i.e., ensure that
document and should aim to: existing subservices are reused as much as possible and avoid
circular dependencies) is out-of-scope for this document.
* Ensure that existing subservices are reused as much as possible.
* Avoid circular dependencies.
To be applied, SAIN requires a mechanism mapping a service instance To be applied, SAIN requires a mechanism mapping a service instance
to the configuration actually required on the devices for that to the configuration actually required on the devices for that
service instance to run. While the Figure 1 makes a distinction service instance to run. While Figure 1 makes a distinction between
between the SAIN orchestrator and a different component providing the the SAIN orchestrator and a different component providing the service
service instance configuration, in practice those two components are instance configuration, in practice those two components are most
mostly likely combined. The internals of the orchestrator are out of likely combined. The internals of the orchestrator are out of scope
scope of this document. of this document.
3.3. Subservices 3.3. Subservices
A subservice corresponds to subpart or a feature of the network A subservice corresponds to a subpart or a feature of the network
system that is needed for a service instance to function properly. system that is needed for a service instance to function properly.
In the context of SAIN, a subservice is associated to its assurance, In the context of SAIN, a subservice is associated to its assurance,
that is the method for assuring that a subservice behaves correctly. which is the method for assuring that a subservice behaves correctly.
Subservices, just as with services, have high-level parameters that Subservices, just as with services, have high-level parameters that
specify the instance to be assured. The needed parameters depend on specify the instance to be assured. The needed parameters depend on
the subservice type. For example, assuring a device requires a the subservice type. For example, assuring a device requires a
specific deviceId as parameter. For example, assuring an interface specific deviceId as a parameter and assuring an interface requires a
requires a specific combination of deviceId and interfaceId. specific combination of deviceId and interfaceId.
When designing a new type of subservice, one should carefully define When designing a new type of subservice, one should carefully define
what is the assured object or functionality. Then, the parameters what is the assured object or functionality. Then, the parameters
must be chosen as a minimal set that completely identify the object must be chosen as a minimal set that completely identifies the object
(see examples from the previous paragraph). Parameters cannot change (see examples from the previous paragraph). Parameters cannot change
during the lifecycle of a subservice. For instance, an IP address is during the life cycle of a subservice. For instance, an IP address
a good parameter when assuring a connectivity towards that address is a good parameter when assuring a connectivity towards that address
(i.e. a given device can reach a given IP address), however it's not (i.e., a given device can reach a given IP address); however, it's
a good parameter to identify an interface as the IP address assigned not a good parameter to identify an interface, as the IP address
to that interface can be changed. assigned to that interface can be changed.
A subservice is also characterized by a list of metrics to fetch and A subservice is also characterized by a list of metrics to fetch and
a list of operations to apply to these metrics in order to infer a a list of operations to apply to these metrics in order to infer a
health status. health status.
3.4. Building the Expression Graph from the Assurance Graph 3.4. Building the Expression Graph from the Assurance Graph
From the assurance graph is derived a so-called global computation From the assurance graph, a so-called global computation graph is
graph. First, each subservice instance is transformed into a set of derived. First, each subservice instance is transformed into a set
subservice expressions that take metrics and constants as input of subservice expressions that take metrics and constants as input
(i.e., sources of the DAG) and produce the status of the subservice, (i.e., sources of the DAG) and produce the status of the subservice
based on some heuristics. For instance, the health of an interface based on some heuristics. For instance, the health of an interface
is 0 (minimal score) with the symptom "interface admin-down" if the is 0 (minimal score) with the symptom "interface admin-down" if the
interface is disabled in the configuration. Then for each service interface is disabled in the configuration. Then, for each service
instance, the service expressions are constructed by combining the instance, the service expressions are constructed by combining the
subservice expressions of its dependencies. The way service subservice expressions of its dependencies. The way service
expressions are combined depends on the dependency types (impacting expressions are combined depends on the dependency types (impacting
or informational). Finally, the global computation graph is built by or informational). Finally, the global computation graph is built by
combining the service expressions, to get a global view of all combining the service expressions to get a global view of all
subservices. In other words, the global computation graph encodes subservices. In other words, the global computation graph encodes
all the operations needed to produce health statuses from the all the operations needed to produce health statuses from the
collected metrics. collected metrics.
The two types of dependencies for combining subservices are: The two types of dependencies for combining subservices are:
Informational Dependency: Type of dependency whose health score Informational Dependency:
does not impact the health score of its parent subservice or The type of dependency whose health score does not impact the
service instance(s) in the assurance graph. However, the symptoms health score of its parent subservice or service instance(s) in
should be taken into account in the parent service instance or the assurance graph. However, the symptoms should be taken into
subservice instance(s), for informational reasons. account in the parent service instance or subservice instance(s)
for informational reasons.
Impacting Dependency: Type of dependency whose score impacts the Impacting Dependency:
score of its parent subservice or service instance(s) in the The type of dependency whose health score impacts the health score
assurance graph. The symptoms are taken into account in the of its parent subservice or service instance(s) in the assurance
parent service instance or subservice instance(s), as the graph. The symptoms are taken into account in the parent service
impacting reasons. instance or subservice instance(s) as the impacting reasons.
The set of dependency type presented here is not exhaustive. More The set of dependency types presented here is not exhaustive. More
specific dependency types can be defined by extending the YANG model. specific dependency types can be defined by extending the YANG
For instance, a connectivity subservice depending on several path module. For instance, a connectivity subservice depending on several
subservices is only partially impacted if only one of these paths path subservices is partially impacted if only one of these paths
fails. Adding these new dependency types requires defining the fails. Adding these new dependency types requires defining the
corresponding operation for combining statuses of subservices. corresponding operation for combining statuses of subservices.
Subservices shall not be dependent on the protocol used to retrieve Subservices shall not be dependent on the protocol used to retrieve
the metrics. To justify this, let's consider the interface the metrics. To justify this, let's consider the interface
operational status. Depending on the device capabilities, this operational status. Depending on the device capabilities, this
status can be collected by an industry-accepted YANG module (IETF, status can be collected by an industry-accepted YANG module (e.g.,
Openconfig [OpenConfig]), by a vendor-specific YANG module, or even IETF or Openconfig [OpenConfig]), by a vendor-specific YANG module,
by a MIB module. If the subservice was dependent on the mechanism to or even by a MIB module. If the subservice was dependent on the
collect the operational status, then we would need multiple mechanism to collect the operational status, then we would need
subservice definitions in order to support all different mechanisms. multiple subservice definitions in order to support all different
This also implies that, while waiting for all the metrics to be mechanisms. This also implies that, while waiting for all the
available via standard YANG modules, SAIN agents might have to metrics to be available via standard YANG modules, SAIN agents might
retrieve metric values via non-standard YANG models, via MIB modules, have to retrieve metric values via nonstandard YANG data models, MIB
Command Line Interface (CLI), etc., effectively implementing a modules, the Command-Line Interface (CLI), etc., effectively
normalization layer between data models and information models. implementing a normalization layer between data models and
information models.
In order to keep subservices independent of metric collection method, In order to keep subservices independent of metric collection method
or, expressed differently, to support multiple combinations of (or, expressed differently, to support multiple combinations of
platforms, OSes, and even vendors, the architecture introduces the platforms, OSes, and even vendors), the architecture introduces the
concept of "metric engine". The metric engine maps each device- concept of "metric engine". The metric engine maps each device-
independent metric used in the subservices to a list of device- independent metric used in the subservices to a list of device-
specific metric implementations that precisely define how to fetch specific metric implementations that precisely define how to fetch
values for that metric. The mapping is parameterized by the values for that metric. The mapping is parameterized by the
characteristics (model, OS version, etc.) of the device from which characteristics (i.e., model, OS version, etc.) of the device from
the metrics are fetched. This metric engine is included in the SAIN which the metrics are fetched. This metric engine is included in the
agent. SAIN agent.
3.5. Open Interfaces with YANG Modules 3.5. Open Interfaces with YANG Modules
The interfaces between the architecture components are open thanks to The interfaces between the architecture components are open thanks to
the YANG modules specified in the YANG modules specified in [RFC9418]; they specify objects for
[I-D.ietf-opsawg-service-assurance-yang]; they specify objects for
assuring network services based on their decomposition into so-called assuring network services based on their decomposition into so-called
subservices, according to the SAIN architecture. subservices, according to the SAIN architecture.
These modules are intended for the following use cases: These modules are intended for the following use cases:
* Assurance graph configuration: * Assurance graph configuration:
- Subservices: configure a set of subservices to assure, by - Subservices: Configure a set of subservices to assure by
specifying their types and parameters. specifying their types and parameters.
- Dependencies: configure the dependencies between the - Dependencies: Configure the dependencies between the
subservices, along with their types. subservices, along with their types.
* Assurance telemetry: export the health status of the subservices, * Assurance telemetry: Export the health status of the subservices,
along with the observed symptoms. along with the observed symptoms.
Some examples of YANG instances can be found in Appendix A of Some examples of YANG instances can be found in Appendix A of
[I-D.ietf-opsawg-service-assurance-yang]. [RFC9418].
3.6. Handling Maintenance Windows 3.6. Handling Maintenance Windows
Whenever network components are under maintenance, the operator wants Whenever network components are under maintenance, the operator wants
to inhibit the emission of symptoms from those components. A typical to inhibit the emission of symptoms from those components. A typical
use case is device maintenance, during which the device is not use case is device maintenance, during which the device is not
supposed to be operational. As such, symptoms related to the device supposed to be operational. As such, symptoms related to the device
health should be ignored. Symptoms related to the device-specific health should be ignored. Symptoms related to the device-specific
subservices, such as the interfaces, might also be ignored because subservices, such as the interfaces, might also be ignored because
their state changes are probably the consequence of the maintenance. their state changes are probably the consequence of the maintenance.
The ietf-service-assurance model proposed in The ietf-service-assurance model described in [RFC9418] enables
[I-D.ietf-opsawg-service-assurance-yang] enables flagging subservices flagging subservices as under maintenance and, in that case, requires
as under maintenance, and, in that case, requires a string that a string that identifies the person or process that requested the
identifies the person or process who requested the maintenance. When maintenance. When a service or subservice is flagged as under
a service or subservice is flagged as under maintenance, it must maintenance, it must report a generic "Under Maintenance" symptom for
report a generic "Under Maintenance" symptom, for propagation towards propagation towards subservices that depend on this specific
subservices that depend on this specific subservice. Any other subservice. Any other symptom from this service or by one of its
symptom from this service, or by one of its impacting dependencies impacting dependencies must not be reported.
must not be reported.
We illustrate this mechanism on three independent examples based on We illustrate this mechanism on three independent examples based on
the assurance graph depicted in Figure 2: the assurance graph depicted in Figure 2:
* Device maintenance, for instance upgrading the device OS. The * Device maintenance, for instance, upgrading the device OS. The
operator flags the subservice "Peer1" device as under maintenance. operator flags the subservice "Peer1" device as under maintenance.
This inhibits the emission of symptoms, except "Under This inhibits the emission of symptoms, except "Under Maintenance"
Maintenance", from "Peer1 Physical Interface", "Peer1 Tunnel from "Peer1 Physical Interface", "Peer1 Tunnel Interface", and
Interface" and "Tunnel Service Instance". All other subservices "Tunnel Service Instance". All other subservices are unaffected.
are unaffected.
* Interface maintenance, for instance replacing a broken optic. The * Interface maintenance, for instance, replacing a broken optic.
operator flags the subservice "Peer1 Physical Interface" as under The operator flags the subservice "Peer1 Physical Interface" as
maintenance. This inhibits the emission of symptoms, except under maintenance. This inhibits the emission of symptoms, except
"Under Maintenance" from "Peer 1 Tunnel Interface" and "Tunnel "Under Maintenance" from "Peer 1 Tunnel Interface" and "Tunnel
Service Instance". All other subservices are unaffected. Service Instance". All other subservices are unaffected.
* Routing protocol maintenance, for instance modifying parameters or * Routing protocol maintenance, for instance, modifying parameters
redistribution. The operator marks the subservice "IS-IS Routing or redistribution. The operator marks the subservice "IS-IS
Protocol" as under maintenance. This inhibits the emission of Routing Protocol" as under maintenance. This inhibits the
symptoms, except "Under Maintenance", from "IP connectivity" and emission of symptoms, except "Under Maintenance" from "IP
"Tunnel Service Instance". All other subservices are unaffected. connectivity" and "Tunnel Service Instance". All other
subservices are unaffected.
In each example above, the subservice under maintenance is completely In each example above, the subservice under maintenance is completely
impacting the service instance, putting it under maintenance as well. impacting the service instance, putting it under maintenance as well.
There are use cases where the subservice under maintenance only There are use cases where the subservice under maintenance only
partially impacts the service instance. For instance, consider a partially impacts the service instance. For instance, consider a
service instance supported by both a primary and backup path. If a service instance supported by both a primary and backup path. If a
subservice impacting the primary path is under maintenance, the subservice impacting the primary path is under maintenance, the
service instance might still be functional but degraded. In that service instance might still be functional but degraded. In that
case, the status of the service instance might include "Primary path case, the status of the service instance might include "Primary path
Under Maintenance", "No redundancy" as well as other symptoms from Under Maintenance", "No redundancy", as well as other symptoms from
the backup path to explain the lower health score. In general, the the backup path to explain the lower health score. In general, the
computation of the service instance status from the subservices is computation of the service instance status from the subservices is
done in the SAIN collector whose implementation is out of scope for done in the SAIN collector whose implementation is out of scope for
this document. this document.
The maintenance of a subservice might modify or hide modifications of The maintenance of a subservice might modify or hide modifications of
the structure of the assurance graph. Therefore, unflagging a the structure of the assurance graph. Therefore, unflagging a
subservice as under maintenance should trigger an update of the subservice as under maintenance should trigger an update of the
assurance graph. assurance graph.
3.7. Flexible Functional Architecture 3.7. Flexible Functional Architecture
The SAIN architecture is flexible in terms of components. While the The SAIN architecture is flexible in terms of components. While the
SAIN architecture in Figure 1 makes a distinction between two SAIN architecture in Figure 1 makes a distinction between two
components, the service orchestrator and the SAIN orchestrator, in components, the service orchestrator and the SAIN orchestrator, in
practice those two components are mostly likely combined. Similarly, practice the two components are most likely combined. Similarly, the
the SAIN agents are displayed in Figure 1 as being separate SAIN agents are displayed in Figure 1 as being separate components.
components. Practically, the SAIN agents could be either independent In practice, the SAIN agents could be either independent components
components or directly integrated in monitored entities. A practical or directly integrated in monitored entities. A practical example is
example is an agent in a router. an agent in a router.
The SAIN architecture is also flexible in terms of services and The SAIN architecture is also flexible in terms of services and
subservices. In the proposed architecture, the SAIN orchestrator is subservices. In the defined architecture, the SAIN orchestrator is
coupled to a service orchestrator which defines the kinds of services coupled to a service orchestrator, which defines the kinds of
that the architecture handles. Most examples in this document deal services that the architecture handles. Most examples in this
with the notion of Network Service YANG modules, with well-known document deal with the notion of Network Service YANG Modules with
services such as L2VPN or tunnels. However, the concept of services well-known services, such as L2VPN or tunnels. However, the concept
is general enough to cross into different domains. One of them is of services is general enough to cross into different domains. One
the domain of service management on network elements, which also of them is the domain of service management on network elements,
require their own assurance. Examples include a DHCP server on a which also require their own assurance. Examples include a DHCP
Linux server, a data plane, an IPFIX export, etc. The notion of server on a Linux server, a data plane, an IPFIX export, etc. The
"service" is generic in this architecture and depends on the service notion of "service" is generic in this architecture and depends on
orchestrator and underlying network system, as illustrated by the the service orchestrator and underlying network system, as
following examples: illustrated by the following examples:
* if a main service orchestrator coordinates several lower level * If a main service orchestrator coordinates several lower-level
controllers, a service for the controller can be a subservice from controllers, a service for the controller can be a subservice from
the point of view of the orchestrator. the point of view of the orchestrator.
* A DHCP server/data plane/IPFIX export can be considered as * A DHCP server / data plane / IPFIX export can be considered
subservices for a device. subservices for a device.
* A routing instance can be considered as a subservice for a L3VPN. * A routing instance can be considered a subservice for an L3VPN.
* A tunnel can be considered as a subservice for an application in * A tunnel can be considered a subservice for an application in the
the cloud. cloud.
* A service function can be considered as a subservice for a service * A service function can be considered a subservice for a service
function chain [RFC7665]. function chain [RFC7665].
The assurance graph is created to be flexible and open, regardless of The assurance graph is created to be flexible and open, regardless of
the subservice types, locations, or domains. the subservice types, locations, or domains.
The SAIN architecture is also flexible in terms of distributed The SAIN architecture is also flexible in terms of distributed
graphs. As shown in Figure 1, the architecture comprises several graphs. As shown in Figure 1, the architecture comprises several
agents. Each agent is responsible for handling a subgraph of the agents. Each agent is responsible for handling a subgraph of the
assurance graph. The collector is responsible for fetching the sub- assurance graph. The collector is responsible for fetching the
graphs from the different agents and gluing them together. As an subgraphs from the different agents and gluing them together. As an
example, in the graph from Figure 2, the subservices relative to Peer example, in the graph from Figure 2, the subservices relative to Peer
1 might be handled by a different agent than the subservices relative 1 might be handled by a different agent than the subservices relative
to Peer 2 and the Connectivity and IS-IS subservices might be handled to Peer 2, and the Connectivity and IS-IS subservices might be
by yet another agent. The agents will export their partial graph and handled by yet another agent. The agents will export their partial
the collector will stitch them together as dependencies of the graph, and the collector will stitch them together as dependencies of
service instance. the service instance.
And finally, the SAIN architecture is flexible in terms of what it And finally, the SAIN architecture is flexible in terms of what it
monitors. Most, if not all examples, in this document refer to monitors. Most, if not all, examples in this document refer to
physical components, but this is not a constraint. Indeed, the physical components, but this is not a constraint. Indeed, the
assurance of virtual components would follow the same principles and assurance of virtual components would follow the same principles, and
an assurance graph composed of virtualized components (or a mix of an assurance graph composed of virtualized components (or a mix of
virtualized and physical ones) is supported by this architecture. virtualized and physical ones) is supported by this architecture.
3.8. Time window for symptoms history 3.8. Time Window for Symptoms' History
The health status reported via the YANG modules contains, for each The health status reported via the YANG modules contains, for each
subservice, the list of symptoms. Symptoms have a start and end subservice, the list of symptoms. Symptoms have a start and end
date, making it is possible to report symptoms that are no longer date, making it is possible to report symptoms that are no longer
occurring. occurring.
The SAIN agent might have to remove some symptoms for specific The SAIN agent might have to remove some symptoms for specific
subservice symptoms, because there are outdated and not relevant any subservice symptoms because they are outdated and no longer relevant
longer, or simply because the SAIN agent needs to free up some space. or simply because the SAIN agent needs to free up some space.
Regardless of the reason, it's important for a SAIN collector Regardless of the reason, it's important for a SAIN collector
(re-)connecting to a SAIN agent to understand the effect of this connecting/reconnecting to a SAIN agent to understand the effect of
garbage collection. this garbage collection.
Therefore, the SAIN agent contains a YANG object specifying the date Therefore, the SAIN agent contains a YANG object specifying the date
and time at which the symptoms' history starts for the subservice and time at which the symptoms' history starts for the subservice
instances. The subservice reports only symptoms that are occurring instances. The subservice reports only symptoms that are occurring
or that have been occurring after the history start date. or that have been occurring after the history start date.
3.9. New Assurance Graph Generation 3.9. New Assurance Graph Generation
The assurance graph will change over time, because services and The assurance graph will change over time, because services and
subservices come and go (changing the dependencies between subservices come and go (changing the dependencies between
subservices), or as a result of resolving maintenance issues. subservices) or as a result of resolving maintenance issues.
Therefore, an assurance graph version must be maintained, along with Therefore, an assurance graph version must be maintained, along with
the date and time of its last generation. The date and time of a the date and time of its last generation. The date and time of a
particular subservice instance (again dependencies or under particular subservice instance (again dependencies or under
maintenance) might be kept. From a client point of view, an maintenance) might be kept. From a client point of view, an
assurance graph change is triggered by the value of the assurance- assurance graph change is triggered by the value of the assurance-
graph-version and assurance-graph-last-change YANG leaves. At that graph-version and assurance-graph-last-change YANG leaves. At that
point in time, the client (collector) follows the following process: point in time, the client (collector) follows the following process:
* Keep the previous assurance-graph-last-change value (let's call it * Keep the previous assurance-graph-last-change value (let's call it
time T) time T).
* Run through all subservice instances and process the subservice * Run through all the subservice instances and process the
instances for which the last-change is newer that the time T subservice instances for which the last-change is newer than the
time T.
* Keep the new assurance-graph-last-change as the new referenced * Keep the new assurance-graph-last-change as the new referenced
date and time date and time.
4. Security Considerations 4. IANA Considerations
This document has no IANA actions.
5. Security Considerations
The SAIN architecture helps operators to reduce the mean time to The SAIN architecture helps operators to reduce the mean time to
detect and mean time to repair. However, the SAIN agents must be detect and the mean time to repair. However, the SAIN agents must be
secured: a compromised SAIN agent may be sending wrong root causes or secured; a compromised SAIN agent may be sending incorrect root
symptoms to the management systems. Securing the agents falls back causes or symptoms to the management systems. Securing the agents
to ensuring the integrity and confidentiality of the assurance graph. falls back to ensuring the integrity and confidentiality of the
This can be partially achieved by correctly setting permissions of assurance graph. This can be partially achieved by correctly setting
each node in the YANG model as described in Section 6 of permissions of each node in the YANG data model, as described in
[I-D.ietf-opsawg-service-assurance-yang]. Section 6 of [RFC9418].
Except for the configuration of telemetry, the agents do not need Except for the configuration of telemetry, the agents do not need
"write access" to the devices they monitor. This configuration is "write access" to the devices they monitor. This configuration is
applied with a YANG module, whose protection is covered by Secure applied with a YANG module, whose protection is covered by Secure
Shell (SSH) [RFC6242] for NETCONF or TLS [RFC8446] for RESTCONF. Shell (SSH) [RFC6242] for the Network Configuration Protocol
Devices should be configured so that agents have their own (NETCONF) or TLS [RFC8446] for RESTCONF. Devices should be
credentials with write access only for the YANG nodes configuring the configured so that agents have their own credentials with write
telemetry. access only for the YANG nodes configuring the telemetry.
The data collected by SAIN could potentially be compromising to the The data collected by SAIN could potentially be compromising to the
network or provide more insight into how the network is designed. network or provide more insight into how the network is designed.
Considering the data that SAIN requires (including CLI access in some Considering the data that SAIN requires (including CLI access in some
cases), one should weigh data access concerns with the impact that cases), one should weigh data access concerns with the impact that
reduced visibility will have on being able to rapidly identify root reduced visibility will have on being able to rapidly identify root
causes. causes.
For building the assurance graph, the SAIN orchestrator needs to For building the assurance graph, the SAIN orchestrator needs to
obtain the configuration from the service orchestrator. The latter obtain the configuration from the service orchestrator. The latter
should restrict access of the SAIN orchestrator to information needed should restrict access of the SAIN orchestrator to information needed
to build the assurance graph. to build the assurance graph.
If a closed loop system relies on this architecture then the well If a closed loop system relies on this architecture, then the well-
known issue of those systems also applies, i.e., a lying device or known issue of those systems also applies, i.e., a lying device or
compromised agent could trigger partial reconfiguration of the compromised agent could trigger partial reconfiguration of the
service or network. The SAIN architecture neither augments nor service or network. The SAIN architecture neither augments nor
reduces this risk. An extension of SAIN, out of scope for this reduces this risk. An extension of SAIN, which is out of scope for
document, could detect discrepancies between symptoms reported by this document, could detect discrepancies between symptoms reported
different agents and thus detect anomalies if an agent or a device is by different agents, and thus detect anomalies if an agent or a
lying. device is lying.
If NTP service goes down, the devices clocks might lose their If NTP service goes down, the devices clocks might lose their
synchronization. In that case, correlating information from synchronization. In that case, correlating information from
different devices, such as detecting symptoms about a link or different devices, such as detecting symptoms about a link or
correlating symptoms from different devices, will give inaccurate correlating symptoms from different devices, will give inaccurate
results. results.
5. IANA Considerations 6. References
This document includes no request to IANA.
6. Contributors
* Youssef El Fathi
* Eric Vyncke
7. References
7.1. Normative References
[I-D.ietf-opsawg-service-assurance-yang] 6.1. Normative References
Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T.
Arumugam, "YANG Modules for Service Assurance", Work in
Progress, Internet-Draft, draft-ietf-opsawg-service-
assurance-yang-10, 28 November 2022,
<https://www.ietf.org/archive/id/draft-ietf-opsawg-
service-assurance-yang-10.txt>.
[RFC8309] Wu, Q., Liu, W., Farrel, A., and RFC Publisher, "Service [RFC8309] Wu, Q., Liu, W., and A. Farrel, "Service Models
Models Explained", RFC 8309, DOI 10.17487/RFC8309, January Explained", RFC 8309, DOI 10.17487/RFC8309, January 2018,
2018, <https://www.rfc-editor.org/info/rfc8309>. <https://www.rfc-editor.org/info/rfc8309>.
[RFC8969] Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., Geng, [RFC8969] Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., and
L., and RFC Publisher, "A Framework for Automating Service L. Geng, "A Framework for Automating Service and Network
and Network Management with YANG", RFC 8969, Management with YANG", RFC 8969, DOI 10.17487/RFC8969,
DOI 10.17487/RFC8969, January 2021, January 2021, <https://www.rfc-editor.org/info/rfc8969>.
<https://www.rfc-editor.org/info/rfc8969>.
7.2. Informative References [RFC9418] Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T.
Arumugam, "YANG Modules for Service Assurance", RFC 9418,
DOI 10.17487/RFC9418, June 2023,
<https://www.rfc-editor.org/info/rfc9418>.
[I-D.ietf-opsawg-yang-vpn-service-pm] 6.2. Informative References
Wu, B., Wu, Q., Boucadair, M., de Dios, O. G., and B. Wen,
"A YANG Model for Network and VPN Service Performance
Monitoring", Work in Progress, Internet-Draft, draft-ietf-
opsawg-yang-vpn-service-pm-15, 11 November 2022,
<https://www.ietf.org/archive/id/draft-ietf-opsawg-yang-
vpn-service-pm-15.txt>.
[OpenConfig] [OpenConfig]
"OpenConfig", <https://openconfig.net>. "OpenConfig", <https://openconfig.net>.
[Piovesan2017] [Piovesan2017]
Piovesan, A. and E. Griffor, "Reasoning About Safety and Piovesan, A. and E. Griffor, "7 - Reasoning About Safety
Security: The Logic of Assurance", 2017, and Security: The Logic of Assurance",
DOI 10.1016/B978-0-12-803773-7.00007-3, 2017,
<https://doi.org/10.1016/B978-0-12-803773-7.00007-3>. <https://doi.org/10.1016/B978-0-12-803773-7.00007-3>.
[RFC2865] Rigney, C., Willens, S., Rubens, A., Simpson, W., and RFC [RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson,
Publisher, "Remote Authentication Dial In User Service "Remote Authentication Dial In User Service (RADIUS)",
(RADIUS)", RFC 2865, DOI 10.17487/RFC2865, June 2000, RFC 2865, DOI 10.17487/RFC2865, June 2000,
<https://www.rfc-editor.org/info/rfc2865>. <https://www.rfc-editor.org/info/rfc2865>.
[RFC5424] Gerhards, R. and RFC Publisher, "The Syslog Protocol", [RFC5424] Gerhards, R., "The Syslog Protocol", RFC 5424,
RFC 5424, DOI 10.17487/RFC5424, March 2009, DOI 10.17487/RFC5424, March 2009,
<https://www.rfc-editor.org/info/rfc5424>. <https://www.rfc-editor.org/info/rfc5424>.
[RFC5905] Mills, D., Martin, J., Ed., Burbank, J., Kasch, W., and [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
RFC Publisher, "Network Time Protocol Version 4: Protocol "Network Time Protocol Version 4: Protocol and Algorithms
and Algorithms Specification", RFC 5905, Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010,
DOI 10.17487/RFC5905, June 2010,
<https://www.rfc-editor.org/info/rfc5905>. <https://www.rfc-editor.org/info/rfc5905>.
[RFC6242] Wasserman, M. and RFC Publisher, "Using the NETCONF [RFC6242] Wasserman, M., "Using the NETCONF Protocol over Secure
Protocol over Secure Shell (SSH)", RFC 6242, Shell (SSH)", RFC 6242, DOI 10.17487/RFC6242, June 2011,
DOI 10.17487/RFC6242, June 2011,
<https://www.rfc-editor.org/info/rfc6242>. <https://www.rfc-editor.org/info/rfc6242>.
[RFC7011] Claise, B., Ed., Trammell, B., Ed., Aitken, P., and RFC [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
Publisher, "Specification of the IP Flow Information "Specification of the IP Flow Information Export (IPFIX)
Export (IPFIX) Protocol for the Exchange of Flow Protocol for the Exchange of Flow Information", STD 77,
Information", STD 77, RFC 7011, DOI 10.17487/RFC7011, RFC 7011, DOI 10.17487/RFC7011, September 2013,
September 2013, <https://www.rfc-editor.org/info/rfc7011>. <https://www.rfc-editor.org/info/rfc7011>.
[RFC7149] Boucadair, M., Jacquenet, C., and RFC Publisher, [RFC7149] Boucadair, M. and C. Jacquenet, "Software-Defined
"Software-Defined Networking: A Perspective from within a Networking: A Perspective from within a Service Provider
Service Provider Environment", RFC 7149, Environment", RFC 7149, DOI 10.17487/RFC7149, March 2014,
DOI 10.17487/RFC7149, March 2014,
<https://www.rfc-editor.org/info/rfc7149>. <https://www.rfc-editor.org/info/rfc7149>.
[RFC7665] Halpern, J., Ed., Pignataro, C., Ed., and RFC Publisher, [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function
"Service Function Chaining (SFC) Architecture", RFC 7665, Chaining (SFC) Architecture", RFC 7665,
DOI 10.17487/RFC7665, October 2015, DOI 10.17487/RFC7665, October 2015,
<https://www.rfc-editor.org/info/rfc7665>. <https://www.rfc-editor.org/info/rfc7665>.
[RFC7950] Bjorklund, M., Ed. and RFC Publisher, "The YANG 1.1 Data [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language",
Modeling Language", RFC 7950, DOI 10.17487/RFC7950, August RFC 7950, DOI 10.17487/RFC7950, August 2016,
2016, <https://www.rfc-editor.org/info/rfc7950>. <https://www.rfc-editor.org/info/rfc7950>.
[RFC8199] Bogdanovic, D., Claise, B., Moberg, C., and RFC Publisher, [RFC8199] Bogdanovic, D., Claise, B., and C. Moberg, "YANG Module
"YANG Module Classification", RFC 8199, Classification", RFC 8199, DOI 10.17487/RFC8199, July
DOI 10.17487/RFC8199, July 2017, 2017, <https://www.rfc-editor.org/info/rfc8199>.
<https://www.rfc-editor.org/info/rfc8199>.
[RFC8446] Rescorla, E. and RFC Publisher, "The Transport Layer [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol
Security (TLS) Protocol Version 1.3", RFC 8446, Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
DOI 10.17487/RFC8446, August 2018,
<https://www.rfc-editor.org/info/rfc8446>. <https://www.rfc-editor.org/info/rfc8446>.
[RFC8466] Wen, B., Fioccola, G., Ed., Xie, C., Jalil, L., and RFC [RFC8466] Wen, B., Fioccola, G., Ed., Xie, C., and L. Jalil, "A YANG
Publisher, "A YANG Data Model for Layer 2 Virtual Private Data Model for Layer 2 Virtual Private Network (L2VPN)
Network (L2VPN) Service Delivery", RFC 8466, Service Delivery", RFC 8466, DOI 10.17487/RFC8466, October
DOI 10.17487/RFC8466, October 2018, 2018, <https://www.rfc-editor.org/info/rfc8466>.
<https://www.rfc-editor.org/info/rfc8466>.
[RFC8641] Clemm, A., Voit, E., and RFC Publisher, "Subscription to [RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications
YANG Notifications for Datastore Updates", RFC 8641, for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641,
DOI 10.17487/RFC8641, September 2019, September 2019, <https://www.rfc-editor.org/info/rfc8641>.
<https://www.rfc-editor.org/info/rfc8641>.
[RFC8907] Dahm, T., Ota, A., Medway Gash, D.C., Carrel, D., Grant, [RFC8907] Dahm, T., Ota, A., Medway Gash, D.C., Carrel, D., and L.
L., and RFC Publisher, "The Terminal Access Controller Grant, "The Terminal Access Controller Access-Control
Access-Control System Plus (TACACS+) Protocol", RFC 8907, System Plus (TACACS+) Protocol", RFC 8907,
DOI 10.17487/RFC8907, September 2020, DOI 10.17487/RFC8907, September 2020,
<https://www.rfc-editor.org/info/rfc8907>. <https://www.rfc-editor.org/info/rfc8907>.
[RFC9315] Clemm, A., Ciavaglia, L., Granville, L. Z., Tantsura, J., [RFC9315] Clemm, A., Ciavaglia, L., Granville, L. Z., and J.
and RFC Publisher, "Intent-Based Networking - Concepts and Tantsura, "Intent-Based Networking - Concepts and
Definitions", RFC 9315, DOI 10.17487/RFC9315, October Definitions", RFC 9315, DOI 10.17487/RFC9315, October
2022, <https://www.rfc-editor.org/info/rfc9315>. 2022, <https://www.rfc-editor.org/info/rfc9315>.
Appendix A. Changes between revisions [RFC9375] Wu, B., Ed., Wu, Q., Ed., Boucadair, M., Ed., Gonzalez de
Dios, O., and B. Wen, "A YANG Data Model for Network and
[[RFC editor: please remove this section before publication.]] VPN Service Performance Monitoring", RFC 9375,
DOI 10.17487/RFC9375, April 2023,
v12 - 13 <https://www.rfc-editor.org/info/rfc9375>.
* Addressing IESG telechat feedback
v11 - 12
* Addressing comments from Last call
v10 - v11
* Adding reference to example of network performance model
v09 - v10
* Addressing comments from Rob Wilton
v08 - v09
* Addressing comments from Michael Richardson
v07 - v08
* Propagating removal of under-maintenance flag from the YANG module
v06-07
Addressing comments from Dhruv Dhody and applying pending changes
v03 - v04
* Address comments from Mohamed Boucadair
v00 - v01
* Cover the feedback received during the WG call for adoption
Acknowledgements Acknowledgements
The authors would like to thank Stephane Litkowski, Charles Eckel, The authors would like to thank Stephane Litkowski, Charles Eckel,
Rob Wilton, Vladimir Vassiliev, Gustavo Alburquerque, Stefan Vallin, Rob Wilton, Vladimir Vassiliev, Gustavo Alburquerque, Stefan Vallin,
Eric Vyncke, Mohamed Boucadair, Dhruv Dhody, Michael Richardson and Éric Vyncke, Mohamed Boucadair, Dhruv Dhody, Michael Richardson, and
Rob Wilton for their reviews and feedback. Rob Wilton for their reviews and feedback.
Contributors
* Youssef El Fathi
* Éric Vyncke
Authors' Addresses Authors' Addresses
Benoit Claise Benoit Claise
Huawei Huawei
Email: benoit.claise@huawei.com Email: benoit.claise@huawei.com
Jean Quilbeuf Jean Quilbeuf
Huawei Huawei
Email: jean.quilbeuf@huawei.com Email: jean.quilbeuf@huawei.com
Diego R. Lopez Diego R. Lopez
Telefonica I+D Telefonica I+D
Don Ramon de la Cruz, 82 Don Ramon de la Cruz, 82
Madrid 28006 28006 Madrid
Spain Spain
Email: diego.r.lopez@telefonica.com Email: diego.r.lopez@telefonica.com
Dan Voyer Dan Voyer
Bell Canada Bell Canada
Canada Canada
Email: daniel.voyer@bell.ca Email: daniel.voyer@bell.ca
Thangam Arumugam Thangam Arumugam
Cisco Systems, Inc. Consultant
Milpitas (California), Milpitas, California
United States of America United States of America
Email: tarumuga@cisco.com Email: thangavelu@yahoo.com
 End of changes. 151 change blocks. 
591 lines changed or deleted 545 lines changed or added

This html diff was produced by rfcdiff 1.48.