rfc9417v2.txt   rfc9417.txt 
Internet Engineering Task Force (IETF) B. Claise Internet Engineering Task Force (IETF) B. Claise
Request for Comments: 9417 J. Quilbeuf Request for Comments: 9417 J. Quilbeuf
Category: Informational Huawei Category: Informational Huawei
ISSN: 2070-1721 D. Lopez ISSN: 2070-1721 D. Lopez
Telefonica I+D Telefonica I+D
D. Voyer D. Voyer
Bell Canada Bell Canada
T. Arumugam T. Arumugam
Cisco Systems, Inc. Consultant
May 2023 June 2023
Service Assurance for Intent-Based Networking Architecture Service Assurance for Intent-Based Networking Architecture
Abstract Abstract
This document describes an architecture that provides some assurance This document describes an architecture that provides some assurance
that service instances are running as expected. As services rely that service instances are running as expected. As services rely
upon multiple subservices provided by a variety of elements, upon multiple subservices provided by a variety of elements,
including the underlying network devices and functions, getting the including the underlying network devices and functions, getting the
assurance of a healthy service is only possible with a holistic view assurance of a healthy service is only possible with a holistic view
skipping to change at line 99 skipping to change at line 99
Service orchestrators use Network Service YANG Modules that will Service orchestrators use Network Service YANG Modules that will
infer network-wide configuration and, therefore, the invocation of infer network-wide configuration and, therefore, the invocation of
the appropriate device modules (Section 3 of [RFC8969]). Knowing the appropriate device modules (Section 3 of [RFC8969]). Knowing
that a configuration is applied doesn't imply that the provisioned that a configuration is applied doesn't imply that the provisioned
service instance is up and running as expected. For instance, the service instance is up and running as expected. For instance, the
service might be degraded because of a failure in the network, the service might be degraded because of a failure in the network, the
service quality may be degraded, or a service function may be service quality may be degraded, or a service function may be
reachable at the IP level but does not provide its intended function. reachable at the IP level but does not provide its intended function.
Thus, the network operator must monitor the service's operational Thus, the network operator must monitor the service's operational
data at the same time as the configuration (Section 3.3 of data at the same time as the configuration (Section 3.3 of
[RFC8969]). To feul that task, the industry has been standardizing [RFC8969]). To fuel that task, the industry has been standardizing
on telemetry to push network element performance information (e.g., on telemetry to push network element performance information (e.g.,
[RFC9375]). [RFC9375]).
A network administrator needs to monitor its network and services as A network administrator needs to monitor its network and services as
a whole, independently of the management protocols. With different a whole, independently of the management protocols. With different
protocols come different data models and different ways to model the protocols come different data models and different ways to model the
same type of information. When network administrators deal with same type of information. When network administrators deal with
multiple management protocols, the network management entities have multiple management protocols, the network management entities have
to perform the difficult and time-consuming job of mapping data to perform the difficult and time-consuming job of mapping data
models, e.g., the model used for configuration with the model used models, e.g., the model used for configuration with the model used
for monitoring when separate models or protocols are used. This for monitoring when separate models or protocols are used. This
problem is compounded by a large, disparate set of data sources problem is compounded by a large, disparate set of data sources
(e.g., MIB modules, YANG models [RFC7950], IP Flow Information Export (e.g., MIB modules, YANG data models [RFC7950], IP Flow Information
(IPFIX) information elements [RFC7011], syslog plain text [RFC5424], Export (IPFIX) information elements [RFC7011], syslog plain text
Terminal Access Controller Access-Control System Plus (TACACS+) [RFC5424], Terminal Access Controller Access-Control System Plus
[RFC8907], RADIUS [RFC2865], etc.). In order to avoid this data (TACACS+) [RFC8907], RADIUS [RFC2865], etc.). In order to avoid this
model mapping, the industry converged on model-driven telemetry to data model mapping, the industry converged on model-driven telemetry
stream the service operational data, reusing the YANG models used for to stream the service operational data, reusing the YANG data models
configuration. Model-driven telemetry greatly facilitates the notion used for configuration. Model-driven telemetry greatly facilitates
of closed-loop automation, whereby events and updated operational the notion of closed-loop automation, whereby events and updated
states streamed from the network drive remediation change back into operational states streamed from the network drive remediation change
the network. back into the network.
However, it proves difficult for network operators to correlate the However, it proves difficult for network operators to correlate the
service degradation with the network root cause, for example, "Why service degradation with the network root cause, for example, "Why
does my layer 3 virtual private network (L3VPN) fail to connect?" or does my layer 3 virtual private network (L3VPN) fail to connect?" or
"Why is this specific service not highly responsive?" The reverse, "Why is this specific service not highly responsive?" The reverse,
i.e., which services are impacted when a network component fails or i.e., which services are impacted when a network component fails or
degrades, is also important for operators, for example, "Which degrades, is also important for operators, for example, "Which
services are impacted when this specific optic decibel milliwatt services are impacted when this specific optic decibel milliwatt
(dBm) begins to degrade?", "Which applications are impacted by an (dBm) begins to degrade?", "Which applications are impacted by an
imbalance in this Equal-Cost Multipath (ECMP) bundle?", or "Is that imbalance in this Equal-Cost Multipath (ECMP) bundle?", or "Is that
skipping to change at line 356 skipping to change at line 356
graph and computing the health statuses in a distributed manner. The graph and computing the health statuses in a distributed manner. The
collector is in charge of collecting and displaying the current collector is in charge of collecting and displaying the current
inferred health status of the service instances and subservices. The inferred health status of the service instances and subservices. The
collector also detects changes in the assurance graph structures collector also detects changes in the assurance graph structures
(e.g., an occurrence of a switchover from primary to backup path) and (e.g., an occurrence of a switchover from primary to backup path) and
forwards the information to the orchestrator, which reconfigures the forwards the information to the orchestrator, which reconfigures the
agents. Finally, the automation loop is closed by having the SAIN agents. Finally, the automation loop is closed by having the SAIN
collector provide feedback to the network/service orchestrator. collector provide feedback to the network/service orchestrator.
In order to make agents, orchestrators, and collectors from different In order to make agents, orchestrators, and collectors from different
vendors interoperable, their interface is defined as a YANG model in vendors interoperable, their interface is defined as a YANG module in
a companion document [RFC9418]. In Figure 1, the communications that a companion document [RFC9418]. In Figure 1, the communications that
are normalized by this YANG model are tagged with a "Y". The use of are normalized by this YANG module are tagged with a "Y". The use of
this YANG module is further explained in Section 3.5. this YANG module is further explained in Section 3.5.
+-----------------+ +-----------------+
| Service | | Service |
| Orchestrator |<----------------------+ | Orchestrator |<----------------------+
| | | | | |
+-----------------+ | +-----------------+ |
| ^ | | ^ |
| | Network | | | Network |
| | Service | Feedback | | Service | Feedback
skipping to change at line 811 skipping to change at line 811
account in the parent service instance or subservice instance(s) account in the parent service instance or subservice instance(s)
for informational reasons. for informational reasons.
Impacting Dependency: Impacting Dependency:
The type of dependency whose health score impacts the health score The type of dependency whose health score impacts the health score
of its parent subservice or service instance(s) in the assurance of its parent subservice or service instance(s) in the assurance
graph. The symptoms are taken into account in the parent service graph. The symptoms are taken into account in the parent service
instance or subservice instance(s) as the impacting reasons. instance or subservice instance(s) as the impacting reasons.
The set of dependency types presented here is not exhaustive. More The set of dependency types presented here is not exhaustive. More
specific dependency types can be defined by extending the YANG model. specific dependency types can be defined by extending the YANG
For instance, a connectivity subservice depending on several path module. For instance, a connectivity subservice depending on several
subservices is partially impacted if only one of these paths fails. path subservices is partially impacted if only one of these paths
Adding these new dependency types requires defining the corresponding fails. Adding these new dependency types requires defining the
operation for combining statuses of subservices. corresponding operation for combining statuses of subservices.
Subservices shall not be dependent on the protocol used to retrieve Subservices shall not be dependent on the protocol used to retrieve
the metrics. To justify this, let's consider the interface the metrics. To justify this, let's consider the interface
operational status. Depending on the device capabilities, this operational status. Depending on the device capabilities, this
status can be collected by an industry-accepted YANG module (e.g., status can be collected by an industry-accepted YANG module (e.g.,
IETF or Openconfig [OpenConfig]), by a vendor-specific YANG module, IETF or Openconfig [OpenConfig]), by a vendor-specific YANG module,
or even by a MIB module. If the subservice was dependent on the or even by a MIB module. If the subservice was dependent on the
mechanism to collect the operational status, then we would need mechanism to collect the operational status, then we would need
multiple subservice definitions in order to support all different multiple subservice definitions in order to support all different
mechanisms. This also implies that, while waiting for all the mechanisms. This also implies that, while waiting for all the
metrics to be available via standard YANG modules, SAIN agents might metrics to be available via standard YANG modules, SAIN agents might
have to retrieve metric values via nonstandard YANG models, MIB have to retrieve metric values via nonstandard YANG data models, MIB
modules, the Command-Line Interface (CLI), etc., effectively modules, the Command-Line Interface (CLI), etc., effectively
implementing a normalization layer between data models and implementing a normalization layer between data models and
information models. information models.
In order to keep subservices independent of metric collection method In order to keep subservices independent of metric collection method
(or, expressed differently, to support multiple combinations of (or, expressed differently, to support multiple combinations of
platforms, OSes, and even vendors), the architecture introduces the platforms, OSes, and even vendors), the architecture introduces the
concept of "metric engine". The metric engine maps each device- concept of "metric engine". The metric engine maps each device-
independent metric used in the subservices to a list of device- independent metric used in the subservices to a list of device-
specific metric implementations that precisely define how to fetch specific metric implementations that precisely define how to fetch
skipping to change at line 1042 skipping to change at line 1042
This document has no IANA actions. This document has no IANA actions.
5. Security Considerations 5. Security Considerations
The SAIN architecture helps operators to reduce the mean time to The SAIN architecture helps operators to reduce the mean time to
detect and the mean time to repair. However, the SAIN agents must be detect and the mean time to repair. However, the SAIN agents must be
secured; a compromised SAIN agent may be sending incorrect root secured; a compromised SAIN agent may be sending incorrect root
causes or symptoms to the management systems. Securing the agents causes or symptoms to the management systems. Securing the agents
falls back to ensuring the integrity and confidentiality of the falls back to ensuring the integrity and confidentiality of the
assurance graph. This can be partially achieved by correctly setting assurance graph. This can be partially achieved by correctly setting
permissions of each node in the YANG model, as described in Section 6 permissions of each node in the YANG data model, as described in
of [RFC9418]. Section 6 of [RFC9418].
Except for the configuration of telemetry, the agents do not need Except for the configuration of telemetry, the agents do not need
"write access" to the devices they monitor. This configuration is "write access" to the devices they monitor. This configuration is
applied with a YANG module, whose protection is covered by Secure applied with a YANG module, whose protection is covered by Secure
Shell (SSH) [RFC6242] for the Network Configuration Protocol Shell (SSH) [RFC6242] for the Network Configuration Protocol
(NETCONF) or TLS [RFC8446] for RESTCONF. Devices should be (NETCONF) or TLS [RFC8446] for RESTCONF. Devices should be
configured so that agents have their own credentials with write configured so that agents have their own credentials with write
access only for the YANG nodes configuring the telemetry. access only for the YANG nodes configuring the telemetry.
The data collected by SAIN could potentially be compromising to the The data collected by SAIN could potentially be compromising to the
skipping to change at line 1095 skipping to change at line 1095
Explained", RFC 8309, DOI 10.17487/RFC8309, January 2018, Explained", RFC 8309, DOI 10.17487/RFC8309, January 2018,
<https://www.rfc-editor.org/info/rfc8309>. <https://www.rfc-editor.org/info/rfc8309>.
[RFC8969] Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., and [RFC8969] Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., and
L. Geng, "A Framework for Automating Service and Network L. Geng, "A Framework for Automating Service and Network
Management with YANG", RFC 8969, DOI 10.17487/RFC8969, Management with YANG", RFC 8969, DOI 10.17487/RFC8969,
January 2021, <https://www.rfc-editor.org/info/rfc8969>. January 2021, <https://www.rfc-editor.org/info/rfc8969>.
[RFC9418] Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T. [RFC9418] Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T.
Arumugam, "YANG Modules for Service Assurance", RFC 9418, Arumugam, "YANG Modules for Service Assurance", RFC 9418,
DOI 10.17487/RFC9418, May 2023, DOI 10.17487/RFC9418, June 2023,
<https://www.rfc-editor.org/info/rfc9418>. <https://www.rfc-editor.org/info/rfc9418>.
6.2. Informative References 6.2. Informative References
[OpenConfig] [OpenConfig]
"OpenConfig", <https://openconfig.net>. "OpenConfig", <https://openconfig.net>.
[Piovesan2017] [Piovesan2017]
Piovesan, A. and E. Griffor, "7 - Reasoning About Safety Piovesan, A. and E. Griffor, "7 - Reasoning About Safety
and Security: The Logic of Assurance", and Security: The Logic of Assurance",
skipping to change at line 1217 skipping to change at line 1217
28006 Madrid 28006 Madrid
Spain Spain
Email: diego.r.lopez@telefonica.com Email: diego.r.lopez@telefonica.com
Dan Voyer Dan Voyer
Bell Canada Bell Canada
Canada Canada
Email: daniel.voyer@bell.ca Email: daniel.voyer@bell.ca
Thangam Arumugam Thangam Arumugam
Cisco Systems, Inc. Consultant
Milpitas, California Milpitas, California
United States of America United States of America
Email: tarumuga@cisco.com Email: thangavelu@yahoo.com
 End of changes. 11 change blocks. 
25 lines changed or deleted 25 lines changed or added

This html diff was produced by rfcdiff 1.48.