| rfc9417v2.txt | rfc9417.txt | |||
|---|---|---|---|---|
| Internet Engineering Task Force (IETF) B. Claise | Internet Engineering Task Force (IETF) B. Claise | |||
| Request for Comments: 9417 J. Quilbeuf | Request for Comments: 9417 J. Quilbeuf | |||
| Category: Informational Huawei | Category: Informational Huawei | |||
| ISSN: 2070-1721 D. Lopez | ISSN: 2070-1721 D. Lopez | |||
| Telefonica I+D | Telefonica I+D | |||
| D. Voyer | D. Voyer | |||
| Bell Canada | Bell Canada | |||
| T. Arumugam | T. Arumugam | |||
| Cisco Systems, Inc. | Consultant | |||
| May 2023 | June 2023 | |||
| Service Assurance for Intent-Based Networking Architecture | Service Assurance for Intent-Based Networking Architecture | |||
| Abstract | Abstract | |||
| This document describes an architecture that provides some assurance | This document describes an architecture that provides some assurance | |||
| that service instances are running as expected. As services rely | that service instances are running as expected. As services rely | |||
| upon multiple subservices provided by a variety of elements, | upon multiple subservices provided by a variety of elements, | |||
| including the underlying network devices and functions, getting the | including the underlying network devices and functions, getting the | |||
| assurance of a healthy service is only possible with a holistic view | assurance of a healthy service is only possible with a holistic view | |||
| skipping to change at line 99 ¶ | skipping to change at line 99 ¶ | |||
| Service orchestrators use Network Service YANG Modules that will | Service orchestrators use Network Service YANG Modules that will | |||
| infer network-wide configuration and, therefore, the invocation of | infer network-wide configuration and, therefore, the invocation of | |||
| the appropriate device modules (Section 3 of [RFC8969]). Knowing | the appropriate device modules (Section 3 of [RFC8969]). Knowing | |||
| that a configuration is applied doesn't imply that the provisioned | that a configuration is applied doesn't imply that the provisioned | |||
| service instance is up and running as expected. For instance, the | service instance is up and running as expected. For instance, the | |||
| service might be degraded because of a failure in the network, the | service might be degraded because of a failure in the network, the | |||
| service quality may be degraded, or a service function may be | service quality may be degraded, or a service function may be | |||
| reachable at the IP level but does not provide its intended function. | reachable at the IP level but does not provide its intended function. | |||
| Thus, the network operator must monitor the service's operational | Thus, the network operator must monitor the service's operational | |||
| data at the same time as the configuration (Section 3.3 of | data at the same time as the configuration (Section 3.3 of | |||
| [RFC8969]). To feul that task, the industry has been standardizing | [RFC8969]). To fuel that task, the industry has been standardizing | |||
| on telemetry to push network element performance information (e.g., | on telemetry to push network element performance information (e.g., | |||
| [RFC9375]). | [RFC9375]). | |||
| A network administrator needs to monitor its network and services as | A network administrator needs to monitor its network and services as | |||
| a whole, independently of the management protocols. With different | a whole, independently of the management protocols. With different | |||
| protocols come different data models and different ways to model the | protocols come different data models and different ways to model the | |||
| same type of information. When network administrators deal with | same type of information. When network administrators deal with | |||
| multiple management protocols, the network management entities have | multiple management protocols, the network management entities have | |||
| to perform the difficult and time-consuming job of mapping data | to perform the difficult and time-consuming job of mapping data | |||
| models, e.g., the model used for configuration with the model used | models, e.g., the model used for configuration with the model used | |||
| for monitoring when separate models or protocols are used. This | for monitoring when separate models or protocols are used. This | |||
| problem is compounded by a large, disparate set of data sources | problem is compounded by a large, disparate set of data sources | |||
| (e.g., MIB modules, YANG models [RFC7950], IP Flow Information Export | (e.g., MIB modules, YANG data models [RFC7950], IP Flow Information | |||
| (IPFIX) information elements [RFC7011], syslog plain text [RFC5424], | Export (IPFIX) information elements [RFC7011], syslog plain text | |||
| Terminal Access Controller Access-Control System Plus (TACACS+) | [RFC5424], Terminal Access Controller Access-Control System Plus | |||
| [RFC8907], RADIUS [RFC2865], etc.). In order to avoid this data | (TACACS+) [RFC8907], RADIUS [RFC2865], etc.). In order to avoid this | |||
| model mapping, the industry converged on model-driven telemetry to | data model mapping, the industry converged on model-driven telemetry | |||
| stream the service operational data, reusing the YANG models used for | to stream the service operational data, reusing the YANG data models | |||
| configuration. Model-driven telemetry greatly facilitates the notion | used for configuration. Model-driven telemetry greatly facilitates | |||
| of closed-loop automation, whereby events and updated operational | the notion of closed-loop automation, whereby events and updated | |||
| states streamed from the network drive remediation change back into | operational states streamed from the network drive remediation change | |||
| the network. | back into the network. | |||
| However, it proves difficult for network operators to correlate the | However, it proves difficult for network operators to correlate the | |||
| service degradation with the network root cause, for example, "Why | service degradation with the network root cause, for example, "Why | |||
| does my layer 3 virtual private network (L3VPN) fail to connect?" or | does my layer 3 virtual private network (L3VPN) fail to connect?" or | |||
| "Why is this specific service not highly responsive?" The reverse, | "Why is this specific service not highly responsive?" The reverse, | |||
| i.e., which services are impacted when a network component fails or | i.e., which services are impacted when a network component fails or | |||
| degrades, is also important for operators, for example, "Which | degrades, is also important for operators, for example, "Which | |||
| services are impacted when this specific optic decibel milliwatt | services are impacted when this specific optic decibel milliwatt | |||
| (dBm) begins to degrade?", "Which applications are impacted by an | (dBm) begins to degrade?", "Which applications are impacted by an | |||
| imbalance in this Equal-Cost Multipath (ECMP) bundle?", or "Is that | imbalance in this Equal-Cost Multipath (ECMP) bundle?", or "Is that | |||
| skipping to change at line 356 ¶ | skipping to change at line 356 ¶ | |||
| graph and computing the health statuses in a distributed manner. The | graph and computing the health statuses in a distributed manner. The | |||
| collector is in charge of collecting and displaying the current | collector is in charge of collecting and displaying the current | |||
| inferred health status of the service instances and subservices. The | inferred health status of the service instances and subservices. The | |||
| collector also detects changes in the assurance graph structures | collector also detects changes in the assurance graph structures | |||
| (e.g., an occurrence of a switchover from primary to backup path) and | (e.g., an occurrence of a switchover from primary to backup path) and | |||
| forwards the information to the orchestrator, which reconfigures the | forwards the information to the orchestrator, which reconfigures the | |||
| agents. Finally, the automation loop is closed by having the SAIN | agents. Finally, the automation loop is closed by having the SAIN | |||
| collector provide feedback to the network/service orchestrator. | collector provide feedback to the network/service orchestrator. | |||
| In order to make agents, orchestrators, and collectors from different | In order to make agents, orchestrators, and collectors from different | |||
| vendors interoperable, their interface is defined as a YANG model in | vendors interoperable, their interface is defined as a YANG module in | |||
| a companion document [RFC9418]. In Figure 1, the communications that | a companion document [RFC9418]. In Figure 1, the communications that | |||
| are normalized by this YANG model are tagged with a "Y". The use of | are normalized by this YANG module are tagged with a "Y". The use of | |||
| this YANG module is further explained in Section 3.5. | this YANG module is further explained in Section 3.5. | |||
| +-----------------+ | +-----------------+ | |||
| | Service | | | Service | | |||
| | Orchestrator |<----------------------+ | | Orchestrator |<----------------------+ | |||
| | | | | | | | | |||
| +-----------------+ | | +-----------------+ | | |||
| | ^ | | | ^ | | |||
| | | Network | | | | Network | | |||
| | | Service | Feedback | | | Service | Feedback | |||
| skipping to change at line 811 ¶ | skipping to change at line 811 ¶ | |||
| account in the parent service instance or subservice instance(s) | account in the parent service instance or subservice instance(s) | |||
| for informational reasons. | for informational reasons. | |||
| Impacting Dependency: | Impacting Dependency: | |||
| The type of dependency whose health score impacts the health score | The type of dependency whose health score impacts the health score | |||
| of its parent subservice or service instance(s) in the assurance | of its parent subservice or service instance(s) in the assurance | |||
| graph. The symptoms are taken into account in the parent service | graph. The symptoms are taken into account in the parent service | |||
| instance or subservice instance(s) as the impacting reasons. | instance or subservice instance(s) as the impacting reasons. | |||
| The set of dependency types presented here is not exhaustive. More | The set of dependency types presented here is not exhaustive. More | |||
| specific dependency types can be defined by extending the YANG model. | specific dependency types can be defined by extending the YANG | |||
| For instance, a connectivity subservice depending on several path | module. For instance, a connectivity subservice depending on several | |||
| subservices is partially impacted if only one of these paths fails. | path subservices is partially impacted if only one of these paths | |||
| Adding these new dependency types requires defining the corresponding | fails. Adding these new dependency types requires defining the | |||
| operation for combining statuses of subservices. | corresponding operation for combining statuses of subservices. | |||
| Subservices shall not be dependent on the protocol used to retrieve | Subservices shall not be dependent on the protocol used to retrieve | |||
| the metrics. To justify this, let's consider the interface | the metrics. To justify this, let's consider the interface | |||
| operational status. Depending on the device capabilities, this | operational status. Depending on the device capabilities, this | |||
| status can be collected by an industry-accepted YANG module (e.g., | status can be collected by an industry-accepted YANG module (e.g., | |||
| IETF or Openconfig [OpenConfig]), by a vendor-specific YANG module, | IETF or Openconfig [OpenConfig]), by a vendor-specific YANG module, | |||
| or even by a MIB module. If the subservice was dependent on the | or even by a MIB module. If the subservice was dependent on the | |||
| mechanism to collect the operational status, then we would need | mechanism to collect the operational status, then we would need | |||
| multiple subservice definitions in order to support all different | multiple subservice definitions in order to support all different | |||
| mechanisms. This also implies that, while waiting for all the | mechanisms. This also implies that, while waiting for all the | |||
| metrics to be available via standard YANG modules, SAIN agents might | metrics to be available via standard YANG modules, SAIN agents might | |||
| have to retrieve metric values via nonstandard YANG models, MIB | have to retrieve metric values via nonstandard YANG data models, MIB | |||
| modules, the Command-Line Interface (CLI), etc., effectively | modules, the Command-Line Interface (CLI), etc., effectively | |||
| implementing a normalization layer between data models and | implementing a normalization layer between data models and | |||
| information models. | information models. | |||
| In order to keep subservices independent of metric collection method | In order to keep subservices independent of metric collection method | |||
| (or, expressed differently, to support multiple combinations of | (or, expressed differently, to support multiple combinations of | |||
| platforms, OSes, and even vendors), the architecture introduces the | platforms, OSes, and even vendors), the architecture introduces the | |||
| concept of "metric engine". The metric engine maps each device- | concept of "metric engine". The metric engine maps each device- | |||
| independent metric used in the subservices to a list of device- | independent metric used in the subservices to a list of device- | |||
| specific metric implementations that precisely define how to fetch | specific metric implementations that precisely define how to fetch | |||
| skipping to change at line 1042 ¶ | skipping to change at line 1042 ¶ | |||
| This document has no IANA actions. | This document has no IANA actions. | |||
| 5. Security Considerations | 5. Security Considerations | |||
| The SAIN architecture helps operators to reduce the mean time to | The SAIN architecture helps operators to reduce the mean time to | |||
| detect and the mean time to repair. However, the SAIN agents must be | detect and the mean time to repair. However, the SAIN agents must be | |||
| secured; a compromised SAIN agent may be sending incorrect root | secured; a compromised SAIN agent may be sending incorrect root | |||
| causes or symptoms to the management systems. Securing the agents | causes or symptoms to the management systems. Securing the agents | |||
| falls back to ensuring the integrity and confidentiality of the | falls back to ensuring the integrity and confidentiality of the | |||
| assurance graph. This can be partially achieved by correctly setting | assurance graph. This can be partially achieved by correctly setting | |||
| permissions of each node in the YANG model, as described in Section 6 | permissions of each node in the YANG data model, as described in | |||
| of [RFC9418]. | Section 6 of [RFC9418]. | |||
| Except for the configuration of telemetry, the agents do not need | Except for the configuration of telemetry, the agents do not need | |||
| "write access" to the devices they monitor. This configuration is | "write access" to the devices they monitor. This configuration is | |||
| applied with a YANG module, whose protection is covered by Secure | applied with a YANG module, whose protection is covered by Secure | |||
| Shell (SSH) [RFC6242] for the Network Configuration Protocol | Shell (SSH) [RFC6242] for the Network Configuration Protocol | |||
| (NETCONF) or TLS [RFC8446] for RESTCONF. Devices should be | (NETCONF) or TLS [RFC8446] for RESTCONF. Devices should be | |||
| configured so that agents have their own credentials with write | configured so that agents have their own credentials with write | |||
| access only for the YANG nodes configuring the telemetry. | access only for the YANG nodes configuring the telemetry. | |||
| The data collected by SAIN could potentially be compromising to the | The data collected by SAIN could potentially be compromising to the | |||
| skipping to change at line 1095 ¶ | skipping to change at line 1095 ¶ | |||
| Explained", RFC 8309, DOI 10.17487/RFC8309, January 2018, | Explained", RFC 8309, DOI 10.17487/RFC8309, January 2018, | |||
| <https://www.rfc-editor.org/info/rfc8309>. | <https://www.rfc-editor.org/info/rfc8309>. | |||
| [RFC8969] Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., and | [RFC8969] Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., and | |||
| L. Geng, "A Framework for Automating Service and Network | L. Geng, "A Framework for Automating Service and Network | |||
| Management with YANG", RFC 8969, DOI 10.17487/RFC8969, | Management with YANG", RFC 8969, DOI 10.17487/RFC8969, | |||
| January 2021, <https://www.rfc-editor.org/info/rfc8969>. | January 2021, <https://www.rfc-editor.org/info/rfc8969>. | |||
| [RFC9418] Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T. | [RFC9418] Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T. | |||
| Arumugam, "YANG Modules for Service Assurance", RFC 9418, | Arumugam, "YANG Modules for Service Assurance", RFC 9418, | |||
| DOI 10.17487/RFC9418, May 2023, | DOI 10.17487/RFC9418, June 2023, | |||
| <https://www.rfc-editor.org/info/rfc9418>. | <https://www.rfc-editor.org/info/rfc9418>. | |||
| 6.2. Informative References | 6.2. Informative References | |||
| [OpenConfig] | [OpenConfig] | |||
| "OpenConfig", <https://openconfig.net>. | "OpenConfig", <https://openconfig.net>. | |||
| [Piovesan2017] | [Piovesan2017] | |||
| Piovesan, A. and E. Griffor, "7 - Reasoning About Safety | Piovesan, A. and E. Griffor, "7 - Reasoning About Safety | |||
| and Security: The Logic of Assurance", | and Security: The Logic of Assurance", | |||
| skipping to change at line 1217 ¶ | skipping to change at line 1217 ¶ | |||
| 28006 Madrid | 28006 Madrid | |||
| Spain | Spain | |||
| Email: diego.r.lopez@telefonica.com | Email: diego.r.lopez@telefonica.com | |||
| Dan Voyer | Dan Voyer | |||
| Bell Canada | Bell Canada | |||
| Canada | Canada | |||
| Email: daniel.voyer@bell.ca | Email: daniel.voyer@bell.ca | |||
| Thangam Arumugam | Thangam Arumugam | |||
| Cisco Systems, Inc. | Consultant | |||
| Milpitas, California | Milpitas, California | |||
| United States of America | United States of America | |||
| Email: tarumuga@cisco.com | Email: thangavelu@yahoo.com | |||
| End of changes. 11 change blocks. | ||||
| 25 lines changed or deleted | 25 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||