| rfc9485v4.txt | rfc9485.txt | |||
|---|---|---|---|---|
| Internet Engineering Task Force (IETF) C. Bormann | Internet Engineering Task Force (IETF) C. Bormann | |||
| Request for Comments: 9485 Universität Bremen TZI | Request for Comments: 9485 Universität Bremen TZI | |||
| Category: Standards Track T. Bray | Category: Standards Track T. Bray | |||
| ISSN: 2070-1721 Textuality | ISSN: 2070-1721 Textuality | |||
| September 2023 | October 2023 | |||
| I-Regexp: An Interoperable Regular Expression Format | I-Regexp: An Interoperable Regular Expression Format | |||
| Abstract | Abstract | |||
| This document specifies I-Regexp, a flavor of regular expression that | This document specifies I-Regexp, a flavor of regular expression that | |||
| is limited in scope with the goal of interoperation across many | is limited in scope with the goal of interoperation across many | |||
| different regular expression libraries. | different regular expression libraries. | |||
| Status of This Memo | Status of This Memo | |||
| skipping to change at line 81 ¶ | skipping to change at line 81 ¶ | |||
| (abbreviated as "regexp") flavor, I-Regexp. | (abbreviated as "regexp") flavor, I-Regexp. | |||
| I-Regexp does not provide advanced regular expression features such | I-Regexp does not provide advanced regular expression features such | |||
| as capture groups, lookahead, or backreferences. It supports only a | as capture groups, lookahead, or backreferences. It supports only a | |||
| Boolean matching capability, i.e., testing whether a given regular | Boolean matching capability, i.e., testing whether a given regular | |||
| expression matches a given piece of text. | expression matches a given piece of text. | |||
| I-Regexp supports the entire repertoire of Unicode characters | I-Regexp supports the entire repertoire of Unicode characters | |||
| (Unicode scalar values); both the I-Regexp strings themselves and the | (Unicode scalar values); both the I-Regexp strings themselves and the | |||
| strings they are matched against are sequences of Unicode scalar | strings they are matched against are sequences of Unicode scalar | |||
| values (often represented in UTF-8 encoding form [STD63] for | values (often represented in UTF-8 encoding form [RFC3629] for | |||
| interchange). | interchange). | |||
| I-Regexp is a subset of XML Schema Definition (XSD) regular | I-Regexp is a subset of XML Schema Definition (XSD) regular | |||
| expressions [XSD-2]. | expressions [XSD-2]. | |||
| This document includes guidance for converting I-Regexps for use with | This document includes guidance for converting I-Regexps for use with | |||
| several well-known regular expression idioms. | several well-known regular expression idioms. | |||
| The development of I-Regexp was motivated by the work of the JSONPath | The development of I-Regexp was motivated by the work of the JSONPath | |||
| Working Group (WG). The WG wanted to include support for the use of | Working Group (WG). The WG wanted to include support for the use of | |||
| skipping to change at line 337 ¶ | skipping to change at line 337 ¶ | |||
| libraries in severely constrained environments may not be able to | libraries in severely constrained environments may not be able to | |||
| support I-Regexp conformance. | support I-Regexp conformance. | |||
| 7. IANA Considerations | 7. IANA Considerations | |||
| This document has no IANA actions. | This document has no IANA actions. | |||
| 8. Security Considerations | 8. Security Considerations | |||
| While technically out of the scope of this specification, Section 10 | While technically out of the scope of this specification, Section 10 | |||
| ("Security Considerations") of [STD63] applies to implementations. | ("Security Considerations") of [RFC3629] applies to implementations. | |||
| Particular note needs to be taken of the last paragraph of Section 3 | Particular note needs to be taken of the last paragraph of Section 3 | |||
| ("UTF-8 definition") of [STD63]; an I-Regexp implementation may need | ("UTF-8 definition") of [RFC3629]; an I-Regexp implementation may | |||
| to mitigate limitations of the platform implementation in this | need to mitigate limitations of the platform implementation in this | |||
| regard. | regard. | |||
| As discussed in Section 6, more complex regexp libraries may contain | As discussed in Section 6, more complex regexp libraries may contain | |||
| exploitable bugs, which can lead to crashes and remote code | exploitable bugs, which can lead to crashes and remote code | |||
| execution. There is also the problem that such libraries often have | execution. There is also the problem that such libraries often have | |||
| performance characteristics that are hard to predict, leading to | performance characteristics that are hard to predict, leading to | |||
| attacks that overload an implementation by matching against an | attacks that overload an implementation by matching against an | |||
| expensive attacker-controlled regexp. | expensive attacker-controlled regexp. | |||
| I-Regexps have been designed to allow implementation in a way that is | I-Regexps have been designed to allow implementation in a way that is | |||
| skipping to change at line 444 ¶ | skipping to change at line 444 ¶ | |||
| ietf-jsonpath-base-20>. | ietf-jsonpath-base-20>. | |||
| [PCRE2] "Perl-compatible Regular Expressions (revised API: | [PCRE2] "Perl-compatible Regular Expressions (revised API: | |||
| PCRE2)", <http://pcre.org/current/doc/html/>. | PCRE2)", <http://pcre.org/current/doc/html/>. | |||
| [RE2] "RE2 is a fast, safe, thread-friendly alternative to | [RE2] "RE2 is a fast, safe, thread-friendly alternative to | |||
| backtracking regular expression engines like those used in | backtracking regular expression engines like those used in | |||
| PCRE, Perl, and Python. It is a C++ library.", commit | PCRE, Perl, and Python. It is a C++ library.", commit | |||
| 73031bb, <https://github.com/google/re2>. | 73031bb, <https://github.com/google/re2>. | |||
| [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO | ||||
| 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November | ||||
| 2003, <https://www.rfc-editor.org/info/rfc3629>. | ||||
| [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, | [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, | |||
| DOI 10.17487/RFC7493, March 2015, | DOI 10.17487/RFC7493, March 2015, | |||
| <https://www.rfc-editor.org/info/rfc7493>. | <https://www.rfc-editor.org/info/rfc7493>. | |||
| [STD63] Yergeau, F., "UTF-8, a transformation format of ISO | ||||
| 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November | ||||
| 2003, <https://www.rfc-editor.org/info/rfc3629>. | ||||
| [UNICODE-GLOSSARY] | [UNICODE-GLOSSARY] | |||
| Unicode, Inc., "Glossary of Unicode Terms", | Unicode, Inc., "Glossary of Unicode Terms", | |||
| <https://unicode.org/glossary/>. | <https://unicode.org/glossary/>. | |||
| Acknowledgements | Acknowledgements | |||
| Discussion in the IETF JSONPATH WG about whether to include a regexp | Discussion in the IETF JSONPATH WG about whether to include a regexp | |||
| mechanism into the JSONPath query expression specification and | mechanism into the JSONPath query expression specification and | |||
| previous discussions about the YANG pattern and Concise Data | previous discussions about the YANG pattern and Concise Data | |||
| Definition Language (CDDL) .regexp features motivated this | Definition Language (CDDL) .regexp features motivated this | |||
| End of changes. 6 change blocks. | ||||
| 9 lines changed or deleted | 9 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||