rfc9485v4.txt   rfc9485.txt 
Internet Engineering Task Force (IETF) C. Bormann Internet Engineering Task Force (IETF) C. Bormann
Request for Comments: 9485 Universität Bremen TZI Request for Comments: 9485 Universität Bremen TZI
Category: Standards Track T. Bray Category: Standards Track T. Bray
ISSN: 2070-1721 Textuality ISSN: 2070-1721 Textuality
September 2023 October 2023
I-Regexp: An Interoperable Regular Expression Format I-Regexp: An Interoperable Regular Expression Format
Abstract Abstract
This document specifies I-Regexp, a flavor of regular expression that This document specifies I-Regexp, a flavor of regular expression that
is limited in scope with the goal of interoperation across many is limited in scope with the goal of interoperation across many
different regular expression libraries. different regular expression libraries.
Status of This Memo Status of This Memo
skipping to change at line 81 skipping to change at line 81
(abbreviated as "regexp") flavor, I-Regexp. (abbreviated as "regexp") flavor, I-Regexp.
I-Regexp does not provide advanced regular expression features such I-Regexp does not provide advanced regular expression features such
as capture groups, lookahead, or backreferences. It supports only a as capture groups, lookahead, or backreferences. It supports only a
Boolean matching capability, i.e., testing whether a given regular Boolean matching capability, i.e., testing whether a given regular
expression matches a given piece of text. expression matches a given piece of text.
I-Regexp supports the entire repertoire of Unicode characters I-Regexp supports the entire repertoire of Unicode characters
(Unicode scalar values); both the I-Regexp strings themselves and the (Unicode scalar values); both the I-Regexp strings themselves and the
strings they are matched against are sequences of Unicode scalar strings they are matched against are sequences of Unicode scalar
values (often represented in UTF-8 encoding form [STD63] for values (often represented in UTF-8 encoding form [RFC3629] for
interchange). interchange).
I-Regexp is a subset of XML Schema Definition (XSD) regular I-Regexp is a subset of XML Schema Definition (XSD) regular
expressions [XSD-2]. expressions [XSD-2].
This document includes guidance for converting I-Regexps for use with This document includes guidance for converting I-Regexps for use with
several well-known regular expression idioms. several well-known regular expression idioms.
The development of I-Regexp was motivated by the work of the JSONPath The development of I-Regexp was motivated by the work of the JSONPath
Working Group (WG). The WG wanted to include support for the use of Working Group (WG). The WG wanted to include support for the use of
skipping to change at line 337 skipping to change at line 337
libraries in severely constrained environments may not be able to libraries in severely constrained environments may not be able to
support I-Regexp conformance. support I-Regexp conformance.
7. IANA Considerations 7. IANA Considerations
This document has no IANA actions. This document has no IANA actions.
8. Security Considerations 8. Security Considerations
While technically out of the scope of this specification, Section 10 While technically out of the scope of this specification, Section 10
("Security Considerations") of [STD63] applies to implementations. ("Security Considerations") of [RFC3629] applies to implementations.
Particular note needs to be taken of the last paragraph of Section 3 Particular note needs to be taken of the last paragraph of Section 3
("UTF-8 definition") of [STD63]; an I-Regexp implementation may need ("UTF-8 definition") of [RFC3629]; an I-Regexp implementation may
to mitigate limitations of the platform implementation in this need to mitigate limitations of the platform implementation in this
regard. regard.
As discussed in Section 6, more complex regexp libraries may contain As discussed in Section 6, more complex regexp libraries may contain
exploitable bugs, which can lead to crashes and remote code exploitable bugs, which can lead to crashes and remote code
execution. There is also the problem that such libraries often have execution. There is also the problem that such libraries often have
performance characteristics that are hard to predict, leading to performance characteristics that are hard to predict, leading to
attacks that overload an implementation by matching against an attacks that overload an implementation by matching against an
expensive attacker-controlled regexp. expensive attacker-controlled regexp.
I-Regexps have been designed to allow implementation in a way that is I-Regexps have been designed to allow implementation in a way that is
skipping to change at line 444 skipping to change at line 444
ietf-jsonpath-base-20>. ietf-jsonpath-base-20>.
[PCRE2] "Perl-compatible Regular Expressions (revised API: [PCRE2] "Perl-compatible Regular Expressions (revised API:
PCRE2)", <http://pcre.org/current/doc/html/>. PCRE2)", <http://pcre.org/current/doc/html/>.
[RE2] "RE2 is a fast, safe, thread-friendly alternative to [RE2] "RE2 is a fast, safe, thread-friendly alternative to
backtracking regular expression engines like those used in backtracking regular expression engines like those used in
PCRE, Perl, and Python. It is a C++ library.", commit PCRE, Perl, and Python. It is a C++ library.", commit
73031bb, <https://github.com/google/re2>. 73031bb, <https://github.com/google/re2>.
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
2003, <https://www.rfc-editor.org/info/rfc3629>.
[RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493, [RFC7493] Bray, T., Ed., "The I-JSON Message Format", RFC 7493,
DOI 10.17487/RFC7493, March 2015, DOI 10.17487/RFC7493, March 2015,
<https://www.rfc-editor.org/info/rfc7493>. <https://www.rfc-editor.org/info/rfc7493>.
[STD63] Yergeau, F., "UTF-8, a transformation format of ISO
10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
2003, <https://www.rfc-editor.org/info/rfc3629>.
[UNICODE-GLOSSARY] [UNICODE-GLOSSARY]
Unicode, Inc., "Glossary of Unicode Terms", Unicode, Inc., "Glossary of Unicode Terms",
<https://unicode.org/glossary/>. <https://unicode.org/glossary/>.
Acknowledgements Acknowledgements
Discussion in the IETF JSONPATH WG about whether to include a regexp Discussion in the IETF JSONPATH WG about whether to include a regexp
mechanism into the JSONPath query expression specification and mechanism into the JSONPath query expression specification and
previous discussions about the YANG pattern and Concise Data previous discussions about the YANG pattern and Concise Data
Definition Language (CDDL) .regexp features motivated this Definition Language (CDDL) .regexp features motivated this
 End of changes. 6 change blocks. 
9 lines changed or deleted 9 lines changed or added

This html diff was produced by rfcdiff 1.48.