IETF J. Klensin Internet-Draft S. Moonesamy Obsoletes: 3987 (if approved) July 9, 2012 Intended status: Standards Track Expires: January 10, 2013 An XML-based Simple Resource Identifier draft-klensin-iri-sri-00.txt Abstract While the URI specification has been widely deployed, it has long been recognized that many valid URIs, especially those that contain extensive information in the "tail" are unsuitable for user presentation, especially for internationalized environments. IRIs have been proposed as a solution for that problem but inherit (and are constrained by) the complex and sometimes method-dependent syntax model of URIs as well as positional and ordering assumptions that make them more difficult to localize than is desirable. This specification illustrates a way to define an "above URI" model for a localization-friendly simple reference identifier (SRI) that explicitly identifies fields and is more appropriate than IRIs to support localization. The current version is intended simply to initiate a discussion. In particular, while it is written to use an XML element syntax model, variations using JSON or some other system with explicitly-labeled data fields might be as, or more, appropriate. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 10, 2013. Copyright Notice Klensin & Moonesamy Expires January 10, 2013 [Page 1] Internet-Draft SRI July 2012 Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Status and Discussion . . . . . . . . . . . . . . . . . . 3 2. Tagged Elements . . . . . . . . . . . . . . . . . . . . . . . 4 3. Data Element Description . . . . . . . . . . . . . . . . . . . 4 3.1. scheme Element . . . . . . . . . . . . . . . . . . . . . . 5 3.2. authority Element . . . . . . . . . . . . . . . . . . . . 5 3.2.1. user-info Element . . . . . . . . . . . . . . . . . . 5 3.2.2. host Element . . . . . . . . . . . . . . . . . . . . . 5 3.2.3. port Element . . . . . . . . . . . . . . . . . . . . . 5 3.3. path Element . . . . . . . . . . . . . . . . . . . . . . . 5 3.4. query Element . . . . . . . . . . . . . . . . . . . . . . 5 3.5. fragment Element . . . . . . . . . . . . . . . . . . . . . 6 4. Internationalization and Escapes . . . . . . . . . . . . . . . 6 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 8. Security Considerations . . . . . . . . . . . . . . . . . . . 7 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 9.1. Normative References . . . . . . . . . . . . . . . . . . . 7 9.2. Informative References . . . . . . . . . . . . . . . . . . 8 Appendix A. This Specification and the IRI Approach . . . . . . . 8 Appendix B. XML DTD . . . . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 Klensin & Moonesamy Expires January 10, 2013 [Page 2] Internet-Draft SRI July 2012 1. Introduction While the URI specification [RFC3986] has been widely deployed, it has long been recognized that many valid URIs, especially those that contain extensive information in the "tail" are unsuitable for user presentation, especially for internationalized environments. IRIs [RFC3987] have been proposed as a solution for that problem but inherit (and are constrained by) the complex and sometimes method- dependent syntax model of URIs as well as positional and ordering assumptions that make them more difficult to localize than is desirable. This specification illustrates a way to define a localization- friendly "above URI" simple syntax (a "SRI") that explicitly identifies fields and is more appropriate than IRIs to support localization. [[anchor2: Note in Draft: "Simple" is chosen in the grand tradition of "simple" protocols like SMTP and SIP". Certainly the parsing of the compound identifier into components is simpler than the URI model. But suggestions for alternate terms would be welcome if "simple" turns into flame-bait.]] This specification obviates most, if not all, of the perceived need for IRIs and hence obsoletes the specification of them in RFC 3087. A discussion of the reasons for that action appears in Appendix A. 1.1. Terminology The terms "i18n" and "l10n" are liberally used as abbreviations for "internationalization" and "localization", respectively, in this specification. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 1.2. Status and Discussion [[anchor5: RFC Editor: Please remove this subsection.]] This draft is a pre-proposal to stimulate discussion of the IRI approach and alternatives to it. While it is deliberately incomplete, the path to an actual proposal should be clear. Also, the choice of an XML element syntax model [XML] structure was fairly arbitrary. It would probably be equally reasonable to support a JSON [RFC4627] or other structure instead (or additionally) as long as the basic syntax chosen supports clear identification of data elements Klensin & Moonesamy Expires January 10, 2013 [Page 3] Internet-Draft SRI July 2012 and a very precise and context-independent syntax for element values. Discussion of this draft should occur on the IRI WG mailing list. Details about subscription and archives for the list may be found at XXXXX. 2. Tagged Elements Much of the complexity in the URI specification lies in trying to identify and extract the various parts of a URI. That process is complicated by scheme-dependent elements and the associated delimiters which may be reserved or not depending on the scheme. That work may be appropriate if some system actually needs to parse and execute a URI -- an activity that requires understanding the scheme in any event-- but may be less appropriate for an i18n / l10n overlay. This specification overcomes that problem and the associated complexities introduced by characters outside the ASCII repertoire, URI escaping conventions, and so on by eliminating the constraint of forward compatibility with URIs in favor of a more international format that can be easily localized and equally easily be mapped into that URI syntax. 3. Data Element Description This section maps the various components of URIs into XML elements. For purposes of this specification, the URI syntax is discarded; only the data elements are retained. The mapping from an XML-structured document using these elements to URI syntax should be fairly obvious [[anchor8: ...and possibly covered in more detail in a future version of this spec]]. It is obviously possible to specify a collection of elements with this specification that, when mapped back into URI syntax, will be invalid or confusing for a particular scheme. If that is perceived as an issue, specific lists of what elements are valid for which schemes should be easy to compile. The basic structure starts with a localization-friendly element that contains all other elements (and has no direct textual content): [[anchor9: Note in Draft: Each of the subsections that follow can probably benefit from some fleshing-out. For this version, the general intent should be clear. It is likely that several more subsidiary elements are needed, but that is a topic for future discussion.]] Klensin & Moonesamy Expires January 10, 2013 [Page 4] Internet-Draft SRI July 2012 3.1. scheme Element SchemeName The Scheme element has no subsidiary elements. 3.2. authority Element Authority elements as below. The Authority element has the subsidiary elements listed in the subsections below. 3.2.1. user-info Element 3.2.2. host Element Domain names are subject to special rules because of IDNA considerations, so the normal content of the host element is a domain element. [Domain-]relative URIs do not use the domain element. 3.2.2.1. domain Element Fully-qualified-domain-name 3.2.3. port Element NN NN is a numeric port number. 3.3. path Element PathString [[anchor16: Subsidiary elements here, including and/or when appropriate.]] 3.4. query Element QueryString [[anchor18: Subsidiary elements here, including and/or when appropriate.]] Klensin & Moonesamy Expires January 10, 2013 [Page 5] Internet-Draft SRI July 2012 3.5. fragment Element FragmentName or other identifier The Fragment element has no subsidiary elements. 4. Internationalization and Escapes Part of the goal for the format specified here is to express the abstract components of a URI as naturally as possible. Consequently, any text component of any element can be expressed in UTF-8 in normalization form NFC. Escapes ("%" or otherwise) are prohibited except as required by XML. If "%" appears, it must be doubled in mapping to URI syntax. 5. Examples [[anchor22: There should be several of these, each showing a URI and the matching XRI form.]] The URI that would appear as http://example.com/test?sri=http://example.net/ Would appear in this form as: http example.com /test http example.net Klensin & Moonesamy Expires January 10, 2013 [Page 6] Internet-Draft SRI July 2012 [[anchor23: Note in draft: RFC (and I-D) constraints prohibit showing one of these data structures with characters in it outside the ASCII repertoire. If the document ever progresses to RFC, an alternate form that can show such examples including such characters should be a requirement.]] 6. Acknowledgements Some of the structuring information for this document was derived from a W3C working draft on URLs [W3C-URL] as well as the URI specification. The thinking that led to this work started with a discussion many years ago with James Seng in which he pointed out that the "natural" ordering of components of compound identifiers differed by culture. 7. IANA Considerations [[anchor24: RFC Editor: Please remove this section before publication.]] This memo includes no requests to or actions for IANA. 8. Security Considerations The model introduced in this specification does not raise any security issues not already present in the URI specification that would not be caught by a URI processor. Because it is less subtle and complex than the URI specification, it may actually lead to a reduction in vunerabilities. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, January 2005. Klensin & Moonesamy Expires January 10, 2013 [Page 7] Internet-Draft SRI July 2012 [XML] Bray, T., Ed., Paoli, J., Ed., Sperberg-McQueen, C., Ed., and E. Maler, Ed., "Extensible Markup Language (XML) 1.0 (Second Edition), W3C=20 Recommendation", October 2000, . 9.2. Informative References [IRI-Charter] IETF, "Internationalized Resource Identifiers (iri)", Captured 2012-07-05, 2019, . [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. [RFC4627] Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, July 2006. [RFC5890] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", RFC 5890, August 2010. [RFC6055] Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on Encodings for Internationalized Domain Names", RFC 6055, February 2011. [W3C-URL] W3C, "URL", Captured 2012-07-03, 2012, . Appendix A. This Specification and the IRI Approach The original IRI specification [RFC3987] was intended as a strict superset of the URI syntax [RFC3986] with all URI forms being permitted but with the use of non-escaped UTF-8 strings also being allowed. IRIs were not separate protocol identifiers or intended for use "on the wire". Instead, they were intended as an overlay for URIs that was more convenient for users. In part because of the interaction with the original [RFC3490] and revised [RFC5890] versions of the IDNA specification, the mapping from IRIs to URIs was not unique: one could map a domain name expressed as a UTF-8 string into either a URI escape sequence or into a set of IDNA A-labels. That choice interacted badly with the domain name encoding considerations discussed by the IAB [RFC6055] and, more importantly, with URI comparisons in caches and similar contexts. Based on those and other considerations, an IETF WG charged with IRI Klensin & Moonesamy Expires January 10, 2013 [Page 8] Internet-Draft SRI July 2012 revision [IRI-Charter] concluded that IRIs should be treated as a separate protocol identifier, primarily for use in new protocols, rather than as a strictly-forward-compatible URI overlay. That decision immediately raised the question of whether it was more valuable to preserve a URI-like syntax or depart from it entirely. This specification resulted from the desire to explore the possibilities that would be opened up by abandoning the constraint of apparent similarity to the URI syntax. But, just as the decision to move to a separate protocol identifier essentially recognizes that the IRIs defined in RFC 3987 was not feasible and an IRI variation that defined a new protocol element while retaining the general form of the URI syntax would obsolete 3987, this specification does as well: whether the underlying syntax model is changed or not, the WG has concluded that IRIs as defined in RFC 3987 are inappropriate for general use on the public Internet. Appendix B. XML DTD Klensin & Moonesamy Expires January 10, 2013 [Page 9] Internet-Draft SRI July 2012 Authors' Addresses John C Klensin 1770 Massachusetts Ave, Ste 322 Cambridge, MA 02140 USA Phone: +1 617 491 5735 Email: john-ietf@jck.com Subramanian Moonesamy 76, Ylang Ylang Avenue Quatre Bornes Mauritius Email: sm+ietf@elandsys.com Klensin & Moonesamy Expires January 10, 2013 [Page 10]