Internet-Draft G. Staykov Intended status: Standards Track VMware Expires: May 07, 2013 J. Hu VMware November 07, 2012 JSON Canonical Form draft-staykov-hu-json-canonical-form-00 Abstract A single JSON document can have multiple logically equivalent physical representations. While convenient for human interaction, this flexibility is inconvenient for cases where a machine is used to assess the logical equivalence of documents. In cases where logical equivalence is useful, an encoder should produce a canonical form of a JSON document. For example, since digital signatures demand the same physical representation for logically equivalent documents, a canonical physical representation would allow the signature to apply to the logical document. This internet draft has the goal to define a canonical form of JSON documents. Two logically equivalent documents should have same canonical form. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. 1. Introduction JSON [JSON] is a lightweight data-interchange text format that is suitable for both humans and machines. It allows multiple physical representations that are logically equivalent. For example, a formatting change to add whitespaces and line endings to make a document more human readable will result in a different representation when doing a byte for byte comparison. There are cases however where it is essential to have a single physical representation of a data document. For example when a cryptographic hash is applied over a JSON document, a single physical representation allows the hash to represent the logical content of the document by removing variation in how that content is encoded in JSON. Thus a common physical representation of logically equivalent JSON documents should be defined. It is called canonical form. 2. JSON canonical form The canonical form is defined by the following rules: * The document MUST be encoded in UTF-8 [UTF-8] * Non-significant(1) whitespace characters MUST NOT be used * Non-significant(1) line endings MUST NOT be used * Entries (set of name/value pairs) in JSON objects MUST be sorted lexicographically(2) by their names * Arrays MUST preserve their initial ordering (1)As defined in JSON data-interchange format [JSON], JSON objects consists of multiple "name"/"value" pairs and JSON arrays consists of multiple "value" fields. Non-significant means not part of "name" or "value". (2)Lexicographic comparison, which orders strings from least to greatest alphabetically based on the UCS (Unicode Character Set) codepoint values. 2.1 Canonical representation of data types 2.1.1 Double The double data type is represented as specified in the XML schema standard [XML] * The canonical representation of the double data type consists of mantissa followed by "E", followed by exponent. * Mantissa * MUST be represented as a decimal. The decimal point is mandatory * There MUST be a single non zero digit on the left of the decimal point (unless a zero is represented). * There MUST be at least single digit on the right of the decimal point. * Exponent * Zero exponent is represented by "E0". * "+" sign is prohibited in both the mantissa and the exponent. * Leading zeroes are prohibited from the left side of the decimal point in the mantissa and from the exponent. * Special values (NaN, INF) MUST not be used. 3. Applications The JSON canonical form can be used when digitally signing JSON documents generated from a serialization library. Because serialization and deserialization libraries might tolerate variation in physical representation, different physical representations may result after several serialization / deserialization cycles. This could result in false signature verification failures as the hash digest of the same document differs from the hash digest used when signing. A way to avoid this problem is to use canonical form when signing and verifying hash digests. 4. Examples 4.1. Example 1 Input: { "foo" : "foo bar" } Canonical form: {"foo":"foo bar"} Demonstrates: * Non-significant whitespace characters and line endings are removed. * Whitespaces inside name/value object entities are preserved. 4.2. Example 2 Input: { "foo":"bar", "abc":"def", "zoo" : [ "def", "abc" ] } Canonical Form: {"abc":"def","foo":"bar","zoo":["def","abc"]} Demonstrates: * Non-significant whitespaces and line endings are removed. * Name/value pairs in JSON objects are lexicographically sorted by "name" key. * Array order is preserved. 4.3. Example 3 Input: { "d1":-12.34e4, "d2":1E-130, "d3":0.0E-0, "d4":1.2 } Canonical Form: {"d1":-1.234E5,"d2":1.0E-130,"d3":0.0E0,"d4":1.2E0} Demonstrates: * Various canonical representations of double data types. 5. Security Considerations This document provides a groundwork needed for providing data integrity by using digital signatures over JSON messages. 6. IANA Considerations This document has no actions for IANA 7. References 7.1. Normative References [JSON] http://www.json.org/ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [UTF-8] UTF-8, a transformation format of ISO 10646, IETF RFC 3629. F. Yergeau. January 1998. http://www.ietf.org/rfc/rfc3629.txt [XML] http://www.w3.org/TR/xmlschema-2 Authors' Addresses Georgi Staykov VMware Email: gstaykov@vmware.com Jeff Hu VMware Email: jhu@vmware.com