Network Working Group R. Peon Internet-Draft Google, Inc Intended status: Informational Oct 12, 2012 Expires: April 15, 2013 Header Delta-Compression for HTTP/2.0 draft-rpeon-httpbis-header-compression-00 Abstract This document describes a mechanism for compressing streams of groups of key-value pairs, often known as Headers in an HTTP session. See RFC 2616 [RFC2616] or successors for more information about headers. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 15, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Peon Expires April 15, 2013 [Page 1] Internet-Draft HTTP/2 Header Compression Oct 2012 Table of Contents 1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Example-data-model . . . . . . . . . . . . . . . . . . . . . . 3 4. Compression-overview . . . . . . . . . . . . . . . . . . . . . 4 5. Decompression-overview . . . . . . . . . . . . . . . . . . . . 6 6. On-the-wire-overview . . . . . . . . . . . . . . . . . . . . . 7 7. Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 9 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 9. Requirements Notation . . . . . . . . . . . . . . . . . . . . 10 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 11. Normative References . . . . . . . . . . . . . . . . . . . . . 10 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 Peon Expires April 15, 2013 [Page 2] Internet-Draft HTTP/2 Header Compression Oct 2012 1. Overview There have been several problems pointed out with the use of the gzip compressor in SPDY [SPDY]. The biggest of these problems is that it is possible for a smart attacker to inject content into the compressor, and then to test hypotheses about the prior contents of the compressor by examining the output size after each such content injection. The other issue is that gzip often consumes more CPU than many would like, especially in situations where one is doing forward or reverse proxying. The compressor proposed here intends to solve the first issue and significantly mitigate the second, while providing compression that is not too much worse than gzip. 2. Definitions user-agent: The program or device which a human interacts with directly and which typically initiates the transport layer connection or session client: Synonym for user-agent server: The computer or device which typically accepts a connection, stores, and serves data proxy: An entity acting as a server for the client, and a client for the server header: An complete set of key-value pairs, either request- headers, or response headers as defined in RFC2616 [RFC2616] section 5.3 or 6.2, respectively 3. Example-data-model The author's current experimental implementation uses something very close to the following data-model for doing compression. Peon Expires April 15, 2013 [Page 3] Internet-Draft HTTP/2 Header Compression Oct 2012 struct ValEntry { KeyMapIterator krt; ValMapIterator vrt; LRUIterator lrt; Index val_idx; }; typedef map ValMap; struct KeyEntry { int refcnt; Index key_idx; ValMap valmap; } typedef map KeyMap; typedef set HeaderGroup; typedef map HeaderGroups; struct CompressorState { HeaderGroups header_groups; KeyMap key_map; list lru; }; typedef map; KeyIdxMap; typedef map; ValIdxMap; struct DecompressorState { KeyIdxMap key_idx_map; ValIdxMap val_idx_map; list lru; }; 4. Compression-overview Note that this only describes conceptually how the algorithm works and is very much pseudo-code User-agents and servers SHOULD NOT NOT use EREF operations. User-agents and servers SHOULD use huffman-coding when so doing reduces the size of the bytes on the wire Proxies MAY use EREF operations when throughput is more important than client experienced latency. Peon Expires April 15, 2013 [Page 4] Internet-Draft HTTP/2 Header Compression Oct 2012 Proxies MAY choose to encode all strings as literal strings and skip huffman coding. Compress(headers): first_used_val_idx = CurrentValIdx() old_keymap = keymap For header in headers: if key in keymap: keymap[key].refcnt += 1 for header in headers: key_map_iterator = lookup(header.key) if (!key_map_iterator): key_iterator = insert_new_key(header.key) insert_new_value_from_key(key_iterator, header.val) instructions.append(KVSto(key,val)) else if (val_iterator = lookup_val_in_keymap_entry(key_map_iterator, header.val)): // we've found that the key,value pair already exists in the compressor // we now must check to see if it is already visible, in which case // there is nothing to do. If it isn't visible, however, we'll toggle // the visibility on. if (!header_group.includes(val_iterator_>val_idx)): instructions.append(Toggle(val_iterator_>val_idx)); else: // we know we found a key, but not the entire key_value. // thus, we emit a clone() instructions. instructions.append(Clone(key_iterator_>key_id, header.val) if key in old_keymap: keymap[key].refcnt += 1 for header_group_entry in header_group: if header_group_entry not in header: instructions.append(header_group_entry_>idx()) for instruction in instructions: ExecuteInstruction(instructions) SerializeInstructions(output_buffer, instructions, first_used_val_idx) ExecuteInstruction(instruction): while (instruction.size_delta() + state_size > max_state_size): RemoveOldestLRUItem() if instruction.opcode == "clone": InsertKV(instruction.lookup_key(), instruction.val(), GetNextLRUIdx()) elif instruction.opcode == "kvsto": InsertKV(instruction.key(), instruction.val(), GetNextLRUIdx()) Peon Expires April 15, 2013 [Page 5] Internet-Draft HTTP/2 Header Compression Oct 2012 elif instruction.opcode == "togglerange": for (i=instruction.start; i <= instruction.last; ++i) bool prev_state = ToggleIdx(i) if prev_state == NOT_VISIBLE: MoveToFrontOfLRU(i) elif instruction.opcode == "toggle": bool prev_state = ToggleIdx(instruction.index) if prev_state == NOT_VISIBLE: MoveToFrontOfLRU(instruction.index) SerializeInstructions(output_buffer, instructions, first_val_idx): // Note that huffman-encoding is optional-- a bit (somwhere) would // indicate that it was or was not used for any particular string. output_buffer.WriteValIdx(first_val_idx) for toggle in filter(TOGGLE, instructions): output_buffer.WriteBits(TOGGLE) output_buffer.WriteValIdx(toggle.val_idx) for toggle in filter(TOGGLE_RANGE, instructions): output_buffer.WriteBits(TOGGLE_RANGE) output_buffer.WriteValIdx(toggle.start) output_buffer.WriteValIdx(toggle.last) for clone in filter(CLONE, instructions): output_buffer.WriteBits(CLONE) output_buffer.WriteKeyIdx(clone.key_idx) output_buffer.WriteString(huffman.encode(clone.val)) for kvsto in filter(KVSTO, instructions): output_buffer.WriteString(huffman.encode(kvsto.key)) output_buffer.WriteString(huffman.encode(kvsto.val)) for eref in filter(EREF, instructions): output_buffer.WriteString(huffman.encode(kvsto.key)) output_buffer.WriteString(huffman.encode(kvsto.val)) 5. Decompression-overview Note that this only describes conceptually how the algorithm works and is very much pseudo-code Peon Expires April 15, 2013 [Page 6] Internet-Draft HTTP/2 Header Compression Oct 2012 Decode(): header_group = headers.header_group // as received from sender val_idx = headers_frame.first_val_idx // as received from sender. header_groups[header_group].ephemereal_headers.clear() for instruction in instructions: ExecuteInstruction(instruction, header_group) ExecuteInstruction(instruction, header_group): while (instruction.size_delta() + state_size > max_state_size): RemoveOldestLRUItem() if instruction.opcode == TOGGLE: if instruction.idx in header_groups[header_group]: header_groups[header_group].remove(instruction.idx) else: header_groups[header_group].insert(instruction.idx) MoveToHeadOfLRU(instruction.idx) elif instruction.opcode == TOGGLE_GROUP: for i=instruction.begin; i <= instruction.last; ++i: ExecuteInstruction(Toggle(i), header_group); elif instruction.opcode == CLONE: KeyIdxMap::iterator kimi = key_idx_map.lookup(instruction.key_idx) val_idx_map.insert(++val_idx, pair(kimi, instruction.val)) elif instruction.opcode == KVSTO: key_idx = GetNextKeyIdx() KeyIdxMap::iterator kimi = key_idx_map.insert(key_idx, instruction.key) val_idx_map.insert(++val_idx, pair(kimi, instruction.val)) elif instruction.opcode == EREF: header_groups[header_group].ephemereal_headers.append( pair(instruction.key, instruction.val)) // This would return the complete set of headers, e.g. for an HTTP // request or response InflateHeaders(header_group): Headers headers for idx in header_groups[header_group]: headers.append(ValIdxMap[idx].first->first, ValIdxMap[idx].second) headers.append(header_groups[header_group].ephemereal_headers); return headers 6. On-the-wire-overview This would replace the current HEADERS frame described in the SPDY/2 draft specification. Peon Expires April 15, 2013 [Page 7] Internet-Draft HTTP/2 Header Compression Oct 2012 The header-group-id MUST be emitted immediately after the HEADERS frame boilerplate. A field indicating the start of the ID-sequence space MUST be immediately after the header-group-id A sequence of 'Toggle' or 'ToggleRange' operations MUST follow, if any of these operations are to be emitted at all A sequence of 'Clone' or 'KVSto' operations MUST then follow, if any of these operations are to be emitted at all A sequence of 'ERef' operations MUST then follow, if any of these are to be emitted. If the size of the aggregate set of instructions is larger than the frame-length, the 'finished' bit of the HEADERS frame boilerplate MUST be set to 0. This will indicate that the next subsequent HEADERS frame on that stream is a continuation of a previous HEADERS frame If all of the 'Toggle', 'ToggleRange', 'Clone', and 'KVSTo' operations have been emitted, the 'finished' bit of the HEADERS frame which encapsulates the last instruction MUST be set to 1. The rough ABNF for this would be: HEADERS_FRAME: HEADER_FRAME_BOILERPLATE GROUP-ID FIRST-VAL-IDX INSTRUCTIONS HEADER_FRAME_BOILERPLATE: LENGTH STREAM_ID HEADERS_OPCODE HEADER_DONE_FLAG HEADER_DONE_FLAG: (TRUE | FALSE) INSTRUCTIONS : *(TOGGLE | TOGGLERANGE) *( CLONE | KVSTO) *(EREF) TOGGLE: TOGGLE_OPCODE VAL_IDX TOGGLERANGE: TOGGLERANGE_OPCODE VAL_IDX_START VAL_IDX_END CLONE: CLONE_OPCODE KEY_IDX VAL_STRING KVSTO: KVSTO_OPCODE KEY_STRING VAL_STRING EREF: EREF_OPCODE KEY_STRING VAL_STRING KEY_STRING: MAYBE_COMPRESSED_STRING VAL_STRING: MAYBE_COMPRESSED_STRING MAYBE_COMPRESSED_STRING: TRUE HUFFMAN_COMPRESSED_STRING | FALSE STRING_LITERAL VAL_IDX_START: int VAL_IDX_END: int VAL_IDX: int A complete set of headers has only been expressed when the HEADERS frame has the HEADERS_DONE bit set to true. If the headers being Peon Expires April 15, 2013 [Page 8] Internet-Draft HTTP/2 Header Compression Oct 2012 serialized do not fit into one HEADERS frame, the HEADERS_DONE flag for all but the last HEADERS frame MUST be set to false There are alternate serializations which have various interesting properties. Here are some interesting variations among these alternatives: Serialize all constant-length information before serializing any string data. This allows for a clear separation of byte- aligned vs. potentially non-byte-aligned data Have explicit sections for each opcode type, with a sentinel denoting the end of each range of opcode. Since the number of opcode types is small, this could reduce the overhead consumed in repeating the opcode. As above, but have a count for the number of each opcode instead of using a sentinel Use huffman coding on the entire payload of the HEADERS frame, not just the strings. TOGGLES are far more common than other operations, and using fewer bits for these could be beneficial 7. Operations Toggle: This encodes a single ValIdx, which, if already present in the HeaderGroup associated with the current HEADERS frame, will be removed, else added. If added, then the LRU entry which the index references MUST be moved to the head of the LRU. ToggleRange: This incodes two ValIdx, denoting an inclusive range of ValIdx which shall be toggled as per the Toggle operation, above. Clone: This encodes a KeyIdx which references a pre-existing key, and references a new string. The Clone operation inserts a new ValEntry into the ValMap associated with the KeyMap entry identified by the KeyIdx. The ValIdx for this new ValEntry is the ++current_val_idx; The new ValEntry MUST be inserted at the head of the LRU. KVSto: This encodes a new key string and new value string. The key will be inserted into the KeyMap, and assigned a new KeyIdx. The value string will be inserted into the Valmap associated with the KeyMap entry that was just added. As with Clone, a new unique ValIdx will be associated with it in a monotonically increasing sequence by using the value of ++current_val_idx. The new ValEntry MUST be inserted at the head of the LRU. Peon Expires April 15, 2013 [Page 9] Internet-Draft HTTP/2 Header Compression Oct 2012 Eref: This encodes a key and value literal which, while interpreted as part of the header frame, doesn't cause any persistent change in the compressor state. ERef stands for "ephemereal reference". 8. Security Considerations The compressor algorithm described here is expected to be immune to the current attacks against encrypted stream-based compressors such as TLS+gzip, but more scrutiny is warranted. The reason that it is believed that the algorithm(s) expressed here is immune is that any backreference to a header key or value always requires a whole-text match, and thus any probe of the compression context confirms no hypothesis unless the attacker has guessed the entire plaintext. 9. Requirements Notation The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 10. Acknowledgements 11. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [SPDY] Belshe, M. and R. Peon, "SPDY PROTOCOL", . Author's Address Roberto Peon Google, Inc Email: fenix@google.com Peon Expires April 15, 2013 [Page 10]