| rfc8878xml2.original.xml | rfc8878.xml | |||
|---|---|---|---|---|
| <?xml version="1.0" encoding="US-ASCII"?> | <?xml version="1.0" encoding="utf-8"?> | |||
| <!DOCTYPE rfc SYSTEM "rfc2629.dtd" [ | ||||
| <!ENTITY RFC1952 PUBLIC "" "http://xml2rfc.tools.ietf.org/public/rfc/bibxml/re | ||||
| ference.RFC.1952.xml"> | ||||
| ]> | ||||
| <rfc ipr="trust200902" category="info" obsoletes="8478" | ||||
| docName="draft-kucherawy-rfc8478bis-05"> | ||||
| <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?> | <!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent"> | |||
| <?rfc toc="yes" ?> | <rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" | |||
| <?rfc tocdepth="4" ?> | obsoletes="8478" docName="draft-kucherawy-rfc8478bis-05" number="8878" | |||
| <?rfc symrefs="yes" ?> | updates="" submissionType="IETF" category="info" consensus="true" | |||
| <?rfc sortrefs="yes"?> | xml:lang="en" tocInclude="true" tocDepth="4" symRefs="true" | |||
| <?rfc compact="yes" ?> | sortRefs="true" version="3"> | |||
| <?rfc subcompact="no"?> | ||||
| <front> | <front> | |||
| <title abbrev="application/zstd"> | ||||
| Zstandard Compression and the application/zstd Media Type | ||||
| </title> | ||||
| <author initials="Y." surname="Collet" | ||||
| fullname="Yann Collet"> | ||||
| <organization> | ||||
| </organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street>1 Hacker Way</street> | ||||
| <city>Menlo Park</city> | ||||
| <region>CA</region> | ||||
| <code>94025</code> | ||||
| <country>United States of America</country> | ||||
| </postal> | ||||
| <email>cyan@fb.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <author initials="M. S." surname="Kucherawy" | ||||
| fullname="Murray S. Kucherawy" | ||||
| role="editor"> | ||||
| <organization> | ||||
| </organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street>1 Hacker Way</street> | ||||
| <city>Menlo Park</city> | ||||
| <region>CA</region> | ||||
| <code>94025</code> | ||||
| <country>United States of America</country> | ||||
| </postal> | ||||
| <email>msk@fb.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <date year="2020"/> | ||||
| <area>General</area> | ||||
| <keyword>compression</keyword> | <title abbrev="application/zstd"> | |||
| Zstandard Compression and the 'application/zstd' Media Type | ||||
| </title> | ||||
| <seriesInfo name="RFC" value="8878"/> | ||||
| <author initials="Y." surname="Collet" fullname="Yann Collet"> | ||||
| <organization>Facebook</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street>1 Hacker Way</street> | ||||
| <city>Menlo Park</city> | ||||
| <region>CA</region> | ||||
| <code>94025</code> | ||||
| <country>United States of America</country> | ||||
| </postal> | ||||
| <email>cyan@fb.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <author initials="M." surname="Kucherawy" fullname="Murray S. Kucherawy" rol | ||||
| e="editor"> | ||||
| <organization>Facebook</organization> | ||||
| <address> | ||||
| <postal> | ||||
| <street>1 Hacker Way</street> | ||||
| <city>Menlo Park</city> | ||||
| <region>CA</region> | ||||
| <code>94025</code> | ||||
| <country>United States of America</country> | ||||
| </postal> | ||||
| <email>msk@fb.com</email> | ||||
| </address> | ||||
| </author> | ||||
| <date year="2021" month="February" /> | ||||
| <area>General</area> | ||||
| <keyword>compression</keyword> | ||||
| <abstract> | ||||
| <abstract> | <t>Zstandard, or "zstd" (pronounced "zee standard"), is a lossless data | |||
| <t> Zstandard, or "zstd" (pronounced "zee standard"), is a | compression mechanism. This document describes the mechanism and | |||
| data compression mechanism. This document describes the | registers a media type, content encoding, and a structured syntax | |||
| mechanism and registers a media type and content | suffix to be used when transporting zstd-compressed content via MIME.</t> | |||
| encoding to be used when transporting zstd-compressed | ||||
| content via Multipurpose Internet Mail Extensions | ||||
| (MIME). It also registers a corresponding media | ||||
| type, content encoding, and structured syntax suffix. </t> | ||||
| <t> Despite use of the word "standard" as part of its name, | <t> Despite use of the word "standard" as part of Zstandard, | |||
| readers are advised that this document is not an | readers are advised that this document is not an | |||
| Internet Standards Track specification; it is being | Internet Standards Track specification; it is being | |||
| published for informational purposes only. </t> | published for informational purposes only. </t> | |||
| <t> This document replaces and obsoletes RFC 8478. </t> | ||||
| <t> This document replaces and obsoletes RFC 8478. </t> | </abstract> | |||
| </abstract> | </front> | |||
| <middle> | ||||
| </front> | <section anchor="intro" numbered="true" toc="default"> | |||
| <name>Introduction</name> | ||||
| <middle> | <t> Zstandard, or "zstd" (pronounced "zee standard"), is a | |||
| <section anchor="intro" title="Introduction"> | ||||
| <t> Zstandard, or "zstd" (pronounced "zee standard"), is a | ||||
| data compression mechanism, akin to gzip | data compression mechanism, akin to gzip | |||
| <xref target="RFC1952"/>. </t> | <xref target="RFC1952" format="default"/>. </t> | |||
| <t> Despite use of the word "standard" as part of its name, | ||||
| <t> Despite use of the word "standard" as part of its name, | ||||
| readers are advised that this document is not an | readers are advised that this document is not an | |||
| Internet Standards Track specification; it is being | Internet Standards Track specification; it is being | |||
| published for informational purposes only. </t> | published for informational purposes only. </t> | |||
| <t>This document describes the Zstandard format. Also, to enable the | ||||
| <t> This document describes the Zstandard format. Also, | transport of a data object compressed with Zstandard, this document | |||
| to enable the transport of a data object compressed with | registers a media type, content encoding, and structured syntax suffix | |||
| Zstandard, this document registers a media type that can be | that can be used to identify such content when it is used in a | |||
| used to identify such content when it is used in a payload | payload.</t> | |||
| encoded using Multipurpose Internet Mail Extensions | </section> | |||
| (MIME). </t> | <section anchor="definitions" numbered="true" toc="default"> | |||
| </section> | <name>Definitions</name> | |||
| <t>Some terms used elsewhere in this document are defined | ||||
| <section anchor="definitions" title="Definitions"> | here for clarity. </t> | |||
| <t> Some terms used elsewhere in this document are defined | <dl newline="false" spacing="normal"> | |||
| here for clarity. </t> | <dt>uncompressed:</dt> | |||
| <dd>Describes an arbitrary | ||||
| <t> <list style="hanging"> | ||||
| <t hangText="uncompressed:"> Describes an arbitrary | ||||
| set of bytes in their original form, prior to being | set of bytes in their original form, prior to being | |||
| subjected to compression. </t> | subjected to compression. </dd> | |||
| <t hangText="compress, compression:"> The act of processing | ||||
| a set of bytes via the compression mechanism | ||||
| described here. </t> | ||||
| <t hangText="compressed:"> Describes the result of passing | <dt>compressed:</dt> | |||
| <dd>Describes the result of passing | ||||
| a set of bytes through this mechanism. The original | a set of bytes through this mechanism. The original | |||
| input has thus been compressed. </t> | input has thus been compressed. </dd> | |||
| <dt>decompressed:</dt> | ||||
| <t hangText="decompress, decompression:"> The act of | <dd>Describes the result of passing | |||
| processing a set of bytes through the inverse | ||||
| of the compression mechanism described here, in an | ||||
| attempt to recover the original set of bytes prior | ||||
| to compression. </t> | ||||
| <t hangText="decompressed:"> Describes the result of passing | ||||
| a set of bytes through the reverse of this mechanism. | a set of bytes through the reverse of this mechanism. | |||
| When this is successful, the decompressed payload and | When this is successful, the decompressed payload and | |||
| the uncompressed payload are indistinguishable. </t> | the uncompressed payload are indistinguishable. </dd> | |||
| <dt>encode:</dt> | ||||
| <t hangText="encode:"> The process of translating data | <dd>The process of translating data | |||
| from one form to another; this may include compression | from one form to another; this may include compression, | |||
| or it may refer to other translations done as part | or it may refer to other translations done as part | |||
| of this specification. </t> | of this specification. </dd> | |||
| <dt>decode:</dt> | ||||
| <t hangText="decode:"> The reverse of "encode"; describes | <dd>The reverse of "encode"; describes | |||
| a process of reversing a prior encoding to recover | a process of reversing a prior encoding to recover | |||
| the original content. </t> | the original content. </dd> | |||
| <dt>frame:</dt> | ||||
| <t hangText="frame:"> Content compressed by Zstandard is | <dd>Content compressed by Zstandard is | |||
| transformed into a Zstandard frame. Multiple frames | transformed into a Zstandard frame. Multiple frames | |||
| can be appended into a single file or stream. A frame | can be appended into a single file or stream. A frame | |||
| is completely independent, has a defined beginning | is completely independent, has a defined beginning | |||
| and end, and has a set of parameters that tells the | and end, and has a set of parameters that tells the | |||
| decoder how to decompress it. </t> | decoder how to decompress it. </dd> | |||
| <dt>block:</dt> | ||||
| <t hangText="block:"> A frame encapsulates one or multiple | <dd>A frame encapsulates one or multiple | |||
| blocks. Each block contains arbitrary content, which | blocks. Each block contains arbitrary content, which | |||
| is described by its header, and has a guaranteed | is described by its header, and has a guaranteed | |||
| maximum content size that depends upon frame | maximum content size that depends upon frame | |||
| parameters. Unlike frames, each block depends | parameters. Unlike frames, each block depends | |||
| on previous blocks for proper decoding. However, each | on previous blocks for proper decoding. However, each | |||
| block can be decompressed without waiting for its | block can be decompressed without waiting for its | |||
| successor, allowing streaming operations. </t> | successor, allowing streaming operations. </dd> | |||
| <dt>natural order:</dt> | ||||
| <t hangText="natural order:"> A sequence or ordering of | <dd>A sequence or ordering of | |||
| objects or values that is typical of that type of | objects or values that is typical of that type of | |||
| object or value. A set of unique integers, for | object or value. A set of unique integers, for | |||
| example, is in "natural order" if when progressing | example, is in "natural order" if, when progressing | |||
| from one element in the set or sequence to the next, | from one element in the set or sequence to the next, | |||
| there is never a decrease in value. </t> | there is never a decrease in value. </dd> | |||
| </list> </t> | </dl> | |||
| <t>The naming convention for identifiers within the | ||||
| <t> The naming convention for identifiers within the | ||||
| specification is Mixed_Case_With_Underscores. | specification is Mixed_Case_With_Underscores. | |||
| Identifiers inside square brackets indicate that the | Identifiers inside square brackets indicate that the | |||
| identifier is optional in the presented context. </t> | identifier is optional in the presented context. </t> | |||
| </section> | </section> | |||
| <section anchor="compression" numbered="true" toc="default"> | ||||
| <section anchor="compression" title="Compression Algorithm"> | <name>Compression Algorithm</name> | |||
| <t> This section describes the Zstandard algorithm. </t> | <t> This section describes the Zstandard algorithm. </t> | |||
| <t> The purpose of this document is to define a lossless | <t> The purpose of this document is to define a lossless | |||
| compressed data format that is a) independent of the CPU | compressed data format that is a) independent of the CPU | |||
| type, operating system, file system, and character set and | type, operating system, file system, and character set and | |||
| b) is suitable for file compression and pipe and streaming | b) suitable for file compression and pipe and streaming | |||
| compression, using the Zstandard algorithm. The text of | compression, using the Zstandard algorithm. The text of | |||
| the specification assumes a basic background in | the specification assumes a basic background in | |||
| programming at the level of bits and other primitive data | programming at the level of bits and other primitive data | |||
| representations. </t> | representations. </t> | |||
| <t> The data can be produced or consumed, even for an | ||||
| <t> The data can be produced or consumed, even for an | ||||
| arbitrarily long sequentially presented input data stream, | arbitrarily long sequentially presented input data stream, | |||
| using only an a priori bounded amount of intermediate | using only an a priori bounded amount of intermediate | |||
| storage, and hence can be used in data communications. | storage; hence, it can be used in data communications. | |||
| The format uses the Zstandard compression method, and | The format uses the Zstandard compression method, and | |||
| an optional xxHash-64 checksum method | an optional xxHash-64 checksum method | |||
| <xref target="XXHASH"/>, for detection of data | <xref target="XXHASH" format="default"/>, for detection of da ta | |||
| corruption. </t> | corruption. </t> | |||
| <t> The data format defined by this specification does not | ||||
| <t> The data format defined by this specification does not | ||||
| attempt to allow random access to compressed data. </t> | attempt to allow random access to compressed data. </t> | |||
| <t> Unless otherwise indicated below, a compliant compressor | ||||
| <t> Unless otherwise indicated below, a compliant compressor | ||||
| must produce data sets that conform to the specifications | must produce data sets that conform to the specifications | |||
| presented here. However, it does not need to support all | presented here. However, it does not need to support all | |||
| options. </t> | options. </t> | |||
| <t> A compliant decompressor must be able to decompress at | ||||
| <t> A compliant decompressor must be able to decompress at | ||||
| least one working set of parameters that conforms to the | least one working set of parameters that conforms to the | |||
| specifications presented here. It may also ignore | specifications presented here. It may also ignore | |||
| informative fields, such as the checksum. Whenever it does | informative fields, such as the checksum. Whenever it does | |||
| not support a parameter defined in the compressed stream, | not support a parameter defined in the compressed stream, | |||
| it must produce a non-ambiguous error code and associated | it must produce an unambiguous error code and associated | |||
| error message explaining which parameter is | error message explaining which parameter is | |||
| unsupported. </t> | unsupported. </t> | |||
| <t> This specification is intended for use by implementers | ||||
| <t> This specification is intended for use by implementers | ||||
| of software to compress data into Zstandard format and/or | of software to compress data into Zstandard format and/or | |||
| decompress data from Zstandard format. The Zstandard | decompress data from Zstandard format. The Zstandard | |||
| format is supported by an open source reference | format is supported by an open-source reference | |||
| implementation, written in portable C, and available at | implementation, written in portable C, and available at | |||
| <xref target="ZSTD"/>. </t> | <xref target="ZSTD" format="default"/>. </t> | |||
| <section anchor="comp_frames" numbered="true" toc="default"> | ||||
| <section anchor="comp_frames" title="Frames"> | <name>Frames</name> | |||
| <t> Zstandard compressed data is made up of one | <t> Zstandard compressed data is made up of one | |||
| or more frames. Each frame is independent and can | or more frames. Each frame is independent and can | |||
| be decompressed independently of other frames. The | be decompressed independently of other frames. The | |||
| decompressed content of multiple concatenated | decompressed content of multiple concatenated | |||
| frames is the concatenation of each frame's | frames is the concatenation of each frame's | |||
| decompressed content. </t> | decompressed content. </t> | |||
| <t> There are two frame formats defined for | ||||
| <t> There are two frame formats defined for | ||||
| Zstandard: Zstandard frames and skippable frames. | Zstandard: Zstandard frames and skippable frames. | |||
| Zstandard frames contain compressed data, while | Zstandard frames contain compressed data, while | |||
| skippable frames contain custom user metadata. </t> | skippable frames contain custom user metadata. </t> | |||
| <section anchor="comp_zstd_frames" numbered="true" toc="default"> | ||||
| <section anchor="comp_zstd_frames" | <name>Zstandard Frames</name> | |||
| title="Zstandard Frames"> | <t> The structure of a single Zstandard frame | |||
| <t> The structure of a single Zstandard frame | ||||
| is as follows: | is as follows: | |||
| <figure><artwork> | </t> | |||
| +--------------------+------------+ | ||||
| | Magic_Number | 4 bytes | | ||||
| +--------------------+------------+ | ||||
| | Frame_Header | 2-14 bytes | | ||||
| +--------------------+------------+ | ||||
| | Data_Block | n bytes | | ||||
| +--------------------+------------+ | ||||
| | [More Data_Blocks] | | | ||||
| +--------------------+------------+ | ||||
| | [Content_Checksum] | 4 bytes | | ||||
| +--------------------+------------+ | ||||
| </artwork></figure> </t> | ||||
| <t> <list style="hanging"> | ||||
| <t hangText="Magic_Number:"> | ||||
| 4 bytes, little-endian | ||||
| format. Value: 0xFD2FB528. | ||||
| </t> | ||||
| <t hangText="Frame_Header:"> | ||||
| 2 to 14 bytes, detailed in | ||||
| <xref target="comp_frame_hdr"/>. | ||||
| </t> | ||||
| <t hangText="Data_Block:"> | ||||
| Detailed in | ||||
| <xref target="blocks"/>. | ||||
| This is where | ||||
| data appears. | ||||
| </t> | ||||
| <t hangText="Content_Checksum:"> | ||||
| An optional 32-bit checksum, | ||||
| only present if | ||||
| Content_Checksum_Flag is set. | ||||
| The content checksum is the | ||||
| result of the XXH64() hash | ||||
| function | ||||
| <xref target="XXHASH"/> | ||||
| digesting the original | ||||
| (decoded) data as input, and a | ||||
| seed of zero. The low 4 | ||||
| bytes of the checksum are | ||||
| stored in little-endian format. | ||||
| </t> | ||||
| </list> </t> | ||||
| <t> The magic number was selected to be less | ||||
| probable to find at the beginning of an | ||||
| arbitrary file. It avoids trivial | ||||
| patterns (0x00, 0xFF, repeated bytes, | ||||
| increasing bytes, etc.), contains byte | ||||
| values outside of ASCII range, and doesn't | ||||
| map into UTF-8 space, all of which reduce | ||||
| the likelihood of its appearance at the | ||||
| top of a text file. </t> | ||||
| <section anchor="comp_frame_hdr" | ||||
| title="Frame Header"> | ||||
| <t> The frame header has a variable | ||||
| size, with a minimum of 2 bytes | ||||
| and up to 14 bytes depending on | ||||
| optional parameters. The structure | ||||
| of Frame_Header is as follows: | ||||
| <figure><artwork> | <table anchor="single-frame"> | |||
| +-------------------------+-----------+ | <name>The Structure of a Single Zstandard Frame</name> | |||
| | Frame_Header_Descriptor | 1 byte | | <tbody> | |||
| +-------------------------+-----------+ | <tr> | |||
| | [Window_Descriptor] | 0-1 byte | | <td>Magic_Number</td> | |||
| +-------------------------+-----------+ | <td>4 bytes</td> | |||
| | [Dictionary_ID] | 0-4 bytes | | </tr> | |||
| +-------------------------+-----------+ | <tr> | |||
| | [Frame_Content_Size] | 0-8 bytes | | <td>Frame_Header</td> | |||
| +-------------------------+-----------+ | <td>2-14 bytes</td> | |||
| </artwork></figure> </t> | </tr> | |||
| <tr> | ||||
| <td>Data_Block</td> | ||||
| <td>n bytes</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>[More Data_Blocks]</td> | ||||
| <td></td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>[Content_Checksum]</td> | ||||
| <td>4 bytes</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>Magic_Number:</dt> | ||||
| <dd> | ||||
| 4 bytes, little-endian format. Value: 0xFD2FB528. | ||||
| </dd> | ||||
| <dt>Frame_Header:</dt> | ||||
| <dd> | ||||
| 2 to 14 bytes, detailed in | ||||
| <xref target="comp_frame_hdr" format="default"/>. | ||||
| </dd> | ||||
| <dt>Data_Block:</dt> | ||||
| <dd> | ||||
| Detailed in <xref target="blocks" format="default"/>. | ||||
| This is where data appears. | ||||
| </dd> | ||||
| <dt>Content_Checksum:</dt> | ||||
| <dd> | ||||
| An optional 32-bit checksum, | ||||
| only present if | ||||
| Content_Checksum_Flag is set. | ||||
| The content checksum is the | ||||
| result of the XXH64() hash | ||||
| function | ||||
| <xref target="XXHASH" format="default"/> | ||||
| digesting the original | ||||
| (decoded) data as input, and a | ||||
| seed of zero. The low 4 | ||||
| bytes of the checksum are | ||||
| stored in little-endian format. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> The magic number was selected to be less | ||||
| probable to find at the beginning of an | ||||
| arbitrary file. It avoids trivial | ||||
| patterns (0x00, 0xFF, repeated bytes, | ||||
| increasing bytes, etc.), contains byte | ||||
| values outside of the ASCII range, and doesn't | ||||
| map into UTF-8 space, all of which reduce | ||||
| the likelihood of its appearance at the | ||||
| top of a text file. </t> | ||||
| <section anchor="comp_frame_hdr" numbered="true" toc="default"> | ||||
| <name>Frame Header</name> | ||||
| <t> The frame header has a variable | ||||
| size, with a minimum of 2 bytes | ||||
| up to a maximum of 14 bytes depending on | ||||
| optional parameters. The structure | ||||
| of Frame_Header is as follows:</t> | ||||
| <section anchor="comp_frame_header_desc" | <table anchor="frame-header"> | |||
| title="Frame_Header_Descriptor"> | <name>The Structure of Frame_Header</name> | |||
| <t> The first header's byte is | <tbody> | |||
| called the | <tr> | |||
| Frame_Header_Descriptor. | <td>Frame_Header_Descriptor </td> | |||
| It describes | <td>1 byte</td> | |||
| which other fields are | </tr> | |||
| present. Decoding this | <tr> | |||
| byte is enough to tell the | <td>[Window_Descriptor]</td> | |||
| size of Frame_Header. | <td>0-1 byte</td> | |||
| </tr> | ||||
| <tr> | ||||
| <td>[Dictionary_ID]</td> | ||||
| <td>0-4 bytes</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>[Frame_Content_Size]</td> | ||||
| <td>0-8 bytes</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <figure><artwork> | <section anchor="comp_frame_header_desc" numbered="true" toc="defaul | |||
| +------------+-------------------------+ | t"> | |||
| | Bit Number | Field Name | | <name>Frame_Header_Descriptor</name> | |||
| +------------+-------------------------+ | <t> The first header's byte is called the | |||
| | 7-6 | Frame_Content_Size_Flag | | Frame_Header_Descriptor. It describes | |||
| +------------+-------------------------+ | which other fields are present. Decoding this | |||
| | 5 | Single_Segment_Flag | | byte is enough to tell the size of Frame_Header. | |||
| +------------+-------------------------+ | </t> | |||
| | 4 | (unused) | | ||||
| +------------+-------------------------+ | ||||
| | 3 | (reserved) | | ||||
| +------------+-------------------------+ | ||||
| | 2 | Content_Checksum_Flag | | ||||
| +------------+-------------------------+ | ||||
| | 1-0 | Dictionary_ID_Flag | | ||||
| +------------+-------------------------+ | ||||
| </artwork></figure> </t> | ||||
| <t> In this table, bit 7 is | <table anchor="Frame-Header-Descriptor"> | |||
| the highest bit, while bit | <name>The Frame_Header_Descriptor</name> | |||
| 0 is the lowest one. </t> | <thead> | |||
| <tr> | ||||
| <th>Bit Number</th> | ||||
| <th>Field Name</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td>7-6</td> | ||||
| <td>Frame_Content_Size_Flag </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>5</td> | ||||
| <td>Single_Segment_Flag </td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>4</td> | ||||
| <td>(unused)</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>3</td> | ||||
| <td>(reserved)</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>2</td> | ||||
| <td>Content_Checksum_Flag</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>1-0</td> | ||||
| <td>Dictionary_ID_Flag</td> | ||||
| </tr> | ||||
| <section title="Frame_Content_Siz | </tbody> | |||
| e_Flag"> | </table> | |||
| <t> | <t> In <xref target="Frame-Header-Descriptor" />, bit 7 is | |||
| This is a 2-bit flag | the highest bit, while bit | |||
| (equivalent to | 0 is the lowest one. </t> | |||
| Frame_Header_Descriptor | <section numbered="true" toc="default"> | |||
| right-shifted 6 bits) | <name>Frame_Content_Size_Flag</name> | |||
| specifying whether | <t> | |||
| Frame_Content_Size | This is a 2-bit flag (equivalent to Frame_Header_Descriptor | |||
| (the decompressed | right-shifted 6 bits) specifying whether Frame_Content_Size | |||
| data size) is provided | (the decompressed data size) is provided within the header. | |||
| within the header. | Frame_Content_Size_Flag provides FCS_Field_Size, which is the | |||
| Frame_Content_Size_Flag | number of bytes used by Frame_Content_Size according to | |||
| provides FCS_Field_Size, | <xref target="frame-content-size-flag"/>: | |||
| which is the | </t> | |||
| number of bytes | ||||
| used by | ||||
| Frame_Content_Size | ||||
| according to | ||||
| the following table: | ||||
| <figure><artwork> | ||||
| +-------------------------+--------+---+---+---+ | ||||
| | Frame_Content_Size_Flag | 0 | 1 | 2 | 3 | | ||||
| +-------------------------+--------+---+---+---+ | ||||
| | FCS_Field_Size | 0 or 1 | 2 | 4 | 8 | | ||||
| +-------------------------+--------+---+---+---+ | ||||
| </artwork></figu | ||||
| re></t> | ||||
| <t> | <table anchor="frame-content-size-flag"> | |||
| When | <name>Frame_Content_Size_Flag Provides FCS_Field_Size</name> | |||
| Frame_Content_Size_Flag | <tbody> | |||
| is 0, FCS_Field_Size | <tr> | |||
| depends on | <td>Frame_Content_Size_Flag</td> | |||
| Single_Segment_Flag: | <td align="center">0</td> | |||
| If Single_Segment_Flag | <td>1</td> | |||
| is set, FCS_Field_Size | <td>2</td> | |||
| is 1. Otherwise, | <td>3</td> | |||
| FCS_Field_Size is 0; | </tr> | |||
| Frame_Content_Size | <tr> | |||
| is not provided. | <td>FCS_Field_Size</td> | |||
| </t> | <td align="center">0 or 1</td> | |||
| </section> | <td>2</td> | |||
| <td>4</td> | ||||
| <td>8</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <section title="Single_Segment_Fl | <t> | |||
| ag"> | When Frame_Content_Size_Flag | |||
| <t> | is 0, FCS_Field_Size | |||
| depends on | ||||
| Single_Segment_Flag: | ||||
| if Single_Segment_Flag | ||||
| is set, FCS_Field_Size | ||||
| is 1. Otherwise, | ||||
| FCS_Field_Size is 0; | ||||
| Frame_Content_Size | ||||
| is not provided. | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Single_Segment_Flag</name> | ||||
| <t> | ||||
| If this flag is | If this flag is | |||
| set, data must | set, data must | |||
| be regenerated | be regenerated | |||
| within a single | within a single | |||
| continuous | continuous | |||
| memory segment. | memory segment. | |||
| </t> | </t> | |||
| <t> | ||||
| <t> | ||||
| In this case, | In this case, | |||
| Window_Descriptor | Window_Descriptor | |||
| byte is skipped, but | byte is skipped, but | |||
| Frame_Content_Size | Frame_Content_Size | |||
| is necessarily | is necessarily | |||
| present. As a | present. As a | |||
| consequence, | consequence, | |||
| the decoder | the decoder | |||
| must allocate a | must allocate a | |||
| memory segment | memory segment | |||
| of size equal | of a size equal to | |||
| or larger than | or larger than | |||
| Frame_Content_Size. | Frame_Content_Size. | |||
| </t> | </t> | |||
| <t> | ||||
| <t> | ||||
| In order to protect the | In order to protect the | |||
| decoder from | decoder from | |||
| unreasonable memory | unreasonable memory | |||
| requirements, | requirements, | |||
| a decoder is | a decoder is | |||
| allowed to reject a | allowed to reject a | |||
| compressed frame that | compressed frame that | |||
| requests a memory size | requests a memory size | |||
| beyond the decoder's | beyond the decoder's | |||
| authorized range. | authorized range. | |||
| </t> | </t> | |||
| <t> | ||||
| <t> | ||||
| For broader | For broader | |||
| compatibility, | compatibility, | |||
| decoders are | decoders are | |||
| recommended to | recommended to | |||
| support memory | support memory | |||
| sizes of at least 8 MB. | sizes of at least 8 MB. | |||
| This is only a | This is only a | |||
| recommendation; | recommendation; | |||
| each decoder is | each decoder is | |||
| free to support | free to support | |||
| higher or lower | higher or lower | |||
| limits, depending on | limits, depending on | |||
| local limitations. | local limitations. | |||
| </t> | </t> | |||
| </section> | </section> | |||
| <section numbered="true" toc="default"> | ||||
| <section title="Unused Bit"> | <name>Unused Bit</name> | |||
| <t> | <t> | |||
| A decoder compliant | A decoder compliant | |||
| with this specification | with this specification | |||
| version shall not | version shall not | |||
| interpret this bit. | interpret this bit. | |||
| It might | It might | |||
| be used in a future | be used in a future | |||
| version, to signal a | version to signal a | |||
| property that is not | property that is not | |||
| mandatory to properly | mandatory to properly | |||
| decode the frame. | decode the frame. | |||
| An encoder compliant | An encoder compliant | |||
| with this specification | with this specification | |||
| must set this bit to | must set this bit to | |||
| zero. | zero. | |||
| </t> | </t> | |||
| </section> | </section> | |||
| <section numbered="true" toc="default"> | ||||
| <section title="Reserved Bit"> | <name>Reserved Bit</name> | |||
| <t> | <t> | |||
| This bit is reserved | This bit is reserved | |||
| for some future | for some future | |||
| feature. Its value must | feature. Its value must | |||
| be zero. A decoder | be zero. A decoder | |||
| compliant with this | compliant with this | |||
| specification version | specification version | |||
| must ensure it is not | must ensure it is not | |||
| set. This bit may be | set. This bit may be | |||
| used in a future | used in a future | |||
| revision, to signal a | revision to signal a | |||
| feature that must be | feature that must be | |||
| interpreted to decode | interpreted to decode | |||
| the frame correctly. | the frame correctly. | |||
| </t> | </t> | |||
| </section> | </section> | |||
| <section numbered="true" toc="default"> | ||||
| <section title="Content_Checksum_ | <name>Content_Checksum_Flag</name> | |||
| Flag"> | <t> | |||
| <t> | ||||
| If this flag is set, a | If this flag is set, a | |||
| 32-bit | 32-bit | |||
| Content_Checksum will | Content_Checksum will | |||
| be present at the | be present at the | |||
| frame's end. | frame's end. | |||
| See the description of | See the description of | |||
| Content_Checksum above. | Content_Checksum above. | |||
| </t> | </t> | |||
| </section> | </section> | |||
| <section numbered="true" toc="default"> | ||||
| <section title="Dictionary_ID_Fla | <name>Dictionary_ID_Flag</name> | |||
| g"> | <t> | |||
| <t> | This is a 2-bit flag | |||
| This is a 2-bit flag | (= Frame_Header_Descriptor & 0x3) indicating | |||
| (= Frame_Header_Descripto | whether a dictionary ID | |||
| r & 0x3) indicating | is provided within the | |||
| whether a dictionary ID | header. It also | |||
| is provided within the | specifies the size of | |||
| header. It also | this field as | |||
| specifies the size of | DID_Field_Size: | |||
| this field as | </t> | |||
| DID_Field_Size: | ||||
| <figure><artwork> | ||||
| +--------------------+---+---+---+---+ | ||||
| | Dictionary_ID_Flag | 0 | 1 | 2 | 3 | | ||||
| +--------------------+---+---+---+---+ | ||||
| | DID_Field_Size | 0 | 1 | 2 | 4 | | ||||
| +--------------------+---+---+---+---+ | ||||
| </artwork></figu | ||||
| re></t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="comp_window_descr" | ||||
| title="Window Descriptor"> | ||||
| <t> | ||||
| This provides guarantees about | ||||
| the minimum | ||||
| memory buffer required to | ||||
| decompress a frame. This | ||||
| information is important for | ||||
| decoders to allocate enough | ||||
| memory. | ||||
| </t> | ||||
| <t> | ||||
| The Window_Descriptor byte is | ||||
| optional. When | ||||
| Single_Segment_Flag is set, | ||||
| Window_Descriptor is not | ||||
| present. In this case, | ||||
| Window_Size is | ||||
| Frame_Content_Size, which can | ||||
| be any value from 0 to | ||||
| 2^64-1 bytes (16 ExaBytes). | ||||
| <figure><artwork> | <table anchor="Dictionary-ID-Flag"> | |||
| +------------+----------+----------+ | <name>Dictionary_ID_Flag</name> | |||
| | Bit Number | 7-3 | 2-0 | | <tbody> | |||
| +------------+----------+----------+ | <tr> | |||
| | Field Name | Exponent | Mantissa | | <td>Dictionary_ID_Flag</td> | |||
| +------------+----------+----------+ | <td>0</td> | |||
| </artwork></figure></t> | <td>1</td> | |||
| <td>2</td> | ||||
| <td>3</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>DID_Field_Size</td> | ||||
| <td>0</td> | ||||
| <td>1</td> | ||||
| <td>2</td> | ||||
| <td>4</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="comp_window_descr" numbered="true" toc="default"> | ||||
| <name>Window Descriptor</name> | ||||
| <t> | ||||
| This provides guarantees about | ||||
| the minimum | ||||
| memory buffer required to | ||||
| decompress a frame. This | ||||
| information is important for | ||||
| decoders to allocate enough | ||||
| memory. | ||||
| </t> | ||||
| <t> | ||||
| The Window_Descriptor byte is | ||||
| optional. When | ||||
| Single_Segment_Flag is set, | ||||
| Window_Descriptor is not | ||||
| present. In this case, | ||||
| Window_Size is | ||||
| Frame_Content_Size, which can | ||||
| be any value from 0 to | ||||
| 2<sup>64</sup> - 1 bytes (16 ExaBytes). | ||||
| </t> | ||||
| <t> | <table anchor="window-descriptor"> | |||
| The minimum memory buffer size | <name>Window_Descriptor</name> | |||
| is called Window_Size. It is | <tbody> | |||
| described by the following | <tr> | |||
| formulae: | <td>Bit Number</td> | |||
| <td align="center">7-3</td> | ||||
| <td align="center">2-0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td>Field Name</td> | ||||
| <td>Exponent</td> | ||||
| <td>Mantissa</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <figure><artwork> | <t> | |||
| The minimum memory buffer size | ||||
| is called Window_Size. It is | ||||
| described by the following | ||||
| formulas: | ||||
| </t> | ||||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| windowLog = 10 + Exponent; | windowLog = 10 + Exponent; | |||
| windowBase = 1 << windowLog; | windowBase = 1 << windowLog; | |||
| windowAdd = (windowBase / 8) * Mantissa; | windowAdd = (windowBase / 8) * Mantissa; | |||
| Window_Size = windowBase + windowAdd; | Window_Size = windowBase + windowAdd; | |||
| </artwork></figure></t> | ]]></artwork> | |||
| <t> | ||||
| <t> | ||||
| The minimum Window_Size is | The minimum Window_Size is | |||
| 1 KB. The maximum Window_Size | 1 KB. The maximum Window_Size | |||
| is (1<<41) + | is (1<<41) + | |||
| 7*(1<<38) bytes, | 7*(1<<38) bytes, | |||
| which is 3.75 TB. | which is 3.75 TB. | |||
| </t> | </t> | |||
| <t> In general, larger | ||||
| <t> In general, larger | ||||
| Window_Size values tend to | Window_Size values tend to | |||
| improve the compression ratio, bu t | improve the compression ratio, bu t | |||
| at the cost of increased memory | at the cost of increased memory | |||
| usage. </t> | usage. </t> | |||
| <t> | ||||
| <t> | ||||
| To properly decode compressed | To properly decode compressed | |||
| data, a decoder will need to | data, a decoder will need to | |||
| allocate a buffer of at least | allocate a buffer of at least | |||
| Window_Size bytes. | Window_Size bytes. | |||
| </t> | </t> | |||
| <t> | ||||
| <t> | ||||
| In order to protect | In order to protect | |||
| decoders from unreasonable | decoders from unreasonable | |||
| memory requirements, a decoder | memory requirements, a decoder | |||
| is allowed to reject a | is allowed to reject a | |||
| compressed frame that | compressed frame that | |||
| requests a memory size beyond | requests a memory size beyond the | |||
| decoder's authorized range. | decoder's authorized range. | |||
| </t> | </t> | |||
| <t> | ||||
| <t> | ||||
| For improved interoperability, | For improved interoperability, | |||
| it's recommended for decoders | it's recommended for decoders | |||
| to support values of | to support values of | |||
| Window_Size up to 8 MB and | Window_Size up to 8 MB and | |||
| for encoders not to generate | for encoders not to generate | |||
| frames requiring a Window_Size | frames requiring a Window_Size | |||
| larger than 8 MB. | larger than 8 MB. | |||
| It's merely a recommendation | It's merely a recommendation | |||
| though, and decoders are free | though, and decoders are free | |||
| to support larger or lower | to support higher or lower | |||
| limits, depending on local | limits, depending on local | |||
| limitations. | limitations. | |||
| </t> | </t> | |||
| </section> | </section> | |||
| <section anchor="comp_dictionary_id" numbered="true" toc="default"> | ||||
| <name>Dictionary_ID</name> | ||||
| <section anchor="comp_dictionary_id" | <t> | |||
| title="Dictionary_ID"> | This is a field of variable size, | |||
| <t> | ||||
| This is a variable size field, | ||||
| which contains the ID of the | which contains the ID of the | |||
| dictionary required to properly | dictionary required to properly | |||
| decode the frame. This field | decode the frame. This field | |||
| is optional. When it's not | is optional. When it's not | |||
| present, it's up to the decoder | present, it's up to the decoder | |||
| to know which dictionary | to know which dictionary | |||
| to use. </t> | to use. </t> | |||
| <t> | ||||
| <t> | ||||
| Dictionary_ID field size is | Dictionary_ID field size is | |||
| provided by DID_Field_Size. | provided by DID_Field_Size. | |||
| DID_Field_Size is directly | DID_Field_Size is directly | |||
| derived from the value | derived from the value | |||
| of Dictionary_ID_Flag. One byte | of Dictionary_ID_Flag. One byte | |||
| can represent an ID 0-255; 2 | can represent an ID 0-255; 2 | |||
| bytes can represent an ID | bytes can represent an ID | |||
| 0-65535; 4 bytes can | 0-65535; 4 bytes can | |||
| represent an ID 0-4294967295. | represent an ID 0-4294967295. | |||
| Format is little-endian. | Format is little-endian. | |||
| </t> | </t> | |||
| <t> | ||||
| <t> | ||||
| It is permitted to represent a | It is permitted to represent a | |||
| small ID (for example, 13) with | small ID (for example, 13) with | |||
| a large 4-byte dictionary | a large 4-byte dictionary | |||
| ID, even if it is less | ID, even if it is less | |||
| efficient. | efficient. | |||
| </t> | </t> | |||
| <t> | ||||
| <t> | Within private environments, | |||
| Within private environments, | any dictionary ID | |||
| any dictionary ID | can be used. However, for | |||
| can be used. However, for | frames and dictionaries | |||
| frames and dictionaries | distributed in public space, | |||
| distributed in public space, | Dictionary_ID must be | |||
| Dictionary_ID must be | attributed carefully. | |||
| attributed carefully. | The following | |||
| The following | ranges are reserved for use | |||
| ranges are reserved for use | only with dictionaries that | |||
| only with dictionaries that | have been registered with | |||
| have been registered with | IANA (see <xref target="iana_dict" format="default"/>): | |||
| IANA (see <xref target="iana_dict | </t> | |||
| "/>): | <dl newline="false" spacing="normal"> | |||
| <dt>low range:</dt> | ||||
| <list style="hanging"> | <dd> | |||
| <?rfc subcompact="yes"?> | ||||
| <t hangText="low range:"> | ||||
| <= 32767 | <= 32767 | |||
| </t> | </dd> | |||
| <t hangText="high range:" | <dt>high range:</dt> | |||
| > | <dd> | |||
| >= (1 << 31 | >= (1 << | |||
| ) | 31) | |||
| </t> | </dd> | |||
| <?rfc subcompact="no"?> | </dl> | |||
| </list> </t> | <t> Any other value for | |||
| <t> Any other value for | ||||
| Dictionary_ID can be used | Dictionary_ID can be used | |||
| by private arrangement | by private arrangement | |||
| between participants. </t> | between participants. </t> | |||
| <t> Any payload presented for | ||||
| <t> Any payload presented for | ||||
| decompression that | decompression that | |||
| references an unregistered | references an unregistered | |||
| reserved dictionary ID | reserved dictionary ID | |||
| results in an error. </t> | results in an error. </t> | |||
| </section> | </section> | |||
| <section anchor="comp_frame_content_size" numbered="true" toc="defau | ||||
| <section anchor="comp_frame_content_size" | lt"> | |||
| title="Frame Content Size"> | ||||
| <t> | ||||
| This is the original | ||||
| (uncompressed) size. This | ||||
| information is optional. | ||||
| Frame_Content_Size uses a | ||||
| variable number of bytes, | ||||
| provided by FCS_Field_Size. | ||||
| FCS_Field_Size is provided by | ||||
| the value of | ||||
| Frame_Content_Size_Flag. | ||||
| FCS_Field_Size can be equal to | ||||
| 0 (not present), 1, 2, 4, or | ||||
| 8 bytes. | ||||
| <figure><artwork> | ||||
| +----------------+--------------+ | ||||
| | FCS Field Size | Range | | ||||
| +----------------+--------------+ | ||||
| | 0 | unknown | | ||||
| +----------------+--------------+ | ||||
| | 1 | 0 - 255 | | ||||
| +----------------+--------------+ | ||||
| | 2 | 256 - 65791 | | ||||
| +----------------+--------------+ | ||||
| | 4 | 0 - 2^32 - 1 | | ||||
| +----------------+--------------+ | ||||
| | 8 | 0 - 2^64 - 1 | | ||||
| +----------------+--------------+ | ||||
| </artwork></figure></t> | ||||
| <t> | ||||
| Frame_Content_Size format is | ||||
| little-endian. When | ||||
| FCS_Field_Size is 1, 4, or 8 | ||||
| bytes, the value is read | ||||
| directly. When FCS_Field_Size | ||||
| is 2, the offset of 256 is | ||||
| added. It's allowed to | ||||
| represent a small size (for | ||||
| example 18) using any | ||||
| compatible variant. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="blocks" title="Blocks"> | ||||
| <t> After Magic_Number and | ||||
| Frame_Header, there are some number of | ||||
| blocks. Each frame must have at least | ||||
| 1 block, but there is no upper | ||||
| limit on the number of blocks per | ||||
| frame. </t> | ||||
| <t> The structure of a block is as | ||||
| follows: | ||||
| <figure><artwork> | ||||
| +--------------+---------------+ | ||||
| | Block_Header | Block_Content | | ||||
| +--------------+---------------+ | ||||
| | 3 bytes | n bytes | | ||||
| +--------------+---------------+ | ||||
| </artwork></figure></t> | ||||
| <t> Block_Header uses 3 bytes, | ||||
| written using little-endian | ||||
| convention. It contains three fields: | ||||
| <figure><artwork> | ||||
| +------------+------------+------------+ | ||||
| | Last_Block | Block_Type | Block_Size | | ||||
| +------------+------------+------------+ | ||||
| | bit 0 | bits 1-2 | bits 3-23 | | ||||
| +------------+------------+------------+ | ||||
| </artwork></figure></t> | ||||
| <section title="Last_Block"> | ||||
| <t> The lowest bit (Last_Block) | ||||
| signals whether | ||||
| this block is the last one. | ||||
| The frame will end after this | ||||
| last block. It may be followed | ||||
| by an optional Content_Checksum | ||||
| (see | ||||
| <xref target="comp_zstd_frames"/> | ||||
| ). | ||||
| </t> | ||||
| </section> | ||||
| <section title="Block_Type"> | ||||
| <t> The next 2 bits represent | ||||
| the Block_Type. There are four | ||||
| block types: | ||||
| <figure><artwork> | ||||
| +-----------+------------------+ | ||||
| | Value | Block_Type | | ||||
| +-----------+------------------+ | ||||
| | 0 | Raw_Block | | ||||
| +-----------+------------------+ | ||||
| | 1 | RLE_Block | | ||||
| +-----------+------------------+ | ||||
| | 2 | Compressed_Block | | ||||
| +-----------+------------------+ | ||||
| | 3 | Reserved | | ||||
| +-----------+------------------+ | ||||
| </artwork></figure></t> | ||||
| <t> <list style="hanging"> | ||||
| <t hangText="Raw_Block:"> | ||||
| This is an | ||||
| uncompressed block. | ||||
| Block_Content contains | ||||
| Block_Size bytes. </t> | ||||
| <t hangText="RLE_Block:"> | <name>Frame_Content_Size</name> | |||
| This is a single byte, | <t> | |||
| repeated Block_Size | This is the original | |||
| times. Block_Content | (uncompressed) size. This | |||
| consists of a single | information is optional. | |||
| byte. On the | Frame_Content_Size uses a | |||
| decompression side, | variable number of bytes, | |||
| this byte must be | provided by FCS_Field_Size. | |||
| repeated Block_Size | FCS_Field_Size is provided by | |||
| times. </t> | the value of | |||
| Frame_Content_Size_Flag. | ||||
| FCS_Field_Size can be equal to | ||||
| 0 (not present), 1, 2, 4, or | ||||
| 8 bytes. | ||||
| </t> | ||||
| <t hangText="Compressed_B | <table anchor="Frame-Content-Size"> | |||
| lock:"> | <name>Frame_Content_Size</name> | |||
| This is a compressed | <thead> | |||
| block as described in | <tr> | |||
| <xref target="comp_blocks | <th align="center">FCS Field Size</th> | |||
| "/>. | <th>Range</th> | |||
| Block_Size is the | </tr> | |||
| length of | </thead> | |||
| Block_Content, namely | <tbody> | |||
| the compressed data. | <tr> | |||
| The decompressed size | <td align="center">0</td> | |||
| is not known, but its | <td>unknown</td> | |||
| maximum possible value | </tr> | |||
| is guaranteed (see | <tr> | |||
| below). </t> | <td align="center">1</td> | |||
| <td>0 - 255</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">2</td> | ||||
| <td>256 - 65791</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">4</td> | ||||
| <td>0 - 2<sup>32</sup> - 1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">8</td> | ||||
| <td>0 - 2<sup>64</sup> - 1</td> | ||||
| </tr> | ||||
| <t hangText="Reserved:"> | </tbody> | |||
| This is not a block. | </table> | |||
| This value cannot be | ||||
| used with the current | ||||
| specification. If such | ||||
| a value is present, | ||||
| it is considered to be | ||||
| corrupt data, and a | ||||
| compliant decoder must | ||||
| reject it. </t> | ||||
| </list> </t> | ||||
| </section> | ||||
| <section title="Block_Size"> | <t> | |||
| <t> The upper 21 bits of | Frame_Content_Size format is | |||
| Block_Header represent the | little-endian. When | |||
| Block_Size. </t> | FCS_Field_Size is 1, 4, or 8 | |||
| bytes, the value is read | ||||
| directly. When FCS_Field_Size | ||||
| is 2, the offset of 256 is | ||||
| added. It's allowed to | ||||
| represent a small size (for | ||||
| example, 18) using any | ||||
| compatible variant. | ||||
| </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="blocks" numbered="true" toc="default"> | ||||
| <name>Blocks</name> | ||||
| <t> After Magic_Number and | ||||
| Frame_Header, there are some number of | ||||
| blocks. Each frame must have at least | ||||
| 1 block, but there is no upper | ||||
| limit on the number of blocks per | ||||
| frame. </t> | ||||
| <t> The structure of a block is as | ||||
| follows: | ||||
| </t> | ||||
| <t> When Block_Type is | <table anchor="block"> | |||
| Compressed_Block or Raw_Block, | <name>The Structure of a Block</name> | |||
| Block_Size is the size of | <thead> | |||
| Block_Content (hence excluding | <tr> | |||
| Block_Header). </t> | <th align="center">Block_Header</th> | |||
| <th align="center">Block_Content</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">3 bytes</td> | ||||
| <td align="center">n bytes</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> Block_Header uses 3 bytes, | ||||
| written using little-endian | ||||
| convention. It contains three fields: | ||||
| </t> | ||||
| <t> When Block_Type is | <table anchor="block-header"> | |||
| RLE_Block, since Block_Content's | <name>Block_Header</name> | |||
| size is always 1, Block_Size | <thead> | |||
| represents the number of times | <tr> | |||
| this byte must be repeated. </t> | <th align="center">Last_Block</th> | |||
| <th align="center">Block_Type</th> | ||||
| <th align="center">Block_Size</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">bit 0</td> | ||||
| <td align="center">bits 1-2</td> | ||||
| <td align="center">bits 3-23</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> Block_Size is limited by | <section numbered="true" toc="default"> | |||
| Block_Maximum_Size (see below). | <name>Last_Block</name> | |||
| </t> | <t> The lowest bit (Last_Block) | |||
| </section> | signals whether | |||
| this block is the last one. | ||||
| The frame will end after this | ||||
| last block. It may be followed | ||||
| by an optional Content_Checksum | ||||
| (see | ||||
| <xref target="comp_zstd_frames" format="default"/>). | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Block_Type</name> | ||||
| <t> The next 2 bits represent | ||||
| the Block_Type. There are four | ||||
| block types: | ||||
| </t> | ||||
| <section | <table anchor="block-types"> | |||
| title="Block_Content and Block_Ma | <name>The Four Block Types</name> | |||
| ximum_Size"> | <thead> | |||
| <t> The size of Block_Content | <tr> | |||
| is limited by | <th align="center">Value</th> | |||
| Block_Maximum_Size, which is | <th>Block_Type</th> | |||
| the smallest of: | </tr> | |||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">0</td> | ||||
| <td align="center">Raw_Block</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">1</td> | ||||
| <td align="center">RLE_Block</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">2</td> | ||||
| <td align="center">Compressed_Block</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">3</td> | ||||
| <td align="center">Reserved</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <list style="symbols"> | <dl newline="false" spacing="normal"> | |||
| <t> Window_Size </t> | <dt>Raw_Block:</dt> | |||
| <t> 128 KB </t> | <dd> | |||
| </list> </t> | This is an | |||
| uncompressed block. | ||||
| Block_Content contains | ||||
| Block_Size bytes. </dd> | ||||
| <dt>RLE_Block:</dt> | ||||
| <dd> | ||||
| This is a single byte, | ||||
| repeated Block_Size | ||||
| times. Block_Content | ||||
| consists of a single | ||||
| byte. On the | ||||
| decompression side, | ||||
| this byte must be | ||||
| repeated Block_Size | ||||
| times. </dd> | ||||
| <dt>Compressed_Block:</dt> | ||||
| <dd> | ||||
| This is a compressed | ||||
| block as described in | ||||
| <xref target="comp_blocks" format="default"/>. | ||||
| Block_Size is the | ||||
| length of | ||||
| Block_Content, namely | ||||
| the compressed data. | ||||
| The decompressed size | ||||
| is not known, but its | ||||
| maximum possible value | ||||
| is guaranteed (see | ||||
| below). </dd> | ||||
| <dt>Reserved:</dt> | ||||
| <dd> | ||||
| This is not a block. | ||||
| This value cannot be | ||||
| used with the current | ||||
| specification. If such | ||||
| a value is present, | ||||
| it is considered to be | ||||
| corrupt data, and a | ||||
| compliant decoder must | ||||
| reject it. </dd> | ||||
| </dl> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <t> Block_Maximum_Size is | <name>Block_Size</name> | |||
| <t> The upper 21 bits of | ||||
| Block_Header represent the | ||||
| Block_Size. </t> | ||||
| <t> When Block_Type is | ||||
| Compressed_Block or Raw_Block, | ||||
| Block_Size is the size of | ||||
| Block_Content (hence excluding | ||||
| Block_Header). </t> | ||||
| <t> When Block_Type is | ||||
| RLE_Block, since Block_Content's | ||||
| size is always 1, Block_Size | ||||
| represents the number of times | ||||
| this byte must be repeated. </t> | ||||
| <t> Block_Size is limited by | ||||
| Block_Maximum_Size (see below). | ||||
| </t> | ||||
| </section> | ||||
| <section numbered="true" toc="default"> | ||||
| <name>Block_Content and Block_Maximum_Size</name> | ||||
| <t> The size of Block_Content | ||||
| is limited by | ||||
| Block_Maximum_Size, which is | ||||
| the smallest of: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> Window_Size </li> | ||||
| <li> 128 KB </li> | ||||
| </ul> | ||||
| <t> Block_Maximum_Size is | ||||
| constant for a given frame. | constant for a given frame. | |||
| This maximum is applicable to | This maximum is applicable to | |||
| both the decompressed size | both the decompressed size | |||
| and the compressed size of any | and the compressed size of any | |||
| block in the frame. </t> | block in the frame. </t> | |||
| <t> The reasoning for this | ||||
| <t> The reasoning for this | ||||
| limit is that a decoder can | limit is that a decoder can | |||
| read this information at the | read this information at the | |||
| beginning of a frame and use | beginning of a frame and use | |||
| it to allocate buffers. | it to allocate buffers. | |||
| The guarantees on the size of | The guarantees on the size of | |||
| blocks ensure that the buffers | blocks ensure that the buffers | |||
| will be large enough for any | will be large enough for any | |||
| following block of the valid | following block of the valid | |||
| frame. </t> | frame. </t> | |||
| <t> If the compressed block | <t> If the compressed block | |||
| is larger than the uncompressed | is larger than the uncompressed o | |||
| ne, | ||||
| sending the uncompressed | sending the uncompressed | |||
| block (i.e., a Raw_Block) is | block (i.e., a Raw_Block) is | |||
| recommended instead. </t> | recommended instead. </t> | |||
| </section> | ||||
| </section> | </section> | |||
| <section anchor="comp_blocks" | </section> | |||
| title="Compressed Blocks"> | <section anchor="comp_blocks" numbered="true" toc="default"> | |||
| <t> To decompress a compressed block, | <name>Compressed Blocks</name> | |||
| <t> To decompress a compressed block, | ||||
| the compressed size must be provided | the compressed size must be provided | |||
| from the Block_Size field within | from the Block_Size field within | |||
| Block_Header. </t> | Block_Header. </t> | |||
| <t> A compressed block consists of two | <t> A compressed block consists of two | |||
| sections: a Literals Section | sections: a Literals_Section | |||
| (<xref target="comp_literals"/>) and | (<xref target="comp_literals" format="def | |||
| ault"/>) and | ||||
| a Sequences_Section | a Sequences_Section | |||
| (<xref target="comp_sequences"/>). | (<xref target="comp_sequences" format="de fault"/>). | |||
| The results of the two sections are | The results of the two sections are | |||
| then combined to produce the | then combined to produce the | |||
| decompressed data in Sequence | decompressed data in Sequence | |||
| Execution | Execution | |||
| (<xref target="comp_sequence_exec"/>). | (<xref target="comp_sequence_exec" format | |||
| </t> | ="default"/>). | |||
| </t> | ||||
| <t> To decode a compressed block, the | <t> To decode a compressed block, the | |||
| following elements are necessary: | following elements are necessary: | |||
| <list style="symbols"> | </t> | |||
| <t> Previous decoded data, up | <ul spacing="normal"> | |||
| <li> Previous decoded data, up | ||||
| to a distance of Window_Size, | to a distance of Window_Size, | |||
| or the beginning of the Frame, | or the beginning of the Frame, | |||
| whichever is smaller. | whichever is smaller. | |||
| Single_Segment_Flag | Single_Segment_Flag | |||
| will be set in the latter | will be set in the latter | |||
| case. </t> | case. </li> | |||
| <li> List of "recent offsets" | ||||
| <t> List of "recent offsets" | ||||
| from the previous | from the previous | |||
| Compressed_Block. | Compressed_Block. | |||
| </t> | </li> | |||
| <li> The previous Huffman tree, | ||||
| <t> The previous Huffman tree, | ||||
| required by | required by | |||
| Treeless_Literals_Block type. | Treeless_Literals_Block type. | |||
| </t> | </li> | |||
| <t> Previous Finite State Entropy (FSE) decoding | <li> Previous Finite State Entropy (FSE) decoding | |||
| tables, required by | tables, required by | |||
| Repeat_Mode, for each symbol | Repeat_Mode, for each symbol | |||
| type (literals lengths, | type (literals length codes, | |||
| match lengths, offsets). </t> | match length codes, offset codes) | |||
| </list> </t> | . </li> | |||
| </ul> | ||||
| <t> Note that decoding tables are not | <t> Note that decoding tables are not | |||
| always from the previous | always from the previous | |||
| Compressed_Block: | Compressed_Block: | |||
| <list style="symbols"> | </t> | |||
| <t> Every decoding table can | <ul spacing="normal"> | |||
| come from a dictionary. </t> | <li> Every decoding table can | |||
| come from a dictionary. </li> | ||||
| <t> The Huffman tree comes from | <li> The Huffman tree comes from | |||
| the previous | the previous | |||
| Compressed_Literals_Block. </ | Compressed_Literals_Block. </ | |||
| t> | li> | |||
| </list></t> | </ul> | |||
| <section anchor="comp_literals" numbered="true" toc="default"> | ||||
| <section anchor="comp_literals" | <name>Literals_Section_Header</name> | |||
| title="Literals_Section_Header"> | <t> All literals are regrouped | |||
| <t> All literals are regrouped | ||||
| in the first part of the | in the first part of the | |||
| block. They can be decoded | block. They can be decoded | |||
| first and then copied during | first and then copied during | |||
| Sequence Execution (see | Sequence Execution (see | |||
| <xref target="comp_sequence_exec" />), | <xref target="comp_sequence_exec" format="default"/>), | |||
| or they can be decoded on the | or they can be decoded on the | |||
| flow during Sequence | flow during Sequence | |||
| Execution. </t> | Execution. </t> | |||
| <t> Literals can be stored | ||||
| <t> Literals can be stored | ||||
| uncompressed or compressed | uncompressed or compressed | |||
| using Huffman prefix codes. | using Huffman prefix codes. | |||
| When compressed, an optional | When compressed, an optional | |||
| tree description can be | tree description can be | |||
| present, followed by 1 or | present, followed by 1 or | |||
| 4 streams. | 4 streams. | |||
| <figure><artwork> | </t> | |||
| +----------------------------+ | <table anchor="compressed-literals"> | |||
| | Literals_Section_Header | | <name>Compressed Literals</name> | |||
| +----------------------------+ | <tbody> | |||
| | [Huffman_Tree_Description] | | <tr> | |||
| +----------------------------+ | <td align="center">Literals_Section_Header</td> | |||
| | [Jump_Table] | | </tr> | |||
| +----------------------------+ | <tr> | |||
| | Stream_1 | | <td align="center">[Huffman_Tree_Description]</td> | |||
| +----------------------------+ | </tr> | |||
| | [Stream_2] | | <tr> | |||
| +----------------------------+ | <td align="center">[Jump_Table]</td> | |||
| | [Stream_3] | | </tr> | |||
| +----------------------------+ | <tr> | |||
| | [Stream_4] | | <td align="center">Stream_1</td> | |||
| +----------------------------+ | </tr> | |||
| </artwork></figure></t> | <tr> | |||
| <td align="center">[Stream_2]</td> | ||||
| <section title="Literals_Section_ | </tr> | |||
| Header"> | <tr> | |||
| <t> | <td align="center">[Stream_3]</td> | |||
| This field describes | </tr> | |||
| how literals are | <tr> | |||
| packed. It's a | <td align="center">[Stream_4]</td> | |||
| byte-aligned | </tr> | |||
| variable-size bit field, | </tbody> | |||
| ranging from 1 to | </table> | |||
| 5 bytes, using | <section numbered="true" toc="default"> | |||
| little-endian | <name>Literals_Section_Header</name> | |||
| convention. | <t> | |||
| This field describes | ||||
| <figure><artwork> | how literals are | |||
| +---------------------+-----------+ | packed. It's a | |||
| | Literals_Block_Type | 2 bits | | byte-aligned | |||
| +---------------------+-----------+ | variable-size bit field, | |||
| | Size_Format | 1-2 bits | | ranging from 1 to | |||
| +---------------------+-----------+ | 5 bytes, using | |||
| | Regenerated_Size | 5-20 bits | | little-endian | |||
| +---------------------+-----------+ | convention. | |||
| | [Compressed_Size] | 0-18 bits | | </t> | |||
| +---------------------+-----------+ | ||||
| </artwork></figure>< | ||||
| /t> | ||||
| <t> In this | ||||
| representation, bits at | ||||
| the top are the lowest | ||||
| bits. </t> | ||||
| <t> The | ||||
| Literals_Block_Type | ||||
| field uses the two | ||||
| lowest bits of the | ||||
| first byte, describing | ||||
| four different block | ||||
| types: | ||||
| <figure><artwork> | ||||
| +---------------------------+-------+ | ||||
| | Literals_Block_Type | Value | | ||||
| +---------------------------+-------+ | ||||
| | Raw_Literals_Block | 0 | | ||||
| +---------------------------+-------+ | ||||
| | RLE_Literals_Block | 1 | | ||||
| +---------------------------+-------+ | ||||
| | Compressed_Literals_Block | 2 | | ||||
| +---------------------------+-------+ | ||||
| | Treeless_Literals_Block | 3 | | ||||
| +---------------------------+-------+ | ||||
| </artwork></figure>< | ||||
| /t> | ||||
| <t> <list style="hanging" | ||||
| > | ||||
| <t hangText="Raw_Lite | ||||
| rals_Block:"> | ||||
| Literals are | ||||
| stored | ||||
| uncompressed. | ||||
| Literals_Section_ | ||||
| Content | ||||
| is | ||||
| Regenerated_Size. | ||||
| </t> | ||||
| <t hangText="RLE_Lite | ||||
| rals_Block:"> | ||||
| Literals | ||||
| consist of a | ||||
| single-byte | ||||
| value repeated | ||||
| Regenerated_Size | ||||
| times. | ||||
| Literals_Section_ | ||||
| Content | ||||
| is 1. </t> | ||||
| <t hangText="Compress | ||||
| ed_Literals_Block:"> | ||||
| This is a | ||||
| standard | ||||
| Huffman-compresse | ||||
| d | ||||
| block, starting | ||||
| with a Huffman | ||||
| tree | ||||
| description. | ||||
| See details | ||||
| below. Literals_ | ||||
| Section_Content | ||||
| is | ||||
| Compressed_Size. | ||||
| </t> | ||||
| <t hangText="Treeless | ||||
| _Literals_Block:"> | ||||
| This is a | ||||
| Huffman-compresse | ||||
| d | ||||
| block, using the | ||||
| Huffman tree | ||||
| from the previous | ||||
| Compressed_Litera | ||||
| ls_Block, | ||||
| or a dictionary | ||||
| if there is no pr | ||||
| evious | ||||
| Huffman-compresse | ||||
| d | ||||
| literals block. | ||||
| Huffman_Tree_Desc | ||||
| ription | ||||
| will be | ||||
| skipped. Note | ||||
| that if this | ||||
| mode is | ||||
| triggered | ||||
| without any | ||||
| previous | ||||
| Huffman-table | ||||
| in the frame | ||||
| (or | ||||
| dictionary, per | ||||
| <xref target="com | ||||
| p_dict"/>), | ||||
| it should be | ||||
| treated as data | ||||
| corruption. | ||||
| Literals_Section_ | ||||
| Content | ||||
| is Compressed_Siz | ||||
| e. | ||||
| </t> | ||||
| </list> </t> | ||||
| <t> The Size_Format is | ||||
| divided into two | ||||
| families: | ||||
| <list style="symbols" | ||||
| > | ||||
| <t> | ||||
| For | ||||
| Raw_Literals_Bloc | ||||
| k | ||||
| and | ||||
| RLE_Literals_Bloc | ||||
| k, | ||||
| it's only | ||||
| necessary to | ||||
| decode | ||||
| Regenerated_Size. | ||||
| There is no | ||||
| Compressed_Size | ||||
| field. </t> | ||||
| <t> | ||||
| For | ||||
| Compressed_Block | ||||
| and | ||||
| Treeless_Literals | ||||
| _Block, | ||||
| it's required | ||||
| to decode both | ||||
| Compressed_Size | ||||
| and | ||||
| Regenerated_Size | ||||
| (the | ||||
| decompressed | ||||
| size). It's | ||||
| also necessary | ||||
| to decode the | ||||
| number of | ||||
| streams | ||||
| (1 or 4). </t> | ||||
| </list> </t> | ||||
| <t> For values | ||||
| spanning several bytes, | ||||
| the convention is | ||||
| little endian. </t> | ||||
| <t> Size_Format for | ||||
| Raw_Literals_Block | ||||
| and RLE_Literals_Block | ||||
| uses 1 or 2 bits. Its | ||||
| value is (Literals_Sectio | ||||
| n_Header[0]>>2) & 0x3. | ||||
| <list style="hanging" | ||||
| > | ||||
| <t hangText="Size | ||||
| _Format == 00 or 10:"> | ||||
| Size_Format | ||||
| uses 1 bit. | ||||
| Regenerated_Size | ||||
| uses 5 bits | ||||
| (value 0-31). | ||||
| Literals_Section_ | ||||
| Header | ||||
| uses 1 byte. | ||||
| Regenerated_Size | ||||
| = Literal_Section | ||||
| _Header[0]>>3. | ||||
| </t> | ||||
| <t hangText="Size | ||||
| _Format == 01:"> | ||||
| Size_Format | ||||
| uses 2 bits. | ||||
| Regenerated_Size | ||||
| uses 12 bits | ||||
| (values 0-4095). | ||||
| Literals_Section_ | ||||
| Header | ||||
| uses 2 bytes. | ||||
| Regenerated_Size | ||||
| = | ||||
| (Literals_Section | ||||
| _Header[0]>>4) | ||||
| + | ||||
| (Literals_Section | ||||
| _Header[1]<<4). | ||||
| </t> | ||||
| <t hangText="Size | ||||
| _Format == 11:"> | ||||
| Size_Format | ||||
| uses 2 bits. | ||||
| Regenerated_Size | ||||
| uses 20 bits | ||||
| (values | ||||
| 0-1048575). | ||||
| Literals_Section_ | ||||
| Header | ||||
| uses 3 | ||||
| bytes. | ||||
| Regenerated_Size | ||||
| = | ||||
| (Literals_Section | ||||
| _Header[0]>>4) | ||||
| + | ||||
| (Literals_Section | ||||
| _Header[1]<<4) | ||||
| + | ||||
| (Literals_Section | ||||
| _Header[2]<<12) | ||||
| </t> | ||||
| </list> </t> | <table anchor="Literals_Section_Header"> | |||
| <name>Literals_Section_Header</name> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">Literals_Block_Type</td> | ||||
| <td align="center">2 bits</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">Size_Format</td> | ||||
| <td align="center">1-2 bits</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">Regenerated_Size</td> | ||||
| <td align="center">5-20 bits</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">[Compressed_Size] </td> | ||||
| <td align="center">0-18 bits</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> Only Stream_1 is | <t> In this | |||
| present for these | representation, bits at | |||
| cases. Note that it is | the top are the lowest | |||
| permitted to represent | bits. </t> | |||
| a short value (for | <t> The | |||
| example, 13) using a | Literals_Block_Type | |||
| long format, even if | field uses the two | |||
| it's less efficient. | lowest bits of the | |||
| </t> | first byte, describing | |||
| four different block | ||||
| types: | ||||
| </t> | ||||
| <t> Size_Format for | <table anchor="Literals_Block_Type"> | |||
| Compressed_Literals_Block | <name>Literals_Block_Type</name> | |||
| and | <thead> | |||
| Treeless_Literals_Block | <tr> | |||
| always uses 2 bits. | <th align="center">Literals_Block_Type</th> | |||
| <list style="hanging" | <th align="center">Value</th> | |||
| > | </tr> | |||
| <t hangText="Size | </thead> | |||
| _Format == 00:"> | <tbody> | |||
| <tr> | ||||
| <td align="center">Raw_Literals_Block</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">RLE_Literals_Block</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">Compressed_Literals_Block</td> | ||||
| <td align="center">2</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">Treeless_Literals_Block</td> | ||||
| <td align="center">3</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>Raw_Literals_Block:</dt> | ||||
| <dd> | ||||
| Literals are | ||||
| stored | ||||
| uncompressed. | ||||
| Literals_Section_Content | ||||
| is | ||||
| Regenerated_Size. | ||||
| </dd> | ||||
| <dt>RLE_Literals_Block:</dt> | ||||
| <dd> | ||||
| Literals consist of a single-byte value repeated | ||||
| Regenerated_Size times. Literals_Section_Content is | ||||
| 1. </dd> | ||||
| <dt>Compressed_Literals_Block:</dt> | ||||
| <dd> | ||||
| This is a standard Huffman-compressed | ||||
| block, starting with a Huffman tree description. | ||||
| See details below. Literals_Section_Content | ||||
| is Compressed_Size. | ||||
| </dd> | ||||
| <dt>Treeless_Literals_Block:</dt> | ||||
| <dd> | ||||
| This is a | ||||
| Huffman-compressed | ||||
| block, using the | ||||
| Huffman tree | ||||
| from the previous | ||||
| Compressed_Literals_Block | ||||
| or a dictionary | ||||
| if there is no previous | ||||
| Huffman-compressed | ||||
| literals block. | ||||
| Huffman_Tree_Description | ||||
| will be | ||||
| skipped. Note that if this mode is triggered without any previous Huffman table | ||||
| in the frame (or dictionary, per <xref target="comp_dict" format="default"/>), | ||||
| it should be treated as data corruption. Literals_Section_Content is | ||||
| Compressed_Size. | ||||
| </dd> | ||||
| </dl> | ||||
| <t> The Size_Format is divided into two families: | ||||
| </t> | ||||
| <ul spacing="normal"> | ||||
| <li> | ||||
| For Raw_Literals_Block and RLE_Literals_Block, | ||||
| it's only necessary to decode Regenerated_Size. | ||||
| There is no Compressed_Size field.</li> | ||||
| <li> | ||||
| For Compressed_Block and Treeless_Literals_Block, | ||||
| it's required to decode both Compressed_Size | ||||
| and Regenerated_Size (the decompressed | ||||
| size). It's also necessary to decode the | ||||
| number of streams (1 or 4). </li> | ||||
| </ul> | ||||
| <t> For values | ||||
| spanning several bytes, | ||||
| the convention is | ||||
| little endian. </t> | ||||
| <t> Size_Format for | ||||
| Raw_Literals_Block | ||||
| and RLE_Literals_Block | ||||
| uses 1 or 2 bits. Its | ||||
| value is (Literals_Section_Header[0]>>2) & 0x3. | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>Size_Format == 00 or 10:</dt> | ||||
| <dd> | ||||
| Size_Format | ||||
| uses 1 bit. | ||||
| Regenerated_Size | ||||
| uses 5 bits | ||||
| (value 0-31). | ||||
| Literals_Section_Header | ||||
| uses 1 byte. | ||||
| Regenerated_Size | ||||
| = Literal_Section_Header[0]>>3. | ||||
| </dd> | ||||
| <dt>Size_Format == 01:</dt> | ||||
| <dd> | ||||
| Size_Format | ||||
| uses 2 bits. | ||||
| Regenerated_Size | ||||
| uses 12 bits | ||||
| (values 0-4095). | ||||
| Literals_Section_Header | ||||
| uses 2 bytes. | ||||
| Regenerated_Size | ||||
| = | ||||
| (Literals_Section_Header[0]>>4) | ||||
| + | ||||
| (Literals_Section_Header[1]<<4). | ||||
| </dd> | ||||
| <dt>Size_Format == 11:</dt> | ||||
| <dd> | ||||
| Size_Format | ||||
| uses 2 bits. | ||||
| Regenerated_Size | ||||
| uses 20 bits | ||||
| (values | ||||
| 0-1048575). | ||||
| Literals_Section_Header | ||||
| uses 3 | ||||
| bytes. | ||||
| Regenerated_Size | ||||
| = | ||||
| (Literals_Section_Header[0]>>4) | ||||
| + | ||||
| (Literals_Section_Header[1]<<4) | ||||
| + | ||||
| (Literals_Section_Header[2]<<12). | ||||
| </dd> | ||||
| </dl> | ||||
| <t> Only Stream_1 is | ||||
| present for these | ||||
| cases. Note that it is | ||||
| permitted to represent | ||||
| a short value (for | ||||
| example, 13) using a | ||||
| long format, even if | ||||
| it's less efficient. | ||||
| </t> | ||||
| <t> Size_Format for | ||||
| Compressed_Literals_Block | ||||
| and | ||||
| Treeless_Literals_Block | ||||
| always uses 2 bits. | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>Size_Format == 00:</dt> | ||||
| <dd> | ||||
| A single | A single | |||
| stream. Both | stream. Both | |||
| Regenerated_Size | Regenerated_Size | |||
| and Compressed_Si ze | and Compressed_Si ze | |||
| use 10 bits | use 10 bits | |||
| (values | (values | |||
| 0-1023). | 0-1023). | |||
| Literals_Section_ Header | Literals_Section_ Header | |||
| uses 3 | uses 3 | |||
| bytes. | bytes. | |||
| </t> | </dd> | |||
| <dt>Size_Format == 01:</dt> | ||||
| <t hangText="Size | <dd> | |||
| _Format == 01:"> | ||||
| 4 streams. | 4 streams. | |||
| Both | Both | |||
| Regenerated_Size | Regenerated_Size | |||
| and | and | |||
| Compressed_Size | Compressed_Size | |||
| use 10 bits | use 10 bits | |||
| (values | (values | |||
| 0-1023). | 0-1023). | |||
| Literals_Section_ Header | Literals_Section_ Header | |||
| uses 3 | uses 3 | |||
| bytes. | bytes. | |||
| </t> | </dd> | |||
| <dt>Size_Format == 10:</dt> | ||||
| <t hangText="Size | <dd> | |||
| _Format == 10:"> | ||||
| 4 streams. | 4 streams. | |||
| Both | Both | |||
| Regenerated_Size | Regenerated_Size | |||
| and | and | |||
| Compressed_Size | Compressed_Size | |||
| use 14 bits | use 14 bits | |||
| (values | (values | |||
| 0-16383). | 0-16383). | |||
| Literals_Section_ Header | Literals_Section_ Header | |||
| uses 4 | uses 4 | |||
| bytes. | bytes. | |||
| </t> | </dd> | |||
| <dt>Size_Format == 11:</dt> | ||||
| <t hangText="Size | <dd> | |||
| _Format == 11:"> | ||||
| 4 streams. | 4 streams. | |||
| Both | Both | |||
| Regenerated_Size | Regenerated_Size | |||
| and | and | |||
| Compressed_Size | Compressed_Size | |||
| use 18 bits | use 18 bits | |||
| (values | (values | |||
| 0-262143). | 0-262143). | |||
| Literals_Section_ Header | Literals_Section_ Header | |||
| uses 5 | uses 5 | |||
| bytes. | bytes. | |||
| </t> | </dd> | |||
| </list> </t> | </dl> | |||
| <t> Both the | ||||
| <t> Both the | ||||
| Compressed_Size and | Compressed_Size and | |||
| Regenerated_Size fields | Regenerated_Size fields | |||
| follow little-endian | follow little-endian | |||
| convention. Note that | convention. Note that | |||
| Compressed_Size | Compressed_Size | |||
| includes the size of | includes the size of | |||
| the Huffman_Tree_Descript ion | the Huffman_Tree_Descript ion | |||
| when it is | when it is | |||
| present. </t> | present. </t> | |||
| </section> | </section> | |||
| <section numbered="true" toc="default"> | ||||
| <section title="Raw_Literals_Bloc | <name>Raw_Literals_Block</name> | |||
| k"> | <t> | |||
| <t> | ||||
| The data in Stream_1 is | The data in Stream_1 is | |||
| Regenerated_Size bytes | Regenerated_Size bytes | |||
| long. It contains | long. It contains | |||
| the raw literals data | the raw literals data | |||
| to be used during | to be used during | |||
| Sequence Execution | Sequence Execution | |||
| (<xref target="comp_seque | (<xref target="comp_seque | |||
| nces"/>). | nces" format="default"/>). | |||
| </t> | </t> | |||
| </section> | </section> | |||
| <section numbered="true" toc="default"> | ||||
| <section title="RLE_Literals_Bloc | <name>RLE_Literals_Block</name> | |||
| k"> | <t> | |||
| <t> | ||||
| Stream_1 consists of a | Stream_1 consists of a | |||
| single byte that | single byte that | |||
| should be repeated | should be repeated | |||
| Regenerated_Size times | Regenerated_Size times | |||
| to generate the | to generate the | |||
| decoded literals. | decoded literals. | |||
| </t> | </t> | |||
| </section> | </section> | |||
| <section numbered="true" toc="default"> | ||||
| <name>Compressed_Literals_Block and Treeless_Literals_Block</nam | ||||
| e> | ||||
| <t> | ||||
| <section title="Compressed_Litera | ||||
| ls_Block and Treeless_Literals_Block"> | ||||
| <t> | ||||
| Both of these modes | Both of these modes | |||
| contain Huffman-encoded | contain Huffman-coded | |||
| data. | data. | |||
| For Treeless_Literals_Blo ck, | For Treeless_Literals_Blo ck, | |||
| the Huffman table comes f rom | the Huffman table comes f rom | |||
| the previously | the previously | |||
| compressed literals | compressed literals | |||
| block, or from a | block, or from a | |||
| dictionary; | dictionary; | |||
| see <xref target="comp_di | see <xref target="comp_di | |||
| ct"/>. | ct" format="default"/>. | |||
| </t> | </t> | |||
| </section> | </section> | |||
| <section numbered="true" toc="default"> | ||||
| <section title="Huffman_Tree_Desc | <name>Huffman_Tree_Description</name> | |||
| ription"> | <t> | |||
| <t> | ||||
| This section is | This section is | |||
| only present | only present | |||
| when the | when the | |||
| Literals_Block_Type type | Literals_Block_Type type | |||
| is | is | |||
| Compressed_Literals_Block | Compressed_Literals_Block | |||
| (2). The format | (2). The format | |||
| of Huffman_Tree_Descripti on | of Huffman_Tree_Descripti on | |||
| can be found in | can be found in | |||
| <xref target="huffman_tre e_desc"/>. | <xref target="huffman_tre e_desc" format="default"/>. | |||
| The size of | The size of | |||
| Huffman_Tree_Description | Huffman_Tree_Description | |||
| is determined | is determined | |||
| during the | during the | |||
| decoding process. It | decoding process. It | |||
| must be used | must be used | |||
| to determine | to determine | |||
| where streams | where streams | |||
| begin. | begin. | |||
| <figure><artwork> | </t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| Total_Streams_Size = Compressed_Size | Total_Streams_Size = Compressed_Size | |||
| - Huffman_Tree_Description_Size | - Huffman_Tree_Description_Size | |||
| </artwork></figu | ]]></artwork> | |||
| re></t> | </section> | |||
| </section> | <section numbered="true" toc="default"> | |||
| <name>Jump_Table</name> | ||||
| <section title="Jump_Table"> | <t> The Jump_Table | |||
| <t> The Jump_Table | ||||
| is only present | is only present | |||
| when there are | when there are | |||
| 4 Huffman-coded | 4 Huffman-coded | |||
| streams. </t> | streams. </t> | |||
| <t> (Reminder: | ||||
| <t> (Reminder: | ||||
| Huffman-compresse d | Huffman-compresse d | |||
| data | data | |||
| consists of | consists of | |||
| either 1 or | either 1 or | |||
| 4 | 4 | |||
| Huffman-coded | Huffman-coded | |||
| streams.) | streams.) | |||
| </t> | </t> | |||
| <t> If only 1 | ||||
| <t> If only 1 | ||||
| stream is | stream is | |||
| present, it is | present, it is | |||
| a single | a single | |||
| bitstream | bitstream | |||
| occupying the | occupying the | |||
| entire | entire | |||
| remaining | remaining | |||
| portion of the | portion of the | |||
| literals | literals | |||
| block, encoded | block, encoded | |||
| as described | as described | |||
| within | within | |||
| <xref target="huf | <xref target="huf | |||
| fman_coded_streams"/>. | fman_coded_streams" format="default"/>. | |||
| </t> | </t> | |||
| <t> If there are | ||||
| <t> If there are | ||||
| 4 streams, | 4 streams, | |||
| Literals_Section_Header | Literals_Section_Header | |||
| only provides | only provides | |||
| enough | enough | |||
| information to | information to | |||
| know the | know the | |||
| decompressed | decompressed | |||
| and compressed | and compressed | |||
| sizes of all | sizes of all | |||
| 4 streams | 4 streams | |||
| skipping to change at line 1410 ¶ | skipping to change at line 1437 ¶ | |||
| last stream, | last stream, | |||
| which may be | which may be | |||
| up to 3 | up to 3 | |||
| bytes smaller, | bytes smaller, | |||
| to reach a | to reach a | |||
| total | total | |||
| decompressed | decompressed | |||
| size as | size as | |||
| specified in | specified in | |||
| Regenerated_Size. </t> | Regenerated_Size. </t> | |||
| <t> | ||||
| <t> | ||||
| The compressed | The compressed | |||
| size of each | size of each | |||
| stream is | stream is | |||
| provided | provided | |||
| explicitly in | explicitly in | |||
| the Jump_Table. | the Jump_Table. | |||
| The Jump_Table | The Jump_Table | |||
| is 6 bytes long and | is 6 bytes long and | |||
| consists of three | consists of three | |||
| 2-byte | 2-byte | |||
| skipping to change at line 1433 ¶ | skipping to change at line 1459 ¶ | |||
| fields, | fields, | |||
| describing the | describing the | |||
| compressed | compressed | |||
| sizes of the | sizes of the | |||
| first 3 | first 3 | |||
| streams. | streams. | |||
| Stream4_Size | Stream4_Size | |||
| is computed | is computed | |||
| from | from | |||
| Total_Streams_Siz e | Total_Streams_Siz e | |||
| minus sizes of | minus the sizes o f | |||
| other streams. | other streams. | |||
| <figure><artwork> | </t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| Stream4_Size = Total_Streams_Size - 6 | Stream4_Size = Total_Streams_Size - 6 | |||
| - Stream1_Size - Stream2_Size | - Stream1_Size - Stream2_Size | |||
| - Stream3_Size | - Stream3_Size | |||
| </artwork></figu | ]]></artwork> | |||
| re></t> | <t> | |||
| <t> | ||||
| Note that if | Note that if | |||
| Stream1_Size + | Stream1_Size + | |||
| Stream2_Size + | Stream2_Size + | |||
| Stream3_Size | Stream3_Size | |||
| exceeds | exceeds | |||
| Total_Streams_Siz e, | Total_Streams_Siz e, | |||
| the data are | the data are | |||
| considered | considered | |||
| corrupted. </t> | corrupted. </t> | |||
| <t> | <t> | |||
| Each of these | Each of these | |||
| 4 bitstreams | 4 bitstreams | |||
| is then decoded | is then decoded | |||
| independently | independently | |||
| as a | as a | |||
| Huffman-Coded | Huffman-coded | |||
| stream, as | stream, as | |||
| described in | described in | |||
| <xref target="huf | <xref target="huf | |||
| fman_coded_streams"/>. | fman_coded_streams" format="default"/>. | |||
| </t> | </t> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section anchor="comp_sequences" numbered="true" toc="default"> | ||||
| <section anchor="comp_sequences" | <name>Sequences_Section</name> | |||
| title="Sequences_Section"> | <t> A compressed block is a | |||
| <t> A compressed block is a | ||||
| succession of sequences. | succession of sequences. | |||
| A sequence is a literal | A sequence is a literal | |||
| copy command, followed by | copy command, followed by | |||
| a match copy command. A | a match copy command. A | |||
| literal copy command | literal copy command | |||
| specifies a length. It is | specifies a length. It is | |||
| the number of bytes to be | the number of bytes to be | |||
| copied (or extracted) from | copied (or extracted) from | |||
| the Literals Section. | the Literals_Section. | |||
| A match copy command | A match copy command | |||
| specifies an offset and a | specifies an offset and a | |||
| length. </t> | length. </t> | |||
| <t> When all sequences are | ||||
| <t> When all sequences are | ||||
| decoded, if there are | decoded, if there are | |||
| literals left in the | literals left in the | |||
| literals section, these | Literals_Section, these | |||
| bytes are added at the | bytes are added at the | |||
| end of the block. </t> | end of the block. </t> | |||
| <t> This is described in more | ||||
| <t> This is described in more | ||||
| detail in | detail in | |||
| <xref target="comp_sequence_e | <xref target="comp_sequence_e | |||
| xec"/>. </t> | xec" format="default"/>. </t> | |||
| <t> The Sequences_Section | ||||
| <t> The Sequences_Section | ||||
| regroups all symbols | regroups all symbols | |||
| required to decode | required to decode | |||
| commands. There are three | commands. There are three | |||
| symbol types: literals | symbol types: literals | |||
| lengths, offsets, and match | length codes, offset codes, a | |||
| lengths. They are encoded | nd match | |||
| length codes. They are encod | ||||
| ed | ||||
| together, interleaved, in | together, interleaved, in | |||
| a single "bitstream". </t> | a single "bitstream". </t> | |||
| <t> The Sequences_Section | ||||
| <t> The Sequences_Section | ||||
| starts by a header, | starts by a header, | |||
| followed by optional | followed by optional | |||
| probability tables for | probability tables for | |||
| each symbol type, followed | each symbol type, followed | |||
| by the bitstream. </t> | by the bitstream. </t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| <t> <figure><artwork> | ||||
| Sequences_Section_Header | Sequences_Section_Header | |||
| [Literals_Length_Table] | [Literals_Length_Table] | |||
| [Offset_Table] | [Offset_Table] | |||
| [Match_Length_Table] | [Match_Length_Table] | |||
| bitStream | bitStream | |||
| </artwork></figure> </t> | ]]></artwork> | |||
| <t> To decode the | ||||
| <t> To decode the | ||||
| Sequences_Section, it's | Sequences_Section, it's | |||
| necessary to know its | necessary to know its | |||
| size. This size is deduced | size. This size is deduced | |||
| from the size of the Literals _Section: | from the size of the Literals _Section: | |||
| Sequences_Section_Size = Bloc | Sequences_Section_Size = Bloc | |||
| k_Size - Literals_Section_Header - Literals_Section_Content </t> | k_Size - Literals_Section_Header - Literals_Section_Content. </t> | |||
| <section anchor="seq_sec_hdr" numbered="true" toc="default"> | ||||
| <section title="Sequences_Section | <name>Sequences_Section_Header</name> | |||
| _Header" anchor="seq_sec_hdr"> | <t> This header | |||
| <t> This header | ||||
| consists of two | consists of two | |||
| items: | items: | |||
| <list style="symbols" | </t> | |||
| > | <ul spacing="normal"> | |||
| <t> Number_of_Seq | <li> Number_of_Sequences </li> | |||
| uences </t> | <li> Symbol_Compression_Modes </li> | |||
| <t> Symbol_Compre | </ul> | |||
| ssion_Modes </t> | <t> Number_of_Sequences | |||
| </list> </t> | is a variable size | |||
| field using between | ||||
| <t> Number_of_Sequences | 1 and 3 | |||
| is a variable size | bytes. If the | |||
| field using between | first byte is | |||
| 1 and 3 | "byte0": | |||
| bytes. If the | ||||
| first byte is | ||||
| "byte0": | ||||
| <list style="symbols" | </t> | |||
| > | <ul spacing="normal"> | |||
| <t> if | <li> if | |||
| (byte0 == | (byte0 == | |||
| 0): there | 0): there | |||
| are no | are no | |||
| sequences. | sequences. | |||
| The | The | |||
| sequence | sequence | |||
| section | section | |||
| stops here. | stops here. | |||
| Decompressed | Decompressed | |||
| content is | content is | |||
| defined | defined | |||
| entirely as | entirely as | |||
| Literals | Literals_Sect | |||
| Section | ion | |||
| content. | content. | |||
| The FSE | The FSE | |||
| tables used | tables used | |||
| in Repeat_Mod e | in Repeat_Mod e | |||
| are not | are not | |||
| updated. </t> | updated. </li | |||
| > | ||||
| <t> if (byte0 | <li> if (byte0 | |||
| < 128): | < 128): | |||
| Number_of_Seq uences | Number_of_Seq uences | |||
| = byte0. | = byte0. | |||
| Uses 1 | Uses 1 | |||
| byte. </t> | byte. </li> | |||
| <li> if (byte0 | ||||
| <t> if (byte0 | ||||
| < 255): | < 255): | |||
| Number_of_Seq uences | Number_of_Seq uences | |||
| = | = | |||
| ((byte0 - 128 ) | ((byte0 - 128 ) | |||
| << 8) + | << 8) + | |||
| byte1. Uses | byte1. Uses | |||
| 2 bytes. </t> | 2 bytes. </li | |||
| > | ||||
| <t> if (byte0 | <li> if (byte0 | |||
| == 255): | == 255): | |||
| Number_of_Seq uences | Number_of_Seq uences | |||
| = byte1 + | = byte1 + | |||
| (byte2 <&l t; 8) | (byte2 <&l t; 8) | |||
| + 0x7F00. | + 0x7F00. | |||
| Uses 3 | Uses 3 | |||
| bytes. </t> | bytes. </li> | |||
| </list> </t> | </ul> | |||
| <t> Symbol_Compression_Modes | ||||
| <t> Symbol_Compression_Mo | is a single byte, | |||
| des | defining the | |||
| is a single byte, | compression mode of | |||
| defining the | each symbol type. | |||
| compression mode of | </t> | |||
| each symbol type. | ||||
| <figure><artwork> | ||||
| +-------------+----------------------+ | ||||
| | Bit Number | Field Name | | ||||
| +-------------+----------------------+ | ||||
| | 7-6 | Literal_Lengths_Mode | | ||||
| +-------------+----------------------+ | ||||
| | 5-4 | Offsets_Mode | | ||||
| +-------------+----------------------+ | ||||
| | 3-2 | Match_Lengths_Mode | | ||||
| +-------------+----------------------+ | ||||
| | 1-0 | Reserved | | ||||
| +-------------+----------------------+ | ||||
| </artwork></figure> | ||||
| </t> | ||||
| <t> The last field, | <table anchor="Symbol_Compression_Modes"> | |||
| Reserved, must be | <name>Symbol_Compression_Modes</name> | |||
| all zeroes. </t> | <thead> | |||
| <tr> | ||||
| <th align="center">Bit Number</th> | ||||
| <th align="center">Field Name</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">7-6</td> | ||||
| <td align="center">Literal_Lengths_Mode</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">5-4</td> | ||||
| <td align="center">Offsets_Mode</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">3-2</td> | ||||
| <td align="center">Match_Lengths_Mode</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">1-0</td> | ||||
| <td align="center">Reserved</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> Literals_Lengths_Mode | <t> The last field, | |||
| , | Reserved, must be | |||
| Offsets_Mode, and | all zeroes. </t> | |||
| Match_Lengths_Mode | <t> Literals_Lengths_Mode, | |||
| define the | Offsets_Mode, and | |||
| Compression_Mode of | Match_Lengths_Mode | |||
| literals lengths, | define the | |||
| offsets, and match | Compression_Mode of | |||
| lengths symbols, | literals length codes, | |||
| respectively. They | offset codes, and match | |||
| follow the same | length codes, | |||
| enumeration: | respectively. They | |||
| follow the same | ||||
| enumeration: | ||||
| </t> | ||||
| <figure><artwork> | <table anchor="literals"> | |||
| +-------+---------------------+ | <name>Literals_Lengths_Mode, Offsets_Mode, and Match_Lengths_Mode</name> | |||
| | Value | Compression_Mode | | <thead> | |||
| +-------+---------------------+ | <tr> | |||
| | 0 | Predefined_Mode | | <th align="center">Value</th> | |||
| +-------+---------------------+ | <th align="center">Compression_Mode</th> | |||
| | 1 | RLE_Mode | | </tr> | |||
| +-------+---------------------+ | </thead> | |||
| | 2 | FSE_Compressed_Mode | | <tbody> | |||
| +-------+---------------------+ | <tr> | |||
| | 3 | Repeat_Mode | | <td align="center">0</td> | |||
| +-------+---------------------+ | <td align="center">Predefined_Mode</td> | |||
| </artwork></figure>< | </tr> | |||
| /t> | <tr> | |||
| <td align="center">1</td> | ||||
| <td align="center">RLE_Mode</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">2</td> | ||||
| <td align="center">FSE_Compressed_Mode</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">3</td> | ||||
| <td align="center">Repeat_Mode</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> <list style="hanging" | <dl newline="false" spacing="normal"> | |||
| > | <dt>Predefined_Mode:</dt> | |||
| <t hangText="Predefin | <dd> | |||
| ed_Mode:"> | ||||
| A predefined | A predefined | |||
| FSE | FSE | |||
| (see <xref target ="comp_fse"/>) | (see <xref target ="comp_fse" format="default"/>) | |||
| distribution | distribution | |||
| table is used, as | table is used, as | |||
| defined in | defined in | |||
| <xref target="def ault_dist"/>. | <xref target="def ault_dist" format="default"/>. | |||
| No distribution | No distribution | |||
| table will be | table will be | |||
| present. </t> | present. </dd> | |||
| <dt>RLE_Mode:</dt> | ||||
| <t hangText="RLE_Mode | <dd> | |||
| :"> | ||||
| The table | The table | |||
| description | description | |||
| consists of a | consists of a | |||
| single byte, | single byte, | |||
| which contains | which contains | |||
| the symbol's | the symbol's | |||
| value. This | value. This | |||
| symbol will | symbol will | |||
| be used for | be used for | |||
| all sequences. | all sequences. | |||
| </t> | </dd> | |||
| <dt>FSE_Compressed_Mode:</dt> | ||||
| <t hangText="FSE_Comp | <dd> | |||
| ressed_Mode:"> | ||||
| Standard FSE | Standard FSE | |||
| compression. A | compression. A | |||
| distribution | distribution | |||
| table will be | table will be | |||
| present. The | present. The | |||
| format of this | format of this | |||
| distribution | distribution | |||
| table is | table is | |||
| described in | described in | |||
| <xref target="com p_fse_table"/>. | <xref target="com p_fse_table" format="default"/>. | |||
| Note that the | Note that the | |||
| maximum allowed | maximum allowed | |||
| accuracy log | accuracy log | |||
| for literals | for literals | |||
| length and | length code and | |||
| match length | match length code | |||
| tables is 9, | tables is 9, | |||
| and the | and the | |||
| maximum | maximum | |||
| accuracy log | accuracy log | |||
| for the | for the | |||
| offsets table | offset code table | |||
| is 8. | is 8. | |||
| This mode must | This mode must | |||
| not be used when | not be used when | |||
| only one symbol | only one symbol | |||
| is present; | is present; | |||
| RLE_Mode should | RLE_Mode should | |||
| be used instead | be used instead | |||
| (although any | (although any | |||
| other mode | other mode | |||
| will work). </t> | will work). </dd> | |||
| <dt>Repeat_Mode:</dt> | ||||
| <t hangText="Repeat_M | <dd> | |||
| ode:"> | ||||
| The table used | The table used | |||
| in the previous | in the previous | |||
| Compressed_Block | Compressed_Block | |||
| with | with | |||
| Number_Of_Sequenc es > 0 | Number_Of_Sequenc es > 0 | |||
| will be | will be | |||
| used again, or | used again, or | |||
| if this is the | if this is the | |||
| first block, | first block, | |||
| the table in | the table in | |||
| the dictionary | the dictionary | |||
| will be used. | will be used. | |||
| Note that this | Note that this | |||
| includes | includes | |||
| RLE_Mode, | RLE_Mode, | |||
| skipping to change at line 1745 ¶ | skipping to change at line 1790 ¶ | |||
| No distribution | No distribution | |||
| table will be | table will be | |||
| present. | present. | |||
| If this mode is | If this mode is | |||
| used without | used without | |||
| any previous | any previous | |||
| sequence table | sequence table | |||
| in the frame | in the frame | |||
| (or dictionary; | (or dictionary; | |||
| see | see | |||
| <xref target="com p_dict"/>) | <xref target="com p_dict" format="default"/>) | |||
| to repeat, this | to repeat, this | |||
| should be | should be | |||
| treated as | treated as | |||
| corruption. </t> | corruption. </dd> | |||
| </dl> | ||||
| </list></t> | <section anchor="codes_lengths_offsets" numbered="true" toc="def | |||
| ault"> | ||||
| <section title="Sequence | <name>Sequence Codes for Lengths and Offsets</name> | |||
| Codes for Lengths and Offsets" anchor="codes_lengths_offsets"> | <t> Each symbol is a | |||
| <t> Each symbol is a | ||||
| code in its own | code in its own | |||
| context, which | context, which | |||
| specifies Baseline | specifies Baseline | |||
| and Number_of_Bits | and Number_of_Bits | |||
| to add. Codes are | to add. Codes are | |||
| FSE compressed | FSE compressed | |||
| and interleaved | and interleaved | |||
| with raw additional | with raw additional | |||
| bits in the same | bits in the same | |||
| bitstream. </t> | bitstream. </t> | |||
| <t> Literals length | ||||
| <t> Literals length | ||||
| codes are values | codes are values | |||
| ranging from 0 to | ranging from 0 to | |||
| 35 inclusive. They | 35, inclusive. They | |||
| define lengths from | define lengths from | |||
| 0 to 131071 bytes. | 0 to 131071 bytes. | |||
| The literals length | The literals length | |||
| is equal to the | is equal to the | |||
| decoded Baseline | decoded Baseline | |||
| plus the result of | plus the result of | |||
| reading | reading | |||
| Number_of_Bits bits | Number_of_Bits bits | |||
| from the bitstream, | from the bitstream, | |||
| as a little-endian | as a little-endian | |||
| value. | value. | |||
| <figure><artwork> | </t> | |||
| +----------------------+----------+----------------+ | ||||
| | Literals_Length_Code | Baseline | Number_of_Bits | | ||||
| +----------------------+----------+----------------+ | ||||
| | 0-15 | length | 0 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 16 | 16 | 1 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 17 | 18 | 1 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 18 | 20 | 1 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 19 | 22 | 1 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 20 | 24 | 2 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 21 | 28 | 2 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 22 | 32 | 3 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 23 | 40 | 3 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 24 | 48 | 4 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 25 | 64 | 6 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 26 | 128 | 7 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 27 | 256 | 8 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 28 | 512 | 9 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 29 | 1024 | 10 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 30 | 2048 | 11 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 31 | 4096 | 12 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 32 | 8192 | 13 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 33 | 16384 | 14 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 34 | 32768 | 15 | | ||||
| +----------------------+----------+----------------+ | ||||
| | 35 | 65536 | 16 | | ||||
| +----------------------+----------+----------------+ | ||||
| </artwork></figure>< | ||||
| /t> | ||||
| <t> Match length codes | <table anchor="length"> | |||
| <name>Literals Length Codes</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="center">Literals_Length_Code</th> | ||||
| <th align="center">Baseline</th> | ||||
| <th align="center">Number_of_Bits</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">0-15</td> | ||||
| <td align="center">length</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">16</td> | ||||
| <td align="center">16</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">17</td> | ||||
| <td align="center">18</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">18</td> | ||||
| <td align="center">20</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">19</td> | ||||
| <td align="center">22</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">20</td> | ||||
| <td align="center">24</td> | ||||
| <td align="center">2</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">21</td> | ||||
| <td align="center">28</td> | ||||
| <td align="center">2</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">22</td> | ||||
| <td align="center">32</td> | ||||
| <td align="center">3</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">23</td> | ||||
| <td align="center">40</td> | ||||
| <td align="center">3</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">24</td> | ||||
| <td align="center">48</td> | ||||
| <td align="center">4</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">25</td> | ||||
| <td align="center">64</td> | ||||
| <td align="center">6</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">26</td> | ||||
| <td align="center">128</td> | ||||
| <td align="center">7</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">27</td> | ||||
| <td align="center">256</td> | ||||
| <td align="center">8</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">28</td> | ||||
| <td align="center">512</td> | ||||
| <td align="center">9</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">29</td> | ||||
| <td align="center">1024</td> | ||||
| <td align="center">10</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">30</td> | ||||
| <td align="center">2048</td> | ||||
| <td align="center">11</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">31</td> | ||||
| <td align="center">4096</td> | ||||
| <td align="center">12</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">32</td> | ||||
| <td align="center">8192</td> | ||||
| <td align="center">13</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">33</td> | ||||
| <td align="center">16384</td> | ||||
| <td align="center">14</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">34</td> | ||||
| <td align="center">32768</td> | ||||
| <td align="center">15</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">35</td> | ||||
| <td align="center">65536</td> | ||||
| <td align="center">16</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> Match length codes | ||||
| are values ranging | are values ranging | |||
| from 0 to 52 | from 0 to 52, | |||
| inclusive. They | inclusive. They | |||
| define lengths from | define lengths from | |||
| 3 to 131074 bytes. | 3 to 131074 bytes. | |||
| The match length is | The match length is | |||
| equal to the | equal to the | |||
| decoded Baseline | decoded Baseline | |||
| plus the result of | plus the result of | |||
| reading | reading | |||
| Number_of_Bits bits | Number_of_Bits bits | |||
| from the bitstream, | from the bitstream, | |||
| as a little-endian | as a little-endian | |||
| value. | value. | |||
| <figure><artwork> | </t> | |||
| +-------------------+-----------------------+----------------+ | ||||
| | Match_Length_Code | Baseline | Number_of_Bits | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 0-31 | Match_Length_Code + 3 | 0 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 32 | 35 | 1 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 33 | 37 | 1 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 34 | 39 | 1 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 35 | 41 | 1 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 36 | 43 | 2 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 37 | 47 | 2 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 38 | 51 | 3 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 39 | 59 | 3 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 40 | 67 | 4 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 41 | 83 | 4 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 42 | 99 | 5 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 43 | 131 | 7 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 44 | 259 | 8 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 45 | 515 | 9 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 46 | 1027 | 10 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 47 | 2051 | 11 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 48 | 4099 | 12 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 49 | 8195 | 13 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 50 | 16387 | 14 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 51 | 32771 | 15 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| | 52 | 65539 | 16 | | ||||
| +-------------------+-----------------------+----------------+ | ||||
| </artwork></figure>< | ||||
| /t> | ||||
| <t> Offset codes are | <table anchor="Match_Length_Code"> | |||
| <name>Match Length Codes</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="center">Match_Length_Code</th> | ||||
| <th align="center">Baseline</th> | ||||
| <th align="center">Number_of_Bits</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">0-31</td> | ||||
| <td align="center">Match_Length_Code + 3</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">32</td> | ||||
| <td align="center">35</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">33</td> | ||||
| <td align="center">37</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">34</td> | ||||
| <td align="center">39</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">35</td> | ||||
| <td align="center">41</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">36</td> | ||||
| <td align="center">43</td> | ||||
| <td align="center">2</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">37</td> | ||||
| <td align="center">47</td> | ||||
| <td align="center">2</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">38</td> | ||||
| <td align="center">51</td> | ||||
| <td align="center">3</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">39</td> | ||||
| <td align="center">59</td> | ||||
| <td align="center">3</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">40</td> | ||||
| <td align="center">67</td> | ||||
| <td align="center">4</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">41</td> | ||||
| <td align="center">83</td> | ||||
| <td align="center">4</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">42</td> | ||||
| <td align="center">99</td> | ||||
| <td align="center">5</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">43</td> | ||||
| <td align="center">131</td> | ||||
| <td align="center">7</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">44</td> | ||||
| <td align="center">259</td> | ||||
| <td align="center">8</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">45</td> | ||||
| <td align="center">515</td> | ||||
| <td align="center">9</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">46</td> | ||||
| <td align="center">1027</td> | ||||
| <td align="center">10</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">47</td> | ||||
| <td align="center">2051</td> | ||||
| <td align="center">11</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">48</td> | ||||
| <td align="center">4099</td> | ||||
| <td align="center">12</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">49</td> | ||||
| <td align="center">8195</td> | ||||
| <td align="center">13</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">50</td> | ||||
| <td align="center">16387</td> | ||||
| <td align="center">14</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">51</td> | ||||
| <td align="center">32771</td> | ||||
| <td align="center">15</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">52</td> | ||||
| <td align="center">65539</td> | ||||
| <td align="center">16</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> Offset codes are | ||||
| values ranging from | values ranging from | |||
| 0 to N. </t> | 0 to N. </t> | |||
| <t> A decoder is free | ||||
| <t> A decoder is free | ||||
| to limit its | to limit its | |||
| maximum supported | maximum supported | |||
| value for N. | value for N. | |||
| Support for values | Support for values | |||
| of at least 22 is | of at least 22 is | |||
| recommended. | recommended. | |||
| At the time of this | At the time of this | |||
| writing, the | writing, the | |||
| reference decoder | reference decoder | |||
| supports a maximum | supports a maximum | |||
| N value of 31. </t> | N value of 31. </t> | |||
| <t> An offset code is | ||||
| <t> An offset code is | ||||
| also the number of | also the number of | |||
| additional bits to | additional bits to | |||
| read in | read in | |||
| little-endian | little-endian | |||
| fashion and can | fashion and can | |||
| be translated into | be translated into | |||
| an Offset_Value | an Offset_Value | |||
| using the following | using the following | |||
| formulas: | formulas: | |||
| <figure><artwork> | </t> | |||
| Offset_Value = (1 << offsetCode) + readNBits(offsetCode); | <artwork name="" type="" align="left" alt=""><![CDATA[ | |||
| Offset_Value = (1 << offsetCode) + readNBits(offsetCode); | ||||
| if (Offset_Value > 3) Offset = Offset_Value - 3; | if (Offset_Value > 3) Offset = Offset_Value - 3; | |||
| </artwork></figure>< | ]]></artwork> | |||
| /t> | <t> This means that | |||
| maximum | ||||
| <t> This means that | Offset_Value is | |||
| maximum | (2<sup>N+1</sup>) - 1, | |||
| Offset_Value is | supporting | |||
| (2^(N+1))-1, | back-reference | |||
| supporting | distance up to | |||
| back-reference | (2<sup>N+1</sup>) - 4, but it is | |||
| distance up to | limited by the | |||
| (2^(N+1))-4, but it i | maximum | |||
| s | back-reference | |||
| limited by the | distance (see | |||
| maximum | <xref target="comp_window_descr" format="default"/>). </t> | |||
| back-reference | <t> Offset_Value from | |||
| distance (see | ||||
| <xref target="comp_wi | ||||
| ndow_descr"/>). </t> | ||||
| <t> Offset_Value from | ||||
| 1 to 3 are special: | 1 to 3 are special: | |||
| they define "repeat | they define "repeat | |||
| codes". This is | codes". This is | |||
| described in more | described in more | |||
| detail in | detail in | |||
| <xref target="repeat_ | <xref target="repeat_ | |||
| offsets"/>. </t> | offsets" format="default"/>. </t> | |||
| </section> | </section> | |||
| <section numbered="true" toc="default"> | ||||
| <section title="Decoding | <name>Decoding Sequences</name> | |||
| Sequences"> | <t> FSE bitstreams are | |||
| <t> FSE bitstreams are | ||||
| read in reverse of | read in reverse of | |||
| the direction they | the direction they | |||
| are written. In zstd, | are written. In zstd, | |||
| the compressor | the compressor | |||
| writes bits forward | writes bits forward | |||
| into a block, and | into a block, and | |||
| the decompressor | the decompressor | |||
| must read the | must read the | |||
| bitstream | bitstream | |||
| backwards. </t> | backwards. </t> | |||
| <t> To find the start | ||||
| <t> To find the start | ||||
| of the bitstream, it | of the bitstream, it | |||
| is therefore | is therefore | |||
| necessary to know | necessary to know | |||
| the offset of the | the offset of the | |||
| last byte of the | last byte of the | |||
| block, which can be | block, which can be | |||
| found by counting | found by counting | |||
| Block_Size bytes | Block_Size bytes | |||
| after the block | after the block | |||
| header. </t> | header. </t> | |||
| <t> After writing the | ||||
| <t> After writing the | ||||
| last bit containing | last bit containing | |||
| information, the | information, the | |||
| compressor writes a | compressor writes a | |||
| single 1 bit and | single 1 bit and | |||
| then fills the rest | then fills the rest | |||
| of the byte with | of the byte with | |||
| zero bits. The | zero bits. The | |||
| last byte of the | last byte of the | |||
| compressed | compressed | |||
| bitstream cannot be | bitstream cannot be | |||
| zero for that | zero for that | |||
| reason. </t> | reason. </t> | |||
| <t> When decompressing, | ||||
| <t> When decompressing, | ||||
| the last byte | the last byte | |||
| containing the | containing the | |||
| padding is the | padding is the | |||
| first byte to read. | first byte to read. | |||
| The decompressor | The decompressor | |||
| needs to skip the | needs to skip the | |||
| up to 7 bits of | up to 7 bits of | |||
| 0-padding as well | 0-padding as well | |||
| as the the first 1 | as the first 1 | |||
| bit that occurs. | bit that occurs. | |||
| Afterwards, the | Afterwards, the | |||
| useful part of the | useful part of the | |||
| bitstream | bitstream | |||
| begins. </t> | begins. </t> | |||
| <t> FSE decoding | <t> FSE decoding | |||
| requires a 'state' | requires a 'state' | |||
| to be carried from | to be carried from | |||
| symbol to symbol. | symbol to symbol. | |||
| For more | For more | |||
| explanation on FSE | explanation on FSE | |||
| decoding, see | decoding, see | |||
| <xref target="comp_fs | <xref target="comp_fse" format="default"/>. </t> | |||
| e"/>. </t> | <t> For sequence | |||
| <t> For sequence | ||||
| decoding, a | decoding, a | |||
| separate state | separate state | |||
| keeps track of | keeps track of | |||
| each literal | each literals | |||
| lengths, offsets, | length, offset, | |||
| and match lengths | and match length | |||
| symbols. Some FSE | code. Some FSE | |||
| primitives are | primitives are | |||
| also used. For | also used. For | |||
| more details on | more details on | |||
| the operation of | the operation of | |||
| these primitives, | these primitives, | |||
| see | see | |||
| <xref target="comp_fs | <xref target="comp_fs | |||
| e"/>. </t> | e" format="default"/>. </t> | |||
| <t> The bitstream | ||||
| <t> The bitstream | ||||
| starts with initial | starts with initial | |||
| FSE state values, | FSE state values, | |||
| each using the | each using the | |||
| required number of | required number of | |||
| bits in their | bits in their | |||
| respective | respective | |||
| accuracy, decoded | accuracy, decoded | |||
| previously from | previously from | |||
| their normalized | their normalized | |||
| distribution. It | distribution. It | |||
| starts with | starts with | |||
| Literals_Length_State , | Literals_Length_State , | |||
| followed by | followed by | |||
| Offset_State, and | Offset_State, and | |||
| finally | finally | |||
| Match_Length_State. < /t> | Match_Length_State. < /t> | |||
| <t> Note that all | ||||
| <t> Note that all | ||||
| values are read | values are read | |||
| backward, so the | backward, so the | |||
| 'start' of the | 'start' of the | |||
| bitstream is at the | bitstream is at the | |||
| highest position in | highest position in | |||
| memory, immediately | memory, immediately | |||
| before the last | before the last | |||
| 1 bit for | 1 bit for | |||
| padding. </t> | padding. </t> | |||
| <t> After decoding the | ||||
| <t> After decoding the | ||||
| starting states, a | starting states, a | |||
| single sequence is | single sequence is | |||
| decoded | decoded | |||
| Number_Of_Sequences | Number_Of_Sequences | |||
| times. These | times. These | |||
| sequences are | sequences are | |||
| decoded in order | decoded in order | |||
| from first to last. | from first to last. | |||
| Since the | Since the | |||
| compressor writes | compressor writes | |||
| the bitstream in | the bitstream in | |||
| the forward | the forward | |||
| direction, this | direction, this | |||
| means the | means the | |||
| compressor must | compressor must | |||
| encode the | encode the | |||
| sequences starting | sequences starting | |||
| with the last one | with the last one | |||
| and ending with the | and ending with the | |||
| first. </t> | first. </t> | |||
| <t> For each of the | ||||
| <t> For each of the | ||||
| symbol types, the | symbol types, the | |||
| FSE state can be | FSE state can be | |||
| used to determine | used to determine | |||
| the appropriate | the appropriate | |||
| code. The code then | code. The code then | |||
| defines the | defines the | |||
| Baseline and Number_o f_Bits | Baseline and Number_o f_Bits | |||
| to read for | to read for | |||
| each type. The | each type. The | |||
| description of the | description of the | |||
| codes for how to | codes for how to | |||
| determine these | determine these | |||
| values can be | values can be | |||
| found in | found in | |||
| <xref target="seq_sec | <xref target="seq_sec | |||
| _hdr"/>. </t> | _hdr" format="default"/>. </t> | |||
| <t> Decoding starts by | ||||
| <t> Decoding starts by | ||||
| reading the | reading the | |||
| Number_of_Bits | Number_of_Bits | |||
| required to decode | required to decode | |||
| offset. It | offset. It | |||
| does the same for | does the same for | |||
| Match_Length and | Match_Length and | |||
| then for | then for | |||
| Literals_Length. | Literals_Length. | |||
| This sequence is | This sequence is | |||
| then used for | then used for | |||
| Sequence Execution | Sequence Execution | |||
| (see | (see | |||
| <xref target="comp_se | <xref target="comp_se | |||
| quence_exec"/>). </t> | quence_exec" format="default"/>). </t> | |||
| <t> If it is not the | ||||
| <t> If it is not the | ||||
| last sequence in | last sequence in | |||
| the block, the next | the block, the next | |||
| operation is to | operation is to | |||
| update states. | update states. | |||
| Using the rules | Using the rules | |||
| pre-calculated in | precalculated in | |||
| the decoding | the decoding | |||
| tables, | tables, | |||
| Literals_Length_State | Literals_Length_State | |||
| is updated, | is updated, | |||
| followed by | followed by | |||
| Match_Length_State, | Match_Length_State, | |||
| and then | and then | |||
| Offset_State. | Offset_State. | |||
| See | See | |||
| <xref target="comp_fs e"/> | <xref target="comp_fs e" format="default"/> | |||
| for details on how | for details on how | |||
| to update states | to update states | |||
| from the | from the | |||
| bitstream. </t> | bitstream. </t> | |||
| <t> This operation will | ||||
| <t> This operation will | ||||
| be repeated | be repeated | |||
| Number_of_Sequences | Number_of_Sequences | |||
| times. At the end, | times. At the end, | |||
| the bitstream shall | the bitstream shall | |||
| be entirely | be entirely | |||
| consumed; otherwise, | consumed; otherwise, | |||
| the bitstream is | the bitstream is | |||
| considered | considered | |||
| corrupted. </t> | corrupted. </t> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section anchor="default_dist" numbered="true" toc="default"> | ||||
| <section anchor="default_dist" | <name>Default Distributions</name> | |||
| title="Default Distribu | <t> If Predefined_Mode | |||
| tions"> | ||||
| <t> If Predefined_Mode | ||||
| is selected for a | is selected for a | |||
| symbol type, its | symbol type, its | |||
| FSE decoding table | FSE decoding table | |||
| is generated from a | is generated from a | |||
| predefined | predefined | |||
| distribution table | distribution table | |||
| defined here. For | defined here. For | |||
| details on how to | details on how to | |||
| convert this | convert this | |||
| distribution into | distribution into | |||
| a decoding table, | a decoding table, | |||
| see <xref target="com | see <xref target="com | |||
| p_fse"/>. </t> | p_fse" format="default"/>. </t> | |||
| <section numbered="true" toc="default"> | ||||
| <section title="Literals | <name>Literals Length Codes</name> | |||
| Length"> | <t> The decoding | |||
| <t> The decoding | ||||
| table uses an | table uses an | |||
| accuracy log of | accuracy log of | |||
| 6 bits (64 | 6 bits (64 | |||
| states). | states). | |||
| <figure><artwork> | </t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| short literalsLength_defaultDistribution[36] = | short literalsLength_defaultDistribution[36] = | |||
| { 4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, | { 4, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, | |||
| 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 1, | 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 1, 1, | |||
| -1,-1,-1,-1 | -1,-1,-1,-1 | |||
| }; | }; | |||
| </artwork></figu | ]]></artwork> | |||
| re></t> | </section> | |||
| </section> | <section numbered="true" toc="default"> | |||
| <name>Match Length Codes</name> | ||||
| <section title="Match Len | <t> The decoding | |||
| gth"> | ||||
| <t> The decoding | ||||
| table uses an | table uses an | |||
| accuracy log of | accuracy log of | |||
| 6 bits (64 | 6 bits (64 | |||
| states). | states). | |||
| <figure><artwork> | </t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| short matchLengths_defaultDistribution[53] = | short matchLengths_defaultDistribution[53] = | |||
| { 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, | { 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, | |||
| 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, | |||
| 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1, | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,-1,-1, | |||
| -1,-1,-1,-1,-1 | -1,-1,-1,-1,-1 | |||
| }; | }; | |||
| </artwork></figu | ]]></artwork> | |||
| re></t> | </section> | |||
| </section> | <section numbered="true" toc="default"> | |||
| <name>Offset Codes</name> | ||||
| <section title="Offset Co | <t> The decoding | |||
| des"> | ||||
| <t> The decoding | ||||
| table uses an | table uses an | |||
| accuracy log of | accuracy log of | |||
| 5 bits (32 | 5 bits (32 | |||
| states), and | states) and | |||
| supports a | supports a | |||
| maximum N value | maximum N value | |||
| of 28, allowing | of 28, allowing | |||
| offset values | offset values | |||
| up to | up to | |||
| 536,870,908. </t> | 536,870,908. </t> | |||
| <t> If any sequence | ||||
| <t> If any sequence | ||||
| in the | in the | |||
| compressed | compressed | |||
| block requires | block requires | |||
| a larger offset | a larger offset | |||
| than this, it's | than this, it's | |||
| not possible to | not possible to | |||
| use the default | use the default | |||
| distribution to | distribution to | |||
| represent it. | represent it. | |||
| <figure><artwork> | </t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| short offsetCodes_defaultDistribution[29] = | short offsetCodes_defaultDistribution[29] = | |||
| { 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, | { 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, | |||
| 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,-1,-1,-1 | 1, 1, 1, 1, 1, 1, 1, 1,-1,-1,-1,-1,-1 | |||
| }; | }; | |||
| </artwork></figu | ]]></artwork> | |||
| re></t> | </section> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| </section> | <section anchor="comp_sequence_exec" numbered="true" toc="default"> | |||
| <name>Sequence Execution</name> | ||||
| <section anchor="comp_sequence_exec" | <t> Once literals and sequences have been decoded, | |||
| title="Sequence Execution"> | ||||
| <t> Once literals and sequences have been decoded, | ||||
| they are combined to produce the decoded content | they are combined to produce the decoded content | |||
| of a block. </t> | of a block. </t> | |||
| <t> Each sequence consists of a tuple of | ||||
| <t> Each sequence consists of a tuple of | ||||
| (literals_length, offset_value, match_length), | (literals_length, offset_value, match_length), | |||
| decoded as described in the Sequences_Section | decoded as described in the Sequences_Section | |||
| (<xref target="comp_sequences"/>). To execute a | (<xref target="comp_sequences" format="default"/>). T o execute a | |||
| sequence, first copy literals_length bytes from | sequence, first copy literals_length bytes from | |||
| the decoded literals to the output. </t> | the decoded literals to the output. </t> | |||
| <t> Then, match_length bytes are copied from previous | ||||
| <t> Then, match_length bytes are copied from previous | ||||
| decoded data. The offset to copy from is | decoded data. The offset to copy from is | |||
| determined by offset_value: | determined by offset_value: | |||
| <list style="symbols"> | </t> | |||
| <t> if Offset_Value > 3, then the offset is | <ul spacing="normal"> | |||
| Offset_Value - 3; </t> | <li> if Offset_Value > 3, then the offset is | |||
| Offset_Value - 3; </li> | ||||
| <t> if Offset_Value is from 1-3, the offset is | <li> if Offset_Value is from 1-3, the offset is | |||
| a special repeat offset value. See | a special repeat offset value. See | |||
| <xref target="repeat_offsets"/> for how | <xref target="repeat_offsets" format="default | |||
| the offset is determined in this case. </t> | "/> for how | |||
| </list> </t> | the offset is determined in this case. </li> | |||
| </ul> | ||||
| <t> The offset is defined as from the current | <t> The offset is defined as from the current | |||
| position (after copying the literals), so an | position (after copying the literals), so an | |||
| offset of 6 and a match length of | offset of 6 and a match length of | |||
| 3 means that 3 bytes should be copied from 6 bytes | 3 means that 3 bytes should be copied from 6 bytes | |||
| back. Note that all offsets leading to previously | back. Note that all offsets leading to previously | |||
| decoded data must be smaller than Window_Size | decoded data must be smaller than Window_Size | |||
| defined in Frame_Header_Descriptor | defined in Frame_Header_Descriptor | |||
| (<xref target="comp_frame_header_desc"/>). </t> | (<xref target="comp_frame_header_desc" format="defaul | |||
| </section> | t"/>). </t> | |||
| </section> | ||||
| <section title="Repeat Offsets" | <section anchor="repeat_offsets" numbered="true" toc="default"> | |||
| anchor="repeat_offsets"> | <name>Repeat Offsets</name> | |||
| <t> As seen above, the first three values | <t> As seen above, the first three values | |||
| define a repeated offset; we will call | define a repeated offset; we will call | |||
| them Repeated_Offset1, Repeated_Offset2, | them Repeated_Offset1, Repeated_Offset2, | |||
| and Repeated_Offset3. They are sorted in | and Repeated_Offset3. They are sorted in | |||
| recency order, with Repeated_Offset1 | recency order, with Repeated_Offset1 | |||
| meaning "most recent one". </t> | meaning "most recent one". </t> | |||
| <t> If offset_value is 1, then the offset used | ||||
| <t> If offset_value is 1, then the offset used | ||||
| is Repeated_Offset1, etc. </t> | is Repeated_Offset1, etc. </t> | |||
| <t> There is one exception: when the current | ||||
| <t> There is one exception: When the current | ||||
| sequence's literals_length is 0, repeated | sequence's literals_length is 0, repeated | |||
| offsets are shifted by 1, so an | offsets are shifted by 1, so an | |||
| offset_value of 1 means Repeated_Offset2, | offset_value of 1 means Repeated_Offset2, | |||
| an offset_value of 2 means Repeated_Offset3, | an offset_value of 2 means Repeated_Offset3, | |||
| and an offset_value of 3 means | and an offset_value of 3 means | |||
| Repeated_Offset1 - 1_byte. </t> | Repeated_Offset1 - 1_byte. </t> | |||
| <t> For the first block, the starting offset | ||||
| <t> For the first block, the starting offset | ||||
| history is populated with the following | history is populated with the following | |||
| values: Repeated_Offset1 (1), | values: Repeated_Offset1 (1), | |||
| Repeated_Offset2 (4), and | Repeated_Offset2 (4), and | |||
| Repeated_Offset3 (8), unless | Repeated_Offset3 (8), unless | |||
| a dictionary is used, in which case they | a dictionary is used, in which case they | |||
| come from the dictionary. </t> | come from the dictionary. </t> | |||
| <t> Then each block gets its starting offset | ||||
| <t> Then each block gets its starting offset | ||||
| history from the ending values of the most | history from the ending values of the most | |||
| recent Compressed_Block. Note that blocks | recent Compressed_Block. Note that blocks | |||
| that are not Compressed_Block are skipped; | that are not Compressed_Block are skipped; | |||
| they do not contribute to offset | they do not contribute to offset | |||
| history. </t> | history. </t> | |||
| <t> The newest offset takes the lead in offset | <t> During the execution of the sequences of a | |||
| history, shifting others back (up to its | Compressed_Block, the Repeated_Offsets' | |||
| previous place if it was already | values are kept up to date, so that they | |||
| present). This means that when | always represent the three most recently | |||
| Repeated_Offset1 (most recent) is used, | used offsets. In order to achieve that, | |||
| history is unmodified. When | they are updated after executing each | |||
| Repeated_Offset2 is used, it is swapped | sequence in the following way: </t> | |||
| with Repeated_Offset1. If any other offset | ||||
| is used, it becomes Repeated_Offset1, and | ||||
| the rest are shifted back by 1. </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="comp_skippable" | <t> When the sequence's offset_value does not | |||
| title="Skippable Frames"> | refer to one of the Repeated_Offsets -- | |||
| <t> <figure><artwork> | when it has value greater than 3, or when | |||
| +--------------+------------+-----------+ | it has value 3 and the sequence's | |||
| | Magic_Number | Frame_Size | User_Data | | literals_length is zero -- the | |||
| +--------------+------------+-----------+ | Repeated_Offsets' values are shifted back | |||
| | 4 bytes | 4 bytes | n bytes | | one, and Repeated_Offset1 takes on the | |||
| +--------------+------------+-----------+ | value of the offset that was just used. </t> | |||
| </artwork></figure> </t> | ||||
| <t> Skippable frames allow the insertion of | <t> Otherwise, when the sequence's offset_value | |||
| refers to one of the Repeated_Offsets -- | ||||
| when it has value 1 or 2, or when it has | ||||
| value 3 and the sequence's literals_length | ||||
| is non-zero -- the Repeated_Offsets are | ||||
| reordered, so that Repeated_Offset1 takes | ||||
| on the value of the used Repeated_Offset, | ||||
| and the existing values are pushed back | ||||
| from the first Repeated_Offset through to | ||||
| the Repeated_Offset selected by the | ||||
| offset_value. This effectively performs a | ||||
| single-stepped wrapping rotation of the | ||||
| values of these offsets, so that their | ||||
| order again reflects the recency of their | ||||
| use.</t> | ||||
| <t>The following table shows the values of | ||||
| the Repeated_Offsets as a series of | ||||
| sequences are applied to them:</t> | ||||
| <table anchor="repeated_offsets"> | ||||
| <name>Repeated_Offsets</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="center">offset_&zwsp;value</th> | ||||
| <th align="center">literals_&zwsp;length</th> | ||||
| <th align="center">Repeated_&zwsp;Offset1</th> | ||||
| <th align="center">Repeated_&zwsp;Offset2</th> | ||||
| <th align="center">Repeated_&zwsp;Offset3</th> | ||||
| <th align="left">Comment</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="right"></td> | ||||
| <td align="center"></td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">8</td> | ||||
| <td align="left">starting values</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="right">1114</td> | ||||
| <td align="center">11</td> | ||||
| <td align="center">1111</td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">4</td> | ||||
| <td align="left">non-repeat</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="right">1</td> | ||||
| <td align="center">22</td> | ||||
| <td align="center">1111</td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">4</td> | ||||
| <td align="left">repeat 1; no change</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="right">2225</td> | ||||
| <td align="center">22</td> | ||||
| <td align="center">2222</td> | ||||
| <td align="center">1111</td> | ||||
| <td align="center">1</td> | ||||
| <td align="left">non-repeat</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="right">1114</td> | ||||
| <td align="center">111</td> | ||||
| <td align="center">1111</td> | ||||
| <td align="center">2222</td> | ||||
| <td align="center">1111</td> | ||||
| <td align="left">non-repeat</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="right">3336</td> | ||||
| <td align="center">33</td> | ||||
| <td align="center">3333</td> | ||||
| <td align="center">1111</td> | ||||
| <td align="center">2222</td> | ||||
| <td align="left">non-repeat</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="right">2</td> | ||||
| <td align="center">22</td> | ||||
| <td align="center">1111</td> | ||||
| <td align="center">3333</td> | ||||
| <td align="center">2222</td> | ||||
| <td align="left">repeat 2; swap 1 & 2</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="right">3</td> | ||||
| <td align="center">33</td> | ||||
| <td align="center">2222</td> | ||||
| <td align="center">1111</td> | ||||
| <td align="center">3333</td> | ||||
| <td align="left">repeat 3; rotate 3 to 1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="right">1</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">2221</td> | ||||
| <td align="center">2222</td> | ||||
| <td align="center">1111</td> | ||||
| <td align="left">insert resolved offset</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="right">1</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">2222</td> | ||||
| <td align="center">2221</td> | ||||
| <td align="center">3333</td> | ||||
| <td align="left">repeat 2</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="comp_skippable" numbered="true" toc="default"> | ||||
| <name>Skippable Frames</name> | ||||
| <table anchor="skippable"> | ||||
| <name>Skippable Frames</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="center">Magic_Number</th> | ||||
| <th align="center">Frame_Size</th> | ||||
| <th align="center">User_Data</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">4 bytes</td> | ||||
| <td align="center">4 bytes</td> | ||||
| <td align="center">n bytes</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> Skippable frames allow the insertion of | ||||
| user-defined metadata into a flow of concatenated | user-defined metadata into a flow of concatenated | |||
| frames. </t> | frames. </t> | |||
| <t> Skippable frames defined in this specification are | ||||
| <t> Skippable frames defined in this specification are | ||||
| compatible with skippable frames in | compatible with skippable frames in | |||
| <xref target="LZ4"/>. </t> | <xref target="LZ4" format="default"/>. </t> | |||
| <t> From a compliant decoder perspective, skippable | ||||
| <t> From a compliant decoder perspective, skippable | ||||
| frames simply need to be skipped, and their | frames simply need to be skipped, and their | |||
| content ignored, resuming decoding after the | content ignored, resuming decoding after the | |||
| skippable frame. </t> | skippable frame. </t> | |||
| <t> It should be noted that a skippable frame can be | ||||
| <t> It should be noted that a skippable frame can be | ||||
| used to watermark a stream of concatenated frames | used to watermark a stream of concatenated frames | |||
| embedding any kind of tracking information (even | embedding any kind of tracking information (even | |||
| just a Universally Unique Identifier (UUID)). Users w ary of such possibility | just a Universally Unique Identifier (UUID)). Users w ary of such possibility | |||
| should scan the stream of concatenated frames in | should scan the stream of concatenated frames in | |||
| an attempt to detect such frames for analysis or | an attempt to detect such frames for analysis or | |||
| removal. </t> | removal. </t> | |||
| <t> The fields are: | ||||
| </t> | ||||
| <dl newline="false" spacing="normal"> | ||||
| <dt>Magic_Number:</dt> | ||||
| <t> The fields are: | <dd>4 bytes, little-endian format. Value: | |||
| <list style="hanging"> | ||||
| <t hangText="Magic_Number:"> | ||||
| 4 bytes, little-endian format. Value: | ||||
| 0x184D2A5?, which means any value from | 0x184D2A5?, which means any value from | |||
| 0x184D2A50 to 0x184D2A5F. All 16 | 0x184D2A50 to 0x184D2A5F. All 16 | |||
| values are valid to identify a | values are valid to identify a | |||
| skippable frame. This specification | skippable frame. This specification | |||
| does not detail any specific tagging | does not detail any specific tagging | |||
| methods for skippable frames. | methods for skippable frames. | |||
| </t> | </dd> | |||
| <dt>Frame_Size:</dt> | ||||
| <t hangText="Frame_Size:"> | <dd> | |||
| This is the size, in bytes, of the | This is the size, in bytes, of the | |||
| following User_Data (without including | following User_Data (without including | |||
| the magic number nor the size field | the magic number nor the size field | |||
| itself). This field is represented | itself). This field is represented | |||
| using 4 bytes, little-endian format, | using 4 bytes, little-endian format, | |||
| unsigned 32 bits. This means User_Data | unsigned 32 bits. This means User_Data | |||
| can't be bigger than (2^32-1) bytes. | can't be bigger than | |||
| </t> | (2<sup>32</sup> -1) bytes. | |||
| </dd> | ||||
| <t hangText="User_Data:"> | <dt>User_Data:</dt> | |||
| <dd> | ||||
| This field can be anything. Data will | This field can be anything. Data will | |||
| just be skipped by the decoder. | just be skipped by the decoder. | |||
| </t> | </dd> | |||
| </list> </t> | </dl> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section anchor="comp_entropy" numbered="true" toc="default"> | ||||
| <section anchor="comp_entropy" title="Entropy Encoding"> | <name>Entropy Encoding</name> | |||
| <t> Two types of entropy encoding are used by the | <t> Two types of entropy encoding are used by the | |||
| Zstandard format: FSE and Huffman coding. | Zstandard format: FSE and Huffman coding. | |||
| Huffman is used to compress literals, while FSE | Huffman is used to compress literals, while FSE | |||
| is used for all other symbols | is used for all other symbols | |||
| (Literals_Length_Code, Match_Length_Code, and offset | (Literals_Length_Code, Match_Length_Code, and offset | |||
| codes) and to compress Huffman headers.</t> | codes) and to compress Huffman headers.</t> | |||
| <section anchor="comp_fse" numbered="true" toc="default"> | ||||
| <section anchor="comp_fse" title="FSE"> | <name>FSE</name> | |||
| <t> FSE, short for Finite State Entropy, is | <t> FSE, short for Finite State Entropy, is | |||
| an entropy codec based on | an entropy codec based on | |||
| <xref target="ANS"/>. | <xref target="ANS" format="default"/>. | |||
| FSE encoding/decoding involves | FSE encoding/decoding involves | |||
| a state that is carried over between | a state that is carried over between | |||
| symbols, so decoding must be done in the | symbols, so decoding must be done in the | |||
| opposite direction as encoding. Therefore, | opposite direction as encoding. Therefore, | |||
| all FSE bitstreams are read from end to | all FSE bitstreams are read from end to | |||
| beginning. Note that the order of the | beginning. Note that the order of the | |||
| bits in the stream is not reversed; | bits in the stream is not reversed; | |||
| they are simply read in the reverse | they are simply read in the reverse | |||
| order from which they were written. </t> | order from which they were written. </t> | |||
| <t> For additional details on FSE, see | ||||
| "FiniteStateEntropy" <xref target="FSE" format="default"/>. </t> | ||||
| <t> For additional details on FSE, see | <t> FSE decoding involves a decoding table | |||
| Finite State Entropy | that has a power-of-2 size and contains | |||
| <xref target="FSE"/>. </t> | ||||
| <t> FSE decoding involves a decoding table | ||||
| that has a power of 2 size and contains | ||||
| three elements: Symbol, Num_Bits, and | three elements: Symbol, Num_Bits, and | |||
| Baseline. The base 2 logarithm | Baseline. The base 2 logarithm | |||
| of the table size is its Accuracy_Log. | of the table size is its Accuracy_Log. | |||
| An FSE state value represents an index in | An FSE state value represents an index in | |||
| this table. </t> | this table. </t> | |||
| <t> To obtain the initial state value, | ||||
| <t> To obtain the initial state value, | ||||
| consume Accuracy_Log bits from the stream | consume Accuracy_Log bits from the stream | |||
| as a little-endian value. The next symbol | as a little-endian value. The next symbol | |||
| in the stream is the Symbol indicated in | in the stream is the Symbol indicated in | |||
| the table for that state. To obtain the | the table for that state. To obtain the | |||
| next state value, the decoder should | next state value, the decoder should | |||
| consume Num_Bits bits from the stream as a | consume Num_Bits bits from the stream as a | |||
| little-endian value and add it to | little-endian value and add it to | |||
| Baseline. </t> | Baseline. </t> | |||
| <section anchor="comp_fse_table" numbered="true" toc="default"> | ||||
| <section anchor="comp_fse_table" | <name>FSE Table Description</name> | |||
| title="FSE Table Description"> | <t> To decode FSE streams, it is necessary | |||
| <t> To decode FSE streams, it is necessary | ||||
| to construct the decoding table. The | to construct the decoding table. The | |||
| Zstandard format encodes FSE table | Zstandard format encodes FSE table | |||
| descriptions as described here. </t> | descriptions as described here. </t> | |||
| <t> An FSE distribution table describes the | ||||
| <t> An FSE distribution table describes the | ||||
| probabilities of all symbols from 0 to | probabilities of all symbols from 0 to | |||
| the last present one (included) on a | the last present one (included) on a | |||
| normalized scale of | normalized scale of | |||
| (1 << Accuracy_Log). | (1 << Accuracy_Log). | |||
| Note that there must be two or | ||||
| more symbols with non-zero probability. | ||||
| </t> | ||||
| <t> A bitstream is read forward, in | Note that there must be two or | |||
| more symbols with nonzero probability. | ||||
| </t> | ||||
| <t> A bitstream is read forward, in | ||||
| little-endian fashion. It is not | little-endian fashion. It is not | |||
| necessary to know its exact size, | necessary to know its exact size, | |||
| since the size will be discovered and | since the size will be discovered and | |||
| reported by the decoding process. The | reported by the decoding process. The | |||
| bitstream starts by reporting on which | bitstream starts by reporting on which | |||
| scale it operates. If low4bits | scale it operates. If low4bits | |||
| designates the lowest 4 bits of | designates the lowest 4 bits of | |||
| the first byte, then | the first byte, then | |||
| Accuracy_Log = low4bits + 5. </t> | Accuracy_Log = low4bits + 5. </t> | |||
| <t> This is followed by each symbol value, | ||||
| <t> This is followed by each symbol value, | ||||
| from 0 to the last present one. The | from 0 to the last present one. The | |||
| number of bits used by each field is | number of bits used by each field is | |||
| variable and depends on: | variable and depends on: | |||
| <list style="hanging"> | </t> | |||
| <t hangText="Remaining probabilities | <dl newline="false" spacing="normal"> | |||
| + 1:"> | <dt>Remaining probabilities + 1:</dt> | |||
| <dd> | ||||
| For example, presuming an | For example, presuming an | |||
| Accuracy_Log of 8, and | Accuracy_Log of 8, and | |||
| presuming 100 probabilities | presuming 100 probabilities | |||
| points have already been | points have already been | |||
| distributed, the decoder may | distributed, the decoder may | |||
| read any value from 0 to | read any value from 0 to | |||
| (256 - 100 + 1) == 157, | (256 - 100 + 1) == 157, | |||
| inclusive. Therefore, it must | inclusive. Therefore, it must | |||
| read log2sup(157) == 8 | read log<sub>2</sub>sup(157) == 8 | |||
| bits. </t> | bits. </dd> | |||
| <dt>Value decoded:</dt> | ||||
| <t hangText="Value decoded:"> | <dd> | |||
| <t> | ||||
| Small values use 1 fewer bit. | Small values use 1 fewer bit. | |||
| For example, presuming values | For example, presuming values | |||
| from 0 to 157 (inclusive) are | from 0 to 157, inclusive, are | |||
| possible, 255 - 157 = 98 values | possible, 255 - 157 = 98 values | |||
| are remaining in an 8-bit | are remaining in an 8-bit | |||
| field. The first 98 values | field. The first 98 values | |||
| (hence from 0 to 97) use only | (hence, from 0 to 97) use only | |||
| 7 bits, and values from 98 to | 7 bits, and values from 98 to | |||
| 157 use 8 bits. This is | 157 use 8 bits. This is | |||
| achieved through this scheme: | achieved through the scheme in | |||
| <figure><artwork> | <xref target="value" />: | |||
| +------------+---------------+-----------+ | </t> | |||
| | Value Read | Value Decoded | Bits Used | | ||||
| +------------+---------------+-----------+ | ||||
| | 0 - 97 | 0 - 97 | 7 | | ||||
| +------------+---------------+-----------+ | ||||
| | 98 - 127 | 98 - 127 | 8 | | ||||
| +------------+---------------+-----------+ | ||||
| | 128 - 225 | 0 - 97 | 7 | | ||||
| +------------+---------------+-----------+ | ||||
| | 226 - 255 | 128 - 157 | 8 | | ||||
| +------------+---------------+-----------+ | ||||
| </artwork></figure> </t> | ||||
| </list></t> | ||||
| <t> Symbol probabilities are read | <table anchor="value"> | |||
| <name>Values Decoded</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="center">Value Read</th> | ||||
| <th align="center">Value Decoded</th> | ||||
| <th align="center">Bits Used</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">0 - 97</td> | ||||
| <td align="center">0 - 97</td> | ||||
| <td align="center">7</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">98 - 127</td> | ||||
| <td align="center">98 - 127</td> | ||||
| <td align="center">8</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">128 - 225</td> | ||||
| <td align="center">0 - 97</td> | ||||
| <td align="center">7</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">226 - 255</td> | ||||
| <td align="center">128 - 157</td> | ||||
| <td align="center">8</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </dd> | ||||
| </dl> | ||||
| <t> Symbol probabilities are read | ||||
| one by one, in order. The | one by one, in order. The | |||
| probability is obtained from | probability is obtained from | |||
| Value decoded using the | Value Decoded using the | |||
| formula P = Value - 1. This | formula P = Value - 1. This | |||
| means the value 0 becomes the | means the value 0 becomes the | |||
| negative probability -1. This | negative probability -1. This | |||
| is a special probability that | is a special probability that | |||
| means "less than 1". Its | means "less than 1". Its | |||
| effect on the distribution | effect on the distribution | |||
| table is described below. For | table is described below. For | |||
| the purpose of calculating | the purpose of calculating | |||
| total allocated probability | total allocated probability | |||
| points, it counts as 1. </t> | points, it counts as 1. </t> | |||
| <t> When a symbol has a | ||||
| <t> When a symbol has a | ||||
| probability of zero, it is | probability of zero, it is | |||
| followed by a 2-bit repeat | followed by a 2-bit repeat | |||
| flag. This repeat flag tells | flag. This repeat flag tells | |||
| how many probabilities of | how many probabilities of | |||
| zeroes follow the current one. | zeroes follow the current one. | |||
| It provides a number ranging | It provides a number ranging | |||
| from 0 to 3. If it is a 3, | from 0 to 3. If it is a 3, | |||
| another 2-bit repeat flag | another 2-bit repeat flag | |||
| follows, and so on. </t> | follows, and so on. </t> | |||
| <t> When the last symbol reaches | ||||
| <t> When the last symbol reaches | ||||
| a cumulated total of | a cumulated total of | |||
| (1 << Accuracy_Lo g), | (1 << Accuracy_Lo g), | |||
| decoding is complete. If the | decoding is complete. If the | |||
| last symbol makes the cumulated | last symbol makes the cumulated | |||
| total go above | total go above | |||
| (1 << Accuracy_Log), | (1 << Accuracy_Log), | |||
| distribution is considered | distribution is considered | |||
| corrupted. </t> | corrupted. </t> | |||
| <t> Finally, the decoder can tell | ||||
| <t> Finally, the decoder can tell | ||||
| how many bytes were used in | how many bytes were used in | |||
| this process and how many | this process and how many | |||
| symbols are present. The | symbols are present. The | |||
| bitstream consumes a round | bitstream consumes a round | |||
| number of bytes. Any remaining | number of bytes. Any remaining | |||
| bit within the last byte is | bit within the last byte is | |||
| simply unused. </t> | simply unused. </t> | |||
| <t> The context in which the table | ||||
| <t> The context in which the table | ||||
| is to be used specifies an expected | is to be used specifies an expected | |||
| number of symbols. That expected | number of symbols. That expected | |||
| number of symbols never exceeds 256. | number of symbols never exceeds 256. | |||
| If the number of symbols decoded | If the number of symbols decoded | |||
| is not equal to the expected, the | is not equal to the expected, the | |||
| header should be considered | header should be considered | |||
| corrupt. </t> | corrupt. </t> | |||
| <t> The distribution of normalized | ||||
| <t> The distribution of normalized | ||||
| probabilities is enough to | probabilities is enough to | |||
| create a unique decoding | create a unique decoding | |||
| table. The table has a size | table. The table has a size | |||
| of (1 << Accuracy_Log). | of (1 << Accuracy_Log). | |||
| Each cell describes the symbol | Each cell describes the symbol | |||
| decoded and instructions to | decoded and instructions to | |||
| get the next state. </t> | get the next state. </t> | |||
| <t> Symbols are scanned in their | ||||
| <t> Symbols are scanned in their | ||||
| natural order for "less than 1" | natural order for "less than 1" | |||
| probabilities as described | probabilities as described | |||
| above. Symbols with this | above. Symbols with this | |||
| probability are being | probability are being | |||
| attributed a single cell, | attributed a single cell, | |||
| starting from the end of the | starting from the end of the | |||
| table and retreating. These | table and retreating. These | |||
| symbols define a | symbols define a | |||
| full state reset, reading | full state reset, reading | |||
| Accuracy_Log bits. </t> | Accuracy_Log bits. </t> | |||
| <t> All remaining symbols are | ||||
| <t> All remaining symbols are | ||||
| allocated in their natural | allocated in their natural | |||
| order. Starting from symbol 0 | order. Starting from symbol 0 | |||
| and table position 0, each | and table position 0, each | |||
| symbol gets allocated as many | symbol gets allocated as many | |||
| cells as its probability. Cell | cells as its probability. Cell | |||
| allocation is spread, not | allocation is spread, not | |||
| linear; each successor | linear; each successor | |||
| position follows this rule: | position follows this rule: | |||
| <figure><artwork> | </t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| position += (tableSize >> 1) + (tableSize >> 3) + 3; | position += (tableSize >> 1) + (tableSize >> 3) + 3; | |||
| position &= tableSize - 1; | position &= tableSize - 1; | |||
| </artwork></figure></t> | ]]></artwork> | |||
| <t> A position is skipped if it is | ||||
| <t> A position is skipped if it is | ||||
| already occupied by a "less | already occupied by a "less | |||
| than 1" probability symbol. | than 1" probability symbol. | |||
| Position does not reset between | Position does not reset between | |||
| symbols; it simply iterates | symbols; it simply iterates | |||
| through each position in the | through each position in the | |||
| table, switching to the next | table, switching to the next | |||
| symbol when enough states have | symbol when enough states have | |||
| been allocated to the current | been allocated to the current | |||
| one. </t> | one. </t> | |||
| <t> The result is a list of state | ||||
| <t> The result is a list of state | ||||
| values. Each state will decode | values. Each state will decode | |||
| the current symbol. </t> | the current symbol. </t> | |||
| <t> To get the Number_of_Bits and | ||||
| <t> To get the Number_of_Bits and | ||||
| Baseline required for the next | Baseline required for the next | |||
| state, it is first necessary | state, it is first necessary | |||
| to sort all states in their | to sort all states in their | |||
| natural order. The lower | natural order. The lower | |||
| states will need 1 more bit | states will need 1 more bit | |||
| than higher ones. The process | than higher ones. The process | |||
| is repeated for each symbol. | is repeated for each symbol. | |||
| </t> | </t> | |||
| <t> For example, presuming a symbol | ||||
| <t> For example, presuming a symbol | ||||
| has a probability of 5, it | has a probability of 5, it | |||
| receives five state values. | receives five state values. | |||
| States are sorted in natural | States are sorted in natural | |||
| order. The next power of | order. The next power of | |||
| 2 is 8. The space of | 2 is 8. The space of | |||
| probabilities is divided into | probabilities is divided into | |||
| 8 equal parts. Presuming the | 8 equal parts. Presuming the | |||
| Accuracy_Log is 7, this | Accuracy_Log is 7, this | |||
| defines 128 states, and each | defines 128 states, and each | |||
| share (divided by 8) is 16 | share (divided by 8) is 16 | |||
| in size. In order to reach | in size. In order to reach | |||
| 8, 8 - 5 = 3 lowest states will | 8, 8 - 5 = 3 lowest states will | |||
| count "double", doubling the | count "double", doubling the | |||
| number of shares (32 in width), | number of shares (32 in width), | |||
| requiring 1 more bit in the | requiring 1 more bit in the | |||
| process. </t> | process. </t> | |||
| <t> Baseline is assigned starting | ||||
| <t> Baseline is assigned starting | ||||
| from the higher states using | from the higher states using | |||
| fewer bits, and proceeding | fewer bits, and proceeding | |||
| naturally, then resuming at | naturally, then resuming at | |||
| the first state, each taking | the first state, each taking | |||
| its allocated width from | its allocated width from | |||
| Baseline. </t> | Baseline. </t> | |||
| <t> <figure><artwork> | <table anchor="state"> | |||
| +----------------+-------+-------+--------+------+-------+ | <name>Baseline Assignments</name> | |||
| | state order | 0 | 1 | 2 | 3 | 4 | | <tbody> | |||
| +----------------+-------+-------+--------+------+-------+ | <tr> | |||
| | width | 32 | 32 | 32 | 16 | 16 | | <td align="center">state order</td> | |||
| +----------------+-------+-------+--------+------+-------+ | <td align="center">0</td> | |||
| | Number_of_Bits | 5 | 5 | 5 | 4 | 4 | | <td align="center">1</td> | |||
| +----------------+-------+-------+--------+------+-------+ | <td align="center">2</td> | |||
| | range number | 2 | 4 | 6 | 0 | 1 | | <td align="center">3</td> | |||
| +----------------+-------+-------+--------+------+-------+ | <td align="center">4</td> | |||
| | Baseline | 32 | 64 | 96 | 0 | 16 | | </tr> | |||
| +----------------+-------+-------+--------+------+-------+ | <tr> | |||
| | range | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 | | <td align="center">width</td> | |||
| +----------------+-------+-------+--------+------+-------+ | <td align="center">32</td> | |||
| </artwork></figure> </t> | <td align="center">32</td> | |||
| <td align="center">32</td> | ||||
| <td align="center">16</td> | ||||
| <td align="center">16</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">Number_of_Bits</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">4</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">range number</td> | ||||
| <td align="center">2</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">Baseline</td> | ||||
| <td align="center">32</td> | ||||
| <td align="center">64</td> | ||||
| <td align="center">96</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">16</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">range</td> | ||||
| <td align="center">32-63</td> | ||||
| <td align="center">64-95</td> | ||||
| <td align="center">96-127</td> | ||||
| <td align="center">0-15</td> | ||||
| <td align="center">16-31</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> The next state is determined | <t> The next state is determined | |||
| from the current state by | from the current state by | |||
| reading the required | reading the required | |||
| Number_of_Bits and adding the | Number_of_Bits and adding the | |||
| specified Baseline. </t> | specified Baseline. </t> | |||
| <t> See <xref target="app_tables" format="default"/> | ||||
| <t> See <xref target="app_tables"/> | ||||
| for the results of this | for the results of this | |||
| process that are applied to the | process that are applied to the | |||
| default distributions. </t> | default distributions. </t> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section anchor="comp_huffman" numbered="true" toc="default"> | ||||
| <section anchor="comp_huffman" title="Huffman Coding"> | <name>Huffman Coding</name> | |||
| <t> Zstandard Huffman-coded streams are read | <t> Zstandard Huffman-coded streams are read | |||
| backwards, similar to the FSE bitstreams. | backwards, similar to the FSE bitstreams. | |||
| Therefore, to find the start of the | Therefore, to find the start of the | |||
| bitstream, it is necessary to know the | bitstream, it is necessary to know the | |||
| offset of the last byte of the | offset of the last byte of the | |||
| Huffman-coded stream. </t> | Huffman-coded stream. </t> | |||
| <t> After writing the last bit containing | ||||
| <t> After writing the last bit containing | ||||
| information, the compressor writes a | information, the compressor writes a | |||
| single 1 bit and then fills the rest of | single 1 bit and then fills the rest of | |||
| the byte with 0 bits. The last byte of | the byte with 0 bits. The last byte of | |||
| the compressed bitstream cannot be 0 for | the compressed bitstream cannot be 0 for | |||
| that reason. </t> | that reason. </t> | |||
| <t> When decompressing, the last byte | ||||
| <t> When decompressing, the last byte | ||||
| containing the padding is the first byte | containing the padding is the first byte | |||
| to read. The decompressor needs to skip | to read. The decompressor needs to skip | |||
| the up to 7 bits of 0-padding as well as | the up to 7 bits of 0-padding as well as | |||
| the first 1 bit that occurs. Afterwards, | the first 1 bit that occurs. Afterwards, | |||
| the useful part of the bitstream | the useful part of the bitstream | |||
| begins. </t> | begins. </t> | |||
| <t> The bitstream contains Huffman-coded | ||||
| <t> The bitstream contains Huffman-coded | ||||
| symbols in little-endian order, with the | symbols in little-endian order, with the | |||
| codes defined by the method below. </t> | codes defined by the method below. </t> | |||
| <section anchor="huffman_tree_desc" numbered="true" toc="default"> | ||||
| <section anchor="huffman_tree_desc" | <name>Huffman Tree Description</name> | |||
| title="Huffman Tree Description"> | <t> Prefix coding represents symbols | |||
| <t> Prefix coding represents symbols | ||||
| from an a priori known alphabet by | from an a priori known alphabet by | |||
| bit sequences (codewords), one | bit sequences (codewords), one | |||
| codeword for each symbol, in a | codeword for each symbol, in a | |||
| manner such that different symbols | manner such that different symbols | |||
| may be represented by bit sequences | may be represented by bit sequences | |||
| of different lengths, but a parser | of different lengths, but a parser | |||
| can always parse an encoded string | can always parse an encoded string | |||
| unambiguously | unambiguously, | |||
| symbol by symbol. </t> | symbol by symbol. </t> | |||
| <t> Given an alphabet with known symbol | ||||
| <t> Given an alphabet with known symbol | ||||
| frequencies, the Huffman algorithm | frequencies, the Huffman algorithm | |||
| allows the construction of an | allows the construction of an | |||
| optimal prefix code using the | optimal prefix code using the | |||
| fewest bits of any possible prefix | fewest bits of any possible prefix | |||
| codes for that alphabet. </t> | codes for that alphabet. </t> | |||
| <t> The prefix code must not exceed a | ||||
| <t> The prefix code must not exceed a | ||||
| maximum code length. More bits | maximum code length. More bits | |||
| improve accuracy but yield a larger | improve accuracy but yield a larger | |||
| header size and require more | header size and require more | |||
| memory or more complex decoding | memory or more complex decoding | |||
| operations. This specification | operations. This specification | |||
| limits the maximum code length to | limits the maximum code length to | |||
| 11 bits. </t> | 11 bits. </t> | |||
| <t> All literal values from zero | ||||
| <t> All literal values from zero | ||||
| (included) to the last present one | (included) to the last present one | |||
| (excluded) are represented by | (excluded) are represented by | |||
| Weight with values from 0 to | Weight with values from 0 to | |||
| Max_Number_of_Bits. Transformation | Max_Number_of_Bits. Transformation | |||
| from Weight to Number_of_Bits | from Weight to Number_of_Bits | |||
| follows this pseudocode: | follows this pseudocode: | |||
| <figure><artwork> | </t> | |||
| <sourcecode type="pseudocode"> | ||||
| if Weight == 0 | if Weight == 0 | |||
| Number_of_Bits = 0 | Number_of_Bits = 0 | |||
| else | else | |||
| Number_of_Bits = Max_Number_of_Bits + 1 - Weight | Number_of_Bits = Max_Number_of_Bits + 1 - Weight | |||
| </artwork></figure> </t> | </sourcecode> | |||
| <t> The last symbol's Weight is | ||||
| <t> The last symbol's Weight is | ||||
| deduced from previously decoded | deduced from previously decoded | |||
| ones, by completing to the nearest | ones, by completing to the nearest | |||
| power of 2. This power of 2 gives | power of 2. This power of 2 gives | |||
| Max_Number_of_Bits the depth of | Max_Number_of_Bits the depth of | |||
| the current tree. </t> | the current tree. </t> | |||
| <t> For example, presume the following | ||||
| <t> For example, presume the following | ||||
| Huffman tree must be described: | Huffman tree must be described: | |||
| <figure><artwork> | </t> | |||
| +---------------+----------------+ | ||||
| | Literal Value | Number_of_Bits | | ||||
| +---------------+----------------+ | ||||
| | 0 | 1 | | ||||
| +---------------+----------------+ | ||||
| | 1 | 2 | | ||||
| +---------------+----------------+ | ||||
| | 2 | 3 | | ||||
| +---------------+----------------+ | ||||
| | 3 | 0 | | ||||
| +---------------+----------------+ | ||||
| | 4 | 4 | | ||||
| +---------------+----------------+ | ||||
| | 5 | 4 | | ||||
| +---------------+----------------+ | ||||
| </artwork></figure> </t> | ||||
| <t> The tree depth is 4, since its | <table anchor="Huffman"> | |||
| <name>Huffman Tree</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="center">Literal Value</th> | ||||
| <th align="center">Number_of_Bits</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">0</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">1</td> | ||||
| <td align="center">2</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">2</td> | ||||
| <td align="center">3</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">3</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">4</td> | ||||
| <td align="center">4</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">5</td> | ||||
| <td align="center">4</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> The tree depth is 4, since its | ||||
| longest element uses 4 bits. | longest element uses 4 bits. | |||
| (The longest elements are those | (The longest elements are those | |||
| with the smallest frequencies.) | with the smallest frequencies.) | |||
| Value 5 will not be listed as it | Value 5 will not be listed as it | |||
| can be determined from the values | can be determined from the values | |||
| for 0-4, nor will values above 5 | for 0-4, nor will values above 5 | |||
| as they are all 0. Values from 0 | as they are all 0. Values from 0 | |||
| to 4 will be listed using Weight | to 4 will be listed using Weight | |||
| instead of Number_of_Bits. The | instead of Number_of_Bits. The | |||
| pseudocode to determine Weight is: | pseudocode to determine Weight is: | |||
| <figure><artwork> | </t> | |||
| <sourcecode type="pseudocode"> | ||||
| if Number_of_Bits == 0 | if Number_of_Bits == 0 | |||
| Weight = 0 | Weight = 0 | |||
| else | else | |||
| Weight = Max_Number_of_Bits + 1 - Number_of_Bits | Weight = Max_Number_of_Bits + 1 - Number_of_Bits | |||
| </artwork></figure> </t> | </sourcecode> | |||
| <t> It gives the following series of | ||||
| <t> It gives the following series of | ||||
| weights: | weights: | |||
| <figure><artwork> | </t> | |||
| +---------------+--------+ | ||||
| | Literal Value | Weight | | ||||
| +---------------+--------+ | ||||
| | 0 | 4 | | ||||
| +---------------+--------+ | ||||
| | 1 | 3 | | ||||
| +---------------+--------+ | ||||
| | 2 | 2 | | ||||
| +---------------+--------+ | ||||
| | 3 | 0 | | ||||
| +---------------+--------+ | ||||
| | 4 | 1 | | ||||
| +---------------+--------+ | ||||
| </artwork></figure> </t> | ||||
| <t> The decoder will do the inverse | <table anchor="weights"> | |||
| <name>Weights</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="center">Literal Value</th> | ||||
| <th align="center">Weight</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">0</td> | ||||
| <td align="center">4</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">1</td> | ||||
| <td align="center">3</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">2</td> | ||||
| <td align="center">2</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">3</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">4</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> The decoder will do the inverse | ||||
| operation: having collected weights | operation: having collected weights | |||
| of literals from 0 to 4, it knows | of literals from 0 to 4, it knows | |||
| the last literal, 5, is present | the last literal, 5, is present | |||
| with a non-zero Weight. The Weight | with a nonzero Weight. The Weight | |||
| of 5 can be determined by advancing | of 5 can be determined by advancing | |||
| to the next power of 2. The sum of | to the next power of 2. The sum of | |||
| 2^(Weight-1) (excluding 0's) is 15. | 2<sup>(Weight-1)</sup> (excluding 0's ) is 15. | |||
| The nearest power of 2 is 16. | The nearest power of 2 is 16. | |||
| Therefore, Max_Number_of_Bits = 4 | Therefore, Max_Number_of_Bits = 4 | |||
| and Weight[5] = 16 - 15 = 1. </t> | and Weight[5] = 16 - 15 = 1. </t> | |||
| <section anchor="huffman_tree_header" numbered="true" toc="default"> | ||||
| <section anchor="huffman_tree_header" | <name>Huffman Tree Header</name> | |||
| title="Huffman Tree Header"> | <t> This is a single byte value | |||
| <t> This is a single byte value | ||||
| (0-255), which describes how | (0-255), which describes how | |||
| the series of weights is | the series of weights is | |||
| encoded. | encoded. | |||
| <list style="hanging"> | </t> | |||
| <t hangText="headerByte < | <dl newline="false" spacing="normal"> | |||
| 128:"> | <dt>headerByte < 128:</dt> | |||
| <dd> | ||||
| The series of weights | The series of weights | |||
| is compressed using | is compressed using | |||
| FSE (see below). The | FSE (see below). The | |||
| length of the | length of the | |||
| FSE-compressed series | FSE-compressed series | |||
| is equal to headerByte | is equal to headerByte | |||
| (0-127). </t> | (0-127). </dd> | |||
| <dt>headerByte >= 128:</dt> | ||||
| <t hangText="headerByte >= 12 | <dd> | |||
| 8:"> | <t> | |||
| This is a direct | This is a direct | |||
| representation, where | representation, where | |||
| each Weight is written | each Weight is written | |||
| directly as a 4-bit | directly as a 4-bit | |||
| field (0-15). They are | field (0-15). They are | |||
| encoded forward, 2 | encoded forward, 2 | |||
| weights to a byte with | weights to a byte with | |||
| the first weight taking | the first weight taking | |||
| the top 4 bits and | the top 4 bits and | |||
| the second taking the | the second taking the | |||
| bottom 4; for example, th e | bottom 4; for example, th e | |||
| following operations | following operations | |||
| could be used to read | could be used to read | |||
| the weights: | the weights: | |||
| <figure><artwork> | </t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| Weight[0] = (Byte[0] >> 4) | Weight[0] = (Byte[0] >> 4) | |||
| Weight[1] = (Byte[0] & 0xf), | Weight[1] = (Byte[0] & 0xf), | |||
| etc. | etc. | |||
| </artwork></figure> | ]]></artwork> | |||
| <t> | ||||
| The full representation | The full representation | |||
| occupies | occupies | |||
| ceiling(Number_of_Symbols /2) | ceiling(Number_of_Symbols /2) | |||
| bytes, meaning it uses | bytes, meaning it uses | |||
| only full bytes even | only full bytes even | |||
| if Number_of_Symbols is | if Number_of_Symbols is | |||
| odd. Number_of_Symbols | odd. Number_of_Symbols | |||
| = headerByte - 127. | = headerByte - 127. | |||
| Note that maximum | Note that maximum | |||
| Number_of_Symbols is | Number_of_Symbols is | |||
| 255 - 127 = 128. If any | 255 - 127 = 128. If any | |||
| literal | literal | |||
| has a value over 128, | has a value over 128, | |||
| raw header mode is not | raw header mode is not | |||
| possible, and it is | possible, and it is | |||
| necessary to use FSE | necessary to use FSE | |||
| compression. </t> | compression. </t> | |||
| </list></t> | </dd> | |||
| </section> | </dl> | |||
| </section> | ||||
| <section anchor="huffman_tree_fse" | <section anchor="huffman_tree_fse" numbered="true" toc="default"> | |||
| title="FSE Compression of Huffman | <name>FSE Compression of Huffman Weights</name> | |||
| Weights"> | <t> In this case, the series of | |||
| <t> In this case, the series of | ||||
| Huffman weights is compressed | Huffman weights is compressed | |||
| using FSE compression. It is a | using FSE compression. It is a | |||
| single bitstream with two | single bitstream with two | |||
| interleaved states, sharing a | interleaved states, sharing a | |||
| single distribution table. </t> | single distribution table. </t> | |||
| <t> To decode an FSE bitstream, it | ||||
| <t> To decode an FSE bitstream, it | ||||
| is necessary to know its | is necessary to know its | |||
| compressed size. Compressed | compressed size. Compressed | |||
| size is provided by headerByte. | size is provided by headerByte. | |||
| It's also necessary to know its | It's also necessary to know its | |||
| maximum possible decompressed | maximum possible decompressed | |||
| size, which is 255, since | size, which is 255, since | |||
| literal values span from 0 to | literal values span from 0 to | |||
| 255, and the last symbol's | 255, and the last symbol's | |||
| Weight is not represented. </t> | Weight is not represented. </t> | |||
| <t> An FSE bitstream starts by | ||||
| <t> An FSE bitstream starts by | ||||
| a header, describing | a header, describing | |||
| probabilities distribution. It | probabilities distribution. It | |||
| will create a decoding table. | will create a decoding table. | |||
| For a list of Huffman weights, | For a list of Huffman weights, | |||
| the maximum accuracy log is 6 | the maximum accuracy log is 6 | |||
| bits. For more details, see | bits. For more details, see | |||
| <xref target="comp_fse_table"/>. | <xref target="comp_fse_table" for | |||
| </t> | mat="default"/>. | |||
| </t> | ||||
| <t> The Huffman header compression | <t> The Huffman header compression | |||
| uses two states, which share | uses two states, which share | |||
| the same FSE distribution | the same FSE distribution | |||
| table. | table. | |||
| The first state (State1) | The first state (State1) | |||
| encodes the even-numbered index | encodes the even-numbered index | |||
| symbols, and the second | symbols, and the second | |||
| (State2) encodes the odd-numbered | (State2) encodes the odd-numbered | |||
| index symbols. State1 is initiali zed | index symbols. State1 is initiali zed | |||
| first, and then State2, and | first, and then State2, and | |||
| they take turns decoding a | they take turns decoding a | |||
| single symbol and updating | single symbol and updating | |||
| their state. | their state. | |||
| For more details | For more details | |||
| on these FSE operations, see | on these FSE operations, see | |||
| <xref target="comp_fse"/>. </t> | <xref target="comp_fse" format="d | |||
| efault"/>. </t> | ||||
| <t> The number of symbols to be | <t> The number of symbols to be | |||
| decoded is determined by | decoded is determined by | |||
| tracking the bitStream overflow | tracking the bitStream overflow | |||
| condition: If updating state | condition: if updating state | |||
| after decoding a symbol would | after decoding a symbol would | |||
| require more bits than remain | require more bits than remain | |||
| in the stream, it is assumed | in the stream, it is assumed | |||
| that extra bits are zero. Then, | that extra bits are zero. Then, | |||
| symbols for each of the | symbols for each of the | |||
| final states are decoded and | final states are decoded and | |||
| the process is complete.</t> | the process is complete.</t> | |||
| </section> | </section> | |||
| <section anchor="huffman_tree_conv" numbered="true" toc="default"> | ||||
| <section anchor="huffman_tree_conv" | <name>Conversion from Weights to Huffman Prefix Codes</name> | |||
| title="Conversion from Weights to | <t> All present symbols will now | |||
| Huffman Prefix Codes"> | ||||
| <t> All present symbols will now | ||||
| have a Weight value. It is | have a Weight value. It is | |||
| possible to transform weights | possible to transform weights | |||
| into Number_of_Bits, using | into Number_of_Bits, using | |||
| this formula: | this formula: | |||
| <figure><artwork> | </t> | |||
| <sourcecode type="pseudocode"> | ||||
| if Weight > 0 | if Weight > 0 | |||
| Number_of_Bits = Max_Number_of_Bits + 1 - Weight | Number_of_Bits = Max_Number_of_Bits + 1 - Weight | |||
| else | else | |||
| Number_of_Bits = 0 | Number_of_Bits = 0 | |||
| </artwork></figure></t> | </sourcecode> | |||
| <t> Symbols are sorted by Weight. | ||||
| <t> Symbols are sorted by Weight. | ||||
| Within the same Weight, symbols | Within the same Weight, symbols | |||
| keep natural sequential | keep natural sequential | |||
| order. Symbols | order. Symbols | |||
| with a Weight of zero are | with a Weight of zero are | |||
| removed. Then, starting from | removed. Then, starting from | |||
| the | the | |||
| lowest Weight, prefix codes | lowest Weight, prefix codes | |||
| are distributed in sequential | are distributed in sequential | |||
| order. </t> | order. </t> | |||
| <t> For example, assume the | ||||
| <t> For example, assume the | ||||
| following list of weights | following list of weights | |||
| has been decoded: | has been decoded: | |||
| <figure><artwork> | </t> | |||
| +---------+--------+ | ||||
| | Literal | Weight | | ||||
| +---------+--------+ | ||||
| | 0 | 4 | | ||||
| +---------+--------+ | ||||
| | 1 | 3 | | ||||
| +---------+--------+ | ||||
| | 2 | 2 | | ||||
| +---------+--------+ | ||||
| | 3 | 0 | | ||||
| +---------+--------+ | ||||
| | 4 | 1 | | ||||
| +---------+--------+ | ||||
| | 5 | 1 | | ||||
| +---------+--------+ | ||||
| </artwork></figure> </t> | ||||
| <t> Sorting by weight and then | <table anchor="decoded-weights"> | |||
| <name>Decoded Weights</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="center">Literal</th> | ||||
| <th align="center">Weight</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">0</td> | ||||
| <td align="center">4</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">1</td> | ||||
| <td align="center">3</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">2</td> | ||||
| <td align="center">2</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">3</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">4</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">5</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> Sorting by weight and then | ||||
| the natural sequential order | the natural sequential order | |||
| yields the following | yields the following | |||
| distribution: | distribution: | |||
| <figure><artwork> | </t> | |||
| +---------+--------+----------------+--------------+ | ||||
| | Literal | Weight | Number_Of_Bits | Prefix Codes | | ||||
| +---------+--------+----------------|--------------+ | ||||
| | 3 | 0 | 0 | N/A | | ||||
| +---------+--------+----------------|--------------+ | ||||
| | 4 | 1 | 4 | 0000 | | ||||
| +---------+--------+----------------|--------------+ | ||||
| | 5 | 1 | 4 | 0001 | | ||||
| +---------+--------+----------------|--------------+ | ||||
| | 2 | 2 | 3 | 001 | | ||||
| +---------+--------+----------------|--------------+ | ||||
| | 1 | 3 | 2 | 01 | | ||||
| +---------+--------+----------------|--------------+ | ||||
| | 0 | 4 | 1 | 1 | | ||||
| +---------+--------+----------------|--------------+ | ||||
| </artwork></figure> </t> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="huffman_coded_streams" | <table anchor="sorting-by-weight"> | |||
| title="Huffman-Coded Streams"> | <name>Sorting by Weight</name> | |||
| <t> Given a Huffman decoding table, it is | <thead> | |||
| <tr> | ||||
| <th align="center">Literal</th> | ||||
| <th align="center">Weight</th> | ||||
| <th align="center">Number_Of_Bits</th> | ||||
| <th align="center">Prefix Codes</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">3</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">N/A</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">4</td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">0000</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">5</td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">0001</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">2</td> | ||||
| <td align="center">2</td> | ||||
| <td align="center">3</td> | ||||
| <td align="center">001</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">1</td> | ||||
| <td align="center">3</td> | ||||
| <td align="center">2</td> | ||||
| <td align="center">01</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">0</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| </section> | ||||
| </section> | ||||
| <section anchor="huffman_coded_streams" numbered="true" toc="default"> | ||||
| <name>Huffman-Coded Streams</name> | ||||
| <t> Given a Huffman decoding table, it is | ||||
| possible to decode a Huffman-coded | possible to decode a Huffman-coded | |||
| stream. </t> | stream. </t> | |||
| <t> Each bitstream must be read backward, starting from the end and go | ||||
| <t> Each bitstream must be read backward, | ing up to | |||
| which starts from the end and goes up to | ||||
| the beginning. Therefore, it is | the beginning. Therefore, it is | |||
| necessary to know the size of each | necessary to know the size of each | |||
| bitstream. </t> | bitstream. </t> | |||
| <t> It is also necessary to know exactly | ||||
| <t> It is also necessary to know exactly | ||||
| which bit is the last. This is | which bit is the last. This is | |||
| detected by a final bit flag: the | detected by a final bit flag: the | |||
| highest bit of the last byte is a | highest bit of the last byte is a | |||
| final-bit-flag. Consequently, a last | final-bit-flag. Consequently, a last | |||
| byte of 0 is not possible. And the | byte of 0 is not possible. And the | |||
| final-bit-flag itself is not part of | final-bit-flag itself is not part of | |||
| the useful bitstream. Hence, the last | the useful bitstream. Hence, the last | |||
| byte contains between 0 and 7 useful | byte contains between 0 and 7 useful | |||
| bits. </t> | bits. </t> | |||
| <t> Starting from the end, it is possible | ||||
| <t> Starting from the end, it is possible | ||||
| to read the bitstream in a | to read the bitstream in a | |||
| little-endian fashion, keeping track | little-endian fashion, keeping track | |||
| of already used bits. Since the | of already used bits. Since the | |||
| bitstream is encoded in reverse order, | bitstream is encoded in reverse order, | |||
| starting from the end, read symbols in | starting from the end, read symbols in | |||
| forward order. </t> | forward order. </t> | |||
| <t> For example, if the literal sequence | ||||
| <t> For example, if the literal sequence | ||||
| "0145" was encoded using the above prefix | "0145" was encoded using the above prefix | |||
| code, it would be encoded (in reverse | code, it would be encoded (in reverse | |||
| order) as: | order) as: | |||
| <figure><artwork> | </t> | |||
| +---------+----------+ | ||||
| | Symbol | Encoding | | ||||
| +---------+----------+ | ||||
| | 5 | 0000 | | ||||
| +---------+----------+ | ||||
| | 4 | 0001 | | ||||
| +---------+----------+ | ||||
| | 1 | 01 | | ||||
| +---------+----------+ | ||||
| | 0 | 1 | | ||||
| +---------+----------+ | ||||
| | Padding | 00001 | | ||||
| +---------+----------+ | ||||
| </artwork></figure></t> | ||||
| <t> This results in the following 2-byte | <table anchor="coded-example"> | |||
| <name>Literal Sequence "0145"</name> | ||||
| <thead> | ||||
| <tr> | ||||
| <th align="center">Symbol</th> | ||||
| <th align="center">Encoding</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0000</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">4</td> | ||||
| <td align="center">0001</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">1</td> | ||||
| <td align="center">01</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">0</td> | ||||
| <td align="center">1</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">Padding</td> | ||||
| <td align="center">00001</td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t> This results in the following 2-byte | ||||
| bitstream: | bitstream: | |||
| <figure><artwork> | </t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| 00010000 00001101 | 00010000 00001101 | |||
| </artwork></figure></t> | ]]></artwork> | |||
| <t> Here is an alternative representation | ||||
| <t> Here is an alternative representation | ||||
| with the symbol codes separated by | with the symbol codes separated by | |||
| underscores: | underscores: | |||
| <figure><artwork> | </t> | |||
| <artwork name="" type="" align="left" alt=""><![CDATA[ | ||||
| 0001_0000 00001_1_01 | 0001_0000 00001_1_01 | |||
| </artwork></figure></t> | ]]></artwork> | |||
| <t> Reading the highest Max_Number_of_Bits | ||||
| <t> Reading the highest Max_Number_of_Bits | ||||
| bits, it's possible to compare the | bits, it's possible to compare the | |||
| extracted value to the decoding table, | extracted value to the decoding table, | |||
| determining the symbol to decode and | determining the symbol to decode and | |||
| number of bits to discard. </t> | number of bits to discard. </t> | |||
| <t> The process continues reading up to | ||||
| <t> The process continues reading up to | ||||
| the required number of symbols per | the required number of symbols per | |||
| stream. If a bitstream is not entirely | stream. If a bitstream is not entirely | |||
| and exactly consumed, hence reaching | and exactly consumed, hence reaching | |||
| exactly its beginning position with | exactly its beginning position with | |||
| all bits consumed, the decoding process | all bits consumed, the decoding process | |||
| is considered faulty. </t> | is considered faulty. </t> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section anchor="comp_dict" numbered="true" toc="default"> | ||||
| <section anchor="comp_dict" title="Dictionary Format"> | <name>Dictionary Format</name> | |||
| <t> Zstandard is compatible with "raw content" | <t> Zstandard is compatible with "raw content" | |||
| dictionaries, free of any format restriction, | dictionaries, free of any format restriction, | |||
| except that they must be at least 8 bytes. | except that they must be at least 8 bytes. | |||
| These dictionaries function as if they were just | These dictionaries function as if they were just | |||
| the content part of a formatted dictionary. </t> | the content part of a formatted dictionary. </t> | |||
| <t> However, dictionaries created by "zstd --train" | ||||
| <t> However, dictionaries created by "zstd --train" | ||||
| in the reference implementation follow a specific | in the reference implementation follow a specific | |||
| format, described here. </t> | format, described here. </t> | |||
| <t> Dictionaries are not included in the compressed | ||||
| <t> Dictionaries are not included in the compressed | ||||
| content but rather are provided out of band. | content but rather are provided out of band. | |||
| That is, the Dictionary_ID identifies which should | That is, the Dictionary_ID identifies which should | |||
| be used, but this specification does not describe | be used, but this specification does not describe | |||
| the mechanism by which the dictionary is obtained | the mechanism by which the dictionary is obtained | |||
| prior to use during compression or | prior to use during compression or | |||
| decompression. </t> | decompression. </t> | |||
| <t> A dictionary has a size, defined either by a | ||||
| <t> A dictionary has a size, defined either by a | ||||
| buffer limit or a file size. The general format | buffer limit or a file size. The general format | |||
| is: | is: | |||
| <figure><artwork> | </t> | |||
| +--------------+---------------+----------------+---------+ | ||||
| | Magic_Number | Dictionary_ID | Entropy_Tables | Content | | ||||
| +--------------+---------------+----------------+---------+ | ||||
| </artwork></figure> </t> | ||||
| <t> <list style="hanging"> | <table anchor="dictionary"> | |||
| <t hangText="Magic_Number:"> 4 bytes ID, | <name>Dictionary General Format</name> | |||
| value 0xEC30A437, little-endian | <thead> | |||
| format. </t> | <tr> | |||
| <th>Magic_Number</th> | ||||
| <th>Dictionary_ID</th> | ||||
| <th>Entropy_Tables</th> | ||||
| <th>Content</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td></td> | ||||
| <td></td> | ||||
| <td></td> | ||||
| <td></td> | ||||
| </tr> | ||||
| </tbody> | ||||
| </table> | ||||
| <t hangText="Dictionary_ID:"> 4 bytes, stored | <dl newline="false" spacing="normal"> | |||
| <dt>Magic_Number:</dt> | ||||
| <dd> 4 bytes ID, | ||||
| value 0xEC30A437, little-endian | ||||
| format. </dd> | ||||
| <dt>Dictionary_ID:</dt> | ||||
| <dd> | ||||
| <t> 4 bytes, stored | ||||
| in little-endian format. Dictionary_ID | in little-endian format. Dictionary_ID | |||
| can be any value, except 0 (which | can be any value, except 0 (which | |||
| means no Dictionary_ID). It is used by | means no Dictionary_ID). It is used by | |||
| decoders to check if they use the | decoders to check if they use the | |||
| correct dictionary. If the frame is | correct dictionary. If the frame is | |||
| going to be distributed in a private | going to be distributed in a private | |||
| environment, any Dictionary_ID can be | environment, any Dictionary_ID can be | |||
| used. However, for public distribution | used. However, for public distribution | |||
| of compressed frames, the following | of compressed frames, the following | |||
| ranges are reserved and shall not be | ranges are reserved and shall not be | |||
| used: | used: | |||
| <list style="hanging"> | ||||
| <?rfc subcompact="yes"?> | ||||
| <t>low range: <= 32767</t> | ||||
| <t>high range: >= (2^31)</t> | ||||
| <?rfc subcompact="no"?> | ||||
| </list> | ||||
| </t> | </t> | |||
| <dl newline="false" spacing="normal"> | ||||
| <t hangText="Entropy_Tables:"> Follow the | <dt/> | |||
| <dd>low range: <= 32767</dd> | ||||
| <dt/> | ||||
| <dd>high range: >= (2<sup>31</sup>)</dd> | ||||
| </dl> | ||||
| </dd> | ||||
| <dt>Entropy_Tables:</dt> | ||||
| <dd> Follow the | ||||
| same format as the tables in | same format as the tables in | |||
| compressed blocks. See the relevant | compressed blocks. See the relevant | |||
| FSE and Huffman sections for how to | FSE and Huffman sections for how to | |||
| decode these tables. They are stored | decode these tables. They are stored | |||
| in the following order: Huffman table for | in the following order: Huffman table for | |||
| literals, FSE table for offsets, FSE | literals, FSE table for offsets, FSE | |||
| table for match lengths, and FSE table | table for match lengths, and FSE table | |||
| for literals lengths. These tables | for literals lengths. These tables | |||
| populate the Repeat Stats literals | populate the Repeat Stats literals | |||
| mode and Repeat distribution mode for | mode and Repeat distribution mode for | |||
| sequence decoding. It is finally | sequence decoding. It is finally | |||
| followed by 3 offset values, | followed by 3 offset values, | |||
| populating repeat offsets (instead of | populating repeat offsets (instead of | |||
| using {1,4,8}), stored in order, | using {1,4,8}), stored in order, | |||
| 4-bytes little-endian each, for a | 4 bytes little-endian each, for a | |||
| total of 12 bytes. Each repeat offset | total of 12 bytes. Each repeat offset | |||
| must have a value less than the | must have a value less than the | |||
| dictionary size. </t> | dictionary size. </dd> | |||
| <dt>Content:</dt> | ||||
| <t hangText="Content:"> The rest of the | <dd> The rest of the | |||
| dictionary is its content. The content | dictionary is its content. The content | |||
| acts as a "past" in front of data to be | acts as a "past" in front of data to be | |||
| compressed or decompressed, so it can be | compressed or decompressed, so it can be | |||
| referenced in sequence commands. As | referenced in sequence commands. As | |||
| long as the amount of data decoded | long as the amount of data decoded | |||
| from this frame is less than or equal | from this frame is less than or equal | |||
| to Window_Size, sequence commands may | to Window_Size, sequence commands may | |||
| specify offsets longer than the total | specify offsets longer than the total | |||
| length of decoded output so far to | length of decoded output so far to | |||
| reference back to the dictionary, | reference back to the dictionary, | |||
| even parts of the dictionary with | even parts of the dictionary with | |||
| offsets larger than Window_Size. | offsets larger than Window_Size. | |||
| After the total output has surpassed | After the total output has surpassed | |||
| Window_Size, however, this is no longer | Window_Size, however, this is no longer | |||
| allowed, and the dictionary is no | allowed, and the dictionary is no | |||
| longer accessible. </t> | longer accessible. </dd> | |||
| </list> </t> | </dl> | |||
| </section> | </section> | |||
| <section anchor="dict_future" numbered="true" toc="default"> | ||||
| <section anchor="dict_future" title="Use of Dictionaries"> | <name>Use of Dictionaries</name> | |||
| <t> Provisioning for use of dictionaries with zstd is being | <t> Provisioning for use of dictionaries with zstd is being | |||
| explored. See, for example, <xref target="DICT-SEC"/>. | explored. See, for example, <xref target="I-D.handte-httpbis | |||
| -dict-sec" format="default"/>. | ||||
| The likely outcome will be a registry of well-tested | The likely outcome will be a registry of well-tested | |||
| dictionaries optimized for different use cases and | dictionaries optimized for different use cases and | |||
| identifiers for each, possibly with a private negotiation | identifiers for each, possibly with a private negotiation | |||
| mechanism for use of unregistered dictionaries. </t> | mechanism for use of unregistered dictionaries. </t> | |||
| <t> To ensure compatibility with the | ||||
| <t> To ensure compatibility with the | ||||
| future specification of use of dictionaries with zstd | future specification of use of dictionaries with zstd | |||
| payloads, especially with MIME, content encoded with the | payloads, especially with MIME, content encoded with the | |||
| media type registered here should not use a dictionary. | media type registered here should not use a dictionary. | |||
| The exception to this requirement might be a private | The exception to this requirement might be a private | |||
| dictionary negotiaton, suggested above, which is not part | dictionary negotiation, suggested above, which is not part | |||
| of this specification. </t> | of this specification. </t> | |||
| </section> | </section> | |||
| <section anchor="iana" numbered="true" toc="default"> | ||||
| <section anchor="iana" title="IANA Considerations"> | <name>IANA Considerations</name> | |||
| <t> IANA has updated two previously existing registrations and | <t> IANA has updated two previously existing registrations and | |||
| made one new registration as described below. </t> | made one new registration as described below. </t> | |||
| <section anchor="iana_media_type" numbered="true" toc="default"> | ||||
| <section anchor="iana_media_type" | <name>The 'application/zstd' Media Type</name> | |||
| title="The 'application/zstd' Media Type"> | <t> The 'application/zstd' media type identifies a | |||
| <t> The 'application/zstd' media type identifies a | ||||
| block of data that is compressed using zstd | block of data that is compressed using zstd | |||
| compression. The data is a stream of bytes as | compression. The data is a stream of bytes as | |||
| described in this document. IANA has | described in this document. IANA has | |||
| added the following to the "Media Types" | added the following to the "Media Types" | |||
| registry: | registry: | |||
| <list style="hanging"> | </t> | |||
| <t hangText="Type name:"> application </t> | <dl> | |||
| <dt>Type name:</dt> | ||||
| <t hangText="Subtype name:"> zstd </t> | <dd>application</dd> | |||
| <dt>Subtype name:</dt> | ||||
| <t hangText="Required parameters:"> N/A </t> | <dd>zstd</dd> | |||
| <dt>Required parameters:</dt> | ||||
| <t hangText="Optional parameters:"> N/A </t> | <dd>N/A</dd> | |||
| <dt>Optional parameters:</dt> | ||||
| <t hangText="Encoding considerations:"> | <dd>N/A</dd> | |||
| <dt>Encoding considerations:</dt> | ||||
| <dd> | ||||
| binary | binary | |||
| </t> | </dd> | |||
| <dt>Security considerations:</dt> | ||||
| <t hangText="Security considerations:"> | <dd> | |||
| See <xref target="security"/> of | See <xref target="security" format="default"/> of | |||
| [this document] | RFC 8878. | |||
| </t> | </dd> | |||
| <dt>Interoperability considerations:</dt> | ||||
| <t hangText="Interoperability considerations:"> | <dd> | |||
| N/A | N/A | |||
| </t> | </dd> | |||
| <dt>Published specification:</dt> | ||||
| <t hangText="Published specification:"> | <dd> | |||
| [this document] | RFC 8878 | |||
| </t> | </dd> | |||
| <dt>Applications which use this media type:</dt> | ||||
| <t hangText="Applications that use this media typ | <dd> | |||
| e:"> | ||||
| anywhere data size is an issue | anywhere data size is an issue | |||
| </t> | </dd> | |||
| <dt>Fragment identifier considerations:</dt> | ||||
| <t hangText="Fragment identifier considerations:" | <dd> | |||
| > | ||||
| No fragment identifiers are defined | No fragment identifiers are defined | |||
| for this type. | for this type. | |||
| </t> | </dd> | |||
| <dt>Additional information:</dt> | ||||
| <dd> | ||||
| <t><br/></t> | ||||
| <dl spacing="compact"> | ||||
| <dt>Deprecated alias names for this type:</dt> | ||||
| <dd> | ||||
| N/A | ||||
| </dd> | ||||
| <dt>Magic number(s):</dt> | ||||
| <dd> | ||||
| 4 bytes, little-endian format. Value: 0xFD2FB528 | ||||
| </dd> | ||||
| <t hangText="Additional information:"> | <dt>File extension(s):</dt> | |||
| <list style="hanging"> | <dd> | |||
| <t hangText="Magic number(s):"> | zst | |||
| 4 bytes, little-endian fo | </dd> | |||
| rmat. Value: 0xFD2FB528 | <dt>Macintosh file type code(s):</dt> | |||
| </t> | <dd> | |||
| <!--note: changed 'zstd' to 'zst' per author request--> | N/A | |||
| <t hangText="File extension(s):"> | </dd> | |||
| zst | </dl> | |||
| </t> | </dd> | |||
| <t hangText="Macintosh file type | ||||
| code(s):"> | ||||
| N/A | ||||
| </t> | ||||
| </list> | ||||
| </t> | ||||
| <t hangText="For further information:"> | <dt>Person & email address to contact for further | |||
| See <xref target="ZSTD"/> | information:</dt><dd>Yann Collet <cyan@fb.com></dd> | |||
| </t> | ||||
| <t hangText="Intended usage:"> | <dt>Intended usage:</dt> | |||
| <dd> | ||||
| common | common | |||
| </t> | </dd> | |||
| <dt>Restrictions on usage:</dt> | ||||
| <t hangText="Restrictions on usage:"> | <dd> | |||
| N/A | N/A | |||
| </t> | </dd> | |||
| <dt>Author:</dt> | ||||
| <t hangText="Author:"> | <dd> | |||
| Murray S. Kucherawy | Murray S. Kucherawy | |||
| </t> | </dd> | |||
| <dt>Change Controller:</dt> | ||||
| <t hangText="Change Controller:"> | <dd> | |||
| IETF | IETF | |||
| </t> | </dd> | |||
| <!--note: changed 'yes' to 'no' per author request--> | ||||
| <t hangText="Provisional registration:"> | ||||
| no | ||||
| </t> | ||||
| </list> </t> | ||||
| </section> | ||||
| <section anchor="iana_content_encoding" | <dt>Provisional registration:</dt> | |||
| title="Content Encoding"> | <dd> | |||
| <t> IANA has added the following entry | no | |||
| </dd> | ||||
| <dt>For further information:</dt> | ||||
| <dd> | ||||
| See <xref target="ZSTD" format="default"/ | ||||
| > | ||||
| </dd> | ||||
| </dl> | ||||
| </section> | ||||
| <section anchor="iana_content_encoding" numbered="true" toc="default"> | ||||
| <name>Content Encoding</name> | ||||
| <t> IANA has added the following entry | ||||
| to the "HTTP Content Coding Registry" | to the "HTTP Content Coding Registry" | |||
| within the "Hypertext Transfer Protocol (HTTP) | within the "Hypertext Transfer Protocol (HTTP) | |||
| Parameters" registry: | Parameters" registry: | |||
| <list style="hanging"> | </t> | |||
| <t hangText="Name:"> zstd </t> | <dl newline="false" spacing="normal"> | |||
| <dt>Name:</dt> | ||||
| <t hangText="Description:"> A stream of bytes | <dd> zstd </dd> | |||
| <dt>Description:</dt> | ||||
| <dd> A stream of bytes | ||||
| compressed using the Zstandard | compressed using the Zstandard | |||
| protocol </t> | protocol </dd> | |||
| <dt>Reference:</dt> | ||||
| <t hangText="Pointer to specification text:"> | <dd> | |||
| [this document]</t> | RFC 8878</dd> | |||
| </list> </t> | </dl> | |||
| </section> | </section> | |||
| <section anchor="iana_suffix" numbered="true" toc="default"> | ||||
| <section anchor="iana_suffix" | <name>Structured Syntax Suffix</name> | |||
| title="Structured Syntax Suffix"> | <t> IANA has registered the following | |||
| <t> IANA is requested to register the following | into the "Structured Syntax Suffix" | |||
| into the Structured Syntax Suffix registry: | registry: | |||
| <list style="hanging"> | ||||
| <t hangText="Name:"> Zstandard </t> | ||||
| <t hangText="+suffix:"> +zstd </t> | ||||
| <t hangText="Encoding Considerations:"> | ||||
| binary </t> | ||||
| <t hangText="Interoperability Considerations:"> | ||||
| N/A </t> | ||||
| <t hangText="Fragment Identifier Considerations:" | </t> | |||
| > | <dl newline="false" spacing="normal"> | |||
| <dt>Name:</dt> | ||||
| <dd> Zstandard </dd> | ||||
| <dt>+suffix:</dt> | ||||
| <dd> +zstd </dd> | ||||
| <dt>Encoding Considerations:</dt> | ||||
| <dd> | ||||
| binary </dd> | ||||
| <dt>Interoperability Considerations:</dt> | ||||
| <dd> | ||||
| N/A </dd> | ||||
| <dt>Fragment Identifier Considerations:</dt> | ||||
| <dd> | ||||
| The syntax and semantics of fragment | The syntax and semantics of fragment | |||
| identifiers specified for +zstd should | identifiers specified for +zstd should | |||
| be as specified for "application/zstd". | be as specified for 'application/zstd'. | |||
| </t> | </dd> | |||
| <dt>Security Considerations:</dt> | ||||
| <t hangText="Security Considerations:"> | <dd> | |||
| See <xref target="security"/> of | See <xref target="security" format="default"/> of | |||
| [this document]. </t> | RFC 8878. </dd> | |||
| <dt>Contact:</dt> | ||||
| <t hangText="Contact:"> | <dd> | |||
| Refer to the author for the | Refer to the author for the | |||
| application/zstd media type. </t> | 'application/zstd' media type. </dd> | |||
| <dt>Author/Change Controller:</dt> | ||||
| <t hangText="Author/Change Controller:"> | <dd> | |||
| IETF </t> | IETF </dd> | |||
| </list> </t> | </dl> | |||
| </section> | </section> | |||
| <section anchor="iana_dict" numbered="true" toc="default"> | ||||
| <section anchor="iana_dict" title="Dictionaries"> | <name>Dictionaries</name> | |||
| <t> Work in progress includes | <t> Work in progress includes | |||
| development of dictionaries that will optimize | development of dictionaries that will optimize | |||
| compression and decompression of particular | compression and decompression of particular | |||
| types of data. Specification of such | types of data. Specification of such | |||
| dictionaries for public use will necessitate | dictionaries for public use will necessitate | |||
| registration of a code point from the reserved | registration of a code point from the reserved | |||
| range described in | range described in | |||
| <xref target="comp_dictionary_id"/> and its | <xref target="comp_dictionary_id" format="default"/> and its | |||
| association with a specific dictionary. </t> | association with a specific dictionary. </t> | |||
| <t> At present, there are no such dictionaries | ||||
| <t> However, there are at present no such dictionaries | published for public use, so this document | |||
| published for public use, so this document makes | has made | |||
| no immediate request of IANA to create such a | no immediate request of IANA to create such a | |||
| registry. </t> | registry. </t> | |||
| </section> | </section> | |||
| </section> | </section> | |||
| <section anchor="security" numbered="true" toc="default"> | ||||
| <name>Security Considerations</name> | ||||
| <section anchor="security" title="Security Considerations"> | <t> Any data-compression method involves the reduction of | |||
| <t> Any data compression method involves the reduction of | ||||
| redundancy in the data. Zstandard is no exception, | redundancy in the data. Zstandard is no exception, | |||
| and the usual precautions apply. </t> | and the usual precautions apply. </t> | |||
| <t> One should never compress a message whose | ||||
| <t> One should never compress a message whose | ||||
| content must remain secret with a message generated by | content must remain secret with a message generated by | |||
| a third party. Such a compression can be used to guess the | a third party. Such a compression can be used to guess the | |||
| content of the secret message through analysis of | content of the secret message through analysis of | |||
| entropy reduction. | entropy reduction. | |||
| This was demonstrated in the Compression Ratio | This was demonstrated in the Compression Ratio | |||
| Info-leak Made Easy (CRIME) attack <xref target="CRIME"/>, for example. </t> | Info-leak Made Easy (CRIME) attack <xref target="CRIME" format="default"/>, f | |||
| or example. </t> | ||||
| <t> A decoder has to demonstrate capabilities to detect | <t> A decoder has to demonstrate capabilities to detect | |||
| and prevent any kind of data tampering in the compressed | and prevent any kind of data tampering in the compressed | |||
| frame from triggering system faults, such as reading or | frame from triggering system faults, such as reading or | |||
| writing beyond allowed memory ranges. This can be | writing beyond allowed memory ranges. This can be | |||
| guaranteed by either the implementation language | guaranteed by either the implementation language | |||
| or careful bound checkings. Of particular note is the | or careful bound checkings. Of particular note is the | |||
| encoding of Number_of_Sequences values that cause the | encoding of Number_of_Sequences values that cause the | |||
| decoder to read into the block header (and beyond), as | decoder to read into the block header (and beyond), as | |||
| well as the indication of a Frame_Content_Size that is | well as the indication of a Frame_Content_Size that is | |||
| smaller than the actual decompressed data, in an attempt | smaller than the actual decompressed data, in an attempt | |||
| to trigger a buffer overflow. It is highly recommended | to trigger a buffer overflow. It is highly recommended | |||
| to fuzz-test (i.e., provide invalid, unexpected, or | to fuzz-test (i.e., provide invalid, unexpected, or | |||
| random input and verify safe operation of) decoder | random input and verify safe operation of) decoder | |||
| implementations to test and harden their capability to | implementations to test and harden their capability to | |||
| detect bad frames and deal with them without any adverse | detect bad frames and deal with them without any adverse | |||
| system side effect. </t> | system side effect. </t> | |||
| <t> An attacker may provide correctly formed compressed frames | ||||
| <t> An attacker may provide correctly formed compressed frames | ||||
| with unreasonable memory requirements. A decoder must | with unreasonable memory requirements. A decoder must | |||
| always control memory requirements and enforce some | always control memory requirements and enforce some | |||
| (system-specific) limits in order to protect memory usage | (system-specific) limits in order to protect memory usage | |||
| from such scenarios. </t> | from such scenarios. </t> | |||
| <t> Compression can be optimized by training a dictionary | ||||
| <t> Compression can be optimized by training a dictionary | ||||
| on a variety of related content payloads. This dictionary | on a variety of related content payloads. This dictionary | |||
| must then be available at the decoder for decompression | must then be available at the decoder for decompression | |||
| of the payload to be possible. While this document does | of the payload to be possible. While this document does | |||
| not specify how to acquire a dictionary for a given | not specify how to acquire a dictionary for a given | |||
| compressed payload, it is worth noting that third-party | compressed payload, it is worth noting that third-party | |||
| dictionaries may interact unexpectedly with a decoder, | dictionaries may interact unexpectedly with a decoder, | |||
| leading to possible memory or other resource exhaustion | leading to possible memory or other resource-exhaustion | |||
| attacks. We expect such topics to be discussed in further | attacks. We expect such topics to be discussed in further | |||
| detail in the Security Considerations section of a | detail in the Security Considerations section of a | |||
| forthcoming RFC for dictionary acquisition and | forthcoming RFC for dictionary acquisition and | |||
| transmission, but highlight this issue now out of an | transmission, but highlight this issue now out of an | |||
| abundance of caution. </t> | abundance of caution. </t> | |||
| <t> As discussed in <xref target="comp_skippable" format="default"/>, it i | ||||
| <t> As discussed in <xref target="comp_skippable"/>, it is | s | |||
| possible to store arbitrary user metadata in skippable | possible to store arbitrary user metadata in skippable | |||
| frames. While such frames are ignored during decompression | frames. While such frames are ignored during decompression | |||
| of the data, they can be used as a watermark to track | of the data, they can be used as a watermark to track | |||
| the path of the compressed payload. </t> | the path of the compressed payload. </t> | |||
| </section> | </section> | |||
| <section anchor="impl" title="Implementation Status"> | ||||
| <t> Source code for a C language implementation of a | ||||
| Zstandard-compliant library is available at | ||||
| <xref target="ZSTD-GITHUB"/>. This implementation is | ||||
| considered to be the reference implementation and is | ||||
| production ready; it implements the full range of the | ||||
| specification. It is routinely tested against security | ||||
| hazards and widely deployed within Facebook | ||||
| infrastructure. </t> | ||||
| <t> The reference version is optimized for speed and is highly | ||||
| portable. It has been proven to run safely on multiple | ||||
| architectures (e.g., x86, x64, ARM, MIPS, PowerPC, IA64) | ||||
| featuring 32- or 64-bit addressing schemes, a little- or | ||||
| big-endian storage scheme, a number of different operating | ||||
| systems (e.g., UNIX (including Linux, BSD, OS-X, and | ||||
| Solaris) and Windows), and a number of compilers (e.g., | ||||
| gcc, clang, visual, and icc). </t> | ||||
| <t> A comprehensive and current list of known implementations | ||||
| can be found at <xref target="ZSTD"/>. </t> | ||||
| </section> | ||||
| </middle> | ||||
| <back> | ||||
| <references title="Normative References"> | ||||
| <reference anchor="ZSTD" target="http://www.zstd.net"> | ||||
| <front> | ||||
| <title> Zstandard | ||||
| </title> | ||||
| <author fullname="Yann Collet"> | ||||
| </author> | ||||
| <date year="2017"/> | ||||
| </front> | ||||
| </reference> | ||||
| </references> | </middle> | |||
| <back> | ||||
| <references title="Informative References"> | <displayreference target="I-D.handte-httpbis-dict-sec" to="DICT-SEC"/> | |||
| <reference anchor="ANS" | ||||
| target="https://arxiv.org/pdf/1311.2540"> | ||||
| <front> | ||||
| <title> Asymmetric numeral systems: entropy | ||||
| coding combining speed of Huffman | ||||
| coding with compression rate of | ||||
| arithmetic coding | ||||
| </title> | ||||
| <author initials='J' surname='Duda' fullname='Ja | ||||
| rek Duda'/> | ||||
| <date month="January" year="2014"/> | ||||
| </front> | ||||
| </reference> | ||||
| <reference anchor="DICT-SEC"> | <references> | |||
| <front> | <name>References</name> | |||
| <title> Security Considerations Regarding | <references> | |||
| Compression Dictionaries </title> | <name>Normative References</name> | |||
| <author initials='W' surname='Handte' fullname=' | ||||
| W. Handte'/> | ||||
| <date month="October" year="2019"/> | ||||
| </front> | ||||
| <seriesInfo name="(work in progress)" | ||||
| value="draft-handte-httpbis-dict-sec"/> | ||||
| </reference> | ||||
| <reference anchor="LZ4" | <reference anchor="ZSTD" target="http://www.zstd.net"> | |||
| target="https://github.com/lz4/lz4/blob/master/doc/lz4_Fr | <front> | |||
| ame_format.md"> | <title> Zstandard | |||
| <front> | </title> | |||
| <title>LZ4 Frame Format Description</title> | <author /> | |||
| <author fullname="Yann Collet"/> | </front> | |||
| <date month="January" year="2018"/> | </reference> | |||
| </front> | </references> | |||
| </reference> | ||||
| <reference anchor="FSE" | <references> | |||
| target="https://github.com/Cyan4973/FiniteStateEntropy/"> | <name>Informative References</name> | |||
| <front> | ||||
| <title> FiniteStateEntropy | ||||
| </title> | ||||
| <author fullname="Yann Collet"/> | ||||
| <date month="June" year="2018"/> | ||||
| </front> | ||||
| </reference> | ||||
| <reference anchor="CRIME" | <reference anchor="ANS" target="https://arxiv.org/pdf/1311.2540"> | |||
| target="https://en.wikipedia.org/w/index.php?title=CRIME& | <front> | |||
| amp;oldid=844538656"> | <title> Asymmetric numeral systems: entropy | |||
| <front> | coding combining speed of Huffman | |||
| <title>CRIME | coding with compression rate of | |||
| </title> | arithmetic coding | |||
| <author/> | </title> | |||
| <date month="June" year="2018"/> | <author initials="J" surname="Duda" fullname="Jarek Duda"/> | |||
| </front> | <date month="January" year="2014"/> | |||
| </reference> | </front> | |||
| </reference> | ||||
| <reference anchor="ZSTD-GITHUB" | <reference anchor='I-D.handte-httpbis-dict-sec'> | |||
| target="https://github.com/facebook/zstd"> | <front> | |||
| <front> | <title>Security Considerations Regarding Compression Dictionaries</title> | |||
| <title>zstd | <author initials='F' surname='Handte' fullname='Felix Handte'> | |||
| </title> | <organization /> | |||
| </author> | ||||
| <date month='October' day='29' year='2019' /> | ||||
| </front> | ||||
| <seriesInfo name='Internet-Draft' value='draft-handte-httpbis-dict-sec-00' /> | ||||
| <format type='TXT' | ||||
| target='http://www.ietf.org/internet-drafts/draft-handte-httpbis-dict-se | ||||
| c-00.txt' /> | ||||
| </reference> | ||||
| <author fullname="Yann Collet"/> | <reference anchor="LZ4" target="https://github.com/lz4/lz4/blob/master/d | |||
| oc/lz4_Frame_format.md"> | ||||
| <front> | ||||
| <title>LZ4 Frame Format Description</title> | ||||
| <author /> | ||||
| <date month="January" year="2019"/> | ||||
| </front> | ||||
| <refcontent>commit ec735ac</refcontent> | ||||
| </reference> | ||||
| <date month="August" year="2018"/> | <reference anchor="FSE" target="https://github.com/Cyan4973/FiniteStateE | |||
| </front> | ntropy/"> | |||
| </reference> | <front> | |||
| <title> FiniteStateEntropy | ||||
| </title> | ||||
| <author /> | ||||
| <date month="July" year="2020"/> | ||||
| </front> | ||||
| <refcontent>commit 12a533a</refcontent> | ||||
| </reference> | ||||
| &RFC1952; | <reference anchor="CRIME" target="https://en.wikipedia.org/w/index.php?t | |||
| itle=CRIME&oldid=844538656"> | ||||
| <front> | ||||
| <title>CRIME | ||||
| </title> | ||||
| <author/> | ||||
| <date month="June" year="2018"/> | ||||
| </front> | ||||
| </reference> | ||||
| <reference anchor="XXHASH" target="http://www.xxhash.org"> | <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/refer | |||
| <front> | ence.RFC.1952.xml"/> | |||
| <title> XXHASH Algorithm | ||||
| </title> | ||||
| <author fullname="(unknown author)"/> | <reference anchor="XXHASH" target="http://www.xxhash.org"> | |||
| <front> | ||||
| <title>xxHash | ||||
| </title> | ||||
| <author /> | ||||
| </front> | ||||
| </reference> | ||||
| <date year="2017"/> | <reference anchor="Err5786" target="https://www.rfc-editor.org/errata/eid5786"> | |||
| </front> | <front> | |||
| </reference> | <title>Erratum ID 5786</title> | |||
| <author><organization>RFC Errata</organization></author> | ||||
| </front> | ||||
| <refcontent>RFC 8478</refcontent> | ||||
| </reference> | ||||
| </references> | <reference anchor="Err6303" target="https://www.rfc-editor.org/errata/eid6303"> | |||
| <front> | ||||
| <title>Erratum ID 6303</title> | ||||
| <author><organization>RFC Errata</organization></author> | ||||
| </front> | ||||
| <refcontent>RFC 8478</refcontent> | ||||
| </reference> | ||||
| <section anchor="app_tables" | </references> | |||
| title="Decoding Tables for Predefined Codes"> | </references> | |||
| <t> This appendix contains FSE decoding tables for the | <section anchor="app_tables" numbered="true" toc="default"> | |||
| predefined literal length, match length, and offset codes. | <name>Decoding Tables for Predefined Codes</name> | |||
| <t> This appendix contains FSE decoding tables for the | ||||
| predefined literals length, match length, and offset codes. | ||||
| The tables have been constructed using the algorithm as | The tables have been constructed using the algorithm as | |||
| given above in <xref target="comp_fse_table"/>. The tables he re can be used as examples | given above in <xref target="comp_fse_table" format="default" />. The tables here can be used as examples | |||
| to crosscheck that an implementation has built its decoding | to crosscheck that an implementation has built its decoding | |||
| tables correctly. </t> | tables correctly. </t> | |||
| <section anchor="app_tables_literal" numbered="true" toc="default"> | ||||
| <name>Literals Length Code Table</name> | ||||
| <section anchor="app_tables_literal" | <table anchor="lit-length-code"> | |||
| title="Literal Length Code Table"> | <name>Literals Length Code</name> | |||
| <t> <figure><artwork> | <thead> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | State | Symbol | Number_Of_Bits | Base | | <th>State</th> | |||
| +-------+--------+----------------+------+ | <th>Symbol</th> | |||
| | 0 | 0 | 0 | 0 | | <th>Number_Of_Bits</th> | |||
| +-------+--------+----------------+------+ | <th>Base</th> | |||
| | 0 | 0 | 4 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | </thead> | |||
| | 1 | 0 | 4 | 16 | | <tbody> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 2 | 1 | 5 | 32 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 3 | 3 | 5 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 4 | 4 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 5 | 6 | 5 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 6 | 7 | 5 | 0 | | <td align="center">4</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 7 | 9 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 8 | 10 | 5 | 0 | | <td align="center">1</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 9 | 12 | 5 | 0 | | <td align="center">4</td> | |||
| +-------+--------+----------------+------+ | <td align="center">16</td> | |||
| | 10 | 14 | 6 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 11 | 16 | 5 | 0 | | <td align="center">2</td> | |||
| +-------+--------+----------------+------+ | <td align="center">1</td> | |||
| | 12 | 18 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 13 | 19 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 14 | 21 | 5 | 0 | | <td align="center">3</td> | |||
| +-------+--------+----------------+------+ | <td align="center">3</td> | |||
| | 15 | 22 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 16 | 24 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 17 | 25 | 5 | 32 | | <td align="center">4</td> | |||
| +-------+--------+----------------+------+ | <td align="center">4</td> | |||
| | 18 | 26 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 19 | 27 | 6 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 20 | 29 | 6 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">6</td> | |||
| | 21 | 31 | 6 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 22 | 0 | 4 | 32 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 23 | 1 | 4 | 0 | | <td align="center">6</td> | |||
| +-------+--------+----------------+------+ | <td align="center">7</td> | |||
| | 24 | 2 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 25 | 4 | 5 | 32 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 26 | 5 | 5 | 0 | | <td align="center">7</td> | |||
| +-------+--------+----------------+------+ | <td align="center">9</td> | |||
| | 27 | 7 | 5 | 32 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 28 | 8 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 29 | 10 | 5 | 32 | | <td align="center">8</td> | |||
| +-------+--------+----------------+------+ | <td align="center">10</td> | |||
| | 30 | 11 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 31 | 13 | 6 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 32 | 16 | 5 | 32 | | <td align="center">9</td> | |||
| +-------+--------+----------------+------+ | <td align="center">12</td> | |||
| | 33 | 17 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 34 | 19 | 5 | 32 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 35 | 20 | 5 | 0 | | <td align="center">10</td> | |||
| +-------+--------+----------------+------+ | <td align="center">14</td> | |||
| | 36 | 22 | 5 | 32 | | <td align="center">6</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 37 | 23 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 38 | 25 | 4 | 0 | | <td align="center">11</td> | |||
| +-------+--------+----------------+------+ | <td align="center">16</td> | |||
| | 39 | 25 | 4 | 16 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 40 | 26 | 5 | 32 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 41 | 28 | 6 | 0 | | <td align="center">12</td> | |||
| +-------+--------+----------------+------+ | <td align="center">18</td> | |||
| | 42 | 30 | 6 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 43 | 0 | 4 | 48 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 44 | 1 | 4 | 16 | | <td align="center">13</td> | |||
| +-------+--------+----------------+------+ | <td align="center">19</td> | |||
| | 45 | 2 | 5 | 32 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 46 | 3 | 5 | 32 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 47 | 5 | 5 | 32 | | <td align="center">14</td> | |||
| +-------+--------+----------------+------+ | <td align="center">21</td> | |||
| | 48 | 6 | 5 | 32 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 49 | 8 | 5 | 32 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 50 | 9 | 5 | 32 | | <td align="center">15</td> | |||
| +-------+--------+----------------+------+ | <td align="center">22</td> | |||
| | 51 | 11 | 5 | 32 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 52 | 12 | 5 | 32 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 53 | 15 | 6 | 0 | | <td align="center">16</td> | |||
| +-------+--------+----------------+------+ | <td align="center">24</td> | |||
| | 54 | 17 | 5 | 32 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 55 | 18 | 5 | 32 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 56 | 20 | 5 | 32 | | <td align="center">17</td> | |||
| +-------+--------+----------------+------+ | <td align="center">25</td> | |||
| | 57 | 21 | 5 | 32 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 58 | 23 | 5 | 32 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 59 | 24 | 5 | 32 | | <td align="center">18</td> | |||
| +-------+--------+----------------+------+ | <td align="center">26</td> | |||
| | 60 | 35 | 6 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 61 | 34 | 6 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 62 | 33 | 6 | 0 | | <td align="center">19</td> | |||
| +-------+--------+----------------+------+ | <td align="center">27</td> | |||
| | 63 | 32 | 6 | 0 | | <td align="center">6</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| </artwork></figure></t> | </tr> | |||
| </section> | <tr> | |||
| <td align="center">20</td> | ||||
| <section anchor="app_tables_match" | <td align="center">29</td> | |||
| title="Match Length Code Table"> | <td align="center">6</td> | |||
| <t> <figure><artwork> | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | State | Symbol | Number_Of_Bits | Base | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">21</td> | |||
| | 0 | 0 | 0 | 0 | | <td align="center">31</td> | |||
| +-------+--------+----------------+------+ | <td align="center">6</td> | |||
| | 0 | 0 | 6 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 1 | 1 | 4 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">22</td> | |||
| | 2 | 2 | 5 | 32 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | <td align="center">4</td> | |||
| | 3 | 3 | 5 | 0 | | <td align="center">32</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 4 | 5 | 5 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">23</td> | |||
| | 5 | 6 | 5 | 0 | | <td align="center">1</td> | |||
| +-------+--------+----------------+------+ | <td align="center">4</td> | |||
| | 6 | 8 | 5 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 7 | 10 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">24</td> | |||
| | 8 | 13 | 6 | 0 | | <td align="center">2</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 9 | 16 | 6 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 10 | 19 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">25</td> | |||
| | 11 | 22 | 6 | 0 | | <td align="center">4</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 12 | 25 | 6 | 0 | | <td align="center">32</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 13 | 28 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">26</td> | |||
| | 14 | 31 | 6 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 15 | 33 | 6 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 16 | 35 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">27</td> | |||
| | 17 | 37 | 6 | 0 | | <td align="center">7</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 18 | 39 | 6 | 0 | | <td align="center">32</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 19 | 41 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">28</td> | |||
| | 20 | 43 | 6 | 0 | | <td align="center">8</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 21 | 45 | 6 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 22 | 1 | 4 | 16 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">29</td> | |||
| | 23 | 2 | 4 | 0 | | <td align="center">10</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 24 | 3 | 5 | 32 | | <td align="center">32</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 25 | 4 | 5 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">30</td> | |||
| | 26 | 6 | 5 | 32 | | <td align="center">11</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 27 | 7 | 5 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 28 | 9 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">31</td> | |||
| | 29 | 12 | 6 | 0 | | <td align="center">13</td> | |||
| +-------+--------+----------------+------+ | <td align="center">6</td> | |||
| | 30 | 15 | 6 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 31 | 18 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 32 | 21 | 6 | 0 | | <td align="center">16</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 33 | 24 | 6 | 0 | | <td align="center">32</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 34 | 27 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">33</td> | |||
| | 35 | 30 | 6 | 0 | | <td align="center">17</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 36 | 32 | 6 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 37 | 34 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">34</td> | |||
| | 38 | 36 | 6 | 0 | | <td align="center">19</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 39 | 38 | 6 | 0 | | <td align="center">32</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 40 | 40 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">35</td> | |||
| | 41 | 42 | 6 | 0 | | <td align="center">20</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 42 | 44 | 6 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 43 | 1 | 4 | 32 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">36</td> | |||
| | 44 | 1 | 4 | 48 | | <td align="center">22</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 45 | 2 | 4 | 16 | | <td align="center">32</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 46 | 4 | 5 | 32 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">37</td> | |||
| | 47 | 5 | 5 | 32 | | <td align="center">23</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 48 | 7 | 5 | 32 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 49 | 8 | 5 | 32 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">38</td> | |||
| | 50 | 11 | 6 | 0 | | <td align="center">25</td> | |||
| +-------+--------+----------------+------+ | <td align="center">4</td> | |||
| | 51 | 14 | 6 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 52 | 17 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">39</td> | |||
| | 53 | 20 | 6 | 0 | | <td align="center">25</td> | |||
| +-------+--------+----------------+------+ | <td align="center">4</td> | |||
| | 54 | 23 | 6 | 0 | | <td align="center">16</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 55 | 26 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">40</td> | |||
| | 56 | 29 | 6 | 0 | | <td align="center">26</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 57 | 52 | 6 | 0 | | <td align="center">32</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 58 | 51 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">41</td> | |||
| | 59 | 50 | 6 | 0 | | <td align="center">28</td> | |||
| +-------+--------+----------------+------+ | <td align="center">6</td> | |||
| | 60 | 49 | 6 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| | 61 | 48 | 6 | 0 | | <tr> | |||
| +-------+--------+----------------+------+ | <td align="center">42</td> | |||
| | 62 | 47 | 6 | 0 | | <td align="center">30</td> | |||
| +-------+--------+----------------+------+ | <td align="center">6</td> | |||
| | 63 | 46 | 6 | 0 | | <td align="center">0</td> | |||
| +-------+--------+----------------+------+ | </tr> | |||
| </artwork></figure></t> | <tr> | |||
| </section> | <td align="center">43</td> | |||
| <td align="center">0</td> | ||||
| <section anchor="app_tables_offset" | <td align="center">4</td> | |||
| title="Offset Code Table"> | <td align="center">48</td> | |||
| <t> <figure><artwork> | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | State | Symbol | Number_Of_Bits | Base | | <td align="center">44</td> | |||
| +-------+--------+----------------+------+ | <td align="center">1</td> | |||
| | 0 | 0 | 0 | 0 | | <td align="center">4</td> | |||
| +-------+--------+----------------+------+ | <td align="center">16</td> | |||
| | 0 | 0 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 1 | 6 | 4 | 0 | | <td align="center">45</td> | |||
| +-------+--------+----------------+------+ | <td align="center">2</td> | |||
| | 2 | 9 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 3 | 15 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 4 | 21 | 5 | 0 | | <td align="center">46</td> | |||
| +-------+--------+----------------+------+ | <td align="center">3</td> | |||
| | 5 | 3 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 6 | 7 | 4 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 7 | 12 | 5 | 0 | | <td align="center">47</td> | |||
| +-------+--------+----------------+------+ | <td align="center">5</td> | |||
| | 8 | 18 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 9 | 23 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 10 | 5 | 5 | 0 | | <td align="center">48</td> | |||
| +-------+--------+----------------+------+ | <td align="center">6</td> | |||
| | 11 | 8 | 4 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 12 | 14 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 13 | 20 | 5 | 0 | | <td align="center">49</td> | |||
| +-------+--------+----------------+------+ | <td align="center">8</td> | |||
| | 14 | 2 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 15 | 7 | 4 | 16 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 16 | 11 | 5 | 0 | | <td align="center">50</td> | |||
| +-------+--------+----------------+------+ | <td align="center">9</td> | |||
| | 17 | 17 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 18 | 22 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 19 | 4 | 5 | 0 | | <td align="center">51</td> | |||
| +-------+--------+----------------+------+ | <td align="center">11</td> | |||
| | 20 | 8 | 4 | 16 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 21 | 13 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 22 | 19 | 5 | 0 | | <td align="center">52</td> | |||
| +-------+--------+----------------+------+ | <td align="center">12</td> | |||
| | 23 | 1 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 24 | 6 | 4 | 16 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 25 | 10 | 5 | 0 | | <td align="center">53</td> | |||
| +-------+--------+----------------+------+ | <td align="center">15</td> | |||
| | 26 | 16 | 5 | 0 | | <td align="center">6</td> | |||
| +-------+--------+----------------+------+ | <td align="center">0</td> | |||
| | 27 | 28 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 28 | 27 | 5 | 0 | | <td align="center">54</td> | |||
| +-------+--------+----------------+------+ | <td align="center">17</td> | |||
| | 29 | 26 | 5 | 0 | | <td align="center">5</td> | |||
| +-------+--------+----------------+------+ | <td align="center">32</td> | |||
| | 30 | 25 | 5 | 0 | | </tr> | |||
| +-------+--------+----------------+------+ | <tr> | |||
| | 31 | 24 | 5 | 0 | | <td align="center">55</td> | |||
| +-------+--------+----------------+------+ | <td align="center">18</td> | |||
| </artwork></figure></t> | <td align="center">5</td> | |||
| </section> | <td align="center">32</td> | |||
| </section> | </tr> | |||
| <tr> | ||||
| <td align="center">56</td> | ||||
| <td align="center">20</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">57</td> | ||||
| <td align="center">21</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">58</td> | ||||
| <td align="center">23</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">59</td> | ||||
| <td align="center">24</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">60</td> | ||||
| <td align="center">35</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">61</td> | ||||
| <td align="center">34</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">62</td> | ||||
| <td align="center">33</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">63</td> | ||||
| <td align="center">32</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <section anchor="changes" title="Changes Since RFC8478"> | </tbody> | |||
| <t> The following are the changes in this document relative | </table> | |||
| to RFC 8478: | ||||
| <list style="symbols"> | </section> | |||
| <t> Apply erratum #5786. </t> | <section anchor="app_tables_match" numbered="true" toc="default"> | |||
| <name>Match Length Code Table</name> | ||||
| <t> Clarify forward compatibility regarding | <table anchor="match-length"> | |||
| dictionaries. </t> | <name>Match Length Code Table</name> | |||
| <thead> | ||||
| <tr> | ||||
| <th>State</th> | ||||
| <th>Symbol</th> | ||||
| <th>Number_Of_Bits</th> | ||||
| <th>Base</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">0</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">0</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">1</td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">2</td> | ||||
| <td align="center">2</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">3</td> | ||||
| <td align="center">3</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">4</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">5</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">6</td> | ||||
| <td align="center">8</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">7</td> | ||||
| <td align="center">10</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">8</td> | ||||
| <td align="center">13</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">9</td> | ||||
| <td align="center">16</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">10</td> | ||||
| <td align="center">19</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">11</td> | ||||
| <td align="center">22</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">12</td> | ||||
| <td align="center">25</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">13</td> | ||||
| <td align="center">28</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">14</td> | ||||
| <td align="center">31</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">15</td> | ||||
| <td align="center">33</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">16</td> | ||||
| <td align="center">35</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">17</td> | ||||
| <td align="center">37</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">18</td> | ||||
| <td align="center">39</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">19</td> | ||||
| <td align="center">41</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">20</td> | ||||
| <td align="center">43</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">21</td> | ||||
| <td align="center">45</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">22</td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">16</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">23</td> | ||||
| <td align="center">2</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">24</td> | ||||
| <td align="center">3</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">25</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">26</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">27</td> | ||||
| <td align="center">7</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">28</td> | ||||
| <td align="center">9</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">29</td> | ||||
| <td align="center">12</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">30</td> | ||||
| <td align="center">15</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">31</td> | ||||
| <td align="center">18</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">32</td> | ||||
| <td align="center">21</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">33</td> | ||||
| <td align="center">24</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">34</td> | ||||
| <td align="center">27</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">35</td> | ||||
| <td align="center">30</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">36</td> | ||||
| <td align="center">32</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">37</td> | ||||
| <td align="center">34</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">38</td> | ||||
| <td align="center">36</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">39</td> | ||||
| <td align="center">38</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">40</td> | ||||
| <td align="center">40</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">41</td> | ||||
| <td align="center">42</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">42</td> | ||||
| <td align="center">44</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">43</td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">44</td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">48</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">45</td> | ||||
| <td align="center">2</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">16</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">46</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">47</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">48</td> | ||||
| <td align="center">7</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">49</td> | ||||
| <td align="center">8</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">32</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">50</td> | ||||
| <td align="center">11</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">51</td> | ||||
| <td align="center">14</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">52</td> | ||||
| <td align="center">17</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">53</td> | ||||
| <td align="center">20</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">54</td> | ||||
| <td align="center">23</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">55</td> | ||||
| <td align="center">26</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">56</td> | ||||
| <td align="center">29</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">57</td> | ||||
| <td align="center">52</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">58</td> | ||||
| <td align="center">51</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">59</td> | ||||
| <td align="center">50</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">60</td> | ||||
| <td align="center">49</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">61</td> | ||||
| <td align="center">48</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">62</td> | ||||
| <td align="center">47</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">63</td> | ||||
| <td align="center">46</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <t> Clarify application of Block_Maximum_Size. </t> | </tbody> | |||
| </table> | ||||
| <t> Add structured media type suffix registration. </t> | </section> | |||
| <section anchor="app_tables_offset" numbered="true" toc="default"> | ||||
| <name>Offset Code Table</name> | ||||
| <t> Clarify that the content checksum is always | <table anchor="offset-code"> | |||
| 4 bytes. </t> | <name>Offset Code</name> | |||
| <thead> | ||||
| <tr> | ||||
| <th>State</th> | ||||
| <th>Symbol</th> | ||||
| <th>Number_Of_Bits</th> | ||||
| <th>Base</th> | ||||
| </tr> | ||||
| </thead> | ||||
| <tbody> | ||||
| <tr> | ||||
| <td align="center">0</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">0</td> | ||||
| <td align="center">0</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">1</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">2</td> | ||||
| <td align="center">9</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">3</td> | ||||
| <td align="center">15</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">4</td> | ||||
| <td align="center">21</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">5</td> | ||||
| <td align="center">3</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">6</td> | ||||
| <td align="center">7</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">7</td> | ||||
| <td align="center">12</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">8</td> | ||||
| <td align="center">18</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">9</td> | ||||
| <td align="center">23</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">10</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">11</td> | ||||
| <td align="center">8</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">12</td> | ||||
| <td align="center">14</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">13</td> | ||||
| <td align="center">20</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">14</td> | ||||
| <td align="center">2</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">15</td> | ||||
| <td align="center">7</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">16</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">16</td> | ||||
| <td align="center">11</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">17</td> | ||||
| <td align="center">17</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">18</td> | ||||
| <td align="center">22</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">19</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">20</td> | ||||
| <td align="center">8</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">16</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">21</td> | ||||
| <td align="center">13</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">22</td> | ||||
| <td align="center">19</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">23</td> | ||||
| <td align="center">1</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">24</td> | ||||
| <td align="center">6</td> | ||||
| <td align="center">4</td> | ||||
| <td align="center">16</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">25</td> | ||||
| <td align="center">10</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">26</td> | ||||
| <td align="center">16</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">27</td> | ||||
| <td align="center">28</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">28</td> | ||||
| <td align="center">27</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">29</td> | ||||
| <td align="center">26</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">30</td> | ||||
| <td align="center">25</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <tr> | ||||
| <td align="center">31</td> | ||||
| <td align="center">24</td> | ||||
| <td align="center">5</td> | ||||
| <td align="center">0</td> | ||||
| </tr> | ||||
| <t> Clarify handling of reserved and corrupt | </tbody> | |||
| inputs. </t> | </table> | |||
| <t> Add fragment identifier considerations to the | </section> | |||
| media type registration. </t> | </section> | |||
| </list> </t> | <section anchor="changes" numbered="true" toc="default"> | |||
| </section> | <name>Changes since RFC 8478</name> | |||
| <t> The following are the changes in this document relative | ||||
| to RFC 8478:</t> | ||||
| <section anchor="ack" title="Acknowledgments"> | <ul spacing="normal"> | |||
| <t> zstd was developed by Yann Collet. </t> | ||||
| <t> Felix Handte and Nick Terrell provided feedback that went | <li> Applied errata <xref target="Err5786" format="default"/> and <xref tar | |||
| into this revision and RFC 8478. RFC 8478 also received | get="Err6303" format="default"/>. </li> | |||
| contributions from Bobo Bose-Kolanu, Kyle Nekritz, and | ||||
| David Schleimer. </t> | ||||
| </section> | ||||
| </back> | <li> | |||
| Clarified forward compatibility regarding dictionaries. </li> | ||||
| <li> | ||||
| Clarified application of Block_Maximum_Size. </li> | ||||
| <li> Added | ||||
| structured media type suffix registration. </li> | ||||
| <li> Clarified | ||||
| that the content checksum is always 4 bytes. </li> | ||||
| <li> Clarified | ||||
| handling of reserved and corrupt inputs. </li> | ||||
| <li> Added fragment | ||||
| identifier considerations to the media type registration. </li> | ||||
| </ul> | ||||
| </section> | ||||
| </rfc> | <section anchor="ack" numbered="false" | |||
| toc="default"> <name>Acknowledgments</name> | ||||
| <t>zstd was | ||||
| developed by <contact fullname="Yann Collet"/>. </t> <t><contact fullname= | ||||
| "Felix | ||||
| Handte"/> and <contact fullname="Nick Terrell"/> provided | ||||
| feedback that went into this revision and RFC 8478. RFC 8478 | ||||
| also received contributions from <contact fullname="Bobo | ||||
| Bose-Kolanu" />, <contact fullname="Kyle Nekritz" />, and | ||||
| <contact fullname="David Schleimer"/>. </t> </section> </back> | ||||
| </rfc> | ||||
| End of changes. 377 change blocks. | ||||
| 2472 lines changed or deleted | 3514 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||