| draft-ietf-sframe-encv1.txt | rfc9605.txt | |||
|---|---|---|---|---|
| sframe E. Omara | Internet Engineering Task Force (IETF) E. Omara | |||
| Internet-Draft Apple | Request for Comments: 9605 Apple | |||
| Intended status: Standards Track J. Uberti | Category: Standards Track J. Uberti | |||
| Expires: 26 December 2024 Google | ISSN: 2070-1721 Fixie.ai | |||
| S. Murillo | S. G. Murillo | |||
| CoSMo Software | CoSMo Software | |||
| R. L. Barnes, Ed. | R. Barnes, Ed. | |||
| Cisco | Cisco | |||
| Y. Fablet | Y. Fablet | |||
| Apple | Apple | |||
| 24 June 2024 | July 2024 | |||
| Secure Frame (SFrame) | Secure Frame (SFrame): Lightweight Authenticated Encryption for Real- | |||
| draft-ietf-sframe-enc-latest | Time Media | |||
| Abstract | Abstract | |||
| This document describes the Secure Frame (SFrame) end-to-end | This document describes the Secure Frame (SFrame) end-to-end | |||
| encryption and authentication mechanism for media frames in a | encryption and authentication mechanism for media frames in a | |||
| multiparty conference call, in which central media servers (Selective | multiparty conference call, in which central media servers (Selective | |||
| Forwarding Units or SFUs) can access the media metadata needed to | Forwarding Units or SFUs) can access the media metadata needed to | |||
| make forwarding decisions without having access to the actual media. | make forwarding decisions without having access to the actual media. | |||
| This mechanism differs from the Secure Real-Time Protocol (SRTP) in | This mechanism differs from the Secure Real-Time Protocol (SRTP) in | |||
| that it is independent of RTP (thus compatible with non-RTP media | that it is independent of RTP (thus compatible with non-RTP media | |||
| transport) and can be applied to whole media frames in order to be | transport) and can be applied to whole media frames in order to be | |||
| more bandwidth efficient. | more bandwidth efficient. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
| provisions of BCP 78 and BCP 79. | ||||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
| and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
| time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
| material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
| Internet Standards is available in Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 26 December 2024. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9605. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
| 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Terminology | |||
| 3. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 3. Goals | |||
| 4. SFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 4. SFrame | |||
| 4.1. Application Context . . . . . . . . . . . . . . . . . . . 5 | 4.1. Application Context | |||
| 4.2. SFrame Ciphertext . . . . . . . . . . . . . . . . . . . . 7 | 4.2. SFrame Ciphertext | |||
| 4.3. SFrame Header . . . . . . . . . . . . . . . . . . . . . . 7 | 4.3. SFrame Header | |||
| 4.4. Encryption Schema . . . . . . . . . . . . . . . . . . . . 9 | 4.4. Encryption Schema | |||
| 4.4.1. Key Selection . . . . . . . . . . . . . . . . . . . . 10 | 4.4.1. Key Selection | |||
| 4.4.2. Key Derivation . . . . . . . . . . . . . . . . . . . 10 | 4.4.2. Key Derivation | |||
| 4.4.3. Encryption . . . . . . . . . . . . . . . . . . . . . 11 | 4.4.3. Encryption | |||
| 4.4.4. Decryption . . . . . . . . . . . . . . . . . . . . . 13 | 4.4.4. Decryption | |||
| 4.5. Cipher Suites . . . . . . . . . . . . . . . . . . . . . . 15 | 4.5. Cipher Suites | |||
| 4.5.1. AES-CTR with SHA2 . . . . . . . . . . . . . . . . . . 16 | 4.5.1. AES-CTR with SHA2 | |||
| 5. Key Management . . . . . . . . . . . . . . . . . . . . . . . 18 | 5. Key Management | |||
| 5.1. Sender Keys . . . . . . . . . . . . . . . . . . . . . . . 19 | 5.1. Sender Keys | |||
| 5.2. MLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 | 5.2. MLS | |||
| 6. Media Considerations . . . . . . . . . . . . . . . . . . . . 22 | 6. Media Considerations | |||
| 6.1. Selective Forwarding Units . . . . . . . . . . . . . . . 22 | 6.1. Selective Forwarding Units | |||
| 6.1.1. LastN and RTP Stream Reuse . . . . . . . . . . . . . 23 | 6.1.1. RTP Stream Reuse | |||
| 6.1.2. Simulcast . . . . . . . . . . . . . . . . . . . . . . 23 | 6.1.2. Simulcast | |||
| 6.1.3. SVC . . . . . . . . . . . . . . . . . . . . . . . . . 23 | 6.1.3. Scalable Video Coding (SVC) | |||
| 6.2. Video Key Frames . . . . . . . . . . . . . . . . . . . . 23 | 6.2. Video Key Frames | |||
| 6.3. Partial Decoding . . . . . . . . . . . . . . . . . . . . 24 | 6.3. Partial Decoding | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 24 | 7. Security Considerations | |||
| 7.1. No Header Confidentiality . . . . . . . . . . . . . . . . 24 | 7.1. No Header Confidentiality | |||
| 7.2. No per-Sender Authentication . . . . . . . . . . . . . . 25 | 7.2. No Per-Sender Authentication | |||
| 7.3. Key Management . . . . . . . . . . . . . . . . . . . . . 25 | 7.3. Key Management | |||
| 7.4. Replay . . . . . . . . . . . . . . . . . . . . . . . . . 25 | 7.4. Replay | |||
| 7.5. Risks Due to Short Tags . . . . . . . . . . . . . . . . . 25 | 7.5. Risks Due to Short Tags | |||
| 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 | 8. IANA Considerations | |||
| 8.1. SFrame Cipher Suites . . . . . . . . . . . . . . . . . . 27 | 8.1. SFrame Cipher Suites | |||
| 9. Application Responsibilities | ||||
| 9. Application Responsibilities . . . . . . . . . . . . . . . . 28 | 9.1. Header Value Uniqueness | |||
| 9.1. Header Value Uniqueness . . . . . . . . . . . . . . . . . 29 | 9.2. Key Management Framework | |||
| 9.2. Key Management Framework . . . . . . . . . . . . . . . . 29 | 9.3. Anti-Replay | |||
| 9.3. Anti-Replay . . . . . . . . . . . . . . . . . . . . . . . 30 | 9.4. Metadata | |||
| 9.4. Metadata . . . . . . . . . . . . . . . . . . . . . . . . 30 | 10. References | |||
| 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 | 10.1. Normative References | |||
| 10.1. Normative References . . . . . . . . . . . . . . . . . . 30 | 10.2. Informative References | |||
| 10.2. Informative References . . . . . . . . . . . . . . . . . 31 | Appendix A. Example API | |||
| Appendix A. Example API . . . . . . . . . . . . . . . . . . . . 32 | Appendix B. Overhead Analysis | |||
| Appendix B. Overhead Analysis . . . . . . . . . . . . . . . . . 34 | B.1. Assumptions | |||
| B.1. Assumptions . . . . . . . . . . . . . . . . . . . . . . . 35 | B.2. Audio | |||
| B.2. Audio . . . . . . . . . . . . . . . . . . . . . . . . . . 35 | B.3. Video | |||
| B.3. Video . . . . . . . . . . . . . . . . . . . . . . . . . . 36 | B.4. Conferences | |||
| B.4. Conferences . . . . . . . . . . . . . . . . . . . . . . . 38 | B.5. SFrame over RTP | |||
| B.5. SFrame over RTP . . . . . . . . . . . . . . . . . . . . . 38 | Appendix C. Test Vectors | |||
| Appendix C. Test Vectors . . . . . . . . . . . . . . . . . . . . 40 | C.1. Header Encoding/Decoding | |||
| C.1. Header Encoding/Decoding . . . . . . . . . . . . . . . . 41 | C.2. AEAD Encryption/Decryption Using AES-CTR and HMAC | |||
| C.2. AEAD Encryption/Decryption Using AES-CTR and HMAC . . . . 65 | C.3. SFrame Encryption/Decryption | |||
| C.3. SFrame Encryption/Decryption . . . . . . . . . . . . . . 67 | Acknowledgements | |||
| Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 72 | Contributors | |||
| Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . 72 | Authors' Addresses | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 72 | ||||
| 1. Introduction | 1. Introduction | |||
| Modern multiparty video call systems use Selective Forwarding Unit | Modern multiparty video call systems use Selective Forwarding Unit | |||
| (SFU) servers to efficiently route media streams to call endpoints | (SFU) servers to efficiently route media streams to call endpoints | |||
| based on factors such as available bandwidth, desired video size, | based on factors such as available bandwidth, desired video size, | |||
| codec support, and other factors. An SFU typically does not need | codec support, and other factors. An SFU typically does not need | |||
| access to the media content of the conference, which allows the media | access to the media content of the conference, which allows the media | |||
| to be encrypted "end to end" so that it cannot be decrypted by the | to be encrypted "end to end" so that it cannot be decrypted by the | |||
| SFU. In order for the SFU to work properly, though, it usually needs | SFU. In order for the SFU to work properly, though, it usually needs | |||
| skipping to change at page 4, line 22 ¶ | skipping to change at line 158 ¶ | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
| BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| MAC: Message Authentication Code | MAC: Message Authentication Code | |||
| E2EE: End-to-End Encryption | E2EE: End-to-End Encryption | |||
| HBH: Hop-By-Hop | HBH: Hop-by-Hop | |||
| We use "Selective Forwarding Unit (SFU)" and "media stream" in a less | We use "Selective Forwarding Unit (SFU)" and "media stream" in a less | |||
| formal sense than in [RFC7656]. An SFU is a selective switching | formal sense than in [RFC7656]. An SFU is a selective switching | |||
| function for media payloads, and a media stream a sequence of media | function for media payloads, and a media stream is a sequence of | |||
| payloads, in both cases regardless of whether those media payloads | media payloads, regardless of whether those media payloads are | |||
| are transported over RTP or some other protocol. | transported over RTP or some other protocol. | |||
| 3. Goals | 3. Goals | |||
| SFrame is designed to be a suitable E2EE protection scheme for | SFrame is designed to be a suitable E2EE protection scheme for | |||
| conference call media in a broad range of scenarios, as outlined by | conference call media in a broad range of scenarios, as outlined by | |||
| the following goals: | the following goals: | |||
| 1. Provide a secure E2EE mechanism for audio and video in conference | 1. Provide a secure E2EE mechanism for audio and video in conference | |||
| calls that can be used with arbitrary SFU servers. | calls that can be used with arbitrary SFU servers. | |||
| 2. Decouple media encryption from key management to allow SFrame to | 2. Decouple media encryption from key management to allow SFrame to | |||
| be used with an arbitrary key management system. | be used with an arbitrary key management system. | |||
| 3. Minimize packet expansion to allow successful conferencing in as | 3. Minimize packet expansion to allow successful conferencing in as | |||
| many network conditions as possible. | many network conditions as possible. | |||
| 4. Independence from the underlying transport, including use in non- | 4. Decouple the media encryption framework from the underlying | |||
| RTP transports, e.g., WebTransport [I-D.ietf-webtrans-overview]. | transport, allowing use in non-RTP scenarios, e.g., WebTransport | |||
| [WEBTRANSPORT]. | ||||
| 5. When used with RTP and its associated error-resilience | 5. When used with RTP and its associated error-resilience | |||
| mechanisms, i.e., RTX and Forward Error Correction (FEC), require | mechanisms, i.e., RTX and Forward Error Correction (FEC), require | |||
| no special handling for RTX and FEC packets. | no special handling for RTX and FEC packets. | |||
| 6. Minimize the changes needed in SFU servers. | 6. Minimize the changes needed in SFU servers. | |||
| 7. Minimize the changes needed in endpoints. | 7. Minimize the changes needed in endpoints. | |||
| 8. Work with the most popular audio and video codecs used in | 8. Work with the most popular audio and video codecs used in | |||
| skipping to change at page 5, line 23 ¶ | skipping to change at line 209 ¶ | |||
| E2EE, is simple to implement, has no dependencies on RTP, and | E2EE, is simple to implement, has no dependencies on RTP, and | |||
| minimizes encryption bandwidth overhead. This section describes how | minimizes encryption bandwidth overhead. This section describes how | |||
| the mechanism works and includes details of how applications utilize | the mechanism works and includes details of how applications utilize | |||
| SFrame for media protection as well as the actual mechanics of E2EE | SFrame for media protection as well as the actual mechanics of E2EE | |||
| for protecting media. | for protecting media. | |||
| 4.1. Application Context | 4.1. Application Context | |||
| SFrame is a general encryption framing, intended to be used as an | SFrame is a general encryption framing, intended to be used as an | |||
| E2EE layer over an underlying HBH-encrypted transport such as SRTP or | E2EE layer over an underlying HBH-encrypted transport such as SRTP or | |||
| QUIC [RFC3711][I-D.ietf-moq-transport]. | QUIC [RFC3711][MOQ-TRANSPORT]. | |||
| The scale at which SFrame encryption is applied to media determines | The scale at which SFrame encryption is applied to media determines | |||
| the overall amount of overhead that SFrame adds to the media stream | the overall amount of overhead that SFrame adds to the media stream | |||
| as well as the engineering complexity involved in integrating SFrame | as well as the engineering complexity involved in integrating SFrame | |||
| into a particular environment. Two patterns are common: using SFrame | into a particular environment. Two patterns are common: using SFrame | |||
| to encrypt either whole media frames (per frame) or individual | to encrypt either whole media frames (per frame) or individual | |||
| transport-level media payloads (per packet). | transport-level media payloads (per packet). | |||
| For example, Figure 1 shows a typical media sender stack that takes | For example, Figure 1 shows a typical media sender stack that takes | |||
| media from some source, encodes it into frames, divides those frames | media from some source, encodes it into frames, divides those frames | |||
| skipping to change at page 7, line 35 ¶ | skipping to change at line 314 ¶ | |||
| | | | | | | | | | | |||
| | | | | | | | | | | |||
| | | | | | | | | | | |||
| +->+-------------------------------------------------------+<-+ | +->+-------------------------------------------------------+<-+ | |||
| | | Authentication Tag | | | | | Authentication Tag | | | |||
| | +-------------------------------------------------------+ | | | +-------------------------------------------------------+ | | |||
| | | | | | | |||
| | | | | | | |||
| +--- Encrypted Portion Authenticated Portion ---+ | +--- Encrypted Portion Authenticated Portion ---+ | |||
| Figure 2: Structure of an SFrame Ciphertext | ||||
| When SFrame is applied per packet, the payload of each packet will be | When SFrame is applied per packet, the payload of each packet will be | |||
| an SFrame ciphertext. When SFrame is applied per frame, the SFrame | an SFrame ciphertext. When SFrame is applied per frame, the SFrame | |||
| ciphertext representing an encrypted frame will span several packets, | ciphertext representing an encrypted frame will span several packets, | |||
| with the header appearing in the first packet and the authentication | with the header appearing in the first packet and the authentication | |||
| tag in the last packet. It is the responsibility of the application | tag in the last packet. It is the responsibility of the application | |||
| to reassemble an encrypted frame from individual packets, accounting | to reassemble an encrypted frame from individual packets, accounting | |||
| for packet loss and reordering as necessary. | for packet loss and reordering as necessary. | |||
| 4.3. SFrame Header | 4.3. SFrame Header | |||
| The SFrame header specifies two values from which encryption | The SFrame header specifies two values from which encryption | |||
| parameters are derived: | parameters are derived: | |||
| * A Key ID (KID) that determines which encryption key should be used | * A Key ID (KID) that determines which encryption key should be used | |||
| * A counter (CTR) that is used to construct the nonce for the | * A Counter (CTR) that is used to construct the nonce for the | |||
| encryption | encryption | |||
| Applications MUST ensure that each (KID, CTR) combination is used for | Applications MUST ensure that each (KID, CTR) combination is used for | |||
| exactly one SFrame encryption operation. A typical approach to | exactly one SFrame encryption operation. A typical approach to | |||
| achieve this guarantee is outlined in Section 9.1. | achieve this guarantee is outlined in Section 9.1. | |||
| Config Byte | Config Byte | |||
| | | | | |||
| .-----' '-----. | .-----' '-----. | |||
| | | | | | | |||
| 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
| +-+-+-+-+-+-+-+-+------------+------------+ | +-+-+-+-+-+-+-+-+------------+------------+ | |||
| |X| K |Y| C | KID... | CTR... | | |X| K |Y| C | KID... | CTR... | | |||
| +-+-+-+-+-+-+-+-+------------+------------+ | +-+-+-+-+-+-+-+-+------------+------------+ | |||
| Figure 2: SFrame Header | Figure 3: SFrame Header | |||
| The SFrame header has the overall structure shown in Figure 2. The | The SFrame header has the overall structure shown in Figure 3. The | |||
| first byte is a "config byte", with the following fields: | first byte is a "config byte", with the following fields: | |||
| Extended Key ID Flag (X, 1 bit): Indicates if the K field contains | Extended KID Flag (X, 1 bit): Indicates if the K field contains the | |||
| the Key ID or the Key ID length. | KID or the KID length. | |||
| Key or Key Length (K, 3 bits): If the X flag is set to 0, this field | KID or KID Length (K, 3 bits): If the X flag is set to 0, this field | |||
| contains the Key ID. If the X flag is set to 1, then it contains | contains the KID. If the X flag is set to 1, then it contains the | |||
| the length of the Key ID, minus one. | length of the KID, minus one. | |||
| Extended Counter Flag (Y, 1 bit): Indicates if the C field contains | Extended CTR Flag (Y, 1 bit): Indicates if the C field contains the | |||
| the counter or the counter length. | CTR or the CTR length. | |||
| Counter or Counter Length (C, 3 bits): This field contains the | CTR or CTR Length (C, 3 bits): This field contains the CTR if the Y | |||
| counter (CTR) if the Y flag is set to 0, or the counter length, | flag is set to 0, or the CTR length, minus one, if set to 1. | |||
| minus one, if set to 1. | ||||
| The Key ID and Counter fields are encoded as compact unsigned | The KID and CTR fields are encoded as compact unsigned integers in | |||
| integers in network (big-endian) byte order. If the value of one of | network (big-endian) byte order. If the value of one of these fields | |||
| these fields is in the range 0-7, then the value is carried in the | is in the range 0-7, then the value is carried in the corresponding | |||
| corresponding bits of the config byte (K or C) and the corresponding | bits of the config byte (K or C) and the corresponding flag (X or Y) | |||
| flag (X or Y) is set to zero. Otherwise, the value MUST be encoded | is set to zero. Otherwise, the value MUST be encoded with the | |||
| with the minimum number of bytes required and appended after the | minimum number of bytes required and appended after the config byte, | |||
| config byte, with the Key ID first and Counter second. The header | with the KID first and CTR second. The header field (K or C) is set | |||
| field (K or C) is set to the number of bytes in the encoded value, | to the number of bytes in the encoded value, minus one. The value | |||
| minus one. The value 000 represents a length of 1, 001 a length of | 000 represents a length of 1, 001 a length of 2, etc. This allows a | |||
| 2, etc. This allows a 3-bit length field to represent the value | 3-bit length field to represent the value lengths 1-8. | |||
| lengths 1-8. | ||||
| The SFrame header can thus take one of the four forms shown in | The SFrame header can thus take one of the four forms shown in | |||
| Figure 3, depending on which of the X and Y flags are set. | Figure 4, depending on which of the X and Y flags are set. | |||
| KID < 8, CTR < 8: | KID < 8, CTR < 8: | |||
| +-+-----+-+-----+ | +-+-----+-+-----+ | |||
| |0| KID |0| CTR | | |0| KID |0| CTR | | |||
| +-+-----+-+-----+ | +-+-----+-+-----+ | |||
| KID < 8, CTR >= 8: | KID < 8, CTR >= 8: | |||
| +-+-----+-+-----+------------------------+ | +-+-----+-+-----+------------------------+ | |||
| |0| KID |1|CLEN | CTR... (length=CLEN) | | |0| KID |1|CLEN | CTR... (length=CLEN) | | |||
| +-+-----+-+-----+------------------------+ | +-+-----+-+-----+------------------------+ | |||
| skipping to change at page 9, line 25 ¶ | skipping to change at line 399 ¶ | |||
| KID >= 8, CTR < 8: | KID >= 8, CTR < 8: | |||
| +-+-----+-+-----+------------------------+ | +-+-----+-+-----+------------------------+ | |||
| |1|KLEN |0| CTR | KID... (length=KLEN) | | |1|KLEN |0| CTR | KID... (length=KLEN) | | |||
| +-+-----+-+-----+------------------------+ | +-+-----+-+-----+------------------------+ | |||
| KID >= 8, CTR >= 8: | KID >= 8, CTR >= 8: | |||
| +-+-----+-+-----+------------------------+------------------------+ | +-+-----+-+-----+------------------------+------------------------+ | |||
| |1|KLEN |1|CLEN | KID... (length=KLEN) | CTR... (length=CLEN) | | |1|KLEN |1|CLEN | KID... (length=KLEN) | CTR... (length=CLEN) | | |||
| +-+-----+-+-----+------------------------+------------------------+ | +-+-----+-+-----+------------------------+------------------------+ | |||
| Figure 3: Forms of Encoded SFrame Header | Figure 4: Forms of Encoded SFrame Header | |||
| 4.4. Encryption Schema | 4.4. Encryption Schema | |||
| SFrame encryption uses an AEAD encryption algorithm and hash function | SFrame encryption uses an AEAD encryption algorithm and hash function | |||
| defined by the cipher suite in use (see Section 4.5). We will refer | defined by the cipher suite in use (see Section 4.5). We will refer | |||
| to the following aspects of the AEAD and the hash algorithm below: | to the following aspects of the AEAD and the hash algorithm below: | |||
| * AEAD.Encrypt and AEAD.Decrypt - The encryption and decryption | * AEAD.Encrypt and AEAD.Decrypt - The encryption and decryption | |||
| functions for the AEAD. We follow the convention of RFC 5116 | functions for the AEAD. We follow the convention of RFC 5116 | |||
| [RFC5116] and consider the authentication tag part of the | [RFC5116] and consider the authentication tag part of the | |||
| skipping to change at page 9, line 49 ¶ | skipping to change at line 423 ¶ | |||
| * AEAD.Nk - The size in bytes of a key for the encryption algorithm | * AEAD.Nk - The size in bytes of a key for the encryption algorithm | |||
| * AEAD.Nn - The size in bytes of a nonce for the encryption | * AEAD.Nn - The size in bytes of a nonce for the encryption | |||
| algorithm | algorithm | |||
| * AEAD.Nt - The overhead in bytes of the encryption algorithm | * AEAD.Nt - The overhead in bytes of the encryption algorithm | |||
| (typically the size of a "tag" that is added to the plaintext) | (typically the size of a "tag" that is added to the plaintext) | |||
| * AEAD.Nka - For cipher suites using the compound AEAD described in | * AEAD.Nka - For cipher suites using the compound AEAD described in | |||
| Section 4.5.1, the size in bytes of a key for the underlying | Section 4.5.1, the size in bytes of a key for the underlying | |||
| Advanced Encryption Standard Counter Mode (AES-CTR) algorithm | encryption algorithm | |||
| * Hash.Nh - The size in bytes of the output of the hash function | * Hash.Nh - The size in bytes of the output of the hash function | |||
| 4.4.1. Key Selection | 4.4.1. Key Selection | |||
| Each SFrame encryption or decryption operation is premised on a | Each SFrame encryption or decryption operation is premised on a | |||
| single secret base_key, which is labeled with an integer KID value | single secret base_key, which is labeled with an integer KID value | |||
| signaled in the SFrame header. | signaled in the SFrame header. | |||
| The sender and receivers need to agree on which base_key should be | The sender and receivers need to agree on which base_key should be | |||
| skipping to change at page 10, line 23 ¶ | skipping to change at line 445 ¶ | |||
| on whether a base_key will be used for encryption or decryption only. | on whether a base_key will be used for encryption or decryption only. | |||
| The process for provisioning base_key values and their KID values is | The process for provisioning base_key values and their KID values is | |||
| beyond the scope of this specification, but its security properties | beyond the scope of this specification, but its security properties | |||
| will bound the assurances that SFrame provides. For example, if | will bound the assurances that SFrame provides. For example, if | |||
| SFrame is used to provide E2E security against intermediary media | SFrame is used to provide E2E security against intermediary media | |||
| nodes, then SFrame keys need to be negotiated in a way that does not | nodes, then SFrame keys need to be negotiated in a way that does not | |||
| make them accessible to these intermediaries. | make them accessible to these intermediaries. | |||
| For each known KID value, the client stores the corresponding | For each known KID value, the client stores the corresponding | |||
| symmetric key base_key. For keys that can be used for encryption, | symmetric key base_key. For keys that can be used for encryption, | |||
| the client also stores the next counter value CTR to be used when | the client also stores the next CTR value to be used when encrypting | |||
| encrypting (initially 0). | (initially 0). | |||
| When encrypting a plaintext, the application specifies which KID is | When encrypting a plaintext, the application specifies which KID is | |||
| to be used, and the counter is incremented after successful | to be used, and the CTR value is incremented after successful | |||
| encryption. When decrypting, the base_key for decryption is selected | encryption. When decrypting, the base_key for decryption is selected | |||
| from the available keys using the KID value in the SFrame header. | from the available keys using the KID value in the SFrame header. | |||
| A given base_key MUST NOT be used for encryption by multiple senders. | A given base_key MUST NOT be used for encryption by multiple senders. | |||
| Such reuse would result in multiple encrypted frames being generated | Such reuse would result in multiple encrypted frames being generated | |||
| with the same (key, nonce) pair, which harms the protections provided | with the same (key, nonce) pair, which harms the protections provided | |||
| by many AEAD algorithms. Implementations MUST mark each base_key as | by many AEAD algorithms. Implementations MUST mark each base_key as | |||
| usable for encryption or decryption, never both. | usable for encryption or decryption, never both. | |||
| Note that the set of available keys might change over the lifetime of | Note that the set of available keys might change over the lifetime of | |||
| skipping to change at page 11, line 9 ¶ | skipping to change at line 478 ¶ | |||
| SFrame encryption and decryption use a key and salt derived from the | SFrame encryption and decryption use a key and salt derived from the | |||
| base_key associated with a KID. Given a base_key value, the key and | base_key associated with a KID. Given a base_key value, the key and | |||
| salt are derived using HMAC-based Key Derivation Function (HKDF) | salt are derived using HMAC-based Key Derivation Function (HKDF) | |||
| [RFC5869] as follows: | [RFC5869] as follows: | |||
| def derive_key_salt(KID, base_key): | def derive_key_salt(KID, base_key): | |||
| sframe_secret = HKDF-Extract("", base_key) | sframe_secret = HKDF-Extract("", base_key) | |||
| sframe_key_label = "SFrame 1.0 Secret key " + KID + cipher_suite | sframe_key_label = "SFrame 1.0 Secret key " + KID + cipher_suite | |||
| sframe_key = HKDF-Expand(sframe_secret, sframe_key_label, AEAD.Nk) | sframe_key = | |||
| HKDF-Expand(sframe_secret, sframe_key_label, AEAD.Nk) | ||||
| sframe_salt_label = "SFrame 1.0 Secret salt " + KID + cipher_suite | sframe_salt_label = "SFrame 1.0 Secret salt " + KID + cipher_suite | |||
| sframe_salt = HKDF-Expand(sframe_secret, sframe_salt_label, AEAD.Nn) | sframe_salt = | |||
| HKDF-Expand(sframe_secret, sframe_salt_label, AEAD.Nn) | ||||
| return sframe_key, sframe_salt | return sframe_key, sframe_salt | |||
| In the derivation of sframe_secret: | In the derivation of sframe_secret: | |||
| * The + operator represents concatenation of byte strings. | * The + operator represents concatenation of byte strings. | |||
| * The KID value is encoded as an 8-byte big-endian integer, not the | * The KID value is encoded as an 8-byte big-endian integer, not the | |||
| compressed form used in the SFrame header. | compressed form used in the SFrame header. | |||
| * The cipher_suite value is a 2-byte big-endian integer representing | * The cipher_suite value is a 2-byte big-endian integer representing | |||
| the cipher suite in use (see Section 8.1). | the cipher suite in use (see Section 8.1). | |||
| The hash function used for HKDF is determined by the cipher suite in | The hash function used for HKDF is determined by the cipher suite in | |||
| use. | use. | |||
| 4.4.3. Encryption | 4.4.3. Encryption | |||
| SFrame encryption uses the AEAD encryption algorithm for the cipher | SFrame encryption uses the AEAD encryption algorithm for the cipher | |||
| suite in use. The key for the encryption is the sframe_key and the | suite in use. The key for the encryption is the sframe_key. The | |||
| nonce is formed by XORing the sframe_salt with the current counter, | nonce is formed by first XORing the sframe_salt with the current CTR | |||
| encoded as a big-endian integer of length AEAD.Nn. | value, and then encoding the result as a big-endian integer of length | |||
| AEAD.Nn. | ||||
| The encryptor forms an SFrame header using the CTR and KID values | The encryptor forms an SFrame header using the CTR and KID values | |||
| provided. The encoded header is provided as AAD to the AEAD | provided. The encoded header is provided as AAD to the AEAD | |||
| encryption operation, together with application-provided metadata | encryption operation, together with application-provided metadata | |||
| about the encrypted media (see Section 9.4). | about the encrypted media (see Section 9.4). | |||
| def encrypt(CTR, KID, metadata, plaintext): | def encrypt(CTR, KID, metadata, plaintext): | |||
| sframe_key, sframe_salt = key_store[KID] | sframe_key, sframe_salt = key_store[KID] | |||
| # encode_big_endian(x, n) produces an n-byte string encoding the | # encode_big_endian(x, n) produces an n-byte string encoding the | |||
| skipping to change at page 13, line 42 ¶ | skipping to change at line 572 ¶ | |||
| | +---------------+ | | | +---------------+ | | |||
| +-------------->| SFrame Header | | | +-------------->| SFrame Header | | | |||
| +---------------+ | | +---------------+ | | |||
| | | | | | | | | |||
| | |<----+ | | |<----+ | |||
| | ciphertext | | | ciphertext | | |||
| | | | | | | |||
| | | | | | | |||
| +---------------+ | +---------------+ | |||
| Figure 4: Encrypting an SFrame Ciphertext | Figure 5: Encrypting an SFrame Ciphertext | |||
| 4.4.4. Decryption | 4.4.4. Decryption | |||
| Before decrypting, a receiver needs to assemble a full SFrame | Before decrypting, a receiver needs to assemble a full SFrame | |||
| ciphertext. When an SFrame ciphertext is fragmented into multiple | ciphertext. When an SFrame ciphertext is fragmented into multiple | |||
| parts for transport (e.g., a whole encrypted frame sent in multiple | parts for transport (e.g., a whole encrypted frame sent in multiple | |||
| SRTP packets), the receiving client collects all the fragments of the | SRTP packets), the receiving client collects all the fragments of the | |||
| ciphertext, using appropriate sequencing and start/end markers in the | ciphertext, using appropriate sequencing and start/end markers in the | |||
| transport. Once all of the required fragments are available, the | transport. Once all of the required fragments are available, the | |||
| client reassembles them into the SFrame ciphertext, then it passes | client reassembles them into the SFrame ciphertext and passes the | |||
| the ciphertext to SFrame for decryption. | ciphertext to SFrame for decryption. | |||
| The KID field in the SFrame header is used to find the right key and | The KID field in the SFrame header is used to find the right key and | |||
| salt for the encrypted frame, and the CTR field is used to construct | salt for the encrypted frame, and the CTR field is used to construct | |||
| the nonce. The SFrame decryption procedure is as follows: | the nonce. The SFrame decryption procedure is as follows: | |||
| def decrypt(metadata, sframe_ciphertext): | def decrypt(metadata, sframe_ciphertext): | |||
| KID, CTR, header, ciphertext = parse_ciphertext(sframe_ciphertext) | KID, CTR, header, ciphertext = parse_ciphertext(sframe_ciphertext) | |||
| sframe_key, sframe_salt = key_store[KID] | sframe_key, sframe_salt = key_store[KID] | |||
| skipping to change at page 14, line 28 ¶ | skipping to change at line 607 ¶ | |||
| return AEAD.Decrypt(sframe_key, nonce, aad, ciphertext) | return AEAD.Decrypt(sframe_key, nonce, aad, ciphertext) | |||
| If a ciphertext fails to decrypt because there is no key available | If a ciphertext fails to decrypt because there is no key available | |||
| for the KID in the SFrame header, the client MAY buffer the | for the KID in the SFrame header, the client MAY buffer the | |||
| ciphertext and retry decryption once a key with that KID is received. | ciphertext and retry decryption once a key with that KID is received. | |||
| If a ciphertext fails to decrypt for any other reason, the client | If a ciphertext fails to decrypt for any other reason, the client | |||
| MUST discard the ciphertext. Invalid ciphertexts SHOULD be discarded | MUST discard the ciphertext. Invalid ciphertexts SHOULD be discarded | |||
| in a way that is indistinguishable (to an external observer) from | in a way that is indistinguishable (to an external observer) from | |||
| having processed a valid ciphertext. In other words, the SFrame | having processed a valid ciphertext. In other words, the SFrame | |||
| decrypt operation should be constant time, regardless of whether | decrypt operation should take the same amount of time regardless of | |||
| decryption succeeds or fails. | whether decryption succeeds or fails. | |||
| SFrame Ciphertext | SFrame Ciphertext | |||
| +---------------+ | +---------------+ | |||
| +---------------| SFrame Header | | +---------------| SFrame Header | | |||
| | +---------------+ | | +---------------+ | |||
| | | | | | | | | |||
| | | |-----+ | | | |-----+ | |||
| | | ciphertext | | | | | ciphertext | | | |||
| | | | | | | | | | | |||
| | | | | | | | | | | |||
| skipping to change at page 15, line 43 ¶ | skipping to change at line 648 ¶ | |||
| | | | | |||
| V | V | |||
| +---------------+ | +---------------+ | |||
| | | | | | | |||
| | | | | | | |||
| | plaintext | | | plaintext | | |||
| | | | | | | |||
| | | | | | | |||
| +---------------+ | +---------------+ | |||
| Figure 5: Decrypting an SFrame Ciphertext | Figure 6: Decrypting an SFrame Ciphertext | |||
| 4.5. Cipher Suites | 4.5. Cipher Suites | |||
| Each SFrame session uses a single cipher suite that specifies the | Each SFrame session uses a single cipher suite that specifies the | |||
| following primitives: | following primitives: | |||
| * A hash function used for key derivation | * A hash function used for key derivation | |||
| * An AEAD encryption algorithm [RFC5116] used for frame encryption, | * An AEAD encryption algorithm [RFC5116] used for frame encryption, | |||
| optionally with a truncated authentication tag | optionally with a truncated authentication tag | |||
| This document defines the following cipher suites, with the constants | This document defines the following cipher suites, with the constants | |||
| defined in Section 4.4: | defined in Section 4.4: | |||
| +============================+====+=====+====+====+====+ | +============================+====+=====+====+====+====+ | |||
| | Name | Nh | Nka | Nk | Nn | Nt | | | Name | Nh | Nka | Nk | Nn | Nt | | |||
| +============================+====+=====+====+====+====+ | +============================+====+=====+====+====+====+ | |||
| | AES_128_CTR_HMAC_SHA256_80 | 32 | 16 | 48 | 12 | 10 | | | AES_128_CTR_HMAC_SHA256_80 | 32 | 16 | 48 | 12 | 10 | | |||
| skipping to change at page 17, line 9 ¶ | skipping to change at line 704 ¶ | |||
| In order to allow very short tag sizes, we define a synthetic AEAD | In order to allow very short tag sizes, we define a synthetic AEAD | |||
| function using the authenticated counter mode of AES together with | function using the authenticated counter mode of AES together with | |||
| HMAC for authentication. We use an encrypt-then-MAC approach, as in | HMAC for authentication. We use an encrypt-then-MAC approach, as in | |||
| SRTP [RFC3711]. | SRTP [RFC3711]. | |||
| Before encryption or decryption, encryption and authentication | Before encryption or decryption, encryption and authentication | |||
| subkeys are derived from the single AEAD key. The overall length of | subkeys are derived from the single AEAD key. The overall length of | |||
| the AEAD key is Nka + Nh, where Nka represents the key size for the | the AEAD key is Nka + Nh, where Nka represents the key size for the | |||
| AES block cipher in use and Nh represents the output size of the hash | AES block cipher in use and Nh represents the output size of the hash | |||
| function (as in Table 1). The encryption subkey comprises the first | function (as in Section 4.4). The encryption subkey comprises the | |||
| Nka bytes and the authentication subkey comprises the remaining Nh | first Nka bytes and the authentication subkey comprises the remaining | |||
| bytes. | Nh bytes. | |||
| def derive_subkeys(sframe_key): | def derive_subkeys(sframe_key): | |||
| # The encryption key comprises the first Nka bytes | # The encryption key comprises the first Nka bytes | |||
| enc_key = sframe_key[..Nka] | enc_key = sframe_key[..Nka] | |||
| # The authentication key comprises Nh remaining bytes | # The authentication key comprises Nh remaining bytes | |||
| auth_key = sframe_key[Nka..] | auth_key = sframe_key[Nka..] | |||
| return enc_key, auth_key | return enc_key, auth_key | |||
| skipping to change at page 19, line 18 ¶ | skipping to change at line 778 ¶ | |||
| they can use it to distribute SFrame keys. Each client participating | they can use it to distribute SFrame keys. Each client participating | |||
| in a call generates a fresh base_key value that it will use to | in a call generates a fresh base_key value that it will use to | |||
| encrypt media. The client then uses the E2E-secure channel to send | encrypt media. The client then uses the E2E-secure channel to send | |||
| their encryption key to the other participants. | their encryption key to the other participants. | |||
| In this scheme, it is assumed that receivers have a signal outside of | In this scheme, it is assumed that receivers have a signal outside of | |||
| SFrame for which client has sent a given frame (e.g., an RTP | SFrame for which client has sent a given frame (e.g., an RTP | |||
| synchronization source (SSRC)). SFrame KID values are then used to | synchronization source (SSRC)). SFrame KID values are then used to | |||
| distinguish between versions of the sender's base_key. | distinguish between versions of the sender's base_key. | |||
| Key IDs in this scheme have two parts: a "key generation" and a | KID values in this scheme have two parts: a "key generation" and a | |||
| "ratchet step". Both are unsigned integers that begin at zero. The | "ratchet step". Both are unsigned integers that begin at zero. The | |||
| "key generation" increments each time the sender distributes a new | key generation increments each time the sender distributes a new key | |||
| key to receivers. The "ratchet step" is incremented each time the | to receivers. The ratchet step is incremented each time the sender | |||
| sender ratchets their key forward for forward secrecy: | ratchets their key forward for forward secrecy: | |||
| base_key[i+1] = HKDF-Expand( | base_key[i+1] = HKDF-Expand( | |||
| HKDF-Extract("", base_key[i]), | HKDF-Extract("", base_key[i]), | |||
| "SFrame 1.0 Ratchet", CipherSuite.Nh) | "SFrame 1.0 Ratchet", CipherSuite.Nh) | |||
| For compactness, we do not send the whole ratchet step. Instead, we | For compactness, we do not send the whole ratchet step. Instead, we | |||
| send only its low-order R bits, where R is a value set by the | send only its low-order R bits, where R is a value set by the | |||
| application. Different senders may use different values of R, but | application. Different senders may use different values of R, but | |||
| each receiver of a given sender needs to know what value of R is used | each receiver of a given sender needs to know what value of R is used | |||
| by the sender so that they can recognize when they need to ratchet | by the sender so that they can recognize when they need to ratchet | |||
| (vs. expecting a new key). R effectively defines a reordering | (vs. expecting a new key). R effectively defines a reordering | |||
| window, since no more than 2^R ratchet steps can be active at a given | window, since no more than 2^R ratchet steps can be active at a given | |||
| time. The key generation is sent in the remaining 64 - R bits of the | time. The key generation is sent in the remaining 64 - R bits of the | |||
| Key ID. | KID. | |||
| KID = (key_generation << R) + (ratchet_step % (1 << R)) | KID = (key_generation << R) + (ratchet_step % (1 << R)) | |||
| 64-R bits R bits | 64-R bits R bits | |||
| <---------------> <------------> | <---------------> <------------> | |||
| +-----------------+--------------+ | +-----------------+--------------+ | |||
| | Key Generation | Ratchet Step | | | Key Generation | Ratchet Step | | |||
| +-----------------+--------------+ | +-----------------+--------------+ | |||
| Figure 6: Structure of a KID in the Sender Keys Scheme | Figure 7: Structure of a KID in the Sender Keys Scheme | |||
| The sender signals such a ratchet step update by sending with a KID | The sender signals such a ratchet step update by sending with a KID | |||
| value in which the ratchet step has been incremented. A receiver who | value in which the ratchet step has been incremented. A receiver who | |||
| receives from a sender with a new KID computes the new key as above. | receives from a sender with a new KID computes the new key as above. | |||
| The old key may be kept for some time to allow for out-of-order | The old key may be kept for some time to allow for out-of-order | |||
| delivery, but should be deleted promptly. | delivery, but should be deleted promptly. | |||
| If a new participant joins in the middle of a session, they will need | If a new participant joins in the middle of a session, they will need | |||
| to receive from each sender (a) the current sender key for that | to receive from each sender (a) the current sender key for that | |||
| sender and (b) the current KID value for the sender. Evicting a | sender and (b) the current KID value for the sender. Evicting a | |||
| participant requires each sender to send a fresh sender key to all | participant requires each sender to send a fresh sender key to all | |||
| receivers. | receivers. | |||
| It is up to the application to decide when sender keys are updated. | It is the application's responsibility to decide when sender keys are | |||
| A sender key may be updated by sending a new base_key (updating the | updated. A sender key may be updated by sending a new base_key | |||
| key generation) or by hashing the current base_key (updating the | (updating the key generation) or by hashing the current base_key | |||
| ratchet step). Ratcheting the key forward is useful when adding new | (updating the ratchet step). Ratcheting the key forward is useful | |||
| receivers to an SFrame-based interaction, since it ensures that the | when adding new receivers to an SFrame-based interaction, since it | |||
| new receivers can't decrypt any media encrypted before they were | ensures that the new receivers can't decrypt any media encrypted | |||
| added. If a sender wishes to assure the opposite property when | before they were added. If a sender wishes to assure the opposite | |||
| removing a receiver (i.e., ensuring that the receiver can't decrypt | property when removing a receiver (i.e., ensuring that the receiver | |||
| media after they are removed), then the sender will need to | can't decrypt media after they are removed), then the sender will | |||
| distribute a new sender key. | need to distribute a new sender key. | |||
| 5.2. MLS | 5.2. MLS | |||
| The Messaging Layer Security (MLS) protocol provides group | The Messaging Layer Security (MLS) protocol provides group | |||
| authenticated key exchange [MLS-ARCH] [MLS-PROTO]. In principle, it | authenticated key exchange [MLS-ARCH] [MLS-PROTO]. In principle, it | |||
| could be used to instantiate the sender key scheme above, but it can | could be used to instantiate the sender key scheme above, but it can | |||
| also be used more efficiently directly. | also be used more efficiently directly. | |||
| MLS creates a linear sequence of keys, each of which is shared among | MLS creates a linear sequence of keys, each of which is shared among | |||
| the members of a group at a given point in time. When a member joins | the members of a group at a given point in time. When a member joins | |||
| skipping to change at page 20, line 49 ¶ | skipping to change at line 858 ¶ | |||
| member has a unique sframe_key and sframe_salt that it uses to | member has a unique sframe_key and sframe_salt that it uses to | |||
| encrypt with. Senders may choose any KID value within their assigned | encrypt with. Senders may choose any KID value within their assigned | |||
| set of KID values, e.g., to allow a single sender to send multiple, | set of KID values, e.g., to allow a single sender to send multiple, | |||
| uncoordinated outbound media streams. | uncoordinated outbound media streams. | |||
| base_key = MLS-Exporter("SFrame 1.0 Base Key", "", AEAD.Nk) | base_key = MLS-Exporter("SFrame 1.0 Base Key", "", AEAD.Nk) | |||
| For compactness, we do not send the whole epoch number. Instead, we | For compactness, we do not send the whole epoch number. Instead, we | |||
| send only its low-order E bits, where E is a value set by the | send only its low-order E bits, where E is a value set by the | |||
| application. E effectively defines a reordering window, since no | application. E effectively defines a reordering window, since no | |||
| more than 2^E epochs can be active at a given time. Receivers MUST | more than 2^E epochs can be active at a given time. To handle | |||
| be prepared for the epoch counter to roll over, removing an old epoch | rollover of the epoch counter, receivers MUST remove an old epoch | |||
| when a new epoch with the same E lower bits is introduced. | when a new epoch with the same low-order E bits is introduced. | |||
| Let S be the number of bits required to encode a member index in the | Let S be the number of bits required to encode a member index in the | |||
| group, i.e., the smallest value such that group_size <= (1 << S). | group, i.e., the smallest value such that group_size <= (1 << S). | |||
| The sender index is encoded in the S bits above the epoch. The | The sender index is encoded in the S bits above the epoch. The | |||
| remaining 64 - S - E bits of the KID value are a context value chosen | remaining 64 - S - E bits of the KID value are a context value chosen | |||
| by the sender (context value 0 will produce the shortest encoded | by the sender (context value 0 will produce the shortest encoded | |||
| KID). | KID). | |||
| KID = (context << (S + E)) + (sender_index << E) + (epoch % (1 << E)) | KID = (context << (S + E)) + (sender_index << E) + (epoch % (1 << E)) | |||
| 64-S-E bits S bits E bits | 64-S-E bits S bits E bits | |||
| <-----------> <------> <------> | <-----------> <------> <------> | |||
| +-------------+--------+-------+ | +-------------+--------+-------+ | |||
| | Context ID | Index | Epoch | | | Context ID | Index | Epoch | | |||
| +-------------+--------+-------+ | +-------------+--------+-------+ | |||
| Figure 7: Structure of a KID for an MLS Sender | Figure 8: Structure of a KID for an MLS Sender | |||
| Once an SFrame stack has been provisioned with the | Once an SFrame stack has been provisioned with the | |||
| sframe_epoch_secret for an epoch, it can compute the required KID | sframe_epoch_secret for an epoch, it can compute the required KID | |||
| values on demand (as well as the resulting SFrame keys/nonces derived | values on demand (as well as the resulting SFrame keys/nonces derived | |||
| from the base_key and KID) as it needs to encrypt or decrypt for a | from the base_key and KID) as it needs to encrypt or decrypt for a | |||
| given member. | given member. | |||
| ... | ... | |||
| | | | | |||
| | | | | |||
| skipping to change at page 22, line 32 ¶ | skipping to change at line 912 ¶ | |||
| | +--> context = 3 --> KID = 0xc20 | | +--> context = 3 --> KID = 0xc20 | |||
| | | | | |||
| | | | | |||
| Epoch 17 +--+-- index=33 --> KID = 0x211 | Epoch 17 +--+-- index=33 --> KID = 0x211 | |||
| | | | | | | |||
| | +-- index=51 --> KID = 0x331 | | +-- index=51 --> KID = 0x331 | |||
| | | | | |||
| | | | | |||
| ... | ... | |||
| Figure 8: An Example Sequence of KIDs for an MLS-based SFrame | Figure 9: An Example Sequence of KIDs for an MLS-based SFrame | |||
| Session (E=4; S=6, Allowing for 64 Group Members) | Session (E=4; S=6, Allowing for 64 Group Members) | |||
| 6. Media Considerations | 6. Media Considerations | |||
| 6.1. Selective Forwarding Units | 6.1. Selective Forwarding Units | |||
| SFUs (e.g., those described in Section 3.7 of [RFC7667]) receive the | SFUs (e.g., those described in Section 3.7 of [RFC7667]) receive the | |||
| media streams from each participant and select which ones should be | media streams from each participant and select which ones should be | |||
| forwarded to each of the other participants. There are several | forwarded to each of the other participants. There are several | |||
| approaches for stream selection, but in general, the SFU needs to | approaches for stream selection, but in general, the SFU needs to | |||
| access metadata associated with each frame and modify the RTP | access metadata associated with each frame and modify the RTP | |||
| information of the incoming packets when they are transmitted to the | information of the incoming packets when they are transmitted to the | |||
| received participants. | received participants. | |||
| This section describes how these normal SFU modes of operation | This section describes how these normal SFU modes of operation | |||
| interact with the E2EE provided by SFrame. | interact with the E2EE provided by SFrame. | |||
| 6.1.1. LastN and RTP Stream Reuse | 6.1.1. RTP Stream Reuse | |||
| The SFU may choose to send only a certain number of streams based on | The SFU may choose to send only a certain number of streams based on | |||
| the voice activity of the participants. To avoid the overhead | the voice activity of the participants. To avoid the overhead | |||
| involved in establishing new transport streams, the SFU may decide to | involved in establishing new transport streams, the SFU may decide to | |||
| reuse previously existing streams or even pre-allocate a predefined | reuse previously existing streams or even pre-allocate a predefined | |||
| number of streams and choose in each moment in time which participant | number of streams and choose in each moment in time which participant | |||
| media will be sent through it. | media will be sent through it. | |||
| This means that in the same transport-level stream (e.g., an RTP | This means that the same transport-level stream (e.g., an RTP stream | |||
| stream defined by either SSRC or Media Identification (MID)) may | defined by either SSRC or Media Identification (MID)) may carry media | |||
| carry media from different streams of different participants. As | from different streams of different participants. Because each | |||
| different keys are used by each participant for encoding their media, | participant uses a different key to encrypt their media, the receiver | |||
| the receiver will be able to verify which is the sender of the media | will be able to verify the sender of the media within the RTP stream | |||
| coming within the RTP stream at any given point in time, preventing | at any given point in time. Thus the receiver will correctly | |||
| the SFU trying to impersonate any of the participants with another | associate the media with the sender indicated by the authenticated | |||
| participant's media. | SFrame KID value, irrespective of how the SFU transmits the media to | |||
| the client. | ||||
| Note that in order to prevent impersonation by a malicious | Note that in order to prevent impersonation by a malicious | |||
| participant (not the SFU), a mechanism based on digital signature | participant (not the SFU), a mechanism based on digital signature | |||
| would be required. SFrame does not protect against such attacks. | would be required. SFrame does not protect against such attacks. | |||
| 6.1.2. Simulcast | 6.1.2. Simulcast | |||
| When using simulcast, the same input image will produce N different | When using simulcast, the same input image will produce N different | |||
| encoded frames (one per simulcast layer), which would be processed | encoded frames (one per simulcast layer), which would be processed | |||
| independently by the frame encryptor and assigned an unique counter | independently by the frame encryptor and assigned an unique CTR value | |||
| for each. | for each. | |||
| 6.1.3. SVC | 6.1.3. Scalable Video Coding (SVC) | |||
| In both temporal and spatial scalability, the SFU may choose to drop | In both temporal and spatial scalability, the SFU may choose to drop | |||
| layers in order to match a certain bitrate or to forward specific | layers in order to match a certain bitrate or to forward specific | |||
| media sizes or frames per second. In order to support the SFU | media sizes or frames per second. In order to support the SFU | |||
| selectively removing layers, the sender MUST encapsulate each layer | selectively removing layers, the sender MUST encapsulate each layer | |||
| in a different SFrame ciphertext. | in a different SFrame ciphertext. | |||
| 6.2. Video Key Frames | 6.2. Video Key Frames | |||
| Forward security and post-compromise security require that the E2EE | Forward security and post-compromise security require that the E2EE | |||
| keys (base keys) are updated any time a participant joins or leaves | keys (base keys) are updated any time a participant joins or leaves | |||
| the call. | the call. | |||
| The key exchange happens asynchronously and on a different path than | The key exchange happens asynchronously and on a different path than | |||
| the SFU signaling and media. So it may happen that, when a new | the SFU signaling and media. So it may happen that when a new | |||
| participant joins the call and the SFU side requests a key frame, the | participant joins the call and the SFU side requests a key frame, the | |||
| sender generates the E2EE frame with a key that is not known by the | sender generates the E2EE frame with a key that is not known by the | |||
| receiver, so it will be discarded. When the sender updates his | receiver, so it will be discarded. When the sender updates his | |||
| sending key with the new key, it will send it in a non-key frame, so | sending key with the new key, it will send it in a non-key frame, so | |||
| the receiver will be able to decrypt it, but not decode it. | the receiver will be able to decrypt it, but not decode it. | |||
| The new receiver will then re-request a key frame, but due to sender | The new receiver will then re-request a key frame, but due to sender | |||
| and SFU policies, that new key frame could take some time to be | and SFU policies, that new key frame could take some time to be | |||
| generated. | generated. | |||
| If the sender sends a key frame after the new E2EE key is in use, the | If the sender sends a key frame after the new E2EE key is in use, the | |||
| time required for the new participant to display the video is | time required for the new participant to display the video is | |||
| minimized. | minimized. | |||
| Note that this issue does not arise for media streams that do not | Note that this issue does not arise for media streams that do not | |||
| have dependencies among frames, e.g., audio streams. In these | have dependencies among frames, e.g., audio streams. In these | |||
| streams, each frame is independently decodable, so there is never a | streams, each frame is independently decodable, so a frame never | |||
| need to process together two frames that might be on two sides of a | depends on another frame that might be on the other side of a key | |||
| key rotation. | rotation. | |||
| 6.3. Partial Decoding | 6.3. Partial Decoding | |||
| Some codecs support partial decoding, where individual packets can be | Some codecs support partial decoding, where individual packets can be | |||
| decoded without waiting for the full frame to arrive. When SFrame is | decoded without waiting for the full frame to arrive. When SFrame is | |||
| applied per frame, partial decoding is not possible because the | applied per frame, partial decoding is not possible because the | |||
| decoder cannot access data until an entire frame has arrived and has | decoder cannot access data until an entire frame has arrived and has | |||
| been decrypted. | been decrypted. | |||
| 7. Security Considerations | 7. Security Considerations | |||
| 7.1. No Header Confidentiality | 7.1. No Header Confidentiality | |||
| SFrame provides integrity protection to the SFrame header (the Key ID | SFrame provides integrity protection to the SFrame header (the KID | |||
| and counter values), but it does not provide confidentiality | and CTR values), but it does not provide confidentiality protection. | |||
| protection. Parties that can observe the SFrame header may learn, | Parties that can observe the SFrame header may learn, for example, | |||
| for example, which parties are sending SFrame payloads (from KID | which parties are sending SFrame payloads (from KID values) and at | |||
| values) and at what rates (from CTR values). In cases where SFrame | what rates (from CTR values). In cases where SFrame is used for end- | |||
| is used for end-to-end security on top of hop-by-hop protections | to-end security on top of hop-by-hop protections (e.g., running over | |||
| (e.g., running over SRTP as described in Appendix B.5), the hop-by- | SRTP as described in Appendix B.5), the hop-by-hop security | |||
| hop security mechanisms provide confidentiality protection of the | mechanisms provide confidentiality protection of the SFrame header | |||
| SFrame header between hops. | between hops. | |||
| 7.2. No per-Sender Authentication | 7.2. No Per-Sender Authentication | |||
| SFrame does not provide per-sender authentication of media data. Any | SFrame does not provide per-sender authentication of media data. Any | |||
| sender in a session can send media that will be associated with any | sender in a session can send media that will be associated with any | |||
| other sender. This is because SFrame uses symmetric encryption to | other sender. This is because SFrame uses symmetric encryption to | |||
| protect media data, so that any receiver also has the keys required | protect media data, so that any receiver also has the keys required | |||
| to encrypt packets for the sender. | to encrypt packets for the sender. | |||
| 7.3. Key Management | 7.3. Key Management | |||
| The key exchange mechanism is out of scope of this document; however, | The specifics of key management are beyond the scope of this | |||
| every client SHOULD change their keys when new clients join or leave | document. However, every client SHOULD change their keys when new | |||
| the call for forward secrecy and post-compromise security. | clients join or leave the call for forward secrecy and post- | |||
| compromise security. | ||||
| 7.4. Replay | 7.4. Replay | |||
| The handling of replay is out of the scope of this document. | The handling of replay is out of the scope of this document. | |||
| However, senders MUST reject requests to encrypt multiple times with | However, senders MUST reject requests to encrypt multiple times with | |||
| the same key and nonce since several AEAD algorithms fail badly in | the same key and nonce since several AEAD algorithms fail badly in | |||
| such cases (see, e.g., Section 5.1.1 of [RFC5116]). | such cases (see, e.g., Section 5.1.1 of [RFC5116]). | |||
| 7.5. Risks Due to Short Tags | 7.5. Risks Due to Short Tags | |||
| skipping to change at page 26, line 14 ¶ | skipping to change at line 1073 ¶ | |||
| * Receivers only accept SFrame ciphertexts over HBH-secure channels | * Receivers only accept SFrame ciphertexts over HBH-secure channels | |||
| (e.g., SRTP security associations or QUIC connections). If this | (e.g., SRTP security associations or QUIC connections). If this | |||
| is the case, only an entity that is part of such a channel can | is the case, only an entity that is part of such a channel can | |||
| mount the above attack. | mount the above attack. | |||
| * The expected packet rate for a media stream is very predictable | * The expected packet rate for a media stream is very predictable | |||
| (and typically far lower than the above example). On the one | (and typically far lower than the above example). On the one | |||
| hand, attacks at this rate will succeed even less often than the | hand, attacks at this rate will succeed even less often than the | |||
| high-rate attack described above. On the other hand, the | high-rate attack described above. On the other hand, the | |||
| application may use an elevated packet-arrival rate as a signal of | application may use an elevated packet arrival rate as a signal of | |||
| a brute-force attack. This latter approach is common in other | a brute-force attack. This latter approach is common in other | |||
| settings, e.g., mitigating brute-force attacks on passwords. | settings, e.g., mitigating brute-force attacks on passwords. | |||
| * Media applications typically do not provide feedback to media | * Media applications typically do not provide feedback to media | |||
| senders as to which media packets failed to decrypt. When media- | senders as to which media packets failed to decrypt. When media- | |||
| quality feedback mechanisms are used, decryption failures will | quality feedback mechanisms are used, decryption failures will | |||
| typically appear as packet losses, but only at an aggregate level. | typically appear as packet losses, but only at an aggregate level. | |||
| * Anti-replay mechanisms (see Section 7.4) prevent the attacker from | * Anti-replay mechanisms (see Section 7.4) prevent the attacker from | |||
| reusing valid ciphertexts (either observed or guessed by the | reusing valid ciphertexts (either observed or guessed by the | |||
| skipping to change at page 26, line 39 ¶ | skipping to change at line 1098 ¶ | |||
| encrypted content is unchanged. In other words, when the above | encrypted content is unchanged. In other words, when the above | |||
| brute-force attack succeeds, it only allows the attacker to send a | brute-force attack succeeds, it only allows the attacker to send a | |||
| single SFrame ciphertext; the ciphertext cannot be reused because | single SFrame ciphertext; the ciphertext cannot be reused because | |||
| either it will have the same CTR value and be discarded as a | either it will have the same CTR value and be discarded as a | |||
| replay, or else it will have a different CTR value and its tag | replay, or else it will have a different CTR value and its tag | |||
| will no longer be valid. | will no longer be valid. | |||
| Nonetheless, without these mitigations, an application that makes use | Nonetheless, without these mitigations, an application that makes use | |||
| of short tags will be at heightened risk of forgery attacks. In many | of short tags will be at heightened risk of forgery attacks. In many | |||
| cases, it is simpler to use full-size tags and tolerate slightly | cases, it is simpler to use full-size tags and tolerate slightly | |||
| higher-bandwidth usage rather than to add the additional defenses | higher bandwidth usage rather than to add the additional defenses | |||
| necessary to safely use short tags. | necessary to safely use short tags. | |||
| 8. IANA Considerations | 8. IANA Considerations | |||
| IANA has created a new registry called "SFrame Cipher Suites" | IANA has created a new registry called "SFrame Cipher Suites" | |||
| (Section 8.1) under the "SFrame" group registry heading. Assignments | (Section 8.1) under the "SFrame" group registry heading. | |||
| are made via the Specification Required policy [RFC8126]. | ||||
| 8.1. SFrame Cipher Suites | 8.1. SFrame Cipher Suites | |||
| The "SFrame Cipher Suites" registry lists identifiers for SFrame | The "SFrame Cipher Suites" registry lists identifiers for SFrame | |||
| cipher suites as defined in Section 4.5. The cipher suite field is | cipher suites as defined in Section 4.5. The cipher suite field is | |||
| two bytes wide, so the valid cipher suites are in the range 0x0000 to | two bytes wide, so the valid cipher suites are in the range 0x0000 to | |||
| 0xFFFF. | 0xFFFF. Except as noted below, assignments are made via the | |||
| Specification Required policy [RFC8126]. | ||||
| The registration template is as follows: | The registration template is as follows: | |||
| * Value: The numeric value of the cipher suite | * Value: The numeric value of the cipher suite | |||
| * Name: The name of the cipher suite | * Name: The name of the cipher suite | |||
| * Recommended: Whether support for this cipher suite is recommended | * Recommended: Whether support for this cipher suite is recommended | |||
| by the IETF. Valid values are "Y", "N", and "D" as described in | by the IETF. Valid values are "Y", "N", and "D" as described in | |||
| Section 17.1 of [MLS-PROTO]. The default value of the | Section 17.1 of [MLS-PROTO]. The default value of the | |||
| skipping to change at page 28, line 34 ¶ | skipping to change at line 1163 ¶ | |||
| +--------+----------------------------+---+-----------+------------+ | +--------+----------------------------+---+-----------+------------+ | |||
| Table 2: SFrame Cipher Suites | Table 2: SFrame Cipher Suites | |||
| 9. Application Responsibilities | 9. Application Responsibilities | |||
| To use SFrame, an application needs to define the inputs to the | To use SFrame, an application needs to define the inputs to the | |||
| SFrame encryption and decryption operations, and how SFrame | SFrame encryption and decryption operations, and how SFrame | |||
| ciphertexts are delivered from sender to receiver (including any | ciphertexts are delivered from sender to receiver (including any | |||
| fragmentation and reassembly). In this section, we lay out | fragmentation and reassembly). In this section, we lay out | |||
| additional requirements that an implementation must meet in order for | additional requirements that an application must meet in order for | |||
| SFrame to operate securely. | SFrame to operate securely. | |||
| In general, an application using SFrame is responsible for | In general, an application using SFrame is responsible for | |||
| configuring SFrame. The application must first define when SFrame is | configuring SFrame. The application must first define when SFrame is | |||
| applied at all. When SFrame is applied, the application must define | applied at all. When SFrame is applied, the application must define | |||
| which cipher suite is to be used. If new versions of SFrame are | which cipher suite is to be used. If new versions of SFrame are | |||
| defined in the future, it will be up to the application to determine | defined in the future, it will be the application's responsibility to | |||
| which version should be used. | determine which version should be used. | |||
| This division of responsibilities is similar to the way other media | This division of responsibilities is similar to the way other media | |||
| parameters (e.g., codecs) are typically handled in media | parameters (e.g., codecs) are typically handled in media | |||
| applications, in the sense that they are set up in some signaling | applications, in the sense that they are set up in some signaling | |||
| protocol and not described in the media. Applications might find it | protocol and not described in the media. Applications might find it | |||
| useful to extend the protocols used for negotiating other media | useful to extend the protocols used for negotiating other media | |||
| parameters (e.g., Session Description Protocol (SDP) [RFC8866]) to | parameters (e.g., Session Description Protocol (SDP) [RFC8866]) to | |||
| also negotiate parameters for SFrame. | also negotiate parameters for SFrame. | |||
| 9.1. Header Value Uniqueness | 9.1. Header Value Uniqueness | |||
| skipping to change at page 29, line 27 ¶ | skipping to change at line 1203 ¶ | |||
| persistent storage, this context needs to include the last-used CTR | persistent storage, this context needs to include the last-used CTR | |||
| value. When the context is used later, the application should use | value. When the context is used later, the application should use | |||
| the stored CTR value to determine the next CTR value to be used in an | the stored CTR value to determine the next CTR value to be used in an | |||
| encryption operation, and then write the next CTR value back to | encryption operation, and then write the next CTR value back to | |||
| storage before using the CTR value for encryption. Storing the CTR | storage before using the CTR value for encryption. Storing the CTR | |||
| value before usage (vs. after) helps ensure that a storage failure | value before usage (vs. after) helps ensure that a storage failure | |||
| will not cause reuse of the same (base_key, KID, CTR) combination. | will not cause reuse of the same (base_key, KID, CTR) combination. | |||
| 9.2. Key Management Framework | 9.2. Key Management Framework | |||
| It is up to the application to provision SFrame with a mapping of KID | The application is responsible for provisioning SFrame with a mapping | |||
| values to base_key values and the resulting keys and salts. More | of KID values to base_key values and the resulting keys and salts. | |||
| importantly, the application specifies which KID values are used for | More importantly, the application specifies which KID values are used | |||
| which purposes (e.g., by which senders). An application's KID | for which purposes (e.g., by which senders). An application's KID | |||
| assignment strategy MUST be structured to assure the non-reuse | assignment strategy MUST be structured to assure the non-reuse | |||
| properties discussed in Section 9.1. | properties discussed in Section 9.1. | |||
| It is also up to the application to define a rotation schedule for | The application is also responsible for defining a rotation schedule | |||
| keys. For example, one application might have an ephemeral group for | for keys. For example, one application might have an ephemeral group | |||
| every call and keep rotating keys when endpoints join or leave the | for every call and keep rotating keys when endpoints join or leave | |||
| call, while another application could have a persistent group that | the call, while another application could have a persistent group | |||
| can be used for multiple calls and simply derives ephemeral symmetric | that can be used for multiple calls and simply derives ephemeral | |||
| keys for a specific call. | symmetric keys for a specific call. | |||
| It should be noted that KID values are not encrypted by SFrame and | It should be noted that KID values are not encrypted by SFrame and | |||
| are thus visible to any application-layer intermediaries that might | are thus visible to any application-layer intermediaries that might | |||
| handle an SFrame ciphertext. If there are application semantics | handle an SFrame ciphertext. If there are application semantics | |||
| included in KID values, then this information would be exposed to | included in KID values, then this information would be exposed to | |||
| intermediaries. For example, in the scheme of Section 5.1, the | intermediaries. For example, in the scheme of Section 5.1, the | |||
| number of ratchet steps per sender is exposed, and in the scheme of | number of ratchet steps per sender is exposed, and in the scheme of | |||
| Section 5.2, the number of epochs and the MLS sender ID of the SFrame | Section 5.2, the number of epochs and the MLS sender ID of the SFrame | |||
| sender are exposed. | sender are exposed. | |||
| skipping to change at page 30, line 21 ¶ | skipping to change at line 1242 ¶ | |||
| key and nonce. | key and nonce. | |||
| It is not mandatory to implement anti-replay on the receiver side. | It is not mandatory to implement anti-replay on the receiver side. | |||
| Receivers MAY apply time- or counter-based anti-replay mitigations. | Receivers MAY apply time- or counter-based anti-replay mitigations. | |||
| For example, Section 3.3.2 of [RFC3711] specifies a counter-based | For example, Section 3.3.2 of [RFC3711] specifies a counter-based | |||
| anti-replay mitigation, which could be adapted to use with SFrame, | anti-replay mitigation, which could be adapted to use with SFrame, | |||
| using the CTR field as the counter. | using the CTR field as the counter. | |||
| 9.4. Metadata | 9.4. Metadata | |||
| The metadata input to SFrame operations is pure application-specified | The metadata input to SFrame operations is an opaque byte string | |||
| data. As such, it is up to the application to define what | specified by the application. As such, the application needs to | |||
| information should go in the metadata input and ensure that it is | define what information should go in the metadata input and ensure | |||
| provided to the encryption and decryption functions at the | that it is provided to the encryption and decryption functions at the | |||
| appropriate points. A receiver MUST NOT use SFrame-authenticated | appropriate points. A receiver MUST NOT use SFrame-authenticated | |||
| metadata until after the SFrame decrypt function has authenticated | metadata until after the SFrame decrypt function has authenticated | |||
| it, unless the purpose of such usage is to prepare an SFrame | it, unless the purpose of such usage is to prepare an SFrame | |||
| ciphertext for SFrame decryption. Essentially, metadata may be used | ciphertext for SFrame decryption. Essentially, metadata may be used | |||
| "upstream of SFrame" in a processing pipeline, but only to prepare | "upstream of SFrame" in a processing pipeline, but only to prepare | |||
| for SFrame decryption. | for SFrame decryption. | |||
| For example, consider an application where SFrame is used to encrypt | For example, consider an application where SFrame is used to encrypt | |||
| audio frames that are sent over SRTP, with some application data | audio frames that are sent over SRTP, with some application data | |||
| included in the RTP header extension. Suppose the application also | included in the RTP header extension. Suppose the application also | |||
| skipping to change at page 30, line 52 ¶ | skipping to change at line 1273 ¶ | |||
| data. | data. | |||
| 10. References | 10. References | |||
| 10.1. Normative References | 10.1. Normative References | |||
| [MLS-PROTO] | [MLS-PROTO] | |||
| Barnes, R., Beurdouche, B., Robert, R., Millican, J., | Barnes, R., Beurdouche, B., Robert, R., Millican, J., | |||
| Omara, E., and K. Cohn-Gordon, "The Messaging Layer | Omara, E., and K. Cohn-Gordon, "The Messaging Layer | |||
| Security (MLS) Protocol", RFC 9420, DOI 10.17487/RFC9420, | Security (MLS) Protocol", RFC 9420, DOI 10.17487/RFC9420, | |||
| July 2023, <https://www.rfc-editor.org/rfc/rfc9420>. | July 2023, <https://www.rfc-editor.org/info/rfc9420>. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/rfc/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC5116] McGrew, D., "An Interface and Algorithms for Authenticated | [RFC5116] McGrew, D., "An Interface and Algorithms for Authenticated | |||
| Encryption", RFC 5116, DOI 10.17487/RFC5116, January 2008, | Encryption", RFC 5116, DOI 10.17487/RFC5116, January 2008, | |||
| <https://www.rfc-editor.org/rfc/rfc5116>. | <https://www.rfc-editor.org/info/rfc5116>. | |||
| [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand | [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand | |||
| Key Derivation Function (HKDF)", RFC 5869, | Key Derivation Function (HKDF)", RFC 5869, | |||
| DOI 10.17487/RFC5869, May 2010, | DOI 10.17487/RFC5869, May 2010, | |||
| <https://www.rfc-editor.org/rfc/rfc5869>. | <https://www.rfc-editor.org/info/rfc5869>. | |||
| [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for | [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for | |||
| Writing an IANA Considerations Section in RFCs", BCP 26, | Writing an IANA Considerations Section in RFCs", BCP 26, | |||
| RFC 8126, DOI 10.17487/RFC8126, June 2017, | RFC 8126, DOI 10.17487/RFC8126, June 2017, | |||
| <https://www.rfc-editor.org/rfc/rfc8126>. | <https://www.rfc-editor.org/info/rfc8126>. | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/rfc/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| 10.2. Informative References | 10.2. Informative References | |||
| [I-D.codec-agnostic-rtp-payload-format] | ||||
| Murillo, S. G. and A. Gouaillard, "Codec agnostic RTP | ||||
| payload format for video", Work in Progress, Internet- | ||||
| Draft, draft-codec-agnostic-rtp-payload-format-00, 19 | ||||
| February 2021, <https://datatracker.ietf.org/doc/html/ | ||||
| draft-codec-agnostic-rtp-payload-format-00>. | ||||
| [I-D.ietf-moq-transport] | ||||
| Curley, L., Pugin, K., Nandakumar, S., Vasiliev, V., and | ||||
| I. Swett, "Media over QUIC Transport", Work in Progress, | ||||
| Internet-Draft, draft-ietf-moq-transport-04, 29 May 2024, | ||||
| <https://datatracker.ietf.org/doc/html/draft-ietf-moq- | ||||
| transport-04>. | ||||
| [I-D.ietf-webtrans-overview] | ||||
| Vasiliev, V., "The WebTransport Protocol Framework", Work | ||||
| in Progress, Internet-Draft, draft-ietf-webtrans-overview- | ||||
| 07, 4 March 2024, <https://datatracker.ietf.org/doc/html/ | ||||
| draft-ietf-webtrans-overview-07>. | ||||
| [MLS-ARCH] Beurdouche, B., Rescorla, E., Omara, E., Inguva, S., and | [MLS-ARCH] Beurdouche, B., Rescorla, E., Omara, E., Inguva, S., and | |||
| A. Duric, "The Messaging Layer Security (MLS) | A. Duric, "The Messaging Layer Security (MLS) | |||
| Architecture", Work in Progress, Internet-Draft, draft- | Architecture", Work in Progress, Internet-Draft, draft- | |||
| ietf-mls-architecture-13, 22 March 2024, | ietf-mls-architecture-14, 8 July 2024, | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-mls- | <https://datatracker.ietf.org/doc/html/draft-ietf-mls- | |||
| architecture-13>. | architecture-14>. | |||
| [MOQ-TRANSPORT] | ||||
| Curley, L., Pugin, K., Nandakumar, S., Vasiliev, V., and | ||||
| I. Swett, Ed., "Media over QUIC Transport", Work in | ||||
| Progress, Internet-Draft, draft-ietf-moq-transport-05, 8 | ||||
| July 2024, <https://datatracker.ietf.org/doc/html/draft- | ||||
| ietf-moq-transport-05>. | ||||
| [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. | [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. | |||
| Norrman, "The Secure Real-time Transport Protocol (SRTP)", | Norrman, "The Secure Real-time Transport Protocol (SRTP)", | |||
| RFC 3711, DOI 10.17487/RFC3711, March 2004, | RFC 3711, DOI 10.17487/RFC3711, March 2004, | |||
| <https://www.rfc-editor.org/rfc/rfc3711>. | <https://www.rfc-editor.org/info/rfc3711>. | |||
| [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the | [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the | |||
| Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, | Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, | |||
| September 2012, <https://www.rfc-editor.org/rfc/rfc6716>. | September 2012, <https://www.rfc-editor.org/info/rfc6716>. | |||
| [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and | [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and | |||
| B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms | B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms | |||
| for Real-Time Transport Protocol (RTP) Sources", RFC 7656, | for Real-Time Transport Protocol (RTP) Sources", RFC 7656, | |||
| DOI 10.17487/RFC7656, November 2015, | DOI 10.17487/RFC7656, November 2015, | |||
| <https://www.rfc-editor.org/rfc/rfc7656>. | <https://www.rfc-editor.org/info/rfc7656>. | |||
| [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, | [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, | |||
| DOI 10.17487/RFC7667, November 2015, | DOI 10.17487/RFC7667, November 2015, | |||
| <https://www.rfc-editor.org/rfc/rfc7667>. | <https://www.rfc-editor.org/info/rfc7667>. | |||
| [RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach, | [RFC8723] Jennings, C., Jones, P., Barnes, R., and A.B. Roach, | |||
| "Double Encryption Procedures for the Secure Real-Time | "Double Encryption Procedures for the Secure Real-Time | |||
| Transport Protocol (SRTP)", RFC 8723, | Transport Protocol (SRTP)", RFC 8723, | |||
| DOI 10.17487/RFC8723, April 2020, | DOI 10.17487/RFC8723, April 2020, | |||
| <https://www.rfc-editor.org/rfc/rfc8723>. | <https://www.rfc-editor.org/info/rfc8723>. | |||
| [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: | [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: | |||
| Session Description Protocol", RFC 8866, | Session Description Protocol", RFC 8866, | |||
| DOI 10.17487/RFC8866, January 2021, | DOI 10.17487/RFC8866, January 2021, | |||
| <https://www.rfc-editor.org/rfc/rfc8866>. | <https://www.rfc-editor.org/info/rfc8866>. | |||
| [RTP-PAYLOAD] | ||||
| Murillo, S. G., Fablet, Y., and A. Gouaillard, "Codec | ||||
| agnostic RTP payload format for video", Work in Progress, | ||||
| Internet-Draft, draft-gouaillard-avtcore-codec-agn-rtp- | ||||
| payload-01, 9 March 2021, | ||||
| <https://datatracker.ietf.org/doc/html/draft-gouaillard- | ||||
| avtcore-codec-agn-rtp-payload-01>. | ||||
| [TestVectors] | [TestVectors] | |||
| "SFrame Test Vectors", commit 025d568, September 2023, | "SFrame Test Vectors", commit 025d568, September 2023, | |||
| <https://github.com/sframe-wg/sframe/blob/main/test- | <https://github.com/sframe-wg/sframe/blob/025d568/test- | |||
| vectors/test-vectors.json>. | vectors/test-vectors.json>. | |||
| [WEBTRANSPORT] | ||||
| Vasiliev, V., "The WebTransport Protocol Framework", Work | ||||
| in Progress, Internet-Draft, draft-ietf-webtrans-overview- | ||||
| 07, 4 March 2024, <https://datatracker.ietf.org/doc/html/ | ||||
| draft-ietf-webtrans-overview-07>. | ||||
| Appendix A. Example API | Appendix A. Example API | |||
| *This section is not normative.* | *This section is not normative.* | |||
| This section describes a notional API that an SFrame implementation | This section describes a notional API that an SFrame implementation | |||
| might expose. The core concept is an "SFrame context", within which | might expose. The core concept is an "SFrame context", within which | |||
| KID values are meaningful. In the key management scheme described in | KID values are meaningful. In the key management scheme described in | |||
| Section 5.1, each sender has a different context; in the scheme | Section 5.1, each sender has a different context; in the scheme | |||
| described in Section 5.2, all senders share the same context. | described in Section 5.2, all senders share the same context. | |||
| skipping to change at page 33, line 18 ¶ | skipping to change at line 1386 ¶ | |||
| operations). A key context tracks the key and salt associated to the | operations). A key context tracks the key and salt associated to the | |||
| KID, and the current CTR value. A key context to be used for sending | KID, and the current CTR value. A key context to be used for sending | |||
| also tracks the next CTR value to be used. | also tracks the next CTR value to be used. | |||
| The primary operations on an SFrame context are as follows: | The primary operations on an SFrame context are as follows: | |||
| * *Create an SFrame context:* The context is initialized with a | * *Create an SFrame context:* The context is initialized with a | |||
| cipher suite and no KID mappings. | cipher suite and no KID mappings. | |||
| * *Add a key for sending:* The key and salt are derived from the | * *Add a key for sending:* The key and salt are derived from the | |||
| base key, and are used to initialize a send context, together with | base key and used to initialize a send context, together with a | |||
| a zero counter value. | zero CTR value. | |||
| * *Add a key for receiving:* The key and salt are derived from the | * *Add a key for receiving:* The key and salt are derived from the | |||
| base key, and are used to initialize a send context. | base key and used to initialize a send context. | |||
| * *Encrypt a plaintext:* Encrypt a given plaintext using the key for | * *Encrypt a plaintext:* Encrypt a given plaintext using the key for | |||
| a given KID, including the specified metadata. | a given KID, including the specified metadata. | |||
| * *Decrypt an SFrame ciphertext:* Decrypt an SFrame ciphertext with | * *Decrypt an SFrame ciphertext:* Decrypt an SFrame ciphertext with | |||
| the KID and CTR values specified in the SFrame header, and the | the KID and CTR values specified in the SFrame header, and the | |||
| provided metadata. | provided metadata. | |||
| Figure 9 shows an example of the types of structures and methods that | Figure 10 shows an example of the types of structures and methods | |||
| could be used to create an SFrame API in Rust. | that could be used to create an SFrame API in Rust. | |||
| type KeyId = u64; | type KeyId = u64; | |||
| type Counter = u64; | type Counter = u64; | |||
| type CipherSuite = u16; | type CipherSuite = u16; | |||
| struct SendKeyContext { | struct SendKeyContext { | |||
| key: Vec<u8>, | key: Vec<u8>, | |||
| salt: Vec<u8>, | salt: Vec<u8>, | |||
| next_counter: Counter, | next_counter: Counter, | |||
| } | } | |||
| skipping to change at page 34, line 35 ¶ | skipping to change at line 1432 ¶ | |||
| trait SFrameContextMethods { | trait SFrameContextMethods { | |||
| fn create(cipher_suite: CipherSuite) -> Self; | fn create(cipher_suite: CipherSuite) -> Self; | |||
| fn add_send_key(&self, kid: KeyId, base_key: &[u8]); | fn add_send_key(&self, kid: KeyId, base_key: &[u8]); | |||
| fn add_recv_key(&self, kid: KeyId, base_key: &[u8]); | fn add_recv_key(&self, kid: KeyId, base_key: &[u8]); | |||
| fn encrypt(&mut self, kid: KeyId, metadata: &[u8], | fn encrypt(&mut self, kid: KeyId, metadata: &[u8], | |||
| plaintext: &[u8]) -> Vec<u8>; | plaintext: &[u8]) -> Vec<u8>; | |||
| fn decrypt(&self, metadata: &[u8], ciphertext: &[u8]) -> Vec<u8>; | fn decrypt(&self, metadata: &[u8], ciphertext: &[u8]) -> Vec<u8>; | |||
| } | } | |||
| Figure 9: An Example SFrame API | Figure 10: An Example SFrame API | |||
| Appendix B. Overhead Analysis | Appendix B. Overhead Analysis | |||
| Any use of SFrame will impose overhead in terms of the amount of | Any use of SFrame will impose overhead in terms of the amount of | |||
| bandwidth necessary to transmit a given media stream. Exactly how | bandwidth necessary to transmit a given media stream. Exactly how | |||
| much overhead will be added depends on several factors: | much overhead will be added depends on several factors: | |||
| * The number of senders involved in a conference (length of KID) | * The number of senders involved in a conference (length of KID) | |||
| * The duration of the conference (length of CTR) | * The duration of the conference (length of CTR) | |||
| skipping to change at page 35, line 24 ¶ | skipping to change at line 1468 ¶ | |||
| In the remainder of this section, we compute overhead estimates for a | In the remainder of this section, we compute overhead estimates for a | |||
| collection of common scenarios. | collection of common scenarios. | |||
| B.1. Assumptions | B.1. Assumptions | |||
| In the below calculations, we make conservative assumptions about | In the below calculations, we make conservative assumptions about | |||
| SFrame overhead so that the overhead amounts we compute here are | SFrame overhead so that the overhead amounts we compute here are | |||
| likely to be an upper bound of those seen in practice. | likely to be an upper bound of those seen in practice. | |||
| +==============+=======+================================+ | +==============+=======+============================+ | |||
| | Field | Bytes | Explanation | | | Field | Bytes | Explanation | | |||
| +==============+=======+================================+ | +==============+=======+============================+ | |||
| | Fixed header | 1 | Fixed | | | Config byte | 1 | Fixed | | |||
| +--------------+-------+--------------------------------+ | +--------------+-------+----------------------------+ | |||
| | Key ID (KID) | 2 | >255 senders; or MLS epoch | | | Key ID (KID) | 2 | >255 senders; or MLS epoch | | |||
| | | | (E=4) and >16 senders | | | | | (E=4) and >16 senders | | |||
| +--------------+-------+--------------------------------+ | +--------------+-------+----------------------------+ | |||
| | Counter | 3 | More than 24 hours of media in | | | Counter | 3 | More than 24 hours of | | |||
| | (CTR) | | common cases | | | (CTR) | | media in common cases | | |||
| +--------------+-------+--------------------------------+ | +--------------+-------+----------------------------+ | |||
| | Cipher | 16 | Full Galois/Counter Mode (GCM) | | | Cipher | 16 | Full authentication tag | | |||
| | overhead | | tag (longest defined here) | | | overhead | | (longest defined here) | | |||
| +--------------+-------+--------------------------------+ | +--------------+-------+----------------------------+ | |||
| Table 3: Overhead Analysis Assumptions | Table 3: Overhead Analysis Assumptions | |||
| In total, then, we assume that each SFrame encryption will add 22 | In total, then, we assume that each SFrame encryption will add 22 | |||
| bytes of overhead. | bytes of overhead. | |||
| We consider two scenarios: applying SFrame per frame and per packet. | We consider two scenarios: applying SFrame per frame and per packet. | |||
| In each scenario, we compute the SFrame overhead in absolute terms | In each scenario, we compute the SFrame overhead in absolute terms | |||
| (kbps) and as a percentage of the base bandwidth. | (kbps) and as a percentage of the base bandwidth. | |||
| B.2. Audio | B.2. Audio | |||
| In audio streams, there is typically a one-to-one relationship | In audio streams, there is typically a one-to-one relationship | |||
| between frames and packets, so the overhead is the same whether one | between frames and packets, so the overhead is the same whether one | |||
| uses SFrame at a per-packet or per-frame level. | uses SFrame at a per-packet or per-frame level. | |||
| Table 4 considers three scenarios that are based on recommended | Table 4 considers three scenarios that are based on recommended | |||
| configurations of the Opus codec [RFC6716]: | configurations of the Opus codec [RFC6716] (where "fps" stands for | |||
| "frames per second"): | ||||
| * Narrow-band (NB) speech: 120 ms packets, 8 kbps | ||||
| * Full-band (FB) speech: 20 ms packets, 32 kbps | ||||
| * Full-band stereo music: 10 ms packets, 128 kbps | ||||
| +================+==============+======+==========+==========+ | +==============+==============+=====+======+==========+==========+ | |||
| | Scenario | Frames per | Base | Overhead | Overhead | | | Scenario | Frame length | fps | Base | Overhead | Overhead | | |||
| | | Second (fps) | kbps | kbps | % | | | | | | kbps | kbps | % | | |||
| +================+==============+======+==========+==========+ | +==============+==============+=====+======+==========+==========+ | |||
| | NB speech, 120 | 8.3 | 8 | 1.4 | 17.9% | | | Narrow-band | 120 ms | 8.3 | 8 | 1.4 | 17.9% | | |||
| | ms packets | | | | | | | speech | | | | | | | |||
| +----------------+--------------+------+----------+----------+ | +--------------+--------------+-----+------+----------+----------+ | |||
| | FB speech, 20 | 50 | 32 | 8.6 | 26.9% | | | Full-band | 20 ms | 50 | 32 | 8.6 | 26.9% | | |||
| | ms packets | | | | | | | speech | | | | | | | |||
| +----------------+--------------+------+----------+----------+ | +--------------+--------------+-----+------+----------+----------+ | |||
| | FB stereo, 10 | 100 | 128 | 17.2 | 13.4% | | | Full-band | 10 ms | 100 | 128 | 17.2 | 13.4% | | |||
| | ms packets | | | | | | | stereo music | | | | | | | |||
| +----------------+--------------+------+----------+----------+ | +--------------+--------------+-----+------+----------+----------+ | |||
| Table 4: SFrame Overhead for Audio Streams | Table 4: SFrame Overhead for Audio Streams | |||
| B.3. Video | B.3. Video | |||
| Video frames can be larger than an MTU and thus are commonly split | Video frames can be larger than an MTU and thus are commonly split | |||
| across multiple frames. Table 5 and Table 6 show the estimated | across multiple frames. Tables 5 and 6 show the estimated overhead | |||
| overhead of encrypting a video stream, where SFrame is applied per | of encrypting a video stream, where SFrame is applied per frame and | |||
| frame and per packet, respectively. The choices of resolution, | per packet, respectively. The choices of resolution, frames per | |||
| frames per second, and bandwidth roughly reflect the capabilities of | second, and bandwidth roughly reflect the capabilities of modern | |||
| modern video codecs across a range from very-low to very-high | video codecs across a range from very low to very high quality. | |||
| quality. | ||||
| +=============+=====+===========+===============+============+ | +=============+=====+===========+===============+============+ | |||
| | Scenario | fps | Base kbps | Overhead kbps | Overhead % | | | Scenario | fps | Base kbps | Overhead kbps | Overhead % | | |||
| +=============+=====+===========+===============+============+ | +=============+=====+===========+===============+============+ | |||
| | 426 x 240 | 7.5 | 45 | 1.3 | 2.9% | | | 426 x 240 | 7.5 | 45 | 1.3 | 2.9% | | |||
| +-------------+-----+-----------+---------------+------------+ | +-------------+-----+-----------+---------------+------------+ | |||
| | 640 x 360 | 15 | 200 | 2.6 | 1.3% | | | 640 x 360 | 15 | 200 | 2.6 | 1.3% | | |||
| +-------------+-----+-----------+---------------+------------+ | +-------------+-----+-----------+---------------+------------+ | |||
| | 640 x 360 | 30 | 400 | 5.2 | 1.3% | | | 640 x 360 | 30 | 400 | 5.2 | 1.3% | | |||
| +-------------+-----+-----------+---------------+------------+ | +-------------+-----+-----------+---------------+------------+ | |||
| skipping to change at page 38, line 9 ¶ | skipping to change at line 1578 ¶ | |||
| as the quality of the video improves since bandwidth is driven more | as the quality of the video improves since bandwidth is driven more | |||
| by picture size than frame rate. In the per-packet case, the SFrame | by picture size than frame rate. In the per-packet case, the SFrame | |||
| percentage overhead approaches the ratio between the SFrame overhead | percentage overhead approaches the ratio between the SFrame overhead | |||
| per packet and the MTU (here 22 bytes of SFrame overhead divided by | per packet and the MTU (here 22 bytes of SFrame overhead divided by | |||
| an assumed 1200-byte MTU, or about 1.8%). | an assumed 1200-byte MTU, or about 1.8%). | |||
| B.4. Conferences | B.4. Conferences | |||
| Real conferences usually involve several audio and video streams. | Real conferences usually involve several audio and video streams. | |||
| The overhead of SFrame in such a conference is the aggregate of the | The overhead of SFrame in such a conference is the aggregate of the | |||
| overhead of all the individual streams. Thus, while SFrame incurs a | overhead across all the individual streams. Thus, while SFrame | |||
| large percentage overhead on an audio stream, if the conference also | incurs a large percentage overhead on an audio stream, if the | |||
| involves a video stream, then the audio overhead is likely negligible | conference also involves a video stream, then the audio overhead is | |||
| relative to the overall bandwidth of the conference. | likely negligible relative to the overall bandwidth of the | |||
| conference. | ||||
| For example, Table 7 shows the overhead estimates for a two-person | For example, Table 7 shows the overhead estimates for a two-person | |||
| conference where one person is sending low-quality media and the | conference where one person is sending low-quality media and the | |||
| other is sending high-quality media. (And we assume that SFrame is | other is sending high-quality media. (And we assume that SFrame is | |||
| applied per frame.) The video streams dominate the bandwidth at the | applied per frame.) The video streams dominate the bandwidth at the | |||
| SFU, so the total bandwidth overhead is only around 1%. | SFU, so the total bandwidth overhead is only around 1%. | |||
| +=====================+===========+===============+============+ | +=====================+===========+===============+============+ | |||
| | Stream | Base Kbps | Overhead Kbps | Overhead % | | | Stream | Base Kbps | Overhead Kbps | Overhead % | | |||
| +=====================+===========+===============+============+ | +=====================+===========+===============+============+ | |||
| skipping to change at page 38, line 48 ¶ | skipping to change at line 1618 ¶ | |||
| SFrame is a generic encapsulation format, but many of the | SFrame is a generic encapsulation format, but many of the | |||
| applications in which it is likely to be integrated are based on RTP. | applications in which it is likely to be integrated are based on RTP. | |||
| This section discusses how an integration between SFrame and RTP | This section discusses how an integration between SFrame and RTP | |||
| could be done, and some of the challenges that would need to be | could be done, and some of the challenges that would need to be | |||
| overcome. | overcome. | |||
| As discussed in Section 4.1, there are two natural patterns for | As discussed in Section 4.1, there are two natural patterns for | |||
| integrating SFrame into an application: applying SFrame per frame or | integrating SFrame into an application: applying SFrame per frame or | |||
| per packet. In RTP-based applications, applying SFrame per packet | per packet. In RTP-based applications, applying SFrame per packet | |||
| means that the payload of each RTP packet will be an SFrame | means that the payload of each RTP packet will be an SFrame | |||
| ciphertext, starting with an SFrame header, as shown in Figure 10. | ciphertext, starting with an SFrame header, as shown in Figure 11. | |||
| Applying SFrame per frame means that different RTP payloads will have | Applying SFrame per frame means that different RTP payloads will have | |||
| different formats: the first payload of a frame will contain the | different formats: The first payload of a frame will contain the | |||
| SFrame headers, and subsequent payloads will contain further chunks | SFrame headers, and subsequent payloads will contain further chunks | |||
| of the ciphertext, as shown in Figure 11. | of the ciphertext, as shown in Figure 12. | |||
| In order for these media payloads to be properly interpreted by | In order for these media payloads to be properly interpreted by | |||
| receivers, receivers will need to be configured to know which of the | receivers, receivers will need to be configured to know which of the | |||
| above schemes the sender has applied to a given sequence of RTP | above schemes the sender has applied to a given sequence of RTP | |||
| packets. SFrame does not provide a mechanism for distributing this | packets. SFrame does not provide a mechanism for distributing this | |||
| configuration information. In applications that use SDP for | configuration information. In applications that use SDP for | |||
| negotiating RTP media streams [RFC8866], an appropriate extension to | negotiating RTP media streams [RFC8866], an appropriate extension to | |||
| SDP could provide this function. | SDP could provide this function. | |||
| Applying SFrame per frame also requires that packetization and | Applying SFrame per frame also requires that packetization and | |||
| depacketization be done in a generic manner that does not depend on | depacketization be done in a generic manner that does not depend on | |||
| the media content of the packets, since the content being packetized/ | the media content of the packets, since the content being packetized | |||
| depacketized will be opaque ciphertext (except for the SFrame | or depacketized will be opaque ciphertext (except for the SFrame | |||
| header). In order for such a generic packetization scheme to work | header). In order for such a generic packetization scheme to work | |||
| interoperably, one would have to be defined, e.g., as proposed in | interoperably, one would have to be defined, e.g., as proposed in | |||
| [I-D.codec-agnostic-rtp-payload-format]. | [RTP-PAYLOAD]. | |||
| +---+-+-+-------+-+-------------+-------------------------------+<-+ | +---+-+-+-------+-+-----------+------------------------------+<-+ | |||
| |V=2|P|X| CC |M| PT | sequence number | | | |V=2|P|X| CC |M| PT | sequence number | | | |||
| +---+-+-+-------+-+-------------+-------------------------------+ | | +---+-+-+-------+-+-----------+------------------------------+ | | |||
| | timestamp | | | | timestamp | | | |||
| +---------------------------------------------------------------+ | | +------------------------------------------------------------+ | | |||
| | synchronization source (SSRC) identifier | | | | synchronization source (SSRC) identifier | | | |||
| +===============================================================+ | | +============================================================+ | | |||
| | contributing source (CSRC) identifiers | | | | contributing source (CSRC) identifiers | | | |||
| | .... | | | | .... | | | |||
| +---------------------------------------------------------------+ | | +------------------------------------------------------------+ | | |||
| | RTP extension(s) (OPTIONAL) | | | | RTP extension(s) (OPTIONAL) | | | |||
| +->+--------------------+------------------------------------------+ | | +->+-------------------+----------------------------------------+ | | |||
| | | SFrame header | | | | | | SFrame header | | | | |||
| | +--------------------+ | | | | +-------------------+ | | | |||
| | | | | | | | | | | |||
| | | SFrame encrypted and authenticated payload | | | | | SFrame encrypted and authenticated payload | | | |||
| | | | | | | | | | | |||
| +->+---------------------------------------------------------------+<-+ | +->+------------------------------------------------------------+<-+ | |||
| | | SRTP authentication tag | | | | | SRTP authentication tag | | | |||
| | +---------------------------------------------------------------+ | | | +------------------------------------------------------------+ | | |||
| | | | | | | |||
| +--- SRTP Encrypted Portion SRTP Authenticated Portion ---+ | +--- SRTP Encrypted Portion SRTP Authenticated Portion ---+ | |||
| Figure 10: SRTP Packet with SFrame-Protected Payload | Figure 11: SRTP Packet with SFrame-Protected Payload | |||
| +----------------+ +---------------+ | +----------------+ +---------------+ | |||
| | frame metadata | | | | | frame metadata | | | | |||
| +-------+--------+ | | | +-------+--------+ | | | |||
| | | frame | | | | frame | | |||
| | | | | | | | | |||
| | | | | | | | | |||
| | +-------+-------+ | | +-------+-------+ | |||
| | | | | | | |||
| | | | | | | |||
| skipping to change at page 40, line 43 ¶ | skipping to change at line 1703 ¶ | |||
| | | | | | | | | | | |||
| V V V V | V V V V | |||
| +---------------+ +---------------+ +---------------+ | +---------------+ +---------------+ +---------------+ | |||
| | SFrame header | | | | | | | SFrame header | | | | | | |||
| +---------------+ | | | | | +---------------+ | | | | | |||
| | | | payload 2/N | ... | payload N/N | | | | | payload 2/N | ... | payload N/N | | |||
| | payload 1/N | | | | | | | payload 1/N | | | | | | |||
| | | | | | | | | | | | | | | |||
| +---------------+ +---------------+ +---------------+ | +---------------+ +---------------+ +---------------+ | |||
| Figure 11: Encryption Flow with per-Frame Encryption for RTP | Figure 12: Encryption Flow with per-Frame Encryption for RTP | |||
| Appendix C. Test Vectors | Appendix C. Test Vectors | |||
| This section provides a set of test vectors that implementations can | This section provides a set of test vectors that implementations can | |||
| use to verify that they correctly implement SFrame encryption and | use to verify that they correctly implement SFrame encryption and | |||
| decryption. In addition to test vectors for the overall process of | decryption. In addition to test vectors for the overall process of | |||
| SFrame encryption/decryption, we also provide test vectors for header | SFrame encryption/decryption, we also provide test vectors for header | |||
| encoding/decoding, and for AEAD encryption/decryption using the AES- | encoding/decoding, and for AEAD encryption/decryption using the AES- | |||
| CTR construction defined in Section 4.5.1. | CTR construction defined in Section 4.5.1. | |||
| skipping to change at page 72, line 13 ¶ | skipping to change at line 3156 ¶ | |||
| 3c1cc24d56ceabced279 | 3c1cc24d56ceabced279 | |||
| Acknowledgements | Acknowledgements | |||
| The authors wish to specially thank Dr. Alex Gouaillard as one of the | The authors wish to specially thank Dr. Alex Gouaillard as one of the | |||
| early contributors to the document. His passion and energy were key | early contributors to the document. His passion and energy were key | |||
| to the design and development of SFrame. | to the design and development of SFrame. | |||
| Contributors | Contributors | |||
| Frederic Jacobs | Frédéric Jacobs | |||
| Apple | Apple | |||
| Email: frederic.jacobs@apple.com | Email: frederic.jacobs@apple.com | |||
| Marta Mularczyk | Marta Mularczyk | |||
| Amazon | Amazon | |||
| Email: mulmarta@amazon.com | Email: mulmarta@amazon.com | |||
| Suhas Nandakumar | Suhas Nandakumar | |||
| Cisco | Cisco | |||
| Email: snandaku@cisco.com | Email: snandaku@cisco.com | |||
| skipping to change at page 72, line 40 ¶ | skipping to change at line 3183 ¶ | |||
| Phoenix R&D | Phoenix R&D | |||
| Email: ietf@raphaelrobert.com | Email: ietf@raphaelrobert.com | |||
| Authors' Addresses | Authors' Addresses | |||
| Emad Omara | Emad Omara | |||
| Apple | Apple | |||
| Email: eomara@apple.com | Email: eomara@apple.com | |||
| Justin Uberti | Justin Uberti | |||
| Fixie.ai | ||||
| Email: juberti@google.com | Email: justin@fixie.ai | |||
| Sergio Garcia Murillo | Sergio Garcia Murillo | |||
| CoSMo Software | CoSMo Software | |||
| Email: sergio.garcia.murillo@cosmosoftware.io | Email: sergio.garcia.murillo@cosmosoftware.io | |||
| Richard L. Barnes (editor) | ||||
| Richard Barnes (editor) | ||||
| Cisco | Cisco | |||
| Email: rlb@ipv.sx | Email: rlb@ipv.sx | |||
| Youenn Fablet | Youenn Fablet | |||
| Apple | Apple | |||
| Email: youenn@apple.com | Email: youenn@apple.com | |||
| End of changes. 99 change blocks. | ||||
| 324 lines changed or deleted | 325 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. | ||||