Post-Delivery Message Downgrading for Internationalized Email Messages
Japan Registry Services Co., Ltd.Chiyoda First Bldg. East 13F, 3-8-1 Nishi-KandaChiyoda-kuTokyo101-0065Japan+81 3 5215 8451fujiwara@jprs.co.jp
Applications
Email Address Internationalization (EAI)EAIEmail Address InternationalizationDowngradeMAIL
The Email Address Internationalization (SMTPUTF8) extension to SMTP allows
UTF-8 characters in mail header fields.
Upgraded POP and IMAP servers support internationalized Email messages.
If a POP/IMAP client does not support Email Address Internationalization,
POP/IMAP servers cannot deliver Internationalized Email Headers to the client
and cannot remove the message.
To avoid that situation,
this document describes a mechanism
for converting internationalized Email messages into the traditional message format.
In the process, message elements
requiring internationalized treatment are recoded or removed
and receivers are able to know that they received messages containing such
elements, even if they cannot process the internationalized elements.
Traditional (legacy) mail systems, which are defined by and other specifications,
allow only ASCII characters in mail header field values.
The SMTPUTF8 extension
(,
, and
)
allow raw UTF-8 in these mail header fields.
If a header field contains non-ASCII strings,
POP/IMAP servers cannot deliver Internationalized Email Headers to
legacy clients that do not send UTF8 commands or have UTF8 capability.
Also, because they have no obvious or standardized way to explain what
is going on to clients, they cannot even safely discard the message.
There
are four plausible approaches to the problem, with the preferred
one depending on the particular circumstances and relationship
among the delivery SMTP server, the mail store, the POP or IMAP
server, and the users and their Mail User Agent (MUA) clients:
If the delivery Mail Transport Agent (MTA) has sufficient knowledge about the POP
and/or IMAP servers and clients being used, the message
may be rejected as undeliverable.The message may be downgraded by the POP or IMAP server in a
way that preserves maximum information at the expense of
some complexity and that does not create security or operational
problems in the mail system.
Some intermediate downgrading may be applied that balances
more information loss against lower complexity and greater
ease of implementation.The POP or IMAP server may fabricate a message whose
intent is to notify the client that an internationalized
message is waiting but cannot be delivered until an
upgraded client is available. This specification describes the second of these options. It
is worth noting that, at least in the general case, none of
these options preserves sufficient information to guarantee that
it is possible to reply to an incoming message without loss of
information, so the choice may be considered to be among the "least
bad" options.
While this document specifies a well-designed mechanism,
it is only an interim solution while clients are being upgraded
.
This message downgrading mechanism converts mail header fields to an
all-ASCII representation. The POP/IMAP servers can use the
downgrading mechanism and deliver the Internationalized Email message in
a traditional form.
Receivers can know they received some internationalized messages or some unknown or broken messages.
allows UTF-8 characters to be used in
mail header fields and MIME header fields.
allows UTF-8 characters to be used in
some trace header fields.
The message downgrading mechanism specified here describes
the conversion method
from the internationalized messages that are defined in
and
to the traditional email messages
defined in
.
This document provides a precise definition of the
minimum-information-loss message downgrading process.
Downgrading consists of the following three parts:
New header field definitionsEmail header field downgradingMIME header field downgrading
Email header field downgrading is described in .
It generates ASCII-only header fields.
In of this document,
header fields starting with "Downgraded-" are introduced.
They preserve the information that appeared in the original
header fields.
The definition of MIME header fields in Internationalized Email Messages
is described in .
MIME header field downgrading is described in .
It generates ASCII-only MIME header fields.
Displaying downgraded messages that originally contained
internationalized header fields is out of scope of this
document. A POP/IMAP client that does not support UTF8
extensions as defined for POP3 [UTF8 command]
and IMAP [ENABLE UTF8=ACCEPT command]
does not know the internationalized message format
described in .
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in RFC
2119 .
All specialized terms used in this specification are defined in the
"Overview and Framework for Internationalized Email"
, in the mail message specifications , or in the MIME documents
.
The terms "U-label", "A-label", and "IDNA" are used as defined in .
The terms "ASCII address", "non-ASCII address", "SMTPUTF8", "message", "internationalized message"
are used as defined in .
The term "non-ASCII string"
is used as defined in .
This section defines the method to convert to ASCII for each header field
that may contain non-ASCII strings.
describes the methods for rewriting each ABNF element.
describes the methods for rewriting each header field.
Header field downgrading is defined below for each ABNF element.
Conversion of the header field terminates when no non-ASCII strings
remain in the header field.
describes the ABNF elements
<group>, <mailbox>, <unstructured>, <word>, <comment>, and <display-name>.
describes the ABNF element <value>.
<domain> is updated to allow non-ASCII characters in Section 3.3 of and Section 3.2 of .
If the header field has an <unstructured> field that
contains non-ASCII strings,
apply the encoding of with charset UTF-8.
If the header field has any <word>
fields that contain non-ASCII strings,
apply the encoding of with charset UTF-8.
If the header field has any <comment>
fields that contain non-ASCII strings,
apply the encoding of with charset UTF-8.
If the header field has any <value> elements defined by
and those elements contain non-ASCII strings,
encode the <value> elements
according to with charset UTF-8 and
leave the language information empty.
If the <value> element is <quoted-string>
and it contains <CFWS> outside the DQUOTE,
remove the <CFWS> before this conversion.
If the header field has any <address>
(<mailbox> or <group>) elements, and
they have <display-name> elements that contain non-ASCII strings,
encode the <display-name> elements
according to with charset UTF-8.
DISPLAY-NAME downgrading uses the same algorithm as WORD downgrading.
If the header field has any <domain> elements
that contain U-labels,
rewrite the non-ASCII domain name into an ASCII domain name using A-labels
as specified in IDNA .
<group> is defined in Section 3.4 of .
The <group> element may contain <mailbox> elements that contain non-ASCII addresses.
If a <group> element contains <mailbox> elements and one of those <mailbox> elements contains a non-ASCII <local-part>,
rewrite the <group> element as
where the <ENCODED_WORD> is the original <group-list> encoded
according to .
Otherwise, the <group> element does not contain a non-ASCII <local-part>.
If the <group> element contains non-ASCII <mailbox> elements,
they contain non-ASCII domain names.
Rewrite the non-ASCII domain names into ASCII domain names using A-labels
as specified in IDNA .
Generated <mailbox> elements contain ASCII addresses only.
If the <local-part> of the <mailbox> element does not contain non-ASCII characters,
the <domain> element may contain non-ASCII characters.
Rewrite the non-ASCII domain name into ASCII domain name using A-labels
as specified in IDNA .
Otherwise, the <local-part> may contain non-ASCII characters.
The non-ASCII <local-part> has no equivalent format for ASCII addresses.
The <addr-spec> element that contains non-ASCII strings may appear in two forms as:
Rewrite both as:
where the <ENCODED-WORD> is the original <addr-spec> encoded
according to .
If the header field contains <utf-8-type-addr>
and the <utf-8-type-addr> contains raw non-ASCII strings,
it is in utf-8-address form. Convert it to utf-8-addr-xtext form.
Both utf-8-address and utf-8-addr-xtext are described in .
COMMENT downgrading is also performed in this case.
If the address type is unrecognized and the header field contains non-ASCII strings,
then fall back to using ENCAPSULATION on the entire header field as
specified in .
As a last resort, when header fields cannot be converted as
discussed in the previous subsection, the fields are deleted
and replaced by specialized new header fields. Those fields
are defined to preserve, in encoded form, as much
information as possible from the header field values of the
incoming message. The syntax of these new header fields is:
Applying this procedure to the "Received:" header field is
prohibited. ENCAPSULATION Downgrading is allowed for
"Message-ID", "In-Reply-To:", "References:",
"Original-Recipient", and "Final-Recipient" header fields.
To preserve a header field in a "Downgraded-" header field:
Generate a new header field.
The field name is a concatenation of "Downgraded-" and the original field name.The initial new field value is the original header field value.
Treat the initial new header field value as if it were unstructured,
and then apply the encoding of with charset
UTF-8 as necessary so that the resulting new header field value
is completely in ASCII.
Remove the original header field.
establishes a registry of header fields.
This section describes the downgrading method for each header field.
If the entire mail header field contains no non-ASCII strings,
email header field downgrading is not required.
Each header field's downgrading method is described below.
If the header field contains non-ASCII characters, first perform
COMMENT downgrading and DISPLAY-NAME downgrading as
described in the corresponding subsections of
.
If the header field still contains non-ASCII characters after that,
complete the following two steps:
If the header field contains <group> elements
that contain non-ASCII addresses,
perform GROUP downgrading on those elements.
If the header field contains <mailbox> elements
that contain non-ASCII addresses,
perform MAILBOX downgrading on those elements.
This procedure may generate empty <group> elements in the "From:", "Sender:", and "Reply-To:" header fields.
updates
to allow (empty) <group> elements in "From:" and "Sender:".
These header fields do not contain non-ASCII strings except in comments.
If the header field contains UTF-8 characters in comments, perform COMMENT downgrading.
Perform ENCAPSULATION as specified in .
If <domain> elements or <mailbox> elements contain U-labels, perform DOMAIN downgrading as specified in .
Comments may contain non-ASCII strings; if so, perform COMMENT downgrading.
After the DOMAIN downgrading and the COMMENT downgrading,
if the FOR clause contains a non-ASCII <local-part>, remove the "FOR" clause.
If the ID clause contains a non-ASCII values, remove the "ID" clause.
Perform MIME-VALUE downgrading and COMMENT downgrading.
Perform UNSTRUCTURED downgrading.
Perform WORD downgrading.
There are other header fields that contain non-ASCII strings.
They are user defined and missing from this document,
or future defined header fields.
They are treated as
"Optional Fields" and their field values are treated as
unstructured as described in Section 3.6.8 of .Perform UNSTRUCTURED downgrading.If the software understands the header field's structure and a downgrading
algorithm other than UNSTRUCTURED is applicable, that software SHOULD use
that algorithm; UNSTRUCTURED downgrading is used as a last resort.Mailing list header fields (those that start in "List-")
are part of this category.
Both MIME Body-Part header fields and contents of a delivery status notification
may contain non-ASCII characters.
MIME body-part header fields may contain non-ASCII strings
.
This section defines the conversion method to ASCII-only header fields
for each MIME header field that contains non-ASCII strings.
Parse the message body's MIME structure at all levels and
check each MIME header field to see whether it contains non-ASCII strings.
If the header field contains non-ASCII strings in the header field value,
the header field is a target of the MIME body-part header field's downgrading.
Each MIME header field's downgrading method is described below.
COMMENT downgrading, MIME-VALUE downgrading, and UNSTRUCTURED downgrading
are described in .
The "Content-ID:" header field does not contain non-ASCII strings
except in comments. If the header field contains
UTF-8 characters in comments, perform COMMENT downgrading.
Perform MIME-VALUE downgrading and
COMMENT downgrading.
Perform UNSTRUCTURED downgrading.
If the message contains a delivery status notification
defined in Section 6 of ,
perform the following tests and conversions.
If there are "Original-Recipient:" and "Final-Recipient:" header fields,
and the header fields contain non-ASCII strings, perform TYPED-ADDRESS downgrading.
The purpose of post-delivery message downgrading is to allow
POP/IMAP servers to deliver internationalized messages
to traditional POP/IMAP clients
and permit the clients to display those messages.
Users who receive such messages can know that they were
internationalized.
It does not permit receivers to read the messages in their
original form and, in general, will not permit generating
replies, at least without significant user intervention.
A downgraded message's header fields contain ASCII characters only.
However, they still contain MIME-encapsulated header fields that contain
non-ASCII strings. Furthermore, the body part may contain UTF-8 characters.
Implementations parsing Internet messages need
to accept UTF-8 body parts and UTF-8 header fields that are MIME-encoded.
Thus, this document inherits the security considerations of
MIME-encoded header fields ( and
).
Rewriting header fields increases the opportunities for
undetected spoofing by malicious senders.
However, the rewritten header field values are
preserved in equivalent MIME form
or in newly defined header fields for which traditional MUAs have no special processing procedures.
The techniques described here invalidate methods that depend
on digital signatures over any part of the message,
which includes the top-level header fields and body-part header fields.
Depending on the specific message being downgraded, at least the
following techniques are likely to break: DomainKeys Identified
Mail (DKIM) and possibly S/MIME and Pretty Good Privacy (PGP).
The downgrade
mechanism SHOULD NOT remove signatures even if the signatures
will fail validation after downgrading. As much of the information as
possible from the original message SHOULD be preserved.
While information in any email header field should usually be treated with
some suspicion, current email systems commonly employ various
mechanisms and protocols to make the information more trustworthy.
Information in the new Downgraded-* header fields is
not inspected by traditional MUAs and may be even less trustworthy
than the traditional header fields.
Note that the Downgraded-* header fields could have been inserted with malicious intent
(and with content unrelated to the traditional header fields);
however, traditional MUAs do not parse Downgraded-* header fields.
In addition, if an Authentication-Results header
field is present,
traditional MUAs may treat that the digital signatures are valid.
See the Security Considerations sections in and for more discussion.
While has a specific algorithm to deal with whitespace in
adjacent encoded words, there are a number of deployed implementations
that fail to implement the algorithm correctly. As a result, whitespace
behavior is somewhat unpredictable, in practice, when multiple encoded words
are used. While states that implementations SHOULD limit lines
to not more than 78 characters, implementations MAY choose to allow
overly long encoded words in order to work around faulty implementations of . Implementations that choose to do so SHOULD have an
optional mechanism to limit line length to 78 characters.
specified that no new header fields be registered that begin
with "Downgraded-". That restriction has now been lifted, and this document
makes a new set of registrations, replacing the experimental fields with
standard ones.
The "Downgraded-*" header fields that were registered as experimental
fields in are no longer in use. IANA has changed the status
from "experimental" to "obsoleted" for every name in the "Permanent Message
Header Field Names" registry that began with "Downgraded-".
The following header fields have been registered in the
"Permanent Message Header Field Names" registry, in accordance with the
procedures set out in .
Downgraded-Message-IdmailstandardIETFThis document ()Downgraded-In-Reply-TomailstandardIETFThis document ()Downgraded-ReferencesmailstandardIETFThis document ()Downgraded-Original-RecipientmailstandardIETFThis document ()Downgraded-Final-RecipientmailstandardIETFThis document ()
This document draws heavily from the experimental in-transit
message downgrading procedure described in
. The contributions of the coauthor
of that earlier document, Y. Yoneya, are gratefully
acknowledged.
Significant comments and suggestions were received from John
Klensin, Barry Leiba, Randall Gellens, Pete Resnick, Martin J. Durst,
and other WG participants.
Update to Internet Message Format to Allow Group Syntax in the "From:" and "Sender:" Header FieldsThe Internet Message Format (RFC 5322) allows "group" syntax in some email header fields, such as "To:" and "CC:", but not in "From:" nor "Sender:". This document updates RFC 5322 to relax that restriction, allowing group syntax in those latter fields, as well as in "Resent- From:" and "Resent-Sender:", in certain situations.Post Office Protocol Version 3 (POP3) Support for UTF-8This specification extends the Post Office Protocol version 3 (POP3) to support UTF-8 encoded international string in user names, passwords, mail addresses, message headers, and protocol-level textual strings.IMAP Support for UTF-8This specification extends the Internet Message Access Protocol version 4rev1 (IMAP4rev1) to support UTF-8 encoded international characters in user names, mail addresses and message headers. This specification replaces RFC 5738.
This appendix shows a message downgrading example.
Consider a received mail message where:
The sender address is a non-ASCII address, "NON-ASCII-LOCAL@example.com". Its display-name is "DISPLAY-LOCAL".
The "To:" header field contains two non-ASCII addresses, "NON-ASCII-REMOTE1@example.net" and "NON-ASCII-REMOTE2@example.com" Its display-names are "DISPLAY-REMOTE1" and "DISPLAY-REMOTE2".
The "Cc:" header field contains a non-ASCII address, "NON-ASCII-REMOTE3@example.org".
Its display-name is "DISPLAY-REMOTE3".
Four display-names contain non-ASCII characters.
The Subject header field is "NON-ASCII-SUBJECT", which contains non-ASCII strings.
The "Message-Id:" header field contains "NON-ASCII-MESSAGE_ID",
which contains non-ASCII strings.
There is an unknown header field "X-Unknown-Header", which contains non-ASCII strings.
The downgraded message is shown in
.
"Return-Path:", "From:", "To:", and "Cc:" header fields are rewritten.
"Subject:" and "X-Unknown-Header:" header fields are encoded
using .
The "Message-Id:" header field is encapsulated as a
"Downgraded-Message-Id:" header field.