An Encoding Parameter for HTTP
Basic Authenticationgreenbytes GmbHHafenweg 16MuensterNW48155Germanyjulian.reschke@greenbytes.dehttp://greenbytes.de/tech/webdav/
The "Basic" authentication scheme defined in RFC 2617 does not properly
define how to treat non-ASCII characters. This has lead to a situation
where user agent implementations disagree, and servers make different
assumptions based on the locales they are running in. There is little
interoperability for the non-ASCII characters in the ISO-8859-1 character set, and even less
interoperability for any characters beyond that.
This document defines a backwards-compatible extension to "Basic",
specifying the server's character encoding expectation,
using a new authentication scheme parameter.
Distribution of this document is unlimited. Although this is not a work
item of the HTTPbis Working Group, comments should be sent to the
Hypertext Transfer Protocol (HTTP) mailing list at ietf-http-wg@w3.org,
which may be joined by sending a message with subject
"subscribe" to ietf-http-wg-request@w3.org.
Discussions of the HTTPbis Working Group are archived at
.
XML versions, latest edits and the issues list for this document
are available from .
The "Basic" authentication scheme defined in Section 2 of does
not properly define how to treat non-ASCII characters (): it uses the Base64
(, Section 4)
encoding of the concatenation of username, separator character, and password
without stating which character encoding to use.
This has lead to a situation
where user agent implementations disagree, and servers make different
assumptions based on the locales they are running in. There is little
interoperability for the non-ASCII characters in the ISO-8859-1 character set (, ),
and even less interoperability for any characters beyond that.
This document defines a backwards-compatible extension to "Basic",
specifying the server's character encoding expectation,
using a new auth-param
for use in the Proxy-Authenticate and WWW-Authenticate header fields,
as defined in .
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document
are to be interpreted as described in .
In challenges, servers MAY use the "accept-charset" authentication parameter (case-insensitive) to express the
character encoding they expect the user agent to use.
The only allowed value is "UTF-8", to be matched case-insensitively
(see , Section 2.3), indicating that
the server expects the UTF-8 character encoding to be used
().
Other values are reserved for future use.
Note: The 'accept-charset' parameter cannot be included when sending
credentials (e.g. in the Authorization or Proxy-Authorization header fields),
as the "Basic" scheme uses a single base64 token for credentials
('b64token' syntax), not a parameter list ('#auth-param' syntax); see
Section 2.1 of .
The user's name is "test", and his password is the string "123" followed by
the Unicode character U+00A3 (POUND SIGN). Following Section 1.2 of , but using the character encoding UTF-8, the user-pass,
converted to a sequence of octets, is:
Encoding this octet sequence in Base64 (, Section 4) yields:
Thus the Authorization header field would be:
Or, for proxy authentication:
This document does not introduce any new security considerations beyond
those defined for the "Basic" authentication scheme (, Section 4), and those
applicable to the handling of UTF-8 (, Section 10).
There are no IANA Considerations related to this specification.
The internationalisation problem has been reported as a Mozilla bug back
in the year 2000 (see
and also the more recent ).
It was Andrew Clover's idea to address it using a new auth-param.
Thanks to Bjoern Hoehrmann, Amos Jeffries, James Manger, and Martin Thomson for providing feedback on this document.
Key words for use in RFCs to Indicate Requirement LevelsHarvard Universitysob@harvard.eduIANA Charset Registration ProceduresUTF-8, a transformation format of ISO 10646Alis Technologiesfyergeau@alis.comInformation technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1International Organization for StandardizationCoded Character Set -- 7-bit American Standard Code for Information InterchangeAmerican National Standards InstituteHTTP Authentication: Basic and Digest Access AuthenticationNorthwestern University, Department of Mathematicsjohn@math.nwu.eduVerisign Inc.pbaker@verisign.comAbiSource, Inc.jeff@AbiSource.comAgranat Systems, Inc.lawrence@agranat.comMicrosoft Corporationpaulle@microsoft.comNetscape Communications CorporationOpen Market, Inc.stewart@OpenMarket.comHypertext Transfer Protocol (HTTP/1.1): AuthenticationAdobe Systems Incorporatedfielding@gbiv.comgreenbytes GmbHjulian.reschke@greenbytes.deThe Base16, Base32, and Base64 Data EncodingsXMLHttpRequest
Latest version available at
.
User agents not implementing this specifications should continue to work as
before, ignoring the new parameter.
User agents which already default to the UTF-8 encoding implement
this specification by definition.
Note that some user agents also have different defaults depending
on whether the request originates from page navigation as opposed to a
script-driven request using XMLHttpRequest .
Other user agents can keep their default behavior, and switch to UTF-8
when seeing the new parameter.
On the other hand, the strategy below may already improve the user-visible
behavior today:
In the first authentication request, choose the character encoding based
on the user's credentials: if they do not need any characters outside
the ISO-8859-1 character set, default to ISO-8859-1, otherwise use
UTF-8.
If the first attempt failed and the encoding used was ISO-8859-1, retry
once with UTF-8 encoding instead.
Note that there's a risk if the site blocks an account after multiple login
failures (for instance, when it doesn't reset the counter after a successful
login).
Origin servers that do not support non-ASCII characters in credentials do not
require any changes.
Origin servers that need to support non-ASCII characters, but can't use
the UTF-8 encoding will not be affected; they will continue to function
as well as before.
Finally, origin servers that need to support non-ASCII characters and can
use the UTF-8 encoding can opt in as described above. In the worst case,
they'll continue to see either broken credentials or no credentials at
all (depending on how legacy clients handle characters they can not
encode).
There are sites in use today that default to a locale encoding, such as
ISO-8859-1, and expect user agents to use that encoding. These sites
will break if the user agent uses a different encoding, such as UTF-8.
Although the solution proposed in this document may be applicable to
"Digest" as well, any attempt to update this scheme may be an uphill
battle hard to win.
It appears they will. See
and .
Add and close issues "credparam" and "paramcase".
Rewrite the deployment considerations.
Note more recent Mozilla bugzilla entry; add behavior of existing UAs
to FAQ (with pointer to test cases).
Add and resolve issue "xhrutf8".
Add and resolve issue "proxy".
Add and resolve issues "paramname" and "sentparam".
Add issues
"terminology" and
"unorm".
Update HTTPbis reference.
Update HTTPbis reference.
Update HTTPbis and XHR references.
Type: editjulian.reschke@greenbytes.de (2010-08-11):
Umbrella issue for editorial fixes/enhancements.
Type: editjulian.reschke@greenbytes.de (2012-02-02):
We need a statement about unicode normalization forms.
Type: editjulian.reschke@greenbytes.de (2012-02-02):
Try to be consistent with the terminology defined in RFC 6365.