IETF64 Review: Internationalized Email and Extensions (IEE)

By: Marcos Sanz

Date: December 7, 2005

line break image

More than one and a half years have gone since the last Internationalizing Email Address BoF, which took place at the IETF 59 in Seoul, and apparently the time has now come at the IETF 64 for a working group to be formed and to take a stab at the issue of fully internationalized email addresses. The biggest push for this work comes from IETFer colleagues in China, Japan and Korea. And not without reason: imagine you have to transliterate your names from Hangul or Kanji into ASCII for most of the email addresses you type. This is not only a cumbersome and alienating process, but also an error-prone one. On top of that consider, you only would have to do that for the left hand side of the ‘@’ character, since the right hand side of the address, the domain names, are already internationalized by the IDNA standard (RFC 3490). Now, how incoherent is this current situation?

Different to the IDNA solution, the current approach, as discussed at the IEE BoF, is not occurring at the presentation level, but consists of a series of far-reaching modifications to the underlying protocols:

  • The definition of a mailbox in RFC 2821 is revisited and updated in order for the Local Part (everything left of the ‘@’) to support the full Unicode range of characters, not only ASCII, and for the Domain part (everything right of the ‘@’) to support the IDNA standard. This new form of mailbox is called the Internationalized eMail Address (IMA).
  • The mail transfer protocol SMTP needs a service extension that will allow mail transmission agents (MTAs) to signalize to their clients at the moment of the session establishment that they support IMAs. Clients that are confronted with that extension will then be able to transmit IMAs in raw UTF-8 encoding to the MTAs in the SMTP commands. The IMA extension is dependent on previous support of the 8BITMIME (RFC 1652) extension by the MTA.
  • IMAs will not be confined to the SMTP envelope, but spread all over the headers of an email, so the need for characters beyond ASCII in the values of header fields becomes obvious. Though MIME-Extensions (RFC 2047) already provide for partial internationalization support in headers by means of different encodings on the wire, these extensions don’t go far enough. Now with IMA, the definition of header fields as of RFC 2822 will be updated for them to natively support the whole Unicode character space. If an SMTP client communicates with an MTA with IMA support, the client can encode any header field in raw UTF-8. The syntax of header names, however, remains unchanged.
  • A last, but crucial issue: downgrade mechanisms are defined for the case in which any of the MTAs involved in the delivery chain of a mail would not support the IMA extension. Basically, the MAIL and RCPT SMTP commands will support an optional parameter (ALT-ADDRESS) that allows a client to convey an alternative non-internationalized address, which could be used as a fallback instead of the original IMA. This alternative address could also be automatically generated by applying an ASCII encoding mechanism similar to the ACE used for domain names to the whole IMA, thus mapping it into ASCII. The latter kind of downgraded addresses would be marked accordingly. It would be up to the final MTA (or mail delivery agent) to decode the downgraded fields to turn them back into IMAs. As a last resort, when everything else fails, the mail could always be bounced back to the sender.

These mechanisms, in the form of four I-Ds, are targeted at becoming experimental RFCs. Since it was recognized as a prime directive not to fragment the existing email system, these RFCs will not find their way on to the standards track before implementations appear and the extensions are thoroughly evaluated in daily operations.

Impact on other protocols which make use of email addresses, notably POP and IMAP, and on others such as LDAP, ACAP or S/MIME, will be evaluated and additional documents will be produced. Interaction with mailing lists and similar distribution mechanisms will be studied and operational guidelines for IMA deployment will be documented.

Internationalized domain names are often associated to phishing and other security problems, like the so-called homograph attack. That is partly unfair: to be true, the whole of the spoofing-attempt mails received by the author at the moment, which are not few, come from traditional ASCII domain names. Since the advent of IDNA, however, some lessons have been learnt and it is a widespread belief that the amount of characters allowed by IDNA is far beyond what is actually needed. To the eyes of the author, the internationalized email addresses effort should try to apply this knowledge from the beginning, constraining the syntax of the Local Part of the email as necessary. It would be sensible to base upon work done by other experts, like for instance, the General Security Profile for Identifiers defined by the Unicode Consortium.

All in all, a challenging, multidisciplinary task, which will need as much peer review as possible. What are you waiting for?