Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #4562: add support for internationalized email addresses #5799

Closed
wants to merge 21 commits into from
Closed
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
1196339
Change definition of string equality to use 'is'/'identical to' from …
aphillips May 12, 2020
4f14092
Fix transposed words so "identical to" reference matches.
aphillips May 12, 2020
2b72d5b
Finished porting `case-sensitive` references to `is`|`identical to`
aphillips May 14, 2020
80acb2e
Remove for=string from references
aphillips May 14, 2020
e955ce6
Address CI build issues (trailing spaces)
aphillips May 14, 2020
09a1124
Missed one trailing space
aphillips May 14, 2020
b0c06b9
Address @annevk's comments
aphillips May 15, 2020
fb0e1b1
Point people directly to Infra
domenic May 15, 2020
433ecd9
Fixed CI issue (used a 2119 keyword in a note)
aphillips May 15, 2020
7e4a89b
Merge branch 'master' of https://github.com/aphillips/html
aphillips May 15, 2020
b6ccfb5
Remove note again
domenic May 15, 2020
ecd6ef8
formatting and minor editorial changes
annevk May 15, 2020
ad07c4d
A couple minor nits
domenic May 15, 2020
ee8c480
Merge remote-tracking branch 'whatwg/master'
aphillips Aug 1, 2020
9d4e530
Merge remote-tracking branch 'whatwg/master'
aphillips Aug 7, 2020
3ae2cc9
Fix #4562. Adds support for internationalized email addresses to inpu…
aphillips Aug 7, 2020
6d6e20c
Address wattsi error (failed to remove close div)
aphillips Aug 7, 2020
fcea2b1
Repair missing link and add reference to RFC6531.
aphillips Aug 7, 2020
17d2eef
Fixed CI errors.
aphillips Aug 8, 2020
5f8f396
Really actually strip trailing spaces.
aphillips Aug 8, 2020
86b9335
Strip one missing trailing space
aphillips Aug 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 24 additions & 20 deletions source
Original file line number Diff line number Diff line change
Expand Up @@ -45494,32 +45494,33 @@ interface <dfn>HTMLInputElement</dfn> : <span>HTMLElement</span> {

<p>A <dfn>valid e-mail address</dfn> is a string that matches the <code data-x="">email</code>
production of the following ABNF, the character set for which is Unicode. This ABNF implements the
extensions described in RFC 1123. <ref spec=ABNF> <ref spec=RFC5322> <ref spec=RFC1034> <ref spec=RFC1123></p>

<pre><code data-x="" class="abnf">email = 1*( atext / "." ) "@" label *( "." label )
label = let-dig [ [ ldh-str ] let-dig ] ; limited to a length of 63 characters by <a href="https://tools.ietf.org/html/rfc1034#section-3.5">RFC 1034 section 3.5</a>
atext = &lt; as defined in <a href="https://tools.ietf.org/html/rfc5322#section-3.2.3">RFC 5322 section 3.2.3</a> >
let-dig = &lt; as defined in <a href="https://tools.ietf.org/html/rfc1034#section-3.5">RFC 1034 section 3.5</a> >
ldh-str = &lt; as defined in <a href="https://tools.ietf.org/html/rfc1034#section-3.5">RFC 1034 section 3.5</a> ></code></pre>

<!-- Domain syntax based on section 3.5 of [RFC1034] and section 2.1 of [RFC1123] -->
extensions described in RFC 1123 and includes support for internationalized email addresses as
described in RFC 6531. <ref spec=ABNF> <ref spec=RFC6531> <ref spec=RFC5322> <ref spec=RFC1034>
<ref spec=RFC1123></p>

<pre><code data-x="" class="abnf">email = localpart "@" ___domain
localpart = 1*( utext / "." )
utext = ALPHA / DIGIT / "!" / ; unreserved printable ASCII
"#" / "$" / "%" / "&" / "'" / "*" / ; as defined in RFC5322 section 3.2.3
"+" / "-" / "/" / "=" / "?" / "^" /
"_" / "`" / "{" / "|" / "}" / "~" /
%80-D7FF / %E000-10FFFF ; or any non-ASCII Unicode
___domain = &lt; a "valid host string", see URL section 3.4 &gt;</code></pre>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The valid host string rule is not compatible with RFC 5321, section 4.1.2. In SMTP, IPv4 addresses must be wrapped in square brackets, e.g. mailbox@[10.0.0.1].

I just verified that Postfix enforces this rule.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The currently published spec forbids the use of IP addresses in the ___domain part anyway. We could just keep recommending that. If we do, then this is sufficient, I think:

Suggested change
___domain = &lt; a "valid host string", see URL section 3.4 &gt;</code></pre>
___domain = &lt; a "valid ___domain string", see URL section 3.4 &gt;</code></pre>

https://url.spec.whatwg.org/#valid-___domain (which is correct for IDNA, but also see whatwg/url#245).


<p>This definition supports internationalized email addresses ("SMTPUTF8"), including
non-ASCII values in both the localpart (the mailbox name or "left hand side") and ___domain portions
of the address. The ___domain must be a <a href="https://url.spec.whatwg.org/#valid-host-string">valid
host string</a>. Because of the details for encoding non-ASCII ___domain names, it's not possible to
describe the ___domain portion of an address in a simple regular expression. The number and range of
Unicode characters permitted are interdependent and somewhat variable. The URL spec,
<a href="https://url.spec.whatwg.org#host-parsing">Section 3.5</a> describes how the ___domain is
validated. <ref spec=URL></p>

<p class="note">This requirement is a <span>willful violation</span> of RFC 5322, which defines a
syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too
vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted
strings in manners unfamiliar to most users) to be of practical use here.</p>

<div class="note">

<p>The following JavaScript- and Perl-compatible regular expression is an implementation of the
above definition.</p>

<pre>/^[a-zA-Z0-9.!#$%&amp;'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/</pre>

<!-- based on: https://blog.gerv.net/2011/05/html5_email_address_regexp/ -->

</div>

<p>A <dfn>valid e-mail address list</dfn> is a <span>set of comma-separated tokens</span>, where
each token is itself a <span>valid e-mail address</span>. <span w-nodev>To obtain the list of
tokens from a <span>valid e-mail address list</span>, an implementation must <span data-x="split a
Expand Down Expand Up @@ -122161,6 +122162,9 @@ INSERT INTERFACES HERE
<dt id="refsRFC6350">[RFC6350]</dt>
<dd><cite><a href="https://tools.ietf.org/html/rfc6350">vCard Format Specification</a></cite>, S. Perreault. IETF.</dd>

<dt id="refsRFC6531">[RFC6531]</dt>
<dd><cite><a href="https://tools.ietf.org/html/rfc6531">SMTP Extension for Internationalized Email</a></cite>, J. Yao, M. Mao. IETF.</dd>

<dt id="refsRFC6596">[RFC6596]</dt>
<dd><cite><a href="https://tools.ietf.org/html/rfc6596">The Canonical Link Relation</a></cite>, M. Ohye, J. Kupke. IETF.</dd>

Expand Down