In ECMA-262 5.1, white space characters are defined in chapter 7.2 as following characters
\u0009 \u000B \u000C \u0020 \u00A0 \uFEFF
... and
Other category "Zs"
, as defined for Unicode.
Well, that's simple, right? The Unicode database is available for download and easy parsing. Let's do this
$ wget http://www.unicode.org/Public/6.2.0/ucd/UnicodeData.txt \
-O unicode-6.2.0.txt
$ grep Zs unicode-6.2.0.txt | wc -l
18
Clearly, Unicode 6.2.0 itself specifies 18 characters to be category "Zs". But wait, that's Unicode 6.2.0. What about the newer version 6.3.0?
$ wget http://www.unicode.org/Public/6.3.0/ucd/UnicodeData.txt \
-O unicode-6.3.0.txt
$ grep Zs unicode-6.3.0.txt | wc -l
17
We ended up with one character less! What's the difference?
$ diff -y -W80 <(grep Zs unicode-6.3.0.txt) \
<(grep Zs unicode-6.2.0.txt)
0020;SPACE;Zs;0;WS;;;;;N;;;;; 0020;SPACE;Zs;0;WS;;;;;N;;;;;
00A0;NO-BREAK SPACE;Zs;0;CS;<noBreak> 00A0;NO-BREAK SPACE;Zs;0;CS;<noBreak>
1680;OGHAM SPACE MARK;Zs;0;WS;;;;;N;; 1680;OGHAM SPACE MARK;Zs;0;WS;;;;;N;;
> 180E;MONGOLIAN VOWEL SEPARATOR;Zs;0;W
2000;EN QUAD;Zs;0;WS;2002;;;;N;;;;; 2000;EN QUAD;Zs;0;WS;2002;;;;N;;;;;
2001;EM QUAD;Zs;0;WS;2003;;;;N;;;;; 2001;EM QUAD;Zs;0;WS;2003;;;;N;;;;;
2002;EN SPACE;Zs;0;WS;<compat> 0020;; 2002;EN SPACE;Zs;0;WS;<compat> 0020;;
2003;EM SPACE;Zs;0;WS;<compat> 0020;; 2003;EM SPACE;Zs;0;WS;<compat> 0020;;
2004;THREE-PER-EM SPACE;Zs;0;WS;<comp 2004;THREE-PER-EM SPACE;Zs;0;WS;<comp
2005;FOUR-PER-EM SPACE;Zs;0;WS;<compa 2005;FOUR-PER-EM SPACE;Zs;0;WS;<compa
2006;SIX-PER-EM SPACE;Zs;0;WS;<compat 2006;SIX-PER-EM SPACE;Zs;0;WS;<compat
2007;FIGURE SPACE;Zs;0;WS;<noBreak> 0 2007;FIGURE SPACE;Zs;0;WS;<noBreak> 0
2008;PUNCTUATION SPACE;Zs;0;WS;<compa 2008;PUNCTUATION SPACE;Zs;0;WS;<compa
2009;THIN SPACE;Zs;0;WS;<compat> 0020 2009;THIN SPACE;Zs;0;WS;<compat> 0020
200A;HAIR SPACE;Zs;0;WS;<compat> 0020 200A;HAIR SPACE;Zs;0;WS;<compat> 0020
202F;NARROW NO-BREAK SPACE;Zs;0;CS;<n 202F;NARROW NO-BREAK SPACE;Zs;0;CS;<n
205F;MEDIUM MATHEMATICAL SPACE;Zs;0;W 205F;MEDIUM MATHEMATICAL SPACE;Zs;0;W
3000;IDEOGRAPHIC SPACE;Zs;0;WS;<wide> 3000;IDEOGRAPHIC SPACE;Zs;0;WS;<wide>
Apparently
\u180E
(Mongolian Vowel Separator) is to category "Zs" as Pluto is to the sun's planets.
It just seems that Test262 does not reflect this change yet. That's also probably why browsers still regard it as a white space, in order to not unnecessarily lower their Test262 scores.
No comments:
Post a Comment