Why do they parse 16 bit UCS2? Although Javascript strings are internally UCS2, the files as transferred on the web are all UTF8 which should be easier to parse fast, and involve less conversion, unless there is something really broken in the spec.
> the files as transferred on the web are all UTF8
Except they're not. There's a ton of ISO-8859-1 out there, as well as Big5, Shift_JIS, and so forth.
So Gecko canonicalizes all input to a single encoding; that happens to be UTF-16 for various historical reasons.
Note, by the way, that since some non-ASCII characters are syntax-special in JavaScript parsing it in UTF-8 may not in fact be simpler to do fast, because locating particular non-ASCII chars in a UTF8 stream is a bit of a PITA.
Has anyone got any more up to date figures than these http://www.w3.org/QA/2008/05/utf8-web-growth.html from 1998, when UTF-8 + ASCII was already 50% of the web? I think there is a lot less ISO 8859-1 than there was back then.
I also think that for Javascript files, almost all will be ASCII, with some UTF-8, and little else. JSON is of course only allowed to be Unicode, and I suspect that almost all of it is UTF8, although that will not generally hit the Javascript parser I presume (although maybe Gecko also normalises it?).
Locating non ASCII chars obviously is an issue, but no worse than the line feed issue mentioned in the article. Doubling the size of the input is generally a big performance hit that will be much more significant.
> Doubling the size of the input is generally a big
> performance hit that will be much more significant.
Apart from cache locality issues, it's not, really. And for a linear scan like this prefetching does OK.
And yes, Gecko normalizes JSON input to the JS engine.
I don't have numbers, sorry. But yes, UTF-8 would be the other thing to try normalizing to; it has its own benefits and drawbacks when the system is considered as a whole.
This will make JavaScript parsing faster for the vast majority of sites, and for the majority of users.
If you have a single JavaScript file that is 1MB after gzipping, you really should think about modularising your code and lazy-loading the bits that aren't immediately required. See http://ajaxpatterns.org/On-Demand_Javascript