Mathias Bynens

Valid JavaScript variable names in ECMAScript 6

· tagged with JavaScript, Unicode

ES6 updates the grammar for identifiers. This affects a number of things, but most importantly, identifiers can be used as variable names, and identifier names are valid unquoted property names. This post describes the observable changes compared to the old ES5 behavior.

Reserved words

ES6 reserves the await keyword in module contexts for use in the future.

// Valid in ES5, but invalid in an ES6 module context:
var await;

Escape sequences

The only type of escape sequence allowed in ES5 identifiers is the so-called Unicode escape of the form \uXXXX.

In ES6, the new Unicode code point escape syntax is accepted as well.

// Valid in ES5 and ES6:
var a;

// Valid in ES5 and ES6:
var \u0061;

// Invalid in ES5, but valid in ES6:
var \u{61};

Acceptable Unicode symbols

In ES6, identifiers must start with $, _, or any symbol with the Unicode derived core property ID_Start.

The rest of the identifier can contain $, _, U+200C zero width non-joiner, U+200D zero width joiner, or any symbol with the Unicode derived core property ID_Continue.

This differs from the definition for ES5 identifier names that was based on Unicode categories. Consequently, some Unicode symbols that were disallowed in ES5 identifiers can now be used in ES6 identifiers just fine, and vice versa.

// Valid in ES5 & Unicode v5.1.0+, but invalid in ES6:
var ⸯ; // U+2E2F VERTICAL TILDE
var \u2E2F; // U+2E2F VERTICAL TILDE

In ES5 identifiers, astral symbols were disallowed, even when represented as an escaped surrogate pair (\uXXXX\uXXXX).

In ES6, astral ID_Start or ID_Continue symbols in identifiers are accepted when represented as a raw symbol or using a single \u{…} escape sequence.

// Invalid in ES5, but valid in ES6:
var 𐊧; // U+102A7 CARIAN LETTER A2
var \u{102A7}; // U+102A7 CARIAN LETTER A2

// Invalid in ES5 and ES6:
var \uD800\uDEA7; // U+102A7 represented as a surrogate pair

The ES5 spec allowed implementations to base their identifier support on Unicode versions as old as Unicode v3.0.0. ES6 lists Unicode v5.1.0 as the minimum Unicode version required for compatibility.

// Valid in ES5, but only works in some ES5 engines (i.e. those with Unicode
// data from v3.2.0 or more recent):
var Ƞ; // U+0220 LATIN CAPITAL LETTER N WITH LONG RIGHT LEG
var \u0220;
// On the other hand, it is guaranteed to work in all ES6-compliant engines.

// Valid in ES5, but only works in some ES5 engines (i.e. those with Unicode
// data from v4.0.0 or more recent):
var ȡ; // U+0221 LATIN SMALL LETTER D WITH CURL
var \u0221;
// On the other hand, it is guaranteed to work in all ES6-compliant engines.

// Valid in ES5, but only works in some ES5 engines (i.e. those with Unicode
// data from v5.1.0 or more recent):
var _҇; // U+0487 COMBINING CYRILLIC POKRYTIE
var _\u0487;
// On the other hand, it is guaranteed to work in all ES6-compliant engines.

No more non-standard behavior

At some point all major JavaScript engines supported reserved words as identifiers if at least one of the characters was escaped. For example, var var; wouldn’t work, but e.g. var v\u0061r; would — even though this was never part of the spec.

// Invalid in ES5 and ES6:
var var;

// Invalid in ES5 and ES6, but supported in old ES5 engines:
var v\u0061r;

ES6 explicitly makes this behavior non-conforming, and implementations are moving away from it. Firefox/Spidermonkey, Safari/JavaScriptCore, and IE/Chakra have already dropped this behavior; Chrome/Opera/V8 plan to.

Bugs

There are open bug tickets to fully implement the ES6 identifier grammar in Chrome/Opera/V8 (now fixed), Firefox/SpiderMonkey, Safari/JavaScriptCore, Microsoft Edge/Chakra (now fixed), Acorn (and therefore Babel) (#214 (now fixed), #215), Esprima, and Traceur.

Resources

I wrote some identifier tests based on ES6 and Unicode 5.1.0, i.e. the minimum required Unicode version as per the spec. They helped me find bugs in several engines.

I created a script that generates a regular expression matching only valid identifiers as per ES5 and ES6. At the time of writing, the ES5 version is being used in the Esprima and Acorn parsers, among other open-source projects.

There’s also an online JavaScript identifier validator, a tool that makes it easy for you to check if a given string is a valid variable name in JavaScript.

JavaScript variable name validator

I’ve updated the unquoted JavaScript property name validator accordingly.

JavaScript unquoted property name validator

About me

Hi there! I’m Mathias. I work on V8 at Google. HTML, CSS, JavaScript, Unicode, performance, and security get me excited. If you managed to read this far without falling asleep, you should follow me on Twitter and GitHub.

Comments

Jan wrote on :

Damn, for a moment I got excited there, thinking I could finally end my identifiers in question marks. Oh well.

Thanks for the writeup!

Chris Ball wrote on :

Jan: Hmm, but there are >100k valid codepoints — seems like we should be able to find one that’s either actually or just mostly visually identical to the question mark, and use that? :)

wrote on :

Jan: ƪ, Ɂ, ʔ, ʡ, , and all look a little bit like a question mark. And here’s a valid identifier that resembles an exclamation mark: ǃ.

To get an quick overview of almost all identifier symbols, I used the following commands:

$ npm install unicode-7.0.0

$ node -p 'require("unicode-7.0.0/Binary_Property/ID_Continue/symbols.js").join(" ")' > symbols.txt

$ less symbols.txt

Jan wrote on :

Chris, Mathias: I have thought of that too, but I suspect this would be rather inconvenient to work with.

It would have only been an aesthetic nicety I’ve come to enjoy in Ruby. As such, it’s not worth jumping through hoops for.

Charles wrote on :

Jan: Maybe you could adopt the Common Lisp convention of putting a p at the end of predicate function names.

Jean-René Bouvier wrote on :

I was looking for an infinity symbol and the closest I found is U+1011, i.e. Myanmar Tha letter: .

So you can write:

const ထ = Infinity;
if (1 / ထ != 0) {
// …
}

Jean-René Bouvier wrote on :

Mathias: If you combine the Latin letter glottal stop (ʔ U+0294) with the combining dot below (̣ U+0323), you get a near perfect question mark, ʔ̣, as well as a valid JavaScript identifier.

Jean-René Bouvier wrote on :

I tried (U+174C BUHID LETTER YA) in the validator, but there’s no support for this character set. Now that Google has created NoTo fonts it might be worth using them to display all/most Unicode characters.

Ricky Reusser wrote on :

I just got a sick feeling in the pit of my stomach at the thought of debugging:

> var с = 17;
> console.log(c);
< Uncaught ReferenceError: c is not defined at <anonymous>:1:1

Leave a comment

Comment on “Valid JavaScript variable names in ECMAScript 6”

Your input will be parsed as Markdown.