Note: This article is based on the ECMAScript 5 specification. For the updated ES2015 version, see Valid JavaScript variable names in ES2015.
Did you know var π = Math.PI;
is syntactically valid JavaScript? I thought this was pretty cool, so I decided to look into which Unicode glyphs are allowed in JavaScript variable names, or identifiers as the ECMAScript specification calls them.
Reserved words
The ECMAScript 5.1 spec says:
An
Identifier
is anIdentifierName
that is not aReservedWord
.
The spec describes four groups of reserved words: keywords, future reserved words, null literals and boolean literals.
Keywords are tokens that have special meaning in JavaScript: break
, case
, catch
, continue
, debugger
, default
, delete
, do
, else
, finally
, for
, function
, if
, in
, instanceof
, new
, return
, switch
, this
, throw
, try
, typeof
, var
, void
, while
, and with
.
Future reserved words are tokens that may become keywords in a future revision of ECMAScript: class
, const
, enum
, export
, extends
, import
, and super
. Some future reserved words only apply in strict mode: implements
, interface
, let
, package
, private
, protected
, public
, static
, and yield
.
The null literal is, simply, null
.
There are two boolean literals: true
and false
.
None of the above are allowed as variable names.
Non-reserved words that act like reserved words
The NaN
, Infinity
, and undefined
properties of the global object are immutable or read-only properties in ES5. So even though var NaN = 42;
in the global scope wouldn’t throw an error, it wouldn’t actually do anything. To avoid confusion, I’d suggest avoiding the use of these variable names.
// In the global scope:
var NaN = 42;
console.log(NaN); // NaN
// …but elsewhere:
(function() {
var NaN = 42;
console.log(NaN); // 42
}());
In strict mode, eval
and arguments
are disallowed as variable names too. (They kind of act like keywords in that case.)
The old ES3 spec defines some reserved words that aren’t reserved words in ES5 anymore: int
, byte
, char
, goto
, long
, final
, float
, short
, double
, native
, throws
, boolean
, abstract
, volatile
, transient
, and synchronized
. It’s probably a good idea to avoid these as well, for optimal backwards compatibility.
Valid identifier names
As mentioned before, the spec differentiates between identifier names and identifiers. Identifiers form a subset of identifier names, since identifiers have the extra restriction that no reserved words are allowed. For example, var
is a valid identifier name, but it’s an invalid identifier.
So, what is allowed in an identifier name?
An identifier must start with $
, _
, or any character in the Unicode categories “Uppercase letter (Lu)”, “Lowercase letter (Ll)”, “Titlecase letter (Lt)”, “Modifier letter (Lm)”, “Other letter (Lo)”, or “Letter number (Nl)”.
The rest of the string can contain the same characters, plus any U+200C zero width non-joiner characters, U+200D zero width joiner characters, and characters in the Unicode categories “Non-spacing mark (Mn)”, “Spacing combining mark (Mc)”, “Decimal digit number (Nd)”, or “Connector punctuation (Pc)”.
That’s it, really. There are a few things to note, though…
As you know, JavaScript uses UCS-2 internally, and the spec defines “characters” as follows:
Throughout the rest of this document, the phrase “code unit” and the word “character” will be used to refer to a 16-bit unsigned value used to represent a single 16-bit unit of text.
This effectively means that supplementary Unicode characters (e.g. 丽
, i.e. U+2F800 CJK Compatibility Ideograph, which is listed in the [Lo] category) are disallowed in identifier names, as JavaScript interprets them as two individual surrogate halves (e.g. \uD87E\uDC00
) which don’t match any of the allowed Unicode categories.
Another gotcha is the following:
Unicode escape sequences are also permitted in an
IdentifierName
, where they contribute a single character. […] AUnicodeEscapeSequence
cannot be used to put a character into anIdentifierName
that would otherwise be illegal.
This means that you can use var \u0061
and var a
interchangeably. Similarly, since var 1
is invalid, so is var \u0031
.
For web browsers, there is an exception to this rule, namely when reserved words are used. Most browsers support identifiers that unescape to a reserved word, as long as at least one character is escaped using a Unicode escape sequence. For example, var var;
wouldn’t work, but e.g. var v\u0061r;
would — even though strictly speaking, the ECMAScript spec disallows it. Subsequent use of such identifiers must also have at least one character escaped (otherwise the reserved word will be used instead), but it doesn’t have to be the same character(s) that were originally used to create the identifier. For example, var v\u0061r = 42; alert(va\u0072);
would alert 42
. This is very confusing, so I wouldn’t recommend relying on this hack. Luckily, it looks like the ECMAScript 6 spec will explicitly make this behavior non-conforming. Firefox/Spidermonkey, Safari/JavaScriptCore, and IE/Chakra have already dropped this behavior.
Two
IdentifierName
s that are canonically equivalent according to the Unicode standard are not equal unless they are represented by the exact same sequence of code units.
So, ma\u00F1ana
and man\u0303ana
are two different variable names, even though they’re equivalent after Unicode normalization.
Examples
The following are all examples of valid JavaScript variable names.
// How convenient!
var π = Math.PI;
// Sometimes, you just have to use the Bad Parts of JavaScript:
var ಠ_ಠ = eval;
// Code, Y U NO WORK?!
var ლ_ಠ益ಠ_ლ = 42;
// How about a JavaScript library for functional programming?
var λ = function() {};
// Obfuscate boring variable names for great justice
var \u006C\u006F\u006C\u0077\u0061\u0074 = 'heh';
// …or just make up random ones
var Ꙭൽↈⴱ = 'huh';
// Did you know about the [.] syntax?
var ᱹ = 1;
console.assert([1, 2, 3][ᱹ] === 2);
// While perfectly valid, this doesn’t work in most browsers:
var foo\u200Cbar = 42;
// This is *not* a bitwise left shift (`<<`):
var 〱〱 = 2;
// This is, though:
〱〱 << 〱〱; // 8
// Give yourself a discount:
var price_9̶9̶_89 = 'cheap';
// Fun with Roman numerals
var Ⅳ = 4;
var Ⅴ = 5;
Ⅳ + Ⅴ; // 9
// Cthulhu was here
var Hͫ̆̒̐ͣ̊̄ͯ͗͏̵̗̻̰̠̬͝ͅE̴̷̬͎̱̘͇͍̾ͦ͊͒͊̓̓̐_̫̠̱̩̭̤͈̑̎̋ͮͩ̒͑̾͋͘Ç̳͕̯̭̱̲̣̠̜͋̍O̴̦̗̯̹̼ͭ̐ͨ̊̈͘͠M̶̝̠̭̭̤̻͓͑̓̊ͣͤ̎͟͠E̢̞̮̹͍̞̳̣ͣͪ͐̈T̡̯̳̭̜̠͕͌̈́̽̿ͤ̿̅̑Ḧ̱̱̺̰̳̹̘̰́̏ͪ̂̽͂̀͠ = 'Zalgo';
Some of these don’t work in all browsers/environments — at least, not yet. See WebKit/JavaScriptCore bug #79353 and #78908 (now fixed), Chrome/V8 bug #1965 (now fixed) and #1958 (now fixed), Internet Explorer/Chakra bug #725622, Opera/Carakan bug DSK-358119 and DSK-357714/CORE-44659 (now fixed), and Firefox/Spidermonkey bug #744784.
I fixed some bugs myself, by writing patches for V8, WebKit/JavaScriptCore, Esprima and JSHint.
JavaScript variable name validator
Even if you’d learn these rules by heart, it would be virtually impossible to memorize every character in the different Unicode categories that are allowed. If you were to summarize all these rules in a single ASCII-only regular expression for JavaScript, it would be 11,236 characters long.
For that reason, I created mothereff.in/js-variables, a tool that makes it easy for you to check if a given string is a valid variable name in JavaScript.
If a valid variable name is entered, the tool checks if the browser you’re using handles the identifier correctly. If not, it will show a warning, encouraging you to file a browser bug.
The validator will warn you if an ECMAScript 3 reserved word (that isn’t a reserved word anymore) is entered. Try char
, for example.
This tool uses the Unicode 7.0.0 character database. Of course, not all JavaScript engines have the same level of Unicode support yet. As the spec says:
ECMAScript implementations may recognize identifier characters defined in later editions of the Unicode Standard. If portability is a concern, programmers should only employ identifier characters defined in Unicode 3.0.
For this reason, the validator will warn you if a variable name is invalid when the Unicode 3.0 database is used. Try \u0CF1
, for example.
Comments
Philip Tellis wrote on :
The stats library I built for Node has functions named
σ
andμ
with obvious semantics ;)Enko wrote on :
ಠ_ಠ
is a valid Ruby method name:It’s valid in Python 3 too, and seems to work in PHP as well:
akavi wrote on :
For anyone curious, ‘ಠ’ is pronounced as ‘Ta’ (Kannada language). :)
More descriptively, it’s an aspirated unvoiced sub-apical palatal plosive. If you wish to make the sound, position your tongue like you’re trying to stick it down your throat. The bottom of the tongue should be touching the top of your mouth. Then, make a ‘tuh’ sound. If you do it right, it should sound hollow and ‘harder’ than a normal English ‘t’ sound.
Alex wrote on :
I’ve never even thought about
var π = Math.PI
before — that’s awesome!Kragen Javier Sitaker wrote on :
The most legitimate use for this is for top-level namespaces, which need to be short or they’ll junk up your code like crazy. jQuery already took
$
, and Underscore took_
. Maybe木
,ϗ
,_⃗
,个
,î
,人
,Ǝ
(not∃
, that’s illegal!),ℵ
,二
,ℜ
,龍
,ℕ
,八
,Δ
,大
,ʃ
,ː
,卐
, or as mentioned above,λ
?ˀ
is probably too obnoxious though.For no particularly good reason,
☺
and☠
are illegal. I think the Plan9 strategy of considering non-ASCII characters as identifier characters by default is probably a better one than changing the language grammar every time the Unicode standard revs.James wrote on :
Thanks for posting this article. That last Cthulhu bit was crazy!
maht wrote on :
Kragen: Plan9 uses UTF-8, not Unicode.
Kragen Javier Sitaker wrote on :
maht: That’s like saying that the US uses dollars, not money.
oroce wrote on :
Awesome. By the way, this works in Perl too:
Nick wrote on :
Does this include Georgian characters?
Or is it my browser?
Mathias wrote on :
Nick: Yes, it does. ‘ლ’ is the Georgian letter las (U+10DA).
Dev wrote on :
Is there a minifier that takes advantage of this? If not there should be!
Mathias wrote on :
Dev: Note that while e.g.
ლ
is one symbol, in UTF-8 encoding it takes up 3 bytes. So minifiers are probably better off with ASCII characters only.Anthony Mills wrote on :
Dev: Also, if you use funky characters and someone screws up the character encoding settings, you’re hooped. Best to just use the normal ranges for characters.
Leo Balter wrote on :
Finally my snowman variables!
Mathias wrote on :
Leo: Unfortunately
☃
is not a valid identifier. See https://mothereff.in/js-variables#%E2%98%83.Scott Murphy wrote on :
ᐩ
,ᐨ
,ᐟ
,ᑉ
,ᐦ
,ᐸ
,ᐳ
,ㅣ
,ㅡ
are all valid JS variable names but look like operators.Time to write confusing code?
kow wrote on :
Elegant, valid and available on most keyboards :)
weiß wrote on :
kow:
ß
is certainly not beta, which would beβ
. And yes, it’s available on all Greek keyboards.Thomas wrote on :
How could you forget whitespace?
https://mothereff.in/js-variables#%EF%BE%A0%E1%85%A0%E1%85%9F ;)
Julien wrote on :
Sadly
var √ = Math.sqrt
doesn’t work :(Robert Gust-Bardon wrote on :
Mr. Bynens, your work has proved useful when it comes to the effort to fix issues #222 and #324 in UglifyJS.
Mathias wrote on :
Robert: Glad to hear! :)
Barney wrote on :
Scott: Well, that’s one particular flavour of evil — I’m making a lot of use of top-level objects
Α
&Β
>:DMatheus wrote on :
The following code is also valid:
Craig wrote on :
akavi: Thanks for taking the time to paraphrase Wikipedia.
kow: Except for all the ones that aren’t German.
Matty wrote on :
ბედნიერი ვარ რომ ვხედავ ქართული ენის ასოებს. :-)
Mariusz Nowak wrote on :
It’d be good to add some words about valid property names, in ES5, there’s no really restrictions and we can use keywords for them:
Mathias wrote on :
Mariusz: I’ve written an entire post on that: Unquoted property names / object keys in JavaScript.
Evi1M4chine wrote on :
My favorite valid ones:
I can add those arrows to any character. And it is a key on my keyboard (NEO 2.0 layout). I create the second and third arrow simply by pressing the arrow key again and again. I can even do it more than 3 times, but it won’t be visible. (I have to use that to troll people reading my code. ;)
I wish
@
and↑
, etc would be valid though. They make much more sense than$
:/William wrote on :
Julien: You can use “Ѵ” instead:
Here is some info on this character: https://en.wikipedia.org/wiki/Izhitsa
William wrote on :
Julien: You can use this though:
Kevin wrote on :
Here’s a few more valid constants: