One of the more subtle yet awesome changes that HTML5 brings, applies to the id
attribute. I already tweeted about this a few months ago, but I think this is interesting enough to write about in more than 140 characters.
How id
differs in between HTML 4.01 and HTML5
The HTML 4.01 spec states that ID
tokens must begin with a letter ([A-Za-z]
) and may be followed by any number of letters, digits ([0-9]
), hyphens (-
), underscores (_
), colons (:
), and periods (.
). For the class
attribute, there is no such limitation. Classnames can contain any character, and they don’t have to start with a letter to be valid.
HTML5 gets rid of the additional restrictions on the id
attribute. The only requirements left — apart from being unique in the document — are that the value must contain at least one character (can’t be empty), and that it can’t contain any space characters.
This means the rules that apply to values of class
and id
attributes are now very similar in HTML5.
Err, what?
Although that probably sounds boring, this actually is pretty cool. In HTML 4.01, the following code is perfectly valid:
<p class="#">Foo.
<p class="##">Bar.
<p class="♥">Baz.
<p class="©">Inga.
<p class="{}">Lorem.
<p class="“‘’”">Ipsum.
<p class="⌘⌥">Dolor.
<p class="{}">Sit.
<p class="[attr=value]">Amet.
Heck, you could even use a brainfuck program as a classname:
<p class="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.">Hello world!
I’ve put up a demo page with some other examples, but I’m sure you can think of more. After all, the possibilities are endless :)
So what’s new?
In HTML5, you can take all of these groovy classnames and use them as values for id
attributes. Yes, HTML5 is that awesome.
<p id="#">Foo.
<p id="##">Bar.
<p id="♥">Baz.
<p id="©">Inga.
<p id="{}">Lorem.
<p id="“‘’”">Ipsum.
<p id="⌘⌥">Dolor.
<p id="{}">Sit.
<p id="[attr=value]">Amet.
<p id="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.">Hello world!
…you get the idea. I remade the same demo page as before to use id
s instead of class
es.
How to escape any character in CSS
Writing CSS for this markup is tricky. For example, you can’t just use ## { color: #f00; }
to target the element with id="#"
. Instead, you’ll have to escape the weird characters (in this case, the second #
). Doing so will cancel the meaning of special CSS characters and allows you to refer to characters you cannot easily type out, like crazy Unicode symbols. It gets even trickier if you need to use these escaped CSS selectors in JavaScript as well.
That’s why I’ve written a separate blog post explaining how to escape any character in CSS, and how to use escaped CSS selectors in JavaScript.
Comments
Glenn Glerum wrote on :
Mathias, is this backwards compatible? Or do older browsers just ignore
ID
s like that? And what does it do for semantics? It’s fun when you can use aclass
like"i-just-♥-this-sub-navigation„ø¤º°¨¨°º¤ø ¸„ø¤º°¨¨°º¤ø"
but would you put it to practice?seutje wrote on :
The night after I noticed this in the HTML5 spec was my worst night ever. Never before have I had such violent nightmares about IRC support and people doing fucked up shit, wondering why it doesn’t work entirely as expected… :(
Mathias wrote on :
Glenn: All tests on both demo pages pass in every A-grade browser, including IE6. So yeah, I’d say it’s backwards compatible.
Mathias wrote on :
I’m getting a lot of “Who would ever use this?” and “Any real use case here?” responses, so it seems a little more explanation is needed. While some of my examples will most likely never be used in production, the fact that HTML5 now allows
ID
s to contain just about any character (as was already the case for theclass
attribute in HTML 4.01) is definitely an improvement.As some people on Hacker News have pointed out, this is pretty damn useful:
<input>
element withname="items[0][name]"
can now finally have anid
matching thename
attribute. This would be invalid HTML 4.01, but valid HTML5:<input type="text" id="items[0][name]" name="items[0][name]">
"id1"
), transliterating words, or replacing letters with ‘similar’ ones (‘O’ or ‘OE’ for ‘Ø’).Kroc Camen wrote on :
seutje: This is just standardizing what all browsers already support. Developers have been able to do this all along anyway. Yes, you can shoot yourself in the foot with it, but being able to use accented characters in
class
andid
names is a definite plus and much welcomed.seutje wrote on :
Kroc Camen: I know, I’ve already run into this nightmare as someone was using underscores in his class names and wasn’t escaping these in the CSS, which caused IE6 to completely ignore it, while all other browsers gladly accept it unescaped:
http://jsbin.com/esofe3 IE6 will show all green, all other browsers will show all red.Edit: Actually, only the unescaped
_foo
is a problem, notfoo_
orfoo_bar
. Here’s a better test case: http://jsbin.com/unogus Nice catch, Nicolas Gallagher!Albert wrote on :
Marvelous! Random usage off the top of my head: links,
#456bereast
,#321contact
,#24ways
, etc. Nice, nice, nice!Weston Ruter wrote on :
Does this new relaxing of ID restrictions apply to HTML5 in the XML serialization, e.g. XHTML5?
David Bishop wrote on :
I don’t see how this makes HTML5 more classy. This just seems… unnecessary at best. Sometimes restrictions such as what exists in the HTML 4.01 spec are necessary to keep developers from doing crazy stuff.
I’m just not sure why most developers would need this; I’m not sure the minor gains are worth the possible headaches that can now be made by poor programmers.
Mathias wrote on :
Weston: Yes, this works in XHTML5 as well.
David: The “
class
y” part is a pun, since theid
attribute restrictions in HTML5 are very similar to those of theclass
attribute (in HTML4+). HTML5 gives developers more freedom to choose which characters they want to use forID
s. I’m not sure why you think this is a bad thing. To me, it’s definitely an improvement.Vic Shoup wrote on :
David Bishop: Agreed. Sounds really cool with all the flexibility until you start accounting for all the other things it can impact… Then you have to go in and do those things differently so they behave normally under HTML 5.
Weston Ruter wrote on :
Fascinating. If this works in XHTML5 as well, isn’t this a direct violation of the XML spec? I guess not if DTDs aren’t used anyway and so the
id
attributes aren’t of theID
type—so they don’t have to be XMLName
s.Mathias wrote on :
Weston: To be honest, I wouldn’t know if this is a violation of the XML spec or not. I’m not much of an XML guy.
I just recreated the entire testcase in XHTML5 (I had only tested a few
ID
s in XML before) and it turns out that in XML mode, there are three invalidID
values on my demo pages:id="<p>"
id="<><<<>><>"
id="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>."
I had to remove these or the page wouldn’t be rendered.
So, in XHTML5,
ID
s cannot contain an unescaped less-than sign (<
). Other than that, everything seems to work fine. Use of the greater-than sign (>
) presents no problem whatsoever, and I can see why.Note that it would be possible to use these three
ID
s in XHTML by wrapping the contents of the<style>
element in a CDATA block so the XML parser ignores it. Then, you could escape theid
attribute values in the XHTML to prevent the “Unescaped<
not allowed in attribute values” error, but that kind of defeats the purpose IMHO.Is this what you expected? What does the XML spec say about this?
Weston Ruter wrote on :
The XML spec says that you are a very bad man: https://www.w3.org/TR/REC-xml/#id
Christo wrote on :
David Bishop: I agree with David, nice to know the restrictions have been lifted, but lets stick to self documenting ids and classes. Future development would be a nightmare if people used such ridiculous naming techniques...
Alex wrote on :
This is wicked! Will solve a lot of my problems. But still a lot of libraries don’t handle this correctly yet e.g. jQuery. And it will destroy all backward compatibility.
Cheers, Alex
Ant Gray wrote on :
Still I see no reason to use that. What is the point of making
class
andid
names harder to read and type?Timothy (TRiG) wrote on :
As pointed out, for users of some languages, this will make id names easier to read and type. Why so anglocentric?
Tom B wrote on :
Does this apply to all other attributes like
rel
,rev
,data
etc.?Mathias wrote on :
Tom B: This has nothing to do with any other attributes. There’s a list of valid link relations, aka values for the
rel
attribute. For customdata-* attributes
you can use whichever value you like, really.Leif Halvard Silli wrote on :
Weston Ruter: XML discerns between well-formed (first level) and valid (second level). The rules described in https://www.w3.org/TR/REC-xml/#id are about validity, and not about well-formedness.
Leif Halvard Silli wrote on :
Leif Halvard Silli: No, I was wrong… it is, Weston Ruter, what you said about @ID type, that (really) matters. ;-)
Ryan Florence wrote on :
People crack me up worrying about me (a developer) doing “crazy things” like giving elements more appropriate IDs.
Good write-up.
Robert Siemer wrote on :
Useful? Yes! I have to turn some user-supplied strings into CSS class names, which is easier now:
zcorpan wrote on :
Mathias:
Just escape the
<
as<
(in both theid=""
attribute and in the<style>
element) in XHTML.Weston Ruter:
Ignore XML’s concept of ‘valid’. It’s tied to DTDs. DTDs are obsolete. If you use doctypeless XHTML, the
id=""
attribute is of type “CDATA” in XML, so that rule doesn’t even apply.David wrote on :
Just as always in web development, you cannot tell if lesser restrictions are a good or a bad thing. What really matters is the context. For a project (either private or for a company/customer) that will always and ever be maintained by you, everything is fine — as long as you do not have to support old browsers (like IE 6). just use that cool
#id
or!class
(letter one could be interesting for removing rules with modifier classes — seems very readable).If you are not the only one who works with the page (HTML, CSS, JS, etc.) in my opinion the good ol’ rules of readable and maintainable code should be applied strictly to any part of the programm, just because it is not a common coding convention to use restricted/non-native characters in IDs and classnames — and it is very likely, that other developers might be distracted on first sight, even if you comment it.
For larger teams, some kind of ‘workaround’ could be coding guidelines for the project. But that seems to be a lot of work if it would just allow you to use those
$****
-style classes and IDs.avenida gez wrote on :
The point is not if lesser restrictions are good or bad. The restrictions may be imposed by the nature of the beast itself so lesser restrictions means they should be used where they apply, if that were not the case, then there were no case for unicode too, lets all use only a-z for now on, forget about japan, china, and the whole world, lets them make their own software. For example, why this site requires email? do you justify that? it may be a fake, why all require an email to post a comment? If I needed a callback, then it do not need to be required, I myself write it. So, that fact, of many many places requiring email, or worst sign in, is a programmers sickness, that they want to be under control of every thing, that the same case in id attribute restrictions, some may be justified some not. This case is a programmer giving freedom and responsability to a programmer, removing the email will be a programmer giving freedom to a user.
Jamie Pate wrote on :
https://www.w3.org/TR/2003/WD-css3-syntax-20030813/#characters ← the CSS3 spec still prohibits class names starting with numbers and all sorts of other things. (Yes, it seems like browsers ignore the spec here, but YOU HAVE BEEN WARNED.)
Mathias wrote on :
Jamie: Did you even read the above article? The CSS spec doesn’t define what is and what isn’t allowed in HTML. HTML does.
sqykly wrote on :
Mathias: I think he’s referring (correctly) to the fact that, in order to apply CSS to an element by its ID, the ID must be valid as part of a CSS selector. Since smiley faces and pirate flags and Klingon letters are not valid as part of a CSS ID selector, using one in your HTML document commits that element to being inaccessible from CSS (via sugary
#
, at least) even if it is valid HTML. Until CSS (and let’s face it, jQuery, too) is on board, it is bad factoring on a practical level to give my element an ID that looks like a level from Kroz. Side note: just for fun, I am about to go make an HTML document with an element with an ID that is a level from Kroz, and it’s going to be awesome, but I would have to fire me if I saw myself doing that on a serious project.Mathias wrote on :
sqykly:
What are you talking about? Try this:
Any value for the HTML
id
attribute can be represented in CSS (or in selectors in JavaScript/jQuery). Check out the the tool I link to in this post.tomByrer wrote on :
Doesn't seem that period is allowed for IDs, at least where CSS is concerned: http://jsbin.com/mimey/1/edit :(
Mathias wrote on :
tomByrer: You just have to escape it. Read the last paragraph in this post.
José A M Pacheco wrote on :
I made a page with only the following:
…and before the
</body>
:Tested in Chrome 36, IE8, IE11, and the latest version of Firefox, and it worked in all of them. Does this make
document.getElementById
obsolete?Mathias wrote on :
José: See Named access on the
Window
object in the HTML Standard.