The `id` attribute got more `class`y in HTML5

Published 11th July 2010 · tagged with CSS, HTML, Unicode

One of the more subtle yet awesome changes that HTML5 brings, applies to the id attribute. I already tweeted about this a few months ago, but I think this is interesting enough to write about in more than 140 characters.

How `id` differs in between HTML 4.01 and HTML5

The HTML 4.01 spec states that ID tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens (-), underscores (_), colons (:), and periods (.). For the class attribute, there is no such limitation. Classnames can contain any character, and they don’t have to start with a letter to be valid.

HTML5 gets rid of the additional restrictions on the id attribute. The only requirements left — apart from being unique in the document — are that the value must contain at least one character (can’t be empty), and that it can’t contain any space characters.

This means the rules that apply to values of class and id attributes are now very similar in HTML5.

Err, what?

Although that probably sounds boring, this actually is pretty cool. In HTML 4.01, the following code is perfectly valid:

<p class="#">Foo.
<p class="##">Bar.
<p class="♥">Baz.
<p class="©">Inga.
<p class="{}">Lorem.
<p class="“‘’”">Ipsum.
<p class="⌘⌥">Dolor.
<p class="{}">Sit.
<p class="[attr=value]">Amet.

Heck, you could even use a brainfuck program as a classname:

<p class="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.">Hello world!

I’ve put up a demo page with some other examples, but I’m sure you can think of more. After all, the possibilities are endless :)

So what’s new?

In HTML5, you can take all of these groovy classnames and use them as values for id attributes. Yes, HTML5 is that awesome.

<p id="#">Foo.
<p id="##">Bar.
<p id="♥">Baz.
<p id="©">Inga.
<p id="{}">Lorem.
<p id="“‘’”">Ipsum.
<p id="⌘⌥">Dolor.
<p id="{}">Sit.
<p id="[attr=value]">Amet.
<p id="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.">Hello world!

…you get the idea. I remade the same demo page as before to use ids instead of classes.

How to escape any character in CSS

Writing CSS for this markup is tricky. For example, you can’t just use ## { color: #f00; } to target the element with id="#". Instead, you’ll have to escape the weird characters (in this case, the second #). Doing so will cancel the meaning of special CSS characters and allows you to refer to characters you cannot easily type out, like crazy Unicode symbols. It gets even trickier if you need to use these escaped CSS selectors in JavaScript as well.

That’s why I’ve written a separate blog post explaining how to escape any character in CSS, and how to use escaped CSS selectors in JavaScript.

Comments

Glenn Glerum wrote on 12th July 2010 at 13:58:

Mathias, is this backwards compatible? Or do older browsers just ignore IDs like that? And what does it do for semantics? It’s fun when you can use a class like "i-just-♥-this-sub-navigation„ø¤º°¨¨°º¤ø ¸„ø¤º°¨¨°º¤ø" but would you put it to practice?

seutje wrote on 12th July 2010 at 14:13:

The night after I noticed this in the HTML5 spec was my worst night ever. Never before have I had such violent nightmares about IRC support and people doing fucked up shit, wondering why it doesn’t work entirely as expected… :(

Mathias wrote on 12th July 2010 at 14:16:

Glenn: All tests on both demo pages pass in every A-grade browser, including IE6. So yeah, I’d say it’s backwards compatible.

Mathias wrote on 12th July 2010 at 14:51:

I’m getting a lot of “Who would ever use this?” and “Any real use case here?” responses, so it seems a little more explanation is needed. While some of my examples will most likely never be used in production, the fact that HTML5 now allows IDs to contain just about any character (as was already the case for the class attribute in HTML 4.01) is definitely an improvement.

As some people on Hacker News have pointed out, this is pretty damn useful:

An <input> element with name="items[0][name]" can now finally have an id matching the name attribute. This would be invalid HTML 4.01, but valid HTML5: <input type="text" id="items[0][name]" name="items[0][name]">
It might be useful for programmers in other languages, so they don’t have to either come up with English names, less descriptive names ("id1"), transliterating words, or replacing letters with ‘similar’ ones (‘O’ or ‘OE’ for ‘Ø’).

Kroc Camen wrote on 12th July 2010 at 14:53:

seutje: This is just standardizing what all browsers already support. Developers have been able to do this all along anyway. Yes, you can shoot yourself in the foot with it, but being able to use accented characters in class and id names is a definite plus and much welcomed.

seutje wrote on 12th July 2010 at 15:45:

Kroc Camen: I know, I’ve already run into this nightmare as someone was using underscores in his class names and wasn’t escaping these in the CSS, which caused IE6 to completely ignore it, while all other browsers gladly accept it unescaped: ~~http://jsbin.com/esofe3 IE6 will show all green, all other browsers will show all red.~~

Edit: Actually, only the unescaped _foo is a problem, not foo_ or foo_bar. Here’s a better test case: http://jsbin.com/unogus Nice catch, Nicolas Gallagher!

Albert wrote on 12th July 2010 at 16:40:

Marvelous! Random usage off the top of my head: links, #456bereast, #321contact, #24ways, etc. Nice, nice, nice!

Weston Ruter wrote on 12th July 2010 at 17:51:

Does this new relaxing of ID restrictions apply to HTML5 in the XML serialization, e.g. XHTML5?

David Bishop wrote on 12th July 2010 at 18:05:

I don’t see how this makes HTML5 more classy. This just seems… unnecessary at best. Sometimes restrictions such as what exists in the HTML 4.01 spec are necessary to keep developers from doing crazy stuff.

I’m just not sure why most developers would need this; I’m not sure the minor gains are worth the possible headaches that can now be made by poor programmers.

Mathias wrote on 12th July 2010 at 18:14:

Weston: Yes, this works in XHTML5 as well.

David: The “classy” part is a pun, since the id attribute restrictions in HTML5 are very similar to those of the class attribute (in HTML4+). HTML5 gives developers more freedom to choose which characters they want to use for IDs. I’m not sure why you think this is a bad thing. To me, it’s definitely an improvement.

Vic Shoup wrote on 12th July 2010 at 18:52:

David Bishop: Agreed. Sounds really cool with all the flexibility until you start accounting for all the other things it can impact… Then you have to go in and do those things differently so they behave normally under HTML 5.

Weston Ruter wrote on 12th July 2010 at 19:38:

Fascinating. If this works in XHTML5 as well, isn’t this a direct violation of the XML spec? I guess not if DTDs aren’t used anyway and so the id attributes aren’t of the ID type—so they don’t have to be XML Names.

Mathias wrote on 12th July 2010 at 20:20:

Weston: To be honest, I wouldn’t know if this is a violation of the XML spec or not. I’m not much of an XML guy.

I just recreated the entire testcase in XHTML5 (I had only tested a few IDs in XML before) and it turns out that in XML mode, there are three invalid ID values on my demo pages:

id="<p>"
id="<><<<>><>"
id="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>."

I had to remove these or the page wouldn’t be rendered.

So, in XHTML5, IDs cannot contain an unescaped less-than sign (<). Other than that, everything seems to work fine. Use of the greater-than sign (>) presents no problem whatsoever, and I can see why.

Note that it would be possible to use these three IDs in XHTML by wrapping the contents of the <style> element in a CDATA block so the XML parser ignores it. Then, you could escape the id attribute values in the XHTML to prevent the “Unescaped < not allowed in attribute values” error, but that kind of defeats the purpose IMHO.

Is this what you expected? What does the XML spec say about this?

Weston Ruter wrote on 13th July 2010 at 06:34:

The XML spec says that you are a very bad man: https://www.w3.org/TR/REC-xml/#id

Christo wrote on 15th July 2010 at 23:42:

David Bishop: I agree with David, nice to know the restrictions have been lifted, but lets stick to self documenting ids and classes. Future development would be a nightmare if people used such ridiculous naming techniques...

Alex wrote on 20th July 2010 at 12:30:

This is wicked! Will solve a lot of my problems. But still a lot of libraries don’t handle this correctly yet e.g. jQuery. And it will destroy all backward compatibility.

Cheers, Alex

Ant Gray wrote on 22nd July 2010 at 08:00:

Still I see no reason to use that. What is the point of making class and id names harder to read and type?

Timothy (TRiG) wrote on 27th September 2010 at 19:32:

What is the point of making class and id names harder to read and type?

As pointed out, for users of some languages, this will make id names easier to read and type. Why so anglocentric?

Tom B wrote on 1st December 2010 at 13:03:

Does this apply to all other attributes like rel, rev, data etc.?

Mathias wrote on 1st December 2010 at 13:10:

Tom B: This has nothing to do with any other attributes. There’s a list of valid link relations, aka values for the rel attribute. For custom data-* attributes you can use whichever value you like, really.

Leif Halvard Silli wrote on 28th January 2011 at 08:52:

Weston Ruter: XML discerns between well-formed (first level) and valid (second level). The rules described in https://www.w3.org/TR/REC-xml/#id are about validity, and not about well-formedness.

Leif Halvard Silli wrote on 28th January 2011 at 19:03:

Leif Halvard Silli: No, I was wrong… it is, Weston Ruter, what you said about @ID type, that (really) matters. ;-)

Ryan Florence wrote on 24th March 2011 at 04:01:

People crack me up worrying about me (a developer) doing “crazy things” like giving elements more appropriate IDs.

Good write-up.

Robert Siemer wrote on 18th July 2011 at 11:42:

Useful? Yes! I have to turn some user-supplied strings into CSS class names, which is easier now:

handle/filter NUL (NUL, even escaped, provokes undefined behavior in CSS)
escape for HTML (How to handle non-space-whitespace?)
escape for CSS
done

zcorpan wrote on 12th January 2012 at 12:28:

Mathias:

id="<p>"

Just escape the < as < (in both the id="" attribute and in the <style> element) in XHTML.

Weston Ruter:

The XML spec says that you are a very bad man: https://www.w3.org/TR/REC-xml/#id

Ignore XML’s concept of ‘valid’. It’s tied to DTDs. DTDs are obsolete. If you use doctypeless XHTML, the id="" attribute is of type “CDATA” in XML, so that rule doesn’t even apply.

David wrote on 16th January 2013 at 21:13:

Just as always in web development, you cannot tell if lesser restrictions are a good or a bad thing. What really matters is the context. For a project (either private or for a company/customer) that will always and ever be maintained by you, everything is fine — as long as you do not have to support old browsers (like IE 6). just use that cool #id or !class (letter one could be interesting for removing rules with modifier classes — seems very readable).

If you are not the only one who works with the page (HTML, CSS, JS, etc.) in my opinion the good ol’ rules of readable and maintainable code should be applied strictly to any part of the programm, just because it is not a common coding convention to use restricted/non-native characters in IDs and classnames — and it is very likely, that other developers might be distracted on first sight, even if you comment it.

For larger teams, some kind of ‘workaround’ could be coding guidelines for the project. But that seems to be a lot of work if it would just allow you to use those $****-style classes and IDs.

avenida gez wrote on 20th March 2013 at 06:55:

The point is not if lesser restrictions are good or bad. The restrictions may be imposed by the nature of the beast itself so lesser restrictions means they should be used where they apply, if that were not the case, then there were no case for unicode too, lets all use only a-z for now on, forget about japan, china, and the whole world, lets them make their own software. For example, why this site requires email? do you justify that? it may be a fake, why all require an email to post a comment? If I needed a callback, then it do not need to be required, I myself write it. So, that fact, of many many places requiring email, or worst sign in, is a programmers sickness, that they want to be under control of every thing, that the same case in id attribute restrictions, some may be justified some not. This case is a programmer giving freedom and responsability to a programmer, removing the email will be a programmer giving freedom to a user.

Jamie Pate wrote on 16th April 2013 at 00:08:

https://www.w3.org/TR/2003/WD-css3-syntax-20030813/#characters ← the CSS3 spec still prohibits class names starting with numbers and all sorts of other things. (Yes, it seems like browsers ignore the spec here, but YOU HAVE BEEN WARNED.)

Mathias wrote on 16th April 2013 at 11:17:

Jamie: Did you even read the above article? The CSS spec doesn’t define what is and what isn’t allowed in HTML. HTML does.

sqykly wrote on 8th September 2013 at 09:54:

Mathias: I think he’s referring (correctly) to the fact that, in order to apply CSS to an element by its ID, the ID must be valid as part of a CSS selector. Since smiley faces and pirate flags and Klingon letters are not valid as part of a CSS ID selector, using one in your HTML document commits that element to being inaccessible from CSS (via sugary #, at least) even if it is valid HTML. Until CSS (and let’s face it, jQuery, too) is on board, it is bad factoring on a practical level to give my element an ID that looks like a level from Kroz. Side note: just for fun, I am about to go make an HTML document with an element with an ID that is a level from Kroz, and it’s going to be awesome, but I would have to fire me if I saw myself doing that on a serious project.

Mathias wrote on 8th September 2013 at 12:39:

sqykly:

Since smiley faces and pirate flags and Klingon letters are not valid as part of a CSS ID selector […]

What are you talking about? Try this:

<style>#☺ { background: lime; }</style>
<p id=☺>This paragraph gets a lime background.

Any value for the HTML id attribute can be represented in CSS (or in selectors in JavaScript/jQuery). Check out the the tool I link to in this post.

tomByrer wrote on 9th April 2014 at 04:02:

Doesn't seem that period is allowed for IDs, at least where CSS is concerned: http://jsbin.com/mimey/1/edit :(

Mathias wrote on 9th April 2014 at 08:49:

tomByrer: You just have to escape it. Read the last paragraph in this post.

[…] That’s why I’ve written a separate blog post explaining how to escape any character in CSS, and how to use escaped CSS selectors in JavaScript.

José A M Pacheco wrote on 19th January 2015 at 16:50:

I made a page with only the following:

<input id=cpf value="anything">

…and before the </body>:

<script> 
  console.log(cpf.value);
  // → 'Anything' 
</script>

Tested in Chrome 36, IE8, IE11, and the latest version of Firefox, and it worked in all of them. Does this make document.getElementById obsolete?

Mathias wrote on 19th January 2015 at 16:54:

José: See Named access on the Window object in the HTML Standard.

Mathias Bynens

The `id` attribute got more `class`y in HTML5

How `id` differs in between HTML 4.01 and HTML5

Err, what?

So what’s new?

How to escape any character in CSS

Comments

Leave a comment

How id differs in between HTML 4.01 and HTML5

Err, what?

So what’s new?

How to escape any character in CSS

Comments

Leave a comment

How `id` differs in between HTML 4.01 and HTML5