Mathias Bynens

The id attribute got more classy in HTML5

Published · tagged with CSS, HTML, Unicode

One of the more subtle yet awesome changes that HTML5 brings, applies to the id attribute. I already tweeted about this a few months ago, but I think this is interesting enough to write about in more than 140 characters.

How id differs in between HTML 4.01 and HTML5

The HTML 4.01 spec states that ID tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens (-), underscores (_), colons (:), and periods (.). For the class attribute, there is no such limitation. Classnames can contain any character, and they don’t have to start with a letter to be valid.

HTML5 gets rid of the additional restrictions on the id attribute. The only requirements left — apart from being unique in the document — are that the value must contain at least one character (can’t be empty), and that it can’t contain any space characters.

This means the rules that apply to values of class and id attributes are now very similar in HTML5.

Err, what?

Although that probably sounds boring, this actually is pretty cool. In HTML 4.01, the following code is perfectly valid:

<p class="#">Foo.
<p class="##">Bar.
<p class="♥">Baz.
<p class="©">Inga.
<p class="{}">Lorem.
<p class="“‘’”">Ipsum.
<p class="⌘⌥">Dolor.
<p class="{}">Sit.
<p class="[attr=value]">Amet.

Heck, you could even use a brainfuck program as a classname:

<p class="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.">Hello world!

I’ve put up a demo page with some other examples, but I’m sure you can think of more. After all, the possibilities are endless :)

So what’s new?

In HTML5, you can take all of these groovy classnames and use them as values for id attributes. Yes, HTML5 is that awesome.

<p id="#">Foo.
<p id="##">Bar.
<p id="♥">Baz.
<p id="©">Inga.
<p id="{}">Lorem.
<p id="“‘’”">Ipsum.
<p id="⌘⌥">Dolor.
<p id="{}">Sit.
<p id="[attr=value]">Amet.
<p id="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.">Hello world!

…you get the idea. I remade the same demo page as before to use ids instead of classes.

How to escape any character in CSS

Writing CSS for this markup is tricky. For example, you can’t just use ## { color: #f00; } to target the element with id="#". Instead, you’ll have to escape the weird characters (in this case, the second #). Doing so will cancel the meaning of special CSS characters and allows you to refer to characters you cannot easily type out, like crazy Unicode symbols. It gets even trickier if you need to use these escaped CSS selectors in JavaScript as well.

That’s why I’ve written a separate blog post explaining how to escape any character in CSS, and how to use escaped CSS selectors in JavaScript.

About me

Hi there! I’m Mathias. I work on Chrome DevTools and the V8 JavaScript engine at Google. HTML, CSS, JavaScript, Unicode, performance, and security get me excited. Follow me on Twitter, Mastodon, and GitHub.

Comments

Glenn Glerum wrote on :

Mathias, is this backwards compatible? Or do older browsers just ignore IDs like that? And what does it do for semantics? It’s fun when you can use a class like "i-just-♥-this-sub-navigation„ø¤º°¨¨°º¤ø ¸„ø¤º°¨¨°º¤ø" but would you put it to practice?

seutje wrote on :

The night after I noticed this in the HTML5 spec was my worst night ever. Never before have I had such violent nightmares about IRC support and people doing fucked up shit, wondering why it doesn’t work entirely as expected… :(

wrote on :

I’m getting a lot of “Who would ever use this?” and “Any real use case here?” responses, so it seems a little more explanation is needed. While some of my examples will most likely never be used in production, the fact that HTML5 now allows IDs to contain just about any character (as was already the case for the class attribute in HTML 4.01) is definitely an improvement.

As some people on Hacker News have pointed out, this is pretty damn useful:

  • An <input> element with name="items[0][name]" can now finally have an id matching the name attribute. This would be invalid HTML 4.01, but valid HTML5: <input type="text" id="items[0][name]" name="items[0][name]">
  • It might be useful for programmers in other languages, so they don’t have to either come up with English names, less descriptive names ("id1"), transliterating words, or replacing letters with ‘similar’ ones (‘O’ or ‘OE’ for ‘Ø’).

Kroc Camen wrote on :

seutje: This is just standardizing what all browsers already support. Developers have been able to do this all along anyway. Yes, you can shoot yourself in the foot with it, but being able to use accented characters in class and id names is a definite plus and much welcomed.

seutje wrote on :

Kroc Camen: I know, I’ve already run into this nightmare as someone was using underscores in his class names and wasn’t escaping these in the CSS, which caused IE6 to completely ignore it, while all other browsers gladly accept it unescaped: http://jsbin.com/esofe3 IE6 will show all green, all other browsers will show all red.

Edit: Actually, only the unescaped _foo is a problem, not foo_ or foo_bar. Here’s a better test case: http://jsbin.com/unogus Nice catch, Nicolas Gallagher!

Albert wrote on :

Marvelous! Random usage off the top of my head: links, #456bereast, #321contact, #24ways, etc. Nice, nice, nice!

David Bishop wrote on :

I don’t see how this makes HTML5 more classy. This just seems… unnecessary at best. Sometimes restrictions such as what exists in the HTML 4.01 spec are necessary to keep developers from doing crazy stuff.

I’m just not sure why most developers would need this; I’m not sure the minor gains are worth the possible headaches that can now be made by poor programmers.

wrote on :

Weston: Yes, this works in XHTML5 as well.

David: The “classy” part is a pun, since the id attribute restrictions in HTML5 are very similar to those of the class attribute (in HTML4+). HTML5 gives developers more freedom to choose which characters they want to use for IDs. I’m not sure why you think this is a bad thing. To me, it’s definitely an improvement.

Vic Shoup wrote on :

David Bishop: Agreed. Sounds really cool with all the flexibility until you start accounting for all the other things it can impact… Then you have to go in and do those things differently so they behave normally under HTML 5.

Weston Ruter wrote on :

Fascinating. If this works in XHTML5 as well, isn’t this a direct violation of the XML spec? I guess not if DTDs aren’t used anyway and so the id attributes aren’t of the ID type—so they don’t have to be XML Names.

wrote on :

Weston: To be honest, I wouldn’t know if this is a violation of the XML spec or not. I’m not much of an XML guy.

I just recreated the entire testcase in XHTML5 (I had only tested a few IDs in XML before) and it turns out that in XML mode, there are three invalid ID values on my demo pages:

  • id="<p>"
  • id="<><<<>><>"
  • id="++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>."

I had to remove these or the page wouldn’t be rendered.

So, in XHTML5, IDs cannot contain an unescaped less-than sign (<). Other than that, everything seems to work fine. Use of the greater-than sign (>) presents no problem whatsoever, and I can see why.

Note that it would be possible to use these three IDs in XHTML by wrapping the contents of the <style> element in a CDATA block so the XML parser ignores it. Then, you could escape the id attribute values in the XHTML to prevent the “Unescaped < not allowed in attribute values” error, but that kind of defeats the purpose IMHO.

Is this what you expected? What does the XML spec say about this?

Christo wrote on :

David Bishop: I agree with David, nice to know the restrictions have been lifted, but lets stick to self documenting ids and classes. Future development would be a nightmare if people used such ridiculous naming techniques...

Alex wrote on :

This is wicked! Will solve a lot of my problems. But still a lot of libraries don’t handle this correctly yet e.g. jQuery. And it will destroy all backward compatibility.

Cheers, Alex

Ant Gray wrote on :

Still I see no reason to use that. What is the point of making class and id names harder to read and type?

Timothy (TRiG) wrote on :

What is the point of making class and id names harder to read and type?

As pointed out, for users of some languages, this will make id names easier to read and type. Why so anglocentric?

Robert Siemer wrote on :

Useful? Yes! I have to turn some user-supplied strings into CSS class names, which is easier now:

  • handle/filter NUL (NUL, even escaped, provokes undefined behavior in CSS)
  • escape for HTML (How to handle non-space-whitespace?)
  • escape for CSS
  • done

zcorpan wrote on :

Mathias:

id="<p>"

Just escape the < as &lt; (in both the id="" attribute and in the <style> element) in XHTML.

Weston Ruter:

The XML spec says that you are a very bad man: https://www.w3.org/TR/REC-xml/#id

Ignore XML’s concept of ‘valid’. It’s tied to DTDs. DTDs are obsolete. If you use doctypeless XHTML, the id="" attribute is of type “CDATA” in XML, so that rule doesn’t even apply.

David wrote on :

Just as always in web development, you cannot tell if lesser restrictions are a good or a bad thing. What really matters is the context. For a project (either private or for a company/customer) that will always and ever be maintained by you, everything is fine — as long as you do not have to support old browsers (like IE 6). just use that cool #id or !class (letter one could be interesting for removing rules with modifier classes — seems very readable).

If you are not the only one who works with the page (HTML, CSS, JS, etc.) in my opinion the good ol’ rules of readable and maintainable code should be applied strictly to any part of the programm, just because it is not a common coding convention to use restricted/non-native characters in IDs and classnames — and it is very likely, that other developers might be distracted on first sight, even if you comment it.

For larger teams, some kind of ‘workaround’ could be coding guidelines for the project. But that seems to be a lot of work if it would just allow you to use those $****-style classes and IDs.

avenida gez wrote on :

The point is not if lesser restrictions are good or bad. The restrictions may be imposed by the nature of the beast itself so lesser restrictions means they should be used where they apply, if that were not the case, then there were no case for unicode too, lets all use only a-z for now on, forget about japan, china, and the whole world, lets them make their own software. For example, why this site requires email? do you justify that? it may be a fake, why all require an email to post a comment? If I needed a callback, then it do not need to be required, I myself write it. So, that fact, of many many places requiring email, or worst sign in, is a programmers sickness, that they want to be under control of every thing, that the same case in id attribute restrictions, some may be justified some not. This case is a programmer giving freedom and responsability to a programmer, removing the email will be a programmer giving freedom to a user.

sqykly wrote on :

Mathias: I think he’s referring (correctly) to the fact that, in order to apply CSS to an element by its ID, the ID must be valid as part of a CSS selector. Since smiley faces and pirate flags and Klingon letters are not valid as part of a CSS ID selector, using one in your HTML document commits that element to being inaccessible from CSS (via sugary #, at least) even if it is valid HTML. Until CSS (and let’s face it, jQuery, too) is on board, it is bad factoring on a practical level to give my element an ID that looks like a level from Kroz. Side note: just for fun, I am about to go make an HTML document with an element with an ID that is a level from Kroz, and it’s going to be awesome, but I would have to fire me if I saw myself doing that on a serious project.

wrote on :

sqykly:

Since smiley faces and pirate flags and Klingon letters are not valid as part of a CSS ID selector […]

What are you talking about? Try this:

<style>#☺ { background: lime; }</style>
<p id=☺>This paragraph gets a lime background.

Any value for the HTML id attribute can be represented in CSS (or in selectors in JavaScript/jQuery). Check out the the tool I link to in this post.

tomByrer wrote on :

Doesn't seem that period is allowed for IDs, at least where CSS is concerned: http://jsbin.com/mimey/1/edit :(

José A M Pacheco wrote on :

I made a page with only the following:

<input id=cpf value="anything">

…and before the </body>:

<script> 
console.log(cpf.value);
// → 'Anything'
</script>

Tested in Chrome 36, IE8, IE11, and the latest version of Firefox, and it worked in all of them. Does this make document.getElementById obsolete?

Leave a comment

Comment on “The id attribute got more classy in HTML5”

Your input will be parsed as Markdown.