Disclaimer: Many thanks to Juriy ‘kangax’ Zaytsev (Юрий Зайцев) for writing the test case that inspired me to investigate this further, and everyone in #whatwg for helping me parse the specification correctly.
ETAGO in HTML4
The HTML 4.01 spec says:
<script>elements use CDATA for their data model, for these elements, CDATA must be handled differently by user agents. Markup and entities must be treated as raw text and passed to the application as is. The first occurrence of the character sequence
</(ETAGO or end-tag open delimiter) is treated as terminating the end of the element’s content. In valid documents, this would be the end tag for the element.
Section 3.2.1 in Appendix B is more specific:
When script or style data is the content of an element (
<style>), the data begins immediately after the element start tag and ends at the first ETAGO (
</) delimiter followed by a name start character (
[a-zA-Z]); note that this may not be the element’s end tag. Authors should therefore escape
</within the content. Escape mechanisms are specific to each scripting or style sheet language.
(Note that this only applies to inline styles and scripts in HTML documents, not external files that are referenced from the HTML.)
This means that technically the following code is invalid HTML4, and it shouldn’t work:
<!-- Remember, this is HTML4 we’re talking about. Redundant @type attributes ftw! -->
<style> element would be closed as soon as the parser reaches the ETAGO delimiter, and none of the style rules in it would be applied. Paragraphs would get a red background color (see the first
<style> element). It would be equivalent to the following non-conformant markup:
bc'; background: green; }</style>
The same goes for
As per HTML4, the
SyntaxError, since it would be interpreted as follows:
Well, that’s the theory. In reality, no browser ever implemented this. The ETAGO delimiter isn’t respected as a terminating sequence for
<script> elements in any browser. You can easily confirm this yourself by viewing the test cases based on the above code examples: ETAGO delimiter inside a
<style> element and ETAGO delimiter inside a
Back to reality with HTML5
Rather than expecting existing implementations to change, HTML5 standardizes the behavior that browsers had implemented (with a few security improvements). This is described in the spec as part of the full tokenization algorithm, specifically here and here.
This means the above examples are now valid HTML. And of course, they continue to work correctly, as they always did. Generally, ETAGO delimiters can be used inside of
<script> elements. Just keep in mind that the full
</script strings followed by a space character,
/ will close their respective opening tag.
Semi-related fun fact: Since the
<title> element is an RCDATA element that uses the text content model, there’s no need to encode
< inside of it unless you want to use
</title followed by any of those characters.
<title>foo < bar</title> and
<title><i>foo</i></title> are perfectly valid markup as per HTML5. The same goes for
<textarea>. In spec lingo:
<style> are raw text elements,
<title> are RCDATA elements.
For backwards compatibility, there’s an interesting exception to this rule for
<script> elements that contain
<!-- with a later occurence of
--> — in that case, e.g.
</script> is allowed in the
<script> element’s content. Here’s a valid, working example:
While this is good to know, luckily there are better solutions than this old-school ’90s-style pattern (that only works for
<script> elements anyway). Whenever you need to use
</style> inside a
<style> element, or
</script> inside a
\, also known as “reverse solidus character”) is by far the simplest:
/* Using the Unicode code point for the solidus character (see http://mths.be/bax): */
/* Using the shorthand notation for Unicode code points (see http://mths.be/bax): */
content: '<\2F style>';
/* Simply escaping the solidus character with a reverse solidus (\): */
// Using `unescape()`:
document.write(unescape('<script>alert("wtf")%3C/script>')); // Überlame.
// Using string concatenation:
document.write('<script>alert("heh")<' + '/script>'); // Lame.
// Using the octal escape sequence for the solidus character (/):
document.write('<script>alert("hah")<\57script>'); // Lame, deprecated, and disallowed in ES5 strict mode.
// Using the Unicode escape sequence:
document.write('<script>alert("hoh")<\u002Fscript>'); // Lame.
// Using the hexadecimal escape sequence:
document.write('<script>alert("huh")<\x2Fscript>'); // Lame.
// Simply escaping the solidus character:
document.write('<script>alert("O HAI")<\/script>'); // Awesome!
Both these examples are valid HTML, and of course they work as expected in any browser.
Note that while it’s an edge case, the
42 </script/. Of course, the simple
\/ escape won’t work here. In that case, make sure to use a space before the regex literal:
42 < /script/. (I can’t think of such a case for CSS though. Can you?)