The XML serialization of HTML5, aka ‘XHTML5’
You know, I’ll always prefer HTML over XHTML because it’s much less verbose and I like to keep things simple. True story.
But that didn’t stop me from wondering how exactly one triggers HTML5’s XML mode — let’s call it XHTML5 from now on. As it turns out, there are three easy steps to convert your HTML5 documents to XHTML5.
Use the correct MIME type
You’re not really using XHTML5 until you’re serving your documents with the corresponding MIME type. For XML, this is application/xhtml+xml. You can easily do this by configuring your web server, or by using a server-side scripting language. Here’s an example in PHP:
<?php header('Content-Type: application/xhtml+xml;charset=UTF-8'); ?>
Use the correct DOCTYPE — or don’t use it at all
The DOCTYPE declaration, which can be written as <!doctype html> in ‘regular’ HTML5, can be omitted in XHTML5. If you insist on using it, however, you should know ‘DOCTYPE’ is supposed to be written in uppercase in XML mode, like this:
<!DOCTYPE html>
The second part can be written in lowercase (html), uppercase (HTML) or even mixed case (hTmL), but if you don’t uppercase DOCTYPE, the XML parser will return a syntax error.
Specify the XML namespace
XHTML5 requires you to add an XML namespace to the root element of the document, which in this case is of course the html element.
<html xmlns="http://www.w3.org/1999/xhtml">
Putting it all together
That’s all there is to it. You can view an example XHTML5 document. This is pretty much what the source code looks like:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>Example XHTML5 document</title>
</head>
<body>
<!-- HERE BE DRAGONS -->
</body>
</html>
Remember, the document must be served as application/xhtml+xml to trigger HTML5 in XML serialization mode. The DOCTYPE declaration is optional in XML mode, but if you don’t want to omit it, it needs to be uppercase.
Of course, you’ll have to use valid XHTML syntax for your document to work. A single syntax error is enough to cause a Yellow Screen of Death instead of the page you wanted to display. Use with caution!
Better yet, never ever use XHTML unless you have a very good reason to do so! There’s no added value, and it will only complicate things. When in doubt, use regular HTML5.
Comments
Austin Andrews wrote on :
I don’t think warning people not to use XHTML5 is a good thing. Its strict parsing and namespaces could actually be beneficial to the web.
But really HTML5 is kind of the norm now so it’s kind of a moot point.
Kevin Diale wrote on :
“No added value”. Bleh.
Standardizing the web is something we’ve been moving towards for a long time, HTML5 being based on HTML instead of XHTML aside. If the XHTML became the standard, it would ease the use of many different technologies, and quicken development time for anything that had to parse HTML because they wouldn’t have to implement catches for badly-formed document.
That’s really the term for a document that can’t be easily converted into XML. Badly-formed.
/rant.
A.J. Cates wrote on :
I agree with Austin. Being able to use other XML languages in your document is truly a magical miracle.
Mathias wrote on :
Kevin: How can any document be ‘badly-formed’ if it perfectly conforms to a specification?
A.J. Which other XML languages are you talking about? SVG? MathML? You don’t need XHTML for that; HTML5 will do fine.
A.J. Cates wrote on :
Mathias: FBML comes to mind, XUL is another one, there are literally hundreds of different namespaces that can be useful to use inside an HTML5 document. If you are writing a big application like Facebook, customized namespaces can really speed up and simplify development time.
Austin Andrews wrote on :
You can also extend the DOM quite a bit for custom JS frameworks. Kind of an inline markup for custom UI elements.
Plus inline SVG is kind of ugly in HTML5’s implementation.
Kevin Diale wrote on :
HTML is very much a technology that babies its authors. There’s a way that it’s supposed to be written and a way that it can be written.
Well-formed HTML documents, that is, documents written in the way that HTML is supposed to be written, closes everything it opens. Every
<p>tag has a</p>tag later down the line. It doesn’t extend this to empties like<br>but that’s not terribly important because nothing nests inside of<br>. Nothing can.Badly-formed HTML documents are documents that are written more sloppily, documents that don’t close what they open. The parser sees that you forgot to close a
<p>tag goes ahead and does it for you. By using HTML in this way you are WILLINGLY GIVING UP CONTROL OF HOW YOUR DOCUMENT RENDERS. You do part of the work and tell the parser to sort it out.XHTML is HTML with a conscience. “I will not submit myself to tag soup” it says. “I will be forward-facing and integrate properly with all the other languages on the playground. I will be standardized so that I may retain full control over how I am seen.”
It’s not a good thing to be lazy with your markup, and if you’re keeping up with good practices in HTML, it’s only a stone’s throw away from being XHTML. Just close your empties and change the DOCTYPE.
HTML5 being based in HTML is a step backwards. I can guarantee you that.
Mathias wrote on :
A.J. Cates: Sure, but unless you’re doing so (developing the next Facebook) you probably have no reason at all to choose XHTML5 over HTML5.
Also, note that technically, Facebook isn’t using XHTML either; they’re serving their documents with a
text/htmlMIME type.A.J. Cates wrote on :
Mathias: Or let’s say you want to add a proper Facebook Like button to your page without using disgusting
iframes.And look at the DOCTYPE on Facebook.
Dirk Gadsden wrote on :
Mathias: But they are serving using an XHTML DOCTYPE. (A.J. Cates beat me to the punch.)
I agree with Kevin Diale in the idea that, as Austin said, XHTML’s “strict parsing and namespaces could actually be beneficial to the web.” HTML5 is supposed to represent a new horizon in standardization and compatibility, and I second the idea that it being based in HTML is definitely a step backwards.
A.J. Cates wrote on :
Dirk: To me XHTML represents that horizon of standardization and compatibility, while HTML5 just means new features.
Mathias wrote on :
A.J. and Dirk Yes, Facebook is using the XHTML 1.0 Strict DOCTYPE. What’s your point? They’re still serving their documents with the
text/htmlMIME type. Allow me to quote myself here: “Note that technically, Facebook isn’t using XHTML either; they’re serving their documents with atext/htmlMIME type.”Austin: “Inline SVG is kind of ugly in HTML5's implementation.” Care to elaborate on that?
Kevin You seem to be missing the point that HTML5 is a spec, which HTML parsers should adhere to. Nobody is “WILLINGLY GIVING UP CONTROL OF HOW YOUR DOCUMENT RENDERS” [sic]; you know how it will be interpreted because it’s documented in the specification. There’s nothing malformed about an HTML5 document that omits an optional closing tag, as long as it is written according to the spec.
Dirk Gadsden wrote on :
Mathias: My point is that they’re still writing XHTML; last time I checked XHTML is still valid HTML, so a
text/htmlMIME type is just fine. Also, Kevin is saying that the HTML5 specification should follow XHTML’s footsteps instead of continuing on with the original HTML’s legacy of malformed’ness.A.J. Cates wrote on :
Mathias: I think what Kevin is trying to say is that the HTML5 spec allows for malformed XML so it becomes difficult to parse and control the document.
Mathias wrote on :
Dirk Gadsden: If you’re sending content as
text/htmlyou’re using HTML, not XHTML. A DOCTYPE declaration won’t change that.And what's with the “malformed” buzzword here? How can something be malformed if it's perfectly conforming to a specification?
Austin Andrews wrote on :
Mathias: I was just putting it out there. It looks rather ugly compared to the XHTML’s implementation. Pointing out in the future the tags will start to mix.
Also I believe anything not adhering to XML syntax should be defined as malformed.
Dirk Gadsden wrote on :
That too, A.J. Cates, good point. I agree with Austin in that HTML is often malformed when compared to XHTML.
I think this is an argument over specifications, Mathias. Also, you’re wrong in saying
text/htmlis not a valid content-type for XHTML, see Wikipedia and the W3C.Mathias wrote on :
Dirk:
Thanks for making my case! I quote http://www.w3.org/TR/2009/NOTE-xhtml-media-types-20090116/#text-html:
A.J. Cates wrote on :
Mathias: Just says it isn’t suitable for XHTML documents that use foreign namespaces. It says nothing about when the document is served as
text/htmlit becomes invalid, and the XHTML media types are not part of the XHTML 1.0 specs, they are just suggestions from the W3C. If you run a document through the XHTML validator when it’s served astext/htmlyou don’t get any errors and still pass validation but you also get a notice suggesting how it should be served.Mathias wrote on :
A.J. Cates: This is not about validation, it is about parsing. See Henri Sivonen’s Unofficial FAQ about the Discontinuation of the XHTML2 WG:
The same goes for any other version of XHTML.
Anyhow, this discussion is starting to feel like xkcd.com/386…
A.J. Cates wrote on :
Mathias: What if you’re not serving the file? What if it's an email attachment or you’re opening it up from your hard drive? Looking at how a file was served is one of the worst ways to do content type detection, it’s far easier to actually look inside the file for something like a doc type.
And there is nothing wrong with arguing over the internet, it’s a great way to add really good content. If anybody ever runs into this thread, well they will have the opportunity to learn way too much about XHTML vs. HTML.
Mathias wrote on :
A.J. Cates: Well there you have it, another reason not to use XHTML: it’s not portable.
Why would anyone even bother? In 99.99% of all cases you’re much better off using HTML. And hey, in HTML5, the use of the solidus is entirely optional, so if you really want to, you can still write semi-XHTML like you’re used to. At least you won’t be lying about your document’s MIME type.
This article is about real XHTML, the kind that doesn’t work in Internet Explorer 8 and below. You simply can’t use XHTML in these browsers, because they don’t understand the MIME type and thus fail to render the document. This alone is reason enough to never use XHTML if you don’t have to.
A.J. Cates wrote on :
Mathias: Umm? Are you F-ing kidding me on portability? XHTML = XML which is hands down the most common data format. Nearly every single programing language ships with an XML parser, can’t even begin to say the same about HTML.
The reason I am so for XHTML5 is because it’s always valid HTML5 and XML. You can’t say HTML5 is always valid XML, therefore making it very unportable when compared to the polyglotism of XHTML5.
Mathias wrote on :
A.J. Cates: I was referring to your examples of mail attachments or other local XHTML documents which, like every other XHTML document, will never get parsed in XML mode without the correct MIME type. You won’t have that kind of problem mailing an HTML document.
It’s not XML and won’t be interpreted as such unless you serve it with the right MIME type. Which is something you and many others fail to do. You’ve been talking about XHTML all this time, but what you really mean is “HTML with a stricter syntax”.
Saying HTML is inferior to XHTML because it’s not XML is like saying circles are inferior to squares because they are not rectangles.
gf3 wrote on :
A.J. Cates: NO U. You’re going to look back on this little exchange here and shake your head in embarrassment in a few months time. The points, you are missing them.
Thomas Aylott wrote on :
@Mathias My sentiments exactly!
Having the option to use XML on the web
is awesome for a few nerds (myself occasionally included),
but having loose HTML be the default is obviously much better.
HTML > XML most of the time.
The (very rare) times when XML is better,
we can choose to use it.
Nicholas Wilson wrote on :
Mr Cates, for the record, I should point out that MIME types are certainly not the only way to indicate that an XML parser is to be used. If an XML document is to be downloaded and read from disk locally, it should have a prologue at the start (
<?xml ... ?>) which functions like aContent-Type http-equiv metain HTML, storing an indication within the file itself of how it is to be treated. A complex polyglot document with an XML prologue will be loaded from disk correctly in all modern browsers (FF1+, IE9, etc) and this has been standard behaviour for some time. Documents like this are fairly common, for example the ones my CMS uses internally with DocBook, XHTML, MathML, SVG, and other namespaces, which get XSL transformed to plain XHTML5+MathML+SVG output, with various bits of embedded ARIA, RDF and other extensions (like atom threading, etc) being examples of more foreign things which could be namespaced and added in.The ease with which complex applications can be build using portable tools like XSLT easily justifies the move to XML authoring and content management, and XML output is then the easiest and most obvious move. When IE gains an XML parser (IE9) and this moves down the pipeline, some larger sites will shift in this direction more. The headache of munging HTML around is not worth it beyond a certain point.
You are right though Mathias that for smaller applications the migration to XML would add some complexity. Added constraints like well-formedness though will add only a trivial amount of complexity unless you approach XML with non-XML tools. String munging will not work efficiently, and will give you lots of headaches that the site is vulnerable to ‘yellow screens of death’, but if you are exploiting XML tools, which are one of the biggest benefits of XML (like DOM, SimpleXML, XSLT, etc), then you will find the added complexity negligeable if any. This is really the biggest advocacy or user training problem with XML: people may judge it by how easy it is to output using non-XML tools, which is irrelevant.
albert wrote on :
while most of this argument is over my pathetic head, FBML is a)wack diddy wack but more importantly b)deprecated. seriously, who tf brings FBML into an argument?
Sorcix wrote on :
I use XHTML5 with the correct doctype, mimetype and xml prolog. I've a spider running to check my websites on broken links and unused css classes, which simply uses a Java XML parser. It's that easy.
For browsers that don't support application/xml+xhtml there is a text/html version. After all, browsers send the "Accept" header for a reason.