Mathias Bynens

The XML serialization of HTML5, aka ‘XHTML5’

Published · tagged with HTML, HTTP

You know, I’ll always prefer HTML over XHTML because it’s much less verbose and I like to keep things simple. True story.

But that didn’t stop me from wondering how exactly one triggers HTML5’s XML mode — let’s call it XHTML5 from now on. As it turns out, there are three easy steps to convert your HTML5 documents to XHTML5.

Use the correct MIME type

You’re not really using XHTML5 until you’re serving your documents with the corresponding MIME type. For XML, this is application/xhtml+xml. You can easily do this by configuring your web server, or by using a server-side scripting language. Here’s an example in PHP:

<?php header('Content-Type: application/xhtml+xml;charset=UTF-8'); ?>

Note that documents served with this MIME type won’t render in IE8 and older versions. Instead, they’ll trigger a download popup. (There is a workaround that allows you to send them as application/xml instead, but why bother when you could just use HTML?)

Use the correct DOCTYPE — or don’t use it at all

The DOCTYPE declaration, which can be written as <!doctype html> in ‘regular’ HTML5, can be omitted in XHTML5. If you insist on using it, however, you should know ‘DOCTYPE’ is supposed to be written in uppercase in XML mode, like this:

<!DOCTYPE html>

Note that if you don’t uppercase DOCTYPE, the XML parser returns a syntax error.

The second part can be written in lowercase (html), uppercase (HTML) or even mixed case (hTmL) — it still works. However, to conform to the Polyglot Markup Guidelines for HTML-Compatible XHTML Documents, it should be written in lowercase.

Specify the XML namespace

XHTML5 requires you to add an XML namespace to the root element of the document, which in this case is of course the html element.

<html xmlns="http://www.w3.org/1999/xhtml">

Putting it all together

That’s all there is to it. You can view an example XHTML5 document. This is pretty much what the source code looks like:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>Example XHTML5 document</title>
</head>
<body>
<!-- HERE BE DRAGONS -->
</body>
</html>

Remember, the document must be served as application/xhtml+xml to trigger HTML5 in XML serialization mode. The DOCTYPE declaration is optional in XML mode, but if you don’t want to omit it, it needs to be uppercase.

Of course, you’ll have to use valid XHTML syntax for your document to work. This means you won’t be able to use <noscript> or document.write(). Also, a single syntax error is enough to cause a Yellow Screen of Death instead of the page you wanted to display. Use with caution!

Better yet, never ever use XHTML unless you have a very good reason to do so! There’s no added value, and it only complicates things. When in doubt, use regular HTML5.

About me

Hi there! I’m Mathias. I work on Chrome at Google. HTML, CSS, JavaScript, Unicode, performance, and security get me excited. Follow me on Twitter, Bluesky, and GitHub.

Comments

Austin Andrews wrote on :

I don’t think warning people not to use XHTML5 is a good thing. Its strict parsing and namespaces could actually be beneficial to the web.

But really HTML5 is kind of the norm now so it’s kind of a moot point.

Kevin Diale wrote on :

“No added value”. Bleh.

Standardizing the web is something we’ve been moving towards for a long time, HTML5 being based on HTML instead of XHTML aside. If the XHTML became the standard, it would ease the use of many different technologies, and quicken development time for anything that had to parse HTML because they wouldn’t have to implement catches for badly-formed document.

That’s really the term for a document that can’t be easily converted into XML. Badly-formed.

/rant.

wrote on :

Kevin: How can any document be ‘badly-formed’ if it perfectly conforms to a specification?

A.J. Which other XML languages are you talking about? SVG? MathML? You don’t need XHTML for that; HTML5 will do fine.

A.J. Cates wrote on :

Mathias: FBML comes to mind, XUL is another one, there are literally hundreds of different namespaces that can be useful to use inside an HTML5 document. If you are writing a big application like Facebook, customized namespaces can really speed up and simplify development time.

Austin Andrews wrote on :

You can also extend the DOM quite a bit for custom JS frameworks. Kind of an inline markup for custom UI elements.

Plus inline SVG is kind of ugly in HTML5’s implementation.

Kevin Diale wrote on :

HTML is very much a technology that babies its authors. There’s a way that it’s supposed to be written and a way that it can be written.

Well-formed HTML documents, that is, documents written in the way that HTML is supposed to be written, closes everything it opens. Every <p> tag has a </p> tag later down the line. It doesn’t extend this to empties like <br> but that’s not terribly important because nothing nests inside of <br>. Nothing can.

Badly-formed HTML documents are documents that are written more sloppily, documents that don’t close what they open. The parser sees that you forgot to close a <p> tag goes ahead and does it for you. By using HTML in this way you are WILLINGLY GIVING UP CONTROL OF HOW YOUR DOCUMENT RENDERS. You do part of the work and tell the parser to sort it out.

XHTML is HTML with a conscience. “I will not submit myself to tag soup” it says. “I will be forward-facing and integrate properly with all the other languages on the playground. I will be standardized so that I may retain full control over how I am seen.”

It’s not a good thing to be lazy with your markup, and if you’re keeping up with good practices in HTML, it’s only a stone’s throw away from being XHTML. Just close your empties and change the DOCTYPE.

HTML5 being based in HTML is a step backwards. I can guarantee you that.

wrote on :

A.J. Cates: Sure, but unless you’re doing so (developing the next Facebook) you probably have no reason at all to choose XHTML5 over HTML5.

Also, note that technically, Facebook isn’t using XHTML either; they’re serving their documents with a text/html MIME type.

Dirk Gadsden wrote on :

Mathias: But they are serving using an XHTML DOCTYPE. (A.J. Cates beat me to the punch.)

I agree with Kevin Diale in the idea that, as Austin said, XHTML’s “strict parsing and namespaces could actually be beneficial to the web.” HTML5 is supposed to represent a new horizon in standardization and compatibility, and I second the idea that it being based in HTML is definitely a step backwards.

wrote on :

A.J. and Dirk Yes, Facebook is using the XHTML 1.0 Strict DOCTYPE. What’s your point? They’re still serving their documents with the text/html MIME type. Allow me to quote myself here: “Note that technically, Facebook isn’t using XHTML either; they’re serving their documents with a text/html MIME type.”

Austin: “Inline SVG is kind of ugly in HTML5’s implementation.” Care to elaborate on that?

Kevin You seem to be missing the point that HTML5 is a spec, which HTML parsers should adhere to. Nobody is “WILLINGLY GIVING UP CONTROL OF HOW YOUR DOCUMENT RENDERS” [sic]; you know how it will be interpreted because it’s documented in the specification. There’s nothing malformed about an HTML5 document that omits an optional closing tag, as long as it is written according to the spec.

Dirk Gadsden wrote on :

Mathias: My point is that they’re still writing XHTML; last time I checked XHTML is still valid HTML, so a text/html MIME type is just fine. Also, Kevin is saying that the HTML5 specification should follow XHTML’s footsteps instead of continuing on with the original HTML’s legacy of malformed’ness.

wrote on :

Dirk Gadsden: If you’re sending content as text/html you’re using HTML, not XHTML. A DOCTYPE declaration won’t change that.

And what’s with the “malformed” buzzword here? How can something be malformed if it’s perfectly conforming to a specification?

Austin Andrews wrote on :

Mathias: I was just putting it out there. It looks rather ugly compared to the XHTML’s implementation. Pointing out in the future the tags will start to mix.

Also I believe anything not adhering to XML syntax should be defined as malformed.

wrote on :

Dirk:

Also, you’re wrong in saying text/html is not a valid content-type for XHTML, see Wikipedia and the W3C.

Thanks for making my case! I quote https://www.w3.org/TR/2009/NOTE-xhtml-media-types-20090116/#text-html:

The text/html media type [RFC2854] is primarily for HTML, not for XHTML. In general, this media type is NOT suitable for XHTML except when the XHTML is conforms to the guidelines in Appendix A. In particular, text/html is NOT suitable for XHTML Family document types that add elements and attributes from foreign namespaces, such as XHTML+MathML [XHTML+MathML].

XHTML documents served as text/html will not be processed as XML [XML10], e.g., well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets (see guidelines 11 and 13).

A.J. Cates wrote on :

Mathias: Just says it isn’t suitable for XHTML documents that use foreign namespaces. It says nothing about when the document is served as text/html it becomes invalid, and the XHTML media types are not part of the XHTML 1.0 specs, they are just suggestions from the W3C. If you run a document through the XHTML validator when it’s served as text/html you don’t get any errors and still pass validation but you also get a notice suggesting how it should be served.

A.J. Cates wrote on :

Mathias: What if you’re not serving the file? What if it’s an email attachment or you’re opening it up from your hard drive? Looking at how a file was served is one of the worst ways to do content type detection, it’s far easier to actually look inside the file for something like a doc type.

And there is nothing wrong with arguing over the internet, it’s a great way to add really good content. If anybody ever runs into this thread, well they will have the opportunity to learn way too much about XHTML vs. HTML.

wrote on :

A.J. Cates: Well there you have it, another reason not to use XHTML: it’s not portable.

Why would anyone even bother? In 99.99% of all cases you’re much better off using HTML. And hey, in HTML5, the use of the solidus is entirely optional, so if you really want to, you can still write semi-XHTML like you’re used to. At least you won’t be lying about your document’s MIME type.

This article is about real XHTML, the kind that doesn’t work in Internet Explorer 8 and below. You simply can’t use XHTML in these browsers, because they don’t understand the MIME type and thus fail to render the document. This alone is reason enough to never use XHTML if you don’t have to.

A.J. Cates wrote on :

Mathias: Umm? Are you F-ing kidding me on portability? XHTML = XML which is hands down the most common data format. Nearly every single programing language ships with an XML parser, can’t even begin to say the same about HTML.

The reason I am so for XHTML5 is because it’s always valid HTML5 and XML. You can’t say HTML5 is always valid XML, therefore making it very unportable when compared to the polyglotism of XHTML5.

wrote on :

A.J. Cates: I was referring to your examples of mail attachments or other local XHTML documents which, like every other XHTML document, will never get parsed in XML mode without the correct MIME type. You won’t have that kind of problem mailing an HTML document.

The reason I am so for XHTML5 is because it’s always valid HTML5 and XML.

It’s not XML and won’t be interpreted as such unless you serve it with the right MIME type. Which is something you and many others fail to do. You’ve been talking about XHTML all this time, but what you really mean is “HTML with a stricter syntax”.

Saying HTML is inferior to XHTML because it’s not XML is like saying circles are inferior to squares because they are not rectangles.

Thomas Aylott wrote on :

@Mathias My sentiments exactly!

Having the option to use XML on the web
is awesome for a few nerds (myself occasionally included),
but having loose HTML be the default is obviously much better.

HTML > XML most of the time.
The (very rare) times when XML is better,
we can choose to use it.

Nicholas Wilson wrote on :

Mr Cates, for the record, I should point out that MIME types are certainly not the only way to indicate that an XML parser is to be used. If an XML document is to be downloaded and read from disk locally, it should have a prologue at the start (<?xml ... ?>) which functions like a Content-Type http-equiv meta in HTML, storing an indication within the file itself of how it is to be treated. A complex polyglot document with an XML prologue will be loaded from disk correctly in all modern browsers (FF1+, IE9, etc) and this has been standard behaviour for some time. Documents like this are fairly common, for example the ones my CMS uses internally with DocBook, XHTML, MathML, SVG, and other namespaces, which get XSL transformed to plain XHTML5+MathML+SVG output, with various bits of embedded ARIA, RDF and other extensions (like atom threading, etc) being examples of more foreign things which could be namespaced and added in.

The ease with which complex applications can be build using portable tools like XSLT easily justifies the move to XML authoring and content management, and XML output is then the easiest and most obvious move. When IE gains an XML parser (IE9) and this moves down the pipeline, some larger sites will shift in this direction more. The headache of munging HTML around is not worth it beyond a certain point.

You are right though Mathias that for smaller applications the migration to XML would add some complexity. Added constraints like well-formedness though will add only a trivial amount of complexity unless you approach XML with non-XML tools. String munging will not work efficiently, and will give you lots of headaches that the site is vulnerable to ‘yellow screens of death’, but if you are exploiting XML tools, which are one of the biggest benefits of XML (like DOM, SimpleXML, XSLT, etc), then you will find the added complexity negligeable if any. This is really the biggest advocacy or user training problem with XML: people may judge it by how easy it is to output using non-XML tools, which is irrelevant.

albert wrote on :

While most of this argument is over my pathetic head, FBML is a) wack diddy wack but more importantly b) deprecated. Seriously, who tf brings FBML into an argument?

Sorcix wrote on :

I use XHTML5 with the correct doctype, mimetype and xml prolog. I’ve a spider running to check my websites on broken links and unused css classes, which simply uses a Java XML parser. It’s that easy.

For browsers that don’t support application/xml+xhtml there is a text/html version. After all, browsers send the Accept header for a reason.

Sorcix wrote on :

My weblog is powered by Tumblr, so I can’t control what they generate. I’m working on a replacement. The blog isn’t served as application/xhtml+xml, so it doesn’t have to be well-formed, and it has a HTML5 doctype, so it isn’t XML at all.

wrote on :

The blog isn’t served as application/xhtml+xml, so it doesn’t have to be well-formed, and it has a HTML5 doctype, so it isn’t XML at all.

XHTML5 documents can have the HTML5 DOCTYPE. Have you read the post above? ;)

If it’s “not XML at all”, why does the <html> element have xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"?

I’d link Hixie’s “Sending XHTML as text/html Considered Harmful” but it looks like mattur already did.

Sorcix wrote on :

Mathias: Yes, I have read the post above. Both my website and my blog have a HTML5 doctype, with a different MIME type. That makes the biggest part of the website XHTML5 and the blog HTML5, as the post above explains.

Also the link is not relevant in my case, since it has a HTML5 doctype (the top of the page explains that it isn’t that relevant anymore in HTML5, read it yourself first, please, thanks!). In HTML5 it’s allowed to use both > and /> so the differences in rendering in browsers are gone. I’m not using <script> and <style> elements so there is no difference in comments in my pages. I fail to see your point with that link?

wrote on :

Sorcix:

Also [Sending XHTML as text/html Considered Harmful] is not relevant in my case, since [my blog] has a HTML5 doctype (the top of the page explains that it isn’t that relevant anymore in HTML5, read it yourself first, please, thanks!).

I’m sorry — what?

In HTML5 it’s allowed to use both > and /> so the differences in rendering in browsers are gone.

What on earth are you talking about?! <div /> still doesn’t close a div if you serve it as HTML — that only happens in XML mode. HTML5 doesn’t change any of that. Please read Hixie’s article more carefully — I think you missed the part about HTML5 allowing XML-like syntax in the same places that XHTML 1.0 Appendix C does. That doesn’t mean HTML5 === XHTML.

Sorcix wrote on :

Mathias: I’m only using empty elements where allowed in both XHTML and HTML, such as <script /> and <link /> tags. I’m not relying on XML to close those elements, as they don’t have to be closed when served as HTML. Why would you want an empty block level element (such as <div />)?

wrote on :

Sorcix: I can see that; I was just replying to your comments. You said Hixie’s article “isn’t that relevant anymore” now that there’s HTML5, and that “the differences in rendering [between HTML and XHTML] in browsers are gone”. Neither of those statements is true.

I’m only using empty elements where allowed in both XHTML and HTML, such as <script /> and <link /> tags.

In what universe is <script /> allowed in HTML? <script></script> is, but <script /> is only valid in XML mode.

Sorcix wrote on :

You’re right, I was confused. <script /> is not allowed indeed, which is why I’m using <script></script> as well, but I’m sure you’d noticed that already.

Balazs wrote on :

Browsers don’t care if you use non-XML HTML markup, but it’s impossible to process a non-XML document outside a browser (believe me, it’s one of the biggest nightmares). This is the reason why I prefer XHTML and XHTML5, especially when markup is not written by hand but generated by software e.g. a WYSIWYG editor or a CMS. Processing an XML document is very simple task for any software since almost every development framework on every platform has an XML parser.

Humans like HTML, applications like XML. It’s better if you use both as needed.

Zimmen wrote on :

Yes, HTML / HTML5 are specs… but it’s a spec that specifies how to handle badly written markup IMHO! The handeling of not-so-well-written markup + the interpretation of the spec makes browsers incompatible and slow, this was something we hoped XHTML would fix! I think web pages should work more like “compiled apps” instead of something that works like “fuzzy logic rendering”.

Miller Medeiros wrote on :

I’m also against the XML serialization (serving with the XML MIME type or using the <?xml ..?> header or .xhtml file extension) unless you really need it, it reduces cross-browser support and the fact that a malformed document can trigger an error message for the client instead of displaying “slightly undesired output” can cause headaches without a real gain. (Developers should validate code, not users…)

But I’m totally favor of using a XML-like syntax (closing tags on the right order, lower-case, etc.) since it is stricter and leaves less room for browsers to ‘guess’ what are your intentions — we all know that specs usually aren’t followed the same way by all vendors — it also allows “code folding’ on most editors and it is easier to understand the document structure (where a node starts and ends)… I guess you also agree with that.

Cheers.

subduedjoy wrote on :

I remember the days before XHTML. What a mess. Browsers continually had to guess at what the web developer meant, and of course, they all guessed differently. How soon we forget.

Ramon wrote on :

Hello,

I am fairly new to web development and trying to understand the differences between HTML5 and XHTML5, I arrived to this post. I have been reading for a while today and have come to the conclusion, that at least for now, XHTML5 is not worth the effort, since this is my opinion, I will expose why I think that.

  1. IE8 and below don’t understand XHTML5.
  2. As I have read here and on some other articles, sending XHTML as text/html, is not a good thing, and besides XHTML5 will be read as HTML5, so what’s the point?. A good article from Ian Hickson about this, can be read at http://hixie.ch/advocacy/xhtml. Have to say that I’m not sure if it’s still valid for HTML5, but I’ll take the safe choice on this one for now.
  3. Most importantly, as long as all pages validate, I don’t see anything wrong in using HTML5 syntax, instead of XHTML5, after all, who cares if I write <DIV> or <div>, or if I don’t double quote atribute values (which I do, but that’s not the point). It’s all valid HTML5 according to W3C validator.

If I’m wrong it’s my right, but please do tell me, I’m trying to learn something here. Mathias, thanks for the article. Cheers.

Chris wrote on :

[…] XHTML5 is not worth the effort […]

I’d hardly call it an effort and I think it is totally worth it.

IE8 and below don’t understand XHTML5 […]

IE8 and below don’t understand the application/xhtml+xml MIME type, and even if they did understand and parse it correctly, they would not support the HTML5 features you’d likely be using if you choose XHTML5.

Is that a problem? Maybe, if you are lazy and absolutely must support every browser on the green earth. Anyone who wants a browser capable of handling XML and HTML5 has them at their disposal. Vista and Windows 7 users can download IE9, and everyone can choose to use the latest Firefox or Chrome.

XHTML has always been a superior specification, and it’s no more difficult to learn and implement than HTML. As a matter of fact it’s easier. A small, strict set of rules is always easier to follow than a set of rules with countless exceptions.

I’m currently serving all my content as XHTML5 with an application/xhtml+xml Content-Type.

TechZilla wrote on :

I use XHTML 1.0 Strict, on my site (man page repository). I prefer XHTML 1.1, but my tools only support 1.0.

If I wasn’t using XML I’d be unable to parse my conversions. The whole point of this crap is so we can use all the standard XML parsers and tools, such as XSLT!

Once it’s all done, what am I going to do? Convert everything all over again to HTML? That’s a one way conversion, you can’t return from that with the same XML parser. You would have to convert the, now HTML, source to XML to do any transforms.

This is why XHTML is superior, no question about it. This is the same nonsense arguments against XHTML. They are based on laziness, or a publisher’s response to it. Laziness of the authors, laziness on the authoring tool developers and laziness on browsers developers.

In response to their ineptitude, I serve as text/html. The modern browsers will recognize the source as intended and render as such. Browsers do lenient syntax parsing, but the rendering is still ends up standard compliant.

The big difference in changing the MIME, is how the browsers do the parsing, but if you stick to a certain valid XHTML/HTML polyglot form, you will render as the same correct way. You are not serving HTML, you are serving XHTML and telling the browser to parse it as HTML.

HTML5 is a whole different animal, I don’t use it for multiple reasons. (Not that they won’t change…)

  1. The standard is implemented, but still not official. This is a good enough reason IMO.
  2. XHTML5 support is less mature than HTML5, thus many of my tools can’t use it.
  3. XHTML5 is the “standard” that is forcing the MIME away from text/html. It was previously not suggested, but it was still valid. As we already stated support for the proper MIME is spotty.

Anyone serious about XHTML, should use the mature XHTML 1.1 until XHTML5 support is more mature and implemented.

Matias Meno wrote on :

I still don’t get, after reading the rules for polyglot markup, why some of you would say that it’s «unsafe» or «bad» to serve XHTML documents as HTML. HTML5 is designed to support that, and there is no downside to it: you serve proper XHTML + HTML which means that you do not have to worry about browsers parsing them differently (of course you shouldn’t just switch to the application/xhtml+xml Content-Type, but why would you?) and you have the benefit of easily parsing it as XML.

Anonymous wrote on :

I wholeheartedly agree with avoiding XHTML (any version). It’s a solution in search of a problem and won’t solve whatever problem you had pretty well anyways. XML is just a huge circle-jerk people somehow feel great using, but other than a misplaced, misguided feeling of doing the “right” thing for a completely pointless definition of “right”, there’s no good in using XHTML.

There’s not a single reason to use it, and the arguments typically exposed such as “force it to be nice” are bogus as you can botch anything no matter how strict the specification is, and you can make anything nice no matter how lax the specification is; you are not going to care for validating it, and you are not going to use XSL unless you’re insane.

One of my favourite programming-related quotes presents a perfect view of XML in one line: “The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well.” — Phil Wadler

Emon wrote on :

To me the simple theory is: if things are not organized or patterned, it’ll become a mess when it grows in large. It’s true for anything, from your book library to your city. I should not say something not well unless I introduce something really well. The web’ll not be the semantic as a whole; I might follow 10 days a week and you probably start a new day after 48 hours.

Evi1M4chine wrote on :

The “yellow screen of death” is the WHOLE DAMN POINT of XHTML! It is the ONLY way to guarantee that the developer is not a complete idiot, as invalid code is not let through at all. Just like with compilers.

In fact, HTML5 is a horrible abomination, a evolutionary step back from HTML 4 (just like its users and designers are an evolutionary step back from an intelligent human being), the biggest failure since HTML 3.2, and will result in the same mess of completely shitty and horribly bad HTML messes we can remember from back then. With taxi drivers acting as if they were “pros” for being able to cobble something together, that the browser still manages to decypher (after tons of “looking over errors”, fixes and guesswork).

In the next years, it will be easy again, to tell a professional from a hack. The pro will use XHTML 5. The hack will only have the “skills” to “use” HTML5. The sad thing is, that most clients won’t know that. But I will make damn sure, everyone I know, will know it.

HTML5 IS CONSIDERED HARMFUL. Avoid it like the pest. Use XHTML5 instead.

Markus Reiter wrote on :

Evi1M4chine: I don't think you can call people “pros” because of wether they close their tags or not.

I personally think HTML 5 is harder to write – if you do it right. Think of all the possibilities where you can omit start and end tags where in XHTML there is no discussion about wether to close or not to close a tag because you just have to.

1UnitedPower wrote on :

I know about the date of this posting, but still want to add my thoughts.

There obviously are reasons to prefer XHTML over HTML.

  1. You can make use of processing-instructions. That’s really badass.
  2. You can style your document with XSL and XSLT. Serve as XHTML, print as PDF — that’s so cool.
  3. Thy syntax is much easier. Just because SGML syntax is more fault-tolerant doesn’t mean it’s easier to read or write.

Clive J. wrote on :

You know, I’ll always prefer HTML over XHTML because it’s much less verbose and I like to keep things simple. True story.

If you think HTML5 is simpler than XHTML5, you’ve clearly never read the HTML5 parsing spec to the level of detail required to implement it. True story.

Also, even if you choose to use the HTML5 serialization, you’re still using XML every time use use foreign content (MathML, SVG). The HTML5 serialization format also makes it impossible to write a streaming parser and thus forces linear space complexity (in stark contrast to XML).

The extra verbosity of XHTML5 is simply what it takes to avoid the awful, over-specified flaws and context-sensitivity involved in parsing HTML5.

P.S. I used to mindlessly claim “XHTML is dead/too verbose/sucks etc. etc.” too, before I spent months working on the alternative, only to discover that it’s 10 times worse.

P.P.S. By “working on the alternative” I meant “working on a spec compliant parser for the alternative (HTML5)”.

John K. wrote on :

You can absolutely send XHTML with a text/html MIME type. It doesn’t mean that it’s somehow not really XHTML or that it’s sloppy or invalid XHTML. It might get passed to a regular HTML parser, but the document itself is XHTML as long as it conforms to the XHTML spec.

The whole point of XHTML was to have a version of HTML that would also be valid XML (hence the name). It would use the tags and attributes from HTML but forbid common HTML practices that were quirky from an XML standpoint. Parsing it as HTML instead of XML doesn’t magically undermine its XHTML status — but it might be less efficient because HTML parsers have to be more complicated and so there will probably be some overhead. In the end, the HTML parser mostly sees an HTML document with a lot of unnecessary (but acceptable) quotation marks, closing tags that could have been left out, and maybe some unrecognized attributes.

P.S. What do you get if you take an XHTML 1.1 document and change its doctype to <!DOCTYPE HTML>? A very tidy HTML 5 document.

William wrote on :

Thanks for this awesome guide. I was wondering how I could convert or transfer my html files to xhtml, or should I just re-rewrite or revamp the whole thing.

John C. Hunt wrote on :

John K.: You are very, VERY clueless and wrong. Running a valid XHTML through an HTML5 parser can have completely unexpected results. It takes very specific effort to make a document both well formed XHTML and proper HTML5. Get a clue before you leave comments around the internet please.

Howard W wrote on :

I guess I’m on the same page as subduedjoy. Maybe HTML5 just standardized the guesses.

Four years ago I started a JSP/Servlet web-app I believed I had properly set up as XHTML5. The other day I was reviewing someone else’s code. The HTML in the IDE had all kind of errors. Self-closing div, inputs with custom attributes… So I started researching what they thought they were doing.

In the process I discover my own XHTML was being served as text/html. I discover the meta content-type XHTML does not work in HTML5 and I had a JSP page directive to use text/xml. Once I fixed that most every page broke, most with entity names. It also broke a jQuery checkbox drop-down that had things like tabIndex, <input disabled>… So I had to fix that too.

Had I properly configured the XHTML5 from the get go I would have got it right in the first place. I don’t think it’s really any harder to do if you write and test. Although markup is not code I’ve always treated like it was. Compilers never guess what I meant, they do exactly what I write.

Leave a comment

Comment on “The XML serialization of HTML5, aka ‘XHTML5’”

Your input will be parsed as Markdown.