Mathias Bynens

The smallest possible valid (X)HTML documents

· tagged with HTML, SGML

I thought it would be fun to document the smallest possible valid HTML documents for each version, so here goes :)

ISO/IEC 15445:2000, also known as “ISO HTML”: 113 bytes

<!DOCTYPE html PUBLIC"ISO/IEC 15445:2000//DTD HTML//EN"><html><head><title></title></head><body><p></body></html>

The DOCTYPE can also be written as <!DOCTYPE html PUBLIC "ISO/IEC 15445:2000//DTD HyperText Markup Language//EN">, but that obviously requires more characters.

Although it tricks the W3C Validator, the space following PUBLIC can be omitted as long as no system identifier is used.

Start and end tags for <html>, <head>, <body> are required, as well as a block-level element as body content. The end tag for the <p> element can be omitted, though.

HTML 2.0: 58 bytes

<!DOCTYPE html PUBLIC"-//IETF//DTD HTML 2.0//EN"><title//x

Other than the DOCTYPE, only the <title> element is required, as well as some body content (in this case, the text “x”). The start and end tags for <html>, <head> and <body> may be omitted. (Browsers automatically create these elements.)

You may have noticed the use of <title// instead of <title></title> here. This is a markup minimization feature of SGML named “SHORTTAG NETENABL IMMEDNET”. NET stands for Null End Tag. Basically, this allows shortening tags surrounding a text value. The first slash (/) in <title// stands for the NET-enabling “start-tag close” (NESTC), and the second slash stands for the NET. If you wanted to add some content to the <title> element, you could theoretically use <title/Foo/ instead of (<title>Foo</title>).

Note that the following version (54 bytes) seems to have the same effect, according to the W3C Validator:

<!DOCTYPE html PUBLIC"-//IETF//DTD HTML//EN"><title//x

HTML 3.2: 63 bytes

<!DOCTYPE html PUBLIC"-//W3C//DTD HTML 3.2 Final//EN"><title//x

Note that the DOCTYPE for HTML 3.2 and older versions doesn’t really have an effect on your document; browsers still enter quirks mode.

HTML 4.0 Strict: 59 bytes

<!DOCTYPE html PUBLIC"-//W3C//DTD HTML 4.0//EN"><title//<p>

In HTML4, the body content must contain a block-level element — just text content won’t do. For that reason, an empty <p> element is used.

HTML 4.01 Transitional: 71 bytes

<!DOCTYPE html PUBLIC"-//W3C//DTD HTML 4.01 Transitional//EN"><title//x

Note that we’re not using the full document type declaration; the system identifier (the URL part that theoretically allows user agents to download the document type definition and any needed entity sets) is optional, so it’s been omitted here.

HTML 4.01 Transitional requires body content, but accepts text content; a block-level element in the <body> isn’t needed.

HTML 4.01 Frameset: 84 bytes

<!DOCTYPE html PUBLIC"-//W3C//DTD HTML 4.01 Frameset//EN"><title//<frameset/<frame>/

The full DOCTYPE is <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">, but the system identifier may be omitted.

As you can see, we’re using the same SGML trick as before (<frameset/<frame>/) — only this time we’re actually adding content to the wrapper element.

In HTML 4.01 Frameset, the <frameset> element must have a <frame> child element. XHTML 1.0 Frameset does not have this requirement.

HTML 4.01 Strict: 60 bytes

<!DOCTYPE html PUBLIC"-//W3C//DTD HTML 4.01//EN"><title//<p>

HTML 4.01 + RDFa 1.0: 69 bytes

<!DOCTYPE html PUBLIC"-//W3C//DTD HTML 4.01+RDFa 1.1//EN"><title//<p>

The full DOCTYPE is <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/html401-rdfa-1.dtd">, but the system identifier may be omitted.

HTML 4.01 + RDFa 1.1: 69 bytes

<!DOCTYPE html PUBLIC"-//W3C//DTD HTML 4.01+RDFa 1.1//EN"><title//<p>

The full DOCTYPE is <!DOCTYPE html PUBLIC"-//W3C//DTD HTML 4.01+RDFa 1.1//EN" "http://www.w3.org/MarkUp/DTD/html401-rdfa11-1.dtd">, but the system identifier may be omitted.

XHTML Basic 1.0: 41 bytes

<html><head><title/></head><body/></html>

The DOCTYPE — in this case, <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd"> — is optional in all XHTML versions, assuming the document is served with the correct Content-Type: application/xhtml+xml header. (That’s a bold assumption.) Note that the xmlns attribute on the root <html> element isn’t required in this version of XHTML.

Body content is optional, too.

You may notice the use of <title/> here instead of <title></title>. This is the XHTML equivalent of <title// in HTML serializations. Remember when we talked about SGML, and how HTML defined both its NET and NETSC with a /? The only difference here is that XML defines NESTC with a /, and NET with an > (angled bracket).

XHTML Basic 1.1: 41 bytes

<html><head><title/></head><body/></html>

The DOCTYPE, <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.1//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd"> is optional — again, assuming the file is served with the correct MIME type.

XHTML 1.0 Transitional: 78 bytes

<html xmlns="http://www.w3.org/1999/xhtml"><head><title/></head><body/></html>

The DOCTYPE, <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">, is optional.

XHTML 1.0 Frameset: 82 bytes

<html xmlns="http://www.w3.org/1999/xhtml"><head><title/></head><frameset/></html>

The DOCTYPE, <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">, is optional.

XHTML 1.0 Strict: 78 bytes

<html xmlns="http://www.w3.org/1999/xhtml"><head><title/></head><body/></html>

The DOCTYPE, <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">, is optional.

XHTML + RDFa 1.1: 78 bytes

<html xmlns="http://www.w3.org/1999/xhtml"><head><title/></head><body/></html>

The DOCTYPE, <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">, is optional.

XHTML 1.1: 78 bytes

<html xmlns="http://www.w3.org/1999/xhtml"><head><title/></head><body/></html>

The DOCTYPE, <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">, is optional.

HTML5: 15 bytes

<!DOCTYPE html>

That’s right — there’s no <title> element! When a higher-level protocol provides title information, e.g. in the Subject line of an email or when HTML is used as an e-mail authoring format, the <title> element may be omitted.

In all other situations, this is the smallest possible HTML5 document (31 bytes):

<!DOCTYPE html><title>x</title>

Sadly, the SGML trick we used before (<title//) is not allowed in HTML5 anymore. Even if it was, we still couldn’t use it, because HTML5 requires a non-empty content value for the <title> element if it is used. The reasoning behind this is obvious: if you leave the <title> element empty, it means the document doesn’t need a title, in which case you should simply omit the <title> element entirely (as explained above).

Note that body content is not required.

XHTML5: 44 bytes

<html xmlns="http://www.w3.org/1999/xhtml"/>

XHTML5 doesn’t require a DOCTYPE. Just like in HTML5, there are cases where a <title> element is not needed. Body content is optional, too.

(Use validator.nu to confirm this; the W3C validator would fall back to XHTML 1.0 Transitional if you tried to validate this.)

Disclaimer

It’s very likely that I missed a possible “optimization”. Please leave a comment if you have any corrections or other feedback!

Update: I’ve set up a repository on GitHub to collect the smallest possible syntactically valid files. Pull requests welcome!

Comments

David Håsäther wrote on :

You can skip the space between PUBLIC and the FPI (it’s a so-called “parameter separator” and can be omitted if it doesn’t introduce any ambiguities). So e.g. <!DOCTYPE html PUBLIC"-//W3C//DTD HTML 4.01//EN"> is a legal doctype declaration.

If you have a catalog with DTD mappings properly set up, you don’t even need the FPI (as the parser checks the catalog), so <!DOCTYPE html> is fine even pre-HTML5.

And you could just use another element type name: <!DOCTYPE b>. Which begs the question if it is a HTML4 document anymore. Could be, or could be something else, as it’s not possible to tell from the doctype declaration whether a document is this or that.

David Håsäther wrote on :

Also, small correction: if NETENABL is set to IMMEDNET, the NET has to follow NESTC immediately. So e.g. <p/d/ is not allowed in that case. This was introduced in the Web SGML Adaptations annex, where NESTC is / and NET is > which forms the familiar void/empty-element tag (or what you want to call it) <foo/>. (All just theory of course, no XML parser I know of makes use of an SGML declaration.)

Karellen wrote on :

Huh. I’d always thought that HTML5 allowed the empty/self-closing tag format (e.g. <p/>) as an equivalent of the empty start tag-end tag pair (e.g. <p></p>) for any element, which could have been used to shorten the HTML5 <title>. Learn something new every day…

sky wrote on :

The HTML5 one with title maybe can be like this:

<!doctype html><title/>and here write your title:)

Mathias wrote on :

sky: No, stuff like <title/> only works in XML serializations (i.e. for XHTML documents served as application/xhtml+xml). Also, your example would place the intended title text outside of the <title> element.

Divya Manian wrote on :

The minimal SVG code for a .svg file that is valid would be <svg xmlns="http://www.w3.org/2000/svg"/> (41 bytes). If you merely want the SVG to render on a browser, you can get away with <!doctype html><svg> (20 bytes), serving it as text/html.

Kees wrote on :

The WHATWG specs on the title element say “the title element must not be empty”, so you need something in it (conveniently also ending any possible discussion on whether it can be self-closing or not ;]).

It also says the following about the <title> element:

Contexts in which this element can be used: In a <head> element containing no other <title> elements

…which seems to suggest to me that you can’t use it without an enclosing <head> element.

Mathias wrote on :

Kees: Good catch! The spec indeed requires a non-empty content value for the <title> element, if any. The HTML validator currently doesn’t detect this, so I’ve filed a bug. I’ll add a note to the article.

Older HTML versions, e.g. HTML4, didn’t have this requirement — so self-closing the <title> element using the SGML trick is still okay there. The HTML standard didn’t have this requirement until December 2012.

The “contexts” sections in the spec are non-normative though, so it’s fine to use it without an enclosing <head> element.

SelenIT wrote on :

Just a note to the last comment: to use title without head is fine not because the context part of the spec is non-normative, but because for html and head elements both start and end tags are optional. So these elements are created by the parser in the appropriate context even without any corresponding tags in the markup, allowing to describe the same correct DOM hierarchy with less code.

Leave a comment

Comment on “The smallest possible valid (X)HTML documents”

Some Markdown is allowed; HTML isn’t. Keyboard shortcuts are available.

It’s possible to add emphasis to text:

_Emphasize_ some terms. Perhaps you’d rather use **strong emphasis** instead?

Select some text and press + I on Mac or Ctrl + I on Windows to make it italic. For bold text, use + B or Ctrl + B.

To create links:

Here’s an inline link to [Google](http://www.google.com/).

If the link itself is not descriptive enough to tell users where they’re going, you might want to create a link with a title attribute, which will show up on hover:

Here’s a [poorly-named link](http://www.google.com/ "Google").

Use backticks (`) to create an inline <code> span:

In HTML, the `p` element represents a paragraph.

Select some inline text and press + K on Mac or Ctrl + K on Windows to make it a <code> span.

Indent four spaces to create an escaped <pre><code> block:

    printf("goodbye world!"); /* his suicide note
was in C */

Alternatively, you could use triple backtick syntax:

```
printf("goodbye world!"); /* his suicide note
was in C */
```

Select a block of text (more than one line) and press + K on Mac or Ctrl + K on Windows to make it a preformatted <code> block.

Quoting text can be done as follows:

> Lorem iPad dolor sit amet, consectetur Apple adipisicing elit,
> sed do eiusmod incididunt ut labore et dolore magna aliqua Shenzhen.
> Ut enim ad minim veniam, quis nostrud no multi-tasking ullamco laboris
> nisi ut aliquip iPad ex ea commodo consequat.

Select a block of text and press + E on Mac or Ctrl + E on Windows to make it a <blockquote>.