Yesterday, Mike Taylor raised a very interesting question on Twitter:
Anybody know what Safari 5 requires for a page to be Reader-ifiable?
I noticed Reader was already working on this site for most blog posts. For example, the article about the three levels of HTML5 usage triggers the Reader badge in Safari’s address bar. I concluded the use of the
<article> element to wrap the actual content must be one of the things that trigger it.
However, there’s more to it than just using the right markup. For example, Reader doesn’t work on this page containing my notes on
document.head, even though the markup is similar to that of any other article on this site. It seems as if the length of the content is important as well. But how does Safari measure content length? Does the number of children of the wrapper element matter? How about the number of characters inside?
Rob Flaherty decided to investigate this further and created some test documents. He made some interesting observations:
- You need a wrapper element around the actual content, other than
<body>. It doesn’t really matter which element you choose, as long as it’s not
- Reader requires at least five child elements inside the wrapper. Using double line breaks (
<br><br>) inside an element makes it count as two elements.
- Reader doesn’t seem to work for local files.
All valid points, except “Reader requires at least five child elements inside the wrapper”, which doesn’t seem to be true. The number of child elements doesn’t matter, the content length seems to be measured another way. I’ll get to that later.
Clayton Ferris did some additional testing, and concluded the following:
It looks like that Safari Reader will detect a
<div>or block level element that contains a header element (
<h6>), followed by a certain amount of text. The reader badge will appear when the content text (not including the header) is more than 2,000 characters.
Sadly, none of Clayton’s statements seem to be true. I created some quick test cases to demonstrate it’s just not that simple:
- Test 1: 3 paragraphs, 1,863 characters (including heading and line breaks); Reader fails.
- Test 2: same as test 1, but with
<p>.</p>added — 4 paragraphs, 1,866 characters; Reader works.
- Test 3: test without any heading elements — 6 paragraphs, 3,718 characters; Reader works.
This also proves that there is no fixed amount of paragraphs (or other elements) needed to enable Reader; it all depends on the contents.
Reader and the Readability bookmarklet
Apple attributes Arc90’s Readability experiment in the Safari Acknowledgements (Safari › Help › Acknowledgements or
file:///Applications/Safari.app/Contents/Resources/Acknowledgments.html on Mac). This bookmarklet seems to be what Reader is based on, so it’s probably a good idea to dive in the source code.
For example, every paragraph containing double line breaks (
<br>) counts as two paragraphs — this confirms what Rob concluded after his tests. Direct child text nodes and
<div>s that don’t have block-level child elements count as paragraphs as well.
It turns out Readability then loops through all these ‘paragraphs’ and assigns a score to them based on how ‘content-y’ they look. This score is determined by things like the number of commas, class names used in the markup, etc. The content’s length appears to be measured by using
.innerText; for every 100 characters inside a paragraph, that paragraph’s score goes up. Eventually, the number of elements is counted, adding their individual scores. I think it’s safe to assume Safari Reader is triggered based on this algorithm.
This definitely needs more investigating, but so far, these appear to be the most important factors for Safari’s Reader functionality to kick in:
- Use the right markup, i.e. make sure the most important content is wrapped inside a container element. Whether you use
<span>doesn’t seem to matter — as long as it’s not
- The content needs to be long enough. Use enough words, use enough paragraphs, use enough punctuation. Every paragraph should have at least 100 characters.
- Reader doesn’t work for local documents.