How to enable Safari Reader on your site?
Yesterday, Mike Taylor raised a very interesting question on Twitter:
Anybody know what Safari 5 requires for a page to be Reader-ifiable?
I noticed Reader was already working on this site for most blog posts. For example, the article about the three levels of HTML5 usage triggers the Reader badge in Safari’s address bar. I concluded the use of the <article> element to wrap the actual content must be one of the things that trigger it.
Further investigation
However, there’s more to it than just using the right markup. For example, Reader doesn’t work on this page containing my notes on document.head, even though the markup is similar to that of any other article on this site. It seems as if the length of the content is important as well. But how does Safari measure content length? Does the number of children of the wrapper element matter? How about the number of characters inside?
Rob Flaherty decided to investigate this further and created some test documents. He made some interesting observations:
- You need a wrapper element around the actual content, other than
<body>. It doesn’t really matter which element you choose, as long as it’s not<p>. - Reader requires at least five child elements inside the wrapper. Using double line breaks (
<br><br>) inside an element makes it count as two elements. - Reader doesn’t seem to work for local files.
All valid points, except “Reader requires at least five child elements inside the wrapper”, which doesn’t seem to be true. The number of child elements doesn’t matter, the content length seems to be measured another way. I’ll get to that later.
Clayton Ferris did some additional testing, and concluded the following:
It looks like that Safari Reader will detect a
<div>or block level element that contains a header element (<h1>to<h6>), followed by a certain amount of text. The reader badge will appear when the content text (not including the header) is more than 2,000 characters.
Sadly, none of Clayton’s statements seem to be true. I created some quick test cases to demonstrate it’s just not that simple:
- Test 1: 3 paragraphs, 1,863 characters (including heading and line breaks); Reader fails.
- Test 2: same as test 1, but with
<p>.</p>added — 4 paragraphs, 1,866 characters; Reader works. - Test 3: test without any heading elements — 6 paragraphs, 3,718 characters; Reader works.
This also proves that there is no fixed amount of paragraphs (or other elements) needed to enable Reader; it all depends on the contents.
Reader and the Readability bookmarklet
Apple attributes Arc90’s Readability experiment in the Safari Acknowledgements (Safari › Help › Acknowledgements or file:///Applications/Safari.app/Contents/Resources/Acknowledgments.html on Mac). This bookmarklet seems to be what Reader is based on, so it’s probably a good idea to dive in the source code.
For example, every paragraph containing double line breaks (<br>) counts as two paragraphs — this confirms what Rob concluded after his tests. Direct child text nodes and <div>s that don’t have block-level child elements count as paragraphs as well.
It turns out Readability then loops through all these ‘paragraphs’ and assigns a score to them based on how ‘content-y’ they look. This score is determined by things like the number of commas, class names used in the markup, etc. The content’s length appears to be measured by using .innerText; for every 100 characters inside a paragraph, that paragraph’s score goes up. Eventually, the number of elements is counted, adding their individual scores. I think it’s safe to assume Safari Reader is triggered based on this algorithm.
Conclusion
This definitely needs more investigating, but so far, these appear to be the most important factors for Safari’s Reader functionality to kick in:
- Use the right markup, i.e. make sure the most important content is wrapped inside a container element. Whether you use
<article>,<div>or even<span>doesn’t seem to matter — as long as it’s not<p>. - The content needs to be long enough. Use enough words, use enough paragraphs, use enough punctuation. Every paragraph should have at least 100 characters.
- Reader doesn’t work for local documents.
Comments
Niels Matthijs wrote on :
One thing you seem to have missed is that the reader fails when it has multiple possible reader sections on one page. The reader works on my blog, unless the article has a somewhat populated comment section. At that time there are two areas competing for the reader’s attention, and It appears to drop the functionality completely in such a case.
Still, interesting stuff, comes very close to what I’d figured out myself (minus the math and points system that is :)).
Mathias wrote on :
Niels: Incorrect. If there are multiple possible content sections on the page, the Reader will always target the one with the highest ‘score’.
Safari Reader indeed doesn’t seem to be working for the latest article on your blog. However, the fact that there are multiple possible content sections has nothing to do with this. I quickly made a test case based on your article without the comments section, and as you can see, Reader still doesn’t work.
Niels Matthijs wrote on :
It seems to dislike all my “work” posts btw. Which is strange, because there are plenty of them that feature more content, more paragraphs, more data.
One thing that’s different between the two is that I use extra wrappers for underlying sections (an extra wrapper
divfor each section starting with anh2for example). My personal blog posts almost never feature subsections.My guess: it does not count all paragraphs nested within a certain
div, only the paragraphs on the same level? Would suck though.Rob Flaherty wrote on :
Great post! Brilliant idea checking out the Acknowledgements and the Readability bookmarklet. It’s interesting that the Readability bookmarklet works on all of the test pages that fail. Actually it works on just about any page.
I’m a little surprised the test case based on Niels’ article fails. Too many nested
divs trips up the algorithm?George Terezakis wrote on :
Another interesting thing to look into would be the way Reader deduces multi-page articles such as this one.
I reckon that it’s looking for a series of 2 or more consecutive links to pages that use the same base URL as the current one, with a increasing numeral as an argument.
Marc wrote on :
Niels: I’d noticed the same thing; a design I'm working on has rather long articles (each wrapped in an
<article>) broken down into sections, each of which is wrapped in a<section>and<div>tag, then headed within that with an<h2>(the<article>and<section>tags are purely future-looking semantic structure, for the heck of it; they serve no useful purpose).With this layout, depending on the content of the largest of the
<div>sections Reader may or may not even appear, and on those it does it only shows the content of the largest<div>. For example on a page where the largest chunk contains about 8,600 characters and 22<p>elements, no Reader is offered at all. On a nearly identical one with a main section of 5,800 characters and 13<p>elements, Reader is offered, but only displays that, ignoring the other 3,000 characters of main content on the page.It would have been nice had the feature assumed
<article>to be meaningful, and filtered down to, say, all the subheadings and paragraphs within it that look ‘content-y’. Maybe in a future version.Gino Marckx wrote on :
Check this out: http://www.nytimes.com/2010/06/11/education/11cheat.html
Safari reader even detects multiple pages. Please keep digging, I'm very interested in how to update my sites.
Scot Hacker wrote on :
Same question as George. I run a site with multi-page tutorials. Reader does kick in for the tutorial pages, but does not detect that they're multi-page (it only renders one page at a time in Reader). What's the secret to Reader detecting that a "Next page" link really does go to the next page of the current article?
Jorix wrote on :
Good post! Could be relevant in the future.
David Smith wrote on :
Very useful article - thanks.
Joachim Van Hove wrote on :
Nice post. Looking forward to some more digging.
Thomas Viktil wrote on :
I just discovered that the Reader enables when I'm logged in to Wordpress' wp-admin. I just wrote a couple of notes on a future article, saved it as a draft and voila! The reader button appears. I clicked it and the only thing it displayed was "Edit" (since I was editing a post).