Math Jazz — Mathias Bynens’s shizzle, y’all



Note: This site might seem inactive… That’s because it is. Don’t worry though, I’m still coding webpages and stuff! If you’re interested, I suggest you get a translator and head over to Qiwi; or you could just check the latest site we’ve been working on: AD Delhaize Lebbeke. Enjoy!

XHTML content negotiation through PHP

The most obvious one of the many perils of using XHTML properly, is of course the fact that IE cannot handle the application/xhtml+xml MIME type. Unless you don’t mind if your site can’t be visited through that wanna-be–browser, this means that you’ll have to serve up your XHTML pages as application/xhtml+xml where possible, but as text/html to the crappier ones. Stuff like this is called content negotiation.

Through mod_rewrite

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0
RewriteCond %{REQUEST_URI} \.xhtml$
RewriteCond %{THE_REQUEST} HTTP/1\.1
RewriteRule .* - [T=application/xhtml+xml]

Indeed, XHTML content can be negotiation through some spiffy rewrite rules in your .htaccess, but because of the complexity of this matter, it’s probably a better idea to use PHP instead if you want top notch accurate results.

Through PHP

Until recently, I was using Simon Jessey’s well-known serving XHTML with the correct MIME type script. There were a couple of things about it that really bugged me:

  1. The script changes the MIME type to text/html when application/xhtml+xml isn’t supported, which is of course fine. But it will also alter the contents of the file you’re serving, making it so that you’re pretty much serving real HTML… What’s the point in that? If you want to use HTML, don’t use XHTML, and vice versa. Whatever it is you’re serving to IE, it will be handled as tag soup anyway. So why even bother removing those sexy />? (Besides, there’s more XML-only code than just /> — think of xml:lang, for example.)
  2. Though it flawlessly analyzes the q value for browsers, the script serves XHTML pages as text/html to validators, which might make it look as if you’re serving your XHTML as text/html all the time. And hey, if it’s a validator, it can “handle” application/xhtml+xml.
  3. I personally don’t really like XML prologs. (Don’t ask.) And the alternating DOCTYPE totally swallows as well, for the same reasons I mentioned above (IE doesn’t even have a friggin’ standards mode).

So, I started hacking away in the code, and eventually I came up with my custom PHP XHTML content negotiation script. I added “support” for several validators:

Anyway, here’s the script:

<?php
$mime = 'text/html';
if(strstr($_SERVER['HTTP_USER_AGENT'], 'W3C_Validator') ||
strstr($_SERVER['HTTP_USER_AGENT'], 'WDG_SiteValidator') ||
strstr($_SERVER['HTTP_USER_AGENT'], 'W3C-checklink') ||
strstr($_SERVER['HTTP_USER_AGENT'], 'Web-Sniffer') ||
strstr($_SERVER['HTTP_USER_AGENT'], 'FeedValidator') ||
strstr($_SERVER['HTTP_USER_AGENT'], 'Poodle predictor') ||
strstr($_SERVER['HTTP_USER_AGENT'], 'Leknor.com gzip tester')) {
 $mime = 'application/xhtml+xml';
} else {
 if(stristr($_SERVER['HTTP_ACCEPT'], 'application/xhtml+xml')) {
  if(preg_match("/application\/xhtml\+xml;q=([01]|0\.\d{1,3}|1\.0)/i", $_SERVER['HTTP_ACCEPT'], $matches)) {
   $xhtml_q = $matches[1];
   if(preg_match("/text\/html;q=q=([01]|0\.\d{1,3}|1\.0)/i", $_SERVER['HTTP_ACCEPT'], $matches)) {
    $html_q = $matches[1];
    if((float)$xhtml_q >= (float)$html_q) {
     $mime = 'application/xhtml+xml';
    }
   }
  } else {
   $mime = 'application/xhtml+xml';
  }
 }
}
header('Content-Type: ' . $mime . ';charset=utf-8');
header('Vary: Accept');
?>

As you can see, this is far more accurate compared to the seven-line mod_rewrite solution.

Hey, how about AdSense?

Good question. As we all know, AdSense won’t work with correctly served XHTML. Again, instead of plainly using Simon Jessey’s workaround, I wrote another function that takes one argument (the $mime that was generated by running the previous script), and then outputs the matching code.

<?php
function mj_adsense($mime) {
 if($mime == 'application/xhtml+xml') {
  echo '<object data="/include/adsense.php" type="text/html"></object>';
 } else {
  echo '<script type="text/javascript"><!--
google_ad_client = "pub-7821233126901128";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as";
google_ad_channel ="";
google_ad_type = "text";
google_color_border = "B4D0DC";
google_color_bg = "ECF8FF";
google_color_link = "0000CC";
google_color_url = "008000";
google_color_text = "6F6F6F";
//--></script>
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>';
 }
}
?>

So basically, when serving as application/xhtml+xml, an object will be included in the document (in this case: /include/adsense.php, see below for source code). Otherwise, the standard AdSense JavaScript is used.

My /include/adsense.php looks like this:

<?php header('Content-Type: text/html;charset=utf-8'); ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<title>AdSense | Math Jazz</title>
<style type="text/css">
body { margin: 0; padding: 0; overflow: none; }
</style>
</head>
<body>
<script type="text/javascript"><!--
google_ad_client = "pub-7821233126901128";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as";
google_ad_channel ="";
google_ad_type = "text";
google_color_border = "B4D0DC";
google_color_bg = "ECF8FF";
google_color_link = "0000CC";
google_color_url = "008000";
google_color_text = "6F6F6F";
//--></script>
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>
</body>
</html>

Any suggestions as to validators that could be added? Do you know of a better script? Any ideas on how to improve this script? Please, let me know.

Filed under PHP, XHTML, HTTP · April 16th, 2005

Comments (17)

Listed below are the responses for this entry.

  1. Henrik Lied:
    This commenter’s Gravatar

    To serve application/xhtml+xml to conforming UAs, I use a rewritten version of Keystone Websites’ script.

    Instead of using PHP’s $_SERVER['HTTP_ACCEPT'], I use apache_request_headers(), as it is more extensive, and gives a more fullfilling printout of the accepted headers.

    Comment posted on April 17th, 2005 @ 9:12 pm
  2. Colin D. Devroe:
    This commenter’s Gravatar

    Excellent work! I’ll definitely use some of this in a few near-future projects. Thanks for making it available.

    Comment posted on April 18th, 2005 @ 3:52 am
  3. Mathias:
    This commenter’s Gravatar

    To serve application/xhtml+xml to conforming UAs, I use a rewritten version of Keystone Websites’ script.

    Yeah, Henrik, me too.

    Instead of using PHP’s $_SERVER['HTTP_ACCEPT'], I use apache_request_headers(), as it is more extensive, and gives a more fullfilling printout of the accepted headers.

    Thanks for the tip, all this sounds very interesting. I shall look in to this. Unfortunately, I can’t access your site at the moment (?), so I can’t view your script’s source.

    Comment posted on April 18th, 2005 @ 12:48 pm
  4. Henrik Lied:
    This commenter’s Gravatar

    Thanks for the tip, all this sounds very interesting. I shall look in to this. Unfortunately, I can’t access your site at the moment (?), so I can’t view your script’s source.

    Yes, I’m really sorry about that, my domain-registrar have forgotten to bill me, so the domain expired, and it won’t be available until a few days. I registered misinterpreted.org for one year, just to have something to play with for the next couple of days, though :)

    misinterpreted.org will be up sometime today, but until then, the file is available on Arve Systad’s server.

    Comment posted on April 19th, 2005 @ 3:19 pm
  5. Oliver:
    This commenter’s Gravatar

    Google’s Adsense scripts change their JavaScripts all the time. For one thing, they don’t like to be modified, at all (tracking insufficiency). Soon, the modification of Google ads may be well out of the question.

    Comment posted on April 20th, 2005 @ 5:52 pm
  6. logtar:
    This commenter’s Gravatar

    Here I come thinking I am going to be able to read without thinking and I encounter this…

    Comment posted on April 20th, 2005 @ 6:38 pm
  7. Dante:
    This commenter’s Gravatar

    Very cool! You can also have all .html files parsed as application/xhtml+xml by editing the Windows registry (Start > Run > regedit.exe > HKEY_ something [the first or second folder] > .html). I wouldn’t recommend this, though.

    Comment posted on April 24th, 2005 @ 9:00 pm
  8. Mathias:
    This commenter’s Gravatar

    Dante, what the fuck?! It is your responsibility as a web developer to serve up your files with the correct MIME type — there’s no way you can assume all your visitors hack their registry like that. Besides, the .html extension makes it look like an ordinary HTML file, which should always be served as text/html. You should’ve put .xhtml files :P

    Comment posted on April 27th, 2005 @ 1:29 pm
  9. Dante:
    This commenter’s Gravatar

    Ouch! Calm down man!

    What my previous comment was stating was that you can force all your browsers you use on XP (or at the very least the default browser) to force the type as application/xhtml+xml. It only effects YOUR BROWSER, not the visitors’.

    Also, I was using your code but threw away the thought of application/xhtml+xml in disgust after I realised I could no longer use innerHTML. Fuck that, I can’t live without innerHTML; using DOM methods all the time makes scripts run way slower.

    Nice script, though.

    Comment posted on April 27th, 2005 @ 11:21 pm
  10. Anne:
    This commenter’s Gravatar

    Dante, you make two mistakes (and I make one, by replying so late). (1) innerHTML does work in XHTML; use a recent nightly of Mozilla and be happy and (2) mapping .html to application/xhtml+xml will not really make it being parsed as XML because Internet Explorer does not know that MIME type.

    Comment posted on April 28th, 2005 @ 7:13 am
  11. Dante:
    This commenter’s Gravatar

    I was using a recent Firefox. The problem is other people who aren’t won’t be able to enjoy the extra effects my scripts with innerHTML provide.

    Comment posted on April 29th, 2005 @ 12:46 am
  12. Mathias:
    This commenter’s Gravatar

    Ouch! Calm down man!

    Hey Dante, I’m sorry if I sounded like beyatch up there or anything, I didn’t intend that at all. I just didn’t quite understand what exactly you were telling to do in the Register (HKEY_–what?), and the overall funkiness of your solution amazed me.

    It only effects YOUR BROWSER, not the visitors’.

    Yeah, well that was my whole point.

    Comment posted on April 30th, 2005 @ 4:45 pm
  13. John:
    This commenter’s Gravatar

    Nice work! I definitely hope to use this in the near future. Actually, I’m trying to figure out how to write it up using a class.

    Question though.
    I look at Simon Jessey’s version in order to compare it to yours. In his, he calls a function which converts “/>” to “>” for docs served as text/html. You opted to remove this function. I don’t know where (that’s why I’m asking) but I thought I read that “/>” can break older browsers? I know that adding a space before the slash fixes older browsers from outputting “>” everywhere, but are there other problems older browsers have when fed “/>”, with or without the starting space?

    Hope I made sense. Thanks.

    Comment posted on May 6th, 2005 @ 2:14 pm
  14. Mathias:
    This commenter’s Gravatar

    I don’t know where (that’s why I’m asking) but I thought I read that “/>” can break older browsers?

    That sounds crappy! I never heard anything like it before, though. (An intensive search session for information on this subject had no result, probably because Google doesn’t like />.) Link, anyone?

    After all, this doesn’t have much to do with XHTML content negotiation — it’s more an issue of XHTML in general.

    Comment posted on May 8th, 2005 @ 11:40 am
  15. John:
    This commenter’s Gravatar

    Alright, sorry about that comment. I should of done my research before posting. I did an intense Google session as well and the only thing I found was from the W3C XHTML 1.0 spec, which is what I already knew…

    Include a space before the trailing / and > of empty elements, e.g. <br />, <hr /> and <img src="karen.jpg" alt="Karen" />.

    Comment posted on May 9th, 2005 @ 12:50 am
  16. Frédéric Bouchery:
    This commenter’s Gravatar

    LOL!

    XML parsing error”. This is the message display when you are trying to read this page with Firefox 1.0.3. I was obliged to look for my dusty IE to write this comment.

    Is it a rock solid solution? :-)

    Comment posted on May 9th, 2005 @ 7:55 am
  17. Mathias:
    This commenter’s Gravatar

    Is it a rock solid solution? :-)

    In fact it is, Frédéric! The XML error message you got was caused by John’s latest comment :)

    Comment posted on May 9th, 2005 @ 12:51 pm

Trackbacks & Pingbacks (4)

Listed below are resources on the web that mention this article.

  1. Nexen.net: Négociation de contenu XHTML:
    This commenter’s Gravatar

    Négociation de contenu XHTML
    L’un des problèmes lorsque vous passez votre site en XHTML, c’est que Explorer ne reconnaitra pas le type de contenu application/xhtml+xml, et vous devrez vous adapter pour lui servir le même texte en text/html […]

    Trackback made on May 6th, 2005 @ 1:48 pm
  2. Negociação de conteúdo via PHP para servir documentos XHTML como aplicação XML:
    This commenter’s Gravatar

    Negociação de conteúdo via PHP para servir documentos XHTML como aplicação XML
    Mathias publicou no seu site um excelente artigo mostrando um meio de servir um documento XHTML como XML via PHP.

    Trackback made on July 8th, 2005 @ 6:03 am
  3. Effair: Billet: WordPress et les types de contenu:
    This commenter’s Gravatar

    WordPress et les types de contenu
    Comme nous le savons tous, MSIE ne gère pas correctement les types MIME, plus particulièrement application/xhtml+xml, celui qui sert à déclarer du contenu en XHTML. […]

    Pingback made on October 29th, 2005 @ 6:19 pm
  4. Gary Court: Proper HTTP content negotiation in PHP:
    This commenter’s Gravatar

    […] Last summer, I was looking into a quick PHP script that does HTTP content negotiation for a small project I was working on. After searching around Google for several hours, I found several scripts that determined if a browser supported proper XHTML mime-types or not, but nothing for proper HTTP content negotiation. So I decided to write one. […]

    Pingback made on January 9th, 2006 @ 8:11 pm