Math Jazz — Mathias Bynens’s shizzle, y’all



Note: This site might seem inactive… That’s because it is. Don’t worry though, I’m still coding webpages and stuff! If you’re interested, I suggest you get a translator and head over to Qiwi; or you could just check the latest site we’ve been working on: Optiek Ockerman Dendermonde. Enjoy! Also be sure to check out MacKeys, a tool to convert MacBook keyboard keys to their single-Unicode-character equivalents. It comes with a QWERTY MacBook keyboard in CSS3, so it’s officially cool.

Texturizing and such

I’ve always refused to follow the US English typography style on this site.

That means that periods and commas go inside the end quotation mark — even when it doesn’t make sense to do so.

I want my site to make sense. So I started looking at WP’s wptexturize() function (© Matt). As you can read here, I’ve been thinking about doing that for a while now.

Like I said, the problem with wptexturize() is that it fails on occasions where (ending) quote signs are directly followed by a character different from a space. Here’s an example.

My favourite tracks are "40 ft", "Cheating on You", "Michael", and, of course, "Take Me Out".

wptexturized, this would become:

My favourite tracks are “40 ft", “Cheating on You", “Michael", and, of course, “Take Me Out".

After modifying the function, it outputs:

My favourite tracks are “40 ft”, “Cheating on You”, “Michael”, and, of course, “Take Me Out”.

Which is typographically (more) correct as far as I know.

Actually, I didn’t do that much to accomplish this… Just follow these instructions if you want to do this too.

Here’s how the function should look after applying the modification.

<?php
function wptexturize($text) {
 $output = '';
 // Capture tags and everything inside them
 $textarr = preg_split("/(<.*>)/Us", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
 $stop = count($textarr); $next = true; // loop stuff
 for ($i = 0; $i < $stop; $i++) {
  $curl = $textarr[$i];

  if (isset($curl{0}) && '<' != $curl{0} && $next) { // If it's not a tag
   $curl = str_replace('---', '&#8212;', $curl);
   $curl = str_replace('--', '&#8211;', $curl);
   $curl = str_replace("...", '&#8230;', $curl);
   $curl = str_replace('``', '&#8220;', $curl);

   // This is a hack, look at this more later. It works pretty well though.
   $cockney = array("'tain't", "'twere", "'twas", "'tis", "'twill", "'til", "'bout", "'nuff", "'round");
   $cockneyreplace = array("&#8217;tain&#8217;t", "&#8217;twere", "&#8217;twas", "&#8217;tis", "&#8217;twill", "&#8217;til", "&#8217;bout", "&#8217;nuff", "&#8217;round");
   $curl = str_replace($cockney, $cockneyreplace, $curl);

   $curl = preg_replace("/'s/", '&#8217;s', $curl);
   $curl = preg_replace("/'(\d\d(?:&#8217;|')?s)/", "&#8217;$1", $curl);
   $curl = preg_replace('/(\s|\A|")\'/', '$1&#8216;', $curl);
   $curl = preg_replace('/(\d+)"/', '$1&Prime;', $curl);
   $curl = preg_replace("/(\d+)'/", '$1&prime;', $curl);
   $curl = preg_replace("/(\S)'([^'\s])/", "$1&#8217;$2", $curl);
   $curl = preg_replace('/(\s|\A)"(?!\s)/', '$1&#8220;$2', $curl);
   $curl = preg_replace('/"(\s|\Z)/', '&#8221;$1', $curl);
   $curl = preg_replace("/'([\s.]|\Z)/", '&#8217;$1', $curl);
   $curl = preg_replace("/\(tm\)/i", '&#8482;', $curl);
   $curl = preg_replace("/\(c\)/i", '&#169;', $curl);
   $curl = preg_replace("/\(r\)/i", '&#174;', $curl);
   $curl = str_replace("''", '&#8221;', $curl);
   
   $curl = preg_replace('/(d+)x(\d+)/', "$1&#215;$2", $curl);

   $curl = str_replace('"', '&#8221;', $curl);

  } elseif (strstr($curl, '<code') || strstr($curl, '<pre') || strstr($curl, '<kbd' || strstr($curl, '<style') || strstr($curl, '<script'))) {
   // strstr is fast
   $next = false;
  } else {
   $next = true;
  }
  $output .= $curl;
 }
 return $output;
}
?>
Filed under PHP, XHTML, WordPress · June 25th, 2004

Comments (16)

Listed below are the responses for this entry.

  1. Luke:
    This commenter’s Gravatar

    Ok, a couple of things… I started twitching when I saw the commas on the wrong side of the quotes. :eyes:

    Next, you’re missing a few commas:

    My favourite tracks are “40 ft,” “Cheating on You,” “Michael,” and, of course, “Take Me Out”.

    In a list, you always separate each item with a comma. (Unless you’re using a specified writing standard that allows for the missing comma, but most modern English writing styles require the final comma.)

    “Of course” is a dependent clause and needs the separating commas.

    Comment posted on June 25th, 2004 @ 7:25 pm
  2. Mathias:
    This commenter’s Gravatar

    Thanks for pointing that out, I edited the post.

    The commas aren’t really on the wrong side of the quotes in my opinion (is this an opinion thing? :lol:) — as I mentioned, I just don’t follow the US English typography style (yet?). It really doesn’t make sense to make these commas go inside the end quotation mark, at least not to me. That song isn’t entitled “Cheating on You comma”, right?

    Comment posted on June 26th, 2004 @ 9:59 am
  3. El Bandano:
    This commenter’s Gravatar

    I think Mathias is right.

    Comment posted on June 26th, 2004 @ 11:05 am
  4. Lissa:
    This commenter’s Gravatar

    Luke, despite the fact that I swallow the U.S. typography style in doing the paper (eheh), I totally agree with Mathias. Those commas go outside the quotations marks, dude. It just makes sense semantically—the comma isn’t a part of what’s being quoted, so it goes outside, unless it’s dialogue.

    Comment posted on June 26th, 2004 @ 4:33 pm
  5. Luke:
    This commenter’s Gravatar

    This is what I get for letting my editors (Lissa) know about my blog… They follow me around and make comments about my editing remarks. :P

    This is one of those times when I think it is an opinion thing. I just don’t like the way it looks. The flow of the list is preserved when the punctuation is on the inside of the quotes. It’s a typographical thing; looks overrule semantics.

    Comment posted on June 26th, 2004 @ 5:29 pm
  6. rob:
    This commenter’s Gravatar

    Thanks, I’d noticed this problem but never got round to fixing it myself, you saved me a job :)

    Luke: not even that, proper (at least British) semantics dictate that that the punctuation should go outside of the quotation marks unless it is contained in what’s being quoted.

    Comment posted on June 27th, 2004 @ 12:25 pm
  7. Luke:
    This commenter’s Gravatar

    Right, but we all know the British can’t speak English worth a hoot.

    I mean a really. How modern can you be if you call a flashlight a torch? ;)

    Comment posted on June 27th, 2004 @ 8:33 pm
  8. flump:
    This commenter’s Gravatar

    luke, you deserve a verbal beating for that.

    any british words the americans have, you’ll find they have fewer syllables, as that makes then easier to pronounce. you should be able to work out what i’m getting at here.

    Comment posted on June 27th, 2004 @ 9:01 pm
  9. Luke:
    This commenter’s Gravatar

    Actually, I seem to be missing your point. Are you saying the British are more lazy than the Americans thus they can’t handle saying flashlight instead of torch? Or that the English language didn’t originate in Britain? My joke was based on the fact that English is from there, but our languages are very different. Lighten up a bit…you’ll have more fun because in the game of life none of us make it out alive. :ta:

    Comment posted on June 27th, 2004 @ 11:34 pm
  10. David:
    This commenter’s Gravatar

    Leaving the quotes and comma location aside, my version of WP (1.3-pre-alpha) already has the correct quotes (inclined to the left and to the right, that is). This — problem you are describing — might only apply to WP 1.2.

    My two cents. Keep the change! :)

    Comment posted on June 28th, 2004 @ 1:55 pm
  11. David:
    This commenter’s Gravatar

    Actually, Mathias, you were right and I was wrong. I have done the modifications you recommended and, fair enough, the quotes are fixed. Thanks!

    P.S.: I have approved your comment over .US :)

    Comment posted on June 28th, 2004 @ 8:49 pm
  12. Mathias:
    This commenter’s Gravatar

    You got me there for a second 8-) I started thinking I reinvented the weel.

    Actually, I hope this does get implemented in WP, as I think it’s part of localization: not every country’s typography style is US English–like.

    Comment posted on June 28th, 2004 @ 8:54 pm
  13. Matt:
    This commenter’s Gravatar

    Yep that’s a bug that was introduced with the last version of WP. Sorry! The good news is I’ll fix it before 1.3. The bad news is your fix breaks other cases (curling quotes where they shouldn’t be curled).

    Comment posted on August 27th, 2004 @ 11:12 am
  14. Matt:
    This commenter’s Gravatar

    BTW, I like what you’ve done with the comment form around here, especially the highlighting of the new comment.

    Comment posted on August 27th, 2004 @ 11:13 am
  15. Mathias:
    This commenter’s Gravatar

    I don’t fully understand where quotes should not be curled… :s

    Thanks for the comment form compliments :) In case you’re interested in redesigning it the way I did, you might want to read this post.

    Comment posted on August 27th, 2004 @ 1:22 pm
  16. Matt:
    This commenter’s Gravatar

    This has now been fixed in CVS.

    Comment posted on September 8th, 2004 @ 11:08 am

Trackbacks & Pingbacks (1)

Listed below are resources on the web that mention this article.

  1. Roblog: Texturize Fix: Rob Miller’s Blog:
    This commenter’s Gravatar

    […] June 2004 Texturize Fix Filed under: Random Stuff — Rob @ 11:38 am Mathibus offers a fix for the quotation “bug” in Texturize. […]

    Pingback made on June 27th, 2004 @ 10:27 am