PHP: Displaying Twitter Search Results in HTML

Tweet 0 Shares 0 Tweets 5 Comments

Here's a new use for our RSS Feed Reader - displaying content returned by the Twitter API. In this article we are simply fetching and presenting the most recent 'tweets' containing a particular string, but there's no reason why the same code can't be used for other RSS content returned by Twitter.

Twitter has finally retired their REST API and searching the timeline now requires authentication. The code presented on this page will no longer work./p>

These links will give you what you need:

Fetching the feed contents from Twitter

The first step is to define our parameters:

<?PHP
  if(!isset($_GET['q']) || !$q = $_GET['q']) $q = 'twitter';

  $NUMITEMS   = 10;
  $BLOGURL    = "http://search.twitter.com/search.rss?q=" . urlencode($q) , "&rpp=$NUMITEMS";
  $TIMEFORMAT = "j F Y, g:ia";
  $CACHEFILE  = "/tmp/" . md5($BLOGURL);
  $CACHETIME  = 0.5; // hours

You can see from the first line that this script accepts a single $_GET parameter $q defining the search string. If this variable is not set then the default search is for 'twitter'. The other parameters should be self-explanatory.

To do a search on Twitter we just send an HTTP request to their search script including the format (in this case 'rss') and the query string. Other more advanced options (see References below) are also available. For our purposes just the rpp (number of tweets to return per page) parameter is enough. No authentication is necessary as we're only reading public content, but it is rate limited.

For security it's a good idea to expand any 'shortened' urls in the results. That way people know what they're clicking on. We do this using curl and a crafty regular expression that detects most known url-shortening services:

  # Original PHP code by Chirp Internet: www.chirpinternet.eu
  # Please acknowledge use of this code by including this header.

  function expandLinks(&$input)
  {
    // links matching the following regular expressions will be checked for redirects and expanded
    $domains = array(
      '[a-z0-9]{2,3}\.[a-z]{2}', '[a-z]{3,4}url\.com'
    );
    if(preg_match_all("@http://((" . implode("|", $domains) . ")/[-a-z0-9]+)@i", $input, $matches)) {
      $matches = array_unique($matches[1]);
      foreach($matches as $shorturl) {
        $command = "curl --head " . escapeshellarg($shorturl) . " | awk '($1 ~ /^Location/){print $2}'";
        if($expandedurl = exec($command)) {
          $input = str_replace("http://$shorturl", htmlspecialchars($expandedurl), $input);
        }
      }
    }
  }

We could also just try to expand every URL detected, but then you're making a lot of unnecessary HEAD requests for those links that pass through Twitter unshortened.

Important: Using the PHP Curl library is also possible and might be necessary if you're using Windows. The script presented above requires command-line curl access and the awk function. You can find a script that uses PHP Curl under References below.

Now we need a way to download and store (cache) the feed contents. We do this using a simple function:

  function updateFeed()
  {
    global $BLOGURL, $CACHEFILE;

    ini_set('user_agent', "TheArtOfWeb (http://{$_SERVER['HTTP_HOST']})");
    if($feed_contents = file_get_contents($BLOGURL)) {

      // expand shortened urls
      expandLinks($feed_contents);

      // write feed contents to cache file
      $fp = fopen($CACHEFILE, 'w');
      fwrite($fp, $feed_contents);
      fclose($fp);
    }
  }

The updateFeed function when called simply downloads and stores the raw RSS file contents. Note that before making the API request to Twitter we are setting the user agent. You don't have to set a user agent, but if you don't your requests can be rate-limited.

Displaying the search results in HTML format

In the next section we display a short paragraph introducing the search results. Both the search string and the last modified date of the cached file are displayed:

  echo "<p>Read the latest tweets mentioning <b>$q</b> as of ";
  if(file_exists($CACHEFILE)) {
    echo date('g:ia', filemtime($CACHEFILE)) . ' local time';
  } else {
    echo 'right now';
  }
  echo ":</p>\n\n";

If no cached file exists for this query the updateFeed function is called to fetch it for the first time.

  // download the feed iff cached version is missing
  if(!file_exists($CACHEFILE)) updateFeed();

Finally we're ready to display the search results on the page. This section is almost identical to that introduced in previous articles, the only difference being that it's been customised specifically for Twitter content:

  include "rssparser.php";
  $rss_parser = new RSSParser($CACHEFILE);

  // read feed data from cache file
  $feeddata = $rss_parser->getRawOutput();
  extract($feeddata['RSS']['CHANNEL'][0], EXTR_PREFIX_ALL, 'rss');

  // display feed items
  if($rss_ITEM) {
    echo "<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\">\n";
    foreach($rss_ITEM as $itemdata) {
      preg_match("/^(.*)@twitter\.com \((.*)\)$/", $itemdata['AUTHOR'], $regs);
      list($foo, $author, $name) = $regs;
      echo "<tr>\n";
      echo "<td><a title=\"$name\" href=\"http://twitter.com/$author\" target=\"_blank\"><img src=\"{$itemdata['GOOGLE:IMAGE_LINK']}\" width=\"48\" height=\"48\" border=\"0\" alt=\"$author\"></a></td>\n";
      echo "<td><p><a href=\"http://twitter.com/$author\" target=\"_blank\">$author</a>: ";
      echo str_replace('<a ', '<a target="_blank" ', stripslashes($itemdata['DESCRIPTION']));
      echo "<br>\n";
      echo "<small>";
      echo date($TIMEFORMAT, strtotime($itemdata['PUBDATE']));
      echo " <a href=\"{$itemdata['GUID']}\" target=\"_blank\">View Tweet</a>";
      echo "</small>";
      echo "</p></td>\n";
      echo "</tr>\n";
    }
    echo "</table>\n\n";
  }
?>

The output, as you can probably work out, will be a two-column HTML table displaying the users twitter icon, the update text and timestamp. We've also included some links where appropriate.

As a final step the following code should be placed right at the end of the page after all the HTML has been displayed:

<?PHP
  // download the feed iff cached version is too old
  if((time() - filemtime($CACHEFILE)) > 3600 * $CACHETIME) {
    flush();
    updateFeed();
  }
?>

The function of this code and the reason for placing it at the end of the page has been described in a previous article.

Sample output of Twitter search results

Here you can see the output of this script more or less exactly as presented. All we've done is added some smilies:

Twitter has finally retired their REST API and searching the timeline now requires authentication. The code presented on this page will no longer work.

This script is experimental so you use it at your own risk. Please send questions and comments via the Feedback link below.

Contents of the Twitter search RSS Feed

Here you can see exactly what kind of data we're receiving back from Twitter in RSS format:

expand code box

All the code in one place

By popular request, here's a copy of all the above code in one place. It's a direct copy of the code used to display the results on this page:

<?PHP
  if(!isset($_GET['q']) || !$q = $_GET['q']) $q = 'twitter';

  $NUMITEMS   = 10;
  $BLOGURL    = "http://search.twitter.com/search.rss?q=" . urlencode($q) . "&rpp=$NUMITEMS";
  $TIMEFORMAT = "j F Y, g:ia";
  $CACHEFILE  = "/tmp/" . md5($BLOGURL);
  $CACHETIME  = 0.5; // hours

  # Original PHP code by Chirp Internet: www.chirpinternet.eu
  # Please acknowledge use of this code by including this header.

  function expandLinks(&$input)
  {
    // links matching the following regular expressions will be checked for redirects and expanded
    $domains = array(
      '[a-z0-9]{2,3}\.[a-z]{2}', '[a-z]{3,4}(url|ly)\.com'      
    );
    if(preg_match_all("@http://((" . implode("|", $domains) . ")/[-a-z0-9]+)@i", $input, $matches)) {
      $matches = array_unique($matches[1]);
      foreach($matches as $shorturl) {
        $command = "curl --head " . escapeshellarg($shorturl) . " | awk '($1 ~ /^Location/){print $2}'";
        if($expandedurl = exec($command)) {
          $input = str_replace("http://$shorturl", htmlspecialchars($expandedurl), $input);
        }
      }
    }
  }

  function updateFeed()
  {
    global $BLOGURL, $CACHEFILE;

    ini_set('user_agent', "TheArtOfWeb (http://{$_SERVER['HTTP_HOST']})");
    if($feed_contents = file_get_contents($BLOGURL)) {
      # expand shortened urls
      expandLinks($feed_contents);

      # write feed contents to cache file
      $fp = fopen($CACHEFILE, 'w');
      fwrite($fp, $feed_contents);
      fclose($fp);
    }
  }

  echo "<p>Read the latest tweets mentioning <b>$q</b> as of ";
  if(file_exists($CACHEFILE)) {
    echo date('g:ia', filemtime($CACHEFILE)) . ' local time';
  } else {
    echo 'right now';
  }
  echo ":</p>\n\n";

  # download the feed iff cached version is missing
  if(!file_exists($CACHEFILE)) updateFeed();

  include "rssparser.php";
  $rss_parser = new RSSParser($CACHEFILE);

  # read feed data from cache file
  $feeddata = $rss_parser->getRawOutput();
  extract($feeddata['RSS']['CHANNEL'][0], EXTR_PREFIX_ALL, 'rss');

  # display feed items
  if($rss_ITEM) {
    echo "<table border=\"0\" cellpadding=\"5\" cellspacing=\"0\">\n";
    foreach($rss_ITEM as $itemdata) {
      preg_match("/^(.*)@twitter\.com \((.*)\)$/", $itemdata['AUTHOR'], $regs);
      list($foo, $author, $name) = $regs;
      echo "<tr>\n";
      echo "<td><a title=\"$name\" href=\"http://twitter.com/$author\" target=\"_blank\"><img src=\"{$itemdata['GOOGLE:IMAGE_LINK']}\" width=\"48\" height=\"48\" border=\"0\" alt=\"$author\"></a></td>\n";
      echo "<td><p style=\"width: 600px; overflow: auto;\"><a href=\"http://twitter.com/$author\" target=\"_blank\">$author</a>: ";
      echo str_replace('<a ', '<a rel="nofollow" target="_blank" ', stripslashes($itemdata['DESCRIPTION']));
      echo "<br>\n";
      echo "<small><a style=\"text-decoration: none; color: inherit;\" href=\"{$itemdata['GUID']}\" target=\"_blank\">";
      echo date($TIMEFORMAT, strtotime($itemdata['PUBDATE']));
      echo "</a></small>";
      echo "</p></td>\n";
      echo "</tr>\n";
    }
    echo "</table>\n\n";
  }
?>

expand code box

And here's the code again to be included at the end of the page:

<?PHP
  // download the feed iff cached version is too old
  if((time() - filemtime($CACHEFILE)) > 3600 * $CACHETIME) {
    flush();
    updateFeed();
  }
?>

References

< PHP

User Comments

Post your comment or question

karan 12 February, 2011

Thanks for the code. I'm not too familiar with caching yet but does this cache only the data from the rss feed. Do the remote images get cached to the server as well? So that they're not always loaded remotely.

Yes, only the data from the rss feed is cached - for $CACHETIME hours. Images (the avatar icons) are still sourced from Twitter. You could cache them locally if you wanted, but they will be always changing, and the largest is only around 10kb, so it's probably not worthwhile.

Dan 9 February, 2011

I worked through your tutorial and it worked like a charm. Thank you. Only question I have is that it seems to mash all hyperlinks together, removing the spaces between them. This includes hastags and bit.ly urls (i took out the expand links function).

It looks like there's a but in the RSS parser - when two links are separated only by white-space the white-space is being removed. I'll see if it can be patched...

dpoelen 7 December, 2010

Question: I might've missed it but where does $rss_ITEM come from?

The preceeding extract command extracts elements from a section of the $feeddata array into normal variables prefixed with rss_.

Federico 23 September, 2010

Thanks for your code. Is really great!
Regards from Argentina!

Edwin 22 June, 2010

I'm trying to implement your code on the website to follow tweets containing the keyword tourdekans. If I copy the parts of the code on the site into one file and load it it dispays an error 500. Is it possible to supply me the php file? Second question. How does that user agent work?

I've placed all the code together now at the end of the article. If you have trouble copying, try not using Firefox. For an explanation of the user agent, you should read this article.