skip to content

PHP: Combined RSS and Atom Feed Reader

This is a long overdue upgrade to our original scripts for parsing RSS and Atom feeds from websites. While the old versions relied on the PHP XML library for parsing, our new FeedReader PHP class takes advantage of SimpleXML to traverse the document elements and namespaces of the RSS feed.

RSS/Atom Feed Reader Demonstration

Use the form below to select a feed and run it through our parser. Note that most feeds are free to access for personal use, but if you want to use them for a commercial site or application you might have to pay the provider.

RSS/Atom Feed Content

When an RSS or Atom feed has been loaded using the form above the contents will appear in this section. If nothing appears, or an error is displayed, there may be a problem with the feed URL or XML formatting.

Only the first five items from each RSS feed are being displayed.

PHP Source Code

Here is the complete source code for \Chirp\FeedParser:

<?PHP namespace Chirp; class FeedParser { // Original PHP code by Chirp Internet: www.chirpinternet.eu // Please acknowledge use of this code by including this header. private $channel = []; private $items = []; public function __construct($file) { libxml_use_internal_errors(TRUE); $xml = simplexml_load_file($file, NULL, LIBXML_NOCDATA); if(FALSE === $xml) { throw new \Exception(trim(libxml_get_errors()[0]->message)); libxml_clear_errors(); return FALSE; } $namespaces = $xml->getNameSpaces(TRUE); if($xml->channel) { $channel = $xml->channel; } else { $channel = $xml; } if($xml->channel->item) { $items = $xml->channel->item; } elseif($xml->item) { $items = $xml->item; } elseif($xml->entry) { $items = $xml->entry; }; foreach($channel->children() as $key => $val) { switch($key) { case "item": case "entry": // these will be parsed below as items continue 2; case "link": if($val->attributes()) { if("self" == $val->attributes()->rel) { // link rel="self" identified $this->channel[$key] = (string) $val->attributes()->href; } } else { $this->channel[$key] = (string) $val; } break; default: if(count($val) > 1) { $this->channel[$key] = (array) $val; } elseif($val) { $this->channel[$key] = (string) $val; } } } if(isset($items) && $items) { foreach($items as $item) { $item_data = []; foreach($item->children() as $key => $val) { switch($key) { case "link": if($val->attributes()) { if("alternate" == $val->attributes()->rel) { // link rel="alternate" identified $item_data[$key] = (string) $val->attributes()->href; } } else { $item_data[$key] = (string) $val; } break; case "enclosure": if($val->attributes()) { foreach($val->attributes() as $key2 => $val2) { $item_data[$key][$key2] = (string) $val2; } } break; default: if($val->attributes()['type'] && ("xhtml" == $val->attributes()['type'])) { $val = $val->children()->asXML(); } if(count($val) > 1) { $item_data[$key] = (array) $val; } else { $item_data[$key] = (string) $val; } } } if($namespaces) { foreach($namespaces as $ns => $url) { if(!$ns) continue; foreach($item->attributes($ns, TRUE) as $key => $val) { $item_data["{$ns}:{$key}"] = (string) $val; } foreach($item->children($url) as $key => $val) { if($item->children($url)->{$key}->attributes()) { foreach($item->children($url)->{$key}->attributes() as $key2 => $val2) { $item_data["{$ns}:{$key}"][$key2] = (string) $val2; } } if((string) $val) { $item_data["{$ns}:{$key}"] = (string) $val; } } } } // for each namespace $this->items[] = $item_data; } // for each item in feed } } private function finesseDate($str) { if(strtotime($str)) { return date('j F Y', strtotime($str)); } return $str; } // display a single image as HTML private function display_image($arr, $caption = NULL) { $retval = ""; if(!isset($arr['url']) || !$arr['url']) { return $retval; } if(isset($arr['link']) && $arr['link']) { $retval .= "<a href=\"{$arr['link']}\" target=\"_blank\">"; } $retval .= "<img class=\"feature-image\" src=\"{$arr['url']}\""; if(isset($arr['width'], $arr['height']) && $arr['width'] && $arr['height']) { $retval .= " width=\"{$arr['width']}\" height=\"{$arr['height']}T\""; } $retval .= " alt=\"" . htmlspecialchars($caption ?? $arr['title'] ?? "") . "\">"; if(isset($arr['link']) && $arr['link']) { $retval .= "</a>"; } return $retval; } // display a single channel as HTML public function display_channel() { $retval = ""; $data = $this->channel; if(!isset($data['title']) || !$data['title']) { return $retval; } $retval .= "<h1>"; if(isset($data['link']) && $data['link']) { $retval .= "<a href=\"" . htmlspecialchars($data['link']) . "\" target=\"_blank\">"; } $retval .= stripslashes($data['title']); if(isset($data['link']) && $data['link']) { $retval .= "</a>"; } if(isset($data['subtitle']) && $data['subtitle']) { $retval .= "<br>\n<small>" . stripslashes($data['subtitle']) . "</small>"; } $retval .= "</h1>\n"; if(isset($data['image']) && is_array($data['image']) && $data['image']) { $retval .= "<p class=\"image\">" . $this->display_image($data['image']) . "</p>\n"; } if(isset($data['description']) && $data['description']) { $retval .= "<p>" . stripslashes($data['description']) . "</p>\n\n"; } $tmp = []; if(isset($data['updated']) && $data['updated']) { $updated = $this->finesseDate($data['updated']); $tmp[] = "Updated: {$updated}"; } if(isset($data['copyright']) && $data['copyright']) { $tmp[] = "Copyright: {$data['copyright']}"; } if(isset($data['author']) && $data['author']) { if(isset($data['author']['name']) && $data['author']['name']) { $author_out = $data['author']['name']; if(isset($data['author']['uri']) && $data['author']['uri']) { $author_out = "<a href=\"{$data['author']['uri']}\">{$author_out}</a>"; } $tmp[] = "Author: {$author_out}"; } } if($tmp) { $retval .= "<p><small>" . implode("<br>\n", $tmp) . "</small></p>\n\n"; } unset($tmp); $retval .= "<div class=\"divider\"><!-- --></div>\n\n"; return $retval; } // display a single item as HTML private function display_item($idx) { $retval = ""; if(!isset($this->items[$idx])) { return $retval; } $item = $this->items[$idx]; if(!isset($item['link'])) { if(isset($item['guid']) && $item['guid']) { $item['link'] = $item['guid']; } elseif(isset($item["rdf:about"])) { $item['link'] = $item["rdf:about"]; } } if(!isset($item['updated'])) { if(isset($item['pubDate']) && $item['pubDate']) { $item['updated'] = $item['pubDate']; } elseif(isset($item["dc:date"])) { $item['updated'] = $item["dc:date"]; } } if(!isset($item['content'])) { if(isset($item['content:encoded']) && $item['content:encoded']) { $item['content'] = $item['content:encoded']; } elseif(isset($item['description']) && $item['description']) { $item['content'] = $item['description']; } } $retval .= "<div class=\"title\">\n"; if(isset($item['media:thumbnail']) && is_array($item['media:thumbnail']) && $item['media:thumbnail']) { $retval .= "<div class=\"thumb\">" . $this->display_image($item['media:thumbnail']) . "</div>\n"; } $retval .= "<h3>"; if(isset($item['link']) && $item['link']) { $retval .= "<a href=\"{$item['link']}\" target=\"_blank\">"; } $retval .= stripslashes($item['title']); if(isset($item['link']) && $item['link']) { $retval .= "</a>"; } $retval .= "</h3>\n"; if(isset($item['updated']) && $item['updated']) { $item['updated'] = $this->finesseDate($item['updated']); $retval .= " <span class=\"updated\">{$item['updated']}</span>"; } $retval .= "</div>\n"; if(isset($item['media:content']) && is_array($item['media:content']) && $item['media:content']) { if(!isset($item['media:content']['type']) || ("image" == $item['media:content']['type'])) { $retval .= "<p>" . $this->display_image($item['media:content'], $item['media:description'] ?? "") . "</p>\n"; } } if(isset($item['enclosure']) && $item['enclosure']) { $retval .= "<p class=\"enclosure\"><strong>Media:</strong> <a href=\"{$item['enclosure']['url']}\">"; $retval .= $item['enclosure']['type']; $retval .= "</a>"; if(isset($item['enclosure']['length'])) { $retval .= " (" . number_format($item['enclosure']['length'] / 1024, 1) . " kb)</small>"; } $retval .= "</p>\n\n"; } if(isset($item['content']) && $item['content']) { $retval .= "<p>" . stripslashes($item['content']) . "</p>\n\n"; } return $retval; } // display $num items from the feed public function display_items($num = 5) { $retval = ""; for($idx=0; $idx < $num; $idx++) { $retval .= $this->display_item($idx); } return $retval; } public function get_channel() { return $this->channel; } public function get_items($num = 5, $offset = 0) { return array_slice($this->items, $offset, $num); } }

expand code box

As you can see, a lot of the heavy lifting is done in the constructor function which parses the file using SimpleXML and populates the channel and items local variables.

The various display_* methods then convert the stored array values into HTML which is returned for display. The two public methods used here other than the constructor are display_channel and display_items.

The main improvements over the previous code are:

  • no longer reliant on the eval function;
  • a single script to parse RSS, RSS 2.0 and Atom feeds;
  • generic handling of namespace elements and attributes; and
  • graceul handling of XML parse errors;

Sample Usage

Assuming you have an RSS or Atom XML file stored locally, you can parse and display the contents as HTML as follows:

<?PHP try { $parser = new \Chirp\FeedParser("/path/to/xmlfile.xml"); echo $parser->display_channel(); echo $parser->display_items(5); } catch(\Exception $e) { die("XML parse error: " . $e->getMessage()); } ?>

Some additional coding will be necessary if you have to first fetch and cache a remote file before parsing.

Depending on your PHP settings you may be able to just supply a URL to be fetched, but often this functionality has been disabled for security reasons.

If that is the case you will need something like our http_get_contents function.

Namespaces

One of the most painful aspects of XML is dealing with namespaces. In the case of RSS feeds you will find all kinds of prefixes embedded in the XML.

Our feed reader currently recognises a few tags in the rdf (RDF/XML), dc (Dublin Core) and media (Yahoo!) namespaces. Other elements and attributes are loaded by the parser, just not used by the display functions.

You can find some handy resources under References below.

References

< PHP

Post your comment or question
top