PHP: RSS Feed Reader: Source CodeThis page presents a simple class with a constructor and two public functions: getOutput returns an HTML-formatted version of the RSS feed, while getRawOutput returns all the attributes in a single multi-level array. <?PHP
include "class.myrssparser.php";
# where is the feed located?
$url = "http://www.example.net/rss.xml";
# create object to hold data and display output
$rss_parser = new myRSSParser($url);
$output = $rss_parser->getOutput(); # returns string containing HTML
echo $output;
?>
Yes, it really can be that simple. Source code of class.myrssparser.phpThis class is by no means the be-all and end-all of RSS parsing. It's designed to be simple, functional and easily customisable. It appears to work for all RSS formats, and can be extended to handle new formats - or perhaps further to handle general XML parsing. File: class.myrssparser.php <?PHP
# Original PHP code by Chirp Internet: www.chirp.com.au
# Please acknowledge use of this code by including this header.
class myRSSParser
{
# keeps track of current and preceding elements
var $tags = array();
# array containing all feed data
var $output = array();
# return value for display functions
var $retval = "";
# constructor for new object
function myRSSParser($file)
{
# instantiate xml-parser and assign event handlers
$xml_parser = xml_parser_create("");
xml_set_object($xml_parser, $this);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "parseData");
# open file for reading and send data to xml-parser
$fp = @fopen($file, "r") or die("myRSSParser: Could not open $file for input");
while($data = fread($fp, 4096)) {
xml_parse($xml_parser, $data, feof($fp)) or die(
sprintf("myRSSParser: Error <b>%s</b> at line <b>%d</b><br>",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser))
);
}
fclose($fp);
# dismiss xml parser
xml_parser_free($xml_parser);
}
function startElement($parser, $tagname, $attrs=array())
{
# RSS 2.0 - ENCLOSURE
if($tagname == "ENCLOSURE" && $attrs) {
$this->startElement($parser, "ENCLOSURE");
foreach($attrs as $attr => $attrval) {
$this->startElement($parser, $attr);
$this->parseData($parser, $attrval);
$this->endElement($parser, $attr);
}
$this->endElement($parser, "ENCLOSURE");
}
# check if this element can contain others - list may be edited
if(preg_match("/^(RDF|RSS|CHANNEL|IMAGE|ITEM)/", $tagname)) {
if($this->tags) {
$depth = count($this->tags);
list($parent, $num) = each($tmp = end($this->tags));
if($parent) $this->tags[$depth-1][$parent][$tagname]++;
}
array_push($this->tags, array($tagname => array()));
} else {
if(!preg_match("/^(A|B|I)$/", $tagname)) {
# add tag to tags array
array_push($this->tags, $tagname);
}
}
}
function endElement($parser, $tagname)
{
if(!preg_match("/^(A|B|I)$/", $tagname)) {
# remove tag from tags array
array_pop($this->tags);
}
}
function parseData($parser, $data)
{
# return if data contains no text
if(!trim($data)) return;
$evalcode = "\$this->output";
foreach($this->tags as $tag) {
if(is_array($tag)) {
list($tagname, $indexes) = each($tag);
$evalcode .= "[\"$tagname\"]";
if(${$tagname}) $evalcode .= "[" . (${$tagname} - 1) . "]";
if($indexes) extract($indexes);
} else {
if(preg_match("/^([A-Z]+):([A-Z]+)$/", $tag, $matches)) {
$evalcode .= "[\"$matches[1]\"][\"$matches[2]\"]";
} else {
$evalcode .= "[\"$tag\"]";
}
}
}
eval("$evalcode = $evalcode . '" . addslashes($data) . "';");
}
# display a single channel as HTML
function display_channel($data, $limit)
{
extract($data);
if($IMAGE) {
# display channel image(s)
foreach($IMAGE as $image) $this->display_image($image);
}
if($TITLE) {
# display channel information
$this->retval .= "<h1>";
if($LINK) $this->retval .= "<a href=\"$LINK\" target=\"_blank\">";
$this->retval .= stripslashes($TITLE);
if($LINK) $this->retval .= "</a>";
$this->retval .= "</h1>\n";
if($DESCRIPTION) $this->retval .= "<p>$DESCRIPTION</p>\n\n";
$tmp = array();
if($PUBDATE) $tmp[] = "<small>Published: $PUBDATE</small>";
if($COPYRIGHT) $tmp[] = "<small>Copyright: $COPYRIGHT</small>";
if($tmp) $this->retval .= "<p>" . implode("<br>\n", $tmp) . "</p>\n\n";
$this->retval .= "<div class=\"divider\"><!-- --></div>\n\n";
}
if($ITEM) {
# display channel item(s)
foreach($ITEM as $item) {
$this->display_item($item, "CHANNEL");
if(is_int($limit) && --$limit <= 0) break;
}
}
}
# display a single image as HTML
function display_image($data, $parent="")
{
extract($data);
if(!$URL) return;
$this->retval .= "<p>";
if($LINK) $this->retval .= "<a href=\"$LINK\" target=\"_blank\">";
$this->retval .= "<img src=\"$URL\"";
if($WIDTH && $HEIGHT) $this->retval .= " width=\"$WIDTH\" height=\"$HEIGHT\"";
$this->retval .= " border=\"0\" alt=\"$TITLE\">";
if($LINK) $this->retval .= "</a>";
$this->retval .= "</p>\n\n";
}
# display a single item as HTML
function display_item($data, $parent)
{
extract($data);
if(!$TITLE) return;
$this->retval .= "<p><b>";
if($LINK) $this->retval .= "<a href=\"$LINK\" target=\"_blank\">";
$this->retval .= stripslashes($TITLE);
if($LINK) $this->retval .= "</a>";
$this->retval .= "</b>";
if(!$PUBDATE && $DC["DATE"]) $PUBDATE = $DC["DATE"];
if($PUBDATE) $this->retval .= " <small>($PUBDATE)</small>";
$this->retval .= "</p>\n";
# use feed-formatted HTML if provided
if($CONTENT["ENCODED"]) {
$this->retval .= "<p>" . stripslashes($CONTENT["ENCODED"]) . "</p>\n";
} elseif($DESCRIPTION) {
$this->retval .= "<p>" . stripslashes($DESCRIPTION) . "</p>\n\n";
}
# RSS 2.0 - ENCLOSURE
if($ENCLOSURE) {
$this->retval .= "<p><small><b>Media:</b> <a href=\"{$ENCLOSURE['URL']}\">";
$this->retval .= $ENCLOSURE['TYPE'];
$this->retval .= "</a> ({$ENCLOSURE['LENGTH']} bytes)</small></p>\n\n";
}
if($COMMENTS) {
$this->retval .= "<p style=\"text-align: right;\"><small>";
$this->retval .= "<a href=\"$COMMENTS\">Comments</a>";
$this->retval .= "</small></p>\n\n";
}
}
function fixEncoding($input, $output_encoding)
{
if(!function_exists('mb_detect_encoding')) return $input;
$encoding = mb_detect_encoding($input);
switch($encoding) {
case 'ASCII':
case $output_encoding:
return $input;
case '':
return mb_convert_encoding($input, $output_encoding);
default:
return mb_convert_encoding($input, $output_encoding, $encoding);
}
}
# display entire feed as HTML
function getOutput($limit=false, $output_encoding='UTF-8')
{
$this->retval = "";
$start_tag = key($this->output);
switch($start_tag) {
case "RSS":
# new format - channel contains all
foreach($this->output[$start_tag]["CHANNEL"] as $channel) {
$this->display_channel($channel, $limit);
}
break;
case "RDF:RDF":
# old format - channel and items are separate
if(isset($this->output[$start_tag]['IMAGE'])) {
foreach($this->output[$start_tag]['IMAGE'] as $image) {
$this->display_image($image);
}
}
foreach($this->output[$start_tag]['CHANNEL'] as $channel) {
$this->display_channel($channel, $limit);
}
foreach($this->output[$start_tag]['ITEM'] as $item) {
$this->display_item($item, $start_tag);
}
break;
case "HTML":
die("Error: cannot parse HTML document as RSS");
default:
die("Error: unrecognized start tag '$start_tag' in getOutput()");
}
return $this->fixEncoding($this->retval, $output_encoding);
}
# return raw data as array
function getRawOutput($output_encoding='UTF-8')
{
return $this->fixEncoding($this->output, $output_encoding);
}
}
?>
The parsing of the RSS feed into a PHP array is done by the myRSSParser class using the startElement, endElement and parseData functions. The remaining functions are used only for displaying the data or accessing the raw data. Fields Supported by DefaultThis script supports the following attributes (fields) by default but can easily be extended. See the Feed Reader Demonstration for examples of parsed RSS (and Atom) feeds. Channel (RSS or RDF:RDF)
Item
If you think it's worth adding support for other RSS attributes, please let us know using the Feedback link below. Multibyte String Function supportIf your PHP install doesn't include Multibyte String Function support then you will see some errors. You can get around that by jettisoning the fixEncoding function. In other words, replacing: return $this->fixEncoding($this->retval, $output_encoding);
with just: return $this->retval;
The feed will then be displayed using it's original character encoding, which may or may not match the encoding of your HTML page, but other than that shouldn't be a problem. Related Articles
ReferencesFeedback and Questions14 January 2006: Akash Takyar says: Excellent thats what I was looking for. Thanks 13 November 2006: jf says: your source code is very badly formatted when i try to cut and paste it into my php editor - carriage returns are missing, so i have to manually edit the code into order to make it readable. I suggest you try using a different browser when copying, or check whether your editor supports UNIX/Mac<->Windows line break conversion, but stay tuned as well for a download option. 11 December 2006: Abhijith Babu says: Wonderfull, great job... 2 September 2007: Roozbeh says: Thank You! This awesome; just what I needed. 16 September 2007: Ben says: I changed the 'fopen or die' do be 'fopen; if fp;' because I found that when the feed I was including timed out, my entire front page was die()ing. What I normally do is have two scripts - script #1 regularly downloads the RSS Feed and caches the content and script #2 displays the feed using the cached file. That way if the feed source becomes unavailable your page doesn't die() 6 December 2007: Brent says: I get the following: That means that Multibyte String Function support hasn't been included in your PHP install. If you remove the fixEncoding function and calls from the PHP script then you can avoid those problems, but you then have to accept the original encoding of the RSS feed. 22 February 2008: Shailesh Gajjar says: I am using the RSS Class but i am getting the problem when i use this RSS URL - www.example.net/atom.xml There are two types of feed - RSS and Atom - and we have different classes for each of them. It looks like your feed is in Atom format so you should be using the Atom Feed Reader. 15 September 2008: eviriyanti (-) says: Thanks for this article, its really help me. 19 March 2009: joe w says: this is an excellent tutorial. i searched high and low for an rss tutorial and this one is miles ahead of the others. thank you very much for it. i would like to ask, how do you limit the results per page? Hi Joe, you just need to pass the number of items you want to display as the first argument to the getOutput() function. 27 March 2009: Keith Chadwick (glaslyn iT) says: I have no display whatso ever!!!! Hi Keith, it sounds like your webserver is denying access to the request from PHP. That can happen for example if you have a firewall or filtering rules (mod_rewrite) that deny access when there is no HTTP_USER_AGENT. Check your server logs for a 403 error. 10 June 2009: Ben says: Using blogger's atom.xml, the > and < and some / used in <br /> are not being parsed out, and are appearing in the html. Any ideas? If you send me the feed URL I can check it out 20 July 2009: Esteban (Takeoff Media) says: First of all, your RSS Feed Reader class is great. Thanks for sharing it. A few people have asked about this. I suggest something like the following: 21 September 2009: Jeff Quiros says: Your class.myrssparser.php has been extremely helpful to me in understanding creating/displaying RSS feeds, but in the code as copied onto my server. I get a huge string of error messages. The first few are as follows: The errors you're seeing are really "Notices" saying that a variable (array index) is being referenced without previously being created/initialised. You can suppress these messages by setting your error_reporting level in PHP to "E_ALL ^ E_NOTICE" so it displays only actual errors and warnings and not notices. |
|
|
© Copyright 2010 Chirp Internet
- Page Last Modified: 22 November 2009
|
|