Wednesday 12 October 2016

Creating an RSS Aggregator with the PHP SimplePie Library

RSS aggregators such as Google Reader provide a great way to quickly peruse the latest updates and other news from websites that you follow regularly. All you need to do is provide the aggregator with each site's RSS feed location, and the aggregator will retrieve and parse the feed, converting it into a format (HTML in the case of Google Reader) that you can easily peruse.
But what if you want to integrate feeds into your website, or create your own version of an aggregator? Writing custom code capable of efficiently retrieving and parsing the XML that comprises a feed can be a difficult and tedious process, one which has grown increasingly complex with the added support for multimedia content such as podcasts. Thankfully, a number of open source libraries can handle the RSS retrieval and parsing tasks for you. Many of these solutions also offer a number of advanced features such as feed caching in order to reduce bandwidth consumption.
PHP developers are particularly lucky as a fantastic library named SimplePie not only offers the aforementioned features but also supports both RSS and Atom formats, multiple character encodings, and an architecture that makes integration with your favorite content management and blogging platforms a breeze. In this tutorial I'll introduce you to SimplePie, showing you how easy it is to create a rudimentary custom RSS aggregator using this powerful library.

Installing SimplePie

SimplePie requires PHP 4.3 or newer, in addition to PHP's PCRE and XML extensions, both of which are enabled by default. Presuming you meet these minimal requirements, browse to SimplePie's GitHub site and download the latest stable version. Unzip the download and place the directory somewhere within your PHP's include path.
To begin using SimplePie all you need to do is include the simplepie.inc within your PHP script, a task that is typically done using PHP's require_once statement:

require_once("simplepie.inc");
Provided that you have added the SimplePie directory to PHP's include path, you won't need to reference the path within the require statement.
Finally, create a directory named cache somewhere within your project directory, and change the directory owner to the server daemon owner and the permissions to 755, which will allow the server to write to it. SimplePie will use this directory to cache the RSS feeds.

Retrieving a Feed

To demonstrate SimplePie's capabilities let's retrieve and publish the WJGilmore.com RSS feed in HTML format. Believe it or not, you can retrieve and parse the feed using four simple commands:

01 $feed = new SimplePie('http://feeds.feedburner.com/wjgilmorecom');
02 $feed->set_cache_location('/var/www/dev.spiesindc.com/library/cache/');
03 $feed->set_feed_url('http://feeds.feedburner.com/wjgilmorecom');
04 $feed->init();
05 $feed->handle_content_type();
Line 01 instantiates the SimplePie class, exposing the methods we'll subsequently use to retrieve, parse and render the feed. Line 02 defines the location of the cache directory we created earlier in the tutorial. Line 03 defines the RSS feed we'd like to retrieve. Finally, line 04 retrieves and parses the feed, whether via the cache or by reaching out to the feed's online location.
When the feed has been retrieved and parsed, you can use a number of methods to access the feed data, including the feed title, description, and feed items' title and publication date.

foreach ($feed->get_items() as $item) {
  $permalink = $item->get_permalink();
  $title = $item->get_title();
  echo "{$title}
";
}
Executing this example produces the output presented in Figure 1.
SimplePie: Rendering a Feed's Items to a Web Page
Figure 1. Rendering a Feed's Items to a Web Page

Retrieving Multiple Feeds

Of course, any useful aggregator is going to be able to retrieve and simultaneously render multiple feeds. SimplePie is capable of doing not only this, but it can also merge and order the feed entries according to each feed item's publication date!
To retrieve the feeds, all you need to do is populate an array with the various feed addresses and then pass that array to the set_feed_url() method:

$feeds = array (
'http://feeds.feedburner.com/wjgilmorecom',
'http://rss.slashdot.org/Slashdot/slashdot',
'http://online.wsj.com/xml/rss/3_7455.xml'
);
...
$feed->set_feed_url($feeds);
To prove that the feeds are indeed being sorted according to each item's publication date, I've used the $item->get_feed()->get_title() method to place the feed's title next to each item. The results are presented in Figure 2.
SimplePie: Weaving Multiple RSS Feeds Together


Figure 2. Weaving Multiple RSS Feeds Together

Other SimplePie Features

Parsing feeds is only one of SimplePie's many capabilities. Among other features is the ability to retrieve podcasts, send item URLs to other aggregation services such as Digg, Reddit, and Newsvine, and give users the ability to subscribe to published feeds easily. Thanks to these great array of features it really would be possible to build a solution that compares to popular tools such as Google Reader!

Conclusion

SimplePie provides developers with a turnkey solution for integrating RSS feed support into their Web applications. It really is simple as, well, pie.

0 comments:

Post a Comment