The Arcadia Mashups Blog: October 2009

Thursday, October 29, 2009

Autodiscoverable RSS Feeds From Cambridge Libraries

One of the very many clever things that folk have worked out what they can do with web pages, web browsers and such like is a way of supporting the autodiscovery of RSS feeds associated with a web page by declaring the location of the web feed within a <link> tag in the <head> of a web page (RSS autodiscovery: howto).

What this means is that if you have a list of links to library web pages, you can potentially automatically discover any RSS feeds associated with that library (if they have published autodiscoverable feed links, that is).

Some time ago, I put together a quick app that took a screenscraped list of UK HEI Library homepage URIs from somewhere (you wouldn't believe how hard it is to try to find a list of UK HEI library homepages;-) and tried to autodiscover any RSS feeds associated with them - Autodiscoverable RSS Feeds From HEI Library Websites. When I ran the detector just now, I got about a 36% success rate, which is far better than this time last year...

So anyway, I was wondering: how do the Cambridge University Libraries fare?

Looking through my list of handy cam.ac.uk links, here's one for an XML feed of the associated libraries, with links to their homepage: http://www.lib.cam.ac.uk/api/local/libraries_data.cgi

Notice that the URI for the web page of each library can be found down the XML path: libraries.library.web_address

So let's bring this in to a Yahoo Pipes environment, and try to autodetect any RSS feeds linked to from those pages. As well as importing RSS, Yahoo Pipes can also import JSON and XML feeds using the Fetch Data import block. However, I've noticed that the Fetch Data block sometimes chokes (I'm not quite sure why) so instead I use another Yahoo service - YQL - to act as an intermediary that will fetch the xml, maybe process it a little for me, and then pull the result into the pipe:

(You can try this query in the YQL Developer console.)

What the query statement does:
select library.web_address from xml where url='http://www.lib.cam.ac.uk/api/local/libraries_data.cgi'
is grab all the library.web_address elements (that point to the homepage for each library) from the XML page at http://www.lib.cam.ac.uk/api/local/libraries_data.cgi and pass them in to the pipe as XML.

NB it's trivial to create a simple 'helper' pipe block that acts like a mimimal Fetch Data block but actually pulls in the XML file via YQL:

This block could then be included in a pipe in the same way that a Fetch Data block can be...

So what next? Well, now we can use the Feed Autodiscovery block to see if there are any autodiscoverable RSS feeds listed on those web pages.

In order to do this, we need to pop the Feed Autodiscovery inside a Loop block - this allows the pipe to grab any autodiscovered feed URIs and produce a new feed of feed URIs by replacing the original elements that point to the Library homepages. The Emit all results instruction enforces this replacement policy.

So to recap - we grab a list of webpage URIs fromn the Cambridge Libraries XML feed:

Then we replace those feed items by any and all autodiscovered feed URIs:

(Who'd have thunk it - Penguin of the Day;-).

Note that the pipe also reports on any broken links it finds in the original homepage list:

We can now use this pipe in another pipe that looks at all the autodiscovered feed URIs, pulls the contents of them into a single feed, and then maybe sorts them by date order (as described in Getting Started With Yahoo Pipes: Merging RSS Feeds):

Again, the pipe reports on any feed URLs that appear to be broken:

So there we have it - a pipe that contains the aggregated feed items from the autodiscoverable RSS feeds listed on Cambridge University Library homepages, all powered by a single XML file containing links to the Library homepages.

Getting Started With Yahoo Pipes: Merging RSS Feeds

If you don't already know what RSS is, you may have noticed the folllowing logo appear on different websites, and even within your browser, and never really been sure what it's actually for...

What it's for is wiring (or plumbing). What it's for is passing content from one web page or application to another. What it's for is never having to visit that web page again to keep up to date with new content that might appear on that web page or website. What it's for is letting you see content from that page or site in another application, such as feed reader like Google Reader, or a 'web desktop/dashboard' like Netvibes, or Google personal pages. What it's for is turning websites into 'not email', that you can subscribe to from a single application and then view updates from in a single location.

It's also for much more than that, but that's what we'll start with...

But that's not what this post is about... What this post is about is how you can use an online application called Yahoo Pipes to do all sorts of plumbing with RSS feeds.

To get started, you'll need a Yahoo account, then you can create your first pipe...

Learn How to Build a Pipe in Just a Few Minutes @ Yahoo! Video

It's like Lego, but with bits of web content pulled into the pipe from one or more RSS feeds, with the content packaged up into bundles where each bundle contains:
- a title;
- some content (like the body of a blog post or news story), referred to as the description;
- a link (which is often to the original web page that contains the description).

So for example, if we look at the RSS from Cambridge University Library web page, we see links to a variety of RSS feeds.

http://www.lib.cam.ac.uk/toolbox/rss.html

We can pull one or more of these feeds into the pipes environment by creating a new Yahoo pipe and then using the Fetch Feed block from the Sources area of the left hand side bar:

(Highlighing a block by clicking on it lets you preview the output of that block.)

We can add combine the output of several feeds simply by adding the URL of each required feed to the Fetch Feed block:

The order of items in combined feed will be all the items from the last feed in the Fecth Feed block. followed by the items in the feed before it, and so on.

To order the items in the combination feed by date order, use the 'Sort' block:

The field you need to sort on is chosen from the drop down menu - PubDate is the element we want to sort on:

(Wire the blocks together by clicking on the 'output circle' at the bottom of a block and dragging the 'wire' that is produced onto the 'input circle' at the top of the next block.)

Finally, we need to connect the output block to the pipe to complete it.

If you run the Pipe, you will see its 'front page':

you can now subscribe to the output feed from this pipe (or use it in another pipe...), add it to your Yahoo or Google homepage, and so on.

There's a lot more you can do with Yahoo Pipes, but this is a good start: being able to aggregate (that is merge, or combine) content from several different sources into a single feed, and then order them accroding to time.

So how else might we use this simple 'aggregate and order' pattern?

How about combining table of contents feeds from different journals (you can find their URIs from TicTocs?

In this way, you can create a single RSS feed that keeps you up to date with the contents of several different journals you are interested in, and maybe also pulls in content from a recent/new books feed from your Library?

Monday, October 26, 2009

The 'Get Selection' Bookmarklet Pattern

In An Introduction to Bookmarklets, I introduced the idea of a bookmarklet, a browser based bookmark that lets you execute a small Javascript programme in the context of the currently displayed web page, rather than taking you to a bookmarked page.

This was followed by The 'Get Current URL' Bookmarklet Pattern, which descrbed how to create a bookmarklet that would operate on the URI of the currently viewed page, within a generic bookmarklet wrapper.

In this post, I'll provide an example of a bookmarklet pattern that passes some highlighted (that is, selected) text within the current page and passes it to another web page.

In Firefox and Safari web browsers, we can straightforwardly use the javascript function window.getSelection() to grab any selected text in the web page. (You can select text in the normal way - click the cursor on the page at the start of the text you want to select, then drag over the text you want to highlight).

In at least some versions of IE, we need to use the construction document.selection.createRange().text.

These two can be combined in a javascript construction that looks to see if (?) the first construction is available, and if not, then (:) uses the second:
window.getSelection?window.getSelection():document.selection.createRange().text;

To see how this works, highlight some text on this page and then click here.

Here is an example that achieves that effect:
Highlight some text and click here.

So how might we use this in practice? How about DOI resolution? (If you don't know about DOIs, they're Digital Object Identifiers - so go Google.. ;-)

DOIs typically look something like this: doi:10.1016/S0040-1625(03)00072-6. A long string of characters (in various formats depending on publisher), often prefixed by doi:

A DOI can point to one or more instances of a document. A DOI resolver will take a DOI and point you to an instance of it depending on various criteria. (In a library setting, this might depend on what online resources your library subscribes to.)

So for example, let's see what the DOI resolver at http://dx.doi.org/ can do with the DOI 10.1016/S0040-1625(03)00072-6...

You can call the resolver with the DOI in the following way:
http://dx.doi.org/doi:THE-DOI_YOU/WANT:RESOLVING

Try it: http://dx.doi.org/doi:10.1016/S0040-1625(03)00072-6

Hopefully, you might now see an opportunity here for a bookmarklet that uses the 'getSelection' pattern? In particular, a bookmarklet that lets a user highlight a DOI and then click on the bookmarklet to resolve that DOI.

Using the window.location=NEWURL trick that we saw in the The 'Get Current URL' Bookmarklet Pattern, we can construct just such a bookmarklet.

Grab the selected text (hopefully corresponding a valid DOI!;-): var t=window.getSelection?window.getSelection().toString():document.selection.createRange().text;

Construct a URI that will pass this DOI to the DOI resolver: var uri="http://dx.doi.org/doi:"+t;

Go to this URI, and as a result get redirected to an instance of the actual resource: window.location=uri;

We can then simplify this as follows:
var t=window.getSelection?window.getSelection().toString():document.selection.createRange().text; window.location="http://dx.doi.org/doi:"+t;

And pop it into our generic bookmarklet wrapper:
javascript:(function(){var t=window.getSelection?window.getSelection().toString():document.selection.createRange().text; window.location="http://dx.doi.org/doi:"+t;})()

Try it - select this DOI (just the numbers... no leading doi:): 10.1016/S0040-1625(03)00072-6 and then click on this DOI Resolver bookmarklet.

Here's a tool for helping generate your own bookmarklets using the 'get selection' pattern:

Your bookmarklet:

PS If you leave the space for the URI blank, you can generate a bookmarklet that will let you highlight an unlinked URI in a webpage, like the following one:
http://arcadiaproject.blogspot.com
and 'click through' it (via the bookmarklet) to the corresponding webpage...

PPS in some situations, it might be sensible to 'go defensive' and encode the selected text so that it works nicely in a URI. do this by adding the step:
t=encodeURIComponent(t);
before the window.location step.

Wednesday, October 21, 2009

The 'Get Current URL' Bookmarklet Pattern

In An Introduction to Bookmarklets, I introduced the idea of a bookmarklet, a browser based bookmark that lets you execute a small Javascript programme in the context of the currently displayed web page, rather than taking you to a bookmarked page.

In this post, I'll provide an example of a bookmarklet pattern that passes the URL of the current page to another web page.

To show you the effect of the bookmark, click here: see what happens

What you should find is that the URI of the current page is passed to the SplashURL service, which displays a short URL code, and a QR code, that both point back to this page.

If you click through on the link above and look at the URI of the page that is loaded, you will see that it has the form:

http://splashurl.net/?mode=qrcode&url=a_version_of_the_uri_of_this_page

What the link above did - and what a bookmarklet can do, is the following:

look up the URI of the current page using the Javascript 'command' window.location.href;

encode this URI so that it can be used in another URI: encodeURIComponent(window.location.href);

use this encoded version of the URI of the current page in the SplashURL URI:
var newURI="http://bigtiny.ecs.soton.ac.uk?mode=qrcode&url="+encodeURIComponent(window.location.href);

reload the current window with this new URI: window.location=newURI;

That gives us the following Javascript code snippet:
var newURI="http://bigtiny.ecs.soton.ac.uk?mode=qrcode&url="+encodeURIComponent(window.location.href);window.location=newURI;

We can actually write this in a simplified form where we do not create the newURI variable at all. Instead, we simply change the location of the current window to the new URI:
window.location="http://bigtiny.ecs.soton.ac.uk?mode=qrcode&url="+encodeURIComponent(window.location.href);

To use this code in a bookmarklet, or in a web page link*, we need to identify this code snippet as Javascript:
javascript:var newURI="http://bigtiny.ecs.soton.ac.uk?mode=qrcode&url="+encodeURIComponent(window.location.href);window.location=newURI;

[* note that some web publishing platforms will strip javascript code out of a link - Blogger is kind enough to leave it in.]

We might also take a precautionary step of making sure that the program code in the bookmarklet does not conflict with any javscript code already present in the current page:
javascript:(function(){var newURI="http://bigtiny.ecs.soton.ac.uk?mode=qrcode&url="+encodeURIComponent(window.location.href);window.location=newURI;})()

Here's the bookmarklet: SplashQR

That is, we wrap our program snippet in the following:
javascript:(function(){Javascript bookmarklet program code here})()

(To 'install' it on your Firefox or Safari browser, simply drag the link to the bookmarks toolbar. In IE, right click on the link and add to Favourites/Links, then in the View->Toolbar menu option of your browser, make sure the Link Toolbar is ticked.)

So now we have a pattern for creating bookmarklets that can pass the URI of the current page to another page. So if you see web service that includes the URI of another page in its URI, you can write a bookmarklet to invoke that service on a web page you are currently viewing.

Here's a simple form to help you generate this sort of bookmarklet:

Your bookmarklet:

If you come across any web page/services that include another URI in their URI, particularly in the context of academic, public, legal, governmental, medical or corporate libraries, please post an example, along with a description of what the page does, in the comments below.

In the next post in this series, I'll review how to create a bookmarklet that acts on a piece of text that is highlighted/selected by the user within the current page.

Wednesday, October 7, 2009

An Introduction to Bookmarklets

One of the expected deliverables from my time on the Arcadia Project is a set of posts on different mashup design patterns and implementation patterns that can act as Quick Start tricks and tips for anyone who wants to get started with producing ad hoc, itch scratching and potentially one-shot/disposable applications (that is, lightweight tools put together for a particular task that may or may not ever be useful again...)

On the implementation side, the bookmarklet approach is one I use extensively, so I thought it might be an idea to introduce the idea of bookmarklets, and then describe over a series of posts a set simple reusable patterns that I return to again and again, as well as a set of bookmarklet generators to get you started created your own bookmarklets.

So - what is a bookmarklet? In practical terms, a bookmarklet is a button that you can add to your browser that extends the functionality of the browser, or a web page displayed in it. But before we see how bookmarklets work, let's take a step back and look at the anatomy of a web browser for a moment. I'm using the Firefox browser for the screenshots and screencasts/videos, but a similar approach is used in other browsers too:

At the top of the screen is a location, or address bar. This is where you can type the URL of a website you want to visit. Whenever you view a web page, its URL will appear in the location bar. So if you click a link on a Google results page to a BBC webpage, for example, when you view the BBC page it's URL will be displayed in the location bar.

Below the location bar is a Bookmark (or Favorites) toolbar. You can typically hide (or reveal) this toolbar via the browser View menu:

The Bookmark toolbar is a place where you can add your own links (that look like buttons) to the browser. They are bit like shortcuts to application that you might place on a Windows desktop. So if you visit the same two or three websites regularly, adding a bookmark to them can save you time. In the above screenshot, I have a bookmark to Camtools and the Hermes Webmail system, for example.

Adding a bookmark to the toolbar is easy, and there are several ways to do it which all work, to a greater or lesser extent, in most of the popular web browsers (Internet Explorer, Firefox, Safari, and so on):

Right click on a link on a webpage, then select 'Bookmark [Favorite] this page' (or something similar!). You may need to select the location for where you want the store the bookmark - look for the toolbar option.
Drag a link from the page and just drop it on the toolbar.
Drag the page icon from the location bar at the top of the page and then drop it onto the toolbar.

Okay, so that's bookmarks; but what about bookmarklets?

Bookmarklets are mini-programmes that you can run from the browser bookmark toolbar. As programmes, they can take things like the URL of the currently displayed webpage and make use of them in some way, or they can be used to modify the contents of the currently displayed webpage.

So for example, the following 'Split screen bookmarklet' takes the URL of the current page and loads it into two separate frames, to make it easier to take screen grabs of a web page where you want to capture two different areas of the same page in the same screenshot:

Alternatively, the 'Newton QR Code' bookmarklet will add a QRcode (2D barcode) to a results page on the Newton Library Catalogue:

So how does a bookmarklet differ from bookmark in practical terms? Let's look at the properties of a bookmark (in Firefox, you can do this by right-cliking on a bookmark and selecting the Properties menu option).

You'll see theire's just the descriptive text that appears as the button label on the toolbar, and the URL of the bookmarked page.

Here's what a bookmarklet looks like:

The link, rather than starting with http://, starts with javascript: and is then followed by some javascript program code. When you click on the bookmarklet button, this mini-javascript programme will be run. Note that the bookmarklet should only be able to 'see' things in the current page, so if you have multiple tabs open in your browser, clicking on a bookmarklet should only affect the current page you are on.

Okay - so that's a quick intro to bookmarklets. In the next post in this series, I'll review some typical bookmarklet 'patterns', as well as the anatomy of a bookmarklet, before getting into some worked examples of how to create your own bookmarklets.

Thursday, October 1, 2009

Visual Links - Sharing Links With QR Codes

When I had my first look at the Cambridge University Library online catalogue yesterday, I hadn't yet sorted out my computer credentials so I was limited to using the system in a personally stateless way using public terminals.

Some of the links I tried to follow were also blocked (they were trying to go outside the local subdomain, I guess), but they might have been useful and if the opportunity had allowed I might have bookmarked them.

So pondering this, and as a starter for ten, are there any easy takeaways for the casual, non-authenticated user to capture information relating to a particular results page using some sort of mobile device, such as a pencil'n'paper device, or a mobile phone (not that cameras are allowed in the library...)?

I spy one - there's a shortcode possibility in the 'Link to this record' URL, I think? ul-4528529 looks like it could be a handy shortcode to me?

So if a 'Catalogue shortcode' was published on each page:

Catalogue shortcode: ul-4528529

with an easy to find resolver somewhere (the form above works as a stop gap), it's easy enough to scribble down a shortcode for a book's results page on the library catalogue:-)

For getting that link onto a phone, a QR-code would be another possibility. Here's what the page might look like:

To try a similar effect out yourself, add the following bookmarklet to you browser toolbar... (just click and drag it onto your browser bookmarks toolbar in Firefox/Safari, or right click on it and select 'Add to Favorites' in IE/Internet Explorer.)

Newton QR code (uses Google visulisation API QR code generator)

...go to a results page, highlight the link and then click on the bookmarklet:

It also strikes me that a really useful generic tool that could be implemented on public terminals using a browser extension would be a right-click menu option offering to Display this link as a QR-code...?

The Arcadia Mashups Blog