======================
Podcache RSS Namespace
======================
:Author: Garth T Kidd
:Organisation: iPodder "Lemon Edition" Team
:Version: $Revision: 1.3 $
:Date: 2004/11/24
This specification describes a simple standard by which friendly people can
host a cache of podcasted content and re-feed it out either as direct
downloads or (more sensibly) via BitTorrent.
Rather than having one output file per feed, this standard assumes a single
file containing cache entries for multiple feeds. Podcatching software can
simply download the cache feed first, look up wanted enclosures in the
cache feed, and download the content from either the cache or directly if
the cache is out of date or doesn't seem to be working.
This standard assumes and describes a balance of responsibility between the
cache administrators, users, podcatcher developers, and podcasters.
Everyone has a role to play, and the standard describes mechanisms by which
more responsible parties can influence less responsible parties to behave.
We feel this is a much more pragmatic approach than appointing some
podcache standard nanny to yell at people who abuse the capability.
This standard is designed to ensure that:
* Users get their podcasts;
* Podcasters can still see the users, and know that users got the podcast
accurately; and
* Cache administrators have an opportunity to address to the users, too.
The `tailored summaries`_ give more information for each of these key
audiences.
.. _hassle Garth:
If you have any questions about this specification, hassle Garth__
either privately or in public in the `ipodder-dev`__ mailing list.
__ mailto:garthk@gmail.com
__ http://groups.yahoo.com/group/ipodder-dev/
.. contents:: Table of Contents
Change Notes
============
* Nov 19: Renamed sections; reorganised content.
* Nov 19: Tidied up `Elements for Podcast Feeds`_, adding
`podcache:expectmd5`_ and `podcache:expectlength`_.
* Nov 18: Initial version
.. _tailored summaries:
Tailored Summaries
==================
To save you the effort of running the protocol in your head to figure out
what it means, this document provides tailored summaries for:
* `podcasters`_;
* `podcatcher developers`_; and
* `podcache providers`_.
If you don't fall into one of these categories, `hassle Garth`_ and he'll
explain it from your choice of frame of reference.
.. _podcasters:
... for Podcasters
------------------
One of the primary design goals of this standard was simplicity for
podcasters:
If you do nothing at all, you'll get almost all of the benefits with none
of the hassle.
If the user has configured a podcache that is caching your podcast, you
shouldn't notice anything except a substantial savings on your bandwidth
bill. A well behaved podcache-enabled podcatcher will still hit your RSS
feed so you know they're listening, will ignore the podcache if it has
fallen out of date, and will download directly from you if the podcache
happens to be broken.
There are some new RSS elements you can add to your feed to help prevent
mistakes in caching your content or turn caching off altogether. For more
details, see `Elements for Podcast Feeds`_. You don't *need* to add these
elements, though: they just make it easier to catch caching bugs, and those
bugs will be detected pretty quickly if just a few popular feeds generate
the elements correctly.
If you'd rather concentrate on getting your content right than fuss about
how the technology works under the hood, and you're using some piece of
software you *didn't* write to generate your RSS, you can leave all the
detail to your software developer. If you hand-roll your RSS and can't be
bothered learning about XML namespaces, you can do absolutely nothing and
watch your bandwidth bill drop without lifting a finger.
.. _podcatcher developers:
... for Podcatcher developers
-----------------------------
On behalf of the podcasters whose feeds you'd like to cache as a favour to
their bandwidth bills, we'd like to ask your assistance in doing everything
you can to cache responsibly, and stop caching when you can't cache
responsibly.
In short, you need to make sure the cache gets it right and that you
bypass the cache if the cache looks like it's out of date or wrong. If you
*don't* do that, any bug in the caching software could cause chaos and
both users and podcasters will lose trust in the idea of caching.
Don't let the sense of urgency make it seem difficult, though. All you need
to do is look for the `Elements for Podcast Feeds`_ when you grab a feed's
RSS and compare their contents against the `Elements for Podcache Feeds`_
contained in the cache feed's RSS, and against the length and MD5 digest of
the enclosure you get via the podcache. It's more work than the podcasters_
have to do, but significantly less than the `podcache providers` have to
do.
It's also your responsibility to help the podcache providers stay afloat by
correctly handling the `podcache:cacheannouncement`_ attribute and
inserting the announcements in the playlists full of enclosures downloaded
from or via the podcache. Don't worry: users aren't being forced to do
anything, here. If they don't want to hear announcements, they can stop
using the podcache. It's up to you to enforce that arrangement.
.. _podcache providers:
... for Podcache providers
--------------------------
Oh, boy, do *you* have a lot of hard work to do. You can't escape reading
this entire document and comprehending the detail, I'm afraid. Sorry. If
that strikes you as unfair, consider this: running a podcache saves
podcasters money but costs you money. Due to efficiencies of scale, it'll
probably cose you less than it costs them, but it's still your money being
spent. If you think writing the code is hard, wait until you try to figure
out a way to recover your costs without pissing everyone off. Ouch.
Elements for Podcache Feeds
===========================
The difference between a normal feed and a podcache feed is special
``podcache:`` `item attributes`_ and `enclosure attributes`_ to let
podcatchers know important information about what it is you're caching
so they can
a) use your cache, and
b) use your cache responsibly.
If you don't specify `podcache:originalurl`_, podcatchers won't even know
to grab something from your cache, so we expect you'll be eager to insert
that one. If you don't specify the others, podcatchers should really stop
using your feed because it makes it too difficult for them to ensure
they're not grabbing out-of-date items.
Namespace Definition
--------------------
First, you *should* make XML parsers happy (and let everyone know how to
find this document) by adding the ``podcache`` namespace definition to
your ``rss`` tag::
...
iPodder and anything else using a permissive feed parser won't mind if
you skip that, but I suspect other podcatchers using strict XML parsers
won't use your cache and will either go elsewhere or download the content
directly from the podcaster's site.
.. _item attributes:
Item Attributes
---------------
There's only one new ``item`` attribute to generate:
* `podcache:feedurl`_
podcache:feedurl
~~~~~~~~~~~~~~~~
For each ``item`` you're caching, you *must* add ``podcache:feedurl``
to let compatible podcatchers know which feed the item came out of. If
you're polling the `recent 100 feed`__, you can get this from the
``source`` tag. ::
-
...
__ http://audio.weblogs.com/top100.xml
.. _enclosure attributes:
Enclosure Attributes
--------------------
There are a handful of ``enclosure`` attributes to generate:
* `podcache:originalurl`_
* `podcache:length`_
* `podcache:contentlength`_
* `podcache:etag`_
* `podcache:lastmodified`_
The latter three *should* be used by podcatchers to verify that the
enclosure hasn't been updated since you fetched it.
Not at all compulsory, but interesting to know about, is a means to podcast
short announcements to people using your cache feed:
* `podcache:cacheannouncement`_
podcache:originalurl
~~~~~~~~~~~~~~~~~~~~
For each enclosure, you *must* add ``podcache:originalurl``. This is the
attribute by which podcatchers will identify that you have a cached version
of an enclosure they want. You *must not* normalize the URL: it must be
*exactly* what was in the original ``url`` attribute, or podcatchers won't
match it. ::
podcache:length
~~~~~~~~~~~~~~~
As some podcasters don't put the right ``length`` in their feed, and the
``length`` in your cached ``item`` will be that of the BitTorrent response
file you generated, you *should* add ``podcache:length`` to let podcatchers
know how big the file they'll get will actually be. They'll also figure
that out once they grab the ``.torrent`` file, but I'm sure they'll make
good use of the information. ::
podcache:contentlength
~~~~~~~~~~~~~~~~~~~~~~
So that podcatchers can verify that what you're seeding matches the
enclosure in the feed you're caching, you should add
``podcache:contentlength`` to let them know the content of the
``Content-Length:`` HTTP header you received when you fetched the content.
If there was no such header, include the attribute but leave it empty. ::
podcache:etag
~~~~~~~~~~~~~
So that podcatchers can verify that what you're seeding matches the
enclosure in the feed you're caching, you should add ``podcache:etag`` to
let them know the content of the ``ETag:`` HTTP header you received when
you fetched the content. If there was no such header, include the
attribute but leave it empty. ::
podcache:lastmodified
~~~~~~~~~~~~~~~~~~~~~
So that podcatchers can verify that what you're seeding matches the
enclosure in the feed you're caching, you should add
``podcache:lastmodified`` to let them know the content of the
``Last-Modified:`` HTTP header you received when you fetched the content.
If there was no such header, include the attribute but leave it empty. ::
Podcache Announcements
----------------------
To make announcements to your cache feed's users, include a normal ``item``
with an ``enclosure`` but without any of the ``podcache`` attributes except
for `podcache:cacheannouncement`_. Podcatchers *must* insert your
announcement into a playlist where the user will hear it.
Podcatchers *must not* provide a way to use a cache feed without
downloading announcements and inserting them into playlists. Podcasters can
talk to their users; podcache providers should also be given an opportunity
to do so. Given the expense of hosting a cache, podcache providers might
even need to advertise. If the podcatchers don't help, the caches and then
the podcasters will be crushed under the weight of their bandwidth bill.
That said, there should be balance: podcatchers *should* make it easy for
users to stop using your feed if they get sick of your announcements. We'd
insist, but we don't need to: , basic market selection will take care of it
for us.
podcache:cacheannouncement
~~~~~~~~~~~~~~~~~~~~~~~~~~
The ``podcache:announcement`` attribute should simply be set to ``true``::
-
...
Elements for Podcast Feeds
==========================
One of the primary design goals of the podcache was simplicity for
podcasters:
If you do nothing at all, you'll get most of the benefits with none of
the hassle.
If the user has configured a podcache that is caching your podcast, you
shouldn't notice anything except a substantial savings on your bandwidth
bill. A well behaved podcache-enabled podcatcher will still hit your RSS
feed so you know they're listening, will try its best to ignore the
podcache if it has fallen out of date, and will download directly from you
if the podcache happens to be broken.
If you want to make your feed podcache-aware, though, there are some
elements you might find interesting:
* `podcache:forbid`_ stops podcatchers from using caches for your feed.
* `podcache:preferredcache`_ indicates your preferred cache.
* `podcache:expectmd5`_ and `podcache:expectlength`_ let caches and
podcatchers make sure they didn't break your enclosure during handling.
Namespace Definition
--------------------
First, you *should* make XML parsers happy (and let everyone know how to
find this document) by adding the ``podcache`` namespace definition to
your ``rss`` tag::
...
This is more urgent for podcasters than it is for podcachers. If someone
can't use a podcache because the feed isn't strict XML, that saves the
podcacher bandwidth. If someone can't download your feed because it isn't
strict XML, they can't listen to you. Oops.
Channel Elements
----------------
The `podcache:forbid`_ and `podcache:preferredcache`_ elements on the
channel control the behaviour of podcatchers.
.. warning::
We don't know whether to put these in as tags or elements, hence the
lack of any example XML. That in turn makes it pretty hard to implement
either a compliant feed writer or software to read it. Sorry about that.
podcache:forbid
~~~~~~~~~~~~~~~
If you specify ``podcache:forbid`` for your channel or any item, iPodder
and any other well behaved podcatcher will ignore any cached entries for
your feed. Well behaved podcaches will stop caching your feed, though they
might keep polling once a day to see if you've taken ``forbid`` off.
Podcatchers *must* obey ``podcache:forbid`` once we figure out where to put
it.
podcache:preferredcache
~~~~~~~~~~~~~~~~~~~~~~~
You can specify a preferred podcache with ``podcache:preferredcache``. This
might be useful if you're in serious trouble with your bandwidth bill: just
configure your feed so that only one cache can download your enclosures,
and set it as your preferred cache.
Podcatchers *may* obey ``podcache:preferredcache``; there might be network
topology or security reasons why they can't access your preferred cache and
might want to instead use some other cache (which might be caching your
preferred cache).
Enclosure Elements
------------------
The `podcache:expectmd5`_ and `podcache:expectlength`_ attributes on
``enclosure`` tags let caches make sure they got your enclosure accurately,
and let podcatchers know the enclosure they got from the cache is exactly
the same as the enclosure they would have fetched from you directly.
podcache:expectmd5
~~~~~~~~~~~~~~~~~~
So that corruption of your enclosure can be detected even if the length is
correct, put a hexified MD5 digest in the ``podcache:expectmd5`` attribute.
You'll know if your implementation is correct if your hexified digest for
the word "fnord" is the same as that given in the example below::
For what it's worth, the Python code to compute the digest is::
import md5
def hexified_digest(blocks):
"""Return a hexified digest.
blocks -- a sequence of blocks of data."""
engine = md5.new()
for block in blocks:
engine.update(block)
return engine.digest().encode('hex')
Podcatchers *should* check downloaded enclosures against
``podcache:expectmd5`` and tell users of any mismatches.
podcache:expectlength
~~~~~~~~~~~~~~~~~~~~~
Podcaches and podcatchers can't rely on your ``length`` attribute being
correct because too many feeds either leave it out entirely or put the same
(incorrect) length on every enclosure. To let them know that you're
serious, put another copy in the ``podcache:expectlength`` attribute::
Podcatchers *should* check downloaded enclosures against
``podcache:expectlength`` and tell users of any mismatches.