Busy Podcaster's Guide

Author: Garth T Kidd
Organisation:iPodder "Lemon Edition" Team
Version: 1.8
Date: 2004/11/24

In a hurry? Prefer to spend time on your show than tweaking software?

This document explains:

Planned sections of the document will explain:

Nobody seems to be having much trouble getting their RSS feeds right enough to attract listeners, and the penalty for getting it wrong is usually pretty obvious (a full inbox of complaints that they can't download your show), so I'll start with how to avoid being bankrupted by your podcast.

Recent Changes

How to Save Money

Podcasting can cost a pretty penny. The more listeners you have, the more likely it is that your web hosting provider will either send you a bill you weren't expecting, cut your site off, or both.

BitTorrent

BitTorrent is a handy way to dodge the bandwidth bill. It's slightly complicated to use and set up, but unless podcaching takes off or you use a professional hosting service there's not much avoiding it.

The most complicated aspect of using BitTorrent turns out to be figuring out how to make your listeners use it. In short: not all listeners can cope with it. Their podcatcher might not support the protocol, or they might be stuck behind a firewall that blocks the protocol. If you want to keep these listeners happy, you have to find a way of letting them (and, preferably, only them) download directly from your site.

Multiple Feeds

A popular way of handling this is to generate two feeds: one that points to the torrent feed, and one that points directly to the content. The problem is that you have to generate multiple feeds. It gets even worse if you want to publish your podcast in multiple encodings (say, one in mp3 and one as m4b so iTunes users can use bookmarks and save disk space): you'd then need four feeds. Even if you and your software can cope, your listeners might not cope with how many orange XML icons they have to choose between.

Another drawback with multiple feeds is that people who could potentially use your torrent feed will be subscribed to your direct download feed costing you more money than they need to. People subscribed to your torrent feed will have trouble grabbing episodes if there aren't enough seeders or if they plug into a firewalled network. In short, as well as running multiple feeds you'll also want to use extended feed attributes, the Grumet kludge and its inverse, or all of the above to make sure that cunning and smart feeds can save you money when they can and recover from problems if they have them.

Extended feed attributes

You can use the extended enclosure attributes in the podcast namespace (summarised in this document and fully documented in the podcast namespace specification) to help podcatchers give you the best of all worlds: the greatest number of happy listeners, the smallest bandwidth bill, and only one feed.

If you can't generate the namespaced attributes, or a lot of your listeners' podcatchers don't understand the attributes, you'll either have unhappy listeners or be paying more for bandwidth than you need to depending on how you structure your feed. Fortunately, there's another easy step that can help:

The Grumet Kludge

The Grumet Kludge is for those of you who prefer happy listeners to saving money but still want to save as much money as you can. Adding the extended attributes mentioned above will help even more, as the kludge is somewhat inefficient, but the kludge (invented by Andrew Grumet) has one big advantage: it's really, really simple.

All you have to do is this: publish your response files with the same URL as your audio files, but with .torrent added at the end.

Let's say your audio file is called fnord-20041124.mp3. Cunning clients will try to download fnord-200041124.mp3.torrent first. If that works, they'll use the response file to download your episode via BitTorrent--saving you a small fortune in bandwidth bills. If the BitTorrent download fails for some reason (say, you're out of seeders or they're behind a firewall), they'll give up and download the file directly.

If fnord-200041124.mp3.torrent doesn't exist, cunning clients will make a note and not try the Grumet Kludge for another week.

The Inverse Grumet Kludge

The Inverse Grumet Kludge is for those of you who prefer saving money to happy listeners. Listeners who can't use BitTorrent will either have to upgrade to a clever podcatcher or use a secondary feed (if you can be bothered publishing one).

For those of you who haven't figured it out just from the name and the scenario, let's continue the previous example:

If you aim url, length and type at fnord-200041124.mp3.torrent, dumb clients will download the BitTorrent response file and then try to get your episode via BitTorrent. That's probably your preferred scenario, because there are a lot of dumb clients out there--every version of every client available up to late November 2004 is a dumb client by this definition, and just those numbers are enough to cause many podcasters severe trouble with their bandwidth bills.

The Inverse Grumet Kludge lets cunning clients deal with BitTorrent failures. They'll strip the .torrent extension and trying to fetch fnord-20041124.mp3 directly. Ta-da!

The Janes Tweak

David "BlogMatrix Jaeger" Janes points out another clever aspect of the Grumet Kludge and its inverse: another hack to podcatcher behaviour would mean that podcasters could make good use of BitTorrent even if they were stuck behind a firewall that stopped BitTorrent. All it'll take is for podcatchers to seed even if their download via BitTorrent didn't succeed.

Stripping away the jargon, what this means is that if you generate your response file (that's the one ending in .torrent) and upload it together with your MP3, here's what'll happen:

  • Your first listener will download the response file.
  • When they try to download via BitTorrent, they'll time out because there won't be any seeds. So, they'll download your MP3 file directly.
  • Once they've finished downloading the MP3 file, they'll start seeding your show.
  • Subsequent listeners will grab the file via BitTorrent rather than directly from your web site.

Podcaching

With any luck, you'll soon be able to pretty much sit back and do nothing. If the podcache specification takes off, listeners' podcatchers will automatically consult podcache servers and grab the show from there rather than from your site. Instead of it costing you money, it'll cost them money, and they'll recover that money by... uh...

Okay, nobody has figured out how a podcache might recover its costs. Unlike the boom years there aren't any venture capitalists eager to throw a few million at it in the hope that someone will one day figure it out. Any prospective podcache provider will have the advantage of economy of scale and will be able to use BitTorrent to spread some of the load out amongst the listeners, but it's still far from certain whether or not podcaching is economically viable.

In short: podcaching might turn out to be a brilliant way of saving money that won't require any effort whatsoever, but don't hold your breath.

Summary

  • If you can't tackle generating BitTorrent response files and seeding, you'll need a professional podcast hosting service unless podcaching takes off and saves your skin.
  • Once you've got BitTorrent straightened out, choose whether your priority is happy listeners or saving money.
    • If your priority is happy listeners, serve your audio files directly and apply the Grumet Kludge. Any cunning or smart podcatcher will grab the torrent file and save you some money.
    • If your priority is saving money, serve torrent files out of your feed and apply the Inverse Grumet Kludge so that cunning and smart podcatchers can still download directly if they need to.

How To Get Your Feed Right

Let's lead with some examples. Monkey see; monkey do.

Minimal Example (direct feed with Grumet Kludge for cunning clients)

This is the least you can possibly do and still get adequate behaviour from podcatchers:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>Your podcast's name</title>
        <link>http://your.podcast.web.site/</link>
        <description>A short description of your podcast</description>
        <pubDate>Thu, 4 Nov 2004 08:52:00 +0000</pubDate>
        <item>
            <title>I podcasted!</title>
            <pubDate>Thu, 4 Nov 2004 08:52:00 +0000</pubDate>
            <enclosure 
             url="http://blah.com/blah.mp3" 
             length="102938392" 
             type="audio/mpeg"/>
            <guid>b15e400c8dbd6697f26385216d32a40f<guid>
        </item>
    </channel>
</rss>

Your feed will still validate against an RSS parser if you omit the two pubDate tags and the guid tag, and podcatchers will be able to successfuly download your enclosures, but Bad Things will happen later on.

If you'd rather dumb clients grab your enclosures via BitTorrent, you need to make a few changes:

  • url should point to the repsonse file;
  • length should be the length of the response file; and
  • type should be "application/bittorrent".

Comprehensive Example (torrent feed, with direct backup for smart clients)

This is hardly comprehensive at all. Can you smell the "UNDER CONSTRUCTION" sign?

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
 xmlns:podcast="http://ipodder.sourceforge.net/docs/podcast.html">
    <channel>
        <title>Your podcast's name</title>
        <link>http://your.podcast.web.site/</link>
        <description>A short description of your podcast</description>
        <pubDate>Thu, 4 Nov 2004 08:52:00 +0000</pubDate>
        <item>
            <title>I podcasted!</title>
            <pubDate>Thu, 4 Nov 2004 08:52:00 +0000</pubDate>
            <enclosure 
             url="http://blah.com/blah.mp3.torrent" 
             length="160452" 
             type="audio/mpeg"
             podcast:expect_md5="b15e400c8dbd6697f26385216d32a40f"
             podcast:expect_length="102938392"
            />
            <guid>b15e400c8dbd6697f26385216d32a40f<guid>
        </item>
    </channel>
</rss>

Feed Elements

Namespace Definitions

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel> ... </channel>
</rss>
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
 xmlns:podcast="http://ipodder.sourceforge.net/docs/podcast.html">
    <channel> ... </channel>
</rss>

Channel Tags

channel title

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 324)

Section empty; must have contents.

channel description

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 330)

Section empty; must have contents.

channel pubDate

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 333)

Section empty; must have contents.

Item Tags

Items must have one title or description. They should also have a pubDate so podcatchers can order playlists appropriately, and a guid so podcatchers can avoid re-downloading items un-necessarily.

If you want to podcast, you also need an enclosure tag.

item title

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 345)

Section empty; must have contents.

item description

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 348)

Section empty; must have contents.

item pubDate

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 351)

Section empty; must have contents.

item guid

If you're generating multiple feeds, make sure that for any item the guid tags are consistent across those feeds.

item podcast:production_notes

Adam and Dave are in sync on namespaced pointers to production notes, and August of the iPodderX team is tuned in also, but they haven't yet tuned us lemon-heads into what they're considering. This attribute could end up called anything. I just hope they don't put it in the enclosure: that'll really muck up the multiple enclosure crowd.

—Garth

enclosure

The RSS spec doesn't explicitly support multiple enclosures per item, but many podcasters have been doing it for a while and permissive feed parsers such as Mark Pilgrim's feedparser.py seem to cope with it.

That said, most podcatchers ignore all but the first enclosure. If you're going to put multiple enclosures in an item, make the first one your audio and use the extras for your production notes or whatever else. Most podcatchers (and RSS aggregators with enclosure support) will at least grab the audio, and some might even grab the others.

The enclosure tag has three compulsory attributes. We're also trying to shoehorn extra details in there.

David "BlogMatrix Jaeger" Janes has suggested that rather than namespaced podcast:direct_url and podcast:torrent_url we might want to put XHTML link tags in the enclosure, with rel attributes set to "delivery".

Mark Pilgrim's feedparser.py doesn't appear to handle any extra elements under enclosure, though, whether attributes or tags. That shoots us down for the time being. I'll have to investigate.

—Garth

enclosure url

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 399)

Section empty; must have contents.

enclosure length

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 402)

Section empty; must have contents.

enclosure type

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 405)

Section empty; must have contents.

enclosure podcast:expect_length and podcast:expect_md5

The podcast:expect_md5 and podcast:expect_length attributes on enclosure tags let podcatchers make sure they got your enclosure accurately.

Warning

If you're using BitTorrent, podcast:expect_md5 and podcast:expect_length should match the payload, not the BitTorrent response file.

So that corruption of your enclosure can be detected even if the length is correct, put a hexified MD5 digest in the podcast:expect_md5 attribute. You'll know if your implementation is correct if your hexified digest for the word "fnord" is the same as that given in the example below:

<enclosure podcast:expect_md5="b15e400c8dbd6697f26385216d32a40f" />

For what it's worth, the Python code to compute the digest is:

import md5

def hexified_digest(blocks): 
    """Return a hexified digest. 

    blocks -- a sequence of blocks of data."""
    engine = md5.new()
    for block in blocks: 
        engine.update(block)
    return engine.digest().encode('hex')

Podcatchers can't rely on your length attribute being correct because too many feeds either leave it out entirely or put the same (incorrect) length on every enclosure. To let them know that you're serious, put another copy in the podcast:expect_length attribute:

<enclosure podcast:expect_length="1875233" length="1875233" />

enclosure podcast:torrent_url

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 448)

Section empty; must have contents.

enclosure podcast:direct_url and podcast:direct_type

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 451)

Section empty; must have contents.

enclosure podcast:enclosure_guid

System Message: ERROR/3 (C:\dev\iPodder-site\htdocs\docs\bpg.txt, line 454)

Section empty; must have contents.

Glossary

Podcast
An RSS feed for which items contain enclosures pointing to audio files subscribers will listen to as if it were a downloadable radio broadcast.
Podcaster
You, assuming you want to produce a podcast.
Podcatcher
Software that downloads podcasts and puts them on a user's media player.
Podsafe
an adjective describing music you can play on your podcast without being sued.
Podcache
An automated podcast caching service. See the podcache namespace specification for details. Note: nobody has built one yet.
BitTorrent
...
Response file
...
Torrent feed
...

FAQ

Q: My MP3 file is full of clicks and I need to replace it. What do I do?

A: Well...