[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TV] prepare for the deluge



In list.comp.tv, Simon wrote:
> as i'm sure you're aware, ananova pulled the plug on their xmltv friendly tv 
> listings. this leave your site as one of the few (if only sights) that 
> provide good quality tv listings for the uk. dispite the fact that i live in 
> ireland, this data is very important to my experiments with mythtv.

Yes, a few people have been asking recently about regional variations
again. Ananova's decision will probably mean that /tv2/ will also become
the main site, so I'll have to prioritise & implement the outstanding
features.

> i know that you are aware of mythtv as i've read the mailing list archive.
> 
> right now for xmltv users in the uk and ireland their are a number of choices:
> 
> - get the data from the bbc site using tv_grab_uk_rt
>   to be honest i don't think this is a runner

Agreed.

> - get the data from bleb.org
>   this seems very promising but i suspect if all the mythtv users suddenly
>   move over to your site then you will have a bit of a problem

There are two alternatives:

    1) A new "tv_grab_uk_bleb" is written which takes the XML available
       on bleb.org and transcodes it into XMLTV. This could use a
       theoretically zipped version of the current listings so that only
       one HTTP request was required.

or  2) A complete (perhaps compressed) XMLTV file was available -
       perhaps generated programatically using the transcoding CGI
       previously discussed. "tv_grab_uk_bleb" basically then downloads
       this file.

> - use your code and run it locally
>   you talk about making your code available so that we can grab it ourselves.
>   i for one, would be very eager to have a look at it. rather than generating
>   your xml i would like to feed it directly into my mysql database.

The way I'd be keen to do this is have a fetching module TV::Fetch::XML
which downloads the data from bleb.org, this means that the individual
channel sites aren't being hammered by X users.

> - build a distributed network of grabbers
>   this would spread the load. as more sites joined this netwrok, maybe we
>   could add new sources of data and build up a very high quality corpus of
>   listings data

That's an interesting one. Perhaps another alternative would be mirrors
of the XML data? This has the advantage of keeping the number of
times a site is scraped low, but prevents any bandwidth problems on
bleb.org. A central mirror list could be held at a known URL and a
random/close one used for each user.

TBH, at the moment I think there's plenty of bandwidth to spare. If it
gets too excessive then the only alternative will be to only make the
XML available through an interface where you specify:

    * Channels (eg. bbc1, bbc2, itv1)
    * Days (eg. -1, 0, 1, 2)
    * Format (eg. "bleb" XML, XMLTV)
    * Compression (eg. zip, tar.gz/bz2)

Then everyone can do something like:

    wget http://bleb.org/tv/data/listings?channels=bbc1,bbc2,itv1&days=6&file=zip
 
> i would appreciate feedback on these ideas

They're appreciated. I look forward to your comments :-)

Cheers,

Andrew

-- 
Andrew Flegg -- mailto:andrew@xxxxxxxx  |  http://www.bleb.org/