[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [TV] prepare for the deluge
In list.comp.tv, Simon wrote:
> as i'm sure you're aware, ananova pulled the plug on their xmltv friendly tv
> listings. this leave your site as one of the few (if only sights) that
> provide good quality tv listings for the uk. dispite the fact that i live in
> ireland, this data is very important to my experiments with mythtv.
Yes, a few people have been asking recently about regional variations
again. Ananova's decision will probably mean that /tv2/ will also become
the main site, so I'll have to prioritise & implement the outstanding
features.
> i know that you are aware of mythtv as i've read the mailing list archive.
>
> right now for xmltv users in the uk and ireland their are a number of choices:
>
> - get the data from the bbc site using tv_grab_uk_rt
> to be honest i don't think this is a runner
Agreed.
> - get the data from bleb.org
> this seems very promising but i suspect if all the mythtv users suddenly
> move over to your site then you will have a bit of a problem
There are two alternatives:
1) A new "tv_grab_uk_bleb" is written which takes the XML available
on bleb.org and transcodes it into XMLTV. This could use a
theoretically zipped version of the current listings so that only
one HTTP request was required.
or 2) A complete (perhaps compressed) XMLTV file was available -
perhaps generated programatically using the transcoding CGI
previously discussed. "tv_grab_uk_bleb" basically then downloads
this file.
> - use your code and run it locally
> you talk about making your code available so that we can grab it ourselves.
> i for one, would be very eager to have a look at it. rather than generating
> your xml i would like to feed it directly into my mysql database.
The way I'd be keen to do this is have a fetching module TV::Fetch::XML
which downloads the data from bleb.org, this means that the individual
channel sites aren't being hammered by X users.
> - build a distributed network of grabbers
> this would spread the load. as more sites joined this netwrok, maybe we
> could add new sources of data and build up a very high quality corpus of
> listings data
That's an interesting one. Perhaps another alternative would be mirrors
of the XML data? This has the advantage of keeping the number of
times a site is scraped low, but prevents any bandwidth problems on
bleb.org. A central mirror list could be held at a known URL and a
random/close one used for each user.
TBH, at the moment I think there's plenty of bandwidth to spare. If it
gets too excessive then the only alternative will be to only make the
XML available through an interface where you specify:
* Channels (eg. bbc1, bbc2, itv1)
* Days (eg. -1, 0, 1, 2)
* Format (eg. "bleb" XML, XMLTV)
* Compression (eg. zip, tar.gz/bz2)
Then everyone can do something like:
wget http://bleb.org/tv/data/listings?channels=bbc1,bbc2,itv1&days=6&file=zip
> i would appreciate feedback on these ideas
They're appreciated. I look forward to your comments :-)
Cheers,
Andrew
--
Andrew Flegg -- mailto:andrew@xxxxxxxx | http://www.bleb.org/