[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TV] prepare for the deluge

To: tvdevel@xxxxxxxx
Subject: Re: [TV] prepare for the deluge
From: Andrew Flegg <andrew@xxxxxxxx>
Date: Fri, 23 Jan 2004 18:31:28 +0000
Delivery-date: Sun, 25 Jan 2004 18:35:05 +0000
Envelope-to: tvdevel@xxxxxxxxxxxxx
In-reply-to: <200401231751.09451.simon@xxxxxxxx>
References: <200401231751.09451.simon@xxxxxxxx>
Resent-date: Sun, 25 Jan 2004 18:34:31 +0000
Resent-from: andrew@xxxxxxxx
Resent-message-id: <E1Akp5b-0001sE-00@xxxxxxxxxxxxxxxxxxxxx>
Resent-to: tvdevel@xxxxxxxx
User-agent: slrn/0.9.7.4 (Linux)

In list.comp.tv, Simon wrote:
> as i'm sure you're aware, ananova pulled the plug on their xmltv friendly tv 
> listings. this leave your site as one of the few (if only sights) that 
> provide good quality tv listings for the uk. dispite the fact that i live in 
> ireland, this data is very important to my experiments with mythtv.

Yes, a few people have been asking recently about regional variations
again. Ananova's decision will probably mean that /tv2/ will also become
the main site, so I'll have to prioritise & implement the outstanding
features.

> i know that you are aware of mythtv as i've read the mailing list archive.
> 
> right now for xmltv users in the uk and ireland their are a number of choices:
> 
> - get the data from the bbc site using tv_grab_uk_rt
>   to be honest i don't think this is a runner

Agreed.

> - get the data from bleb.org
>   this seems very promising but i suspect if all the mythtv users suddenly
>   move over to your site then you will have a bit of a problem

There are two alternatives:

    1) A new "tv_grab_uk_bleb" is written which takes the XML available
       on bleb.org and transcodes it into XMLTV. This could use a
       theoretically zipped version of the current listings so that only
       one HTTP request was required.

or  2) A complete (perhaps compressed) XMLTV file was available -
       perhaps generated programatically using the transcoding CGI
       previously discussed. "tv_grab_uk_bleb" basically then downloads
       this file.

> - use your code and run it locally
>   you talk about making your code available so that we can grab it ourselves.
>   i for one, would be very eager to have a look at it. rather than generating
>   your xml i would like to feed it directly into my mysql database.

The way I'd be keen to do this is have a fetching module TV::Fetch::XML
which downloads the data from bleb.org, this means that the individual
channel sites aren't being hammered by X users.

> - build a distributed network of grabbers
>   this would spread the load. as more sites joined this netwrok, maybe we
>   could add new sources of data and build up a very high quality corpus of
>   listings data

That's an interesting one. Perhaps another alternative would be mirrors
of the XML data? This has the advantage of keeping the number of
times a site is scraped low, but prevents any bandwidth problems on
bleb.org. A central mirror list could be held at a known URL and a
random/close one used for each user.

TBH, at the moment I think there's plenty of bandwidth to spare. If it
gets too excessive then the only alternative will be to only make the
XML available through an interface where you specify:

    * Channels (eg. bbc1, bbc2, itv1)
    * Days (eg. -1, 0, 1, 2)
    * Format (eg. "bleb" XML, XMLTV)
    * Compression (eg. zip, tar.gz/bz2)

Then everyone can do something like:

    wget http://bleb.org/tv/data/listings?channels=bbc1,bbc2,itv1&days=6&file=zip

> i would appreciate feedback on these ideas

They're appreciated. I look forward to your comments :-)

Cheers,

Andrew

-- 
Andrew Flegg -- mailto:andrew@xxxxxxxx  |  http://www.bleb.org/

References:
- prepare for the deluge
  - From: Simon Kenyon

Prev by Date: Site changes & code uploaded
Next by Date: Rules for downloading data
Previous by thread: prepare for the deluge
Next by thread: Site changes & code uploaded
Index(es):
- Date
- Thread