[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [TV] Source code, GPL and licenses...
OK. so I pulled and scanned the code... (tv-v2.00-20040126.tar.gz)
What I'm not seeing is the grabber source.... any suggestions?
Looking at dgc2xml.pl, I'm guessing that you have URLs for Digiguide
data for some sites.
I've also gone back through the archives and seen the 'mirroring'
discussion.
I'd add some thoughts in this area (bear in mind I'm thinking primarily
PVR here - although all the Peers could act as bleb www and xml mirrors too)
Listing data changes after publication and before it's 'live'.
I think there are 3 reasonable stages:
14 days away : Fuzzy data, quite a few unknown slots. Useful advance
notice of some things.
7 days away : Pretty solid. Ready to be paper published.
1 days away : Last minute schedule changes.
I call this the +1/+7/+14 approach.
Terminology
========
Source : master site for Diguide/XML/HTML listings data.
Network : An ordered list of Peers offering the bleb.org service
Master : www.bleb.org
Peer : www.another.site in the Network
Grab : A grab from the Source site
Scheduled Grab : Each channel has an schedule for obtaining data from
the Source Site.
Client : application (not browser) using bleb data (eg: xmltv's
tv_grab_uk_bleb)
Use : a Client grab or use of the service.
Objective
======
Service is available if any Network member fails.
Reasonable Client load balancing
Minimal load on Sources
Occasional
High level Peer behaviour
=================
A Peer regularly [on the hour + 2mins * position in the Network list]
connects to the top entry in the Network list to synchronise listings
data and code [rsync].
If the connection fails, the Network member immediately trys the next
entry in the Network list and so on until the next entry points to
itself; in that case it checks to see if a Scheduled Grab is due and
tries to Grab from the Source.
The Master immediately checks to see if a Scheduled Grab is due and
tries to Grab from the Source.
The data is stamped in order that the test for the retrieving Scheduled
Grab will fail in the future.
If any Grab fails then notification is sent.
Discussion
This is a fairly simple distribution approach - less to go wrong.
Clock sync between the Network servers isn't critical (but ntp is always
good).
Peers will never attempt a Grab unless the Master and all intervening
Peers are unreachable.
When any Network member returns it should not respond to requests until
it has synchronised with a superior (or the Sources in the case of the
Master).
There's one obvious timing issue: we need to understand how long a full
set of Grabs will take allowing for parallelism and maybe the +1/+7/+14
approach.
Worst case is a Master going down just before a Grab set is due to
complete and the next Peer starting the Grab very late and causing eager
clients to essentially miss a days update. Clients could notice this and
compensate the next day.
Note that Grabs are retried after an hour. Potential issue if a grab
takes over an hour! That would require some locking. If the grabbing
server failed whilst the lock was held then the assuming Peer should
ignore the lock.
High level Client behaviour
==================
Initially Clients pull the Network list from the Master; thereafter they
select a machine from the Network at random and use that. If a
connection times out then they try a different Peer. The Network list is
updated each time the Client uses the bleb service.
Discussion
Load balancing is done simply by the client.
Temporary Peer failure is hardly noticed.
Out of service Peers can be removed by the network and will cause minor
inconvenience until the Use following their removal (nothing stops the
client from blacklisting a Peer). In the event of the Master going down
for good (for whatever reason) then the next entry in the Network can
issue a new Network list.
Service Uses should probably be scheduled to allow time for Sources to
be consulted - maybe 3am onwards?
[ I wrote this and it's true so I'll leave it in but a review tells me
it's overkill: ] From a 'security' point of view - any Peer can
permanently hijack clients using it by providing a bad Network file. If
we care then we could digitally sign the file and arrange for trusted
admins to have access to the signature. The clients would then refuse to
honour a bad Network file.
This is just a start - I've no doubt made mistakes :)
David