[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [TV] XMLTV and PHP - written a parser?
Thanks for that, that's perfect.
Incidentally, it displays as inline text in Gmail (indeed, there's no
attachment showing, that's how clever Gmail is) - but doubtless it
would have opened in SubEthaEdit on this Apple Mac.
Windows? Pah!
On Sun, 02 Jan 2005 23:33:47 +0000, simon <simon@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> James,
>
> The perl source is attached. It was developed and is run on a linux system
> hence the lack of file extension. I find that wordpad reads the file well
> under windows. To give you a starter, the rough logic is as follows;
>
> - Connect to the database
> - Delete all previous records to clean the database before this load
> - Read a configuration file which is for my application (sets lead in, lead
> out, etc for the recordings)
> - Open and process previously downloaded XMLTV files writing various
> records either directly from the XMLTV listings or derived for my own
> application
>
> Hope this helps and good luck with the project.
>
> Regards
>
> Simon
>
>
> James Cridland wrote:
> Simon, that's very kind - I know little perl, but it might be a good start,
> as you say... On Sun, 02 Jan 2005 20:32:00 +0000, simon
> <simon@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi James, As part of my home grown PVR I've written a bit of perl code that
> loads a mysql database from the XMLTV files downloaded from the Radio Times
> site. It's not terribly generic since I too am intrinsically lazy and wanted
> it only for myself ...(path of least resistance) but it does parse the XMLTV
> files and load a database. It should work equally well with any SQL based
> database although I've never tried it. If this sounds like a good starting
> point for what you what then drop me a line and I'll gladly donate them.
> Happy New Year Simon James Cridland wrote:
> Hello. I'm planning to use bleb.org's data, with others, to do a few things
> at www.mediauk.com - not just TV schedules, but to build a database of
> television programmes and add to this database with more information >from
> other sources.
> I'd like to import the XMLTV format, primarily because other broadcasters
> may use that too - and, where it exists, I'll take it direct from them. Now,
> being intrinsically lazy... has anyone written a simple importer for the
> bleb.org files (and I'm thinking particularly of the XMLTV files)? I'm happy
> to put in the spadework, but if someone's already invented that wheel, I'd
> prefer not to have to reinvent it, if at all possible. And, as (I think) the
> first poster to this list in 2005, Happy New Year.
> ----------------------------------------------------------------------
> Distributed to the bleb.org/tv developer list. Archive available at:
> http://www.bleb.org/tv/maillist/ To unsubscribe, send 'UNSUBSCRIBE
> james.cridland@xxxxxxxxx' to mailto:tvdevel-request@xxxxxxxxx If you have
> any problems please contact mailto:listmaster@xxxxxxxx
> #!/usr/bin/perl
> #
> # TV Recording - Listings DB load Interface
> #
> # Runs immediately after the XML TV listings have been downloaded. Parses
> the XML listings files and
> # loads them into a database which is used as a central repository for
> listing on the web interface
> # and holding the recording information.
> #
> # Version Change
> # 0.0 Start of the script
> # 0.1 Tidied up the code to implement strict
> # 2.0 Implemented XMLTV format input file with multi day
> capability and lots more fields
> #
> use XML::Twig;
> use Time::Local;
> use DBI;
> use strict;
> use vars qw(
> $dsn
> $db_user_name
> $db_password
> $dbh
> $listconf
> $channel_file
> $cf_leadin
> $cf_leadout
> @listings
> @channtitles
> $configuration
> %CHANHASH
> $xml_file
> $rec_count
> $dbg
> $timenow
> $sixdays
> );
> $|=1;
> $dbg="n"; # Set to "y" to dump
> records as they are loaded
> $configuration="/etc/gluepvr/gluepvr.conf";
> $dsn='DBI:mysql:tvrec:localhost';
> $db_user_name='apache'; # Webserver user
> name
> $db_password='xxxxx'; # Webserver password
> $dbh = DBI->connect($dsn,$db_user_name,$db_password) or die "Can't connect:
> ", $DBI::errstr;
> $dbh->{RaiseError} = 1;
> $rec_count=0;
>
> #------------------------------------------------------------------------
> # MAIN logic
> #------------------------------------------------------------------------
> my $sqldel = "DELETE FROM listings"; # Delete everything
> from the listings file
> my $sth = $dbh->prepare($sqldel); # Prepare and
> execute the SQL
> $sth->execute;
> $sth->finish; # Commit database
> changes
>
> $timenow = time(); # Get the current
> time
> $sixdays = $timenow + (86400*6); # Get six days time
> &config_file; # Process the
> configuration file
> my $twigtv= new XML::Twig # Set the twig
> handling routines
> (
> TwigHandlers =>
> {
> channel => \&channel_handler,
> programme => \&program_handler
> }
> );
>
> my $rc=$twigtv->parsefile($xml_file) or die "Can't Open XMLTV listings file
> $xml_file "; # Parse the xmltv listings file
>
> open (CONF, ">$channel_file"); # Write the channel
> listings file
> print CONF @channtitles;
> close CONF;
> print "\n"; # To leave the load
> count on the screen
> $dbh->disconnect; # Disconnect
> database
> #------------------------------------------------------------------------
> # Handle the channel description elements from the xmltv listing
> #------------------------------------------------------------------------
> sub channel_handler
> {
> my($twig,$channel)=@_; # Get the
> twig and element
> my $chtitle=$channel->first_child('display-name')->text; # Extract
> the display name
> $chtitle=~ s/\s/\_/; # Replace
> white space with "_"
> my $chid=$channel->att('id'); # Extract
> the id
> my $chanline="$chtitle=$chid\n"; # Create a
> string
> push @channtitles,$chanline; # Add them
> to the array for eventual output
> $CHANHASH{$chid}=$chtitle; # Create a
> hash for later reference
> }
> #------------------------------------------------------------------------
> # Parse the programme and load the database
> #------------------------------------------------------------------------
> sub program_handler
> {
> my ($twig,$prog)=@_; # Get the twig and
> the element
>
> my $xml_start_time="";
> my $xml_stop_time="";
> my $channel_url="";
> my $title="";
> my $desc="";
> my $category="";
> my $sth="";
> my $cert="";
> my $rating="";
>
> my $rc = eval
> # Not all elements always exist so catch errors in case they don't
> {
> $xml_start_time=$prog->att('start');
> # Start time
> $xml_stop_time=$prog->att('stop');
> # End time
> $channel_url=$prog->att('channel');
> # Listing Channel URL
> $title=$prog->first_child('title')->text;
> # Program title
> $desc=$prog->first_child('desc')->text;
> # Program description
> $category=$prog->first_child('category')->text;
> # Program category
> };
>
> $cert="";
> $rating="";
>
> if ($category eq "film")
> {
> $rc=eval
> {
> $cert=$prog->first_child('rating')->first_child('value')->text;
> $rating=$prog->first_child('star-rating')->first_child('value')->text;
> }
> }
>
> my ($syyyy,$smon,$sdd,$shh,$smm,$sss)=$xml_start_time=~
> /(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.+)$/; # Parse out the
> components
> my ($eyyyy,$emon,$edd,$ehh,$emm,$ess)=$xml_stop_time=~
> /(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.+)$/;
>
> my $recdate="$sdd.$smon.$syyyy";
> # Set start of recording date
>
> my $start_epoc = timelocal($sss,$smm,$shh,$sdd,$smon-1,$syyyy-1900);
> # Calculate the start time epoch
> if ($start_epoc > $sixdays){return();}
> # Only interested in up to six days worth
> my $end_epoc = timelocal($ess,$emm,$ehh,$edd,$emon-1,$eyyyy-1900);
> # Calculate the end time epoch
>
> my $indexref="$start_epoc$channel_url";
> # Create a unique reference using the time and the channel
>
> my $end_record = $end_epoc + ($cf_leadout*60);
> # Add the lead out time
> my $start_record = $start_epoc - ($cf_leadin*60);
> # Add the lead in time (for duration calculation)
>
> (my $dhh, my $dmm)= &time_duration($start_record,$end_record);
> # Calculate the HH:MM format of the duration
> (my $filler,my $rec_mm,my $rec_hh,my @ignored) = localtime($start_epoc);
> # Convert the start time back to HH MM
>
> $rec_mm=&at_format($rec_mm);
> # Simple leading zero formating
> $rec_hh=&at_format($rec_hh);
> $dhh=&at_format($dhh);
> $dmm=&at_format($dmm);
>
> $desc=&cleanse($desc);
> # Format for html display
> $title=&cleanse($title);
> $category=&cleanse($category);
> $rec_count++;
> if($dbg eq "y")
> {
> print "-----------------------------------------------\n";
> print "Index ref $indexref\n";
> print "Channel $CHANHASH{$channel_url}\n";
> print "Title $title\n";
> print "Category $category\n";
> print "Start date $recdate\n";
> print "Start time $rec_hh:$rec_mm\n";
> print "Duration $dhh:$dmm:00\n";
> print "Description $desc\n";
> print "Channel URL $channel_url\n";
> print "Start Epoc $start_epoc\n";
> print "End Epoc $end_epoc\n";
> print "Certificate $cert\n";
> print "Star Rating $rating\n";
> print "-----------------------------------------------\n";
> }
>
> print "Records Loaded $rec_count\r";
>
> my $sqlins = "INSERT INTO listings
> (indexref,
> channel,
> title,
> category,
> start_date,
> start_time,
> duration,
> description,
> rating,
> cert,
> infourl,
> recstat,
> start_epoc,
> end_epoc)
> VALUES ('$indexref',
> '$CHANHASH{$channel_url}',
> '$title',
> '$category',
> '$recdate',
> '$rec_hh:$rec_mm',
> '$dhh:$dmm:00',
> '$desc',
> '$rating',
> '$cert',
> '$channel_url',
> ' ',
> '$start_epoc',
> '$end_epoc')";
>
> $sth = $dbh->prepare($sqlins);
> $sth->execute;
>
> $sth->finish; # Commit database changes
> return();
> }
>
> #-------------------------------------------------------------------------
> # Calculate the difference between the start and finish time
> # hence the duration of the recording
> #-------------------------------------------------------------------------
> sub time_duration
> {
> my $sepoch=shift();
> my $eepoch=shift();
>
> my $diff = $eepoch-$sepoch;
>
> (my $rss, my $rmm, my $rhh, my @rest) = gmtime($diff);
> return($rhh,$rmm);
>
> }
> #---------------------------------------------------------------------------
> # Process the main configuration file
> #---------------------------------------------------------------------------
> sub config_file
> {
> my @conf_file = (1,1);
> open (CONF, $configuration) or die("Could NOT open configuration file
> $configuration");
> @conf_file = <CONF>;
> close (CONF);
>
> foreach my $line (@conf_file)
> {
> if (!($line =~ m/\#/))
> {
> my @line_split = split /=/, $line;
> if ($line_split[0] eq 'lead_in') { $cf_leadin =
> $line_split[1]; }
> if ($line_split[0] eq 'lead_out') { $cf_leadout =
> $line_split[1]; }
> if ($line_split[0] eq 'channel_file') { $channel_file =
> $line_split[1]; }
> if ($line_split[0] eq 'xmltv_file') { $xml_file =
> $line_split[1]; }
> }
> }
> }
> #--------------------------------------------------------------------------
> # Some simple formating to keep the "at" command happy
> #--------------------------------------------------------------------------
> sub at_format
> {
> my $num=shift();
> if ($num < 10) {$num = "0$num";}
> return($num);
> }
> #-----------------------------------------------------------------------
> # Remove none printable characters from text
> #-----------------------------------------------------------------------
> sub cleanse
> {
> my ($var);
> $var=$_[0];
>
> #$var =~ s/\./ /g;
> $var =~ s/\'//g;
> $var =~ s/\,/ /g;
>
> $var =~ s/%([0-9][0-9])/pack("C", hex($1))/eg;
> $var =~ s/%0D/ /eg;
> $var =~ s/%0A/ /eg;
> $var =~ s/%([3-F][3-F])/pack("C", hex($1))/eg;
> $var =~ s/%([5-F][5-F])/pack("C", hex($1))/eg;
> $var =~ s/%([2-F][2-F])/pack("C", hex($1))/eg;
>
> return($var);
> }
>
>
>
--
http://james.cridland.net/