James, The perl source is attached. It was developed and is run on a linux system hence the lack of file extension. I find that wordpad reads the file well under windows. To give you a starter, the rough logic is as follows; - Connect to the database - Delete all previous records to clean the database before this load - Read a configuration file which is for my application (sets lead in, lead out, etc for the recordings) - Open and process previously downloaded XMLTV files writing various records either directly from the XMLTV listings or derived for my own application Hope this helps and good luck with the project. Regards Simon James Cridland wrote: Simon, that's very kind - I know little perl, but it might be a good start, as you say... On Sun, 02 Jan 2005 20:32:00 +0000, simon <simon@xxxxxxxxxxxxxxxxxxxxxx> wrote:Hi James, As part of my home grown PVR I've written a bit of perl code that loads a mysql database from the XMLTV files downloaded from the Radio Times site. It's not terribly generic since I too am intrinsically lazy and wanted it only for myself ...(path of least resistance) but it does parse the XMLTV files and load a database. It should work equally well with any SQL based database although I've never tried it. If this sounds like a good starting point for what you what then drop me a line and I'll gladly donate them. Happy New Year Simon James Cridland wrote:Hello. I'm planning to use bleb.org's data, with others, to do a few things at www.mediauk.com - not just TV schedules, but to build a database of television programmes and add to this database with more information>from other sources.I'd like to import the XMLTV format, primarily because other broadcasters may use that too - and, where it exists, I'll take it direct from them. Now, being intrinsically lazy... has anyone written a simple importer for the bleb.org files (and I'm thinking particularly of the XMLTV files)? I'm happy to put in the spadework, but if someone's already invented that wheel, I'd prefer not to have to reinvent it, if at all possible. And, as (I think) the first poster to this list in 2005, Happy New Year.---------------------------------------------------------------------- Distributed to the bleb.org/tv developer list. Archive available at: http://www.bleb.org/tv/maillist/ To unsubscribe, send 'UNSUBSCRIBE james.cridland@xxxxxxxxx' to mailto:tvdevel-request@xxxxxxxx. If you have any problems please contact mailto:listmaster@xxxxxxxx |
#!/usr/bin/perl # # TV Recording - Listings DB load Interface # # Runs immediately after the XML TV listings have been downloaded. Parses the XML listings files and # loads them into a database which is used as a central repository for listing on the web interface # and holding the recording information. # # Version Change # 0.0 Start of the script # 0.1 Tidied up the code to implement strict # 2.0 Implemented XMLTV format input file with multi day capability and lots more fields # use XML::Twig; use Time::Local; use DBI; use strict; use vars qw( $dsn $db_user_name $db_password $dbh $listconf $channel_file $cf_leadin $cf_leadout @listings @channtitles $configuration %CHANHASH $xml_file $rec_count $dbg $timenow $sixdays ); $|=1; $dbg="n"; # Set to "y" to dump records as they are loaded $configuration="/etc/gluepvr/gluepvr.conf"; $dsn='DBI:mysql:tvrec:localhost'; $db_user_name='apache'; # Webserver user name $db_password='xxxxx'; # Webserver password $dbh = DBI->connect($dsn,$db_user_name,$db_password) or die "Can't connect: ", $DBI::errstr; $dbh->{RaiseError} = 1; $rec_count=0; #------------------------------------------------------------------------ # MAIN logic #------------------------------------------------------------------------ my $sqldel = "DELETE FROM listings"; # Delete everything from the listings file my $sth = $dbh->prepare($sqldel); # Prepare and execute the SQL $sth->execute; $sth->finish; # Commit database changes $timenow = time(); # Get the current time $sixdays = $timenow + (86400*6); # Get six days time &config_file; # Process the configuration file my $twigtv= new XML::Twig # Set the twig handling routines ( TwigHandlers => { channel => \&channel_handler, programme => \&program_handler } ); my $rc=$twigtv->parsefile($xml_file) or die "Can't Open XMLTV listings file $xml_file "; # Parse the xmltv listings file open (CONF, ">$channel_file"); # Write the channel listings file print CONF @channtitles; close CONF; print "\n"; # To leave the load count on the screen $dbh->disconnect; # Disconnect database #------------------------------------------------------------------------ # Handle the channel description elements from the xmltv listing #------------------------------------------------------------------------ sub channel_handler { my($twig,$channel)=@_; # Get the twig and element my $chtitle=$channel->first_child('display-name')->text; # Extract the display name $chtitle=~ s/\s/\_/; # Replace white space with "_" my $chid=$channel->att('id'); # Extract the id my $chanline="$chtitle=$chid\n"; # Create a string push @channtitles,$chanline; # Add them to the array for eventual output $CHANHASH{$chid}=$chtitle; # Create a hash for later reference } #------------------------------------------------------------------------ # Parse the programme and load the database #------------------------------------------------------------------------ sub program_handler { my ($twig,$prog)=@_; # Get the twig and the element my $xml_start_time=""; my $xml_stop_time=""; my $channel_url=""; my $title=""; my $desc=""; my $category=""; my $sth=""; my $cert=""; my $rating=""; my $rc = eval # Not all elements always exist so catch errors in case they don't { $xml_start_time=$prog->att('start'); # Start time $xml_stop_time=$prog->att('stop'); # End time $channel_url=$prog->att('channel'); # Listing Channel URL $title=$prog->first_child('title')->text; # Program title $desc=$prog->first_child('desc')->text; # Program description $category=$prog->first_child('category')->text; # Program category }; $cert=""; $rating=""; if ($category eq "film") { $rc=eval { $cert=$prog->first_child('rating')->first_child('value')->text; $rating=$prog->first_child('star-rating')->first_child('value')->text; } } my ($syyyy,$smon,$sdd,$shh,$smm,$sss)=$xml_start_time=~ /(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.+)$/; # Parse out the components my ($eyyyy,$emon,$edd,$ehh,$emm,$ess)=$xml_stop_time=~ /(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.+)$/; my $recdate="$sdd.$smon.$syyyy"; # Set start of recording date my $start_epoc = timelocal($sss,$smm,$shh,$sdd,$smon-1,$syyyy-1900); # Calculate the start time epoch if ($start_epoc > $sixdays){return();} # Only interested in up to six days worth my $end_epoc = timelocal($ess,$emm,$ehh,$edd,$emon-1,$eyyyy-1900); # Calculate the end time epoch my $indexref="$start_epoc$channel_url"; # Create a unique reference using the time and the channel my $end_record = $end_epoc + ($cf_leadout*60); # Add the lead out time my $start_record = $start_epoc - ($cf_leadin*60); # Add the lead in time (for duration calculation) (my $dhh, my $dmm)= &time_duration($start_record,$end_record); # Calculate the HH:MM format of the duration (my $filler,my $rec_mm,my $rec_hh,my @ignored) = localtime($start_epoc); # Convert the start time back to HH MM $rec_mm=&at_format($rec_mm); # Simple leading zero formating $rec_hh=&at_format($rec_hh); $dhh=&at_format($dhh); $dmm=&at_format($dmm); $desc=&cleanse($desc); # Format for html display $title=&cleanse($title); $category=&cleanse($category); $rec_count++; if($dbg eq "y") { print "-----------------------------------------------\n"; print "Index ref $indexref\n"; print "Channel $CHANHASH{$channel_url}\n"; print "Title $title\n"; print "Category $category\n"; print "Start date $recdate\n"; print "Start time $rec_hh:$rec_mm\n"; print "Duration $dhh:$dmm:00\n"; print "Description $desc\n"; print "Channel URL $channel_url\n"; print "Start Epoc $start_epoc\n"; print "End Epoc $end_epoc\n"; print "Certificate $cert\n"; print "Star Rating $rating\n"; print "-----------------------------------------------\n"; } print "Records Loaded $rec_count\r"; my $sqlins = "INSERT INTO listings (indexref, channel, title, category, start_date, start_time, duration, description, rating, cert, infourl, recstat, start_epoc, end_epoc) VALUES ('$indexref', '$CHANHASH{$channel_url}', '$title', '$category', '$recdate', '$rec_hh:$rec_mm', '$dhh:$dmm:00', '$desc', '$rating', '$cert', '$channel_url', ' ', '$start_epoc', '$end_epoc')"; $sth = $dbh->prepare($sqlins); $sth->execute; $sth->finish; # Commit database changes return(); } #------------------------------------------------------------------------- # Calculate the difference between the start and finish time # hence the duration of the recording #------------------------------------------------------------------------- sub time_duration { my $sepoch=shift(); my $eepoch=shift(); my $diff = $eepoch-$sepoch; (my $rss, my $rmm, my $rhh, my @rest) = gmtime($diff); return($rhh,$rmm); } #--------------------------------------------------------------------------- # Process the main configuration file #--------------------------------------------------------------------------- sub config_file { my @conf_file = (1,1); open (CONF, $configuration) or die("Could NOT open configuration file $configuration"); @conf_file = <CONF>; close (CONF); foreach my $line (@conf_file) { if (!($line =~ m/\#/)) { my @line_split = split /=/, $line; if ($line_split[0] eq 'lead_in') { $cf_leadin = $line_split[1]; } if ($line_split[0] eq 'lead_out') { $cf_leadout = $line_split[1]; } if ($line_split[0] eq 'channel_file') { $channel_file = $line_split[1]; } if ($line_split[0] eq 'xmltv_file') { $xml_file = $line_split[1]; } } } } #-------------------------------------------------------------------------- # Some simple formating to keep the "at" command happy #-------------------------------------------------------------------------- sub at_format { my $num=shift(); if ($num < 10) {$num = "0$num";} return($num); } #----------------------------------------------------------------------- # Remove none printable characters from text #----------------------------------------------------------------------- sub cleanse { my ($var); $var=$_[0]; #$var =~ s/\./ /g; $var =~ s/\'//g; $var =~ s/\,/ /g; $var =~ s/%([0-9][0-9])/pack("C", hex($1))/eg; $var =~ s/%0D/ /eg; $var =~ s/%0A/ /eg; $var =~ s/%([3-F][3-F])/pack("C", hex($1))/eg; $var =~ s/%([5-F][5-F])/pack("C", hex($1))/eg; $var =~ s/%([2-F][2-F])/pack("C", hex($1))/eg; return($var); }