[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TV] XMLTV and PHP - written a parser?



James,

The perl source is attached.  It was developed and is run on a linux system hence the lack of file extension. I find that wordpad reads the file well under windows. To give you a starter, the rough logic is as follows;

- Connect to the database
- Delete all previous records to clean the database before this load
- Read a configuration file which is for my application (sets lead in, lead out, etc for the recordings)
- Open and process previously downloaded XMLTV files writing various records either directly from the XMLTV listings or derived for my own application

Hope this helps and good luck with the project.

Regards

Simon

James Cridland wrote:
Simon, that's very kind - I know little perl, but it might be a good
start, as you say...


On Sun, 02 Jan 2005 20:32:00 +0000, simon <simon@xxxxxxxxxxxxxxxxxxxxxx> wrote:
  
Hi James,

As part of my home grown PVR I've written a bit of perl code that loads
a mysql database from the XMLTV files downloaded from the Radio Times
site. It's not terribly generic since I too am intrinsically lazy and
wanted it only for myself ...(path of least resistance) but it does
parse the XMLTV files and load a database. It should work equally well
with any SQL based database although I've never tried it.

If this sounds like a good starting point for what you what then drop me
a line and I'll gladly donate them.

Happy New Year

Simon


James Cridland wrote:

    
Hello.

I'm planning to use bleb.org's data, with others, to do a few things
at www.mediauk.com - not just TV schedules, but to build a database of
television programmes and add to this database with more information
      
>from other sources.
    
I'd like to import the XMLTV format, primarily because other
broadcasters may use that too - and, where it exists, I'll take it
direct from them.

Now, being intrinsically lazy... has anyone written a simple importer
for the bleb.org files (and I'm thinking particularly of the XMLTV
files)? I'm happy to put in the spadework, but if someone's already
invented that wheel, I'd prefer not to have to reinvent it, if at all
possible.

And, as (I think) the first poster to this list in 2005, Happy New Year.



      
----------------------------------------------------------------------
Distributed to the bleb.org/tv developer list.
Archive available at: http://www.bleb.org/tv/maillist/

To unsubscribe, send 'UNSUBSCRIBE james.cridland@xxxxxxxxx' to
mailto:tvdevel-request@xxxxxxxx. If you have any problems please contact
mailto:listmaster@xxxxxxxx


    


  
#!/usr/bin/perl 
#
# TV Recording - Listings DB load Interface 
#
# Runs immediately after the XML TV listings have been downloaded. Parses the XML listings files and 
# loads them into a database which is used as a central repository for listing on the web interface
# and holding the recording information.
#
# Version	Change
# 0.0		Start of the script
# 0.1		Tidied up the code to implement strict
# 2.0		Implemented XMLTV format input file with multi day capability and lots more fields
#
use XML::Twig;
use Time::Local;
use DBI;
use strict;
use vars qw(
	    $dsn
	    $db_user_name
	    $db_password
	    $dbh
	    $listconf
	    $channel_file
	    $cf_leadin
	    $cf_leadout
	    @listings
	    @channtitles
	    $configuration
	    %CHANHASH
	    $xml_file
	    $rec_count
	    $dbg
	    $timenow
	    $sixdays
	    );
$|=1;
$dbg="n";						# Set to "y" to dump records as they are loaded
$configuration="/etc/gluepvr/gluepvr.conf";
$dsn='DBI:mysql:tvrec:localhost';
$db_user_name='apache';					# Webserver user name
$db_password='xxxxx';					# Webserver password
$dbh = DBI->connect($dsn,$db_user_name,$db_password) or die "Can't connect: ", $DBI::errstr;
$dbh->{RaiseError} = 1;
$rec_count=0;

#------------------------------------------------------------------------
# MAIN logic
#------------------------------------------------------------------------
my $sqldel = "DELETE FROM listings";			# Delete everything from the listings file 
my $sth = $dbh->prepare($sqldel);			# Prepare and execute the SQL
$sth->execute;
$sth->finish;						# Commit database changes

$timenow = time();					# Get the current time
$sixdays = $timenow + (86400*6);			# Get six days time
&config_file;						# Process the configuration file
my $twigtv= new XML::Twig				# Set the twig handling routines
		(
		TwigHandlers => 
		    {
		    channel   => \&channel_handler,
		    programme => \&program_handler
		    }
		);

my $rc=$twigtv->parsefile($xml_file) or die "Can't Open XMLTV listings file $xml_file "; # Parse the xmltv listings file

open (CONF, ">$channel_file");				# Write the channel listings file
print CONF @channtitles;
close CONF;
print "\n";						# To leave the load count on the screen
$dbh->disconnect; 					# Disconnect database
#------------------------------------------------------------------------
# Handle the channel description elements from the xmltv listing
#------------------------------------------------------------------------
sub channel_handler
{
 my($twig,$channel)=@_;						# Get the twig and element
 my $chtitle=$channel->first_child('display-name')->text;	# Extract the display name
 $chtitle=~ s/\s/\_/;						# Replace white space with "_"
 my $chid=$channel->att('id');					# Extract the id
 my $chanline="$chtitle=$chid\n";				# Create a string
 push @channtitles,$chanline;					# Add them to the array for eventual output
 $CHANHASH{$chid}=$chtitle;					# Create a hash for later reference
}
#------------------------------------------------------------------------
# Parse the programme and load the database
#------------------------------------------------------------------------
sub program_handler
{
my ($twig,$prog)=@_;					# Get the twig and the element

my $xml_start_time="";
my $xml_stop_time="";
my $channel_url="";
my $title="";
my $desc="";
my $category="";
my $sth="";
my $cert="";
my $rating="";

my $rc = eval									# Not all elements always exist so catch errors in case they don't
 {
    $xml_start_time=$prog->att('start');					# Start time
    $xml_stop_time=$prog->att('stop');						# End time
    $channel_url=$prog->att('channel');						# Listing Channel URL
    $title=$prog->first_child('title')->text;					# Program title
    $desc=$prog->first_child('desc')->text;					# Program description
    $category=$prog->first_child('category')->text;				# Program category
 };

$cert="";
$rating="";

if ($category eq "film")
 {
 $rc=eval
  {
   $cert=$prog->first_child('rating')->first_child('value')->text;
   $rating=$prog->first_child('star-rating')->first_child('value')->text; 
  }
 }

my ($syyyy,$smon,$sdd,$shh,$smm,$sss)=$xml_start_time=~ /(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.+)$/;     # Parse out the components
my ($eyyyy,$emon,$edd,$ehh,$emm,$ess)=$xml_stop_time=~ /(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.+)$/;

my $recdate="$sdd.$smon.$syyyy";						# Set start of recording date

my $start_epoc = timelocal($sss,$smm,$shh,$sdd,$smon-1,$syyyy-1900);		# Calculate the start time epoch 
if ($start_epoc > $sixdays){return();}						# Only interested in up to six days worth 
my $end_epoc = timelocal($ess,$emm,$ehh,$edd,$emon-1,$eyyyy-1900);     		# Calculate the end time epoch 

my $indexref="$start_epoc$channel_url";						# Create a unique reference using the time and the channel
  
my $end_record = $end_epoc + ($cf_leadout*60);	     				# Add the lead out time 
my $start_record = $start_epoc - ($cf_leadin*60);				# Add the lead in time (for duration calculation) 

(my $dhh, my $dmm)= &time_duration($start_record,$end_record);     		# Calculate the HH:MM format of the duration
(my $filler,my $rec_mm,my $rec_hh,my  @ignored) = localtime($start_epoc);	# Convert the start time back to HH MM

$rec_mm=&at_format($rec_mm);							# Simple leading zero formating
$rec_hh=&at_format($rec_hh);
$dhh=&at_format($dhh);
$dmm=&at_format($dmm);
 
$desc=&cleanse($desc);								# Format for html display
$title=&cleanse($title);
$category=&cleanse($category);
$rec_count++;
if($dbg eq "y")
{
    print "-----------------------------------------------\n";
    print "Index ref	$indexref\n";
    print "Channel	$CHANHASH{$channel_url}\n";
    print "Title	$title\n";
    print "Category	$category\n";
    print "Start date	$recdate\n";
    print "Start time	$rec_hh:$rec_mm\n";
    print "Duration	$dhh:$dmm:00\n";
    print "Description	$desc\n";
    print "Channel URL	$channel_url\n";
    print "Start Epoc	$start_epoc\n";
    print "End Epoc	$end_epoc\n";
    print "Certificate  $cert\n";
    print "Star Rating  $rating\n";
    print "-----------------------------------------------\n";
}

print "Records Loaded $rec_count\r"; 
 
  my $sqlins = "INSERT INTO listings 
                (indexref,
		 channel,
		 title,
		 category,
		 start_date,
		 start_time,
		 duration,
		 description,
		 rating,
		 cert,
		 infourl,
		 recstat,
		 start_epoc,
		 end_epoc)
               VALUES ('$indexref',
	    	       '$CHANHASH{$channel_url}',
		       '$title',
		       '$category',
		       '$recdate',
		       '$rec_hh:$rec_mm',
		       '$dhh:$dmm:00',
		       '$desc',
		       '$rating',
		       '$cert',
		       '$channel_url',
		       '&nbsp;',
		       '$start_epoc',
		       '$end_epoc')";

$sth = $dbh->prepare($sqlins);
$sth->execute;

$sth->finish;				# Commit database changes
return();
}

#-------------------------------------------------------------------------
# Calculate the difference between the start and finish time
# hence the duration of the recording
#-------------------------------------------------------------------------
sub time_duration
{
my $sepoch=shift();
my $eepoch=shift();

my $diff = $eepoch-$sepoch;

(my $rss, my $rmm, my $rhh, my @rest) = gmtime($diff);
return($rhh,$rmm); 

}
#---------------------------------------------------------------------------
# Process the main configuration file
#--------------------------------------------------------------------------- 
sub config_file
{
 my @conf_file = (1,1);
 open (CONF, $configuration) or die("Could NOT open configuration file $configuration");
 @conf_file = <CONF>;
 close (CONF);

 foreach my $line (@conf_file)
 {
	if (!($line =~ m/\#/))
	{
	my @line_split = split /=/, $line;
	if ($line_split[0] eq 'lead_in') 	{ $cf_leadin = $line_split[1]; }
	if ($line_split[0] eq 'lead_out') 	{ $cf_leadout = $line_split[1]; }
	if ($line_split[0] eq 'channel_file') 	{ $channel_file = $line_split[1]; }
	if ($line_split[0] eq 'xmltv_file') 	{ $xml_file = $line_split[1]; }
	}
 }
}
#--------------------------------------------------------------------------
# Some simple formating to keep the "at" command happy
#--------------------------------------------------------------------------
sub at_format
{
 my $num=shift();
 if ($num < 10) {$num = "0$num";}
 return($num); 
}
#-----------------------------------------------------------------------
# Remove none printable characters from text
#-----------------------------------------------------------------------
sub cleanse
{
my ($var); 
$var=$_[0];

#$var =~ s/\./ /g;
$var =~ s/\'//g;
$var =~ s/\,/ /g;

$var =~ s/%([0-9][0-9])/pack("C", hex($1))/eg;
$var =~ s/%0D/ /eg;
$var =~ s/%0A/ /eg;
$var =~ s/%([3-F][3-F])/pack("C", hex($1))/eg;
$var =~ s/%([5-F][5-F])/pack("C", hex($1))/eg;
$var =~ s/%([2-F][2-F])/pack("C", hex($1))/eg;

return($var);
}