Re: [TV] XMLTV and PHP - written a parser?


The perl source is attached.  It was developed and is run on a linux system hence the lack of file extension. I find that wordpad reads the file well under windows. To give you a starter, the rough logic is as follows;

- Connect to the database
- Delete all previous records to clean the database before this load
- Read a configuration file which is for my application (sets lead in, lead out, etc for the recordings)
- Open and process previously downloaded XMLTV files writing various records either directly from the XMLTV listings or derived for my own application

Hope this helps and good luck with the project.



James Cridland wrote:
Simon, that's very kind - I know little perl, but it might be a good
start, as you say...

On Sun, 02 Jan 2005 20:32:00 +0000, simon <simon@xxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi James,

As part of my home grown PVR I've written a bit of perl code that loads
a mysql database from the XMLTV files downloaded from the Radio Times
site. It's not terribly generic since I too am intrinsically lazy and
wanted it only for myself ...(path of least resistance) but it does
parse the XMLTV files and load a database. It should work equally well
with any SQL based database although I've never tried it.

If this sounds like a good starting point for what you what then drop me
a line and I'll gladly donate them.

Happy New Year


James Cridland wrote:


I'm planning to use bleb.org's data, with others, to do a few things
at www.mediauk.com - not just TV schedules, but to build a database of
television programmes and add to this database with more information
>from other sources.
I'd like to import the XMLTV format, primarily because other
broadcasters may use that too - and, where it exists, I'll take it
direct from them.

Now, being intrinsically lazy... has anyone written a simple importer
for the bleb.org files (and I'm thinking particularly of the XMLTV
files)? I'm happy to put in the spadework, but if someone's already
invented that wheel, I'd prefer not to have to reinvent it, if at all

And, as (I think) the first poster to this list in 2005, Happy New Year.

# TV Recording - Listings DB load Interface 
# Runs immediately after the XML TV listings have been downloaded. Parses the XML listings files and 
# loads them into a database which is used as a central repository for listing on the web interface
# and holding the recording information.
# Version	Change
# 0.0		Start of the script
# 0.1		Tidied up the code to implement strict
# 2.0		Implemented XMLTV format input file with multi day capability and lots more fields
use XML::Twig;
use Time::Local;
use DBI;
use strict;
use vars qw(
$dbg="n";						# Set to "y" to dump records as they are loaded
$db_user_name='apache';					# Webserver user name
$db_password='xxxxx';					# Webserver password
$dbh = DBI->connect($dsn,$db_user_name,$db_password) or die "Can't connect: ", $DBI::errstr;
$dbh->{RaiseError} = 1;

# MAIN logic
my $sqldel = "DELETE FROM listings";			# Delete everything from the listings file 
my $sth = $dbh->prepare($sqldel);			# Prepare and execute the SQL
$sth->finish;						# Commit database changes

$timenow = time();					# Get the current time
$sixdays = $timenow + (86400*6);			# Get six days time
&config_file;						# Process the configuration file
my $twigtv= new XML::Twig				# Set the twig handling routines
		TwigHandlers => 
		    channel   => \&channel_handler,
		    programme => \&program_handler

my $rc=$twigtv->parsefile($xml_file) or die "Can't Open XMLTV listings file $xml_file "; # Parse the xmltv listings file

open (CONF, ">$channel_file");				# Write the channel listings file
print CONF @channtitles;
close CONF;
print "\n";						# To leave the load count on the screen
$dbh->disconnect; 					# Disconnect database
# Handle the channel description elements from the xmltv listing
sub channel_handler
 my($twig,$channel)=@_;						# Get the twig and element
 my $chtitle=$channel->first_child('display-name')->text;	# Extract the display name
 $chtitle=~ s/\s/\_/;						# Replace white space with "_"
 my $chid=$channel->att('id');					# Extract the id
 my $chanline="$chtitle=$chid\n";				# Create a string
 push @channtitles,$chanline;					# Add them to the array for eventual output
 $CHANHASH{$chid}=$chtitle;					# Create a hash for later reference
# Parse the programme and load the database
sub program_handler
my ($twig,$prog)=@_;					# Get the twig and the element

my $xml_start_time="";
my $xml_stop_time="";
my $channel_url="";
my $title="";
my $desc="";
my $category="";
my $sth="";
my $cert="";
my $rating="";

my $rc = eval									# Not all elements always exist so catch errors in case they don't
    $xml_start_time=$prog->att('start');					# Start time
    $xml_stop_time=$prog->att('stop');						# End time
    $channel_url=$prog->att('channel');						# Listing Channel URL
    $title=$prog->first_child('title')->text;					# Program title
    $desc=$prog->first_child('desc')->text;					# Program description
    $category=$prog->first_child('category')->text;				# Program category


if ($category eq "film")

my ($syyyy,$smon,$sdd,$shh,$smm,$sss)=$xml_start_time=~ /(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.+)$/;     # Parse out the components
my ($eyyyy,$emon,$edd,$ehh,$emm,$ess)=$xml_stop_time=~ /(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.+)$/;

my $recdate="$sdd.$smon.$syyyy";						# Set start of recording date

my $start_epoc = timelocal($sss,$smm,$shh,$sdd,$smon-1,$syyyy-1900);		# Calculate the start time epoch 
if ($start_epoc > $sixdays){return();}						# Only interested in up to six days worth 
my $end_epoc = timelocal($ess,$emm,$ehh,$edd,$emon-1,$eyyyy-1900);     		# Calculate the end time epoch 

my $indexref="$start_epoc$channel_url";						# Create a unique reference using the time and the channel
my $end_record = $end_epoc + ($cf_leadout*60);	     				# Add the lead out time 
my $start_record = $start_epoc - ($cf_leadin*60);				# Add the lead in time (for duration calculation) 

(my $dhh, my $dmm)= &time_duration($start_record,$end_record);     		# Calculate the HH:MM format of the duration
(my $filler,my $rec_mm,my $rec_hh,my  @ignored) = localtime($start_epoc);	# Convert the start time back to HH MM

$rec_mm=&at_format($rec_mm);							# Simple leading zero formating
$desc=&cleanse($desc);								# Format for html display
if($dbg eq "y")
    print "-----------------------------------------------\n";
    print "Index ref	$indexref\n";
    print "Channel	$CHANHASH{$channel_url}\n";
    print "Title	$title\n";
    print "Category	$category\n";
    print "Start date	$recdate\n";
    print "Start time	$rec_hh:$rec_mm\n";
    print "Duration	$dhh:$dmm:00\n";
    print "Description	$desc\n";
    print "Channel URL	$channel_url\n";
    print "Start Epoc	$start_epoc\n";
    print "End Epoc	$end_epoc\n";
    print "Certificate  $cert\n";
    print "Star Rating  $rating\n";
    print "-----------------------------------------------\n";

print "Records Loaded $rec_count\r"; 
  my $sqlins = "INSERT INTO listings 
               VALUES ('$indexref',

$sth = $dbh->prepare($sqlins);

$sth->finish;				# Commit database changes

# Calculate the difference between the start and finish time
# hence the duration of the recording
sub time_duration
my $sepoch=shift();
my $eepoch=shift();

my $diff = $eepoch-$sepoch;

(my $rss, my $rmm, my $rhh, my @rest) = gmtime($diff);

# Process the main configuration file
sub config_file
 my @conf_file = (1,1);
 open (CONF, $configuration) or die("Could NOT open configuration file $configuration");
 @conf_file = <CONF>;
 close (CONF);

 foreach my $line (@conf_file)
	if (!($line =~ m/\#/))
	my @line_split = split /=/, $line;
	if ($line_split[0] eq 'lead_in') 	{ $cf_leadin = $line_split[1]; }
	if ($line_split[0] eq 'lead_out') 	{ $cf_leadout = $line_split[1]; }
	if ($line_split[0] eq 'channel_file') 	{ $channel_file = $line_split[1]; }
	if ($line_split[0] eq 'xmltv_file') 	{ $xml_file = $line_split[1]; }
# Some simple formating to keep the "at" command happy
sub at_format
 my $num=shift();
 if ($num < 10) {$num = "0$num";}
# Remove none printable characters from text
sub cleanse
my ($var); 

#$var =~ s/\./ /g;
$var =~ s/\'//g;
$var =~ s/\,/ /g;

$var =~ s/%([0-9][0-9])/pack("C", hex($1))/eg;
$var =~ s/%0D/ /eg;
$var =~ s/%0A/ /eg;
$var =~ s/%([3-F][3-F])/pack("C", hex($1))/eg;
$var =~ s/%([5-F][5-F])/pack("C", hex($1))/eg;
$var =~ s/%([2-F][2-F])/pack("C", hex($1))/eg;
