[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TV] XMLTV and PHP - written a parser?



Thanks for that, that's perfect.

Incidentally, it displays as inline text in Gmail (indeed, there's no
attachment showing, that's how clever Gmail is) - but doubtless it
would have opened in SubEthaEdit on this Apple Mac.

Windows? Pah!


On Sun, 02 Jan 2005 23:33:47 +0000, simon <simon@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>  James,
>  
>  The perl source is attached.  It was developed and is run on a linux system
> hence the lack of file extension. I find that wordpad reads the file well
> under windows. To give you a starter, the rough logic is as follows;
>  
>  - Connect to the database
>  - Delete all previous records to clean the database before this load
>  - Read a configuration file which is for my application (sets lead in, lead
> out, etc for the recordings)
>  - Open and process previously downloaded XMLTV files writing various
> records either directly from the XMLTV listings or derived for my own
> application
>  
>  Hope this helps and good luck with the project.
>  
>  Regards
>  
>  Simon
> 
>  
>  James Cridland wrote: 
>  Simon, that's very kind - I know little perl, but it might be a good start,
> as you say... On Sun, 02 Jan 2005 20:32:00 +0000, simon
> <simon@xxxxxxxxxxxxxxxxxxxxxx> wrote: 
>  Hi James, As part of my home grown PVR I've written a bit of perl code that
> loads a mysql database from the XMLTV files downloaded from the Radio Times
> site. It's not terribly generic since I too am intrinsically lazy and wanted
> it only for myself ...(path of least resistance) but it does parse the XMLTV
> files and load a database. It should work equally well with any SQL based
> database although I've never tried it. If this sounds like a good starting
> point for what you what then drop me a line and I'll gladly donate them.
> Happy New Year Simon James Cridland wrote: 
>  Hello. I'm planning to use bleb.org's data, with others, to do a few things
> at www.mediauk.com - not just TV schedules, but to build a database of
> television programmes and add to this database with more information >from
> other sources. 
>  I'd like to import the XMLTV format, primarily because other broadcasters
> may use that too - and, where it exists, I'll take it direct from them. Now,
> being intrinsically lazy... has anyone written a simple importer for the
> bleb.org files (and I'm thinking particularly of the XMLTV files)? I'm happy
> to put in the spadework, but if someone's already invented that wheel, I'd
> prefer not to have to reinvent it, if at all possible. And, as (I think) the
> first poster to this list in 2005, Happy New Year.
> ----------------------------------------------------------------------
> Distributed to the bleb.org/tv developer list. Archive available at:
> http://www.bleb.org/tv/maillist/ To unsubscribe, send 'UNSUBSCRIBE
> james.cridland@xxxxxxxxx' to mailto:tvdevel-request@xxxxxxxxx If you have
> any problems please contact mailto:listmaster@xxxxxxxx 
> #!/usr/bin/perl
> #
> # TV Recording - Listings DB load Interface
> #
> # Runs immediately after the XML TV listings have been downloaded. Parses
> the XML listings files and
> # loads them into a database which is used as a central repository for
> listing on the web interface
> # and holding the recording information.
> #
> # Version       Change
> # 0.0           Start of the script
> # 0.1           Tidied up the code to implement strict
> # 2.0           Implemented XMLTV format input file with multi day
> capability and lots more fields
> #
> use XML::Twig;
> use Time::Local;
> use DBI;
> use strict;
> use vars qw(
>             $dsn
>             $db_user_name
>             $db_password
>             $dbh
>             $listconf
>             $channel_file
>             $cf_leadin
>             $cf_leadout
>             @listings
>             @channtitles
>             $configuration
>             %CHANHASH
>             $xml_file
>             $rec_count
>             $dbg
>             $timenow
>             $sixdays
>             );
> $|=1;
> $dbg="n";                                               # Set to "y" to dump
> records as they are loaded
> $configuration="/etc/gluepvr/gluepvr.conf";
> $dsn='DBI:mysql:tvrec:localhost';
> $db_user_name='apache';                                 # Webserver user
> name
> $db_password='xxxxx';                                   # Webserver password
> $dbh = DBI->connect($dsn,$db_user_name,$db_password) or die "Can't connect:
> ", $DBI::errstr;
> $dbh->{RaiseError} = 1;
> $rec_count=0;
> 
> #------------------------------------------------------------------------
> # MAIN logic
> #------------------------------------------------------------------------
> my $sqldel = "DELETE FROM listings";                    # Delete everything
> from the listings file
> my $sth = $dbh->prepare($sqldel);                       # Prepare and
> execute the SQL
> $sth->execute;
> $sth->finish;                                           # Commit database
> changes
> 
> $timenow = time();                                      # Get the current
> time
> $sixdays = $timenow + (86400*6);                        # Get six days time
> &config_file;                                           # Process the
> configuration file
> my $twigtv= new XML::Twig                               # Set the twig
> handling routines
>                 (
>                 TwigHandlers =>
>                     {
>                     channel   => \&channel_handler,
>                     programme => \&program_handler
>                     }
>                 );
> 
> my $rc=$twigtv->parsefile($xml_file) or die "Can't Open XMLTV listings file
> $xml_file "; # Parse the xmltv listings file
> 
> open (CONF, ">$channel_file");                          # Write the channel
> listings file
> print CONF @channtitles;
> close CONF;
> print "\n";                                             # To leave the load
> count on the screen
> $dbh->disconnect;                                       # Disconnect
> database
> #------------------------------------------------------------------------
> # Handle the channel description elements from the xmltv listing
> #------------------------------------------------------------------------
> sub channel_handler
> {
>  my($twig,$channel)=@_;                                         # Get the
> twig and element
>  my $chtitle=$channel->first_child('display-name')->text;       # Extract
> the display name
>  $chtitle=~ s/\s/\_/;                                           # Replace
> white space with "_"
>  my $chid=$channel->att('id');                                  # Extract
> the id
>  my $chanline="$chtitle=$chid\n";                               # Create a
> string
>  push @channtitles,$chanline;                                   # Add them
> to the array for eventual output
>  $CHANHASH{$chid}=$chtitle;                                     # Create a
> hash for later reference
> }
> #------------------------------------------------------------------------
> # Parse the programme and load the database
> #------------------------------------------------------------------------
> sub program_handler
> {
> my ($twig,$prog)=@_;                                    # Get the twig and
> the element
> 
> my $xml_start_time="";
> my $xml_stop_time="";
> my $channel_url="";
> my $title="";
> my $desc="";
> my $category="";
> my $sth="";
> my $cert="";
> my $rating="";
> 
> my $rc = eval                                                               
>    # Not all elements always exist so catch errors in case they don't
>  {
>     $xml_start_time=$prog->att('start');                                    
>    # Start time
>     $xml_stop_time=$prog->att('stop');                                      
>    # End time
>     $channel_url=$prog->att('channel');                                     
>    # Listing Channel URL
>     $title=$prog->first_child('title')->text;                               
>    # Program title
>     $desc=$prog->first_child('desc')->text;                                 
>    # Program description
>     $category=$prog->first_child('category')->text;                         
>    # Program category
>  };
> 
> $cert="";
> $rating="";
> 
> if ($category eq "film")
>  {
>  $rc=eval
>   {
>    $cert=$prog->first_child('rating')->first_child('value')->text;
>    $rating=$prog->first_child('star-rating')->first_child('value')->text;
>   }
>  }
> 
> my ($syyyy,$smon,$sdd,$shh,$smm,$sss)=$xml_start_time=~
> /(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.+)$/;     # Parse out the
> components
> my ($eyyyy,$emon,$edd,$ehh,$emm,$ess)=$xml_stop_time=~
> /(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})(.+)$/;
> 
> my $recdate="$sdd.$smon.$syyyy";                                            
>    # Set start of recording date
> 
> my $start_epoc = timelocal($sss,$smm,$shh,$sdd,$smon-1,$syyyy-1900);        
>    # Calculate the start time epoch
> if ($start_epoc > $sixdays){return();}                                      
>    # Only interested in up to six days worth
> my $end_epoc = timelocal($ess,$emm,$ehh,$edd,$emon-1,$eyyyy-1900);          
>    # Calculate the end time epoch
> 
> my $indexref="$start_epoc$channel_url";                                     
>    # Create a unique reference using the time and the channel
> 
> my $end_record = $end_epoc + ($cf_leadout*60);                              
>    # Add the lead out time
> my $start_record = $start_epoc - ($cf_leadin*60);                           
>    # Add the lead in time (for duration calculation)
> 
> (my $dhh, my $dmm)= &time_duration($start_record,$end_record);              
>    # Calculate the HH:MM format of the duration
> (my $filler,my $rec_mm,my $rec_hh,my  @ignored) = localtime($start_epoc);   
>    # Convert the start time back to HH MM
> 
> $rec_mm=&at_format($rec_mm);                                                
>    # Simple leading zero formating
> $rec_hh=&at_format($rec_hh);
> $dhh=&at_format($dhh);
> $dmm=&at_format($dmm);
> 
> $desc=&cleanse($desc);                                                      
>    # Format for html display
> $title=&cleanse($title);
> $category=&cleanse($category);
> $rec_count++;
> if($dbg eq "y")
> {
>     print "-----------------------------------------------\n";
>     print "Index ref    $indexref\n";
>     print "Channel      $CHANHASH{$channel_url}\n";
>     print "Title        $title\n";
>     print "Category     $category\n";
>     print "Start date   $recdate\n";
>     print "Start time   $rec_hh:$rec_mm\n";
>     print "Duration     $dhh:$dmm:00\n";
>     print "Description  $desc\n";
>     print "Channel URL  $channel_url\n";
>     print "Start Epoc   $start_epoc\n";
>     print "End Epoc     $end_epoc\n";
>     print "Certificate  $cert\n";
>     print "Star Rating  $rating\n";
>     print "-----------------------------------------------\n";
> }
> 
> print "Records Loaded $rec_count\r";
> 
>   my $sqlins = "INSERT INTO listings
>                 (indexref,
>                  channel,
>                  title,
>                  category,
>                  start_date,
>                  start_time,
>                  duration,
>                  description,
>                  rating,
>                  cert,
>                  infourl,
>                  recstat,
>                  start_epoc,
>                  end_epoc)
>                VALUES ('$indexref',
>                        '$CHANHASH{$channel_url}',
>                        '$title',
>                        '$category',
>                        '$recdate',
>                        '$rec_hh:$rec_mm',
>                        '$dhh:$dmm:00',
>                        '$desc',
>                        '$rating',
>                        '$cert',
>                        '$channel_url',
>                        '&nbsp;',
>                        '$start_epoc',
>                        '$end_epoc')";
> 
> $sth = $dbh->prepare($sqlins);
> $sth->execute;
> 
> $sth->finish;                           # Commit database changes
> return();
> }
> 
> #-------------------------------------------------------------------------
> # Calculate the difference between the start and finish time
> # hence the duration of the recording
> #-------------------------------------------------------------------------
> sub time_duration
> {
> my $sepoch=shift();
> my $eepoch=shift();
> 
> my $diff = $eepoch-$sepoch;
> 
> (my $rss, my $rmm, my $rhh, my @rest) = gmtime($diff);
> return($rhh,$rmm);
> 
> }
> #---------------------------------------------------------------------------
> # Process the main configuration file
> #---------------------------------------------------------------------------
> sub config_file
> {
>  my @conf_file = (1,1);
>  open (CONF, $configuration) or die("Could NOT open configuration file
> $configuration");
>  @conf_file = <CONF>;
>  close (CONF);
> 
>  foreach my $line (@conf_file)
>  {
>         if (!($line =~ m/\#/))
>         {
>         my @line_split = split /=/, $line;
>         if ($line_split[0] eq 'lead_in')        { $cf_leadin =
> $line_split[1]; }
>         if ($line_split[0] eq 'lead_out')       { $cf_leadout =
> $line_split[1]; }
>         if ($line_split[0] eq 'channel_file')   { $channel_file =
> $line_split[1]; }
>         if ($line_split[0] eq 'xmltv_file')     { $xml_file =
> $line_split[1]; }
>         }
>  }
> }
> #--------------------------------------------------------------------------
> # Some simple formating to keep the "at" command happy
> #--------------------------------------------------------------------------
> sub at_format
> {
>  my $num=shift();
>  if ($num < 10) {$num = "0$num";}
>  return($num);
> }
> #-----------------------------------------------------------------------
> # Remove none printable characters from text
> #-----------------------------------------------------------------------
> sub cleanse
> {
> my ($var);
> $var=$_[0];
> 
> #$var =~ s/\./ /g;
> $var =~ s/\'//g;
> $var =~ s/\,/ /g;
> 
> $var =~ s/%([0-9][0-9])/pack("C", hex($1))/eg;
> $var =~ s/%0D/ /eg;
> $var =~ s/%0A/ /eg;
> $var =~ s/%([3-F][3-F])/pack("C", hex($1))/eg;
> $var =~ s/%([5-F][5-F])/pack("C", hex($1))/eg;
> $var =~ s/%([2-F][2-F])/pack("C", hex($1))/eg;
> 
> return($var);
> }
> 
> 
> 


-- 
http://james.cridland.net/