[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [TV] Character set oddity on HTML pages
In list.comp.tv, I wrote:
> In list.comp.tv, Gerph wrote:
>> which I /think/ is the correct sequence for the e with an accent over it
>> if you were displaying ISO 8859-1 plain version of the UTF-8 encoded
>> character (sorry that sounds complicated).
Actually it's what you get when you:
* Take a UTF-8 character (in this case 'e' with an acute accent)
* Encode the two bytes making it up as two separate UTF-8 characters
(ie. we've now got 4 bytes)
* Read those unicode characters back in
* Convert them to HTML entities
> Yeah, I think Perl's either not being as clever as I thought it was, or
> something's changed in a recent update.
I think it's the former (and an encoding change to UTF-8 on one of the
websites which Perl wasn't automagically handling). I also think it's
now fixed, but it may have ended up breaking one of the other channels.
Let me know if the XML's no longer valid UTF-8 (or it/the website is
displaying the wrong characters).
Andrew Flegg -- mailto:andrew@xxxxxxxx | http://www.bleb.org/