Info Font, Symbol, Character Question Marks and Diamond error icons.

Unicode, UTF-8 encoding, and Windows-1252 encoding

I've seen many posts where typographic quotation marks are being rendered as junk characters. This is a typical example: "a ‘score’ can be worked out"

That could be the result of Unicode characters being stored using UTF-8 encoding but interpreted and displayed using Windows-1252 encoding. Working backwards from what's displayed:

‘ in Windows-1252 encoding maps to E2 80 98 = 1110 0010 1000 0000 1001 1000

Using UTF-8 encoding those three bytes represent a single character (more properly, a Unicode code point) using the 16 bits within the curly brackets: 1110 {0010} 10{00 0000} 10{01 1000} The other bits are part of the UTF-8 encoding overhead and are ignored here.

That leaves: 0010 0000 0001 1000 = 20 18 = code point 8216 = left single quotation mark. Similarly, ’ maps to a Unicode code point 8217, right single quotation mark.

I wish I could say there's a trivial fix for this, but if your database has data encoded both ways there's probably not one that's going to work perfectly.

You might try changing this:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

to this:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

UTF-8 encoding is the standard now, and has been for some years. But some Windows-1252 encodings are not valid UTF-8 encodings, and those characters (mostly a handful of accented letters) might display as something else.
 
the problem is that we have to ensure vb and the db are both using utf-8 encoding which requires us to run a job on the whole db. we'll get it addressed - just need to schedule the db job (which means taking site down for a while) and do it right.

alasdair
 
������

Awww.. Cute. ��

It's a unicode unicorn.

EDIT: Annd they got fucked on my post. That's strange.. They render fine when I'm typing it. Why did it change yours and not mine? Might go back to the test subforum and play around with it more. It got changed to unicode replacement character U+FFFD from U+1F984.
 
Last edited:
the problem is that we have to ensure vb and the db are both using utf-8 encoding which requires us to run a job on the whole db. we'll get it addressed - just need to schedule the db job (which means taking site down for a while) and do it right.

alasdair

Got it, thank you!
 
the problem is that we have to ensure vb and the db are both using utf-8 encoding which requires us to run a job on the whole db. we'll get it addressed - just need to schedule the db job (which means taking site down for a while) and do it right.

alasdair

Makes sense since the difficulties all kicked off when the site went down in October and we switched to the backup db (iirc).
 
All of this should be resolved now.

We officially have £ signs back, welcome to the 80's!
 
Top