ofnuts wrote:
....
This is exactly ('×,Ä,Ü', for 'ž,Ž,š') what you get when you use CP852 to display something which is intended for CP1250. So it looks like your data in is CP1250. What kind of errors do you get when you use CP1250?
The final question mark is really the ASCII code for a question mark (6310), so the data may have been corrupted on input, maybe because the 'ć' was not considered a valid character. But then an automatic change of 'i?' to 'ić' a the end of strings may be possible to fix this.
Here is what I get when I apply CP1250 to that bit of code,
for row in results:
trial_nam = row[0]
print trial_nam
print unicode(trial_nam, encoding = 'CP1250')
Karadži?
Karadži?
Staniši? & Simatovi?
Staniši? & Simatovi?
Boškoski & Tar?ulovski
Boškoski & Tar?ulovski
?or?evi?
?or?evi?
Ražnatovi?, Željko - "Arkan"
Ražnatovi?, Željko - "Arkan"
So the results are exactly the same whether I apply the unicode function or not. The only thing missing at this point is the odd 'c' and 'd'.
Again, it should be something like,
Karadžić
Stanišić & Simatović
Boškoski & Tarčulovski
Đorđević
Ražnatović, Željko - "Arkan"
I might add, the names are Serbian. The native script for Serbian is cyrillic, but printing Serbian words in English requires other characters used by Croatians who use a basic Latin script with the addition of these other characters in order to accomodate the Serbian language which the Croatians also speak.