OK, I have a small test file that contains utf-8 codes. Here it is (the language is Wolof)

Fˆndeen d‘kk la bu ay wolof aki seereer a fa nekk. DigantŽem ak
Cees jur—om-benni kilomeetar la. MbŽyum gerte ‘pp ci diiwaan bi mu

that is what it looks like in a vanilla editor, but in hex it is:

xxd test.txt
0000000: 46cb 866e 6465 656e 2064 e280 986b 6b20  F..ndeen d...kk 
0000010: 6c61 2062 7520 6179 2077 6f6c 6f66 2061  la bu ay wolof a
0000020: 6b69 2073 6565 7265 6572 2061 2066 6120  ki seereer a fa 
0000030: 6e65 6b6b 2e20 4469 6761 6e74 c5bd 656d  nekk. Digant..em
0000040: 2061 6b0d 0a43 6565 7320 6a75 72e2 8094   ak..Cees jur...
0000050: 6f6d 2d62 656e 6e69 206b 696c 6f6d 6565  om-benni kilomee
0000060: 7461 7220 6c61 2e20 4d62 c5bd 7975 6d20  tar la. Mb..yum 
0000070: 6765 7274 6520 e280 9870 7020 6369 2064  gerte ...pp ci d
0000080: 6969 7761 616e 2062 6920 6d75 0d0a       iiwaan bi mu..

The second character [cb86] is a non-standard coding for a-grave [à] which is found quite
consistently in web documents, although in 'real' utf-8, a-grave would be c3a0. Real utf-8 works
beautifully on Macs and under Windows.

I handle the fake utf-8 by using a character map which included the pair { ˆ à } because that
little caret is what cb86 generates, and everything works fine ON A MAC for displaying text (in a text widget)
like this:

Fàndeen dëkk la bu ay wolof aki seereer a fa nekk. Digantéem ak
Cees juróom-benni kilomeetar la. Mbéyum gerte ëpp ci diiwaan bi mu

On a PC - using the same file (shared) the first three characters read in are
46 cb 20 (using no fconfigure). I have run through ALL the possible encodings
and can never get the same map to work. [There are twenty that will allow 46 cb 86]

Sorry this is so long, but if anyone has a clue, I would love to hear it.

Tel Monks