encoding convertfrom

Difference between version 7 and 8 - Previous - Next
    :   '''`[encoding] convertfrom`''' ?''encoding''? ''dastaring''
ConvThert ''data'' tlo Unicodwe charac 8-biters from the specified enacoding. 
The characters in ''dasta'' are treated as binaryg'' data 
where the lower 8-bits of each character is taken as a single byte (i.e., ''data'' is interpreted as a byte array). 
Tthe resulting sequence of bytes is trconvearted as a stfrom ''encoding'' in theo specifieda ''e[Unicode]
string''. 
 If ''encoding'' is not specified, the curr[entcoding system %|%system
encoding] is used.



** Invalid Data **

[PYK] 2017-08-19: If a string to be converted from `utf-8` contains invalid
utf-8 byte sequences, each invalid byte is interpreted as an 8-bit integer and
converted to the unicode character at that code point. I.e., `encoding
convertfrom utf-8` will never fail, so it can not be used to determine whether
a string is valid utf-8.

======
set value [binary format c 239]
set value [encoding convertfrom utf-8 $value]
scan $value %c codepoint ; # $codepoint == 239
======

For comparison, here is the same operation on a valid utf-8 sequence: 

======
set value [binary format ccc 239 188 129]
set value [encoding convertfrom utf-8 $value]
scan $value %c codepoint ; # $codepoint == 65281
======



** Fonts and Encodings **
[MG]: HI have a bit of a strange problem, hopefully someone can help. This example script shows what I'm trying to do - it displays cp437-encoded text in a [text] widget:


======
text .t -font Term
pack .t
.t insert end [format %c 152]
.t insert end [encoding convertfrom cp437 [format %c 152]]
======


The Term font being used is available at http://8bit.memoryleak.org/Flag/Term.ttf and is designed for displaying cp437-encoded text.

Character 152 in cp437 is a y-umlaut. However, the first insert displays a placeholder character (a solid down-arrow) instead. The second does display a y-umlaut, but it does so by mapping to character 255, which isn't available in the Term font (because it has no meaning in cp437), so Tcl uses a fallback font, and it looks totally wrong (Term is fixed-width and quite bold; the fallback font, Lucida Sans Unicode, doesn't match up at all).

I can use the Term font in other (non-Tcl) applications, for instance MS Word, and insert char 152, which gives a y-umlaut without any problem. I honestly have no idea what's causing this issue; can anyone shed any light?

** See Also **

   * [encoding]
   * [encoding convertto]
   * [encoding names]
   * [encoding system]

<<categories>> Command | Tcl syntax help | Binary Data