Richard Suchenwirth - The little tool below lets you select an encoding in the listbox and display the characters between \x00 and \xFF of that encoding, hence it is most suited for single-byte encodings. Especially useful for checking the various cp... code pages.
package require Tk listbox .lb -yscrollcommand ".y set" -width 16 bind .lb <Double-1> {showCodepage .t [selection get]} scrollbar .y -command ".lb yview" text .t -bg white -height 32 -wrap word pack .lb .y .t -side left -fill y pack .t -fill both -expand 1 foreach encoding [lsort [encoding names]] { .lb insert end $encoding } proc showCodepage {w encoding} { $w delete 1.0 end wm title . $encoding set hexdigits [list 0 1 2 3 4 5 6 7 8 9 A B C D E F] foreach high $hexdigits { foreach low $hexdigits { set c [encoding convertfrom $encoding [subst \\x$high$low]] $w insert end "$high$low:$c " } $w insert end \n\n } } ;# RS
How does one do the reverse of this: given an encoding name and a character or utf-8 decimal value (possibly > 255):
scan $c %c decVal
How can one get from this to the code to use for that character in the particular encoding given? - RS: Elementary:
encoding convertto $target_encoding $utf8string encoding convertto $target_encoding [format %c $decimal_unicode]
But this gives one a character, not a decimal value, doesn't it (but I am confused, so please bear with me)? For example I know that a bullet \u2022 is actually decimal-165 in the macRoman encoding, decimal-8226 in utf-8, and something else in iso8859-1, but given a utf8string "\u2022", how do I generate those numbers (which I understand are the code-page indices or whatever of that glyph in that encoding)? - RS: Characters are decimal values. After encoding convertto, you can just scan it out:
% scan [encoding convertto macRoman \u2022] %c 165
But:
% scan [encoding convertto utf-8 \u2022] %c 226 % scan [encoding convertto iso8859-1 \u2022] %c 63
Aren't correct....