In a recent discussion in the chat room, the frustrations of having characters without knowing their encoding, and wishing for a utility to assist in turning the characters into something nearly readable, resulted in the suggestion of :

RS: Larry: Yes. One might wrap that functionality in a text widget, into which you paste the suspicious page, and have a listbox to choose all encoding names offers... a double click on a listbox item converts the text contents. RS: Rolf: "encoding convertfrom foo" turns the questionable characters into UTF-8, which you can inspect directly in your text widget. Writing it to a text file, or stdout, involved an intrinsic "encoding convertto encoding system", which often is iso8859-1 or cp1252. So if the input hat Russian or Greek characters, these will not come through to system encoding, but will be replaced by question marks.

Alas, the situation is this - LV receives emails, files, web pages, etc. on a regular basis without an encoding specified. While much of the text characters appear correct, the punctuation is skewed - it shows up as ? or \x92,etc. instead of things like " or ' or -, and so forth.

All he wants to do is have the weird punctuation marks (which he figures in the editors of the creators look like various fancy punctuation characters) readable.

Examples: The Euro sign is represented in Windows cp1252 as \x80. If you get such strings in, you can see the real Euro sign with

 encoding convertfrom cp1252 \x80

Back you go with

 encoding convertto cp1252 \u20AC

Which default encoding system is used for communication with the OS (including file I/O), you can find out with

 encoding system

Which encodings are delivered with your Tcl version, you can easily see with

 encoding names

Can I use the 'encoding' command (or some appriate 'fconfigure -encoding') to take a Tcl source file (*.tcl) in an arbitrary encoding and output a well-formed Tcl source file which is pure ascii (i.e. all chars > 127 have been converted to \uhhhh unicode escape sequences)?

RS: Sure, but some assembly required - like this:

 proc u2x s {
    set res ""
    foreach i [split $s ""] {
        scan $i %c int
        if {$int<128} {
           append res $i
        } else {
           append res \\u[format 04.4X $int]
    set res
 set fp [open $filename]
 fconfigure $fp -encoding $originalEncoding
 set data [u2x [read $fp [file size $filename]]]
 close $fp
 set fp2 [open $newFilename w]
 puts -nonewline $fp2 $data
 close $fp2 

The "u2x" functionality is easily done, but it's also somewhere built-in in Tk - on Unix, codes for which no font has a character are substituted in "\uxxxx" style... (Windows mostly shows an empty rectangle). See Unicode and UTF-8

