A method of [encoding] [UNICODE] characters. It takes a variable number of bytes per character (1..3), but has the good property of making those characters from the [ASCII] subset (a majority of those found in most [Tcl] programs and much other text) single bytes. Internally, Tcl uses a pseudo-UTF-8 encoding for most of its strings. This differs from the standard encoding in exactly one way: the NUL character (\u0000) is encoded using two bytes (i.e. in denormalized form). This means that we can use strings as binary-safe containers while still maintaining the C-string property of having a zero byte terminate the string. See also [Unicode and UTF-8]. ---- [DKF]: Here's a little utility procedure I wrote today when I needed to convert a UNICODE character into a set of UTF-8 encoded hex digits (for a C string literal): ====== proc toutf8 c { set s [encoding convertto utf-8 $c] binary scan $s cu* x format [string repeat \\x%x [string length $s]] {*}$x } ====== Demonstrating: === '''%''' toutf8 \u1234 ''\xe1\x88\xb4'' === ---- !!!!!! %|[Category Glossary]|% !!!!!!