A method of [encoding] [UNICODE] characters. It takes a variable number of bytes per character (1..3), but has the good property of making those characters from the [ASCII] subset (a majority of those found in most [Tcl] programs and much other text) single bytes.

Internally, Tcl uses a pseudo-UTF-8 encoding for most of its strings. This differs from the standard encoding in exactly one way: the NUL character (\u0000) is encoded using two bytes (i.e. in denormalized form). This means that we can use strings as binary-safe containers while still maintaining the C-string property of having a zero byte terminate the string.

See also [Unicode and UTF-8].
----
[DKF]: Here's a little utility procedure I wrote today when I needed to convert a UNICODE character into a set of UTF-8 encoded hex digits (for a C string literal):
======
proc toutf8 c {
    set s [encoding convertto utf-8 $c]
    binary scan $s cu* x
    format [string repeat \\x%x [string length $s]] {*}$x
}
======
Demonstrating:
===
'''%''' toutf8 \u1234
''\xe1\x88\xb4''
===
----
!!!!!!
%|[Category Glossary]|%
!!!!!!