https://wiki.tcl-lang.org/_edit/BUG+%2D+%27string+length%27+count+also+NON+visible+chars# **update 7 mar 2024** the '''CORE''' problem is ''not'' just the `string length` the '''CORE''' problem is the representation of the string in `TCL` (and all other programming languages as well) let's start simple: the problem is just the same als the ''multibyte chars problem'' raised up ~30 years ago. * at the beginning it was only `ascii` and a string was `char*` after a while they figure out that a languages with ''more'' chars than plain `ascii`. * It took ~20 years to implement `utf8` and finally `utf16` into all programming languages. `TCL` uses `utf8` as script-encoding and `utf16` internal as string representation because a ''string'' is only useful if ''index'' operations can be performt like: `string index $str 0` and `utf16` is '''index-able''' and `utf8` not. The `string length` problem is not alone, the is all the ''index-operation'' like `string index` or `string range` or `format %40s` that doesn't work. The '''CORE''' problem is that the ''terminal-chars'' are embedded into the string and '''not''' as an character attribute. * the ''terminal'' is the main ''presentation-layer'' of a `tcl`. * the ''gui-toolkit'' is the main ''presentation-layer'' of `tk`. The current `string length` in `tcl` is doing some like in-between `string bytelength` and `string visual-length` because ''multibyte-chars'' are reduced to a '''single-char''' but invisible ''console-ctrl-char'' are counted like a `char`. ***solution*** The solution is the same solution like in `utf16` or `tk`. every `char` have to be presented by a data-structure that hold 1. the ''multibyte-char'' (`utf16`) 1. the ''terminal-encoding'' (color,bold,underline,...) I call this `utf32` The `puts` operate as: 1. connected to a `terminal` just translate the `utf32` string into a ''stream'' understood by the ''terminal''. 1. connected to a `file ` just do a `utf32` write into a file ** (perhaps the ''utf32'' can be ''encoded'' into `utf8`) 1. finally you can write a ''terminal-colored-string'' into a file and read it again with the ''color'' information still available. I understand that '''this''' is not a job for `tcl` alone because '''all''' programming languages are involved and that is the reason I assume a `utf32` consortium is good and programmers of '''all''' languages start together to implement this shit. **original problem** Hi, I use a library to format and print a ''debugging-message'' which also uses the ''linux-terminal-color-code''. The '''CORE''' problem is that `string length ..` is used to format the message and finally add some additional information etc BUT the `string length ..` count the NON-visible color-code as well as visible chars that result in a MISS-format string example: (color is not visible in this thread !! - but all is "green") -> outTXT -> | |DEBUG[1]: aa -> LIST | <- count color codes -> | |DEBUG[1]: -> | a1 : 111 | <- count color codes -> | |DEBUG[1]: -> | a2 : 222 | <- count color codes -> | |DEBUG[1]: -> | a3 : 333 | a color-code is something like ## ------------------------------------------------------------------------ ## DEBUG helper ## Black 0;30 Dark Gray 1;30 ## Red 0;31 Light Red 1;31 ## Green 0;32 Light Green 1;32 ## Brown/Orange 0;33 Yellow 1;33 ## Blue 0;34 Light Blue 1;34 ## Purple 0;35 Light Purple 1;35 ## Cyan 0;36 Light Cyan 1;36 ## Light Gray 0;37 White 1;37 ## ## 8bit (256) colors: https://stackoverflow.com/questions/4842424/list-of-ansi-color-escape-sequences variable CL_COLOR set CL_COLOR(red) "\[1;31m" set CL_COLOR(green) "\[1;32m" set CL_COLOR(yellow) "\[1;33m" set CL_COLOR(blue) "\[1;34m" set CL_COLOR(purple) "\[38;5;206m" set CL_COLOR(cyan) "\[1;36m" set CL_COLOR(lightcyan) "\[38;5;51m" set CL_COLOR(white) "\[1;37m" set CL_COLOR(grey) "\[38;5;254m" set CL_COLOR(orange) "\[38;5;202m" set CL_COLOR(no) "" variable CL_RESET "\[0;m" code: set tstmsg "$lib_debug::CL_COLOR(red)123$lib_debug::CL_RESET" puts "$tstmsg" puts [string length $tstmsg] result: 123 (the color of "123" is red) 15 (this should be 3) there is something MISSING in TCL and this is the `string length ...` only count the !! VISIBLE !! chars mfg ao ---- [GWL] The definition of what is a !! VISIBLE !! chars depends on the device that is displaying the string. This is not something TCL is missing. This is very application specific.