https://wiki.tcl-lang.org/_edit/BUG+%2D+%27string+length%27+count+also+NON+visible+chars#
The CORE problem is not just the string length the CORE problem is the representation of the string in tcl
Let's start simple: the problem is just the same as the multibyte-chars-problem raised up ~30 years ago.
tcl uses utf8 as script-encoding and utf16 internal as string representation because
The string length problem is not alone, there is also the index-operation-problem like:
that doesn't work.
The CORE problem is that the terminal-chars are embedded into the string and not as an character attribute.
The current string length in tcl is doing something in-between string bytelength and string visual-length because multibyte-chars are reduced to a single-char but invisible console-ctrl-char are counted like a char.
The solution is the same solution like in utf16 or tk. every char have to be presented by a data-structure that hold
I call this utf32
The puts operate as:
I understand that this is not a job for tcl alone because all programming languages are involved and that is the reason I assume a utf32 consortium is good and programmers of all languages start together to implement this shit.
Hi,
I use a library to format and print a debugging-message which also uses the linux-terminal-color-code.
The CORE problem is that string length .. is used to format the message and finally add some additional information etc BUT the string length .. count the NON-visible color-code as well as visible chars that result in a MISS-format string
example: (color is not visible in this thread !! - but all is "green")
-> outTXT<multi-line> -> | |DEBUG[1]: aa -> LIST | <- count color codes -> | |DEBUG[1]: -> | a1 : 111 | <- count color codes -> | |DEBUG[1]: -> | a2 : 222 | <- count color codes -> | |DEBUG[1]: -> | a3 : 333 |
a color-code is something like
## ------------------------------------------------------------------------ ## DEBUG helper ## Black 0;30 Dark Gray 1;30 ## Red 0;31 Light Red 1;31 ## Green 0;32 Light Green 1;32 ## Brown/Orange 0;33 Yellow 1;33 ## Blue 0;34 Light Blue 1;34 ## Purple 0;35 Light Purple 1;35 ## Cyan 0;36 Light Cyan 1;36 ## Light Gray 0;37 White 1;37 ## ## 8bit (256) colors: https://stackoverflow.com/questions/4842424/list-of-ansi-color-escape-sequences variable CL_COLOR set CL_COLOR(red) "\[1;31m" set CL_COLOR(green) "\[1;32m" set CL_COLOR(yellow) "\[1;33m" set CL_COLOR(blue) "\[1;34m" set CL_COLOR(purple) "\[38;5;206m" set CL_COLOR(cyan) "\[1;36m" set CL_COLOR(lightcyan) "\[38;5;51m" set CL_COLOR(white) "\[1;37m" set CL_COLOR(grey) "\[38;5;254m" set CL_COLOR(orange) "\[38;5;202m" set CL_COLOR(no) "" variable CL_RESET "\[0;m"
code:
set tstmsg "$lib_debug::CL_COLOR(red)123$lib_debug::CL_RESET" puts "$tstmsg" puts [string length $tstmsg]
result:
123 (the color of "123" is red) 15 (this should be 3)
there is something MISSING in TCL and this is the string length ... only count the !! VISIBLE !! chars
mfg ao
GWL The definition of what is a !! VISIBLE !! chars depends on the device that is displaying the string. This is not something TCL is missing. This is very application specific.
JMN Processing strings containing ANSI SGR sequences can be pretty tricky. A good first step is to use a regex to split into a list of plaintext sections and code sections - possibly each individual ANSI code or possibly all lumped together depending on what you want to do.
If you want to manipulate things and join blocks of text together - you need to do ANSI resets and replays, which is possible, but escalates the complexity quickly.
Perl has Text::ANSI::Util and others which is good for some ideas.
For my ANSI handling in Tcl I do a perl style split to get an always zero-length or odd length list; starting and ending always with plaintext. You can then do a foreach {plaintext code} $ansisplits {dostuff...} which makes things manageable.