Version 1 of string bytelength

Updated 2003-10-16 20:13:35

string bytelength string

Returns a decimal string giving the number of bytes used to represent string in memory. Because UTF-8 uses one to three bytes to represent Unicode characters, the byte length will not be the same as the character length in general. The cases where a script cares about the byte length are rare. In almost all cases, you should use the string length operation (including determining the length of a Tcl ByteArray object). Refer to the Tcl_NumUtfChars manual entry for more details on the UTF-8 representation.


Example:

   string bytelength "abc"

would return 3.

The reason this command is important is because the value returned is the number of memory bytes used by string. Since internally Tcl uses UTF-8 for internal representation, Unicode characters can take from one to three characters. If your code really cares about the memory size, then this is the function you need.

Note that (perhaps confusingly) [string bytelength] should not be used with binary data. This command measures how long the UTF-8 representation of a string is in bytes. For binary data you don't want conversion to UTF-8, so you don't want [string bytelength] either. Use [string length] instead.

US: Proof for the sceptical:

 for {set n 0} {$n < 256} {incr n} {
   lappend cl $n
 }
 set str [binary format c* $cl]
 puts "len: [string length $str]"
 puts "blen: [string bytelength $str]"

See also:


Tcl syntax help - Category Command