Version 17 of string length

Updated 2020-06-14 00:47:51 by pooryorick

string length returns the number of characters in a value.

Synopsis

string length string

Description

A single character may require more than one byte of storage, so the length of the string may be smaller than the space required to store the string. A string in which every character requires only one byte of storage may be represented internally as a ByteArray. binary format creates such values, as does reading from a binary-encoded channel. Because binary data is typically represented as a string where each character only requires one byte of storage, string length is the right routine to use to get the size of binary data, while string bytelength is deprecated as a historical oddity.

The following alias has a name familiar from C:

interp alias {} strlen {} string length

Implementing string length

2003-10-17 in the Tcl chatroom, some of us played around with silly pure-Tcl implementations of string length:

proc strlen s {
    set n 0
    foreach char [split $s {}] {incr n}
    set n
} ;# RS


proc strlen s {
    llength [split $s {}]
} ;# AM


proc strlen string {
    regsub -all . $string +1 string
    expr 0$string
} ;# MS


interp alias {} strlen {} regexp -all . ;# MS, jcw


proc strlen string {
    expr 0[regsub -all . $string +1]
} ;# dkf


# The ''functional'' way:

proc strlen s {
    expr {[regexp {.(.*)} $s - s] ? (1+[strlen $s]) : 0}
} ;# EB

ulis, A recursive way:

proc strlen string {
    if {$string eq {}} {
        return 0
    } else {
        expr {[strlen [string range $string 1 end]] + 1}
    }
}

And the classical iterative way:

proc strlen {string} {
    set n 0
    while {$string ne {}} {
        set string [string range $string 1 end]
        incr n
    }
    return $n
}

Powers of ten (works only for short strings):

proc strlen s {
    expr round(log10(1.[regsub -all . $s *10]))
} ;# RS

At times, people ask which is better for determining whether a string is empty string:

string equal x x$str

string equal {} $str

![string compare {} $b]

[string length $str] == 0

![string length $str]

$str eq {}

string length or string equal are a bit quicker as they look to see if the strings are of equal size first.

regexp Bug

Negative-length strings: A bug (SF 230589) in regexp produces incredible consequences:

% regexp {[ ]*(^|[^@])@} x@ m; puts [string length $m]
-109537

Numbers vary by platform, the above was 8.4.1 on Solaris. (CMcC via RS)