Generating random strings

Arjen Markus (23 july 2002) This script presents two methods of generating arbitrary strings:

  • The binary format command allows you to convert a number into a character (or more generally, a binary string). The advantage of this first method is that you can easily create strings with characters that can not be found directly on the keyboard.
  • By selecting characters at random from a given string you can control both which characters you get in the output and their relative frequency (by duplicating characters that should appear more often).

Note that these are only two techniques, it is possible to come up with many more, some variations on the theme (by using non-uniform random distributions for instance), some different techniques altogether, for instance Markov chains. If you have an interesting method (for instance indeed Markov chains), then please add them!


#
# Proc to generate a string of (binary) characters
# Range defaults to 'A'-'z' (this includes several non-alphabetic
# characters)
#
binary scan A c A
binary scan z c z
proc randomDelimString [list length [list min $A] [list max $z]] {
    set range [expr {$max-$min}]

    set txt ""
    for {set i 0} {$i < $length} {incr i} {
       set ch [expr {$min+int(rand()*$range)}]
       append txt [binary format c $ch]
    }
    return $txt
}

#
# Proc to generate a string of (given) characters
# Range defaults to "ABCDEF...wxyz'
#
proc randomRangeString {length {chars "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"}} {
    set range [expr {[string length $chars]-1}]

    set txt ""
    for {set i 0} {$i < $length} {incr i} {
       set pos [expr {int(rand()*$range)}]
       append txt [string range $chars $pos $pos]
    }
    return $txt
}

puts [randomDelimString 30]
puts [randomRangeString 30]
puts [randomRangeString 30 "aaabcdeeeeee"]

#
# Time the performance
#
puts [time {set x [randomDelimString 100]} 10000]
puts [time {set x [randomRangeString 100]} 10000]

if { 0 } {
   # Sample output (Pentium II, 350 MHz, running Windows 98, Tcl 8.3.4):

 [lmZQAiB]Hb_hFpEw`LiEQSjOSsBfC
 BnnWCymqGYaFyAKOAMPfmnYKRJTugu
 abcbdeaeeadeeeceaebaaebeeeebee
 3564 microseconds per iteration
 2862 microseconds per iteration

}

In the chatroom, RS came up with the following solution:

proc lpick L {lindex $L [expr {int(rand()*[llength $L])}]}
proc randomlyPicked { length {chars {C G T A}} } {
    for {set i 0} {$i<$length} {incr i} {append res [lpick $chars]}
    return $res
}

puts [randomlyPicked 30]

Stu 2007-02-14 The CAS extension also provides random char/string commands.


The random page also has a random string generator.


Stu 2007-10-23 Random A-Za-z one-liner:

proc randAZazStr {len} { return [subst [string repeat {[format %c [expr {int(rand() * 26) + (rand() > .5 ? 97 : 65)}]]} $len]] }

aspect 2014-12: this previously had int(rand() * 10) > 5, which is a long-winded and incorrect way to flip coins (it's true only 40% of the time). A caution to those careless with entropy!