Version 9 of Interning a String

Updated 2006-09-21 12:36:59

>Strick> In some languages ( e.g. [L1 ] ) you can "intern" a string, which gives you one unique "interned instance" of the string with that value.

Here's a proc to do it in Tcl:

 proc intern s {
        if { ! [info exists ::INTERN($s)] } { set ::INTERN($s) $s }
        set ::INTERN($s)
 }

I used Tcl a dozen years before needing this, but today i wrote a program that used many of the same long strings (file pathnames) in some Tcl lists.

There's two different reasons you might want to intern a string:

  • To make string comparisons as fast as pointer comparisons
  • To save memory, when strings with the same value are used many times

Most often you intern strings for the first reason, but in Tcl that can't work: there are no separate "value" and "address" comparisons. Today I interned for the second reason.

MS remarks that Tcl has a shared pool of literal strings. Any code within proc bodies reuses literals from the shared pool. In a sense, all literal strings are already "interned", and this mechanism is mainly useful for strings that are generated at runtime.

NEM has interned strings for the first reason (fast comparisons). In Tcl, I'd do that like this:

 set INTERN_COUNTER 0
 proc intern str {
     if {![info exists ::INTERN($str)]} {
         set ::INTERN($str) [incr ::INTERN_COUNTER]
     }
     return $::INTERN($str)
 }

That returns a unique integer for each string and you can then do integer tests for equality. Interned strings are used often for programming language interpreters to represent symbols/ atoms/variables names to optimise for equality checks and hash-table lookups.


A quick look at tclExecute.c shows that the byte code interpreter in fact contains the trivial optimization of string compare for comparing an instance to itself - JBR

 "tclExecute.c" line 2457 of 6089 
       if (valuePtr == value2Ptr) {
            /*
               In the pure equality case, set lengths too for
               the checks below (or we could goto beyond it).
             */
            iResult = s1len = s2len = 0;
        } else if ((valuePtr->typePtr == &tclByteArrayType)

Interning a string using method 1 above optimized string compare and memory usage.


[ Category String Processing | Category Performance ]