atoms

Richard Suchenwirth 2007-05-22 - Some languages (like Lisp) have the concept of "atoms" which are values (often strings) stored in only one place, and all uses of such values share the same representation. This reduces memory needs for often-repeated strings, and allows fast comparison - two atoms are only equal if they have the same address (on the C side). Here's a quick sketch how to do that in Tcl:

proc atom x {
    if {[catch {set ::Atom($x)}]} {
        set ::Atom($x) $x
    } else {set ::Atom($x)}
}
#-- Testing
% set res {}
% foreach test {foo bar grill bar foo} {lappend res [atom $test]}
% parray Atom
Atom(bar)   = bar
Atom(foo)   = foo
Atom(grill) = grill
% puts $res
foo bar grill bar foo

The test shows that while you see the atom values multiple times, they only appear in the array once.

Disadvantage: If used uncritically, the Atom array may grow considerably in long-running apps. Values that are unlikely to occur repeatedly should better not be "atomized". For instance, in a library database application, author or publisher names or years are well worth sharing with atoms, but books' titles probably not.

DKF: This is sometimes called "intern"ing the strings. Tk interns many strings internally (as Tk_Uids); this is the most common cause of memory leaks in Tk applications, though in many situations it doesn't matter.

DKF: Tcl also effectively interns constants, though less aggressively (and with a more controlled lifespan).


escargo - What would be the difference between [catch set ::Atom($x)] and [catch {set ::Atom($x)}]?

one does what you expect, and one doesn't. Only the first argument to catch will be executed; in the case of your first example that would be the one-word command "set" - RS: oops - fixed.. :^) - not examining catch results may indeed hide bad bugs...

LV Hey, anyone know if there is a functional difference between the atom proc above and this one?

proc atom x {
    if { [info exists ::Atom($x)]} {
        set ::Atom($x)
    } else {
        set ::Atom($x) $x
    }
}

The main reason I like this is that the if more clearly specifies why it is executed. - RS: see Tcl performance: catch vs. info.