2008-02-24 1. I keep a large collection of text data as a list in memory (lappend x {text...} etc) 2. I want to search this data. Here's what happens. set match [lsearch -regexp $x {needle}] -> memory usage of the tcl process more than doubles (before: 80MB, after: 200MB) (EDIT 2008-02-25: stays at 80 using -glob) foreach k $x { if {[regexp -nocase {needle} $k]} {puts "match"} - >ditto foreach k $x { if {[regexp -nocase {needle} [list $k]]} {puts "match"} -> Heureka! total memory usage stays at 80MB. I'm still not quite sure what's going on, it's about keeping lists 'pure' I guess. I'm now consulting these pages: [list] [shimmering] [pure list]. 6am EDIT: I think I almost get it now. regexp treats $x as string and forces every element into a string representation AS WELL as a list representation. sigh. lsearch is forcing a string representation of all elements. seems unavoidable. I could also lappend items as strings not lists, but I'm saving my data as TCL source for various reasons, and of course { } is much cleaner in that case. -[hans] ---- I see a common misconception about Tcl here. Please notice using curly braces to quote a command argument *does not* turn the contents of the {...} into a list. IOW, your ''lappend x {text...} etc'' should result in exactly the same value as ''lappend x "text..." etc''. As to the original issue of memory growth, I would be curious to read what the core maintainers have to say. Also interesting would be a -glob search. - [RT] 24Feb08 Ah, as for the glob search, indeed there's no memory doubling with that, thanx for the hint. Also, I'm not talking about values but about storage here. At the very least "lappend x {ABC}" results in only an internal list representation of the 3 bytes ABC being stored, no string rep, which is exactly what I want. Maximum memory efficiency. See below for more experimenting, especially eval "lappend a {$t}" -[hans] 2008-02-25 ---- [Lars H]: Are you taking about a the object getting a string representation, or getting a String internal representation? (They're not the same.) For foreach k $x { if {[regexp -nocase {needle} [list $k]]} {puts "match"} to make a difference, it seems it'd have to be the latter (in order to get a string rep for `[[list $k]]`, one first needs the string rep of `$k`, but since `[[list $k]]` is not the same [Tcl_Obj] as `$k`, an intrep imposed upon `[[list $k]]` by [regexp] will not be retained in the elements of `$x`); that String intreps use 2 bytes for every character is consistent with a jump from 80MB to 200MB, if you have about 60M ASCII characters without intrep in your big list. I once had a similar difficulty, but the other way round, with a program that dumped a list of lists of integers to a text file. In order to generate a stringrep for a list of integers, Tcl first had to generate stringreps for all the integers, and that similarly doubled the memory usage during the final dump. In that case I solved it by feeding each integer through [format] before I did anything stringy to it, since that gave a [Tcl_Obj] that wasn't shared with the big list of lists of integers... ---- Very well explained, makes sense to me. Let's play around a little bit more. set t [string repeat "a" 1000] Nr.1 for {set x 0} {$x<50000} {incr x} { lappend a $t } -> mem allocated: 413696 (ok, only references) Nr.2 for {set x 0} {$x<50000} {incr x} { lappend a [list $t] } -> mem allocated: 3153920 (not what I thougt it would do) Nr.3 for {set x 0} {$x<50000} {incr x} { lappend a [split $t "\x000"] } -> mem allocated: 55214080 (closer) Nr.4 for {set x 0} {$x<50000} {incr x} { eval "lappend a {$t}" } -> mem allocated: 52420608 (nice) I think I will stick to Nr.4, looks like if I'm careful I can get close to a 1/1 textchars/memorybytes ratio. No lsearch -regexp, but lsearch -glob is fine. -[hans] 2008-02-24 --- See also: * [Compact Data Storage]