2008-02-24 1. I keep a large collection of text data as a list in memory (lappend x {text...} etc) 2. I want to search this data. Here's what happens. set match [lsearch -regexp $x {needle}] -> memory usage of the tcl process more than doubles (before: 80MB, after: 200MB) (EDIT 2008-02-25: stays at 80 using -glob) foreach k $x { if {[regexp -nocase {needle} $k]} {puts "match"} - >ditto foreach k $x { if {[regexp -nocase {needle} [list $k]]} {puts "match"} -> Heureka! total memory usage stays at 80MB. I'm still not quite sure what's going on, it's about keeping lists 'pure' I guess. I'm now consulting these pages: [list] [shimmering] [pure list]. 6am EDIT: I think I almost get it now. regexp treats $x as string and forces every element into a string representation AS WELL as a list representation. sigh. lsearch is forcing a string representation of all elements. seems unavoidable. I could also lappend items as strings not lists, but I'm saving my data as TCL source for various reasons, and of course { } is much cleaner in that case. -[hans] ---- I see a common misconception about Tcl here. Please notice using curly braces to quote a command argument *does not* turn the contents of the {...} into a list. IOW, your ''lappend x {text...} etc'' should result in exactly the same value as ''lappend x "text..." etc''. As to the original issue of memory growth, I would be curious to read what the core maintainers have to say. Also interesting would be a -glob search. - [RT] 24Feb08 Ah, as for the glob search, indeed there's no memory doubling with that, thanx for the hint. -[hans] 2008-02-25 ---- [Lars H]: Are you taking about a the object getting a string representation, or getting a String internal representation? (They're not the same.) For foreach k $x { if {[regexp -nocase {needle} [list $k]]} {puts "match"} to make a difference, it seems it'd have to be the latter (in order to get a string rep for `[[list $k]]`, one first needs the string rep of `$k`, but since `[[list $k]]` is not the same [Tcl_Obj] as `$k`, an intrep imposed upon `[[list $k]]` by [regexp] will not be retained in the elements of `$x`); that String intreps use 2 bytes for every character is consistent with a jump from 80MB to 200MB, if you have about 60M ASCII characters without intrep in your big list. I once had a similar difficulty, but the other way round, with a program that dumped a list of lists of integers to a text file. In order to generate a stringrep for a list of integers, Tcl first had to generate stringreps for all the integers, and that similarly doubled the memory usage during the final dump. In that case I solved it by feeding each integer through [format] before I did anything stringy to it, since that gave a [Tcl_Obj] that wasn't shared with the big list of lists of integers... ---- I can see a lot clearer now. I think. Maybe. ''internal''' stringreps are always 2bytes/char. But a "string intrep" is not a "int rep", right? I have to admit all this stuff and terminology (int reps, intreps, string intreps....) is mildly confusing. -[hans] 2008-02-25 --- See also: * [Compact Data Storage]