Measuring memory usage

Measuring memory usage presents techniques and tools for determining the memory requirements and/or actual memory usage of algorithms and programs.

The reason these are not on the Compact data storage page (where many of them are used) is that Tcl itself does not by default provide any way in which this can be measured, and hence one must instead rely on various platform-dependent tricks. There is of course the memory command, and the commands below try to make use of it when it is available, but in practice it usually isn't.


The growth procedure

Idea: Evaluate a couple of scripts (sequences of commands) and measure the size of the shell before, after, and in between every pair of them. To avoid errors caused by having made other measurements previously, the scripts are evaluated in a shell which constitutes a completely different process and which is terminated when all the scripts have been evaluated.

Syntax:

growth <script> ?<script> ...?

Returns: a list of measurements of the size of the subshell spawned. If there are N script arguments then the return value will be a list with N+1 elements.

Limitation: Makes use of the ps command, so it probably only works on UNIX.

Note: The code assumes that there is a file "memsizehelper.tcl" in the current directory. The first block of code below will create this file if it doesn't already exist.

if {![file exists memsizehelper.tcl]} then {
    set F [open memsizehelper.tcl w]
    puts $F {
        fileevent stdin readable {
            catch [join [gets stdin] \n]
            if {[catch {
                puts stdout [list mark [
                    lindex [split [lindex [split [memory info] \n] 3]] end]]
            }]} then {puts stdout [list mark ?]}
        }
        vwait forever
    }
    close $F
}

proc growth args {
    variable the_child [open |[list tclsh memsizehelper.tcl] r+]
    fconfigure $the_child -blocking 0
    lappend args exit
    variable to_do $args
    variable measurements [list]
    fileevent $the_child readable {
        if {[gets $the_child the_line] < 0} then {
            close $the_child
            set the_child {}
        } elseif {![string match {mark *} $the_line]} then {
            puts stdout $the_line
        } else {
            if {![regexp {^mark ([0-9]+)$} $the_line -> measured_size]} then {
                # Change the [exec ps ...] below if it doesn't work in general
                scan [lindex [split [exec ps --pid [
                    pid $the_child] --format vsize] \n] end] %d measured_size
                set measured_size [expr {1024*$measured_size}]
            }
            lappend measurements $measured_size
            puts $the_child [split [lindex $to_do 0] \n]
            set to_do [lreplace $to_do 0 0]
            flush $the_child
        }
    }
    uplevel #0 [list set oldsize 0]
    puts $the_child {} 
    flush $the_child
    vwait the_child
    return $measurements
}

Example of use:

growth {set a [string repeat Abracadabra 2000]}

returns something like

2097152 2142208

and the difference between these numbers is the amount of bytes that the child shell grew.

Trickier example (using Delta):

for {set N 100} {$N <= 3000} {incr N 100} {
    set D [Delta [growth [format {
        set a [string repeat Abracadabra %d]
    } $N]]]
    puts stderr [format {%5d %10d} $N $D]
}

The output is

 100       8192
 200       8192
 300       8192
 400       8192
 500       8192
 600      20480
 700      20480
 800      20480
 900      20480
1000      20480
1100      45056
1200      45056
1300      45056
1400      45056
1500      45056
1600      45056
1700      45056
1800      45056
1900      45056
2000      45056
2100      90112
2200      90112
2300      90112
2400      90112
2500      90112
2600      90112
2700      90112
2800      90112
2900      90112
3000      90112

and this illustrates a problem with this kind of measurements. If the number of parts of an object grows as (TeX notation:) $c_1 n$, then the amount of memory Tcl allocates tends to grow as $c_1 2^{\lceil c_2 + \log_2 n \rceil - c_2}$, for some $0<c_2<1$ (end TeX notation). This makes single-data-point measurements risky.


The Delta procedure

Takes a list of numbers as argument and returns the list of differences between adjacent numbers.

proc Delta list {
    set res [list]
    set prev [lindex $list 0]
    foreach item [lrange $list 1 end] {
        lappend res [expr {$item-$prev}]
        set prev $item
    }
    set res
}

For Windows, here's what I'm using:

proc usedmem x {
    set q1 [join [regexp -inline Name.* [exec pslist -m $x 2>nul]]]
    set words {}
    while {[scan $q1 %s%n word length] == 2} {
        lappend words $word
        set q1 [string range $q1 $length end]
    }
    return [expr {[lindex $words$ 14] * 1024}]
}    

pslist is part of Sysinternals Process Utilities and can be downloaded here here .

-hans

See Also

Character byte size in tcl memory management
Memory Usage Pattern Graph
Compact data storage
tcl size