Version 8 of Measuring memory usage

Updated 2008-02-25 00:12:17 by hans

This page is for collecting tools (probably mainly scripts) that are useful for measuring how much memory is consumed by a certain algorithm. The reason these are not on the Compact data storage page (where many of them are used) is that Tcl itself does not by default provide any way in which this can be measured, and hence one must instead rely on various platform-dependent tricks. There is of course the memory command, and the commands below try to make use of it when it is available, but in practice it usually isn't.


The growth procedure

Idea: Evaluate a couple of scripts (sequences of commands) and measure the size of the shell before, after, and in between every pair of them. To avoid errors caused by having made other measurements previously, the scripts are evaluated in a shell which constitutes a completely different process and which is terminated when all the scripts have been evaluated.

Syntax: growth <script> ?<script> ...?

Returns: a list of measurements of the size of the subshell spawned. If there are N script arguments then the return value will be a list with N+1 elements.

Limitation: Makes use of the ps command, so it probably only works on UNIX.

Note: The code assumes that there is a file "memsizehelper.tcl" in the current directory. The first block of code below will create this file if it doesn't already exist.

 if {![file exists memsizehelper.tcl]} then {
     set F [open memsizehelper.tcl w]
     puts $F {
         fileevent stdin readable {
             catch [join [gets stdin] \n]
             if {[catch {
                 puts stdout "mark [lindex [split [lindex [split [memory info] \n] 3]] end]"
             }]} then {puts stdout "mark ?"}
         }
         vwait forever
     }
     close $F
 }

 proc growth {args} {
     variable the_child [open "|tclsh memsizehelper.tcl" r+]
     fconfigure $the_child -blocking 0
     lappend args exit
     variable to_do $args
     variable measurements [list]
     fileevent $the_child readable {
         if {[gets $the_child the_line] < 0} then {
             close $the_child
             set the_child {}
         } elseif {![string match "mark *" $the_line]} then {
             puts stdout $the_line
         } else {
             if {![regexp {^mark ([0-9]+)$} $the_line -> measured_size]} then {
                 # Change the [exec ps ...] below if it doesn't work in general
                 scan [lindex [split\
                   [exec ps --pid [pid $the_child] --format vsize]\
                   \n] end] %d measured_size
                 set measured_size [expr {1024*$measured_size}]
             }
             lappend measurements $measured_size
             puts $the_child [split [lindex $to_do 0] \n]
             set to_do [lreplace $to_do 0 0]
             flush $the_child
         }
     }
     uplevel #0 [list set oldsize 0]
     puts $the_child ""
     flush $the_child
     vwait the_child
     return $measurements
 }

Example of use:

 growth {set a [string repeat Abracadabra 2000]}

returns something like

 2097152 2142208

and the difference between these numbers is the amount of bytes that the child shell grew.

Trickier example (using Delta):

 for {set N 100} {$N<=3000} {incr N 100} {
    set D [Delta [growth [format {
       set a [string repeat Abracadabra %d]
    } $N]]]
    puts stderr [format {%5d %10d} $N $D]
 }

The output is

  100       8192
  200       8192
  300       8192
  400       8192
  500       8192
  600      20480
  700      20480
  800      20480
  900      20480
 1000      20480
 1100      45056
 1200      45056
 1300      45056
 1400      45056
 1500      45056
 1600      45056
 1700      45056
 1800      45056
 1900      45056
 2000      45056
 2100      90112
 2200      90112
 2300      90112
 2400      90112
 2500      90112
 2600      90112
 2700      90112
 2800      90112
 2900      90112
 3000      90112

and this illustrates a problem with this kind of measurements. If the number of parts of an object grows as (TeX notation:) $c_1 n$, then the amount of memory Tcl allocates tends to grow as $c_1 2^{\lceil c_2 + \log_2 n \rceil - c_2}$, for some $0<c_2<1$ (end TeX notation). This makes single data point measurements risky.


The Delta procedure

Takes a list of numbers as argument and returns the list of differences between adjacent numbers.

 proc Delta {list} {
     set res [list]
     set prev [lindex $list 0]
     foreach item [lrange $list 1 end] {
         lappend res [expr {$item-$prev}]
         set prev $item
     }
     set res
 }

For Windows, here's what I'm using:

   proc usedmem {x} {
     set q1 [join [regexp -inline {Name.*} [exec pslist -m $x 2>nul]]]
     set words {}
     while {[scan $q1 %s%n word length] == 2} {
         lappend words $word
         set q1 [string range $q1 $length end]
     }
     return [expr [lindex $words$ 14] *1024]
    }    

pslist is part of pstools and can be downloaded here [L1 ].

-hans



[ Category Internals | Category Performance ]