Garbage collection

Difference between version 37 and 38 - Previous - Next
'''Garbage Collection''' is the cleanup of resources that are no-longer needed.




** See Also **

   [Arts and Crafts of Tcl-Tk programming]:   

   [Reference counted command objects]:   

   [linked lists]:   
   
   [Tcl and LISP]:   
   
   [complex data structures]:   

   [event processing & garbage collection in Tcl and Tk]:   [RJM]: I have recently had practical experiences with garbage collection in Tcl/Tk related to timing. Here a new page is referenced with the purpose to document.

   [Reference counting vs. tracing garbage collection]:   



** Further Reding **

   [http://blogs.msdn.com/b/oldnewthing/archive/2010/08/09/10047586.aspx%|%Everybody thinks about garbage collection the wrong way], Raymond Chen, 2010-08-09:   ''Garbage collection is simulating a computer with an infinite amount of memory''

   [https://news.ycombinator.com/item?id=23856169%|%What's New in Lua 5.4]:   A discussion of why garbage collection on doesn't fit into the Tcl, i.e. why it's pointless to try to clean up references in a system where an each object is referenced by name rather than by a direct binding.  [cow%|%copy-on-write] is the technique that fits Tcl.



** Description **

'''garbage collection''' is used in other languages to address a problem Tcl
doesn't have:  The dispoal of objects that are no longer reachable because all
references to them have been destroyed.  In the case of a system that
'''produces no garbage''', i.e., no explicit references to objects, a garbage
collector becomes pointless.  Tcl is such a system.  Variables and routines can
not be passed as as arguments to other routines, so it becomes a simple matter
of deleting a variable, namespace or routine when one is finished with it. 

In Tcl, [data structure%|%data structures] are not built by directly
referencing other other structures.  Instead, a values are used directly, and
[copy-on-write] is used internally to store the value only once even though it
may appear at the script level in multiple places. 

The only place at the script level where a reference is created in is when
`[upvar]` or `[namespace upvar]` is used to create a local reference in a
routine to a variable outside the routine.  This reference is naturally cleaned
up when the routine returns.

----

The following "'''AI koan'''" (see [http://www.catb.org/~esr/jargon/html/koans.html%|%The Jargon File]  for more) points out a fundamental difference between the Tcl and [Lisp] approaches to when unused memory is reclaimed and the implications this has for what can be a value.

One day a student came to Moon and said: "I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons."

Moon patiently told the student the following story:
    "One day a student came to Moon and said: `I understand how to make a better garbage collector ...

(Editorial note: Pure reference-count garbage collectors have problems with circular structures that point to themselves. On the primitive level Tcl avoids that problem by the principle that ''everything is a string'', since strings don't point to anything.)



** Reference Counting **


At the [C] level a count of references to each [Tcl_Obj] is kept within the
object itself. Each time the address of the object is stored somewhere the
reference count is incremented, and each time the address is released the
reference count is decremented.

An [extension] can implement and register its own [Tcl_Obj] internal
representation in order to more efficiently handle the data it uses, and the
standard reference-counting scheme will take care of the storage management.

An alternative is to provide an interface that returns a unique identifier or
allows the user to provide a unique name as a [handle] when creating a new set
of resources.



** Binding Cleanup to a Variable **


[Arjen Markus]: There has been much discussion about ''references'' to data in Tcl, in order to build [complex data structures] and such. Inevitably, garbage collection pops up.

This page is meant to show that at some mundane level you can let Tcl do the job for you. The script below will create a counter "object" that keeps its internal state hidden. It juggles a bit with the counter object and then throws it away.

The essence: the internal state is stored via the `[interp alias]` mechanism that allows extra arguments and the counter itself is destroyed via the `[trace]`.

----

======
namespace eval Counter {
    variable nextid 0

proc makecounter {name initial} {
    upvar $name vname
    variable nextid

    set vname [namespace current]::$nextid    uplevel [list trace add variable $name unset [
        list [namespace current]::deletecounter $vname]]

    interp alias {} $vname {} [
        namespace current]::increasecounter $vname $initial
    incr nextid
}

proc increasecounter {cmdname current} {
    set result [expr {$current+1}]
    interp alias {} $cmdname {} [
        namespace current]::increasecounter $cmdname $result
    return $result
}

proc deletecounter {aliasname counter dummy op} {
    interp alias {} $aliasname {}
    # puts "deleted: $counter"
}

} ;# End of namespace

#
# Create a counter
#
Counter::makecounter count1 0# puts [trace vinfo variable count1]
puts "1? [$count1]"
puts "2? [$count1]"

#
# Copy the counter (not a reference, just a harmless alias)
#
set count2 $count1
puts "3? [$count2]"

#
# Deleting the alias has no effect
#
unset count2
puts "4? [$count1]"

#
# Deleting the true counter does!
#
set count3 $count1
unset count1
puts "5? [$count3]"
======

Result:

======none
1? 1
2? 2
3? 3
4? 4
invalid command name "::Counter::0"
    while executing
"$count3"
    invoked from within
"puts "5? [$count3]"
"
    (file "counter.tcl" line 52)
======



** Misc **

I was pondering the [http] package; the need to call `http::cleanup` when done with a token and the potential for leaking
memory just seems wrong.  So I was thinking about a Tcl-level garbage collector, and came up with the following.  I suppose it's a mark & sweep collector of sorts, although it doesn't do any marking or recursive sweep.

======
proc gc-find pattern {
    set vars [info vars $pattern]
    set searchspace [uplevel info vars]
    foreach var $searchspace {
        if {[uplevel array exists $var]} {
            foreach {k v} [array get $var] {
                check-item $v vars
            }
        } else {
            check-item [uplevel set $var] vars
        }
    }
    return $vars
}

proc check-item {item vars} {
    upvar $vars vlist
    catch {
        foreach el $item {
            set s [lsearch -exact $vlist $el]
            if {$s > -1} {
                set vlist [lreplace $vlist $s $s]
            }
        }
    }
}
======

One would periodically call it as

======
foreach tok [gc-find {::http::[0-9]*}] {
    ::http::cleanup $tok
}
======

It assumes that any tokens will be either an individual item in a list, or a variable by itself, and it doesn't search
namespaces other than the root.

[RLE] 2011-09-22: Has anyone considered that with 8.5+'s `[dict]` that the [http] package could return a result [dict] instead of a handle to an [array]?  This would result in garbage collection happening automatically when the [dict]'s reference count fell to zero.

----

[AM] 2007-12-17: In response to a thread on the [comp.lang.tcl] group, I experimented a bit with procedure traces.
The idea I had was that the usual way of creating objects is to create a new procedure/command. If you want to create a ''local'' 
object, i.e. an object that should only exist during the life-time of a procedure, however, there is no way for Tcl to
know that that is what you intended. So there is no way to actually remove it when the procedure returns.

Unless you help it a bit.

And that is what is done in the slightly silly script below:

======
# gc.tcl --
#     An experiment with garbage-collecting "objects"

# localobj --
#     Create a _local_ object
#
# Arguments:
#     name      Name of the object/command
#
# Result:
#     None
#
# Side effects:
#     Creates a new command and a trace on the _calling_
#     procedure if needed
#
proc localobj name {
    global local_objects

    # Create the object
    proc $name {cmd} {
        if {$cmd eq {run}} {
            puts "[lindex [info level 0] 0]: Boo!"
        } else {
            return -code error "Command unknown: $cmd"
        }
    }

    # Administration:
    # - Store the command for later GC
    # - Add a trace to the caller, if this was not done yet
    # (Take care of global objects though!)
    if {[info level] > 1} {
        set caller [lindex [info level -1] 0]
        if {![info exists local_objects($caller)]} {
            trace add execution $caller leave [list localobj_destroy $caller]
        }
        lappend local_objects($caller) $name
    }
}

# localobj_destroy --
#     Destroy the caller's local objects
#
# Arguments:
#     caller    Name of the caller
#     command   Original command (ignored)
#     code      Return code (ignored)
#     result    Result of the command (ignored)
#     ops       Operation (ignored)
#
# Result:
#     None
#
# Side effects:
#     Destroys all objects created in the caller procedure
#
proc localobj_destroy {caller command code result ops} {
    global local_objects

    foreach obj $local_objects($caller) {
        rename $obj {}
    }
    unset local_objects($caller)
}

# main --
#     Test this
#
proc myproc {} {
    localobj cow1

    puts Myproc
    cow1 run
    cow2 run
    myproc2
}
proc myproc2 {} {
    # localobj cow1 ;# Hm, would override the other one

    puts Myproc2
    cow1 run                ;# cow1 was created by the calling procedure - it is still available. This is a slight flaw ;)
    cow2 run                ;# cow2 was created as a _global_ object, is this a flaw?
}

localobj cow2

myproc

puts Main
cow1    ;# Now object "cow1" no longer exists, so we get an error message
cow2

======

----

KD: Wouldn't it be better to call localobj with the name of a local variable, in which the name of the object will then be stored? In this way, Tcl's inherent rules for destroying local variables can be used to destroy the object itself too:

======
proc localobj_destroy {name args} {
    puts "Destroying $name"
    rename $name {}
}

proc localobj &name {
    global handlecounter
    if {![info exists handlecounter]} {set handlecounter 0}
    upvar 1 ${&name} name
    set name handle#[incr handlecounter]
    puts "Creating variable ${&name} = proc $name"
    proc $name {args} {puts "executing: [info level 0]"}
    trace add variable name unset [list [
        namespace which localobj_destroy] $name]
}

#Testing
proc myproc2 {} {
    localobj foo
    $foo testlocal2
}
proc myproc1 {} {
    localobj foo
    $foo testlocal1
    myproc2
}
localobj foo   ;# this one is in fact global
$foo testglobal
myproc1
======

Result:

======none
Creating variable foo = proc handle#1
executing: handle#1 testglobal
Creating variable foo = proc handle#2
executing: handle#2 testlocal1
Creating variable foo = proc handle#3
executing: handle#3 testlocal2
Destroying handle#3
Destroying handle#2
======

[AM] 2007-12-18: Some discussion on this solution was lost, due to a problem with
the disks. However, consider the following fragment:

======
proc myproc {} {
    localobj foo
    $foo testlocal1
    set bar $foo
    unset foo
    $bar testlocal2
}
======

If I understand the code correctly, then this won't work as expected:
[unset%|%unsetting] ''foo'' will cause the associated object to disappear, leaving 
''bar'' to pick up the pieces.

KD: Yes, that's right. By declaring `localobj foo`, you are signing a contract that the lifetime of the object is tied to the lifetime of the `$foo`. Usually that's also the lifetime of the procedure call, unless `$foo` is [unset] manually.

[AM]: Hm, it is not a perfect solution, but it does have its attractive points - mine was inspired by a partial/incorrect
understanding of [incr Tcl]. Your solution restricts objects to the procedure that created them (unless you
pass them to a called procedure). Good :)

[DKF]: One solution is to give objects a method that instructs them to "return themselves to the caller", which gives them a chance to manage their reference counting/[trace%|%traces]. Another is to allow creators of an object to specify what variable to couple the lifetime to, which allows for management via `[upvar]`; [NAP] does this via the '''as''' method IIRC.



<<categories>> Internals