Directory recursion

Difference between version 32 and 33 - Previous - Next
[AMG]: This page gives some '''directory recursion''' scripts.  Each prinbouts the names
of aoll sub andi rectorutines ofor the cuworreking with a
directory, but of course you can changie
them to do whratever chyou need. 



** See Also **
   [glob]:   aA built-in Tcl command, matches files by name.
   [fileutil::traverse]:   a tcll%|%fib module, providing utilities ::for traversing directory]: hierarchies.  
   [rfilecurstil::trave_globrse]:   A [tcllib] module, paroviding utilities ofor thraversing [TdireclX]tory phierarckaghies.
   [for_recursive_glob]:   pPart of the [TclX] package.
   [Matthias Hoffmann - Tcl-Codr_re-Snippetcurs - misc - ve_globx]:   Part of the [TclX] package.
   `[ycgl%|%ycl dobfir iternd]`:   Walk a directory asyAnchronously using using `[ycl coro relay]`.  Supporlts depth and brenadth-first opvera tion and dynamic pruning o`f dirlecutoriesl::find`.
   [Matthias Hoffmann - Tcl-Code-Snippets - misc - globx%|%globx], by [Matthias Hoffmann]:   

   `[ycl%|%ycl dir iter]`:   Traverse a directory hierarchy asynchronously using using `[ycl coro relay]`.  Supports depth-first and breadth-first operation, as well as dynamic pruning of directories.

   [TkTreemap]:   See ''DirTree'' procedure where directories are stored in a nested list


** Description **
[LV]: So what is breadth-first traversal vs depth-first traversal?  Is this
distinguishing between whether one sees the files of a directory first or last?

[AMG]: The following directory tree listing is generated using a depth-first
traversal.

======none
/
/a
/a/1
/a/2
/b
/b/1
/b/2
======

Here's the same tree again generated using a breadth-first traversal.

======
/
/a
/b
/a/1
/a/2
/b/1
/b/2
======

See the difference?  Depth-first single-mindedly concentrates on one branch of
the tree, and only after reaching a leaf node will it back up and try something
else.  Breadth-first takes turns following all branches it encounters.  I'm
aware that my explanation sucks, so instead I'll let
[http://www.jwz.org/xscreensaver/%|%XScreenSaver] talk for me.

[http://www.jwz.org/xscreensaver/screenshots/maze.jpg]
[http://www.jwz.org/xscreensaver/screenshots/coral.jpg]

Maze performs a depth-first traversal, and coral performs a breadth-first
traversal.

Whether files are visited before, after, or intermingled with directories is a
separate issue.  It just depends on your code.  Some relevant terms to explore
are pre-order, in-order, and post-order.

The depth/breadth issue, on the other hand, is determined by whether a stack or
a queue is used to keep track of the directories that have been discovered but
not yet visited.

By the way, all these concepts are useful in expression parsing.  I'd write
more, but work is calling...

----

'''1. Iterative, breadth-first traversal, Tcl 8.5, [[[lappend]]]'''

======
proc ftw_1 {{dirs .}} {
    while {[llength $dirs]} {
        set dirs [lassign $dirs name]
        lappend dirs {*}[glob -nocomplain -directory $name -type d *]
        puts $name
    }
}
======

'''2. Iterative, breadth-first traversal, Tcl 8.5, [[[list]]] + [{*}]'''

======
proc ftw_2 {{dirs .}} {
    while {[llength $dirs]} {
        set dirs [list {*}[lassign $dirs name] {*}[
            glob -nocomplain -directory $name -type d *]]
        puts $name
    }
}
======

'''3. Iterative, breadth-first traversal, Tcl 8.4'''

======
proc ftw_3 {{dirs .}} {
    while {[llength $dirs]} {
        set name [lindex $dirs 0]
        set dirs [concat [lrange $dirs 1 end] [
            glob -nocomplain -directory [lindex $dirs 0] -type d *]]
        puts $name
    }
}
======

'''4. Iterative, depth-first traversal, Tcl 8.5, [[[lassign]]]'''

======
proc ftw_4 {{dirs .}} {
    while {[llength $dirs]} {
        set dirs [lassign $dirs name]
        set dirs [list {*}[
            glob -nocomplain -directory $name -type d *] {*}$dirs]
        puts $name
    }
}
======

'''5. Iterative, depth-first traversal, Tcl 8.5, [[[lreplace]]]'''

======
proc ftw_5 {{dirs .}} {
    while {[llength $dirs]} {
        set name [lindex $dirs 0]
        set dirs [lreplace $dirs 0 0 {*}[
            glob -nocomplain -directory [lindex $dirs 0] -type d *]]
        puts $name
    }
}
======

'''6. Iterative, depth-first traversal, Tcl 8.5, [[[lrange]]]'''

======
proc ftw_6 {{dirs .}} {
    while {[llength $dirs]} {
        set name [lindex $dirs 0]
        set dirs [list {*}[
            glob -nocomplain -directory [lindex $dirs 0] -type d *] {*}[
                lrange $dirs 1 end]]
        puts $name
    }
}
======

'''7. Iterative, depth-first traversal, Tcl 8.4'''

======
proc ftw_7 {{dirs .}} {
    while {[llength $dirs]} {
        set name [lindex $dirs 0]
        set dirs [concat [glob -nocomplain -directory [
            lindex $dirs 0] -type d *] [lrange $dirs 1 end]]
        puts $name
    }
}
======

'''8. Recursive, depth-first traversal'''

======
proc ftw_8 {{name .}} {
    puts $name
    foreach subdir [glob -nocomplain -directory $name -type d *] {
        ftw_8 $subdir
    }
}
======

'''9. Iterative, breadth-first traversal, Tcl 8.5, [[[lassign]]] + [[[concat]]]'''

[AMG]: New test for 2014, showing off how to pop the first list item and append to the list, all in the same line.

======
proc ftw_9 {{dirs .}} {
    while {[llength $dirs]} {
        set dirs [concat [lassign $dirs name] [glob -nocomplain -directory $name -type d *]]
        puts $name
    }
}
======

'''10. Iterative, breadth-first traversal, Tcl 8.5, [[[lassign]]] + [{*}]'''

[AMG]: Like 9, but without [[[concat]]].

======
proc ftw_10 {{dirs .}} {
    while {[llength $dirs]} {
        set dirs [list {*}[lassign $dirs name] {*}[glob -nocomplain -directory $name -type d *]]
        puts $name
    }
}
======

For this application, [[[linsert]]] is too clumsy because it errors when
instructed to insert zero elements; therefore it would have to be wrapped with
an [[[if]]] (or [[[catch]]], hehehe).

I [[[time]]]d the above procedures (sans [[[puts]]]) on a certain directory
tree.  Here are the results:

======
% foreach p [info procs ftw_*] {puts "$p: [time $p 10]"}
ftw_1: 519323.9 microseconds per iteration
ftw_2: 512807.2 microseconds per iteration
ftw_3: 665287.1 microseconds per iteration
ftw_4: 518121.0 microseconds per iteration
ftw_5: 515800.6 microseconds per iteration
ftw_6: 515826.2 microseconds per iteration
ftw_7: 595750.0 microseconds per iteration
ftw_8: 515477.5 microseconds per iteration
======

And a bit of analysis:

   * The fastest two do the least amount of list and variable manipulation.  By no means does recursion make ftw_8 slow.
   * The slowest two are the ones using [[[concat]]].  Yay [{*}]!
   * The fastest implementation is ftw_2, but breadth-first is an unconventional directory traversal.
   * The fastest depth-first implementation is ftw_8, since recursion apparently is quite fast.
   * I don't know why ftw_3 and ftw_7 have different execution times, but in repeated trials the difference is consistent.

[AMG]: Fresh numbers from 2014!  All the above dates back to 2006...  Now with Tcl version 85f74e89a1 [http://core.tcl.tk/tcl/info/85f74e89a1e9af1b87d0bbdbb6fb4b5c95c0cc15], which is somewhere between 8.6.1 and the forthcoming 8.6.2.  By the way, I'm testing a version of the above with all the [[[puts]]] lines removed.

======
ftw_1 : 72062.8 microseconds per iteration
ftw_2 : 70604.1 microseconds per iteration
ftw_3 : 73941.0 microseconds per iteration
ftw_4 : 67606.9 microseconds per iteration
ftw_5 : 71432.7 microseconds per iteration
ftw_6 : 72495.7 microseconds per iteration
ftw_7 : 67910.9 microseconds per iteration
ftw_8 : 68738.3 microseconds per iteration
ftw_9 : 70887.5 microseconds per iteration
ftw_10: 71153.5 microseconds per iteration
======

I added ftw_9 and ftw_10 to show off a one-liner construction, and I ditched the "!= 0" comparisons for the [[[llength]]] calls because the explicit compare with zero produces slower [bytecode]s.


** Reverse Traverse **

Sometimes it's desirable to traverse a hierarchy backwards, such that deeper
vertices are visited first.  One example is when renaming all the directories
and files in a tree.  If parents are renamed before children, the newly-renamed
directories might not get traversed.  Using the example from the top of this
page, here's what a reverse traversal would look like:

======
/a/1
/a/2
/a
/b/1
/b/2
/b
/
======

Here is a procedure that visits the contents in that order:

======
package require fileutil

if 0 {
    this iterative depth-first directory traversal takes care apply $script to
    deeper paths first so that paths aren't invalidated by renaming before they
    can be processed
}
proc reverse_traverse_iter {path cmd} {
    set origpath $path 
    set orignormalized [::fileutil::fullnormalize $path]
    set cursor [list $path]
    while 1 {
        if {$cursor eq {}} {
            break
        }
        set path [join $cursor /] 
        set descend 1
        if {[file type $path] eq {link}} {
            set pathfullnormalized [::fileutil::fullnormalize $path]
            if {[string first $orignormalized/ $pathfullnormalized] == 0} {
                #symlink to another directory in this tree.  avoid cyclic
                #loops.  Don't descend.
                set descend 0
            }
        }
        if {$descend} {
            set children [
                glob -nocomplain -tails -directory $path -types {d hidden} *]
            lappend children {*}[
                glob -nocomplain -tails -directory $path -type d *]
            foreach child $children[set children {}] {
                if {$child ni {. ..}} {
                    dict set dirs {*}$cursor $child {}
                    lappend children $child
                }
            }
        } else {
            set children {}
        }
        if {[llength $children]} {
            lappend cursor [lindex $children 0]
        } else {
            while 1 {
                if {$descend} {
                    set files [
                        glob -nocomplain -directory $path -types hidden *]
                    lappend files {*}[glob -nocomplain -directory $path *]
                    foreach fname $files[set files {}] {
                        if {![file isdirectory $fname]} {
                            lappend files $fname
                        }
                    }
                } else {
                    set files {}
                }
                lappend files $path
                foreach fname $files {
                    {*}$cmd $fname
                }
                dict unset dirs {*}$cursor
                set cursor [lreplace $cursor end end]
                if {$cursor eq {}} {
                    break
                }
                set children [dict get $dirs {*}$cursor]
                set childnames [dict keys $children]
                if {[llength $childnames]} {
                    lappend cursor [lindex $childnames 0]
                    break
                } else {
                    set path [join $cursor /]
                }
                #descend should be 1 for any additional iterations of this
                #loop
                set descend 1
            }
        }
    }
}
======

Example: 

======
reverse_traverse_iter . {apply {fname {
    if {[file isdirectory $fname]} {
        puts "directory: $fname"
    } else {
        puts "file: $fname"
    }
}}} 
======

Of course, the recursive version is shorter, and the [NRE] of modern Tcl takes
the stack pressure: 

======
package require fileutil

proc walkin {path cmd} {
    set normalized [::fileutil::fullnormalize $path]
    set myname [lindex [info level 0] 0]
    set children [glob -nocomplain -directory $path -types hidden *]
    lappend children {*}[glob -nocomplain -directory $path *]
    foreach child $children[set children {}] {
        if {[file tail $child] in {. ..}} {
            continue
        }
        if {[file isdirectory $child]} {
            if {[file type $child] eq "link"} {
                set normalizedchild [fileutil::fullnormalize $child]
                if {[string first $normalized/ $normalizedchild] == 0} {
                    #symlink to a directory in $path.  Avoid cyclic traversal.
                    #Don't descend.
                } else {
                    $myname $child $cmd
                }
            }
        }
        {*}$cmd $child
    }
}
======


** Page Authors **

   [AMG]:   

<<categories>> File | Performance | Algorithm