Version 35 of fileutil

Updated 2007-05-31 05:58:24 by Laif

Documentation can be found at http://tcllib.sourceforge.net/doc/fileutil.html


Currently the fileutil package contains a find, grep, and cat. Other procs that would be useful to add would include wc, tee, head, tail, and perhaps some awk'ish type functions ala Tclx.

For more on grep, see "NR-grep: A Fast and Flexible Pattern Matching Tool" [L1 ].


Perhaps even some code like Glenn Jackman's:

    proc touch {filename {time ""}} {
        if {[string length $time] == 0} {set time [clock seconds]}
        file mtime $filename $time
        file atime $filename $time
    }

glennj: This proc has been accepted into tcllib 1.2: http://tcllib.sourceforge.net/doc/fileutil.html

US Unix-like touch:

    proc touch {filename {time ""}} {
        if {![file exists $filename]} {
           close [open $filename a]
           }
        if {[string length $time] == 0} {set time [clock seconds]}
        file mtime $filename $time
        file atime $filename $time
    }

What other file-related procs would be useful?


2003-11-28 VI Nice of you to ask. There's a list above, other than that: tail -f, split, join. I use tkcon as my main shell on a wimpy laptop. Fewer dlls loaded is good..


2003-12-16 SS Trying to improve over the Tcl implementation of wc in the Great Language Shootout I wrote this, that seems half in execution time against big files:

 set text [read stdin]
 set c [string length $text]
 set l [expr {[llength [split $text "\n\r"]]-1}]
 set T [split $text "\n\r\t "]
 set w [expr {[llength $T]-[llength [lsearch -all -exact $T {}]]-1}]
 puts "\t$l\t$w\t$c"

Output seems to be identical to GNU's wc command.


SEH 20060723 -- The proc fileutil::find is useful, but it has several deficiencies:

  • On Windows, hidden files are mishandled.
  • On Windows, checks to avoid infinite loops due to nested symbolic links are not done.
  • On Unix, nested loop checking requires a "file stat" of each file/dir encountered, a significant performance hit.
  • The basedir from which the search starts is not included in the results, as it is with GNU find.
  • If the basedir is a file, it is returned in the result not as a list element (like glob) but as a string.
  • The proc calls itself recursively, and thus risks running into interp recursion limits for very large systems.
  • fileutil.tcl contains three separate instantiations of proc find for varying os's/versions. Maintenance nightmare.

The following code eliminates all the above deficiencies. It checks for nested symbolic links in a platform-independent way, and scans directory hierarchies without recursion.

For speed and simplicity, it takes advantage of glob's ability to use multiple patterns to scan deeply into a directory structure in a single command, hence the name. Its calling syntax is the same as fileutil::find, so with a name change it could be used as a drop-in replacement:

 proc globfind {{basedir .} {filtercmd {}}} {
    set depth 16
    set filt [string length $filtercmd]
    set basedir [file normalize $basedir]
    file stat $basedir fs
    set linkName $basedir
    while {$fs(type) eq "link"} {
       if {[catch {file stat [set linkName [file normalize [file link $linkName]]] fs}]} {
          break
       }
    }
    if {$fs(type) eq "file"} {
       set filename $basedir
       if {!$filt || [uplevel $filtercmd [list $filename]]} {
          return [list $filename]
       }
    }
    set globPatternTotal {}
    set globPattern *
    set incrPattern /*
    for {set i 0} {$i < $depth} {incr i} {
       lappend globPatternTotal $globPattern
       append globPattern $incrPattern
    }

    lappend checkDirs $basedir
    set returnFiles {}
    set redo 0
    set terminate 0
    set hidden {}
    while {!$terminate} {
       set currentDir [lindex $checkDirs 0]
       if {!$redo} {
          set allFiles [eval [list glob -directory $currentDir -nocomplain] \
                $hidden $globPatternTotal]
       }
       set redo 0
       set termFile [lindex $allFiles end]
       set termFile [lrange [file split $termFile] [llength [file split $currentDir]] end]
       if {![llength $hidden]} {
          set checkDirs [lrange $checkDirs 1 end]
       }
       foreach test {checkdirs length duplicate recursion prune} {
          switch $test {
             checkdirs {
                set afIndex [llength $allFiles]
                incr afIndex -1
                for {set i $afIndex} {$i >= 0} {incr i -1} {
                   set cdir [lindex $allFiles $i]
                   if {[llength [lrange [file split $cdir] [llength [file split $currentDir]] end]] < $depth} {
                      break
                   }
                   file stat $cdir fs
                   set linkName $cdir
                   while {$fs(type) eq "link"} {
                      if [catch {
                         file stat [set linkName [file normalize [file link $linkName]]] fs
                      }] {break}
                   }
                   if {$fs(type) eq "directory"} {
                      lappend checkDirs $cdir
                   }
                }
             }                                   {
             length {
                if {[llength $termFile] < $depth} {
                   break
                }
             }
             duplicate {
                set recurseTest 0
                set dupFile [lindex $allFiles end]
                set dupFile [lrange [file split $dupFile] [llength [file split $basedir]] end]
                set dupFileEndDir [expr [llength $dupFile] - 2]
                if {[lsearch $dupFile [lindex $dupFile end-1]] < $dupFileEndDir} {
                   set recurseTest 1
                }
             }
             recursion {
                if {!$recurseTest} {continue}
                if {($hidden == {})} {
                   set type [list -types l]
                } else {
                   set type [list -types "hidden l"]
                }

                set linkFiles {}
                set linkDir $currentDir
                while 1 {
                   set linkFiles [concat $linkFiles [eval \
                         [list glob -directory $linkDir -nocomplain] $type $globPatternTotal]]
                   if {$linkDir eq $basedir} {break}
                   set linkDir [file dirname $linkDir]
                }
                array unset links
                set linkFiles [lsort -unique $linkFiles]
                foreach lf $linkFiles {
                   set ltarget [file normalize [file readlink $lf]]
                   if {[array names links -exact $ltarget] ne {}} {
                      lappend pruneLinks $lf
                      set redo 1
                   }
                   array set links "$ltarget $lf"
                }
             }
             prune {
                if {![info exists pruneLinks]} {
                   continue
                }
                set afIndex [llength $allFiles]
                incr afIndex -1
                set cdIndex [llength $checkDirs]
                incr cdIndex -1
                set rfIndex [llength $returnFiles]
                incr rfIndex -1
                foreach pl $pruneLinks {
                   for {set i $afIndex} {$i >= 0} {incr i -1} {
                      set af [lindex $allFiles $i]
                      if {![string first $pl/ $af]} {
                         set allFiles [lreplace $allFiles $i $i]
                      }
                   }
                   for {set i $cdIndex} {$i >= 0} {incr i -1} {
                      set cd [lindex $checkDirs $i]
                      if {![string first $pl/ $cd]} {
                         set checkDirs [lreplace $checkDirs $i $i]
                      }
                   }
                   for {set i $rfIndex} {$i >= 0} {incr i -1} {
                      set rf [lindex $returnFiles $i]
                      if {![string first $pl/ $rf]} {
                         set returnFiles [lreplace $returnFiles $i $i]
                      }
                   }
                }
                unset pruneLinks
             }
             default {}
          }
       }
       if {$redo} continue
       if {$hidden eq {}} {
          set hidden [list -types hidden]
       } else {
          set hidden {}
          if {[llength $checkDirs] == 0} {
             set terminate 1
          }
       }
       set returnFiles [concat $returnFiles $allFiles]
    }
    set filterFiles {}
    foreach filename [lsort -unique [linsert $returnFiles end $basedir]] {
       if {!$filt || [uplevel $filtercmd [list $filename]]} {
          lappend filterFiles $filename
       }
    }
    return $filterFiles
 }

LV People who write utilities that seem like they would be useful should submit them to the tcllib sf.net feature request facility, so the maintainers can take a look at them.


gavino posted a question on comp.lang.tcl:

"I can not figure out the [globfind] syntax to limit it to finding say .pdf files. ... please someone post and [sic] example."

and Gerald Lester replied:

 proc PdfOnly {fileName} {
     return [string equal [string tolower [file extension $fileName] .pdf]
 }

 set fileList [globfind $dir PdfOnly] 

SEH 20070317 -- A simpler alternative:

 set fileList [globfind $dir {string match -nocase *.pdf}]

It should be noted by those who are not familiar with unix - that even in windows xp, if ::fileutil::find encounters a folder named with a single tilde (~), it will append the contents of the person's home directory to the search results.


See also Unixy minitools


Category Package, subset Tcllib, Category File