'''`[https://github.com/tcllab/svnsync/tree/master/globfind%|%globfind]`''' is designed as a fast and simple alternative to `[fileutil]::find`. ** News ** [SEH] 2022-09-07: globfind 2.0 released.: globfind is a directory hierarchy search utility. It takes advantage of glob's ability to use multiple patterns to scan deeply into a directory structure in a single command, hence the name. Version 2.0 is a rewrite from scratch to be faster and more compact, featureful and error-resilient. On searches of large directory spaces for files matching a glob pattern, globfind typically runs about three times faster than fileutil::findByPattern, and about 150% of the speed of GNU find. Support for Tcl versions before 8.5 has been dropped. [https://wiki.tcl-lang.org/revision/globfind?V=21%|%Link to previous version of page] `fileutil::find` is useful but it has several deficiencies: * On [Microsoft Windows%|%Windows], hidden files are mishandled. * On Windows, checks to avoid infinite loops due to nested symbolic links are not done. * On Unix, nested loop checking requires a "file stat" of each file/dir encountered, a significant performance hit. * The basedir from which the search starts is not included in the results, as it is with GNU find. * If the basedir is a file, it is returned in the result not as a list element (like glob) but as a string. * The `[fileutil%|%fileutil::find]` calls itself recursively, and thus risks running into interp recursion limits for very large systems. * fileutil.tcl contains three separate instantiations of `find` for varying os's/versions. Maintenance nightmare. `globfind` eliminates all the above deficiencies. It checks for nested symbolic links in a platform-independent way and scans directory hierarchies without recursion. For speed and simplicity, `globfind` takes advantage of [glob%|%glob's] ability to use multiple patterns to scan deeply into a directory structure in a single command, hence the name. Its calling syntax is the same as `[fileutil%|%fileutil::find]`, so with a name change it could be used as a drop-in replacement: ====== Usage: globfind ?basedir? ?filtercmd? ?switches? Options: basedir - the directory from which to start the search. Defaults to current directory. filtercmd - Tcl command; for each file found in the basedir, the filename will be appended to filtercmd and the result will be evaluated. The evaluation should return a boolean value; only files whose return code is true will be included in the final return result. ex: {file isdir} switches - The switches will "prefilter" the results before the filtercmd is applied. The available switches are: -depth - sets the number of levels down from the basedir into which the filesystem hierarchy will be searched. A value of zero is interpreted as infinite depth. -pattern - a glob-style filename-matching wildcard. ex: -pattern *.pdf -types - any value acceptable to the "types" switch of the glob command. ex: -types {d hidden} -redundancy - eliminates redundant listing of real files that may occur due to symbolic links that link to directories within basedir (at the cost of slower execution). Stores names of such symbolic links in ::fileutil::globfind::redundant_files. Sets ::fileutil::globfind::REDUNDANCY to 1 if redundancies found, otherwise 0. ====== ** See Also ** [Matthias Hoffmann - Tcl-Code-Snippets - misc - globx]: [getfiles cached]: [JOB] - 2016-07-12 20:07:18: Relies on the same glob statement plus the ability to store search results in an additional cache file. ** Misc ** [AQI] 2016-07-16: `rglob` is another much faster alternative to `[fileutil]::find`. ====== proc rglob {basedir pattern} { # Fix the directory name, this ensures the directory name is in the # native format for the platform and contains a final directory seperator set basedir [string trimright [file join [file normalize $basedir] { }]] set fileList {} # Look in the current directory for matching files, -type {f r} # means ony readable normal files are looked at, -nocomplain stops # an error being thrown if the returned list is empty foreach fileName [glob -nocomplain -type {f r} -path $basedir $pattern] { lappend fileList $fileName } # Now look for any sub direcories in the current directory foreach dirName [glob -nocomplain -type {d r} -path $basedir *] { # Recusively call the routine on the sub directory and append any # new files to the results set subDirList [rglob $dirName $pattern] if { [llength $subDirList] > 0 } { foreach subDirFile $subDirList { lappend fileList $subDirFile } } } return $fileList } ====== '''example:''' ======none time {util::rglob [pwd] *.tcl} 100 1505.3 microseconds per iteration time {fileutil::findByPattern [pwd] *.tcl} 100 8957.71 microseconds per iteration both provide the same result for basic glob matches its even better on larger file structures time {fileutil::findByPattern [pwd] *.tcl} 10 2696438.9 microseconds per iteration time {util::rglob [pwd] *.tcl} 10 277771.5 microseconds per iteration ====== ** Page Authors ** [Gerald Lester]: [PYK]: <> Package | File | find files