Additional file commands

Read a text file into a string:

proc readfile filename {
    set fp [open $filename]
    set res [read $fp [file size $filename]]
    close $fp
    set res 
}

JCW: Just out of curiosity... would the minimal code below be equivalent, i.e. is the close indeed automatic in this context?

proc readfile filename {
    read [open $filename] [file size $filename]
}

RS: According to man close, "Channels are automatically closed when an interpreter is destroyed and when the process exits." From this I imply they are not closed automatically elsewhere. A handle of a file open for reading can't do much harm, except leak some memory (and may cause the app to run out of available file descriptors...) Anyway, I've applied the simple rule "if you opened it, close it" ;-)

JCW: Yes, ok, but doesn't the file get closed when (and because) the last reference goes away? Or is bytecode compilation affecting this and leading to a kept reference? Inquiring minds want to know... :o)

Vince: There's a reference in the interpreter (as file channels will show). The only way to get rid of this reference is with close (unless you resort to C code).

Arjen Markus: There is a good reason for the file not closing automatically (the same I guess as with any other programming language): you may want to do more with the file, such as append to the end, replace the contents and so on.

The minimal procedure can even be shorter:

proc readfile filename {
    read [open $filename]
}

as by default read consumes whatever is available.

RS: Yes, but using the default buffer size, which may run slower.. I was taught years ago that one should always specify file size, for performance.

Stephen Trier: That's what I thought, too, but JH says on c.l.t that it isn't true any more. His TclBench (Tcl Normalized Benchmarks) results confirm it.

A one-liner that doesn't leave files open can be made with the help of the K combinator. (Glossary of terms)

proc K {x y} {set x}

proc readfile filename {
    K [read [set fp [open $filename]]] [close $fp]
}

Theo Verelst: I'd be interested knowing whether the read returns at all or within a guaranteed time interval with a valid file, on unix flavours because it might be updated during the read and on windows well, ehh, something may happen to it at all times...

Regardless of whether I close it or the interpreter does. And then I'd like to know which version I got, and whether programs accessing it will deadlock because of such considerations, i.e. maybe the open/close construct serves a potential purpose in such a thinking.

dizzy: A one-liner that doesn't need the K combinator:

proc readfile {filename} {
    return [read [set fp [open $filename]]][close $fp]
}

Touch a file: without modifying the contents, change a file's access time to (now); create it if it doesn't exist, with 0 bytes contents:

proc touch filename {
    close [open $filename a]
} ;# after Don Porter in c.l.t, brought here by RS 

Why not just use

file mtime $filename [clock seconds]

RS: In very new versions, file mtime can be used with a second argument to set the mtime; but not in 8.2.3.

CL: makes explicit that, while well-meaning newcomers to Tcl often think in terms of, for example,

exec touch $filename

the forms above are more portable, secure, performance-preserving, ...


LV: with all the bug fixes that have occurred since Tcl 8.2.3, I hope people upgrade to 8.3.4 soon...

RS: I am actually running 8.3.4 (even compiled it myself ;-), but my Winhelp is still the 8.2.3 one... But I am not hurried to chase after the latest versions - we install Cygwins here that are still on 8.0.4, so our scripts cannot be too demanding ... Also, file mtime .. works only on existing files. To have the full "touch" functionality (create if not exists), the close [open ...] approach is better.. and shorter (in words) too ;-)

LV: However, the close/open has the downside of not having the real touch functionality of being able to set the modification time to whatever time one wants - which file mtime does have.

glennj: the touch proc in tcllib's fileutil package does all this. See https://core.tcl-lang.org/tcllib/doc/trunk/embedded/md/tcllib/files/modules/fileutil/fileutil.md


Determine whether a file is "binary", i.e. contains at least one NUL byte (after a c.l.t hint by Donal Fellows):

proc file'is'binary {filename} {
   set fp [open $filename]
   fconfigure $fp -translation binary
   set sample [read $fp 1024]
   close $fp
   expr {[string first \x00 $sample]>=0}
} ;# RS

Another version, with belt and suspenders, resulting from the same thread: isBinary binary file discriminator


Arjen Markus: Often I read data files as if they were just another Tcl source file (via source. This can get me into problems if special characters, such as [ and ] are part of the contents. Is there a general strategy for working around this? (The background is, that in one application the files I source consist of a mixture of Tcl and C/Java/... code fragments, hence are littered with special characters that are special in various ways. I do have a solution, but something does not quite feel right.)

RS: Hard to tell in that generality. Tcl is so dynamic, with every proc or so you change the language, and much deeper with unknown... You might catch {source ..} untrusted files into a safe interpreter to prevent havoc, and have that interpreter return you the wanted data (this way filtering out possible other side effects).

Benny Riefenstahl: list does the job of quoting content in all cases that are needed to create safe code dynamically. I routinely do persistence as

puts $fp "set some_varname [list $some_data]"

Count lines in one or several files:

proc wc-l files {
    set n 0
    foreach file $files {
        if {![catch {set fp [open $file]}]} {
            incr n [llength [split [read $fp [file size $file]] \n]]
            close $fp
        }
    }
    set n
} ;# RS

Usage example: wc-l [glob *.tcl]


Read a line specified by its number from a text file:

proc gotoline {f n} {
    seek $f 0 ;# Start with first line
    for {set i 1} {$i<$n} {incr i} {
        gets $f 
    }
    if {[eof $f]} {
        error "There is no line $n in this file"
    }
    return [gets $f]
}

There are numerous ways to improve this procedure - for instance store line number from previous call in the global variable and do seek to the start only if needed line is less than current. (from a c.l.t post by Victor Wagner)

---

JPTanguay: A one-liner that uses external commands:

lindex [split [exec cat [file nativename c:/windows/tips.txt] ] \n ] 45
  or 
lindex [split [exec command /c type [file nativename c:/windows/tips.txt] ] \n ] 45

---

Arjen Markus: My favourite technique nowadays is to use interp alias. You can store hidden arguments that way and there is no need to litter the global namespace. See Garbage collection for an example.


Find an executable file from the PATH environment variable:

proc path'separator {} {
    switch -- $::tcl_platform(platform) {
        unix    {return ":"}
        windows {return ";"}
        default {error "unknown platform"}
    }
}
proc path'find filename {
    foreach path [split $::env(PATH) [path'separator]] {
       set try [file join $path $filename]
       if [file readable $try] {return $try}
    }
    error "no such file: $filename"
} ;# RS

Note that this is at the core of the command auto_execok predefined in the tcl script library.

AK: Except that it always assumes ":" as the separator.

RS: From what I see, it's rather constantly ";" - the whole auto_execok looks a bit windows-centric... but it does the job that I wrote the above for, so it's more minimal (=better) to use auto_execok. Thanks!


File line termination:

CR and/or LF? Donal Fellows shows the way:

proc file_lineterm {filename} {
    set fd1 [open $filename r]
    set fd2 [open $filename r];# Avoids most synch problems...
    fconfigure $fd2 -translation binary
    set EOLidx [string length [gets $fd1]]
    close $fd1
    read $fd2 $EOLidx
    set EOLchars [read $fd2 2]
    close $fd2
    if {[string equal $EOLchars "\r\n"]} {
        return "crlf" ;# DOS/Windows
    } elseif {[string equal [string index $EOLchars 0] "\r"]} {
        return "cr"   ;# Mac
    } elseif {[string equal [string index $EOLchars 0] "\n"]} {
        return "lf"   ;# Unix
    } else {
        return "unknown"
    }
}

Lars H 2003-02-27: rather than determining what is the line terminator in the specified file -- determine what Tcl considers to be the line terminator? Answering myself 9 Sep 2004: Yes it does, but the default for Tcl reading files is to treat all of lf, cr, and crlf as a line termination, so you really get the wanted information about which one of those three it is in this file.


File reader:

takes a filename, returns the lines of that file as a list. Trivial algorithm, but note the "whitespace sugar": mentions of a variable are vertically aligned to indicate data flow ;-)

proc file_lines {fn} {
    set f [open $fn r]
    set t [read $f [file size $fn]]
    close $f
    split $t \n
} ;#RS

Normalize line endings on text file: Software other than Tcl can have all sorts of problemes when text files were edited on a different platform, because line endings are sometimes CR-LF, sometimes LF, sometimes CR, and some editors normalize part of them and others not... Tcl reads lines in any convention, and writes in the local convention, so this simple procedure helps, where even dos2unix fails:

proc normalizeTextfile filename {
    set fp [open $filename]
    set data [read $fp]
    close $fp
    set fp [open $filename w]
    # don't add extra newline
    puts -nonewline $fp $data
    close $fp
} ;#RS

Determine EOL Sequence Length:

MG has just needed to find out how long the EOL sequence in a particular file was. Checking fconfigure $fid -translation didn't help (as it was just 'auto' on the read, and always crlf (Win XP) on the write, even if the file used Unix line-endings). Came up with this (which is possibly obvious, but I'm rather pleased with myself anyway;)

proc lineEndingSize file {
    set fid [open $file r]
    chan gets $fid line
    set size [expr {[chan tell $fid] - [string bytelength $line]}]
    close $fid
    return $size;
}

DKF: That can go wrong. The problem is that you need to use the encoded length of the string read, and some encodings have multiple ways of encoding a particular character. This is all rather nasty; even guessing based on the value of fconfigure -encoding can go wrong! So instead try this:

proc lineEndingSize file {
   set f [open $file]
   gets $f line
   set after [tell $f]
   seek $f 0
   read $f [string length $line]
   set before [tell $f]
   close $f
   return [expr {$after - $before}]
}

MG: I'll try that instead, thanks :)


Resolve symbolic links through recursion: Recursion is helpful in trying to follow a series or chain of symlinks to its final destination. Recursion stops when anything that's not a link is encountered. -CJU

proc resolve_link {f} {
    if ![file exists $f] {
         error "bad/nonexistant/inaccessible/evil link: $f"
         return {}
    }
    if {[file type $f] == "link"} {
        set target [file readlink $f]
        if {[file pathtype $target] == "absolute"} {
            resolve_link $target
        } else {
            resolve_link "[file dirname $f]/$target"
        }
    } else {
        return $f
    }
}

Example:

set my_target [resolve_link $link]

Check whether a channel is still open:

proc channelOpen? ch {expr {![catch {eof $ch}]}} ;# RS

Another version, which is several times faster on channels which no longer exist, but a little slower on ones that do...

proc channelOpen?2 {ch} {string equal "" [file channels $ch]} ;# MG

Check whether two files are equal (functional, and with K combinator - RS):

proc files_equal {file1 file2} {
    expr {[file size $file1] == [file size $file2]
          && [readfile $file1] eq [readfile $file2]}
}
proc readfile {filename} {
    set fp [open $filename]
    K [read $fp] [close $fp]
}
proc K {a b} {set a}

Cute names for file I/O:

proc < filename {
    set fp [open $filename]
    K [read $fp] [close $fp]
}
proc > {filename string} {
    set fp [open $filename w]
    puts $fp $string
    close $fp
}
proc >> {filename string} {
    set fp [open $filename a]
    puts $fp $string
    close $fp
}

Examples:

foreach file [glob *.tcl] {
    > $file.bak [< $file]
    >> logfile "copied $file"
}

Iterate over the lines of a text file

proc fforeach {_var filename body} {
    upvar 1 $_var var
    set f [open $filename]
    while {[gets $f var] >= 0} {uplevel 1 $body}
    close $f
} ;# RS

Example, with simple grep-like functionality:

fforeach line try.txt {
    if {[regexp error $line]} {puts $line}
}

See Also

Building a file index
How do I remove one line from a file?
Indexed file reading
indexing a flat file, and a readBytes proc
inserting lines in between a file and not end of line
random line from file
Shuffle a file
How do I read and write files in Tcl