'''Slurp! -- reading a file'''

The basic file commands are [open], [close], [gets] and [read], and [puts]. There are also [seek], [tell], and [eof]. Additionally, there are the various subcommands of the [file] command, as well as [fblocked], [fconfigure], Tcl_StandardChannels(3), [flush], [fileevent], [filename].


One way to get file data in Tcl is to 'slurp' up the file into a text variable.  This works really well if the files are known to be small.  

     #  Slurp up the data file
     set fp [open "somefile" r]
     set data [read $fp]
     close $fp

Now you can [split] the file data into lines, and process it to your heart content.  NOTE: The mention of ''split'' is important here- input data is seldom well-behaved/structured, and needs to be processed in this way to ensure that any potential Tcl metacharacters are appropriated quoted into list format.  '''However''', the split ''transforms'' the data, so that one may have lost some of the original contents of the line.   Anyone have techniques for ensuring that the data remains '''exactly''' the same (other than appropriate quoting)?

     #  Process data file
     set data [split $data "\n"]
     foreach line $data {
          # do some line processing here
     }

RWT

It even works good for large files, but there is one trick that might be necessary. Determine the size of the file first, then '[read]' that many bytes. This allows the channel code to optimize buffer handling (preallocation in the correct size). I don't know anymore who posted this first. This is something for the [Tcl Performance] page as well.

     #  Slurp up the data file, optmimized buffer handling
     set fsize [file size "somefile"]
     set fp [open "somefile" r]
     set data [read $fp $fsize]
     close $fp

[AK]

The comment above stating that you must use split to make sure you can
deal with the contents is only true if you want to use list commands (such as [foreach] in the example).  If you always treat is as a string then you don't have to worry about unbalanced braces and such. One way to avoid split if you are just searching for stuff on each line is to use the -line option of regexp which forces matching line by line, and combine it with the -inline switch to return matches as a list instead of placing them in variables and iterate over that list e.g.
  foreach {fullmatch submatch1 submatch2} [regexp -line -inline $exp $str] {
      # process matches here - don't have to worry about skipping lines
      # because only matches make it here.
  }
This assumes the exp contains an expression with 2 paren'd subsections.

[BBH]

'''[glennj]''' - Note that you must use the ''-all'' switch, otherwise the foreach loop will only execute once (on the first match):
  foreach {line sub1 sub2} [regexp -all -line -inline $exp $str] {...}

----

'''Writing a file'''

I just noticed that there isn't an example of writing a file, so here  goes:

        # create some data
        set data "This is some test data.\n"
        # pick a filename - if you don't include a path,
        #  it will be saved in the current directory
         set filename "test.txt"
        # open the filename for writing
         set fileId [open $filename "w"]
        # send the data to the file -
        #  failure to add '-nonewline' will result in an extra newline
        # at the end of the file
        puts -nonewline $fileId $data
        # close the file, ensuring the data is written out before you continue
        #  with processing.
        close $fileId

A simple tk example using the text widget is at [Text Widget Example].

''so'' 4/21/01
----
But a file without newline at end of last line is not considered too well-behaved. Since in this example, a plain string is [puts]'ed, I'd advocate a newline here. It's different when you save a text widget's content, where a newline after the last line is always guaranteed; there the ''-nonewline'' switch is well in place. ''[RS]''

----
Also note that when ''-nonewline'' is used, the data might not 
reach the file immediately.  In the example, the output buffer is flushed when the file is closed, but if closing the file happens 
later in the program, it might be a problem.  I believe adding a [[[flush] $fileId]] everytime ''-nonewline'' is used is a good idea 
(unless you ''want'' to delay writing, of course). -- [Peter Lewerin]

----
An alternative to the above is to [[fconfigure $fileId -buffering none]] to force automatic flushing whenever data is written. [Andreas Kupries].

----
Configuring away buffering ''might'' cause confusion in a larger script because of its action-from-a-distance characteristics. -- [Peter Lewerin]

----
'''Using Lisp pattern for reading and writing'''
I've found quite useful to use this pattern to work with files
    proc with-open-file {fname mode fp block} {
	upvar 1 $fp fpvar
	set fpvar [open $fname $mode]
	#fconfigure $fpvar -translation binary
	uplevel 1 $block
	close $fpvar
    }

Usage is like this:
    with-open-file $fname w fp {
	puts -nonewline $fp $data
        puts $fp "more data
    }
    with-open-file $fname r fp {
         set line [get $fp]
    }

This scheme hides implementation details and therfore allows modifying of file-handling at run-time (by adjusting with-open-file). --mfi.

----
'''Reading and writing binary data and unusual encodings'''

Anyone have the know-how to add to this page to cover reading and writing binary data and data from various languages (ie interacting with the encoding system)?

RS: If the encoding is known, just add the command
 fconfigure $fp -encoding $enc
between the [[open]] and the first read access. From then on, it's transparent like reading pure ASCII. See also [Unicode file reader], [A little Unicode editor].
----

Often new programmers stop by comp.lang.tcl and ask "how can I replace
the information in just one line of Tcl".  Tcl uses something called
standard I/O for the input and output to text and standard binary files
(as does many other applications).  Tcl itself does not provide a simple
method for doing this.  Using tcl's low level file input/output, one can do a very crude method of
this, making use of [open], [seek], [puts], and [close].  This would
allow you to replace a string of N bytes with another string of the same
number of bytes.

If the OS being used has other means of accessing and updating files, the developer will likely need to find an OS specific solution for this. 
There are a number of extensions for interacting with relational and other types of databases (such as Oracle, Sybase, etc.).