**Slurp! -- reading a file** The Tcl file commands are [file], [open], [close], [gets] and [read], and [puts], [seek], [tell], and [eof], [fblocked], [fconfigure], Tcl_StandardChannels(3), [flush], [fileevent], [filename]. One way to get file data in Tcl is to 'slurp' up the file into a text variable. This works really well if the files are known to be small. ====== # Slurp up the data file set fp [open "somefile" r] set file_data [read $fp] close $fp ====== Now you can [split] file_data into lines, and process it to your heart's content. NOTE: The mention of ''split'' is important here- input data is seldom well-behaved/structured, and needs to be processed in this way to ensure that any potential Tcl metacharacters are appropriately quoted into list format. ====== # Process data file set data [split $file_data "\n"] foreach line $data { # do some line processing here } ====== '''However''', the split ''transforms'' the data, so that one may have lost some of the original contents of the line. Anyone have techniques for ensuring that the data remains '''exactly''' the same (other than appropriate quoting)? [NEM] [split] doesn't lose any data (except the newlines). All of the data in the file is still completely present in the list that split returns. In particular, [[join [[split $data \n]] \n]] will result in the same file contents. [LV] NEM, perhaps the writer was thinking about this scenario. ====== $ cat /tmp/testdata.txt This is a test. How in the world will this work? If I have a [ or I have a ] and other such things, will they remain? And what about extra spaces or even a tab? $ tclsh8.6 % set fd [open "/tmp/testdata.txt" "r"] file4 % set a [read $fd] This is a test. How in the world will this work? If I have a [ or I have a ] and other such things, will they remain? And what about extra spaces or even a tab? % set b [split $a] This is a test. How in the world will this work? If I have a {[} or I have a \] and other such things, will they remain? And what about {} {} extra spaces or even a tab? {} {} % set c [join $b] This is a test. How in the world will this work? If I have a [ or I have a ] and other such things, will they remain? And what about extra spaces or even a tab? % puts $c This is a test. How in the world will this work? If I have a [ or I have a ] and other such things, will they remain? And what about extra spaces or even a tab? % set fo [open "/tmp/testo.txt" "w"] file5 % puts $fo $c % close $fo % exit srv20 (178) $ cmp /tmp/testdata.txt /tmp/testo.txt /tmp/testdata.txt /tmp/testo.txt differ: char 16, line 1 ====== Most of the newlines in the original file are gone, as is the tab that was in /tmp/testdata.txt right before the word tab. [LV] I'm not certain but I think the original poster of that question is asking ''how would one read through file_data (in this example) and process each line of input, without the use of split?'' One could use regular expressions, I guess, to work through the line, obtaining lines of data to process. Any other techniques that would be useful in this scenario? RWT It even works well for large files, but there is one trick that might be necessary. Determine the size of the file first, then '[read]' that many bytes. This allows the channel code to optimize buffer handling (preallocation in the correct size). I don't know anymore who posted this first. But you only need this for Tcl 8.0. This is something for the [Tcl Performance] page as well. ====== # Slurp up the data file, optmimized buffer handling # Only needed for Tcl 8.0 set fsize [file size "somefile"] set fp [open "somefile" r] set data [read $fp $fsize] close $fp ====== [dizzy] [TclX] has a read_file command: ====== set data [read_file -nonewline $filename] set bytes [read_file $filename $numbytes] ====== Also under [Unix] if you are not concerned about performance you can do: set data [exec cat $filename] [AK] [LV] AK, what would be the advantage of using [exec] and cat to read in the file in this manner? Just curious. For the simple task of reading a whole file and splitting it into a list, there is a [critcl] version over at [loadf]. The comment above stating that you must use split to make sure you can deal with the contents is only true if you want to use list commands (such as [foreach] in the example). If you always treat the data as a string then you don't have to worry about unbalanced braces and such. One way to avoid split if you are just searching for stuff on each line is to use the -line option of regexp which forces matching line by line, and combine it with the -inline switch to return matches as a list instead of placing them in variables and iterate over that list e.g. ====== foreach {fullmatch submatch1 submatch2} [regexp -line -inline $exp $str] { # process matches here - don't have to worry about skipping lines # because only matches make it here. } ====== This assumes the exp contains an expression with 2 paren'd subsections. [LV] With regards to the comment about exp containing an expression - do you mean like: set exp {(a).*(b)} and also, should that $str be $data (if set in the same universe as the read example above? [BBH] ''glennj'' - Note that you must use the ''-all'' switch, otherwise the foreach loop will only execute once (on the first match): ====== foreach {line sub1 sub2} [regexp -all -line -inline $exp $str] {...} ====== [LV] same here, right - exp as I mention above, and str should be data? What if what I am looking for is a single regular expression, or even a constant string? If you want to receive the input from the file line by line, without having to split it and worry about splitting and losing brackets is to use fconfigure, and then read the data line by line. e.g. ====== # read the file one line at a time set fp [open "somefile" r] fconfigure $fp -buffering line gets $fp data while {$data != ""} { puts $data gets $fp data } close $fp ====== This is also very useful for command-response type network sockets (POP, SMTP, etc.) mailto:douglas@networkhackers.com [ZB] 2009-11-18 I've got a feeling the above example has serious flaw: it'll stop reading at first "empty" line in the file. My proposal would be rather: ====== # read the file one line at a time set fp [open "somefile" r] while { [gets $fp data] >= 0 } { puts $data } close $fp ====== Not sure, is "fconfigure $fp -buffering line" really necessary. Back to the original topic of reading in data: it just astonished [CL] to search the Wiki for what used to be the most common input idioms, and not find them '''at all'''. Before memory seemed so inexpensive, input was commonly done as ====== set fp [open $some_file] while {-1 != [gets $fp line]} { puts "The current line is '$line'." } ====== Newcomers often try to write this with [eof], and generally confuse themselves in the process [http://phaseit.net/claird/comp.lang.tcl/fmm.html#eof]. It calls for ====== set fp [open $some_file] while 1 { set line [gets $fp line] if [eof $fp] break puts "The current line is '$line'." } ====== or equivalent. [smh] Minor fix CL in your 2nd example - the line set line [gets $fp line] should read either set linelength [gets $fp line] or simply gets $fp line due to the syntax of [gets] which when passed a 2nd argument reads the line into the named variable and returns line length. ---- **Writing a file** I just noticed that there isn't an example of writing a file, so here goes: '''note:''' When you are writing into a file , the contents of the file will not be visible till the end of the execution of the TCL script. so if you are going to use "tail -f " midway to check out , I am sorry to say you will find that the size of the file will be 0. Its contents are visible only after the file has finished executing. [LV] Of course, if there is a need to see the file while it is open, just be certain your code invokes [flush], which will force the output to the file instead of waiting until there is a large enough chunk of data to force out. ====== # create some data set data "This is some test data.\n" # pick a filename - if you don't include a path, # it will be saved in the current directory set filename "test.txt" # open the filename for writing set fileId [open $filename "w"] # send the data to the file - # failure to add '-nonewline' will result in an extra newline # at the end of the file puts -nonewline $fileId $data # close the file, ensuring the data is written out before you continue # with processing. close $fileId ====== A simple tk example using the text widget is at [Text Widget Example]. ''so'' 4/21/01 ---- But a file without newline at end of last line is not considered too well-behaved. Since in this example, a plain string is [puts]'ed, I'd advocate a newline here. It's different when you save a text widget's content, where a newline after the last line is always guaranteed; there the ''-nonewline'' switch is well in place. ''[RS]'' [LV] RS, in the example above, $data has a newline in it, so it should be ''all good''. (About the extra newline thing... under Unix, (possibly POSIX) all text files are supposed to end in a blank line, hence the "extra" newline. This is the proper behavior, even if it isn't technically required for most things these days. Unix text editors and tools still enforce it, however. Should it be considered a bug if Tcl behaves this way under Win32? --CJU) [LV] This comment puzzles me. I have never seen a requirement for text files to end in blank lines. I have seen a few broken programs which generated an error if the last line in a file didn't end in a newline - but that doesn't create a blank line. And I don't understand the last question - if Tcl behaves which way under Win32? ---- In situations where output is line buffered (default for text), `puts -nonewline` does not immediately deliver the output. One solution, if this is a problem, is to add `[flush] $fileId`. An alternative is to [[fconfigure $fileId -buffering none]] to force automatic flushing whenever data is written. [Andreas Kupries]. ---- **Using [Lisp] pattern for reading and writing.* ====== I've found quite useful to use this pattern to work with files proc with-open-file {fname mode fp block} { upvar 1 $fp fpvar if {[string equal [string index $mode end] b]} { set mode [string range $mode 0 end-1] set binarymode 1 } set fpvar [open $fname $mode] if {$binarymode} { fconfigure $fpvar -translation binary } uplevel 1 $block close $fpvar } ====== Usage is like this: ====== with-open-file $fname w fp { puts -nonewline $fp $data puts $fp "more data } with-open-file $fname r fp { set line [get $fp] } ====== This scheme hides implementation details and therefore allows modifying of file-handling at run-time (by adjusting with-open-file). More about this at [Emulating closures in Tcl] page. --[mfi]. ---- **Reading and writing binary data and unusual encodings** Anyone have the know-how to add to this page to cover reading and writing binary data and data from various languages (ie interacting with the encoding system)? [RS]: If the encoding is known, just add the command ====== fconfigure $fp -encoding $enc ====== between the [[open]] and the first read access. From then on, it's transparent like reading pure ASCII. See also [Unicode file reader], [A little Unicode editor]. [AK]: Regarding the other part of the question see [Working with binary data]. ---- Often new programmers stop by comp.lang.tcl and ask "how can I replace the information in just one line of Tcl". Tcl uses something called standard I/O for the input and output to text and standard binary files (as does many other applications). Tcl itself does not provide a simple method for doing this. Using tcl's low level file input/output, one can do a very crude method of this, making use of [open], [seek], [puts], and [close]. This would allow you to replace a string of N bytes with another string of the same number of bytes. If the OS being used has other means of accessing and updating files, the developer will likely need to find an OS specific solution for this. There are a number of extensions for interacting with relational and other types of databases (such as Oracle, Sybase, etc.). ---- I always worried of losing data, having to filter out or replace all command chars before using one of .tcl's list commands (like e.g. lindex, lrange, lsearch) but i also never wanted to lose the convenience they offer, so i wrote a proc to use instead for my IRC bot i wrote in tclsh, for the lindex command, but it will not exactly be giving the same output but may return also the " " (space) which is part of the requested lindex. By adding ! to the number ar- gument it also returns the space after the requested lindex. I'am not sure whether this also works for other types of data, i only tested it with: smtp, irc, and xml/rss, and it works fine. This is my proc: ====== proc xindex { data num } { set index [set 0 [set 1 [set 2 [set 3 0]]]] if {[string index $num 0] == "!"} { set 1 ! ; set num [string range $num 1 end] } while {$0 <= [string length $data] && $index != $num} { set 3 [string index $data $0] if {[string index $data [expr [expr $0] +1]] == " " && $3 != " "} { incr index } ; set 2 [incr 0] } if {$num != 0} { set data [string range $data 1 end] } while {[string index $data $2] == " "} { incr 2 } while {[string index $data $2] != " " && $2 <= [string length $data] } { incr 2 } if {[string index $data $2] == " " && $1 != "!"} { set 2 [expr [expr $2] - 1] } return [string range $data $0 $2] } ====== I hope someone may have benefit from it, it still works fine for me, but i have never used it on other types of data i specified. - Obzolete ---- [overloading file handling functions] <> Tutorial | File