Version 17 of Text processing tips

Updated 2012-11-13 05:39:31 by pooryorick

See Also

Continued Lines

AM I had a problem with files that contain "continued lines". Here is a sketch:

line with info \
      continued on the next line (see the backslash) \
      and the info is: Aha=BBBB
another line with info - here the info is: Aha=CC

I needed to extract information from the complete lines. Now usually I read files line by line and analyse the lines one by one. You can not do that with this type of layout. Or can you? Here is my little trick:

set contents [read $infile] 
set contents [string map [list "\\\n" " "] $contents]
foreach line [split $contents \n] {
   .. process the line ...
}

This little fragment of code=:

  • reads the complete file (in my case they were not very large)
  • replaces the trailing backslash (and the newline) by a single space
  • splits the contents into separate lines again

No need to check if the line is complete or not - just use a few commands.

Trailing Newline

CLN When you get the contents of a text widget, you get an extra trailing newline. If you read the contents of a file, insert it in the widget, and just save those contents, you'll add a blank line at the end of the file for each save. The solution is to save one less character than [$text get 1.0 end] returns, something like [puts $fid [string range [$text get 1.0 end] 0 end-1]]].

ECS: Why not this: [puts $fid [$text get 1.0 end-1c]]]?

CLN Oops. I guess that'll work, too (though I haven't verified either).

rdt Well, isn't this because your puts is adding the newline? So wouldn't 'puts -nonewline $text' just do the job?

CLN No good deed goes unpunished! ;-) In my too-quick example, both the get from the text widget and the [puts] added newlines. Try this:

% text .t
.t
% pack .t
% .t insert end "Foo"
% string length [.t get 1.0 end]
4

So, I guess to write out only what you see in the text widget, you'd have to do:

puts -nonewline [.t get 1.0 end-1c]

RS: Note that text files not ending in a newline are considered ill-behaved, e.g. by diff...

Evaluate a Data Format

schlenk If a data file or text is already quite similar to a Tcl program one can sometimes easily map it to a Tcl program and just execute it. One Example for this:

A plotter data file like this:

 ;PU 640, 6900
 ;PD   640,  6909
 ,   640,  6913
 ,   640,  6917
 ,   640,  6921
 ,   640,  6924
 ;PU   641,  6928
 ;PD   641,  6932
 ,   642,  6936
 ,   643,  6940
 ,   644,  6944
 ,   645,  6947
 ,   646,  6951

Looks already quite similar to a Tcl program, we just need to reformat it a little bit. This does the trick:

 set data [string map {  \n "" ; \n  , "" } $data]

Now we have to setup a nice evaluation environment, so we do not get surprised:

proc dummy_unknown {args} {return}
proc PD {args} { 
    foreach {x y} $args { puts "PenDown ( $x , $y )" }
}
proc PU {args} {
    foreach {x y} $args { puts "PenUp ( $x , $y )" }
}
set i [interp create -safe ]
$i eval {namespace delete ::}
interp alias $i unknown {} dummy_unknown
interp alias $i PD {} PD
interp alias $i PU {} PU

And at last just evaluate our little program:

interp eval $i $data

Split Text into Lines

Splitting or processing a text file as a list of lines:

set lines  [split [read $fd] \n]

so, the number of lines in the file is [llength $lines], the n'th line in the file is [lindex $lines $n] and so forth.