Text processing tips

See Also

String Processing
text processing
Parsing

Continued Lines

AM I had a problem with files that contain "continued lines". Here is a sketch:

line with info \
    continued on the next line (see the backslash) \
    and the info is: Aha=BBBB
another line with info - here the info is: Aha=CC

I needed to extract information from the complete lines. Now usually I read files line by line and analyse the lines one by one. You can not do that with this type of layout. Or can you? Here is my little trick:

set contents [read $infile] 
set contents [string map [list "\\\n" " "] $contents]
foreach line [split $contents \n] {
   .. process the line ...
}

This little fragment of code=:

  • reads the complete file (in my case they were not very large)
  • replaces the trailing backslash (and the newline) by a single space
  • splits the contents into separate lines again

No need to check if the line is complete or not - just use a few commands.

Trailing Newline

CLN: When you get the contents of a text widget, you get an extra trailing newline. If you read the contents of a file, insert it in the widget, and just save those contents, you'll add a blank line at the end of the file for each save. The solution is to save one less character than [$text get 1.0 end] returns, something like [puts $fid [string range [$text get 1.0 end] 0 end-1]]].

ECS: Why not this: [puts $fid [$text get 1.0 end-1c]]]?

CLN: Oops. I guess that'll work, too (though I haven't verified either).

rdt: Well, isn't this because your puts is adding the newline? So wouldn't 'puts -nonewline $text' just do the job?

CLN: No good deed goes unpunished! ;-) In my too-quick example, both get, from the text widget, and the puts added newlines. Try this:

% text .t
.t
% pack .t
% .t insert end "Foo"
% string length [.t get 1.0 end]
4

So, I guess to write out only what you see in the text widget, you'd have to do:

puts -nonewline [.t get 1.0 end-1c]

RS: Note that text files not ending in a newline are considered ill-behaved, e.g. by diff...

Evaluate a Data Format

schlenk: If a data file or text is already quite similar to a Tcl program one can sometimes easily map it to a Tcl program and just execute it. One Example for this:

A plotter data file like this:

;PU 640, 6900
;PD   640,  6909
,   640,  6913
,   640,  6917
,   640,  6921
,   640,  6924
;PU   641,  6928
;PD   641,  6932
,   642,  6936
,   643,  6940
,   644,  6944
,   645,  6947
,   646,  6951

Looks already quite similar to a Tcl program, we just need to reformat it a little bit. This does the trick:

set data [string map {\n {} ; \n  , {}} $data]

Now we have to setup a nice evaluation environment, so we do not get surprised:

proc dummy_unknown {args} {return}
proc PD args { 
    foreach {x y} $args { puts "PenDown ( $x , $y )" }
}
proc PU args {
    foreach {x y} $args { puts "PenUp ( $x , $y )" }
}
set i [interp create -safe ]
$i eval {namespace delete ::}
interp alias $i unknown {} dummy_unknown
interp alias $i PD {} PD
interp alias $i PU {} PU

And at last just evaluate our little program:

interp eval $i $data

Split Text into Lines

Splitting or processing a text file as a list of lines:

set lines [split [read $fd] \n]

so, the number of lines in the file is [llength $lines], the n'th line in the file is [lindex $lines $n] and so forth.