Version 15 of Text processing tips

Updated 2008-07-18 15:45:48 by AK

Arjen Markus (9 March 2005) I created this page to collect small code fragments about the various tricks you can use to process text. Most of the time these tricks do not merit a page of their own, but they do merit a place on the Wiki :).


AM I had a problem with files that contain "continued lines". Here is a sketch:

    line with info \
          continued on the next line (see the backslash) \
          and the info is: Aha=BBBB
    another line with info - here the info is: Aha=CC

I needed to extract information from the complete lines. Now usually I read files line by line and analyse the lines one by one. You can not do that with this type of layout. Or can you? Here is my little trick:

   set contents [read $infile] 
   set contents [string map [list "\\\n" " "] $contents]
   foreach line [split $contents \n] {
      .. process the line ...
   }

This little fragment of code=:

  • reads the complete file (in my case they were not very large)
  • replaces the trailing backslash (and the newline) by a single space
  • splits the contents into separate lines again

No need to check if the line is complete or not - just use a few commands.


CLN When you get the contents of a text widget, you get an extra trailing newline. If you read the contents of a file, insert it in the widget, and just save those contents, you'll add a blank line at the end of the file for each save. The solution is to save one less character than [$text get 1.0 end] returns, something like [puts $fid [string range [$text get 1.0 end] 0 end-1]]].

ECS: Why not this: [puts $fid [$text get 1.0 end-1c]]]?

CLN Oops. I guess that'll work, too (though I haven't verified either).

rdt Well, isn't this because your puts is adding the newline? So wouldn't 'puts -nonewline $text' just do the job?

CLN No good deed goes unpunished! ;-) In my too-quick example, both the get from the text widget and the [puts] added newlines. Try this:

  % text .t
  .t
  % pack .t
  % .t insert end "Foo"
  % string length [.t get 1.0 end]
  4

So, I guess to write out only what you see in the text widget, you'd have to do:

  puts -nonewline [.t get 1.0 end-1c]

RS: Note that text files not ending in a newline are considered ill-behaved, e.g. by diff...


schlenk If a data file or text is already quite similar to a Tcl program one can sometimes easily map it to a Tcl program and just execute it. One Example for this:

A plotter data file like this:

 ;PU 640, 6900
 ;PD   640,  6909
 ,   640,  6913
 ,   640,  6917
 ,   640,  6921
 ,   640,  6924
 ;PU   641,  6928
 ;PD   641,  6932
 ,   642,  6936
 ,   643,  6940
 ,   644,  6944
 ,   645,  6947
 ,   646,  6951

Looks already quite similar to a Tcl program, we just need to reformat it a little bit. This does the trick:

 set data [string map {  \n "" ; \n  , "" } $data]

Now we have to setup a nice evaluation environment, so we do not get surprised:

 proc dummy_unknown {args} {return}
 proc PD {args} { 
     foreach {x y} $args { puts "PenDown ( $x , $y )" }
 }
 proc PU {args} {
     foreach {x y} $args { puts "PenUp ( $x , $y )" }
 }
 set i [interp create -safe ]
 $i eval {namespace delete ::}
 interp alias $i unknown {} dummy_unknown
 interp alias $i PD {} PD
 interp alias $i PU {} PU

And at last just evaluate our little program:

 interp eval $i $data

Splitting or processing a text file as a list of lines:

    set lines  [split [read $fd] \n]

so, the number of lines in the file is [llength $lines], the n'th line in the file is [lindex $lines $n] and so forth.


... next trick? ...


Category Parsing Category File Category String Processing