[GPS] May 28, 2002: Diff is a common tool used in Unix-like systems. It compares two (sometimes more) files and outputs the differences. I wrote a version that only tracks content changes after seeing some code done by Master Tcl'er [KBK]. Here is my version: ---- #!/bin/tclsh8.3 #Thanks to Kevin Kenny for showing me an example of how to do this. #This is all new code by George Peter Staplin. #It doesn't deal with \n differences on blank lines, only actual content changes and spaces. #The SECTION usage reminds me of COBOL, and should help if I write a patch program. #Version 3 proc diff {origList newList} { set result "SECTION lines-removed-from-new-file:\n" set origLine 1 set count 0 foreach orig $origList { set didNotMatch 1 foreach new [lrange $newList $count end] { if {[string equal $orig $new]} { set didNotMatch 0 incr count break } } if {$didNotMatch} { set formattedLine [format "%-5d" $origLine] append result "$formattedLine < $orig\n" } incr origLine } append result "SECTION lines-added-to-new-file:\n" set newLine 1 set count 0 foreach new $newList { set didNotMatch 1 foreach orig [lrange $origList $count end] { if {[string equal $orig $new]} { set didNotMatch 0 incr count break } } if {$didNotMatch} { set formattedLine [format "%-5d" $newLine] append result "$formattedLine > $new\n" } incr newLine } return $result } proc main {argc argv} { if {$argc != 2} { return -code error "please use 2 arguments when calling $::argv0" } set orig [lindex $argv 0] set new [lindex $argv 1] set fi [open $orig r] set data [read $fi] close $fi set origList [split $data \n] set fi [open $new r] set data [read $fi] close $fi set newList [split $data \n] puts [diff $origList $newList] } main $::argc $::argv ---- Example output: $ ./gps_diff_2.tcl diff1.test diff2.test SECTION lines-removed-from-new-file: 11 < regsub -all {\t} $data " " newData 12 < regsub -all {\n} $newData "\n " theData SECTION lines-added-to-new-file: 11 > regsub -all {\n} $data "\n " theData 15 > puts DONE ---- [Arjen Markus] We have faced a slightly different problem: two files that should be compared with special care for (floating-point) numbers. The solution was simple in design: * Read the files line by line (all lines should be comparable, we did not need to deal with inserted or deleted lines) * Split the lines into words and compare the words as either strings or as numbers. * By using [[string is float]] we identified if the "word" is actually a number and if so, we compared them numerically (even allowing a certain tolerance if required). This way you are immune to numbers formatted in different ways: 0.1, +.1, 1.0E-01, +1.00e-001 all spell the same number and you can encounter all of these forms (sometimes you have less than perfect control over the precise format). ---- [Arjen Markus] Question: would not this be a nice addition for the fileutil module in Tcllib? ---- See also [diff in Tcl]