Version 8 of diff

Updated 2002-05-29 10:28:38

GPS May 28, 2002: Diff is a common tool used in Unix-like systems. It compares two (sometimes more) files and outputs the differences. I wrote a version that only tracks content changes after seeing some code done by Master Tcl'er KBK.

Here is my version:


  #!/bin/tclsh8.3

  #Thanks to Kevin Kenny for showing me an example of how to do this.
  #This is all new code by George Peter Staplin.
  #It doesn't deal with \n differences on blank lines, only actual content changes and spaces.
  #Version 4

  proc diff:compareLists {origList newList char} {
    set origLine 1
    set result ""

    foreach orig $origList {
      set didNotMatch 1
      set count 0

      foreach new $newList {
        if {[string equal $orig $new]} {
          set newList [lreplace $newList $count $count]
          set didNotMatch 0
          break
        }
        incr count
      }

      if {$didNotMatch} {
        set formattedLine [format "%-5d" $origLine]
        append result "$formattedLine $char $orig\n"
      }
      incr origLine
    }

    return $result 
  }

  proc diff {origList newList} {
    set result "SECTION lines-removed-from-new-file:\n" 

    append result [diff:compareLists $origList $newList "<"]

    append result "SECTION lines-added-to-new-file:\n"

    #run diff:compareLists in reverse order to find the view of 
    #differences as the new file sees it
    append result [diff:compareLists $newList $origList ">"]

    return $result
  }

  proc main {argc argv} {

    if {$argc != 2} {
      return -code error "please use 2 arguments when calling $::argv0"
    }

    set orig [lindex $argv 0]
    set new [lindex $argv 1]

    set fi [open $orig r]
    set data [read $fi]
    close $fi

    set origList [split $data \n]

    set fi [open $new r]
    set data [read $fi]
    close $fi

    set newList [split $data \n]

    puts [diff $origList $newList]
  }
  main $::argc $::argv

Example output:

 $ ./gps_diff_2.tcl diff1.test diff2.test        
 SECTION lines-removed-from-new-file:
 11    <         regsub -all {\t} $data "  " newData
 12    <         regsub -all {\n} $newData "\n  " theData
 SECTION lines-added-to-new-file:
 11    >         regsub -all {\n} $data "\n  " theData
 15    >         puts DONE

Arjen Markus We have faced a slightly different problem: two files that should be compared with special care for (floating-point) numbers. The solution was simple in design:

  • Read the files line by line (all lines should be comparable, we did not need to deal with inserted or deleted lines)
  • Split the lines into words and compare the words as either strings or as numbers.
  • By using [string is float] we identified if the "word" is actually a number and if so, we compared them numerically (even allowing a certain tolerance if required).

This way you are immune to numbers formatted in different ways: 0.1, +.1, 1.0E-01, +1.00e-001 all spell the same number and you can encounter all of these forms (sometimes you have less than perfect control over the precise format).


Arjen Markus Question: would not this be a nice addition for the fileutil module in Tcllib?

GPS maybe it would...


See also diff in Tcl