Version 15 of diff

Updated 2003-06-18 07:58:18

See diff in Tcl

The code that was here was crap (according to the author) and has been removed.

Arjen Markus We have faced a slightly different problem: two files that should be compared with special care for (floating-point) numbers. The solution was simple in design:

  • Read the files line by line (all lines should be comparable, we did not need to deal with inserted or deleted lines)
  • Split the lines into words and compare the words as either strings or as numbers.
  • By using [string is float] we identified if the "word" is actually a number and if so, we compared them numerically (even allowing a certain tolerance if required).

This way you are immune to numbers formatted in different ways: 0.1, +.1, 1.0E-01, +1.00e-001 all spell the same number and you can encounter all of these forms (sometimes you have less than perfect control over the precise format).


Arjen Markus Question: would not this be a nice addition for the fileutil module in Tcllib?

GPS maybe it would...

Arjen Markus If so, it would benefit (in my opinion) from two custom procedures:

  • A procedure one can supply to compare the lines (for instance: ignore white-space or interpret numbers as numbers - my original problem)
  • A procedure to process the output (in a manner as Tkdiff does for instance)

Arjen Markus A few thoughts for improving the performance:

  • Store the lines as {lineno content}
  • Sort by content (lsort has this ability via "-index")
  • Use binary search to replace the inner loop.

This would bring back the number of iterations from O(N^2) to O(NlogN). But perhaps it is not worth the trouble :-)


See also diff in Tcl and Using Snit to glue diff, patch, and md5sum