Version 20 of diff

Updated 2007-11-15 13:20:54 by LV

The word diff , in many computer circles, refers to the concept of comparing two items and displaying, in some manner, the differences between the two items. Most frequently, it is a comparison of two files. If the output is in text, the Unix tradition is to display the differences in terms of the changes made to the first file to achieve a file similar to the second file.

Often in a GUI application, coloring or other techniques are used to convey more information about what changed. In some applcations, entire lines are highlighted, while in other, particular characters are highlited.


See diff in Tcl

The code that was here was crap (according to the author) and has been removed.

Arjen Markus We have faced a slightly different problem: two files that should be compared with special care for (floating-point) numbers. The solution was simple in design:

  • Read the files line by line (all lines should be comparable, we did not need to deal with inserted or deleted lines)
  • Split the lines into words and compare the words as either strings or as numbers.
  • By using [string is float] we identified if the "word" is actually a number and if so, we compared them numerically (even allowing a certain tolerance if required).

This way you are immune to numbers formatted in different ways: 0.1, +.1, 1.0E-01, +1.00e-001 all spell the same number and you can encounter all of these forms (sometimes you have less than perfect control over the precise format).


Arjen Markus Question: would not this be a nice addition for the fileutil module in Tcllib?

GPS maybe it would...

Arjen Markus If so, it would benefit (in my opinion) from two custom procedures:

  • A procedure one can supply to compare the lines (for instance: ignore white-space or interpret numbers as numbers - my original problem)
  • A procedure to process the output (in a manner as Tkdiff does for instance)

Arjen Markus A few thoughts for improving the performance:

  • Store the lines as {lineno content}
  • Sort by content (lsort has this ability via "-index")
  • Use binary search to replace the inner loop.

This would bring back the number of iterations from O(N^2) to O(NlogN). But perhaps it is not worth the trouble :-)


See also Using Snit to glue diff, patch, and md5sum.


CL has received mild testimonials about "Active File Compare" available through http://formulasoft.com There's no particular Tcl connection; it's just been valuable to me as a Tcl developer when working under Windows.


From comp.lang.tcl on Aug 22, 2007, we find this note:

A unofficial patched version of XDelta3 binary diff compression package is available on

http://downloads.sourceforge.net/tcldbrcs/xdelta30q-prepatched-tcl.tar.gz?use_mirror=osdn

[and] supports a simple but flexible callback interface to feed/extract data to/from the compressor.

TCL Examples included.

Jean-Samuel Gauthier


comparing files in tcl