Version 1 of text_replacer.tcl

Updated 2009-07-30 20:04:11 by rpremuz

rpremuz (2009-01-08)

text_replacer.tcl [L1 ] replaces character strings in the specified plain text files according to the given mappings.

Script usage:
text_replacer.tcl ?options? mappingsFile ?inputFile...?

mappingsFile
  Path of the text file that defines the string replacements by pairs of
  consecutive lines that have the following format:
  o:<old_string>
  n:<new_string>
  Each exact match of character string <old_string>, which must not be empty,
  will be replaced by the character string <new_string>.
  A line is ignored if it contains only whitespaces or begins with a hash (#).
  The script sorts the mappings in decreasing order (largest items first) based
  on old strings.

inputFile
  Path of the text file in which strings are to be replaced. If no files are
  specified, stdin is filtered to stdout.

Options:
-encoding <encoding>
  Specifies the encoding of the text in input, mappings and output. The
  'encoding names' Tcl command gives the accepted encodings. The default
  encoding is the platform- and locale-dependent system encoding.
  (Note: on MS Windows platform if stdin or stdout corresponds to a terminal,
  its default encoding is unicode. Otherwise it is the system encoding.)

-translation <translation>
  Specifies the translation mode for end-of-line characters in input, mappings
  and output. The accepted values are:
    lf    The EOL is a single newline (linefeed) character. This is typically
          used on Unix-like platforms. 
    crlf  The EOL is a carriage return character followed by a linefeed
          character. This is typically used on MS DOS/Windows platforms.
    auto  This is the default mode. On input EOL may be lf or crlf. On output
          EOL is platform specific (lf for Unix, crlf for Windows).

-t        Prints the execution time of the main procedure to the stderr.
-h        Prints this help to stdout.
--        Specifies the end of options.

The script first processes the mappings file to get the mappings. Then for each
specified input file it reads its lines and writes them to a temporary output
file making the replacements defined by the mappings. If the input file is a
symbolic link, the file that the link points to is opened.
If any real changes of input text have been made during the processing, the
input file is renamed to inputfile.old and the temporary output file is
renamed to inputfile. Otherwise, the temporary output file is deleted.

If an error occurs during the reading or writing, an error message is written
to stderr, the input file is not changed and the output file is deleted.
The script prints the debug messages to stderr if the DEBUG environment
variable is set to 1.

Exit code:
  0 - No errors during the processing of input files (or stdin).
  1 - An error occurred during the processing of input files (or stdin).

Use this page for discussion and improvements.