What: | char2ent.tcl |
Where: | https://pastebin.com/c2dPigpd |
Description: | Read an HTML or XML file containing literal special characters, then replace specials with appropriate SGML entities and output. |
Currently at version 0.0 . | |
Updated: | 11/2004 |
Contact: LES
LES on November 18, 2004: Get the code here: [L1 ]
Or here:
#!/usr/bin/env tclsh # char2ent.tcl - Opens an HTML or XML file with special characters # (diacritics) written in plain text, replaces these special # characters with appropriate HTML or XML entities and writes the # output to a new file. # # Author: Luciano Espirito Santo # # History # # Version 1.0 2004-11-18 Luciano Espirito Santo # First version. Alpha stage. # # KNOWN ISSUES: # - No user-proof measures, no error or exception handling, no nothing! # No guarantees! Use it at your own risk! # - Tested on Windows 98 and Linux only. # # TODO: # - Extend it so it can handle all possible characters (Unicode). # - Make the ability to do the INVERSE operation (that would include the # ability to tell non-escaped entities from escaped entities and NOT # replace the escaped entities. # - Make it handle STDIN. # # LICENSE: BSD # # How to use it: # # char2ent.tcl --help # ---------------------------------------------------------------- # Do not change anything below this point unless you know what you're doing. # Print help text and exit if '--help' is the only argument if {[llength $argc] == 1 && [lindex $argv 0] == "--help"} { puts "" puts "char2ent, by Luciano Espirito Santo - 2004" puts "" puts {Usage: char2ent -[option] "input file" "output file"} puts "" puts "Possible options:" puts "-h: convert special characters to HTML entities" puts "-x: convert special characters to XML entities" puts "" puts {"input file" MUST exist} puts {"output file" is created automatically if it does not exist} puts {"input file" and "output file" MUST NOT be the same file} puts "" puts {Example: char2ent -x "sample.xml" "converted.xml"} puts "" exit } # Complain and exit if option is neither '-h' nor '-x' if {[lindex $argv 0] != "-h" && [lindex $argv 0] != "-x"} { puts "Error! Try 'char2ent --help' to see how to use this program.\n" exit } # Complain and exit if not exactly 3 arguments (option, input, output) are found if {$argc != 3} { puts "Error! You must use exactly 3 arguments.\n" puts "$argv\n" puts "Error! Try 'char2ent --help' to see how to use this program.\n" exit } # Complain and exit if input file does not exist if {! [file exists [lindex $argv 1]]} { puts "Error! File \"[lindex $argv 1]\" not found!\n" exit } # Complain and exit if input file is not readable if {! [file readable [lindex $argv 1]]} { puts "Error! Permission denied to read [ lindex $argv 1 ]!\n" exit } # Complain if input file and output file are the same if {[lindex $argv 1] == [lindex $argv 2]} { puts "Error! \"input file\" and \"output file\" must not be the same.\n" exit } # Try to open input file for reading. # Complain and exit in case of errors. if {[catch {set IF [open [lindex $argv 1] r]} IFerror]} { puts "Error! $IFerror\n" exit } # Try to open output file for writing. # Complain, close input file and exit in case of errors. if {[catch {set OF [open [lindex $argv 2] w]} OFerror]} { close $IF puts "Error! $OFerror\n" exit } # ================================================ # Two files open. No errors this far. Let's replace. set CHARS { ª º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ Œ œ Ÿ } set HTML { ª º À Á Â Ã Ä Å &Aelig; Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ &Oelig; œ Ÿ } set XML { ª º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ Œ œ Ÿ } set TEXT [read $IF] for {set i 0} {$i < [llength $CHARS]} {incr i} { switch -- [lindex $argv 0] { "-h" {set REPL [lindex $HTML $i]} "-x" {set REPL [lindex $XML $i]} } set TEXT [string map "[lindex $CHARS $i] $REPL" $TEXT] } puts -nonewline $OF $TEXT close $IF close $OF exit
The code was posted here entirely at first, but I never succeeded in having all characters displayed correctly. Today (May 2007), I saw that many other characters were displayed incorrectly: a whole series of them displayed as question marks. I spent almost half an hour trying to correct it, but I couldn't make it work. Wikit keeps complaining about some "bogus" character and blames my browser (Firefox). Konqueror didn't work either. I give up. Just download the thing and move on.
LES in 2022: it seems the Wiki displays all the characters correctly now. Yay! But it kills all the XML characters when editing. No yay! :-(
RS Note that read includes the trailing newline; to write such a file-string out it is best to use puts -nonewline so you don't get an extra one. Also, string map is happy with long maps, so you can avoid the for loop by just coding
set XMLmap {& & < < > > ...} set HTMLmap {Ä Ä ...} ... set myMap $XMLmap ... set myText [string map $myMap $mytext]
arjen - 2022-09-15 07:05:39
We will be looking into this issue: the &...; entities are displayed as the characters they represent, rather than the entities themselves.