|What:| char2ent.tcl|
| Where:| https://pastebin.com/c2dPigpd|
| Description:| Read an HTML or XML file containing literal special characters, then replace specials with appropriate SGML entities and output.|
|| Currently at version 0.0 .|
| Updated:| 11/2004|
Contact: [LES]
----
[LES] on November 18, 2004: Get the code here: [https://pastebin.com/c2dPigpd]
Or here:
======
#!/usr/bin/env tclsh
# char2ent.tcl - Opens an HTML or XML file with special characters
# (diacritics) written in plain text, replaces these special
# characters with appropriate HTML or XML entities and writes the
# output to a new file.
#
# Author: Luciano Espirito Santo
#
# History
#
# Version 1.0 2004-11-18 Luciano Espirito Santo
# First version. Alpha stage.
#
# KNOWN ISSUES:
# - No user-proof measures, no error or exception handling, no nothing!
# No guarantees! Use it at your own risk!
# - Tested on Windows 98 and Linux only.
#
# TODO:
# - Extend it so it can handle all possible characters (Unicode).
# - Make the ability to do the INVERSE operation (that would include the
# ability to tell non-escaped entities from escaped entities and NOT
# replace the escaped entities.
# - Make it handle STDIN.
#
# LICENSE: BSD
#
# How to use it:
#
# char2ent.tcl --help
# ----------------------------------------------------------------
# Do not change anything below this point unless you know what you're doing.
# Print help text and exit if '--help' is the only argument if {[llength $argc] == 1 && [lindex $argv 0] == "--help"} {
puts ""
puts "char2ent, by Luciano Espirito Santo - 2004"
puts ""
puts {Usage: char2ent -[option] "input file" "output file"}
puts ""
puts "Possible options:"
puts "-h: convert special characters to HTML entities"
puts "-x: convert special characters to XML entities"
puts ""
puts {"input file" MUST exist}
puts {"output file" is created automatically if it does not exist}
puts {"input file" and "output file" MUST NOT be the same file}
puts ""
puts {Example: char2ent -x "sample.xml" "converted.xml"}
puts ""
exit
}
# Complain and exit if option is neither '-h' nor '-x' if {[lindex $argv 0] != "-h" && [lindex $argv 0] != "-x"} {
puts "Error! Try 'char2ent --help' to see how to use this program.\n"
exit
}
# Complain and exit if not exactly 3 arguments (option, input, output) are found if {$argc != 3} {
puts "Error! You must use exactly 3 arguments.\n"
puts "$argv\n"
puts "Error! Try 'char2ent --help' to see how to use this program.\n"
exit
}
# Complain and exit if input file does not exist if {! [file exists [lindex $argv 1]]} {
puts "Error! File \"[lindex $argv 1]\" not found!\n"
exit
}
# Complain and exit if input file is not readable if {! [file readable [lindex $argv 1]]} {
puts "Error! Permission denied to read [ lindex $argv 1 ]!\n"
exit
}
# Complain if input file and output file are the same if {[lindex $argv 1] == [lindex $argv 2]} {
puts "Error! \"input file\" and \"output file\" must not be the same.\n"
exit
}
# Try to open input file for reading.
# Complain and exit in case of errors. if {[catch {set IF [open [lindex $argv 1] r]} IFerror]} {
puts "Error! $IFerror\n"
exit
}
# Try to open output file for writing.
# Complain, close input file and exit in case of errors. if {[catch {set OF [open [lindex $argv 2] w]} OFerror]} {
close $IF
puts "Error! $OFerror\n"
exit
}
# ================================================
# Two files open. No errors this far. Let's replace.
set CHARS { ª º À Á Â Ã Ä Å Æ Ç
È É Ê Ë Ì Í Î Ï Ð Ñ
Ò Ó Ô Õ Ö Ø Ù Ú Û Ü
Ý Þ ß à á â ã ä å æ
ç è é ê ë ì í î ï ð
ñ ò ó ô õ ö ø ù ú û
ü ý þ ÿ Œ œ Ÿ
}
set HTML { ª º À Á Â Ã Ä
Å &Aelig; Ç È É Ê Ë
Ì Í Î Ï Ð Ñ Ò
Ó Ô Õ Ö Ø Ù Ú
Û Ü Ý Þ ß à á
â ã ä å æ ç è
é ê ë ì í î ï ð
ñ ò ó ô õ ö ø
ù ú û ü ý þ ÿ
&Oelig; œ Ÿ
}
set XML { ª ªº À ºÁ  ÀÃ Ä ÁÅ Æ Â Ã Ä Å Æ
Ç ÇÈ É ÈÊ Ë ÉÌ Í ÊÎ Ï Ë Ì Í Î Ï
Ð ÐÑ Ò ÑÓ Ô ÒÕ Ö ÓØ Ù Ô Õ Ö Ø Ù
Ú ÚÛ Ü ÛÝ Þ Üß à Ýá â Þ ß à á â
ã ãä å äæ ç åè é æê ë ç è é ê ë
ì ìí î íï ð îñ ò ïó ô ð ñ ò ó ô
õ õö ø öù ú øû ü ùý þ ú û ü ý þ
ÿ ÿŒ œ Œ œ ŸŸ
}
set TEXT [read $IF]
for {set i 0} {$i < [llength $CHARS]} {incr i} {
switch -- [lindex $argv 0] {
"-h" {set REPL [lindex $HTML $i]}
"-x" {set REPL [lindex $XML $i]}
}
set TEXT [string map "[lindex $CHARS $i] $REPL" $TEXT]
}
puts -nonewline $OF $TEXT
close $IF
close $OF
exit
======
The code was posted here entirely at first, but I never succeeded in having all characters displayed correctly. Today (May 2007), I saw that many other characters were displayed incorrectly: a whole series of them displayed as question marks. I spent almost half an hour trying to correct it, but I couldn't make it work. Wikit keeps complaining about some "bogus" character and blames my browser (Firefox). Konqueror didn't work either. I give up. Just download the thing and move on.
[LES] in 2022: it seems the Wiki displays all the characters correctly now. Yay!
----
[RS] Note that [read] includes the trailing newline; to write such a file-string out it is best to use ''[puts] -nonewline'' so you don't get an extra one. Also, [string map] is happy with long maps, so you can avoid the [for] loop by just coding
======
set XMLmap {& & < < > > ...}
set HTMLmap {Ä Ä ...}
...
set myMap $XMLmap
...
set myText [string map $myMap $mytext]
======
<<categories>> Application | Characters | Dev. Tools | XML