|What:| char2ent.tcl|
| Where:| https://pastebin.com/c2dPigpd|
| Description:| Read an HTML or XML file containing literal special characters, then replace specials with appropriate SGML entities and output.|
|| Currently at version 0.0 .|
| Updated:| 11/2004|
Contact: [LES]
----
[LES] on November 18, 2004: Get the code here: [https://pastebin.com/c2dPigpd]
Or here:
======
#!/usr/bin/env tclsh
# char2ent.tcl - Opens an HTML or XML file with special characters
# (diacritics) written in plain text, replaces these special
# characters with appropriate HTML or XML entities and writes the
# output to a new file.
#
# Author: Luciano Espirito Santo
#
# History
#
# Version 1.0 2004-11-18 Luciano Espirito Santo
# First version. Alpha stage.
#
# KNOWN ISSUES:
# - No user-proof measures, no error or exception handling, no nothing!
# No guarantees! Use it at your own risk!
# - Tested on Windows 98 and Linux only.
#
# TODO:
# - Extend it so it can handle all possible characters (Unicode).
# - Make the ability to do the INVERSE operation (that would include the
# ability to tell non-escaped entities from escaped entities and NOT
# replace the escaped entities.
# - Make it handle STDIN.
#
# LICENSE: BSD
#
# How to use it:
#
# char2ent.tcl --help
# ----------------------------------------------------------------
# Do not change anything below this point unless you know what you're doing.
# Print help text and exit if '--help' is the only argument
if {[llength $argc] == 1 && [lindex $argv 0] == "--help"} {
puts ""
puts "char2ent, by Luciano Espirito Santo - 2004"
puts ""
puts {Usage: char2ent -[option] "input file" "output file"}
puts ""
puts "Possible options:"
puts "-h: convert special characters to HTML entities"
puts "-x: convert special characters to XML entities"
puts ""
puts {"input file" MUST exist}
puts {"output file" is created automatically if it does not exist}
puts {"input file" and "output file" MUST NOT be the same file}
puts ""
puts {Example: char2ent -x "sample.xml" "converted.xml"}
puts ""
exit
}
# Complain and exit if option is neither '-h' nor '-x'
if {[lindex $argv 0] != "-h" && [lindex $argv 0] != "-x"} {
puts "Error! Try 'char2ent --help' to see how to use this program.\n"
exit
}
# Complain and exit if not exactly 3 arguments (option, input, output) are found
if {$argc != 3} {
puts "Error! You must use exactly 3 arguments.\n"
puts "$argv\n"
puts "Error! Try 'char2ent --help' to see how to use this program.\n"
exit
}
# Complain and exit if input file does not exist
if {! [file exists [lindex $argv 1]]} {
puts "Error! File \"[lindex $argv 1]\" not found!\n"
exit
}
# Complain and exit if input file is not readable
if {! [file readable [lindex $argv 1]]} {
puts "Error! Permission denied to read [ lindex $argv 1 ]!\n"
exit
}
# Complain if input file and output file are the same
if {[lindex $argv 1] == [lindex $argv 2]} {
puts "Error! \"input file\" and \"output file\" must not be the same.\n"
exit
}
# Try to open input file for reading.
# Complain and exit in case of errors.
if {[catch {set IF [open [lindex $argv 1] r]} IFerror]} {
puts "Error! $IFerror\n"
exit
}
# Try to open output file for writing.
# Complain, close input file and exit in case of errors.
if {[catch {set OF [open [lindex $argv 2] w]} OFerror]} {
close $IF
puts "Error! $OFerror\n"
exit
}
# ================================================
# Two files open. No errors this far. Let's replace.
set CHARS { ª º À Á Â Ã Ä Å
Æ Ç
È É Ê Ë Ì Í
Î Ï Ð Ñ
Ò Ó Ô Õ
Ö Ø Ù Ú Û Ü
Ý Þ
ß à á â ã ä å æ
ç è é ê ë ì í î
ï ð
ñ ò ó ô õ ö
ø ù ú û
ü ý þ ÿ
Œ œ Ÿ
}
set HTML { ª º À Á Â Ã Ä
Å
&Aelig; Ç È É Ê Ë
Ì Í
Î Ï Ð Ñ Ò
Ó Ô Õ
Ö Ø Ù Ú
Û Ü Ý Þ
ß à á
â ã ä å æ
ç è
é ê ë ì í î
ï ð
ñ ò ó ô õ ö
ø
ù ú û ü ý þ ÿ
&Oelig; œ Ÿ
}
set XML { ªª ºº ÀÀ ÁÁ  Ãà ÄÄ ÅÅ
ÆÆ Ç
È ÇÉ ÈÊ ÉË ÊÌ ËÍ
ÌÎ ÍÏ ÎÐ ÏÑ Ò
Ó ÐÔ ÑÕ
ÒÖ ÓØ ÔÙ ÕÚ ÖÛ ØÜ ÙÝ Þ
Úß Ûà Üá Ýâ Þã ßä àå á â æ
ãç äè åé æê çë èì éí ê ë î
ìï íð îñ ïò ðó ñô òõ ó ô ö
õø öù øú ùû úü ûý üþ ý þ ÿ
ÿŒ œ Œ œ ŸŸ
}
set TEXT [read $IF]
for {set i 0} {$i < [llength $CHARS]} {incr i} {
switch -- [lindex $argv 0] {
"-h" {set REPL [lindex $HTML $i]}
"-x" {set REPL [lindex $XML $i]}
}
set TEXT [string map "[lindex $CHARS $i] $REPL" $TEXT]
}
puts -nonewline $OF $TEXT
close $IF
close $OF
exit
======
The code was posted here entirely at first, but I never succeeded in having all characters displayed correctly. Today (May 2007), I saw that many other characters were displayed incorrectly: a whole series of them displayed as question marks. I spent almost half an hour trying to correct it, but I couldn't make it work. Wikit keeps complaining about some "bogus" character and blames my browser (Firefox). Konqueror didn't work either. I give up. Just download the thing and move on.
[LES] in 2022: it seems the Wiki displays all the characters correctly now. Yay!But it kills all the XML characters when editing. No yay! :-(
----
[RS] Note that [read] includes the trailing newline; to write such a file-string out it is best to use ''[puts] -nonewline'' so you don't get an extra one. Also, [string map] is happy with long maps, so you can avoid the [for] loop by just coding
======
set XMLmap {& & < < > > ...}
set HTMLmap {Ä Ä ...}
...
set myMap $XMLmap
...
set myText [string map $myMap $mytext]
======
----
'''[arjen] - 2022-09-15 07:05:39'''
We will be looking into this issue: the &...; entities are displayed as the characters they represent, rather than the entities themselves.
<<categories>> Application | Characters | Dev. Tools | XML