Eurolish

Richard Suchenwirth 2000-04-13 -- Here's from The Lish family a routine that produces Unicodes of European accented and other special characters (e.g. c cedille, oe ligature) from plain 7-bit (ASCII) input. For accents, trailing apostrophe, back-apostrophe, caret were used; dieresis (umlaut dots) is produced by double apostrophe, because an unbalanced double-quote might trouble Tcl. Ligatures are joined by ampersand (this brings us in trouble with HTML, so might change). Cedillas are marked by back-apostrophe -- the more natural comma should not be lost in texts. Note how the carets had to be marked for escaping their normal meaning in regexps - beginning of string, or inverse class. Back apostrophes have also been chosen for other special characters since they're relatively rarely used outside of shell scripts.

This mapping should be sufficient for rendering Albanian, Croatian, Czech, Danish, Finnish, French, German, Icelandic, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish (including inverted exclam./question marks), Swedish, Turkish. If not, just edit this page! Enjoy ;-)

 set i18n_euro2u { 
           A`' \u0104 a`' \u0105 A'` \u0102 a'` \u0103
           A`` \uc5 A` \uc0 A'' \uc4 A' \uc1 {A[\^]} \uc2 A~ \uc3
           a`` \ue5 a` \ue0 a'' \ue4 a' \ue1 {a[\^]} \ue2 a~ \ue3
           A&E \uc6 a&e \ue6   
           C` \uc7 c` \ue7 C' \u0106 c' \u0107
           {C[\^]} \u010c {c[\^]} \u010d 
           D`` \u0110 d`` \u0111 D` \ud0 d` \uf0
           E`' \u011a e`' \u011b E~ \u0118 e~ \u0119
           E`` \u20ac E` \uc8 E'' \ucb E' \uc9 {E[\^]} \uca
           e` \ue8 e'' \ueb e' \ue9 {e[\^]} \uea
           {G[\^]} \u011e {g[\^]} \u011f        
           I`` \u0130 I` \ucc I'' \ucf I' \ucd {I[\^]} \uce
           i`` \u0131 i` \uec i'' \uef i' \ued {i[\^]} \uee
           L`` \u20a4  L` \u0141 l` \u0142                
           N~ \ud1 n~ \uf1 N' \u0143 n' \u0144 
           {N[\^]} \u0147  {n[\^]} \u0148
           O'' \ud6 O~ \ud5 O`` \u0150 O` \ud2 O' \ud3 {O[\^]} \ud4
           o`` \u0151 o` \uf2 o'' \uf6 o' \uf3 {o[\^]} \uf4 o~ \uf5
           O/ \ud8 o/ \uf8 O&E \u0152 \o&e \u0153
           {R[\^]} \u0158 {r[\^]} \u0159
           S' \u015a s' \u015b s&s \udf {S[\^]} \u0160 {s[\^]} \u0161
           S` \u015e s` \u015f
           T~ \ude t~ \ufe T` \u0162 t` \u0163 
           U`' \u016e u`' \u016f
           U` \ud9 U'' \udc U' \uda {U[\^]} \udb          
           u` \uf9 u'' \ufc u' \ufa {u[\^]} \ufb
           y'' \uff y' \ufd Y' \udd
           Z' \u0179 z' \u017a Z` \u017b z` \u017c 
           {Z[\^]} \u017d {z[\^]} \u017e
           <&< \uab >&> \ubb _a \uaa _o \uba !~ \ua1 {[?]~} \ubf
 }
 proc eurolish args {
        if {$args==""} {
           set args "!~Hola! Un co&eur franc`ais, e^tre A``lborgga&ede"
        }
        foreach {from to} $::i18n_euro2u {
           regsub -all $from $args $to args
        }
        set args
 }

DKF: Extended to support a few other languages as well (most notably German.) In fact, it should be possible to support most western european languages (and icelandic?) with the same translation mechanism...

RS: That's why it's moved now from the original name Franlish to Eurolish ;-) Moved the table out of the proc, so you can display it for reference (use subst -nocommands $::i18n_euro2u). - Just added the Euro currency sign (E`), which a Eurolish must include, and the sterling sign (L`) for the others .., and, serious again, Czech and Polish.


See also An anomaly in case conversion for Turkish.