[Richard Suchenwirth] 2001-01-16 -- The ''Lish'' family is a set of [transliteration]s all designed to convert strings in lowly 7-bit ASCII to appropriate Unicode strings (see [Unicode and UTF-8]) in some major non-Latin writing systems. The name comes from the common suffix "lish" as in English, which is actually the neutral element of the family, faithfully returning its input ;-) Some rules of thumb: * One *lish character should unambiguously map to one target character, wherever applicable * One target letter should be represented by one *lish letter ([[A-Za-z]]), wherever applicable. Special characters and digits should be avoided for coding letters * Mappings should look intuitive and/or follow established practices * In languages that distinguish case, the corresponding substitutes for upper- and lowercase letters should also correspond casewise in lower ASCII (e.g. see [Ruslish]) It all began with ''[Greeklish]'', which is not my invention, but used by Greeks on the Internet for writing Greek without Greek fonts or character set support. I just extended the practice I found with the convention of marking accented vowels with a trailing apostrophe (so it's not a strict 1:1 transliteration anymore). The *lish procedures can be called with any number of arguments, for convenience. So you can just type arblish dby w Abw Zby ruslish Moskva i Leningrad greeklish Ellhnikh' Dhmokrati'a and watch the output on any Unicode-enabled device (e.g. all Tk widgets that accept text). BTW: printing Unicode text goes quite nicely on NT by displaying on a Tk text widget, copying and pasting into Notepad with a Unicode font set. Depending on job requirements and interests, the family grew, and now contains (see also [Languages supported by Lish]) * Arblish -- see [A simple Arabic renderer], r2l and context glyphs * [Chinlish] -- Pinyin words to Unicode, partial solution - add the words you want * [Eurolish] -- Danish, French, German, Icelandic, Italian, Spanish, Swedish * [Greeklish] -- the mother of all Lishes * [Hanglish] -- computing Hangul 2.0 Unicodes from Jamo equivalents * [Heblish] for Hebrew (r2l, context forms of letters explicitly indicated) * [Japlish] -- work in progress, here's a first shot * [Ruslish] -- for Russian * [APLish] -- for APL (not exactly a natural language) * Monglish - for [Mongolian in Tcl strimjes], different because the vertical writing requires a bottom-up design for pixel fonts, and the output is bitmap images with Mongolian For frequent use in multilingual contexts, one might introduce two-letter language code aliases: ar, gr, kr, iv, jp, ru. Source text with such embeddings just needs to be subst-ed and then makes nice Unicode. One minor flaw is left-justification even for Arabic and Hebrew, but in plain text you can't do much more than pad with spaces ;-( With this set of transliterations, I've basically covered most of what the fabulous ''Bitstream Cyberbit'' (available from http://jefferson.village.virginia.edu/IBabble/download/cyberbit.html ) and other monster fonts have to offer. Any volunteers for "Thailish" ;-? '''Future plans''': The parts of the Lish family developed over years and were just recently put under their common wrapper. You can see evolution in the code. When I have some time, I'd like to unify concepts and interfaces more than before to turn the whole thing into [the i18n package]. Most parts of the Lish family are coming together under the roof of ''taiku'', see [taiku goes multilingual], or it's little [PocketPC] brother [iKu]. ---- See [lish2html] for a filter that substitutes embedded *lish calls from and to HTML files. ---- [Arts and crafts of Tcl-Tk programming] ---- [Category Characters]