ITRANS to Unicode Converter

WJG (26/09/09) Sometimes it really easy to do relatively complex jobs in Tcl. But, I guess regular Tclers know that anyway. Handling a lot of texts possibly in mixed romanised script is a daily task for me. SCIM on Linux goes a long way to help, but there's the need to constantly switch my input system between a number of encodings. Sometimes it helps to use ascii key combinations. This is where ITRANS [2 ] comes in useful. ITRANS a system developed as a means of depicting Indic languages using the base ASCII code set. So,its easy! Imagine my nightmares of using mixed Chinese-Characters, Pinyin, Romanised Sanskrit (IAST) [1 ] and English key mappings! Its a lot of switching about. Hence this little routine. I guess that an ITRANS to Devanagari converter is now called for.

#===============
# ITrans-Unicode.tcl
#===============
# Author:   William J Giddings
# Date:     26/09/09
#===============

#!/bin/sh
# the next line restarts using tclsh \
exec tclsh "$0" "[email protected]"

#---------------
# convert ITRANS ascii sequence to Unicode
#---------------
proc ITrans { key } {

    return [string map {
    "aa" "ā"
    "AA" "Ā"
    "ii" "ī"
    "II" "Ī"
    "uu" "ū"
    "UU" "Ū"
    ".r" "ṛ"
    ".R" "Ṛ"
    ".rr" "ṝ"
    ".RR" "Ṝ"
    ".l" "ḷ"
    ".L" "Ḷ"
    ".ll" "ḹ"
    ".LL" "Ḹ"
    ".M" "ṁ"
    ".m" "ṃ"
    ".h" "ḥ"
    ".H" "Ḥ"
    ";n" "ṅ"
    ";N" "Ṅ"
    "~n" "ñ"
    "~N" "Ñ"
    ".t" "ṭ"
    ".T" "Ṭ"
    ".d" "ḍ"
    ".D" "Ḍ"
    ".n" "ṇ"
    ".N" "Ṇ"
    ";s" "ś"
    ";S" "Ś"
    ".s" "ṣ"
    ".S" "Ṣ"
    } $key]

}

# Demonstration code
puts [ITrans "praaj~napaaramitaa"]
puts [ITrans "rat.nagunasa.Mcayagaathaa"]
puts [ITrans "K.r.s.na"]

The demonstration code produces this output:

prājñapāramitā
ratṇagunasaṁcayagāthā
Kṛṣṇa

DGP This script should use the -encoding option of tclsh to be sure that tclsh uses the encoding in which all the characters in this script are stored in the file. This is probably utf-8.


jbr - 2009-11-13 22:23:24

Here are a bunch of other characters with a TeX mapping. [3 ]

WJG (25/03/13) It might be worth noting for any Linux users who happen to be Sanskritists or work with any other Indic language that a while ago I submitted an IAST module based upon the above script for distribution with the M17N library used by SCIM to handle multi-language input. This certainly makes implementing complex input methods more manageable. This code snippet need not be forgotten, however, as there are quote a number of online texts out there on the web encoded in ITRANS which can be downloaded and converted using this simple proc.