Richard Suchenwirth 2001-02-06 -- Hangul is the Korean writing system. Each syllable is represented by an often square arrangement of its constituting letters ("jamo") in either left-right or top-bottom fashion. Transliteration is element-by-element conversion of text in one writing system to another (if to English/Latin, it's also called romanization). "Hanglish" is the name of the following romanization scheme from The Lish family , chosen in analogy to Greeklish often used on the Net to write Greek in Latin.
There is an ISO agreement (ISO/TC46/SC2/WG4, 1992) on Hangul transliteration from which I slightly deviate:
One could still express the composition with AI for "ae", VI for "yae". For best adaptation to OCR/interpretation needs, I however prefer to use the two left-over letters: F for "yae", R for "ae".
After so much theory, here's the code:
proc hangul2hanglish {numuc} { # takes a numeric Unicode so far (until scan works, from 8.1b1) set ncount [expr 21*28] set index [expr $numuc - 0xAC00] ;# offset of Unicode 2.0 Hangul append res [lindex {G GG N D DD L M B BB S SS Q J JJ C K T P H}\ [expr int($index/$ncount)]] append res [lindex {A R V F E EI X XI O OA OR OI Y U UE UEI UI Z W WI I}\ [expr int(($index%$ncount)/28)]] append res [lindex {"" G GG GS N NJ NH D L LG LM LB LS LT LP LH \ M B BS S SS Q J C K T P H}\ [expr $index%28]] return $res } proc hanglish2uc {hanglish} { # convert a Hanglish string to one Unicode 2.0 Hangul if possible set L ""; set V "" ;# in case regexp doesn't hit regexp {^([GNDLMBSQJCKTPH]+)([ARVFEIXOYUZW]+)([GNDLMBSQJCKTPH]*)$} \ [string toupper $hanglish] -> L V T ;# lead consonant - vowel - trail cons. if {$L=="" || $V==""} {return $hanglish} set l [lsearch {G GG N D DD L M B BB S SS Q J JJ C K T P H} $L] set v [lsearch {A R V F E EI X XI O OA OR OI Y U UE UEI UI Z W WI I} $V] set t [lsearch {"" G GG GS N NJ NH D L LG LM LB LS LT LP LH \ M B BS S SS Q J C K T P H} $T] ;# trailing consonants if {[min $l $v $t]<0} {return $hanglish} set uc [expr $l*21*28 + $v*28 + $t + 0xAC00] return [format %c $uc] } proc hanglish {args} { # tolerant converter: makes Unicode 2.0 Hangul where possible set res "" foreach i $args { set word "" foreach {from to} { ai r vi f } {regsub -all $from $i $to i} foreach j [split $i "-"] { set t [hanglish2uc $j] if {$j==$t} {set word $i; break} ;# all syllables must fit append word $t } lappend res $word } return $res }
Usage example: [hanglish Se-qul] produces the hangul for s.Korea's capital. Note that the circle jamo is written as Q, although it's silent at the beginning of a syllable (at end, it is /ng/)
These routines have been incorporated into taiku, see taiku goes multilingual, which also introduces liberalisations - for Q you can write NG, or you can omit it at syllable-initial position, so se-ul has the same effect there. Also, going both ways, and with a GUI: A little Hangul converter.