Heblish

Richard Suchenwirth 2001-01-04 - "Heblish" ("Hebrew from English", termed in analogy to Greeklish) is a transliteration of the Hebrew alphabet from 7-bit ASCII to Unicode (cf. Unicode and UTF-8, The Lish family). Mostly lowercase letters are used (Hebrew does not distinguish case), final forms of five Hebrew letters are mapped to the uppercase equivalents (K, M, N, F, Y).

WikiDbImage heblish.jpg

Direction is changed to right-to-left, except for letterless words (e.g. numbers). Usage example: "heblish irwsliM" gives the Hebrew spelling for Jerusalem. See also bidi rendering.

 array set i18n_a2hb {
    a \u5d0 b \u5d1 g \u5d2 d \u5d3 h \u5d4 w \u5d5 z \u5d6 x \u5d7 u \u5d8
    i \u5d9 K \u5da k \u5db l \u5dc M \u5dd m \u5de N \u5df n \u5e0 o \u5e1
    e \u5e2 F \u5e3 f \u5e4 Y \u5e5 y \u5e6 q \u5e7 r \u5e8 s \u5e9 t \u5ea
    ( ) ) (
 }

 proc heblish {args} {
    global i18n_a2hb
    set res {}
    foreach word [split $args] {
        if [regexp {^[^A-Za-z]+$} $word] {
            lpush res $word
        } else {
            set t ""
            foreach i [split $word ""] {
                if {[array names i18n_a2hb $i]!=""} {
                    prepend t $i18n_a2hb($i)
                } elseif {[array names i18n_a2hb [string tolower $i]]!=""} {
                    prepend t $i18n_a2hb([string tolower $i]) ;# tolerant
                } else {
                    prepend t $i
                }
            }
            lpush res $t
        }
    }
    return $res
 }

 proc lpush {_list what} {
    upvar $_list L
    if ![info exists L] {set L {}}
    set L [concat [list $what] $L]
 }

 proc prepend {_var string} {
    upvar $_var var
    if ![info exists var] {set var ""}
    set var $string$var
 }

KBK - What's the current thinking on Unicode combining forms? This setup leaves us without vowel points, dotted sin/shin, and dagesh / mapiq. (Obviously, the text widget doesn't do them either.) - RS: Right. The combining characters (U+05B0..05C2) are contained in Cyberbit, but are rendered non-combined (both by Tcl's text widget and M$'s Notepad), so they're of little use yet. I know too little about Truetype internals, so have no way to check whether the font lacks that information, or it is not used by the renderer. Sorry for the moment...

The "unicodeedit.tcl" script has very similar to the above logic in place, plus a visual (on-window, mousable) keyboard for a variety of options (default ISO-8859-1, also ISO-8859-8). I found it on a website, without attribution, nor internal attribution last night and foolishly forgot to also save the .fr URL.

It does adjacent-character-logic RtoL and LtoR keying, saves to Unicode, UTF-8, or translates to any other encoding supported by tcl's encoding/ tables. (Has anyone done an Olb 8-bit font .enc for pre-pointed glyphs?) I haven't fully explored the font, to see if there are more than sin/shin dots, but believe that there are.

As for dagesh/mapiq, sin/shin dots, text effects like dotted-sin+holam become a choice of the user, in the long run, and really, both should be available.

As posted "unicodeedit.tcl" ignores points and cantilation, and only puts up an ISO-8859-8 (or perhaps Windows-1255) "keyboard". With the right fonts available, it _is_ very easily extendable to either use (Olb pre-pointed) keyboard "keys", or to use the U+0591..U+05C4 range as overstrikes. But I have yet to find a great font for overstrike (Dec 2002), in a Type1 or BDF I'm not sure if David.ttf, has overstrike spacing encoded (yet). Of course, the widely useed Elronet fonts are plain 8859-8, no points? m$ Webfonts Times, Courier and Arial, of course, have the full range, or nearly so, but I keep reading reports of bad rendering.

David/DavidBD/DavidTR .ttf fonts have the following in the glyph post table: ISO-8859-1, plus the following names sheva hatafsegol hatafpatah hatafqamats hiriq tsere segol patah qamats holam qibuts dagesh meteg maqaf rafe paseq shindot sindot sofpasuq

alef bet gimel dalet he vav zayin het tet yod kaffinal kaf lamed memfinal mem nunfinal nun samekh ayin pefinal pe tsadifinal tsadi qof resh shin tav

vavdbl vavyod yoddbl geresh gershayim lefttoright righttoleft

newsheqel finalkafwithsheva finalkafwithqamats lamedwithholam lamedwithdageshandholam alternativeayin alefwithpatah alefwithqamats alefwithmapiq finalpewithdagesh vavwithholam betwithrafe kafwithrafe pewithrafe aleflamed shinwithshindot shinwithsindot shinwithdageshandshindot shinwithdageshandsindot betwithdagesh gimelwithdagesh daletwithdagesh hewithmapiq vavwithdagesh zayinwithdagesh tetwithdagesh yodwithdagesh finalkafwithdagesh kafwithdagesh lamedwithdagesh memwithdagesh nunwithdagesh samekhwithdagesh pewithdagesh tsadiwithdagesh qofwithdagesh reshwithdagesh shinwithdagesh and tavwithdagesh

Portions (c) Monotype, portions (c) Type Solutions Inc., portions Kivun Computers Ltd. in the 1033 index. This info from a ttfdump.pl output of David.ttf (but bold and transparent variants appear to be nearly identical).

UCD-2 [L1 ] Hebrew PDF Charts [L2 ]:

Hebrew [L3 ] U+0590 .. Combining forms
Currency [L4 ] U+20AA
Presentation [L5 ] U+FB1D .. Combined
Discussion [L6 ] Middle-Eastern Scripts

One Hebrew<->ASCII transliteration of sdome following is "Michigan-Clairmont", see http://www.dreamwater.org/bccox/heb_unic_conv.html for the table, and a converter.

Ruffy

I run a Windows 2000 box and run tcl via a DOS window.

When I open a .txt file containg Hebrew, and write to a .txt file the Hebrew lines therein, the Hebrew shows properly in the output file. But parallel output to the DOS window (stdout) does not show the Hebrew lettering; Even if I change stdout's encoding via "fconfigure stdout -encoding xxxx"; I've tried UTF-8, ISO8859-1, UNICODE but none of these encodings work. I'd like to see Hebrew on the DOS screen to debug processing I'll undertake.

Can you help me out? Much obliged and thanks.

MG I suspect (read as: largely guess) that what you need to do is use

  set fid [open $file r]
  fconfigure $fid -translation binary

Category Characters

Category Human Language