bidi rendering

Summary

Richard Suchenwirth: bidi is short for bidirectional and is especially used in situations where right-to-left (r2l) text (e.g. Arabic, Hebrew - see Heblish) is embedded in left-to-right text or viceversa. The Unicode standard specifies that text storage is always in logical direction, so in memory (or in files), Arabic and Hebrew would run in the order of ascending addresses, exactly like left-to-right text, and it's the task of the rendering software to restore the habitual direction.

Description

Tk has no bidi facilities yet, so Unicodes even if from r2l systems must come naively left-to-right if they are to appear correct on screen. In taiku (taiku goes multilingual) I ran into the problem that my simple "printer driver", which lets IE do the work, failed on Hebrew, because IE has bidi handling, and thus wrongly re-reverts Hebrew words. OK, so here's a simple bidi handler that reverts sequences of r2l characters. The condition for prepending characters to a swapping sequence are

character is in Arabic or Hebrew range of Unicode, or:
character is in a set of "direction-independent" ones (presently dash and space) - then it is only prepended if the buffer is not empty.

All other characters append the buffer to output and flush it before being inserted themselves. As IE is also able to render Arabic into context-dependent glyphs, this was not necessary here (otherwise, see A simple Arabic renderer). Handling of mixed lines with both English and Arab words is satisfactory. Embedded Indo-Arabic digits seem to turn off IE's bidi totally, so better avoid them for now.

proc fixBidi s {
   set res {}; set buf {}
   foreach c [split $s ""] {
       if {[r2l $c] || [regexp {[ -]} $c] && $buf ne ""} {
           set buf $c$buf ;# prepend
       } else {
           append res $buf$c ;# empty buffer won't hurt
           set buf {}
       }
   }
   append res $buf ;# in case some were left at end of line
}
proc r2l c {
   scan $c %c uc
   expr {$uc >= 0x05b0 && $uc <= 0x065f
      || $uc >= 0x066a && $uc <= 0x06bf
   } ;# Arabic context glyphs need not be reverted for IE
}

The next routine "fixes" a file with partial bidi content, so that it can be correctly rendered by bidi-conscious software, like: Outlook, Wordpad, Powerpoint, IE.

proc fixBidiFile {filename} {
   set fp [open $filename]
   fconfigure $fp -encoding utf-8
   set data [read $fp]
   close $fp
   set fp [open $filename w]
   fconfigure $fp -encoding utf-8
   foreach line [split $data \n] {
      puts $fp [fixBidi $line]
   } 
   close $fp
}

Devanagari i: The Indian alphabets normally run from left to right. In at least the Devanagari alphabet there is one special case, though: the vowel (short-i) is written to the left of its consonant, even though it follows it as phoneme. One good example word is हिन्दी HindI (the second being a long-i), where the letter order from left to right is iHndI. On Windows XP, some renderers for Devanagari do this transposition automatically, others (including the one that Tk uses) don't.

For a fix to this problem, a [[regsub] should be enough, to locally revert the character order:

regsub -all {(.)\u093f} $str \u093f\\1 str

RS 2008-06-27: Another problem in Tk rendering of Devanagari, for which I have no solution, can also be demonstrated on Hindi हिन्दी . The "n" in proper rendering does not have its vertical bar, but rather just sticks to the following "d". The Tk rendering provides a full "n" with subscript \u094D (silent vowel indicator), like हिन् दी . I'm not sure how much this inhibits readability, or how "wrong" it looks to experienced users of Devanagari...

APN: It looks completely wrong to me, though I'm not a native Hindi speaker per se, I'm close enough I think. For what it's worth, Firefox 2 also has this rendering problem. I'm surprised because I had thought (naively, I now realize) Windows does all the rendering and thus all apps would display the same way. RS 2008-07-03: Fixed thanks to help from JH and KBK - On Win 2000 and up, one must select the "Complex scripts" support. See Wikipedia [Windows XP and Server 2003, Wikipedia]. Then Tk renders correctly.

Matta 2011-05-10 20:18:42: Bidirectional languages in Tk now works fine on Windows. Arabic is presented correctly, although selection of text in a text box -- even if the text box is right-justified or right-aligned -- is weird. So if the user is likely to Copy and Paste, do not reverse the order of rtl language characters, for use on Windows.

Unfortunately, Tk on Linux still shows the Arabic letters in (i) a left-to-right order, and (ii) disconnected. Both of which render the text unreadable. Well, it can be read, but it is like decrypting a code!

Superlinux 2013-July-5 The issue of the Arabic language as described by Matta above is now 80% of the problem solved . Please Read Arabic Character Renderer For Readability In TCL/Tk

Superlinux - 2014-03-18 17:38:51

I made another solution to this. https://wiki.tcl-lang.org/39542 . It displays Arabic correctly coexistent With English . However, When you want to save the text, it will save the Arabic in the reverse order. The English is not touched at all. I haven't tested printing on paper using this method but I am 90% sure that the Arabic will also be printed reversed.

Linuxpeter - 2022-04-11 14:15

My solution called TkBidi [L1 ] should work 100% in any Tk widget for Arabic, Persian, Urdu and Hebrew. It has been extensively tested on our own Tcl application BiblePix [L2 ].

Natural Languages

Category Human Language

Arts and Crafts of Tcl-Tk Programming