[Richard Suchenwirth] 2001-06-21 - A Python user from Latvia asked
how to make a Tk widget display Cyrillic (Russian) characters when typed into with a normal Latin (English) keyboard.

This is related to the more general problem of so-called "input managers", software that maps keyboard input to widget output in a not exactly trivial way. For each platform, there are existing solutions, but we don't have a generalized approach in Tk yet. So here comes my tiny "Tk input manager", or briefly ''tim'', which does first steps into that direction.

As required, the first application is for Cyrillic; but the
principle is clear enough to add other mappings  with little effort (for
a mapping ''foo'', add a proc ''foo'' and a setup''foo''). One
problem is that the Cyrillic alphabet contains more than the 26 [[A-Z]]
characters of the Latin one, so a sort-of "dead key" approach had to be
taken. In the sketch below, I disable the exclamation mark, but use it
in the beginning of two-character sequences that produce one Cyrillic
character each (see [Ruslish] for a discussion of this approach). To get
one real exclamation mark, add an extra space behind it (thanks to rmax for that tip!)

The widgets (text or entry - as both accept the ''insert insert'' method)
created with the prefix ''tim::Russian'' get the ''Russian''
bindings prepended to their [bindtags] list; besides, they inherit
everything from the original widgets.

A real input manager needs of course some more work. For example, allow toggling the keyboard encoding at runtime; display the current scheme, etc. Most challenging however is the extension to writing systems with a large character set: Chinese, Japanese, Korean... 
See [taiku goes multilingual]
}
 namespace eval tim {
	proc Russian {type w args} {makeit Russian $type $w $args}

	proc makeit {name type w argl} {
		variable know
		if ![info exists know($name)] {setup$name} 
		eval $type $w $argl
		bindtags $w [concat $name [bindtags $w]]
		set w
	}
	proc setupRussian {} {
		variable know; set know(Russian) 1
		foreach {in out} {
		    ! "{}" !<space> !
            A  \u0410 B  \u0411 V  \u0412 G \u0413 D \u0414 E \u0415
            !Z \u0416 Z  \u0417 I  \u0418 J \u0419 K \u041A L \u041b
            M  \u041c N  \u041d O  \u041e P \u041f R \u0420 S \u0421
            T  \u0422 U  \u0423 F  \u0424 X \u0425 C \u0426 !C \u0427
            !S \u0428 !T \u0429 Q  \u042a Y \u042b H \u042c !E \u042d
            !U \u042e !A \u042F !O \u0401
            a  \u0430 b  \u0431 v  \u0432 g \u0433 d \u0434 e \u0435
            !z \u0436 z  \u0437 i  \u0438 j \u0439 k \u043a l \u043b
            m  \u043c n  \u043d o  \u043e p \u043f r \u0440 s \u0441
            t  \u0442 u  \u0443 f  \u0444 x \u0445 c \u0446 !c \u0447
            !s \u0448 !t \u0449 q  \u044a y \u044b h \u044c !e \u044d
            !u \u044e !a \u044f !o \u0451
            
		} {bind Russian $in "%W insert insert $out; break"}
	}
 }
 #------------------------------- demo and test code, usage examples
 # Hint: type in e.g. "Moskva i Leningrad - dva gorody Rossii".
 # or: "!A ne zna!u nicevo, a ne ponema!u nicevo."
 
 tim::Russian entry .e
 tim::Russian text .t
 eval pack [winfo children .] -fill x
----

[Dimitry Golubovsky] - If you have a X-windows keyboard switch set up to generate keycodes for non-Latin characters (e. g. Cyrillic) you may also use the following approach.

Put the following someplace in your script to be executed at startup:

 foreach {keysym unichar} {
   <Cyrillic_yu> \u044e
   <Cyrillic_a> \u0430
   <Cyrillic_be> \u0431

 # ... Put pairs of <Keysym> and \u Unicode character value
 # ... for all the characters you want to map: this is not
 # ... limited to Cyrillic
 
   <Cyrillic_SHCHA> \u0429
   <Cyrillic_CHE> \u0427
   <Cyrillic_HARDSIGN> \u042a
 } {
   bind . $keysym \
     [concat "catch \{" \
             {[focus -displayof %W] insert insert} \
             $unichar \
             "\}" \
             {;break}]
 }

Technically, the whole <keysymdef.h> may be processed (perhaps manually), so Unicode mapping may be set for each keysym.

The map contains pairs of event symbolic code and Unicode character value. The iterator walks over the map and binds each event to the insertion of the corresponding Unicode character at the current insertion point of widget in focus. The insertion is embraced by catch, so if any error occurs (like the widget in focus does not support "insert insert") error popups supposedly will not appear.

Scripts bound to the root window are constructed for each event code mapped. The trick is that [[focus -displayof .]] must be evaluated when the script is invoked, but $unichar must be supplied at the time of binding. Therefore concat is called to build the script from parts.

This method might be more generalized if 'event generate' made it possible to create events substituting %A (Unicode character value). Currently, only %k and %K (of ones relevant to this issue) are supported. The nature of the issue seems to be in the X locale  setup when by some reason keysyms outside Latin-1 are not translated into Unicode properly by Xlib.

To find out whether your keyboard switch is set up to produce correct key codes, use the method described in [bind], topic by [KBK] (You can find the keysym ... ).

I switch my keyboard using xmodmap, and the beginning of my .Xmodmap file looks:

 !...........................................................................
 !        Key   Base              Shift           Mode    Mode+Shift
 !---------------------------------------------------------------------------
 keycode  24    = q               Q               Cyrillic_shorti       Cyrillic_SHORTI
 keycode  25    = w               W               Cyrillic_tse          Cyrillic_TSE
 keycode  26    = e               E               Cyrillic_u            Cyrillic_U
 keycode  27    = r               R               Cyrillic_ka           Cyrillic_KA
 keycode  28    = t               T               Cyrillic_ie           Cyrillic_IE
 keycode  29    = y               Y               Cyrillic_en           Cyrillic_EN

----

[Peter Schweitzer] 2006-03-24 I'm puzzled by a line above, within proc setupRussian:
           t  \u0442 u  \u0443 f  \u0444 x \22323u044b h \u044c !e \u044d
Is "\22323" correct syntax?  Also the article [Ruslish] seems to show the
value \u0445 corresponding to "x"; should it be different here?

[RS] - no, seems like some chaos paste, in which a row of the table got lost. Fixed - thanks!
----
[i18n - writing for the world] -  [Arts and crafts of Tcl-tk programming]

[[
[Category Characters] |
[Category Human Language]
]]