Non-ASCII characters

Arjen Markus (17 september 2002) The purpose of this page is to provide a place for information about typing characters that are not on the plain keyboard. I started it after some lamentation on my part for not being able to properly type "francais" for instance (it still does not work here :-(), later being introduced to the wonders of "xmodmap" and the trouble [Keiichi Takahashi] was experiencing with wish 8.4.0, when using Japanese input methods.


Methods of typing accented characters in X Window (UNIX, Linux):

  • First of all, use "xev" to find out what the keycode or keysym is for a key that you do not use under X Window, like the "menu key" on a typical PC keyboard:
   > xev 

   (put the mouse cursor in the window that appears and press the
   selected key.)
  • With the command "xmodmap -e keycode ... = Multi_key" or "xmodmap -e keysym ... = Multi_key", you can make this key into one that combines two consecutive characters into one:
   "U becomes U-umlaut (well, if it works)
  • The command can be put in a file .Xmodmap or .xmodmaprc in your home directory:
   keycode 101 = Multi_key

Note: this worked on my Linux machine, using KDE as the window manager. However, under Reflection X on a SUN Solaris with CDE the results are less than satisfactory.


KPV To convert non-ASCII characters into html entites you can use:

    foreach index [lreverse [regexp -all -indices -inline {[^\x1-\x7F]} $html]] {
        scan [string range $html {*}$index] %c char
        set html [string replace $html {*}$index "&#${char};"]
    }

[Keiichi Takahashi] (17 September 2002) The followings are almost the same as what I posted to c.l.t. on 15-Sep-2002.

I have a problem using Tcl/Tk 8.4.0 on Linux. The platform I am using is shown below:

  • Platform: Red Hat Linux 7.3 (Intel)
  • Kernel: 2.4.18-10
  • X Window: XFree86-4.2.0-8

The problem is Segmentation fault which can be seen on the console when exiting wish.

This symptom is observed when LANG variable in the environment is set to ja_JP.eucJP, and probably with appropriate client for the character conversion, such as kinput2, xwnmo and so on, which is defined at XMODIFIES variable in the environment.

This means that the same symptom has never been seen if the LANG is set to C, en_US.ISO8859-1 and so on.

Exiting the wish with gdb, the back trace shows the followings:

   (gdb) r
   Starting program: /home/bitwalk/bin/wish8.4
   % button .b -text "Hello"
   .b
   % pack .b
   % exit

   Program received signal SIGSEGV, Segmentation fault.
   0x402f400f in free () from /lib/libc.so.6
   (gdb) bt
   #0  0x4207afbf in chunk_free () from /lib/i686/libc.so.6
   #1  0x4207ad14 in free () from /lib/i686/libc.so.6
   #2  0x401c4e07 in _XFreeAtomTable () from /usr/X11R6/lib/libX11.so.6
   #3  0x401c947a in _XFreeDisplayStructure () from /usr/X11R6/lib/libX11.so.6
   #4  0x401b9067 in XCloseDisplay () from /usr/X11R6/lib/libX11.so.6
   #5  0x40057a13 in TkpCloseDisplay () from /usr/lib/libtk8.4.so
   #6  0x400518e0 in TkCloseDisplay () from /usr/lib/libtk8.4.so
   #7  0x40053a8c in DeleteWindowsExitProc () from /usr/lib/libtk8.4.so
   #8  0x4011cbfa in Tcl_Finalize () from /usr/lib/libtcl8.4.so
   #9  0x4011c9e8 in Tcl_Exit () from /usr/lib/libtcl8.4.so
   #10 0x4010095b in Tcl_ExitObjCmd () from /usr/lib/libtcl8.4.so
      :
      :

I am afraid this is because of lack of closing procedure for XIM, which is the Input Method for X.

At the function of TkpCloseDisplay(dispPtr) in tkUnixEvent.c, there is no XDestroyIC prior to function of XCloseIM(dispPtr->inputMethod).

I've tried to add XDestroyIC there, but it looks difficult for me to add, since every context for XIM is defined by winPtr->inputContext as local in tkEvent.c and they seem no relation to dispPtr->inputMethod.

Or this problem would be caused by other root....

Anyway, for users who want to use XIM on UNIX-like OS, such as for Japanese, this must be big issue to upgrade to Tcl/Tk 8.4.0 because this issue seems locale related.


LV Also, if you want to use XIM on SPARC Solaris, you need to patch Tk 8.4.[0-6], because it currently doesn't work with kinput.


See also the Lish family for several converters from ASCII to outlandish characters, the i18n package, taiku goes multilingual and iKu.


[ Category Human Language | Category Characters ]