Version 0 of Common Questions about Tcl/Tk and Japanese language support

Updated 2001-04-12 07:16:12

Purpose: to accumulate common questions (and hopefully answers!) relating to the use of Tk and Japanese language input, output, etc.


Jim Breen asks:

I am porting an application to Tcl/Tk which uses mixed Japanese and non-Japanese text. Although I can display Japanese fine, a problem remains with input of Japanese text directly into text boxes, etc.

Are there any existing or planned extensions to allow Tcl/Tk to interface the common "input methods" which enable on-the-fly conversion of typed text? The sorts of IMs I am referring to include XIM or kinput2 (for Unix) and Microsoft's Global IME.

There are patched Japanese versions of Tcl/Tk which engage XIM and/or kinput2 when inputting, but I don't want to get locked into non-standard approaches.

Hoping someone will either tell me it's there already, or it's coming in 8.4.

Without IM capability or plugins, I18N is only half-done.


Larry Virden replies:

I've a user doing the same - except in his case (see in a comp.lang.tcl thread I _tried_ to start earlier) the Japanese is NOT being displayed fine... the font is pretty ugly compared to the patched version of Tk 4 previously being used...

I am needing the same information regarding input methods, adding Canna to the list ...

In my case, platform is SPARC Solaris 8.


More information from Jim Breen:

Re the rendering of Japanese characters:

The default is very ugly. I was able to improve it greatly by cranking up the point size, e.g.:

        set ntext [encoding convertfrom euc-jp "......"]

        font create gothicfont -family gothic -size 16
        label .jpl -text $ntext -font gothicfont

Strangely, when I moved it to 24pt, it went "blocky". Ir appears not the render TTFs. Odd, considering the Gothic font is a TTF.

Re Input Mehods:

Sort-of answering my own question, using material emailed to me (the person who sent it has a site-block on posting.)

Supplied Information:

  For UNIX, you can get XIM patch in the following site::

  ftp://ftp.sra.co.jp/pub/lang/tcl/jp/tcltk8.3.1i18n.patch.tar.gz

  For Windows IME, you can use Mr. Yamamoto's patch:

  http://www3.ocn.ne.jp/~yamako/tcl/8.3.2/ime832r1.zip
  http://www3.ocn.ne.jp/~yamako/tcl/8.4a1/ime84a1r.zip

  Also you can download full package with installer which were compiled
  with the IME patch and packaged by Inno Setup:

  http://members.nbci.com/tcltk/bitwalk/win/tcl832.exe
  (note: This includes several extensions such as Incr Tcl, BLT, Tix,
  Tktable and so on.)

I have been pointed at the Tcl 8.4 Roadmap. It states under Miscellaneous that XIM support is to come in 8.4 (8.4a3) and marks it as complexity "3". No mention of the Microsoft IME (yes, it does need the application to use the API. Netscape provided IME API support in 4.74)


Jim Breen later writes on news:comp.lang.tcl

I am working with files containing Japanese text in the "EUC-JP" encapsulation. For display in Tcl/Tk, I convert them to Unicode with 'set ublah [encoding convertfrom euc-jp $jblah]'.

This works fine where the Japanese is from the more common JIS X 0208 character set, but fails when the characters are from the supplementary JIS X 0212 set, which is encapsulated in EUC-JP as 3 bytes, the first being 0x8F. (The JIS X 0212 characters have all been in Unicode from 1.0.)

Is there something extra I have to do to get them converted, or is it really the case that the EUC-JP -> Unicode conversion in Tcl/Tk8.1 and later is incomplete? If the latter, in other words it is a bug, what is best way to get it fixed for the 8.4 stable release?

(I guess I'm not that surprised that the 6,000 characters in JIS X 0212 are not being handled, because they are not used that often, largely because the coding used by Microsoft (Shift-JIS) cannot encapsulate them. They *are* however in EUC-JP and Unicode, and should not be overlooked. See http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0212.TXT

Jeff Hobbs replies:

Have you tried explicitly the jis0212 encoding instead? I thought our encodings were derived from Unicode 2.1, but perhaps there was a flaw in translation. I do see though that there are separate euc-jp, jis0208 and jis0212 encodings.

Jim Breen replies:

I have now, changing the "encoding system" to "jis0212". At that point the client stopped functioning; entry boxes wouldn't accept text, eventually XFree crashed,....

: (It) worries me as they are really different things. jis0208 and jis0212 are two sets of codes, with jis0212 an extension of jis0208. euc-jp is an encapsulation system which can represent both of them.

In fact when I use jis0212 characters, they are mixed up with jis0208 ones. By themselves they are meaningless. Perhaps this was not quite understood when the I18N code went into 8.1.


Later in the discussion (not everything has reached me here yet), the following exchange occured on news:comp.lang.tcl :

 Date: Thu Mar 22 01:47:54 EST 2001
 From: [Ioi Lam] <[email protected]>

 Jeffrey Hobbs wrote:
 >
 > Ioi Lam <[email protected]> writes:
 >
 > > I have sumbitted a patch (402993) to TK for using IME on English
 > > Win2000. So far it is not integrated yet. Can someone in TCT look at
 > > this and integrate it?
 > >
 > > If all works well I am willing to implement Global IME as requested by
 > > Jim. It should be fairly, copying code from Mozilla, etc. However, I
 > > want to make sure the code has a chance of being used :-)
 >
 > I will look at this later today or tomorrow morning.  However, if
 > I could get a confirmation from Jim (or any other IME user), that
 > would be great.  I just want the warm fuzzy feeling that the
 > patch worked for differently configured machines with IME (since
 > I have no IME enabled machine myself).

In case anyone is as confused as me about IME for Win32, there are 4 ways of inputing CJK (Chinese/Japanese/Korean) characters for Win32

(a) "old" IME: This is what you get when you buy a CJK edition of Windows (e.g., Japanese Edition of Windows 98). Tk already supports this today.

(b) 3rd party "wrapper" IME: in the old days, if you had an English Win95, you couldn't input CJK characters. To solve this, some 3rd party companies started selling so-called "wrapper" IMEs. Examples are UnionWay, TwinBridge, etc.

(c) Microsoft Global IME. As part of its instinct to destroy other people's business, Microsoft started (around the time of IE 4.0) giving away its own wrapper IME for English Windows. They call this Global IME. It works under Win95/98/NT. Apps can access Global IME with a COM API.

(d) Win2K IME. With Win2K, Microsoft decided Global IME is too much a hack, so they support yet another kind of IME. Any version of Win2K supports CJK IMEs. You just have to go into the Control Panel and enable them. This works almost the same way as the "old" IME in (a), except the user can change the input language on the fly, so you have to process extra events in order to support it.

So, what's TK's status? (a) is supported today. (d) works after my patch to handle the input language change events. (c) doesn't work, but I am willing (make that *very willing*) to work on it. (b) has too much variations so that's a bit tough.

As for testing my patch, I tested under Japanese Win98 to make sure it doesn't break (a). I then tested under English W2K with CJK IMEs installed, and it proved that (d) also works.

For sanity sake, I tested under plain English versions of Win98/NT/2K and nothing seems to break.

A side effect of this patch also gives you some of (b): my Chinese PenPower input tablet starts working under TK.

Jeff, if you have Win2K and want to taste the Joy of IME, I can send you a page on how to input Japanese :-)

Hope that's enough information.


Ioi Lam I've uploaded my instructions for setting W2k for inputting Japanese with TK

http://tixlibrary.sourceforge.net/i18n/tk_w2kime/

It's a similar process for setting up Chinese or Korean input.

At this page, you can also find a patched wish832d.exe that can handle CJK input on W2k.


Can someone show me a code sample for being able to copy and past text from a Kanji display to a Kanji input text entry? Right now, the text is being pasted as ???? for me.