localization

aka l10n

In software, localization is the process of making a program able to adapt to local conventions of sorting (??), commification (I guess that's what you would call selecting the character used when marking off orders of magnitudes of numbers ... i.e., 1,000,000 versus 1.000.000 (1)), and other issues.

Also, translation of all GUI texts to the local, or selected, language (see msgcat).


DKF: Perhaps it is easier to put it like this: Localization is the process of taking a program (typically one that has undergone the process of internationalization) and adapting it to the particular situation in which it is found (generally called the locale.) This includes things like showing the right strings to the user, handling numbers, dates and monetary values correctly, doing the right thing when comparing two strings, etc.


VL: As mentioned at Tcl'2003E, for some examples of just how easy it is to work with l10n in Tcl, see [L1 ]. It shows some of the ICU data in Tcl-form. It can of course be downloaded with all the data from sourceforge, see [L2 ].


(1) RS thinks number formatting is a better understandable term for that.

escargo thinks numeric presentation or numeric representation is a better term for that.

aa interjects: I think numeric representation is more appropriate as a description of the underlying bits in computer memory than of the displayed characters. The term presentation is quite nice, though.

escargo continues: There are three issues of presentation that I can think of.

  1. Marking off magnitudes.
  2. Marking off fractions.
  3. Expressions involving currency.

aa: What about dates and times?

VL: Date and time settings need to be considered, but there needs to be like today the clock format to be used by the programmer, do we collect any sort of setting today from the OS in regards to date and time presentation? - VL

Luciano ES 2003-08-20: As a hobbyist programmer and translator by trade, I think that "localization" is the best term because that's the term used by the entire translation and l10n industry. Unless you mean something else, of course. In practice, some l10n is almost invariably applied to some extent even in "pure" translation, because some cultural differences cannot be ignored. In the present context, you can't simply "translate" numbers. Each locale's conventions must be taken into consideration. Another example is that Tk also uses "named colors", and many of these colors' names will not be recognized in other countries if rendered into strictly literal translation. Games probably provide the best example to illustrate the issue. Many worldwide popular games, including some already represented in this wiki board in Tcl/Tk code, have names whose meanings do not correspond to the meaning of the names they carry in other cultures. For example: if you translate "Solitaire" into Brazilian Portuguese literally, everyone will think you mean another game, not the one you meant. If you translate "Monopoly" or "Clue" literally, no one will have the faintest idea of what you mean, although these games are (or used to be when I was a kid) very popular in Brazil. So numbers are just one of the many aspects of localization. If you don't like the name, then you'd better decide if this page is going to be truly about localization in general or merely about numeric representation. We can't change the names of the pages, so I guess you don't have much of a choice. :-)


VL 2003-08-20: Since there seems to be a small 'project' to localize the messages in Tk's dialogs etc. shouldn't that be at least mentioned on this wiki? It seemed quite obvious at Tcl'03'europe that there was a great deal of interest in translation of the core. But noone seems to have even mentioned it since, is the problem interest, coordination or what?


Luciano ES 2002-08-30: It just crossed my mind, and I never saw anyone mention it, that Tcl must be the only language that can be translated/localized, allowing the programmer to program in their own native language. For example: instead of saying...

foreach i $list { 
    if { [eof] } { 
        set a [ expr $a + 1 ]
        puts "the end - total is $a" 
    } else { gets 
    }
}

... I could source one file full of renames and say, in Portuguese:

percorre i $lista { 
    se { [fda] } { 
        seja a [ calcule $a + 1 ]
        cuspa "Acabou - o total é de $a" 
    } euc { pl 
    }
}

... or more verbose...

{para cada} i $lista { 
    se { {fim do arquivo} } { 
        seja a [ calcule $a + 1 ]
        cuspa "Acabou - o total é de $a" 
    } {em ultimo caso} { {pega linha} 
    }
}

Weird, but totally doable!

Programming and many IT things are developed mostly by Americans, and I really resent the way so many developers simply ignore that other languages have diacritics. Take e-mail and http, for example. It's like they're saying "we did it for us only, we don't expect you lot to use it". The ugly consequence of that is a current generation of youngsters that were raised in front of computers, e-mail, icq etc., interacting with people in their native languages, but leaving out the accents in characters and sometimes coming up with some really weird spellings (e.g. naum = não). Many people still refrain from using accented characters in their mail because they think it still is "forbidden" by all computers. I love my native language and I hate reading mail written in "accent-less" speak. Just to give you ASCII-0127-only speakers an idea, it's almost as bad as reading something in "l33t speak". It's an almost irreversible damage on other cultures, caused by those who were too lazy to consider the full ASCII range right from the beginning.

In the examples above, the commands could even have accents and diacritics, and Tcl wouldn't care. :-)

DKF: Alas, the keyword understood by if are not changeable through rename (they're hard-coded in the if implementation). But they're also, with the exception of elseif if you're using it, unnecessary, so you can leave them out and still have your script work. :-)

EF It can be done in C as well, using a bunch of #define. I actually learnt to program in C that way as a student and remember how hard it was to adapt to the real english syntax. I am not entirely convinced that it was a good teaching idea...


TV: I remember trying some time ago to get the actual time of day, possibly in some time zone, and that was not easy, I'm not sure earlier with clock ticks approacg it was, but I didn't exactly like the idea even as much as for instance the unix clock seconds since a certain fixed time idea. But what an options, on the other hand. Which should be doable as list based processing, but I don't know whether that isn't already the way it is kept from the core.

Localization reminds me of a dll per language dialect which I find quite unnice, and not logically connected with the ide af a function library, more with a little database. Also, I am almost a hater of dutch program version, I'd rather have the english than die in that way, but that is personal.

The reasonable enough idea to present a program in various languages would at least make me think about a straightforward method: define some array/list/database (the order and content can be fixed, so what does it matter) with all language strings with a handy key-method (a sequential number is easiest but least illustrative), and refer to the language strings by reference. Replacing the little database with the database of choice is then flexible and code-safe. RS: The canonical solution for this database is msgcat...

Just for the idea, lets make a little example, maybe that sets some things of. A little window with original language and translation entry, next and first button, a text_from widget grabber, and a put_translation_in_widgets live updater. If anything, at least good tcl/tk code practice.

http://www.theover.org/Wiki/lang1.png


AM: The problem occurs with scan and format too: in some countries, a comma is used instead of a period and vice versa (notably: Holland, Germany, ...)

I have written a small package to deal with the problem: Extensions to scan and format


WJP 2007-05-16: I am curious if anyone has addressed the problem of changing the language of the interface at runtime, after the GUI has been created. The problem is that if we use msgcat on the arguments of -label and -text options when creating widgets, we get whatever language is in force at the time that the GUI is created. If the user subsequently chooses another language, that choice will have no effect on the existing text.

The blunt force approach is to destroy and rebuild the GUI. If it is complex that may be difficult, consume a lot of resources, take more time than is desirable, and look strange to the user.

The naive approach is to manually configure the various widgets when the language changes, but this quickly becomes cumbersome and error prone. I've got a partial solution in the form of the following procedure:

proc RelabelChildren r {
    foreach w [winfo children $r] {
        switch -- [winfo class $w] {
            Canvas        -
            Entry         -
            Frame         -
            Listbox       -
            Panedwindow   -
            Scrollbar     -
            Text          -
            Toplevel      -
            tk_optionMenu {
                RelabelChildren $w
            }
            Button      -
            Checkbutton -
            Label       -
            Labelframe  -
            Menubutton  -
            Message     -
            Radiobutton -
            Spinbox {
                if {[info exists ::TextBase($w)]} {
                    $w configure -text [_ $::TextBase($w)]
                }
                RelabelChildren $w
            }
            Scale {
                if {[info exists ::TextBase($w)]} {
                    $w configure -label [_ $::TextBase($w)]
                }
            }
            Menu {
                #The menu itself never has any text to configure.
                set TearoffP [$w cget -tearoff];#Tearoff has index 0 if present.
                set Last [$w index last]
                for {set k $TearoffP} {$k <= $Last} {incr k} {
                    if {[info exists ::TextBase($w,$k)]} {
                        $w entryconfigure $k -label [_ $::TextBase($w,$k)]
                    }
                }
                RelabelChildren $w
            }
            RelabelChildren $w
        }
    }
}

The idea here is to start at the top of a tree or subtree of widgets (normally ., but in some cases we know that we can start lower, or the argument might be a toplevel) and recursively descend the tree. For each widget, we determine its class, and if it has a -text or -label option, we change the text to the new output of msgcat by using configure. Menu entries are a bit more complicated but the idea is the same.

This works quite nicely but for the problem of knowing what string to pass to msgcat to translate. An easy approach would be to write, e.g.:

$w configure -text [_ [$w configure -text]]

but this will return the current text, which may not be the base text for the message catalog. (Suppose, for example, that the base text is in English but that Spanish translations have already been assigned. For example, something labelled "File" is now "Archivo". Now we switch to German.

[$w configure -text]

now has the value "Archivo". If the base language of the German message catalogue is English,

[_ Archivo]

is not going to yield Datei as we wish since it is indexed under File.)

The upshot is that we need to store the base values of all of the relevant strings. The above procedure uses a global array TextBase, which is initialized by calls like these:

set ::TextBase([label $w.order.title]) "Presentation Order"
set ::TextBase($m,0) File
set ::TextBase($m,1) Options
set ::TextBase($m,2) Help
set ::TextBase($m.file,4) Quit

which either include the widget creation as in the first example or are interspersed among the widget creation calls as in the remainder. Getting menu entries right is especially challenging.

This is the best solution that I have found so far. Has anyone got anything better?

If this is the best approach, I wonder if it would not be a nice modification to Tk to have each widget automatically store the base string. It would cost a bit of memory to keep what might be extra copies of the strings, but would make it much easier for the programmer.

MB 2007-10-28: This is an interesting solution. The solution chosen by RS in https://wiki.tcl-lang.org/2967 was to manually configure the menu. Your solution is more automatic and has the advantage that no msgcat::mc has to be introduced into the base Tcl script for it to be translated. But your solution to find the base string is not really economic because it not so simple and requires memory. I think that a simpler approach would be to invert the dictionnary stored in the message catalog. No command to do this is currently available in the msgcat package. Modifying the msgcat package to include such a command would be simple because the latest version of the msgcat package is based on a Tcl 8.5 dict. One could search the key corresponding with the given value.

WJP 2007-10-31 : That sounds like a good approach, but it assumes that the mapping from the base language to the other languages is invertible. I don't know if this will often occur in practice, but in principle I don't think that we can assume this. Suppose that two English entries map onto the same Spanish string. If we now change locale to German and invert the English->Spanish mapping to get the index under which to look up the German translation, we will have an irresolvable ambiguity.

See Also

When size matters
Localizing Tk's messages
A complaint about limiting a translated string to the length of the string in the original language.
Delimiting Numbers