Version 27 of localization

Updated 2003-09-01 07:49:00

aka l10n.

In software, localization is the process of making a program able to adapt to local conventions of sorting (??), commification (I guess that's what you would call selecting the character used when marking off orders of magnitudes of numbers ... i.e., 1,000,000 versus 1.000.000 (1)), and other issues.

Also, translation of all GUI texts to the local, or selected, language (see msgcat).


DKF - Perhaps it is easier to put it like this: Localization is the process of taking a program (typically one that has undergone the process of internationalization) and adapting it to the particular situation in which it is found (generally called the locale.) This includes things like showing the right strings to the user, handling numbers, dates and monetary values correctly, doing the right thing when comparing two strings, etc.


As mentioned at Tcl'2003E, for some examples of just how easy it is to work with l10n in Tcl, see [L1 ]. It shows some of the ICU data in Tcl-form. It can of course be downloaded with all the data from sourceforge, see [L2 ]. - VL


(1) RS thinks number formatting is a better understandable term for that.

escargo thinks numeric presentation or numeric representation is a better term for that.

aa thinks numeric representation is more appropriate as a description of the underlying bits in computer memory than of the displayed characters. The term presentation is quite nice, though.

There are three issues of presentation that I can think of.

  1. Marking off magnitudes.
  2. Marking off fractions.
  3. Expressions involving currency.

What about dates and times? - aa

Date and time settings need to be considered, but there needs to be like today the clock format to be used by the programmer, do we collect any sort of setting today from the OS in regards to date and time presentation? - VL

2003-aug-20: Luciano ES, hobbyist programmer and translator by trade, thinks that "localization" is the best term because that's the term used by the entire translation and l10n industry. Unless you mean something else, of course. In practice, some l10n is almost invariably applied to some extent even in "pure" translation, because some cultural differences cannot be ignored. In the present context, you can't simply "translate" numbers. Each locale's conventions must be taken into consideration. Another example is that Tk also uses "named colors", and many of these colors' names will not be recognized in other countries if rendered into strictly literal translation. Games probably provide the best example to illustrate the issue. Many worldwide popular games, including some already represented in this wiki board in Tcl/Tk code, have names whose meanings do not correspond to the meaning of the names they carry in other cultures. For example: if you translate "Solitaire" into Brazilian Portuguese literally, everyone will think you mean another game, not the one you meant. If you translate "Monopoly" or "Clue" literally, no one will have the faintest idea of what you mean, although these games are (or used to be when I was a kid) very popular in Brazil. So numbers are just one of the many aspects of localization. If you don't like the name, then you'd better decide if this page is going to be truly about localization in general or merely about numeric representation. We can't change the names of the pages, so I guess you don't have much of a choice. :-)


VL 2003-aug-20: Since there seems to be a small 'project' to localize the messages in Tk's dialogs etc. shouldn't that be at least mentioned on this wiki? It seemed quite obvious at Tcl'03'europe that there was a great deal of interest in translation of the core. But noone seems to have even mentioned it since, is the problem interest, coordination or what?


30 Aug, 2002 - Luciano ES It just crossed my mind, and I never saw anyone mention it, that Tcl must be the only language that can be translated/localized, allowing the programmer to program in their own native language. For example: instead of saying...

        foreach i $list { 
                if { eof } { 
                        set a [ expr $a + 1 ]
                        puts "the end - total is $a" 
                } else { gets 
                }
        }

... I could source one file full of renames and say, in Portuguese:

        percorre i $lista { 
                se { fda } { 
                        seja a [ calcule $a + 1 ]
                        cuspa "Acabou - o total é de $a" 
                } euc { pl 
                }
        }

... or more verbose...

        {para cada} i $lista { 
                se { {fim do arquivo} } { 
                        seja a [ calcule $a + 1 ]
                        cuspa "Acabou - o total é de $a" 
                } {em ultimo caso} { {pega linha} 
                }
        }

Weird, but totally doable!

Programming and many IT things are developed mostly by Americans, and I really resent the way so many developers simply ignore that other languages have diacritics. Take e-mail and http, for example. It's like they're saying "we did it for us only, we don't expect you lot to use it". The ugly consequence of that is a current generation of youngsters that were raised in front of computers, e-mail, icq etc., interacting with people in their native languages, but leaving out the accents in characters and sometimes coming up with some really weird spellings (e.g. naum = não). Many people still refrain from using accented characters in their mail because they think it still is "forbidden" by all computers. I love my native language and I hate reading mail written in "accent-less" speak. Just to give you ASCII-0127-only speakers an idea, it's almost as bad as reading something in "l33t speak". It's an almost irreversible damage on other cultures, caused by those who were too lazy to consider the full ASCII range right from the beginning.

In the examples above, the commands could even have accents and diacritics, and Tcl wouldn't care. :-)


TV I remember trying some time ago to get the actual time of day, possibly in some time zone, and that was not easy, I'm not sure earlier with clock ticks approacg it was, but I didn't exactly like the idea even as much as for instance the unix clock seconds since a certain fixed time idea. But what an options, on the other hand. Which should be doable as list based processing, but I don't know wether that isn't already the way it is kept from the core.

Localization reminds me of a dll per language dialect which I find quite unnice, and not logically connected with the ide af a function library, more with a little database. Also, I am almost a hater of dutch program version, I'd rather have the english than die in that way, but that is personal.

The reasonable enough idea to present a program in various languages would at least make me think about a straightforward method: define some array/list/database (the order and content can be fixed, so what does it matter) with all language strings with a handy key-method (a sequential number is easiest but least illustrative), and refer to the language strings by reference. Replacing the little database with the database of choice is then flexible and code-safe. RS: The canonical solution for this database is msgcat...

Just for the idea, lets make a little example, maybe that sets some things of. A little window with original language and translation entry, next and first button, a text_from widget grabber, and a put_translation_in_widgets live updater. If anything, at least good tcl/tk code practice.

http://195.241.128.75/Wiki/lang1.png


Category Porting