** Description ** [Richard Suchenwirth] 2001-06-11: Case conversion means changing uppercase characters to lowercase, or vice versa. Tcl does this quite nicely with ''string toupper/tolower'', even with characters outside the ASCII range, e.g. the Cyrillic alphabet. There is however one case in Turkish where special rules apply. The uppercase of our well-known letter "i" must be the dotted I \u0130, while the lowercase of good old "I" shall be a dotless i, \u0131. This means the dot is treated as a diacritic, which is pretty logical, but contrary to the habit in other languages. Here's a nifty routine for use in Turkish applications that treats these special cases as well as the rest of the conversion: ====== proc tr:to {cmd args} { switch -- $cmd { upper {regsub -all i $args \u0130 args} lower {regsub -all I $args \u0131 args} default {error "bad option '$cmd': must be upper or lower"} } string to$cmd $args } # usage examples tr:to upper izmir ;# produces Ä°ZMÄ°R tr:to lower YILDIZ ;# produces yıldız ====== Notice how the minor command, after filtering for correctness, is pasted into the string to$cmd call. See also [Eurolish] for easy input of Turkish diacritics, and [The Lish family] for the whole picture. ---- An even worse anomaly, which is not correctly reversible, exists in German: the lowercase ''Eszet/scharfes S'' (ß, \u00DF) corresponds to two uppercase letters SS, but not all SS sequences may be lowercased to ß. ---- '''Greek Sigma''': There are two different lowercase forms for the Greek letter Sigma, \u03C2 (used at end of word only) and \u03c3 (used in all other positions), but only one uppercase \u03a3 (the preceding \u03a2 is not used, so for software that wants to keep this distinction, it might be 'abused' for uppercase final Sigma...) [RS] ---- [LV]: Richard, has this special case been mentioned to Scriptics so that they might have the routines do the right thing without programmers having to special case things? [RS]: No. The problem is that there is no general solution. Even a system localized in Turkey would be wrong in always toupper/lowering as above, if dealing with filenames - imagine how much code would break (there's files like CONFIG.SYS...). The application must have the 'conscience' that a string is Turkish, and only then apply tr:to {upper,lower} to it. ---- [KBK]:Case conversion also is different in Dutch - where converting 'ijssel' to titlecase results in 'IJssel' (see [Things Dutch]). [AM]: Alas, precious little software is aware of this - one culprit being MS Word (unless you take the pain to instruct it do the "right thing". At the beginning of a word any combination "ij" is to be capitalised as "IJ". <> Category Human Language | i18n - writing for the world | RS