Why AndroWish switched to TCL_UTF_MAX=6

  • Support of Emoji requires to cover the full range of Unicode code points (0x000000...0x10ffff).
  • Unicode 8.0 support in Tcl was introduced by Jan Nijtmans during EuroTcl 2015.
  • Emoji is quite popular on mobile devices, specifically fully supported on Android starting with the 4.4 release.
  • There's a nice TrueType font named Symbola, which implements most of Emoji.
  • By redefining TCL_UTF_MAX (default value 3) the valid range of supported code points can be adapted as well as the in-core memory requirements.
  • Convention: TCL_UTF_MAX=3 and TCL_UTF_MAX=4 fits in 16 bit memory representation (sizeof (Tcl_UniChar)).
  • Convention: TCL_UTF_MAX>4 needs 32 bit memory representation (sizeof (Tcl_UniChar)).
  • TCL_UTF_MAX>=4 is able to cover the full range of code points (0x000000...0x10ffff).
  • TCL_UTF_MAX=4 requires surrogate pairs internally to represent a code point larger than 0x00ffff by using two consecutive 16 bit values resembling UTF-8 notation.
  • Usage of surrogate pairs is expensive and error prone with respect to counting the number of code points (and repeats a range of strange problems observed in the most popular programming languages of the year of this writing (whose names both start with the letter 'J')).
  • TCL_UTF_MAX=6 forces usage of 32 bit as internal Tcl_UniChar representation and eliminates counting issues as well as escaping issues.
  • 32 bit representation fits font rendering with freetype.

And last but not least:

  • We are the Borg! AndroWish need not be compatible with nothing. You will be assimilated.

But there's hope:

https://web.archive.org/web/20150105000208if_/http://www.ch-werner.de/AndroWish/aw-no-work.jpg

Another article explaining the problem is The Tragedy of UCS-2 .