Tcl Meetup, 2022-03-09
Summary of surrogate pair issue: 8.6 always had issues handling them. That handling changed around 8.6.10 (?) to be like 8.7. string length of certain Unicode scalar values is 2, as documented in TIP 542. string length of any single Unicode code point will be 1 in 9.0. Jan Nijtmans suggested releasing 8.7 and 9.0 at the same time
Jan and Don discussed Tcl_UniCharToUtf, which maintains some state information about the string in the output buffer so that when it encounters a surrogate pair, it can wait to output a character until it encounters the second half of the pair, even though it only takes one character as an argument each time it is called.
Tcl_UniCharToUtf is used throughout the C sources, so teasing the ad-hoc utf-16 encode/decoder from Tcl will be a chore.
Brian mentioned that 8.7 crashed when he pasted an emoji into a text widget.
Jan said that only a little more work to be done on 8.7/9.0's Unicode issues to make them ready for a new release. In particular he mentioned TIPs 573, 601, 607 and 619.
The fossil branch https://core.tcl-lang.org/tk/timeline?r=glyph_indexing_2 was mentioned as a case of a workaround Tk requires to resolve some current Unicode issues, which would presumably be made unnecessary by the above TIPs.
Don responded to requests for feedback on how people could lighten his workload in preparing new releases. He mentioned: