[Keith Vetter] - 2024-01-15: Emoji are weird [https://unicode.org/emoji/principles.html%|% %|%]. After initially being ignored by the Unicode Consortium, and they were only adopted in 2007 as part of the Supplementary Multilingual Plane (SMP) [[https://www.unicode.org/reports/tr51/#Introduction%|% %|%]. But being part of the SMP meant that many systems had to be updated to be able to handle them. This includes Tcl which didn't get initial emoji handling until version 8.6.10. Alas, I'm stuck on version 8.6.9, so I thought I was out of luck in regards to using emoji in my software. But, Unicode is complicated, and there's an obscure feature called Variation Selectors [https://www.unicode.org/charts/PDF/UFE00.pdf%|% %|%] [https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)%|% %|%]. They are 16 combining characters (U+FE00 - U+FE0F) which can change the glyph variant of the preceding character. Variant Selector U+FE0E and U+FE0F are glyph variants text style and emoji style, respectively. As an example, the character \u25b6 "▶" is "BLACK RIGHT-POINTING TRIANGLE", but \u25b6\uFE0F is an emoji play button https://emojipedia.org/play-button%|%(view here)%|%. Turns out there are over 150 BMP code points which when followed by Emoji Style Variation Selector display as emoji characters. A full list is at https://www.unicode.org/Public/emoji/5.0/emoji-variation-sequences.txt. Below is a short program which downloads the master list of emoji-variation-sequences, and displays them--as a table if Tk is loaded or as a text otherwise. ====== package require Tk package require http package require tls http::register https 443 [list ::tls::socket -tls1 1] set url https://www.unicode.org/Public/emoji/5.0/emoji-variation-sequences.txt proc GetEmojiData {url} { set token [::http::geturl $url] set ncode [::http::ncode $token] set data [::http::data $token] ::http::cleanup $token return $data } proc ExtractCodePoints {emoji_data} { # 3299 FE0F ; emoji style; # (1.1) CIRCLED IDEOGRAPH SECRET # 1F004 FE0E ; text style; # (5.1) MAHJONG TILE RED DRAGON set result {} foreach line [split $emoji_data \n] { if {[string first "emoji style" $line] == -1} continue set codepoint [lindex [split $line " "] 0] if {[string length $codepoint] > 4} continue set n [regexp {\# \([.0-9]+\) (.*)} $line _ name] if {$n == 0} { puts stderr "could not extract name from '$line'" continue } lappend result [list $codepoint $name] } return $result } proc MakeTreeview {} { destroy {*}[winfo child .] set headers [list Raw FE0E FE0F Name] set tree ".tree" ::ttk::treeview $tree -columns $headers -yscroll ".vsb set" -xscroll ".hsb set" -height 30 scrollbar .vsb -orient vertical -command "$tree yview" scrollbar .hsb -orient horizontal -command "$tree xview" set font [::ttk::style lookup Treeview -font] foreach col $headers { $tree heading $col -text $col -anchor c set width [font measure $font " $col "] if {$col eq "Name"} { set width 300 } $tree column $col -width $width } $tree heading \#0 -text Codepoint set width [font measure $font " Codepoint "] $tree column \#0 -width $width grid $tree .vsb -sticky nsew grid .hsb -sticky nsew grid column . 0 -weight 1 grid row . 0 -weight 1 return $tree } proc Show {tree codepoint raw FE0E FE0F name} { if {$tree ne ""} { set values [list $raw $FE0E $FE0F $name] $tree insert {} end -text $codepoint -values $values } else { puts "$codepoint\t$raw\t$FE0E\t$FE0F\t$name" flush stdout } } ################################################################ set emoji_data [GetEmojiData $url] set code_points [ExtractCodePoints $emoji_data] set tree "" if {[info exists ::tk_version]} { set tree [MakeTreeview] } else { puts "Hex\tRaw\tFE0E\tFE0F\tName" } foreach pair $code_points { lassign $pair codepoint name scan $codepoint %x hex set raw [format %c $hex] set FE0E [format %c%c $hex 0xFE0E] set FE0F [format %c%c $hex 0xFE0F] Show $tree $codepoint $raw $FE0E $FE0F $name } ====== ---- [Jeff Smith] 2024-01-16 : Below is an online demo using [CloudTk]. This demo runs “Emoji, Unicode and Variation Selectors” in an Alpine Linux Docker Container. It is a 28.3MB image which is made up of Alpine Linux + tclkit + Emoji-Unicode-and-Variation-Selectors.kit + tls1.7.18 + libx11 + libxft + fontconfig. It is run under a user account in the Container. The Container is restrictive with permissions for "Other" removed for "execute" and "read" for certain directories. <> <> [KPV] 2024-01-17: Alas, the online demo, at least on my browser, wouldn't display the emoji characters properly. So here's a screen shot instead: [emoji and variation selectors] ---- [MG] 2024-01-21: Just tried this on Windows 7 using Tcl/Tk 8.6.9; unfortunately, none of the rows display emojis. A few display the exact same thing in all 3 columns; most display a single character in column one, then the same character followed by an "unprintable character" box in columns 2 and 3.