Emoji, Unicode and Variation Selectors

Keith Vetter - 2024-01-15: Emoji are weird . After initially being ignored by the Unicode Consortium, and they were only adopted in 2007 as part of the Supplementary Multilingual Plane (SMP) [ ]. But being part of the SMP meant that many systems had to be updated to be able to handle them. This includes Tcl which didn't get initial emoji handling until version 8.6.10.

Alas, I'm stuck on version 8.6.9, so I thought I was out of luck in regards to using emoji in my software.

But, Unicode is complicated, and there's an obscure feature called Variation Selectors .

They are 16 combining characters (U+FE00 - U+FE0F) which can change the glyph variant of the preceding character. Variant Selector U+FE0E and U+FE0F are glyph variants text style and emoji style, respectively.

As an example, the character \u25b6 "▶" is "BLACK RIGHT-POINTING TRIANGLE", but \u25b6\uFE0F is an emoji play button (view here) .

Turns out there are over 150 BMP code points which when followed by Emoji Style Variation Selector display as emoji characters. A full list is at https://www.unicode.org/Public/emoji/5.0/emoji-variation-sequences.txt .

Below is a short program which downloads the master list of emoji-variation-sequences, and displays them--as a table if Tk is loaded or as a text otherwise.

package require Tk
package require http
package require tls
http::register https 443 [list ::tls::socket -tls1 1]

set url https://www.unicode.org/Public/emoji/5.0/emoji-variation-sequences.txt

proc GetEmojiData {url} {
    set token [::http::geturl $url]
    set ncode [::http::ncode $token]
    set data [::http::data $token]
    ::http::cleanup $token
    return $data

proc ExtractCodePoints {emoji_data} {
    # 3299 FE0F  ; emoji style; # (1.1) CIRCLED IDEOGRAPH SECRET
    # 1F004 FE0E ; text style;  # (5.1) MAHJONG TILE RED DRAGON

    set result {}
    foreach line [split $emoji_data \n] {
        if {[string first "emoji style" $line] == -1} continue
        set codepoint [lindex [split $line " "] 0]
        if {[string length $codepoint] > 4} continue
        set n [regexp {\# \([.0-9]+\) (.*)} $line _ name]
        if {$n == 0} {
            puts stderr "could not extract name from '$line'"
        lappend result [list $codepoint $name]
    return $result
proc MakeTreeview {} {
    destroy {*}[winfo child .]
    set headers [list Raw FE0E FE0F Name]
    set tree ".tree"
    ::ttk::treeview $tree -columns $headers -yscroll ".vsb set" -xscroll ".hsb set" -height 30
    scrollbar .vsb -orient vertical -command "$tree yview"
    scrollbar .hsb -orient horizontal -command "$tree xview"

    set font [::ttk::style lookup Treeview -font]

    foreach col $headers {
        $tree heading $col -text $col -anchor c
        set width [font measure $font "   $col   "]
        if {$col eq "Name"} { set width 300 }
        $tree column $col -width $width
    $tree heading \#0 -text Codepoint
    set width [font measure $font " Codepoint "]
    $tree column \#0 -width $width

    grid $tree .vsb -sticky nsew
    grid .hsb -sticky nsew
    grid column .  0 -weight 1
    grid row . 0 -weight 1

    return $tree

proc Show {tree codepoint raw FE0E FE0F name} {
    if {$tree ne ""} {
        set values [list $raw $FE0E $FE0F $name]
        $tree insert {} end -text $codepoint -values $values
    } else {
        puts "$codepoint\t$raw\t$FE0E\t$FE0F\t$name"
        flush stdout

set emoji_data [GetEmojiData $url]
set code_points [ExtractCodePoints $emoji_data]

set tree ""
if {[info exists ::tk_version]} {
    set tree [MakeTreeview]
} else {
    puts "Hex\tRaw\tFE0E\tFE0F\tName"

foreach pair $code_points {
    lassign $pair codepoint name

    scan $codepoint %x hex
    set raw [format %c $hex]
    set FE0E [format %c%c $hex 0xFE0E]
    set FE0F [format %c%c $hex 0xFE0F]
    Show $tree $codepoint $raw $FE0E $FE0F $name

Jeff Smith 2024-01-16 : Below is an online demo using CloudTk. This demo runs “Emoji, Unicode and Variation Selectors” in an Alpine Linux Docker Container. It is a 28.3MB image which is made up of Alpine Linux + tclkit + Emoji-Unicode-and-Variation-Selectors.kit + tls1.7.18 + libx11 + libxft + fontconfig. It is run under a user account in the Container. The Container is restrictive with permissions for "Other" removed for "execute" and "read" for certain directories.

Jeff Smith 2024-01-22 : I loaded NotoColorEmoji.ttf and DejaVuSans.ttf fonts in the Container and it gives a better output but still not the same as Keith’s screenshot. (Using Tclkit 8.6.9)

KPV 2024-01-17: Alas, the online demo, at least on my browser, wouldn't display the emoji characters properly. So here's a screen shot instead: emoji and variation selectors

MG 2024-01-21: Just tried this on Windows 7 using Tcl/Tk 8.6.9; unfortunately, none of the rows display emojis. A few display the exact same thing in all 3 columns; most display a single character in column one, then the same character followed by an "unprintable character" box in columns 2 and 3.

greg 2024.01-21: Could be due to the font used. Maybe try a font from the Segoe UI family.font support {"Segoe UI Emoji" 10}

MG I don't have Segoe UI Emoji unfortunately; tried "Segoe UI" and "Segoe UI Symbol" but have the same outcome as the default font - both the FE0E and FE0F columns are displaying the extra byte as a separate unprintable character in most instances and not combining with or changing the first character.

KPV What displays if you run this in tclsh (no tk or wish)? It should output to the console which I hope will have emoji support. Likewise, what happens if you type at the command prompt: echo -e "\u25b6 \u25b6\uFE0F"? It should show a black triangle and a emoji play button.

MG In tclsh, virtually every character in all 3 columns is just a ? (and the 0E/0F columns always have 2 characters). "echo" in my cmd prompt just displays the following string literally, no expansion, so nothing useful.

ray2501 You can check TIP 600: Migration guide for Tcl 8.6/8.7/9.0 , Tcl started working for Emoji since 8.6.10.