Tilde Substitution

JMN 2023-08-27 With the implementation of tip 602 https://core.tcl-lang.org/tips/doc/trunk/tip/602.md The information below no longer applies for Tcl9

The [file tildeexpand $path] and [file home] commands can be used to substitute ~ when desired.

I use an alias in my tclsh for commandline use: e.g interp alias "" ~ "" file join $::env(HOME)

Where previously you might type: glob -dir ~/.fossils * it becomes: glob -dir [~ .fossils] *

This bakes the homedir into the alias - which as far as I know is fine, as I'm not aware of any platforms that really allow a process to change the user it's running as - but in case it matters for some reason, you could always alias it as:

interp alias "" ~ "" apply {args {file join $::env(HOME) $args}}


Tcl supports csh-style tilde substitution. If a file name starts with a tilde, then the file name will be interpreted as if the first element is replaced with the location of the home directory for the given user. If the tilde is followed immediately by a separator, then the $HOME environment variable is substituted. Otherwise the characters between the tilde and the next separator are taken as a user name, which is used to retrieve the user's home directory for substitution.

The Macintosh and Windows platforms do not support tilde substitution when a user name follows the tilde. On these platforms, attempts to use a tilde followed by a user name will generate an error that the user does not exist when Tcl attempts to interpret that part of the path or otherwise access the file. The behaviour of these paths when not trying to interpret them is the same as on Unix. File names that have a tilde without a user name will be correctly substituted using the $HOME environment variable, just like for Unix.


Tilde substitution considered harmful

SEH: Many Tcl programmers have encountered the situation of writing code that takes lists of arbitrary file names as input, having that code work perfectly well for considerable time, then suddenly seeing it blow up and start throwing errors for no evident reason. Upon looking into the matter, said programmer discovers that the blowups occur when the first character of an input pathname is a tilde, and realizes the horror of tilde substitution; that is, sometimes when a path name starts with a tilde, the interpreter attempts to replace the tilde with a guess at a home directory value, which may or may actually exist, whether or not such substitution might make sense in context.

Fixing a tilde substitution problem in a single instance might be straightforward, but if you don't want it to happen again you may find it notoriously difficult to make your code really bulletproof against it. And this is not just frustrating and time-consuming, it feels non-tclish.

One of the great strengths of Tcl is the fact that its straightforward evaluation rules and the list data type usually make it trivially easy to handle arbitrary data without ambiguity. Tilde substitution undermines this strength and makes it impossible to treat file name information with the kind of non-ambiguity a Tcl programmer comes to expect.

It may be impossible to eliminate tilde substitution behavior in the interpreter for reasons of backward compatibility, but I think it's enough of a problem to justify adding an easy way to turn it off, such as a global- or namespace-level configuration option.

Until that day, I hope to use this page to collect and contribute techniques and code to combat tilde substitution problems.


AMG: Here's some code from an old version of Wibble that may be helpful:

# Version of [file join] that doesn't do ~user substitution and ignores leading
# slashes, for all elements except the first.
proc filejoin {args} {
    for {set i 1} {$i < [llength $args]} {incr i} {
        lset args $i ./[lindex $args $i]
    }
    string map {./ ""} [file join {*}$args]
}

However, it has problems with path elements ending in dot.

SEH: Since the only character not allowed in a Unix filename is null, here is a version of "file join" that takes advantage of that fact to neutralize tilde substitution:

proc filejoin args {
        if {[string index [set result [eval file join $args]] 0] ne {~}} {
                return $result
        }
        foreach tindex [lsearch -all $args ~*] {
                set arg [string replace [lindex $args $tindex] 0 0 \x0]
                lset args $tindex $arg 
        }
        set result [eval file join $args]
        string map {\x0 ~} $result 
}

Null is not allowed in Windows filenames either, so this should work as a cross-platform solution.

AMG: Regarding NUL, see this Tcl bug: [L1 ]. Even though NUL may be invalid in filenames, some (not all) of the [file] commands accept it.

SEH: Excellent point. I hope to create wrappers for cd and open as well, I will consider filtering filenames with nulls as a bonus feature of the wrappers.

SEH Below is a full-blown package providing wrappers for all Tcl commands for which tilde substitution is an issue. The wrapper functions behave the same way as the commands they replace, except tilde substitution is not done.

if 0 {

tilde.tcl --

Version 0.3
Author: Stephen Huntley ([email protected])

Package tilde provides wrappers for common filesystem-access functions
(auto_execok, cd, exec, file, glob, open) which neutralize the tilde
substitution behavior of those commands.  

The tilde substitution feature in the Tcl interpreter tends to be 
counter-productive in code designed to process sets of arbitrary file names not 
known to the programmer in advance.  Said code may seem to work fine until a 
data set is processed containing a pathname that starts with a tilde character.
Then the code suddenly starts throwing errors, because the interpreter rewrites
the pathname without warning or opportunity to opt out, leading to unpredictable
results.

The equivalent functions provided here:
(tauto_execok, tcd, texec, tfile, tglob, topen) have the same syntax as the
originals named above, and give the same results with the exception that leading
tildes in pathnames are treated as ordinary characters, and no substitution
is performed.

As an added bonus, in the more powerful commands pathnames containing a null
character (\x00) are banned, and trying to pass a parameter with a null will 
result in an error.

This is because some of the built-in Tcl commands improperly treat a null in a
filename as a string terminator rather than a proper part of the name. Thus the
result returned by the command is likely to be not what is expected, which may
in turn lead to security and stability issues.

Null characters in pathnames are impermissible in standard Unix and Windows 
filesystems, so they should not ordinarily be encountered.

}

package require Tcl 8.4

namespace eval ::tilde {

proc tauto_execok args {
        set arg [lindex $args 0]
        if {[string index $arg 0] eq {~}} {
                set arg "./$arg"
                lset args 0 $arg
        }
        eval auto_execok $args
}

proc tcd args {
        set arg [lindex $args 0]
        if {[string first \x0 $arg] > -1} {error "Null character not permitted in path."}
        if {[string index $arg 0] eq {~}} {
                set arg "./$arg"
                lset args 0 $arg
        }
        eval cd $args
}

proc texec args {
        set argIndex [llength $args]
        for {set i 0} {$i < $argIndex} {incr i} {
                set arg [lindex $args $i]
                if {[string index $arg 0] eq {-}} continue
                break
        }
        if {[string first \x0 $arg] > -1} {error "Null character not permitted in path."}
        if {[string index $arg 0] eq {~}} {
                set arg "./$arg"
                lset args $i $arg
        }
        eval exec $args
}

proc fileattributes {file {attribute {}} args} {
        if {[string first \x0 $file] > -1} {error "Null character not permitted in path."}
        if {[string index $file 0] eq {~}} {
                set file ./$file
        }
        eval file attributes [list $file] $attribute $args
}

proc filejoin args {
        if {[string index [set result [eval file join $args]] 0] ne {~}} {
                return $result
        }
        foreach tindex [lsearch -all $args ~*] {
                set arg [string replace [lindex $args $tindex] 0 0 \x0]
                lset args $tindex $arg 
        }
        set result [eval file join $args]
        string map {\x0 ~} $result 
}

proc tfile args {
        set command [lindex $args 0]
        if {([lsearch -all $args ~*] eq {}) && ([lsearch -all $args *\x0*] eq {})} {
                if {[string match *stat $command]} {
                        set var [lindex $args 2]
                        upvar $var statvar
                        lset args 2 statvar
                }
                return [eval file $args]
        }
        set args [lrange $args 1 end]
        switch -glob -- $command {
                ati* -
                co* -
                de* -
                li* -
                m* -
                ren* {
                        # atime copy delete link mkdir mtime rename
                        if {[lsearch $args *\x0*] > -1} {error "Null character not permitted in argument values."}
                        foreach tindex [lsearch -all $args ~*] {
                                set arg ./[lindex $args $tindex]
                                lset args $tindex $arg 
                        }
                        set eval_command "file $command $args"
                }
                att* {
                        # attributes
                        set eval_command "fileattributes $args"
                }
                di* {
                        # dirname
                        set name [lindex $args 0]
                        if {[string index $name 0] eq {~}} {
                                set name .
                                lset args 0 $name 
                        }
                        set eval_command "file dirname $args"
                }
                exe* -
                exi* -
                i* -
                no* -
                o* -
                p* -
                rea* -
                se* -
                si* -
                sy* -
                t* -
                w* {
                        # executable exists isdirectory isfile 
                        # normalize owned pathtype readable readlink separator
                        # size system tail type writable
                        set name [lindex $args 0]
                        if {[string first \x0 $name] > -1} {
                                switch -glob -- $command {
                                        exe* -
                                        exi* -
                                        is* -
                                        o* -
                                        reada* -
                                        w* {
                                                return 0
                                        }
                                        p* -
                                        se*
                                        sy*
                                        ta* {
                                                 
                                        }
                                        default {
                                                error "Null character not permitted in path."
                                        }
                                }

                        }
                        if {[string index $name 0] eq {~}} {
                                set name ./$name
                                lset args 0 $name 
                        }
                        set eval_command "file $command $args"
                }
                j* {
                        # join
                        set eval_command "filejoin $args"
                }
                ls* -
                st* {
                        # lstat stat
                        set name [lindex $args 0]
                        set var [lindex $args 1]
                        upvar $var statvar
                        if {[string index $name 0] eq {~}} {
                                set name ./$name
                                lset args 0 $name
                                lset args 1 statvar
                        }
                        set eval_command "file $command $args"
                }
                na* {
                        # nativename
                        set name [lindex $args 0]
                        set name [string map {~ \x0} $name]
                        lset args 0 $name
                        set name [eval file nativename $args]
                        set eval_command "string map {\x0 ~} [list $name]"
                }
                default {
                        set eval_command "file $command $args"
                }
        }
        eval $eval_command
}

proc tglob args {
        if {[lsearch $args *\x0*] > -1} {error "Null character not permitted in argument values."}
        if {[lsearch -all $args ~*] eq {}} {
                return [eval glob $args]
        }
        set pathedit 0
        set argIndex [llength $args]
        set newArgs [list]
        for {set i 0} {$i < $argIndex} {incr i} {
                set arg [lindex $args $i]
                switch -glob -- $arg {
                        -d* -
                        -p* {
                                lappend newArgs $arg
                                incr i
                                set arg [lindex $args $i]
                                if {[string index $arg 0] eq {~}} {
                                        set arg "./$arg"
                                        set pathedit 1
                                }
                                lappend newArgs $arg
                        }
                        -j* -
                        -n* -
                        -ta* {
                                lappend newArgs $arg
                        }
                        -ty* {
                                lappend newArgs $arg
                                incr i
                                set arg [lindex $args $i]
                                lappend newArgs $arg
                        }
                        -- {
                                incr i
                                break
                        }
                        default break
                }
                
        }
        set args [lrange $args $i end]
        foreach tindex [lsearch -all $args ~*] {
                set arg [lindex $args $tindex]
                set arg \[~\][string range $arg 1 end]
                set args [lset args $tindex $arg] 
        }
        set newArgs [concat $newArgs $args]
#        set result [glob {*}$newArgs]
        set result [eval glob $newArgs]
        if {$pathedit} {
                foreach path [lsearch -all $result ./*] {
                        set arg [lindex $result $path]
                        set arg [string range $arg 2 end]
                        set result [lset result $path $arg]
                }
        }
        return $result
}

proc topen args {
        set arg [lindex $args 0]
        if {[string index $arg 0] eq {|}} {
                set arg [string range $arg 1 end]
                set cmd [lindex $arg 0]
                if {[string first \x0 $cmd] > -1} {error "Null character not permitted in path."}
                if {[string index $cmd 0] eq {~}} {
                        set cmd ./$cmd
                        lset arg 0 $cmd
                }
                set arg [linsert $arg 0 |]
                lset args 0 $arg
        } else {
                if {[string first \x0 $arg] > -1} {error "Null character not permitted in path."}
                if {[string index $arg 0] eq {~}} {
                        set arg ./$arg
                        lset args 0 $arg
                }
        }
        eval open $args
}

namespace export tauto_execok tcd texec tfile tglob topen
} ; # end namespace tilde

package provide tilde 0.3