TIP #431 Discussion

TIP #431 Discussion summarizes the discussions that occurred on tcl-core mailing list and in the Tcl Chatroom starting on 2014-09-12.

Over several hours in the chat, this TIP was discussed. aspect tries to summarise the eventual consensus below. Any errors or misrepresentations are my own, so speak up :-).

See Also

file mkdir
also contains a discussion of this issue

Problem Description

Short version (what?)

The public function Tcl_FSCreateDirectory ("attempts to create the directory given by calling the owning filesystem's “create directory” function") returns EEXIST under conditions where this maps to the operating system's behaviour. This error is not visible from the script level. It should be.

Detailed version (why?)

A process may desire to create a directory that cooperating processes stay out of. On Unix systems, the mkdir() system call and the accompanying command can be used for this purpose, since it fails if the directory already exists, allowing the caller to know that it created the directory. This fact can be used as a synchronization primitive among the cooperating processes. In contrast, file mkdir behaves like the Unix command mkdir -p, which does not return an error if the directory already exists, thus losing the detail that is leveraged as the synchronization primitive.

file mkdir as implemented eats the EEXIST (or equivalent) error from the underlying syscall. This is desirable with mkdir -p style usage, where an arbitrary number of parent directories may or may not need to be created first, but makes it impossible for a script to safely generate a directory for its own use.

Presently, the best option a script writer has is to call file mkdir $dir after ensuring file exists $dir returns false. But here lies a race condition: in the short interval between file exists returning false and file mkdir being called, another process might have created the directory. file mkdir turns EEXIST into success, and the programmer is none the wiser.

Security Implications

In the 90s this used to be known as a directory race, these days apparently it's called "TOCTOU" - Time-of-check / Time-of-use error.

Consider a program that runs with elevated privileges, which responds to requests from users by creating a directory. This could be saving results in their homedir, creating a ~/.config directory or writing in /tmp. On receiving and validating a request, the program will do something like:

set dir [file join ~$user results_dir]
if {[file exists $dir]} {   ;#  1
    error "Directory exists!"
}
file mkdir $dir    ;#  2
file attributes $dir -owner $user
# .. save results into the new directory

The interval between 1 and 2 provides a short opportunity where the user can create ~/results_dir. A malicious user might create it as a symbolic link to a more sensitive location controlled by the program. file attributes will follow the link and change ownership of wherever it points! Examples of exactly this class of attack in the wild: http://www.securityfocus.com/bid/2207/discuss http://www.openwall.com/lists/oss-security/2014/01/28/8

This trivial case is easily sidestepped by checking file isdirectory, at which point the attack just gets a bit more complicated - perhaps the attacker needs to replace the containing directory with a symlink. The ensuing arms race is unwinnable.

Conversely, correct behaviour (up to the operating system's guarantees) can be obtained by a batch file or shell script. Or an extension that uses only Tcl_FSCreateDirectory, which doesn't exhibit the TOCTOU fault.

Note also that Tcl's open supports EXCL, whose role is to avoid exactly these problems when creating files.

Security Warnings

The absence of TOCTOU errors in mkdir(2 ) and http://msdn.microsoft.com/en-us/library/windows/desktop/aa363855%28v=vs.85%29.aspx%|% CreateDirectory is contingent on guarantees provided by the operating system and file system: on ancient Unices, the above described race was baked into the system call itself! Today, POSIX mandates protection from symlink attacks since 2003 , but risks apparently still exist for some networked filesystems. When providing assurances about reliability or security to your users, be clear about your assumptions.

Amusingly, it's come out in researching this issue that races have come out in the opposite direction: this report is of a bug in shtool where a command which should have provided mkdir -p semantics reported EEXIST in corner cases, causing parallel make -j to fail. Tcl's implementation of file mkdir does not appear to permit such an error.

Discussion

While exposing mkdtemp(3) as file tempdir is desirable, the underlying issue can be addressed more usefully in a much simpler fashion.

In principle, it may be possible to recover security by checking the ownership and permissions of the directory after calling file mkdir, but this is nearly impossible to get correct. And there is no safe way to recover -- if we believe the directory has been tampered with, we cannot safely change it.

By contrast, the system call itself offers an "atomic create" which returns an error when the directory already exists. TclpObjCreateDirectory passes this error back to its caller, but TclFileMakeDirsCmd hides it.

So the solution seems to be a variant of file mkdir that respects EEXIST. Such a command could be used to implement file tempdir correctly at the script level.

Since file mkdir is N-ary, we can't simply add an option so a new command is needed: the best suggested name so far seems to be: file createdir.

Suggested behaviour for this command:

  file createdir $dir
    Attempt to create the directory specified.  Raises an error if:
      1 - The immediate parent of $dir (ie, [file dirname $dir]) does not exist (POSIX ENOENT)
      2 - A directory named $dir already exists (POSIX EEXIST)
    Otherwise, behaviour is equivalent to [file mkdir] called with one argument.

This corresponds to a simple call to Tcl_FSCreateDirectoryProc(), exposing the POSIX errors mentioned above. On Windows, these are translated from the native errors (ERROR_ALREADY_EXISTS, ERROR_PATH_NOT_FOUND) in tclWinError.c and this translation is tested in winFCmd-4.3 and 4.2.

A future enhancement might add platform-specific prefix options to file createdir, exposing permissions (on Unix) or security attributes (on Windows), but that's beyond scope for now.

Proof-of-concept implementation

This is a naive patch against trunk b5ecfdaff3:

Index: generic/tclCmdAH.c
==================================================================
--- generic/tclCmdAH.c
+++ generic/tclCmdAH.c
@@ -951,10 +951,11 @@
     static const EnsembleImplMap initMap[] = {
         {"atime",        FileAttrAccessTimeCmd,        TclCompileBasic1Or2ArgCmd, NULL, NULL, 0},
         {"attributes",        TclFileAttrsCmd,        NULL, NULL, NULL, 0},
         {"channels",        TclChannelNamesCmd,        TclCompileBasic0Or1ArgCmd, NULL, NULL, 0},
         {"copy",        TclFileCopyCmd,                NULL, NULL, NULL, 0},
+        {"createdir",        TclFileCreateDirCmd,        TclCompileBasic1ArgCmd, NULL, NULL, 0},
         {"delete",        TclFileDeleteCmd,        TclCompileBasicMin0ArgCmd, NULL, NULL, 0},
         {"dirname",        PathDirNameCmd,                TclCompileBasic1ArgCmd, NULL, NULL, 0},
         {"executable",        FileAttrIsExecutableCmd, TclCompileBasic1ArgCmd, NULL, NULL, 0},
         {"exists",        FileAttrIsExistingCmd,        TclCompileBasic1ArgCmd, NULL, NULL, 0},
         {"extension",        PathExtensionCmd,        TclCompileBasic1ArgCmd, NULL, NULL, 0},

Index: generic/tclFCmd.c
==================================================================
--- generic/tclFCmd.c
+++ generic/tclFCmd.c
@@ -209,10 +209,32 @@
  * Side effects:
  *        See the user documentation.
  *
  *----------------------------------------------------------------------
  */
+int
+TclFileCreateDirCmd(
+    ClientData clientData,        /* Unused */
+    Tcl_Interp *interp,                /* Used for error reporting. */
+    int objc,                        /* Number of arguments */
+    Tcl_Obj *const objv[])        /* Argument strings passed to Tcl_FileCmd. */
+{
+    if (objc != 2) {
+        Tcl_WrongNumArgs(interp, 1, objv, "target");
+        return TCL_ERROR;
+    }
+
+    if (Tcl_FSCreateDirectory(objv[1]) != TCL_OK) {
+        Tcl_SetObjResult(interp, Tcl_ObjPrintf(
+                    "can't create directory \"%s\": %s",
+                    TclGetString(objv[1]), Tcl_PosixError(interp)));
+        return TCL_ERROR;
+    }
+
+    return TCL_OK;
+}
+
 
 int
 TclFileMakeDirsCmd(
     ClientData clientData,        /* Unused */
     Tcl_Interp *interp,                /* Used for error reporting. */

Index: generic/tclInt.h
==================================================================
--- generic/tclInt.h
+++ generic/tclInt.h
@@ -2891,10 +2891,11 @@
 MODULE_SCOPE int        TclEvalEx(Tcl_Interp *interp, const char *script,
                             int numBytes, int flags, int line,
                             int *clNextOuter, const char *outerScript);
 MODULE_SCOPE Tcl_ObjCmdProc TclFileAttrsCmd;
 MODULE_SCOPE Tcl_ObjCmdProc TclFileCopyCmd;
+MODULE_SCOPE Tcl_ObjCmdProc TclFileCreateDirCmd;
 MODULE_SCOPE Tcl_ObjCmdProc TclFileDeleteCmd;
 MODULE_SCOPE Tcl_ObjCmdProc TclFileLinkCmd;
 MODULE_SCOPE Tcl_ObjCmdProc TclFileMakeDirsCmd;
 MODULE_SCOPE Tcl_ObjCmdProc TclFileReadLinkCmd;
 MODULE_SCOPE Tcl_ObjCmdProc TclFileRenameCmd;

Index: tests/fCmd.test
==================================================================
--- tests/fCmd.test
+++ tests/fCmd.test
@@ -390,10 +390,29 @@
 } -constraints {notRoot} -body {
     file mkdir tf1
     file exists tf1
 } -result {1}
 
+;# not sure how to number these ..
+test fCmd-4.20 {TclFileCreateDirCmd: TclpCreateDirectory succeeds} -setup {
+    cleanup
+} -constraints {notRoot} -body {
+    file createdir td1
+    file exists td1
+} -result {1}
+test fCmd-4.21 {TclFileCreateDirCmd: errno: EEXIST} -setup {
+    cleanup
+} -constraints {notRoot} -body {
+    file createdir td1
+    list [catch {file createdir td1} msg] $msg $errorCode
+} -result {1 {can't create directory "td1": file already exists} {POSIX EEXIST {file already exists}}}
+test fCmd-4.22 {TclFileCreateDirCmd: errno: ENOENT} -setup {
+    cleanup
+} -constraints {notRoot} -body {
+    list [catch {file createdir td1/td2} msg] $msg $errorCode
+} -result {1 {can't create directory "td1/td2": no such file or directory} {POSIX ENOENT {no such file or directory}}}
+
 test fCmd-5.1 {TclFileDeleteCmd: FileForceOption fails} -constraints {notRoot} -body {
     file delete -xyz
 } -returnCodes error -result {bad option "-xyz": must be -force or --}
 test fCmd-5.2 {TclFileDeleteCmd: accept 0 files (TIP 323)} -body {
     file delete -force -force

A Script-level Alternative

PYK 2014-09-16: I'm in favour of a built-in command that fails if the directory can't be created or already exists, but I think it's already possible to securely get the desired functionality, a cooperative directory lock, without such a change. The procedures in the following example break down the process into discrete tasks:

directory creation
just make sure the directory exists.
directory restriction
guarantee ownership and restricted permissions on the directory.
acquire a lock
Use exclusive creation of a separate lock file as the synchronization primitive.

Those steps can then be combined to create a locked directory, as createlockeddire does.

In order not to create a critical race condition, the example has to cheat by using the system's stat command, which illustrates that maybe what's really needed is for file stat to additionally return permissions.

The created lockfile is left open, which makes it possible to use external tools such as lsof to decide if the process owning the lock is still around.

The example below detects direct compromise of the target directory. There is no way for it to report what file the attacker changed permissions on, but the fact the the only possible attack is to restrict permissions mitigates the potential consequences, and external tools could be used to find out which files have recently had their permissions changed. It doesn't detect when the attack window was used to silently change permissions on some other file, but the fact that the attack would only result in locking down the permissions on that file mitigates the risk. In general, any chmod operation is potentially subject to an attack in which the attacker moves the directory out from under the caller, yet changing file permissions is a necessary task. The way to manage this issue is to only create directories and chmod directories within a directory over which you or someone you trust (root?) already has control. In other words, get your filesystem permissions right!

#! /bin/env tclsh

namespace import ::tcl::mathop::-

proc createdir fname {
    file mkdir $fname
    file attributes $fname -permissions 0600
    lassign [exec stat -c {%F %a %U} $fname] type permissions userid

    if {$type ne {directory} || $permissions ne 600 || $userid ne $::tcl_platform(user)} {
        if {$type ne {directory}} {
            set compromised type 
            set desired directory
            set actual $type
        } elseif {$permissions ne 0600} {
            set compromised permissions
            set desired 0600
            set actual $permissions
        } else {
            set compromised owner
            set desired $::tcl_platform(user)
            set actual $userid
        }
        error [list {race condition} {take evasive action!} $compromised [
            list $desired ne $actual] $fname]
    }
    file attributes $fname -permissions 700
    return $fname
}

proc lockdir  {fname args} {
    #lock file is is relative to $fname
    set lock ./.lock
    set timeout 0 
    foreach {opt val} $args {
        switch $opt {
            lock {
                if {$val ne {}} {
                    set lock $val
                }
            }
            timeout {
                if {![string is integer $timeout]} {
                    error [list {timeout must be an integer} not $opt]
                }
                set timeout $val
            }
        }
    }
    set lock [file join $fname $lock]
    set start [clock milliseconds]
    while {[catch {set chan [open $lock {RDWR CREAT EXCL}]} chan copts]} {
        set current [clock milliseconds]
        set elapsed [- $current $start]
        if {$elapsed >= $timeout} {
            error [list {timed out after} $elapsed file $lock]
        }
        after 1
    }
    return [list $fname $lock $chan]
}

proc createlockeddir {fname {lock {}}} {
    set fname [createdir $fname]
    lassign  [lockdir $fname lock $lock] fname lock chan
    return [list $fname $lock $chan]
}

Example:

% lassign [createlockeddir somedir] fname chan
file3
% lassign [lockdir somedir timeout 5000] fname chan 
{timed out after} 5000 file somedir/./.lock
% close file3
% file delete somedir/./.lock
% lassign [lockdir somedir timeout 5000] fname chan
file3

aspect notes the exec stat call and proposes an equally portable alternative:

proc createdir fname {
    exec mkdir $fname
    return $fname
}

.. on Windows I guess it would look something like exec cmd /c md $fname.

PYK 2014-09-17: Yes, of course, but the presence of the exec call above illustrates what may be considered a deficiency in file stat, just as this example illustrates what may be considered a deficiency in file mkdir.

Page Authors

aspect
PYK