Version 40 of tclvfs gotchas

tclvfs is an excellent facility; however, there are some gotchas (as you'd expect for something that exposes the soft underbelly of tcl's file system).

This page is for folklore and here be dragons advice. Please track changes to any folklore you relate here, so it can be a useful resource for implementors and not a collection of old wives' tales.

Also please file bug reports or feature requests for problems you run into where you believe there is a better or more correct behaviour.

Also please make sure you develop with the very latest cvs version of tclvfs. It's a continuously improving system. Old versions can crash on unusual cases, you can waste a lot of time if you don't track the CVS for development.

SEH -- The fileattributes vfs handler is a potential performance land mine. If you do [file attributes $file] on a file in a virtual filesystem, the vfs extension fulfulls it by making a call to the fileattributes handler for a list of all attribute types, then a separate call to get each attribute value. On my Windows box that's seven separate calls.

If you've stacked two vfs's on top of each other, then the top vfs must resolve each of those 7 calls with a [file attributes] call of its own, which generates a new 7 calls from the extension to the bottom vfs for each of the 7 calls to the top vfs.

If you want to stack several vfs's on top of each other the result is a geometric explosion of calls to fileattributes handlers, which puts a serious practical limit on the stackability of virtual filesystems.

This problem can be ameliorated with caching of the results of [file attributes] calls within vfs namespaces, but in the future it would be nice if such complexity were dealt with within the extension itself, say with an addition to the fileattributes handler options to include an "allvalues" index that would return all file attribute values in one call, thus keeping performance of virtual filesystems roughly linear with stacking.

This of course would be very helpful for vfs's that employ network connections as well; getting all attribute info with one call would be much preferable performance-wise than having to deal with the network latency that comes with multiple separate calls.

Vince perhaps you should comment on the move tclvfs to tcl core TIP [L1 ] before it is voted on to try to get Andreas to improve the handling of 'file attributes'...

CMcC It's not as bad as all that, is it? Tcl i/o core calls to get the list of all attribute names (which tclvfs converts to a [$vfs fileattributes] call with no args.) The i/o core then calls to get an attribute with each attribute index. If your vfs is calling [file attributes] to get the list of names *it* can export to its caller, then of course each of the child vfs's attribute values will be calculated.

The reason this is a problem is that tcl [file] doesn't have an interface equivalent to the AttrString command, so there's no way for you to know which names the underlying fs supports (and the underlying file has) without getting the names *and* values, but all that's really needed is the names.

Seems there are only a few ways out of this:

add a subcommand to file like [file attrnames] which returns just the names and not the values.
vfs stacking implementations assume that all the underlying files will support a static set of attribute names, perform an initial [file attributes $path] with no attribute name args, and cache the lsorted array names which result from it.

What's happening, in summary: tcl doesn't export the weaker get attribute names facility, we (perforce) use the stronger and more expensive [file attributes $path] command to get those names, but we need not choose to do that *per*call*, we could do it on tclvfs initialisation and return that cached value.

Of course, the set of supported attributes *could* be different for each file in a file system, and tcl does support this (which is kinda funky) but because it provides no way to discover the supported attribute names without the associated values, it's going to be an expensive thing to do.

SEH [file exists $fileName] returns 0 if $fileName exists but is not executable. Use glob to test for existence.

Vince This bug is invalid. It is user error -- 'file exists' works perfectly if you implement your vfs 'access' handler correctly.

SEH I'll restate the nature of the gotcha: the docs for the Tclvfs package don't contain enough detail about the access handler method to allow certainty that you've implemented it correctly.

The vfs 'access' handler takes a mode argument in the form of an integer the meaning of which is not explained anywhere in the docs. A utility command (accessMode) is provided which resolves the integer into a string, but there is no explanation of the meanings of the possible strings either. Most of the meanings can be guessed, but if you guess wrong on one, your access handler won't work as you expect. The possible outputs of accessMode and their meanings are (I believe):

 F        file exists
 X        file is executable
 W        file is writable
 XW        file is executable and writable
 R        file is readable
 RX        file is readable and executable
 RW        file is readable and writable

I don't know if these designations are meant to check for exclusivity or simply sufficiency; i.e., if a file is readable and writable and the access mode is R, does access properly throw an error or not?

Vince feel free to file a bug report against the documentation. FYI, any unix man page for 'access()' will tell you how to interpret these fields (but of course we should still document it correctly in tclvfs).

CMcC it's more than a documentation bug - the vfs.c code for handling return from access callback is pretty horrible - there has got to be a better way to do it.

Here's the (excerpted) code to call the tclvfs' access callback as at 28Dec04, interspersed are my comments:

    returnVal = Tcl_EvalObjEx(interp, mountCmd, 
                              TCL_EVAL_GLOBAL | TCL_EVAL_DIRECT);

The tcl access callback is evaluated - at this point returnVal can be one of TCL_OK (0), TCL_ERROR, TCL_BREAK, TCL_CONTINUE, or anything returned using [return -code].

    if (returnVal != TCL_OK && returnVal != -1) {
        VfsInternalError(interp);
    }

So, anything but a normal [return] or a [return -code -1] is an internal error.

    if (returnVal != 0) {
        Tcl_SetErrno(ENOENT);
        return -1;

This code is equivalent to if (returnVal != TCL_OK)

    } else {
        return returnVal;
    }

And this negative consequence will always return TCL_OK.

It would be more intuitive to actually decode the returned value of the access callback, rather than rely upon the return -code.

SEH Extension has a deadly crashing bug when attempting to close a file opened for write. See [L2 ] (remove when fixed/closed with a note to use latest CVS)

Vince This has now been fixed. Use latest cvs.

SEH -- The vfs api layer will catch all errors in the close callback procedure, so if the procedure aborts due to an error, the close command that invoked the callback will still return OK, giving no indication that the close procedure failed and thus possibly lost data.

SEH -- Although the w channel open mode is supported in theory, in practice when the channel is passed to the close callback to be handled no capability is provided to read what's been written to the channel. Thus the written data is inaccessible to the vfs programmer.

So unless you want to limit yourself to read-only filesystems, you must kludge things somehow, like silently switching the mode to w+ in the open procedure and hoping nobody using the vfs assumes a file opened for writing only will have reads blocked.

CMcC I had problems with an old version of memchan which aped this. I see this problem, at base, is that the implementor of a vfs wants to create a channel in one mode (to give to a user), and subsequently escalate that mode to perform more powerful operations.

It would be possible to achieve this escalation in the tclvfs C code which calls the close callback - automatically escalating mode to the maximum possible. There is currently a TIP to expose those mode bits to extensions to try to solve this problem.

I note that the problem hasn't been obvious to date because most of the distributed filesystems use memchan, and memchans are always maximally permissive.

SEH -- You can't use fcopy in the close callback, or otherwise pass data directly from one file channel to another. You have to read channel data into a variable, then write it from the variable to the target channel.

CMcC Once a channel is closed, the only file facilities which the i/o core will permit are read and write. Asynchronous operations ought to be possible, but are probably risky.

SEH -- My understanding was that the channel is not closed in the close callback, merely restricted to seek and read operations. Fcopy illustrates the problem I'm talking about, but you can try the following in the callback procedure:

 set newchannel [open newfile.txt w]
 seek $callbackchannel 0
 puts $newchannel [read $callbackchannel]

and you will notice that no data is transmitted to newchannel, and no error is generated. You will only notice that there was a problem when you find that the data you thought you'd backed up is gone forever.

CMcC I just tried that (with the addition of [close $newchannel] :) and it worked fine. I had an old version of memchan, though, which didn't support seek properly (I think that was the problem) and which indeed failed silently to allow me to read its contents back ... it just always returned "". I found I had both a /usr/lib/Memchan2.2 and a /usr/lib/Memchan2.2.0 which was more current, but which was never being found by the package mechanism.

I think we need to get to the bottom of the problem you observe.

SEH -- I'm not using Memchan at all. The above callbackchannel was created by opening an existing real file.

CMcC -- Ok, well this is puzzling and disturbing ... I'm getting the output in the output file. Can you add tests to discover whether the problem is (a) empty read, (b) failed write, (c) some other error? Can you write to stderr from the close callback?

SEH -- I tried a simple scenario here at work (Win2000) and it seems to work OK as you say. At home (WinXP) is where I have the problems. The problem may be OS-dependent, or it may stem from the rather more complicated arrangements (daisy-chained vfs's) I'm using.

13dec04 jcw - I'm having some trouble throwing errors from VFS driver code. Am trying to report an error inside matchindirectory, when called with a bad path. There appears to be a "vfs::filesystem posixerror $N" call for this purpose, but I can't make it work. Looking into tclvfs's "vfs.c" code, something strange seems to be in there - the error is -1 i.s.o. TCL_OK. It leads to the following effect:

    % vfs::filesystem posixerror 13
    command returned bad code: -1
    %

CMcC look above for discussion of the access callback - it relies upon a [return -code -1] to indicate no access, perhaps this is related?

Ok, RTFM vfs-fsapi.man ...

    The generic layer expects that the subcommands of a handler signal
    error conditions by calling [[vfs::filesystem posixerror]] with
    the appropriate posix error code instead of throwing a tcl error.
    If the latter is done nevertheless it will be treated as an unknown
    posix error.

A consensus seems to be forming that [vfs::filesystem posixerror] is a bad thing:

it generates errors which purport to be from a posix command call, but which aren't, they're from the tclvfs layer.
the posixerror array isn't done portably, so on some platforms you will get false errors reported.
[posixerror] sets errno, which it should not do, as that will mask real posix errors.

The consensus is that the API needs changing, that simple [error] should be enough.

Vince inserts that the entire purpose of introducing vfs::filesystem posixerror was to be able to set errno from Tcl code to emulate posix errors from a filesystem implementation. So your (3) above is considered to be the defining feature of this command, not a problem with it!! The reason for this is that Tcl's filesystem API is in most cases only able to transfer a posix error code, not an arbitrary string, from a filesystem implementation to Tcl's innards.

Meanwhile, some API elements require a [return -code -1] to work properly. I'll try to document them.

Ok, all tclvfs callbacks / api elements, can return -1 to indicate failure. They can also [error] to indicate failure, but this provokes a call to vfsInternalError, which is usually not really what you want.

Vince inserts that error is currently defined to be exactly the wrong thing to do. Tclvfs documents that it is a bug in your vfs handler if it throws an error that is not a posix error (see the 2003-02-20 changelog entry).

I guess the problem is to indicate an expected failure (such as when you tried to stat a file but the file didn't exist) and an unexpected failure, such as when your tclvfs implementation has a bug.

It seems that expected failures are indicated by [return -code -1] and the others will result in some kind of logging behavior.

This seems reasonable, even unavoidable, but should be documented explicitly - not indirectly by [vfs::filesystem posixerror].

[Category VFS]