tclvfs gotchas

tclvfs is an excellent facility; however, there are some gotchas (as you'd expect for something that exposes the soft underbelly of tcl's file system).

This page is for folklore and here be dragons advice. Please track changes to any folklore you relate here, so it can be a useful resource for implementors and not a collection of old wives' tales.

Also please file bug reports or feature requests for problems you run into where you believe there is a better or more correct behaviour.

Also please make sure you develop with the very latest cvs version of tclvfs. It's a continuously improving system. Old versions can crash on unusual cases, you can waste a lot of time if you don't track the CVS for development.

SEH 20080304 -- I have found that some virtual filesystems that work perfectly well in a tclsh interpreter suddenly start to malfunction when run via tclkit. The culprit seems to be the Memchan package, or rather the fact that the tclkit program overrides Memchan with its own equivalent "rechan" code on startup. If a vfs uses memchans, while running inside a tclkit it may suddenly start throwing "bad file number" errors when attempts to write to the channel are made.

My best guess why is that since tclkit appears to run wrapped application code in a slave interpreter, appropriate share authorization is not being done for the "rechan" channel, which is created in the master interpreter, to make it usable in the slave.

SEH -- The fileattributes vfs handler is a potential performance land mine. If you do [file attributes $file] on a file in a virtual filesystem, the vfs extension fulfulls it by making a call to the fileattributes handler for a list of all attribute types, then a separate call to get each attribute value. On my Windows box that's seven separate calls.

If you've stacked two vfs's on top of each other, then the top vfs must resolve each of those 7 calls with a [file attributes] call of its own, which generates a new 7 calls from the extension to the bottom vfs for each of the 7 calls to the top vfs.

If you want to stack several vfs's on top of each other the result is a geometric explosion of calls to fileattributes handlers, which puts a serious practical limit on the stackability of virtual filesystems.

This problem can be ameliorated with caching of the results of [file attributes] calls within vfs namespaces, but in the future it would be nice if such complexity were dealt with within the extension itself, say with an addition to the fileattributes handler options to include an "allvalues" index that would return all file attribute values in one call, thus keeping performance of virtual filesystems roughly linear with stacking.

This of course would be very helpful for vfs's that employ network connections as well; getting all attribute info with one call would be much preferable performance-wise than having to deal with the network latency that comes with multiple separate calls.

Vince perhaps you should comment on the move tclvfs to tcl core TIP [L1 ] before it is voted on to try to get Andreas to improve the handling of 'file attributes'...

CMcC It's not as bad as all that, is it? Tcl i/o core calls to get the list of all attribute names (which tclvfs converts to a [$vfs fileattributes] call with no args.) The i/o core then calls to get an attribute with each attribute index. If your vfs is calling [file attributes] to get the list of names *it* can export to its caller, then of course each of the child vfs's attribute values will be calculated.

The reason this is a problem is that tcl [file] doesn't have an interface equivalent to the AttrString command, so there's no way for you to know which names the underlying fs supports (and the underlying file has) without getting the names *and* values, but all that's really needed is the names.

Seems there are only a few ways out of this:

add a subcommand to file like [file attrnames] which returns just the names and not the values.
vfs stacking implementations assume that all the underlying files will support a static set of attribute names, perform an initial [file attributes $path] with no attribute name args, and cache the lsorted array names which result from it.

What's happening, in summary: tcl doesn't export the weaker get attribute names facility, we (perforce) use the stronger and more expensive [file attributes $path] command to get those names, but we need not choose to do that *per*call*, we could do it on tclvfs initialisation and return that cached value.

Of course, the set of supported attributes *could* be different for each file in a file system, and tcl does support this (which is kinda funky) but because it provides no way to discover the supported attribute names without the associated values, it's going to be an expensive thing to do.

SEH [file exists $fileName] returns 0 if $fileName exists but is not executable. Use glob to test for existence.

Vince This bug is invalid. It is user error -- 'file exists' works perfectly if you implement your vfs 'access' handler correctly.

SEH I'll restate the nature of the gotcha: the docs for the Tclvfs package don't contain enough detail about the access handler method to allow certainty that you've implemented it correctly.

The vfs 'access' handler takes a mode argument in the form of an integer the meaning of which is not explained anywhere in the docs. A utility command (accessMode) is provided which resolves the integer into a string, but there is no explanation of the meanings of the possible strings either. Most of the meanings can be guessed, but if you guess wrong on one, your access handler won't work as you expect. The possible outputs of accessMode and their meanings are (I believe):

 F      file exists
 X      file is executable
 W      file is writable
 XW     file is executable and writable
 R      file is readable
 RX     file is readable and executable
 RW     file is readable and writable

I don't know if these designations are meant to check for exclusivity or simply sufficiency; i.e., if a file is readable and writable and the access mode is R, does access properly throw an error or not?

Vince feel free to file a bug report against the documentation. FYI, any unix man page for 'access()' will tell you how to interpret these fields (but of course we should still document it correctly in tclvfs).

CMcC it's more than a documentation bug - the vfs.c code for handling return from access callback is pretty horrible - there has got to be a better way to do it.

Here's the (excerpted) code to call the tclvfs' access callback as at 28Dec04, interspersed are my comments:

    returnVal = Tcl_EvalObjEx(interp, mountCmd, 
                              TCL_EVAL_GLOBAL | TCL_EVAL_DIRECT);

The tcl access callback is evaluated - at this point returnVal can be one of TCL_OK (0), TCL_ERROR, TCL_BREAK, TCL_CONTINUE, or anything returned using [return -code].

    if (returnVal != TCL_OK && returnVal != -1) {
        VfsInternalError(interp);
    }

So, anything but a normal [return] or a [return -code -1] is an internal error.

    if (returnVal != 0) {
        Tcl_SetErrno(ENOENT);
        return -1;

This code is equivalent to if (returnVal != TCL_OK)

    } else {
        return returnVal;
    }

And this negative consequence will always return TCL_OK.

It would be more intuitive to actually decode the returned value of the access callback, rather than rely upon the return -code.

Lars H: I have no opinion on whether using the return code is a good idea or not in this case, but I find it somewhat troubling that the return code -1 is used. The uplevel page contains a discussion which assigns a completely different meaning to this return code, and there is also a pure Tcl implementation of that mechanism here on the wiki.

SEH Extension has a deadly crashing bug when attempting to close a file opened for write. See [L2 ] (remove when fixed/closed with a note to use latest CVS)

Vince This has now been fixed. Use latest cvs.

SEH -- The vfs api layer will catch all errors in the close callback procedure, so if the procedure aborts due to an error, the close command that invoked the callback will still return OK, giving no indication that the close procedure failed and thus possibly lost data.

SEH -- Although the w channel open mode is supported in theory, in practice when the channel is passed to the close callback to be handled no capability is provided to read what's been written to the channel. Thus the written data is inaccessible to the vfs programmer.

So unless you want to limit yourself to read-only filesystems, you must kludge things somehow, like silently switching the mode to w+ in the open procedure and hoping nobody using the vfs assumes a file opened for writing only will have reads blocked.

CMcC I had problems with an old version of memchan which aped this. I see this problem, at base, is that the implementor of a vfs wants to create a channel in one mode (to give to a user), and subsequently escalate that mode to perform more powerful operations.

It would be possible to achieve this escalation in the tclvfs C code which calls the close callback - automatically escalating mode to the maximum possible. There is currently a TIP to expose those mode bits to extensions to try to solve this problem.

I note that the problem hasn't been obvious to date because most of the distributed filesystems use memchan, and memchans are always maximally permissive.

SEH -- You can't use fcopy in the close callback, or otherwise pass data directly from one file channel to another. You have to read channel data into a variable, then write it from the variable to the target channel.

CMcC Once a channel is closed, the only file facilities which the i/o core will permit are read and write. Asynchronous operations ought to be possible, but are probably risky.

SEH -- My understanding was that the channel is not closed in the close callback, merely restricted to seek and read operations. Fcopy illustrates the problem I'm talking about, but you can try the following in the callback procedure:

 set newchannel [open newfile.txt w]
 seek $callbackchannel 0
 puts $newchannel [read $callbackchannel]

and you will notice that no data is transmitted to newchannel, and no error is generated. You will only notice that there was a problem when you find that the data you thought you'd backed up is gone forever.

CMcC I just tried that (with the addition of [close $newchannel] :) and it worked fine. I had an old version of memchan, though, which didn't support seek properly (I think that was the problem) and which indeed failed silently to allow me to read its contents back ... it just always returned "". I found I had both a /usr/lib/Memchan2.2 and a /usr/lib/Memchan2.2.0 which was more current, but which was never being found by the package mechanism.

I think we need to get to the bottom of the problem you observe.

SEH -- I'm not using Memchan at all. The above callbackchannel was created by opening an existing real file.

CMcC -- Ok, well this is puzzling and disturbing ... I'm getting the output in the output file. Can you add tests to discover whether the problem is (a) empty read, (b) failed write, (c) some other error? Can you write to stderr from the close callback?

SEH -- I tried a simple scenario here at work (Win2000) and it seems to work OK as you say. At home (WinXP) is where I have the problems. The problem may be OS-dependent, or it may stem from the rather more complicated arrangements (daisy-chained vfs's) I'm using.

13dec04 jcw - I'm having some trouble throwing errors from VFS driver code. Am trying to report an error inside matchindirectory, when called with a bad path. There appears to be a "vfs::filesystem posixerror $N" call for this purpose, but I can't make it work. Looking into tclvfs's "vfs.c" code, something strange seems to be in there - the error is -1 i.s.o. TCL_OK. It leads to the following effect:

    % vfs::filesystem posixerror 13
    command returned bad code: -1
    %

CMcC look above for discussion of the access callback - it relies upon a [return -code -1] to indicate no access, perhaps this is related?

Ok, RTFM vfs-fsapi.man ...

    The generic layer expects that the subcommands of a handler signal
    error conditions by calling [[vfs::filesystem posixerror]] with
    the appropriate posix error code instead of throwing a tcl error.
    If the latter is done nevertheless it will be treated as an unknown
    posix error.

A consensus seems to be forming that [vfs::filesystem posixerror] is a bad thing:

it generates errors which purport to be from a posix command call, but which aren't, they're from the tclvfs layer.
the posixerror array isn't done portably, so on some platforms you will get false errors reported.
[posixerror] sets errno, which it should not do, as that will mask real posix errors.

The consensus is that the API needs changing, that simple [error] should be enough.

Meanwhile, some API elements require a [return -code -1] to work properly. I'll try to document them.

Ok, all tclvfs callbacks / api elements, can return -1 to indicate failure. They can also [error] to indicate failure, but this provokes a call to vfsInternalError, which is usually not really what you want.

I guess the problem is to indicate an expected failure (such as when you tried to stat a file but the file didn't exist) and an unexpected failure, such as when your tclvfs implementation has a bug.

It seems that expected failures are indicated by [return -code -1] and the others will result in some kind of logging behavior.

This seems reasonable, even unavoidable, but should be documented explicitly - not indirectly by [vfs::filesystem posixerror].

Vince inserts that the entire purpose of introducing vfs::filesystem posixerror was to be able to set errno from Tcl code to emulate posix errors from a filesystem implementation. So your (3) above is considered to be the defining feature of this command, not a problem with it!! The reason for this is that Tcl's filesystem API is in most cases only able to transfer a posix error code, not an arbitrary string, from a filesystem implementation to Tcl's innards.

Vince inserts that error is currently defined to be exactly the wrong thing to do. Tclvfs documents that it is a bug in your vfs handler if it throws an error that is not a posix error (see the 2003-02-20 changelog entry).

Vince finally adds that he doesn't see any consensus referred to above, but has no problem with tclvfs being changed if someone can provide a full proposal that works, and explains how 'error' is sufficient to do everything that needs to be done and not to mask bugs. What happens if my 'access' vfs-handler does error "This is an error." in response to a file exists command at the Tcl level. That arbitrary string cannot be turned into a posix error code, and cannot be passed through to Tcl (Tcl_FSAccess does not permit that). So what is this "consensus proposal"? (I should say I'm more than happy to see posixerror disappear, but can't see any proposal here which will permit that).

CMcC You got me, Vince, the consensus was 'someone said, and I found plausible' :)

So, taking Tcl_FSAccess, my understanding is that it only cares about the return value [return -code -1], which means that the requested access is not permitted, or anything else [return whatever] which means the requested access is permitted ... the errno in the case of failure is set to ENOENT.

Considering the other API functions, does it actually matter if the posix error isn't set? What breaks if it's not set, or if it's set to 0 or somesuch?

Vince: there are lots more posix errors that access() can return beyond ENOENT. You are right that if all of these commands want to return is success or failure then ok vs error would work, but in reality there are many more possible error codes, so we need a mechanism by which such codes could be returned. So, I would turn the question back to you: if you're not going to use the posixerror mechanism to set errors, how do you propose to signal all the different kinds of errors that might be triggered? (tclErrno.h lists them -- we might not need them all, but we do need more than 1 or 2)

Basically, you want to know if the error in, e.g., 'file delete $path' is because the file is busy, it's a non-empty directory, the permissions are inadequate, it doesn't even exist, the filesystem is read-only, etc etc.

CMcC: I don't know how I'd write a program which discriminated between those conditions and took different actions depending upon them. I usually just need success-or-failure along with a string to tell me what the problem was (so I can rectify it).

Would the problem you see be solved in all cases if [error] result was passed back as a string to the tcl i/o core, and the core preserved it?

CMcC: I see, by inspecting the code, that the tcl i/o core in fact does discriminate between different values of errno. One example is tclFCmd.c TclFileMakeDirsCmd(), where errno value of ENOENT is significant.

It follows, then, that something like posixerror is needed. A question: how can we know that the C value of ENOENT is the same as the vfsUtils.tcl value of ENOENT? Are they standards? Will the values ever vary between implementations? Can we really guarantee that?

Next problem: we need to document, for each vfs hook, which values of errno are discriminated and the functional impact of each significant value.

Vince: indeed, the tcl fs core discriminates between all sorts of different values of errno. Just search the Tcl sources for the regexp == E[A-Z]{4,5}[^A-Z_] to see a fairly broad selection. I don't know if the numerical values are standards, and I agree we do need to document these.

Finally, yes, one way of solving this would be if in all cases the [error] result was passed back through the tcl fs core, but in most respects it is easier to use or at least allow posix error codes because these makes it easier for a vfs to ensure it uses exactly the same error messages as Tcl's core, and this is very nice for consistency --- in principle it makes it much easier to consider using large chunks of fCmd.test, cmdAH.test, fileName.test, etc, as actual tests of any new vfs and not just of Tcl's core. For example we ought to be able to run most of those tests from inside a tclkit and have them pass.

CMcC 20050112 ... -permissions must be attribute index 2 if you want to [file copy] into the vfs.

This is a bug in tcl vfs core, [L3 ] and not in tclvfs per se, however it would be as well to be aware of it until it's fixed.

When a cross-filesystem [file copy] has copied a file, it endeavours (under unix, only) to set the -permissions attribute of the newly created file to those of the original. Unfortunately, the attribute index of -permissions is hard-coded to integer 2, regardless of the set of attributes published by the vfs. This's fairly annoying if you're not expecting it :)

Vince - this bug has now been fixed (Jan 2005).

SEH -- 17Mar05 -- I ran into a gotcha this past week that is most likely a Win filesystem problem rather than a Tclvfs bug, but it still severly impacted my vfs development activities, so I'll describe it under the category of "here be dragons."

It seems there is a significant time lag between successful return of a [file delete] call and the point when [file exists] on the same item will return 0.

I was working on a sync script involving a virtual filesystem that worked fine in debug, but the auditing record that counted what was added and deleted malfunctioned when run in real time. At first, the thought that a timing issue had somehow manifested itself in normally-procedural Tcl made my brain explode, but many hours of careful analysis made it clear that when the script did [file delete] on a directory and then globbed the contents of the containing directory right afterward, the supposedly-deleted file showed up in the list returned by [glob].

In order to make the script work as desired, I had to put the delete call in a while loop; to wit:

 while {[file exists "item"]} {file delete -force "item"}

This may have been overkill, but by the end of the experience I was thirsting for blood.

I would be interested to know if users more experienced in Windows system programming have additional insight into this behavior, and if it's appropriate or possible for the Tcl core to address it.

MG Would it perhaps be better to use

  file delete $item
  while { [file exists $item] } {continue}

to save running the [file delete] multiple times? I haven't checked, but I'd assume that would be less CPU-intensive.

SEH -- Yes, it might be better, that's what I meant when I thought my solution may be overkill. But there is still a good reason for my solution over yours; if the delete fails for a transitory reason (like network service interruption) without throwing an error, your solution will hang the program. Since an error in a close callback in a virtual filesystem will not cause the [close] statement to return an error, I think there is valid cause for worry on that front.

FPX just ran into the following gotcha: VFS are not visible to tk_chooseDirectory or tk_getOpenFile. Because both dialogs are implemented natively, they are not aware of the VFS plumbing that exists within Tcl. And here I was hoping that I could easily navigate and pick from the contents of a ZIP file using tk_getOpenFile.

DKF: They are, but only on Unix (because the dialogs there are written in Tcl). Try using tk::dialog::file::chooseDir:: (yes, with a :: at the end) or tk::dialog::file:: open (or ... save to do tk_getSaveFile) instead.

Category VFS