Version 88 of VFS

Updated 2003-04-22 17:19:06

VFS - Virtual File System

[Anyone have a definition for this term?]


As of Tcl 8.4a3 (and in Tcl 8.4 very robust), Tcl's filesystem is completely 'virtual filesystem aware'. What this means is that in principle you can divert all desired filesystem activity away from the native operating system and to something else.

In practice what this means is that, given appropriate extensions, any ordinary Tcl code can use the standard file commands: cd, pwd, glob, file, load and operate on 'virtual files' without realising it. Such virtual files could be remote files (on ftp sites or over a http connection) or inside archives (zip files, for instance). The idea is that a script doesn't need to know what or where exactly a file actually exists (note: using the file system command, there is introspection into what kind of system a file is actually on, if that information is needed). In fact, any extensions to Tcl (such as Tk, BLT, Img, etc) which make use of Tcl's cross-platform filesystem API will also automatically be virtual filesystem aware. (So 'image create -file ftp://blah ' works just fine).

Tcl's core doesn't come with any useful such extensions (apart from the one which lets you use the native filesystem, as ever!). But, a number of such extensions do exist -- see below.

(Historical note: the VFS idea for Tcl was originally built in pure Tcl by Matt Newman for use in TclKit, and later re-introduced at the C level by Vince Darley, as described in TIP #17 [L1 ]). The nice thing about it being at the C level is that all Tcl extensions (including Tk) are automatically vfs-aware (or, more precisely, completely unaware that they are operating on non-native files).

(Editor note: try to keep tclvfs-specific material on that page, and general discussions of Tcl's VFS capabilities here)

Known users of Tcl's FS API

Here are the known extensions which make use of this API:

native filesystem

Tcl's core hooks the native filesystem into these APIs to provide Tcl's standard file support. This is coded in TclIOUtil.c and the various platform specific files (tclWinFile.c tclWinFCmd.c, etc).

prowrap

There is now an updated version of zvfs.c available on http://sourceforge.net/projects/tclpro/ . Checkout the Patches link.

There is a provisional patch for prowrap which attempts to make it use these APIs. [L2 ] Unfortunately, prowrap's 'build' system is so unfriendly that this patch has not been tested. With a little effort, this patch would allow prowrap to support 'glob' (as well as generally have a more robust virtual filesystem).

testreporting filesystem

This is part of Tcl's test code/suite. When registered it allows all filesystem activity to be reported upon (a brief message describing each filesystem access is printed to stderr by default).

tclvfs extension

This is at http://sourceforge.net/projects/tclvfs/ and tclvfs and should compile easily on win/unix/macos (there's just one file vfs.c to compile). This extension, which can than be accessed with package require vfs exposes the vfs API to Tcl scripts. This means you can then write a completely new filesystem using pure Tcl.

The tclvfs extension actually comes with a library of tcl code which implements a number of such filesystems. You can mount .zip archives, ftp sites, http sites, webdav remote disks, metakit archives (as used by tclkit), and even mount Tcl's proc namespaces as a filesystem!


If you would like to add/use new filesystem types, you can do that in one of two ways: use 'tclvfs' and write the new filesystem in Tcl; or write a new filesystem in C (or indeed you could write/modify something like tclvfs which exposes the vfs to tcl in a different way).


Examples

Take a look at the One-line web browser in Tcl.

See the tclvfs page for examples there.


Future Vfs Thoughts (i.e. not implemented in Tcl 8.4)!

asynchronicity

Asynchronous filesystem access would be useful, primarily for copying files, but also potentially other purposes:

    file copy -command progress http://a.b.c/foo .   

    set fd [open http://a.b.c/foo w]
    puts $fd -command progress $megabytes

    load -command progress http://a.b.c/tk84.dll

Note that both 'file rename', and 'load' may require cross-filesystem copies.

Note: strictly speaking, the second of the three examples above (puts ... -command) doesn't have anything to do with the new vfs API. It is 100% channel based. In fact the real need here is for something which bridges the current gap between the channel api and the vfs api. This gap is structurally very clean, but, from the examples above, needs to be plugged!

overlapping mounts

Allow a single directory's contents to come from a merge of a variety of sources.

WebDAV

[future VFS thoughts: BXXP? WebDAV?]

WebDAV would be great! It should be possible to write a WebDAV implementation entirely in Tcl using the tclvfs extension. See tclvfs for a simple webdav implementation (needs work).

Translucent File System (Writeable CD-ROMs)

Allow a proxy for a CD-ROM that passes reads through but stores files that are written in a proxy directory structure. The program would have no simple way of knowing that the data being read comes from a read-only medium. See http://wks.uts.ohio-state.edu/sysadm_course/html/sysadm-67.html escargo

file overlays

Closely related is the idea of mounting several directories on top of each other, with things like 'glob' returning the union (or the concatenation) of all underlying items. Some of this is possible now in VFS (but there is no implementation), but I'm sure some of this will require further Tcl core enhancements.

file locking

Many asynchronous protocols or operations would benefit from some kind of locking mechanism. It might be good to extend the interface in the future with this. (Note that any vfs can provide its own 'file attributes' both readable and writable so in principle any other features can be supported through attributes using the existing vfs code).

exec

This Tcl command effectively boils down to forking off a variety of processes and hooking their input/output/errors up appropriately. Most of this code is quite generic, and ends up in 'TclpCreateProcess' for the actual forking and execution of another process (whose name is given by 'argv[0]' in TclpCreateProcess). Would it be possible to make a Tcl_FSCreateProcess which can pass the command on either to the native filesystem or to virtual filesystems? The simpler answer is "yes", given that we can simply examine 'argv[0]' and see if it is it is a path in a virtual filesystem and then hand it off appropriately, but could a vfs actually implement anything sensible? The kind of thing I'm thinking of is this: we mount an ftp site and would then like to execute various ftp commands directly. Now, we could use 'ftp::Quote' (from the ftp package) to send commands directly, but why not 'exec' them? If my ftp site is mounted at /tcl/ftppub, why couldn't "exec /tcl/ftppub FOO arg1 arg2" attempt a verbatim "FOO arg1 arg2" command on the ftp connection? (Or would perhaps "exec /tcl/ftppub/FOO arg1 arg2" be the command?). Similarly a Tcl 'namespace' filesystem could use 'exec' to evaluate code in the relevant namespace (of course you could just use 'namespace eval' directly, but then you couldn't hook the code up to input/output pipes).

If anyone wants to have a look at a patch which does most of this on Windows: ftp://ftp.ucsd.edu/pub/alpha/tcl/chanExec.patch should do the trick. 4 tests fail, however, so any help fixing that would be great. It should be pretty easy to move the patch over to Unix as well.

See VFS, exec and command pipelines


JCW VFS in the core offers an interesting option: taking all file system calls out of the core, and bringing them back in as an extension. This is one of the areas where I see a quite unique opportunity: with FS calls out (and using a wrapped exe such as TclKit to provide just access to runtime files stored at the end of the executable), one has a system which is provably incapable of accessing (let alone modifying) the local disk. Such a setup, with sockets still included, would offer a formidable alternative to client-side Java deployment, with all the power of scripting and Tk at its disposal. There are a few extra steps to take (such as disabling exec, and load, and piped open)*, but that's about it. The 8.4a4 core has all the logic in place to start modularizing file system calls. TclKit already fully relies on memory-mapped access to the executable for all runtime files, so we're pretty close to such a setup, IMO...

(*) There's no need to disable 'load' since it will be disabled anyway if a native filesystem isn't present. Both 'exec' and piped open end up at TclpCreateProcess (see 'exec' discussion above). This and related procedures (in tclPipe.c and tclWin|UnixPipe.c) in turn call 'TclpOpenFile, TclpCloseFile, TclpMakeFile, etc'. The question is whether we can suitably abstract all of this away into the filesystem table. The easiest would perhaps be if the generic code tclPipe.c used channels everywhere instead of TclFile file descriptors. Then there would only need to be a single filesystem-specific call 'TclpCreateProcess' which could go in the lookup table. This would then allow us (very easily) to create a version of Tcl which is incapable of accessing the local disk, since all native filesystem support could actually be in a separate library.

Does anyone understand enough about the unix/win pipeline code (remember there's no 'exec' on macos) to know whether a shift from TclFile to Channels would be a problem? See above, for a patch which does most of this: ftp://ftp.ucsd.edu/pub/alpha/tcl/chanExec.patch

Such a rewritten exec core could allow diversion of pipes to or from ordinary Tcl code as well (as in Streams). This would make a 'stream' a real, robust, part of Tcl.


21oct02 jcw - Idea for yet another VFS driver: a readonly filesystem which does a script-indirection. In other words, when opening file "abc", the driver reads in "abc", evals the contents as a Tcl script, and returns the output. A refinement would be to call a 'read" proc when reading, a "stat" proc when doing a stat on the file, etc.

The uses? Well, endless really - imagine a set of scripts, each of which does a build of an extension. Then doing a full build consists of nothing more than doing "file copy ...". In fact, one could do "load blah.so", which the extension built on-the-fly.

Or more traditionally: storing scripts which generate HTML pages, and using such a set to re-generate a website. Or scripts which take a certain datafile, apply some transformations, and return the result.

Endless. The possibilities of such, eh, "active scripting files" :) really are endless.


filesystem attributes

See vfs filesystem configuration


Core VFS support


Category Package Category Acronym