Array vs. VFS

This page was triggered by a collate/broadcast virtual filesystem. The title does not fully cover the topic, though, so here's a brief intro:

Tcl offers two basic lookup concepts:

arrays, i.e. name/value pairs, and the new dicts in 8.5 which bridges a gap to list-type collections
VFS, the file system model used on disk, and now available for arbitrary mapping through TclVFS

Arrays are one-dimensional, i.e. from key to value. They support traces to act on data access and changes.

VFS is hierarchical and uses some sort of "path", with parents (directories) and children (dirs and files).

The question I have is whether this dichotomy is useful or unfortunate. It is obvious that both models work extremely well, and that together everything needed can be done. Tcl's virtual access to file systems means we now have a way to fake tons of things, such as starkits which look like a file system but actually are inside a single file/database. VFS can do a lot more. Access to http/ftp/webdav servers is an obvious step, but why stop there. There is a TclVFS driver to access Tcl's own variable/array/namespace hierarchy, for example. Another one accesses Tcl's proc/namespace hierarchy. Things like this means that if you have a file browser, then it can be used to access all of Tcl's state and code through introspection. Yummie. Might be an interesting simplication for IDE use, for example.

I've occasionally toyed with the idea of virtualizing arrays as well, although Tcl does not appear to have all the hooks in place for it yet. Wouldn't it be nice to have something which looks like an array, but which actually is a persistent datastructure, possibly of far greater size than what comfortably fits in memory? This is one of the research areas for Metakit, btw. Such virtualized arrays might be simplistic (implemented through today's trace mechanism) or more sophisticated (full transparent on-demand load/save).

(CMcC: I've written systems which maps array semantics onto a database, it works quite well via trace.)

Another example is Tequila, which can map array entries to a central server, and to files on a backing store or rows in a Metakit datafile. Tequila has proven to be a pretty effective model, but it suffers from the fact that everything usually ends up being resident *and* persistent at the same time, losing quite a bit w.r.t. true memory-mapped on-demand data handling and virtual memory paging.

So now we have a VFS which can map onto arrays, and arrays which can map onto a file system or a database. All this is relatively low-hanging fruit, there are no doubt a lot more ideas and cross-over points yet to be discovered.

Where does it end? Is there duplication of effort and concepts which could be merged? Or are there fundamental issues which indicate that the distinction should be kept? What if *everything* in Tcl were a file system through VFS - would that be the scripted equivalent of Plan 9? Conversely, what if *everything* were an array or some data structure like it - would that be the scripted equivalent of WinFS? Would either help towards a simpler / more capable design for say Tcl 9?

To put it differently, do we really want to maintain all these enumeration mechanisms in Tcl:

array get foo::bar *
glob foo/bar/*
info vars foo::bar::*
namespace children foo::bar *
info procs foo::bar::*

Or:

file chan
info args foo
info loaded
package names

Or even:

pack slaves .foo.bar (Tk)
mk::select db.foo bar (Metakit)
select bar from foo (SQL)

-jcw

I think the dichotomy between fs and array is perhaps weaker than your analysis suggests: just as we can have an array("a,b") which resembles a 2-dimensional array, we can look at array("a/b") and a vfs element "a/b" - it is possible to interpret this hierarchally, but we are not really logically required to do so.

In the standard tcl fs, we can't open and read or write a file whose name is the same as a directory, but other file systems (e.g. mkvfs, httpvfs) don't share this limitation.

It's logically possible to treat a vfs as a flat mapping path->content, / need have no special interpretation.

There's an implementation problem with trying to present a vfs as an array. One can use trace to implement file read/write as array element read/write, but trace isn't strong enough to replace glob with array names, because there's insufficient information provided to the trace to enable one to construct the kind of return value expected. It seems to me that trace array is nearly useless.

-- CMcC

Category Discussion

Category VFS