''Could become a TIP if there is enough support'' '''Abstract''' Tcl currently starts up from the native filesystem, unless special hooks/builds are created (like TclKit) to divert that startup through the use of the new Tcl 8.4 vfs functionality. Tcl's core distribution should allow a more configurable distribution startup, including full support for making standalone distributions. Given the flexibility which is possible, the same mechanism should allow network startup, .zip startup, .kit startup or the ordinary 'files all over your filesystem' startup. '''Rationale''' [KBK] By now, there are a number of files that must be installed at some well-known place in the filesystem (designated by ''$::tcl_library'') and available at run time, complicating deployment of Tcl-based applications. These include: * The standard scripts, ''init.tcl'', ''auto.tcl'', ''package.tcl'', etc., without which the Tcl interpreter itself doesn't work. * The standard packages, ''http'', ''msgcat'', and ''tcltest''. * The encoding files, without which the Tcl system finds itself unable to read files in any encoding except UTF-8. Soon to come will no doubt be: * Message catalogs for localization of error messages, formats of numbers, times, and currencies, names of languages and countries, and so on; these are fundamental to internationalized systems. * Time zone information files. Tcl has been too long at the mercy of the implementation of strftime on the local system; there are a number of bugs in time zone management that simply are not fixable without local definition of time zone names, offsets, and DST rules. I've been contemplating some fairly bizarre hacks on this issue to avoid confronting the question of how to distribute the roughly five hundred files of a few hundred bytes each that any reasonably complete time-zone library would need. * Additional bundled packages as we move to larger "batteries included" distributions. In addition, a universally-available internal VFS is a common component for [starkit] support, and for single-file packaging tools such as [freewrap] and the ''prowrap'' component of TclPro. It's an answer to a commonly asked question: [How can I compile Tcl type scripts into binary code]? There are several open issues with actually getting an internal VFS into place. * ''Fabricating the VFS.'' There is an open question about how to get the VFS content installed in the executable. Executable file formats vary, and compiler-based solutions are problematic (many compilers, for instance, have inconveniently small fixed limits on the size of compiled-in data. For this reason, the internal VFS probably must be optional, with the Tcl system capable of running out of a native filesystem. * ''Choice of 'mount point' name.'' The VFS must have a name that cannot collide with any path name in the native filesystem. While awkward, this issue can probably be managed, for instance by naming its mount point to be the same as the result of [[info nameofexecutable]]. It's still something we have to settle on before we pass a formal TIP on the subject. * ''Choice of format.'' ZIP/JAR format is probably the best choice, since there is a wealth of code (for instance, the ZIP filesystem of Tobe [http://www.hwaci.com/sw/tobe/zvfs.html]) available for the purpose, and the format is familiar. There are nevertheless several competitors, for instance Jean-Claude Wippler's successful VFS based on MetaKit. Once again, a decision will have to be made. I don't see any of these problems as insurmountable, and I think this is a direction we should be going. I welcome the comments of others! See [Built-in VFS] ---- ''[JCW] - Cool, very cool, Kevin. Let me take you up on that offer:'' First, as you may have noticed, I've repositioned your remarks - they were placed in a weird spot (in the middle of a sentence?). Perhaps your browser played tricks on you? Your comments were most enlightening to me. It's good to make the VFS concept a central one - even when ultimately it means there '''are''' no issues, since it's all the same regardless of what handler is used. But from another angle, the terminology "internal" VFS reflects what I think might be inappropriate perspective: "Tcl is the center of the universe, let's extend it and internalize more". I can understand this approach - given the history and how it must have been a atruggle to make Tcl become more of foundation for application development over the years, as opposed to JO's early positioning as an add-on C library. I consider '''persistence''' to be a more effective center of the universe, if there needs to be one at all. Or in different words: that what happens when a system stops doing things and you return to it later. Code and programs (and interpreters, at a different level), act as "operators" on data. Code evolves and gets replaced - data is a passive component (conversion happens, but is always painful). Data is the fixed point, code comes and goes. So from where I stand, I'm far more inclined to say that there is a data model, such as a hierarchical file system (this itself may one day need to be revisited, btw), and that some of its objects happen to be runnable and able to make things come to life. Tcl is not the heart, but a (major) part of it, "/" is the center of things. That perspective highlights an option I've been mulling over for some time: taking all native file-system access out of the Tcl core, and bringing it back as an extension using VFS. So "/" would start out unattached to any real-world data, and tcl then starts mounting things like the local file system (or part of it), and the VFS it needs to access the standard runtime files you described above. The value of this should not be under-estimated: a scripting infrastructure which can be stripped down to not contain a single local filesystem call can be of great importance for some deployment scenario's. Think for example of a network-client which runs locally but cannot touch the local disk, not even if hacked. One could still access the local VFS btw, by using a read-only memory-mapped file (as MetaKit does). There's a bootstrap problem when "/" is the center of everything, just as there is for a CPU trying to figure out how to boot a "file" off a hard disk before it understands the file system. Addressing that in a generic manner would help all VFS implementations greatly. The choice of ZIP vs. MetaKit has a number of dimensions. Right now, ZIP cannot be written from VFS (though a trivial bit of Tcl, such as [http://www.equi4.com/critlib/zipper.README] could fix that). More important to me, is the transaction model which MetaKit supports, meaning that changes are safe - even if the process is interrupted. That too can be had in other ways, transactions are no rocket science - I just never bothered because I already had a solution. Convenience and acceptance come to mind, which was no doubt a key reason to pick ZIP in Java as well. Encryption will become an issue. Performance is an issue as well, an area in which I consider MK superior to ZIP because its file index (zip would call it the "catalog") can be traversed on demand. What this tells me is that the choice of "vfs handler" should be kept open. There is value in having more than one storage implementation, and people will gain from having a trade-off option to pick one over the other. Or to come up with yet better designs, for that matter. One could technically build a tclkit (which is MK centric) to use ZIP archives as its "internal VFS", but the generality of being able to treat the tclkit executable '''itself''' as simply another scripted document, is starting to simplify things considerably. To illustrate: the latest SDX utility [http://www.equi4.com/pub/tk/examples/sdx.README] makes the construction of single-file executables like ProWrap and FreeWrap trivial - so that all deployment scenarios from 1 to N files are treated uniformly. The one restriction that keeps popping up is that executables can only run in read-only mode - it makes sense, but it means 1-file deployment can never use a r/w VFS setup. This OS-imposed limitation means ZIP and MK become equivalent for 1-file use. It's the 2-or-more-files deployment scenarios which benefit from MK, which is what scripted documents are all about. Note that file format issues are just about to become irrelevant in normal use - as one can "VFS-mount" any of them and then simply do a (recursive) "file copy" in Tcl. If zlib compress/decompress is added to the Tcl core, one can write VFS readers for both ZIP and MK SD's in pure Tcl. In fact, I have that code for both - it just needs some work to clean things up and release it all as a set of packages. Creating ZIP files from pure Tcl is easy, as mentioned before. Creating MK datafiles from pure Tcl is not - the file format of MK is non-trivial, and aimed at high performance and at a much larger set of uses than zip archives. but it's a matter of time, because that is exactly where I intend to go. A last comment on why perhaps the term "internal VFS" may not be optimal long term: suppose one uses a net-based VFS handler, that would allow fetching all run-time files off the net in whatever format one likes. Not very different from running off NFS, but there may be more aspects to this, such as security or ownership issues. ---- ''[Vince] - Just a few comments on the very interesting discussion above'' Certainly Tcl needs to support some sort of packaging by default, with zip/jar being indeed the obvious choice. I also see why it would not be the best idea to fix on a single choice. There are two reasons for this: (i) Given the generic VFS support in Tcl, Tcl doesn't actually care how it gets hold of the files at all, (ii) JCW and perhaps others might prefer to build things using other filesystems (metakit), (iii) we might even want to initialise Tcl off a network. So, it seems as if what Tcl needs, extremely early in the startup sequence, is a call to a new ''TclInitializeFilesystem'', the result of which must be that there is at least one entry in Tcl's filesystem lookup table. Then the rest of Tcl's startup procedes as normal. The default TclInitializeFilesystem will register the ''&nativeFilesystem '' table (from tclIOUtil.c), but as long as there is a mechanism (whether this is compile-time or not, I'm not sure) for a different action to be taken, then a Zip, Metakit, Network filesystem could be registered instead, as appropriate. ---- [KBK] (11 April 2002) -- OK, I think we're all reading from the same page here. I wasn't trying to suggest that there necessarily be one and only one choice for an "internal" VFS - one that Core functionality can use for packaging. Rather, I was trying to say that there should be ''at least'' one internal VFS packaged with the Core, so that we can be more weakly dependent on the host FS. I think we could still salvage the use of [[info nameofexecutable]] as a mount point, if we introduce another layer of naming beneath it; perhaps the initialization should mount Tcl's own internal VFS as [[file join [[info nameofexecutable]] Tcl]] or something. ---- [JCW] Vince's ''TclInitializeFilesystem'' sounds like a very good idea to me. It de-couples a key policy decison. W.r.t. [[info nameofexe]] I am not sure. It assumes that what Tcl considers "/" maps to some real world "/". We do need a spot to anchor information so the startup process can access "files" such as info.tcl (at least that what they look like). Ponder, ponder... how about taking the model of VFS '''being''' the implementation of Uniform Resource Locators. An idea launched on the chat earlier today: when one sees a path "haha://blah", then what that really means is that the VFS handler called "haha" deals with "blah". Plus on top the powerful concept of mounting, so things may switch schemes without knowledge of the URL user. Now imagine (actually it exists) a file system which maps into Tcl's namespaces and vars/arrays. Let's call it the "var" scheme. I.e. set fd [open var://tcl_platform/platform] puts [gets $fd] close %fd Maybe the name "inner" would be more accurate. Now, drop the assumption that real files need to be mounted in certain spots for Tcl to find them. One thing we do know, is that the "var" scheme always exists (every Tcl Interp has vars, or at least the capability to have them). Drop all library searching logic, and *define* Tcl startup as doing: source var://tcl/config/boot.tcl No if's, no but's, always. Then startup becomes a matter of putting the right stuff in ::tcl::config(boot.tcl) - it'd be totally flexible, wouldn't it? It could be two-stage, i.e. mount something else, and source the "real" init.tcl? ---- [Vince] -- just like to echo jcw's point that [[info nameof]] is not necessarily a good place to look, because if we haven't mounted a native filesystem the string it returns will not be very meaningful, and a lot of the core code might not like it at all (the problem is that the first parts of the path will not exist). For this reason I think the path would have to be one that is considered absolute, otherwise the same trouble will always arise. This means that either it is under '/' or it is a new toplevel 'drive' (which the new code allows quite happily, and is probably preferable in this case, since, again '/' won't necessarily exist). This could be something like what jcw suggests (var://...) or something like 'tcl://' under which everything lies.... Anyway, that is really just a 'cosmetic' issue, since whatever vfs is written to support the new Tcl filesystem, it will be pretty trivial to modify it to accept any particular mount point as its source.