Version 14 of StarSite

Updated 2002-12-01 08:07:07

Name for the concept of placing the contents of a whole website into a single file and serving it directly from there. By Neil Madden.


He, Neil, already uses something near to this for his personal website, based on Metakit and tDOM. I.e. he stores XML and converts that to HTML. This is currently done offline. The moment this is changed to online generation the StarSite is complete.

AK: Related to this is a notion by BrowseX. This browser allows the retrieval of a website and its storage into a zip-archive. This is usually meant for easy transfer of a website, but also allow display of the archive via BrowseX, without unpacking it (IIRC - AK).


AK: Obvious extensions of the concept.

  • A local mode, i.e. running the Starkit containing the StarSite, or providing access to it [*], in a non-web environment pops up a Tk based display which allows browsing the site without web-browser.
  • An extension of such a local mode would be to enable the editing of pages in the site.
  • Allow the starkit to run not only as cgi-type application, but as its own web server.

Depending on the exact nature of how web pages are stored in the StarSite this can have significant overlap with the code base providing the Tcl'ers Wiki. Especially as some discussed extensions to it would allow the storage of not only Wiki markup, but other types of data as well, like images, HTML, or XML. The similarity should be clear by now.

AK: [*] I should explain. My first association was that the code providing access to the contents of the StarSite was part of the StarSite Metakit, in a VFS, making the StarSite a StarKit, or StarPack. As the Wiki codebase shows that doesn't have to be the case. The StarKit containing the code can be distinct form the Metakit database containing the StarSite.

Regarding editing: For HTML this might have to be a free-form editor. For XML we can use StarDOM. Also note that the AlphaTk editor already has a Wiki mode. Extending this to HTML and XML modes might be simple. This implies that the StarSite access StarKit does not need to have the editor embedded into it, although that is an option too. It just has to have way of invoking editors we can hook our prefered editor into.

NEM - Yes, I was thinking along these lines. Some other things I am considering are:

  • Storing data with a mime-type association (text/html, text/xml, image/gif etc). I don't believe that mk4vfs does this presently.
  • Allowing viewing the database as a metakit database, or a filesystem (both are useful at times).
  • Some sort of authentication/access-control built in. Wiki type applications with universal access are useful for some things, but often, you want more security. This needs to be designed in from the start, to be effective.
  • Versioning/Archiving (just like the wiki, but maybe more fine-grained?)
  • Ability to run as standalone HTTP or as CGI, with a consistent scripting API in both environments (ie a script shouldn't care).
  • Some mechanism for plugging in XML/XSLT transformations.
  • Ability to query database using XPath???
  • Ability to group items together (for instance, grouping identical pictures in different formats: a .gif/.jpg for web grouped with a bitmap for WAP).

Lots to think about. I think that getting the authentication/access-control stuff right will be the toughest bit. Looking at zope, everything seems to be an "object" (including users, scripts, static content), which can have access permissions granted to it. I know adding security features might seem like overkill, but it needs to be there from the start if anyone wants to use it (which I will). Adding it in later would be a bit hackish.


NEM - Excellent summary. This is exactly what I was planning. My main interest was in XML/XSLT generation of content, but really anything should be possible. The StarSite would sit on the server, and intercept requests using the PATH_TRANSLATED variable. So, for instance, in my website currently, the script xml.cgi can be invoked like http://www.tallniel.co.uk/cgi-bin/xml.cgi/home.xml which grabs the home.xml file and applies necessary stylesheets to it. Likewise, images could also be requested and returned from the database. The fact that MetaKit is the backend, allows for sophisticated searching and user interface options (session management, personalisation etc). Mirroring a site would be a case of copying one, highly-compressed MetaKit datafile. I find this concept quite exciting.


Note that the Wikit is a case of putting the _contents_ of a website into a file. I see above that Starsite would include a web server and, rather than using a markup style and conversion like the wikit does right now, would use xml as the markup and tdom or tclxml as the conversion software. Another difference appears to be that wikit is about content management, in a sense, in that visitors to the web site have the ability to update the pages. What other differences are envisioned?

Well - as far as I am aware, wikit only allows the inclusion of the textual content in one file. The StarSite concept takes this a bit further, by allowing images, media etc to be stored in the same file, as well as other information (e.g. a user database). The idea of a starsite, as I (NEM), envision it, is that it should be able to do whatever a normal website can do, but with the added advantage of having everything in one file. So, you could, theoretically, put a wikit inside a StarSite. That is how I see it developing. At the moment it is nothing but this collection of ideas. When things start to reach a more coherent state, I (and any others who wish to join me) will sit down and start making it. The ability to update a StarSite (or parts thereof) over the web, is a feature I would like to include. The XML references are just there as that is what I like to create my site in. However, I feel StarSite should be broader than that. It should be a means of encapsulating a whole web site, with various common functionality available to make things easier (collaborative editing, authentication, session management, data storage etc). In the simplest case, a person would fire it up at home and use the Tk GUI to add static content (HTML, pictures etc). When finished, they would simply ftp the file to the webspace they use (in a cgi directory), and it would just work (just like starkits - no hassle installation). Alternatively, it could run as its own webserver, for intranets and the like. StarKits solve installation problems for regular applications. StarSites would solve it for web applications.

AK:

  • Look at Ideas for Wikit enhancements and Christophe Muller to see the overlaps.
  • Using mime/type association for the content: Exactly as proposed for the wiki. Note that the wiki stores its pages directly in Metakit tables. It does not use the mk4VFS for its contents.
  • mime-types / mk4vfs: Interesting idea. Generalized: User-defined attributes for files. I am not sure, but I believe there are even native filesystems which might support this. Needs research.
  • Authentication/Security: Agree with building this in form the start.
  • Authentication/Security: Has to allow deactivation. Example: Wiki
  • Versioning/Archiving: The wiki codebase itself remembers the times of any change, and also saves out any change to a directory, if so configured. It only does not remember the exact changes/diffs in the internal database. The history of the Tcler's Wiki itself is a daily CVS import of the current state, making this more coarse-grained than the wiki codebase is able to support.
  • Regarding plugins: Ties to mime-types in my view. Based on the mime-type of a content page, and the chosen output medium we can choose which renderer to use, which editor to use, etc. The wiki already has several Wiki Markup renderers chosen automatically upon 'format' flag and medium (Tk vs. Web).

NEM 30Nov2002: Latest brainstorming on this (flow of control of a request coming into a starsite):

  • The whole system sits on a special virtual filesystem, with some differences:
  • Files have a mime-type associated with them
  • As well as directories, there is the concept of sections. These are mounted on to directory points, and control access to all files from that point down (until a new section starts).
  • These sections are essentially directories, but with some procedures associated with them - namely a handle request procedure, and a handle error procedure. (Possibly others).
  • Sections have an access-control list associated with them. This consists of a list of groups and a set of permissions. Initially, I think the following permissions:
  • page - create, delete, read, edit.
  • subsection - create, delete, read, edit.
  • groups are like they are on UNIX. users are people viewing/editing the site. Users belong to groups. There are two special users: anonymous is a non-logged-in user, webmaster is the super-user. There are like-wise two such-named groups which contain these users. The webmaster (or admin, or root, ...) group has complete access to everything, while the anonymous group typically would only have read-only access (notice, though how a section can override this in its access control list, so a wiki could work). A user who has edit permissions on a section can alter the permissions (?? - maybe).
  • Access to this VFS is through a special API (probably not the standard Tcl VFS API, due to the need for mime-type associations).
  • Right, now onto how this all works:
  • A request comes into the starsite (either through the built in webserver, or CGI or...). The first stage is to authenticate the user. A seperate (replaceable) module handles this. It simply does all it has to do to determine who the user is. It removes any trace of its mechanism from the input (so, if it used a cookie, it would remove the cookie from the list passed in). It returns the username of the person making the request, or anonymous if they are not logged in. This module could work in any way, and so will be replaceable. It only works out the user name, it does not do access-control.
  • Next step, the starsite runtime looks at the requested URL, and figures out which section it falls in (as sections are mapped onto directories, this will be by just finding the most specific directory which is a section map point).
  • The starsite works out the format that the client wants the result in. It will use a (customizeable) algorithm based on specific request (e.g. if .html was request then return HTML), accept-type headers, and finally, as a last resort user-agent strings.
  • Access-control: The star-site then looks up the access control list of the section in question, and compares it against the groups which this user belongs to. If the user has access to this section, then we call the request handler for this section, passing in the requested URL, the requested mime-type, and the arguments passed (from ?blah=foo&a=b stuff, and from POSTed data etc. PATH_INFO/PATH_TRANSLATED stuff will not be passed in here - it will be used to figure out the requested URL).
  • The request handler retreives the file, and performs whatever processing it needs to do (e.g. dynamically generating the file, applying style-sheets, etc), and returns the file contents. The mime-type etc will already have been set. There may be an API for adding extra headers etc.
  • If an error occurs at any time, or if the user doesn't have the correct permissions, then the section's error-handler proc will be called with a mime-type and the error message. It should format a nice error message and return it.
  • How to enforce access-control within a section handler? Well, here I thought the best way, would be to only allow access to the VFS through a special API. When a request comes in, the request handler is called in a new interp (or one from a pool). This interp is a safe interp with access to the VFS API setup through aliases. These aliases incorporate the username into them, so that they can check access control, without the content-handler having to pass through the name of the user (that would be open to attack).
  • All access to the VFS and StarSite internals would be through these safe interps with checked access control. This keeps the starsite secure (at least, I think so, but I'm not a security expert - comments appreciated).
  • Versioning/history could be activated on a per-section basis, by adding more information to the interpreter aliases for the API - if an argument is flaged in the call then a versioning routine is called. In fact, the sections could each have an update-handler which handles edits of files, and can store away the old version.

This is just some brainstorming, and hasn't been thought through to the bitter end. I quite like the design, but I'm willing to take criticism to perfect this in the design stage. Consider this, a request for comments! Neil Madden.

30nov02 jcw - Interesting ideas... can you elaborate on the usage scenarios? Is this for deployment, i.e. creating a complete site and shipping it? Or more to to keep things manageable and self-contained?

Note that authorization per dir/subdir is supported in Apache through ".htaccess" - if tclhttpd has similar capabilities, that might be a very quick path to add such features to StarSite, since tclhttpd can work with (as well as *in*) VFS.

Currently, it is not easy to extend VFS with extra info such as a mime type, even though Metakit could easily deal with it. The reason is that the VFS layer opens with a certain layout, which would lose any fields added. Hm, having said that - it's probably possible to open, and immediately reset the layout to include those fields again - data would not be lost. But this leads to another problem: how to make VFS aware of fields such as a mime type. My hunch is that you're best off maintaining a separate data structure for mime types. If stored as a Metakit view could still be in the same VFS file (i.e. starkit), with some tricky hacking. If you'd like an example of how to store other views in a starkit, next to the VFS file system tree, let me know.

30nov02 NEM - To answer your questions in order: I see starsite as being able to create self contained sites and then deploy them as complete items, but also to allow editing after they have been deployed. I envision created a general web interface which allows creating new sections etc. This could be enabled or disabled, even on a per-section basis. Also, section handlers running under appropriate permissions (the permissions of whoever is accessing them, not whoever created them) could update content, and add new content, to that section.

Authorization: I intend to make this fully customizeable. Someone could write an authorization routine which uses .htaccess, for instance (although I don't know how this would work in a VFS). Other authorization methods could be used as well. For instance, for my own personal site, I would probably use a custom login procedure, as I do not like .htaccess and I only have CGI (with no SSL). I was thinking about including tclhttpd into the basic starsite so that it can run standalone. It would also be able to run in a cgi environment.

VFS: Yes, currently it would not be easy to add mime-types etc to a VFS. My usage of the term VFS was perhaps confusing, as I was thinking more in general terms of accessing a sort-of filesystem through an API, rather than particularly using Tcl's VFS layer. It would be nice to use it, but I'm not sure how useful it would be (the API commands would probably be quite different to open, read, close etc).

Examples would be nice. I think I could figure it out, but probably best to find out the best way of doing it.

30nov02 jcw - Ah, ok, hence the reference to Zope - a site, ready to be filled in by content providers, comes to mind. One more thought: maybe WebDAV makes sense in this context? I'll try to come up with an example for storing data alongside VFS in the same file, one such use would be to have wikit store its pages in the same file.

30 Nov 2002 escargo - I don't know if it is practical, but perhaps your permissions could be more general. In a Multics system, one of the permissions was append. That meant that you could not modify existing content, but you could add to the end. (This applied to directories, but the notion should be transferable to other domains.) This would be more of a notes file capability than a wiki capability, but still might be worth considering.

1dec02 NEM - Good idea. The permissions were off the top of my head. Append is a good one. Can we come up with a complete list? Maybe it would be possible to allow a site maintainer to define their own set of permissions? Hmm.. I think a reasonable list would probably be best. Time to think of use-cases, I guess.


Category Internet