Version 0 of Concepts of Architectural Design for Tcl Applications

Alexandre Ferrieux - July 98

Introduction

This guide was motivated by my experience with questions on the comp.lang.tcl usenet newsgroup. Very frequently, Tcl novices are lost in problems created by a poor architecture. Interestingly, Tcl has now gained sufficient maturity to allow fast and easy design of efficient and beautiful architectures. But, as any other full-featured programming language, it cannot prevent people from writing *bad* ones. Hence this guide.

Intended Audience

The intended audience is primarily 'Tcl novices': people with some previous exposure to programming in general (e.g. other languages), who are new to Tcl. Now I'm not saying it couldn't benefit 'Programming novices', who are discovering Tcl *and* programming at the same time (lucky them, wish I had something like Tcl to start with ;-), but it is more a 'concept paper' than a tutorial. The goal is to promote ideas, with the help of *illustrative* examples; for the corresponding recommended style/idioms, please look at real tutorials (there are plenty of good ones around - see e.g. http://www.purl.org/net/tcl-faq/ or http://starbase.neosoft.com/~claird/comp.lang.tcl/tcl_tutorials.html ).

Scope

At the end of this article is a short list of simple ways of using Tcl for software reuse in a (possibly) heterogeneous, two-layer (explained below) context. It does *not* cover the simpler one-layer apps (where reuse occurs through the 'source' or 'require' commands), because IMHO it is sufficiently well-known...

White vs. Black boxes

Software architecture is all about *reuse*: nobody likes to reinvent the wheel.

There are two dinstinct approaches for reuse:

        - "white boxes", where all the  pieces  to  be  integrated are
          available   in  source  form,  and  the   user  is  supposed
          (encouraged)   to  read  and   understand  thoroughly  their
          internals  and possibly copy and modify them to suit his/her
          needs.

        - "black boxes", where  only binaries  are  available, or when
          the sources are not supposed  to be read,  and certainly not
          to  be modified.  The intended use is through a well-defined
          API, which plays a role of a contract between developers and
          "insulates" them from details on either side.

It is not hard to imagine the class of situations to which each one applies:

        - White-boxing  is restricted to tight developer  interaction,
          i.e.   small  teams,  preferably  one  standalone  developer
          reusing his own tools.  Maybe its sole advantage is a global
          optimization  of the components, since  their design choices
          can  be questioned at any time.  But its massive drawback is
          the  maintainance headache induced by code duplication.   (A
          famous  example   is   Knuth's  TeX-MetaFont  duo,  where  a
          significant  part   of  code  is  *nearly*  common,  through
          duplication and slight modification.  It is okay  for people
          like   Knuth,   under   the  strong   ruling   of  'Literate
          Programming',  but   few  other   situations   have   enough
          discipline to manage the problem).

        - Black-boxing, on  the  other  hand,  applies to  the general
          case: when the  team is larger, or (more interestingly) when
          insulation from details is sought. Another  (bad) reason for
          this good choice is when different source languages (with no
          bilingual expert around) are to be mixed.

Interestingly, the now seemingly doomed white-boxing scheme has been endorsed (for years) by most OO languages, where a minimal prerequisite to object reuse is class-method oriented link API. As an example Sun's now dead CTI library (XTL) could *only* be accessed by its originating language, C++. Sometimes, things get even worse: you need to use the same compiler !

After a period of heavy support to the white-box OO framework, Microsoft itself acknowledged the failure of the model, and now puts its full weight into a black-boxing, language-neutral framework: COM.

It is thus clear that nowadays, black-boxing is the way to go in most cases. Now in this black-box context, a paradigm has emerged that allows one to build applications with minimal "mixing entropy": the Two-Layer model.

The Two-Layer model

In this framework, several independent and efficient functional "atoms" (building blocks) are made available to a single, general-purpose, highly readable integration language. The efficient atoms define the lower layer (L1), the integration language the higher one (L2).

The justification for this paradigm is fairly obvious: the two (rather incompatible) tasks of CPU-efficiency and "human-efficiency" are separately undertaken by the two layers, so one gets the best from both worlds.

The first and most popular example of this architecture is given by the Unix shells (sh, csh, and the like): human-efficiency is obtained via the super-simple command-line syntax, limited quoting/substitution abilities, and handy wildcards. CPU-efficiency is handled by handcrafted gems like grep(1) or sort(1), which are spawned as subprocesses of the shell.

The second, less obvious example of this is the late endorsement by Microsoft of the "scripting" paradigm within the COM/OLE automation framework. The result is less convincing, because of the heavy impact of GUI-only tradition (and also of bad design choices at early stages), but the idea is here.

Tcl is the third example along these lines. Its crossplatform scope makes it roughly a child of both previous examples, although the purity and orthogonality of its principles discard any possibility of a relationship with the second one ;-).

Unixians, read on !

All of the above is obvious to people familiar with Unix. It might interest them, however, to know that Tcl's inter-process communication (IPC) capabilities far outreach those from, say, the excellent Bourne Shell. One specific item, 'fileevent', is even unseen in any modern shell. Its power shouldn't be overlooked.

An ordered approach list for Tcl

Below is an introductory list of approaches for creating a modular two-layer Tcl application, ordered by increasing complexity & power.

Others are available - e.g. lower-level IPC like shared memory and message queues, or Tk-specific like 'send'. However, they are not described in this document because they're either less portable, or overconstrained, or too low-level for Tcl's shell spirit, or both.

 List summary:

Child + exec
Child + fileevent
Loadable extensions
Custom main() linked with libtcl

1) Child + exec

 (See exec.n)

Here and in the examples below, "xxx.n" or "yyy.3" are the names of the (nroff-based) Unix manual pages for Tcl, n being the 'scripting language commands' section, 3 the 'C functions'. On Mac and Windows, just type the name without extension in the Help application.

  1a) Synchronous

        set result [exec cmd args 2>@ stderr]

        Pros:
                - easy !
                - homeland to Unixians (backquote)
        Cons:
                - synchronous, blocking
                - not for a tight loop (fork() overhead)
                - not available on the Mac.

  1b) Asynchronous

        set pid [exec cmd args 2>@ stderr &]

        Pros:
                - async !
        Cons:
                - no result or exit status back
                - no notification on termination
                - not available on the Mac (though it can be simulated
                  by sending Apple  Events to the Finder,  or  through
                  AppleScript).

2) Child + fileevent

        (See open.n, fileevent.n, close.n)

        set f [open "|cmd args 2>@ stderr" r]
        fileevent $f readable "gotline $f"
        proc gotline f {
                if {[gets $f line]<0} {
                        # it died !
                        close $f ;      # catch to get exit status
                        return ;        # (see errorCode in tclvars.n)
                }
                # use $line !
        }

        Pros:
                - async !
                - full I/O with "r+" open mode
                - the child remains alive all the time it is needed
                  (small fork overhead: just once)
                - full notification of stdout close (approximates exit)
                - full exit status back (catch {close})
                - flexible (same with sockets, named pipes, ptys)
                - nice  integration   (through  vwait)  with  "after"
                  handlers and with GUI (Tk event loop)

        Cons:
                - IPC and parsing overhead
                - not pipes on the Mac (but sockets are OK).

3) Loadable extensions

        (see load.n, Tcl_CreateCommand.3)

        load mydll
        mycall myargs ...

        Pros:
                - fast !!!
                - Crystal-simple API. Write an extension in 10 minutes.
                - Can even  use  non-Tcl  extensions on  Win32  (Robin
                  Becker's generic DLL caller)
        Cons:
                - Nothing asynchronous (callbacks) up to now
                - Maintainance    headache   regarding    directories,
                  versions, and names...

4) Linking libtcl to custom main()

        (described in old JO draft)

        <main.c>:
                ...
                Tcl_CreateCommand(...)
                ...

        cc ... -lmylib -ltcl
        -> builds a custom tclsh/wish extended with mylib's primitives.

        Pros:
                - only solution  in  case of legacy code accessible as
                  statically  linked libraries  only  (which cannot be
                  glued  inside  a dynamic  one,  e.g.  because  of OS
                  shortcomings, or non-relocatable objects).
                - Easy  packaging   (hidden   inside   the  tclsh/wish
                  binary).
        Cons:
                - Complexity  increases dramatically  in  the  case of
                  multiple orthogonal add-ons.
                - 'Easy  packaging' comes  at  a cost:  packaging  Tcl
                  itself  (which  means  compiling   the  tcl  library
                  scripts into  a single binary). Short  of that,  you
                  have to worry about directories anyway,  so the case
                  of  the  (static  vs.   loadable)  extension becomes
                  negligible.
                - I'd say  'deprecated' in most modern,  unconstrained
                  cases ;-)

Discussion and caveats

My strong suggestion here is to *always* make sure no approach earlier on the list than the one you choose can do the job. Occam's razor is a golden tool in software design (and in other areas as well ;-).

If you don't care for this dumb linear 'algorithm', here are a few rules of thumb to help you choose:

        - 1a  (sync exec) is  primarily  for  executables which  yield
          quickly  a short  result, like 'uname  -a'.  Quickly because
          Tcl won't be blocked  too long, and short because the amount
          of output that can be stored  in memory is *finite*.  Beware
          not to reinvent  the wheel though (e.g. [exec date] has been
          obsoleted by [clock format [clock seconds]]).

        - 1b (async exec) is  a  "fire-and-forget"  operation.  It  is
          often used to  spawn some passive  document-viewing GUI app,
          like "xv",  since  the user-triggered end of a viewer has no
          effect  on the rest of the app. Often, a slight violation of
          this can stick to the method with the help of a shell:

                 exec sh -c "dosomething > tmpfile;emacs tmpfile;rm -f tmpfile" &

          Here,  'sh' waits  for  the end  of  the  viewing  operation
          because tmpfile serves no  other purpose and should thus  be
          deleted.


        - 2 (fileevent) is a  Swiss  knife. Considering all it can do,
          the 'complexity' involved is really minimal.  The one caveat
          is   that  you  need   to  understand  the  nature  of   its
          asynchronicity: fileevent registers an *event handler* (just
          like 'after' or Tk events  ,  which  can  only  be called at
          well-defined times  (see vwait.n,  update.n), unlike  signal
          handlers or alternate threads  (which  need more  protection
          against  reentrancy  problems).  Once  you are familiar with
          this  event-driven  programming  model,  you'll  wonder  how
          people survived  in the  prehistoric times where  it was not
          available  (or  restricted to Tk,  where it was born).  As a
          popular example, using  fileevent  with sockets  instead  of
          pipes   is  frequently   a   winner,  because  a   network's
          unavoidable delays  call for asynchronicity.  Note that this
          makes distributed architectures trivial to build with Tcl.

           Portability  issue:  fileevent  works  with sockets  on all
          platforms  in 8.0p2, but to get fileevent to work with pipes
          on Windows {95,NT},  you will need 8.1. (and again, on MacOS
          you'll need sheer magic ;-).

        - 3 (loadable extensions) Is the recommended way ... to extend
          Tcl, i.e. to  add features  and/or performance that couldn't
          be  achieved in  Tcl  or  through IPC:  typically,  I/O with
          exotic  binary  file  formats  or socket protocols.  Another
          interesting  case  is  extending  *Tk*,  e.g.   with new  or
          enhanced widgets.

           Naming issue: as always with link-level  reuse, people must
          avoid  stepping on each other's toes.  Namespaces  are handy
          here (see namespace.n).

        - 4 (static  extensions)  As  stated above,  a few  situations
          discourage the use  of dynamic linking (I know AIX, at least
          in not-so-old versions, had a braindead way of  handling it;
          I'm  told that  omitting  -fPIC  on  some  platforms  builds
          non-relocatable code, unusable to build dynamic libraries).

Thanks

My hearty thanks go to Larry Virden, for his thorough and insightful reviewing and editing, and also to Cameron Laird and Jean-Claude Wippler for their support about this paper.