Tcl IO Drivers

The Tcl I/O system as seen by a driver (channel type) , aka Channel System, Driver Perspective, by aku, is reproduced below in its entirety.

HOWTO

 A. Kupries
 TclIODriver                                Andreas Computer Laboratories
                                                      (Me, myself and I)
                                                       November 14, 2000

The Tcl I/O system as seen by a driver (channel type)

Abstract

This document describes the I/O system used in the Tcl core as it is seen from a driver implementing a channel type.

   1.     Introduction . . . . . . . . . . . . . . . . . . . . . . .   2
   2.     Main facilities in the core  . . . . . . . . . . . . . . .   3
   3.     Writing a channel driver . . . . . . . . . . . . . . . . .   5
   3.1    InstanceData . . . . . . . . . . . . . . . . . . . . . . .   5
   3.2    Creation of channels . . . . . . . . . . . . . . . . . . .   6
   3.2.1  Creation of a base channel . . . . . . . . . . . . . . . .   6
   3.2.2  Creation of a transformation . . . . . . . . . . . . . . .   7
   3.3    Destruction of channels  . . . . . . . . . . . . . . . . .   7
   3.4    Accessing the channel downstream . . . . . . . . . . . . .   8
   3.5    The driver in detail . . . . . . . . . . . . . . . . . . .   8
   3.5.1  GetHandleProc  . . . . . . . . . . . . . . . . . . . . . .  11
   3.5.2  SetOptionProc  . . . . . . . . . . . . . . . . . . . . . .  11
   3.5.3  GetOptionProc  . . . . . . . . . . . . . . . . . . . . . .  12
   3.5.4  SeekProc . . . . . . . . . . . . . . . . . . . . . . . . .  14
   3.5.5  BlockModeProc  . . . . . . . . . . . . . . . . . . . . . .  16
   3.5.6  CloseProc  . . . . . . . . . . . . . . . . . . . . . . . .  17
   3.5.7  InputProc  . . . . . . . . . . . . . . . . . . . . . . . .  18
   3.5.8  OutputProc . . . . . . . . . . . . . . . . . . . . . . . .  20
   3.5.9  WatchProc  . . . . . . . . . . . . . . . . . . . . . . . .  20
   3.5.10 HandlerProc  . . . . . . . . . . . . . . . . . . . . . . .  24
   3.5.11 FlushProc  . . . . . . . . . . . . . . . . . . . . . . . .  25
          References . . . . . . . . . . . . . . . . . . . . . . . .  26
          Author's Address . . . . . . . . . . . . . . . . . . . . .  29
   A.     Glossary . . . . . . . . . . . . . . . . . . . . . . . . .  30
   B.     Acknowledgements . . . . . . . . . . . . . . . . . . . . .  31

1. Introduction

The main concept of the I/O system used by the Tcl core is the abstract notion of channels unifying different paths for communication and the accompanying split of this subsystem into two layers, one generic in nature, the other handling the specialities of the various communication channels.

It is this second layer which is the home of the drivers implementing channel types and thus bridging the gap between the generic layer and the operating system providing the actual facilities for communication. His interface to the generic layer is what we will describe here.

Before embarking on this task some other things to note before. In the beginning of the I/O system were there only drivers for things like files, pipes and sockets. But with the inclusion of the stacked channel patch into the core in 8.2 we now have the situation that two different types of drivers can be written, one like the ones mentioned before, i.e. base (fundamental, bottom) drivers, and transformations (also called filtering channels). Both types will be described in this document but thereas the properties regarding base drivers are valid across the various versions of the core the statements regarding transformations will apply only to Tcl 8.4 and beyond. The reason for this restriction is that the interface to transformations (and their semantics) differ considerably between the various versions of the core and while documenting the differences is not impossible (only tedious) I am currently not in the mood for this boring task. People who want to know how to support a transformation across version are hereby directed to take a look at the Trf[L1 ] extension and the various compile-time and run-time tricks and decisions to do so.

2. Main facilities in the core

The main entrance to understanding a channel driver is the Tcl_ChannelType[L2 ] structure as it lists all the functionality a driver has to implement for a correct integration into the (I/O) core.

These function vectors will be later explained in Section 3.5

Of the many channel-related functions in the public API only some are of interest to a channel driver.

A principal API used by all drivers is

Tcl_NotifyChannel[L3 ]

which allows a driver to communicate with the notification/event subsystem and to post events when he is readable or writeable.

For the creation of new channels two different APIs are available, one for each type of drivers, base and transformation:

Tcl_CreateChannel[L4 ] and
Tcl_StackChannel[L5 ]

Whereas the first function creates an independent channel the second will push the new transformation over an existing channel, thus forming a stack of transformations with a base driver at the bottom. This latter function is also complemented by two more functions, one to retrieve the channel below a transformation, the second to remove the topmost transformation from its stack:

Tcl_UnstackChannel[L6 ] and
Tcl_GetStackedChannel[L7 ]

All these functions (and some more) will be explained later in more detail.

Right now it is also important to know that the following guarantees are made by the core with respect to channels and their possible stacking:

Stacking a transformation on a channel given through a Tcl_Channel token (a reference) will neither invalidate this nor all other references held by some C code. When writing to a channel represented by such an older token the I/O system will automatically compensate for the stack above it.
Only the topmost channel in a stack will do EOL-translation, UTF <-> XX encoding and buffering. All channels below will neither buffer, nor translate EOL, nor encode UTF.
Events posted by a channel are always filtered through all the channels above before being handed to channel handlers, either in C or in Tcl. All channels the event is passing through are allowed to absorb it in the process of their own work. This means that transformations are allowed and able to talk to and negotiate in an asynchronous manner with their counterpart on the other side of their channel before allowing the higher layers their turn with respect to events. An example of a transformation requiring such a facility is the implementation of the TLS/SSL protocol[L8 ] which has to set up the secure channel before allowing the normal communication.
The above also implies that the topmost channel in a stack is always notified last.
A transformation channel automatically starts out with the same blocking mode as the channel it replaces.

3. Writing a channel driver

3.1 InstanceData

Whenever a channel is created the core will not only get a reference to the structure containing the references to the driver procedures, i.e. the channel type, but a reference to a structure allocated by the caller as well. This reference is given to all driver procedures when called for that particular channel. The internals of this structure are known only to the channel driver; the core will just pass the reference around. This allows us to associate the specific state of the driver with the channel.

The following information should be present in the instance data, not necessarily under the name I gave them, you are free to choose your own identifiers:

Tcl_Channel channel;: is a backlink from the instanceData to the channel. Without this link the driver will be unable to access and manipulate its channel as all driver procedures are called only with the instance data as argument. The token to store is the result value of Tcl_CreateChannel[L9 ] (or Tcl_StackChannel[L10 ]).

All transformations need access to the channel below them so that they are able read and/or write from it. No additional item is required, just call Tcl_GetStackedChannel (Section 3.4) with this item as argument to obtain the necessary token.

Tcl_TimerToken timer;: Such a timer is necessary if the driver is able to buffer processed data the generic I/O layer has no knowledge of. It will be used to flush out such data. See Section 3.5.9 for more explanations. Transformations usually do such buffering.

int flags;: Transformations have to remember the current blocking mode to handle EOF on input right. See Section 3.5.7 for more explanations. Base channels on the other hand usually don't have a need for this as they can propagate this information to the operating system. An exception are channel types like memchan[L11 ] which hold all their information in memory.

int mask;: Transformations have to remember the current interest in events; see Section 3.5.9 for more explanations. Other channel types may use this too to detect and skip calls which do not change the mask.

3.2 Creation of channels

Every channel has to have a creation command at the tcl level and an equivalent procedure at the C-level. Depending on the type of the channel the procedure has to use either Tcl_CreateChannel[L12 ] or Tcl_StackChannel[L13 ] to create and configure the generic part of the new channel. These two cases are described in the next two subsections.

3.2.1 Creation of a base channel

The main call to create a new base channel is Tcl_CreateChannel[L14 ]. Before doing it the channeltype-specific creation procedure has to

process its arguments for possible options,
create and the specific channel structure (clientdata),
create a name for the channel, like 'sockXX', 'fileN', etc. and
initialize the specific channel structure.

After the creation of the channel it might be necessary to configure to configure it. For example it might be non-blocking by default. Note that the backlink to the channel has to be initialized with the result of the call to Tcl_CreateChannel[L15 ].

Creation procedure skeleton for a base channel

int XX_CreateChannel(interp, objc, objv, cd) {
        ''process arguments'' /* objc, objv */

        name = ''generate_name_for_new_channel()'';

        clientData  = Tcl_Alloc(...);

        ''initialize clientData''... /* name, state, config ... */

        clientData->channel = Tcl_CreateChannel(interp, &chan_type, clientData);

        ''configure the channel according to arguments, if necessary''

        Tcl_SetResult(interp, name);
        return TCL_OK;
}

The chan_type in the code above is the structure containing the references to the driver procedures for the base channel.

3.2.2 Creation of a transformation

In contrast to base channel types the creation procedure must not use Tcl_CreateChannel[L16 ] as that would create a new and separate channel, but use Tcl_StackChannel[L17 ] instead. This procedure takes as one of its parameters a reference to an existing channel and creates a new channel structure holding the state of the transformation. A token for this new structure is returned. When later accessing the old channel, i.e. the one the transformation was stacked upon, via Tcl_Read/Write et. al. the system will automatically redirect such calls to the top of the stack. In other words, all Tcl_Channel tokens stay valid, independent of where they are in a stack, yet no backdoors are opened. The latter is not completely true, but we will come to this later on.

Other things, like the creation and initialization of the necessary clientData for the transformation, have to be done as usual.

The backlink to the channel of the transformation has to be initialized with the result of the call to Tcl_StackChannel[L18 ].

Creation procedure skeleton for a transformation

int XX_CreateTransformation(interp, objc, objv, cd) {
        old_channel = find(handle(objv[[1]));

        clientData = Tcl_Alloc (...);

        ''initialize clientData...''

        clientData->channel = Tcl_StackChannel(interp, &trans_type, clientData, old_channel);

        Tcl_SetResult(interp, old_channel->name);
        return TCL_OK;
}

The trans_type in the code above is the structure containing the references to the driver procedures for the transformation.

3.3 Destruction of channels

Destruction of channels is done with either Tcl_UnregisterChannel[L19 ] or Tcl_UnstackChannel[L20 ]. Both of them can be called with any channel and will always compensate if the channel was part of a stack. The first always destroys all channels in a stack, from top to bottom, whereas the second will always destroy just the topmost channel of a stack. Both procedures are equivalent if there is only one channel in the stack.

As Tcl_UnregisterChannel[L21 ] knows that the whole stack of channels is in destruction it does not deal with events anymore, except for destroying the internal data structures supposed to deal with them. But it does ask the various channels in the stack to flush buffered information (on the write side) down the stack so that nothing which is stuck is lost. This is not possible for information in the upward/read buffers, as there is no ultimate receiver for them, so these bytes are lost.

Tcl_UnstackChannel[L22 ]does mainly the same as Tcl_UnregisterChannel[L23 ] above, except that it takes action to keep the event-system up and running. Again information in the generic read-buffers is lost, but for a reason: Anything in the input queue and the push-back buffers of the transformation going away is transformed data, but not yet read. As unstacking means that the caller does not want to see transformed data any more we have to discard these bytes. Information stored in buffers internal to the transformation and not yet transformed should be saved for later reads without the transformation in place, but we currently don't have an API to do this. Consequence: No transformation should read more information than it is willing to transform at once, or unstacking will cause gaps in the data read from a channel. This may change in the future as some of the required mechanism are already in place in the core, internally.

Whatever way was used to destroy a channel, the system will call the Section 3.5.6 of the transformation so that its driver may cleanup its data structures.

3.4 Accessing the channel downstream

This section is relevant to transformations, but nothing else.

To accesss the channel below itself a driver just has to call Tcl_GetStackedChannel[L24 ] with the token of its own channel (the backlink we talked about in Section 3.1). The function will return the token for the channel we want. A (Tcl_Channel) NULL indicates that the channel used as argument was at the bottom of the stack.

3.5 The driver in detail

Now that the environment of the driver is a little more known we can explain the operations of the various driver procedures in detail. Every description will start with the general condition under which the procedure is called by the generic I/O layer of the tcl core and proceeds to the specialities a transformation has to take care of.

But before we come to this we have to talk about driver versions as well. The core currently supports version 1 and version 2 drivers. The latter was introduced with Tcl 8.3.2, during the second rewrite of the stacked channel stuff. It was required as for some functionality new vectors were needed to correctly support it and the core also had to had a way to determine whether the new fields were valid or not and just accessing them is out of the question as the older structures simply don't have them, i.e. we would get random information and most likely crash later oon.

The structure for a channel type looks like this:

Tcl_ChannelType definition

   typedef struct Tcl_ChannelType {
       char *typeName;                  /* The name of the channel type in Tcl
                                         * commands. This storage is owned by
                                         * the channel type. */
       Tcl_ChannelTypeVersion version;  /* Version of the channel type. */
       Tcl_DriverCloseProc *closeProc;  /* Procedure to call to close the
                                         * channel, or TCL_CLOSE2PROC if the
                                         * close2Proc should be used
                                         * instead. */
       Tcl_DriverInputProc *inputProc;  /* Procedure to call for input
                                         * on channel. */
       Tcl_DriverOutputProc *outputProc;        /* Procedure to call for output
                                         * on channel. */
       Tcl_DriverSeekProc *seekProc;    /* Procedure to call to seek
                                         * on the channel. May be NULL. */
       Tcl_DriverSetOptionProc *setOptionProc;
                                        /* Set an option on a channel. */
       Tcl_DriverGetOptionProc *getOptionProc;
                                        /* Get an option from a channel. */
       Tcl_DriverWatchProc *watchProc;  /* Set up the notifier to watch
                                         * for events on this channel. */
       Tcl_DriverGetHandleProc *getHandleProc;
                                        /* Get an OS handle from the channel
                                         * or NULL if not supported. */
       Tcl_DriverClose2Proc *close2Proc;        /* Procedure to call to close the
                                         * channel if the device supports
                                         * closing the read & write sides
                                         * independently. */
       Tcl_DriverBlockModeProc *blockModeProc;
                                        /* Set blocking mode for the
                                         * raw channel. May be NULL. */
       /*
        * Only valid in TCL_CHANNEL_VERSION_2 channels
        */
       Tcl_DriverFlushProc *flushProc;  /* Procedure to call to flush a
                                         * channel. May be NULL. */
       Tcl_DriverHandlerProc *handlerProc;      /* Procedure to call to handle a
                                         * channel event.  This will be passed
                                         * up the stacked channel chain. */
   } Tcl_ChannelType;

For this discussion are only the four fields 'version', 'blockModeProc', 'flushProc' and 'handlerProc' of relevance.

All possible values are acceptable for 'version', but 'TCL_CHANNEL_VERSION_1' and 'TCL_CHANNEL_VERSION_2' are special as they explictly spell out the version of the driver. Any other value will cause the system to assume that the structure describes a version 1 driver and that the 'version' field actually contains the reference to 'blockModeProc'. And when looking at older versions of tcl.h one will find exactly this definition of the structure. If the special values are used the 'blockModeProc' must be found in the field of that name, and for a version 2 driver 'flushProc' and 'handlerProc' are valid as well.

Now that I have explained the complex logic I should also note that code which has just to read the various fields in a Tcl_ChannelType[L25 ] has to use the accessor functions in the following list. These functions have the logic above written into them and will return the correct value independent of the driver they access. This is especially of value for transformation channels. Someone setting up a Tcl_ChannelType[L26 ] structure for a new driver still has to know the rules, though.

Tcl_ChannelBlockModeProc[L27 ]
Tcl_ChannelCloseProc[L28 ]
Tcl_ChannelClose2Proc[L29 ]
Tcl_ChannelInputProc[L30 ]
Tcl_ChannelOutputProc[L31 ]
Tcl_ChannelSeekProc[L32 ]
Tcl_ChannelSetOptionProc[L33 ]
Tcl_ChannelGetOptionProc[L34 ]
Tcl_ChannelWatchProc[L35 ]
Tcl_ChannelGetHandleProc[L36 ]
Tcl_ChannelFlushProc[L37 ]
Tcl_ChannelHandlerProc[L38 ]

Some questions of policy:

When should one write a version 1 driver, when one for version 2? And what is the 'best' way to write a version 1 driver?

When defining a transformation writing a version 2 driver is recommended as only that version has the best support for integrating transformations and eevent processing. The disadvantage is that thiis will restrict the driver to Tcl 8.3.2, 8.4 and up. On the other hand, someone dead set at writing a transformation supporting all (or some) versions of the core with stacked channels should take a good look at Trf[L39 ] for the necessary voodoo to make such a beast work.

When writing a base channel it is on the other hand recommended to use a version 1 driver as that version will support all versions of the core with the least hassle involved. Because of this reasoning it is also recommended not to use 'TCL_CHANNEL_VERSION_1' but to define the version implicitly, i.e. to store 'blockModeProc' in that field. Usage of 'TCL_CHANNEL_VERSION_1' would again restrict the driver to 8.3.2 and beyond.

3.5.1 GetHandleProc

This procedure is called by the C-API function Tcl_GetChannelHandle[L40 ] to retrieve the OS specific handle associated to the queried channel.

Channel types implementing communication paths independent of the OS, like memchan[L41 ], have to return a NULL handle (erroring out is not possible).

Transformations don't need to bother with this function. The generic layer will always query the bottom most channel in a stack as that is the only one which can have OS specific handles. In other words, transformations are never queried for this information.

3.5.2 SetOptionProc

This procedure is called by the generic I/O layer whenever Tcl_SetChannelOption[L42 ] is used (for example by 'fconfigure') and a non-standard option was specified as argument.

For base channel the handling is straight forward. If there are no options to set, just set the asssociated field in Tcl_ChannelType[L43 ] to NULL. Else compare the specified name against the options supported by the driver and act accordingly. If the option is not known use Tcl_BadChannelOption[L44 ] to generate the error message.

Transformation channels are basically the same, except for unknown options. They have the additional option to delegate the call to the channel downstream. I personally recommend to delegate the call. Because of this I also recommend to implement this function even if the transformation has no options by itself.

SetOptionProc skeleton for a transformation

static int SetOptionProc(clientData, interp, optionName, value) {
        ''... handle your own options''

        /* delegate unknown options downstream */

        Tcl_Channel parent = Tcl_GetStackedChannel(clientData->channel);

        Tcl_DriverSetOptionProc *setOptionProc =
                Tcl_ChannelSetOptionProc(Tcl_GetChannelType(parent));

        if (setOptionProc == NULL) {
                return TCL_ERROR;
        }
        return setOptionProc(Tcl_GetChannelInstanceData(parent),
                interp, optionName, value);
}

3.5.3 GetOptionProc

This procedure is called by the generic I/O layer whenever Tcl_GetChannelOption[L45 ] is used (f.e. by 'fconfigure') to query the value of a non-standard (or all) option(s).

The channel type has to implement everything from Section 3.5.2 and some more. The latter not only because read-only options make sense (write-only not so much) but also because there is a special case which asks the channel for the values of all of its options.

For transformations it is again possible to delegate options unknown to it to the underlying channel. In the case of a query for all options such a delegation will generate a mighty long result. Pruning the unnecessary options values from the result of the underlying channel (-encoding, -buffering, -translation) is possible, but tedious (We are working with DStrings, not Tcl_Obj'ects, and especially no ListObj'ects).

GetOptionProc skeleton for a transformation

static int GetOptionProc(clientData, interp, optionName, dsPtr) {
        if (query for all options) {
                ''... add our own options to the result''

                ''/* add the options of the channel downstream''
                '' * to the result''
                '' */''

                Tcl_Channel parent = Tcl_GetStackedChannel(clientData->channel);

                Tcl_DriverGetOptionProc *getOptionProc =
                        Tcl_ChannelGetOptionProc(Tcl_GetChannelType(parent));

                if (getOptionProc == NULL) {
                        return TCL_OK;
                }
                return getOptionProc(Tcl_GetChannelInstanceData(parent),
                        interp, optionName, dsPtr);
        } else {
                ''... handle your own options.''

                ''/* delegate queries for unknown options downstream */''

                Tcl_Channel parent = Tcl_GetStackedChannel(clientData->channel);

                Tcl_DriverGetOptionProc *getOptionProc =
                        Tcl_ChannelGetOptionProc(Tcl_GetChannelType(parent));

                if (getOptionProc == NULL) {
                        return TCL_ERROR;
                }
                return getOptionProc(Tcl_GetChannelInstanceData(parent),
                        interp, optionName, dsPtr);
        }
}

P.S. Given the similarities in the way the delegation is handled by the two branches of the if statement above it makes sense to factor this code into a separate procedure.

3.5.4 SeekProc

This procedure is called by the generic I/O layer whenever the user asks the channel to move or query the 'file access point'. The respective public entries for these functions are Tcl_Seek[L46 ] and Tcl_Tell[L47 ]. The tell functionality is requested by 'mode == SEEK_CUR' and 'offset == 0'.

With respect to seeking the core currently distinguishes between seekable and unseekable channels. The latter are marked by setting 'seekProc' to NULL. This is currently true for "tty", "tcp" and "pipe" types, i.e serial lines, sockets and pipes. This distinction as supported by the tcl core is actually a bit more restrictive than necessary as all of the currently unseekable channels could support limited seeking. I am speaking of forward seeking or 'skipping'. For now we will have to live with this restriction.

For new a base channel the implementation of this procedure should be straight forward. Unseekable channels (like udp) and forward seekable channels just don't implement it, the other types either forward the call to the OS or simply manipulate their internal state. See memchan[L48 ] for an example of the latter.

SeekProc skeleton for OS associated channels

static int SeekProc(clientdata, offset, mode, errorCodePtr) {
        ... flush possibly waiting output
        ... discard possibly waiting input

        ... compute some value from offset, mode and current location, then forward this to the OS.

        *errorCodePtr = (result == -1) ? Tcl_GetErrno () : 0;
        return result;
}

SeekProc skeleton for non-OS channels

static int SeekProc(clientdata, offset, mode, errorCodePtr) {
        ... compute the new location X from offset, mode and current location

        if (X out of bounds) {
                *errorCodePtr = EINVALID;
                return -1;
        }

        clientdata->state = X;
        *errorCodePtr = 0;
        return X;
}

For transformations seeking is a hard problem. Should they seek using their own notion of access point? Or should they use the notion of the underlying channel and then try to adapt their own state for fine-positioning? Should they allow seeking at all?

Depending on the transformation both the first two can be impossible. Nice examples are compressors (like zlib) with their completely non-linear and position-dependent relationship between the number of bytes coming in from the downstream channel and going out to its caller. Another reason could be that the transformation state is not reversible, i.e. cannot be rolled back in a simple way, without hogging memory. An example for this would be an encryption transformation using a cryptographically strong hash-function to go from the current state to the state for the encryption of the next byte (or block). This is not reversible. We can go forward from state to state, but not back to the old state, except for saving them all.

So the simplest policy when dealing with seeking is to propagate the request unchanged to the underlying channel, to discard all information in the internal buffers of the transformation and then hope for the best. Data waiting to be written is converted as if they are the last block, in other words the special end of information processing is applied, and then flushed. The current state is abandoned too. The next call to InputProc (Section 3.5.7) or OutputProc (Section 3.5.8) will be handled as if it were the first call to the transformation.

The above is basically a strategy 'The user knows best, is able to compute a place making sense and not creating garbage during recover'. It could also be named 'head-in-the-ground'. In the end this simply means that the user of a certain transformation has to understand its properties and whether a seek on it makes sense at all before trying to seek.

Remark: It is possible to deal even with non-reversible state, by recording all read/write calls and maintaining an exact image of the information read/written so far, but this is, ah, memory-extensive, to understate this a little. Also note that the notion of channels with non-reversible state is equivalent to the notion of forward seekable channels.

SeekProc skeleton with 1:1 pass

static int SeekProc(clientdata, offset, mode, errorCodePtr) {
        ... flush waiting output
        ... flush waiting input, if possible (f.e. into a configured variable!)

        /* Chain the call */

        Tcl_Channel parent = Tcl_GetStackedChannel(clientData->channel);

        Tcl_ChannelType *parentType = Tcl_GetChannelType(parent);
        Tcl_DriverSeekProc *parentSeekProc = Tcl_ChannelSeekProc(parentType);
        int errorCode;

        if (parentSeekProc == NULL) {
                return -1;
        }

        return parentSeekProc(Tcl_GetChannelInstanceData(parent),
                offset, mode, &errorCode);
}

As a last note, Trf[L49 ] implements a much more complex seeking model for transformations, but describing it is beyond the scope of this document. Go to its documentation instead.

3.5.5 BlockModeProc

This procedure is called by the generic I/O layer whenever Tcl_SetChannelOption[L50 ] is used (for example by 'fconfigure') and a '-blocking' was specified as the name of the option.

For a base channel this procedure has to take the necessary actions at the OS level to switch the OS object underlying the managed channel into (non-)blocking behaviour.

A transformation channel however just has to remember this information in its instance data so that InputProc (Section 3.5.7) is able to deal correctly with empty reads on the downstream channel. The generic layer takes care of notifying all channels in a stack so that all have the same information.

BlockModeProc skeleton

static int BlockModeProc(clientdata, mode) {
        if (mode == TCL_MODE_NONBLOCKING) {
                clientdata->flags |= ASYNC;
        } else {
                clientdata->flags &= ~ASYNC;
        }
}

3.5.6 CloseProc

This procedure is called by generic the I/O layer to tell a channel that it is about to be destroyed. See Section 3.3 for the procedures which can invoke it.

It is the responsibility of the procedure to clean up any data structures held by the channel. Another task is the removal of all event related things, like ChannelHandlers and Timers, although this could be billeted under 'clean up of any data structures held by the channel' too.

Transformations have the additional responsibility to complete the conversion of all incomplete information sitting in its internal write buffers and to write the result into the downstream channel to ensure a clean closure.

CloseProc skeleton

int CloseProc(clientdata, interp) {
        ... delete timer, if any. See 'WatchProc' too.

        /*
        |...      do last minute conversions on r/w buffers and try to
        | flush their results to the underlying channel.
        | See below
        |
        | parent = Tcl_GetStackedChannel (clientdata->channel);
        | Tcl_WriteRaw (parent, buffer, bufsize);
        */

        ... free data structures on the heap.
        return TCL_OK
}

The part marked with | is specific to transformations and not required for a base channel.

3.5.7 InputProc

This procedure is called by the generic I/O layer whenever some input is required. Entrypoints which can cause this are Tcl_Read[L51 ], Tcl_ReadChars[L52 ], Tcl_Gets[L53 ] and Tcl_GetsObj[L54 ].

A base channel just has to call the appropriate OS functionality to get the information, or retrieve it from its internal buffers.

A transformation on the other hand has to ask the channel downstream for the data to convert when reading, or write the converted data to it when writing. Usage of Tcl_Read[L55 ] is not allowed under any circumstances. As said several times before, the generic layer compensates for the existence of a stack when dealing with a channel. So reading using Tcl_Read[L56 ] will cause a read from the topmost channel in the stack, this will try to get information from downstream, jump again to the top, ad infinitum, or rather until the stack blows up.

To get around this problem two special APIs were introduced, the Raw-functions. They will always access the channel given as their argument without compensation for stacking, thus enabling a transformation to talk directly to the channel downstream. It is important to note that these functions pose a risk too. Usage from within the driver of a transformation is required, but nothing can stop usage from outside of such a driver as well. This means that it is possible to write tools which are able to bypass channels in a stacks and cause all sorts of (de)synchronisation and security problems.

Here we need Tcl_ReadRaw[L57 ].

Instead of a skeleton which would be overwhelming even if trimmed down I list the rules the input procedures in my transformations are based upon.

If the request can be satisfied by the information in the internal read buffers of the transformation, just use their contents.
If there was not enough data in the buffers to satisfy the request ask the underlying channel for more.
- In blocking mode this will wait until we get some data or hit EOF.
  - After returning from the read we first have to convert the read bytes and second check whether the result is enough to satisfy the initial request. If not we have to repeat querying the channel downstream.
  - If we hit EOF instead we have to convert any incomplete information in the internal buffers using any special handling defined for the transformation and then return (a possibly partial result). The EOF condition must not be signaled upward to our caller until our internal buffers are empty.
- In non-blocking mode we either get nothing, some data or EOF. Getting EOF or data has to be handled as in the previous item. But if nothing was retrieved we simply return the partial (or even empty) result. And if there is nothing in the internal buffers we have to signal the error EWOULDBLOCK too.

Other things to consider:

If a transformation makes use of an interpreter for the evaluation of scripts during its work it has to use Tcl_SaveResult[L58 ] and Tcl_RestoreResult[L59 ] to protect the result area of the interpreter. This is necessary as the I/O system, i.e. the calling procedure may have an unannounced reference to the object. Not doing this may crash the interpreter with a defect list of free objects.
Writing to the underlying channel is allowed! An example using this is the SSL/TLS transformation[L60 ] created by Matt Newman[L61 ]. Before going into a transparent encryption mode it handles the complete handshake between server and client required to setup the encryption. As long as the handshake is not complete nothing can be read from/written to the channel.
The InputProc can have any type and number of side effects. Examples:
- Identity transformation collecting statistics (frequency of bytes, byte-pairs, triplets, etc.)
- Splitter: Identity transformation piping the information flowing through it to a second channel (different from the channel downstream).
Recursively reading from/writing to the transformation itself (maybe indirectly, see splitter) is not a good idea, it may lead to infinite looping.

3.5.8 OutputProc

This procedure is called by the generic I/O layer whenever something is written to the channel and an I/O buffer in the generic layer is flushed. Entrypoints which can cause this are Tcl_Write[L62 ], Tcl_WriteChars[L63 ] and Tcl_WriteObj[L64 ].

A base channel just has to call the appropriate OS functionality to write the information, or to store it into its internal buffers.

Transformations on the other hand have to convert as much as possible of the data they got, and the result must be written to the channel downstream (Well, not really, but not writing it does not make much sense). Like for Section 3.5.7 usage of Tcl_Write[L65 ] and consorts is forbidden and would crash the system.

Here we need the second of the two Raw_APIs, namely Tcl_WriteRaw[L66 ].

If there is data which cannot be converted at once it has to be buffered internally, for conversion by future write requests, together with the data written by these calls.

As with InputProc (Section 3.5.7) this procedure is free to read from the underlying channel too, or from some other channel, or ...

3.5.9 WatchProc

This procedure is called by the generic I/O layer whenever the user (or the system) announces its (dis)interest in events on the channel. It is called throughout the system, for example when channel handlers are added to and removed from a channel, or after the execution of channelhandlers (as they may change the interest).

Base channels have to check the mask and then (un)register themselves at the notifier. In most cases this will involve using the existing APIs and some OS handle for the channel, but in more complex cases it might be necessary to write an entirely new event source and add it to the notifier. This is beyond the scope of this HOWTO.

The most relevant entries in the API for this are

Tcl_CreateFileHandler[L67 ],
Tcl_DeleteFileHandler[L68 ],
Tcl_CreateTimerHandler[L69 ] and
Tcl_DeleteTimerHandler[L70 ]

The behaviour for transformations is much simpler, and fixed. In other words, the following is what has to be done here for a smooth interoperation with the notifier and for working fileevents. The correct implementation of Section 3.5.10 is also of essence.

Two things have to be done.

Propagation of the mask information to the channel downstream. All transformations have to cooperate in this or else the base channel at the bottom won't register with the notifier for events. The contrasting example to this is Section 3.5.5 where the generic layer automatically notifies all channels in the tsack itself. Here we decided against such an automatism because the way it is now allows the various transformations to modify the mask before passing it down, i.e. to add the events they are interested in beyond the interest of the script level without any changes to the structure of Tcl_ChannelType[L71 ] or the sematics of this vector. This would have been the case if we had changed this to an automatic chaining in the generic part of the I/O system.
Setting up and destroying the timer used to flush out the internal read buffers. This needs a bit more explanation, which will be given after the skeleton code for a WatchProc.

WatchProc skeleton

static void WatchProc(clientdata, mask) {
        /* Pass the mask to the channel downstream, possibly
         * modified. Remember the mask internally.
         */

        Tcl_Channel parent = Tcl_GetStackedChannel(clientdata->channel);
        Tcl_DriverWatchProc *watchProc = Tcl_ChannelWatchProc(Tcl_GetChannelType(parent));

        trans->watchMask = mask;

        watchProc(Tcl_GetChannelInstanceData(parent), mask);

        /* Manage the timer */

        if (!(mask & TCL_READABLE) || (no pending converted information))) {
                /* A pending timer may exist, but either is
                 * there no (more) interest in the events it
                 * generates or nothing is available for
                 * reading. Remove it, if existing.
                 */

                ... kill the timer
        } else {
                /* There might be no pending timer, but there
                 * is interest in readable events and we
                 * actually have data waiting, so generate a
                 * timer to flush that if it does not exist.
                 */

                ... create the timer.
        }
}

The handler procedure for the timer handled above looks like below. It basically proclaims that the transformation channel is readable. There is no need to recreate the timer here, because the generic layer will call Section 3.5.9 after it has handled the event and that vector will then do the right thing (see above).

Timer handler skeleton

static void ChannelHandlerTimer(clientData) {
        clientdata->timer = NULL;
        Tcl_NotifyChannel(clientdata->channel, TCL_READABLE);
}

Now the promised explanation about the necessity of the timer.

Consider this scenario:

A transformation is stacked upon a socket and its internal read buffer is empty. The transformation does not merge lines.
A fileevent script is set up and waiting for calls.
The socket has data available, say 400 bytes, in several lines (more than one), they are the last on the channel, i.e. followed by EOF. The notifier generates the appropriate 'readable' event.
This event triggers the execution of the fileevent script in the top channel.
The executed script uses gets to read a single line.
As the buffers are empty (s.a.) the transformation asks the socket for data to convert, using Tcl_ReadRaw[L72 ] and a standard buffer of 4K size. Thus it gets all waiting bytes from the socket. These are converted, resulting in several lines (no merge). Some of them are delivered up into the generic layer and its buffering, but not all (small buffersize). At least one line remains in the read buffer(s) of the transformation itself.
The script processes the one line it got and then goes back to sleep.
Now what?
The generic I/O layer finds that its buffers are not empty and uses a timer to generate additional 'readable' events to clear them.
In the end the generic I/O buffers are empty. Now what, again ?
Nothing. No events, and no processing of the remaining line(s) stored in the transformation.
Why? Because the socket has an EOF pending and will not generate events anymore. and the generic layer has empty buffers and ceases to generate events too. It has no knowledge about the buffers inside the driver, i.e the transformation. So the script will not wake up again, neither ask for the line, nor detect the pending EOF. We are hung.

The solution to this lock is the same one used by the generic layer, but from the inside of the transformation this time:

: The transformation has to check itself for data waiting to be read and then use a timer to generate the necessary 'readable' events. And that is what the timer shown in the WatchProc will do.

3.5.10 HandlerProc

This vector is the second part of the integration of transformations with the notifier. It is called by the generic layer whenever an event happens at the channel downstream. This also means that it is never called for the channel at the bottom of a stack. In other words, base channels don't have to implement this function.

Transformations on the other hand have to implement it, and the minimally required implementation will pass the incoming event mask through, unchanged.

To understand this an explanation is in in order. The function is called with a mask where bits are set for all events which happend on the channel downstream. The caller then expects that the return value of the function is the same mask, but with all the bits cleared whose events were handled by the function itself.

Because of this a transformation is able to absorb and handle events without the channel (or script) above being aware of them and the associated processing. The TLS transformation[L73 ] for example uses this facility to handle the whole negotiation phase. Only after the encryption is setup events are passed unchanged to the higher layers.

HandleProc skeleton

static int HandleProc(clientdata, mask) {
        /*
         * An event occured in the underlying channel.  This
         * transformation doesn't process such events thus
         * returns the incoming mask unchanged.
         *
         * We do delete an existing timer. It was not fired,
         * yet we are here, so the channel below generated
         * such an event and we don't have to. The renewal of
         * the interest after the execution of channel
         * handlers will eventually cause us to recreate the
         * timer
         */

        ... kill timer
        return interestMask;
}

3.5.11 FlushProc

This vector is currently not used by the generic layer of the I/O system. In the future it might be used to separate the actions of flushing and writing data.

Author's Address

   Andreas Kupries
   Andreas Computer Laboratories (Me, myself and I)

   EMail: [email protected]

Appendix A. Glossary

A little glossary of terms used in the paper, but so far without much of an explanation (or none).

Tcl_Channel: An opaque token for channels, and used by all interfaces accessing channels. Internally it is a pointer to the relevant data structures (Channel*).

stack: If one or more transformations are stacked upon an arbitrary other channel I use this word to refer to the whole group of channels.

(un)cover: Placing a transformation on a channel C "covers" C, removing the transformation "uncovers" it again.

Appendix B. Acknowledgements

This HOWTO was written in XML using the DTD developed by Marshall T. Rose for writing RFC's and I-D's, see RFC 2629[L74 ], and converted to text and HTML with his tool, xml2rfc.

Category Internals

Category Documentation