Version 28 of Tcl_Obj

Updated 2005-06-02 11:14:08 by btheado

This structure is the fundamental value used in the Tcl core, and represents a value that may have either a string (UTF8) representation, an arbitrary other representation (e.g. an integer if it is a number (integer/double), or a collection of values if it is a list, or whatever) or both, and may move between these representations pretty much at will. It is reference counted, and the allocator for it is very heavily tuned.

NEM - See [L1 ] for the orginal paper, written by Brian T. Lewis [L2 ].

It has a deeply unfortunate name, but the far more apt Tcl_Value was previously taken for handling user-defined expr functions...

RS thinks that the name is ok if one does not expect OO features, class membership etc. Objects have been there long before OO, and the name is certainly not under a monopoly (I'd object against that ;-). But the basic feature ob Tcl_Obj's is that they have a string representation and possibly a problem-oriented one, but each can be regenerated from the other (also if you define your obj Obj types). If such type conversions occur frequently, this costs performance - the so-called shimmering occurs. E.g. see what happens to i below:

 for {set i 0} {$i<10} {incr i} { #here we need the integer rep
    puts [string length $i]      ;#here the string rep..
    puts [llength $i]            ;# and here the list rep, so int rep goes away 
 }

Related man pages


[CMcC] I've put together a summary page of Tcl_Objs current for 8.4, containing information culled from the source.

A Tcl_Obj is defined as

an integer refCount
representing the number of references to this Tcl_Obj
a char *bytes
being the string representation of the object (under the doctrine `everything's a string')
an integer length
being the length of the string representation
a pointer to a Tcl_ObjType
which contains the type's name, and pointers to functions implementing the four fundamental operations which all Tcl_Obj instances are expected to implement.
a union internalRep
which is used to store up to two pointers of information which is opaque to Tcl.

Each Tcl_ObjType contains the following four function pointers plus a name.

freeIntRepProc
Called to free any storage for the type's internal rep. NULL if the internal rep does not need freeing.
dupIntRepProc
Called to create a new object as a copy of an existing object; NULL indicates that the default strategy (copy the whole internalRep union) is sufficient.
updateStringProc
Called to update the string rep from the type's internal representation. (Not sure what NULL means for this; IME that's not an especially good idea.)
setFromAnyProc
Called to convert the object's internal rep to this type. Frees the internal rep of the old type. Returns TCL_ERROR on failure. NULL indicates that objects of this type can't normally be created (typically because extra context is needed.)

Joe Mistachkin -- 13/Oct/2003 -- The following is a pre-TIP preview of some enhancements to the Tcl_Obj system I [and others] would like.

The following changes:

<some excerpts are from Tcl'ers chat>

 Tcl_Obj always has SOME object type.
 The whole idea is that you rely on the clientData to be YOURS even when the object type is not.
 I guess a convention (again only a convention) is to point to the clientData's associated object type in the first sizeof(void *) word.
 That way you can safely check.
 Instead, we can make it more "safe" and generalized by using this:

      struct Tcl_ObjData { 
        int size;
        Tcl_ObjType *typePtr; 
        ClientData clientData;   
        int coreFlags;
        int extFlags;
      };

Now, we need a new callback for Tcl_Obj's so they can be notified when the object is being DESTROYED.

      typedef void (Tcl_FreeObjProc) _ANSI_ARGS_((struct Tcl_Obj *objPtr));

Next, we modify the Tcl_ObjType struct like so:

      typedef struct Tcl_ObjType {
          char *name;                        /* Name of the type, e.g. "int". */
          Tcl_FreeInternalRepProc *freeIntRepProc;
                                      /* Called to free any storage for the type's
                                +* internal rep. NULL if the internal rep
                                +* does not need freeing. */
          Tcl_DupInternalRepProc *dupIntRepProc;
                                          /* Called to create a new object as a copy
                                        +* of an existing object. */
          Tcl_UpdateStringProc *updateStringProc;
                                          /* Called to update the string rep from the
                                        +* type's internal representation. */
          Tcl_SetFromAnyProc *setFromAnyProc;
                                          /* Called to convert the object's internal
                                        +* rep to this type. Frees the internal rep
                                        +* of the old type. Returns TCL_ERROR on
                                        +* failure. */
          Tcl_FreeObjProc *freeObjProc;
                  /* Called when the object refcount reaches 
                  +* zero just prior to the object being freed. */
      } Tcl_ObjType;

Finally, we modify the Tcl_Obj struct like so:

      typedef struct Tcl_Obj {
        int refCount;                /* When 0 the Tcl_FreeObjProc will be called 
                                +* and the object will be freed. 
                                +* WE may also need to call the Tcl_ObjData's Tcl_ObjType freeProc
                                +*/
        char *bytes;                /* This points to the first byte of the
                                +* object's string representation. The array
                                +* must be followed by a null byte (i.e., at
                                +* offset length) but may also contain
                                +* embedded null characters. The array's
                                +* storage is allocated by ckalloc. NULL
                                +* means the string rep is invalid and must
                                +* be regenerated from the internal rep.
                                +* Clients should use Tcl_GetStringFromObj
                                +* or Tcl_GetString to get a pointer to the
                                +* byte array as a readonly value. */
        int length;                        /* The number of bytes at *bytes, not
                                        +* including the terminating null. */
        Tcl_ObjType *typePtr;        /* Denotes the object's type. Always
                                +* corresponds to the type of the object's
                                +* internal rep. NULL indicates the object
                                +* has no internal rep (has no type). */
        union {                        /* The internal representation: */
                long longValue;        /*   - an long integer value */
                double doubleValue;    /*   - a double-precision floating value */
                VOID *otherValuePtr;   /*   - another, type-specific value */
                Tcl_WideInt wideValue; /*   - a long long value */
                struct {               /*   - internal rep as two pointers */
                  VOID *ptr1;
                  VOID *ptr2;
                } twoPtrValue;
        } internalRep;
        Tcl_ObjData *dataPtr;    /* The extra information for use by the "owner" of this object. */
        char *file;              /* The file where this object was allocated/initialized. */
        int line;                /* The source line where this object was allocated/initialized. */
      } Tcl_Obj;

NOTE: The "file" and "line" fields should reside DIRECTLY in the Tcl_Obj for robustness.


 scenerio #1. internal rep changes (gets freed)
 step #1. if obj->dataPtr is non-NULL, and obj->dataPtr->typePtr->freeIntRepProc isn't null, call the freeIntRepProc (in this step the called freeIntRepProc CANNOT modify the "outer" Tcl_Obj data UNLESS the objTypes match exactly).
 step #2. check the result, if it's an error, stop processing and return the error.
 step #3. next, call the obj->objType->freeIntRepProc, if it's non-NULL (it CAN touch any of the "inner" or "outer" data).
 step #4. check the result, if it's an error, stop processing and return the error.
 step #5. done, actually free the int rep.

 scenerio #2: refcount == 0, object is about to be DESTROYED
 step #1. if obj->dataPtr->objType->freeObjProc != NULL, then call it (in this step the called freeObjProc CANNOT  modify the Tcl_Obj UNLESS the objTypes match exactly).
 step #2. check the result, if it's an error, stop processing and return the error.
 step #3. next, call the obj->objType->freeObjProc, if it's non-null (it CAN touch anything in the "inner" or "outer" data).
 step #4. check the result, if it's an error, stop processing and return the error.
 step #5. actually FREE the object if both calls succeeded.

 NOTES: 

 The "outer" data is the directly inside the Tcl_Obj.
 The "inner" data is the data inside the contained Tcl_ObjData.
 The called procs need to be verify the pointers they need are valid prior to trying to free/use them.
 And to be fully robust... if null pointers are considered an error by the procs, the procs should return TCL_ERROR.

More discussion...

 Tcl_ObjData probably ought to have a refcount field (I assume you'd share them between duplicated objects, yes?).
 Ok, now for refcounting Tcl_ObjData.
 We could do that... It would complicate things a bit.
 We would need the same sharing semantics that Tcl_Obj's have
 I think we may have reasons NOT to share Tcl_ObjData's
 Because, presumably two "identical" Tcl_Obj's may need entirely different internal clientData "handles".
 How does that differ from the case where you have a [puts $x,[string length $x]] in between?
 The internal rep gets shimmered away.
 No problem though
 Who retains the knowledge of how to duplicate the objdata?
 The ObjType DuplicateObjProc, in theory.
 Which is now serving two purposes?
 No.
 It's serving one "purpose", to "duplicate" a Tcl_Obj, which includes any subordinate data.

Marco Maggi (Oct 14, 2003) I'm not getting it. Can you add the explanation of a real world example?

As it is now, Tcl_Obj is a data proxy for its internal and external representation:

  -------------
 | user module |-------
  -------------        |    -------    --------------
                        -->| data  |->| internal rep |
                        -->| proxy |   --------------
  -------------        |    -------
 | user module |-------        |       --------------
  -------------                 ----->| external rep |
                                       --------------

the data proxy allows copy-on-write for the representations.

You are proposing is to add another data reference. Do you want the internal and external representations to (1) serve as a data proxy for the real data, or do you want them to be (2) a shimmerable representation of the real data?.

  -------------
 | user module |-------
  -------------        |    -------    --------------
                        -->| data  |->| internal rep |
                        -->| proxy |   --------------
  -------------        |    -------
 | user module |-------      |   |     --------------
  -------------              |    --->| external rep |
                             v         --------------
                         -----------   
                        | real data |
                         -----------

Example of option (1): a module, implemented at the C level, has the responsibility of a big vector of elements; you register a pointer to the vector as real data in a Tcl_Obj, and let the representations represent an index in the vector; in this case the object type is something like "index-in-a-vector".

Example of option (2): a module, implemented at the C level, instantiates a tree structure; you register a pointer to the root node as real data in a Tcl_Obj, and let the representations offer a view over the tree's nodes; that way the representations may be shimmered at will: the tree is still there.

In your scenarios there's the possibility that the object destruction returns an error: is this correct?


Joe Mistachkin -- 14/Oct/2003 -- First, these changes would facilitate the ability for extension-specific data to survive the internal rep being shimmered away. Second, it would allow extensions to know when objects of their type get destroyed. As for the possibility of the object destruction returning an error, I was under the impression that was the case now. However, it appears to NOT be the case. I do not propose changing the Tcl_FreeInternalRepProc to be capable of returning an error.


DKF 140803: Let's see if I've got this all straight in my mind:

The proposal is to add a new representation slot to Tcl_Objs with different management semantics to the current internalRep?

The semantics are that the new slot is a pointer (or NULL) to some other structure that is self-describing (i.e. contains a pointer to some type structure) to some degree. There are two standard operations on the overall object that affect the slot: duplication and deletion.

duplication
When an object is duplicated and the slot is non-NULL, the slot's duplication operation (if not NULL) is called to perform whatever duplication operation is required. The target object's slot state is NULL prior to the operation, and it is entirely up to the duplication operation to manage the slot.
deletion
When the object is deleted overall, or if something decides to force the clearing of the slot, and the slot is not NULL, the slot's deletion operation is called. If the slot's deletion operation is NULL, the slot's pointer will be passed to ckfree()/Tcl_Free().

(Warning: No use cases for the flags field!)

Slot creation is not defined here, but it assumed to be up to the code that manufactured the object. No core object will use the slot (what about object-shimmer-to-list-and-lappend?) but the core is allowed to clear the slot. It is strongly recommended that other transparent collection types do not use the slot for their primary information store either, though perhaps they can hold metadata there?

Because both dup and free operations are controllable, user code can implement slot sharing between duplicated objects if it wishes.

Will it be possible to pack a Tcl_ObjData into the front of another structure so you can cut the number of calls to ckalloc()? If that's the case, user code can easily add a new field (like a refcount) if desired.

Should the slot type structure have a versioning/size field? No need to duplicate the mistake that was made with Tcl_ObjType...


Joe Mistachkin -- 15/Oct/2003 -- Ok.

  1. There are quite a few things that extensions [potentially] need to know about to be robust. First, the extension needs to know when objects of the custom type are "created". This functionality is already present in Tcl today since the extension has to "cook up" its objects from the generic ones. Second, the extension needs to know when somebody wants to duplicate objects of the custom type. Third, the extension needs to know when the internal rep for objects of the custom type are being freed. Finally, it needs to know when objects of the custom type are being totally destroyed. I believe that these modifications would address all these needs and still leave room for future expansion.
  2. I have modified the above Tcl_ObjData struct to have an additional flags field. The "coreFlags" field is for exclusive use by things inside of the core. Extensions may query it but NOT modify it. The "" field is for exclusive use by "extensions" (whatever is implementing the objType in question, which may be the core), the core may query it but NOT modify it.
  3. Yes, we probably need a size field. Added above.

DKF: Things are looking good; I'm just trying to understand everything. :^)

DKF 11/11/03: One possibility might be to allow for separate control (via configure's --enable-debug option which is already heavily overloaded) of whether objects have the debugging info in them. Some kinds of debugging (e.g. of general memory use) don't want the overhead of an extra 8 bytes per object, and other kinds of debugging (e.g. who's allocating that bad object?) need it...

NEM 22Mar2004: Need some clarification here. If Tcl_ObjType is still responsible for dealing with the Tcl_ObjData stuff, then wouldn't the following scenario be quite likely (assuming use of Tcl_ObjData becomes wide-spread):

   1 Tcl_Obj is given some Tcl_ObjType (typeA), which installs some Tcl_ObjData
   2 Tcl_Obj is shimmered to some other type (typeB) which installs some new Tcl_ObjData, causing the deletion of typeA's

This would seem to solve nothing in this case. Or have I misunderstood? Perhaps Tcl_ObjData can only be set once, and it is an error to try to overwrite it? This second option would seem to imply that converting a Tcl_Obj to some type may cause an error, even if the string rep is entirely compatible. This would further imply that we would have added a form of static typing to Tcl, wouldn't it? Perhaps someone can clear this up for me.


See also Creating and Using Tcl Handles in C Extensions


[ Category Concept

Category Internals ]