Fixing C Access to Bignums

The following routines in Tcl's public interface, tcl.h, make use of the mp_int type:

  • Tcl_NewBignumObj(mp_int *)
  • Tcl_DbNewBignumObj(mp_int *, const char *, int)
  • Tcl_SetBignumObj(Tcl_Obj *, mp_int *)
  • Tcl_GetBignumFromObj(Tcl_Interp *, Tcl_Obj *, mp_int *)
  • Tcl_TakeBignumFromObj(Tcl_Interp *, Tcl_Obj *, mp_int *)
  • Tcl_InitBignumFromDouble(Tcl_Interp *, double, mp_int *)

Tcl's public header, tcl.h also includes these declarations:

typedef struct mp_int mp_int;
typedef unsigned long mp_digit;

which allows the compiler to make sense of the function declarations.

Callers of those functions must be able to allocate an mp_int struct so they can pass its address in. In order to do this a definition of the mp_int struct has to be in scope, but Tcl's public header tcl.h does not provide one.

Q1: What header file are callers of these routines expected to #include to get a suitable mp_int definition in scope?

Q2: If the answer is something other than a corrected tcl.h, how are we to verify that the mp_digit definition active wherever that definition comes from is the same as that in tcl.h? ( Failure to make them consistent leads to a binary incompatibility).

Several of Tcl's own source code files need to call these routines. They have the same problem. Their solution is #include "tommath.h". When Tcl sources are built, the -I compiler options are set so that this refers to the file tcl/generic/tommath.h which is only a wrapper pulling in tcl/generic/tclTomMath.h which is a patched version of the original header of libtommath, tcl/libtommath/tommath.h. That's the Tcl internals answer to Q1.

Each Tcl source file using the mp_int struct, follows the pattern of #include "tcl.h" before #include "tommath.h". Also the file tcl/generic/tclTomMath.h has been patched so that when MP_DIGIT_DECLARED is defined, any declarations of mp_digit are skipped. This combination is the Tcl internals answer to Q2.

The file tcl/generic/tclTomMath.h is patched so that if you include it without first including tcl.h (and no other special MP_* directive is active), the controlling mp_digit definition is:

typedef unsigned int mp_digit

Because of the Tcl internals answer to Q2 though, the tcl.h declaration of mp_digit controls and the mp_int struct in use throughout Tcl's implementation is one built on an array of unsigned long digits. This is most unfortunate on L64 systems, because the rest of tcl/generic/tclTomMath.h forces a configuration where only 28 bits of each digit are used to store bits of the bignums. This doubles the memory weight of bignums on L64 systems for no gain at all. (See [L1 ].)

Q3. Can we change tcl.h so that mp_digit is unsigned int instead of unsigned long ? Can we do it in a patch release?

This would be no change on L32 systems. On L64 systems it would be a huge improvement in memory efficiency.

It's a binary incompatibility, which is normally off limits in a patch release, but I think this is really a (very late) bug fix in the new interface routines of Tcl 8.5. I think it's worth doing, but we should find out if/how it will cause problems to any existing users of the interface.

Q4 Why is it that the mp_int and mp_digit declarations in tcl.h are bracketed by MP_*_DECLARED tests? What is it protecting against?

When building Tcl itself, there will not be anything defining these directives, so Tcl's declarations will control. When building an extension or embedding program, if we let some other header control and choose different definitions, we will end up with code that is binary incompat with the prior Tcl build. I think these protection lines are wrong and should go away. If I'm wrong, I need to learn what the value is, and adding some comments explaining it would be a good idea.

Since tcl.h does not give callers everything they need, what do folks try? From what I see, they copy Tcl's own sources, and take note that we install the header file tclTomMath.h (and its junior partner tclTomMathDecls.h), so they #include "tclTomMath.h" to get the needed mp_int struct definition.

Problem is, this is broken. (See [L2 ].)

Aha! We actually tell users of those routines what to do in their documentation! [L3 ]. So #include "tclTomMath.h" is what we currently tell people to do. It's still broken though.

Q5 How best to fix this brokenness?

See the patch attached now to [L4 ]. Any comments pro or con?

Assuming a fix for that, or just the workaround of making an empty tommath_class.h file visible to your build process, the #include "tclTomMath.h" declares a collection of mp_* routines (mp_add, etc.) which can either be used directly, or, when TCL_USE_STUBS is active, after a call to the routine Tcl_TomMath_InitStubs().

Q6 Since these routines come courtesy of libtcl, shouldn't we document this?

I don't really care about this next issue, but for the sake of completeness.

Q7 Is it possible for a program to use both the Tcl library and the original libtommath library (not our fork!)?

And finally as a placeholder for things I know Kevin Kenny woudl like to pursue:

Q8 What about the other libtommath routines?

My aim is to address as much of these questions as we can at the BOF on 2009 Sept. 30 at Tcl 2009. Feel free to add relevant comments or additional related questions here.

AMG: See my comments on tcl::tommath. Also see [L5 ] for a relevant bug report.