Purpose: The Tcl maintainers are working on an issue where overflow of a 32-bit integer, followed by conversion to wide, can yield a literal with an inappropriate internal representation. KBK has been asked to start a Wiki page to track the discussions, since they are getting complex enough to be difficult to track in the Chat and email, and a Wiki page supports collaborative editing, unlike a SourceForge bug report.
Background: Tcl 8.4 has a serious bug [L1 ] in its processing of integer literals that exceed the range of a 32-bit word. The issue is that, for backward compatibility with earlier versions of Tcl, integers that fit into a 32-bit unsigned word are treated as 32-bit constants. The problem that arises is that these constants acquire an internal representation that can then be sign-extended to a wide integer. The wide integer will have an incorrect value. [L2 ] is a related bug.
The bug is truly insidious because it pollutes shared literals, so unrelated code can stumble over problems. The following illustrates the sort of bizarre results the literal pollution can cause.
% proc b {} { set x 2200000000 ; puts [expr { wide($x) + 1 }] } % b 2200000001 % proc a {} { clock format 2200000000 } % a; b -2094967295
Certain changes to [string is integer] in 8.4.3 and later releases have made the following script also fail similarly, although it works from 8.0 to 8.4.2:
% proc b {} { set x 2200000000 ; puts [expr { wide($x) + 1 }] } % b 2200000001 % proc a {} { string is integer 2200000000 } % a; b -2094967295
What's happening: The following table gives examples of each type of 32-bit integer conversion that's possible, and the notes below explain what is going on.
-------------------------------------------------------------- Constant 32-bit Note representation ------------------------------------------------------------- -0x100000000 -- integer value too large to represent -- *1 ------------------------------------------------------------- -0xffffffff 0x1 *2 -0x80000001 0x7fffffff *2 ------------------------------------------------------------- -0x80000000 0x80000000 *3 -0x7fffffff 0x80000001 *3 -0x1 0xffffffff *3 ------------------------------------------------------------- -0x0 0x0 *4 0x0 0x0 *4 0x1 0x1 *4 0x7fffffff 0x7fffffff *4 ------------------------------------------------------------- 0x80000000 0x80000000 *5 0xffffffff 0xffffffff *5 ------------------------------------------------------------- 0x100000000 -- integer value too large to represent -- *6 -------------------------------------------------------------
The significant cases above are 2 (numbers between -0x80000001 and -0xffffffff) and 5 (numbers between 0x80000000 and 0xffffffff). In both these cases, the "integer" internal representation, if sign extended to "wide", will result in an incorrect value. In both cases, the "wide" value will be correct if the sign bit is complemented before sign extension. The following table gives examples for each case.
Constant Incorrect Correct Note ------------------------------------------------------------------- -0xffffffff 0x0000000000000001 0xffffffff00000001 *2 -0x80000001 0x000000007fffffff 0xffffffff7fffffff *2 -0x80000000 0xffffffff80000000 *3 -0x00000001 0xffffffffffffffff *3 -0x00000000 0x0000000000000000 *3 0x00000000 0x0000000000000000 *4 0x00000001 0x0000000000000001 *4 0x7fffffff 0x000000007fffffff *4 0x80000000 0xffffffff80000000 0x0000000080000000 *5 0xffffffff 0xffffffffffffffff 0x00000000ffffffff *5 -------------------------------------------------------------------
So, how to fix this bug? KBK's initial thought is to introduce another object type in tclObj.c: tclOverflowedIntType. This object type will represent objects that were converted on input to 32-bit integers with overflow. It will behave identically to tclIntType in that Tcl_GetIntFromObj will return the 32-bit value. But the places in the Core where an integer representation is retrieved and then sign extended to wide will change to sign extend with the complement of the sign bit, as shown above in 2 and 5.
It is KBK's belief that this change should not break existing scripts, since they will see the same 32-bit behavior that they did before. It should also not break existing extensions, even those that reach into the internal representation; the worst that it will cause is to make them do needless calls to convert the type.
One remaining issue with this idea is the question of what Tcl_ConvertToType should do if requested to convert one of these overflowed integers to tclIntType. KBK is of the belief that the most backward-compatible action is probably to have it silently convert to tclOverflowedIntType instead; any code that is expecting the internal representation afterward will see the correct data in objPtr->internalRep.longValue and will only notice the difference if it explicitly checks objPtr->typePtr. A riskier alternative is to return TCL_ERROR with a message indicating that the value is too large to represent. KBK believes that both alternatives are low-risk, because there are few if any callers for Tcl_ConvertToType - no Core caller ever requests an explicit integer conversion in this manner.
Jacob Levy KBK's tcl-core message sounded more alarming than it appears here. For starters, where in the core did this bug manifest itself? And does this only affect TclInt or is it, as DKF hints, an issue with setting the type of the object incorrectly?
In any case, I'm hoping that any fix for this will not break existing extensions that use the core's internals such as Feather, Jacl, and e4Graph.
DKF: Rummaging around in Tcl's guts is not a great thing to do. Only the owner code of a type should look at the internal rep of objects of that type.
Jacob Levy DKF I'm not sure what the above means, please explain.
DGP Here's a simple example of what DKF means. Say you've been handed a Tcl_Obj, and you want to store its integer value in a C variable of type int. The right way to do that is:
Tcl_GetIntFromObj(interp, objPtr, &value);
The wrong way to do that is:
Tcl_ConvertToType(interp, objPtr, Tcl_GetObjType("int")); value = (int) objPtr->internalRep.longValue;
The idea is that Tcl knows how it stores internal representations of integers. Let it pull out the value for you. Don't try to do Tcl's work for it (and in the process create future breakage if Tcl ever changes its mind).
Snipped the rest which was veering into a different topic.
Lars H: IMHO, these bugs demonstrate that 32-bit integer overflow is something that is alien to the spirit of Tcl. Integers in Tcl should be proper integers (mathematical Z) rather than some standard C datatype. Proper integers for Tcl 8.5 or 9.0 is a sketch for how that could be achieved (comments welcome).
HTL: Are the bug IDs correct in the background section? The first bug was closed in 2004 and the second got to an error page.