Version 0 of 32-bit integer overflow

Purpose: The Tcl maintainers are working on an issue where overflow of a 32-bit integer, followed by conversion to wide, can yield a literal with an inappropriate internal representation. KBK has been asked to start a Wiki page to track the discussions, since they are getting complex enough to be difficult to track in the Chat and email, and a Wiki page supports collaborative editing, unlike a SourceForge bug report.

Background: Tcl 8.4 has a serious bug [L1 ] in its processing of integer literals that exceed the range of a 32-bit word. The issue is that, for backward compatibility with earlier versions of Tcl, integers that fit into a 32-bit unsigned word are treated as 32-bit constants. The problem that arises is that these constants acquire an internal representation that can then be sign-extended to a wide integer. The wide integer will have an incorrect value.

The bug is truly insidious because it pollutes shared literals, so unrelated code can stumble over problems. The following illustrates the sort of bizarre results the literal pollution can cause.

 % proc b {} { set x 2200000000 ; puts [expr { wide($x) + 1 }] }
 % b
 2200000001
 % proc a {} { clock format 2200000000 }
 % a; b
 -2094967295

Certain changes to [string is integer] in 8.4.3 and later releases have made the following script also fail similarly, although it works from 8.0 to 8.4.2:

 % proc b {} { set x 2200000000 ; puts [expr { wide($x) + 1 }] }
 % b
 2200000001
 % proc a {} { string is integer 2200000000 }
 % a; b
 -2094967295

What's happening: The following table gives examples of each type of 32-bit integer conversion that's possible, and the notes below explain what is going on.

 -----------------------------------------------------------------------------
 Constant       Tcl 7.6         Tcl 8.x                                   Note
                and earlier
 -----------------------------------------------------------------------------
 -0x100000000   0x00000000      -- integer value too large to represent -- *1
 -----------------------------------------------------------------------------
 -0xffffffff    0x1             0x1                                        *2
 -0x80000001    0x7fffffff      0x7fffffff                                 *2
 -----------------------------------------------------------------------------
 -0x80000000    0x80000000      0x80000000                                 *3
 -0x7fffffff    0x80000001      0x80000001                                 *3
 -0x1           0xffffffff      0xffffffff                                 *3
 -----------------------------------------------------------------------------
 -0x0           0x0             0x0                                        *4
  0x0           0x0             0x0                                        *4
  0x1           0x1             0x1                                        *4
  0x7fffffff    0x7fffffff      0x7fffffff                                 *4
 -----------------------------------------------------------------------------
  0x80000000    0x80000000      0x80000000                                 *5
  0xffffffff    0xffffffff      0xffffffff                                 *5
 -----------------------------------------------------------------------------
  0x100000000   0x0             -- integer value too large to represent -- *6
 -----------------------------------------------------------------------------

1 - Numbers less than -2**32+1 cannot be represented as 32-bit integers. Tcl 7.6 and earlier versions did not detect this, and simply took the least significant 32 bits of any integer presented, however large. Tcl 8.x rejects these numbers in any context requiring a 32-bit integer. In 8.4 and beyond, these numbers can have a "wide" internal representation as a 64-bit number.
2 - Numbers between -2**32+1 and -2**31-1 cannot be represented as 32-bit integers, but we suspect that they appear occasionally in scripts that intend to treat them as 32-bit constants (-0xffffffff is an example). They are handled by converting the absolute value as an unsigned integer, and then twos-complemeting the result. This is a case where the "wide" and "integer" internal representations disagree.
3 - Numbers between -2**31 and -1 are negative signed integers that fit conveniently in a 32-bit signed word. They do not cause trouble with conversion between "int" and "wide."
4 - Numbers between 0 and 2**31-1 are positive signed integers that fit conveniently in a 32-bit signed word. They do not cause trouble with conversion between "integer" and "wide."
5 - Numbers between 2**31 and 2**32-1 are positive integers that require a 32-bit unsigned word to represent. Tcl 8.x converts them to an "integer" internal representation. This is a case where the "wide" and "integer" internal representations disagree.
6 - Numbers greater than or equal to 2**32 are positive integers that do not fit in a 32-bit word. Tcl 7.6 converts them without complaint, truncating to the least significant 32 bits. Tcl 8.x rejects them in a context where 32-bit integers are required, but will convert them to "wide."

The significant cases above are 2 (numbers between -0x80000001 and -0xffffffff) and 5 (numbers between 0x80000000 and 0xffffffff). In both these cases, the "integer" internal representation, if sign extended to "wide", will result in an incorrect value. In both cases, the "wide" value will be correct if the sign bit is complemented before sign extension. The following table gives examples for each case.

 Constant       Incorrect               Correct                 Note
 -------------------------------------------------------------------
 -0xffffffff    0x0000000000000001      0xffffffff00000001       *2
 -0x80000001    0x000000007fffffff      0xffffffff7fffffff       *2
 -0x80000000                            0xffffffff80000000       *3
 -0x00000001                            0xffffffffffffffff       *3
 -0x00000000                            0x0000000000000000       *3
  0x00000000                            0x0000000000000000       *4
  0x00000001                            0x0000000000000001       *4
  0x7fffffff                            0x000000007fffffff       *4
  0x80000000    0xffffffff80000000      0x0000000080000000       *5
  0xffffffff    0xffffffffffffffff      0x00000000ffffffff       *5
 -------------------------------------------------------------------

So, how to fix this bug? KBK's initial thought is to introduce another object type in tclObj.c: tclOverflowedIntType. This object type will represent objects that were converted on input to 32-bit integers with overflow. It will behave identically to tclIntType in that Tcl_GetIntFromObj will return the 32-bit value. But the places in the Core where an integer representation is retrieved and then sign extended to wide will change to sign extend with the complement of the sign bit, as shown above in 2 and 5.

It is KBK's belief that this change should not break existing scripts, since they will see the same 32-bit behavior that they did before. It should also not break existing extensions, even those that reach into the internal representation; the worst that it will cause is to make them do needless calls to convert the type.

One remaining issue with this idea is the question of what Tcl_ConvertToType should do if requested to convert one of these overflowed integers to tclIntType. KBK is of the belief that the most backward-compatible action is probably to have it silently convert to tclOverflowedIntType instead; any code that is expecting the internal representation afterward will see the correct data in objPtr->internalRep.longValue and will only notice the difference if it explicitly checks objPtr->typePtr. A riskier alternative is to return TCL_ERROR with a message indicating that the value is too large to represent. KBK believes that both alternatives are low-risk, because there are few if any callers for Tcl_ConvertToType - no Core caller ever requests an explicit integer conversion in this manner.