Version 32 of Tcl and octal numbers

Updated 2012-12-23 12:59:56 by amw

Summary

expr interprets strings beginning with "0" as octal numbers if possible.

AMG: Not in Tcl 9 [L1 ]! Tcl 9 octal numbers must begin with "0o". Tcl 8.6 supports (but does not require) this notation.

Description

This can be a problem because in many cases, a leading zeros are often intended to "pad" the number to a certain number of digits, rather than to express the octal representation of a number. Here's an example of the gotcha:

set hours    08
set minutes  45
set seconds  00 
# $HH 
set newtime [expr {$HH + 1}]
invalid bareword "08"
in expression "08";
should be "$08" or "{08}" or "08(...)" or ... (invalid octal number?)

08 is not a valid octal number, and expr won't simply interpret it as an decimal "8" that's left-padded with a "0".

[scan], on the other hand, can be told to interpret numbers with leading zero's as decimal numbers, so a naïve fix would be:

scan $HH %d HH

which strips hazardous leading zeros. This is also safer than [string trimleft $HH 0] which can fail if $HH ever ends up containing "00" for example. Or if $HH is negative (not likely in a clock context, but the argument still applies).

DKF

Lars H, 2008-07-04: One problem with the above that turned up when doing arithmetic on currency is that %d doesn't handle arbitrarily large integers (even though Tcl can) — it's restricted to the range of numbers int() can output. In order to handle integers with an arbitrary number of digits, it is necessary to do

scan $HH %lld HH

Obviously this is not an issue with a number of hours in a day, but it can be an issue for other numbers.


glennj: one potential pitfall of [scan] is that it might mask potential errors. The following example fails as expected:

set n 09blah42
incr n
expected integer but got "09blah42"
while evaluating {incr n}

However:

set n 09blah42
scan $n %d n
incr n ;# ==> n is now 10

Application writers might actually want to trap an invalid entry like that.


AMW 2012-12-23: I use the following regexp in my code to prevent accidental interpretation as octal:

# assure that $num is not interpreted as octal:
regexp {^0*(\d+)} $num _dummy num
# 0 -> 0
# 0000 -> 0
# 0123 -> 123
# notanumber -> notanumber
# 0123dollar -> 123

If you want to avoid the interpretation of the last example as a number, add a final $ to the regexp:

regexp {^0*(\d+)$} $num _dummy num
# 0 -> 0
# 0000 -> 0
# 0123 -> 123
# notanumber -> notanumber
# 0123dollar -> 123dollar

[2003-03-12] I see Kevin Kenny contributed the following to c.l.t

proc forceInteger { x } {
    set count [scan $x %d%s n rest]
    if { $count <= 0 || ( $count == 2 && ![string is space $rest] ) } {
        return -code error "not an integer: \"$x\""
    }
    return $n
}
% forceInteger x
not an integer: "x"
% forceInteger 123
123
% forceInteger 08
8

This also covers my preceding concern:

% forceInteger 09blah42
not an integer: "09blah42"

Better than an explicit test of

string is space $rest

is to just skip (optional) spaces in the scan pattern:

proc forceInteger { x } {
    set count [scan $x {%d %c} n c]
    if { $count != 1 } {
        return -code error "not an integer: \"$x\""
    }
    return $n
}

Donald Arseneau



[Refer to http://phaseit.net/claird/comp.lang.tcl/fmm.html#zero ]

[Explain improved diagnostic in 8.3.]


The question recently came up how do I display the octal value of a character in Tcl? and according to RS, the complete sequence is

format 0%o [scan a %c]

(but only with Tcl more recent than 8.2 or so; older scan works slightly differently).

Non-arithmetic Use of expr

RS 2006-06-19 A word of warning: expr may normalize strings that look like octals to decimal, even if no arithmetic operation was performed on them:

%set bond james
% expr {$bond eq ""? "-": "$bond"}
james
% set bond 0070
% expr {$bond eq ""? "-": "$bond"}
56

But no complaints if the string cannot be parsed as octal:

% set bond 008
% expr {$bond eq ""? "-": "$bond"}
008

In such cases it's better and more robust to use if:

if {$bond eq ""} {set bond -}

string is ... Does Not Understand Octal

string is integer 098] ;# 0

IDG: There appears to be an octal related bug in string is. string is double 098 returns 1. (8.4.1 on windoze)

PT: At the very least it's inconsistent (tcl 8.5a0 win98)

% string is integer 098
0
% string is double 098
1

TIP #114

TIP 114 proposes modifying Tcl in a future release so that numbers beginning with 0 will not be interpreted by default as being expressed in octal. The proposer believes that far more users stumble upon this feature by accident than use it intentionally.

Misc


RS 2006-06-19 A word of warning: expr may normalize strings that look like octals to decimal, even if no arithmetic operation was performed on them:

 %set bond james
 % expr {$bond eq ""? "-": "$bond"}
 james
 % set bond 0070
 % expr {$bond eq ""? "-": "$bond"}
 56

But no complaints if the string cannot be parsed as octal:

 % set bond 008
 % expr {$bond eq ""? "-": "$bond"}
 008

In such cases it's better and more robust to use if:

 if {$bond eq ""} {set bond -}

HE 2006-06-20 Strange behavior! The manpage of expr says:

 eq ne
  Boolean string equal and string not equal.
  Each operator produces a zero/one result.
  The operand types are interpreted only as strings.

There is no mention about this behavior. (I remember weakly this two operators are added exactly to avoid this problem) More interesting: The manpage of if says:

 The if command evaluates expr1 as an expression (in the same way that expr evaluates its argument).

Is there something wrong?

JMN 2006-10-19 Not really.. This normalization isn't occurring in the 'eq' operation - it happens when expr returns the result. This may make it clearer:

 %expr {$bond eq ""? "-": "hello $bond"}
 hello 0070
 %expr {$bond}
 56
 %expr {$bond eq 56}
 0
 %expr {$bond == 56}
 1

LV Note the previous discussions on this page regarding the precautions one should keep in mind when using eq on tcl variables which contain numeric values. I am not certain I can think of a case where one would use eq when comparing numeric values...

2003-12-22 VI What I'd like more than what TIP 114 specifies is a prefix like 0d, which would force the rest of the number to be interpreted as decimal.

For the specific clock case, where we know we have two digits, I like to use expr like this:

set m [clock format [clock seconds] -format %m]
set m [expr 1$m % 100]

LV Right now, when I say:

set abc "\1"

abc is set to an octal 001.

Once this TIP is implemented, what will happen to the above code? Is it going to change behavior? - RS: abc is set to a string of one character U+0001 (ASCII SOH) - unaware of decimal, hex, or octal. This page is about parsing integers from strings, and U+0001 or any non-digit cannot be parsed as integer anyway :^)


Chris Nelson points out that octality afflicts not only Tcl [L2 ].


snichols 02/26/07 I tested octal arithmetic in both Ruby's irb interpreter and Python's interpreter and they behave in a similar way. So, this seems to be a common pitfall with other scripting languages too:

Ruby's IRB

 irb(main):001:0> 09 + 100
 SyntaxError: compile error
 (irb):1: Illegal octal digit
 09 + 100
  ^

Python

 >>> 09 + 100
  File "<stdin>", line 1
    09 + 100
     ^
 SyntaxError: invalid token