Playing with obfuscation

Richard Suchenwirth 2006-08-26 - While many of us Tclers want their source code as open as can be, others come up every now and then asking for how to make their code unreadable. So this weekend I experimented with the concept of obfuscation. My first idea was to apply a string map and wrap the mapped string together with its reverse mapping, so evaluating that will bring the original input back:

proc obfusc {map str} {
    list string map [lrevert $map] [string map $map $str]
}
#-- The future Tcl core may offer this list reversal built-in:
proc lrevert list {
    set res {}
    set i [llength $list]
    while {$i>0} {lappend res [lindex $list [incr i -1]]}
    set res
}

Now for the map. As this is only a sketch, I map the most frequent letters (etaoin shrdlu) and some digits, taking care to do this in x->y, y->x pairs to prevent artefacts. When the target cannot occur in the input string (as in the "high control" range \x80..\x9F), the pairing is not needed, so mapping Tcl's syntactically special characters is one-way only:

set map {
    e t t e a o o a i n n i s h h s r d d r l u u l
    0 - - 0 1 b b 1 2 c c 2 3 f f 3 4 g g 4
    \n \x80 \t \x81 \{ \x82 \} \x83 \[ \x84 \] \x85 $ \x86 " " \x87
}

For testing (the results of which I cannot show here, due to unprintable characters), I did:

 set fusc [obfusc $map $input]     ;# returns gibberish
 string equal [eval $fusc] $input  ;# should always be 1

To run the code, just do

 eval [eval $fusc]

But the disadvantage is that the map is part of the fusc string itself. One might hide it elsewhere, but I wasn't really after "hardened" obfuscation :^)

So I though of another, lighterweight method. Every character corresponds to an unsigned integer, conversions being done with scan resp. format %c. This integer can of course be computed with. For example, scan " " %c yields 32, add 1, then format %c 33 gives the character "!". To revert that operation, just subtract the same offset. So one little proc is sufficient for both en- and decoding:

proc shiftn {n str} {
    set res ""
    foreach c [split $str ""] {
        append res [format %c [expr {[scan $c %c]+$n}]]
    }
    set res
}

Though simple, the outcome looks baffling:

 % shiftn 1 [info body shiftn]
 ␋!!!!tfu!sft!##␋!!!!gpsfbdi!d!\tqmju!%tus!##^!|␋!!!!!!!!bqqfoe!sft!\gpsnbu!&d!\fyqs!|\tdbo!%d!&d^,%o~^^␋!!!!~␋!!!!tfu!sft␋!

And applying shiftn -1 faithfully returns the original input. Such shifted strings can be deployed, together with the shiftn function, which is nicely readable. So I finally applied some manual obfuscation, with variable names 0, 1, O, l and bad indentation for this "user" version:

proc < 1\ 0\ {l\ ""} {foreach O [split $0 $l] {
append l [format %c [expr {[scan $O %c]+$1}]]};set l

Testing looked good:

 < -1 [< 1 [info body shiftn]]

But finally, be aware that this is no encryption. It can just deter curious onlookers, but anyone who really wants to find out can easily do it - replacing the eval by puts is enough to reveal the secret:

 % puts [< -1 {qvut!#ifmmp-!xpsme"#}]
 puts "hello, world!"

rdt 2006.08.26 Nice, but shouldn't the shiftn be adding/subtracting modulo some number to wrap the upper characters around to the lower values?
RS Possibly.. but as in Tcl all characters are Unicodes, I didn't see this as a problem. There's a long way to \uFFFF :)
rdt I guess you are right (as always). I was thinking of the range of \u0020 ... \u007f.