base64

Difference between version 52 and 53 - Previous - Next
'''[http://en.wikipedia.org/wiki/Base64%|%base64]''' is a method for encoding binary data in an [ASCII] format



** Tcl Tools **

   [binary]:   provides commands to encode and decode base64 (since [Tcl 8.6])

   [Tcllib] [Tcllib base64%|%base64]:   a pure-Tcl implementation for encoding and decoding base64

   [base 64 encoding (Garrigues)] (defunct):   

   [Trf]:   is a handy C-coded extension which includes many encoding capabilities; among them is to replace the base64 encoding and decoding of tcllib with far faster versions.  Jeff McWhirter, who's working on Tcl-coded Web-mail, has coded a small [C] object [http://www.infocetera.com/dir/tcl/files_get/base64.c] (lost) just to do base64.

   [Creating image photo data from GIF files]:   This is a useful image creator, which uses 'package require base64', to create an 'image create photo some_name.gif -data {...}' statement. This may be kind of long. But it is great, because you can use that image, whatever it's called, in your tcl code (or just drop the .gif off the end in that image statement).
   [base64url]:   implementation of the URL-safe variant of Base64.

** Documentation **

    [RFC] [http://www.rfc-editor.org/info/rfc4648%|%4648]:   canonical reference on base64 encoding

    [http://merrigrove.blogspot.com/2014/04/what-heck-is-base64-encoding-really.html%|%What the Heck is Base64 Encoding really?], Daniel Eklund, 2014-04:   



** Description **

Base64 is a method of encoding arbitrary binary data in such a way that the result contains only printable characters and are thus able to pass unchanged even through old applications which are not 8bit-[ASCII]-clean.

Base64 is related to [uuencode%|%uuencoding] in that it uses the same mechanism of distributing the bits in 3 bytes into 4 bytes. But it uses a different table to map the resulting data into printable characters.  Therefore, base64 encoded data takes one-third more space than the data before the conversion.

[Base64] is used in [MIME] and by the [Tk] [image] commands.

[BAS]: There's also a derivation on base64 commonly referred to as base64url, which essentially just makes the base64 encode string viable for inclusion as a query parameter in an URL. This is used in http://tools.ietf.org/html/draft-jones-json-web-token-01%|%JWT%|%. From what I understand, it replaces the + encoded char with - and the / with _, and with no = padding. So, I believe it would be something like (using Tcl 8.6 binary command):

======
set base64url [string map {+ - / _ = {}} [binary encode base64 $string]] ;# untested
======

And of course, decoding would just be the reverse process.

[ray2501]: Thanks [BAS] information. I add my version for base64url encode and decode.

======
proc base64url_encode {string} {
    tailcall string map {+ - / _ = {}} [binary encode base64 $string]
}

proc base64url_decode {string} {
    tailcall binary decode base64 [string map {- + _ /} $string]
}
======

** base64 by [RS] **

2006-07-23: [RS] could not resist to do base64 encoding as a little weekend fun project, like this:

======
proc b64en str {
    binary scan $str B* bits
    switch [expr {[string length $bits]%6}] {
        0 {set tail {}}
        2 {append bits 0000; set tail ==}
        4 {append bits 00; set tail =}
    }
    return [string map {
        000000 A 000001 B 000010 C 000011 D 000100 E 000101 F
        000110 G 000111 H 001000 I 001001 J 001010 K 001011 L
        001100 M 001101 N 001110 O 001111 P 010000 Q 010001 R
        010010 S 010011 T 010100 U 010101 V 010110 W 010111 X
        011000 Y 011001 Z 011010 a 011011 b 011100 c 011101 d
        011110 e 011111 f 100000 g 100001 h 100010 i 100011 j
        100100 k 100101 l 100110 m 100111 n 101000 o 101001 p
        101010 q 101011 r 101100 s 101101 t 101110 u 101111 v
        110000 w 110001 x 110010 y 110011 z 110100 0 110101 1
        110110 2 110111 3 111000 4 111001 5 111010 6 111011 7
        111100 8 111101 9 111110 + 111111 /
    } $bits]$tail
}
======

On short test strings, it delivers the same results as "the real thing". What's missing is insertion of whitespace for longer strings. In any case, I think this implementation is pretty educational...



** base64 by [Jannis] **

[Jannis] 2009-12-08: Wrote a base64 decoder in the same manner as above. No warranties, works for me and I learned a lot. Doesn't honor "wrap characters", just as the encoder above. Many thanks to [RS] for the encoder!

======
proc b64de {str} {
    set tail [expr [string length $str] - [string length [string trimright $str =]]]
    set str [string trimright $str =]
    set bits [string map {
        A 000000 B 000001 C 000010 D 000011 E 000100 F 000101
        G 000110 H 000111 I 001000 J 001001 K 001010 L 001011
        M 001100 N 001101 O 001110 P 001111 Q 010000 R 010001
        S 010010 T 010011 U 010100 V 010101 W 010110 X 010111
        Y 011000 Z 011001 a 011010 b 011011 c 011100 d 011101
        e 011110 f 011111 g 100000 h 100001 i 100010 j 100011
        k 100100 l 100101 m 100110 n 100111 o 101000 p 101001
        q 101010 r 101011 s 101100 t 101101 u 101110 v 101111
        w 110000 x 110001 y 110010 z 110011 0 110100 1 110101
        2 110110 3 110111 4 111000 5 111001 6 111010 7 111011
        8 111100 9 111101 + 111110 / 111111
    } $str]
    set bytes [binary format B* $bits]
    return [string range $bytes 0 end-$tail]
}
======

----

[DominicErnst] 2011-12-06: just tried [Jannis]' small base64 decoder - it's not working properly! The matter is, the decoder does not remove the 2- or 4-bit tail at the end, in some cases, this deletes the last character of the decoded string. I fixed this issue with my own version of b64de as follows:

======
proc b64de {_str} {
    set nstr [string trimright $_str =]
    set dstr [string map {
        A 000000 B 000001 C 000010 D 000011 E 000100 F 000101
        G 000110 H 000111 I 001000 J 001001 K 001010 L 001011
        M 001100 N 001101 O 001110 P 001111 Q 010000 R 010001
        S 010010 T 010011 U 010100 V 010101 W 010110 X 010111
        Y 011000 Z 011001 a 011010 b 011011 c 011100 d 011101
        e 011110 f 011111 g 100000 h 100001 i 100010 j 100011
        k 100100 l 100101 m 100110 n 100111 o 101000 p 101001
        q 101010 r 101011 s 101100 t 101101 u 101110 v 101111
        w 110000 x 110001 y 110010 z 110011 0 110100 1 110101
        2 110110 3 110111 4 111000 5 111001 6 111010 7 111011
        8 111100 9 111101 + 111110 / 111111
    } $nstr]
    switch [expr [string length $_str]-[string length $nstr]] {
        0 {#nothing to do}
        1 {set dstr [string range $dstr 0 {end-2}]}
        2 {set dstr [string range $dstr 0 {end-4}]}
    }
    return [binary format B* $dstr]
}
======



** See Also **

   [base58]:   Like base64, but without these characters: `+/0OIl`

   [http://core.tcl.tk/tcllib/doc/trunk/embedded/www/tcllib/files/modules/base64/ascii85.html%|%ascii85]:   Like base64, but with only a 25% size increase.


** Historical: Bugs **

There are incompatibilities between various base64 packages and versions of Tcl.  [CL] has pursued this most. The root problem is a change in the semantics of [binary].  Tcl 8.0 with the base64 in tcllib 0.8 and before was definitely bad.  This thread [http://groups.google.com/groups?th=81c2fae612d54e6a] details this aspect of the change from 2.1 to 2.2 of the base64.tcl source code.

----

Please note that there is some base64 decoding support built into Tk which is different code than what is in tcllib.  There may be other bits around as well.  The reason this is important is when fixing bugs, etc.

----

Older versions of [Critcl] also had a Base64 implementation, see "ascenc" in [http://www.equi4.com/critlib/%|%critlib] and [a critical mindset about policy]


<<categories>> Tcllib | Category File | Category Internet | Category Data Serialization Format