Poly1305 is a cryptographic Message Authentication Code (MAC). That is, it takes a secret key and a message and produces a short "tag" that is extremely difficult to forge unless you know the secret key. It therefore can be used to authenticate a message and ensure it hasn't been tampered with. Compared to the more widely known HMAC, Poly1305 is extremely fast. (Or rather, has the potential to be - I make no claims about the speed of the implementation on this page).
The downside is that if an attacker sees two raw Poly1305 tags from the same key but for different messages, then they can recover the key using straightforward arithmetic. For this reason, modes that use Poly1305 either encrypt the tag with a separate key or derive a fresh Poly1305 key for each message. It is most often used in combination with the ChaCha20 stream cipher (e.g., in TLS), where the combined authenticated encryption scheme is known as ChaCha20-Poly1305, or "ChaPoly" for short.
The name derives from it being a polynomial evaluation MAC (i.e., it treats the message as a large polynomial with secret coefficients), based on the prime number 2^130-5.
The following code is based on the well-known poly1305-donna implementation that is used in many cryptographic libraries. It tries to ensure that no numeric variable is ever promoted out of the range of a 64-bit wide, to reduce the risk of timing side-channels due to dynamic memory allocation in the bignum code. Integers are represented as 5 26-bit limbs (giving 130 bits in total), and all operations are performed with 32-to-64 bit arithmetic. The extra bits are used to delay carry propagation for efficiency. This code should be constant time (i.e., the time should not be variable according to any secret inputs), but it hasn't had a lot of testing for that yet.
# poly1305.tcl -- # # Implementation of the Poly1305 authenticator. # # Copyright (c) 2018 Neil Madden. # License: Tcl-style package require Tcl 8.6 package provide poly1305 1.0.0 # poly1305 compute key message # poly1305 verify key message tag # # Computes or verifies a Poly-1305 message authentication code (MAC). # Poly1305 is a fast, cryptographically secure polynomial MAC. It is # information theoretically secure, although only if the key is only used # to authenticate a single message. The key can be recovered if more than # one message tag is computed with the same key. Therefore either a per-message # fresh key should be computed, or else the tag should be encrypted under # a separate key. # namespace eval ::poly1305 { namespace export compute verify namespace ensemble create #proc debug {msg args} { puts [format $msg {*}$args] } proc debug args {} proc clamp r { list [expr {[lindex $r 0] & 0x3ffffff}] \ [expr {[lindex $r 1] & 0x3ffff03}] \ [expr {[lindex $r 2] & 0x3ffc0ff}] \ [expr {[lindex $r 3] & 0x3f03fff}] \ [expr {[lindex $r 4] & 0x00fffff}] } proc verify {key data tag} { set expected [compute $key $data] equals $expected $tag } # constant time equality check proc equals {a b} { if {[string length $a] != [string length $b]} { return 0 } set ret 0 binary scan $a c* as binary scan $b c* bs foreach x $as y $bs { set ret [expr {$ret | ($x ^ $y)}] } expr {$ret == 0} } # Loads a 130-bit little-endian number and represents it as 5 26-bit limbs. # This internal format allows to delay carry propagation, resulting in a fast # and constant-time representation. proc load_130_le_26 bin { # Scan as 5 32-bit little-endian integers, rewinding (X) by one byte between each. # We then mask each down to a 26-bit integer. if {[binary scan $bin iuXiuXiuXiuXiu x0 x1 x2 x3 x4] != 5} { error "unable to parse data: [binary encode hex $bin]" } set x0 [expr {$x0 & 0x03ffffff}] set x1 [expr {($x1 >> 2) & 0x03ffffff}] set x2 [expr {($x2 >> 4) & 0x03ffffff}] set x3 [expr {($x3 >> 6) & 0x03ffffff}] set x4 [expr {($x4 >> 8) & 0x03ffffff}] list $x0 $x1 $x2 $x3 $x4 } proc carry limb { list [expr {$limb & 0x03FFFFFF}] [expr {$limb >> 26}] } namespace eval tcl { namespace eval mathfunc {} } proc tcl::mathfunc::mul64 {x y} { expr {wide(wide($x) * wide($y))} } proc compute {key data} { variable INT32 if {[string length $key] != 32} { error "key must be exactly 32 bytes" } set r [clamp [load_130_le_26 $key]] lassign $r r0 r1 r2 r3 r4 binary scan $key iu4iu4 -> s set s1 [expr {$r1 * 5}] set s2 [expr {$r2 * 5}] set s3 [expr {$r3 * 5}] set s4 [expr {$r4 * 5}] lassign {0 0 0 0 0} a0 a1 a2 a3 a4 set len [string length $data] for {set i 0} {$i < $len} {incr i 16} { set end [expr {min($i + 15, $len-1)}] set bytes [string range $data $i $end]\x01[string repeat \x00 [expr {15 - ($end - $i)}]] debug "block = %s" [binary encode hex [string reverse $bytes]] binary scan [string index $bytes 16] c c lassign [load_130_le_26 $bytes] n0 n1 n2 n3 n4 incr a0 $n0 incr a1 $n1 incr a2 $n2 incr a3 $n3 incr a4 [expr {$n4 | ($c << 24)}] debug "a = %08x %08x %08x %08x %08x" $a0 $a1 $a2 $a3 $a4 set d0 [expr {mul64($a0,$r0) + mul64($a1,$s4) + mul64($a2,$s3) + mul64($a3,$s2) + mul64($a4,$s1)}] set d1 [expr {mul64($a0,$r1) + mul64($a1,$r0) + mul64($a2,$s4) + mul64($a3,$s3) + mul64($a4,$s2)}] set d2 [expr {mul64($a0,$r2) + mul64($a1,$r1) + mul64($a2,$r0) + mul64($a3,$s4) + mul64($a4,$s3)}] set d3 [expr {mul64($a0,$r3) + mul64($a1,$r2) + mul64($a2,$r1) + mul64($a3,$r0) + mul64($a4,$s4)}] set d4 [expr {mul64($a0,$r4) + mul64($a1,$r3) + mul64($a2,$r2) + mul64($a3,$r1) + mul64($a4,$r0)}] debug "d = %08x %08x %08x %08x %08x" $d0 $d1 $d2 $d3 $d4 # Reduce mod 2^130-5 (partially) lassign [carry $d0] a0 c incr d1 $c lassign [carry $d1] a1 c incr d2 $c lassign [carry $d2] a2 c incr d3 $c lassign [carry $d3] a3 c incr d4 $c lassign [carry $d4] a4 c incr a0 [expr {$c * 5}] lassign [carry $a0] a0 c incr a1 $c debug "a = %08x %08x %08x %08x %08x" $a0 $a1 $a2 $a3 $a4 } # Final reduction mod 2^130-5 lassign [carry $a1] a1 c incr a2 $c lassign [carry $a2] a2 c incr a3 $c lassign [carry $a3] a3 c incr a4 $c lassign [carry $a4] a4 c incr a0 [expr {$c * 5}] lassign [carry $a0] a0 c incr a1 $c debug "a = %08x %08x %08x %08x %08x" $a0 $a1 $a2 $a3 $a4 # a-p set g0 [expr {$a0 + 5}] lassign [carry $g0] g0 c set g1 [expr {$a1 + $c}] lassign [carry $g1] g1 c set g2 [expr {$a2 + $c}] lassign [carry $g2] g2 c set g3 [expr {$a3 + $c}] lassign [carry $g3] g3 c set g4 [expr {$a4 + $c - (1 << 26)}] debug "g = %08x %08x %08x %08x %08x" $g0 $g1 $g2 $g3 $g4 # Use bit-slicing to select a if a < p or a - p if a >= p set mask [expr {$g4 >> 63}] set a0 [expr {$a0 & $mask}] set a1 [expr {$a1 & $mask}] set a2 [expr {$a2 & $mask}] set a3 [expr {$a3 & $mask}] set a4 [expr {$a4 & $mask}] # a = a mod 2^128 set a0 [expr {($a0 | ($a1 << 26)) & 0xFFFFFFFF}] set a1 [expr {(($a1 >> 6) | ($a2 << 20)) & 0xFFFFFFFF}] set a2 [expr {(($a2 >> 12) | ($a3 << 14)) & 0xFFFFFFFF}] set a3 [expr {(($a3 >> 18) | ($a4 << 8 )) & 0xFFFFFFFF}] # mac = (a + s) mod 2^128 set c [expr {$a0 + [lindex $s 0]}] set a0 [expr {$c & 0xFFFFFFFF}] set c [expr {$a1 + [lindex $s 1] + ($c >> 32)}] set a1 [expr {$c & 0xFFFFFFFF}] set c [expr {$a2 + [lindex $s 2] + ($c >> 32)}] set a2 [expr {$c & 0xFFFFFFFF}] set c [expr {$a3 + [lindex $s 3] + ($c >> 32)}] set a3 [expr {$c & 0xFFFFFFFF}] binary format iiii $a0 $a1 $a2 $a3 } } ##### TESTS ##### if {![info exists argv0] || [file tail [info script]] ne [file tail $argv0]} { # Not running as main script so return to avoid running tests below return } proc fromHex hex { regsub -all {\s+} $hex {} hex regsub -all {:} $hex {} hex regsub -all {..} $hex {\x\0} hex subst $hex } proc assertEqual {a b {msg ""}} { if {$a ne $b} { puts "FAIL - assertion failed: expecting '[binary encode hex $a]' to equal '[binary encode hex $b]' $msg" } else { puts "PASS - $msg" } } set key [fromHex {85:d6:be:78:57:55:6d:33:7f:44:52:fe:42:d5:06:a8:01:0 3:80:8a:fb:0d:b2:fd:4a:bf:f6:af:41:49:f5:1b}] set msg "Cryptographic Forum Research Group" set tag [fromHex {a8:06:1d:c1:30:51:36:c6:c2:2b:8b:af:0c:01:27:a9}] assertEqual [poly1305 compute $key $msg] $tag "poly1305 compute" assertEqual 1 [poly1305 verify $key $msg $tag] "poly1305 verify" assertEqual 0 [poly1305 verify $key $msg [string range $tag 1 end]] "poly1305 verify - wrong length" assertEqual 0 [poly1305 verify $key $msg [string reverse $tag]] "poly1305 verify - wrong tag" # vim: ft=tcl
Here is a simple performance comparison of the above code compared to tcllib's HMAC-sha256 code (pure Tcl implementations for both):
128 bytes Poly1305 - pure Tcl - 128 bytes 108bd1726bfc1e416a70895135b9d815 193.6676 microseconds per iteration HMAC-SHA256 - pure Tcl - 128 bytes 8e5353b62994e5e7d30ac7d8d758d21800f34fbf53f2f3705197ba28d000d138 919.1384 microseconds per iteration 1024 bytes Poly1305 - pure Tcl - 1024 bytes 886efe4dfaee12d84278cff7ec560c9b 1476.0026 microseconds per iteration HMAC-SHA256 - pure Tcl - 1024 bytes 6584d1549f6b4f25d639c354d6145c1a769276a8f889df887e4d58a70506b50b 3037.9848 microseconds per iteration 2048 bytes Poly1305 - pure Tcl - 2048 bytes 0669408bbc52f1c842241437933371ac 2862.2865 microseconds per iteration HMAC-SHA256 - pure Tcl - 2048 bytes 0688c0aa0c6487a4d43e18905c3c4925cf879a42dc78c9eced988057b486b7d0 5592.3639 microseconds per iteration 4096 bytes Poly1305 - pure Tcl - 4096 bytes e9f266f8a7e78a2227d18cbe8fddf1e6 5752.5026 microseconds per iteration HMAC-SHA256 - pure Tcl - 4096 bytes cabbdbeb3f913cb3b372c92c89588ac08cfa84ac0d7b9afcd6b1e45b453dfbf6 10104.6838 microseconds per iteration 8092 bytes Poly1305 - pure Tcl - 8092 bytes a206dfe8d8f817e728e44af1b25696b9 11050.4179 microseconds per iteration HMAC-SHA256 - pure Tcl - 8092 bytes 205d7c5b828bb9750816add762bbbbd7b671e460730bfb1c16ce58749e45d49a 19202.6242 microseconds per iteration 65536 bytes Poly1305 - pure Tcl - 65536 bytes 675c087034022440f1b3242cb776c3bd 88696.99309999999 microseconds per iteration HMAC-SHA256 - pure Tcl - 65536 bytes 759e2868b90b38ef0ac7d7d98d8807076c5849fbb04749f68412330f1fae1b92 147730.62420000002 microseconds per iteration 1048576 bytes Poly1305 - pure Tcl - 1048576 bytes 07870a9f9f3db815bc66bd523c19386b 1403489.3573 microseconds per iteration HMAC-SHA256 - pure Tcl - 1048576 bytes 4719f43cd96770ee9707409b3f4c7148bd4c26ea4b6289dca2b3ec70ab792569 2381001.2313 microseconds per iteration
Poly1305 is roughly 2-5x faster than HMAC for short messages, and about 1.75x faster for longer messages. Both are quite slow in Tcl though, so a good C implementation is a better bet if you need to authenticate large amounts of data. (It's also considerably easier to write constant-time code in C).
nem - 2018-11-02 12:20:53
I should say that when used in the AEAD construction of ChaCha20-Poly1305, timing side-channels are of relatively less concern as the derived poly1305 key is unique to each message and the only other data fed into the MAC is the encrypted ciphertext, which is obviously not secret. So the chance of an attacker learning anything interesting from such a timing side-channel is very slim (this is not the case for AES-GCM for example, where the same MAC key is used for every message).