JSON value extraction benchmark

dbohdan 2017-02-03: This benchmark compares the speed with which various JSON libraries for Tcl can extract a deeply nested string value from a large JSON blob (circa 18 MiB). The comparison may help you choose one out of a growing number of similar JSON-parsing libraries for Tcl, but, as any microbenchmark, it has a limited scope. Libraries may be faster or slower under different circumstances.

How to run the benchmark

Install Git, jq, Tcl 8.6, Tcllib and (optionally) wiki-reaper. Install the Tcl extensions rl_json, SQLite with JSON1, tcl-duktape and yajl-tcl. Then run the POSIX shell commands below.

mkdir jsonbench
cd jsonbench
git clone https://github.com/dbohdan/jimhttp
curl -sf -o AllSets.json.zip https://mtgjson.com/json/AllSets.json.zip
# or wget https://mtgjson.com/json/AllSets.json.zip
unzip AllSets.json.zip
rm AllSets.json.zip
# Instead of running the next command you can manually copy the code from the
# "Code" section of this wiki page and save it as jsonbench.tcl.
wiki-reaper 48500 3 | tee jsonbench.tcl
tclsh jsonbench.tcl

Sample results

These results are from running the benchmark on WSL (Ubuntu 16.04) on a Phenom II X4 955 CPU with no CPU-intensive tasks running in the background. The benchmark used version version 3.8.1 (Jan 23, 2017) of the MTG JSON data set. The jq version was 1.5.

5 iterations

Package versions:
Tcl           --  8.6.5
Tcllib json   --  1.3.3
duktape       --  0.3.0
jimhttp json  --  2.1.0
rl_json       --  0.9.7
sqlite3       --  3.18.0
yajltcl       --  1.6.2

Running the benchmark with 5 iterations for each library
jq:              1062 ms
tcl-duktape:     1670 ms
jimhttp JSON:   37941 ms
rl_json:           99 ms
Tcllib JSON:    25261 ms
SQLite JSON1:      58 ms
yajl-tcl:         284 ms

20 iterations

Package versions:
Tcl           --  8.6.5
Tcllib json   --  1.3.3
duktape       --  0.3.0
jimhttp json  --  2.1.0
rl_json       --  0.9.7
sqlite3       --  3.18.0
yajltcl       --  1.6.2

Running the benchmark with 20 iterations for each library
jq:              1051 ms
tcl-duktape:     1634 ms
jimhttp JSON:   38772 ms
rl_json:           28 ms
Tcllib JSON:    26073 ms
SQLite JSON1:      57 ms
yajl-tcl:         284 ms

Code

#! /usr/bin/env tclsh
# version 0.4.0
package require fileutil
puts {Package versions:}
puts "Tcl           --  [package require Tcl]"
puts "Tcllib json   --  [package require json]"
puts "duktape       --  [package require duktape]"
package require duktape::oo
source jimhttp/json.tcl
puts "jimhttp json  --  $::json::version"
puts "rl_json       --  [package require rl_json]"
puts "sqlite3       --  [package require sqlite3]"
puts "yajltcl       --  [package require yajltcl]"
puts {}

proc ms timeResult {
    return [expr {round([lindex $timeResult 0] / 1000.0)}]
}

proc benchmark {command data times result} {
    return [ms [time {
        set actualResult [$command $data]
        if {$actualResult ne $result} {error "bad result: \"$actualResult\""}
    } $times]]
}

proc jq data {
    return [exec jq -r {.INV.cards[68].flavor} << $data]
}

proc duktape data {
    set j [::duktape::oo::JSON new $::duk $data]
    set result [$j get INV cards 68 flavor]
    $j destroy
    return $result
}

proc jimhttp-json data {
    return [dict get [::json::parse $data] INV cards 68 flavor]
}

proc rl_json data {
    return [::rl_json::json get $data INV cards 68 flavor]
}

proc tcllib-json data {
    set parsed [::json::json2dict $data]
    return [dict get [lindex [dict get $parsed INV cards] 68] flavor]
}

proc sqlite3-json1 data {
    return [lindex [::sq3 eval {
        select json_extract($data, '$.INV.cards[68].flavor')
    }] 0]
}

proc yajltcl data {
    set parsed [::yajl::json2dict $data]
    return [dict get [lindex [dict get $parsed INV cards] 68] flavor]
}

proc report {displayName n} {
    puts [format {%-13s  %6u ms} ${displayName}:  $n]
}

proc main {} {
    set times 20
    set value {Children claim no two feathers are exactly the same color,\
            then eagerly gather them for proof.}

    set sets [::fileutil::cat AllSets.json]

    puts "Running the benchmark with $times iterations for each library"
    report jq             [benchmark jq $sets $times $value]
    set ::duk [::duktape::oo::Duktape new]
    report tcl-duktape    [benchmark duktape $sets $times $value]
    $::duk destroy
    report {jimhttp JSON} [benchmark jimhttp-json $sets $times $value]
    report rl_json        [benchmark rl_json $sets $times $value]
    report {Tcllib JSON}  [benchmark tcllib-json $sets $times $value]
    sqlite3 ::sq3 :memory:
    ::sq3 enable_load_extension 1
    report {SQLite JSON1} [benchmark sqlite3-json1 $sets $times $value]
    ::sq3 close
    report yajl-tcl       [benchmark yajltcl $sets $times $value]
}

main

Discussion

ak - 2017-05-01 23:44:31

Note, while basic Tcllib is pure Tcl you can use critcl and make tcllibc to generate a C-based accelerator for various parts of Tcllib, including the json package. It might be interesting to see how much this accelerator helps the json extractor.


dbohdan 2018-03-31: It is worth noting that the jimhttp JSON parser is a lot slower in Tcl 8.6 than it is in Jim Tcl. The following benchmark shows the difference. To run the benchmark script I used a Tcl 8.6.8 Tclkit built with KitCreator and a Jim Tcl v0.77 binary built locally with GCC 5.4.0 with the default CFLAGS given by its configure script: -g -O2 -fno-unwind-tables -fno-asynchronous-unwind-tables.

$ du -h AllSets.json  # v3.14 Mar 7, 2018 -- much larger than v3.8.1 above
29M AllSets.json

$ cat bench.tcl
puts "$::tcl_platform(engine) [info patchlevel]"
set ch [open $argv]; set json [read $ch]; close $ch
source json.tcl
set i 0
puts [time {::json::parse $json; puts [incr i]} 5]

$ for bin in /tmp/benchkits/tclkit /tmp/benchkits/jimtcl/jimsh; \
do $bin bench.tcl AllSets.json; done
Tcl 8.6.8
1
2
3
4
5
148498381.0 microseconds per iteration
Jim 0.77
1
2
3
4
5
25020679 microseconds per iteration

That's a 6x difference. The same does not happen with Tcllib JSON hacked up to work in Jim Tcl.

$ diff -r /usr/share/tcltk/tcllib1.17/json/json.tcl /tmp/tcllib-json/json.tcl
9d8
< package require Tcl 8.4
32c31
<       if {![package vsatisfies [package provide Tcl] 8.4]} {return 0}
---
>       return 0
diff -r /usr/share/tcltk/tcllib1.17/json/json_tcl.tcl /tmp/tcllib-json/json_tcl.tcl
12,14d11
< if {![package vsatisfies [package provide Tcl] 8.5]} {
<     package require dict
< }

$ cat bench2.tcl
puts "$::tcl_platform(engine) [info patchlevel]"
set ch [open $argv]; set json [read $ch]; close $ch
source /tmp/tcllib-json/json.tcl
set i 0
puts [time {::json::json2dict $json; puts [incr i]} 5]

$ for bin in /tmp/benchkits/tclkit /tmp/benchkits/jimtcl/jimsh; \
do $bin bench2.tcl AllSets.json; done
Tcl 8.6.8
1
2
3
4
5
27883525.0 microseconds per iteration
Jim 0.77
1
2
3
4
5
29755740 microseconds per iteration