JSON value extraction benchmark

Difference between version 19 and 20 - Previous - Next
[dbohdan] 2017-02-03: This benchmark compares the speed with which various [JSON] libraries for Tcl can extract a deeply nested string value from a http://mtgjson.com/%|%large JSON blob%|% (circa 18 MiB). The comparison may help you choose one out of a growing number of similar JSON-parsing libraries for Tcl, but, as any microbenchmark, it has a limited scope. Libraries may be faster or slower under different circumstances.


** How to run the benchmark **

Install [Git], [jq], Tcl 8.6, [Tcllib] and (optionally) [wiki-reaper]. Install the Tcl extensions [rl_json], SQLite with [JSON1], [tcl-duktape] and [yajl-tcl]. Then run the POSIX shell commands below.

======none
mkdir jsonbench
cd jsonbench
git clone https://github.com/dbohdan/jimhttp
curl -sf -o AllSets.json.zip https://mtgjson.com/json/AllSets.json.zip
# or wget https://mtgjson.com/json/AllSets.json.zip
unzip AllSets.json.zip
rm AllSets.json.zip
# Instead of running the next command you can manually copy the code from the
# "Code" section of this wiki page and save it as jsonbench.tcl.
wiki-reaper 48500 3 | tee jsonbench.tcl
tclsh jsonbench.tcl
======


** Sample results **

These results are from running the benchmark on https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux%|%WSL%|% (Ubuntu 16.04) on a http://www.cpubenchmark.net/singleThread.html#rk368%|%Phenom II X4 955%|% CPU with no CPU-intensive tasks running in the background. The benchmark used version version 3.8.1 (Jan 23, 2017) of the MTG JSON data set. The jq version was 1.5.

*** 5 iterations ***

======none
Package versions:
Tcl           --  8.6.5
Tcllib json   --  1.3.3
duktape       --  0.3.0
jimhttp json  --  2.1.0
rl_json       --  0.9.7
sqlite3       --  3.18.0
yajltcl       --  1.6.2

Running the benchmark with 5 iterations for each library
jq:              1062 ms
tcl-duktape:     1670 ms
jimhttp JSON:   37941 ms
rl_json:           99 ms
Tcllib JSON:    25261 ms
SQLite JSON1:      58 ms
yajl-tcl:         284 ms
======

*** 20 iterations ***

======none
Package versions:
Tcl           --  8.6.5
Tcllib json   --  1.3.3
duktape       --  0.3.0
jimhttp json  --  2.1.0
rl_json       --  0.9.7
sqlite3       --  3.18.0
yajltcl       --  1.6.2

Running the benchmark with 20 iterations for each library
jq:              1051 ms
tcl-duktape:     1634 ms
jimhttp JSON:   38772 ms
rl_json:           28 ms
Tcllib JSON:    26073 ms
SQLite JSON1:      57 ms
yajl-tcl:         284 ms
======


** Code **

======
#! /usr/bin/env tclsh
# version 0.4.0
package require fileutil
puts {Package versions:}
puts "Tcl           --  [package require Tcl]"
puts "Tcllib json   --  [package require json]"
puts "duktape       --  [package require duktape]"
package require duktape::oo
source jimhttp/json.tcl
puts "jimhttp json  --  $::json::version"
puts "rl_json       --  [package require rl_json]"
puts "sqlite3       --  [package require sqlite3]"
puts "yajltcl       --  [package require yajltcl]"
puts {}

proc ms timeResult {
    return [expr {round([lindex $timeResult 0] / 1000.0)}]
}

proc benchmark {command data times result} {
    return [ms [time {
        set actualResult [$command $data]
        if {$actualResult ne $result} {error "bad result: \"$actualResult\""}
    } $times]]
}

proc jq data {
    return [exec jq -r {.INV.cards[68].flavor} << $data]
}

proc duktape data {
    set j [::duktape::oo::JSON new $::duk $data]
    set result [$j get INV cards 68 flavor]
    $j destroy
    return $result
}

proc jimhttp-json data {
    return [dict get [::json::parse $data] INV cards 68 flavor]
}

proc rl_json data {
    return [::rl_json::json get $data INV cards 68 flavor]
}

proc tcllib-json data {
    set parsed [::json::json2dict $data]
    return [dict get [lindex [dict get $parsed INV cards] 68] flavor]
}

proc sqlite3-json1 data {
    return [lindex [::sq3 eval {
        select json_extract($data, '$.INV.cards[68].flavor')
    }] 0]
}

proc yajltcl data {
    set parsed [::yajl::json2dict $data]
    return [dict get [lindex [dict get $parsed INV cards] 68] flavor]
}

proc report {displayName n} {
    puts [format {%-13s  %6u ms} ${displayName}:  $n]
}

proc main {} {
    set times 20
    set value {Children claim no two feathers are exactly the same color,\
            then eagerly gather them for proof.}

    set sets [::fileutil::cat AllSets.json]

    puts "Running the benchmark with $times iterations for each library"
    report jq             [benchmark jq $sets $times $value]
    set ::duk [::duktape::oo::Duktape new]
    report tcl-duktape    [benchmark duktape $sets $times $value]
    $::duk destroy
    report {jimhttp JSON} [benchmark jimhttp-json $sets $times $value]
    report rl_json        [benchmark rl_json $sets $times $value]
    report {Tcllib JSON}  [benchmark tcllib-json $sets $times $value]
    sqlite3 ::sq3 :memory:
    ::sq3 enable_load_extension 1
    report {SQLite JSON1} [benchmark sqlite3-json1 $sets $times $value]
    ::sq3 close
    report yajl-tcl       [benchmark yajltcl $sets $times $value]
}

main
======


** Discussion **

'''[ak] - 2017-05-01 23:44:31'''

Note, while basic Tcllib is pure Tcl you can use [critcl] and `make tcllibc` to generate a C-based accelerator for various parts of Tcllib, including the json package.
It might be interesting to see how much this accelerator helps the json extractor.

----

[dbohdan] 2018-03-31: It is worth noting that the [jimhttp] JSON parser is a lot slower in Tcl 8.6 than it is in [Jim Tcl]. The following benchmark shows the difference. To run the benchmark script I used a [http://kitcreator.rkeene.org/kits/building/39e6cc5c2814e7753a9149aa0c74fc1056cc4935/%|%Tcl 8.6.8 Tclkit] built with [KitCreator] and a [Jim Tcl] [https://github.com/msteveb/jimtcl/releases/tag/0.77%|%v0.77] binary built locally with GCC 5.4.0 with the default `CFLAGS` given by its `configure` script: `-g -O2 -fno-unwind-tables -fno-asynchronous-unwind-tables`.

======
$ du -h AllSets.json  # v3.14 Mar 7, 2018 -- much larger than v3.8.1 above
29M AllSets.json

$ cat bench.tcl
puts "$::tcl_platform(engine) [info patchlevel]"
set ch [open $argv]; set json [read $ch]; close $ch
source json.tcl
set i 0
puts [time {::json::parse $json; puts [incr i]} 5]

$ for bin in /tmp/benchkits/tclkit /tmp/benchkits/jimtcl/jimsh; \
do $bin bench.tcl AllSets.json; done
Tcl 8.6.8
1
2
3
4
5
148498381.0 microseconds per iteration
Jim 0.77
1
2
3
4
5
25020679 microseconds per iteration
======

That's a 6x difference. The same does not happen with [Tcllib JSON] hacked up to work in Jim Tcl.

======
$ diff -r /usr/share/tcltk/tcllib1.17/json/json.tcl /tmp/tcllib-json/json.tcl
9d8
< package require Tcl 8.4
32c31
<       if {![package vsatisfies [package provide Tcl] 8.4]} {return 0}
---
>       return 0
diff -r /usr/share/tcltk/tcllib1.17/json/json_tcl.tcl /tmp/tcllib-json/json_tcl.tcl
12,14d11
< if {![package vsatisfies [package provide Tcl] 8.5]} {
<     package require dict
< }

$ cat bench2.tcl
puts "$::tcl_platform(engine) [info patchlevel]"
set ch [open $argv]; set json [read $ch]; close $ch
source /tmp/tcllib-json/json.tcl
set i 0
puts [time {::json::json2dict $json; puts [incr i]} 5]

$ for bin in /tmp/benchkits/tclkit /tmp/benchkits/jimtcl/jimsh; \
do $bin bench2.tcl AllSets.json; done
Tcl 8.6.8
1
2
3
4
5
27883525.0 microseconds per iteration
Jim 0.77
1
2
3
4
5
29755740 microseconds per iteration
======

[dbohdan] 2019-10-28: The comparison between jimhttp's `json.tcl` in Jim Tcl and Tcl 8.6 above is '''outdated.'''  UTF-8 builds of Jim Tcl's `master` branch perform more than an order of magnitude slower on the benchmark than Tcl 8.6.  In return they have gained full Unicode support in `regexp` and hence the important ability to decode JSON with UTF-8 strings that aren't escaped.  For faster JSON parsing in UTF-8 Jim Tcl you can use SQLite with JSON1, jq, or the [jmsn] binary extension.  Non-UTF-8 builds retain the old performance profile.
[dbohdan] 2019-12-08: The `--full` configuration of Jim Tcl 0.79 comes with a native `json::decode` command.

<<categories>>Data Serialization Format | Performance