I'm in the phase of optimization of the new ATL-Tcl dialect using myoo-OO-extension and right now I'm fight to get PYTHON speed for an EMPTY callback.
→ The CORE problem is the Tcl_EvalObjv which is TO SLOW to beat python.
exec#7 -> '.../release/example/c/perfclient' '--send-nothing' '--sec' '10' '@' 'python3' '.../example/py/perfserver.py' ...:PerfClientExec }: start ------------------------ : result count / sec ...:statistics }: --send-nothing : 504821.1 5048489 / 10.000551 ^^^^^^^^^^^^^^^^^^^^^ ...:PerfClientExec }: end: ---------------------------------------- exec#7 -> '.../release/example/c/perfclient' '--send-nothing' '--sec' '10' '@' 'tclsh8.6' '.../example/atl/perfserver.atl' ...:PerfClientExec }: start ------------------------ : result count / sec ...:statistics }: --send-nothing : 453027.1 4530432 / 10.000355 ^^^^^^^^^^^^^^^^^^^^^ ...:PerfClientExec }: end: ----------------------------------------
→ The CORE problem is the NRE which eats the performance
*---------------------------------------------------------------------- * * Tcl_EvalObjv -- * * This function evaluates a Tcl command that has already been parsed * into words, with one Tcl_Obj holding each word. * * Results: * The return value is a standard Tcl completion code such as TCL_OK or * TCL_ERROR. A result or error message is left in interp's result. * * Side effects: * Always pushes a callback. Other side effects depend on the command. * *---------------------------------------------------------------------- */ int Tcl_EvalObjv( Tcl_Interp *interp, /* Interpreter in which to evaluate the * command. Also used for error reporting. */ int objc, /* Number of words in command. */ Tcl_Obj *const objv, /* An array of pointers to objects that are * the words that make up the command. */ int flags) /* Collection of OR-ed bits that control the * evaluation of the script. Only * TCL_EVAL_GLOBAL, TCL_EVAL_INVOKE and * TCL_EVAL_NOERR are currently supported. */ { int result; NRE_callback *rootPtr = TOP_CB(interp); result = TclNREvalObjv(interp, objc, objv, flags, NULL); return TclNRRunCallbacks(interp, result, rootPtr); }
→ Is there a possible to get an Tcl_EvalObjvEx as striped version of all the NRE stuff?
More information from kcachgrind
exec#5 -> 'NHI1_BUILD/x86_64-suse-linux-gnu/release/example/c/perfclient' '--all-performance' '@' 'NHI1_EXT/x86_64-suse-linux-gnu/release/bin/../atl/bin/atlsh' 'NHI1_HOME/example/atl/perfserver.atl' C> {perfclient :pid(39303):tid(0x7f2f2c02ebc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7f2f2c019c20):PerfClientExec }: start ------------------------ : result count / sec C> {perfclient :pid(39303):tid(0x7f2f2c02ebc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7f2f2c019c20):statistics }: --send-nothing : 471426.8 942995 / 2.000300 C> {perfclient :pid(39303):tid(0x7f2f2c02ebc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7f2f2c019c20):statistics }: --send : 291266.1 582627 / 2.000326 C> {perfclient :pid(39303):tid(0x7f2f2c02ebc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7f2f2c019c20):statistics }: --send-and-callback : 151889.9 303852 / 2.000476 C> {perfclient :pid(39303):tid(0x7f2f2c02ebc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7f2f2c019c20):statistics }: --send-and-wait : 54482.2 108965 / 2.000011 C> {perfclient :pid(39303):tid(0x7f2f2c02ebc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7f2f2c019c20):statistics }: --parent : 91.5 183 / 2.000180 C> {perfclient :pid(39303):tid(0x7f2f2c02ebc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7f2f2c019c20):statistics }: --child : 3339.9 6680 / 2.000032 C> {perfclient :pid(39303):tid(0x7f2f2c02ebc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7f2f2c019c20):statistics }: --bus : 41171.3 82343 / 2.000010 C> {perfclient :pid(39303):tid(0x7f2f2c02ebc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7f2f2c019c20):statistics }: --bfl : 41091.9 82199 / 2.000372 C> {perfclient :pid(39303):tid(0x7f2f2c02ebc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7f2f2c019c20):PerfClientExec }: end: ---------------------------------------- releasedev1usr@linux02:~/Project/NHI1/theLink/tests> Nhi1Exec perfclient.c --all-performance @ perfserver.py exec#5 -> 'NHI1_BUILD/x86_64-suse-linux-gnu/release/example/c/perfclient' '--all-performance' '@' 'NHI1_EXT/x86_64-suse-linux-gnu/release/bin/python3' 'NHI1_HOME/example/py/perfserver.py' C> {perfclient :pid(39555):tid(0x7faace432bc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7faace41dc20):PerfClientExec }: start ------------------------ : result count / sec C> {perfclient :pid(39555):tid(0x7faace432bc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7faace41dc20):statistics }: --send-nothing : 503074.6 1006250 / 2.000200 C> {perfclient :pid(39555):tid(0x7faace432bc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7faace41dc20):statistics }: --send : 309142.1 618398 / 2.000368 C> {perfclient :pid(39555):tid(0x7faace432bc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7faace41dc20):statistics }: --send-and-callback : 156562.8 313139 / 2.000086 C> {perfclient :pid(39555):tid(0x7faace432bc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7faace41dc20):statistics }: --send-and-wait : 74886.3 149773 / 2.000004 C> {perfclient :pid(39555):tid(0x7faace432bc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7faace41dc20):statistics }: --parent : 25.5 51 / 2.000217 C> {perfclient :pid(39555):tid(0x7faace432bc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7faace41dc20):statistics }: --child : 3758.3 7517 / 2.000085 C> {perfclient :pid(39555):tid(0x7faace432bc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7faace41dc20):statistics }: --bus : 44940.4 89881 / 2.000004 C> {perfclient :pid(39555):tid(0x7faace432bc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7faace41dc20):statistics }: --bfl : 43755.8 87512 / 2.000011 C> {perfclient :pid(39555):tid(0x7faace432bc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7faace41dc20):PerfClientExec }: end: ---------------------------------------- releasedev1usr@linux02:~/Project/NHI1/theLink/tests> Nhi1Exec perfclient.c --all-performance @ perfserver.tcl exec#5 -> 'NHI1_BUILD/x86_64-suse-linux-gnu/release/example/c/perfclient' '--all-performance' '@' 'NHI1_EXT/x86_64-suse-linux-gnu/release/bin/tclsh8.6' 'NHI1_HOME/example/tcl/perfserver.tcl' C> {perfclient :pid(39843):tid(0x7fbbffe0cbc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7fbbffdf7c20):PerfClientExec }: start ------------------------ : result count / sec C> {perfclient :pid(39843):tid(0x7fbbffe0cbc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7fbbffdf7c20):statistics }: --send-nothing : 411179.5 822486 / 2.000309 C> {perfclient :pid(39843):tid(0x7fbbffe0cbc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7fbbffdf7c20):statistics }: --send : 232887.1 465810 / 2.000154 C> {perfclient :pid(39843):tid(0x7fbbffe0cbc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7fbbffdf7c20):statistics }: --send-and-callback : 127970.0 255998 / 2.000453 C> {perfclient :pid(39843):tid(0x7fbbffe0cbc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7fbbffdf7c20):statistics }: --send-and-wait : 64507.6 129016 / 2.000011 C> {perfclient :pid(39843):tid(0x7fbbffe0cbc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7fbbffdf7c20):statistics }: --parent : 29.5 59 / 2.000217 C> {perfclient :pid(39843):tid(0x7fbbffe0cbc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7fbbffdf7c20):statistics }: --child : 3344.7 6691 / 2.000498 C> {perfclient :pid(39843):tid(0x7fbbffe0cbc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7fbbffdf7c20):statistics }: --bus : 33603.9 67218 / 2.000303 C> {perfclient :pid(39843):tid(0x7fbbffe0cbc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7fbbffdf7c20):statistics }: --bfl : 33314.0 66628 / 2.000001 C> {perfclient :pid(39843):tid(0x7fbbffe0cbc0):L:dlv(0):ctxId( 0):rc(1):ctx(0x7fbbffdf7c20):PerfClientExec }: end: ----------------------------------------
As I fought - the CORE problem was TCL8.6 which lost ~10%-20% of performance against PYTHON.
mfg.