A Tcl extension for FPGA compiled C function direct interaction

Some times it's a real interest of mine I put on these pages, this is the start of one of those occasions!

I've been working with the Xilinx tools (see Use of TCL in Xilinx Vivado 2019 and Bwise blocks connecting FPGA accelerated C functions) on the old dream of Silicon Compilation, in this case turning a (more or less) regular C function into a FPGA program. The Vivado_hls tools allow the interaction with such a FPGA compiled by direct AXI bus communication, allowing fast and reliable communication between the CPU and the FPGA, using C code.

Here's my first example C function turned FPGA with a also C based TCL extension communicating with it on a Parallella Board, based on the Zynq 7010 A.R.M. processor + FPGA. This is the C function put into the FPGA fabric [L1 ] (It's actually normal C, not ++, based on an example, but quite different now, and not particularly suited for a practical purpose outside testing). The usage of a our relatively small (compared with more current Ultrascales, and bigger models like on AWS F1 nodes) but not bad FPGA for this lookup table with in-fpga initialization is like this (for the Vivado 2019.2 tools, because the parameters change with the course of time):

The bus protocol and data preparation mess in in here: [L2 ] , the big advantage is that this can work clean and efficient.

Finally, the Tcl extension to talk with the FPGA over this interface: [L3 ]. There are two parameters passed to the added function to the tcl interpreter, one isn't used at the moment, and the result which after the first call to the FPGA can be blazing fast, is returned as an integer as well. Actually the lookup is 16 bits in and out, so there's a typecast. The compile went like this:

   gcc -shared -o libfpgacom.so -DUSE_TCL_STUBS -I/usr/include/tcl8.6/ fpgacom.c uio_mult_test_csin3lookup_tcl.c -L/usr/lib/arm-linux-gnueabihf/ -ltclstub8.6 -fPIC

Of course the tcl-dev install is required for the library, and I didn't discuss the making of the FPGA ".bit" file and the loading of such. I've done a video on the main line of that process if you're interested [L4 ] . Here are the test commands I just tried the extension out with:

 (Tcl) 1 % load libfpgacom.so
 (Tcl) 2 % fpga
 (Tcl) 3 % fpga 0 0
 -32767
 (Tcl) 4 % fpga 0 44
 -32767
 (Tcl) 5 % fpga 16384 0
 -23170
 (Tcl) 6 % expr 1.0 * [fpga 16384 0] / [fpga 0 44]
 0.7071138645588549
 (Tcl) 7 % expr sin(3.14156535 /4);
 0.7071019545317011

To test if the FPGA function is accurate, we can use these procs, also to use with the proctoblock BWise function:

 proc runfpga { {i} } {
   return [fpga $i 0]
 } 

 proc runsimu { {i} } {
    return [expr int( \
       [domaxima "float( 32768.0*sin( \
          3.1415926535 *  ($i - 65536/2) /  65536 \
       ))"] \
    )]
 }

The call time for the fpga routine is about 2.5 microseconds, tested in a for a million long loop that took ~5 seconds by itself on the cpu running the for {} {} {} {} loop. I didn't test a pure C program interface with the handshake for timing. The domaxima proc uses external Maxima (also on the edge computer board) for math operations.

The program as shown has a sine table in short integers from 0-65535 (2^16) from -pi/2 tp +pi/2, a few examples:

 (Tcl) 14 % runsimu 100
 -32767
 (Tcl) 15 % runfpga 100
 -32767
 (Tcl) 16 % runsimu 32768
 0
 (Tcl) 17 % runfpga 32768
 0

It's also a good idea to check the FPGA-compiled results with a regular C compiler, for instance on the Zynq board, or in this case on a x86_64 machine, using gcc 8.3.1, using the actual code used for the Xilinx tools, with a few added defines for data types not a part of normal C, and a obvious "main()" calling the same function as is used for the silicon compile: [L5 ]. Testing this C version:

 $ ./lookupsim 65535
 -32768
 $ ./lookupsim 65534
 -32768
 $ ./lookupsim 65500
 32767
 $ ./lookupsim 16384
 -23170
 $ ./lookupsim 16383
 -23171
 $ ./lookupsim `echo "16384 32768 +p" | dc`
 23170
 $ ./lookupsim 0
 -32768
 $ ./lookupsim 500
 -32758

It is possible the type casting and the exact interpretation of binary numbers can vary between certain C and Xilinx interpretations.