List computations in a FPGA, driven by Tcl

by Theo Verelst

In this time of fast computers, one might forget that also fast computers and/or their programs have limitations, and think and work about those limitations. Tcl is a list processing language, no matter what the exact result ticks like on a processor, after interpretation or compilation so it would seem like fun to try to do a little homework with list processing on a FPGA [1 ].

It is modern to add (Field) Programmable (Gate Array) hardware to a computer farm to speed up computations, even with C to HDL tools, like for instance available from Xilinx and Nallatech, in this case I do modest experiments with a fairly but not beastly fast and moderately (not small but certainly not large by modern standards) sized Spartan 3E Xilinx FPGA (500-4) from the Spartan 3e starter kit from Xilinx (should be quickly google-able from xilinx.com) , and I use the ISE webpack (which includes tcl scripting, but I don´t use that now) to define the schematic (with HDL parts) design and to create the chip programming files.

The home assignment was first to make one or a few builtin 2 kilo byte each memory blocks in the Xilinx readable and writable over serial port in this case running only at 9600 kbps. The logic in the fpga runs at 50MHz (and at times at at least double the frequency using an internal pll circuit has proven to work fine), and the first trick is to take one memory and give it a checksum type of processing which counts up all it´s 2048 words binary, and give back the result, where the counting should work at 50 MegaHerz. That part now works as described below, the design files are here

   http://www.theover.org/Xilinx/Lists.zip

(the whole of the ISE 8.2 project, free for Non-Commercial use) and the link is a self adapted wireless microcontroller programmer from TI which acts as a Usb serial port, so it works on a notebook. The project can be (re-)built ina couple of minutes (on a fairly fast Dell M90, using a Gb ethernet to a normally fast fileserver for the project files), and as stated the testing and driving of the design goes over tcl commands.

The main diagram looks like this:

Image TV Wiki checksum1.png

The serial connection is made to test the design by this adapted script from another application [2 ]:

 ####################################################################################
 #                                     FM setup                                     #
 ####################################################################################
 ##
 ## By Ir. Theo Verelst, http://www.theover.org
 ##
 ##
 ####################################################################################

 # this shows the console window to see output and type Tcl/Tk commands at startup,
 # close it in the console with : console hide, or comment it out here by putting 
 # a hash in front of it.
 console show


 # on Unix, this requires an extra procedurefile, the com port is called different, too
 # See https://wiki.tcl-lang.org/786  and https://wiki.tcl-lang.org/447 , on Redhat e.g. : /dev/ttyS0
 # the unchanged FPGA program IO rate is: 115200

 # open the serial port acting as MIDI
 # (fill in MOC5 and 9600 as you need)
 set fh [open COM5: RDWR]
 fconfigure $fh -blocking 0 -mode 9600,n,8,1 -translation binary -buffering full

 # write a hex code to the serial port
 #
 proc w h {
    global fh
    foreach c [split $h] {
       puts -nonewline $fh [binary format H* $c]
    }
    flush $fh
 }

 # get a hex code to the serial port when present
 #
 fileevent $fh readable {
 global sync
    set w [read $fh] ;
    foreach c [split $w {}] {
       binary scan $c H* a; set v $a
       puts " $v"
        set t3 [expr "0x$v"]
        if {$t3 > 127} {incr t3 -256}
        set t1 [expr 256-2*$t3]
        set t2 [expr 256-2*$t3 +2]
        set x [expr $sync /2]
        switch [expr $sync %4] 0 {set col blue} 1 {set col red} 2 {set col yellow} 3 {set col green}
        .c create oval $x $t1 [expr $x+1] $t2 -fill $col -outline $col -tag gr
        # incr x
        incr sync
        # flush stdout
    }
 }

 update
 set playnote 1

 canvas .c -bg grey90 ; pack .c -expand y -fill both 

 puts "Sending graph data."

 .c del gr ; set x 2 ; set sync 0
 #for {set h 0} {$h<256} {incr h} {w FF[format "%02x" $h]0000}
 # for {set i 0} {$i < 2048} {incr i} {set m($i) [expr $i%256]}
 # for {set i 0} {$i < 2048} {incr i} {w "07[format %02x $m($i)][format %04x $i][format %02x $m($i)][format %02x $m($i)]0000" }
 #for {set i 0} {$i < 9} {incr i} {w "01[format %02x $m($i)][format %04x $i]00000000" }

 puts "Done."

The comments and the graph drawing part are ignored for the moment. Mainly, one can examine the fpga memories by the command ´w 0000HAHL00000000´ where the 16 digits are organized into 8 2 digit HEX numbers for consecutive bytes sent over the serial port, so this is literally taken from a test session:

 (Tcl) 133 % w 0000000800010000
 (Tcl) 134 %  00
  00
  00
  ff

It shows the memory contents at address 8, which is three times a 0, while the checksum for the given memory content was FF (hex). The serial protocol is 2:1 asymetrical, 8 bytes in generate 4 bytes in return, no flow control, the hardware sends one byte automatically back for each received two bytes, and keeps an internal modulo 8 counter, which is supposed to be synchronized and to remain so. If necessary the fast Usb xilinx programmer can be used to reprogram the whole fpga to initial state within a second.

To write to memories the command is ´w 0wD1AHALD2D30000´ where D1..D3 are the memory 1..3 data, HA and HL the high and low adress byte (HA only has 3 bits to get 2 KiloByte), and w the write bits, bit 0 for memory 1, bit 1 for mem2 and bit 2 for mem3, 1 means write, 0 means read.

 (Tcl) 134 % w 0400000800fe0000
 (Tcl) 135 %  00
  00
  00
  fe

The checksum had imedeately reflected the change in data because it gets computed after the last byte has been sent, but the return value comes as the first byte of the next 4-tuple. I had initialized the memory with 0xff s by:

 for {set i 0} {$i < 2048} {incr i} {w "0400[format %04x $i]00[format %02x 255]0000" }

The 50 MHz checksum computation over the memory space of 2048 bytes works correct it appears at this speed. Now lookups are on the menu, because unlike many computerarchitectures, the memory can be used to lookup data in one clockcycle in any circumstance, so indirections without requiring cache planning can be made with fast loop computations. Hold on.


See also small formula rendering tests , Bwise, a serial port tcl script and a Xilinx demo board