CInvoke: Another Foreign Function Interface

MiR 2023-03-08 I played around with CInvoke, a very old implementation of a basic Foreign Funtion Interface from https://www.nongnu.org/cinvoke/ Finally, after adapting the code base of CInvoke a bit and implementing a basic TCL library, I came up with a proof of concept. I know, that there are two much more elaborate libs, ffidl and cffi, wich take care of ffi under TCL, so there actually is no need for another ffi library :-) but I looked for something not based on libffi with a smaller codebase to be easily compiled and the got carried away while coding.

The whole thing will be hosted on my github page [L1 ], I also upload the original CInvoke codebase, so you can easily see the changes I applied.

2023-03-17: Meanwhile some bugfixes (there are still some left, don't worry) and basic code cleaning took place. The actual repository seems to be stable, it compiles under linux_86_64 and win32 with gcc successfully.

Moreover I added a spin-off to the tcc4tcl-dev repo [L2 ], so it now can take tagged pointers from CInvoke. Therefor it is now possible, to build CStruct and CType in TCL, manipulate it and give it's pointer to a C-function to compile with tcc4tcl (or any other compiler) with just adding a simple #include and package require to the Tcl source

Some words on CInvoke

First, this beast is really old, so the plattfrom specific support is a bit dated. There is support for linux x64 (tested, works) and Linux x86 (didn't test), Win32 (had to patch this to work with gcc, works, Original version seems to be destined to MSVC) and there is some sparc code and ppc_osx, wich I didn't look at. Second, the implementation uses the stack pointer to post function arguments on the stack and newer versions of gcc complain about clobbering the stack register. The code still compiles, but be warned, other compiler might block this behaviour (TCC for example doesn't warn but will error out on this) So, while it was fun to code this little piece, I have no hope this will hold up to the future... and break when gcc decides to error out also on clobbering &esp.

I took some inspiration (and code) as well from ffidl as from cffi. The whole tcl-stubs implementation is taken from ffidl, the pointer-tagging is taken from cffi. The rest is implemented from scratch. Since I'm no professional coder I know, that the code could need a reasonable amount of refactoring and de-spagetthifiing... but anyway, limited on time resources I publish this anyway.

Introduction

Approaching the interface I decided to take a more TCL route and separated out the main funtionality into Commands. There are 5 main Commands:

CInvoke
CType
CData
CStruct
CCallback

Each of has an initial command to instance an ensaemble of named commands, wich will handle the further work. See examples beneath.

and there are some support commands

typedef
sizeof
PointerCast

CInvoke

implements the main interface to loading dynamic libraries, getting symbol adresses and invoke functions on it.

To open a dll, instance a CInvoke command:

CInvoke pathto/libname[info sharedlibextension] mylib

the resulting command mylib has two subcommands: getadr and function

mylib getadr SYMBOLNAME
mylib function SYMBOLNAME TCL-PROCNAME _returntype_ {LIST _paramtypes_}

Usage: Given a dll, that holds the symbol hellofunc

set hellodata [mylib getadr hellofunc]

will return a tagged pointer 0x0000012345^SYMBOL

The SYMBOL tag signals, that this is an external symbol. If you need to use this directly you will have to PointerCast this to a usable type. (See the chapter on pointers beneath)

Given a function in yourlib.dll/.so defined as int hellofunc(int a, int b); You will define a function as follows:

mylib function hellofunc Tcl_hellofunc int {int int}
#Now we can call 
set result [hellofunc 123 456]

...

CType

CType represents typed values in TCL space. These are useful, if we need a referenced value like int* We will use these also to instance CStruct (see beneath)

CType int myint 123
set myint 456
puts [myint get]
set myintptr [myint getptr]

myint set "1234"
myint get
myint setptr
myint getptr
myint getptr*

getptr and getptr* can be used for foreign functions, that use pointers as arguments. It is typesafe and can be used to retrieve values by pointer. How differ getptr and getptr*? Well, the value of a CType is a memoryadress, so getptr basically returns this adress. If the value is already a ptr type, getting it's adress returns a double ptr. Sometimes we need the value of the pointer, and that's where getptr* comes into play, it returns the ptr-value.

Samples for getptr: Given a dll holds a function int hellofunc(int* a, int* b); that returns values by pointer in *a and *b

CType int a 123
CType int b 456
mylib function hellofunc tcl_hellofunc int {ptr.int ptr.int} 
set ok [tcl_hellfunc [a getptr] [b getptr]]
puts "a: [a get] b: [b get]"

Sample for getptr* our lib now has a routine, that works on a struct with data. The ptr data needs to point to the bytearray, where the data is stored.

# define bytearray with 1024 bytes length
CDATA arrtype 1024

# declare struct
CStruct withdata {
    ptr data
    int size
}
# and define a custom type on it
typedef struct withdata

# instance the struct
CType withdatavar withdata

# and init the fields
set arrptr [arrtype getptr*]
withdatavar set data $arrptr
withdatavar set size 1024

#now we can give this to a foreign function
# using the struct type identifier as parametertype will garantee at least that the pointer refers to a valid struct
# and this will resolve the ptr as struct accordingly
ci function test19 test19 int withdata
set rv [test19 withdatavar]

# the according c code could look like this
typedef struct _withdata {
    char* mydata;
    int size;
} withdata;

DLLEXPORT int test19(withdata* wsb) {
    printf ("Test19: Size of withdata: %lu\n", sizeof(withdata));
    printf ("test19: size of data %dn", wsb->size);
    char* data;
    char* dest=(char*)wsb->mydata;
    for (int i=0; i<wsb->size; i++) {
        *(dest+i)=(wsb->size-i)%255;
    }
    return 0;
    
//
}

...

Basic types are listed beneath and you can even derive own types:

typedef int myint
typedef uint myuint

Syntax

typdef typename basetype (size signed)

Signed will invoke a cast to signed /unsigned values, where applicable. Size actually gets ignored :-)

typedef can also be used to typedef structs, see according chapter about CStruct

Arrays can be built with CTypes or as parts of CStructs

CType int(5) anarray {1 2 3 4 5}
anarray get 0
anarray set 0 1
anarray set {1 2 3 4 5}
anarray set 2 {3 4 5}

anarray length 5

...

Basic types

Hardcoded into the system are

funcdamental type from c 

char
short
int
long
longlong

uchar
ushort
uint
ulong
ulonglong

float
double

uint8_t
uint16_t
uint32_t
uint64_t
int8_t
int16_t
int32_t
int64_t


a generic ptr type
ptr

ptr representing a struct
struct

Stringtypes
string              tcl allocated utf8-string
utfstring           tcl allocated utf16-string
char*               sys allocated utf8-string

Datatypes
bytearray           ptr representing a bytearray (char)
bytevar             tcl-var to represent bytearray

Special Types
tclobj              a Tcl_Obj*, that will be passed as ptr
tclproc             a Tcl_Proc (for callbacks or callouts)

interp              the actual instance of Tcl_Interp*

Extended types
will be defined depending on System, under Win32 for example there are

_Bool
bool
dev_t
ino_t

off_t
size_t
time_t


BOOL
BOOLEAN
BYTE
CHAR
DWORD
DWORDLONG
DWORD_PTR
HALF_PTR
INT
INT_PTR
LONG
LONGLONG
LONG_PTR
LPARAM
LRESULT
SHORT
SIZE_T
SSIZE_T

UCHAR
UINT
UINT_PTR
ULONG
ULONGLONG
ULONG_PTR
USHORT
WORD
WPARAM

...

CDATA

CDATA is a specialized CType, that can handle bytearrays

Usage

CDATA mydata nbytes
mydata set "1234"
mydata get
mydata setptr
mydata getptr
mydata getptr*
mydata memcpy
mydata peek @
mydata poke @ value

...

CStruct

CStruct can be used to construct c-like structs, set/get named values and pass them to functions (pass-by-pointer only). Substructs and Unions can be used, but they connot be named structs. So naming follows an explicit scheme like sub.a sub.b etc

CStruct declares the definition, typedef prepares the type and CType instances a type of a struct.

Usage

CStruct structname {
   int a
   int b
   struct {
      int sub.a
      int sub.b
   }
}
typedef struct structname
CType mystructinstance structname

mystructinstance set a 1234
mystructinstance get a
mystructinstance setptr 0x000xxx^structname
mystructinstance getptr
mystructinstance memcpy << from // >> to (ptr)
mystructinstance info 
mystructinstance size
mystructinstance struct

mystructinstance info, size and struct give information on the underlying struct definition.

memcpy can be used to copy structs of the same type onto another, >> and << define the underlying direction to memcpy.

Sample:

typedef ptr CDATA
CStruct Tk_PhotoImageBlock { 
    CDATA pixelPtr
    int width
    int height
    int pitch
    int pixelSize
    int red
    int green
    int blue
    int reserved
}
typedef struct Tk_PhotoImageBlock

set Tk_FindPhoto        [cinv::stubsymbol tk stubs 64]; #Tk_FindPhoto
set Tk_PhotoGetImage    [cinv::stubsymbol tk stubs 146]; #Tk_PhotoGetImage

CType block Tk_PhotoImageBlock
ci function $Tk_PhotoGetImage Tk_PhotoGetImage int {ptr.pblockh Tk_PhotoImageBlock} 
ci function $Tk_FindPhoto Tk_FindPhoto ptr.pblockh {interp string}

# first we lookup an image by name; be sure to have defined this beforehand
set PBlockH [Tk_FindPhoto . $imgname]
Tk_PhotoGetImage $PBlockH [block getptr]

set nbytes [expr {[block get height]*[block get pitch]}]
CDATA bytes $nbytes
bytes memcpy << [block get pixelPtr]
CDATA bytes2 $nbytes
bytes2 setptr [block get pixelPtr] mem_none
if {[bytes get] ne [bytes2 get]} {
    puts "Error, got different data blocks"
}

CStrcut can also declare nested structs.

CStruct subs {
    char c
    int i
    long l
}
CStruct subs2 {
    char c2
    int i2
    long l2
    subs sub1
}

CStruct testsubs {
    int a
    subs sub1
    subs2 sub2
}

#unfolded this will be constructed as
CStruct testsubs {        
    int a 
    struct {
        char sub1.c 
        int sub1.i 
        long sub1.l
    } 
    struct {
        char sub2.c2 
        int sub2.i2 
        long sub2.l2 
        struct {
            char sub2.sub1.c 
            int sub2.sub1.i 
            long sub2.sub1.l
        }
    }
}

...

On Pointers and Scriptinglevel FFI

The cffi webpage has a good introduction into pointers as seen from a TCL point of view. Pointers in an FFI are the easiest way to crash your script, so be careful, there might be dragons, no there ARE dragons. I stole the whole tagged pointer concept and part of the code from cffi... so it basically works the same way.

Basic Pointer type is called ptr

mylib function hellofunc tcl_hellofunc int {ptr ptr}

The basic ptr type is convertible in both ways, you can give a ptr type to all pointers and pass a ptr type everywhere.

If you specify a special pointer, lets say to an int* this will get ptr.int

mylib function hellofunc tcl_hellofunc int {ptr.int ptr.int}

As soon as a ptr is specified, it will only accept ptr tagged with the same type.

So, given a function in your dynlib defined as char* hellofunc(int* a, int* b); will get

mylib function hellofunc tcl_hellofunc ptr.char* {ptr.int ptr.int}

To feed the pointers to the function we need a CType:

#define two int 
CType int a
CType int b
set a 123; set b 456;

#create function
mylib function hellofunc tcl_hellofunc int {ptr.int ptr.int}

#call it with the pointers
puts [tcl_hellofunc [a getptr] [b getptr]]

#hellofunc has the signature int hellofunc(int* a, int* b);

Basically, the tag can be any type predefined or user defined. You can even use arbitrary tags, if the pointer is opaque and doesn't need to be dereferenced on the script level

mylib function opaqueresultfunction tcl_opaqueresultfunction ptr.opaquetype {int int}
mylib function takeopaqueptrfunction tcl_takeopaqueptrfunction int {ptr.opaquetype}

set ptropaque [tcl_opaqueresultfunction 123 456]
tcl_takeopaqueptrfunction $ptropaque

...

Sample code: Create a TCL-Command from TCL via CInvoke

CInvoke "" mylib
set Tcl_CreateObjCommand    [cinv::stubsymbol tcl stubs 96]; #Tcl_CreateObjCommand

# setup an arbitrary clientdata structure to show how that works
CStruct clientdata {
    int a
    int b
}
typedef struct clientdata

# instance and init it
CType cd1 clientdata
puts "Setting values"
cd1 set a 10
cd1 set b 20

# this will be our Tcl_Command later
# it carries the typical signature ( ClientData cdata, Tcl_Interp *interp, int objc, Tcl_Obj * CONST objv[])
proc tcl_command {cdata interp objc objv} {
    # get the clientdata 
    CType _cdata clientdata
    _cdata setptr $cdata mem_none
    puts "got [_cdata get a] and [_cdata get b] from clientdata"
    # takeover the objv arguments
    CType objarr tclobj 
    CType testest int
    objarr setptr $objv mem_none
    objarr length $objc
    puts "len: [objarr length]"
    # iterate over the arguments   
    for {set i 0} {$i<$objc} {incr i} {
        puts "Got arg $i [objarr get $i]"
    }
    incr i -1
    _cdata set a [expr [_cdata get a]+1]
    _cdata set b [expr [_cdata get b]-1]
    # cleanup
    return 0
}

#create a callback from tclproc with the classic tcl_objcommand signature
#( ClientData cdata, Tcl_Interp *interp, int objc, Tcl_Obj * CONST objv[])
CCallback tclcommand tcl_command int {clientdata ptr int ptr}

# create tcl obj command with tclproc as callback and a cstruct as clientdata
mylib function $Tcl_CreateObjCommand Tcl_CreateObjCommand ptr {interp string tclproc clientdata ptr} 
Tcl_CreateObjCommand "." "test_tclcommand" [tclcommand getptr] [cd1 getptr] NULL

#run
test_tclcommand arg1 arg2 {l1 l2 l3}

...