[llvmtcl logo] **News** * [jdc] 29-jul-2013 Support for LLVM 3.3 * [jdc] 10-jan-2013 Support for LLVM 3.2, added support for `PassManagerBuilder`s * [jdc] 24-nov-2011 Support for LLVM 3.0 * [jdc] 15-apr-2011 Support for LLVM 2.9 * [jdc] 28-may-2010 Made namespace ensemble. * [jdc] 27-may-2010 Converted into a TEA package. * [jdc] 21-may-2010 To learn [LLVM] I made a wrapper for LLVM's C API. This wrapper is available at: http://github.com/jdc8/llvmtcl **Requirements** * Tcl 8.6 * LLVM 3.2, get it from http://llvm.org/releases/ **Building the wrapper** The package is a [TEA] package, build it like this: ======none $ cd llvmtcl $ export CC=g++ $ ./configure --with-tcl=... --with-llvm-config=... $ make $ make test $ make install ====== **Using the LLVM API** The package makes a `llvmtcl` ensemble command. The subcommands are the LLVM C API functions with `LLVM` trimmed from the front. Functions taking a pointer argument followed by an `unsigned` argument to specified the number of elements the pointer is pointing to are converted to ensemble commands where the pointer and the `unsigned` argument are replaced by a Tcl [list]. All supported types, enumerators and functions can be found in http://github.com/jdc8/llvmtcl/blob/master/llvmtcl-gen.inp An example using the package can be found at: http://github.com/jdc8/llvmtcl/blob/master/examples/test2.tcl. It first builds and verifies a LLVM module and function: ======tcl lappend auto_path . package require llvmtcl # Initialize the JIT llvmtcl LinkInJIT llvmtcl InitializeNativeTarget # Create a module and builder set m [llvmtcl ModuleCreateWithName "testmodule"] set bld [llvmtcl CreateBuilder] # Create a plus10 function, taking one argument and adding 6 and 4 to it set ft [llvmtcl FunctionType [llvmtcl Int32Type] [list [llvmtcl Int32Type]] 0] set plus10 [llvmtcl AddFunction $m "plus10" $ft] # Create constants set c6 [llvmtcl ConstInt [llvmtcl Int32Type] 6 0] set c4 [llvmtcl ConstInt [llvmtcl Int32Type] 4 0] # Create the basic blocks set entry [llvmtcl AppendBasicBlock $plus10 entry] # Put arguments on the stack to avoid having to write select and/or phi nodes llvmtcl PositionBuilderAtEnd $bld $entry set arg0_1 [llvmtcl GetParam $plus10 0] set arg0_2 [llvmtcl BuildAlloca $bld [llvmtcl Int32Type] arg0] set arg0_3 [llvmtcl BuildStore $bld $arg0_1 $arg0_2] # Do add 10 in two steps to see the optimizer @ work # Add 6 set arg0_4 [llvmtcl BuildLoad $bld $arg0_2 "arg0"] set add6 [llvmtcl BuildAdd $bld $arg0_4 $c6 "add6"] # Add 4 set add4 [llvmtcl BuildAdd $bld $add6 $c4 "add4"] # Set return llvmtcl BuildRet $bld $add4 # Show input puts "----- Input --------------------------------------------------" puts [llvmtcl DumpModule $m] # Verify the module lassign [llvmtcl VerifyModule $m LLVMReturnStatusAction] rt msg if {$rt} { error $msg } ====== This results in the following LLVM bit code: ======none define i32 @plus10(i32) { entry: %arg0 = alloca i32 ; [#uses=2] store i32 %0, i32* %arg0 %arg01 = load i32* %arg0 ; [#uses=1] %add6 = add i32 %arg01, 6 ; [#uses=1] %add4 = add i32 %add6, 4 ; [#uses=1] ret i32 %add4 } ====== Now execute it: ======tcl # Execute lassign [llvmtcl CreateJITCompilerForModule $m 0] rt EE msg set i [llvmtcl CreateGenericValueOfInt [llvmtcl Int32Type] 4 0] set res [llvmtcl RunFunction $EE $plus10 $i] puts "plus10(4) = [llvmtcl GenericValueToInt $res 0]\n" ====== Now optimize the LLVM module: ======tcl # Optimize llvmtcl Optimize $m $plus10 ====== Result of optimization: ======none define i32 @plus10(i32) readnone { entry: %add4 = add i32 %0, 10 ; [#uses=1] ret i32 %add4 } ====== **Transforming Tcl into LLVM bit code** The LLVM wrapper has limited support for converting Tcl into LLVM bit code and is based on the output of [tcl::unsupported::disassemble]. Current (stringent) limitation are: * all variables are 32 bit integers (no strings, floats, lists, arrays, dicts, ...) * all proc's return a single 32 bit integer * all proc's must be know at convert time ***Simple example*** Take this simple Tcl procedure as input: ======tcl proc test2 {a b c d e} { return [expr {4+$a+6}] } ====== The [tcl::unsupported::disassemble] output of this example looks like this: ======none ByteCode 0x0x9b4ffe8, refCt 1, epoch 4, interp 0x0x9a9f3b0 (epoch 4) Source "\n return [expr {4+$a+6}]\n" Cmds 2, src 28, inst 10, litObjs 2, aux 0, stkDepth 2, code/src 0.00 Proc 0x0x9b44b38, refCt 1, args 5, compiled locals 5 slot 0, scalar, arg, "a" slot 1, scalar, arg, "b" slot 2, scalar, arg, "c" slot 3, scalar, arg, "d" slot 4, scalar, arg, "e" Commands 2: 1: pc 0-8, src 5-26 2: pc 0-7, src 13-25 Command 1: "return [expr {4+$a+6}]" Command 2: "expr {4+$a+6}" (0) push1 0 # "4" (2) loadScalar1 %v0 # var "a" (4) add (5) push1 1 # "6" (7) add (8) done (9) done ====== Translating it to llvm with the `llvmtcl Tcl2LLVM` command results in: ======none define i32 @test2(i32, i32, i32, i32, i32) { block0: %5 = alloca [100 x i8*] ; <[100 x i8*]*> [#uses=10] %6 = alloca i32 ; [#uses=20] store i32 0, i32* %6 %7 = alloca i32 ; [#uses=2] store i32 %0, i32* %7 %8 = alloca i32 ; [#uses=1] store i32 %1, i32* %8 %9 = alloca i32 ; [#uses=1] store i32 %2, i32* %9 %10 = alloca i32 ; [#uses=1] store i32 %3, i32* %10 %11 = alloca i32 ; [#uses=1] store i32 %4, i32* %11 %push = alloca i32 ; [#uses=2] store i32 4, i32* %push %push1 = load i32* %6 ; [#uses=2] %push2 = getelementptr [100 x i8*]* %5, i32 0, i32 %push1 ; [#uses=1] %12 = bitcast i32* %push to i8* ; [#uses=1] store i8* %12, i8** %push2 %push3 = add i32 %push1, 1 ; [#uses=1] store i32 %push3, i32* %6 %13 = load i32* %7 ; [#uses=1] %push4 = alloca i32 ; [#uses=2] store i32 %13, i32* %push4 %push5 = load i32* %6 ; [#uses=2] %push6 = getelementptr [100 x i8*]* %5, i32 0, i32 %push5 ; [#uses=1] %14 = bitcast i32* %push4 to i8* ; [#uses=1] store i8* %14, i8** %push6 %push7 = add i32 %push5, 1 ; [#uses=1] store i32 %push7, i32* %6 %pop = load i32* %6 ; [#uses=1] %pop8 = add i32 %pop, -1 ; [#uses=2] store i32 %pop8, i32* %6 %pop9 = getelementptr [100 x i8*]* %5, i32 0, i32 %pop8 ; [#uses=1] %pop10 = load i8** %pop9 ; [#uses=1] %pop11 = bitcast i8* %pop10 to i32* ; [#uses=1] %pop12 = load i32* %pop11 ; [#uses=1] %pop13 = load i32* %6 ; [#uses=1] %pop14 = add i32 %pop13, -1 ; [#uses=2] store i32 %pop14, i32* %6 %pop15 = getelementptr [100 x i8*]* %5, i32 0, i32 %pop14 ; [#uses=1] %pop16 = load i8** %pop15 ; [#uses=1] %pop17 = bitcast i8* %pop16 to i32* ; [#uses=1] %pop18 = load i32* %pop17 ; [#uses=1] %15 = add i32 %pop18, %pop12 ; [#uses=1] %push19 = alloca i32 ; [#uses=2] store i32 %15, i32* %push19 %push20 = load i32* %6 ; [#uses=2] %push21 = getelementptr [100 x i8*]* %5, i32 0, i32 %push20 ; [#uses=1] %16 = bitcast i32* %push19 to i8* ; [#uses=1] store i8* %16, i8** %push21 %push22 = add i32 %push20, 1 ; [#uses=1] store i32 %push22, i32* %6 %push23 = alloca i32 ; [#uses=2] store i32 6, i32* %push23 %push24 = load i32* %6 ; [#uses=2] %push25 = getelementptr [100 x i8*]* %5, i32 0, i32 %push24 ; [#uses=1] %17 = bitcast i32* %push23 to i8* ; [#uses=1] store i8* %17, i8** %push25 %push26 = add i32 %push24, 1 ; [#uses=1] store i32 %push26, i32* %6 %pop27 = load i32* %6 ; [#uses=1] %pop28 = add i32 %pop27, -1 ; [#uses=2] store i32 %pop28, i32* %6 %pop29 = getelementptr [100 x i8*]* %5, i32 0, i32 %pop28 ; [#uses=1] %pop30 = load i8** %pop29 ; [#uses=1] %pop31 = bitcast i8* %pop30 to i32* ; [#uses=1] %pop32 = load i32* %pop31 ; [#uses=1] %pop33 = load i32* %6 ; [#uses=1] %pop34 = add i32 %pop33, -1 ; [#uses=2] store i32 %pop34, i32* %6 %pop35 = getelementptr [100 x i8*]* %5, i32 0, i32 %pop34 ; [#uses=1] %pop36 = load i8** %pop35 ; [#uses=1] %pop37 = bitcast i8* %pop36 to i32* ; [#uses=1] %pop38 = load i32* %pop37 ; [#uses=1] %18 = add i32 %pop38, %pop32 ; [#uses=1] %push39 = alloca i32 ; [#uses=2] store i32 %18, i32* %push39 %push40 = load i32* %6 ; [#uses=2] %push41 = getelementptr [100 x i8*]* %5, i32 0, i32 %push40 ; [#uses=1] %19 = bitcast i32* %push39 to i8* ; [#uses=1] store i8* %19, i8** %push41 %push42 = add i32 %push40, 1 ; [#uses=1] store i32 %push42, i32* %6 %top = load i32* %6 ; [#uses=1] %top43 = add i32 %top, -1 ; [#uses=1] %top44 = getelementptr [100 x i8*]* %5, i32 0, i32 %top43 ; [#uses=1] %top45 = load i8** %top44 ; [#uses=1] %top46 = bitcast i8* %top45 to i32* ; [#uses=1] %top47 = load i32* %top46 ; [#uses=1] ret i32 %top47 } ====== Note the 100 location stack being allocated at the beginning, the stack pushes and the stack pops. Running all this through the llvm optimizer results in: ======none define i32 @test2(i32, i32, i32, i32, i32) readonly { block0: %5 = add i32 %0, 10 ; [#uses=1] ret i32 %5 } ====== ***IIR Filter example*** Now consider an IIR filter implemented in Tcl: ======tcl proc low_pass {x x1 x2 y1 y2 C0 C1 C2 C3 C4} { return [expr {$x*$C0 + $x1*$C1 + $x2*$C2 + $y1*$C3 + $y2*$C4}] } ====== Converting and optimizing it with `llvmtcl` gives the following result: ======none define i32 @low_pass(i32, i32, i32, i32, i32, i32, i32, i32, i32, i32) readnone { block0: %10 = mul i32 %5, %0 ; [#uses=1] %11 = mul i32 %6, %1 ; [#uses=1] %12 = mul i32 %7, %2 ; [#uses=1] %13 = mul i32 %8, %3 ; [#uses=1] %14 = mul i32 %9, %4 ; [#uses=1] %15 = add i32 %11, %10 ; [#uses=1] %16 = add i32 %15, %12 ; [#uses=1] %17 = add i32 %16, %13 ; [#uses=1] %18 = add i32 %17, %14 ; [#uses=1] ret i32 %18 } ====== Now also convert a driver function: ======tcl proc filter { } { set y 0 set x1 0 set x2 0 set y1 0 set y2 0 for {set i 0} {$i < 1000} {incr i} { set y [low_pass $i $x1 $x2 $y1 $y2 1 3 -2 4 -5] # Messing with the result to stay within 32 bit if {$y > 1000 || $y < -1000} { set y 1 } else { set y1 $y } set y2 $y1 set y1 $y set x2 $x1 set x1 [expr {$i}] } return $y } ====== The result of `low_pass` is modified so the results stay well within 32 bit boundaries. The optimized result becomes: ======none define i32 @filter() readnone { bb.nph: %0 = alloca [100 x i8*], align 8 ; <[100 x i8*]*> [#uses=2] %push2 = getelementptr [100 x i8*]* %0, i64 0, i64 0 ; [#uses=1] %push410 = getelementptr [100 x i8*]* %0, i64 0, i64 1 ; [#uses=1] %push408474 = alloca i32, align 4 ; [#uses=2] store i32 1000, i32* %push408474 %1 = bitcast i32* %push408474 to i8* ; [#uses=1] store i8* %1, i8** %push410 %push424475 = alloca i32, align 4 ; [#uses=2] store i32 -1, i32* %push424475 %2 = bitcast i32* %push424475 to i8* ; [#uses=1] store i8* %2, i8** %push2 br label %block89 block89: ; preds = %block89, %bb.nph %.0461480 = phi i32 [ 0, %bb.nph ], [ %.0462, %block89 ] ; [#uses=1] %.0464479 = phi i32 [ 0, %bb.nph ], [ %.0465478, %block89 ] ; [#uses=1] %.0465478 = phi i32 [ 0, %bb.nph ], [ %indvar476, %block89 ] ; [#uses=2] %.0466477 = phi i32 [ 0, %bb.nph ], [ %storemerge472, %block89 ] ; [#uses=2] %indvar476 = phi i32 [ 0, %bb.nph ], [ %indvar.next, %block89 ] ; [#uses=4] %tmp721 = add i32 %indvar476, 1000 ; [#uses=1] %indvar.next = add i32 %indvar476, 1 ; [#uses=2] %tmp722 = shl i32 %.0466477, 2 ; [#uses=2] %tmp723 = add i32 %tmp721, %tmp722 ; [#uses=1] %tmp724 = mul i32 %.0465478, 3 ; [#uses=2] %tmp725 = add i32 %tmp723, %tmp724 ; [#uses=1] %tmp726 = mul i32 %.0461480, 5 ; [#uses=1] %tmp727 = shl i32 %.0464479, 1 ; [#uses=1] %tmp728 = add i32 %tmp726, %tmp727 ; [#uses=2] %tmp506.off = sub i32 %tmp725, %tmp728 ; [#uses=1] %not. = icmp ult i32 %tmp506.off, 2001 ; [#uses=2] %tmp706 = add i32 %indvar476, %tmp722 ; [#uses=1] %tmp708 = add i32 %tmp706, %tmp724 ; [#uses=1] %tmp506 = sub i32 %tmp708, %tmp728 ; [#uses=2] %storemerge472 = select i1 %not., i32 %tmp506, i32 1 ; [#uses=2] %.0462 = select i1 %not., i32 %tmp506, i32 %.0466477 ; [#uses=1] %exitcond = icmp eq i32 %indvar.next, 1000 ; [#uses=1] br i1 %exitcond, label %block256, label %block89 block256: ; preds = %block89 ret i32 %storemerge472 } ====== Some [time] data with same filter implemented in C added for reference: ======none tcl [filter]: 1757.0 microseconds per iteration llvm [filter]: 18.5 microseconds per iteration c [filter] : 10.8 microseconds per iteration ====== **Playing [Ffidl]** You can also use LLVM to invoke shared library functions: ======tcl # Example showing how to call library function lappend auto_path . package require llvmtcl # Initialize the JIT llvmtcl LinkInJIT llvmtcl InitializeNativeTarget # Create a module and builder set m [llvmtcl ModuleCreateWithName "testmodule"] set bld [llvmtcl CreateBuilder] # Create a sin function calling math lib's 'double sin(double)' set ft [llvmtcl FunctionType [llvmtcl DoubleType] [list [llvmtcl DoubleType]] 0] set wsin [llvmtcl AddFunction $m "wrapped_sin" $ft] set sin [llvmtcl AddFunction $m "sin" $ft] set entry [llvmtcl AppendBasicBlock $wsin entry] llvmtcl PositionBuilderAtEnd $bld $entry # Call a c function set rt [llvmtcl BuildCall $bld $sin [list [llvmtcl GetParam $wsin 0]] "call"] # Set return llvmtcl BuildRet $bld $rt # Create a function to calculate sqrt(a^2+b^2) set ft2 [llvmtcl FunctionType [llvmtcl DoubleType] [list [llvmtcl DoubleType] [llvmtcl DoubleType]] 0] set pyth [llvmtcl AddFunction $m "pyth" $ft2] set pow [llvmtcl AddFunction $m "pow" $ft2] set sqrt [llvmtcl AddFunction $m "sqrt" $ft] set entry [llvmtcl AppendBasicBlock $pyth entry] llvmtcl PositionBuilderAtEnd $bld $entry set a2 [llvmtcl BuildCall $bld $pow [list [llvmtcl GetParam $pyth 0] [llvmtcl ConstReal [llvmtcl DoubleType] 2]] "a2"] set b2 [llvmtcl BuildCall $bld $pow [list [llvmtcl GetParam $pyth 1] [llvmtcl ConstReal [llvmtcl DoubleType] 2]] "b2"] set c2 [llvmtcl BuildFAdd $bld $a2 $b2 "c2"] set c [llvmtcl BuildCall $bld $sqrt [list $c2] "c"] llvmtcl BuildRet $bld $c # Verify the module puts [llvmtcl DumpModule $m] lassign [llvmtcl VerifyModule $m LLVMReturnStatusAction] rt msg if {$rt} { error $msg } # Optimize llvmtcl Optimize $m [list $wsin $pyth] puts [llvmtcl DumpModule $m] # Execute 'wsin' lassign [llvmtcl CreateJITCompilerForModule $m 0] rt EE msg set i [llvmtcl CreateGenericValueOfFloat [llvmtcl DoubleType] 0.5] set res [llvmtcl RunFunction $EE $wsin $i] puts "sin(0.5) = [llvmtcl GenericValueToFloat [llvmtcl DoubleType] $res]" # Execute 'pyth' set a [llvmtcl CreateGenericValueOfFloat [llvmtcl DoubleType] 3] set b [llvmtcl CreateGenericValueOfFloat [llvmtcl DoubleType] 4] set res [llvmtcl RunFunction $EE $pyth [list $a $b]] puts "sqrt(pow(3,2)+pow(4,2)) = [llvmtcl GenericValueToFloat [llvmtcl DoubleType] $res]" # Cleanup llvmtcl DisposeModule $m ====== Looking at the optimized code, note how LLVM could avoid the call to `pow`: ======none define double @wrapped_sin(double) { entry: %call = tail call double @sin(double %0) ; [#uses=1] ret double %call } declare double @sin(double) define double @pyth(double, double) { entry: %pow2 = fmul double %0, %0 ; [#uses=1] %pow21 = fmul double %1, %1 ; [#uses=1] %c2 = fadd double %pow2, %pow21 ; [#uses=1] %c = tail call double @sqrt(double %c2) ; [#uses=1] ret double %c } ====== **Discussion** '''[FB] - 2010-05-27 04:29:35''' Impressive. You may want to look at the Hindley-Milner algorithm for type inference: http://www.codecommit.com/blog/scala/what-is-hindley-milner-and-why-is-it-cool Another option is trace tree compiling à la TraceMonkey, using the bytecode stream as input: http://www.ics.uci.edu/~franz/Site/pubs-pdf/ICS-TR-06-16.pdf ---- '''[schlenk] - 2010-05-27''' I didn't like all the namespace imports and the long names so i defined a little namespace ensemble like this, maybe you think its a good idea to use that too? ======tcl proc create_llvm_ensemble {} { set map {} foreach cmd [info commands ::llvmtcl::LLVM*] { lappend map [string range $cmd 15 end] $cmd } namespace ensemble create -command ::llvm -map [dict create {*}$map] } ====== Usage like this: ======tcl package require llvmtcl create_llvm_ensemble llvm LinkInJIT llvm InitializeNativeTarget ... ====== ---- '''[jdc] - 28-may-2010''' Good proposal. I converted the package to make the commands an ensemble. <>Package