[jdc] 21-may-2010 To learn [LLVM] I made a wrapper got LLVM's C API. This wrapper is available at: http://github.com/jdc8/llvmtcl **Building the wrapper** The wrapper uses LLVM's C API as found in LLVM's header file `Core.h` in `include\llvm-c`. Requirements: * [LLVM] 2.7 * [Tcl] 8.5 (most test have been done with Tcl HEAD) There is a `Makefile` to build the extension; 1. Edit the `Makefile` to specify the paths to your Tcl and LLVM. 1. Run `make` to build the extension. 1. You'll also have to add LLVM's lib path to `LD_LIBRARY_PATH`. 1. Run `make test` to check it the extension is working **Using the LLVM API** Building a LLVM module and function: ====== lappend auto_path . package require llvmtcl namespace import llvmtcl::* # Initialize the JIT LLVMLinkInJIT LLVMInitializeNativeTarget # Create a module and builder set m [LLVMModuleCreateWithName "testmodule"] set bld [LLVMCreateBuilder] # Create a plus10 function, taking one argument and adding 6 and 4 to it set ft [LLVMFunctionType [LLVMInt32Type] [list [LLVMInt32Type]] 0] set plus10 [LLVMAddFunction $m "plus10" $ft] # Create constants set c6 [LLVMConstInt [LLVMInt32Type] 6 0] set c4 [LLVMConstInt [LLVMInt32Type] 4 0] # Create the basic blocks set entry [LLVMAppendBasicBlock $plus10 entry] # Put arguments on the stack to avoid having to write select and/or phi nodes LLVMPositionBuilderAtEnd $bld $entry set arg0_1 [LLVMGetParam $plus10 0] set arg0_2 [LLVMBuildAlloca $bld [LLVMInt32Type] arg0] set arg0_3 [LLVMBuildStore $bld $arg0_1 $arg0_2] # Do add 10 in two steps to see the optimizer @ work # Add 6 set arg0_4 [LLVMBuildLoad $bld $arg0_2 "arg0"] set add6 [LLVMBuildAdd $bld $arg0_4 $c6 "add6"] # Add 4 set add4 [LLVMBuildAdd $bld $add6 $c4 "add4"] # Set return LLVMBuildRet $bld $add4 # Show input puts "----- Input --------------------------------------------------" puts [LLVMDumpModule $m] # Verify the module lassign [LLVMVerifyModule $m LLVMReturnStatusAction] rt msg if {$rt} { error $msg } ====== This results in the following LLVM bit code: ====== ; ModuleID = 'testmodule' define i32 @plus10(i32) { entry: %arg0 = alloca i32 ; [#uses=2] store i32 %0, i32* %arg0 %arg01 = load i32* %arg0 ; [#uses=1] %add6 = add i32 %arg01, 6 ; [#uses=1] %add4 = add i32 %add6, 4 ; [#uses=1] ret i32 %add4 } ====== Now execute it: ====== # Execute lassign [LLVMCreateJITCompilerForModule $m 0] rt EE msg set i [LLVMCreateGenericValueOfInt [LLVMInt32Type] 4 0] set res [LLVMRunFunction $EE $plus10 $i] puts "plus10(4) = [LLVMGenericValueToInt $res 0]\n" ====== Now optimize the LLVM module: ====== # Optimize set td [LLVMCreateTargetData ""] LLVMSetDataLayout $m [LLVMCopyStringRepOfTargetData $td] LLVMOptimizeFunction $m $plus10 3 $td LLVMOptimizeModule $m 3 0 1 1 1 0 $td ====== Result of optimization: ====== ; ModuleID = 'testmodule' target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64" define i32 @plus10(i32) readnone { entry: %add4 = add i32 %0, 10 ; [#uses=1] ret i32 %add4 } ====== **Transforming Tcl into LLVM bit code** The LLVM wrapper has limited support for converting Tcl into LLVM bit code and is based on the output of [tcl::unsupported::disassemble]. Current (stringent) limitation are: * all variables are 32 bit integers (no strings, floats, lists, arrays, dicts, ...) * all proc's return a single 32 bit integer * all proc's must be know at convert time Take this simple Tcl procedure as input: ====== proc test2 {a b c d e} { return [expr {4+$a+6}] } ====== The [tcl::unsupported::disassemble] output of this example looks like this: ====== ByteCode 0x0x9b4ffe8, refCt 1, epoch 4, interp 0x0x9a9f3b0 (epoch 4) Source "\n return [expr {4+$a+6}]\n" Cmds 2, src 28, inst 10, litObjs 2, aux 0, stkDepth 2, code/src 0.00 Proc 0x0x9b44b38, refCt 1, args 5, compiled locals 5 slot 0, scalar, arg, "a" slot 1, scalar, arg, "b" slot 2, scalar, arg, "c" slot 3, scalar, arg, "d" slot 4, scalar, arg, "e" Commands 2: 1: pc 0-8, src 5-26 2: pc 0-7, src 13-25 Command 1: "return [expr {4+$a+6}]" Command 2: "expr {4+$a+6}" (0) push1 0 # "4" (2) loadScalar1 %v0 # var "a" (4) add (5) push1 1 # "6" (7) add (8) done (9) done ====== Translating it to llvm with the `llvmtcl::Tcl2LLVM` command results in: ====== define i32 @test2(i32, i32, i32, i32, i32) { block0: %5 = alloca [100 x i8*] ; <[100 x i8*]*> [#uses=10] %6 = alloca i32 ; [#uses=20] store i32 0, i32* %6 %7 = alloca i32 ; [#uses=2] store i32 %0, i32* %7 %8 = alloca i32 ; [#uses=1] store i32 %1, i32* %8 %9 = alloca i32 ; [#uses=1] store i32 %2, i32* %9 %10 = alloca i32 ; [#uses=1] store i32 %3, i32* %10 %11 = alloca i32 ; [#uses=1] store i32 %4, i32* %11 %push = alloca i32 ; [#uses=2] store i32 4, i32* %push %push1 = load i32* %6 ; [#uses=2] %push2 = getelementptr [100 x i8*]* %5, i32 0, i32 %push1 ; [#uses=1] %12 = bitcast i32* %push to i8* ; [#uses=1] store i8* %12, i8** %push2 %push3 = add i32 %push1, 1 ; [#uses=1] store i32 %push3, i32* %6 %13 = load i32* %7 ; [#uses=1] %push4 = alloca i32 ; [#uses=2] store i32 %13, i32* %push4 %push5 = load i32* %6 ; [#uses=2] %push6 = getelementptr [100 x i8*]* %5, i32 0, i32 %push5 ; [#uses=1] %14 = bitcast i32* %push4 to i8* ; [#uses=1] store i8* %14, i8** %push6 %push7 = add i32 %push5, 1 ; [#uses=1] store i32 %push7, i32* %6 %pop = load i32* %6 ; [#uses=1] %pop8 = add i32 %pop, -1 ; [#uses=2] store i32 %pop8, i32* %6 %pop9 = getelementptr [100 x i8*]* %5, i32 0, i32 %pop8 ; [#uses=1] %pop10 = load i8** %pop9 ; [#uses=1] %pop11 = bitcast i8* %pop10 to i32* ; [#uses=1] %pop12 = load i32* %pop11 ; [#uses=1] %pop13 = load i32* %6 ; [#uses=1] %pop14 = add i32 %pop13, -1 ; [#uses=2] store i32 %pop14, i32* %6 %pop15 = getelementptr [100 x i8*]* %5, i32 0, i32 %pop14 ; [#uses=1] %pop16 = load i8** %pop15 ; [#uses=1] %pop17 = bitcast i8* %pop16 to i32* ; [#uses=1] %pop18 = load i32* %pop17 ; [#uses=1] %15 = add i32 %pop18, %pop12 ; [#uses=1] %push19 = alloca i32 ; [#uses=2] store i32 %15, i32* %push19 %push20 = load i32* %6 ; [#uses=2] %push21 = getelementptr [100 x i8*]* %5, i32 0, i32 %push20 ; [#uses=1] %16 = bitcast i32* %push19 to i8* ; [#uses=1] store i8* %16, i8** %push21 %push22 = add i32 %push20, 1 ; [#uses=1] store i32 %push22, i32* %6 %push23 = alloca i32 ; [#uses=2] store i32 6, i32* %push23 %push24 = load i32* %6 ; [#uses=2] %push25 = getelementptr [100 x i8*]* %5, i32 0, i32 %push24 ; [#uses=1] %17 = bitcast i32* %push23 to i8* ; [#uses=1] store i8* %17, i8** %push25 %push26 = add i32 %push24, 1 ; [#uses=1] store i32 %push26, i32* %6 %pop27 = load i32* %6 ; [#uses=1] %pop28 = add i32 %pop27, -1 ; [#uses=2] store i32 %pop28, i32* %6 %pop29 = getelementptr [100 x i8*]* %5, i32 0, i32 %pop28 ; [#uses=1] %pop30 = load i8** %pop29 ; [#uses=1] %pop31 = bitcast i8* %pop30 to i32* ; [#uses=1] %pop32 = load i32* %pop31 ; [#uses=1] %pop33 = load i32* %6 ; [#uses=1] %pop34 = add i32 %pop33, -1 ; [#uses=2] store i32 %pop34, i32* %6 %pop35 = getelementptr [100 x i8*]* %5, i32 0, i32 %pop34 ; [#uses=1] %pop36 = load i8** %pop35 ; [#uses=1] %pop37 = bitcast i8* %pop36 to i32* ; [#uses=1] %pop38 = load i32* %pop37 ; [#uses=1] %18 = add i32 %pop38, %pop32 ; [#uses=1] %push39 = alloca i32 ; [#uses=2] store i32 %18, i32* %push39 %push40 = load i32* %6 ; [#uses=2] %push41 = getelementptr [100 x i8*]* %5, i32 0, i32 %push40 ; [#uses=1] %19 = bitcast i32* %push39 to i8* ; [#uses=1] store i8* %19, i8** %push41 %push42 = add i32 %push40, 1 ; [#uses=1] store i32 %push42, i32* %6 %top = load i32* %6 ; [#uses=1] %top43 = add i32 %top, -1 ; [#uses=1] %top44 = getelementptr [100 x i8*]* %5, i32 0, i32 %top43 ; [#uses=1] %top45 = load i8** %top44 ; [#uses=1] %top46 = bitcast i8* %top45 to i32* ; [#uses=1] %top47 = load i32* %top46 ; [#uses=1] ret i32 %top47 } ====== Note the 100 location stack being allocated at the beginning, the stack pushes and the stack pops. Running all this through the llvm optimizer results in: ====== define i32 @test2(i32, i32, i32, i32, i32) readonly { block0: %5 = add i32 %0, 10 ; [#uses=1] ret i32 %5 } ====== <>Enter Category Here