Playing Assembler

Gerald Lester - Actually this is a simulator, an Assembler would take the code and spit out a binary object file to run on the hardware. Of course a simulator is *much* harder to write than an Assembler, which makes this all the more impressive!

Richard Suchenwirth - In the Tcl and other languages series, one of the oldest was still missing: Assembler (sort of a symbolic wrapper around pure machine code). Now here is the beginning for that... As this is a weekend fun project, I picked those parts of Intel 8080A/8085 Assembler (because I had a detailed reference handy) that are easily implemented and still somehow educational (or nostalgic ;-).

Of course this is no real assembler. The memory model is constant-size instructions (strings in array elements), which are implemented as Tcl procs. So an "assembler" program in this plaything will run even slower than in pure Tcl, and consume more memory - while normally you associate speed and conciseness with "real" assembler code. But it looks halfway like the real thing: you get sort of an assembly listing with symbol table, and can run it - I'd hardly start writing an assembler in C, but in Tcl it's fun for a sunny Sunday afternoon...

 namespace eval asm {
    proc asm body {
        variable mem
        catch {unset mem} ;# good for repeated sourcing
        foreach line [split $body \n] {
            foreach i {label op args} {set $i ""}
            regexp {([^;]*);} $line -> line ;# strip off comments
            regexp {^ *(([A-Z0-9]+):)? *([A-Z]*) +(.*)} [string toupper $line]\
                 ->  -   label           op       args
                 puts label=$label,op=$op,args=$args
            if {$label!=""} {set sym($label) $PC}
            if {$op==""}     continue
            if {$op=="DB"}  {set mem($PC) [convertHex $args]; incr PC; continue}
            if {$op=="EQU"} {set sym($label) [convertHex $args]; continue}
            if {$op=="ORG"} {set PC [convertHex $args]; continue}
            regsub -all ", *" $args " " args ;# normalize commas
            set mem($PC) "$op $args"
            incr PC
        substituteSymbols sym
        dump   sym
    proc convertHex s {
        if [regexp {^([0-9A-F]+)H$} [string trim $s] -> s] {set s [expr 0x$s]}
        set s
    proc substituteSymbols {_sym} {
        variable mem
        upvar $_sym sym
        foreach i [array names mem] {
            set tmp [lindex $mem($i) 0]
            foreach j [lrange $mem($i) 1 end] {
                if {[array names sym $j]==$j} {set j $sym($j)}
                lappend tmp $j
            set mem($i) $tmp
    proc dump {_sym} {
        variable mem
        upvar $_sym sym
        foreach i [lsort -integer [array names mem]] {
            puts [format "%04d %s" $i $mem($i)]
        foreach i [lsort [array names sym]] {
            puts [format "%-10s: %04x" $i $sym($i)]
    proc run {{pc 255}} {
        variable mem
        foreach i {A B C D E Z} {set ::$i 0}
        while {$pc>=0} {
            incr pc
            #puts "$mem($pc)\tA:$::A B:$::B C:$::C D:$::D E:$::E Z:$::Z"
            eval $mem($pc)
#----------------- "machine opcodes" implemented as procs
    proc ADD  {reg reg2}  {set ::Z [incr ::$reg [set ::$reg2]]}
    proc ADI  {reg value} {set ::Z [incr ::$reg $value]}
    proc CALL {name}      {[string tolower $name] $::A}
    proc DCR  {reg}       {set ::Z [incr ::$reg -1]}
    proc INR  {reg}       {set ::Z [incr ::$reg]}
    proc JMP  where       {uplevel 1 set pc [expr $where-1]}
    proc JNZ  where       {if $::Z {uplevel 1 JMP $where}}
    proc JZ   where       {if !$::Z {uplevel 1 JMP $where}}
    proc MOV  {reg adr}   {variable mem; set ::$reg $mem($adr)}
    proc MVI  {reg value} {set ::$reg $value}

#-- Now testing:
 asm::asm {
        org  100     ; the canonical start address in CP/M
        jmp  START   ; idiomatic: get over the initial variable(s)
 DONE:  equ  0       ; warm start in CP/M ;-)
 MAX:   equ  5
 INCR:  db   2       ; a variable (though we won't vary it)
 ;; here we go...
 START: mvi  c,MAX   ; set count limit
        mvi  a,0     ; initial value
        mov  b,INCR
 LOOP:  call puts    ; for now, fall back to Tcl for I/O
        inr  a
        add  a,b     ; just to make adding 1 more complicated
        dcr  c       ; counting down..
        jnz  LOOP    ; jump on non-zero to LOOP
        jmp  DONE    ; end of program

The mov b,INCR part is an oversimplification. For a real 8080, one would have to say

 LXI H,INCR ; load double registers H+L with the address INCR
 MOV B,M    ; load byte to register B from the address pointed to in HL

Since the pseudo-register M can also be used for writing back, it cannot be implemented by simply copying the value. Rather, one could use read and write traces on variable M, causing it to load from, or store to, mem($HL). Maybe another weekend...

Keith Vetter - How about a MIX simulator so I can get all of Knuth's algorithms working?

escargo 26 Mar 2004 - I have sometimes wondered if there should be an assembly language level interface to the Tcl byte code.

  • AK See TAL "Tcl Assembly Language".

JM 11 Nov 2012 - In order to actually running this example, I had to change 2 things:

  • I added the line:
 incr pc -1

to the top of the "run" proc, so you can start the execution like this:

 asm::run 100

instead of:

 asm::run 99

when the starting address is precisely 100d,and...

  • I added the line:
 set s [string trim $s]

to the top of the "convertHex" proc, so the ORG's args are properly processed when such arg is decimal not hex.

After running the example, the output is:

 0100 JMP 102
 0101 2
 0102 MVI C 5
 0103 MVI A 0
 0104 MOV B 101
 0106 INR A
 0107 ADD A B
 0108 DCR C
 0109 JNZ 105
 0110 JMP 0
 DONE      : 0000
 INCR      : 0065
 LOOP      : 0069
 MAX       : 0005
 START     : 0066
 (code) 1 % asm::run 100
 (code) 2 %

See Also: Playing PIC Simulation