Assembly code example

TV The C code from [Tcl in comparison]:

 int countdown(n)
 int n;
 {
    int i;
    for (i=n; i>0; i--) {
       printf("%d...\n", i);
    }
 }

in more traditional, non-cross source file argument checking, non-ansi notation, can also be represented in assembly code. For the above, using the gnu C compiler under cygwin, the following assembly can be generated, which looks obfuscated, because of the complex or hard to see through variable space and stack handling:

 LC0:
         .ascii "%d...\12\0"
 .globl _countdown
         .def    _countdown;     .scl    2;      .type   32;     .endef
 _countdown:
         pushl   %ebp
         movl    %esp, %ebp
         subl    $24, %esp
         movl    8(%ebp), %eax
         movl    %eax, -4(%ebp)
 L10:
         cmpl    $0, -4(%ebp)
         jg      L13
         jmp     L11
 L13:
         movl    -4(%ebp), %eax
         movl    %eax, 4(%esp)
         movl    $LC0, (%esp)
         call    _printf
         leal    -4(%ebp), %eax
         decl    (%eax)
         jmp     L10
 L11:
         leave
         ret

Or, with the optimizer (-O) on, and more verbose assembly code:

         .text
 LC0:
         .ascii "%d...\12\0"
 .globl _countdown
         .def    _countdown;     .scl    2;      .type   32;     .endef
 _countdown:
         pushl   %ebp
         movl    %esp, %ebp
         pushl   %ebx
         subl    $20, %esp
         movl    8(%ebp), %ebx    #  n,  i
         testl   %ebx, %ebx       #  i
         jle     L8
 L6:
         movl    %ebx, 4(%esp)    #  i
         movl    $LC0, (%esp)
         call    _printf
         decl    %ebx     #  i
         testl   %ebx, %ebx       #  i
         jg      L6
 L8:
         addl    $20, %esp
         popl    %ebx
         popl    %ebp
         ret

This irrespective of another possible optimisation step in the assembler, to parallelize adjacent instuctions.

Of course the implicit register assignment of the counter and loop test variable, which apart from optimizer we could also define in the C variable declaration, doesn't make all to much sense in the light of the relatively time consuming printf call.

In Intel mnemonics, the last becomes:

         .text
 LC0:
         .ascii "%d...\12\0"
 .globl _countdown
         .def    _countdown;     .scl    2;      .type   32;     .endef
 _countdown:
         push    ebp
         mov     ebp, esp
         push    ebx
         sub     esp, 20
         mov     ebx, DWORD PTR [ebp+8]   #  i,  n
         test    ebx, ebx         #  i
         jle     L8
 L6:
         mov     DWORD PTR [esp+4], ebx   #  i
         mov     DWORD PTR [esp], OFFSET FLAT:LC0
         call    _printf
         dec     ebx      #  i
         test    ebx, ebx         #  i
         jg      L6
 L8:
         add     esp, 20
         pop     ebx
         pop     ebp
         ret

With extreme optimisation by the compiler using -O4, the function may end up macrofied, so that it is no longer called as a subroutine.