3
$\begingroup$

A compiler I'm writing generates the following x86-64 assembly (AT&T syntax) for a recursive factorial function. I convert the assembly into an ELF executable using gcc. But, when I execute it, the output is always some garbage number and not the desired output 120.

Source Code:

fn factorial(num: i32) -> i32 {
    if num == 0 {
        return 1;
    } else {
        return num * factorial(num - 1);
    }
}

fn main() {
    println(factorial(5));
}

Generated Assembly:

.globl factorial, main
.format_number:
        .string "%d\n"
factorial:
        pushq %rbp
        movq  %rsp, %rbp
        subq  $4, %rsp           # allocating 4 bytes for parameter num
        movl  %edi, -4(%rbp)
        pushq %rbx
        pushq %r12
        pushq %r13
        pushq %r14
        pushq %r15
        movl  -4(%rbp), %ebx
        movl  $0, %r10d
        cmpl  %r10d, %ebx
        jne   .L0
        movl  $1, %ebx
        movl  %ebx, %eax
        jmp   .factorial_epilogue
        jmp   .L1
.L0:
        movl  -4(%rbp), %ebx
        andq  $-16, %rsp
        movl  -4(%rbp), %r10d
        movl  $1, %r11d
        subl  %r11d, %r10d
        movl  %r10d, %edi
        pushq %r10
        pushq %r11
        call  factorial
        popq  %r11
        popq  %r10
        movl  %eax, %r11d
        imull %ebx, %r11d
        movl  %r11d, %eax
        jmp   .factorial_epilogue
.L1:
.factorial_epilogue:
        popq  %r15
        popq  %r14
        popq  %r13
        popq  %r12
        popq  %rbx
        movq  %rbp, %rsp
        popq  %rbp
        ret
main:
        pushq %rbp
        movq  %rsp, %rbp
        subq  $0, %rsp
        pushq %rbx
        pushq %r12
        pushq %r13
        pushq %r14
        pushq %r15
        andq  $-16, %rsp
        movl  $5, %ebx
        movl  %ebx, %edi
        pushq %r10
        pushq %r11
        call  factorial
        popq  %r11
        popq  %r10
        movl  %eax, %r11d
        andq  $-16, %rsp
        movl  %r11d, %esi
        leaq  .format_number(%rip), %rax
        movq  %rax, %rdi
        xor   %eax, %eax
        call  printf@PLT
.main_epilogue:
        popq  %r15
        popq  %r14
        popq  %r13
        popq  %r12
        popq  %rbx
        movq  %rbp, %rsp
        popq  %rbp
        ret
.data

I thought this was an issue with stack alignment during function calls and fixed it with andq $-16, %rsp before every function call to align the stack pointer to a 16 byte boundary. Now, memcheck does not detect any errors but the output is still some garbage number. The same assembly was working fine when all the numbers were 8 byte values but I don't understand why it doesn't work when numbers are 4 byte values. Anyone know how to find what's wrong in the generated assembly?

Edit: Currently, the code generator attempts to align the stack by doing an andq before every function call. Then, the function prologue pushes 5 * 8 bytes (callee saved registers) and also allocates the exact amount of space required for local variables. For example, it allocates 40 + 4 bytes if the function has an i32 parameter only.

$\endgroup$
3

1 Answer 1

4
$\begingroup$

Your way of aligning the stack messes with the restoration of saved registers. You should keep track of stack alignment in the compiler, and manage it with sub and add before and after every call. Currently, you don't reverse your change to rsp, nor have a simple way to

$\endgroup$
1
  • 1
    $\begingroup$ sub and add around the call are not a conventional way to do it on x64 under Linux (nor Windows either, but random other OSes may have random conventions). The typical way to do this is to set rsp up correctly (including space to save caller-save registers and to pass arguments) in the prologue and then never change it in the body of the function. That works well with the table-based exception unwinding. $\endgroup$
    – user1030
    Commented Jul 19, 2023 at 17:38

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .