10

I wrote a small C program below:

#include <stdlib.h>
int sub(int x, int y){
  return 2*x+y;
}

int main(int argc, char ** argv){
  int a;
  a = atoi(argv[1]);
  return sub(argc,a);
}

Compiled with gcc 5.4.0 and target 32 bit x86. I got the following in disassembly:

0804841b <main>:
 804841b: 8d 4c 24 04           lea    0x4(%esp),%ecx
 804841f: 83 e4 f0              and    $0xfffffff0,%esp
 8048422: ff 71 fc              pushl  -0x4(%ecx)
 8048425: 55                    push   %ebp
 8048426: 89 e5                 mov    %esp,%ebp
 8048428: 53                    push   %ebx
 8048429: 51                    push   %ecx
 804842a: 83 ec 10              sub    $0x10,%esp
 804842d: 89 cb                 mov    %ecx,%ebx
....

What are the first three instructions before push %ebp doing? I haven't seen those in older gcc compiled binaries.

2 Answers 2

19

What the Instructions Are Doing

What are the first three instructions before push %ebp doing?

Namely,

 804841b: 8d 4c 24 04           lea    0x4(%esp),%ecx      <-  1
 804841f: 83 e4 f0              and    $0xfffffff0,%esp    <-  2
 8048422: ff 71 fc              pushl  -0x4(%ecx)          <-  3

This is easy to see if gdb (or some other debugger) is used to step through the code.

  1. 804841b: 8d 4c 24 04 lea 0x4(%esp),%ecx

At this point in the process, the memory address in register $esp is 0xffffd13c, so 4(%esp) = $esp+4 = 0xffffd140:

>>> x/x $esp+4
0xffffd140: 0x01

This means that the lea instruction loads the effective address of 0x4(%esp), 0xffffd140, into $ecx.


  1. 804841f: 83 e4 f0 and $0xfffffff0,%esp

Next, the value in $esp, 0xffffd13c, is ANDed with 0xfffffff0:

0xffffd13c:            11111111111111111101000100111100
0xfffffff0:       AND  11111111111111111111111111110000
                  -------------------------------------
                       11111111111111111101000100110000

This results in the value 0xffffd130, which is stored in $esp. This is equivalent to

0xffffd13c - 0x0c = 0xffffd130.

This has the effect of creating 12 bytes of space on the process runtime stack. On a side note, the value -16 would be represented as 0xfffffff0, so we could think of

and $0xfffffff0,%esp

as

and $-16,%esp

This is done to keep the stack aligned to a 16-byte boundary, since the next instruction (see 3) decrements the stack pointer by 4 and then saves a value to the stack.


  1. 8048422: ff 71 fc pushl -0x4(%ecx)

As a result of lea 0x4(%esp),%ecx from earlier, the value in $ecx is equivalent to what had been $esp+4 (that is, 0xffffd140). As a result,

-0x4(%ecx) = 0xffffd140 - 4 = 0xffffd13c.

This was the value of $esp at the beginning of main(). This value is now saved on the process runtime stack via a pushl instruction.


summary:

 lea    0x4(%esp),%ecx         // load 0xffffd140 into $ecx
 and    $0xfffffff0,%esp       // subtract 0x0c (decimal 12) from $esp
 pushl  -0x4(%ecx)             // decrement $esp by 4, save 0xffffd13c on stack

The Purpose of these Instructions

What is the purpose of these instructions before the main preamble?

A clue about the purpose of these instructions is the fact that they are executed prior to the conventional function prologue:

8048425: 55                    push   %ebp
8048426: 89 e5                 mov    %esp,%ebp

According to the System V Application Binary Interface Intel386 Architecture Processor Supplment, Fourth Edition, after the execution of the function prologue $ebp+4 is the location on the runtime stack of the return address.

SYS V ABI i386 supplement C stack frame

The address saved on the stack at $ebp+4 by the instruction

8048422: ff 71 fc pushl -0x4(%ecx)

is 0xffffd13c. This is a pointer to 0xf7e12637, the address of offset 247 in __libc_start_main():

>>> x/x $ecx-4
0xffffd13c: 0xf7e12637
>>> x/x 0xf7e12637
0xf7e12637 <__libc_start_main+247>: 0x8310c483

This indicates that the return address of main() is in function __libc_start_main().

As for $ecx, this register simply holds the value of argc:

>>> x/x $ecx
0xffffd140: 0x00000001

Note that since variable a is never used, the compiler optimizes out the call to atoi.

So to answer the question directly, the instructions in main() prior to the prologue pass an argument to main() (the value of argc) and save the return address of main() on the runtime stack.

The C Runtime Environment and Linux Process Anatomy

Naturally, the next question is "What is __libc_start_main?" According to Linux Standard Base PDA Specification 3.0RC1:

The __libc_start_main() function shall initialize the process, call the main function with appropriate arguments, and handle the return from main().

So where does __libc_start_main() come from? The short answer is that it is a function in the shared object /lib/i386-linux-gnu/libc-2.23.so which is dynamically linked into the executable ELF binary:

 $ ldd [binary_name]
    linux-gate.so.1 =>  (0xf7764000)
    libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7586000)
    /lib/ld-linux.so.2 (0x56640000)

In addition to __libc_start_main(), the function __gmon_start__, also part of process initialization, is dynamically linked to the executable ELF binary as well:

$ readelf --dyn-syms [binary_name]

Symbol table '.dynsym' contains 5 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 FUNC    GLOBAL DEFAULT  UND __stack_chk_fail@GLIBC_2.4 (2)
     2: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 00000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.0 (3)
     4: 0804851c     4 OBJECT  GLOBAL DEFAULT   16 _IO_stdin_used

Here is the complete picture, from Linux x86 Program Start Up or - How the heck do we get to main()? by Patrick Horgan:

C process initialization call graph

On a final note, if the return address of main() of 0xf7e12637 is examined more closely, we see that this address lies outside of the text segment as well as the runtime stack. This address, located in __libc_start_main(), is actually located in the memory-mapped segment in virtual memory, as shown by this diagram from Gustavo Duarte's article Anatomy of a Program in Memory:

Linux Process Layout in VM

0
7

What does this do?

These three statements serve to move the stackframe of main, beginning with its return address, to the next 16-byte-aligned address.

lea    0x4(%esp),%ecx    # save address of arguments
and    $0xfffffff0,%esp  # align stack
pushl  -0x4(%ecx)        # move return address
...                      # continue normal preamble

At the same time, the arguments to main (argc and argv) are not moved, so a pointer to them is saved in %ecx.

Recall the layout of the stack upon entering main:

%esp+8:  argv (a pointer to an array of pointers)
%esp+4:  argc (a 32-bit integer)
%esp+0:  return address (from call)

The arguments sit right above the return address, so %esp+4 is saved to %ecx before the stack pointer is adjusted. Next, %ecx also serves as our pointer to locate the original return address, -4(%ecx), which we push to our new stack frame.

After the rest of the preamble, the stack will look like this:

%ecx+4:  argv pointer
%ecx+0:  argc
%ecx-4:  original return address
         ...
%esp+4:  copy of return address
%esp+0:  saved base pointer

In your code, you can also see that %ecx is pushed onto the stack (i.e. saved as a local variable) after the preamble; it will be restored from there at the end of the function which will look like this:

...
mov    -0x8(%ebp),%ecx   # load pointer to argc
leave                    # unwind stack frame, pop %ebp
lea    -0x4(%ecx),%esp   # restore original stack pointer
ret                      # jump out, using the original return address!

Why is all this done at all?

Modern processors like data aligned to 16-byte boundaries for various reasons; some operations may take significant performance hits otherwise, others might not work at all.

Adjusting the main stack frame once allows the rest of the code to run without further adjustment as long as care is taken to always allocate stack in multiples of 16 bytes before a call. That is why you will often see something like this:

sub    $0xc,%esp    # pad stack by 12 bytes
push   %eax         # push 4-byte argument
call   puts

NB: The x86-64 ABI makes the 16-byte stack alignment mandatory. Incidentally this means that you will not find a frame adjustment on main in 64-bit code - the stack is already aligned.

3
  • Welcome! Based on what you have written so far, I look forward to reading your future posts.
    – julian
    Commented Aug 3, 2018 at 23:30
  • Thanks! I came here to look this up and afterwards felt that, while your answer is quite elaborate, it was missing a few details. Since as a new user I couldn't comment, I took a shot at my own. Hope you don't mind! :)
    – pesco
    Commented Aug 4, 2018 at 13:14
  • 1
    I like this better than @SYS_V's answer (no offense to SYS_V). I don't believe SYS_V's answer address "what is the purpose". It does a great job of explaining what the instructions do. The answer seems to be very very simple. Above and beyond the obvious alignment-optimization "At the same time, the arguments to main (argc and argv) are not moved, so a pointer to them is saved in %ecx." Beautiful. Thanks a ton. Commented Nov 6, 2018 at 22:02

Not the answer you're looking for? Browse other questions tagged or ask your own question.