Bare-metal start-up code for Cortex M3 .bss region initialization

Question

I have developed inspired from here a bare metal start-up code for arm cortex M3. However, I encounter the following problem: suppose I declare an uninitialized global variable, say of type unsigned char in main.c

#include ...
unsigned char var; 
...
int main()
{
 ...
}

this makes the .bss region in STM32 f103 starting at _BSS_START=0x20000000 and ending at _BSS_END = 0x20000001. Now, the start up code

    unsigned int * bss_start_p = &_BSS_START; 
    unsigned int * bss_end_p = &_BSS_END;

    while(bss_start_p != bss_end_p)
    {
        *bss_start_p = 0;
        bss_start_p++;
    }

tries to initialize to zero the whole .bss region. However, inside that while loop the pointer increases with 4 bytes, therefore after one step bss_start_p = 0x20000004 hence it will always be different than bss_end_p which leads to an infinite loop etc.

Is there any standard solution to this? Am I suppose to "force" somehow the dimension of the .bss region to be a multiple of 4? Or should I use a pointer to unsigned char to walk through .bss region? Perhaps something like:

    unsigned char * bss_start_p = (unsigned char *)(&_BSS_START); 
    unsigned char * bss_end_p = (unsigned char *)(&_BSS_END);

    while(bss_start_p != bss_end_p)
    {
        *bss_start_p = 0;
        bss_start_p++;
    }
```

use less than. bootstraps are written in assembly for a reason. first off now you have created a .data problem. its a chicken and egg thing to use/assume that C works you rely on .text, .bss and .data at a minimum, but you are writing C code that makes sure C code will work, using things in C code that requires a bootstrap possibly written in C code that relies on C working. — old_timer, Commented Aug 7, 2019 at 12:39
the code to copy .data over is very similar to .bss, but if you write it like the code above then you need .data copied over in order to copy .data over. — old_timer, Commented Aug 7, 2019 at 12:43

bitsmack · Accepted Answer · 2019-08-06 22:33:38Z

15

As you suspect, this is happening because the unsigned int data type is 4 bytes in size. Each *bss_start_p = 0; statement actually clears four bytes of the bss area.

The bss memory range needs to be aligned correctly. You could simply define _BSS_START and _BSS_END so that the total size is a multiple of four, but this is usually handled by allowing the linker script to define the start and stop locations.

As an example, here is the linker section in one of my projects:

.bss (NOLOAD) : ALIGN(4)
{
    __bss_start__ = .;
    *(.bss)
    . = ALIGN(4);
    __bss_end__ = .;
} >RAM

The ALIGN(4) statements take care of things.

Also, you may wish to change

while(bss_start_p != bss_end_p)

to

while(bss_start_p < bss_end_p).

This won't prevent the problem (since you might be clearing 1-3 more bytes than you wish), but it could minimize the impact :)

answered Aug 6, 2019 at 22:33

bitsmack

17k9 gold badges55 silver badges113 bronze badges

\$\begingroup\$ @CMarius Upon reflection, I think your char pointer idea would work great, although it would require more cycles. But I'm not sure if there would be subsequent problems with the next memory area being unaligned, so I'm not going to mention it in my answer... \$\endgroup\$
– bitsmack
Commented Aug 7, 2019 at 5:33
1

\$\begingroup\$ while(bss_start_p < bss_end_p - 1) followed by a byte-wise clearing of the remaining memory range would eliminate the last concern. \$\endgroup\$
– glglgl
Commented Aug 8, 2019 at 6:24

Add a comment |

old_timer · Accepted Answer · 2019-08-08 00:44:13Z

There are countless other sites and examples. Many thousands if not tens of thousands. There are the well known c libraries with linker scripts and boostrap code, newlib, glibc in particular but there are others you can find. Bootstraping C with C makes no sense.

Your question has been answered you are trying to do an exact compare on things that might not be exact, it might not start on a known boundary or end on a known boundary. So you can do the less than thing but if the code didnt work with an exact comparison then that means you are zeroing past .bss into the next section which may or may not cause bad things to happen, so simply replacing with a less than isnt the solution.

So here goes TL;DR is fine. You dont bootstrap a language with that language, you can get away with it sure, but you are playing with fire when you do that. If you are just learning how to do this you need to be on the side of caution, not dumb luck or facts you have not uncovered yet.

The linker script and bootstrap code are have a very intimate relationship, they are married, joined at the hip, you dont develop one without the other that leads to massive failure. And unfortunately the linker script is defined by the linker and the assembly language defined by the assembler so as you change toolchains expect to have to re-write both. Why assembly language? It needs no bootstrap, compiled languages generally do. C does if you want to not limit your use of the langauge, Ill start with something very simple that has minimal toolchain specific requirements, you dont assume .bss variables are zero (makes the code less readable if the variable is never initialized in that language, try to avoid this, is not true for local variables so have to be on the ball as to when you use it. folks shun globals anyway, so why are we talking about .bss and .data??? (globals are good for this level work but that is another topic)) the other rule for the simple solution is dont initialize variables in the declaration, do it in the code. yes burns more flash, you generally have plenty, not all variables are initialized with constants anyway that end up consuming instructions.

You can tell from the cortex-m design that they may have been thinking there is no bootstrap code at all so no .data nor .bss support. Most folks that use globals cant live without so here goes:

I could make this more minimal but a minimal functional example for all cortex-ms using the gnu toolchain, I dont remember what versions you can start with 5.x.x or so up through the current 9.x.x I switched linker scripts somewhere around 3.x.x or 4.x.x as I learned more and as gnu changed something that broke my first one.

bootstrap:

.thumb

.thumb_func
.global _start
_start:
stacktop: .word 0x20000800
.word reset
.word done
.word done
.word done

.thumb_func
reset:
    bl centry
    b done

.thumb_func
done:   b .

.thumb_func
.globl bounce
bounce:
    bx lr

entry point into C code:

void bounce ( unsigned int );

unsigned int a;

int centry ( void )
{
    a = 7;
    bounce(a);
    return(0);
}

linker script.

MEMORY
{
    rom : ORIGIN = 0x00000000, LENGTH = 0x1000
    ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}

SECTIONS
{
    .text : { *(.text*) } > rom
    .rodata : { *(.rodata*) } > rom
    .bss : { *(.bss*) } > ram
}

All of these could be smaller and still work, added some extra stuff here just to see it at work.

optimized build and link.

00000000 <_start>:
   0:   20001000
   4:   00000015
   8:   0000001b
   c:   0000001b
  10:   0000001b

00000014 <reset>:
  14:   f000 f804   bl  20 <centry>
  18:   e7ff        b.n 1a <done>

0000001a <done>:
  1a:   e7fe        b.n 1a <done>

0000001c <bounce>:
  1c:   4770        bx  lr
    ...

00000020 <centry>:
  20:   2207        movs    r2, #7
  22:   b510        push    {r4, lr}
  24:   4b04        ldr r3, [pc, #16]   ; (38 <centry+0x18>)
  26:   2007        movs    r0, #7
  28:   601a        str r2, [r3, #0]
  2a:   f7ff fff7   bl  1c <bounce>
  2e:   2000        movs    r0, #0
  30:   bc10        pop {r4}
  32:   bc02        pop {r1}
  34:   4708        bx  r1
  36:   46c0        nop         ; (mov r8, r8)
  38:   20000000    andcs   r0, r0, r0

Disassembly of section .bss:

20000000 <a>:
20000000:   00000000    andeq   r0, r0, r0

for some vendors you want to use 0x08000000 or 0x01000000 or other similar addresses as the flash is mapped there and mirrored to 0x00000000 in some boot modes. some only have so much of the flash mirrored at 0x00000000 so you want to have the vector table point at the application flash space not zero. since it is vector table based it all works.

first note the cortex-ms are thumb only machines and for whatever reason they enforced a thumb function address, meaning the lsbit is odd. Know your tools, the .thumb_func directives tell the gnu assembler that the next label is a thumb function address. doing the +1 thing in the table will lead to failure, dont be tempted to do it, do it right. there are other gnu assembler ways to declare a function this is the minimal approach.

   4:   00000015
   8:   0000001b
   c:   0000001b
  10:   0000001b

it wont boot if you dont get the vector table right.

arguably only need the stack pointer vector (can put anything in there if you wish to set the stack pointer yourself in code) and the reset vector. I put four here for no particular reason. Usually put 16 but wanted to shorten this example.

So what is the minimal a C bootstrap needs to do? 1. set the stack pointer 2. zero .bss 3. copy .data 4. branch to or call the C entry point

the C entry point is usually called main(). but some toolchains see main() and add extra garbage to your code. I intentionally use a different name. YMMV.

the copy of .data is not needed if this is all ram based. being a cortex-m microcontroller it is technically possible but unlikely so the .data copy is needed.....if there is .data.

My first example and a coding style is to not rely on .data nor .bss, as in this example. Arm took care of the stack pointer so the only thing left is to call the entry point. I like to have it so the entry point can return, many folks argue you should never do that. you could just do this then:

.thumb_func
.global _start
_start:
stacktop: .word 0x20000800
.word centry
.word done
.word done
.word done

and not return from centry() and not have reset handler code.

00000020 <centry>:
  20:   2207        movs    r2, #7
  22:   b510        push    {r4, lr}
  24:   4b04        ldr r3, [pc, #16]   ; (38 <centry+0x18>)
  26:   2007        movs    r0, #7
  28:   601a        str r2, [r3, #0]
  2a:   f7ff fff7   bl  1c <bounce>
  2e:   2000        movs    r0, #0
  30:   bc10        pop {r4}
  32:   bc02        pop {r1}
  34:   4708        bx  r1
  36:   46c0        nop         ; (mov r8, r8)
  38:   20000000    andcs   r0, r0, r0

Disassembly of section .bss:

20000000 <a>:
20000000:   00000000

the linker has put things where we asked. And overall we have a fully functional program.

So first work on the linker script:

MEMORY
{
    bob : ORIGIN = 0x00000000, LENGTH = 0x1000
    ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}

SECTIONS
{
    .text : { *(.text*) } > bob

    .rodata : { *(.rodata*) } > bob

   __data_rom_start__ = .;
   .data : {
    __data_start__ = .;
    *(.data*)
   } > ted AT > bob
   __data_end__ = .;
   __data_size__ = __data_end__ - __data_start__;

   .bss  : {
   __bss_start__ = .;
   *(.bss*)
   } > ted
   __bss_end__ = .;
   __bss_size__ = __bss_end__ - __bss_start__;

}

emphasizing that the names rom and ram have no meaning they only connect the dots for the linker between sections.

.thumb

.thumb_func
.global _start
_start:
stacktop: .word 0x20000800
.word reset
.word done
.word done
.word done

.thumb_func
reset:
    bl centry
    b done

.thumb_func
done:   b .

.thumb_func
.globl bounce
bounce:
    bx lr

.align
.word __data_rom_start__
.word __data_start__
.word __data_end__
.word __data_size__

add some items so that we can see what the tools did

void bounce ( unsigned int );

unsigned int a;

unsigned int b=4;
unsigned char c=5;

int centry ( void )
{
    a = 7;
    bounce(a);
    return(0);
}

add some items to place in those sections. and get

Disassembly of section .text:

00000000 <_start>:
   0:   20000800    andcs   r0, r0, r0, lsl #16
   4:   00000015    andeq   r0, r0, r5, lsl r0
   8:   0000001b    andeq   r0, r0, r11, lsl r0
   c:   0000001b    andeq   r0, r0, r11, lsl r0
  10:   0000001b    andeq   r0, r0, r11, lsl r0

00000014 <reset>:
  14:   f000 f80c   bl  30 <centry>
  18:   e7ff        b.n 1a <done>

0000001a <done>:
  1a:   e7fe        b.n 1a <done>

0000001c <bounce>:
  1c:   4770        bx  lr
  1e:   46c0        nop         ; (mov r8, r8)
  20:   0000004c    andeq   r0, r0, r12, asr #32
  24:   20000000    andcs   r0, r0, r0
  28:   20000008    andcs   r0, r0, r8
  2c:   00000008    andeq   r0, r0, r8

00000030 <centry>:
  30:   2207        movs    r2, #7
  32:   b510        push    {r4, lr}
  34:   4b04        ldr r3, [pc, #16]   ; (48 <centry+0x18>)
  36:   2007        movs    r0, #7
  38:   601a        str r2, [r3, #0]
  3a:   f7ff ffef   bl  1c <bounce>
  3e:   2000        movs    r0, #0
  40:   bc10        pop {r4}
  42:   bc02        pop {r1}
  44:   4708        bx  r1
  46:   46c0        nop         ; (mov r8, r8)
  48:   20000008    andcs   r0, r0, r8

Disassembly of section .data:

20000000 <c>:
20000000:   00000005    andeq   r0, r0, r5

20000004 <b>:
20000004:   00000004    andeq   r0, r0, r4

Disassembly of section .bss:

20000008 <a>:
20000008:   00000000    andeq   r0, r0, r0

here being the stuff we are looking for in that experiment (note no reason to actually load nor run any code...know your tools, learn them)

  1c:   4770        bx  lr
  1e:   46c0        nop         ; (mov r8, r8)
  20:   0000004c    andeq   r0, r0, r12, asr #32
  24:   20000000    andcs   r0, r0, r0
  28:   20000008    andcs   r0, r0, r8
  2c:   00000008    andeq   r0, r0, r8

so what we learned here is that the position of variables is very sensitive in gnu linker scripts. note the position of data_rom_start vs data_start but why does data_end work? Ill let you figure that out. Already understanding why one might not want to have to mess with linker scripts and just get to simple programming...

so another thing we learned here is that the linker aligned data_rom_start for us we didnt need an ALIGN(4) in there. Should we assume that that will always work?

Also note that it padded on the way out to, we have 5 bytes of .data but it padded it to 8. Without any ALIGN()s we can already do the copy using words. Based on what we see with this toolchain on my computer today, might that be true for the past and future? Who knows, even with the ALIGNs need to periodically check to confirm some new version didnt break things, they will do that from time to time.

from that experiment lets move on to this just to be safe.

MEMORY
{
    bob : ORIGIN = 0x00000000, LENGTH = 0x1000
    ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}

SECTIONS
{
    .text : { *(.text*) } > bob

    .rodata : { *(.rodata*) } > bob

   . = ALIGN(4);
   __data_rom_start__ = .;
   .data : {
    __data_start__ = .;
    *(.data*)
   . = ALIGN(4);
   __data_end__ = .;
   } > ted AT > bob
   __data_size__ = __data_end__ - __data_start__;

   . = ALIGN(4);
   .bss  : {
   __bss_start__ = .;
   *(.bss*)
   . = ALIGN(4);
   __bss_end__ = .;
   } > ted
   __bss_size__ = __bss_end__ - __bss_start__;

}

moving the ends inside to be consistent with what other folks do. And that didnt change it:

0000001c <bounce>:
  1c:   4770        bx  lr
  1e:   46c0        nop         ; (mov r8, r8)
  20:   0000004c    andeq   r0, r0, r12, asr #32
  24:   20000000    andcs   r0, r0, r0
  28:   20000008    andcs   r0, r0, r8
  2c:   00000008    andeq   r0, r0, r8

one more quick test:

.globl bounce
bounce:
    nop
    bx lr

giving

0000001c <bounce>:
  1c:   46c0        nop         ; (mov r8, r8)
  1e:   4770        bx  lr
  20:   0000004c    andeq   r0, r0, r12, asr #32
  24:   20000000    andcs   r0, r0, r0
  28:   20000008    andcs   r0, r0, r8
  2c:   00000008    andeq   r0, r0, r8

no need to pad between bounce and the .align

Ohh, right, I remember now why I dont put the _end__ inside. because it DOESNT WORK.

MEMORY
{
    bob : ORIGIN = 0x00000000, LENGTH = 0x1000
    ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}

SECTIONS
{
    .text : { *(.text*) } > bob

    .rodata : { *(.rodata*) } > bob

   . = ALIGN(4);
   __data_rom_start__ = .;
   .data : {
    __data_start__ = .;
    *(.data*)
   } > ted AT > bob
   . = ALIGN(4);
   __data_end__ = .;
   __data_size__ = __data_end__ - __data_start__;

   . = ALIGN(4);
   .bss  : {
   __bss_start__ = .;
   *(.bss*)
   } > ted
   . = ALIGN(4);
   __bss_end__ = .;
   __bss_size__ = __bss_end__ - __bss_start__;

}

some simple, but very portable code to marry to this linker script

.thumb

.thumb_func
.global _start
_start:
stacktop: .word 0x20000800
.word reset
.word done
.word done
.word done

.thumb_func
reset:

    ldr r0,blen
    cmp r0,#0
    beq bss_zero_done
    ldr r1,bstart
    mov r2,#0
bss_zero:
    stmia r1!,{r2}
    sub r0,#4
    bne bss_zero
bss_zero_done:

    ldr r0,dlen
    cmp r0,#0
    beq data_copy_done
    ldr r1,rstart
    ldr r2,dstart
data_copy:
    ldmia r1!,{r3}
    stmia r2!,{r3}
    sub r0,#4
    bne data_copy
data_copy_done:

    bl centry
    b done

.thumb_func
done:   b .

.thumb_func
.globl bounce
bounce:
    nop
    bx lr

.align
bstart: .word __bss_start__
blen:   .word __bss_size__
rstart: .word __data_rom_start__
dstart: .word __data_start__
dlen:   .word __data_size__

giving

Disassembly of section .text:

00000000 <_start>:
   0:   20000800    andcs   r0, r0, r0, lsl #16
   4:   00000015    andeq   r0, r0, r5, lsl r0
   8:   0000003d    andeq   r0, r0, sp, lsr r0
   c:   0000003d    andeq   r0, r0, sp, lsr r0
  10:   0000003d    andeq   r0, r0, sp, lsr r0

00000014 <reset>:
  14:   480c        ldr r0, [pc, #48]   ; (48 <blen>)
  16:   2800        cmp r0, #0
  18:   d004        beq.n   24 <bss_zero_done>
  1a:   490a        ldr r1, [pc, #40]   ; (44 <bstart>)
  1c:   2200        movs    r2, #0

0000001e <bss_zero>:
  1e:   c104        stmia   r1!, {r2}
  20:   3804        subs    r0, #4
  22:   d1fc        bne.n   1e <bss_zero>

00000024 <bss_zero_done>:
  24:   480b        ldr r0, [pc, #44]   ; (54 <dlen>)
  26:   2800        cmp r0, #0
  28:   d005        beq.n   36 <data_copy_done>
  2a:   4908        ldr r1, [pc, #32]   ; (4c <rstart>)
  2c:   4a08        ldr r2, [pc, #32]   ; (50 <dstart>)

0000002e <data_copy>:
  2e:   c908        ldmia   r1!, {r3}
  30:   c208        stmia   r2!, {r3}
  32:   3804        subs    r0, #4
  34:   d1fb        bne.n   2e <data_copy>

00000036 <data_copy_done>:
  36:   f000 f80f   bl  58 <centry>
  3a:   e7ff        b.n 3c <done>

0000003c <done>:
  3c:   e7fe        b.n 3c <done>

0000003e <bounce>:
  3e:   46c0        nop         ; (mov r8, r8)
  40:   4770        bx  lr
  42:   46c0        nop         ; (mov r8, r8)

00000044 <bstart>:
  44:   20000008    andcs   r0, r0, r8

00000048 <blen>:
  48:   00000004    andeq   r0, r0, r4

0000004c <rstart>:
  4c:   00000074    andeq   r0, r0, r4, ror r0

00000050 <dstart>:
  50:   20000000    andcs   r0, r0, r0

00000054 <dlen>:
  54:   00000008    andeq   r0, r0, r8

00000058 <centry>:
  58:   2207        movs    r2, #7
  5a:   b510        push    {r4, lr}
  5c:   4b04        ldr r3, [pc, #16]   ; (70 <centry+0x18>)
  5e:   2007        movs    r0, #7
  60:   601a        str r2, [r3, #0]
  62:   f7ff ffec   bl  3e <bounce>
  66:   2000        movs    r0, #0
  68:   bc10        pop {r4}
  6a:   bc02        pop {r1}
  6c:   4708        bx  r1
  6e:   46c0        nop         ; (mov r8, r8)
  70:   20000008    andcs   r0, r0, r8

Disassembly of section .data:

20000000 <c>:
20000000:   00000005    andeq   r0, r0, r5

20000004 <b>:
20000004:   00000004    andeq   r0, r0, r4

Disassembly of section .bss:

20000008 <a>:
20000008:   00000000    andeq   r0, r0, r0

we can stop there or keep going. If we initialize in the same order as the linker script it is okay if we go over into the next thing as we have not gotten there yet. and stm/ldm are only required/desired to use word aligned addresses, so if you change to:

    ldr r0,blen
    cmp r0,#0
    beq bss_zero_done
    ldr r1,bstart
    mov r2,#0
    mov r3,#0
    mov r4,#0
    mov r5,#0
bss_zero:
    stmia r1!,{r2,r3,r4,r5}
    sub r0,#16
    ble bss_zero
bss_zero_done:

with bss first in the linker script, and yes you want ble not bls.

Disassembly of section .text:

00000000 <_start>:
   0:   20000800    andcs   r0, r0, r0, lsl #16
   4:   00000015    andeq   r0, r0, r5, lsl r0
   8:   00000043    andeq   r0, r0, r3, asr #32
   c:   00000043    andeq   r0, r0, r3, asr #32
  10:   00000043    andeq   r0, r0, r3, asr #32

00000014 <reset>:
  14:   480d        ldr r0, [pc, #52]   ; (4c <blen>)
  16:   2800        cmp r0, #0
  18:   d007        beq.n   2a <bss_zero_done>
  1a:   490b        ldr r1, [pc, #44]   ; (48 <bstart>)
  1c:   2200        movs    r2, #0
  1e:   2300        movs    r3, #0
  20:   2400        movs    r4, #0
  22:   2500        movs    r5, #0

00000024 <bss_zero>:
  24:   c13c        stmia   r1!, {r2, r3, r4, r5}
  26:   3804        subs    r0, #4
  28:   ddfc        ble.n   24 <bss_zero>

0000002a <bss_zero_done>:
  2a:   480b        ldr r0, [pc, #44]   ; (58 <dlen>)
  2c:   2800        cmp r0, #0
  2e:   d005        beq.n   3c <data_copy_done>
  30:   4907        ldr r1, [pc, #28]   ; (50 <rstart>)
  32:   4a08        ldr r2, [pc, #32]   ; (54 <dstart>)

00000034 <data_copy>:
  34:   c978        ldmia   r1!, {r3, r4, r5, r6}
  36:   c278        stmia   r2!, {r3, r4, r5, r6}
  38:   3810        subs    r0, #16
  3a:   ddfb        ble.n   34 <data_copy>

0000003c <data_copy_done>:
  3c:   f000 f80e   bl  5c <centry>
  40:   e7ff        b.n 42 <done>

00000042 <done>:
  42:   e7fe        b.n 42 <done>

00000044 <bounce>:
  44:   46c0        nop         ; (mov r8, r8)
  46:   4770        bx  lr

00000048 <bstart>:
  48:   20000000    andcs   r0, r0, r0

0000004c <blen>:
  4c:   00000004    andeq   r0, r0, r4

00000050 <rstart>:
  50:   20000004    andcs   r0, r0, r4

00000054 <dstart>:
  54:   20000004    andcs   r0, r0, r4

00000058 <dlen>:
  58:   00000008    andeq   r0, r0, r8

0000005c <centry>:
  5c:   2207        movs    r2, #7
  5e:   b510        push    {r4, lr}
  60:   4b04        ldr r3, [pc, #16]   ; (74 <centry+0x18>)
  62:   2007        movs    r0, #7
  64:   601a        str r2, [r3, #0]
  66:   f7ff ffed   bl  44 <bounce>
  6a:   2000        movs    r0, #0
  6c:   bc10        pop {r4}
  6e:   bc02        pop {r1}
  70:   4708        bx  r1
  72:   46c0        nop         ; (mov r8, r8)
  74:   20000000    andcs   r0, r0, r0

Disassembly of section .bss:

20000000 <a>:
20000000:   00000000    andeq   r0, r0, r0

Disassembly of section .data:

20000004 <c>:
20000004:   00000005    andeq   r0, r0, r5

20000008 <b>:
20000008:   00000004    andeq   r0, r0, r4

those loops will go faster. now I dont know if the ahb busses can be 64 bits wide or not but for a full sized arm you would want to align these things on 64 bit boundaries. a four register ldm/stm on a 32 bit boundary but not a 64 bit boundary becomes three separate bus transactions, where aligned on a 64 bit boundary is a single transaction saving several clocks per instruction.

since we are doing baremetal and we are wholly responsible for everything we can put say bss first then data then if we have heap do that then stack grows from the top down, so if we zero bss and spill over some so long as we start at the right place that is fine we are not using that memory yet. then we copy .data over and can spill into the heap thats fine, heap or not there is plenty of room for the stack so we are not stepping on anyone/anything (so long as we make sure in the linker script we do that. if there is a concern make the ALIGN()s bigger so that we area always within our space for these fills.

so my simple solution, take it or leave it. welcome to fix any bugs, I didnt run this on hardware nor my simulator...

MEMORY
{
    bob : ORIGIN = 0x00000000, LENGTH = 0x1000
    ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}

SECTIONS
{
    .text : { *(.text*) } > bob

    .rodata : { *(.rodata*) } > bob

   . = ALIGN(8);
   .bss  : {
   __bss_start__ = .;
   *(.bss*)
   } > ted
   . = ALIGN(4);
   __bss_end__ = .;
   __bss_size__ = __bss_end__ - __bss_start__;

   . = ALIGN(8);
   __data_rom_start__ = .;
   .data : {
    __data_start__ = .;
    *(.data*)
   } > ted AT > bob
   . = ALIGN(4);
   __data_end__ = .;
   __data_size__ = __data_end__ - __data_start__;

}



.thumb

.thumb_func
.global _start
_start:
stacktop: .word 0x20000800
.word reset
.word done
.word done
.word done

.thumb_func
reset:

    ldr r0,blen
    cmp r0,#0
    beq bss_zero_done
    ldr r1,bstart
    mov r2,#0
    mov r3,#0
    mov r4,#0
    mov r5,#0
bss_zero:
    stmia r1!,{r2,r3,r4,r5}
    sub r0,#16
    ble bss_zero
bss_zero_done:

    ldr r0,dlen
    cmp r0,#0
    beq data_copy_done
    ldr r1,rstart
    ldr r2,dstart
data_copy:
    ldmia r1!,{r3,r4,r5,r6}
    stmia r2!,{r3,r4,r5,r6}
    sub r0,#16
    ble data_copy
data_copy_done:

    bl centry
    b done

.thumb_func
done:   b .

.thumb_func
.globl bounce
bounce:
    nop
    bx lr

.align
bstart: .word __bss_start__
blen:   .word __bss_size__
rstart: .word __data_rom_start__
dstart: .word __data_start__
dlen:   .word __data_size__


void bounce ( unsigned int );

unsigned int a;

unsigned int b=4;
unsigned char c=5;

int centry ( void )
{
    a = 7;
    bounce(a);
    return(0);
}

arm-none-eabi-as --warn --fatal-warnings flash.s -o flash.o
arm-none-eabi-ld -o hello.elf -T flash.ld flash.o centry.o
arm-none-eabi-objdump -D hello.elf > hello.list
arm-none-eabi-objcopy hello.elf hello.bin -O binary

put it all together and you get:

Disassembly of section .text:

00000000 <_start>:
   0:   20000800    andcs   r0, r0, r0, lsl #16
   4:   00000015    andeq   r0, r0, r5, lsl r0
   8:   00000043    andeq   r0, r0, r3, asr #32
   c:   00000043    andeq   r0, r0, r3, asr #32
  10:   00000043    andeq   r0, r0, r3, asr #32

00000014 <reset>:
  14:   480d        ldr r0, [pc, #52]   ; (4c <blen>)
  16:   2800        cmp r0, #0
  18:   d007        beq.n   2a <bss_zero_done>
  1a:   490b        ldr r1, [pc, #44]   ; (48 <bstart>)
  1c:   2200        movs    r2, #0
  1e:   2300        movs    r3, #0
  20:   2400        movs    r4, #0
  22:   2500        movs    r5, #0

00000024 <bss_zero>:
  24:   c13c        stmia   r1!, {r2, r3, r4, r5}
  26:   3810        subs    r0, #16
  28:   ddfc        ble.n   24 <bss_zero>

0000002a <bss_zero_done>:
  2a:   480b        ldr r0, [pc, #44]   ; (58 <dlen>)
  2c:   2800        cmp r0, #0
  2e:   d005        beq.n   3c <data_copy_done>
  30:   4907        ldr r1, [pc, #28]   ; (50 <rstart>)
  32:   4a08        ldr r2, [pc, #32]   ; (54 <dstart>)

00000034 <data_copy>:
  34:   c978        ldmia   r1!, {r3, r4, r5, r6}
  36:   c278        stmia   r2!, {r3, r4, r5, r6}
  38:   3810        subs    r0, #16
  3a:   ddfb        ble.n   34 <data_copy>

0000003c <data_copy_done>:
  3c:   f000 f80e   bl  5c <centry>
  40:   e7ff        b.n 42 <done>

00000042 <done>:
  42:   e7fe        b.n 42 <done>

00000044 <bounce>:
  44:   46c0        nop         ; (mov r8, r8)
  46:   4770        bx  lr

00000048 <bstart>:
  48:   20000000    andcs   r0, r0, r0

0000004c <blen>:
  4c:   00000004    andeq   r0, r0, r4

00000050 <rstart>:
  50:   20000008    andcs   r0, r0, r8

00000054 <dstart>:
  54:   20000004    andcs   r0, r0, r4

00000058 <dlen>:
  58:   00000008    andeq   r0, r0, r8

0000005c <centry>:
  5c:   2207        movs    r2, #7
  5e:   b510        push    {r4, lr}
  60:   4b04        ldr r3, [pc, #16]   ; (74 <centry+0x18>)
  62:   2007        movs    r0, #7
  64:   601a        str r2, [r3, #0]
  66:   f7ff ffed   bl  44 <bounce>
  6a:   2000        movs    r0, #0
  6c:   bc10        pop {r4}
  6e:   bc02        pop {r1}
  70:   4708        bx  r1
  72:   46c0        nop         ; (mov r8, r8)
  74:   20000000    andcs   r0, r0, r0

Disassembly of section .bss:

20000000 <a>:
20000000:   00000000    andeq   r0, r0, r0

Disassembly of section .data:

20000004 <c>:
20000004:   00000005    andeq   r0, r0, r5

20000008 <b>:
20000008:   00000004    andeq   r0, r0, r4

note that this works with arm-none-eabi- and arm-linux-gnueabi and the other variants as no ghee whiz stuff was used.

You will find when you look around that folks will go crazy with ghee whiz stuff in their linker scripts, huge monstrous kitchen sink things. Better to just know how to do it (or better how to master the tools so you can control what goes on) rather than rely on someone elses stuff and not know where it is going to break because you dont understand and/or want to research it.

as a general rule do not bootstrap a language with the same language (bootstrap in this sense meaning running code not compiling a compiler with the same compiler) you want to use a simpler language with less of a bootstrap. That is why C is done in assembly, it has no bootstrap requirements you just start from the first instruction after reset. JAVA, sure you might write the jvm in C and bootstrap that C with asm then bootstrap the JAVA if you will with C but also execute the JAVA in C too.

Because we control assumptions on these copy loops they are by definition tighter and cleaner than hand tuned memcpy/memset.

Note your other problem was this:

unsigned int * bss_start_p = &_BSS_START; 
unsigned int * bss_end_p = &_BSS_END;

if these are local fine, no problem, if these are global then you need .data initialized first for them to work and if you try that trick to do .data then you will fail. Local variables, fine that will work. if you for some reason decided to make the static locals (local globals I like to call them) then you are back to being in trouble again. Every time you do an assignment in a declaration though you should think about it, how is that implemented and is it safe/sane. Every time you assume a variable is zero when undeclared, same deal, if a local variable its not assumed to be zero, if global then it is. if you never assume them to be zero then you never have to worry.

awesome, this is the second time I have exceeded the max character count in an answer.... — old_timer, Commented Aug 8, 2019 at 0:44
This question belongs on stackoverflow not electrical engineering. — old_timer, Commented Aug 8, 2019 at 0:44
Also relying on an external link in your question is not good form, if the link goes away before the question then the question might not make sense. — old_timer, Commented Aug 8, 2019 at 0:45
In this case your title and content is enough to know that you are trying to bootstrap C on a particular microcontroller and are wandering into .bss and .data initialization — old_timer, Commented Aug 8, 2019 at 0:45
but in this case have been mislead by an otherwise very informative website. — old_timer, Commented Aug 8, 2019 at 0:45

ilkkachu · Accepted Answer · 2019-08-07 09:43:31Z

4

The standard solution is memset():

#include <string.h>
memset(&_BSS_START, 0, &_BSS_END - &_BSS_START)

If you can't use the standard library, then you'll have to decide if it's ok in your case to round the size of the memory area up to 4 bytes and continue using unsigned int *; or if you need to be strict about it, in which case you'd need to use unsigned char *.

If you do round up the size, like in your first loop, then bss_start_p may indeed end up greater than bss_end_p but thats easy to deal with a less-than comparison < instead of an inequality test.

Of course, you could also fill most of the memory area with 32-bit transfers, and only the last few bytes with 8-bit transfers, but that's more work for little gain, particularly here when it's only a piece of startup code.

edited Aug 7, 2019 at 9:43

answered Aug 7, 2019 at 7:49

ilkkachu

1,0678 silver badges8 bronze badges

1

\$\begingroup\$ Agree very much with the use of memset(). But alignment to 4 bytes is more or less a must. So why not do it? \$\endgroup\$
– Codo
Commented Aug 7, 2019 at 11:00
3

\$\begingroup\$ in no way shape or form is the standard solution for the bootstrap to use memset, that is crazy. \$\endgroup\$
– old_timer
Commented Aug 7, 2019 at 12:44
\$\begingroup\$ you dont use the same language to bootstrap that language \$\endgroup\$
– old_timer
Commented Aug 7, 2019 at 12:47
2

\$\begingroup\$ the bootstrap code and linker script are very much married, you will find it common that the linker script aligns and sizes the .bss on at least a 4 byte boundary to improve the fill (in the bootstrap) by 4x over byte at a time instructions (assuming (minimum) 32 bit busses which is typical for arm but there are exceptions) \$\endgroup\$
– old_timer
Commented Aug 7, 2019 at 12:51
3

\$\begingroup\$ @old_timer, the standard C function to set memory to a particular value is memset(), and C is what they seem to be programming in. The simple implementation of memset() is also pretty much just that loop, it's not like it depends on much else. Since that's a microcontroller, I also assume that there's no dynamic linking or such going on (and looking at the link, there isn't, it's just a call to main() after that zeroing loop), so the compiler should be capable of dropping memset() in there along with àny other functions (or to implement it inline). \$\endgroup\$
– ilkkachu
Commented Aug 7, 2019 at 12:59

| Show 7 more comments

Dave Tweed · Accepted Answer · 2019-08-08 11:37:28Z

4

Just change != to <. That's usually a better approach anyway, as it deals with problems like this.

edited Aug 8, 2019 at 11:37

Dave Tweed

175k17 gold badges237 silver badges407 bronze badges

answered Aug 6, 2019 at 22:29

Elliot Alderson

31.6k5 gold badges31 silver badges67 bronze badges

Add a comment |

Stack Exchange Network

Bare-metal start-up code for Cortex M3 .bss region initialization

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
microcontroller
stm32
arm
cortex-m3
linker
or ask your own question.

Hot Network Questions

Bare-metal start-up code for Cortex M3 .bss region initialization

4 Answers 4

Not the answer you're looking for? Browse other questions tagged microcontrollerstm32armcortex-m3linker or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
microcontroller
stm32
arm
cortex-m3
linker
or ask your own question.