14

Right now I am working with embedded systems and figuring out ways to implement strings on a microprocessor with no operating system. So far what I am doing is just using the idea of having NULL terminated character pointers and treating them as strings where the NULL signifies the end. I know that this is fairly common, but can you always count on this to be the case?

The reason I ask is that I was thinking about maybe using a real time operating system at some point, and I'd like to re-use as much as my current code as possible. So for the various choices that are out there, can I pretty much expect the strings to work the same?

Let me be more specific though for my case. I am implementing a system that takes and processes commands over a serial port. Can I keep my command processing code the same, and then expect that the string objects created on the RTOS (which contains the commands) to all be NULL terminated? Or, would it be different based on the OS?

Update

After being advised to take a look at this question I have determined that it does not exactly answer what I am asking. The question itself is asking if a string's length should always be passed which is entirely different than what I am asking, and although some of the answers had useful information in them they are not exactly what I am looking for. The answers there seemed to give reasons why or why not to terminate a string with a null character. The difference with what I am asking is if I can more or less expect the in-born strings of different platforms to terminate their own strings with null, without having to go out and try every single platform out there if that makes sense.

8
  • 4
    I haven't used C in a long time, but I can't think of a time when I ran into an implementation that did not use NULL-terminated strings. It's part of standard C, if I remember correctly (like I said, it's been a while...) Commented Mar 21, 2017 at 13:25
  • 1
    I'm not a specialist in C, but as far as I know all strings in C are arrays of char, null-terminated. You can create your own string type though, but you'd have to implement all string manipulation functions by yourself.
    – Machado
    Commented Mar 21, 2017 at 13:26
  • 1
    Possible duplicate of Should functions of a C library always expect a string's length?
    – gnat
    Commented Mar 21, 2017 at 13:26
  • 1
    @MetalMikester You think that this information could be found in the standard C spec?
    – Snoop
    Commented Mar 21, 2017 at 13:39
  • 3
    @Snoopy Most likely, yes. But really, when talking about strings in C, they're just an array of characters that end with NULL and that's that, unless you use some kind of non-standard string library but that's not what we're talking about here anyway. I doubt you'll find a platform that doesn't respect that, especially with one of C's strengths being portability. Commented Mar 21, 2017 at 13:55

8 Answers 8

43

The things that are called "C strings" will be null-terminated on any platform. That's how the standard C library functions determine the end of a string.

Within the C language, there's nothing stopping you from having an array of characters that doesn't end in a null. However you will have to use some other method to avoid running off the end of a string.

6
  • 4
    just to add on; usually you have an integer somewhere to keep track of the string length and then you end up with a custom data structure to do it right, something like the QString class in Qt
    – user7433
    Commented Mar 21, 2017 at 13:52
  • 9
    Case in point: I work with a C program that uses at least five different string formats: null-terminated char arrays, char arrays with the length encoded in the first byte (commonly known as "Pascal strings"), wchar_t-based versions of both of the above, and char arrays that combine both methods: length encoded in the first byte, and a null character terminating the string.
    – Mark
    Commented Mar 21, 2017 at 18:04
  • 5
    @Mark Interfacing with lots of 3rd party components/applications or a legacy code mess? Commented Mar 21, 2017 at 18:13
  • 3
    @DanNeely, all of the above. Pascal strings for interfacing with classic MacOS, C strings for internal use and Windows, wide strings for adding Unicode support, and bastard strings because someone tried to be clever and make a string that could interface with both MacOS and Windows at the same time.
    – Mark
    Commented Mar 21, 2017 at 21:49
  • 1
    @Mark ...and of course no one is willing to spend money to pay off the technical debt because classic MacOS is long dead, and the bastard strings are a double clusterfrak every time they need to be touched. My sympathies. Commented Mar 21, 2017 at 23:40
22

Determination of the terminating character is up to the compiler for literals and the implementation of the standard library for strings in general. It isn't determined by the operating system.

The convention of NUL termination goes back to pre-standard C, and in 30+ years, I can't say I've run into an environment that does anything else. This behavior was codified in C89 and continues to be part of the C language standard (link is to a draft of C99):

  • Section 6.4.5 sets the stage for NUL-terminated strings by requiring that a NUL be appended to string literals.
  • Section 7.1.1 brings that to the functions in the standard library by defining a string as "a contiguous sequence of characters terminated by and including the first null character."

There's no reason why someone couldn't write functions that handle strings terminated by some other character, but there's also no reason to buck the established standard in most cases unless your goal is giving programmers fits. :-)

5
  • 2
    One reason would be to avoid having to find the end of the same string over and over. Commented Mar 21, 2017 at 22:54
  • @PaŭloEbermann Right. At the expense of having to pass two values instead of one. Which is a bit irksome if you just pass a string literal as in printf("string: \"%s\"\n", "my cool string"). The only way around passing four parameters in this case (other than some kind of termination byte) would be to define a string to be something like std::string in C++, which has its own problems and limitations. Commented Mar 22, 2017 at 7:37
  • 1
    Section 6.4.5 does not require a string literal to be terminated with a null character. It explicitly notes "A character string literal need not be a string (see 7.1.1), because a null character may be embedded in it by a \0 escape sequence."
    – bzeaman
    Commented Jan 29, 2019 at 10:16
  • 1
    @bzeaman The footnote says you can construct a string literal that doesn't meet 7.1.1's definition of a string, but the sentence referring to it says compliant compilers NUL-terminate them no matter what: "In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals." Library functions using 7.1.1's definition stop at the first NUL they find and won't know or care that additional characters exist beyond it.
    – Blrfl
    Commented Jan 29, 2019 at 12:24
  • I stand corrected. I searched for various terms like 'null' but missed 6.4.5.5 mentioning the 'value zero'.
    – bzeaman
    Commented Jan 30, 2019 at 9:38
4

I am working with embedded systems ... with no operating system...I am...using the idea of having NULL terminated character pointers and treating them as strings where the NULL signifies the end. I know that this is fairly common, but can you always count on this to be the case?

There is no string data type in the C language, but there are string literals.

If you put a string literal in your program, it will usually be NUL terminated (but see the special case, discussed in comments below.) That is to say, If you put "foobar" in a place where a const char * value is expected, the compiler will emit foobar⊘ to the const/code segment/section of your program, and the value of the expression will be a pointer to the address where it stored the f character. (Note: I am using to signify the NUL byte.)

The only other sense in which the C language has strings is, it has some standard library routines that operate on NUL terminated character sequences. Those library routines will not exist in a bare metal environment unless you port them yourself.

They're just code---no different from the code that you yourself write. If you don't break them when you port them, then they will do what they always do (e.g., stop on a NUL.)

9
  • 3
    Re: "If you put a string literal in your program, it will always be NUL terminated": Are you sure about that? I'm pretty sure that (e.g.) char foo[4] = "abcd"; is a valid way to create a non-null-terminated array of four characters.
    – ruakh
    Commented Mar 21, 2017 at 20:32
  • 3
    @ruakh, Oops! that's a case that I did not consider. I was thinking about a string literal that appear in a place where a char const * expression is expected. I forgot that C initializers can sometimes obey different rules. Commented Mar 21, 2017 at 20:47
  • @ruakh The string literal is NUL-terminated. The array is not.
    – jamesdlin
    Commented Mar 22, 2017 at 6:44
  • 3
    @ruakh you have a char[4]. That is not a string, but it was initialised from one
    – Caleth
    Commented Mar 22, 2017 at 9:33
  • 3
    @Caleth, "initialized from one" is not something that must happen at run time. If we add the keyword static to Ruakh's example, then the compiler may emit a non NUL terminated "abcd" to an initialized data segment so that the variable is initialized by the program loader. So, Ruakh was right: There is at least one case where the appearance of a string literal in a program does not require the compiler to emit a NUL-terminated string. (p.s., I actually compiled the example with gcc 5.4.0, and the compiler did not emit the NUL.) Commented Mar 22, 2017 at 13:17
2

As others have mentioned, null terminating of strings is a convention of the C Standard Library. You can handle strings any way you wish if you're not going to use the standard library.

This is true of any operating system with a 'C' compiler, and as well, you can write 'C' programs that are not run under a true operating system as you mention in your question. An example would be the controller for an ink jet printer I designed once. In embedded systems, the memory overhead of an operating system may not be necessary.

In memory-tight situations, I would look at the characteristics of my compiler vis-a-vis the instruction set of the processor, for example. In an application where strings are processed a lot, it might be desirable to use descriptors such as string length. I'm thinking of a case where the CPU is particularly efficient at working with short offsets and/or relative offsets with address registers.

So which is more important in your application: code size and efficiency, or compatibility with an OS or Library? Another consideration might be maintainability. The further you stray from convention, the harder it will be for someone else to maintain.

1

Others have addressed the issue that in C, strings are largely what you make of them. But there seems to be some confusion in your question w.r.t. the terminator itself, and from one perspective, this could be what someone in your position is worried about.

C strings are null-terminated. That is, they are terminated by the null character, NUL. They are not terminated by the null pointer NULL, which is a completely different kind of value with a completely different purpose.

NUL is guaranteed to have the integer value zero. Within the string, it will also have the size of the underlying character type, which will usually be 1.

NULL is not guaranteed to have an integer type at all. NULL is intended for use in a pointer context, and is generally expected to have a pointer type, which shouldn't convert to a character or integer if your compiler is any good. While the definition of NULL involves the glyph 0, it is not guaranteed to actually have that value[1], and unless your compiler implements the constant as a one-character #define (many don't, because NULL really shouldn't be meaningful in a non-pointer context), the expanded code is therefore not guaranteed to actually involve a zero value (even though it confusingly does involve a zero glyph).

If NULL is typed, it will also be unlikely to have a size of 1 (or another character size). This may conceivably cause additional problems, although actual character constants don't have character size either for the most part.

Now most people will see this and think, "null pointer as anything other than all-zero-bits? what nonsense" - but assumptions like that are only safe on common platforms like x86. Since you've explicitly mentioned an interest in targeting other platforms, you need to take this issue into account, as you have explicitly separated your code from assumptions about the nature of the relationship between pointers and integers.

Therefore, while C strings are null-terminated, they aren't terminated by NULL, but by NUL (usually written '\0'). Code which explicitly uses NULL as a string terminator will work on platforms with a straightforward address structure, and will even compile with many compilers, but it's absolutely not correct C.


[1] the actual null pointer value is inserted by the compiler when it reads a 0 token in a context where it would be converted to a pointer type. This is not a conversion from the integer value 0, and is not guaranteed to hold if anything other than the token 0 itself is used, such as a dynamic value from a variable; the conversion is also not reversible, and a null pointer doesn't have to yield the value 0 when converted to an integer.

2
  • Great point. I've submitted an edit to help clear this up. Commented Mar 21, 2017 at 21:36
  • "NUL is guaranteed to have the integer value zero." --> C does not define NUL. Instead C defines that strings have a final null chracter, a byte with all bits set to 0. Commented Feb 26, 2020 at 20:35
1

I have been using string in C, it means characters with null termination is called Strings.

It won't have any issues when you use in baremetal or in any operating systems such as Windows, Linux, RTOS :(FreeRTO, OSE).

In embedded world null termination actually helps more to token the character as string.

I've been using strings in C like that in many safety critical systems.

You might be wondering, what is string actually in C?

C-style strings, which are arrays, there are also string literals, such as "this". In reality, both of these string types are merely just collections of characters sitting next to each other in memory.

Whenever you write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character.

For example, you can declare and define an array of characters, and initialize it with a string constant:

char string[] = "Hello cruel world!";

Straightforward answer: You don't really need to worry about the usage of characters with null termination, this work independent of any platform.

1
  • Thanks, did not know that when declared with double quotes, a NUL is automatically appended.
    – Snoop
    Commented Mar 22, 2017 at 12:31
1

As others have said, null termination is pretty much universal for standard C. But (as others have also pointed out) not 100%. For (another) example, the VMS operating system typically used what it called "string descriptors" http://h41379.www4.hpe.com/commercial/c/docs/5492p012.html accessed in C by #include <descrip.h>

Application-level stuff can use null termination or not, however the developer sees fit. But low-level VMS stuff absolutely requires descriptors, which don't use null termination at all (see above link for details). This is largely so that all languages (C, assembly, etc) which directly use VMS internals can have a common interface with them.

So if you're anticipating any kind of similar situation, you might want to be somewhat more careful than "universal null termination" might suggest is necessary. I'd be more careful if I were doing what you're doing, but for my application-level stuff it's safe to assume null termination. I just wouldn't suggest the same level of safety to you. Your code might well have to interface with assembly, and/or other, language code at some future point, which may not always conform to the C standard of null-terminated strings.

2
  • Today, 0 termination is actually quite unusual. C++ std::string doesn't, Java String doesn't, Objective-C NSString doesn't, Swift String doesn't - as a result, each languages library supports strings with NUL codes inside the string (which is impossible with C strings for obvious reasons).
    – gnasher729
    Commented Mar 22, 2017 at 21:56
  • @gnasher729 I changed "...pretty much universal" to "pretty much universal for standard C", which I hope removes any ambiguity and remains correct today (and which is what I meant, as per the OP's subject and question). Commented Mar 23, 2017 at 0:14
0

In my experience of embedded, safety critical and real time systems it is not uncommon to use both the C and PASCAL string conventions, i.e. to supply the strings length as the first character, (which limits the length to 255), and to end the string with at least one 0x00, (NUL), which reduces the usable size to 254.

One reason for this is to know how much data you are expecting after the first byte has been received and another is that, in such systems, dynamic buffer sizes are avoided where possible - allocating a fixed 256 buffer size is faster and safer, (no need to check if malloc failed). Another is that the other systems that you are communicating with may not be written in ANSI-C.

In any embedded work it is important to establish and maintain an Interface Control Document, (IDC), that defines all of your communications structures including string formats, endianness, integer sizes, etc., as soon as possible, (ideally before starting), and it should be your, and all the teams, holy book when writing the system - if someone wishes to introduce a new structure or format it must be documented there first and everybody that might be impacted informed, possibly with an option to veto the change.

Not the answer you're looking for? Browse other questions tagged or ask your own question.