2

I was just beginning out with C (K.N king's C Programming) when I came across the following passage:

By default, floating constants are stored as double-precision numbers. In other words, when a C compiler finds the constant 57.0 in a program, it arranges for the number to be stored in memory in the same format as a double variable. This rule generally causes no problems, since double values are converted automatically to float when necessary.

Suppose I have the following statements:

 float x = 5.0;     // 1

 float y = 5.0f;    // 2

What does the passage mean in this example? What is the difference between statement 1 and 2 with regard to storage of values in bits?

In the first statement, is 5.0 first saved as a double and then allocated as a float to x?

3
  • 1
    Real compiler godbolt.org/z/eYfsEKf8P only cares about the variable type, a 5/5.0/5.0f constant produces the same code here. Text may be outdated/inaccurate/not to be taken literally. Consume with salt.
    – teapot418
    Commented Jul 8 at 12:24
  • 1
    If you assign it to a float variable immediately, the compiler will actually store it as a float in your executable, as an optimization. It's called "constant folding". The compiler knows the result of the conversion in advance so your program doesn't have to calculate it at runtime. Commented Jul 8 at 12:26
  • This is the "converted when necessary" part. To store a value into x, the compiler converts it to float .
    – BoP
    Commented Jul 8 at 12:34

2 Answers 2

8

The author’s assertion that “By default, floating constants are stored as double-precision numbers” likely arises from this paragraph in the C standard, C 2018 6.4.4.2 4:

An unsuffixed floating constant has type double. If suffixed by the letter f or F, it has type float. If suffixed by the letter l or L, it has type long double.

That paragraph makes it clear that a floating-point constant in source code is by default (meaning it does not have a suffix) interpreted as a double. But the author’s assertion that the value is “stored” is imprecise. The C standard tells us how to interpret source code, but it does not require that constants be stored. Even in the abstract machine model the C standard uses to specify C semantics, before optimization is considered, it is only specified that the values of variables are stored in the memory of the variables, not that the values of constants are stored.

Thus, I would expect the compiler to do its best job of converting an unsuffixed constant to double,1 but I would not necessarily expect it to store it anywhere other than in its own memory while working with it and generating the program. It might end up storing it in the program’s data if needed, but it could generate it in instructions or fold it into other parts of an expression.

This rule generally causes no problems, since double values are converted automatically to float when necessary.

I would phrase it as saying problems caused by this automatic conversion are rare. Stating that it “generally” causes no problems might cause a student to take that as a general rule, rather than being cautious of when problems can occur. In situations where floating-point constants are carefully engineered for a particular task, suffixes should be used to ensure the constant has exactly the desired value.

In your example with five, float x = 5.0; and float y = 5.0f; will produce the same value in x because five is representable in both float and double. However, consider this code:

#include <stdio.h>


int main(void)
{
    float x = 0x9.876548000000000000001p0;
    float y = 0x9.876548000000000000001p0f;
    printf("%a\n", x);
    printf("%a\n", y);
}

In my C implementation, x and y get different values, and this prints:

0x1.30eca8p+3
0x1.30ecaap+3

The reason is this:

  • In float x = 0x9.876548000000000000001p0;, 9.87654800000000000000116 is converted to double. The final 1 bit is several bits below what is representable in a double, so it is rounded down, producing 9.87654816. Then this double is converted to float for storing in x. The low bit of the 4 is the last bit that fits in a float, so the first bit of the 8 is the first bit that does not fit. This is halfway between two values representable in a float, 9.8765416 and 9.8765516. In case of a tie, the rule is to round to the even low bit, so the result of the conversion is 9.8765416, and that is stored in x. Printing it produces 0x1.30eca8p+3, which is another representation of that number.

  • In float y = 0x9.876548000000000000001p0f;, 9.87654800000000000000116 is converted to float. Again, the low bit of the 4 fits, so the part that does not fit is the 8000000000000001. Because of the 1, this is more than halfway from 9.8765416 to 9.8765516, so there is no tie, and rounding to the nearest value produces 9.8765516, and that is what is stored in y. Printing it produces 0x1.30ecaap+3, another representation of the same value.

Footnote

1 The C standard is lax about how floating-point constants are converted to floating-point values. C 2018 6.4.4.2 7 says translation-time conversion “should” match the conversion done by library functions such as strtod, and 7.22.1.3 9 says strtod “should” be correctly rounded if it does not have too many digits (at most DECIMAL_DIG digits) or, if it does, should equal the result of converting one of the two decimal numbers with DECIMAL_DIG digits that immediately bound the value. This is a legacy due to the fact that converting values with exponents such as +300 or -300 from scratch nominally requires computing with hundreds of digits, considered too much of a burden for early compilers and computers. Modern algorithms have been devised for this, so the standard could require correct rounding in all cases.

0

There are some cases where the implicit conversion may have some unexpected performance impact. For example, if you have a simple function like:

float foo(float x){
    return 0.1 * x;
}

What the compiler will have to do is first convert x from float to double, multiply by the double constant 0.1, then convert the result back to float, i.e. requiring 3 operations instead of just one single precision multiplication if you wrote:

return 0.1f * x;

If you have just a few 1000 floating point operations per second, the impact of that is likely negligible, but in that case you could use double variables everywhere as well (on modern hardware, there should not be a performance difference between a single float or double multiplication/addition).

I suggest to consider using float only if you need to store a lot of values or need to process lots of values in a short time and you don't need the extra precision of double (what "a lot" means, is very problem specific).

N.B. gcc and clang have a -Wconversion flag which will warn you about many cases where an implicit conversion can lose precision.

float x = 5.0;  // no precision loss, no warning
float y = 0.1;  // warning: conversion from 'double' to 'float' changes value from '1.0000000000000001e-1' to '1.00000001e-1f' [-Wfloat-conversion] (gcc 14.1)
float z = 0.1f; // no conversion, no warning.

Not the answer you're looking for? Browse other questions tagged or ask your own question.