Type-pun uint64_t as two uint32_t in C++20

Question

This code to read a uint64_t as two uint32_t is UB due to the strict aliasing rule:

uint64_t v;
uint32_t lower = reinterpret_cast<uint32_t*>(&v)[0];
uint32_t upper = reinterpret_cast<uint32_t*>(&v)[1];

Likewise, this code to write the upper and lower part of an uint64_t is UB due to the same reason:

uint64_t v;
uint32_t* lower = reinterpret_cast<uint32_t*>(&v);
uint32_t* upper = reinterpret_cast<uint32_t*>(&v) + 1;

*lower = 1;
*upper = 1;

How can one write this code in a safe and clean way in modern C++20, potentially using std::bit_cast?

How do you want to handle endianess?
– Jarod42
Commented Nov 10, 2021 at 10:35 — Jarod42, Commented Nov 10, 2021 at 10:35

Arty · Accepted Answer · 2021-11-11 18:47:44Z

Using std::bit_cast:

Try it online!

#include <bit>
#include <array>
#include <cstdint>
#include <iostream>

int main() {
    uint64_t x = 0x12345678'87654321ULL;
    // Convert one u64 -> two u32
    auto v = std::bit_cast<std::array<uint32_t, 2>>(x);
    std::cout << std::hex << v[0] << " " << v[1] << std::endl;
    // Convert two u32 -> one u64
    auto y = std::bit_cast<uint64_t>(v);
    std::cout << std::hex << y << std::endl;
}

Output:

87654321 12345678
1234567887654321

std::bit_cast is available only in C++20. Prior to C++20 you can manually implement std::bit_cast through std::memcpy, with one exception that such implementation is not constexpr like C++20 variant:

template <class To, class From>
inline To bit_cast(From const & src) noexcept {
    //return std::bit_cast<To>(src);
    static_assert(std::is_trivially_constructible_v<To>,
        "Destination type should be trivially constructible");
    To dst;
    std::memcpy(&dst, &src, sizeof(To));
    return dst;
}

For this specific case of integers quite optimal would be just to do bit shift/or arithmetics to convert one u64 to two u32 and back again. std::bit_cast is more generic, supporting any trivially constructible type, although std::bit_cast solution should be same optimal as bit arithmetics on modern compilers with high level of optimization.

One extra profit of bit arithmetics is that it handles correctly endianess, it is endianess independent, unlike std::bit_cast.

Try it online!

#include <cstdint>
#include <iostream>

int main() {
    uint64_t x = 0x12345678'87654321ULL;
    // Convert one u64 -> two u32
    uint32_t lo = uint32_t(x), hi = uint32_t(x >> 32);
    std::cout << std::hex << lo << " " << hi << std::endl;
    // Convert two u32 -> one u64
    uint64_t y = (uint64_t(hi) << 32) | lo;
    std::cout << std::hex << y << std::endl;
}

Output:

87654321 12345678
123456788765432

Notice! As @Jarod42 points out, solution with bit shifting is not equivalent to memcpy/bit_cast solution, their equivalence depends on endianess. On little endian CPU memcpy/bit_cast gives least significant half (lo) as array element v[0] and most significant (hi) in v[1], while on big endian least significant (lo) goes to v[1] and most significant goes to v[0]. While bit-shifting solution is endianess independent, and on all systems gives most significant half (hi) as uint32_t(num_64 >> 32) and least significant half (lo) as uint32_t(num_64).

You can manually implement std::bit_cast, but note that C++20 std::bit_cast is constexpr which isn't possible without some compiler magic. — Possseidon, Commented Nov 10, 2021 at 10:27
Possibly OT: For some reason, GCC added a single-instruction overhead with bit_cast: godbolt.org/z/seoM4za9d (Clang did not). — Daniel Langr, Commented Nov 10, 2021 at 10:31
@DanielLangr I think this overhead only happens if you do this inside separate function. When you inline this function or use its code as a part of bigger code then compiler should better optimize away extra instructions. But to me looks weird that compiler decided to use extra RAX in third function without using RDI directly as in first two functions. — Arty, Commented Nov 10, 2021 at 10:36
@DanielLangr If you don't use references for return values, everything is optimized into basically nothingness: godbolt.org/z/dMfxdah6G — Possseidon, Commented Nov 10, 2021 at 10:37
Notice that the solution with manual bitshift is not equivalent to memcpy/bitcast with endianess. — Jarod42, Commented Nov 10, 2021 at 13:10

KamilCuk · Accepted Answer · 2021-11-10 11:21:46Z

in a safe and clean way

Do not use reinterpret_cast. Do not depend on unclear code that depends on some specific compiler settings and fishy, uncertain behavior. Use exact arithmetic operations with well-known defined result. Classes and operator overloads are all there waiting for you. For example, some global functions:

#include <iostream>

struct UpperUint64Ref {
   uint64_t &v;
   UpperUint64Ref(uint64_t &v) : v(v) {}
   UpperUint64Ref operator=(uint32_t a) {
      v &= 0x00000000ffffffffull;
      v |= (uint64_t)a << 32;
      return *this;
   }
   operator uint64_t() {
      return v;
   }
};
struct LowerUint64Ref { 
    uint64_t &v;
    LowerUint64Ref(uint64_t &v) : v(v) {}
    /* as above */
};
UpperUint64Ref upper(uint64_t& v) { return v; }
LowerUint64Ref lower(uint64_t& v) { return v; }

int main() {
   uint64_t v;
   upper(v) = 1;
}

Or interface object:

#include <iostream>

struct Uint64Ref {
   uint64_t &v;
   Uint64Ref(uint64_t &v) : v(v) {}
   struct UpperReference {
       uint64_t &v;
       UpperReference(uint64_t &v) : v(v) {}
       UpperReference operator=(uint32_t a) {
           v &= 0x00000000ffffffffull;
           v |= (uint64_t)a << 32u;
       }
   };
   UpperReference upper() {
      return v;
   }
   struct LowerReference {
       uint64_t &v;
       LowerReference(uint64_t &v) : v(v) {}
   };
   LowerReference lower() { return v; }
};
int main() {
   uint64_t v;
   Uint64Ref r{v};
   r.upper() = 1;
}

Notice that this answer would be independent of endianess, contrary to memcpy/bitcast usage. Not sure what is expected though. — Jarod42, Commented Nov 10, 2021 at 10:37
Great answer. commandcenter.blogspot.com/2012/04/byte-order-fallacy.html?m=1 should be required reading for everyone. — n. m. could be an AI, Commented Nov 10, 2021 at 10:59
This would have been the best you could do before C++20. Now bit_cast (with std::endian) is as fast or faster and is defined behavior — Glenn Teitelbaum, Commented Sep 19, 2022 at 14:12

Quimby · Accepted Answer · 2021-11-10 10:30:26Z

Using std::memcpy

#include <cstdint>
#include <cstring>

void foo(uint64_t& v, uint32_t low_val, uint32_t high_val) {
    std::memcpy(reinterpret_cast<unsigned char*>(&v), &low_val,
                sizeof(low_val));
    std::memcpy(reinterpret_cast<unsigned char*>(&v) + sizeof(low_val),
                &high_val, sizeof(high_val));
}

int main() {
    uint64_t v = 0;
    foo(v, 1, 2);
}

With O1, the compiler reduces foo to:

        mov     DWORD PTR [rdi], esi
        mov     DWORD PTR [rdi+4], edx
        ret

Meaning there are no extra copies made, std::memcpy just serves as a hint to the compiler.

Glenn Teitelbaum · Accepted Answer · 2021-11-11 06:40:08Z

std::bit_cast alone is not enough since results will vary by the endian of the system.

Fortunately <bit> also contains std::endian.

Keeping in mind that optimizers generally compile-time resolve ifs that are always true or false, we can test endianness and act accordingly.

We only know beforehand how to handle big or little-endian. If it is not one of those, bit_cast results are not decodable.

Another factor that can spoil things is padding. Using bit_cast assumes 0 padding between array elements.

So we can check if there is no padding and the endianness is big or little to see if it is castable.

If it is not castable, we do a bunch of shifts as per the old method. (this can be slow)
If the endianness is big -- return the results of bit_cast.
If the endianness is little -- reverse the order. Not the same as c++23 byteswap, as we swap elements.

I arbitrarily decided that big-endian has the correct order with the high bits at x[0].

#include <bit>
#include <array>
#include <cstdint>
#include <climits>
#include <concepts>

template <std::integral F, std::integral T>
    requires (sizeof(F) >= sizeof(T))
constexpr auto split(F x) { 
    enum consts {
        FBITS=sizeof(F)*CHAR_BIT,
        TBITS=sizeof(F)*CHAR_BIT,
        ELEM=sizeof(F)/sizeof(T),
        BASE=FBITS-TBITS,
        MASK=~0ULL >> BASE
    };
    using split=std::array<T, ELEM>;
    const bool is_big=std::endian::native==std::endian::big;
    const bool is_little=std::endian::native==std::endian::little;
    const bool can_cast=((is_big || is_little)
        && (sizeof(F) == sizeof(split)));

    // All the following `if`s should be eliminated at compile time
    // since they are always true or always false
    if (!can_cast)
    {
        split ret;
        for (int e = 0; e < ELEM; ++e)
        {
            ret[e]=(x>>(BASE-e*TBITS)) & MASK;
        }
        return ret;
    }
    split tmp=std::bit_cast<split>(x);
    if (is_big)
    {
        return tmp;
    }
    split ret;
    for (int e=0; e < ELEM; ++e)
    {
        ret[e]=tmp[ELEM-(e+1)];
    }
    return ret;
}

auto tst(uint64_t x, int y)
{
    return split<decltype(x), uint32_t>(x)[y];
}

I believe this should be defined behavior.

EDIT: changed uint64 base to template parameter and minor edit tweaks

the question is if OP needs endian-aware result, because often tasks like that assume that you have use memory-order, not value-order, i.t for converting data to networked format — Swift - Friday Pie, Commented Nov 11, 2021 at 6:47
You can use std::endian to force to a wire format based on std::native and the goal. In every case I can think of, you need to know the machine endian to decide how to convert. — Glenn Teitelbaum, Commented Sep 19, 2022 at 14:11

KevinZ · Accepted Answer · 2021-11-14 02:57:45Z

0

Don't bother, because arithmetic is faster anyway:

uint64_t v;
uint32_t lower = v;
uint32_t upper = v >> 32;

answered Nov 14, 2021 at 2:57

KevinZ

3,2011 gold badge20 silver badges27 bronze badges

Splitting in two, like converting a 64-bit to two 32-bit, is the same speed using bit_cast with std::endian or bit twiddling. Splitting in four, like converting a 64-bit to four 16-bit is faster using bit_cast with std::endian.
– Glenn Teitelbaum
Commented Sep 19, 2022 at 14:03
@GlennTeitelbaum No, arithmetic (this includes bit twiddling) is typically faster than memory operations. Although, in this case, the compiler can probably see through the trivial memory operation and convert it to arithmetic.
– KevinZ
Commented Sep 20, 2022 at 19:05

Add a comment |

Collectives™ on Stack Overflow

Type-pun uint64_t as two uint32_t in C++20

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
c++
c++20
type-punning
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Not the answer you're looking for? Browse other questions tagged c++c++20type-punning or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
c++
c++20
type-punning
or ask your own question.