initializing a timezone database and getting all timezone names (before main())

Question

I have written the below small program that tries to initialize a tzdb before the main() runs. So once the main function runs, it checks the two global variables (that are already initialized at that stage) to determine if an error has occurred during the initialization process.

This is done using three functions:

try_initialize_timezone_database
allocate_and_get_buffer_for_timezone_names
retrieve_all_timezone_names

and two constant global variables

is_tzdb_initialized
all_timezone_names

that I have defined.

I have also intentionally made the functions have internal linkage and the variables have external linkage since the functions will only be called once but the variables will be accessed throughout the program.

The four main goals here are:

to initialize the timezone database;
to get a span (the 2nd global variable) that gives a const view to the pmr::vector that contains all the names stored on its static buffer;
all the operations must be noexcept (and actually not throw!!);
and lastly, efficiency.

Here:

#include <chrono>
#include <stdexcept>
#include <expected>
#include <functional>
#include <vector>
#include <string_view>
#include <system_error>
#include <array>
#include <memory_resource>
#include <new>
#include <exception>
#include <span>
#include <algorithm>
#include <iterator>
#include <iostream>
#include <cstddef>


namespace
{

[[ nodiscard ]] bool
try_initialize_timezone_database( ) noexcept
{
    bool is_initialized { };

    try
    {
        [[ maybe_unused ]] const auto& tzdb_list { std::chrono::get_tzdb_list( ) };
        is_initialized = true;
    }
    catch ( const std::runtime_error& )
    { }

    return is_initialized;
}

[[ nodiscard ]] std::expected<std::reference_wrapper<std::pmr::vector<std::string_view>>, std::errc>
allocate_and_get_buffer_for_timezone_names( const std::size_t timezone_names_count ) noexcept
{
    constexpr auto timezone_names_default_count { 1024uz };
    constexpr auto buffer_size_in_bytes { timezone_names_default_count * sizeof( std::string_view ) };

    static std::array<std::byte, buffer_size_in_bytes> buffer;
    static std::pmr::monotonic_buffer_resource buffer_resource { std::data( buffer ), std::size( buffer ) };
    static std::pmr::vector<std::string_view> names_buffer { &buffer_resource };

    try
    {
        names_buffer.reserve( timezone_names_count );
    }
    catch ( const std::bad_alloc& )
    {
        return std::unexpected { std::errc::not_enough_memory };
    }
    catch ( const std::length_error& )
    {
        return std::unexpected { std::errc::value_too_large };
    }
    catch ( const std::exception& )
    {
        return std::unexpected { std::errc::resource_unavailable_try_again };
    }

    return names_buffer;
}

[[ nodiscard ]] std::expected<std::span<const std::string_view>, std::errc>
retrieve_all_timezone_names( const bool is_tzdb_initialized ) noexcept
{
    if ( is_tzdb_initialized == false ) return std::unexpected { std::errc::state_not_recoverable };

    const auto timezone_names_count { std::chrono::get_tzdb( ).zones.size( ) +
                                      std::chrono::get_tzdb( ).links.size( ) };

    auto timezone_names_buffer { allocate_and_get_buffer_for_timezone_names( timezone_names_count ) };
    if ( timezone_names_buffer.has_value( ) == false ) return std::unexpected { timezone_names_buffer.error( ) };

    auto& names_buffer { timezone_names_buffer->get( ) };

    constexpr auto get_name { [ ]( const auto& tz ) noexcept { return tz.name( ); } };

    std::ranges::transform( std::chrono::get_tzdb( ).zones, std::back_inserter( names_buffer ), get_name );
    std::ranges::transform( std::chrono::get_tzdb( ).links, std::back_inserter( names_buffer ), get_name );

    return names_buffer;
}

}

const auto is_tzdb_initialized { try_initialize_timezone_database( ) };
const auto all_timezone_names { retrieve_all_timezone_names( is_tzdb_initialized ) };


int main( )
{
    if ( is_tzdb_initialized == false )
    {
        std::cout << "The timezone database could not be initialized\n";
        return 1;
    }

    if ( all_timezone_names.has_value( ) == false )
    {
        std::cout << "An error occurred during buffer allocation\n";
        return 1;
    }

    const auto timezone_names { *all_timezone_names };

    std::ranges::for_each( timezone_names, [ ]( const auto timezone_name ) { std::cout << timezone_name << '\n'; } );
}

Note 1: As can be seen, the std::expected in the return type of the function allocate_and_get_buffer_for_timezone_names stores the vector as a reference to the actual vector object that has static storage duration (due to obvious reasons). I made this decision to avoid losing the vector and causing UB. I'm really not sure if this makes the program correct.

Note 2: I have used a static std::pmr::vector and a static std::array (as the underlying buffer) to avoid a dynamic allocation if possible. The size of the aforementioned array is 1024 * 16 (16384 bytes, aka 4 memory pages). In the case the actuall count of the timezone names is greater than 1024 (currently 597 on my machine), the call to reserve should not use that buffer and instead allocate a bigger one (typically on the heap). This is my expectation.

Is this a satisfactory way of achieving the above goals? Is there any UB anywhere in the code? Or anything that can be improved?

What is the desired behavior, what should the user experience be, when DB init fails? That is, do we have a fallback position, or should we just give up? In which case, maybe a fatal exception would be more appropriate. — J_H, Commented Apr 17 at 20:41
@J_H No I don't think I need that. The program should print a message and exit gracefully. — digito_evo, Commented Apr 17 at 20:58

Toby Speight · Accepted Answer · 2024-04-21 08:18:32Z

The code looks correct and well presented. Most of my comments are matters of style.

There are a lot of headers included, in a strange order. It would be easier to comprehend in alphabetical order.

In try_initialize_timezone_database(), the is_initialized variable is unnecessary. It's simpler and more comprehensible to return the appropriate constant from each path:

    try
    {
        [[ maybe_unused ]] const auto& tzdb_list { std::chrono::get_tzdb_list( ) };
        return true;
    }
    catch ( const std::runtime_error& )
    {
        return false;
    }

In allocate_and_get_buffer_for_timezone_names(), the spacing of sizeof operator is strange, formatting it more like a function. I would suggest that sizeof (std::string_view) better conveys its nature.

It is easy to confirm that reducing timezone_names_default_count to a very small value causes no exceptions or Valgrind issues, proving that the fallback to default allocator works correctly. But do note that because vectors allocate contiguous memory, if the count exceeds the default, then the buffer is entirely wasted space, as the whole content will be allocated from the default allocator.

Passing a std::reference_wrapper around seems to be unnecessary given that the vector can be move-constructed in all cases that are not elided. This can be proved by forcing the allocator to fail when we attempt a second allocation (given the knowledge we have more than 512 names in our database):

    static std::pmr::monotonic_buffer_resource
        buffer_resource { std::data( buffer ), std::size( buffer ),
                          std::pmr::null_memory_resource()};

We do need to ensure that the last instance is static, so that the elements are not destructed during the span's lifetime. I.e. in retrieve_all_timezone_names() we'll need static auto timezone_names_buffer.

While we're looking at lifetimes, there's a small risk in using string views if the timezone database list is updated during program execution and std::chrono::reload_tzdb() is then called. I'd need to set up a controlled environment (probably in Docker) to test that, so I'm not going to bother. But it may be a genuine concern if used in a long-lived server.

Boolean values don't need to compared against false. Just use them directly or with the negation operator - e.g.

    if (!is_tzdb_initialized)

The way that we're inspecting and recreating std::expected looks clunky compared to using the more functional-style members it provides (such as transform and and_then). I would go for something like

[[nodiscard]] auto
populate_timezone_name_list(std::pmr::vector<std::string_view>&& names_buffer)
{
    auto inserter = std::back_inserter(names_buffer);
    std::ranges::transform(std::chrono::get_tzdb().zones, inserter,
                           &std::chrono::time_zone::name);
    std::ranges::transform(std::chrono::get_tzdb().links, inserter,
                           &std::chrono::time_zone_link::name);
    return names_buffer;
}

[[nodiscard]] std::expected<std::pmr::vector<std::string_view>, std::errc>
retrieve_all_timezone_names(const bool is_tzdb_initialized) noexcept
{
    if (!is_tzdb_initialized) {
        return std::unexpected { std::errc::state_not_recoverable };
    }

    auto const& tzdb = std::chrono::get_tzdb();
    auto const timezone_names_count { tzdb.zones.size() + tzdb.links.size() };

    return allocate_and_get_buffer_for_timezone_names(timezone_names_count)
        .transform(populate_timezone_name_list);
}

That said, in this case, I would simply let exceptions pass up to the top-level function and handle them there.

Instead of the get_name wrapper function, I'd just pass the appropriate accessors directly to std::ranges::transform() as the projection argument. But I would probably write a single inserter rather than creating two:

    auto inserter = std::back_inserter(names_buffer);
    std::ranges::transform(std::chrono::get_tzdb().zones, inserter,
                           &std::chrono::time_zone::name);
    std::ranges::transform(std::chrono::get_tzdb().links, inserter,
                           &std::chrono::time_zone_link::name);

When we have C++26, we might concatenate the two views first before populating the vector:

    auto all_names_view = std::views::concat({
            std::chrono::get_tzdb().zones | std::views::transform(&std::chrono::time_zone::name),
            std::chrono::get_tzdb().links | std::views::transform(&std::chrono::time_zone_link::name)}
        );
    names_buffer.assign(all_names_view.begin(), all_names_view.end());

We can do that before reserving space in the vector, so we can use the view's size rather than calculating the sum separately.

The demo program uses the wrong stream for error messages. Also prefer using EXIT_FAILURE macro from <cstdlib> for the return value.

Instead of for_each() with a lambda, we might find it simpler to copy the names to an output-stream iterator:

    std::ranges::copy(timezone_names,
                      std::ostream_iterator<std::string_view>(std::cout, "\n"));

Modified code

I think it's actually easier to use a single immediately-invoked anonymous function for this. There's then no risk of multiple invocations, and we don't need the separate try_initialize_timezone_database() test.

#include <array>
#include <cassert>
#include <chrono>
#include <cstddef>
#include <cstdlib>
#include <expected>
#include <iterator>
#include <memory_resource>
#include <new>
#include <ranges>
#include <span>
#include <stdexcept>
#include <string_view>
#include <system_error>
#include <vector>

static auto const all_timezone_names =
    [] -> std::expected<std::span<const std::string_view>, std::errc>
    {
        constexpr auto timezone_names_default_count = 1024uz;
        try {
            auto const& tzdb = std::chrono::get_tzdb();
            auto const timezone_names_count = tzdb.zones.size() + tzdb.links.size();
            // In debug builds, we want to know if the provision is too small.
            // It's okay to not test this in release builds, as we'll fall back
            // to less-efficient default allocation.
            assert(timezone_names_count < timezone_names_default_count);

            constexpr auto buffer_size_in_bytes
                = timezone_names_default_count * sizeof (std::string_view);
            // the backing store needs to outlive this scope, so it and the allocator are static
            static auto buffer = std::array<std::byte, buffer_size_in_bytes>{};
            static auto buffer_resource
                = std::pmr::monotonic_buffer_resource{buffer.data(), buffer.size()};
            static auto names_buffer = std::pmr::vector<std::string_view>{&buffer_resource};

            auto inserter = std::back_inserter(names_buffer);
            names_buffer.reserve(timezone_names_count);
            std::ranges::transform(tzdb.zones, inserter, &std::chrono::time_zone::name);
            std::ranges::transform(tzdb.links, inserter, &std::chrono::time_zone_link::name);
            return names_buffer;
        } catch (const std::bad_alloc&) {
            return std::unexpected { std::errc::not_enough_memory };
        } catch (const std::length_error&) {
            return std::unexpected { std::errc::value_too_large };
        } catch (const std::runtime_error&) {
            return std::unexpected { std::errc::state_not_recoverable };
        } catch (const std::exception&) {
            return std::unexpected { std::errc::resource_unavailable_try_again };
        }
    }();


#include <iostream>

int main()
{
    if (!all_timezone_names) {
        std::cout << "An error occurred during buffer allocation: "
                  << std::make_error_condition(all_timezone_names.error()).message() << '\n';
        return EXIT_FAILURE;
    }

    const auto& timezone_names = all_timezone_names.value();

    std::cout << "There are "
              << timezone_names.size()
              << " timezones:\n";

    std::ranges::copy(timezone_names,
                      std::ostream_iterator<std::string_view>(std::cout, "\n"));
}

How do you (portably) know the target's page size? Or even whether it uses pages? I think it's reasonable to trust the platform's std::array implementation to perform the alignment if it's beneficial, without needing to tell it to do so. TBH, I don't see that aligning it would improve anything. — Toby Speight, Commented Apr 20 at 16:02
Sorry, it was late when I wrote that. A use-after-destroy that Valgrind can't catch. What we need is that the final destination passed to the span constructor is static. Which it was before my latest edit. I'll fix shortly. — Toby Speight, Commented Apr 21 at 8:06
I've also added a concern about possible invalidation of the string views if an updated tzdb is reloaded. — Toby Speight, Commented Apr 21 at 8:19
It should probably be noted for people looking for info on static initialization that while this pattern works for this specific program, it will not work in general. As soon as there are multiple threads, or multiple translation units/modules/whatever, or any other complication (like the time zone database being reloaded), all bets are off. — indi, Commented Apr 22 at 19:49

Stack Exchange Network

initializing a timezone database and getting all timezone names (before main())

1 Answer 1

Modified code

Not the answer you're looking for? Browse other questions tagged
c++
error-handling
memory-management
c++23
or ask your own question.

Hot Network Questions

initializing a timezone database and getting all timezone names (before main())

1 Answer 1

Modified code

Not the answer you're looking for? Browse other questions tagged c++error-handlingmemory-managementc++23 or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
c++
error-handling
memory-management
c++23
or ask your own question.