1

It seems that Linux and FreeBSD (at least) have different notions of how presentation of time is implemented in different locales (LC_TIME), notably but not entirely in how the hour of day is presented (12 hr am/pm style or 24 hour).

Linux (as seen on recent Fedora & Ubuntu):

env LC_TIME=C.UTF-8 TZ=UTC date --date 22:22:22 '+%H %T %p, %r, %+, %Ec' ; env LC_TIME=en_US.UTF-8 TZ=UTC date --date 22:22:22 '+%H %T %p, %r, %+, %Ec'
22 22:22:22 PM, 10:22:22 PM, %+, Fri May 17 22:22:22 2024
22 22:22:22 PM, 10:22:22 PM, %+, Fri 17 May 2024 10:22:22 PM UTC

FreeBSD:

env LC_TIME=C.UTF-8 TZ=UTC date -v 22H -v 22M -v 22S '+%H %T %p, %r, %+, %Ec' ; env LC_TIME=en_US.UTF-8 TZ=UTC date -v 22H -v 22M -v 22S '+%H %T %p, %r, %+, %Ec'
22 22:22:22 PM, 10:22:22 PM, Fri May 17 22:22:22 UTC 2024, Fri May 17 22:22:22 2024
22 22:22:22 PM, 10:22:22 PM, Fri May 17 22:22:22 UTC 2024, Fri May 17 22:22:22 2024

My questions are: Is there a standard that defines presentation for various locale specific settings (like LC_TIME)? If so, where is the standard documented?

1 Answer 1

1

The short answer: Yes. POSIX and Unicode CLDR.

The longer version:

If we look at the FreeBSD Developer Handbook Chapter 4. Localization and Internationalization - L10N and I18N it tells us FreeBSD follows POSIX.1 Native Language Support (NLS). In practice Linux does the same thing as we live in a Unix-like world. Not many are actually certified but that is the gospel we all follow to some degree.

But if you check the man page for date you see it claims compatibility with IEEE Std 1003.2 ("POSIX.2"). It uses strftime(3) to output:

   %+    is  replaced by national representation of the date and time (the
         format is similar to that produced by date(1)).
   %E* %O*
         POSIX locale extensions.  The sequences %Ec %EC %Ex %EX  %Ey  %EY
         %Od  %Oe %OH %OI %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy are supposed
         to provide alternate representations.

         Additionally %OB  implemented  to  represent  alternative  months
         names (used standalone, without day mentioned).
   %c    is replaced by national representation of time and date.        

On Linux it depends on your userland. Most likely strftime from the GNU C Library. See this man page for libc strftime(3) which references C11 and POSIX.1-2008.

   %+     The date and time in date(1) format. (TZ) (Not supported
          in glibc2.)

   %E     Modifier: use alternative ("era-based") format, see below.
          (SU)
   %c     The preferred date and time representation for the current
          locale.  (The specific format used in the current locale
          can be obtained by calling nl_langinfo(3) with D_T_FMT as
          an argument for the %c conversion specification, and with
          ERA_D_T_FMT for the %Ec conversion specification.)  (In
          the POSIX locale this is equivalent to %a %b %e %H:%M:%S
          %Y.)

    ...and that of the E modifier is to use a locale-
   dependent alternative representation.  The rules governing date
   representation with the E modifier can be obtained by supplying
   ERA as an argument to a nl_langinfo(3)

Which is why you see %+ on Linux as it is likely linked against glibc2. The real question is then what happens with %Ec. On the FreeBSD man page it says "supposed to" which is a bit of a weasel word. Is it fully implemented and how? It would then be worth making changes to the locale settings on FreeBSD to see if it checks elsewhere or simply does not. Or dive into the source of strftime.

If I interpret the code correctly both E and O simply does a simple count and nothing else. I do not see anything happens with the count of Ealternative. I would expect that to have been addressed in case 'c'. But I am for sure no C guru but to me it seems %Ec is then handled as %c to which the man page says is the national representation.

I would then play around with locale settings on FreeBSD to see if it changes or not. I think some base settings on your systems might not be aligned.

The source of truth is then:

If we dive into the contents POSIX.1-2017 IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008) which is part of the Single UNIX Specification we find:

The actual locale definitions across most operating systems are today governed by the The Unicode Common Locale Data Repository (CLDR).

You can see an overview for FreeBSD in New approach to the FreeBSD locale database which decribes how the CLDR is imported. Consider new in airquotes as that article is from 2009.

What is then correct? Who is following the spec most stringent? And what version is followed?

And beware of LC_MESSAGES. I am unsure if that comes into play.

Giles have a couple of really good answers which are worth reading in this context:

As you are asking about standards and using C.UTF-8 you are probably a true believer in standards. But for other readers who might think that the abomination of a hack which was en_DK.UTF-8 was a good idea please read my answer to Create a new locale on FreeBSD

Personally I think CLDR and Unicode versions should be noted in the man page for locale. But AFAIK you need to keep track of the release notes to do that. For FreeBSD 14 we can see it synced up with CLDR 41.0 and Unicode 14 with change e87ec409fa9b (look under "Userland Configuration Changes").

1
  • Nice answer. Ultimately I was trying to determine "who is right" here. Maybe both OS are - based on their particular implementation and/or interpretation of the standard, but I was looking for truth sources to help make that determination. You have provided that, so I will dig in now and see if there are bugs here or not (or maybe opportunities for documentation clarity improvements).
    – Juan
    Commented May 17 at 16:56

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .