How to determine which character is used as decimal separator (radix point) or thousand separator, under current locale?

Question

Comming from this answer to Format currency in Bash, I wonder for ways to determine which characters are used as numeric separators.

There are lot of issue regarding locales and number formating, for sample:

printf '%.5f\n' $(bc -l <<<'4*a(1)')
3.14159

LANG=de_DE printf '%.5f\n' $(bc -l <<<'4*a(1)')
bash: printf: 3.14159265358979323844: invalid number
3,00000

binary calculator bc seem not handling locale correctly...

Under mentioned answer, searching for decimal separator (or radix character), I've used this:

int2amount() {
    local TIMEFORMAT=%U _decsep
    read _decsep < <(eval 'time true' 2>&1)
    _decsep=${_decsep//[0-9]}
    ...
}

This work fine:

pi() { local TIMEFORMAT=%U _decsep;read _decsep < <(eval 'time true' 2>&1);_decsep=${_decsep//[0-9]};
       local pi=$(bc -l <<<'4*a(1)')
       printf '%.5f\n' ${pi/./$_decsep}
}

pi
3.14159
LANG=de_DE pi
3,14159

But as thousand separator is a lot easier to find:

printf -v ts "%'d" 1111 ; ts=${ts//1}

There is no fork, so system footprint is very light.

So I could imagine at begin of source file, something like:

numericSeparators() {
    local TIMEFORMAT=%U
    read NUM_DEC_SEP < <(eval 'time true' 2>&1)
    NUM_DEC_SEP=${NUM_DEC_SEP//[0-9]}
    printf -v NUM_THO_SEP "%'d" 1111
    NUM_THO_SEP=${NUM_THO_SEP//1}
}
numericSeparators
declare -r NUM_THO_SEP NUM_DEC_SEP
...

But I think <(eval 'time true' 2>&1) heavy for the goal. I'm searching for a lighter and/or cleaner way for determine them (even both decimal and thousand separators).

Self-answer

Thanks to dan's answer, my functions would become simplier and quicker!

Sample to correct/adapt bc's output:

pi() {
    local _decsep pi=$(bc -l <<<'4*a(1)')
    printf -v _decsep %.1f 1
    printf '%.5f\n' ${pi/./${_decsep:1:1}}
}

pi
3.14159
LANG=de_DE.UTF-8 pi
3,14159

A small function that will set two variables: NUM_THO_SEP for thousand separator and NUM_DEC_SEP for decimal separator:

numericSeparators() {
    local numtest
    printf -v numtest "%'.1f" 1111
    NUM_THO_SEP=${numtest:1:1}
    NUM_THO_SEP=${NUM_THO_SEP/1}
    NUM_DEC_SEP=${numtest: -2:1}
}
numericSeparators

for loctest in {C,en_US,fr_{CH,FR},de_{CH,DE},it_{CH,IT}}{,.UTF8}  ;do
    LANG=$loctest numericSeparators
    LANG=C printf 'LANG=%-12s thsnd=%-1s \e[2m(%q)\e[0m\e[45G radix=%q\n' \
        "$loctest" "$NUM_THO_SEP"{,} "$NUM_DEC_SEP"
done

LANG=C            thsnd=  ('')               radix=.
LANG=C.UTF8       thsnd=  ('')               radix=.
LANG=en_US        thsnd=, (\,)               radix=.
LANG=en_US.UTF8   thsnd=, (\,)               radix=.
LANG=fr_CH        thsnd=' (\')               radix=.
LANG=fr_CH.UTF8   thsnd=’ ($'\342\200\231')  radix=.
LANG=fr_FR        thsnd=�($'\240')          radix=\,
LANG=fr_FR.UTF8   thsnd=  ($'\342\200\257')  radix=\,
LANG=de_CH        thsnd=' (\')               radix=.
LANG=de_CH.UTF8   thsnd=’ ($'\342\200\231')  radix=.
LANG=de_DE        thsnd=. (.)                radix=\,
LANG=de_DE.UTF8   thsnd=. (.)                radix=\,
LANG=it_CH        thsnd=' (\')               radix=.
LANG=it_CH.UTF8   thsnd=’ ($'\342\200\231')  radix=.
LANG=it_IT        thsnd=. (.)                radix=\,
LANG=it_IT.UTF8   thsnd=. (.)                radix=\,

Note: As my terminal is UTF-8, they are unable to print out NO-BREAKABLE SPACE in plain ASCII ($'\240'). This is because they show a replacement character: � instead.

@JamesBrown Oh yes! (forgot this!) ... But locale is not builtin, so system footprint won't be better... — F. Hauri - Give Up GitHub, Commented Jun 29, 2022 at 8:50
Try LANG=de_DE printf '%.5f\n' $(LANG=de_DE bc -l <<<'4*a(1)'), maybe? — Renaud Pacalet, Commented Jun 29, 2022 at 9:34
@RenaudPacalet On my system, this render: Error: bash: printf: 3.14159265358979323844: invalid number, then 3,00000. — F. Hauri - Give Up GitHub, Commented Jun 29, 2022 at 9:40
My guess is that bc ignores the locale while bash does not. — Renaud Pacalet, Commented Jun 29, 2022 at 9:41

dan · Accepted Answer · 2022-06-29 11:08:11Z

3

You can get the locale's radix character (decimal separator) with:

printf -v ds '%#.1f' 1
ds=${ds//[0-9]}

And the thousands grouping separator, with:

printf -v ts "%'d" 1111
ts=${ts//1}

Some locales (eg. C) have no thousands separator, in which case $ts is empty. Conversely, if the radix character is not defined by the locale, POSIX (printf(3)) says it should default to .. The # flag guarantees that it will be printed.

answered Jun 29, 2022 at 11:08

dan

5,1017 silver badges15 bronze badges

1

Then in one way: LANG=$loczest printf -v var %\'5.1f 1111;thsnd=${var:1:1} radix=${var: -2:1} ;thsnd=${thsnd/1};declare -p thsnd radix
– F. Hauri - Give Up GitHub
Commented Jun 29, 2022 at 11:47

Add a comment |

RARE Kpop Manifesto · Accepted Answer · 2023-07-24 21:44:49Z

0

In the vast majority of cases, you don't even have to know which locale setting you're in to properly decode any value, regardless of your own locale's settings.

Because you simply can't have 2 radix points (RP), of any base, one can just use gsub() or similar quick counting tools to figure out which one of , vs. . has multiple copies.

If both does, that's likely problematic input to begin with.
If one of each exists, right side one has to be the RP
When there's only one and ambiguous, then consider :

If there's 0 or more digits that's anything but 3 digits to right of that character, it couldn't possibly be the thousands sep

And more likely than not, thousands sep is bookended on both sides by digits, but having a leading edge or trailing edge radix point isn't all that uncommon

Only the 4 to 6 digit numbers (including below radix point, assuming its' still ambiguous) would require extra context to properly decode.

answered Jul 24, 2023 at 21:44

RARE Kpop Manifesto

2,7204 silver badges13 bronze badges

You said: "In the vast majority of cases, you don't even have to know which locale setting". I agree, but if I've asked this, it was because I encountered a case which was not in vast majority of cases: The case where a tool (bc) doesn't apply locale correctly.
– F. Hauri - Give Up GitHub
Commented Apr 21 at 15:09
@F.Hauri-GiveUpGitHub : you clearly missed my point. i'm telling you to ignore whatever BS is coming out of shells or bc since the logic is very straight forward to deal with radix points regardless of your locale settings. I hardly trust shells to get things done, seeing that the POSIX group thinks heredocs are essential but refuse to also mandate herestrings, even though they use the exact same parsing mechanism.
– RARE Kpop Manifesto
Commented Apr 26 at 19:25

Add a comment |

Collectives™ on Stack Overflow

How to determine which character is used as decimal separator (radix point) or thousand separator, under current locale?

Self-answer

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
bash
locale
number-formatting
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Self-answer

2 Answers 2

Not the answer you're looking for? Browse other questions tagged bashlocalenumber-formatting or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
bash
locale
number-formatting
or ask your own question.