Convert Persian and Arabic digits to English

Question

I use the following utility method to convert Persian and Arabic digits to English using regex:

convertNumbers2English: function (string) {
    return string.replace(/[٠١٢٣٤٥٦٧٨٩]/g, function (c) {
        return c.charCodeAt(0) - 1632;
    }).replace(/[۰۱۲۳۴۵۶۷۸۹]/g, function (c) {
       return c.charCodeAt(0) - 1776;
   });
}

\$\begingroup\$ See this post. \$\endgroup\$
– Mahozad
Commented Dec 24, 2021 at 15:04 — Mahozad, Commented Dec 24, 2021 at 15:04

Peter Taylor · Accepted Answer · 2017-06-27 14:24:42Z

Be nice to the maintenance programmer, even (especially?) if you expect it to be you. If you're mixing characters which are visually indistinguishable but don't need to be literal self-representations, you can use Unicode escapes and hexadecimal offsets as so:

convertNumbers2English: function (string) {
    return string.replace(/[\u0660-\u0669]/g, function (c) {
        return c.charCodeAt(0) - 0x0660;
    }).replace(/[\u06f0-\u06f9]/g, function (c) {
       return c.charCodeAt(0) - 0x06f0;
   });
}

Just that small change accomplishes the following:

I can easily see that I haven't missed any digits without having to count;
I can easily see that I haven't accidentally mixed digits from the two styles;
I can easily see that the offset subtracted is correct in each case;
I can easily see that the values returned by the anonymous functions are integers from 0 to 9 and not strings or codepoints corresponding to '0' to '9', which is useful if I'm not primarily a JS developer;

If I care about squeezing every last byte out of my JS, I can see a way to combine the two into one:

convertNumbers2English: function (string) {
    return string.replace(/[\u0660-\u0669\u06f0-\u06f9]/g, function (c) {
        return c.charCodeAt(0) & 0xf;
    });
}

The minimiser should take care of unescaping the Unicode escapes.

It might be slightly easier for me to find which characters they are, because I can look up the hex values in a Unicode character table.

Just out of curiosity, why do you say that the characters don't need to be literal self-representations? Wouldn't it be more meaningful to use the self-representation? — Cave Johnson, Commented Jun 27, 2017 at 19:26
@KodosJohnson, is ٩ the Persian one or the Arabic one? Is \u06f5 the Persian one or the Arabic one? I hope that answers your question. — Peter Taylor, Commented Jun 27, 2017 at 20:07

Tushar · Accepted Answer · 2017-06-27 13:10:02Z

You can use capture groups

return string.replace(/([٠١٢٣٤٥٦٧٨٩])|([۰۱۲۳۴۵۶۷۸۹])/g, function(m, $1, $2) {
    return m.charCodeAt(0) - ($1 ? 1632 : 1776);
});

$1 is the character matched by [٠١٢٣٤٥٦٧٨٩] and $2 is character matched by [۰۱۲۳۴۵۶۷۸۹]. Using ternary operator, correct value is subtracted from the charcode.

If arrow function is supported by target environments, the code can be shortened to

convertNumbers2English: str => str.replace(/([٠١٢٣٤٥٦٧٨٩])|([۰۱۲۳۴۵۶۷۸۹])/g, (m, $1, $2) => m.charCodeAt(0) - ($1 ? 1632 : 1776));

Mohsen Alyafei · Accepted Answer · 2020-07-30 18:52:59Z

If the string may contain both "Arabic" and "Persian" numbers then a one-line "replace" can do the job as follows.

The Arabic and Persian numbers are converted to English equivalents. Other text characters remain unchanged.

Num= "۳٣۶٦۵any٥۵٤۶32٠۰";     // Output should be "33665any55453200"

Num = Num.replace(/[٠-٩]/g, d => "٠١٢٣٤٥٦٧٨٩".indexOf(d)).replace(/[۰-۹]/g, d => "۰۱۲۳۴۵۶۷۸۹".indexOf(d));

console.log(Num);

Stack Exchange Network

Convert Persian and Arabic digits to English

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
javascript
regex
number-systems
or ask your own question.

Hot Network Questions

Convert Persian and Arabic digits to English

3 Answers 3

Not the answer you're looking for? Browse other questions tagged javascriptregexnumber-systems or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
javascript
regex
number-systems
or ask your own question.