The current version of UTF-16 is only capable of encoding 1,112,064 different numbers(code points); 0x0-0x10FFFF
.
Does the Unicode Consortium Intend to make UTF-16 run out of characters?
i.e. make a code point > 0x10FFFF
If not, why would anyone write the code for a utf-8 parser to be able to accept 5 or 6 byte sequences? Since it would add unnecessary instructions to their function.
Isn't 1,112,064 enough, do we actually need MORE characters? I mean: How quickly are we running out?
utf8-loose
parser that accepts 13-byte code points. This is not unuseful. Obviously this process doesn’t give a fart about UTF-16, which is a very unfortunate legacy we’d all like to forget since it incorporates the worst disadvantages of both UTF-8 and UTF-32 without enjoying any of the advantages of either: UTF-16 is truly the worst of both worlds. But make no misake: any strict UTF-8 parser must reject code points over 4 bytes in encoded length. This is to kiss UTF-16’s sweet you know what.