Conversion of String with emoticon unicode format to String with single character emoticon

Question

I am trying to convert a String object containing a string representing an emoticon's Unicode format into a String with the same emoticon represented by the Unicode as its only character, e.g. converting "\u1F34E" to 🍎.

I attempted the following under the supposition the string's escape sequence would be properly processed:

String str = "\u1F34E";
Console.WriteLine("'{0}' to '{1}'", str, str.ToCharArray()[0]);

Output:

'\u1F34E' to '\'

Outputting the string directly to a text file yields the same result, so it is not just the debugger I am using. I am unsure what to do. Any help would be greatly appreciated.

EDIT:

I realize my original question was not clear; my intent was to have a properly formatted UTF-16 string with a UTF-32 unicode within a string, as an API I was sending this value to required this formatting. I have successfully resolved the problem with the following:

String str = "1F34E"; //removed \u with prior parsing
int unicode_utf32 = int.Parse(stdemote.Unicode, System.Globalization.NumberStyles.HexNumber);
String unicode_utf16_str = Char.ConvertFromUtf32(unicode_utf32);
Console.WriteLine("'{0}' to '{1}'", str, unicode_utf16_str);

Note that code shown in the post does not produce output you claim it does. It is very unclear if you are looking how to represent non-basic-plane emoj characters in C# string or you actually have representation from somewhere else coming with format similar to C# strings and you want to convert that to C# string or you have no idea how non-basic-plane Unicode code points represented in .Net strings and asking about that... — Alexei Levenkov, Commented Jan 23, 2021 at 1:27

Mark Tolonen · Accepted Answer · 2021-01-24 07:41:44Z

3

This is not what it seems

string str = "\u1F34E";

.Net uses using UTF-16 to encode its strings. This means two bytes (16-bit) are used to represent one Unicode Code Point. Which in turn makes the Unicode \u escape sequence actually U+0000 to U+FFFF (16-bit) or for the extended version U+00000000 to U+FFFFFFFF (32-bit)

The emoji 🍎, uses a high code point 0001F34E so will need to encode it as a surrogate pair, two UTF-16 characters "\uD83C\uDF4E" or combined as
"\U0001F34E" ¹

Example

string str = "\uD83C\uDF4E";
// or
string str = "\U0001F34E"

If you goal is to separate actual text elements apposed to characters, you could make use of StringInfo.GetTextElementEnumerator

public static IEnumerable<string> ToElements(string source)
{
   var enumerator = StringInfo.GetTextElementEnumerator(source);
   while (enumerator.MoveNext())
      yield return enumerator.GetTextElement();
}

_{Note : My use of terminology might not be most common or accurate, if you think it can be tightened up feel free to edit}

¹ Thanks to Mark Tolonen for pointing out that the Unicode escape sequence actually supports both 16bit and 32bit variants \uXXXX and \UXXXXXXXX more information can be found in a blog post by Jon Skeet Strings in C# and .NET

edited Jan 24, 2021 at 7:41

Mark Tolonen

173k26 gold badges173 silver badges258 bronze badges

answered Jan 23, 2021 at 1:09

TheGeneral

80.9k9 gold badges106 silver badges147 bronze badges

Thank you, this was precisely the problem I was having. I used the code in the above edited original post to solve the problem.
– user3658679
Commented Jan 23, 2021 at 1:42
C# allows \U0001F34E escape codes so you don't have to manually build surrogates.
– Mark Tolonen
Commented Jan 24, 2021 at 6:48
@MarkTolonen I am extremely grateful for this comment, ill try and formulate an update soon
– TheGeneral
Commented Jan 24, 2021 at 6:52
1

No problem. Here's a reference: csharpindepth.com/articles/Strings
– Mark Tolonen
Commented Jan 24, 2021 at 6:56
1

@MarkTolonen, once again thanks for your input, updated and attributed
– TheGeneral
Commented Jan 24, 2021 at 7:11

Add a comment |

Collectives™ on Stack Overflow

Conversion of String with emoticon unicode format to String with single character emoticon

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
c#
unicode
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged c#unicode or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
c#
unicode
or ask your own question.