1

I have this code to help parse the unicode for an emoji:

public string DecodeEncodedNonAsciiCharacters(string value)
{
    return Regex.Replace(
       value,
      @"\\u(?<Value>[a-zA-Z0-9]{4})",
       m =>
         ((char)int.Parse(m.Groups["Value"].Value, NumberStyles.HexNumber)).ToString();
    );
} 
   

so I put my code as such

DecodeEncodedNonAsciiCharacters("\uD83C\uDFCB\uD83C\uDFFF\u200D\u2642\uFE0F");

into Console.WriteLine(); which gives me this emoji 🏋🏿‍♂️ so my question is how can I turn this

"\uD83C\uDFCB\uD83C\uDFFF\u200D\u2642\uFE0F"

into this Codepoints

U+1F3CB, U+1F3FF, U+200D, U+2642, U+FE0F

the codepoints above are from Emojipedia.org

4
  • @"\\u(?<Value>[a-fA-F0-9]{4})" note f and F Commented Nov 27, 2020 at 6:55
  • Help me with this one i need help matching the emoji with this json file from here gist.github.com/oliveratgithub/…
    – Lee Rankin
    Commented Nov 27, 2020 at 7:08
  • so i can get the name associated with that emoji i know it can be done with Json.Net but idk how to implement it
    – Lee Rankin
    Commented Nov 27, 2020 at 7:09
  • It seems you want to combine two surrogate characters into one Utf-32. If it's your case, please, see my answer Commented Nov 27, 2020 at 9:33

1 Answer 1

1

It seems, that you want to combine two surrogate characters into one Utf-32:

\uD83C\uDFCB => \U0001F3CB

If it's your case, you can put it like this:

Code:

public static IEnumerable<int> CombineSurrogates(string value) {
  if (null == value)
    yield break; // or throw new ArgumentNullException(name(value));

  for (int i = 0; i < value.Length; ++i) {
    char current = value[i];
    char next = i < value.Length - 1 ? value[i + 1] : '\0';

    if (char.IsSurrogatePair(current, next)) {
      yield return (char.ConvertToUtf32(current, next));

      i += 1;
    }
    else
      yield return (int)current;
  }
}

public static string DecodeEncodedNonAsciiCharacters(string value) =>
  string.Join(" ", CombineSurrogates(value).Select(code => $"U+{code:X4}"));

Demo:

string data = "\uD83C\uDFCB\uD83C\uDFFF\u200D\u2642\uFE0F";

// If you want codes, uncomment the line below
//int[] codes = CombineSurrogates().ToArray(data);

string result = DecodeEncodedNonAsciiCharacters(data);

Console.Write(result);

Outcome:

U+1F3CB U+1F3FF U+200D U+2642 U+FE0F

Not the answer you're looking for? Browse other questions tagged or ask your own question.