1

I am converting my string to byte array using ASCII encoding using below code.

String data = "<?xml version="1.0" encoding="utf-8"?><ns0:ReceivedPayment Amount="1.01"/>"
byte[] buffer = Encoding.ASCII.GetBytes(data);

The problem i am facing is it's adding "?" in my string.

Now if i again convert back my byte array to string

var str = System.Text.Encoding.Default.GetString(buffer);

my string becomes

string str = "?<?xml version="1.0" encoding="utf-8"?><ns0:ReceivedPayment Amount="1.01"/>"

Does any one know why it's adding "?" in my string and how to remove it.

9
  • I could not reproduce this, but using mismatching encoding and decoding is wrong anyway (even if it had worked)
    – user555045
    Commented Feb 12, 2016 at 19:05
  • 1
    Is encoding="utf-8" not a hint you should use Encoding.UTF8?
    – leppie
    Commented Feb 12, 2016 at 19:05
  • encoding="utf-8" is just inside string. even i remove that it's behaving same. Commented Feb 12, 2016 at 19:08
  • @user1104946 it means that the receiving end will (or at least should) decode it as if it was utf-8. If it's not, well, that could be bad.
    – user555045
    Commented Feb 12, 2016 at 19:10
  • I just changed Encoding.ASCII.GetBytes(data) to Encoding.UTF8.GetBytes(data) still facing same issue. Commented Feb 12, 2016 at 19:12

3 Answers 3

5

It seems that you showed only simplified code. Am I right that you read data from a file? If yes, check for a BOM (byte order mark) field at the begining of the file. It is used for encoding: UTF-8, UTF-16 and UTF-32.

2
  • Yes, I only showed simplified code. My string had hidden character. I used Regex expression to remove hidden character. Commented Feb 12, 2016 at 20:10
  • There are many ways to remove BOM markers. If you are sure that the first significant char in you string is always '<', then you can use code like this: int index = data.IndexOf("<"); if (index > 0) data = data.Substring(index);
    – michael
    Commented Feb 12, 2016 at 20:21
0

There a several things wrong here. One is not showing the relevant code.

Nonetheless, if you use valid methods to read text from a UTF-8, UTF-32, etc file, you won't have a BOM in your string because the string will hold the text and the BOM is not part of the text.

One the other hand, if you are reading an XML file, it is not a "text" file. You should use an XML reader. That would take care to use the encoding that is (most likely) indicated in the file.

And, when you write an XML file (which I presume you'll be doing with the byte array), you should use an XML writer. That would take care to use the encoding you specify and write it into the file.

Keep in mind, though, that conversion from Unicode (for which UTF-8 is one encoding) to some other character set can silently corrupt your data with a replacement character (typically '?') for those that are not in the target character set.

-1

Here is my extension method:

   public static byte[] ToByteArray(this string str)
    {
        var bytes = new byte[str.Length * sizeof(char)];
        Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
        return bytes;
    }
1
  • so is this normal converting string to byte array adds a "?" in string. Commented Feb 12, 2016 at 19:05

Not the answer you're looking for? Browse other questions tagged or ask your own question.