String to Byte Array

Question

I am converting my string to byte array using ASCII encoding using below code.

String data = "<?xml version="1.0" encoding="utf-8"?><ns0:ReceivedPayment Amount="1.01"/>"
byte[] buffer = Encoding.ASCII.GetBytes(data);

The problem i am facing is it's adding "?" in my string.

Now if i again convert back my byte array to string

var str = System.Text.Encoding.Default.GetString(buffer);

my string becomes

string str = "?<?xml version="1.0" encoding="utf-8"?><ns0:ReceivedPayment Amount="1.01"/>"

Does any one know why it's adding "?" in my string and how to remove it.

I could not reproduce this, but using mismatching encoding and decoding is wrong anyway (even if it had worked) — user555045, Commented Feb 12, 2016 at 19:05
Is encoding="utf-8" not a hint you should use Encoding.UTF8? — leppie, Commented Feb 12, 2016 at 19:05
encoding="utf-8" is just inside string. even i remove that it's behaving same. — user1104946, Commented Feb 12, 2016 at 19:08
@user1104946 it means that the receiving end will (or at least should) decode it as if it was utf-8. If it's not, well, that could be bad. — user555045, Commented Feb 12, 2016 at 19:10
I just changed Encoding.ASCII.GetBytes(data) to Encoding.UTF8.GetBytes(data) still facing same issue. — user1104946, Commented Feb 12, 2016 at 19:12

michael · Accepted Answer · 2016-02-12 19:38:59Z

5

It seems that you showed only simplified code. Am I right that you read data from a file? If yes, check for a BOM (byte order mark) field at the begining of the file. It is used for encoding: UTF-8, UTF-16 and UTF-32.

answered Feb 12, 2016 at 19:38

michael

661 gold badge1 silver badge3 bronze badges

Yes, I only showed simplified code. My string had hidden character. I used Regex expression to remove hidden character.
– user1104946
Commented Feb 12, 2016 at 20:10
There are many ways to remove BOM markers. If you are sure that the first significant char in you string is always '<', then you can use code like this: int index = data.IndexOf("<"); if (index > 0) data = data.Substring(index);
– michael
Commented Feb 12, 2016 at 20:21

Add a comment |

Tom Blodget · Accepted Answer · 2016-02-13 02:10:44Z

There a several things wrong here. One is not showing the relevant code.

Nonetheless, if you use valid methods to read text from a UTF-8, UTF-32, etc file, you won't have a BOM in your string because the string will hold the text and the BOM is not part of the text.

One the other hand, if you are reading an XML file, it is not a "text" file. You should use an XML reader. That would take care to use the encoding that is (most likely) indicated in the file.

And, when you write an XML file (which I presume you'll be doing with the byte array), you should use an XML writer. That would take care to use the encoding you specify and write it into the file.

Keep in mind, though, that conversion from Unicode (for which UTF-8 is one encoding) to some other character set can silently corrupt your data with a replacement character (typically '?') for those that are not in the target character set.

jhilden · Accepted Answer · 2016-02-12 19:03:47Z

-1

Here is my extension method:

   public static byte[] ToByteArray(this string str)
    {
        var bytes = new byte[str.Length * sizeof(char)];
        Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
        return bytes;
    }

answered Feb 12, 2016 at 19:03

jhilden

12.4k5 gold badges55 silver badges77 bronze badges

so is this normal converting string to byte array adds a "?" in string.
– user1104946
Commented Feb 12, 2016 at 19:05

Add a comment |

Collectives™ on Stack Overflow

String to Byte Array

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
c#
encoding
arrays
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged c#encodingarrays or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
c#
encoding
arrays
or ask your own question.