To my understanding when I write a file using notepad++ I can write the symbols ’ and & without a problem in a text file. Both are valid ASCI symbols and they are not that exotic either. & = 38 decimal 26 in hex ’ = 44 decimal 2C in hex
I try to write both out in streamwriter (.net core) I used various text encodings, but somehow it fails One of them depending of the encoding gets broken to \uxxxx Is there an encoding type that works for both ?
my code
filedata = "& Test ’ ";
// filedata = filedata.Replace("\\u0026", "&"); //extra added should not be needed with Test
// filedata = filedata.Replace("\\u2019", "’"); //extra added should not be needed with Test
// Write the updated content back to the file with exclusive access
using (var fileStream = new FileStream(filePath, FileMode.Truncate, FileAccess.Write, FileShare.None)) {
// I used various combinations for below, also Encoding.ASCI, ...UTF.. , Encoding.Asci etc..
using (var writer = new StreamWriter(fileStream, new UTF8Encoding(true))) {
await writer.WriteAsync(filedata);
}
}
The other application needs it, I think its valid UTF-8 Bom which should not translate these symbols.
But maybe its something else, notepad++ shows me in clear text the \uxxx when i write the file with C#
While if i type the symbols in notepad++ and open the file i dont see the \uxxxx
I trust notepad++ a bit more.
Notably over the wire it all goes fine. It's the file saving causing me a headache literally.
Encoding
at all? I've never personally seen\u
escaping happen when doing simple file I/O unless I was using JSON-related APIs.38 (&)
, see dotnetfiddle.net/ikBSxO. I suspect the escaping you are seeing occurs because your viewer is somehow escaping the character, or because you are transmitting the file over HTML which introduces the escaping, or because you are working on some unusual .NET platform or version. Please edit your question to share a minimal reproducible example.’
most certainly is not a simple character; in UTF-8 that's 3 bytes: E2-80-99; 44/0x2C is,
i.e. comma\u
escaping - it just writes the bytes; I think that\u
is coming from whatever you're using to view the data. Honestly: when inspecting payloads, you need to look at the bytes with a hex viewer - nothing else will be useful.