UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2

Question

I am creating XML file in Python and there's a field on my XML that I put the contents of a text file. I do it by

f = open ('myText.txt',"r")
data = f.read()
f.close()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml")

And then I get the UnicodeDecodeError. I already tried to put the special comment # -*- coding: utf-8 -*- on top of my script but still got the error. Also I tried already to enforce the encoding of my variable data.encode('utf-8') but still got the error. I know this issue is very common but all the solutions I got from other questions didn't work for me.

UPDATE

Traceback: Using only the special comment on the first line of the script

Traceback (most recent call last):
  File "D:\Python\lse\createxml.py", line 151, in <module>
    tree.write("D:\\python\\lse\\xmls\\" + items[ctr][0] + ".xml")
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write
    serialize(write, self._root, encoding, qnames, namespaces)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
    _serialize_xml(write, e, encoding, qnames, None)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata
    return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 243: ordina
l not in range(128)

Traceback: Using .encode('utf-8')

Traceback (most recent call last):
  File "D:\Python\lse\createxml.py", line 148, in <module>
    field.text = data.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 227: ordina
l not in range(128)

I used .decode('utf-8') and the error message didn't appear and it successfully created my XML file. But the problem is that the XML is not viewable on my browser.

It would be useful to see the entire error message to see where it's coming from. In the meantime try using decode instead of encode. — Mark Ransom, Commented May 12, 2013 at 14:49
Updated, it successfully created my XML when I use decode, but the file is not viewable on my browser. — kagat-kagat, Commented May 12, 2013 at 15:00
Note that using # -*- coding: utf-8 -*- serves only to insert non ASCII characters in the python sources. It doesn't affect encoding/decoding of strings in any way. Also, if the file myText.txt isn't ASCII you should use codecs.open and provide the right encoding: codecs.open('myText.txt', 'r', 'utf-8'). — Bakuriu, Commented May 12, 2013 at 15:17
Additionally, you should add an encoding to tree.write if your text is not just ASCII (see also the docs) — Thomas Fenzl, Commented May 12, 2013 at 15:40
Might have been a non-breaking space. Just saying. Option + Space on Mac. 0xC2 0xA0 in UTF-8. — superlukas, Commented Mar 3, 2015 at 23:13

uhbif19 · Accepted Answer · 2013-05-12 15:33:17Z

69

You need to decode data from input string into unicode, before using it, to avoid encoding problems.

field.text = data.decode("utf8")

answered May 12, 2013 at 15:33

uhbif19

3,2253 gold badges29 silver badges49 bronze badges

Add a comment |

the · Accepted Answer · 2019-12-12 11:33:58Z

12

I was running into a similar error in pywikipediabot. The .decode method is a step in the right direction but for me it didn't work without adding 'ignore':

ignore_encoding = lambda s: s.decode('utf8', 'ignore')

Ignoring encoding errors can lead to data loss or produce incorrect output. But if you just want to get it done and the details aren't very important this can be a good way to move faster.

edited Dec 12, 2019 at 11:33

answered Dec 25, 2013 at 3:32

the

21.7k11 gold badges71 silver badges101 bronze badges

11

Do note that ignoring encoding errors will potentially lose data, or produce incorrect output.
– tripleee
Commented Feb 1, 2015 at 6:55

Add a comment |

Alastair McCormack · Accepted Answer · 2016-05-08 07:08:02Z

Python 2

The error is caused because ElementTree did not expect to find non-ASCII strings set the XML when trying to write it out. You should use Unicode strings for non-ASCII instead. Unicode strings can be made either by using the u prefix on strings, i.e. u'€' or by decoding a string with mystr.decode('utf-8') using the appropriate encoding.

The best practice is to decode all text data as it's read, rather than decoding mid-program. The io module provides an open() method which decodes text data to Unicode strings as it's read.

ElementTree will be much happier with Unicodes and will properly encode it correctly when using the ET.write() method.

Also, for best compatibility and readability, ensure that ET encodes to UTF-8 during write() and adds the relevant header.

Presuming your input file is UTF-8 encoded (0xC2 is common UTF-8 lead byte), putting everything together, and using the with statement, your code should look like:

with io.open('myText.txt', "r", encoding='utf-8') as f:
    data = f.read()

root = ET.Element("add")
doc = ET.SubElement(root, "doc")

field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data

tree = ET.ElementTree(root)
tree.write("output.xml", encoding='utf-8', xml_declaration=True)

Output:

<?xml version='1.0' encoding='utf-8'?>
<add><doc><field name="text">data€</field></doc></add>

Ankit Kumar Rathod · Accepted Answer · 2016-11-21 09:24:37Z

1

#!/usr/bin/python

# encoding=utf8

Try This to starting of python file

answered Nov 21, 2016 at 9:24

Ankit Kumar Rathod

5131 gold badge4 silver badges7 bronze badges

Add a comment |

Collectives™ on Stack Overflow

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged python or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
or ask your own question.