I am creating XML file in Python and there's a field on my XML that I put the contents of a text file. I do it by
f = open ('myText.txt',"r")
data = f.read()
f.close()
root = ET.Element("add")
doc = ET.SubElement(root, "doc")
field = ET.SubElement(doc, "field")
field.set("name", "text")
field.text = data
tree = ET.ElementTree(root)
tree.write("output.xml")
And then I get the UnicodeDecodeError
. I already tried to put the special comment # -*- coding: utf-8 -*-
on top of my script but still got the error. Also I tried already to enforce the encoding of my variable data.encode('utf-8')
but still got the error. I know this issue is very common but all the solutions I got from other questions didn't work for me.
UPDATE
Traceback: Using only the special comment on the first line of the script
Traceback (most recent call last):
File "D:\Python\lse\createxml.py", line 151, in <module>
tree.write("D:\\python\\lse\\xmls\\" + items[ctr][0] + ".xml")
File "C:\Python27\lib\xml\etree\ElementTree.py", line 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 937, in _serialize_xml
write(_escape_cdata(text, encoding))
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1073, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 243: ordina
l not in range(128)
Traceback: Using .encode('utf-8')
Traceback (most recent call last):
File "D:\Python\lse\createxml.py", line 148, in <module>
field.text = data.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 227: ordina
l not in range(128)
I used .decode('utf-8')
and the error message didn't appear and it successfully created my XML file. But the problem is that the XML is not viewable on my browser.
decode
instead ofencode
.decode
, but the file is not viewable on my browser.# -*- coding: utf-8 -*-
serves only to insert non ASCII characters in the python sources. It doesn't affect encoding/decoding of strings in any way. Also, if the filemyText.txt
isn't ASCII you should usecodecs.open
and provide the right encoding:codecs.open('myText.txt', 'r', 'utf-8')
.tree.write
if your text is not just ASCII (see also the docs)