Python Writing Weird Unicode to CSV

Question

I'm attempting to extract article information using the python newspaper3k package and then write to a CSV file. While the info is downloaded correctly, I'm having issues with the output to CSV. I don't think I fully understand unicode, despite my efforts to read about it.

from newspaper import Article, Source
import csv

first_article = Article(url="http://www.bloomberg.com/news/articles/2016-09-07/asian-stock-futures-deviate-as-s-p-500-ends-flat-crude-tops-46")

first_article.download()
if first_article.is_downloaded:
    first_article.parse()
    first_article.nlp

article_array = []
collate = {}

collate['title'] = first_article.title
collate['content'] = first_article.text
collate['keywords'] = first_article.keywords
collate['url'] = first_article.url
collate['summary'] = first_article.summary
print(collate['content'])
article_array.append(collate)

keys = article_array[0].keys()
with open('bloombergtest.csv', 'w') as output_file:
    csv_writer = csv.DictWriter(output_file, keys)
    csv_writer.writeheader()
    csv_writer.writerows(article_array)

output_file.close()

When I print collate['content'], which is first_article.text, the console outputs the article's content just fine. Everything shows up correctly, apostrophes and all. When I write to the CVS, the content cell text has odd characters in it. For example:

â€œAt the end of the day, Europeâ€™s economy isnâ€™t in great shape, inflation doesnâ€™t look exciting and there are a bunch of political risks to reckon with.

So far I have tried:

with open('bloombergtest.csv', 'w', encoding='utf-8') as output_file:

to no avail. I also tried utf-16 instead of 8, but that just resulted in the cells writing in an odd order. It didn't create the cells correctly in the CSV, although the output looked correct. I've also tried .encode('utf-8') are various variable but nothing has worked.

What's going on? Why would the console print the text correctly, while the CSV file has odd characters? How can I fix this?

Mark Tolonen · Accepted Answer · 2020-02-12 05:49:14Z

11

Add encoding='utf-8-sig' to open(). Excel requires the UTF-8-encoded BOM code point (Byte Order Mark, U+FEFF) signature to interpret a file as UTF-8; otherwise, it assumes the default localized encoding.

edited Feb 12, 2020 at 5:49

answered Sep 10, 2016 at 15:05

Mark Tolonen

173k26 gold badges173 silver badges258 bronze badges

Add a comment |

sirryankennedy · Accepted Answer · 2016-09-10 18:40:17Z

7

Changing with open('bloombergtest.csv', 'w', encoding='utf-8') as output_file: to with open('bloombergtest.csv', 'w', encoding='utf-8-sig') as output_file:, worked, as recommended by Leon and Mark Tolonen.

answered Sep 10, 2016 at 18:40

sirryankennedy

2671 gold badge4 silver badges9 bronze badges

Add a comment |

Community · Accepted Answer · 2017-05-23 12:32:47Z

4

That's most probably a problem with the software that you use to open or print the CSV file - it doesn't "understand" that CSV is encoded in UTF-8 and assumes ASCII, latin-1, ISO-8859-1 or a similar encoding for it.

You can aid that software in recognizing the CSV file's encoding by placing a BOM sequence in the beginning of your file (which, in general, is not recommended for UTF-8).

edited May 23, 2017 at 12:32

CommunityBot

11 silver badge

answered Sep 10, 2016 at 6:59

Leon

32.2k4 gold badges76 silver badges99 bronze badges

1

I'm opening it in excel. Is there no way to write universal characters?
– sirryankennedy
Commented Sep 10, 2016 at 12:01
@sirryankennedy Have you tried writing UTF-8 with BOM (as shown in the linked answer)?
– Leon
Commented Sep 10, 2016 at 13:13
@sirryankennedy: there is no "universal" encoding. Even plain ASCII is not "universal". If you want to use a one-byte encoding, convert to one that contains your curly quotes, such as Windows-1252.
– Jongware
Commented Sep 10, 2016 at 15:27

Add a comment |

Collectives™ on Stack Overflow

Python Writing Weird Unicode to CSV

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
python
csv
unicode
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged pythoncsvunicode or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
csv
unicode
or ask your own question.