3

I want to read a JSON file containing Cyrillic symbols.

The Cyrillic symbols are represented like \u123.

Python converts them to '\\u123' instead of the Cyrillic symbol.

For example, the string "\u0420\u0435\u0433\u0438\u043e\u043d" should become "Регион", but becomes "\\u0420\\u0435\\u0433\\u0438\\u043e\\u043d".

encode() just makes string look like u"..." or adds a new \.

How do I convert "\u0420\u0435\u0433\u0438\u043e\u043d" to "Регион"?

3
  • Can you clarify? Those files are proper JSON. Commented Oct 15, 2016 at 3:34
  • json str: "\u0420\u0435\u0433\u0438\u043e\u043d"; desired: "Регион"; get:"\\u0420\\u0435\\u0433\\u0438\\u043e\\u043d" Commented Oct 15, 2016 at 3:44
  • After you've deserialized the JSON? Commented Oct 15, 2016 at 3:47

2 Answers 2

7

If you want json to output a string that has non-ASCII characters in it then you need to pass ensure_ascii=False and then encode manually afterward.

0
0

Just use the json module.

import json

s = "\u0420\u0435\u0433\u0438\u043e\u043d"

# Generate a json file.
with open('test.json','w',encoding='ascii') as f:
    json.dump(s,f)

# Reading it directly
with open('test.json') as f:
    print(f.read())

# Reading with the json module
with open('test.json',encoding='ascii') as f:
    data = json.load(f)
print(data)

Output:

"\u0420\u0435\u0433\u0438\u043e\u043d"
Регион

Not the answer you're looking for? Browse other questions tagged or ask your own question.