4

I got this data returned b'\\u041a\\u0435\\u0439\\u0442\\u043b\\u0438\\u043d\\u043f\\u0440\\u043e from an API. This data is in Russian which I know for sure. I am guessing these values are unicode representation of the cyrillic letters?

The data returned was a byte array.

How can I convert that into readable cyrillic string? Pretty much I need a way to convert that kind into readable human text.

EDIT: Yes this is JSON data. Forgot to mention, sorry.

2
  • 1
    Most likely you have JSON data. Commented May 27, 2014 at 18:08
  • Oh yes, forgot to mention it is JSON data. Commented May 27, 2014 at 18:09

1 Answer 1

5

Chances are you have JSON data; JSON uses \uhhhh escape sequences to represent Unicode codepoints. Use the json.loads() function on unicode (decoded) data to produce a Python string:

import json

string = json.loads(data.decode('utf8'))

UTF-8 is the default JSON encoding; check your response headers (if you are using a HTTP-based API) to see if a different encoding was used.

Demo:

>>> import json
>>> json.loads(b'"\\u041a\\u0435\\u0439\\u0442\\u043b\\u0438\\u043d\\u043f\\u0440\\u043e"'.decode('utf8'))
'Кейтлинпро'
1
  • Ahh wonderful. I understand. I was getting a like freaked out thinking there is like a unique way to handle non-ascii chars. Commented May 27, 2014 at 18:12

Not the answer you're looking for? Browse other questions tagged or ask your own question.