0

I have issue with SQL Server and JSON :/

There are a lot of JSON strings coming from a remote database (which I only have read permission). The problem is that all of them are parsed into unicodes and I don't know how to turn them into UTF-8 !

Take a look at this JSON for example:

{
   "0":{
      "id":1,
      "name":"\\u062a\\u0646"
   },
   "1":{
      "id":2,
      "name":"\\u0628\\u0633\\u062a\\u0647 10\\u062a\\u0627\\u06cc\\u06cc"
   },
   "2":{
      "id":3,
      "name":"\\u0639\\u062f\\u062f 10\\u062a\\u0627\\u06cc\\u06cc"
   },
   "5":{
      "id":6,
      "name":"\\u0639\\u062f\\u062f 1000\\u062a\\u0627\\u06cc\\u06cc"
   },
   "6":{
      "id":7,
      "name":"\\u06a9\\u06cc\\u0644\\u0648\\u06af\\u0631\\u0645"
   }
}

As you can see field name is parsed into unicode. I did this to extract data into a table:

SELECT Units.*
FROM OPENJSON(@Json) AS i
CROSS APPLY OPENJSON(i.[value]) WITH (
   [Id] INT '$.id',
   [Name] NVARCHAR(MAX) '$.name'
) AS Units

It will then show something like this as result:

  • Id: 1, Name: \u062a\u0646
  • Id: 2, Name: \u0628\u0633\u062a\u0647 10\u062a\u0627\u06cc\u06cc
  • ...

But I need something like this as result:

  • Id: 1, Name: تن
  • Id: 2, Name: عدد 10تایی
  • ...
5
  • 3
    Why not ... FROM OPENJSON(REPLACE(@Json, N'\\', N'\')) AS i? More information here.
    – Zhorov
    Commented Aug 13, 2021 at 12:28
  • Thank you @Zhorov it solved my problem :)
    – Farhad Rad
    Commented Aug 13, 2021 at 12:31
  • Not sure if this can help, but prefixing with 'N' signifies string contains unicode. stackoverflow.com/questions/46842044/…
    – Phaelax z
    Commented Aug 13, 2021 at 12:32
  • What you posted are escape sequences, not Unicode strings. This page is a Unicode string. Your question is Unicode. nvarchar fields are Unicode. What is @json though and what does it contain? Why does json contain those escape sequences instead of actual Unicode characters like تن ? Commented Aug 13, 2021 at 12:39
  • BTW escape sequences have only a single backslash. If @json contains \\u0628 instead of \u0628 it's invalid JSON. Most likely the code that generated the string had a bug and doubled the backslashes Commented Aug 13, 2021 at 12:55

1 Answer 1

0

Some explanations from ECMA-404 standard: A JSON string is a sequence of Unicode code points wrapped with quotation marks (U+0022). ... If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point.

It seems, that the names are escaped, so a possible option is to replace the escaped \:

JSON:

DECLARE @json nvarchar(max) = N'{
   "0":{
      "id":1,
      "name":"\\u062a\\u0646"
   },
   "1":{
      "id":2,
      "name":"\\u0628\\u0633\\u062a\\u0647 10\\u062a\\u0627\\u06cc\\u06cc"
   },
   "2":{
      "id":3,
      "name":"\\u0639\\u062f\\u062f 10\\u062a\\u0627\\u06cc\\u06cc"
   },
   "5":{
      "id":6,
      "name":"\\u0639\\u062f\\u062f 1000\\u062a\\u0627\\u06cc\\u06cc"
   },
   "6":{
      "id":7,
      "name":"\\u06a9\\u06cc\\u0644\\u0648\\u06af\\u0631\\u0645"
   }
}'

Statement:

SELECT Units.*
FROM OPENJSON(REPLACE(@Json, N'\\u', N'\u')) AS i
CROSS APPLY OPENJSON(i.[value]) WITH (
   [Id] INT '$.id',
   [Name] NVARCHAR(MAX) '$.name'
) AS Units

Result:

Id  Name
1   تن
2   بسته 10تایی
3   عدد 10تایی
6   عدد 1000تایی
7   کیلوگرم
4
  • Even so, the string should contain a single backslash, not two. \\u0628 is invalid. The correct escape sequnce is \u0628 Commented Aug 13, 2021 at 12:53
  • @PanagiotisKanavos It will so in the end, but since it's a literal residing in a different context a double backslash needs to be used to effectively print one backslash. Nested strings are like nested loops: logical.
    – AmigoJack
    Commented Aug 13, 2021 at 14:20
  • @AmigoJack no it's not. That's not how escaping works. That's why removing the double backslashes works. Whatever code produced this string has a serious bug. Try any online validator with a string with single backslashes. You'll see it works just fine. In fact, there's no reason to use escape sequences at all. Entering عدد 10تایی in the attribute value works just fine Commented Aug 13, 2021 at 15:36
  • Now I understood Q and A: it's not about UTF-8 or Unicode or JSON - it's about the mysterious source that "has" double backslashes. But there's no indication it's actually crafted like this or just mis-consumed already. In most SQL dialects a backslash in string literals has a special meaning - that's why I would consider N'{ "\\u062a" ... }' to be the proper SQL for an outcome of proper JSON.
    – AmigoJack
    Commented Aug 13, 2021 at 16:00

Not the answer you're looking for? Browse other questions tagged or ask your own question.