9

I'm trying to write a fairly complicated SQL Query that produces JSON as the result. All is working great except for some hardcoded arrays I need to have deeper in the hierarchy that I must use UNION ALL to create. I've produced a query that shows my problem here (no data is required - I'm running this in Azure SQL Database):

SELECT
    'Hi' AS Greeting,
    (
        SELECT
            CASE WHEN DatePart(second, GetDate()) % 2 = 1 THEN
                'qwerty'
            ELSE
                'asdf'
            END AS Stuff
        FOR JSON PATH
    ) AS StuffArray,
    (
        CASE WHEN DatePart(second, GetDate()) % 2 = 1 THEN
        (
            SELECT 'qwerty' AS Stuff
            FOR JSON PATH
        )
        ELSE
        (
            SELECT 'asdf' AS Stuff
            FOR JSON PATH
        )
        END
    ) AS QuotedStuffArray,
    (
        CASE WHEN DatePart(second, GetDate()) % 2 = 1 THEN
        (
            SELECT * FROM
            (
                SELECT 'qwerty' AS Stuff
                UNION ALL
                SELECT 'zxcvb' AS Stuff
            ) AS SubSelect
            FOR JSON PATH
        )
        ELSE
        (
            SELECT 'asdf' AS Stuff
            FOR JSON PATH
        )
        END
    ) AS WhyItMatters,
    (
        SELECT * FROM
        (
            SELECT 'qwerty' AS Stuff
            UNION ALL
            SELECT 'zxcvb' AS Stuff
        ) AS SubSelect
        FOR JSON PATH
    ) AS ButThisIsFine
FOR JSON PATH

This outputs this JSON:

[
    {
        "Greeting": "Hi",
        "StuffArray": [
            {
                "Stuff": "qwerty"
            }
        ],
        "QuotedStuffArray": "[{\"Stuff\":\"qwerty\"}]",
        "WhyItMatters": "[{\"Stuff\":\"qwerty\"},{\"Stuff\":\"zxcvb\"}]",
        "ButThisIsFine": [
            {
                "Stuff": "qwerty"
            },
            {
                "Stuff": "zxcvb"
            }
        ]
    }
]

In this query, you'll see four different objects in the hierarchy beyond the base object: StuffArray, QuotedStuffArray, WhyItMatters, and ButThisIsFine. The StuffArray object is exactly what I want all of my objects to look like - pure JSON without anything escaped. However, when I begin to put my SELECT syntax inside of my CASE statements, my results begin to be quoted, as shown by the QuotedStuffArray object. So for the first two objects, this is fine. But I have a problem where I sometimes need to do a conditional UNION of two hardcoded values which forces me to put my SELECT into the CASE statement as shown by the WhyItMatters object. The ButThisIsFine object produces the output formated like I want the WhyItMatters object to be formatted but it removes the conditional UNION, which I need.

How can I get this last WhyItMatters object to produce pure JSON without escaped quotes just like the ButThisIsFine object while keeping in that conditional UNION statement?

3
  • 1
    My eyes are bleeding, but does adding JSON_QUERY() around the SELECT (rather than plain parentheses) achieve what you want? Commented Aug 29, 2017 at 14:10
  • @JeroenMostert Yes it does! Please provide that as an answer and I'll accept it. MUCH better than what I posted as a possible solution. Thanks!
    – Jaxidian
    Commented Aug 29, 2017 at 14:27
  • 1
    @Jaxidian But your solution possibly have better pefomance? Can you test that?
    – Ruslan K.
    Commented Aug 29, 2017 at 14:44

3 Answers 3

8

There's some fascinating behavior going on in the optimizer for this query, and I'm not sure if it's a bug. The following query will not add escaping:

SELECT
    'Hi' AS Greeting,
    (
        CASE WHEN 1 = 1 THEN (
            SELECT * FROM (
                SELECT 'qwerty' AS [Stuff]
                UNION ALL
                SELECT 'zxcvb' AS [Stuff]
            ) _
            FOR JSON PATH
        ) ELSE (
            SELECT 'asdf' AS [Stuff]
            FOR JSON PATH
        )
        END
    ) AS WhyItMatters
FOR JSON PATH

The CASE can be optimized away, and it is optimized away, and the end result is nicely nested JSON. But if we remove the ability to optimize things away, it degenerates into pasting in an escaped string:

SELECT
    'Hi' AS Greeting,
    (
        CASE WHEN RAND() = 1 THEN (
            SELECT * FROM (
                SELECT 'qwerty' AS [Stuff]
                UNION ALL
                SELECT 'zxcvb' AS [Stuff]
            ) _
            FOR JSON PATH
        ) ELSE (
            SELECT 'asdf' AS [Stuff]
            FOR JSON PATH
        )
        END
    ) AS WhyItMatters
FOR JSON PATH

It seems illogical that one query would result in processing typed JSON and the other would not, but there you go. JSON is not an actual type in T-SQL (unlike XML), so we can't CAST or CONVERT, but JSON_QUERY will do roughly the same thing:

SELECT
    'Hi' AS Greeting,
    JSON_QUERY(
        CASE WHEN RAND() = 1 THEN (
            SELECT * FROM (
                SELECT 'qwerty' AS [Stuff]
                UNION ALL
                SELECT 'zxcvb' AS [Stuff]
            ) _
            FOR JSON PATH
        ) ELSE (
            SELECT 'asdf' AS [Stuff]
            FOR JSON PATH
        )
        END
    ) AS WhyItMatters
FOR JSON PATH

Note that this also works if the argument is already JSON (in the constant case), so it's safe to add regardless.

4
  • Thanks for this answer - this is the solution I'm going with. I'm especially fascinated in your example where the optimizer in fact changes the format of the output just because a boolean logic optimization takes place. That certain seems like a bug. Thanks a ton for pointing that out in addition to giving me the answer I was looking for!
    – Jaxidian
    Commented Aug 29, 2017 at 16:33
  • FYI, I have filed a Connect bug for this: connect.microsoft.com/SQLServer/feedback/details/3140057
    – Jaxidian
    Commented Sep 1, 2017 at 17:12
  • I don't think this is a bug: if the FOR JSON is at the top level of the expression then there is no escaping, otherwise there is. dbfiddle.uk/…. It has always been this way. The compiler cannot "see" through the CASE expression. In the first example, the CASE is optimized away completely, so the compiler is dealing with a top level FOR JSON. Commented Jun 28, 2022 at 9:14
  • @Charlieface: my contention is that whether or not an expression can be "optimized away completely" should not influence the actual semantics, because it's essentially unpredictable when this happens (or more accurately, there is no formal documentation on what is optimized and what isn't). There is also no real reason why the compiler "cannot see" through the expression, as in, producing JSON if every subexpression produces JSON. Of course whether this is "wrong" and there is (enough) value in changing the behavior is another matter. Commented Jun 28, 2022 at 12:10
3

I have tested peformance of both solutions:

first - via JSON_QUERY:

declare @i int = 1;
SELECT
    JSON_QUERY(
        CASE when @i = 1 THEN
        (
            SELECT * FROM
            (
                select textCol AS Stuff from table1 where id % 2 = 0 
                UNION ALL
                SELECT textCol AS Stuff from table1 where id % 2 <> 0
            ) AS SubSelect
            FOR JSON PATH
        )
        ELSE
        (
            SELECT textCol AS Stuff from table1
            FOR JSON PATH
        )
        END
    ) AS WhyItMatters
FOR JSON path

gives me average execution time 91ms.

Second:

declare @i int = 1;
SELECT
    (SELECT * FROM
        (
            select textCol AS Stuff from table1 where id % 2 = 0 and @i = 1
            UNION ALL
            SELECT textCol AS Stuff from table1 where id % 2 <> 0 and @i = 1
            union all
            SELECT textCol AS Stuff from table1 where @i <> 1
        ) AS SubSelect
        FOR JSON PATH
    ) AS WhyItMatters
FOR JSON path

gives me average execution time 45ms.

table1 contains 12727 rows. Resulting JSON has length about 1500000 characters.

1
  • 1
    This is good information, thank you! This would be a great optimization to consider for some. For my scenario, though, I was being literal where I said I was UNION'ing some hard-coded values. So I'm absolutely not concerned with optimizing hard-coded subselects that hit no tables - instead I'm optimizing for code maintainability. As such, I'd like to not have to duplicate (and maintain duplicates of) the boolean logic that is driving whether I want to UNION multiple results or not. But for somebody else, this performance may be critical!
    – Jaxidian
    Commented Aug 29, 2017 at 16:10
2

I have found one possible solution but I really don't like it. I'm posting what I have in hopes that somebody has a better solution.

Using a WHERE statement on every branch of my UNION with either the affirmative or exact negative of my CASE statement can prevent the "strigifying" of my results.

For example, this query:

SELECT
    'Hi' AS Greeting,
    (
        SELECT * FROM
        (
            SELECT 'asdf' AS Stuff WHERE DatePart(second, GetDate()) % 2 = 0
            UNION ALL
            SELECT 'qwerty' AS Stuff WHERE DatePart(second, GetDate()) % 2 = 1
            UNION ALL
            SELECT 'zxcvb' AS Stuff WHERE DatePart(second, GetDate()) % 2 = 1
        ) AS SubSelect
        FOR JSON PATH
    ) AS Try1
FOR JSON PATH

provides these results:

[
    {
        "Greeting": "Hi",
        "Try1": [
            {
                "Stuff": "qwerty"
            },
            {
                "Stuff": "zxcvb"
            }
        ]
    }
]

If nothing better can be found, I can move forward with this. But this seems like a hacky way to control this.

1
  • 1
    I received the same solution. And I had test it peformance: it give twice better time compared with solution via JSON_QUERY
    – Ruslan K.
    Commented Aug 29, 2017 at 15:51

Not the answer you're looking for? Browse other questions tagged or ask your own question.