Had a problem with a tool that was generating illegal JSON.
Some of the JSON strings contained characters in the range 00-1f. So I wanted to convert these characters to correctly escaped valued \u00xx
within the string.
The best I have managed to do is:
cat test2.json | jq -aR . | sed -e 's/\\"/"/g' -e 's/^"\(.*\)"$/\1/' | jq
Explanation:
jq -aR reads the data as raw input and converts
the whole thing into a single string.
This converts all control characters into
the correct form => \u00xx
sed -e 's/^"\(.*\)"$/\1/' Removes the quotes from the beginning and end.
sed -e 's/\\"/"/g' Looks for escaped quotes and removed the quotes.
jq Just makes it pretty again at the end.
Also makes sure it is valid JSON.
A couple of issues that I have spotted (but luckily don't affect me yet).
- embedded '\n' in the string are not handled correctly.
- Any escaped characters are now probably double escaped.
- Probably other things I have not though about.
Some test data can be generated with:
echo -e "{ \"data\": \"XX\001YY\"}" > test2.json
Then I have tested with:
cat test2.json | jq -aR . | sed -e 's/\\"/"/g' -e 's/^"\(.*\)"$/\1/' | jq
Generates:
{
"data": "XX\u0001YY"
}
Just noticed that this does not handle newline
=> `\n' => '\x0a' correctly when it is inside a string.
'\a'
or 0x0a is invalid in JSON strings and is in the range 00-1f that could potentially be produced).