4

Essentially I have a file.log as follow

blah blah
blah blah
Hello world | {"foo": "bar"}
blah blah
Hello earth | {"foo1": "bar1"}

Now my goal is to write some shell commands to have desire output like this:

Hello earth | "bar"
Hello earth | "bar1"

Currently this is what I have:

grep Hello file.log | awk -F "|" '{print $1, system("jq " $2)}'

However calling jq is giving me this error:

jq: error: syntax error, unexpected ':', expecting $end (Unix shell quoting issues?) at <top-level>, line 1:
bin:application   
jq: 1 compile error

I am thinking that its because inside system(), my $12 is stripped off all the quotation character (") thus JQ fail to recognize its json. Any suggestion?

0

4 Answers 4

7

You have several problems here

  • system doesn't return something to print, it returns the exit value of the command you executed (0 if everything ran fines). You will see your JSON decoded data and then a line like Hello earth 0
  • the double quotes in your JSON string are swallowed by the shell. The resulting command you are executing is jq {foo: bar} (two arguments, JSON no longer quoted)
  • if $2 contains special characters like $, your shell will interpret them
  • even with proper quoting, jq is not called like that, it expects a filter as first argument (say '.') and it expects the JSON input to be read from a file, or from standard input
  • building a command from the logs and executing it has huge security implication (what if $2 was ; rm -rf ~?). Better avoid it if you can.

The security issue set aside, here is an awk code that will work most of the time:

awk -F "|" '{ printf "%s", $1; system("echo \x27" $2 "\x27 | jq .")}'

What it does is send $2 enclosed in single-quotes (\x27) to jq through stdin.

Issues remain, though

  • if $2 contains a single quote, it will break the whole command
  • if $2 begins with a dash (unlikely) it will be interpreted as an option to echo (we may use the printf command instead of echo)
  • the security issue already mentioned (e.g. if $2 contains ...'; rm -r ~; : ' ... anywhere in the string)

Now a better awk code

awk -F "|" '{ printf "%s", $1; print $2 | "jq ."; close("jq ."); }'

Since $2 is sent to a jq process through stdin, but now using an awk pipe, it is no longer interpreted by the shell, solving all the issues above. The jq command must be closed (terminated) at each line, hence the call to close().

2

Another solution , without using awk and only jq

The trick is to use --raw-input , that will read the file as a array of string .

So for each line , test if the symbol | is here cut the string into , and parse the part as json string

jq -j --raw-input  '
    . as $line | 
    if index("|") >= 0  
    then  
      [ .[:index("|")-1] ,.[index("|")+2:] ]  
    else 
      empty
    end | 
   [ .[0] , ( .[1] | fromjson | to_entries | .[0].value ) ] |
   .[0] , " | \"" ,.[1] , "\"\n" '  /tmp/file.log
2

xhienne gave a good overview of the issues with the existing code, and a good alternative for what you want to accomplish.

The following is another alternative: Don't try to call jq from awk at all, but let the awk script create proper JSON output.

$ awk -F '|' 'BEGIN { print "[" } $2 != "" { if (t != "") print t ","; t = $2 } END { print t, "]" }' file | jq .
[
  {
    "foo": "bar"
  },
  {
    "foo1": "bar1"
  }
]

The awk code, by itself, will generate the following JSON array from the found JSON objects (given the example in the question):

[
 {"foo": "bar"},
 {"foo1": "bar1"} ]

This allows you to work more freely with jq without making your script too difficult to maintain and understand.

The juggling with the t variable in the script is just a way of making sure that we don't get a trailing comma after the last JSON object.

2

TL;DR:

jq -r -R '
  select(contains(" | ")) |
  split(" | ") |
  .[0] as $text |
  (.[1] | fromjson | to_entries | .[0].value ) as $json_obj_value |
  "\($text) | \($json_obj_value)"
' yourlogfile.log

Complete answer

Most people don't realize quite how powerful jq is (though the same can be said about awk).

As Kusalananda thoughtfully pointed out in their answer, your best friend here is the -R flag, that will read the input line by line as json strings instead of a json object. With that we are free to treat the string within jq only, without any need for awk at all.

Here is how the documentation describes it as of version 1.6:

--raw-input / -R:

Don't parse the input as JSON. Instead, each line of text is passed to the filter as a string. If combined with --slurp, then the entire input is passed to the filter as a single long string.

For your desired output you will also need the -r flag, which makes it print bare strings instead of json strings in the terminal.

Again from the docs

--raw-output / -r:

With this option, if the filter's result is a string then it will be written directly to standard output rather than being formatted as a JSON string with quotes. This can be useful for making jq filters talk to non-JSON-based systems.

So with that out of the way, there are a few ways to tackle this in jq.

As EchoMike444 already answered with a more imperative way, I tried to use a different approach, which was a little more pipeliney.

jq -r -R '
  select(contains(" | ")) |
  split(" | ") |
  .[0] as $text |
  (.[1] | fromjson | to_entries | .[0].value ) as $json_obj_value |
  "\($text) | \($json_obj_value)"
' yourlogfile.log

Basically we

  1. Throw out any line without " | " in it
  2. Split each line into two parts
  3. Put the left part in a $text binding for legibility
  4. Parse the right part into json, get it's first value and put it in a $json_obj_value binding for legibility
  5. Print a string "$text | $json_obj_value" (\(foo) is how you do interpolation on jq)

If you want it as compact as possible, you can use

jq -Rr 'select(contains(" | "))|split(" | ")|"\(.[0]) | \(.[1]|fromjson|to_entries|.[0].value)"' yourlogfile.log

That will be smaller but also harder to read. Which is the best will depend on taste and use-case.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .