61

I have following string:

{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}

and I need to get value of "scheme version", which is 1234 in this example.

I have tried

grep -Eo "\"scheme_version\":(\w*)"

however it returns

"scheme_version":1234

How can I make it? I know I can add sed call, but I would prefer to do it with single grep.

3
  • I don't think it's possible with only 'grep'. A couple of years ago I did a lot with string manipulation, often piping greps to stuff like 'sed', or 'cut'. I'd suggest you study 'piping' and the 'cut' command. Commented Dec 22, 2011 at 11:01
  • 1
    I don't use grep very often, but perhaps you can use a look-behind expression, as outlined in the accepted answer in stackoverflow.com/questions/1247812/…. Commented Dec 22, 2011 at 11:02
  • 7
    Use jq
    – Mmmh mmh
    Commented Aug 20, 2017 at 14:26

7 Answers 7

72

You'll need to use a look behind assertion so that it isn't included in the match:

grep -Po '(?<=scheme_version":)[0-9]+'
7
  • 1
    Hmm I got grep: Support for the -P option is not compiled into this --disable-perl-regexp binary
    – lstipakov
    Commented Dec 22, 2011 at 11:27
  • 5
    @Stipa Without PCRE support you cannot do what you want with grep as it does not support backreferences i.e. \1
    – SiegeX
    Commented Dec 22, 2011 at 11:33
  • Exactly what was asked, worked as a charm that "positive lookbehind"
    – greuze
    Commented Sep 5, 2016 at 7:52
  • Better than the accepted answer by a long shot for those of us lucky enough to have -P support already compiled in (or stubborn enough to rebuild grep...) :)
    – rinogo
    Commented Apr 19, 2017 at 20:39
  • 2
    When you've multiple named groups, each of them is output in a new line. is there a way to print it on the same line? e.g. cat ~/mydoc | grep -Po '(?<=blah">)[^<]*|(?<=bleh"></span>)[^<]*' prints the captures in different lines.
    – asgs
    Commented Apr 28, 2017 at 11:59
61

This might work for you:

echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' |
sed -n 's/.*"scheme_version":\([^}]*\)}/\1/p'
1234

Sorry it's not grep, so disregard this solution if you like.

Or stick with grep and add:

grep -Eo "\"scheme_version\":(\w*)"| cut -d: -f2
2
  • It seems it is best available option for me.
    – lstipakov
    Commented Dec 22, 2011 at 11:37
  • Hello, thank you for your answer. Works great to get "scheme_version" value but does not work to get "_id" value. This sed expression worked for me, i'm hading it to answers : sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p' Commented Apr 30, 2020 at 14:10
49

I would recommend that you use jq for the job. jq is a command-line JSON processor.

$ cat tmp
{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}

$ cat tmp | jq .scheme_version
1234
0
35

As an alternative to the positive lookbehind method suggested by SiegeX, you can reset the match starting point to directly after scheme_version": with the \K escape sequence. E.g.,

$ grep -Po 'scheme_version":\K[0-9]+'

This restarts the matching process after having matched scheme_version":, and tends to have far better performance than the positive lookbehind. Comparing the two on regexp101 demonstrates that the reset match start method takes 37 steps and 1ms, while the positive lookbehind method takes 194 steps and 21ms.

You can compare the performance yourself on regex101 and you can read more about resetting the match starting point in the PCRE documentation.

0
9

To avoid using greps PCRE feature which is available in GNU grep, but not in BSD version, another method is to use ripgrep, e.g.

$ rg -o 'scheme_version.?:(\d+)' -r '$1' <file.json 
1234

-r Capture group indices (e.g., $5) and names (e.g., $foo).

Another example with Python and json.tool module which can validate and pretty-print:

$ python -mjson.tool file.json | rg -o 'scheme_version[^\d]+(\d+)' -r '$1'
1234

Related: Can grep output only specified groupings that match?

2

You can do this:

$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | awk -F ':' '{print $4}' | tr -d '}'
1
  • 4
    While this code block may answer the OP's question, this answer would be much more useful if you explain how this code is different from the code in the question, what you've changed, why you've changed it and why that solves the problem without introducing others.
    – davejal
    Commented Dec 7, 2015 at 14:08
-1

Improving @potong's answer that works only to get "scheme_version", you can use this expression :

$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_id":["]*\([^(",})]*\)[",}].*/\1/p'
scheme_version

$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"_rev":["]*\([^(",})]*\)[",}].*/\1/p'
4-cad1842a7646b4497066e09c3788e724

$ echo '{"_id":"scheme_version","_rev":"4-cad1842a7646b4497066e09c3788e724","scheme_version":1234}' | sed -n 's/.*"scheme_version":["]*\([^(",})]*\)[",}].*/\1/p'
1234

Not the answer you're looking for? Browse other questions tagged or ask your own question.