0

I have a file called random.html with the following line(not the only line):

blahblahblahblah random="whatever h45" blahblahblahblah

I want to specifically only get whatever, so far i used the following:

egrep -o 'random="([a-z]*[A-Z]*[0-9]*[ ]*)+'

This gives me random="whatever h45

I cant use just egrep -o ="([a-z]*[A-Z]*[0-9]*[ ]*)+' to begin with because this is not my only line and there will be unwanted lines, the random keyword is important for distinction purposes. I tried to do a double egrep -o such as:

egrep -o 'random="([a-z]*[A-Z]*[0-9]*[ ]*)+' | egrep -o '="([a-z]*[A-Z]*[0-9]*[ ]*)+'

Where it would just display ="whatever h45 but that doesn't work. Am i doing something wrong or is this illegal? I don't want to use anything fancy or use cut. This is supposed to be very "basic".

1
  • that's impossible to do with just grep Commented Feb 17, 2013 at 6:48

3 Answers 3

2

You can do this in bash alone as well:

while read -r; do
    [[ $REPLY =~ random=\"([a-zA-Z0-9]+) ]] || continue
    echo ${BASH_REMATCH[1]}
done < file.txt

If your version of grep supports Perl regexes, you can use lookback assertions to match only text that follows random=".

grep -P -o '(?<=random=\")([a-zA-Z0-9]+)' file.txt
1

You're just using the wrong tool, this is trivial in awk. There's various solutions, here's one:

$ cat file
blahblahblahblah random="whatever h45" blahblahblahblah

$ awk 'match($0,/random="([a-z]*[A-Z]*[0-9]*[ ]*)+/) { print substr($0,RSTART+8,RLENGTH-8) }' file
whatever h45

It wasn't clear from your question if you wanted whatever or whatever h45 or ="whatever h45 or some other part of the string printed, so I just picked the one I thought most likely. Whichever it is, it's trivial...

By the way, your regexp doesn't seem to make sense, I just copied it from your question to ease the contrast between what you had and the awk solution. if you tell us in words what it's meant to represent we can write it correctly for you but I THINK the most likely thing is that it should just be non-double-quote, e.g.:

$ awk 'match($0,/random="[^"]+/) { print substr($0,RSTART+8,RLENGTH-8) }' file
whatever h45
0

Perl solution for completeness.

#% perl -n -e 'print $1, "\n" if m!random="(\S+)!' tt

gives

whatever
whatever

where tt is

#% cat tt

blahblahblahblah random="whatever h45" blahblahblahblah
blahblahblahblah random="whatever h45" blahblahblahblah

Not the answer you're looking for? Browse other questions tagged or ask your own question.