10

I'm trying to familiarize myself a little with Perl to use for regular expression searches in Terminal (Mac). Now, I'm not really looking to learn Perl rigourously, just trying to find out how to do some simple regular expressions.

But I can't figure out how to do this in Terminal:

I'd like to be able to match expressions over several lines, and I'll take HTML tags as an example. PLEASE NOTE, that the HTML tag is just an example of something to match, and specifically something that goes over multiple lines. Whether matching HTML with regular expressionS is a good idea or not is not the issue. I just want to understand the syntax of matching with Perl on the command line!

Say I want to match the entire ul tag here:

<ul>
 <li>item 1</li>
 <li>item 2</li>
</ul>

I would like to:

  1. Be able to match this in a file and output the match to the stdout (don't ask why, I would just want to to understand how it works :-))
  2. Be able to replace it with something else.

For matching, I found something like this (using 'start' and 'end' as an example here from a simple text file when I was testing, but please give the example for the ul tag instead:

perl -wnE 'say $1 if /(start(.*?)end)/' test.txt 

This matches a part, but only on one line. Surprisingly, adding the s at the end didn't work to make it "dotall" or "single-line mode", it still just matched one line...

For replacing, I tried something like this:

perl -pe 's/start(.*?)end/replacement text/'s test.txt

This didn't work either...

4
  • 1
    On parsing HTML with regexes. Commented Apr 24, 2012 at 22:02
  • Ok, sorry, I shouldn't have named my question Matching html tags... That was unfortunate. What I'm really after is how to use perl to match on the command line, and the syntax for it, and also to get matching on several lines to work with the /s option... Commented Apr 24, 2012 at 22:29
  • Edited my question accordingly! Commented Apr 24, 2012 at 22:32
  • 1
    On parsing HTML with regexes, mark 2.
    – tchrist
    Commented Aug 15, 2012 at 1:28

1 Answer 1

23

Well, here's a wikipedia page for matching or replacing with Perl one liners. I did this in Cygwin:

Perl can behave like grep or like sed.

The /s makes dot match new line.

The -0777 makes it apply the regular expression to the whole thing instead of line by line.

\n can match new line as well.

$ echo -e 'a\nb\nc\nd' | perl -0777 -pe 's/.*c//s'

d

user@comp ~
$ echo -e 'a\nb\nc\nd' | perl -pe 's/.*c//s'
a
b

d

Here is the other form, -ne with print $1:

user@comp ~
$ echo -e 'a\nb\nc\nd' | perl -ne 'print $1 if /(.*c)/s'
c
user@comp ~
$ echo -e 'a\nb\nc\nd' | perl -0777 -ne 'print $1 if /(.*c)/s'
a
b
c
user@comp ~
$

Also

$ echo xxx|perl -lne 'print ""'

Perl's equivalent of \0 or &, i.e. the whole match is $_ or to be able to put text before and after without a space, ${_}

$ echo xxx|perl -lne 'print "a${_}${_}a"'
axxxxxxa

and

$  echo xxx|perl -lpe 's/.*/a${_}${_}a"/'
axxxxxxa"

###Some further examples

$ cat t.t
<ul>
 <li>item 1</li>
 <li>item 2</li>
</ul>

$ perl -0777 -ne 'print $1 if /\<ul\>(.*?)\<\/ul>/s' t.t

 <li>item 1</li>
 <li>item 2</li>

user@comp ~
$ perl -0777 -ne 'print $1 if /(.*)/s' t.t
<ul>
 <li>item 1</li>
 <li>item 2</li>
</ul>

user@comp ~
$

An example of Global for the -ne one (change "if" to "while"):

$ echo -e 'bbb' | perl -0777 -ne 'print $1 while /(b)/sg'
bbb

For the -pe one, just add the g at the end (/sg or /gs, same thing):

$  echo -e 'aaa' | perl -0777 -pe 's/a/z/s'
zaa

user@comp ~
$  echo -e 'aaa' | perl -0777 -pe 's/a/z/sg'
zzz

Note- This question contrasts /s and -0777

Those print $1 examples don't show the whole line. this link https://dzone.com/articles/perl-as-a-better-grep has this example that does perl -wln -e "/RE/ and print;" foo.txt

5
  • Perfect! Thank you! Just a minor question: why -0777? I thought the s option at the end was supposed to take care of making it match "DOTALL" and therefore include everything (possibly with a g option to take more than one match)? Commented Apr 25, 2012 at 6:04
  • 1
    i've just updated it for some global examples. There is a difference between dot matches new line, and having the regex apply to the whole thing. If you don't have -0777 then the only new line dot could ever see, would be the \n at the end of the line but it won't see past that. Similarly without -0777, the only new line \n could match, is the only one that is there, which is the \n at the end of one line. It won't see past that, as the regex is only being applied line by line. So you can have any combination of "dot matches new line"(or not). And -0777 (or not).
    – barlop
    Commented Apr 25, 2012 at 15:15
  • @AndersSvensson yeah I could see that before, you getting an answer that actually replied to your question properly wasn't looking promising! Removing the HTML might've made the question more idiot proof
    – barlop
    Commented Apr 26, 2012 at 19:25
  • @AndersSvensson To make a further point, just recently, over here I ask a similar question that you ask.. about the difference between -0777 and /s (because I momentarily forgot!). Check the answer given by qtax but also check my answer to my own question.. as it gives examples demonstrating the difference between /s and -0777 stackoverflow.com/questions/26875838/…
    – barlop
    Commented Nov 12, 2014 at 2:19
  • i'm thinking the -ne print isn't that needed as one has grep which people still use. But the -pe is well needed to replace sed.
    – barlop
    Commented May 4, 2020 at 15:05

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .