1

How can I delete a line containing a matching pattern and the following n lines using a tool supporting regular expressions?

Said differently, how can I write a regular expression matching a line containing a matching pattern and the following n lines, so that I can replace them with nothing?

For example, if I have a matching pattern bbbb and I want to delete also the 5 lines that follows it, for the input file:

aldjflajdkl
aaaabbbbaaaa
1l;adfjl
2aldfjl
3adlflkdas
4aldfjd
5aldfkld
6dlafjlkdas

The output would be:

aldjflajdkl
6dlafjlkdas

It probably simplify things that in my specific case, it cannot be that the matching pattern (bbbb) is contained in the following 5 lines.

A solution already exists for sed, but it relies only partially on regular expressions, and uses custom replacement commands which are not portable.

4
  • You wrote "I am aware of how you can do this in sed. How could I do the same with a tool supporting regular expressions?" <-- I would note that sed is a tool that supports regular expressions. e.g. sed "s/a/z/g" file.ext replaces 'a' with 'z', and does it with all occurrences of 'a'. The regex is in the find portion of that, that's where the 'a' is. Though sed can't see new lines in the find portion and some more advanced regex features are missing from it, so there are better tools than sed for regex support.
    – barlop
    Commented Apr 8, 2015 at 19:45
  • @barlop I see your point. I made a major reshuffling, let me know if this addresses your concerns.
    – Antonio
    Commented Apr 8, 2015 at 21:37
  • It's good, it was pretty good even before that correction too. IMO It's fine even if a question has one (easily made) mistake and a comment corrects it. So I upvoted your question even with that mistake that I corrected in comment. The reason I upvoted it was that it was very clear, the description and showing the input and the output you wanted.
    – barlop
    Commented Apr 8, 2015 at 22:21
  • I would note that sometimes if correcting a question, it can make a comment nonsensical.. But in this case your correction is fine 'cos my comment quoted you so there's no questions about what the comment is or was referring to.
    – barlop
    Commented Apr 8, 2015 at 22:24

2 Answers 2

3

A possible solution is:

.*<matching pattern>(.*\r?\n){<N+1>}

where N is the number of lines I want to remove after the line containing the pattern.

For the example given, this translates in:

.*bbbb(.*\r?\n){6}

That's how it looks in grepWin: grepWin screenshot
Side notes:

  • In the tab "The regex search string matches" also the 5aldfkld line is marked to be matched, indeed a scroll bar is visible on the right
  • (grepWin specific) Because of a small bug, when applying this search on files, you'll see the count of Matches increasing by 7 for each match! That's probably because the match counter counts how many lines are matched, and in this case the pattern covers 7 lines: the matched line, the following 5 lines and the line reached with the last line feed
  • (sed specific) This regex does not work for sed, which does not fully support regex and has no easy way to match/replace new lines.

The following explains how I got to the solution.

I started from:

.*bbbb.*\n.*\n.*\n.*\n.*\n.*\n

which would not work in my system. But the following would work:

.*bbbb.*\r\n.*\r\n.*\r\n.*\r\n.*\r\n.*\r\n

So, I am working in a CRLF system. However this doesn't look very pretty nor portable.

I can make it a little bit more portable (and uglier :-) ) by doing:

.*bbbb.*\r?\n.*\r?\n.*\r?\n.*\r?\n.*\r?\n.*\r?\n

(The carriage return becomes optional). It still looks ugly, but I can collect the repetitive term:

.*bbbb(.*\r?\n){6}

This guide was very handy.

7
  • here is a similar one done with notepad++ though you make a good point that the one in the pic of mine won't remove the last line and would need another \r\n . Also it's good how you did \r? to make the \r optional. You should do a screenshot of your one in whatever editor or program you use for your regexes. i.sstatic.net/rfLHQ.png could do .*bbbb.*\r?\n(.*\r?\n){5} then it's 5 like 5 lines after, though yours is better, more compact while doing the same.
    – barlop
    Commented Apr 8, 2015 at 14:18
  • Can you include a screenshot from your favorite regex supporting program?
    – barlop
    Commented Apr 8, 2015 at 14:23
  • @barlop Yep, done.
    – Antonio
    Commented Apr 8, 2015 at 14:28
  • And i'm curious, i'm assuming you knew about repetition when you began answering and not just at the end, so why did you begin typing \n.*\n.*\n.* and then simplying after? why not just straight away type something like (\n.*){5} ? or (\n.*){5} as soon as you got up to two of them. why keep going as far as \n.*\n.*\n.*\n.*\n.*\n and then do the repetition after? Like if you know the multilication operator and you wanted five fivevs you wouldn't first write 5+5+5+5+5=5*5, you'd just write 5*5
    – barlop
    Commented Apr 8, 2015 at 14:30
  • @barlop I simply didn't know about repetition :) You know about my first approach to regular expressions, a few days ago.
    – Antonio
    Commented Apr 8, 2015 at 14:36
1

An awk solution:

awk '/bbbb/ {i=5; next} {if (i>0) i--; else print}'

When it detects the pattern you're looking for, it sets i (which is a countdown counter) to 5, and skips the rest of the processing (i.e., skips to the next line of input).  In particular, it does not print the line.  (Saying /bbbb/ {i=5+1} for the first part would be equivalent; choose one based on your style preference.)  Then, if the counter is positive, decrement it (subtract 1) so as to count the lines that are being deleted (skipped), and do not print; otherwise, print the line.

1
  • I think it would be worth to post your answer (also) here, although there they want also to have the option to keep the line with the matching pattern.
    – Antonio
    Commented Apr 9, 2015 at 7:45

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .