0

I have a log file that is a list of repeating characteristics. For example:

## This is the pattern of lines
time
urgency
icon_path
summary
body
appname

## Below is what the log file would actually look like
12:30
critical

test notification
notification
notify-send
11:00
low

earlier notification
notification
notify-send
10:46
normal

hello
world
dunstify

I'm trying to find a way to search for a block/cluster of lines that match my search terms and then delete them in bash. As you can see in the above example, sometimes lines are empty, sometimes they are filled. The best "solution" I have found so far is to use sed '/12:30/,+5 d' or slightly better sed '/12:30/,/notify-send/d'. The problem with both of these is that the first one will delete all occurrences of the timestamp, thus deleting more than just one log entry; the problem with the other command is that if there are two or more entries with the same time and appname, all matching entries will get deleted.

What I've been trying to get working and have been failing spectacularly is to do something like: sed '/12:30\n^.*$\n^.*$\ntest notification\nnotification\nnotify-send/d' /tmp/notification_log. Note that the 2nd and 3rd lines can be anything (the urgency and icon_path lines respectively), which is why I used ^.*$ (to be frank, I'm not even sure if that is the proper regex).

EDIT: Using the above failed command, I would expect the output to be:

11:00
low

earlier notification
notification
notify-send
10:46
normal

hello
world
dunstify

That command had the input of:

12:30
*anything*
*anything*
test notification
notification
notify-send

1 Answer 1

0

It's actually not so hard, provided all clusters are M lines long, M is fixed, clusters don't overlap and we don't need to search for the beginning of any cluster. In our case M is 6.

sed allows you to match against multiple lines, but since it normally processes one line at a time, you need to explicitly append additional lines to the pattern space. You do it with N:

sed 'N;N;N;N;N; /12:30\n.*\n.*\ntest notification\nnotification\nnotify-send/d'

The rest is your code without ^ and $ anchors. The anchors are often associated with "the beginning of the line" and "the end of the line" respectively; but in sed they are really "… of the string". When sed processes one line at a time there is no difference. In our case we should definitely remember the anchors are "… of the string". Putting them in the middle doesn't make sense. It's not they would never match anything. sed wouldn't interpret them as anchors in the first place, it would interpret them as literal ^ and $.

There's no need for "… of the line" anchors in the middle of a string. Any line but last ends just before some newline character; any line but first begins just after some newline character. So it's enough to match \n.

Maybe you tried to use anchors to make sure .* (which is greedy and can match newline characters) doesn't match more than one line. Even if ^ and $ acted as "… of the line" anchors, .* would still be greedy. Consider this: The pattern space in sed never contains a newline character after the last line*. In our case we know there are at most six lines in the pattern space; and we used \n exactly five times. This guarantees each fragment of the regular expression can only match certain line in a cluster.

Still anchors can help. The above command can delete a cluster ending with notify-send-whatever. $ is the right way to prevent this. There is no time other than 12:30 that matches 12:30; but it's different for 2:30, so in general ^ can also be useful. The improved command:

sed 'N;N;N;N;N; /^12:30\n.*\n.*\ntest notification\nnotification\nnotify-send$/d'

* This doesn't mean there can never be a newline character at the end of the pattern space. A newline character at the end indicates there's a line just after the character. It's the last line and it's empty. And there is no newline character after it, so "never a newline character after the last line" stands.

1
  • Thank you so much!!! That was wonderfully thorough!
    – Barbarossa
    Commented Jan 9, 2021 at 14:23

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .