1

So I'm trying to practice regex with sed on linux. I've got these lines:

│      .--.     3..6 °C        │  _ /"".-.     6..8 °C        │  _ /"".-.     2..6 °C        │  _ /"".-.     0..4 °C        │
│  _ /"".-.     -2..+3 °C      │  _ /"".-.     1..5 °C        │   ,\_(   ).   1..4 °C        │   ,\_(   ).   -1..+2 °C      │
│     (   ).    -1..+1 °C      │     (   ).    -2..+2 °C      │     (   ).    -4..+2 °C      │     (   ).    -4..+2 °C      │

How can I extract only numbers and their sign? But every single on of them? I'm very sloppy with using groups and regexes, and any tip and explanation would help. I got this regex which can shorten the problem, but I still can't extract every single match. I hope it can help.

sed -n -E "/[+-]?[0-9]+\.{2}[+-]?[0-9]+/p"

I want my output to be 3,6 6,8 2,6 0,4 if done on first line, for second and third:

-2,+3 1,5 1,4 -1,+2

-1,+1 -2,+2 -4,+2 -4,+2

5
  • You say you want to extract only numbers, yet your regex tries to match °C.
    – Quasímodo
    Commented Mar 22, 2020 at 15:05
  • Are you curently getting only the first match per line? Because then you need to add the 'g' flag to the regex. And to get only the sign and the numbers try enclosing those parts in escaped parentheses e.g. \([+-]?[0-9]+\) to store those parts separately. Not sure if that's what you meant though.
    – Zwiers
    Commented Mar 22, 2020 at 15:13
  • @gjzwiers yeah, that's what I meant but I couldn't get it to work. Commented Mar 22, 2020 at 15:17
  • Ok, just wanted to make sure :) Now, I think a simple alternative would be grep -Eo "[-+]*[0-9]+\.\.[-+]*[0-9]+", but since that does not produce the output formatted in the way you asked for, I just leave it as a comment.
    – Quasímodo
    Commented Mar 22, 2020 at 16:28
  • @Quasímodo oh it's ok...I just need format to be manipulatible easily...I just put the one as placeholder. Thank you. Too bad I can't mark it as an answer Commented Mar 22, 2020 at 21:14

1 Answer 1

1

This might work for you (GNU sed):

sed -E 's/([+-]?[0-9]+)\.\.([+-]?[0-9]+)/\n\1,\2\n/g;s/^[^0-9+-].*$/ /mg;s/^ |\n| $//g' file

Surround valid strings by newlines (convert .. to , at the same time).

Replace all non-valid strings by a space.

Remove the spaces at the front and end of the line and any introduced newlines.

N.B. The use of the m flag on the substitution command.

An alternative, is to work through the line from beginning to end removing non-valid characters:

sed -E 's/^/\n/;:a;s/\n([+-]?[0-9]+)\.\.([+-]?[0-9]+)/\1,\2 \n/;ta;s/\n./\n/;ta;s/ ?\n//' file
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.