How do I output only a capture group with sed

Question

I have an input file


Werkzeug==2.0.2 # https://github.com/pallets/werkzeug
ipdb==0.13.9  # https://github.com/gotcha/ipdb
psycopg2==2.9.1  # https://github.com/psycopg/psycopg2
watchgod==0.7  # https://github.com/samuelcolvin/watchgod

# Testing
# ------------------------------------------------------------------------------
mypy==0.910  # https://github.com/python/mypy
django-stubs==1.8.0  # https://github.com/typeddjango/django-stubs
pytest==6.2.5  # https://github.com/pytest-dev/pytest
pytest-sugar==0.9.4  # https://github.com/Frozenball/pytest-sugar
djangorestframework-stubs==1.4.0  # https://github.com/typeddjango/djangorestframework-stubs

# Documentation
# ------------------------------------------------------------------------------
sphinx==4.2.0  # https://github.com/sphinx-doc/sphinx
sphinx-autobuild==2021.3.14 # https://github.com/GaretJax/sphinx-autobuild

# Code quality
# ------------------------------------------------------------------------------
flake8==3.9.2  # https://github.com/PyCQA/flake8
flake8-isort==4.0.0  # https://github.com/gforcada/flake8-isort
coverage==6.0.2  # https://github.com/nedbat/coveragepy
black==21.9b0  # https://github.com/psf/black
pylint-django==2.4.4  # https://github.com/PyCQA/pylint-django
pylint-celery==0.3  # https://github.com/PyCQA/pylint-celery
pre-commit==2.15.0  # https://github.com/pre-commit/pre-commit

# Django
# ------------------------------------------------------------------------------
factory-boy==3.2.0  # https://github.com/FactoryBoy/factory_boy

django-debug-toolbar==3.2.2  # https://github.com/jazzband/django-debug-toolbar
django-extensions==3.1.3  # https://github.com/django-extensions/django-extensions
django-coverage-plugin==2.0.1  # https://github.com/nedbat/django_coverage_plugin
pytest-django==4.4.0  # https://github.com/pytest-dev/pytest-django

and I am trying to extract the parts before the # for every line beginning with pytest using this command

sed -nE "s/(^pytest.+)#/\1/p" ./requirements/local.txt

Expected output

pytest==6.2.5  
pytest-sugar==0.9.4  
pytest-django==4.4.0

Actual output

pytest==6.2.5   https://github.com/pytest-dev/pytest
pytest-sugar==0.9.4   https://github.com/Frozenball/pytest-sugar
pytest-django==4.4.0   https://github.com/pytest-dev/pytest-django

Any help to get the expected?

These refs have not helped solve this particular problem

You're only matching up to the #. Nothing after it is part of the matched text and thus not changed and thus printed out... easy fix is to include everything after the # in your RE too. — Shawn, Commented Jan 14, 2022 at 10:17
Right! changing to sed -nE "s/(^pytest.+)#.*/\1/p" ./requirements/local.txt solved the problem. Thanks — Kwesi Smart, Commented Jan 14, 2022 at 10:22
How to extract text from a string using sed? and How to use sed to extract substring and probably more are quite helpful. — Wiktor Stribiżew, Commented Jan 14, 2022 at 10:30
Changing to sed -nE "s/(^pytest.+)#.*/\1/p" may have solved your problem for this particular input file, but that sed command will still have issues: when 1) there is no # character, 2) there are more than one # characters, in the line. — M. Nejat Aydin, Commented Jan 14, 2022 at 16:12

Mihai · Accepted Answer · 2022-01-14 10:20:16Z

3

You are missing the regex after #. This should solve it:

$ sed -nE "s/(^pytest.+)#.*/\1/p" ./requirements/local.txt

answered Jan 14, 2022 at 10:20

Mihai

2,1352 gold badges14 silver badges16 bronze badges

1

Even though OP's examples doesn't show but a property file may or may not have comment section starting with # in each line. This command assumes # will always be there. So just having a line with pytest==124 won't be matched
– anubhava
Commented Jan 14, 2022 at 10:42
1

N.B. This will also capture the white space before the #, perhaps sed -nE 's/(^pytest\S+)\s*#.*/\1/p' file?
– potong
Commented Jan 14, 2022 at 13:50
1

In addition to @anubhava's comment, your regex will capture a part of comment if there are more than one # characters in the line.
– M. Nejat Aydin
Commented Jan 14, 2022 at 15:44
For a more generic approach, please check the other answers. Mine simply pointed out what he missed in his very particular case.
– Mihai
Commented Jan 16, 2022 at 21:47

Add a comment |

anubhava · Accepted Answer · 2022-01-14 10:30:47Z

3

Using sed:

sed -nE 's/^(pytest[^=]*=[^[:blank:]]*).*/\1/p' file

pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

However a grep -o solution would be even simpler:

grep -o '^pytest[^=]*=[^[:blank:]]*' file

pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

Explanation:

^pytest: Match pytest at the start
[^=]*: Match 0 or more of any character except =
=: Match a =
[^[:blank:]]*: Match 0 or more of non-whitespace characters

edited Jan 14, 2022 at 10:30

answered Jan 14, 2022 at 10:20

anubhava

778k66 gold badges589 silver badges659 bronze badges

Add a comment |

M. Nejat Aydin · Accepted Answer · 2022-01-14 15:48:09Z

2

A sed one-liner would be:

sed -e '/^pytest/!d' -e 's/[[:blank:]]*#.*//' file

The first expression deletes lines which don't begin with pytest. The second one deletes the comment portion (including blanks before the #), if any.

edited Jan 14, 2022 at 15:48

answered Jan 14, 2022 at 11:59

M. Nejat Aydin

9,8291 gold badge8 silver badges18 bronze badges

Add a comment |

RavinderSingh13 · Accepted Answer · 2022-01-14 10:24:21Z

1st solution: With awk you could try following. Using match function of awk here, written and tested in GNU awk should work in any any. Simple explanation would be, using match function of awk to match regex ^pytest[^ ]* to match starting value of pytest till 1st occurrence of space and print the matched value by using substr function of awk.

awk 'match($0,/^pytest[^ ]*/){print substr($0,RSTART,RLENGTH)}' Input_file

2nd solution: Using GNU awk try following where making use of RS variable of it.

awk -v RS='(^|\n)pytest[^ ]*' 'RT{sub(/^\n*/,"",RT);print RT}' Input_file

The fourth bird · Accepted Answer · 2022-01-14 11:12:42Z

1

As an alternative using awk, you might also set the field separator to # preceded by optional spaces, and print the first column if it starts with pytest

awk -F"[[:blank:]]*#" '/^pytest/ {print $1}' ./requirements/local.txt

Output

pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

If the # is not always present, you could also make the match more specific to match the number, and then print the first field:

awk '/^pytest[^[:blank:]]*==[0-9]+(\.[0-9]+)*/ {print $1}' file

edited Jan 14, 2022 at 11:12

answered Jan 14, 2022 at 10:40

The fourth bird

161k16 gold badges58 silver badges73 bronze badges

Add a comment |

sseLtaH · Accepted Answer · 2022-01-14 13:37:51Z

1

Using sed

$ sed -n '/^pytest/s/#.*//p' input_file
pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

answered Jan 14, 2022 at 13:37

sseLtaH

11.1k5 gold badges16 silver badges33 bronze badges

Add a comment |

Collectives™ on Stack Overflow

How do I output only a capture group with sed

6 Answers 6

Not the answer you're looking for? Browse other questions tagged
regex
linux
bash
sed
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Not the answer you're looking for? Browse other questions tagged regexlinuxbashsed or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
regex
linux
bash
sed
or ask your own question.