3

I have an input file


Werkzeug==2.0.2 # https://github.com/pallets/werkzeug
ipdb==0.13.9  # https://github.com/gotcha/ipdb
psycopg2==2.9.1  # https://github.com/psycopg/psycopg2
watchgod==0.7  # https://github.com/samuelcolvin/watchgod

# Testing
# ------------------------------------------------------------------------------
mypy==0.910  # https://github.com/python/mypy
django-stubs==1.8.0  # https://github.com/typeddjango/django-stubs
pytest==6.2.5  # https://github.com/pytest-dev/pytest
pytest-sugar==0.9.4  # https://github.com/Frozenball/pytest-sugar
djangorestframework-stubs==1.4.0  # https://github.com/typeddjango/djangorestframework-stubs

# Documentation
# ------------------------------------------------------------------------------
sphinx==4.2.0  # https://github.com/sphinx-doc/sphinx
sphinx-autobuild==2021.3.14 # https://github.com/GaretJax/sphinx-autobuild

# Code quality
# ------------------------------------------------------------------------------
flake8==3.9.2  # https://github.com/PyCQA/flake8
flake8-isort==4.0.0  # https://github.com/gforcada/flake8-isort
coverage==6.0.2  # https://github.com/nedbat/coveragepy
black==21.9b0  # https://github.com/psf/black
pylint-django==2.4.4  # https://github.com/PyCQA/pylint-django
pylint-celery==0.3  # https://github.com/PyCQA/pylint-celery
pre-commit==2.15.0  # https://github.com/pre-commit/pre-commit

# Django
# ------------------------------------------------------------------------------
factory-boy==3.2.0  # https://github.com/FactoryBoy/factory_boy

django-debug-toolbar==3.2.2  # https://github.com/jazzband/django-debug-toolbar
django-extensions==3.1.3  # https://github.com/django-extensions/django-extensions
django-coverage-plugin==2.0.1  # https://github.com/nedbat/django_coverage_plugin
pytest-django==4.4.0  # https://github.com/pytest-dev/pytest-django

and I am trying to extract the parts before the # for every line beginning with pytest using this command

sed -nE "s/(^pytest.+)#/\1/p" ./requirements/local.txt

Expected output

pytest==6.2.5  
pytest-sugar==0.9.4  
pytest-django==4.4.0  

Actual output

pytest==6.2.5   https://github.com/pytest-dev/pytest
pytest-sugar==0.9.4   https://github.com/Frozenball/pytest-sugar
pytest-django==4.4.0   https://github.com/pytest-dev/pytest-django

Any help to get the expected?

These refs have not helped solve this particular problem

4
  • 2
    You're only matching up to the #. Nothing after it is part of the matched text and thus not changed and thus printed out... easy fix is to include everything after the # in your RE too.
    – Shawn
    Commented Jan 14, 2022 at 10:17
  • Right! changing to sed -nE "s/(^pytest.+)#.*/\1/p" ./requirements/local.txt solved the problem. Thanks Commented Jan 14, 2022 at 10:22
  • How to extract text from a string using sed? and How to use sed to extract substring and probably more are quite helpful. Commented Jan 14, 2022 at 10:30
  • Changing to sed -nE "s/(^pytest.+)#.*/\1/p" may have solved your problem for this particular input file, but that sed command will still have issues: when 1) there is no # character, 2) there are more than one # characters, in the line. Commented Jan 14, 2022 at 16:12

6 Answers 6

3

You are missing the regex after #. This should solve it:

$ sed -nE "s/(^pytest.+)#.*/\1/p" ./requirements/local.txt
4
  • 1
    Even though OP's examples doesn't show but a property file may or may not have comment section starting with # in each line. This command assumes # will always be there. So just having a line with pytest==124 won't be matched
    – anubhava
    Commented Jan 14, 2022 at 10:42
  • 1
    N.B. This will also capture the white space before the #, perhaps sed -nE 's/(^pytest\S+)\s*#.*/\1/p' file?
    – potong
    Commented Jan 14, 2022 at 13:50
  • 1
    In addition to @anubhava's comment, your regex will capture a part of comment if there are more than one # characters in the line. Commented Jan 14, 2022 at 15:44
  • For a more generic approach, please check the other answers. Mine simply pointed out what he missed in his very particular case.
    – Mihai
    Commented Jan 16, 2022 at 21:47
3

Using sed:

sed -nE 's/^(pytest[^=]*=[^[:blank:]]*).*/\1/p' file

pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

However a grep -o solution would be even simpler:

grep -o '^pytest[^=]*=[^[:blank:]]*' file

pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

Explanation:

  • ^pytest: Match pytest at the start
  • [^=]*: Match 0 or more of any character except =
  • =: Match a =
  • [^[:blank:]]*: Match 0 or more of non-whitespace characters
2

A sed one-liner would be:

sed -e '/^pytest/!d' -e 's/[[:blank:]]*#.*//' file

The first expression deletes lines which don't begin with pytest. The second one deletes the comment portion (including blanks before the #), if any.

1

1st solution: With awk you could try following. Using match function of awk here, written and tested in GNU awk should work in any any. Simple explanation would be, using match function of awk to match regex ^pytest[^ ]* to match starting value of pytest till 1st occurrence of space and print the matched value by using substr function of awk.

awk 'match($0,/^pytest[^ ]*/){print substr($0,RSTART,RLENGTH)}' Input_file

2nd solution: Using GNU awk try following where making use of RS variable of it.

awk -v RS='(^|\n)pytest[^ ]*' 'RT{sub(/^\n*/,"",RT);print RT}' Input_file
1

As an alternative using awk, you might also set the field separator to # preceded by optional spaces, and print the first column if it starts with pytest

awk -F"[[:blank:]]*#" '/^pytest/ {print $1}' ./requirements/local.txt

Output

pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

If the # is not always present, you could also make the match more specific to match the number, and then print the first field:

awk '/^pytest[^[:blank:]]*==[0-9]+(\.[0-9]+)*/ {print $1}' file
1

Using sed

$ sed -n '/^pytest/s/#.*//p' input_file
pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

Not the answer you're looking for? Browse other questions tagged or ask your own question.