How to delete the rest of each line after a certain pattern or a string in a file?

Question

Suppose I have a list of URLs in a text file:

google.com/funny
unix.stackexchange.com/questions
isuckatunix.com/ireallydo

I want to delete everything that comes after '.com'.

Expected Results:

google.com
unix.stackexchange.com
isuckatunix.com

I tried

sed 's/.com*//' file.txt

but it deleted .com as well.

Is there a specific reason for which you want to search for .com only instead of removing everything after and including the first / character? What if you had a URL like en.wikipedia.org/wiki/Ubuntu in your list? — Byte Commander, Commented Feb 1, 2016 at 18:55
@ByteCommander The answer below replaces doesn't remove. I am in need of removing everything after a pattern but keeping the pattern. — John, Commented Mar 14, 2021 at 7:59

Jeff Schaller · Accepted Answer · 2016-01-25 15:16:56Z

To explicitly delete everything that comes after ".com", just tweak your existing sed solution to replace ".com(anything)" with ".com":

sed 's/\.com.*/.com/' file.txt

I tweaked your regex to escape the first period; otherwise it would have matched something like "thisiscommon.com/something".

Note that you may want to further anchor the ".com" pattern with a trailing forward-slash so that you don't accidentally trim something like "sub.com.domain.com/foo":

sed 's/\.com\/.*/.com/' file.txt

αғsнιη · Accepted Answer · 2023-02-10 04:27:15Z

16

You can use awk's field separator (-F) following way:

$ cat file
google.com/funny
unix.stackexchange.com/questions
isuckatunix.com/ireallydo

$ <file awk -F '\\.com' '{print $1".com"}'
google.com
unix.stackexchange.com
isuckatunix.com

Explanation:

NAME
       awk - pattern scanning and processing language


-F fs
       --field-separator fs
              Use fs for the input field separator (the value of the FS predefined variable).

As you want to delete every things after .com, -F '\\.com' separates line with .com and print $1 gives output only the part before .com. So, $1".com" adds .com and gives you expected output.

edited Feb 10, 2023 at 4:27

αғsнιη

41.5k17 gold badges72 silver badges115 bronze badges

answered Jan 25, 2016 at 12:52

Pandya

24.9k31 gold badges95 silver badges145 bronze badges

Why not just / as FS and take the first field?
– heemayl
Commented Jan 25, 2016 at 12:53
@heemayl unix.stackexchange.com/questions/257514/…
– Pandya
Commented Jan 25, 2016 at 12:57

Add a comment |

Wildcard · Accepted Answer · 2016-01-25 19:15:12Z

The best tool for non-interactive in-place file editing is ex.

ex -sc '%s/\(\.com\).*/\1/ | x' file.txt

If you've used vi and if you've ever typed a command that begins with a colon : you've used an ex command. Of course many of the more advanced or "fancy" commands you can execute this way are Vim extensions (e.g. :bufdo) and are not defined in the POSIX specifications for ex, but those specifications allow for a truly astonishing degree of power and flexibility in non-visual text editing (whether interactive or automated).

The command above has several parts.

-s enables silent mode to prepare ex for batch use. (Suppress output messages et. al.)

-c specifies the command to execute once the file (file.txt, in this case) is opened in a buffer.

% is an address specifier equivalent to 1,$—it means that the following command is applied to all lines of the buffer.

s is the substitute command that you are likely familiar with already. It is commonly used in vi and has essentially identical features to the s command of sed, though some of the advanced regex features may vary by implementation. In this case from ".com" to the end of the line is replaced with just ".com".

The vertical bar separates sequential commands to be executed. In many (most) ex implementations you can also use an additional -c option, like so:

ex -sc '%s/\(\.com\).*/\1/' -c x file.txt

However, this is not required by POSIX.

The x command exits, after writing any changes to the file. Unlike wq which means "write and quit", x only writes to the file if the buffer has been edited. Thus if your file is unaltered, the timestamp will be preserved.

It doesnt edit in-place. At least, it doesnt any more than Gnu sed's bogus -i does. It reads/writes to on-disk buffers. See for yourself w/ ex -r and the preserve command. — mikeserv, Commented Feb 1, 2016 at 16:59

Sergiy Kolodyazhnyy · Accepted Answer · 2016-01-25 19:57:14Z

3

Very quick, simple and dirty python way:

#!/usr/bin/env python
import sys
with open( sys.argv[1]  ) as file:
    for line in file:
        print line.split("/")[0]

Sample run

skolodya@ubuntu:$ chmod +x removeStrings.py                                   

skolodya@ubuntu:$ ./removeStrings.py strings.txt                              
google.com
unix.stackexchange.com
isuckatunix.com


skolodya@ubuntu:$ cat strings.txt                                             
google.com/funny
unix.stackexchange.com/questions
isuckatunix.com/ireallydo

answered Jan 25, 2016 at 19:57

Sergiy Kolodyazhnyy

16.7k12 gold badges55 silver badges106 bronze badges

3

It works, but it does not care about .com, it just removes everything starting with the first / in the line. (which is in my opinion even the better approach!)
– Byte Commander
Commented Feb 1, 2016 at 18:50
1

@ByteCommander exactly right ! If domain name is .net, in other approaches the part that comes after domain and extension wouldn't get deleted, so it's safer to use / as separator.
– Sergiy Kolodyazhnyy
Commented Feb 1, 2016 at 18:53

Add a comment |

AdminBee · Accepted Answer · 2023-10-05 12:15:02Z

-1

Delete everything after a string

sed 's/\.com.*/.com/'

example:

sed 's/\.com.*/.com/' filename >> filename

Delete everything before a certain word

sed 's/^.*can/can/' filename >> filename

edited Oct 5, 2023 at 12:15

AdminBee

22.9k24 gold badges50 silver badges74 bronze badges

answered Oct 4, 2023 at 21:12

bennie1

111 bronze badge

The basic solution in this answer duplicates another answer. You added some examples, but they are flawed: appending to filename while processing filename will cause the file to grow and grow until "no space left"; unless filename is small enough, sosed reads the whole of it in one chunk, but you should never count on it. This is why I vote down.
– Kamil Maciorowski
Commented Oct 15, 2023 at 9:00

Add a comment |

Stack Exchange Network

How to delete the rest of each line after a certain pattern or a string in a file?

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
text-processing
sed
.

Linked

Hot Network Questions

How to delete the rest of each line after a certain pattern or a string in a file?

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged text-processingsed.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
text-processing
sed
.