6

I finally figured out how to append text to the end of each line in a file:

perl -pe 's/$/addthis/' myfile.txt

However, as I'm trying to learn Perl for frequent regex use, I can't figure out why is it that the following perl command adds the text 'addthis' to the end and start of each line:

perl -pe 's/$/addthis/g' myfile.txt

I thought that '$' matched the end of a line no matter what modifier was used for the regex match, but I guess this is wrong?

1
  • 1
    did you read the documentation for $?
    – ysth
    Commented Feb 18, 2013 at 16:39

3 Answers 3

11

Summary: For what you're doing, drop the /g so it only matches before the newline. The /g is telling it to match before the newline and at the end of the string (after the newline).

Without the /m modifier, $ will match either before a newline (if it occurs at the end of the string) or at the end of the string. For instance, with both "foo" and "foo\n", the $ would match after foo. With "foo\nbar", though, it would match after bar, because the embedded newline isn't at the end of the string.

With the /g modifier, you're getting all the places that $ would match -- so

s/$/X/g;

would take a line like "foo\n" and turn it into "fooX\nX".

Sidebar: The /m modifier will allow $ to match newlines that occur before the end of the string, so that

s/$/X/mg;

would convert "foo\nbar\n" into "fooX\nbarX\nX".

5

As Jim Davis pointed out, $ matches both the end of the string, or before the \n character (with the /m option). (See the Regular Expressions section of the perlre Perldoc page. Using the g modifier allowed it to continue matching.

Multiple line Perl regular expressions (i.e., Perl regular expressions with the new line character in them even if it only occurs once at the end of the line) causes all sorts of complications that most Perl programmers have issues handling.

  • If you're reading in a file one line at a time, always use chomp before doing ANYTHING with that line. This would have solved your issue when using the g qualifier.

  • Further issues can happen if you're reading files on Linux/Mac which came from Windows. In that case, you will have both the \r and \n character. As I found out recently in attempting to debug a program, the \r character isn't removed by chomp. I now make sure I always open my text files for reading

Like this:

open my $file_handle, "<:crlf", $file...

This will automatically substitute the \r\n characters with just \n if this is in fact a Windows file on a Linux/Mac system. If this is a regular Linux/Mac text file, it will do nothing. Other obvious solution is not to use Windows (rim shot!).

Of course, in your case, using chomp first would have done the following:

$cat file
line one
line two
line three
line four
$ perl -pe 'chomp;s/$/addthis::/g`
line oneaddthis::line twoaddthis::line threeaddthis::line fouraddthis::

The chomp removed the \n, so now, you don't see it when the line print out. Hmm...

$ perl -pe 'chomp;s/$/addthis/g;print "\n";
line oneaddthis
line twoaddthis
line threeaddthis
line fouraddthis

That works! And, your one liner is only mildly incomprehensible.


The other thing is to take a more modern approach that Damian Conway recommends in Chapter 12 of his book Perl Best Practices:

Use \A and \z as string boundary anchors.

Even if you don’t adopt the previous practice of always using /m, using ^ and $ with their default meanings is a bad idea. Sure, you know what ^ and $ actually mean in a Perl regex1. But will those who read or maintain your code know? Or is it more likely that they will misinterpret those metacharacters in the ways described earlier? Perl provides markers that always—and unambiguously—mean “start of string” and “end of string”: \A and \z (capital A, but lowercase z). They mean “start/end of string” regardless of whether /m is active. They mean “start/end of string” regardless of what the reader thinks ^ and $ mean.

If you followed Conaway's advice, and did this:

perl -pe 's/\z/addthis/mg' myfile.txt

You would see that your phrase addthis got added to only to the end of each and every line:

$cat file
line one
line two
line three
line four
$ perl -pe `s/\z/addthis/mg` myfile.txt
line one
addthisline two
addthisline three
addthisline four
addthis

See how well that works. That addthis was added to the very end of each line! ...Right after the \n character on that line.

Enough fun and back to work. (Wait, it's President's Day. It's a paid holiday. No work today except of course all that stuff I promised to have done by Tuesday morning).

Hope this helped you understand how much fun regular expressions are and why so many people have decided to learn Python.


1. Know what ^ and $ really mean in Perl? Uh, yes of course I do. I've been programming in Perl for a few decades. Yup, I know all this stuff. (Note to self: $ apparently doesn't mean what I always thought it meant.)

1
  • Thank you for the informative and entertaining answer! I thought I knew what '$' meant... <humbly bows head> this young grasshopper has much to learn :)
    – drapkin11
    Commented Feb 21, 2013 at 0:18
0

A workaround :

perl -pe 's/\n/addthis\n/' 

no need g modifier : the regex is treated line by lines.

Not the answer you're looking for? Browse other questions tagged or ask your own question.