5

I've got a bash script that cleans up the mail queue periodically. For Reasons, we've elected to delete any email to @mms.att.net and other email2SMS gateways that are over 9 hours in the queue and still not delivered.

Simplified, the script does this:

domains=`cat /etc/mail/email2textdomains.txt`
egrep $domains /var/log/maillog | .... other tasks

and the content of /etc/mail/email2textdomains.txt is exactly

"mms.att.net|vtxt.com|vtext.com|vzwpix.com"

So the egrep line should be this, which is exactly what I'd type at the command line.

egrep "mms.att.net|vtxt.com|vtext.com|vzwpix.com" file | ...

If I ran it like this, then it's a 5+ stage pipeline of commands each reading stdin from the previous stdout. This is clearly not the search I want to do.

egrep  mms.att.net|vtxt.com|vtext.com|vzwpix.com  file | ...

However when run, the two double quotes are treated differently - they become part of the string, so we're essentially searching for

  • "mms.att.net
  • vtxt.com
  • vtext.com
  • vzwpix.com"

Clearly I've misunderstood how quoting works - the resolution was to change the included line to remove the double quotes, resulting in a line that should not work, but does.

I've tried testing by piping to od -a which does not show any non-printing characters.

Why does it work, making the content of /etc/mail/email2textdomains.txt is exactly

mms.att.net|vtxt.com|vtext.com|vzwpix.com

when it should be a long pipeline to failure as written?

5
  • Aside - why are we fiddling the mailqueue directly? Because these ~4 domains all outsource their email to SMS gateway services to cloudfilter.net which has very low limits for messages/hour and old messages are better-off deleted if undelivered.
    – Criggie
    Commented Aug 11, 2023 at 4:04
  • 3
    Yes; since the file contains quotes, var=$(cat file) will put the quotes into var. The quotes are data, and not code. var="abc" puts only abc into var, because those quotes are shell syntax. Quotes that arise from variable substitution, like egrep $domains are also not syntax, but data. eval egrep $domains should do it, if you're confident that the file contains valid shell syntax with no injection pitfalls. The eval command will re-process the entire substitution and expansion as shell syntax and thus recognize the quotes.
    – Kaz
    Commented Aug 11, 2023 at 5:40
  • 2
    1. don't use egrep, use grep -E. 2. double-quote $domains on the grep -E command line. 3. better yet, there's no need for command substitution here, grep has -f option for reading the pattern list from a file (e.g. grep -f /etc/mail/email2textdomains.txt /var/log/maillog. 4. since this is a list of domains with .s in them, it would probably a good idea to use grep's -F option for fixed strings rather than regexes and -w for "whole-word" matches....otherwise you get false positives like vtext.com matching domains like somevtextycomputer.net because vtextycom matches.
    – cas
    Commented Aug 11, 2023 at 12:18
  • 3
    I'm surprised nobody has yet linked to BashFAQ #50, which is directly on-point; the reasons people can't store arbitrary commands as strings are the same as the reasons quotes that are data (like ones that come from a variable) and quotes that are syntax (typed into your source code) can't substitute for each other, which is the heart of your problem here. You want that distinction to exist; it would be impossible to write secure code handling untrusted data in bash otherwise. Commented Aug 11, 2023 at 15:51
  • 1
    Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Unix & Linux Meta, or in Unix & Linux Chat. Comments continuing discussion may be removed.
    – terdon
    Commented Aug 15, 2023 at 10:50

2 Answers 2

8

A great tool when trying to debug this sort of thing is set -x. Using that, we can see exactly what your commands are doing:

$ set -x
$ domains=$(cat domains.txt)
++ cat domains.txt
+ domains='"mms.att.net|vtxt.com|vtext.com|vzwpix.com"'

As you can see, $domains includes the quotes. So when you use it with grep, you get:

$ grep -E -- "$domains" file
+ grep --color -E -- '"mms.att.net|vtxt.com|vtext.com|vzwpix.com"' file

What you wanted to do is to use the quotes at the shell level, before the data are passed to the grep command, but since the quotes are part of the variable's data, they are treated just like any other character. The simplest solution is to remove the quotes from the file and then to just quote your variables, which is best practice anyway:

domains=$(tr -d \" < domains.txt) &&
grep -E -- "$domains" file

As an aside, using var=$(command) is preferred over using var=`command` because the former is clearer and allows more nesting, and egrep is deprecated in favor of grep -E.

Also beware that . is a regex operator that matches any single character, so grep mms.att.net actually finds the lines that contain mms followed by any single character followed by att followed by any single character followed by net. So for example, it would also match on a line containing hammstattinet.com.

So to build an Extended regular expression that matches on lines that contain any of those domains, you would not only have to remove the "s but also escape all the characters in domain names that also happen to be regex operators. For valid domain names, that should be limited to ..

Also beware that for an empty regex, the behaviour varies between grep implementations, but many of them would report all the lines, so you may want to treat it specially.

So:

regex=$(
  sed 's/"//g; # remove all "s like with tr
       s/\./\\./g; # substitute .s with \.s
      ' domains.txt
) && 
  [ -n "$regex" ] && # check it's not empty 
  grep -E -- "$regex" file

Alternatively, you could replace the |s with newlines and use the -F option of grep (formerly fgrep) to look for Fixed strings:

domains=$(<domains.txt tr -d '"' | tr '|' '\n') &&
  [ -n "$domains" ] &&
  grep -F -- "$domains" file
0
4

@Kaz should write up his comment so it can be the accepted answer.

If you wish to avoid the eval then I think you should rewrite your code to put in additional quotes. My overly simplistic rule is that every dollar sign should be inside double quotes unless you know better.

I would change /etc/mail/email2textdomains.txt to be one domain per line, to take advantage of the fact that grep allows a newline as a way to express alternatives i.e.

mms.att.net
vtxt.com
vtext.com
vzwpix.com

and say

domains="$(cat /etc/mail/email2textdomains.txt)"
grep -- "$domains" /var/log/maillog | .... other tasks

The quotes are only on the first line to satisfy my rule, they are not needed. The -- is there to protect against a leading - inside the textdomains file. Using straight grep rather than egrep or grep -E to increase portability. In effect you are writing

grep -- "mms.att.net
vtxt.com
vtext.com
vzwpix.com" /var/log/maillog | .... other tasks
16
  • 2
    If you have one line per domain, you can simply use grep -f /etc/mail/email2textdomains.txt /var/log/maillog as well.
    – DonHolgo
    Commented Aug 11, 2023 at 10:40
  • 5
    @OlivierDulac it is POSIX: "The pattern_list's value shall consist of one or more patterns separated by <newline> characters"
    – muru
    Commented Aug 11, 2023 at 12:33
  • 3
    You should also use the -F option so these are matched as fixed strings rather than regular expressions.
    – Barmar
    Commented Aug 11, 2023 at 13:47
  • 1
    @OlivierDulac glad you have learnt something and your mind has recovered from being blown.
    – icarus
    Commented Aug 11, 2023 at 14:03
  • 1
    @Wastrel, as long as there are no glob characters, yes. Then you need set -f in addition to modifying IFS, and both are global settings, so annoying in larger scripts. Or you could just use readarray.
    – ilkkachu
    Commented Aug 14, 2023 at 6:48

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .