3

Compare the following commands:

cat f.txt | grep "someText" 
grep "someText" f.txt

They both seem to work. But the documentation for cat says cat outputs the contents of the file rather than the file name and grep command takes file name but not file contents (correct me if I am wrong), then why does the first command work since it is feeding grep with the file contents rather than the file name.

Another question: they both work but why would one use the first line instead of the second, the first one is just redundant?

2 Answers 2

9

In your first example

cat f.txt | grep "someText" 

grep doesn't get a filename argument, only a string to search for. In that case grep will read the text to search from standard input. In this case that standard input is piped in from the output of cat f.txt, which outputs the content of the file not the filename.
What you also could have done to make grep read from stdin is to use:

< f.txt grep "someText"

Using cat is quite often redundant on its own (independent of grep) and can be replaced by the input redirection as above. I would always use the second form in your example, unless you have to do some preprocessing on the input.

8
  • 2
    I see. I think I should learn more about linux pipe online then, thank you
    – Kun
    Commented Apr 23, 2016 at 15:55
  • But doesn't grep only take filename as its argument?
    – Kun
    Commented Apr 23, 2016 at 15:59
  • @kun No like I indicated (if you don't specify any options) the first argument is the string to search for. If there are more arguments, then those is/are the file(s) to search through. Look at man grep and carefully study the SYNOPSIS in there.
    – Anthon
    Commented Apr 23, 2016 at 16:32
  • I think you didn't get my point. I am saying that grep's last operand should be a file name rather than the contents of the file, which are different, right? I completely understand its first operand is the pattern string.
    – Kun
    Commented Apr 23, 2016 at 17:23
  • 1
    @kun, the grep man page says grep [OPTIONS] PATTERN [FILE...] -- that means the only required argument is the pattern. You may provide one or more files, but the square brackets indicate the files are optional. The man page says (the 2nd sentence): "If no files are specified, or if the file “-” is given, grep searches standard input." Commented Apr 23, 2016 at 18:16
2

There are two main reasons for using cat as in your first example:

  1. As a place-holder for some other command or long and complicated pipeline of commands.

    e.g. if you're writing a script or one-liner to process a huge file, or data from a psql/mysql or wget or jq etc query, you might save (some of) the input to a sample file and use cat sample as input until you get your script or one-liner right. Then just replace the cat with the actual command or pipeline.

    Similarly, it's a useful place-holder if your purpose is to teach someone about pipes.

    (Many people call this a Useless Use of Cat or UUOC. That's largely because they're smug and like to use their allegedly superior knowledge to beat down novices rather than help them learn - the terrible crime of using cat rather than < is so important that it can't be ignored as if it were just a trivial detail. the world would end, it would be a catastrophe.)

  2. When you don't want a program to know the filename(s) of the input file(s). e.g. cat * | grep ... is different to grep ... *.

    This doesn't matter very often but when it does, it can matter a lot.

    For grep, you can suppress listing of the filenames with -h, but other programs have no such option - wc, for example, will always output the filenames and per-file counts, even if you don't want them.

    You can, of course, use something like wc * | tail -1 | awk '{print $1, $2, $3}' but that doesn't work if you're doing something like find . -type f -exec wc {} + and the list of filenames generated by find exceeds the shell's maximum command line length - in that case, you get multiple 'total' lines (and there's no way of distinguishing between an actual total line and a file with the name 'total' in wc's output).

    find . -type f -exec cat {} + | wc produces just one line of output (the totals) no matter how many files find finds, none or one or many.

    (wc really needs both --totals-only and --no-totals options.)

1
  • 1
    Another practical use of using cat is to ensure that the program the data is being fed to can handle a pipe correctly. If you feed a file with shell redirection < the process will still be able to seek in it, which it won't be able to do if you pipe the output of cat. So cat can be used to check for compatibility when you might later want to put more complex processing in the pipeline. Commented Jan 21, 2017 at 7:07

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .