0

I was doing a little script to find and count all "the" appearences in numerous files and subdirectories, then i need to print the file address and the number of "the" appearances. But I don't know how to finish it.

    find . -name "*.txt" -type f -printf "%p\t" -exec grep -c "the" {}\; 

ex.sh the name of the program

txt a file extension

the a word to find the appearences

  • The correct output should be:

    ./ex.sh txt the
    
    ./etext00/00ws110.txt 42764
    ./etext00/1cahe10.txt 26692
    ./etext00/1vkip11.txt 21895
    ./etext00/2cahe10.txt 24604
    ./etext00/2yb4m10.txt 15476
    ./etext00/8rbaa10.txt 3131
    
  • What i get:

    ./etext00/00ws110.txt   35388
    ./etext00/1cahe10.txt   17905
    ./etext00/1vkip11.txt   14617
    ./etext00/2cahe10.txt   16971
    ./etext00/2yb4m10.txt   9938
    ./etext00/8rbaa10.txt   1839
    

    Which I assume it's the number of lines containing a "the" appearance but in some lines there can be more than 1 "the".

2 Answers 2

0

Use grep -o the and count the number of lines that this generates:

find . -name "*.txt" -type f -printf "%p\t" \
    -exec sh -c 'grep -o "the" "$0" | wc -l' {} \; 

grep -o returns every match on every line, on separate lines (one match is returned per output line).

You may additionally want to use -wi with grep to include The (case-insensitivity) and to exclude matches like the in theory (full word matching).

2
  • 1
    Also grep is case sensitive, maybe you want the -i flag as well to match "The" (for instance) as well Commented Mar 22, 2018 at 11:48
  • @AndreiCioara Made a note about that.
    – Kusalananda
    Commented Mar 22, 2018 at 11:49
0

Since you're already using GNU extensions (-printf), with GNU awk, you could do:

find . -name '*.txt' -size +2c -readable -type f -exec gawk -v RS=the '
   ENDFILE {print FILENAME "\t" (FNR - ($0 != ""))}' {} +

That is use txt as the record delimiter, and report the number of records after processing each file. But do not count the extra record that may (and generally will) occur after the last occurrence of txt.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .