1

I want a file content search command, under Linux so that:

  1. it will search on specified files, e.g: md, txt, htm.
  2. recursively from the folder and its subfolders, e.g: .
  3. the content search can be regexp pattern, e.g: tomat.*es
  4. it will output the text around the match
  5. the output is in this format, each file is separated by a blank line:
file1
lineNr1:text1
lineNr2:text2

file2
lineNr1:text1
lineNr2:text2

6/ last criteria, the output needs to be visually clear, so on the terminal, a coloring scheme such as grep

  • file in color_1, say purple
  • lineNr in color_2, say green
  • for the text output:
    • matched text in color_3, say red
    • the rest of the text in color_4, say white

Basically, grep does the job, but I want to change its OUTPUT format, which is:

file1:lineNr1:text1
file1:lineNr2:text2

file2:lineNr1:text1
file2:lineNr2:text2

What I want is to focus my attention on the search result and when you're directory searching, file path name before the search result make it more complex, esp. when you have several matches in a file. What I want is per file have a straight view on what I'm looking for. The more files, subfolders and matches, the more important clear focus it becomes.

As such, grep gives lengthy output, losing focus. Maybe it ought to be a request as new feature for the grep command.

I'm near to what I want.

Suppose test.txt, with these 2 sentences

2023-09-25: after colon char does not output the sentence.
2023-09-25 outputs line as there is NO colon preceding match.

And you perform this cli:

grep -rwn --include=\*.{md,txt} -ie "output.*" --color=always | awk -F: '{if(f!=$1)print "\n"$1; f=$1; print $2 ":" $3;}'

In this example, the output for the 1st line stops at the ":", while for the 2nd line you have a beautiful output. see attached pictureenter image description here

So, unless the match text contains a colon ":", this query does the job. Not having the text output around the match makes the search output less usefull

A more complex sample(can't attach txt file):

    utf-8 encoded

#        We're interested in searching on the word: tomato or tomate in french
In markdown file it can be put in bold using **tomatoes**
In a html file, content is full of tags, put a word in bold can be put in many way, such as <b>tomato</b>

Let's see what the search will return on these combinations:
1. At 6:45 will eat tomato soup.
2. Tomatoes were cooked for the soup recipe, but what time do we eat tomato soup? Isn't it six forty-five, aka 6:45?
3. Tomate en français
4. tomates: pluriel du mot tomate.
Could be tricky to restrict search only on bilingual TOMATO's variation, as for instance in automatically, there is auTOMATically.
Regular expression are of help.

suppose the match lies down in 2 sub-folders, this CLI makes it clear:

grep -rn --include=\*.{md,txt} -iP "tomat[eo]s*" --color=always | awk -F: '{if(f!=$1)print "\n"$1; f=$1; print $2 ":" $3;}'

Output attached, HOWEVER anything after colon char":" fails to be in the output, just replace the colon":" by say ";" and you'll see the difference. enter image description here

compare to grep output enter image description here

Now if one wants to dump the output search result into a plain text file, losing color scheme is losing visual information. Therefore a html file with tag's would restore the coloring information, it might be done with such a html output:

<div class="grep">
<p class="grep_file">file_1</p>
<span class="grep_line">lineNr1</span>:beginning of surrounding match<span class="grep_match">SEARCH_PATTERN</span>end of surrounding match<br>
<span class="grep_line">lineNr2</span>:beginning of surrounding match<span class="grep_match">SEARCH_PATTERN</span>end of surrounding match<br>
</div>

<div class="grep">
<p class="grep_file">file_2</p>
<span class="grep_line">lineNr1</span>:beginning of surrounding match<span class="grep_match">SEARCH_PATTERN</span>end of surrounding match<br>
<span class="grep_line">lineNr2</span>:beginning of surrounding match<span class="grep_match">SEARCH_PATTERN</span>end of surrounding match<br>
</div>

And with styling the classes you get your colorized scheme.

Now, I tried with grep & awk, but another combination might be better idea to do the job.

Thanks

11
  • 1
    I'm afraid this is very confusing. Please edit your question and show us a representative example of your input files, making sure it covers all the cases you need, and the output you need from that example. Also, please add your operating system so we know what tools you have available.
    – terdon
    Commented Sep 25, 2023 at 13:03
  • @terdon. Edit done. Ought to be clear now Commented Sep 25, 2023 at 15:19
  • So you just want the output of the grep without the awk? What are you trying to do with that awk command? Are you just trying to add an empty line between groups of grep results from different files? I know you cannot show color, but can you show at least the text of the output you want? And you still haven't told us what operating system you are using. The grep on Linux is not the same as the one on macOS which is not the same as the one on other UNIX systems.
    – terdon
    Commented Sep 25, 2023 at 15:42
  • Don't use grep to find files, there's a perfectly good tool for finding files with a perfectly obvious name. Keep your code simple, portable, etc. by finding files with find and then g/re/ping within those files using grep (or use awk instead of grep if you want to do something other than just g/re/p - Globally match a Regular Expression and Print the result).
    – Ed Morton
    Commented Sep 25, 2023 at 17:20
  • If you provide concise, testable sample input and expected output as text (not just images) then we can best help you. As of now, I could imagine several different things your question might be asking about.
    – Ed Morton
    Commented Sep 25, 2023 at 17:30

2 Answers 2

1

I think what you want is this:

$ grep -rwn --include=\*.{md,txt} -ie "output.*" --color=always | awk -F: '{if(f!=$1){print "\n"$1;}f=$1; $1=""; }1'

file.txt
 1 2023-09-25  after colon char does not output the sentence.
 2 2023-09-25 outputs line as there is NO colon preceding match.

file1.txt
 1 2023-09-25  after colon char does not output the sentence.
 2 2023-09-25 outputs line as there is NO colon preceding match.

Which looks like this:

screenshot of the result of the above command, showing colors

The problem with your approach was that you were using : as the field delimiter and then explicitly printing only fields 2 and 3. So when there were more : on the line, you missed the rest. What I am doing here is emptying the first field ($1="") and then printing the entire line (1; prints the line; in awk, the default action when something evaluates to true, and 1 always evaluates to true, is to print the line).

You can expand the awk code to this, for clarity:

awk -F: '
 {
   ## If this is a new file name, print the file name
   if ( f != $1 ){
     print "\n"$1
   }
   ## save the 1st field in the variable f
   f=$1
   ## clear the first field
   $1=""
   ## print the line
   print
}'

Important: this will still fail if the file name itself contains a :. You could have a file called file:weird.txt for example. Dealing with that is possible, but requires significantly more scripting so if that is a problem, please update your question to include more example file names or post a new question.

2
  • Surely your answer is of interest, can't up vote it. It's still wird not to have the colon ":" appear in the output, for instance when match is a url(whatever). I have update the question. Commented Sep 25, 2023 at 20:52
  • @user2718593 that is perfectly possible, but we need an example input, and expected output. You can find the target files with find and use its -printf to do this cleanly. I'll try and post a solution tomorrow.
    – terdon
    Commented Sep 25, 2023 at 21:06
1

From this command you provided:

grep -rwn --include=\*.{md,txt} -ie "output.*" --color=always |
    awk -F: '{if(f!=$1)print "\n"$1; f=$1; print $2 ":" $3;}'

I THINK you're trying to find lines that contain a string matching output.* in upper or lower case in files that end in .md or .txt. That'd be:

find . -type f \( -name '*.md' -o -name '*.txt' \) -exec \
    grep -Hin 'output' \
{} +

You're then piping that output to awk to, again I THINK, change the output from this:

file1:lineNr1:text1
file1:lineNr2:text2
file2:lineNr1:text1
file2:lineNr1:text2

to this:

file1
lineNr1:text1
lineNr2:text2

file2
lineNr1:text1
lineNr2:text2

So, this is what you're asking for help to implement for printing to the screen:

$ grep -rwn --include=\*.{md,txt} -ie "output.*" --color=always |
    awk -F':' '{p=f; f=$1; sub(/[^:]+:/,"")} f!=p{print sep f; sep=ORS} 1'
test.txt
1:2023-09-25: after colon char does not output the sentence.
2:2023-09-25 outputs line as there is NO colon preceding match.

enter image description here

but then the ASCII escape sequences to color the results are already present in the grep output when read by awk so if you wanted to then produce HTML tags instead of ASCII escape sequences you'd need to update the awk script to find those escape sequences in it's input and convert them to HTML tags which is kinda backwards and fragile (e.g. what if some of those escape sequences were present in the original input? There would be no way to distinguish those from the ones added by grep) vs just running awk instead of grep on the original input files and having awk print whatever colorizing strings you want.

To just print uncolored text in whatever layout you like, you wouldn't pipe the output of find+grep to awk, you'd replace grep with awk, e.g.

find . -type f \( -name '*.md' -o -name '*.txt' \) -exec \
    awk '
        tolower($0) ~ /output/ {
            if ( !seen[FILENAME]++ ) {
                print ORS FILENAME
            }
            print
        }
    ' \
{} +

If you want color in the output, update the awk script to print the escape sequences or HTML tags or whatever you like for whatever color(s) you want whatever text to be, see https://unix.stackexchange.com/a/669122/133219 and https://stackoverflow.com/questions/64034385/using-awk-to-color-the-output-in-bash/64046525#64046525 for ways to do that for colors on your screen, and see https://stackoverflow.com/a/40722767/1745001 and https://stackoverflow.com/a/39193330/1745001 for ways to color HTML output.

Here's an example of using find+awk in a bash script to format the output as I think you want for printing to the screen:

$ cat tst.sh
#!/usr/bin/env bash
tput sc
trap 'tput rc; exit' EXIT

colors=( reset red green yellow blue purple )
for colorNr in "${!colors[@]}"; do
    fgColorMap+=( "${colors[colorNr]} $(tput setaf $colorNr)" )
done

find . -type f \( -name '*.md' -o -name '*.txt' \) -exec \
    awk -v fgColorMap="${fgColorMap[*]}" '
        BEGIN {
            OFS = ":"
            split(fgColorMap,tmp)
            for ( i=1; i in tmp; i+=2 ) {
                fg[tmp[i]] = tmp[i+1]
            }
        }

        match(tolower($0),/output.*/) {
            if ( !seen[FILENAME]++ ) {
                if ( found++ ) { print "" }
                print fg["purple"] FILENAME fg["reset"]
            }
            print fg["green"] FNR ":" fg["reset"]                  \
                  substr($0,1,RSTART-1)                           \
                  fg["red"] substr($0,RSTART,RLENGTH) fg["reset"] \
                  substr($0,RSTART+RLENGTH)
        }
        END { if ( found ) print "" }
    ' \
{} +

Here is the visible text output:

$ ./tst.sh
./test.txt
1:2023-09-25: after colon char does not output the sentence.
2:2023-09-25 outputs line as there is NO colon preceding match.

Here is the same but showing the color codes:

$ ./tst.sh | cat -A
^[7^[[35m./test.txt^[[30m$
^[[32m1:^[[30m2023-09-25: after colon char does not ^[[31moutput the sentence.^[[30m$
^[[32m2:^[[30m2023-09-25 ^[[31moutputs line as there is NO colon preceding match.^[[30m$
$
^[8$

and here is the colored output:

![enter image description here

To get HTML instead, just change the awk script to print whatever HTML you want. You didn't show any expected HTML output in your question so we can't help you to get what you want since you haven't shown us what you want, but there's lots of existing examples for you to work with (see the references I provided above) so you can always ask a new question later if you can't figure out how to do that.

2
  • Many thanks, text modified, hope it's clearer Commented Sep 27, 2023 at 10:29
  • I updated my answer to hopefully also make what I was saying clearer.
    – Ed Morton
    Commented Sep 27, 2023 at 11:35

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .