1

EDIT: Generally speaking, I'm asking for an elegant way to duplicate an input stream, process it with 2 or more sets of commands, then merge the output of these commands. Emphasis on elegant.

Real-world use case: In a CI pipeline, a bash script is doing git fetch --tags which outputs a long list of tags. The tags all look like desktop-... or mobile-.... We don't really need to see tags from months or years ago, so I'd like to avoid printing the whole list, and instead print just the last 10 (each) desktop and mobile tags. I can imagine a scenario where there would be more than 2 platforms, so in general case we would like to be able to filter an arbitrary number of groups of 10.

Rules:

  • Has to be a neat and elegant bash one-liner, not a script (use some advanced bash features or (more or less common) helper utilities to reduce the amount of code)
  • Should not use temporary files or something else that requires cleanup
  • No piping to awk, perl or similar
  • No complicated bash logic (e.g. counter variables and a while read loop)
  • Has to work both in the terminal and when detached from a tty (like in the CI env)
  • Has to match at least 2 patterns, and print 10 last results for each
  • The results for each pattern should not be intermingled, but let's say this requirement is loose, as long as the solution allows to | sort -V the complete output as the last step
  • Bonus points if the solution can be extended to 3+ patterns

For simplicity of testing, I deleted all the local tags, dumped the output of git fetch --tags to a file called tags, and am just using cat tags during my attempts. To make it easier for you to test possible solutions, here's a simple script that generates a reasonably looking mock input:

for ((i=0; i<100; ++i)); do echo "$((1+RANDOM/5000)).0.$((RANDOM/5000))"; done | sort -V | while read v; do [[ $((RANDOM%2)) == 1 ]] && echo -n "mobile-" || echo -n "desktop-"; echo $v; done >tags

Note: The solution shouldn't refer to the file directly. If it was ok, you could just grep it twice or more to get 10 matches of each pattern. But in the use case I described the command git fetch --tags runs only once (even if it was run more than once, it would output nothing on the second run, since the tags would already be fetched by the first run). So it's only allowed to use this input file once like so: cat tags. This is intended to emulate the real-world use case described above.

My best attempt so far:

cat tags | tee >(grep '\<desktop-' | tail -n 10) | grep '\<mobile-' | tail -n 10

The problem with the "desktop" process substitution is: its stdout is null, so the output is lost. All the examples of process substitution that I've seen redirect the output such commands to a file. I feel like there's gotta be some nice way to "merge" such output back into a single stream, but I couldn't find this way so far.

Solutions that produce the desired output but don't meet my rules for being neat and elegant (both of them are a bit long:

  • Some typical imperative programming, the kind of code you would write if you were using a "real" programming language. It's not in the spirit of UNIX shell scripting, which is using pipes and standard commands, each doing one job (and do it well) to achieve the desired end result.
d=0; m=0; tac tags | while read l; do [[ $d -lt 10 && $l =~ ^desktop- ]] && { echo $l; let ++d; }; [[ $m -lt 10 && $l =~ ^mobile- ]] && { echo $l; let ++m; } done | sort -V
  • Uses pipes and bash's process substituion, but uses a temporary file. "Proper" usage of temp files generally involves mktemp, cleaning up afterwards, handling abnormal script termination (using trap to do cleanup on SIGHUP/SIGINT).
cat tags | tee >(grep '\<mobile-' | tail -n 10 >mobile-tags) | grep '\<desktop-' | tail -n 10 && cat mobile-tags && rm -f mobile-tags
7
  • Are you trying to parse XML? If yes, there's better tools like xmlstarlet or xmllint Commented Feb 21, 2023 at 11:55
  • 1
    Cant do much without a complete sample of input and desired output.
    – Nic3500
    Commented Feb 22, 2023 at 20:11
  • @GillesQuénot Not parsing XML, I already mentioned that my input is a list of git tags, which look like desktop-1.2.3 or mobile-4.5.6 Commented Feb 22, 2023 at 21:02
  • @Nic3500 If you insist, run this to get a sample input to work with: for ((i=19; i>0; --i)); do echo "desktop-$i.$((i/2)).0"; echo "mobile-$((i+5)).$((i/2+1)).0"; done >tags The following command produces the output I want to see - but only in an interactive shell, e.g. redirecting the output to a file will not produce the same result: cat tags | tee >(grep '\<desktop-' | tail -n 10 >/dev/tty) | grep '\<mobile-' | tail -n 10 Commented Feb 22, 2023 at 21:07
  • Do you "accept" the use of ; ? Then it becomes quite simple, grep pattern1 | sort | tail; grep pattern2 | sort | tail.
    – Nic3500
    Commented Feb 23, 2023 at 4:06

3 Answers 3

1

Ok, got it this morning, plumbing:

the idea is to dup()licate the output on both STDERR and STDOUT with tee and parse it in //:

Here we go, could be a one liner as well, on multiline for readability:

git fetch --tags | tee /dev/stderr 2> >(
grep -w mobile   | tail -n10) 1> >(
grep -w desktop  | tail -n10) 

Or with the help or your brilliant comments (kind of brainstorming):

git fetch --tags | tee  /dev/fd/{10,11,12} 10> >(
grep -w desktop  | tail -n 3) 11> >(
grep -w mobile   | tail -n 3) 12> >(
grep -w foobar   | tail -n 3) 1>/dev/null

Could be simplified as:

git fetch --tags | tee >(
grep -w mobile   | tail -n3) >(
grep -w desktop  | tail -n3) >(
grep -w foobar   | tail -n3) > /dev/null

Related: http://mywiki.wooledge.org/ProcessSubstitution

10
  • First of all, the code works. It's kinda brilliant, but I am still trying to understand it 100%. I understand that tee /dev/stderr duplicates the input so it comes out on both stdout and stderr of tee. You redirect each one into a subshell using IO redirection and process substitution. So one subshell's stdin is tee's stdout, and the other's is tee's stderr. The thing I don't understand is that where does the subshell's stdouts go, and why. I don't fully understand the difference between 1> >(grep ...) and if I would replace this with | grep ... or | (grep ...)... Commented Feb 24, 2023 at 18:28
  • If I do use a |, then the first subshell's (grep mobile one) output disappears. This is what I was experiencing in my experiments, because I was using a pipe after the tee. Can you pls explain what's going on here with the IO redirections? Commented Feb 24, 2023 at 18:29
  • I also noticed that the 2>&1 at the end is not needed. Outer {} along works, or | cat. With 2>/dev/null, both desktop and mobile still appear. Or >outfile - both desktop and mobile results end up in the file. I am struggling to understand why is this so, logically it would seem that the grep mobile subshell would output to stderr. So far my guess is that in your tee command, the subshell's get different streams as their stdins, but both inherit the same stdout (the parent's stdout). But I don't really understand is why it is so, e.g. which part of bash's manpage explains that? Commented Feb 24, 2023 at 18:35
  • 1
    So this general template seems to work, and its results don't change if the order of redirections is changed: cat tags | tee /dev/fd/{3,4,5,...} 3> >(filter1) 4> >(filter2) 5> >(filter3) ... >/dev/null Commented Feb 24, 2023 at 19:07
  • 1
    Added version without using /dev/fd/{10,11,12} Commented Feb 24, 2023 at 23:17
1

Using and the good switches:

$ grep -Ewm20 'desktop|mobile' tags | sort
desktop-10.5.0
desktop-11.5.0
desktop-12.6.0
desktop-13.6.0
desktop-14.7.0
desktop-15.7.0
desktop-16.8.0
desktop-17.8.0
desktop-18.9.0
desktop-19.9.0
mobile-15.6.0
mobile-16.6.0
mobile-17.7.0
mobile-18.7.0
mobile-19.8.0
mobile-20.8.0
mobile-21.9.0
mobile-22.9.0
mobile-23.10.0
mobile-24.10.0
6
  • This produces the expected result only for this artificial input, which has desktop and mobile tags alternating one by one. In real life it will not be like this. Modify the command that produces the artificial input and it all falls apart. E.g. for ((i=19; i>0; --i)); do echo "desktop-$i.$((i/2)).0"; echo "desktop-$((i+3)).$((i/2)).0"; echo "mobile-$((i+5)).$((i/2+1)).0"; done >tags . Your solution then outputs 14 desktop tags and 6 mobile ones. I want 10 of each. I don't think grep alone can do that. Commented Feb 23, 2023 at 22:11
  • Then your question is wrong, edit your post with required real sample input Commented Feb 23, 2023 at 22:15
  • I've edited the post and included the updated script to generate the input data. I admit my initial mock data was not the best quality. As with any unit testing, you can't cover every possible input data. If the data used follows some patterns which don't actually match real-life data, an erroneous solution may pass the test. Hopefully my new set is better. Commented Feb 23, 2023 at 22:55
  • Thanks for introducing the -m option to grep, though - didn't know about this one. Commented Feb 23, 2023 at 22:57
  • Usually, the way to thanks people giving time and tips is to upvote... Moreover, was working with your original requirements Commented Feb 23, 2023 at 23:09
1

What about this one liner?

d=( $(grep desktop tags) ) m=( $(grep mobile tags) ); printf '%s\n' "${d[@]:0:10}" "${m[@]:0:10}" | sort -rV
5
  • This looks like it can do the job. I had to slightly adjust the command to give the output I expected to see, like so: d=( $(grep desktop tags | tail -n 10) ) m=( $(grep mobile tags | tail -n 10) ); printf '%s\n' "${d[@]:0:10}" "${m[@]:0:10}" Commented Feb 23, 2023 at 23:33
  • It's nice how printf can be used to implement something like array.join('\n') of most programming languages. Didn't know about that. From help printf: "The format is re-used as necessary to consume all of the arguments." Commented Feb 23, 2023 at 23:34
  • I have to say the idea to use arrays didn't come to my mind. I think arrays have always been my weakest skill in bash. Every time that I know that I should use an array for some task, I have to look up the manpage to refresh my memory on how to declare and access them :) Commented Feb 23, 2023 at 23:36
  • So far your solution is the best, temporary variables are much better than temporary files. Still, in a perfect world, you have to make sure that the variable names are unique (to not overwrite something else), and then it's not a bad idea to do cleanup when done (unset the variables). Obviously, there's much less chance to cause issues than when using temporary files. Still, a perfect solution wouldn't involve temporary variables or files. I just keep thinking that there's gotta be a way to do it with some weird combination of pipes, IO redirections, process substitutions and whatnot. Commented Feb 23, 2023 at 23:40
  • You could try using plumbing with file descriptors. Now need to sleep(3600*8) Commented Feb 23, 2023 at 23:46

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .