I have file names in the following format and I would like to cat files based on substring match(Orange,Apple) and a constant(S4,S5),
file names example _S6_trimmed_, _S8_trimmed_, _S9_trimmed_, _S10_trimmed_
Orange1_S4_trimmed_1.fastq
Orange1_S4_trimmed_2.fastq
Orange2_S4_trimmed_1.fastq
Orange2_S4_trimmed_2.fastq
Apple1_S4_trimmed_1.fastq
Apple1_S4_trimmed_2.fastq
Apple2_S4_trimmed_1.fastq
Apple2_S4_trimmed_2.fastq
Orange1_S5_trimmed_1.fastq
Orange1_S5_trimmed_2.fastq
Orange2_S5_trimmed_1.fastq
Orange2_S5_trimmed_2.fastq
Apple1_S5_trimmed_1.fastq
Apple1_S5_trimmed_2.fastq
Apple2_S5_trimmed_1.fastq
Apple2_S5_trimmed_2.fastq
What I want to do and repeat the same for several samples S4, S5,..
cat Orange*_S4_trimmed_1.fastq >Orange_S4_trimmed_1.fastq
cat Orange*_S4_trimmed_2.fastq >Orange_S4_trimmed_2.fastq
cat Apple*_S4_trimmed_1.fastq >Apple_S4_trimmed_1.fastq
cat Apple*_S4_trimmed_2.fastq >Apple_S4_trimmed_2.fastq
Here is a script I wrote in bash,
#!/bin/bash
filename="samples.txt"
while read -r sample;
do
echo $sample
cat ${sample}_trimmed_1.fastq >${sample}_trimmed_1.fastq
cat ${sample}_trimmed_2.fastq >${sample}_trimmed_2.fastq
done <$filename
Here is the format for my samples.txt file,
samples.txt
Apple*_S4
Apple*_S5
Orange*_S4
Orange*_S5
Is there a better way to do this on a large group of files? Thanks for your help in advance.
A solution I'm currently working on based on comments on Bodo,
#!/bin/bash
for file in *1_*_trimmed_1.fastq;
do
echo $file
subs=`echo $file | cut -d_ -f1 | tr -d 0-9`
echo $subs
sample=`echo $file | cut -d_ -f2`
echo $sample
cat ${subs}*_${sample}_trimmed_1.fastq >${subs}_${sample}_trimmed_1.fastq
cat ${subs}*_${sample}_trimmed_2.fastq >${subs}_${sample}_trimmed_2.fastq
done
1
(and others with2
and maybe more numbers) for the placeholder*
as inApple*_S4_trimmed_1.fastq
? In this case my suggestion would be a loop over all files with1
and construct the corresponding concatenation command._S4_trimmed_
or_S5_trimmed_
or show a longer list of example file names. This might help to propose commands to construct the file names. Better use$( ... )
instead of backticks for command substitution.