Compare one column in one file, against another column and if they match, append column

Question

I would like to compare one column from one file against another column in another file. If the columns match, I would like to append the first column from the first file, as the first column of the second file.

File1

File2

IID column1 column2 column3... column464
1 Value11 Value12 etc etc ... etc
7 Value71 Value72 etc etc ... etc
2 Value21 Value22 etc etc ... etc
6 Value61 Value62 etc etc ... etc
3 Value31 Value32 etc etc ... etc

Desired output

FID IID column1 column2 column3... column464
123 1 Value11 Value12 etc etc ... etc
456 2 Value21 Value22 etc etc ... etc
789 3 Value31 Value32 etc etc ... etc

Edited and updated now. I know it would be appropriate to use awk, but I really do not know where to start. — zerberus, Commented Feb 3, 2022 at 12:19
You can use join command, but before you need to sort files (with sort command) — golder3, Commented Feb 3, 2022 at 12:21
Is it important that the resulting file start with FID column? Because you're comparing it by IID column... — golder3, Commented Feb 3, 2022 at 12:22
I have considering switching the FID and IID column, but then there are another 462 columns, and writing them out all in the order of how I want them to be may be tricky. — zerberus, Commented Feb 3, 2022 at 13:07

αғsнιη · Accepted Answer · 2022-02-03 12:50:06Z

Using awk:

awk 'NR==FNR { id[$2]=$1; next } ($1 in id){ print id[$1], $0}' file1 file2

within first action block we are reading the file1 into an associated array called id where they keys are the column $2 and their value is column $1; about the NR==FNR which is an always true condition when processing the first input file which are awk's internal environment variables and NR (Number of Records) update for every input as same to FNR (File Number of Record) but FNR reset for the each next input file.

then within next action block we are checking if that first column $1 of the file2 exist in the id array then print its corresponding value id[$1] followed by the entire line $0 of the file2.

RudiC · Accepted Answer · 2022-02-03 12:40:43Z

0

For a limited number of fields, try

join -12 -21 -o1.1,0,2.2,2.3,2.4  <(sort -nk2,2 file1) <(sort -n file2) 2>/dev/null
FID IID column1 column2 column3...
123 1 Value11 Value12 etc
456 2 Value21 Value22 etc
789 3 Value31 Value32 etc

but with your alluded number of fields, the format might become somewhat cumbersome.

answered Feb 3, 2022 at 12:40

RudiC

8,9792 gold badges10 silver badges22 bronze badges

there are a lot of columns based on the header column464, so I don't think so it's can easily done by join as comparing columns are also vary between two files or swamping the columns with another command after that
– αғsнιη
Commented Feb 3, 2022 at 12:54
1

interestingly it's can be done with the help of the brace expansion join -12 -21 -o1.1 -o2.{1..464} <(sort -nk2,2 file1) <(sort -n file2)
– αғsнιη
Commented Feb 3, 2022 at 13:00
except for -o2 expansion needs to start from 1 also as in join -1 2 -2 1 -o1.1 -o2.{1..264} <(sort -nk2 file1) <(sort -n file2)
– golder3
Commented Feb 3, 2022 at 13:08

Add a comment |

Stack Exchange Network

Compare one column in one file, against another column and if they match, append column

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
text-processing
awk
.

Hot Network Questions

Compare one column in one file, against another column and if they match, append column

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged text-processingawk.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
text-processing
awk
.