0

I would like to compare one column from one file against another column in another file. If the columns match, I would like to append the first column from the first file, as the first column of the second file.

File1

FID IID
456 2
123 1
789 3
112 4

File2

IID column1 column2 column3... column464
1 Value11 Value12 etc etc ... etc
7 Value71 Value72 etc etc ... etc
2 Value21 Value22 etc etc ... etc
6 Value61 Value62 etc etc ... etc
3 Value31 Value32 etc etc ... etc

Desired output

FID IID column1 column2 column3... column464
123 1 Value11 Value12 etc etc ... etc
456 2 Value21 Value22 etc etc ... etc
789 3 Value31 Value32 etc etc ... etc
4
  • 1
    Edited and updated now. I know it would be appropriate to use awk, but I really do not know where to start.
    – zerberus
    Commented Feb 3, 2022 at 12:19
  • You can use join command, but before you need to sort files (with sort command)
    – golder3
    Commented Feb 3, 2022 at 12:21
  • Is it important that the resulting file start with FID column? Because you're comparing it by IID column...
    – golder3
    Commented Feb 3, 2022 at 12:22
  • I have considering switching the FID and IID column, but then there are another 462 columns, and writing them out all in the order of how I want them to be may be tricky.
    – zerberus
    Commented Feb 3, 2022 at 13:07

2 Answers 2

2

Using awk:

awk 'NR==FNR { id[$2]=$1; next } ($1 in id){ print id[$1], $0}' file1 file2

within first action block we are reading the file1 into an associated array called id where they keys are the column $2 and their value is column $1; about the NR==FNR which is an always true condition when processing the first input file which are awk's internal environment variables and NR (Number of Records) update for every input as same to FNR (File Number of Record) but FNR reset for the each next input file.

then within next action block we are checking if that first column $1 of the file2 exist in the id array then print its corresponding value id[$1] followed by the entire line $0 of the file2.

0
0

For a limited number of fields, try

join -12 -21 -o1.1,0,2.2,2.3,2.4  <(sort -nk2,2 file1) <(sort -n file2) 2>/dev/null
FID IID column1 column2 column3...
123 1 Value11 Value12 etc
456 2 Value21 Value22 etc
789 3 Value31 Value32 etc

but with your alluded number of fields, the format might become somewhat cumbersome.

3
  • there are a lot of columns based on the header column464, so I don't think so it's can easily done by join as comparing columns are also vary between two files or swamping the columns with another command after that Commented Feb 3, 2022 at 12:54
  • 1
    interestingly it's can be done with the help of the brace expansion join -12 -21 -o1.1 -o2.{1..464} <(sort -nk2,2 file1) <(sort -n file2) Commented Feb 3, 2022 at 13:00
  • except for -o2 expansion needs to start from 1 also as in join -1 2 -2 1 -o1.1 -o2.{1..264} <(sort -nk2 file1) <(sort -n file2)
    – golder3
    Commented Feb 3, 2022 at 13:08

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .