Skip to main content
rework based on new information
Source Link
glenn jackman
  • 26.8k
  • 7
  • 47
  • 73

Depending on the answer to my comment, this is a first attemptWith tab-separated files:

awk '
    BEGIN { FS = OFS = "\t" }
    NR == FNR {fn[$1] = $4 "_" $5;$4; next} 
    {fprint =$1, $2, $3, ($3 in fn) ? fn[$3] : "FALSE"; print $1, $2, $3, f"FALSE"), $4, $5}
' file2.txt file1.txt | column -t
Query       No.   Accession  Function_        Function    Name    DB
EFX03602.1  1006    PHI:1006    HMG-CoA_ReductaseCoA Reductase   HMR1    Not_Available
EFX00827.1  101   PHI:101    Polyketide_synthasePolyketide synthase ALB1    AAC39471
EFX01509.1  101   PHI:101    Polyketide_synthasePolyketide synthase ALB1    AAC39471
EFX05810.1  1010    PHI:1010   FALSE             FALSE   SID1    XM_385547
EFX00466.1  1026    PHI:1026   Phospholipase_C    Phospholipase C bcplc1  AAB39564

Depending on the answer to my comment, this is a first attempt:

awk '
    NR == FNR {fn[$1] = $4 "_" $5; next} 
    {f = ($3 in fn) ? fn[$3] : "FALSE"; print $1, $2, $3, f, $4, $5}
' file2.txt file1.txt | column -t
Query       No.   Accession  Function_            Name    DB
EFX03602.1  1006  PHI:1006   HMG-CoA_Reductase    HMR1    Not_Available
EFX00827.1  101   PHI:101    Polyketide_synthase  ALB1    AAC39471
EFX01509.1  101   PHI:101    Polyketide_synthase  ALB1    AAC39471
EFX05810.1  1010  PHI:1010   FALSE                SID1    XM_385547
EFX00466.1  1026  PHI:1026   Phospholipase_C      bcplc1  AAB39564

With tab-separated files:

awk '
    BEGIN { FS = OFS = "\t" }
    NR == FNR {fn[$1] = $4; next} 
    {print $1, $2, $3, ($3 in fn ? fn[$3] : "FALSE"), $4, $5}
' file2.txt file1.txt 
Query   No. Accession   Function    Name    DB
EFX03602.1  1006    PHI:1006    HMG-CoA Reductase   HMR1    Not_Available
EFX00827.1  101 PHI:101 Polyketide synthase ALB1    AAC39471
EFX01509.1  101 PHI:101 Polyketide synthase ALB1    AAC39471
EFX05810.1  1010    PHI:1010    FALSE   SID1    XM_385547
EFX00466.1  1026    PHI:1026    Phospholipase C bcplc1  AAB39564
Source Link
glenn jackman
  • 26.8k
  • 7
  • 47
  • 73

Depending on the answer to my comment, this is a first attempt:

awk '
    NR == FNR {fn[$1] = $4 "_" $5; next} 
    {f = ($3 in fn) ? fn[$3] : "FALSE"; print $1, $2, $3, f, $4, $5}
' file2.txt file1.txt | column -t
Query       No.   Accession  Function_            Name    DB
EFX03602.1  1006  PHI:1006   HMG-CoA_Reductase    HMR1    Not_Available
EFX00827.1  101   PHI:101    Polyketide_synthase  ALB1    AAC39471
EFX01509.1  101   PHI:101    Polyketide_synthase  ALB1    AAC39471
EFX05810.1  1010  PHI:1010   FALSE                SID1    XM_385547
EFX00466.1  1026  PHI:1026   Phospholipase_C      bcplc1  AAB39564