4

I am aware of diff and using loops but I just cant seem to really get what I need with diff. I'm basically looking to compare two files (file2.txt and file2.txt) and just get the output of what is missing between them.

Objective 1: Find what is missing in file2.txt from file1.txt

Objective 2: Find what is missing in either file. Some lines may exist in file2.txt that arent in file1.txt. I'd like to know about them as well.

diff only tells me that the two files arent the same, going line by line comparing the differences. What I need is a program that goes through the file, and doesn't discriminate by lines. If a line containing '/bin/mount' is found on line 2 of file1.txt and is found on line 59 of file2.txt, then I don't need to know about it. I only want to know what isn't there as a whole. Can this be done?

3 Answers 3

7

If you don't care about the line order, sort the files first. To see what lines are missing in what file, use comm instead of diff:

comm <(sort file1) <(sort file2)
3
  • What a simple and easy command. Never knew about it. But how can I grep out only the unique entries from whatever column I want?
    – unixpipe
    Commented Aug 31, 2014 at 19:38
  • @unixpipe: Have you read man comm?
    – choroba
    Commented Aug 31, 2014 at 19:39
  • I just did now, apologies. I am able to suppress either columns. These are great answers. I wish I can answer you guys both because they are both right!
    – unixpipe
    Commented Aug 31, 2014 at 19:44
4

Objective 1: Find what is missing in file2.txt from file1.txt

With grep:

grep -xvFf file2.txt file1.txt

With comm:

comm -13 <(sort file1.txt) <(sort file2.txt)

With sort and uniq:

sort file2.txt file2.txt file1.txt | uniq -u

Objective 2: Find what is missing in either file. Some lines may exist in file2.txt that arent in file1.txt. I'd like to know about them as well.

With grep:

grep -xvFf file1.txt file2.txt; grep -xvFf file2.txt file1.txt

With comm:

comm -3 <(sort file1.txt) <(sort file2.txt) | tr -d '\t'

With sort and uniq:

sort file1.txt file2.txt | uniq -u
1
  • 1
    You guys are the best
    – unixpipe
    Commented Aug 31, 2014 at 19:43
0

Here is a simple code to match the similarity percentage between two file

import numpy as np
def levenshtein(seq1, seq2):
    size_x = len(seq1) + 1
    size_y = len(seq2) + 1
    matrix = np.zeros ((size_x, size_y))
    for x in range(size_x):
        matrix [x, 0] = x
    for y in range(size_y):
        matrix [0, y] = y

    for x in range(1, size_x):
        for y in range(1, size_y):
            if seq1[x-1] == seq2[y-1]:
                matrix [x,y] = min(
                    matrix[x-1, y] + 1,
                    matrix[x-1, y-1],
                    matrix[x, y-1] + 1
                )
            else:
                matrix [x,y] = min(
                    matrix[x-1,y] + 1,
                    matrix[x-1,y-1] + 1,
                    matrix[x,y-1] + 1
                )
    #print (matrix)
    return (matrix[size_x - 1, size_y - 1])

with open('original.txt', 'r') as file:
    data = file.read().replace('\n', '')
    str1=data.replace(' ', '')
with open('target.txt', 'r') as file:
    data = file.read().replace('\n', '')
    str2=data.replace(' ', '')
if(len(str1)>len(str2)):
    length=len(str1)
else:
    length=len(str2)
print(100-round((levenshtein(str1,str2)/length)*100,2),'% Similarity')

Create two files "original.txt" and "target.txt" in same directory with content.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .