I've a wordlist which contains, username,phonenumber and email. It's gathered from various sources therefore in variable sizes. I need to filter the duplicate username and all things in a row.

So far I've used the method using Notepad++. But the limitations are it fill filter only one file at a time. And it can't handle files like 500MB.

so if a file has [email protected] here means the same should not appear in another file.

In simple I need to achieve the result above by Notepad++ for multiple files which are HIGHER than 500 MB.

Any tools or programs? Or any efficient Java or C# snippet?

  • 1
    I could give you a C# or even PowerShell snippet (though 500MB may cause you to run out of memory), but do you want to combine the files into one, or do you have some way to determine which file will keep the duplicate? There may also be an existing program. Once again, 500MB may cause issues.
    – Bob
    Commented May 19, 2012 at 7:28
  • @Bob - If its possible to process 100 MB file means, I can split the bigger ones to 100 and process them. The first loaded file is master file and the second file should be clear of the first one's content.
    – Jones
    Commented May 19, 2012 at 7:42

1 Answer 1


Here's a C# program that does.. something like what you asked. I'm actually not 100% sure what you want.

Usage is: program.exe "outputfolder" "file1.txt" "file2.txt" "file3.txt"

It rewrites the listed files in the output folder, processed in the order specified. If a username has been encountered in any line or file before, it will skip the line. It doesn't check email or phone number in any way.

using System;
using System.Collections.Generic;
using System.IO;

namespace CreateUniqueFile
    class Program
        static void Main(string[] args)
            string fullpath;
            string outpath;
            List<string> files = new List<string>();

            for (int i = 1; i < args.Length; i++)
                fullpath = Path.GetFullPath(args[i]);
                if (!File.Exists(fullpath))
                    Console.WriteLine("File not found: {0}", fullpath);

            if (files.Count == 0)
                Console.WriteLine("No files provided!");
                outpath = Path.GetFullPath(args[0]);
                Console.WriteLine("Output will go to folder: \"{0}\"", outpath);
                Console.WriteLine("Process files in the above order? (Y/N)");
                bool yes = false;
                while (!yes)
                    switch (Console.ReadKey().Key)
                        case ConsoleKey.Y:
                            yes = true;
                        case ConsoleKey.N:

            if (!Directory.Exists(outpath))

            HashSet<string> seennames = new HashSet<string>();

            string line, username;

            foreach (string path in files)
                string writepath = outpath + '\\' + Path.GetFileName(path);
                if (File.Exists(writepath))
                    writepath = outpath + '\\' + Path.GetFileNameWithoutExtension(path) + " (2)" + Path.GetExtension(path);
                    // Dodgy thing to increment the number, don't touch!
                    while (File.Exists(writepath))
                        writepath = writepath.Substring(0, writepath.LastIndexOf('(') + 1) +
                            (Int32.Parse(writepath.Substring(writepath.LastIndexOf('(') + 1, writepath.LastIndexOf(')') - writepath.LastIndexOf('(') - 1)) + 1) +

                using (StreamWriter writer = new StreamWriter(writepath))
                    using (StreamReader reader = new StreamReader(path))
                        while (!reader.EndOfStream)
                            line = reader.ReadLine();
                            username = line.Split('-')[0];
                            if (!seennames.Contains(username))

                        Console.WriteLine("{0} processed, output to {1}", path, writepath);

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .