1

I received a few text documents with thousands of words in them (each word is in a line). I'm sure there are duplicate words and need to delete those duplicate and just remain a single of them. I copy/paste all those words in an MS document and now I need to find duplicates and delete extra ones. Find and then delete one by one is boring and takes much time and some of them can escape from my eyes. I need software or a method to do it inside MS Word at once. Something that searches all words, and give me a result list to tell it to keep a single one of those words and delete them the rest to clean my list. I use MS Word 2019 on Windows 10 x64. Is there a macro or simple way to be able to fix this? I google it and find the old macro, but didn't work on MS Word 2019, and also was complicated. Looking for an easier way or program with easy UI to do it. Free or trial software would be appreciated.

2 Answers 2

1

If you have Excel, you could instead copy your list into a spreadsheet (if in separate lines, they should paste in as separate cell/row for each word in a single column). You can then use Excel's Remove Duplicate feature (on the Data tab).

1
  • Wonderful and it worked! Thank you "Tanya" for this simple, handy trick and Excel found about 300 duplicate words that should be deleted. Unfourtualnely I couldn't fix this with the previous solution that "Xehei" described, he described in detail very well, but I'm not good at programming and this material, tried a few times but didn't get the answer with the PowerShell feature. But this Excel's option is very easy and quick. Commented Jan 11, 2021 at 10:08
2

You can use PowerShell to do this, to open Powershell, use Win+R->type PowerShell -> Enter; The basic idea is to create an empty array first, then check if the array already contains the word, add the word to the array only if the array does not contain the word.

You said each word is in its own, separate line, then it would be simple to achieve with these codes:

[array]$words=get-content "path\to\file\files.txt"
$uniquewords=@()
foreach ($word in $words) {
    if ($uniquewords -notcontains $word) {$uniquewords += $word}
}
$uniquewords | out-file "path\to\file\files.txt"

Update as per comment:

An array is a data structure that is designed to store a collection of items. The items can be the same type or different types.

Microsoft Docs:Arrays

An [Array] ([System.Array]) is a type of PowerShell objects that is a collection of items, Arrays can be easily traversed and manipulated with PowerShell commands.

Use [array] | get-member -static to get all available methods for [array]'s.

To make a variable an [array], put [array] before it;

In the first line, get-content gets the content of the file located at "path\to\file\files.txt" and assigns to result to a variable named words, the dollar sign $ indicates the string following it names a variable. The variable is an [array] because the [array] put before it.

Get-Content returns each line as a separate string, so each line would be an element in the $words [array].

The second command creates an empty [array] named unique word.

In the third line, foreach ($word in $words) means for each item in the array named words(for every item, one by one, in order)

for example:

$array=@('one','two','three','four','five')

The above line creates an [array] named $array with the five elements, each word is an element, the elements are [string]'s because of the quotes that enclose them. The elements are separated by comma.

Try this command:

foreach ($arra in $array) {$arra}

This will output:

one
two
three
four
five

The things in () is a condition, the things in {} is a scriptblock(commands to be executed).

the scriptblock of the foreach statement,

 if ($uniquewords -notcontains $word) {$uniquewords += $word}

This is a if conditional statement, the things in () is a condition, things in {} is a scriptblock.

-notcontains is an operator that means the thing before it does not contain the thing after it(exactly what it says in its name), += is an operator that adds the thing after it to the thing before it.

The if statement means if $uniquewords doesn't contain the word, add the word to $uniquewords.

The final line outputs the content of $uniquewords to the file.

The foreach statement ensures every word is processed.

As how to replace path, replace the "path\to\file\files.txt" with the full path of the file.

For example, if the file is named textfile.txt stored on Desktop, then it is in %userprofile%\desktop your username is username, its full path C:\Users\Username\Desktop\textfile.txt

In cmd, you can use %userprofile%\desktop\textfile.txt to indicate the full path for any username.

In PowerShell, use this instead:

$Desktop=[$Environment]::GetFolderPath('Desktop')
${Desktop}\textfile.txt

If you really really are not programming material, no matter how simple it is you just cannot understand it, use Shift+RMB and scroll down to find "Copy as path" in the context menu and click it after finding the file in explorer and LMB on it.

To replace the path, replace "path\to\file\files.txt" with full path of the file.

For example, if the file is named textfile.txt stored in C:\somefolder\

Use this:

[array]$words=get-content "C:\somefolder\textfile.txt"
......
$uniquewords=set-content "C:\somefolder\textfile.txt"

I am sorry I cannot make it any simpler...

1
  • I'm totally new to Powershell, I should confess doesn't understand well what to do. I open Powershell and have a text file that merged from three txt files. I don't understand what you mean by "array" in the context. Sorry, I guess it guides still complicated for me. Should I copy/paste the codes in Powershell? and how to replace the path and my file name? Commented Jan 7, 2021 at 7:32

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .