I'm trying to create a batch, powershell, or anything a novice like myself could run easily to complete the following task. Any help would be greatly appreciated.
I have a few thousand pdfs, in a folder, that I'm trying to sort through. The problem is that the folder includes old and new revisions of the same pdf documents. I only want to keep the newest revision of each unique document. Revised version are indicated by the addition of a letter at the end of the filename (A-Z). Here is a sample list.
670BA-11-001.pdf
670BA-11-001A.pdf
670BA-11-001B.pdf
670BA-12-001.pdf
670BA-15-030C.pdf
670BA-49-120AC.pdf
670BA-49-120AD.pdf
- All files start with "
670BA
" - The following numbers change.
670BA-XX-XXX.pdf
- A file with no letter at the end of the filename indicates that it is the original revision
- A file with a letter at the end of the filename indicates it is a revised version.
- Revisions go from
A-Z
and thenAA-AZ...
so on and so forth.
Ideally I'd like the batch file to delete the older versions and leave the newest version of each unique document. In this case the output should look like:
670B-11-001B.pdf
670B-12-001.pdf
670B-15-030C.pdf
670BA-49-120AD.pdf
I was provided the following code, however I believe it is in unix (again forgive my lack of knowledge here). Would this work if I could convert it to windows command?
codes=`ls | sort | cut -d'-' -f2 | uniq`
for f in $codes; do old=`ls *-$f-* | head -n -1`; rm -vf $old; done
Here's what's going on;
ls | sort lists all the files in lexical order
cut -d'-' -f2 | uniq
splits the filenames on '-', grabs the 2 digit number from the middle, and gets rid of duplicates.
ls *-$f-* | head -n -1
lists all the files for a 2 digit code, except for the last one - which is the newest.
rm -f $old
deletes those old files, and the -f keeps it from failing of the list is empty.
SAMPLE RUN;
/tmp# touch 601R-11-001.pdf 601R-11-001B.pdf 601R-15-030C.pdf 601R-25-005E.pdf 601R-49-120AD.pdf 601R-11-001A.pdf 601R-12-001.pdf 601R-25-005D.pdf 601R-49-120AC.pdf
/tmp# codes=`ls | sort | cut -d'-' -f2 | uniq`
/tmp# echo $codes
11 12 15 25 49
/tmp# for f in $codes; do old=`ls *-$f-* | head -n -1`; rm -vf $old; done
removed '601R-11-001.pdf'
removed '601R-11-001A.pdf'
removed '601R-25-005D.pdf'
removed '601R-49-120AC.pdf'
670B-11-001B.pdf
and670B-11-001AA.pdf
it appears the one with just theB
appears to sort/order improperly without accounting for the second letter soB
comes afterAA
in those cases. There's probably a way to break down the characters and then sort but I wanted to ask aboutDate Modified
last and if that'd work instead?