0

I have a directory which contains pairs of files. Unfortunatley, the naming scheme of the files is a mess, therefore it is very difficult to associate the pairs to each other by file name.

BUT: each pair has been created at nearly the same time. Nearly meaning less than 1 minute appart. Unrelated files have timestamps which differ by at least several hours.

How can I find each file pair?

I want to process them further. Therefore, an output format which is nicley parsable/usable by a bash script is preferred.

Example directory listing:

Mar 14  08:29   AAA_2018_03_20_33.xxx
Mar 14  08:30   BBB-xxx-20_4.pdf
May 3   08:32   AAA_2018_05_10_40.xxx
May 3   08:32   BBB-xxx-10_2.pdf
May 24  08:33   AAA_2018_05_30_44.xxx
May 24  08:33   BBB-xxx-30_5.pdf
Mar 23  08:44   AAA_2018_03_30_35.xxx
Mar 23  08:44   BBB-xxx-30_1.pdf
May 18  08:48   AAA_2018_05_25_43.xxx
May 18  08:48   BBB-xxx-25_7.pdf

I sorted them by time to highlight which files belog together. Also the filenames have been censored slightly.

There might be errors which need to be dealed with: there could be single files (pair is missing) or more than two files within the same time-delta. In these cases I want to call a bash function to deal with the problem (log it, inform the user, etc.)

1 Answer 1

1

Rough sketch: for each AAA* file:

  • get the time stamp (stat is your friend)
  • compute the minimum timestamp for a matching BBB (same as the AAA, I think), and the maxium TS for BBB (AAA+some minutes)
  • use the two timestamps as a condition in a find: \( \! -newermt $maxts -a -newermt $mints \)
  • rename the found file (or create a link) AAA-whatever-BBB-whatever.pdf (AAA_2018_03_20_33-BBB-xxx-20_4.pdf) so that you can later obtain the BBB name from the AAA name.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .