We are faced with a situation where data has been backed-up to several external mediums and we are undergoing an exercise to consolidate the data. The data is comprised of binary files, audio, video, compressed archives, virtual machines, databases, etc.
Is it a best practice to copy all the files to a single source prior to deduplicating the data or is it normal to run the procedure across multiple media?
Is it best to run file-level or block-level deduplication? I am aware of the technical differences but am unclear why you would choose one over the other. We are after accuracy as opposed to performance
EDIT
When I say copy, I mean we would copy each source to a single drive or NAS. Each source would be represented by a directory. All the data is currently stored in external hard drives. The objective is to deduplicate the data and have a single source of truth.