geekhack
geekhack Community => Other Geeky Stuff => Topic started by: fohat.digs on Sun, 10 January 2016, 09:48:42
-
Does anyone know of a good free Windows utility that does the opposite of "Duplicate File Finder" as is often found in utility programs such as Glary Utilities?
It seems that finding duplicate files is commonplace but I need the opposite.
My problem is that I have several external hard drives with miscellaneous backup archives containing almost the same stuff but not exactly. These are drives with thousands of entries and thus finding and deleting individual duplicates is not really an option. I would like to bring them all up together (and be more scrupulous about it in the future) but I do not want to simply "synch" the drives - I need to look at the folders and decide which I want, and it is not necessarily going to be the most recent version. The issue is more important concerning folders with the same name than it is with individual files, because not all folders with the same names have equivalent content.
Thanks in advance for your help.
PS - if the only option is in Linux, I could do it in Ubuntu, but I would prefer to do it from within Windows if possible
-
wow this is seriously confusing geek stuff
-
Since there were no suggestions forthcoming, and it seems that there are no utilities to do what I want to do simply and cleanly, I will proceed to the painful brute force method.
I downloaded "Duplicate Finder" and selected a "master" disc that I will use as the standard. I set it to work, and it will probably take a full day to process because here are 2 former internal hard drives (1.5T and 2T) set in external drive docks and connected via USB 2. We are talking about 40K very large files in 8K folders here.
At the end, I will delete all the duplicates from the "number 2" drive and look over whatever is left before I decide what to do with it. Then I will do it all over again for the 3rd drive. Going forward, I will "leapfrog" and just copy the whole enchilada onto the (freshly formatted) next drive for each iteration.
-
You probably already checked into this, but have you looked at rsync? Not sure if it can do exactly what you want but it is pretty robust and already part of the OS.
-
You probably already checked into this, but have you looked at rsync? Not sure if it can do exactly what you want but it is pretty robust and already part of the OS.
Similar, I was going to suggest FreeFileSync (http://www.freefilesync.org/). I find it quite flexible, and you can compare without then synchronising, and if you just wanted to create a list and work on it manually rather than with the automatic options you can export the file list as a csv file too.
-
I find it quite flexible, and you can compare without then synchronising, and if you just wanted to create a list and work on it manually
Thanks, but I have that already. Unfortunately, with 40,000 files and 97% overlap, I can't really use any sort of list unless it is a list of only the 3% segregated out so that it is not lost within the 97%.
Everybody wants to give you the list the other way, but that is too overwhelming and daunting for me to sift through.
-
I find it quite flexible, and you can compare without then synchronising, and if you just wanted to create a list and work on it manually
Thanks, but I have that already. Unfortunately, with 40,000 files and 97% overlap, I can't really use any sort of list unless it is a list of only the 3% segregated out so that it is not lost within the 97%.
Everybody wants to give you the list the other way, but that is too overwhelming and daunting for me to sift through.
in Freefilesync I do a mirror left to right for backups, and the list it gives me are only the files that are different, and that's the only files on the list I export. But I guess that only works if the names are exact and the directory structure too.
Still, even if that works, its a pain, good luck with it :)
-
the list it gives me are only the files that are different, and that's the only files on the list I export. But I guess that only works if the names are exact and the directory structure too.
I have some instances that are really goofy and arcane, and I carelessly laid booby traps for myself.
For example each disc might have "\Music\Rock\Beatles\1966 Revolver\01 - Taxman.MP3"
But one would be the common 192-bit CD rip and the other would be the preferable 320-bit mono LP rip that I would want to keep.
-
You could probably write a pretty simple batch script to do this for you with the aid of Grep for Windows (https://twitter.com/BroCaps).
-
the list it gives me are only the files that are different, and that's the only files on the list I export. But I guess that only works if the names are exact and the directory structure too.
I have some instances that are really goofy and arcane, and I carelessly laid booby traps for myself.
For example each disc might have "\Music\Rock\Beatles\1966 Revolver\01 - Taxman.MP3"
But one would be the common 192-bit CD rip and the other would be the preferable 320-bit mono LP rip that I would want to keep.
Rsync should be able to copy both files into a directory even if they have the same name. If I'm not mistaken it looks at file size and name. If either are different it gets copied.
-
That's assuming you'd rather have duplicates than lose a copy that you wanted to keep. If you want to cherry pick exact files then UsualSuspectXXX's suggestion would be better.
-
How sick would you like the automation? I would probably take a day and write a python script to do everything for me:
(1) For each source (harddrive), build a list for which each entry contains the filename, md5 (or another hash), size, and modification date (and other criteria you would like to use).
(2) Use python sets to find out which filenames exist across all sources and make one list containing the files that exist across multiple places.
(3) For those files in that list, automatically copy all files to the new place with different filenames and append modification date to filename or number them (e.g., 2015-03-20 == #1, 2015-05-30 == #2 etc).
(4) Copy the rest over.
Or am I missing something?
-
Did I say that this was over 1TB and 40K+ files?
Eventually I used "Duplicate Finder Free" and it ran for over 24 hours (although it reported running less than 2 hours). I then reconciled the different files onto 1 drive and formatted the other before I copied the "complete current" set onto it, too.
In the future I will keep changes and additions in a separate directory until I am ready to deal with them properly.