Author Topic: File Utility Software - Non-Duplicate Compare Finder  (Read 4667 times)

0 Members and 1 Guest are viewing this topic.

Offline fohat.digs

  • * Elevated Elder
  • Thread Starter
  • Posts: 6533
  • Location: 35°55'N, 83°53'W
  • weird funny old guy
File Utility Software - Non-Duplicate Compare Finder
« on: Sun, 10 January 2016, 09:48:42 »
Does anyone know of a good free Windows utility that does the opposite of "Duplicate File Finder" as is often found in utility programs such as Glary Utilities?

It seems that finding duplicate files is commonplace but I need the opposite.

My problem is that I have several external hard drives with miscellaneous backup archives containing almost the same stuff but not exactly. These are drives with thousands of entries and thus finding and deleting individual duplicates is not really an option. I would like to bring them all up together (and be more scrupulous about it in the future) but I do not want to simply "synch" the drives - I need to look at the folders and decide which I want, and it is not necessarily going to be the most recent version. The issue is more important concerning folders with the same name than it is with individual files, because not all folders with the same names have equivalent content.

Thanks in advance for your help.

PS - if the only option is in Linux, I could do it in Ubuntu, but I would prefer to do it from within Windows if possible

"However, even though I was born in the Mesozoic, I do know what anyone who wants to reach out to young people should say: Billionaires took your money. They took your chance to buy a home. They took your chance at a good education. They stole your opportunities. Billionaires took the things you want in life. If you really want those things, you have to take them back.
That's the message. That's the whole message. Say that every day, not just to reach America's frustrated young white men, but people of every age, race, and gender.
Late-stage capitalism is a wealth-concentration engine, focused on vacuuming up every dollar and putting it in as few hands as possible. Republicans are helping that vacuum suck.
How does a tiny fraction of the population get away with this? They do it by dividing the other 99% of Americans against themselves."
- Marc Sumner 2025-05-30

Offline berserkfan

  • Posts: 2135
  • Location: Not CONUS Not CONUS Not CONUS Not CONUS
  • changing diapers is more fun than model f assembly
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #1 on: Sun, 10 January 2016, 10:00:56 »
wow this is seriously confusing geek stuff
Most of the modding can be done on your own once you break through the psychological barriers.

Offline fohat.digs

  • * Elevated Elder
  • Thread Starter
  • Posts: 6533
  • Location: 35°55'N, 83°53'W
  • weird funny old guy
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #2 on: Sun, 10 January 2016, 17:50:28 »
Since there were no suggestions forthcoming, and it seems that there are no utilities to do what I want to do simply and cleanly, I will proceed to the painful brute force method.

I downloaded "Duplicate Finder" and selected a "master" disc that I will use as the standard. I set it to work, and it will probably take a full day to process because here are 2 former internal hard drives (1.5T and 2T) set in external drive docks and connected via USB 2. We are talking about 40K very large files in 8K folders here.

At the end, I will delete all the duplicates from the "number 2" drive and look over whatever is left before I decide what to do with it. Then I will do it all over again for the 3rd drive. Going forward, I will "leapfrog" and just copy the whole enchilada onto the (freshly formatted) next drive for each iteration.
"However, even though I was born in the Mesozoic, I do know what anyone who wants to reach out to young people should say: Billionaires took your money. They took your chance to buy a home. They took your chance at a good education. They stole your opportunities. Billionaires took the things you want in life. If you really want those things, you have to take them back.
That's the message. That's the whole message. Say that every day, not just to reach America's frustrated young white men, but people of every age, race, and gender.
Late-stage capitalism is a wealth-concentration engine, focused on vacuuming up every dollar and putting it in as few hands as possible. Republicans are helping that vacuum suck.
How does a tiny fraction of the population get away with this? They do it by dividing the other 99% of Americans against themselves."
- Marc Sumner 2025-05-30

Offline inanis

  • Truly Literally The Cloud
  • * Destiny Supporter
  • Posts: 790
  • Location: Dark Places
    • SEALWoodworking
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #3 on: Sun, 10 January 2016, 18:04:42 »
You probably already checked into this, but have you looked at rsync?  Not sure if it can do exactly what you want but it is pretty robust and already part of the OS.
Some hearts are gallows, I'm not here for hangin' around

Offline RabRhee

  • Posts: 271
  • Location: Highlands, Scotland
  • Life is just a box of cherries.
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #4 on: Sun, 10 January 2016, 18:22:58 »
You probably already checked into this, but have you looked at rsync?  Not sure if it can do exactly what you want but it is pretty robust and already part of the OS.

Similar, I was going to suggest FreeFileSync (http://www.freefilesync.org/).  I find it quite flexible, and you can compare without then synchronising, and if you just wanted to create a list and work on it manually rather than with the automatic options you can export the file list as a csv file too.
-Life is good-          Crafting: |  KeychainsMore.   .Keychains | Crowdsource Key | Budget Keycap Board |

QFR Dvorak Greens | Neo 87 Dvorak Blues

Offline fohat.digs

  • * Elevated Elder
  • Thread Starter
  • Posts: 6533
  • Location: 35°55'N, 83°53'W
  • weird funny old guy
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #5 on: Sun, 10 January 2016, 19:02:37 »
I find it quite flexible, and you can compare without then synchronising, and if you just wanted to create a list and work on it manually

Thanks, but I have that already. Unfortunately, with 40,000 files and 97% overlap, I can't really use any sort of list unless it is a list of only the 3% segregated out so that it is not lost within the 97%.

Everybody wants to give you the list the other way, but that is too overwhelming and daunting for me to sift through.

"However, even though I was born in the Mesozoic, I do know what anyone who wants to reach out to young people should say: Billionaires took your money. They took your chance to buy a home. They took your chance at a good education. They stole your opportunities. Billionaires took the things you want in life. If you really want those things, you have to take them back.
That's the message. That's the whole message. Say that every day, not just to reach America's frustrated young white men, but people of every age, race, and gender.
Late-stage capitalism is a wealth-concentration engine, focused on vacuuming up every dollar and putting it in as few hands as possible. Republicans are helping that vacuum suck.
How does a tiny fraction of the population get away with this? They do it by dividing the other 99% of Americans against themselves."
- Marc Sumner 2025-05-30

Offline RabRhee

  • Posts: 271
  • Location: Highlands, Scotland
  • Life is just a box of cherries.
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #6 on: Sun, 10 January 2016, 19:21:26 »
I find it quite flexible, and you can compare without then synchronising, and if you just wanted to create a list and work on it manually

Thanks, but I have that already. Unfortunately, with 40,000 files and 97% overlap, I can't really use any sort of list unless it is a list of only the 3% segregated out so that it is not lost within the 97%.

Everybody wants to give you the list the other way, but that is too overwhelming and daunting for me to sift through.

in Freefilesync I do a mirror left to right for backups, and the list it gives me are only the files that are different, and that's the only files on the list I export. But I guess that only works if the names are exact and the directory structure too.

Still, even if that works, its a pain, good luck with it :)
-Life is good-          Crafting: |  KeychainsMore.   .Keychains | Crowdsource Key | Budget Keycap Board |

QFR Dvorak Greens | Neo 87 Dvorak Blues

Offline fohat.digs

  • * Elevated Elder
  • Thread Starter
  • Posts: 6533
  • Location: 35°55'N, 83°53'W
  • weird funny old guy
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #7 on: Sun, 10 January 2016, 19:40:37 »

the list it gives me are only the files that are different, and that's the only files on the list I export. But I guess that only works if the names are exact and the directory structure too.


I have some instances that are really goofy and arcane, and I carelessly laid booby traps for myself.

For example each disc might have "\Music\Rock\Beatles\1966 Revolver\01 - Taxman.MP3"

But one would be the common 192-bit CD rip and the other would be the preferable 320-bit mono LP rip that I would want to keep.

"However, even though I was born in the Mesozoic, I do know what anyone who wants to reach out to young people should say: Billionaires took your money. They took your chance to buy a home. They took your chance at a good education. They stole your opportunities. Billionaires took the things you want in life. If you really want those things, you have to take them back.
That's the message. That's the whole message. Say that every day, not just to reach America's frustrated young white men, but people of every age, race, and gender.
Late-stage capitalism is a wealth-concentration engine, focused on vacuuming up every dollar and putting it in as few hands as possible. Republicans are helping that vacuum suck.
How does a tiny fraction of the population get away with this? They do it by dividing the other 99% of Americans against themselves."
- Marc Sumner 2025-05-30

Offline UsualSuspectXXX

  • Posts: 3461
  • Location: Persephone
  • (⌐■_■)⊃━☆゚.*・。゚
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #8 on: Sun, 10 January 2016, 19:48:49 »
You could probably write a pretty simple batch script to do this for you with the aid of Grep for Windows.

Offline smknjoe

  • Posts: 862
  • Location: Tejas
  • I like tactile, clicky, switches.
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #9 on: Sun, 10 January 2016, 20:02:42 »

the list it gives me are only the files that are different, and that's the only files on the list I export. But I guess that only works if the names are exact and the directory structure too.


I have some instances that are really goofy and arcane, and I carelessly laid booby traps for myself.

For example each disc might have "\Music\Rock\Beatles\1966 Revolver\01 - Taxman.MP3"

But one would be the common 192-bit CD rip and the other would be the preferable 320-bit mono LP rip that I would want to keep.



Rsync should be able to copy both files into a directory even if they have the same name. If I'm not mistaken it looks at file size and name. If either are different it gets copied.
SSKs for everyone!

Offline smknjoe

  • Posts: 862
  • Location: Tejas
  • I like tactile, clicky, switches.
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #10 on: Sun, 10 January 2016, 20:04:44 »
That's assuming you'd rather have duplicates than lose a copy that you wanted to keep. If you want to cherry pick exact files then UsualSuspectXXX's suggestion would be better.
SSKs for everyone!

Offline iLLucionist

  • * Elevated Elder
  • Posts: 2735
  • Location: Netherlands
  • Topre is Love.
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #11 on: Thu, 28 January 2016, 15:10:05 »
How sick would you like the automation? I would probably take a day and write a python script to do everything for me:

(1) For each source (harddrive), build a list for which each entry contains the filename, md5 (or another hash), size, and modification date (and other criteria you would like to use).

(2) Use python sets to find out which filenames exist across all sources and make one list containing the files that exist across multiple places.

(3) For those files in that list, automatically copy all files to the new place with different filenames and append modification date to filename or number them (e.g., 2015-03-20 == #1, 2015-05-30 == #2 etc).

(4) Copy the rest over.

Or am I missing something?
MJT2 Browns o-rings - HHKB White - ES-87 Smoke White Clears - 87UB 55g

Offline fohat.digs

  • * Elevated Elder
  • Thread Starter
  • Posts: 6533
  • Location: 35°55'N, 83°53'W
  • weird funny old guy
Re: File Utility Software - Non-Duplicate Compare Finder
« Reply #12 on: Thu, 28 January 2016, 15:29:14 »
Did I say that this was over 1TB and 40K+ files?

Eventually I used "Duplicate Finder Free" and it ran for over 24 hours (although it reported running less than 2 hours). I then reconciled the different files onto 1 drive and formatted the other before I copied the "complete current" set onto it, too.

In the future I will keep changes and additions in a separate directory until I am ready to deal with them properly.
"However, even though I was born in the Mesozoic, I do know what anyone who wants to reach out to young people should say: Billionaires took your money. They took your chance to buy a home. They took your chance at a good education. They stole your opportunities. Billionaires took the things you want in life. If you really want those things, you have to take them back.
That's the message. That's the whole message. Say that every day, not just to reach America's frustrated young white men, but people of every age, race, and gender.
Late-stage capitalism is a wealth-concentration engine, focused on vacuuming up every dollar and putting it in as few hands as possible. Republicans are helping that vacuum suck.
How does a tiny fraction of the population get away with this? They do it by dividing the other 99% of Americans against themselves."
- Marc Sumner 2025-05-30