Finding and deleting duplicate files
Okay, so you have a huge pile of mp3s and somehow managed to copy them repeatedly somewhere and now only want one copy of each? (hey! i do this all the time copying them from machine to machine!).
Best way to check that they are “identical” is with md5sum. This is how i deal with my problem.
find ./ -type f | while read file ; do md5sum "$file" >> md5list ; done # this gives me a file called md5sum with all the filenames and their md5sum cat md5list | awk '{print $1}' | sort | uniq -c |grep -v 1\ | awk '{print $2}' >duplist # this checks for files with duplicate md5sum for i in `cat duplist` ; do grep $i md5list | sed "1,1d"| sed s/$i// >>rmlist; done # this outputs a list of files minus the first/top one so we are still left with one copy cat rmlist | while read line ; do mkdir bin ; echo removing $line ;mv "$line" bin/; done # this moves them all to a dir called bin/ which you can remove later echo check bin/ for any files you accidently deleted # letting you know the above!
You probably want to remove the files md5list duplist and rmlist after you are done

You might want to look at the hardlink tool by Jakub Jelinek ( https://fedorahosted.org/hardlink/browser/hardlink.c). This way you don’t need to delete anything.
If however you do want to delete the doubles entries you could do something like:
find . -type f -links +1 -printf “%i %p\n” | sort -n
cheers