Hints on what FreeDup (seems to) lack

  1. Excess Mode
    Duff provides an "excess mode" that shows clusters of identical files where exactly one is missing. The intention is to remove all duplicates and keep the one that is not shown. The man page of duff suggests to use:
    duff -er . | xargs rm
    In case you want to do the same with FreeDup, your line should read
    freedup -in . | awk '{if(NF!=0)print x;x=$0}' | xargs rm
    Please be aware that two such concurrently active jobs might delete files, since qsort() of the OS does not provide a kind of sorting that guarantees to keep two identical files in their original order.

  2. Convert Symbolic Links To Real Files
    Duff also provides a "reverse mode" that converts symbolically linked files back to plain files, which are not linked to their original source. If this is generally desired you may want to use:
    find test -type l -exec cp {} {}.tmp$$ \; -exec mv {}.tmp$$ {} \;

  3. Convert Symbolic Links To Hard Links
    If you wish to replace symbolic links by hard links, I may provide you two alternatives. One is using a shell script soft2hard The other uses my own C code of a command symharden, which accepts a single symbolic link that will be replaced by a hard link if possible. It will fail with non-zero return code in all other cases. Please be aware that the usual restriction for links apply, i.e. no cross-device links. Use this line to do it for a full tree:
    find test -type l -exec symharden {} \;

  4. Hard-Linking directories
    some operating systems support linking of directories on some file systems with the link (not possible with ln) command. Since the testing environment does not provide such functionality, there is no option for it. On the other hand, it would probably not significantly change the file system size.

  5. Exclude Directories
    FreeDup does not really provide to exclude directories by full or partial path name. But since you may prepare any file list, you are able to do this as well. For me mostly two situations apply: You may easily think of other solutions following this patterns.

  6. Directory Modification Times
    When linking files it is unavoidable to change the modification and the access time of the directory the target belongs to. Carsten Schmidt reported me a case where this leads to additional (unwanted) actions. Use the -T option to make FreeDup keeping the modification times of the directories. Since FreeDup avoids locking, there is a little chance for another process to modify the directory while the access date is (mis)corrected.
    NB: The full time stamp of the linked file is the one of the link source. After linking any modification would affect link source and target.

  7. Removing empty files or directories
    With Linux this a simple command line would do it for directories:
    find ./ -type d -empty -print0 | xargs -0 rmdir
    and another one for empty files
    find ./ -type f -empty -print0 | xargs -0 rm
    Both lines allow to use line feeds within the file names, since they use zero limited strings (This hint is from the readme of dupmerge).
    With a non GNU-like OS this command line would do it for directories:
    find ./ -type d -size 0c -print | xargs rmdir
    and a similar one for empty files:
    find ./ -type f -size 0c -print | xargs rm

  8. Finding Linked files with Windows
    Kurt pointed out how hard it can be to find linked files within Windows (compare: junction systeminternals). He suggests this command using cygwin:
    find . -type f -noleaf -links +1 -printf "%n %i %f\t%h\n" | sort | less

Hints on (previous) strange behaviour of FreeDup

Quality Verification

Starting with version 1.0-5 the binary installation provides a file called verify. This is partially identical to the testing routines of the source package. Please be aware, that you have to be root to be successful with all tasks since there are test files given to other users (bin.bin, except for cygwin: ASPNET.SYSTEM). In version 1.0-5 there is also a good chance that the intercative test fails, when the sorting order (which was not defined!) differs from my development system.

Hash Evaluation

The versions before 1.0-4 do not support an internal hash sum calculation. External hash programs give you the disadvantage of speed loss, since the hash sums are calculated separately (use -# to switch hashsums off). The advantage is, that you may use nearly every external program that generates hash sums. You have to set the paths at compile time or move/link the executables to where they are expected.

The use of external programs may cause strange effects, if they do not work as expected. This was the reason why the cygwin versions before 1.0-3 all had no hash support. Version 1.0-3 does no strict testing, but checks that the output format matches the format that FreeDup needs. On my development system (SuSE Linux 10) the output to sha1sum freedup reads

284abef5f109e88d8e997a8756c6fe396dade795  freedup
while it reads under cygwin
284abef5f109e88d8e997a8756c6fe396dade795 *freedup

Freedup expects a 40 byte hash sum for sha1sum and 32 alphanumeric bytes for md5sum. The use of the 16 bytes output from sum is not really considered helpful, but provided as fallback. The spaces (for cygwin the second is an asterisk) after the hash code are checked to be there (details are in the definition around the hashme[] within the source). If not, it is assumed that the hash methods quite certainly provide misleading results. Therefore they are disabled automatically. Prior to version 1.1 you see output like this if the output format does not match:

$ freedup /home/peter/
md5: format does not match ('/' instead of ' ')
sum: format does not match ('i' instead of ' ')
No working hashmethod found.
--> Use of hash methods is disabled.
[...]

Starting with version 1.0-4, FreeDup provides an internal hash method as default and fallback method, but allows a free choice between an internal and the external hash methods.