Monday, July 25, 2011

Collectfs - a trash collecting userspace file system for Linux

Collectfs is a FUSE userspace filesystem that provides add-on trash collection for a directory hierarchy. The purpose of collectfs is to protect a project hierarchy by providing a fairly universal no-clobber mechanism:
  • The history of changes is preserved.
  • Missteps in using rm, mv, cat, etc are non-permanent.
  • It works seamlessly with standard development tools.
It is not intended as a replacement for revision control or backups. The intention is to protect you during the between-times, when you're not covered by these other tools.

Any file that is overwritten by remove (unlink), move, link, symlink, or open-truncate is relocated to a trash directory (mount-point/.trash/). Removed files are date-time stamped so that edit history is maintained (a version number is appended if the same file is collected more than once in the same second).

Usage is quite straight forward, for example:
# Use collectfs to mount the real folder onto a mount point (any other folder) 
% collectfs myProject myWorkspace
# Now the mount point mirrors the original but with trash collection
% cd myWorkspace
% vi main.c
% indent main.c
% ls .trash
% diff .trash/main.c.2011-07-24.14:59:39 main.c
% mv .trash/main.c.2011-07-24.14:59:39 main.c
% ls .trash
# To unmount (stop using) the virtual filesystem...
% cd ..
% fusermount -u myWorkspace
It's easy to build it from source. The source is available at:
Thanks to Thomas Spahni (vodoo on the the openSUSE build services now has a collectfs package. (I also used Thomas's much more concise description of collectfs to rewrite the first paragraph of this blog post)

I'm currently using collectfs to help write collectfs. It augments my development environment with a local-history feature similar to that provided by eclipse. But unlike eclipse, KDE or gnome, the protection is implemented in the filesystem layer and applies to any tool I care to employ.

Thoughts on a deeper level... I don't think eclipse, KDE, or gnome should be in the business of collecting trash and providing undelete. The proper place to do this is in the filesystem. But this would be a very radical departure for a UNIX filesystem, probably too radical to feature in main-stream UNIX filesystem development. Such features do not fit well with established practice, and raise concerns about security and privacy (data coming back to haunt you). I do think that if you want to think beyond Linux/KDE/gnome, concepts such as trash, undelete, file-versioning are all things that would be worth consideration for a true desktop OS. I believe GNU Hurd was originally to have file-versioning similar to VMS - I considered using fuse to implement VMS-style file-versioning:

xyz.txt;1 xyz.txt;2 xyz.txt;3 ... xyz.txt;N
open(xyz.txt) == open(xyz.txt;N)

but I never liked the clutter in my VMS directories, so I went for timestamped trash instead.