Basically backup time is approaching, and instead of bluntly copying my home folder (including files that haven’t changed for years) to a timestamped folder on my USB drive, I’m thinking of going for an incremential backup solution this time. I’m using ArchLinux so obviously I need something that works on Linux, but I actually want something that works on most common platforms (Win+Mac+Linux), is free and commonly available or easy to install. Since I’ve been getting into Git lately, it’s only natural that the idea of using Git as a backup solution was the first to cross my mind. My idea is to initialise a (remote) Git repository on my USB drive, clone it (locally) in the root of my home folder, add files to it and push.

Git is great because it fulfils all the requirements I’ve mentioned, is open source, will already be installed on any machine I’d use as my personal machine, and stores the latest version of files plain and simple while storing their history and previous versions in a smart compressed format. It allows me to exclude folders and files with .gitignore, lets me describe each update with a commit message and even lets me look at what’s changed before composing the commit message (but also anytime later as long as the history is preserved) so that I can come up with an accurate description. Plus I can decide to selectively include or exclude files from the backup that are not mentioned in the .gitignore, and I can spread the new backup version over multiple commits each with their own distinct description. In short, Git has some pretty awesome characteristics.

That said, there are also some drawbacks. A quick DuckDuckGo search yielded the following ones, which are not really compelling for my case [serverfault]:

  • File permissions are not stored, except for the execute bit. Basically all my files are rwr-r- except for a few that the webserver user has to be able to write, and those aren’t particularly important and I know which ones they are and will find out soon enough if I forget to set those permissions again so that’s no big deal.
  • The diff compression for only storing the modified parts of a modified file only works for text files and not for binary files. However, the fact that I can replace new versions of some files by diffs, and actually even the fact that I’m not storing completely unmodified files once more, is already an improvement over what I have now. And my drive is 3 TB and still like 90% free, while my laptop only has a 320 GB drive with less than 100 GB used, so I’ve got a long time before my backup drive becomes filled up.

There are however two issues that do bother me:

  • By default, the entire history is saved in the remote (on the USB drive) as well as the local (on my laptop) repository, meaning it would quickly start taking up space on my laptop as well. Is there some way to purge the history on the local end while keeping it on the remote end?
  • The repository’s metatada would be stored in ~/.git, but so would be my global git preferences and stuff. Will this cause conflict / unexpected behaviour, and if yes, is there any way to alleviate this?

Advertisement

I have to admit I have not looked at other options yet, but somehow the geek in me just wants to use Git for this and wants to set it up without some third-party wrapper that hides the technical details from me. What do you people think? Good or bad idea?