Clean your Git history

Make your repositories cleaner and lighter

Posted on March 5, 2016

Clean your Git history

Why

When I started my .dotfiles repo it wasn’t very obvious to me how to keep my atom packages in it. As a consequence I just decided to push into my .dotfiles everything contained in the .atom directory under my user directory on my machine.

It was an easy and no brain solution that worked for a while but my repository and commits became hard to work with because of the numbers of files contained in every commit (+15000 for some commits). So I had to look for a solution to clean the repository retroactively.

Remove private things (password/keys) from a Git repository

This technique can also be used to remove something that you did push in a repository which shouldn’t be there. Passwords and API keys are good clients for those mistakes. Sometimes they are pushed inadvertently and of course you should remove them from your repository right away.

As explained here by SXX, a good practice is also changing these passwords and API keys. Even they won’t be visible in the future that doesn’t mean that they haven’t been seen while you pushed them.

The steps

Only remove from the history

This version supposes that you already have .gitignore the things that you want to remove from your repository. If you haven’t, just go to the next section.

git filter-branch --tree-filter 'rm -rf atom.symlink/packages' --prune-empty HEAD
git gc
git push origin master --force

Remove from the history and gitignore

Let’s suppose you have just pushed a config.js containing passwords and you want to remove it.

git filter-branch --tree-filter 'rm -rf config.js' --prune-empty HEAD

echo config.js >> .gitignore
git add .gitignore
git commit -m 'Remove config.js from Git history'

git gc
git push origin master --force

For the curious, what is git gc

If you are like me you are now wondering what is the git gc command. Is there a garbage collector in Git? Almost indeed ;)

git-gc - Cleanup unnecessary files and optimize the local repository. From Git documention

If you look at the description on the same site you will read that: “Runs a number of housekeeping tasks within the current repository, such as compressing file revisions (to reduce disk space and increase performance) and removing unreachable objects which may have been created from prior invocations of git add”.

I strongly encourage you to go to the git documention and read the rest of the description.