In this example, we'll see how we can use Git filter-branch to remove sensitive data from a file throughout the repository history.
For simplicity, we'll use a very simple example repository. It contains a few files. One among them is .credentials
, which contains a username and password. Start by cloning the repository and changing the directory, as shown in the following command:
$ git clone https://github.com/dvaske/remove-credentials.git $ cd remove-credentials
tree-filter
option to filter branch. The .credentials
file looks as follows:username = foobar password = verysecret
sed
command to do this:sed -i '' 's/^(.*=).*$/1/'
$ git filter-branch --prune-empty --tree-filter "test -f .credentials && sed -i '' 's/^(.*=).*$/1/' .credentials || true" -- --all
$ cat .credentials username = password =
reflog
, and triggering garbage collection.For each commit in the repository, Git will check the contents of that commit and run tree-filter
. If the filter fails, non zero the exit code, filter-branch
will fail. Therefore, it is important to remember to handle the cases where tree-filter
might fail. This is why the previous tree-filter
checks whether the .credentials
file exists, runs the sed
command if it does, and otherwise returns true
to continue the filter-branch
.