How to Rewrite History (with Git)
Published 2018-11-29It's never a great day when you need to rewrite git histories, but the good news is that it can be done.
In my case I was linking to a domain that had related documentation and code that was unfortunately hijacked.
Since googlebot crawls the commit history of Github, Gitea, and Gitlab (and ostensibly all other git platforms), that was actually playing a significant role in the SEO of the legitimate re-hosted content, even months later.
Zeroth: Backup
We're about to do some Hard-Core Henry type stuff here - so you besta back up!
mydate=$(date +%F-%H-%M-%S)
rsync -av ./my-project/ ./my-project.$mydate/
First: Search
git rev-list --all | xargs git grep -i <<SEARCH_TERM>>
I say to search first because it's very likely that an exact match may
have different contexts. Perhaps example.com
should be be replaced
with whatever.net
, but the more specific example.com/code
actually
needs to be replaced with code.whatever.net
- thus taking a peek first will help us decide.
To search all of everything you can pipe git rev-list --all
to git grep
.
This is done via xargs
to handle long commit histories gracefully.
git rev-list --all | xargs git grep -i example.com
If you only need to search a subtree, this can be done with passing -- path/to/thing
, like so:
git rev-list --all -- lib/foo | xargs git grep -i example.com -- lib/foo
Second: Replace
On mac you can install bfg
to make the replacing easier,
but I'll cover normal git filter-branch
as well.
BFG
brew install bfg
You'll need to compile a list of things that need to be replaced, in order:
/tmp/replacements.txt
:
example.com/code==>code.whatever.net
example.com==>whatever.net
example foo==>whatever foo
Do the dirty work cleanup:
bfg --private --replace-text /tmp/replacements.txt
Bonus: delete large files, if you need to
bfg --strip-blobs-bigger-than 10M
Prune the orphans (although they'll come back when mirroring the remote):
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git filter-branch
git grep -l 'example.com' | xargs sed -i '' -e 's/SEARCH_TERM/REPLACE_WITH/g'
Meh, I don't want to spend a lot of time on this because bfg is just so much better.
Mac:
git grep -l 'example.com' | xargs sed -i '' -e 's/example\.com/whatever.net/g'
Linux:
git grep -l 'example.com' | xargs sed -i 's/example\.com/whatever.net/g'
Third: Reauthor
When you re-author you also need to change the commiter as well.
This can only be done with git filter-branch
:
git filter-branch -f --env-filter '
old_email="me@example.com"
my_name="Jon Doe"
my_email="jon@whatever.net"
if [ "$GIT_COMMITTER_EMAIL" = "$old_email" ]; then
export GIT_COMMITTER_NAME="$my_name"
export GIT_COMMITTER_EMAIL="$my_email"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$old_email" ]
then
export GIT_AUTHOR_NAME="$my_name"
export GIT_AUTHOR_EMAIL="$my_email"
fi
' --tag-name-filter cat -- --branches --tags
This will also rewrite tags (no idea why this isn't the default)
and the -f
is because we've already made a backup and I don't want
to get the message about the local git backup that already exists.
Fourth: Push
Like a Jedi Master using the brute-force power of suggestion, you will now brute force overwrite your remote:
git push --force
This will royally screw up the next git pull
of anyone who has cloned your
repo up to this point.
You definitely don't want to do it if there are PRs on the way... or at least not without telling your collaborators.
High Five: Replace Repo
What we've done here today isn't sufficient if you've got passwords or secrets in the remote. You actually have to delete the remote and re-push.
The reason is that the newly orphaned commits aren't deleted by a push, but rather the prior references (HEAD, branches, and tags), are overwritten.
It's unlikely to show up in a Google search, as it won't be reachable, but if someone were to mirror the repo they'd get the orphans also.
Thanks
- On BFG: https://medium.com/@rhoprhh/removing-keys-passwords-and-other-sensitive-data-from-old-github-commits-on-osx-2fb903604a56
- On rewriting emails https://stackoverflow.com/questions/750172/how-to-change-the-author-and-committer-name-and-e-mail-of-multiple-commits-in-gi/9491696#9491696
- git filter-branch + sed https://blog.jasonmeridth.com/posts/use-git-grep-to-replace-strings-in-files-in-your-git-repository/
By AJ ONeal
Did I make your day?
Buy me a coffee
(you can learn about the bigger picture I'm working towards on my patreon page )