How to Rewrite History (with Git)Published 2018-11-29
It's never a great day when you need to rewrite git histories, but the good news is that it can be done.
In my case I was linking to a domain that had related documentation and code that was unfortunately hijacked.
Since googlebot crawls the commit history of Github, Gitea, and Gitlab (and ostensibly all other git platforms), that was actually playing a significant role in the SEO of the legitimate re-hosted content, even months later.
We're about to do some Hard-Core Henry type stuff here - so you besta back up!
mydate=$(date +%F-%H-%M-%S) rsync -av ./my-project/ ./my-project.$mydate/
git rev-list --all | xargs git grep -i <<SEARCH_TERM>>
I say to search first because it's very likely that an exact match may
have different contexts. Perhaps
example.com should be be replaced
whatever.net, but the more specific
needs to be replaced with
code.whatever.net - thus taking a peek first will help us decide.
To search all of everything you can pipe
git rev-list --all to
This is done via
xargs to handle long commit histories gracefully.
git rev-list --all | xargs git grep -i example.com
If you only need to search a subtree, this can be done with passing
-- path/to/thing, like so:
git rev-list --all -- lib/foo | xargs git grep -i example.com -- lib/foo
On mac you can install
bfg to make the replacing easier,
but I'll cover normal
git filter-branch as well.
brew install bfg
You'll need to compile a list of things that need to be replaced, in order:
example.com/code==>code.whatever.net example.com==>whatever.net example foo==>whatever foo
dirty work cleanup:
bfg --private --replace-text /tmp/replacements.txt
Bonus: delete large files, if you need to
bfg --strip-blobs-bigger-than 10M
Prune the orphans (although they'll come back when mirroring the remote):
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git grep -l 'example.com' | xargs sed -i '' -e 's/SEARCH_TERM/REPLACE_WITH/g'
Meh, I don't want to spend a lot of time on this because bfg is just so much better.
git grep -l 'example.com' | xargs sed -i '' -e 's/example\.com/whatever.net/g'
git grep -l 'example.com' | xargs sed -i 's/example\.com/whatever.net/g'
When you re-author you also need to change the commiter as well.
This can only be done with
git filter-branch -f --env-filter ' old_email="email@example.com" my_name="Jon Doe" my_email="firstname.lastname@example.org" if [ "$GIT_COMMITTER_EMAIL" = "$old_email" ]; then export GIT_COMMITTER_NAME="$my_name" export GIT_COMMITTER_EMAIL="$my_email" fi if [ "$GIT_AUTHOR_EMAIL" = "$old_email" ] then export GIT_AUTHOR_NAME="$my_name" export GIT_AUTHOR_EMAIL="$my_email" fi ' --tag-name-filter cat -- --branches --tags
This will also rewrite tags (no idea why this isn't the default)
-f is because we've already made a backup and I don't want
to get the message about the local git backup that already exists.
Like a Jedi Master using the brute-force power of suggestion, you will now brute force overwrite your remote:
git push --force
This will royally screw up the next
git pull of anyone who has cloned your
repo up to this point.
You definitely don't want to do it if there are PRs on the way... or at least not without telling your collaborators.
High Five: Replace Repo
What we've done here today isn't sufficient if you've got passwords or secrets in the remote. You actually have to delete the remote and re-push.
The reason is that the newly orphaned commits aren't deleted by a push, but rather the prior references (HEAD, branches, and tags), are overwritten.
It's unlikely to show up in a Google search, as it won't be reachable, but if someone were to mirror the repo they'd get the orphans also.
- On BFG: https://medium.com/@rhoprhh/removing-keys-passwords-and-other-sensitive-data-from-old-github-commits-on-osx-2fb903604a56
- On rewriting emails https://stackoverflow.com/questions/750172/how-to-change-the-author-and-committer-name-and-e-mail-of-multiple-commits-in-gi/9491696#9491696
- git filter-branch + sed https://blog.jasonmeridth.com/posts/use-git-grep-to-replace-strings-in-files-in-your-git-repository/
By AJ ONeal
Did I make your day?
Buy me a coffee