It's never a great day when you need to rewrite git histories, but the good news is that it can be done.

In my case I was linking to a domain that had related documentation and code that was unfortunately hijacked.

Since googlebot crawls the commit history of Github, Gitea, and Gitlab (and ostensibly all other git platforms), that was actually playing a significant role in the SEO of the legitimate re-hosted content, even months later.

Zeroth: Backup

We're about to do some Hard-Core Henry type stuff here - so you besta back up!

mydate=$(date +%F-%H-%M-%S)
rsync -av ./my-project/ ./my-project.$mydate/

First: Search

git rev-list --all | xargs git grep -i <<SEARCH_TERM>>

I say to search first because it's very likely that an exact match may have different contexts. Perhaps example.com should be be replaced with whatever.net, but the more specific example.com/code actually needs to be replaced with code.whatever.net - thus taking a peek first will help us decide.

To search all of everything you can pipe git rev-list --all to git grep. This is done via xargs to handle long commit histories gracefully.

git rev-list --all | xargs git grep -i example.com

If you only need to search a subtree, this can be done with passing -- path/to/thing, like so:

git rev-list --all -- lib/foo | xargs git grep -i example.com -- lib/foo

Second: Replace

On mac you can install bfg to make the replacing easier, but I'll cover normal git filter-branch as well.

BFG

brew install bfg

You'll need to compile a list of things that need to be replaced, in order:

/tmp/replacements.txt:

example.com/code==>code.whatever.net
example.com==>whatever.net
example foo==>whatever foo

Do the dirty work cleanup:

bfg --private --replace-text /tmp/replacements.txt

Bonus: delete large files, if you need to

bfg --strip-blobs-bigger-than 10M

Prune the orphans (although they'll come back when mirroring the remote):

git reflog expire --expire=now --all && git gc --prune=now --aggressive

git filter-branch

git grep -l 'example.com' | xargs sed -i '' -e 's/SEARCH_TERM/REPLACE_WITH/g'

Meh, I don't want to spend a lot of time on this because bfg is just so much better.

Mac:

git grep -l 'example.com' | xargs sed -i '' -e 's/example\.com/whatever.net/g'

Linux:

git grep -l 'example.com' | xargs sed -i 's/example\.com/whatever.net/g'

Third: Reauthor

When you re-author you also need to change the commiter as well.

This can only be done with git filter-branch:

git filter-branch -f --env-filter '
old_email="me@example.com"
my_name="Jon Doe"
my_email="jon@whatever.net"

if [ "$GIT_COMMITTER_EMAIL" = "$old_email" ]; then
    export GIT_COMMITTER_NAME="$my_name"
    export GIT_COMMITTER_EMAIL="$my_email"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$old_email" ]
then
    export GIT_AUTHOR_NAME="$my_name"
    export GIT_AUTHOR_EMAIL="$my_email"
fi
' --tag-name-filter cat -- --branches --tags

This will also rewrite tags (no idea why this isn't the default) and the -f is because we've already made a backup and I don't want to get the message about the local git backup that already exists.

Fourth: Push

Like a Jedi Master using the brute-force power of suggestion, you will now brute force overwrite your remote:

git push --force

This will royally screw up the next git pull of anyone who has cloned your repo up to this point.

You definitely don't want to do it if there are PRs on the way... or at least not without telling your collaborators.

High Five: Replace Repo

What we've done here today isn't sufficient if you've got passwords or secrets in the remote. You actually have to delete the remote and re-push.

The reason is that the newly orphaned commits aren't deleted by a push, but rather the prior references (HEAD, branches, and tags), are overwritten.

It's unlikely to show up in a Google search, as it won't be reachable, but if someone were to mirror the repo they'd get the orphans also.

Thanks


By AJ ONeal

If you loved this and want more like it, sign up!


Did I make your day?
Buy me a coffeeBuy me a coffee  

(you can learn about the bigger picture I'm working towards on my patreon page )