admond@portfolio:~/blog
← all posts
$ cat cleaning-up-git-repositories.md

Cleaning Up Git Repositories with BFG Repo-Cleaner

2022-03-11 · 1 min read

Cleaning Up Git Repositories with BFG Repo-Cleaner

Git repositories grow over time. Accidental commits of large files, sensitive data, or build artifacts can bloat your repo, slow down clones, and create security risks. BFG Repo-Cleaner is the fastest, safest way to fix these mistakes.

Why Repository Cleanup Matters

As your project evolves, your Git history can accumulate:

These issues make repositories slower to clone, harder to work with, and risky to share. Traditional git filter-branch is powerful but slow and error-prone. BFG Repo-Cleaner is built for this exact problem.

What is BFG Repo-Cleaner?

BFG Repo-Cleaner is an open-source tool by Roberto Tyley that surgically removes unwanted data from Git history. Unlike git filter-branch, BFG:

BFG operates directly on your repository's object database, making it ideal for cleaning up years of accumulated clutter.

Prerequisites

BFG Repo-Cleaner is a Java application, so you'll need:

  1. Java 11 or later installed on your system
  2. Git (obviously)
  3. Write access to the repository you're cleaning

Check your Java version:

java -version

Download the latest BFG JAR from the official site, or install via package manager:

# macOS
brew install bfg

# Linux (Ubuntu/Debian)
sudo apt-get install bfg

# Or download directly
wget https://repo1.maven.org/maven2/com/madgag/bfg/1.14.0/bfg-1.14.0.jar

Cleaning Up Your Repository

Step 1: Clone as a Mirror

Always work on a mirror clone. This creates a full copy including all branches, tags, and reflog—without a working directory.

git clone --mirror git://example.com/some-big-repo.git

This creates some-big-repo.git (a bare repository). Never run BFG on your original repository—always use a mirror.

Step 2: Run BFG Repo-Cleaner

BFG has several cleanup modes. Here are the most common:

Remove Large Files

Strip files larger than 100MB:

java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git

Remove Files by Name Pattern

Remove all .DS_Store files:

java -jar bfg.jar --delete-files .DS_Store some-big-repo.git

Remove multiple patterns:

java -jar bfg.jar --delete-files '{*.log,node_modules,*.tmp}' some-big-repo.git

Remove Files by Folder

Delete everything in a specific directory:

java -jar bfg.jar --delete-folders build some-big-repo.git

Replace Sensitive Data

Find and replace text (like API keys):

java -jar bfg.jar --replace-text replacements.txt some-big-repo.git

Where replacements.txt contains:

secret-key==>***REDACTED***
password123==>***REDACTED***

Step 3: Expire and Garbage Collect

BFG marks objects for deletion but doesn't remove them immediately. Force garbage collection to permanently remove the data:

cd some-big-repo.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive

This permanently deletes the unwanted data from your repository's object database.

Step 4: Push the Cleaned Repository

Once cleaned, push the rewritten history to your remote:

git push

Important: All developers must pull the cleaned history. Anyone with local clones should:

git pull --rebase origin main

Or reset to the cleaned remote:

git reset --hard origin/main

Real-World Example: Removing Accidental Commits

Scenario: You accidentally committed a 500MB backup folder and your .env file with secrets.

# 1. Clone as mirror
git clone --mirror https://github.com/yourorg/repo.git repo.git

# 2. Remove the backup folder and env files
java -jar bfg.jar --delete-folders backups repo.git
java -jar bfg.jar --delete-files .env repo.git

# 3. Clean up object database
cd repo.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive

# 4. Push cleaned history
git push

# 5. All team members update local clones
git pull --rebase origin main

Your repository is now cleaner, smaller, and your secrets are actually removed (not just hidden).

Verification

After cleanup, verify the repository size:

# Check repository size
du -sh repo.git

# List largest objects (to confirm they're gone)
git rev-list --all --objects | sort -k2 | tail -10

Common Mistakes to Avoid

Don't run BFG on your working repository. Always use --mirror.

# ❌ WRONG
cd my-repo
java -jar bfg.jar --delete-files .env .

# ✅ RIGHT
git clone --mirror https://... my-repo.git
java -jar bfg.jar --delete-files .env my-repo.git

Don't forget garbage collection. Without it, data isn't actually deleted.

# ❌ Incomplete
java -jar bfg.jar --strip-blobs-bigger-than 100M repo.git

# ✅ Complete
java -jar bfg.jar --strip-blobs-bigger-than 100M repo.git
cd repo.git && git gc --prune=now --aggressive

Communicate with your team. Rewritten history requires everyone to update. Notify developers before pushing and provide instructions.

Advanced Usage

Protecting Files

Keep certain files even if they match deletion patterns:

java -jar bfg.jar --delete-files '*.log' \
  --protect-blobs-from=main \
  repo.git

This deletes .log files from all branches except main.

Dry Run

Preview changes without modifying the repository:

java -jar bfg.jar --strip-blobs-bigger-than 100M --dry-run repo.git

When to Use BFG vs. git filter-branch

| Task | BFG | git filter-branch | |------|-----|-------------------| | Remove large files | ✅ Recommended | Slow, complex | | Remove sensitive data | ✅ Recommended | Slow, error-prone | | Complex filtering | ⚠️ Limited | ✅ Flexible | | Speed | ✅ 10-1000x faster | Slow | | Ease of use | ✅ Simple | Steep learning curve |

For 99% of cleanup tasks, BFG is the better choice.

Post-Cleanup Checklist

Wrapping Up

Git history cleanup is essential for maintaining healthy repositories. BFG Repo-Cleaner makes it fast, safe, and straightforward—whether you're removing gigabytes of build artifacts or purging accidentally-committed secrets.

Start small with a mirror clone and a dry run. Once you're confident, run the cleanup and push. Your future clones will be significantly faster, and your repository will be cleaner.

Pro tip: Add a .gitignore rule after cleanup to prevent re-committing the same files:

# .gitignore
*.log
node_modules/
.env
backups/
*.tmp

Commit this, and you've solved the problem permanently.

← all posts
admond tamang · portfoliotheme: mono