How to Format a Large Ruby Codebase Overnight with Rubyfmt Automation
How to Format a Large Ruby Codebase Overnight with Rubyfmt Automation
Maintaining consistent code style across a massive Ruby monorepo is one of the toughest challenges in large engineering organizations. When you're managing millions of lines of code spread across thousands of files, manual formatting becomes impossible. Rubyfmt—Ruby's native code formatter—combined with smart automation strategies can help you format an entire codebase in a single overnight batch run.
This guide walks you through the practical approach used by teams managing 25M+ line Ruby codebases, showing you how to set up automated formatting that respects your development workflow.
Why Formatting a Massive Codebase Matters
Consistent code style reduces:
- Cognitive load during code review (reviewers focus on logic, not spacing)
- Merge conflicts caused by formatting inconsistencies
- Time spent on formatting discussions in pull requests
- Onboarding friction for new team members
But applying formatting rules to a 25-million-line codebase manually would block your entire engineering team for days. Automation is the only practical solution.
Prerequisites for Large-Scale Rubyfmt Formatting
Before you start, ensure you have:
- Rubyfmt installed: Version 0.6+ for stability on large codebases
- Sufficient disk space: At least 2-3x your repository size for temp files
- A separate branch: Never format directly on main; use a dedicated formatting branch
- CI/CD pipeline access: To schedule the overnight job
- Ruby version management: Ensure all developers use the same Ruby version as your formatter
Installation
gem install rubyfmt
# Verify installation
rubyfmt --version
Step 1: Create a Dedicated Formatting Branch
Start by creating an isolated branch for your formatting changes:
git checkout -b format/rubyfmt-codebase-migration
git pull origin main # Ensure you're up-to-date
This branch will contain only formatting changes, making the diff easy to review and reducing risk of accidental logic changes.
Step 2: Set Up Parallel Processing Configuration
For a 25M-line codebase, sequential formatting can take 12+ hours. Use parallel processing to distribute work across CPU cores:
#!/bin/bash
# format_codebase.sh
set -e
echo "Starting parallel Rubyfmt formatting..."
start_time=$(date +%s)
# Find all Ruby files, excluding common non-essential directories
find . \
-name '*.rb' \
-not -path './node_modules/*' \
-not -path './vendor/*' \
-not -path './.git/*' \
-not -path './tmp/*' \
-not -path './log/*' \
| xargs -P 8 -I {} rubyfmt -i {}
end_time=$(date +%s)
elapsed=$((end_time - start_time))
echo "Formatting complete. Time elapsed: $((elapsed / 60)) minutes"
The -P 8 flag tells xargs to run 8 parallel processes. Adjust based on your CPU core count:
# For 16-core machine
xargs -P 16 -I {} rubyfmt -i {}
# For 4-core machine
xargs -P 4 -I {} rubyfmt -i {}
Step 3: Configure Rubyfmt Options for Your Team
Create a .rubyfmt.yml configuration file in your repository root to ensure consistent settings:
# .rubyfmt.yml
line_length: 100
trailing_comma: true
indentation_width: 2
This ensures all team members and CI systems use identical formatting rules.
Step 4: Run the Overnight Formatting Job
Schedule this as a CI job that runs outside business hours:
# .github/workflows/format-codebase.yml (GitHub Actions example)
name: Overnight Codebase Formatting
on:
schedule:
- cron: '0 22 * * 5' # Every Friday at 10 PM UTC
jobs:
format:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- uses: ruby/setup-ruby@v1
with:
ruby-version: '3.2'
bundler-cache: true
- name: Install Rubyfmt
run: gem install rubyfmt
- name: Run parallel formatting
run: |
bash scripts/format_codebase.sh
- name: Create Pull Request
uses: peter-evans/create-pull-request@v4
with:
branch: format/rubyfmt-codebase-migration
title: '[Automated] Rubyfmt codebase formatting'
body: |
This PR applies Rubyfmt formatting to the entire codebase.
No logic changes—formatting only.
commit-message: 'chore: apply rubyfmt formatting'
Step 5: Validate and Review the Changes
After the job completes:
-
Check the diff size: A formatting-only PR should show only whitespace changes
git diff --stat main..format/rubyfmt-codebase-migration -
Verify no logic changes: Search for actual code modifications
git diff main..format/rubyfmt-codebase-migration | grep -E '^[+-]' | grep -v '^[+-]{3}' | head -20 -
Run your test suite: Ensure formatting didn't break anything
bundle exec rspec
Step 6: Merge and Enforce Going Forward
Once validated, merge the formatting PR:
git merge --no-ff format/rubyfmt-codebase-migration
Then enforce formatting in your CI pipeline:
# Add to your standard CI workflow
- name: Check Rubyfmt compliance
run: |
rubyfmt --check $(find . -name '*.rb' -type f)
This prevents future commits from violating formatting standards.
Common Pitfalls and Solutions
| Issue | Solution |
|-------|----------|
| Out of memory on large repos | Reduce parallel workers (-P 2 instead of -P 16) |
| Formatting takes >8 hours | Exclude more directories (logs, cache, temp files) |
| Team members get merge conflicts | Rebase branches immediately after merge |
| Rubyfmt crashes on DSL-heavy code | Update to latest version; report bugs to maintainers |
Performance Benchmarks
Based on real-world formatting runs:
- 5M lines: ~45 minutes (8 cores)
- 15M lines: ~2.5 hours (8 cores)
- 25M lines: ~4-5 hours (8 cores)
Parallelization provides roughly 7-8x speedup over sequential processing on modern hardware.
After the Initial Formatting
Set up pre-commit hooks to catch formatting issues early:
# .git/hooks/pre-commit
#!/bin/bash
ruby_files=$(git diff --cached --name-only | grep '\.rb$')
if [ -n "$ruby_files" ]; then
rubyfmt --check $ruby_files
if [ $? -ne 0 ]; then
echo "Run 'rubyfmt -i' on modified Ruby files before committing"
exit 1
fi
fi
Conclusion
Formatting a 25-million-line Ruby codebase overnight is achievable with proper automation. By using parallel processing, scheduling overnight CI jobs, and enforcing formatting in your pipeline, you can maintain consistent code style without disrupting your team's workflow. The initial effort pays dividends through reduced code review friction and cleaner diffs for years to come.
Recommended Tools
- GitHubWhere the world builds software