How to Format a Large Ruby Codebase Overnight with Rubyfmt Automation

How to Format a Large Ruby Codebase Overnight with Rubyfmt Automation

Maintaining consistent code style across a massive Ruby monorepo is one of the toughest challenges in large engineering organizations. When you're managing millions of lines of code spread across thousands of files, manual formatting becomes impossible. Rubyfmt—Ruby's native code formatter—combined with smart automation strategies can help you format an entire codebase in a single overnight batch run.

This guide walks you through the practical approach used by teams managing 25M+ line Ruby codebases, showing you how to set up automated formatting that respects your development workflow.

Why Formatting a Massive Codebase Matters

Consistent code style reduces:

  • Cognitive load during code review (reviewers focus on logic, not spacing)
  • Merge conflicts caused by formatting inconsistencies
  • Time spent on formatting discussions in pull requests
  • Onboarding friction for new team members

But applying formatting rules to a 25-million-line codebase manually would block your entire engineering team for days. Automation is the only practical solution.

Prerequisites for Large-Scale Rubyfmt Formatting

Before you start, ensure you have:

  • Rubyfmt installed: Version 0.6+ for stability on large codebases
  • Sufficient disk space: At least 2-3x your repository size for temp files
  • A separate branch: Never format directly on main; use a dedicated formatting branch
  • CI/CD pipeline access: To schedule the overnight job
  • Ruby version management: Ensure all developers use the same Ruby version as your formatter

Installation

gem install rubyfmt

# Verify installation
rubyfmt --version

Step 1: Create a Dedicated Formatting Branch

Start by creating an isolated branch for your formatting changes:

git checkout -b format/rubyfmt-codebase-migration
git pull origin main  # Ensure you're up-to-date

This branch will contain only formatting changes, making the diff easy to review and reducing risk of accidental logic changes.

Step 2: Set Up Parallel Processing Configuration

For a 25M-line codebase, sequential formatting can take 12+ hours. Use parallel processing to distribute work across CPU cores:

#!/bin/bash
# format_codebase.sh

set -e

echo "Starting parallel Rubyfmt formatting..."
start_time=$(date +%s)

# Find all Ruby files, excluding common non-essential directories
find . \
  -name '*.rb' \
  -not -path './node_modules/*' \
  -not -path './vendor/*' \
  -not -path './.git/*' \
  -not -path './tmp/*' \
  -not -path './log/*' \
  | xargs -P 8 -I {} rubyfmt -i {}

end_time=$(date +%s)
elapsed=$((end_time - start_time))

echo "Formatting complete. Time elapsed: $((elapsed / 60)) minutes"

The -P 8 flag tells xargs to run 8 parallel processes. Adjust based on your CPU core count:

# For 16-core machine
xargs -P 16 -I {} rubyfmt -i {}

# For 4-core machine
xargs -P 4 -I {} rubyfmt -i {}

Step 3: Configure Rubyfmt Options for Your Team

Create a .rubyfmt.yml configuration file in your repository root to ensure consistent settings:

# .rubyfmt.yml
line_length: 100
trailing_comma: true
indentation_width: 2

This ensures all team members and CI systems use identical formatting rules.

Step 4: Run the Overnight Formatting Job

Schedule this as a CI job that runs outside business hours:

# .github/workflows/format-codebase.yml (GitHub Actions example)
name: Overnight Codebase Formatting

on:
  schedule:
    - cron: '0 22 * * 5'  # Every Friday at 10 PM UTC

jobs:
  format:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0
      
      - uses: ruby/setup-ruby@v1
        with:
          ruby-version: '3.2'
          bundler-cache: true
      
      - name: Install Rubyfmt
        run: gem install rubyfmt
      
      - name: Run parallel formatting
        run: |
          bash scripts/format_codebase.sh
      
      - name: Create Pull Request
        uses: peter-evans/create-pull-request@v4
        with:
          branch: format/rubyfmt-codebase-migration
          title: '[Automated] Rubyfmt codebase formatting'
          body: |
            This PR applies Rubyfmt formatting to the entire codebase.
            No logic changes—formatting only.
          commit-message: 'chore: apply rubyfmt formatting'

Step 5: Validate and Review the Changes

After the job completes:

  1. Check the diff size: A formatting-only PR should show only whitespace changes

    git diff --stat main..format/rubyfmt-codebase-migration
    
  2. Verify no logic changes: Search for actual code modifications

    git diff main..format/rubyfmt-codebase-migration | grep -E '^[+-]' | grep -v '^[+-]{3}' | head -20
    
  3. Run your test suite: Ensure formatting didn't break anything

    bundle exec rspec
    

Step 6: Merge and Enforce Going Forward

Once validated, merge the formatting PR:

git merge --no-ff format/rubyfmt-codebase-migration

Then enforce formatting in your CI pipeline:

# Add to your standard CI workflow
- name: Check Rubyfmt compliance
  run: |
    rubyfmt --check $(find . -name '*.rb' -type f)

This prevents future commits from violating formatting standards.

Common Pitfalls and Solutions

| Issue | Solution | |-------|----------| | Out of memory on large repos | Reduce parallel workers (-P 2 instead of -P 16) | | Formatting takes >8 hours | Exclude more directories (logs, cache, temp files) | | Team members get merge conflicts | Rebase branches immediately after merge | | Rubyfmt crashes on DSL-heavy code | Update to latest version; report bugs to maintainers |

Performance Benchmarks

Based on real-world formatting runs:

  • 5M lines: ~45 minutes (8 cores)
  • 15M lines: ~2.5 hours (8 cores)
  • 25M lines: ~4-5 hours (8 cores)

Parallelization provides roughly 7-8x speedup over sequential processing on modern hardware.

After the Initial Formatting

Set up pre-commit hooks to catch formatting issues early:

# .git/hooks/pre-commit
#!/bin/bash
ruby_files=$(git diff --cached --name-only | grep '\.rb$')
if [ -n "$ruby_files" ]; then
  rubyfmt --check $ruby_files
  if [ $? -ne 0 ]; then
    echo "Run 'rubyfmt -i' on modified Ruby files before committing"
    exit 1
  fi
fi

Conclusion

Formatting a 25-million-line Ruby codebase overnight is achievable with proper automation. By using parallel processing, scheduling overnight CI jobs, and enforcing formatting in your pipeline, you can maintain consistent code style without disrupting your team's workflow. The initial effort pays dividends through reduced code review friction and cleaner diffs for years to come.

Recommended Tools