How to Block AI Bot Spam in GitHub Repos Using Git Author Filters 2025
Prerequisites and What You'll Need
Required tools: Git CLI, GitHub CLI (gh), and repo admin access
Before diving in, make sure your environment matches what these steps assume. Missing any of these will cause specific steps to fail silently.
- [ ] Git 2.38 or newer (
git --versionto confirm) - [ ] GitHub CLI 2.40 or newer (
gh --version), authenticated withgh auth login - [ ] Admin or maintainer role on the target repository (required for branch protection rules and interaction limits)
- [ ]
jq1.6 or newer installed for JSON parsing in bash scripts (jq --version) - [ ] A GitHub Personal Access Token with
reposcope if you're running API calls outside Actions - [ ] Basic familiarity with GitHub Actions YAML syntax
Estimated time: 45 minutes for initial setup; 10 minutes for ongoing maintenance scripts.
Understanding the AI bot spam problem in open source issues and PRs
This isn't hypothetical anymore. When Archestra posted a GitHub issue with a $900 bounty for a new feature contribution, the issue attracted legitimate contributors at first — but AI bots quickly swarmed it, pushing the comment count to 253 total comments. The signal-to-noise ratio collapsed. Real contributors couldn't follow the conversation. Maintainers spent hours moderating instead of shipping code.
The pattern is consistent across repos: bots flood high-value issues with content that superficially mimics human engagement but contributes nothing actionable. GitHub's own metrics celebrated AI contribution volume without accounting for this quality collapse, which is exactly the wrong incentive.
Checklist: signs your repo is being targeted by AI-generated contributions
Watch for these indicators:
- [ ] Multiple comments containing phrases like "implementation plan", "I can help with this", or "Here's my approach" with no linked code
- [ ] Accounts created within the last 30-90 days with zero prior public commits
- [ ] Identical or near-identical comment phrasing across different accounts on the same issue
- [ ] PRs that reference an issue but contain no diff — only a description
- [ ] A sudden spike in issue comments within hours of a bounty being posted
- [ ] Contributor profiles with a generic bio, default avatar, and follower count of zero
Step 1 — Identify AI-Generated Commits and Contributors with Git Log
Before you can block anything, you need to know what you're dealing with. Git's built-in logging tools let you audit contributor history locally without any API calls, which makes this a fast first triage step.
Using git log --author to filter commits by suspected bot accounts
If you've noticed a suspicious username in your PR queue or commit history, filter their activity immediately:
# Filter commits from a specific suspected bot author
git log --author='bot-username' --oneline --since='2024-01-01'
# Use a regex pattern to catch variations (e.g., ai-helper-1, ai-helper-2)
git log --author='ai-helper' --oneline --all
# Filter by email domain if bots share a common provider
git log --author='@tempmail\.com' --oneline --since='2024-01-01'
Spotting AI-generated commit messages: patterns and red flags
AI-generated commit messages often follow a suspiciously polished cadence: feat: implement complete solution for issue #42 with full error handling on a first commit from a brand-new account. Real first contributions are messier. Also watch for:
- Commit messages that perfectly echo the issue title verbatim
- Conventional commit format (
feat:,fix:,chore:) from accounts with zero history — legitimate newcomers rarely know this convention on day one - Single-commit PRs that claim to "fully implement" a complex feature
Auditing contributor history with git shortlog
Run this on your repo to surface anomalies in contribution distribution:
# Rank all contributors by commit count, excluding merge commits
git shortlog -sn --no-merges
# Narrow to a recent time window to catch new entrants
git shortlog -sn --no-merges --since='2024-01-01'
# Output with emails to cross-reference accounts
git shortlog -sne --no-merges --since='2024-01-01'
A contributor with 1 commit to your repo but a pattern of showing up on multiple bounty issues is a strong signal. Cross-reference the emails against GitHub profiles using gh api /users/{username}.
Note:
git log --authoruses regex matching, so--author='john'will match any author whose name or email contains "john". Wrap patterns in^and$anchors for exact matches:--author='^John Doe <john@example.com>$'.
Step 2 — Set Up a CODEOWNERS File to Gate Merges
Identifying bots reactively wastes time. CODEOWNERS lets you enforce that only trusted humans can approve merges into protected branches, which eliminates the risk of a bot-authored PR slipping through during a busy week.
Creating .github/CODEOWNERS to require human reviewer approval
Create this file at .github/CODEOWNERS:
# .github/CODEOWNERS
# All files require review from a core maintainer
* @your-org/core-maintainers
# Critical paths require a specific named maintainer
/.github/ @yourusername @trusted-colleague
src/core/ @yourusername
package.json @yourusername @trusted-colleague
Replace @your-org/core-maintainers with your actual GitHub org team slug. If you're a solo maintainer, use your username directly.
Enabling branch protection rules that enforce CODEOWNERS
You can configure branch protection via the GitHub API using gh:
# Set branch protection on 'main' requiring code owner review
gh api \
--method PUT \
/repos/{owner}/{repo}/branches/main/protection \
--input - <<'EOF'
{
"required_status_checks": null,
"enforce_admins": true,
"required_pull_request_reviews": {
"require_code_owner_reviews": true,
"required_approving_review_count": 1,
"dismiss_stale_reviews": true
},
"restrictions": null
}
EOF
In the GitHub UI, navigate to Settings → Branches → Branch protection rules, add a rule for main, and check Require review from Code Owners.
Note: CODEOWNERS review enforcement only triggers on PRs — it does not prevent direct pushes unless you also enable Restrict who can push to matching branches. Always pair CODEOWNERS with that restriction.
Step 3 — Add a GitHub Actions Workflow to Detect Bot-like PR Authors
CODEOWNERS stops unauthorized merges, but it still requires a human to manually close bot PRs. This Actions workflow auto-labels or auto-closes PRs from accounts that fail basic legitimacy heuristics, so your review queue stays clean.
Writing a workflow that checks PR author account age and contribution history
Create .github/workflows/detect-bot-prs.yml:
name: Detect Bot-like PR Authors
on:
pull_request_target:
types: [opened, reopened]
jobs:
check-author:
runs-on: ubuntu-latest
permissions:
pull-requests: write
issues: write
steps:
- name: Check PR author metadata
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const author = context.payload.pull_request.user.login;
const prNumber = context.payload.pull_request.number;
// Fetch author metadata from GitHub REST API
const { data: user } = await github.rest.users.getByUsername({
username: author
});
const accountCreatedAt = new Date(user.created_at);
const accountAgeInDays = (Date.now() - accountCreatedAt) / (1000 * 60 * 60 * 24);
const publicRepos = user.public_repos;
const followers = user.followers;
console.log(`Author: ${author}`);
console.log(`Account age: ${Math.floor(accountAgeInDays)} days`);
console.log(`Public repos: ${publicRepos}`);
console.log(`Followers: ${followers}`);
// Flag accounts newer than 30 days with fewer than 3 public repos
const isSuspected = accountAgeInDays < 30 && publicRepos < 3;
if (isSuspected) {
// Apply the 'suspected-bot' label
await github.rest.issues.addLabels({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
labels: ['suspected-bot']
});
// Post a comment requesting human verification
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: [
'👋 Hi @' + author + ',',
'',
'Our automated checks flagged this PR because your account was created recently and has limited public activity.',
'This is not an automatic rejection. To help us verify you are a human contributor, please:',
'',
'1. Link to a previous open source contribution you have made',
'2. Add a brief comment explaining your implementation approach',
'3. Confirm you wrote this code yourself and did not submit AI-generated output without review',
'',
'A maintainer will review shortly. Thank you for your patience.'
].join('\n')
});
}
Note: Use
pull_request_targetinstead ofpull_requestso this workflow has write permissions to comment on PRs from forks. This is a known GitHub Actions security consideration — never run untrusted code from the PR in apull_request_targetworkflow. Keep this workflow to API calls only.
Create the suspected-bot label first via gh label create 'suspected-bot' --color 'FF0000' --description 'Account age heuristic flagged this PR'.
Step 4 — Lock Issue Comments with Interaction Limits and Templates
Bounty issues are the primary target because they offer visible rewards. Locking down who can comment — and raising the bar for what a valid comment looks like — dramatically reduces bot noise before it starts.
Enabling temporary interaction limits on high-traffic bounty issues
# Set repository-level interaction limit to collaborators only for 24 hours
gh api \
--method PUT \
/repos/{owner}/{repo}/interaction-limits \
-f limit='collaborators_only' \
-f expiry='one_day'
# To lock a specific issue (prevents new comments entirely)
gh api \
--method PUT \
/repos/{owner}/{repo}/issues/{issue_number}/lock \
-f lock_reason='spam'
Writing an ISSUE_TEMPLATE that discourages AI-generated responses
Create .github/ISSUE_TEMPLATE/bounty.md:
---
name: Bounty Contribution Proposal
about: Submit your proposal for a bounty issue
title: '[BOUNTY PROPOSAL] <brief description>'
labels: 'bounty-proposal'
assignees: ''
---
## Mandatory Pre-submission Checklist
Please check every box before submitting. Unchecked submissions will be closed without review.
- [ ] I have read **all existing comments** on the bounty issue before posting
- [ ] My proposal includes **working code or a testable prototype** (link or inline snippet required)
- [ ] I have **not copy-pasted AI output** without personally reviewing and testing it
- [ ] I am a **human contributor** making this submission on my own behalf
- [ ] My GitHub account has **prior public contributions** (link at least one)
## My Implementation Approach
<!-- Describe in 2-3 sentences what you will build and how. Vague plans will be closed. -->
## Proof of Work
<!-- Link to a branch, gist, or code snippet showing you've already started. Plans without code are not accepted. -->
## Prior Relevant Work
<!-- Link to a previous contribution (PR, commit, or published package) that demonstrates relevant experience. -->
The "proof of work" requirement alone eliminates the majority of AI bot proposals, which are characteristically long on plans and short on code.
Step 5 — Filter and Remove Existing AI Spam Comments via GitHub CLI
If your issue already has 100+ comments, manual deletion is untenable. This script automates the cleanup using gh api and jq to bulk-delete comments matching bot-signature phrases.
Scripting bulk deletion of comments matching bot-like patterns
#!/usr/bin/env bash
# cleanup-spam-comments.sh
# Usage: ./cleanup-spam-comments.sh OWNER REPO ISSUE_NUMBER
# Requires: gh CLI authenticated, jq installed
set -euo pipefail
OWNER="${1}"
REPO="${2}"
ISSUE_NUMBER="${3}"
# Patterns that indicate AI-generated spam comments
# Extend this array with patterns specific to your repo
SPAM_PATTERNS=(
"implementation plan"
"I can help with this"
"Here's my approach"
"I would like to work on this"
"I can implement this"
"Step 1:"
"happy to take this on"
)
echo "Fetching comments for issue #${ISSUE_NUMBER} in ${OWNER}/${REPO}..."
# Fetch all comments (handles pagination via --paginate)
COMMENTS=$(gh api \
--paginate \
"/repos/${OWNER}/${REPO}/issues/${ISSUE_NUMBER}/comments" \
--jq '.[] | {id: .id, user: .user.login, body: .body}')
DELETED=0
while IFS= read -r comment; do
COMMENT_ID=$(echo "${comment}" | jq -r '.id')
COMMENT_USER=$(echo "${comment}" | jq -r '.user')
COMMENT_BODY=$(echo "${comment}" | jq -r '.body')
MATCH=false
for pattern in "${SPAM_PATTERNS[@]}"; do
if echo "${COMMENT_BODY}" | grep -qi "${pattern}"; then
MATCH=true
break
fi
done
if [ "${MATCH}" = true ]; then
echo "Deleting comment ${COMMENT_ID} from @${COMMENT_USER} (matched spam pattern)"
gh api \
--method DELETE \
"/repos/${OWNER}/${REPO}/issues/comments/${COMMENT_ID}"
DELETED=$((DELETED + 1))
# Respect rate limits: sleep 100ms between deletions
sleep 0.1
fi
done <<< "${COMMENTS}"
echo "Done. Deleted ${DELETED} spam comments."
Run it as:
chmod +x cleanup-spam-comments.sh
./cleanup-spam-comments.sh my-org my-repo 42
Note: Review the matched comments in a dry run first. Add
echo "Would delete: ${COMMENT_ID}"before the DELETE call and comment out thegh api --method DELETEline until you're confident in your patterns. A false positive that deletes a legitimate comment is a worse outcome than a spam comment staying up temporarily.
Restoring issue signal-to-noise ratio after a spam wave
After bulk deletion, pin a summary comment from a maintainer account explaining what happened and what the current status is. This resets the conversation context for legitimate contributors who arrive after the cleanup.
Common Issues & Fixes
Error: HTTP 429 Too Many Requests when bulk-querying contributor data
Cause: GitHub's REST API enforces a rate limit of 5,000 requests per hour for authenticated requests. Iterating over hundreds of contributor profiles in a tight loop exhausts this budget quickly.
Fix: Add exponential backoff and check the X-RateLimit-Remaining response header before each request.
# Check your current rate limit status
gh api /rate_limit --jq '.rate | {limit, remaining, reset}'
# In scripts, sleep when remaining drops below a threshold
REMAINING=$(gh api /rate_limit --jq '.rate.remaining')
if [ "${REMAINING}" -lt 100 ]; then
RESET=$(gh api /rate_limit --jq '.rate.reset')
SLEEP=$((RESET - $(date +%s) + 5))
echo "Rate limit low. Sleeping ${SLEEP} seconds..."
sleep "${SLEEP}"
fi
Error: CODEOWNERS file exists but review is not being required on fork PRs
Cause: Branch protection rules with CODEOWNERS enforcement do not automatically apply to PRs from forks unless Require review from Code Owners is explicitly enabled under the branch protection rule, AND the workflow trigger is pull_request_target (not pull_request).
Fix: Confirm the protection rule is set via the API:
gh api /repos/{owner}/{repo}/branches/main/protection \
--jq '.required_pull_request_reviews.require_code_owner_reviews'
# Should return: true
If it returns false or null, re-apply the protection rule with the configuration from Step 2.
Error: Interaction limits not applying to existing collaborators or org members
Cause: GitHub's interaction limits apply to users who are not collaborators, contributors, or organization members. Accounts that have previously merged a PR or been explicitly added as collaborators bypass the limit regardless of when it was set.
Fix: For issues already under active spam attack, lock the issue entirely (gh api --method PUT /repos/{owner}/{repo}/issues/{issue_number}/lock -f lock_reason='spam') rather than relying on interaction limits. Interaction limits work best as a preventive measure before the spam wave hits.
| Symptom | Root Cause | Fix |
|---|---|---|
| 429 from GitHub API in Actions workflow | Rate limit exhausted by rapid contributor lookups | Add sleep between API calls; cache results in workflow artifacts |
| CODEOWNERS not triggering on fork PRs | pull_request trigger lacks write permissions | Switch to pull_request_target; never execute fork code in this context |
| Interaction limits ignored by some users | User is an existing collaborator or org member | Lock the issue directly via the Issues lock API |
| gh api DELETE returns 404 on comment | Comment already deleted, or wrong comment ID | Add || true to suppress error; validate IDs with a list call first |
FAQ
Q: Can I use the --author Git flag to permanently block a contributor from pushing?
git log --author is a read-only filter — it has no effect on what gets pushed to your repository. It only changes what you see in local log output. To actually prevent a user from pushing, you need to either block them at the GitHub account level (Settings → Block a user), remove their collaborator access, or use branch protection rules to restrict direct pushes. For fork-based PRs, blocking the GitHub account is the most reliable mechanism and prevents them from interacting with your repo entirely.
Q: Does GitHub have a native AI bot detection feature built in?
As of mid-2025, GitHub does not have a native AI-generated content detection system for pull requests or issue comments. GitHub has added Copilot-authored attribution to commits made via Copilot, but this is opt-in metadata, not enforcement. The GitHub Community forum has active discussions on this gap. For coordinated spam, your best current options are the workflow-based heuristics in this guide, combined with manual review. Watch the GitHub Changelog for updates — this is a known pain point the platform team is aware of.
Q: How do I report a coordinated spam campaign to GitHub Trust & Safety?
Navigate to github.com/contact/report-abuse and select "Spam or misuse". Include the repository URL, a list of the offending account usernames, and a description of the pattern (e.g., coordinated bot accounts flooding a bounty issue). For large-scale campaigns affecting multiple repos, posting in the GitHub Community forum under the Trust & Safety category increases visibility. GitHub's T&S team can take action at the account and IP level, which goes beyond what repo-level tooling can achieve.