How to Set Up Airbyte Agents for Multi-Source Data Orchestration in 2025

Tools & Libraries·May 10, 2026·5 min read

How to Set Up Airbyte Agents for Multi-Source Data Orchestration in 2025

Data engineers and backend developers increasingly face a complex problem: orchestrating data flows across multiple disconnected sources without context loss. Traditional ETL pipelines treat each source independently, creating gaps when you need intelligent decisions about which data to sync, when to sync it, and how to handle conflicts.

Airbyte Agents solve this by providing context-aware orchestration—agents that understand your entire data landscape and make intelligent routing decisions. This guide walks you through setting up Airbyte Agents for real multi-source scenarios.

Why Airbyte Agents Matter for Multi-Source Pipelines

Unlike standard Airbyte connectors that follow static configuration, Airbyte Agents are intelligent components that:

Maintain cross-source context: Agents track state and relationships across all connected sources
Make adaptive decisions: They determine optimal sync timing, batching, and retry strategies based on source health
Handle dependencies: Automatically order syncs when downstream sources depend on upstream data completion
Reduce manual interventions: Context awareness eliminates most edge cases requiring manual pipeline management

This is particularly valuable when integrating SaaS APIs (Salesforce, HubSpot), databases (PostgreSQL, MongoDB), and warehouses (Snowflake, BigQuery) simultaneously.

Prerequisites

Before setting up Airbyte Agents:

Airbyte Cloud or self-hosted instance (v0.50+)
At least 2 configured data sources (connectors already set up)
Basic familiarity with JSON-based configuration
Access to your data warehouse or destination
API credentials for your sources (ready to authenticate)

Step 1: Enable Agent Mode in Your Airbyte Workspace

Agent functionality isn't enabled by default. Access your Airbyte workspace settings:

Navigate to Settings → Advanced
Toggle "Enable Experimental Agents" (available in Airbyte v0.50+)
Confirm workspace reload

Once enabled, you'll see a new "Agents" tab in the left sidebar alongside Connections and Sources.

Step 2: Define Agent Context Schema

Agents need to understand relationships between your sources. Create a context schema that maps:

{
  "sources": [
    {
      "id": "salesforce-crm",
      "type": "salesforce",
      "priority": "high",
      "refresh_interval_hours": 4,
      "depends_on": []
    },
    {
      "id": "postgres-transactions",
      "type": "postgres",
      "priority": "critical",
      "refresh_interval_hours": 1,
      "depends_on": []
    },
    {
      "id": "hubspot-contacts",
      "type": "hubspot",
      "priority": "medium",
      "refresh_interval_hours": 6,
      "depends_on": ["salesforce-crm"]
    }
  ],
  "join_conditions": [
    {
      "left_source": "salesforce-crm",
      "right_source": "hubspot-contacts",
      "left_key": "email",
      "right_key": "email"
    }
  ],
  "conflict_resolution": "timestamp"
}

This configuration tells Airbyte Agents:

HubSpot syncs only after Salesforce completes
Records are joined on email fields
When conflicts occur, the most recent timestamp wins

Step 3: Create Agent Configurations

Navigate to Agents → Create New Agent:

Name: multi-source-daily-sync
Mode: Select "Multi-Source Orchestrator"
Paste your context schema from Step 2
Assign connectors: Select your Salesforce, PostgreSQL, and HubSpot connections
Set failure behavior: Choose "pause-dependent-sources" (recommended for production)

The agent now understands your source topology and can optimize sync order.

Step 4: Configure Intelligence Rules

Intelligence rules allow agents to make decisions beyond static scheduling:

| Rule Type | Use Case | Example | |-----------|----------|----------| | Source Health Gating | Skip downstream syncs if upstream fails | If Salesforce sync <90% successful in last 24h, pause HubSpot | | Volume-Based Throttling | Adjust sync timing based on data size | Trigger HubSpot sync only if Salesforce extracted >1000 records | | Time-Window Optimization | Sync only during low-traffic periods | PostgreSQL syncs only between 2-4 AM UTC | | Dedupe Detection | Prevent redundant syncs | Skip sync if previous one completed <30 min ago |

Add these rules in the Intelligence tab:

{
  "rules": [
    {
      "name": "salesforce-health-gate",
      "condition": "source_success_rate < 0.9",
      "action": "skip_dependent_sources",
      "lookback_hours": 24
    },
    {
      "name": "postgres-volume-trigger",
      "condition": "postgres_extracted_records > 1000",
      "action": "trigger_hubspot_sync",
      "cooldown_minutes": 30
    }
  ]
}

Step 5: Set Up Monitoring and Error Handling

Agents generate detailed execution logs. Configure alerts:

Go to Agents → Monitoring
Enable "Context Loss Detection" - alerts when agent loses source state
Set "Retry Policy" - exponential backoff starting at 30 seconds
Configure "Webhook Notifications" for slack integration (paste your Slack webhook URL)

For production, integrate with your observability stack:

# Example: Ship agent logs to DataDog
export AIRBYTE_AGENT_LOG_DESTINATION="datadog"
export DATADOG_API_KEY="your-key-here"
export DATADOG_SITE="us3.datadoghq.com"

Step 6: Test Multi-Source Sync Behavior

Before running production, validate agent behavior:

Trigger manual sync: Click "Test Run" in agent details
Verify sync order: Check that HubSpot waits for Salesforce completion
Inspect logs: Review Execution Timeline to confirm context was maintained
Check data quality: Validate joined records in your warehouse

Look for this pattern in logs:

[Agent] Starting sync for source: salesforce-crm (priority: high)
[Agent] Waiting for completion...
[Agent] Salesforce sync completed: 5,234 records extracted
[Agent] Dependency satisfied: hubspot-contacts cleared to sync
[Agent] Starting sync for source: hubspot-contacts (priority: medium)

Common Pitfalls and Solutions

Problem: Agent syncs all sources simultaneously instead of respecting dependencies.

Solution: Verify depends_on array in context schema is properly formatted. Agents don't infer dependencies automatically.

Problem: "Context Loss" alerts trigger frequently.

Solution: Increase agent timeout values in Advanced Settings. Default 5-minute timeout may be insufficient for slow sources like Salesforce.

Problem: Join conditions fail silently.

Solution: Validate that join keys (email, ID, etc.) actually exist in both source schemas. Use describe_table queries to confirm field names.

Production Checklist

[ ] Agent schema tested with at least 3 sources
[ ] Webhook notifications configured and tested
[ ] Failure scenarios (source down, network timeout) tested manually
[ ] Rollback plan documented (how to revert to individual connections)
[ ] Monitoring dashboards created for agent performance metrics
[ ] Data quality validation queries written for joined datasets
[ ] Team training completed on agent logs and troubleshooting

Next Steps

Once stable, expand your agent setup:

Add more sources: Scale to 5+ sources; agents maintain performance
Custom transformation rules: Add dbt-based transformations between sources
Predictive scheduling: Use historical sync patterns for ML-optimized timing
Cost optimization: Agent insights can identify which syncs could run less frequently

Airbyte Agents transform multi-source orchestration from a manual scripting problem into an intelligent, context-aware system. Start small with 2-3 sources, validate the dependency model, then expand confidently.

Recommended Tools

SupabaseOpen source Firebase alternative with Postgres
VercelDeploy frontend apps instantly with zero config

How to Set Up Airbyte Agents for Multi-Source Data Orchestration in 2025

How to Set Up Airbyte Agents for Multi-Source Data Orchestration in 2025

Why Airbyte Agents Matter for Multi-Source Pipelines

Prerequisites

Step 1: Enable Agent Mode in Your Airbyte Workspace

Step 2: Define Agent Context Schema

Step 3: Create Agent Configurations

Step 4: Configure Intelligence Rules

Step 5: Set Up Monitoring and Error Handling

Step 6: Test Multi-Source Sync Behavior

Common Pitfalls and Solutions

Production Checklist

Next Steps

Related Articles