Schedule ETL Jobs — REST API
Schedule the data lakehouse pipeline using crontab with curl calls to the Zynexa REST API instead of direct Docker Compose commands. This approach is useful when:
- You want to use the same API as the Zynexa UI
- You need toggle control (
enableOpenlineage,enableElementary) per scheduled run - The cron job runs on a different machine than the Docker host
Prerequisites
PIPELINE_ENABLED=truein.env.production- Zynexa container running and healthy
- Lakehouse DB running
curlandjqinstalled on the cron host
For the recommended Docker Compose approach (no web app dependency), see Schedule ETL Jobs — Docker Compose.
Quick Setup
# Open crontab editor
crontab -e
# Add the daily pipeline schedule (runs at midnight)
0 0 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily"}' \
>> /var/log/ctms-pipeline-api.log 2>&1
Schedule Examples
Midnight Daily — Full Pipeline (Default)
0 0 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily"}' \
>> /var/log/ctms-pipeline-api.log 2>&1
Midnight Daily — Without Marquez Lineage
0 0 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily", "enableOpenlineage": false}' \
>> /var/log/ctms-pipeline-api.log 2>&1
Midnight Daily — Minimal (No Marquez, No Elementary)
Fastest run — plain dbt only:
0 0 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily", "enableOpenlineage": false, "enableElementary": false}' \
>> /var/log/ctms-pipeline-api.log 2>&1
Ingester Every 4 Hours, dbt Once Daily
# Ingester every 4 hours
0 */4 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "ingester"}' \
>> /var/log/ctms-ingester-api.log 2>&1
# dbt once daily at 1 AM
0 1 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "dbt", "command": "daily"}' \
>> /var/log/ctms-dbt-api.log 2>&1
2 AM Weekdays Only
0 2 * * 1-5 curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily"}' \
>> /var/log/ctms-pipeline-api.log 2>&1
Remote Server (by IP)
If the cron job runs on a different machine:
0 0 * * * curl -sf -X POST http://{IP_ADDRESS}:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily"}' \
>> /var/log/ctms-pipeline-api.log 2>&1
Fire-and-Forget vs Poll-and-Wait
The examples above are fire-and-forget — they trigger the pipeline and return immediately (HTTP 202). The pipeline runs in the background inside the Zynexa container.
If you need the cron job to wait for completion and log the result:
#!/bin/bash
# /opt/ctms-deployment/scripts/trigger-and-wait.sh
BASE=http://localhost:3000
LOG=/var/log/ctms-pipeline-api.log
echo "[$(date)] Triggering full pipeline..." >> "$LOG"
# Trigger
RESPONSE=$(curl -sf -X POST "$BASE/api/data-analytics/pipeline/trigger" \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily"}')
JOB_ID=$(echo "$RESPONSE" | jq -r '.jobId')
echo "[$(date)] Job started: $JOB_ID" >> "$LOG"
# Poll until done (timeout after 30 min)
TIMEOUT=1800
ELAPSED=0
while [ $ELAPSED -lt $TIMEOUT ]; do
RESULT=$(curl -sf "$BASE/api/data-analytics/pipeline/trigger?jobId=$JOB_ID")
STATUS=$(echo "$RESULT" | jq -r '.status')
if [ "$STATUS" != "running" ]; then
EXIT_CODE=$(echo "$RESULT" | jq -r '.exitCode')
echo "[$(date)] Job $JOB_ID finished: status=$STATUS exitCode=$EXIT_CODE" >> "$LOG"
exit "$EXIT_CODE"
fi
sleep 15
ELAPSED=$((ELAPSED + 15))
done
echo "[$(date)] Job $JOB_ID timed out after ${TIMEOUT}s" >> "$LOG"
exit 1
Then in crontab:
0 0 * * * /opt/ctms-deployment/scripts/trigger-and-wait.sh
Docker Compose vs REST API
| Feature | Docker Compose | REST API |
|---|---|---|
| Requires SSH to Docker host | Yes | No |
| Requires Zynexa running | No | Yes |
| Toggle control per run | No | Yes |
| Conflict detection (409) | No | Yes |
| Job tracking / polling | No | Yes |
| Recommended for production | Yes | Alternative |
Logging
# View latest API trigger output
tail -50 /var/log/ctms-pipeline-api.log
# Check for curl failures
grep -i 'error\|fail\|curl' /var/log/ctms-pipeline-api.log | tail -10
See Also
- Schedule ETL Jobs — Docker Compose — recommended approach using direct Docker Compose commands
- Trigger ETL Pipeline via REST API — full API reference with all payloads and toggle combinations
- Platform Runbook — Data Lakehouse Pipeline