Skip to main content

Schedule ETL Jobs — REST API

Schedule the data lakehouse pipeline using crontab with curl calls to the Zynexa REST API instead of direct Docker Compose commands. This approach is useful when:

  • You want to use the same API as the Zynexa UI
  • You need toggle control (enableOpenlineage, enableElementary) per scheduled run
  • The cron job runs on a different machine than the Docker host
Prerequisites
  • PIPELINE_ENABLED=true in .env.production
  • Zynexa container running and healthy
  • Lakehouse DB running
  • curl and jq installed on the cron host

For the recommended Docker Compose approach (no web app dependency), see Schedule ETL Jobs — Docker Compose.


Quick Setup

# Open crontab editor
crontab -e

# Add the daily pipeline schedule (runs at midnight)
0 0 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily"}' \
>> /var/log/ctms-pipeline-api.log 2>&1

Schedule Examples

Midnight Daily — Full Pipeline (Default)

0 0 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily"}' \
>> /var/log/ctms-pipeline-api.log 2>&1

Midnight Daily — Without Marquez Lineage

0 0 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily", "enableOpenlineage": false}' \
>> /var/log/ctms-pipeline-api.log 2>&1

Midnight Daily — Minimal (No Marquez, No Elementary)

Fastest run — plain dbt only:

0 0 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily", "enableOpenlineage": false, "enableElementary": false}' \
>> /var/log/ctms-pipeline-api.log 2>&1

Ingester Every 4 Hours, dbt Once Daily

# Ingester every 4 hours
0 */4 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "ingester"}' \
>> /var/log/ctms-ingester-api.log 2>&1

# dbt once daily at 1 AM
0 1 * * * curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "dbt", "command": "daily"}' \
>> /var/log/ctms-dbt-api.log 2>&1

2 AM Weekdays Only

0 2 * * 1-5 curl -sf -X POST http://localhost:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily"}' \
>> /var/log/ctms-pipeline-api.log 2>&1

Remote Server (by IP)

If the cron job runs on a different machine:

0 0 * * * curl -sf -X POST http://{IP_ADDRESS}:3000/api/data-analytics/pipeline/trigger \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily"}' \
>> /var/log/ctms-pipeline-api.log 2>&1

Fire-and-Forget vs Poll-and-Wait

The examples above are fire-and-forget — they trigger the pipeline and return immediately (HTTP 202). The pipeline runs in the background inside the Zynexa container.

If you need the cron job to wait for completion and log the result:

#!/bin/bash
# /opt/ctms-deployment/scripts/trigger-and-wait.sh
BASE=http://localhost:3000
LOG=/var/log/ctms-pipeline-api.log

echo "[$(date)] Triggering full pipeline..." >> "$LOG"

# Trigger
RESPONSE=$(curl -sf -X POST "$BASE/api/data-analytics/pipeline/trigger" \
-H "Content-Type: application/json" \
-d '{"pipeline": "full", "command": "daily"}')

JOB_ID=$(echo "$RESPONSE" | jq -r '.jobId')
echo "[$(date)] Job started: $JOB_ID" >> "$LOG"

# Poll until done (timeout after 30 min)
TIMEOUT=1800
ELAPSED=0
while [ $ELAPSED -lt $TIMEOUT ]; do
RESULT=$(curl -sf "$BASE/api/data-analytics/pipeline/trigger?jobId=$JOB_ID")
STATUS=$(echo "$RESULT" | jq -r '.status')

if [ "$STATUS" != "running" ]; then
EXIT_CODE=$(echo "$RESULT" | jq -r '.exitCode')
echo "[$(date)] Job $JOB_ID finished: status=$STATUS exitCode=$EXIT_CODE" >> "$LOG"
exit "$EXIT_CODE"
fi

sleep 15
ELAPSED=$((ELAPSED + 15))
done

echo "[$(date)] Job $JOB_ID timed out after ${TIMEOUT}s" >> "$LOG"
exit 1

Then in crontab:

0 0 * * * /opt/ctms-deployment/scripts/trigger-and-wait.sh

Docker Compose vs REST API

FeatureDocker ComposeREST API
Requires SSH to Docker hostYesNo
Requires Zynexa runningNoYes
Toggle control per runNoYes
Conflict detection (409)NoYes
Job tracking / pollingNoYes
Recommended for productionYesAlternative

Logging

# View latest API trigger output
tail -50 /var/log/ctms-pipeline-api.log

# Check for curl failures
grep -i 'error\|fail\|curl' /var/log/ctms-pipeline-api.log | tail -10

See Also