Data Pipeline - DBT & Semantic Layer
This guide covers running DBT transformations (Bronze → Silver → Gold) and managing the Cube.js semantic layer cache.
Quick Start (Monorepo)
The DBT pipeline is part of the consolidated AI Analytics Pipeline monorepo:
# Clone the monorepo
git clone https://github.com/zynomilabs/ctms-data-pipeline-ai-analytics.git
cd ctms-data-pipeline-ai-analytics
# Copy environment file
make env-copy
# Install DBT dependencies
make setup-dbt
# Verify connection
make dbt-debug
# Run transformations
make dbt-run
Part 1: DBT Transformation
Prerequisites
- Python 3.10+
- PostgreSQL/Neon database credentials (same as ingester)
Configure Environment
The monorepo uses a unified .env.production at the root:
# Database Connection
DB_HOST=your-neon-host.aws.neon.tech
DB_PORT=5432
DB_USER=neondb_owner
DB_PASSWORD=your-password
DB_NAME=neondb
DB_SSLMODE=require
# DBT Settings
DBT_PROFILES_DIR=.
DBT_TARGET=dev
TABLE_PREFIX=tbl_mst_
Run Transformations
# Run all models
make dbt-run
# Run models + tests
make dbt-build
# Full daily pipeline with Elementary reports
make dbt-daily
Available Commands
| Command | Description |
|---|---|
make dbt-debug | Verify database connection |
make dbt-run | Run all DBT models |
make dbt-test | Run all DBT tests |
make dbt-build | Run models + tests |
make dbt-seed | Load seed/reference data |
make dbt-silver | Run Silver layer only |
make dbt-gold | Run Gold layer only |
make dbt-docs | Generate and serve docs |
make dbt-daily | Full pipeline with reports |
Part 2: Semantic Layer Cache
After DBT refresh, the Cube.js semantic layer cache must be cleared to reflect new data.
Clear Cache (Local)
make cube-cache-bust-local
Clear Cache (Production)
make cube-cache-bust
This restarts the Cube.js server, clearing all cached queries.
Validation
Option 1: Query Database Directly
psql "postgresql://user:pass@host/db?sslmode=require" \
-c "SELECT COUNT(*) FROM gold.dim_study;"
Option 2: Use AI Chat
Query via the AI chat interface:
"How many studies are in the system?"
Expected response should reflect the latest data count.
Option 3: Cube.js Playground
Open http://localhost:4000 and run:
{"measures": ["Studies.count"]}
Full Pipeline Workflow
Run the complete data pipeline using the monorepo:
cd ctms-data-pipeline-ai-analytics
# 1. Ingest data from Frappe API
make ingester-run
# 2. Transform with DBT
make dbt-run
# 3. Clear semantic layer cache (if Cube.dev is running)
make cube-cache-bust-local
# Or run everything at once:
make pipeline
Elementary Data Observability
The pipeline includes Elementary for data quality monitoring:
# Build Elementary tables
make dbt-elementary
# Generate HTML report
make dbt-report
# Open report in browser
make dbt-view-report
Troubleshooting
| Issue | Solution |
|---|---|
| Stale data in dashboards | Run make cube-cache-bust-local |
| DBT connection failed | Verify .env credentials with make dbt-debug |
| Models not found | Run make setup-dbt to install packages |
| Elementary report fails | Ensure Elementary package is installed |
See Also
- AI Analytics Pipeline — Full architecture overview
- Environment Variables — Configuration reference