Skip to main content

Data Pipeline - DBT & Semantic Layer

This guide covers running DBT transformations (Bronze → Silver → Gold) and managing the Cube.js semantic layer cache.


Quick Start (Monorepo)

The DBT pipeline is part of the consolidated AI Analytics Pipeline monorepo:

# Clone the monorepo
git clone https://github.com/zynomilabs/ctms-data-pipeline-ai-analytics.git
cd ctms-data-pipeline-ai-analytics

# Copy environment file
make env-copy

# Install DBT dependencies
make setup-dbt

# Verify connection
make dbt-debug

# Run transformations
make dbt-run

Part 1: DBT Transformation

Prerequisites

  • Python 3.10+
  • PostgreSQL/Neon database credentials (same as ingester)

Configure Environment

The monorepo uses a unified .env.production at the root:

# Database Connection
DB_HOST=your-neon-host.aws.neon.tech
DB_PORT=5432
DB_USER=neondb_owner
DB_PASSWORD=your-password
DB_NAME=neondb
DB_SSLMODE=require

# DBT Settings
DBT_PROFILES_DIR=.
DBT_TARGET=dev
TABLE_PREFIX=tbl_mst_

Run Transformations

# Run all models
make dbt-run

# Run models + tests
make dbt-build

# Full daily pipeline with Elementary reports
make dbt-daily

Available Commands

CommandDescription
make dbt-debugVerify database connection
make dbt-runRun all DBT models
make dbt-testRun all DBT tests
make dbt-buildRun models + tests
make dbt-seedLoad seed/reference data
make dbt-silverRun Silver layer only
make dbt-goldRun Gold layer only
make dbt-docsGenerate and serve docs
make dbt-dailyFull pipeline with reports

Part 2: Semantic Layer Cache

After DBT refresh, the Cube.js semantic layer cache must be cleared to reflect new data.

Clear Cache (Local)

make cube-cache-bust-local

Clear Cache (Production)

make cube-cache-bust

This restarts the Cube.js server, clearing all cached queries.


Validation

Option 1: Query Database Directly

psql "postgresql://user:pass@host/db?sslmode=require" \
-c "SELECT COUNT(*) FROM gold.dim_study;"

Option 2: Use AI Chat

Query via the AI chat interface:

"How many studies are in the system?"

Expected response should reflect the latest data count.

Option 3: Cube.js Playground

Open http://localhost:4000 and run:

{"measures": ["Studies.count"]}

Full Pipeline Workflow

Run the complete data pipeline using the monorepo:

cd ctms-data-pipeline-ai-analytics

# 1. Ingest data from Frappe API
make ingester-run

# 2. Transform with DBT
make dbt-run

# 3. Clear semantic layer cache (if Cube.dev is running)
make cube-cache-bust-local

# Or run everything at once:
make pipeline

Elementary Data Observability

The pipeline includes Elementary for data quality monitoring:

# Build Elementary tables
make dbt-elementary

# Generate HTML report
make dbt-report

# Open report in browser
make dbt-view-report

Troubleshooting

IssueSolution
Stale data in dashboardsRun make cube-cache-bust-local
DBT connection failedVerify .env credentials with make dbt-debug
Models not foundRun make setup-dbt to install packages
Elementary report failsEnsure Elementary package is installed

See Also