Data Pipeline - DBT & Semantic Layer

This guide covers running DBT transformations (Bronze → Silver → Gold) and managing the Cube.js semantic layer cache.

Quick Start (Monorepo)

The DBT pipeline is part of the consolidated AI Analytics Pipeline monorepo:

# Navigate to the analytics monorepo
cd ctms-data-pipeline-ai-analytics

# Copy environment file
make env-copy

# Install DBT dependencies
make setup-dbt

# Verify connection
make dbt-debug

# Run transformations
make dbt-run

Part 1: DBT Transformation

Prerequisites

Python 3.10+
PostgreSQL/Neon database credentials (same as ingester)

Configure Environment

The monorepo uses a unified .env.production at the root:

# Database Connection
DB_HOST=your-neon-host.aws.neon.tech
DB_PORT=5432
DB_USER=neondb_owner
DB_PASSWORD=your-password
DB_NAME=neondb
DB_SSLMODE=require

# DBT Settings
DBT_PROFILES_DIR=.
DBT_TARGET=dev
TABLE_PREFIX=tbl_mst_

Run Transformations

# Run all models
make dbt-run

# Run models + tests
make dbt-build

# Full daily pipeline with Elementary reports
make dbt-daily

Available Commands

Command	Description
`make dbt-debug`	Verify database connection
`make dbt-run`	Run all DBT models
`make dbt-test`	Run all DBT tests
`make dbt-build`	Run models + tests
`make dbt-seed`	Load seed/reference data
`make dbt-silver`	Run Silver layer only
`make dbt-gold`	Run Gold layer only
`make dbt-docs`	Generate and serve docs
`make dbt-daily`	Full pipeline with reports

Part 2: Semantic Layer Cache

After DBT refresh, the Cube.js semantic layer cache must be cleared to reflect new data.

Clear Cache (Local)

make cube-cache-bust-local

Clear Cache (Production)

make cube-cache-bust

This restarts the Cube.js server, clearing all cached queries.

Validation

Option 1: Query Database Directly

psql "postgresql://user:pass@host/db?sslmode=require" \
  -c "SELECT COUNT(*) FROM gold.dim_study;"

Option 2: Use AI Chat

Query via the AI chat interface:

"How many studies are in the system?"

Expected response should reflect the latest data count.

Option 3: Cube.js Playground

Open http://localhost:4000 and run:

{"measures": ["Studies.count"]}

Full Pipeline Workflow

Run the complete data pipeline using the monorepo:

cd ctms-data-pipeline-ai-analytics

# 1. Ingest data from Frappe API
make ingester-run

# 2. Transform with DBT
make dbt-run

# 3. Clear semantic layer cache (if Cube.dev is running)
make cube-cache-bust-local

# Or run everything at once:
make pipeline

Elementary Data Observability

The pipeline includes Elementary for data quality monitoring:

# Build Elementary tables
make dbt-elementary

# Generate HTML report
make dbt-report

# Open report in browser
make dbt-view-report

Troubleshooting

Issue	Solution
Stale data in dashboards	Run `make cube-cache-bust-local`
DBT connection failed	Verify `.env` credentials with `make dbt-debug`
Models not found	Run `make setup-dbt` to install packages
Elementary report fails	Ensure Elementary package is installed

Quick Start (Monorepo)​

Part 1: DBT Transformation​

Prerequisites​

Configure Environment​

Run Transformations​

Available Commands​

Part 2: Semantic Layer Cache​

Clear Cache (Local)​

Clear Cache (Production)​

Validation​

Option 1: Query Database Directly​

Option 2: Use AI Chat​

Option 3: Cube.js Playground​

Full Pipeline Workflow​

Elementary Data Observability​

Troubleshooting​

See Also​