DBT Transformation Pipeline

The dbt (data build tool) project transforms raw clinical trial data through a Medallion Architecture into analytics-ready datasets and CDISC-compliant domains.

Enterprise Feature

DBT pipeline and semantic views are available in Enterprise editions.

Overview

DBT Pipeline Architecture

Medallion Architecture

Layer	Schema	Purpose	Models
Bronze	`bronze`	Raw 1:1 source copy	`stg_*` staging models
Silver	`silver`	Cleaned, validated, enriched	`int_*` intermediate models
Gold	`gold`	Analytics-ready facts & dimensions	`dim_`, `fact_` models
Semantic	`gold`	Pre-joined business views	`sem_*` models
CDISC	`cdisc`	Regulatory-compliant domains	`cdisc_*` models

Bronze (Raw)  →  Silver (Clean)  →  Gold (Analytics)  →  Semantic (BI)
   stg_*           int_*            dim_*/fact_*          sem_*

Key Models

Dimension Tables (`dim_*`)

Model	Description
`dim_study`	Study master data
`dim_site`	Site information
`dim_subject`	Subject demographics
`dim_patient`	Patient records
`dim_date`	Date dimension

Fact Tables (`fact_*`)

Model	Description
`fact_enrollment`	Enrollment events
`fact_adverse_event`	Adverse events
`fact_visit`	Study visits
`fact_vital_sign`	Vital measurements
`fact_lab_result`	Lab test results

Semantic Views (`sem_*`)

Pre-joined, business-friendly views for analytics:

Model	Description
`sem_clinical_summary`	One-row-per-subject with all key data
`sem_adverse_events`	AE metrics aggregated by subject
`sem_enrollment_metrics`	Monthly enrollment with cumulative totals

CDISC Domains

Regulatory-compliant domains following CDISC SDTM standards:

Domain	Description
`cdisc_dm`	Demographics
`cdisc_ae`	Adverse Events
`cdisc_vs`	Vital Signs
`cdisc_lb`	Laboratory Results
`cdisc_cm`	Concomitant Medications

Exposures

DBT exposures document how models are consumed by downstream systems:

Exposure	Consumer	Models
Cube Semantic Layer	Cube.dev	All `dim_`, `fact_`, `sem_*`
CDISC Export	Regulatory submissions	All `cdisc_*`
BI Dashboards	Metabase, Superset	`sem_*` views

Quick Commands

# Run full pipeline
dbt run

# Run by layer
dbt run --select staging
dbt run --select dimensions
dbt run --select facts
dbt run --select tag:cdisc

# Run tests
dbt test

# Generate docs
dbt docs generate && dbt docs serve

Data Lineage (OpenLineage + Marquez)

The dbt pipeline emits OpenLineage events via the dbt-ol CLI wrapper. These events are captured by Marquez, providing:

Model-level dependency tracking — which dbt models read from and write to which tables
Cross-layer lineage — visibility into data flow through bronze → silver → gold layers
Run history — historical record of every dbt execution with timing and status
Impact analysis — understand downstream effects of schema changes

Architecture

dbt-ol build
    │
    ├── Runs dbt normally (models, tests)
    │
    └── POST /api/v1/lineage → Marquez API
                                    │
                                    └── Stores in lakehouse-db (marquez database)
                                            │
                                            └── Marquez Web UI (http://localhost:3300)

Lineage Visualization

The Marquez Web UI at http://localhost:3300 shows:

View	Description
Namespace	`ctms-lakehouse` — groups all dbt lineage events
Jobs	Each dbt model (e.g., `dim_study`, `fact_enrollment`)
Datasets	Each table/view (e.g., `gold.dim_study`, `silver.stg_bronze__patient`)
Lineage Graph	Visual DAG showing source → staging → dimensions/facts → semantic

Feature Toggles

OpenLineage and Elementary are independently toggleable via environment variables:

Variable	Default	Description
`ENABLE_OPENLINEAGE`	`true` (when `OPENLINEAGE_URL` set)	Use `dbt-ol` wrapper for lineage emission; when `false`, plain `dbt` is used
`ENABLE_ELEMENTARY`	`true`	Run Elementary report generation after dbt builds

These can be overridden per-run via docker compose run -e flags, the Makefile, or the Zynexa pipeline trigger REST API. See Environment Variables for REST API toggle details.

Overview​

Medallion Architecture​

Key Models​

Dimension Tables (dim_*)​

Fact Tables (fact_*)​

Semantic Views (sem_*)​

CDISC Domains​

Exposures​

Quick Commands​

Data Lineage (OpenLineage + Marquez)​

Architecture​

Lineage Visualization​

Feature Toggles​