DBT Transformation Pipeline
The dbt (data build tool) project transforms raw clinical trial data through a Medallion Architecture into analytics-ready datasets and CDISC-compliant domains.
DBT pipeline and semantic views are available in Enterprise editions.
Overview

Medallion Architecture
| Layer | Schema | Purpose | Models |
|---|---|---|---|
| Bronze | bronze | Raw 1:1 source copy | stg_* staging models |
| Silver | silver | Cleaned, validated, enriched | int_* intermediate models |
| Gold | gold | Analytics-ready facts & dimensions | dim_*, fact_* models |
| Semantic | gold | Pre-joined business views | sem_* models |
| CDISC | cdisc | Regulatory-compliant domains | cdisc_* models |
Bronze (Raw) → Silver (Clean) → Gold (Analytics) → Semantic (BI)
stg_* int_* dim_*/fact_* sem_*
Key Models
Dimension Tables (dim_*)
| Model | Description |
|---|---|
dim_study | Study master data |
dim_site | Site information |
dim_subject | Subject demographics |
dim_patient | Patient records |
dim_date | Date dimension |
Fact Tables (fact_*)
| Model | Description |
|---|---|
fact_enrollment | Enrollment events |
fact_adverse_event | Adverse events |
fact_visit | Study visits |
fact_vital_sign | Vital measurements |
fact_lab_result | Lab test results |
Semantic Views (sem_*)
Pre-joined, business-friendly views for analytics:
| Model | Description |
|---|---|
sem_clinical_summary | One-row-per-subject with all key data |
sem_adverse_events | AE metrics aggregated by subject |
sem_enrollment_metrics | Monthly enrollment with cumulative totals |
CDISC Domains
Regulatory-compliant domains following CDISC SDTM standards:
| Domain | Description |
|---|---|
cdisc_dm | Demographics |
cdisc_ae | Adverse Events |
cdisc_vs | Vital Signs |
cdisc_lb | Laboratory Results |
cdisc_cm | Concomitant Medications |
Exposures
DBT exposures document how models are consumed by downstream systems:
| Exposure | Consumer | Models |
|---|---|---|
| Cube Semantic Layer | Cube.dev | All dim_*, fact_*, sem_* |
| CDISC Export | Regulatory submissions | All cdisc_* |
| BI Dashboards | Metabase, Superset | sem_* views |
Quick Commands
# Run full pipeline
dbt run
# Run by layer
dbt run --select staging
dbt run --select dimensions
dbt run --select facts
dbt run --select tag:cdisc
# Run tests
dbt test
# Generate docs
dbt docs generate && dbt docs serve
Data Lineage (OpenLineage + Marquez)
The dbt pipeline emits OpenLineage events via the dbt-ol CLI wrapper. These events are captured by Marquez, providing:
- Model-level dependency tracking — which dbt models read from and write to which tables
- Cross-layer lineage — visibility into data flow through bronze → silver → gold layers
- Run history — historical record of every dbt execution with timing and status
- Impact analysis — understand downstream effects of schema changes
Architecture
dbt-ol build
│
├── Runs dbt normally (models, tests)
│
└── POST /api/v1/lineage → Marquez API
│
└── Stores in lakehouse-db (marquez database)
│
└── Marquez Web UI (http://localhost:3300)
Lineage Visualization
The Marquez Web UI at http://localhost:3300 shows:
| View | Description |
|---|---|
| Namespace | ctms-lakehouse — groups all dbt lineage events |
| Jobs | Each dbt model (e.g., dim_study, fact_enrollment) |
| Datasets | Each table/view (e.g., gold.dim_study, silver.stg_bronze__patient) |
| Lineage Graph | Visual DAG showing source → staging → dimensions/facts → semantic |
Feature Toggles
OpenLineage and Elementary are independently toggleable via environment variables:
| Variable | Default | Description |
|---|---|---|
ENABLE_OPENLINEAGE | true (when OPENLINEAGE_URL set) | Use dbt-ol wrapper for lineage emission; when false, plain dbt is used |
ENABLE_ELEMENTARY | true | Run Elementary report generation after dbt builds |
These can be overridden per-run via docker compose run -e flags, the Makefile, or the Zynexa pipeline trigger REST API. See Environment Variables for REST API toggle details.