Skip to main content

DBT Transformation Pipeline

The dbt (data build tool) project transforms raw clinical trial data through a Medallion Architecture into analytics-ready datasets and CDISC-compliant domains.

Enterprise Feature

DBT pipeline and semantic views are available in Enterprise editions.


Overview

DBT Pipeline Architecture


Medallion Architecture

LayerSchemaPurposeModels
BronzebronzeRaw 1:1 source copystg_* staging models
SilversilverCleaned, validated, enrichedint_* intermediate models
GoldgoldAnalytics-ready facts & dimensionsdim_*, fact_* models
SemanticgoldPre-joined business viewssem_* models
CDISCcdiscRegulatory-compliant domainscdisc_* models
Bronze (Raw)  →  Silver (Clean)  →  Gold (Analytics)  →  Semantic (BI)
stg_* int_* dim_*/fact_* sem_*

Key Models

Dimension Tables (dim_*)

ModelDescription
dim_studyStudy master data
dim_siteSite information
dim_subjectSubject demographics
dim_patientPatient records
dim_dateDate dimension

Fact Tables (fact_*)

ModelDescription
fact_enrollmentEnrollment events
fact_adverse_eventAdverse events
fact_visitStudy visits
fact_vital_signVital measurements
fact_lab_resultLab test results

Semantic Views (sem_*)

Pre-joined, business-friendly views for analytics:

ModelDescription
sem_clinical_summaryOne-row-per-subject with all key data
sem_adverse_eventsAE metrics aggregated by subject
sem_enrollment_metricsMonthly enrollment with cumulative totals

CDISC Domains

Regulatory-compliant domains following CDISC SDTM standards:

DomainDescription
cdisc_dmDemographics
cdisc_aeAdverse Events
cdisc_vsVital Signs
cdisc_lbLaboratory Results
cdisc_cmConcomitant Medications

Exposures

DBT exposures document how models are consumed by downstream systems:

ExposureConsumerModels
Cube Semantic LayerCube.devAll dim_*, fact_*, sem_*
CDISC ExportRegulatory submissionsAll cdisc_*
BI DashboardsMetabase, Supersetsem_* views

Quick Commands

# Run full pipeline
dbt run

# Run by layer
dbt run --select staging
dbt run --select dimensions
dbt run --select facts
dbt run --select tag:cdisc

# Run tests
dbt test

# Generate docs
dbt docs generate && dbt docs serve

Data Lineage (OpenLineage + Marquez)

The dbt pipeline emits OpenLineage events via the dbt-ol CLI wrapper. These events are captured by Marquez, providing:

  • Model-level dependency tracking — which dbt models read from and write to which tables
  • Cross-layer lineage — visibility into data flow through bronze → silver → gold layers
  • Run history — historical record of every dbt execution with timing and status
  • Impact analysis — understand downstream effects of schema changes

Architecture

dbt-ol build

├── Runs dbt normally (models, tests)

└── POST /api/v1/lineage → Marquez API

└── Stores in lakehouse-db (marquez database)

└── Marquez Web UI (http://localhost:3300)

Lineage Visualization

The Marquez Web UI at http://localhost:3300 shows:

ViewDescription
Namespacectms-lakehouse — groups all dbt lineage events
JobsEach dbt model (e.g., dim_study, fact_enrollment)
DatasetsEach table/view (e.g., gold.dim_study, silver.stg_bronze__patient)
Lineage GraphVisual DAG showing source → staging → dimensions/facts → semantic

Feature Toggles

OpenLineage and Elementary are independently toggleable via environment variables:

VariableDefaultDescription
ENABLE_OPENLINEAGEtrue (when OPENLINEAGE_URL set)Use dbt-ol wrapper for lineage emission; when false, plain dbt is used
ENABLE_ELEMENTARYtrueRun Elementary report generation after dbt builds

These can be overridden per-run via docker compose run -e flags, the Makefile, or the Zynexa pipeline trigger REST API. See Environment Variables for REST API toggle details.