Data Ingestion Pipeline
Enterprise-grade data pipeline for extracting clinical trial data from Zynomi CTMS APIs and loading into the Analytics Lakehouse.
Enterprise Feature
Data ingestion and analytics pipeline are available in Enterprise editions.
Overview
The ingestion layer uses dlthub (data load tool) to extract data from 30+ CTMS API endpoints and load into cloud or on-premise data warehouses.

Architecture
| Layer | Purpose | Output |
|---|---|---|
| Source | CTMS REST APIs | JSON responses |
| Extract | Async concurrent API calls | Raw data streams |
| Load | Schema detection, incremental merge | Bronze tables |
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ CTMS APIs │────▶│ dlthub │────▶│ Data Warehouse │
│ 30+ endpoints │ │ Pipeline │ │ (Bronze Layer) │
│ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Supported Data Sources
| Category | Endpoints | Description |
|---|---|---|
| Clinical | Study, Subject, Consent | Core trial data |
| Medical | Vitals, Physical Exam, Family History | Patient health data |
| Safety | Adverse Events, Concomitant Medications | Safety monitoring |
| Visits | Patient Encounters, Lab Results | Visit-level data |
Multi-Destination Support
The pipeline supports multiple target destinations:
| Destination | Use Case |
|---|---|
| MotherDuck | Cloud data warehouse (DuckDB-as-a-Service) |
| PostgreSQL | On-premise RDBMS |
| Snowflake | Enterprise cloud DWH |
| DuckDB | Local development |
Pipeline Features
| Feature | Description |
|---|---|
| Incremental Loading | Merge mode with change detection |
| Auto Schema Evolution | Automatic DDL handling |
| Async Extraction | Concurrent API calls |
| Retry Logic | Exponential backoff on failures |
| Data Lineage | Full audit trail in bronze layer |