Data Ingestion Pipeline

Enterprise-grade data pipeline for extracting clinical trial data from Zynomi CTMS APIs and loading into the Analytics Lakehouse.

Enterprise Feature

Data ingestion and analytics pipeline are available in Enterprise editions.

Overview

The ingestion layer uses dlthub (data load tool) to extract data from 30+ CTMS API endpoints and load into cloud or on-premise data warehouses.

Data Ingestion Architecture

Architecture

Layer	Purpose	Output
Source	CTMS REST APIs	JSON responses
Extract	Async concurrent API calls	Raw data streams
Load	Schema detection, incremental merge	Bronze tables

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   CTMS APIs     │────▶│   dlthub        │────▶│  Data Warehouse │
│   30+ endpoints │     │   Pipeline      │     │  (Bronze Layer) │
│                 │     │                 │     │                 │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Supported Data Sources

Category	Endpoints	Description
Clinical	Study, Subject, Consent	Core trial data
Medical	Vitals, Physical Exam, Family History	Patient health data
Safety	Adverse Events, Concomitant Medications	Safety monitoring
Visits	Patient Encounters, Lab Results	Visit-level data

Multi-Destination Support

The pipeline supports multiple target destinations:

Destination	Use Case
MotherDuck	Cloud data warehouse (DuckDB-as-a-Service)
PostgreSQL	On-premise RDBMS
Snowflake	Enterprise cloud DWH
DuckDB	Local development

Pipeline Features

Feature	Description
Incremental Loading	Merge mode with change detection
Auto Schema Evolution	Automatic DDL handling
Async Extraction	Concurrent API calls
Retry Logic	Exponential backoff on failures
Data Lineage	Full audit trail in bronze layer

Overview​

Architecture​

Supported Data Sources​

Multi-Destination Support​

Pipeline Features​

Overview

Architecture

Supported Data Sources

Multi-Destination Support

Pipeline Features