Skip to main content

Data Ingestion Pipeline

Enterprise-grade data pipeline for extracting clinical trial data from Zynomi CTMS APIs and loading into the Analytics Lakehouse.

Enterprise Feature

Data ingestion and analytics pipeline are available in Enterprise editions.


Overview

The ingestion layer uses dlthub (data load tool) to extract data from 30+ CTMS API endpoints and load into cloud or on-premise data warehouses.

Data Ingestion Architecture


Architecture

LayerPurposeOutput
SourceCTMS REST APIsJSON responses
ExtractAsync concurrent API callsRaw data streams
LoadSchema detection, incremental mergeBronze tables
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ CTMS APIs │────▶│ dlthub │────▶│ Data Warehouse │
│ 30+ endpoints │ │ Pipeline │ │ (Bronze Layer) │
│ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘

Supported Data Sources

CategoryEndpointsDescription
ClinicalStudy, Subject, ConsentCore trial data
MedicalVitals, Physical Exam, Family HistoryPatient health data
SafetyAdverse Events, Concomitant MedicationsSafety monitoring
VisitsPatient Encounters, Lab ResultsVisit-level data

Multi-Destination Support

The pipeline supports multiple target destinations:

DestinationUse Case
MotherDuckCloud data warehouse (DuckDB-as-a-Service)
PostgreSQLOn-premise RDBMS
SnowflakeEnterprise cloud DWH
DuckDBLocal development

Pipeline Features

FeatureDescription
Incremental LoadingMerge mode with change detection
Auto Schema EvolutionAutomatic DDL handling
Async ExtractionConcurrent API calls
Retry LogicExponential backoff on failures
Data LineageFull audit trail in bronze layer