Pipeline catalog
Register pipelines, datasets, and connections in a central metadata store. Query specs, owners, schedules, and environment tags from a versioned catalog API.
DataXPipe combines code generation with a live metadata layer so your team never loses track of what runs, where data flows, or whether quality gates passed.
Register pipelines, datasets, and connections in a central metadata store. Query specs, owners, schedules, and environment tags from a versioned catalog API.
Capture source-to-target edges as pipelines are registered. Trace upstream and downstream dependencies for any dataset to understand blast radius before you change a transform.
Attach SQL and runnable checks to pipeline runs. Store pass/fail results with row counts and sample rows so stakeholders can verify data health after every execution.
Validate YAML or JSON specs against a JSON Schema, then generate Airflow DAGs, SQL transforms, test scripts, and metadata bundles — ready to deploy.
Every pipeline run records status, timing, row counts, and linked check results. Prometheus metrics and structured logging integrate with your existing monitoring stack.
Organizations get isolated API keys, plan-based limits, and role-aware permissions. Platform and admin roles control production deployments and sensitive operations.
Register pipeline metadata via REST API — sources, targets, connections, schedules, and ownership. The catalog persists specs so downstream tools, dashboards, and operators always query a single source of truth.
List pipelines, fetch full specs, and attach run events as jobs complete. RBAC ensures only platform and admin roles can deploy to production environments.
Lineage edges are captured when pipelines are registered, linking sources to targets through transforms. Query upstream and downstream dependencies for any dataset to assess impact before schema or logic changes.
Lineage metadata travels with generated artifacts so Airflow DAGs and check scripts stay aligned with the catalog.
Define SQL and script-based checks in your pipeline spec. Generated test scripts execute against registered connections and post pass/fail results to the catalog, linked to the run that triggered them.
Review row counts, sample rows, and failure details in run history — giving data engineers and stakeholders confidence after every execution.
Spin up the API locally, generate the example orders_sync pipeline, and register it in under five minutes.