DXDataXPipe

Built for observable data pipelines

DataXPipe combines code generation with a live metadata layer so your team never loses track of what runs, where data flows, or whether quality gates passed.

Pipeline catalog

Register pipelines, datasets, and connections in a central metadata store. Query specs, owners, schedules, and environment tags from a versioned catalog API.

End-to-end lineage

Capture source-to-target edges as pipelines are registered. Trace upstream and downstream dependencies for any dataset to understand blast radius before you change a transform.

Quality checks

Attach SQL and runnable checks to pipeline runs. Store pass/fail results with row counts and sample rows so stakeholders can verify data health after every execution.

Spec-driven generation

Validate YAML or JSON specs against a JSON Schema, then generate Airflow DAGs, SQL transforms, test scripts, and metadata bundles — ready to deploy.

Run history & observability

Every pipeline run records status, timing, row counts, and linked check results. Prometheus metrics and structured logging integrate with your existing monitoring stack.

Multi-tenant & RBAC

Organizations get isolated API keys, plan-based limits, and role-aware permissions. Platform and admin roles control production deployments and sensitive operations.

Central pipeline catalog

Register pipeline metadata via REST API — sources, targets, connections, schedules, and ownership. The catalog persists specs so downstream tools, dashboards, and operators always query a single source of truth.

List pipelines, fetch full specs, and attach run events as jobs complete. RBAC ensures only platform and admin roles can deploy to production environments.

  • REST API at /api/v1/pipelines
  • Dataset and connection registry
  • Environment-aware deployment controls

Dataset lineage graph

Lineage edges are captured when pipelines are registered, linking sources to targets through transforms. Query upstream and downstream dependencies for any dataset to assess impact before schema or logic changes.

Lineage metadata travels with generated artifacts so Airflow DAGs and check scripts stay aligned with the catalog.

  • Graph edges stored per organization
  • Query by dataset ID
  • Embedded in generated metadata bundles

Automated quality checks

Define SQL and script-based checks in your pipeline spec. Generated test scripts execute against registered connections and post pass/fail results to the catalog, linked to the run that triggered them.

Review row counts, sample rows, and failure details in run history — giving data engineers and stakeholders confidence after every execution.

  • Check results tied to run IDs
  • SQL execution against connections
  • Pass/fail status with detailed payloads

See it in your environment

Spin up the API locally, generate the example orders_sync pipeline, and register it in under five minutes.