Skip to content

Release Notes — v1.11

Release date: April 2026

This release adds connections for cross-lakehouse data pipelines with Fabric context variables, sub-pipelines (with: block) for CTE-style intermediate transformations, generated sources (date_sequence, int_sequence) for creating data without external tables, and warp schema contracts with schema drift handling for target-boundary validation and evolution control.


Connections

Declare a Fabric lakehouse once by name and reference it from sources, targets, and exports. Eliminates hardcoded abfss:// paths and workspace GUIDs.

connections:
  raw_lh:
    type: onelake
    workspace: "${fabric.workspace_id}"
    lakehouse: "${fabric.lakehouse_id}"
    default_schema: raw

sources:
  orders:
    connection: raw_lh
    table: raw_orders

target:
  connection: raw_lh
  table: fact_orders
  schema: curated

Key features:

  • Cross-lakehouse reads and writes — reference different workspaces and lakehouses by name within the same thread
  • Fabric context variables${fabric.workspace_id}, ${fabric.lakehouse_id}, and ${fabric.workspace_name} auto-resolve from the active Spark session at execution time
  • Schema overridesdefault_schema on the connection applies to all tables; schema: on a source or target overrides for that specific table
  • Cascade — connections defined at loom or weave level are inherited by all threads; thread-level connections override by name
  • Export support — exports reference connections for secondary output destinations

See the Connections guide for full documentation.


Sub-Pipelines (with: block)

Define named sub-pipelines that run before the main thread steps. Each sub-pipeline reads from a source or another sub-pipeline, applies its own step chain, and produces a named DataFrame available to downstream steps.

sources:
  orders:
    path: Tables/stg_orders
  customers:
    path: Tables/dim_customer

with:
  active_customers:
    from: customers
    steps:
      - filter:
          expr: "is_active = true"

steps:
  - join:
      source: active_customers
      on: [{left: customer_id, right: customer_id}]
      type: inner

Key features:

  • CTE-like composition — named intermediate results without writing to storage
  • Chaining — sub-pipelines can reference earlier sub-pipelines by name
  • Full step support — each sub-pipeline accepts any step type (filter, select, join, rename, etc.)
  • Explain visibility — sub-pipelines appear in plan output with row counts and step counts
  • Telemetry — each sub-pipeline emits its own span under the thread span

See the Sub-Pipelines guide for full documentation.


Generated Sources

Two new source types produce single-column DataFrames from a range definition — no external table required.

date_sequence

Generates one row per calendar interval over a date range.

sources:
  calendar:
    type: date_sequence
    start: "2024-01-01"
    end: "2024-12-31"
    column: date
    step: day
  • stepday (default), week, month, or year
  • Output column is DateType
  • Both start and end are inclusive

int_sequence

Generates one row per integer over a numeric range.

sources:
  ids:
    type: int_sequence
    start: 1
    end: 10000
    column: id
    step: 1
  • step — any positive integer (default 1)
  • Output column is LongType
  • Both start and end are inclusive

Generated sources are mutually exclusive with alias, path, connection, lookup, options, and dedup.


Warp Schema Contracts

Warps are typed configuration files (.warp) that declare the intended shape of a target table — columns, types, nullability, and keys. They enable contract validation, pre-initialization, stub columns, and documentation independent of pipeline execution.

# dim_customer.warp
config_version: "1.0"
columns:
  - name: customer_sk
    type: bigint
    nullable: false
  - name: customer_id
    type: string
    nullable: false
  - name: customer_name
    type: string
  - name: email
    type: string
    default: "unknown"
keys:
  surrogate: customer_sk
  business: [customer_id]

Warp enforcement

Validates pipeline output against the declared contract.

target:
  alias: gold.dim_customer
  warp: dim_customer
  warp_enforcement: enforce

Three modes:

  • warn (default) — log findings, continue execution
  • enforce — fail the thread on any violation
  • off — skip validation

Checks missing columns, type mismatches, and nullable violations. Findings include column-level detail in both log messages and the WarpEnforcementError payload.

Warp-only columns

Columns declared in the warp but absent from pipeline output are appended with their declared default value (or null). This enables stub columns for downstream consumers without requiring the pipeline to produce them.

Pre-initialization

Set warp_init: true to create the Delta table from the effective warp before the first pipeline run. Solves the seed-row chicken-and-egg problem for dimension targets.

Auto-generation

Set warp_mode: auto to write or update the .warp file after each successful run. The generated file carries auto_generated: true as a marker.

See the Warps guide for full documentation.


Schema Drift Handling

Configurable behavior when extra columns from upstream source changes reach the target boundary. Operates with or without a warp.

target:
  alias: gold.dim_customer
  schema_drift: strict
  on_drift: warn

Drift modes

Mode Behavior
lenient (default) Extra columns pass through
strict Extra columns handled per on_drift
adaptive Extra columns pass through and extend auto-generated warps

on_drift severity (strict mode)

Action Effect
error Thread fails with SchemaDriftError
warn (default) Extra columns dropped with warning
ignore Extra columns dropped silently

Drift baseline

When a warp exists, the warp column list is the baseline. Without a warp, the existing Delta table schema is used. On first write with no table, drift detection is skipped.

Adaptive + auto

When schema_drift: adaptive and warp_mode: auto are both active, new pipeline columns pass through to Delta and are added to the auto-generated warp with discovered: true.

Cascade

All three settings (schema_drift, on_drift, warp_enforcement) cascade through defaults.target.* at loom, weave, and thread levels.

Observability

Drift events are captured regardless of mode:

  • Telemetrydrift_detected, drift_columns, drift_mode, drift_action_taken on ThreadTelemetry
  • Notebook output — drift report and warp findings sections in _repr_html_()

See the Schema Drift guide for full documentation.


Other Improvements

  • JSON Schema and LLM discoverability — all Pydantic models export JSON Schema with Field(description=...) for LLM tooling. Schema files at docs/schema/.
  • New error typesWarpEnforcementError and SchemaDriftError carry structured payloads (findings, drift_report) for programmatic error handling.
  • Plan mode warp summary — plan/explain output shows warp configuration per thread when non-default settings are present.