Skip to content

Exports (Secondary Outputs)

Exports let a thread write its output data to additional locations beyond the primary Delta target. This avoids duplicating thread pipelines for common patterns like Parquet archiving, CSV extracts for non-Spark systems, or compliance copies.

Overview

Exports write the same DataFrame that goes to the primary target — post-mapping and with audit columns applied. They are declared in the exports: key at the thread, weave, or loom level and execute sequentially after the primary write, watermark persistence, and post-write assertions.

Configuration

exports:
  - name: parquet_archive
    description: "Daily archive for external consumers"
    type: parquet
    path: /lakehouse/archive/${thread.name}/${run.timestamp}/
    partition_by: [region, date]
    on_failure: warn
    options:
      compression: snappy

  - name: compliance_copy
    type: delta
    alias: compliance.orders_archive
    on_failure: abort

Fields

Field Required Default Description
name Yes Unique identifier (valid Python identifier).
type Yes Output format: delta, parquet, csv, json, orc.
path Conditional OneLake path. Required when alias is not set. Supports variables.
alias Conditional Metastore alias. Delta type only, mutually exclusive with path.
description No Human-readable label shown in explain() output.
mode No overwrite Write mode. Only overwrite in v1.
partition_by No Partition columns, independent of primary target.
on_failure No warn abort fails the thread; warn logs and continues.
enabled No true Set to false to suppress an inherited export.
options No Format-specific Spark DataFrameWriter options.

Format Notes

  • Delta: use alias for a metastore-registered table or path for a direct OneLake path. Exactly one must be set. Auto-creates tables on first write.
  • Parquet: use options.compression for compression codec (snappy, gzip, etc.).
  • CSV: common options include header, delimiter, quote, escape.
  • JSON: writes one JSON object per line by default.
  • ORC: Hive-compatible columnar format.

Complex types (arrays, maps, structs) may not be supported by flat-file formats (CSV, JSON). If a format cannot serialize the schema, the export fails and follows the on_failure behavior.

Dynamic Path Naming

Export paths support context variables for archive-style outputs with unique-per-run directories:

  • ${run.timestamp} — ISO 8601 UTC timestamp of execution start
  • ${run.id} — UUID4 unique per execution
  • ${thread.name}, ${weave.name}, ${loom.name} — config hierarchy names
path: /archive/${loom.name}/${thread.name}/${run.timestamp}/

These variables are resolved at execution time, not at config load time.

Cascade Inheritance

Exports cascade additively through loom → weave → thread:

  • Each level can define exports; all are collected by name.
  • Same-named exports at a lower level override the higher-level definition.
  • enabled: false at any level suppresses an inherited export.
# loom defaults — applies to all threads
defaults:
  exports:
    - name: audit_archive
      type: parquet
      path: /archive/${thread.name}/${run.timestamp}/

# thread level — suppress the inherited archive
exports:
  - name: audit_archive
    enabled: false

Error Handling

on_failure Behavior
warn (default) Log warning, record error in telemetry, continue. Thread succeeds.
abort Record error, raise ExportError. Thread status: failure. Remaining exports skipped.

Exports only run after a successful primary write. If the primary write fails, no exports execute.

Observability

  • explain(): lists exports with name, type, and target path before execution.
  • summary(): shows per-export results with row count and status.
  • Flow SVG: export nodes appear as destinations fanning out from the preparation stage.
  • Sankey waterfall: export bands shown alongside the target band.
  • Telemetry: each export produces an ExportResult on ThreadTelemetry.