Weave YAML Schema¶
A weave groups a collection of threads into a dependency graph. It defines which threads run, their execution order, optional shared defaults, and runtime settings.
Top-level keys¶
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
config_version |
string |
yes | -- | Schema version identifier (e.g. "1.0") |
name |
string |
no | "" |
Human-readable weave name |
threads |
list[ThreadEntry or string] |
yes | -- | Thread references. Strings are shorthand for { name: "<value>" } (local name references). Use ref for external file references. |
lookups |
dict[string, Lookup] |
no | null |
Named lookup sources shared across threads in this weave |
column_sets |
dict[string, ColumnSet] |
no | null |
Named column sets for bulk rename from external dictionaries. Shared across threads in this weave. |
variables |
dict[string, VariableSpec] |
no | null |
Weave-scoped typed variables, settable by hook steps |
pre_steps |
list[HookStep] |
no | null |
Hook steps executed before any thread runs |
post_steps |
list[HookStep] |
no | null |
Hook steps executed after all threads complete |
defaults |
dict[string, any] |
no | null |
Default values cascaded into every thread in this weave. audit_columns and exports use additive merge (see Exports guide). |
params |
dict[string, ParamSpec] |
no | null |
Typed parameter declarations scoped to this weave |
execution |
ExecutionConfig |
no | null |
Runtime settings (logging, tracing) cascaded to threads |
naming |
NamingConfig |
no | null |
Naming normalization cascaded to threads |
audit_templates |
dict[string, AuditTemplate] |
no | null |
Named audit column templates cascaded to all threads in this weave. Thread-level definitions override weave-level definitions with the same name. A flat dict[string, string] shorthand is also accepted. See Audit Templates guide. |
connections |
dict[string, OneLakeConnection] |
no | null |
Named connection definitions cascaded to threads. Thread-level connections with the same name override weave-level. See Connections guide. |
threads (ThreadEntry)¶
Each entry in the threads list is either a plain string (shorthand) or a
ThreadEntry object. Use ref to reference an external .thread file, or
name for inline thread definitions within the weave.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
ref |
string |
no | null |
Path to an external .thread file, relative to the project root. Mutually exclusive with inline name. |
name |
string |
no | "" |
Thread name. Required for inline definitions; derived from filename stem when using ref. |
as |
string |
no | null |
Alias overriding the thread's effective name. Required when the same ref appears multiple times. See Thread Templates guide. |
params |
dict[string, any] |
no | null |
Key-value pairs injected into the thread's ${param.x} namespace for this entry. See Thread Templates guide. |
dependencies |
list[string] |
no | null |
Explicit upstream thread names. Merged with auto-inferred dependencies from source/target path matching. |
condition |
ConditionSpec |
no | null |
Conditional execution gate |
threads.condition (ConditionSpec)¶
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
when |
string |
yes | -- | Condition expression. Supports ${param.x} references, table_exists(), table_empty(), row_count(), and boolean operators. |
lookups¶
Named lookup data sources shared across threads. Each key is a lookup name
that threads can reference via source.lookup instead of declaring a direct
source type.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
source |
Source |
yes | -- | Source definition for the lookup data |
materialize |
bool |
no | false |
Pre-read and cache/broadcast the data before thread execution |
strategy |
string |
no | "cache" |
Materialization strategy: "broadcast" (Spark broadcast join hints) or "cache" (DataFrame caching). Only meaningful when materialize is true. |
key |
list[string] |
no | null |
Column(s) used for matching (join key). When set with values, only these columns plus values columns are retained. |
values |
list[string] |
no | null |
Payload column(s) to retrieve. Requires key to be set. Must not overlap with key. |
filter |
string |
no | null |
SQL WHERE expression applied to the source before projection |
unique_key |
bool |
no | false |
Validate that key columns form a unique key after filtering and projection |
on_failure |
string |
no | "abort" |
Behavior when unique_key check finds duplicates: "abort" or "warn". Only valid when unique_key is true. |
lookups:
dim_product:
source:
type: delta
alias: silver.dim_product
materialize: true
strategy: cache
key: [product_id]
values: [product_name, category]
unique_key: true
exchange_rates:
source:
type: delta
alias: ref.exchange_rates
filter: "effective_date = current_date()"
materialize: true
strategy: broadcast
Threads reference lookups via the lookup field on a source:
# In a thread's sources block
sources:
products:
lookup: dim_product # resolved from the weave's lookups map
column_sets¶
Named column sets define externally-sourced column mappings for bulk
rename operations. Each key is a column set name that rename steps can
reference via column_set: <name>.
Column sets can be sourced from a Delta table, a YAML file, or a
runtime parameter. A column set must have exactly one of source or
param — not both.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
source |
ColumnSetSource |
no | null |
External source for the column mapping (Delta or YAML) |
param |
string |
no | null |
Runtime parameter name containing a dict[str, str] mapping. Mutually exclusive with source. |
on_unmapped |
string |
no | "pass_through" |
Behavior when a DataFrame column has no rename entry: "pass_through" or "error" |
on_extra |
string |
no | "ignore" |
Behavior when a mapping key references a column not in the DataFrame: "ignore", "warn", or "error" |
on_failure |
string |
no | "abort" |
Behavior when the source cannot be read or returns no rows: "abort", "warn", or "skip" |
column_sets.source (ColumnSetSource)¶
A delta source can be addressed two ways: by registered table alias
(default Spark catalog) or by named connection plus table (resolves
through the lakehouse referenced by a connections: entry on the loom
or weave). The two modes are mutually exclusive.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
type |
string |
yes | -- | Source type: "delta" or "yaml" |
alias |
string |
cond. | null |
Lakehouse table alias. One of alias or connection is required when type="delta". Mutually exclusive with connection. |
path |
string |
cond. | null |
File path. Required when type="yaml". |
connection |
string |
cond. | null |
Reference to a named connection defined at the loom or weave level. Only valid for type="delta". Mutually exclusive with alias. |
schema |
string |
no | null |
Schema name within the connection's lakehouse. Only valid alongside connection. |
table |
string |
cond. | null |
Table name within the connection's lakehouse. Required when connection is set. |
from_column |
string |
no | "source_name" |
Column containing the original (raw) column names |
to_column |
string |
no | "target_name" |
Column containing the target (friendly) column names |
filter |
string |
no | null |
SQL WHERE expression applied before collecting mappings |
# Connection-backed Delta source — reads from a non-default lakehouse
# referenced by a loom-level connections: entry.
connections:
ref:
type: onelake
workspace: ${param.workspace_id}
lakehouse: ${param.ref_lakehouse_id}
column_sets:
sap_dictionary:
source:
type: delta
connection: ref
schema: dictionary
table: sap_column_renames
from_column: raw_name
to_column: friendly_name
filter: "system = 'SAP'"
on_unmapped: pass_through
on_extra: ignore
on_failure: abort
# Alias-backed Delta source — resolves against the active Spark catalog
# (the attached notebook lakehouse on Fabric).
legacy_dictionary:
source:
type: delta
alias: ref.column_dictionary
hr_mappings:
source:
type: yaml
path: mappings/hr_columns.yaml
runtime_overrides:
param: column_overrides
Rename steps reference column sets by name:
pre_steps / post_steps (HookStep)¶
Hook steps that run before or after thread execution within a weave.
pre_steps run before any thread starts; post_steps run after all
threads complete.
Each hook step is a discriminated union with three types, identified by
the type field.
Common fields¶
All hook step types share these fields:
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
type |
string |
yes | -- | Step type: "quality_gate", "sql_statement", or "log_message" |
name |
string |
no | null |
Optional name for telemetry and diagnostics |
on_failure |
string |
no | phase default | Failure behavior: "abort" or "warn". Defaults to "abort" for pre_steps and "warn" for post_steps. |
quality_gate¶
Runs a predefined quality check against a data source or target.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
check |
string |
yes | -- | Check type: "source_freshness", "row_count_delta", "row_count", "table_exists", "expression" |
source |
string |
conditional | null |
Table alias. Required for source_freshness and table_exists. |
max_age |
string |
conditional | null |
Duration string (e.g. "24h"). Required for source_freshness. |
target |
string |
conditional | null |
Table alias. Required for row_count_delta and row_count. |
max_decrease_pct |
float |
no | null |
Max allowed decrease percentage for row_count_delta |
max_increase_pct |
float |
no | null |
Max allowed increase percentage for row_count_delta |
min_delta |
int |
no | null |
Minimum absolute row change for row_count_delta |
max_delta |
int |
no | null |
Maximum absolute row change for row_count_delta |
min_count |
int |
conditional | null |
Minimum row count for row_count. At least one of min_count or max_count required. |
max_count |
int |
conditional | null |
Maximum row count for row_count |
sql |
string |
conditional | null |
Spark SQL boolean expression for expression check. Required for expression. |
message |
string |
no | null |
Failure message for expression check diagnostics |
sql_statement¶
Executes an arbitrary Spark SQL statement. Optionally captures the scalar result into a weave-scoped variable.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
sql |
string |
yes | -- | Spark SQL statement to execute |
set_var |
string |
no | null |
Variable name to capture the scalar result into. The variable must be declared in variables. |
log_message¶
Emits a log message. Supports ${var.name} placeholders resolved from
weave variables.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
message |
string |
yes | -- | Message template. Supports ${var.name} placeholders. |
level |
string |
no | "info" |
Log level: "info", "warn", "error" |
pre_steps:
- type: quality_gate
name: check_source_freshness
check: source_freshness
source: raw.orders
max_age: "24h"
- type: sql_statement
name: capture_baseline
sql: "SELECT count(*) FROM silver.fact_orders"
set_var: baseline_count
post_steps:
- type: quality_gate
name: check_row_count
check: row_count
target: silver.fact_orders
min_count: 1
- type: log_message
message: "Pipeline complete. Baseline was ${var.baseline_count} rows."
level: info
variables (VariableSpec)¶
Typed scalar variables scoped to the weave. Variables can be set by
sql_statement hook steps via set_var and referenced in downstream
config as ${var.name}.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
type |
string |
yes | -- | Scalar type: "string", "int", "long", "float", "double", "boolean", "timestamp", "date" |
default |
string\|int\|float\|bool |
no | null |
Default value used when no hook step sets the variable |
execution (ExecutionConfig)¶
Cascades to all threads within the weave. Thread-level settings override these.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
log_level |
string |
no | "standard" |
Logging verbosity: "minimal", "standard", "verbose", "debug" |
trace |
bool |
no | true |
Collect execution spans for telemetry |
naming (NamingConfig)¶
Cascades to all threads within the weave. Thread-level and target-level naming settings override these.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
columns |
NamingPattern |
no | null |
Column naming pattern |
tables |
NamingPattern |
no | null |
Table naming pattern |
exclude |
list[string] |
no | [] |
Names or glob patterns excluded from normalization |
on_collision |
string |
no | "error" |
Behavior when normalization produces duplicate names: "error" or "suffix" (appends _2, _3, etc.) |
reserved_words |
ReservedWordConfig |
no | null |
Reserved word protection for column and table names |
Supported patterns: snake_case, camelCase, PascalCase,
UPPER_SNAKE_CASE, Title_Snake_Case, Title Case, lowercase,
UPPERCASE, kebab-case, none.
kebab-case requires backtick-quoting
Column names with hyphens must be backtick-quoted in Spark SQL
expressions (e.g., `my-column`). A validation warning is
emitted when kebab-case is selected.
naming.reserved_words (ReservedWordConfig)¶
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
strategy |
string |
no | "quote" |
Collision strategy (see below) |
prefix |
string |
no | "_" |
Prefix for prefix strategy |
suffix |
string |
no | "_col" |
Suffix for suffix strategy |
rename_map |
dict |
cond. | null |
Map for rename strategy |
fallback |
string |
no | "quote" |
Fallback for unmapped rename |
preset |
string \| list |
no | null |
Word list presets |
extend |
list[string] |
no | [] |
Extra reserved words |
exclude |
list[string] |
no | [] |
Words to remove from check |
Strategies: quote, prefix, suffix, error, rename,
revert, drop. The rename strategy requires rename_map
(non-empty dict). The drop strategy is not valid for table
names — neither as a direct strategy nor as a fallback.
See Configuration Keys for full
strategy descriptions and YAML examples.
Complete example¶
config_version: "1.0"
name: sales_pipeline
lookups:
dim_product:
source:
type: delta
alias: silver.dim_product
materialize: true
strategy: cache
key: [product_id]
values: [product_name, category]
unique_key: true
column_sets:
sap_dictionary:
source:
type: delta
alias: ref.column_dictionary
from_column: raw_name
to_column: friendly_name
variables:
baseline_count:
type: int
default: 0
pre_steps:
- type: quality_gate
name: check_orders_freshness
check: source_freshness
source: raw.orders
max_age: "24h"
- type: sql_statement
name: capture_baseline
sql: "SELECT count(*) FROM curated.order_summary"
set_var: baseline_count
threads:
- ref: staging/load_orders.thread
- ref: staging/load_customers.thread
- ref: curated/build_order_summary.thread
dependencies: [load_orders, load_customers]
- ref: curated/refresh_snapshot.thread
dependencies: [build_order_summary]
condition:
when: "table_exists('curated.order_summary')"
post_steps:
- type: quality_gate
name: verify_row_count
check: row_count
target: curated.order_summary
min_count: 1
- type: log_message
message: "Pipeline complete. Baseline was ${var.baseline_count} rows."
level: info
defaults:
write:
mode: merge
exports:
- name: audit_archive
type: parquet
path: /archive/${thread.name}/${run.timestamp}/
params:
run_date:
name: run_date
type: date
required: true
description: "Pipeline execution date"
execution:
log_level: verbose
trace: true
naming:
columns: snake_case
on_collision: suffix
reserved_words:
preset: [ansi, dax]
strategy: prefix
prefix: "_"
Shorthand thread syntax¶
Thread entries can be plain strings for inline definitions that need no overrides:
This is equivalent to:
For external file references, use ref explicitly: