Skip to content

Configuration Keys

This page provides a consolidated reference of every configuration field across all weevr models. Fields are grouped by the model they belong to.

For YAML syntax and complete examples, see the schema reference pages: Thread | Weave | Loom

Data FlowBehaviorThread+config_versionstr+sourcesdict[str, Source]+pipeline_stepslist[Step]+targetTargetSource+typestr+aliasstr+pathstrTarget+aliasstr+pathstr+mapping_modestrWriteConfig+modestr+match_keyslist[str]LoadConfig+modestr+watermark_columnstrKeyConfig+business_keylist[str]+surrogate_keyconfig+change_detectionconfigValidationRule+namestr+ruleexpr+severitystrFailureConfig+on_failurestr 1..*1..1 0..10..10..10..*0..1

Thread

The top-level configuration unit for a single data pipeline.

Field Type Default Description
config_version str required Schema version identifier
name str "" Thread name
sources dict[str, Source] required Named data sources
steps list[Step] [] Ordered transformation pipeline
target Target required Output destination
write WriteConfig None Write mode and merge settings
keys KeyConfig None Key management configuration
validations list[ValidationRule] None Pre-write data quality rules
assertions list[Assertion] None Post-execution assertions
load LoadConfig None Incremental load settings
tags list[str] None Free-form tags
params dict[str, ParamSpec] None Parameter declarations
defaults dict[str, Any] None Inherited defaults
failure FailureConfig None Failure handling policy
execution ExecutionConfig None Runtime settings
cache bool None Cache DataFrame before writing

Source

A data source declaration referenced by alias within a thread.

Field Type Default Description
type str required Source type: delta, csv, json, parquet, excel
alias str None Table alias (required for delta)
path str None File path (required for file-based types)
options dict[str, Any] {} Spark reader options
dedup DedupConfig None Post-read deduplication

DedupConfig

Field Type Default Description
keys list[str] required Deduplication grouping columns
order_by str None Sort expression for row selection

Target

Write destination with column mapping and partitioning.

Field Type Default Description
alias str None Lakehouse table alias
path str None File path for file-based targets
mapping_mode "auto" \| "explicit" "auto" Column mapping strategy
columns dict[str, ColumnMapping] None Per-column mapping
partition_by list[str] None Partition columns
audit_template str None Audit column template name
naming NamingConfig None Naming normalization

ColumnMapping

Field Type Default Description
expr SparkExpr None Expression for column value (mutually exclusive with drop)
type str None Target data type for casting
default Any None Default when source is null
drop bool False Drop column from output (mutually exclusive with expr)

WriteConfig

Write mode and merge behavior for the target.

Field Type Default Description
mode "overwrite" \| "append" \| "merge" "overwrite" Write strategy
match_keys list[str] None Merge match keys (required for merge mode)
on_match "update" \| "ignore" "update" Action when source row matches target
on_no_match_target "insert" \| "ignore" "insert" Action for new source rows
on_no_match_source "delete" \| "soft_delete" \| "ignore" "ignore" Action for missing source rows
soft_delete_column str None Column for soft delete flag (required for soft_delete)
soft_delete_value str "true" Value written to the soft delete column

LoadConfig

Incremental load mode and watermark tracking.

Field Type Default Description
mode "full" \| "incremental_watermark" \| "incremental_parameter" \| "cdc" "full" Load strategy
watermark_column str None Column for watermark comparison
watermark_type "timestamp" \| "date" \| "int" \| "long" None Watermark column data type
watermark_inclusive bool False Include rows equal to last watermark
watermark_store WatermarkStoreConfig None Watermark persistence backend
cdc CdcConfig None CDC configuration

WatermarkStoreConfig

Field Type Default Description
type "table_properties" \| "metadata_table" "table_properties" Storage backend
table_path str None Path for metadata table (required for metadata_table)

CdcConfig

Field Type Default Description
preset "delta_cdf" None CDC preset (mutually exclusive with operation_column)
operation_column str None Operation type column (mutually exclusive with preset)
insert_value str None Value indicating insert
update_value str None Value indicating update
delete_value str None Value indicating delete
on_delete "hard_delete" \| "soft_delete" "hard_delete" Delete handling strategy

KeyConfig

Business key, surrogate key, and change detection settings.

Field Type Default Description
business_key list[str] None Natural business key columns
surrogate_key SurrogateKeyConfig None Surrogate key generation
change_detection ChangeDetectionConfig None Change detection hash

SurrogateKeyConfig

Field Type Default Description
name str required Output column name
algorithm "sha256" \| "md5" "sha256" Hash algorithm

ChangeDetectionConfig

Field Type Default Description
name str required Output column name
columns list[str] required Columns included in the hash
algorithm "md5" \| "sha256" "md5" Hash algorithm

ValidationRule

Pre-write data quality rules.

Field Type Default Description
name str required Rule name
rule SparkExpr required Spark SQL boolean expression
severity "info" \| "warn" \| "error" \| "fatal" "error" Failure severity

Assertion

Post-execution assertions on the target dataset.

Field Type Default Description
type "row_count" \| "column_not_null" \| "unique" \| "expression" required Assertion type
severity "info" \| "warn" \| "error" \| "fatal" "warn" Failure severity
columns list[str] None Columns for column_not_null and unique
min int None Minimum for row_count
max int None Maximum for row_count
expression SparkExpr None Expression for expression type

ExecutionConfig

Runtime settings that cascade through loom/weave/thread.

Field Type Default Description
log_level "minimal" \| "standard" \| "verbose" \| "debug" "standard" Logging verbosity
trace bool True Collect execution spans

FailureConfig

Per-thread failure handling policy.

Field Type Default Description
on_failure "abort_weave" \| "skip_downstream" \| "continue" "abort_weave" Failure policy

ParamSpec

Typed parameter declaration.

Field Type Default Description
name str required Parameter name
type str required Data type: string, int, float, bool, date, timestamp, list[string]
required bool True Whether the parameter must be supplied
default Any None Default value
description str "" Human-readable description

NamingConfig

Column and table naming normalization. Cascades through loom/weave/thread/target.

Field Type Default Description
columns NamingPattern None Column naming pattern
tables NamingPattern None Table naming pattern
exclude list[str] [] Names or patterns excluded from normalization

Supported NamingPattern values: snake_case, camelCase, PascalCase, UPPER_SNAKE_CASE, Title_Snake_Case, Title Case, lowercase, UPPERCASE, none.