Model API¶

The weevr.model package contains all Pydantic domain models that represent the configuration structure -- threads, weaves, looms, pipeline steps, sources, targets, hooks, lookups, exports, variables, failure configuration, and supporting types.

`weevr.model` ¶

Domain object model for weevr configuration.

`HookStep = Annotated[Annotated[QualityGateStep, Tag('quality_gate')] | Annotated[SqlStatementStep, Tag('sql_statement')] | Annotated[LogMessageStep, Tag('log_message')], Discriminator(_hook_step_discriminator)]` `module-attribute` ¶

Discriminated union of all hook step types.

Dispatches on the type field: quality_gate, sql_statement, or log_message.

Step = Annotated[Annotated[FilterStep, Tag('filter')] | Annotated[DeriveStep, Tag('derive')] | Annotated[JoinStep, Tag('join')] | Annotated[SelectStep, Tag('select')] | Annotated[DropStep, Tag('drop')] | Annotated[RenameStep, Tag('rename')] | Annotated[CastStep, Tag('cast')] | Annotated[DedupStep, Tag('dedup')] | Annotated[SortStep, Tag('sort')] | Annotated[UnionStep, Tag('union')] | Annotated[AggregateStep, Tag('aggregate')] | Annotated[WindowStep, Tag('window')] | Annotated[PivotStep, Tag('pivot')] | Annotated[UnpivotStep, Tag('unpivot')] | Annotated[CaseWhenStep, Tag('case_when')] | Annotated[FillNullStep, Tag('fill_null')] | Annotated[CoalesceStep, Tag('coalesce')] | Annotated[StringOpsStep, Tag('string_ops')] | Annotated[DateOpsStep, Tag('date_ops')] | Annotated[ConcatStep, Tag('concat')] | Annotated[MapStep, Tag('map')] | Annotated[FormatStep, Tag('format')] | Annotated[ResolveStep, Tag('resolve')], Discriminator(_step_discriminator)] `module-attribute` ¶

Discriminated union of all pipeline step types.

Accepts a dict with a single key matching the step type name: {"filter": {"expr": "amount > 0"}} → FilterStep

`SparkExpr = NewType('SparkExpr', str)` `module-attribute` ¶

A Spark SQL expression string.

Used to annotate fields that hold Spark SQL expressions (e.g., filter predicates, derived column expressions). At runtime these are plain strings; the NewType signals intent and allows downstream type-checkers to distinguish expressions from arbitrary strings.

all = ['OneLakeConnection', 'Thread', 'Weave', 'Loom', 'FailureConfig', 'ThreadEntry', 'ConditionSpec', 'WeaveEntry', 'Source', 'DedupConfig', 'Target', 'ColumnMapping', 'AuditTemplate', 'Step', 'FilterStep', 'DeriveStep', 'JoinStep', 'SelectStep', 'DropStep', 'RenameStep', 'CastStep', 'DedupStep', 'SortStep', 'UnionStep', 'AggregateStep', 'WindowStep', 'PivotStep', 'UnpivotStep', 'CaseWhenStep', 'FillNullStep', 'CoalesceStep', 'StringOpsStep', 'DateOpsStep', 'ConcatStep', 'ConcatParams', 'MapStep', 'MapParams', 'FormatStep', 'FormatSpec', 'FormatParams', 'ResolveStep', 'ResolveParams', 'ResolveBatchItem', 'EffectiveConfig', 'CurrentConfig', 'KeyConfig', 'SurrogateKeyConfig', 'ChangeDetectionConfig', 'WriteConfig', 'ValidationRule', 'Assertion', 'LoadConfig', 'ParamSpec', 'ParamsConfig', 'Export', 'ExecutionConfig', 'LogLevel', 'SparkExpr', 'HookStep', 'QualityGateStep', 'SqlStatementStep', 'LogMessageStep', 'Lookup', 'ColumnSet', 'ColumnSetSource', 'ReservedWordConfig', 'ReservedWordPreset', 'NamingConfig', 'NamingPattern', 'VariableSpec', 'DimensionSurrogateKeyConfig', 'SubPipeline'] `module-attribute` ¶

`AuditTemplate` ¶

Bases: FrozenBase

A named set of audit columns applied during data shaping.

Attributes:

Name	Type	Description
`columns`	`dict[str, str]`	Mapping of column names to Spark SQL expressions. Each entry defines a column that will be appended to the output dataset.

Example

template = AuditTemplate( ... columns={ ... "created_at": "current_timestamp()", ... "created_by": "current_user()", ... } ... )

`ColumnSet` ¶

Bases: FrozenBase

A named column set that defines an external column mapping.

Column sets describe where to find a mapping of incoming column names to outgoing column names. The mapping data can come from a Delta table, a YAML file, or a runtime notebook parameter. Exactly one of source or param must be provided.

Attributes:

Name	Type	Description
`source`	`ColumnSetSource \| None`	Source definition pointing to the mapping data.
`param`	`str \| None`	Name of a runtime notebook parameter that supplies the mapping at execution time.
`on_unmapped`	`Literal['pass_through', 'error']`	Behaviour when an input column has no mapping entry. `"pass_through"` keeps the column unchanged; `"error"` aborts.
`on_extra`	`Literal['ignore', 'warn', 'error']`	Behaviour when the mapping contains entries for columns not present in the input. `"ignore"` silently skips; `"warn"` logs a warning; `"error"` aborts.
`on_failure`	`Literal['abort', 'warn', 'skip']`	Behaviour when the column set cannot be resolved (e.g. source unavailable). `"abort"` raises; `"warn"` logs and continues; `"skip"` silently skips the rename step.

`ColumnSetSource` ¶

Bases: FrozenBase

Source definition for a named column set.

Describes where the column mapping data lives — either a Delta table or a YAML file. Delta sources may reference a registered table alias or a named connection plus a table name; YAML sources reference a file path relative to the project root.

Attributes:

Name	Type	Description
`type`	`Literal['delta', 'yaml']`	Source kind — `"delta"` or `"yaml"`.
`alias`	`str \| None`	Registered table alias. One of `alias` or `connection` is required when `type="delta"`; mutually exclusive with `connection`.
`path`	`str \| None`	File path to the YAML mapping file. Required for `type="yaml"`.
`connection`	`str \| None`	Reference to a named connection defined at the loom or weave level. When set, the column set is read from `schema.table` within the connection's lakehouse. Only valid for `type="delta"` and mutually exclusive with `alias`.
`schema_override`	`str \| None`	Schema name within the connection's lakehouse. Only valid alongside `connection`. Accepts the YAML key `schema`.
`table`	`str \| None`	Table name within the connection's lakehouse. Required when `connection` is set; not valid otherwise.
`from_column`	`str`	Column name in the source that holds the incoming column names. Defaults to `"source_name"`.
`to_column`	`str`	Column name in the source that holds the outgoing (renamed) column names. Defaults to `"target_name"`.
`filter`	`str \| None`	SQL WHERE expression applied when reading a Delta source.

`ReservedWordConfig` ¶

Bases: FrozenBase

Configuration for handling reserved word collisions in column names.

When a column or table name matches a reserved word, the engine applies the configured strategy to resolve the collision.

Strategies:

"quote" — keep the name as-is; rely on backtick-quoting in SQL.
"prefix" — prepend prefix to colliding names.
"suffix" — append suffix to colliding names.
"error" — raise a ConfigError listing all colliding names.
"rename" — apply explicit rename_map; unmapped collisions fall through to fallback strategy.
"revert" — discard the rename for collisions, keeping the pre-normalization name.
"drop" — remove colliding columns from the output (columns only; not valid for table names).

The preset field selects one or more built-in word lists. When omitted, the ANSI SQL list is used as the default. Specifying any preset replaces that default — to include ANSI words alongside other presets, list "ansi" explicitly. A single string is accepted as shorthand for a one-element list.

The extend and exclude lists compose on top of the resolved preset union, adding or removing individual words.

Attributes:

Name	Type	Description
`strategy`	`Literal['prefix', 'quote', 'error', 'suffix', 'rename', 'revert', 'drop']`	How to handle reserved word collisions.
`prefix`	`str`	String prepended to colliding column names when `strategy="prefix"`.
`suffix`	`str`	String appended to colliding column names when `strategy="suffix"`.
`rename_map`	`dict[str, str] \| None`	Explicit mapping of reserved words to replacement names. Required when `strategy="rename"`.
`fallback`	`Literal['prefix', 'quote', 'error', 'suffix', 'revert', 'drop'] \| None`	Fallback strategy for unmapped collisions when `strategy="rename"`. Cannot be `"rename"`.
`preset`	`list[ReservedWordPreset] \| None`	Built-in word list presets to activate. `None` (default) uses the ANSI SQL list. Accepts a single string or a list.
`extend`	`list[str]`	Additional words to treat as reserved beyond the preset.
`exclude`	`list[str]`	Words to remove from the reserved word check.

`ReservedWordPreset` ¶

Bases: StrEnum

Built-in reserved word list presets.

Each preset represents a self-contained set of reserved words for a specific query language or engine context. Presets can be combined via list composition; the effective word set is the union.

Attributes:

Name	Type	Description
`ANSI`		ANSI SQL reserved keywords (~80 words).
`DAX`		DAX reserved words for Power BI semantic models.
`M`		M language (Power Query) reserved words.
`POWERBI`		Convenience alias expanding to DAX + M union.
`TSQL`		T-SQL reserved keywords for Fabric SQL endpoints.

`OneLakeConnection` ¶

Bases: FrozenBase

A OneLake connection declaration.

Identifies the Fabric workspace and lakehouse that a source or target resolves against at execution time.

`DimensionSurrogateKeyConfig` ¶

Bases: FrozenBase

Surrogate key generation configuration for a dimension target.

Unlike the keys.py SurrogateKeyConfig, this variant includes a columns field that specifies which source columns are hashed to produce the SK.

Attributes:

Name	Type	Description
`name`	`str`	Output column name for the generated surrogate key.
`algorithm`	`_ALGORITHM_LITERAL`	Hash algorithm to use.
`columns`	`list[str]`	Source columns to hash into the surrogate key value.
`output`	`Literal['native', 'string']`	Controls the output type for integer-returning algorithms (xxhash64, crc32, murmur3). `native` preserves the algorithm's return type; `string` casts to StringType.

`ExecutionConfig` ¶

Bases: FrozenBase

Runtime execution settings that cascade through loom/weave/thread.

Attributes:

Name	Type	Description
`log_level`	`LogLevel`	Logging verbosity for execution output.
`trace`	`bool`	Whether to collect execution spans for telemetry.

`LogLevel` ¶

Bases: StrEnum

Configurable log level for weevr execution.

Controls the verbosity of structured logging output during pipeline execution. Maps to Python logging levels internally.

`Export` ¶

Bases: FrozenBase

A named secondary output destination for thread data.

Exports write the same post-mapping, audit-injected DataFrame as the primary target to additional locations in configurable formats.

Attributes:

Name	Type	Description
`name`	`str`	Unique identifier within the resolved thread.
`description`	`str \| None`	Optional human-readable label shown in explain() output.
`type`	`ExportFormat`	Output format (delta, parquet, csv, json, orc).
`path`	`str \| None`	OneLake path for the export. Supports context variables.
`alias`	`str \| None`	Metastore alias (delta type only, mutually exclusive with path).
`connection`	`str \| None`	Named connection reference (delta type only). Mutually exclusive with `path` and `alias`. Requires `table`.
`schema_override`	`str \| None`	Schema override within the connection's lakehouse. Aliased as `schema` in YAML configs.
`table`	`str \| None`	Table name within the connection's lakehouse. Required when `connection` is set; invalid without it.
`mode`	`Literal['overwrite']`	Write mode. Only `"overwrite"` in v1.
`partition_by`	`list[str] \| None`	Partition columns, independent of primary target.
`on_failure`	`Literal['abort', 'warn']`	Behavior on write error — `"abort"` fails the thread, `"warn"` logs and continues.
`enabled`	`bool`	Set to `False` to suppress an inherited export.
`options`	`dict[str, str] \| None`	Format-specific Spark DataFrameWriter options.

`FailureConfig` ¶

Bases: FrozenBase

Per-thread failure handling policy.

Controls what happens to remaining threads in a weave when this thread fails.

Attributes:

Name	Type	Description
`on_failure`	`Literal['skip_downstream', 'continue', 'abort_weave']`	One of `"abort_weave"` (default), `"skip_downstream"`, or `"continue"`. See SPEC §14.2 for semantics.

`LogMessageStep` ¶

Bases: FrozenBase

A log message emitted as a hook step.

Attributes:

Name	Type	Description
`type`	`Literal['log_message']`	Step type discriminator, always `"log_message"`.
`name`	`str \| None`	Optional name for telemetry span naming.
`on_failure`	`Literal['abort', 'warn'] \| None`	Failure behaviour. `None` means the executor applies the phase-specific default.
`message`	`str`	Message template to log. Supports `${var.name}` placeholders.
`level`	`Literal['info', 'warn', 'error']`	Log level for the message.

`QualityGateStep` ¶

Bases: FrozenBase

A quality gate check executed as a hook step.

The check field determines which check-specific fields are required.

Attributes:

Name	Type	Description
`type`	`Literal['quality_gate']`	Step type discriminator, always `"quality_gate"`.
`name`	`str \| None`	Optional name for telemetry span naming.
`on_failure`	`Literal['abort', 'warn'] \| None`	Failure behaviour. `None` means the executor applies the phase-specific default (pre=abort, post=warn).
`check`	`Literal['source_freshness', 'row_count_delta', 'row_count', 'table_exists', 'expression']`	Which quality gate check to perform.
`source`	`str \| None`	Table alias for `source_freshness` and `table_exists` checks.
`max_age`	`str \| None`	Duration string for `source_freshness` (e.g. `"24h"`).
`target`	`str \| None`	Table alias for `row_count_delta` and `row_count` checks.
`max_decrease_pct`	`float \| None`	Max allowed decrease percentage for `row_count_delta`.
`max_increase_pct`	`float \| None`	Max allowed increase percentage for `row_count_delta`.
`min_delta`	`int \| None`	Minimum absolute row change for `row_count_delta`.
`max_delta`	`int \| None`	Maximum absolute row change for `row_count_delta`.
`min_count`	`int \| None`	Minimum row count for `row_count`.
`max_count`	`int \| None`	Maximum row count for `row_count`.
`sql`	`str \| None`	Spark SQL boolean expression for `expression` check.
`message`	`str \| None`	Failure message for `expression` check diagnostics.

`SqlStatementStep` ¶

Bases: FrozenBase

An arbitrary SQL statement executed as a hook step.

Optionally captures the scalar result into a weave-scoped variable via set_var.

Attributes:

Name	Type	Description
`type`	`Literal['sql_statement']`	Step type discriminator, always `"sql_statement"`.
`name`	`str \| None`	Optional name for telemetry span naming.
`on_failure`	`Literal['abort', 'warn'] \| None`	Failure behaviour. `None` means the executor applies the phase-specific default.
`sql`	`str`	Spark SQL statement to execute.
`set_var`	`str \| None`	Optional variable name to capture the scalar result into.

`ChangeDetectionConfig` ¶

Bases: FrozenBase

Configuration for change detection hash generation.

Attributes:

Name	Type	Description
`name`	`str`	Output column name for the generated hash.
`columns`	`list[str]`	Columns to include in the change detection hash.
`algorithm`	`Literal['xxhash64', 'sha1', 'sha256', 'sha384', 'sha512', 'md5', 'crc32', 'murmur3']`	Hash algorithm to use. `crc32` has a small 32-bit output space with higher collision risk. `murmur3` (Spark's `hash()`) may produce different results across Spark major versions. Neither is recommended for high-cardinality change detection use cases.
`output`	`Literal['native', 'string']`	Controls the output type for integer-returning algorithms (xxhash64, crc32, murmur3). `native` preserves the algorithm's return type; `string` casts to StringType. Has no practical effect on sha*/md5 which already return hex strings.

`KeyConfig` ¶

Bases: FrozenBase

Key management configuration for a thread.

`SurrogateKeyConfig` ¶

Bases: FrozenBase

Configuration for surrogate key generation.

Attributes:

Name	Type	Description
`name`	`str`	Output column name for the generated surrogate key.
`algorithm`	`Literal['xxhash64', 'sha1', 'sha256', 'sha384', 'sha512', 'md5', 'crc32', 'murmur3']`	Hash algorithm to use. `crc32` has a small 32-bit output space with higher collision risk. `murmur3` (Spark's `hash()`) may produce different results across Spark major versions. Neither is recommended for high-cardinality surrogate key use cases.
`output`	`Literal['native', 'string']`	Controls the output type for integer-returning algorithms (xxhash64, crc32, murmur3). `native` preserves the algorithm's return type; `string` casts to StringType. Has no practical effect on sha*/md5 which already return hex strings.

`LoadConfig` ¶

Bases: FrozenBase

Incremental load mode and watermark parameters.

Cross-field validation: - mode == "incremental_watermark" requires watermark_column to be set. - mode == "cdc" requires cdc config to be set. - mode == "cdc" with cdc.preset == "delta_cdf" rejects watermark_column (CDF uses commit-version tracking automatically). - mode == "cdc" with cdc.operation_column (generic CDC) may compose with watermark_column to narrow the read window for append-only CDC history tables; watermark_type is required whenever watermark_column is set in cdc mode.

`Lookup` ¶

Bases: FrozenBase

A weave-level named data definition that can be referenced by threads.

Lookups define small reference datasets shared across threads in a weave. When materialize is true, the data is read once and cached (or broadcast) before threads execute.

Narrow lookup fields (key, values, filter) control projection and filtering applied during materialization or on-demand reads. When set, only the declared key and value columns are retained, reducing memory and improving join performance.

Attributes:

Name	Type	Description
`source`	`Source`	Source definition for the lookup data.
`materialize`	`bool`	Whether to pre-read and cache/broadcast the data before thread execution.
`strategy`	`Literal['broadcast', 'cache']`	Materialization strategy. Only meaningful when `materialize` is true. `"broadcast"` uses Spark broadcast join hints; `"cache"` uses DataFrame caching.
`key`	`list[str] \| None`	Column(s) used for matching (join key). Kept in the cached projection alongside `values`.
`values`	`list[str] \| None`	Payload column(s) to retrieve. When set, only `key` + `values` columns are retained in the cached DataFrame.
`filter`	`str \| None`	SQL WHERE expression applied to the source before projection.
`unique_key`	`bool`	When true, validate that `key` columns form a unique key after filtering and projection.
`on_failure`	`Literal['abort', 'warn']`	Behavior when `unique_key` check finds duplicates. Only meaningful when `unique_key` is true.

`Loom` ¶

Bases: FrozenBase

A deployment unit containing weave references with optional shared defaults.

`WeaveEntry` ¶

Bases: FrozenBase

A weave reference within a loom, with optional condition.

Attributes:

Name	Type	Description
`name`	`str`	Weave name. Required for inline definitions; derived from filename stem for external references.
`ref`	`str \| None`	Path to an external `.weave` file, relative to project root. Mutually exclusive with inline definition (name + body).
`condition`	`ConditionSpec \| None`	Optional condition for conditional execution.

`NamingConfig` ¶

Bases: FrozenBase

Configuration for naming normalization.

Attributes:

Name	Type	Description
`columns`	`NamingPattern \| None`	Pattern to apply to column names. None means inherit from parent.
`tables`	`NamingPattern \| None`	Pattern to apply to table names. None means inherit from parent.
`exclude`	`list[str]`	Glob patterns or explicit names to exclude from column normalization.
`on_collision`	`Literal['suffix', 'error']`	Behaviour when two columns normalise to the same name. `"suffix"` appends a numeric suffix to the later column; `"error"` aborts with a validation error.
`reserved_words`	`ReservedWordConfig \| None`	Optional configuration for handling SQL reserved word collisions in column names. None means no reserved word handling.

`NamingPattern` ¶

Bases: StrEnum

Column and table naming patterns.

Attributes:

Name	Type	Description
`SNAKE_CASE`		`http_status`
`CAMEL_CASE`		`httpStatus`
`PASCAL_CASE`		`HttpStatus`
`UPPER_SNAKE_CASE`		`HTTP_STATUS`
`TITLE_SNAKE_CASE`		`Http_Status`
`TITLE_CASE`		`Http Status`
`LOWERCASE`		`httpstatus`
`UPPERCASE`		`HTTPSTATUS`
`KEBAB_CASE`		`http-status`
`NONE`		Opt-out sentinel — no normalization applied.

`ParamsConfig` ¶

Bases: BaseModel

Parameter file schema (flat key-value structure).

Uses a mutable BaseModel rather than FrozenBase because it is a validation schema for param files, not a domain object. It relies on extra="allow" for arbitrary param keys and on model_dump(exclude_unset=True) within the config pipeline.

`ParamSpec` ¶

Bases: FrozenBase

Typed parameter specification for config-level param declarations.

`AggregateStep` ¶

Bases: FrozenBase

Pipeline step: aggregate rows with optional grouping.

`CaseWhenStep` ¶

Bases: FrozenBase

Pipeline step: conditional column values.

`CastStep` ¶

Bases: FrozenBase

Pipeline step: cast column types.

`CoalesceStep` ¶

Bases: FrozenBase

Pipeline step: coalesce columns.

`ConcatParams` ¶

Bases: FrozenBase

Parameters for the concat step.

Concatenates multiple columns into a single string column with configurable null handling, separator, and trimming.

`ConcatStep` ¶

Bases: FrozenBase

Pipeline step: concatenate columns into a string.

`CurrentConfig` ¶

Bases: FrozenBase

Current-flag sub-mode for SCD2 narrowing.

When the lookup dimension uses a boolean or coded column to mark the active record, column identifies that column and value is the active marker (defaults to True).

`DateOpsStep` ¶

Bases: FrozenBase

Pipeline step: apply expression template to date/timestamp columns.

`DedupStep` ¶

Bases: FrozenBase

Pipeline step: deduplicate rows.

`DeriveStep` ¶

Bases: FrozenBase

Pipeline step: derive one or more new columns from Spark SQL expressions.

`DropStep` ¶

Bases: FrozenBase

Pipeline step: drop columns.

`EffectiveConfig` ¶

Bases: FrozenBase

SCD2 narrowing configuration for resolve step.

Supports two mutually exclusive sub-modes:

Date range — half-open interval [from, to) checked against a fact date_column.
Current flag — filter lookup rows where a column equals a specific value (string sugar or dict form with custom value).

`FillNullStep` ¶

Bases: FrozenBase

Pipeline step: fill null values with defaults.

`FilterStep` ¶

Bases: FrozenBase

Pipeline step: filter rows using a Spark SQL expression.

`FormatParams` ¶

Bases: FrozenBase

Parameters for the format step.

Maps target column names to format specifications. Multiple columns can be formatted in a single step.

`FormatSpec` ¶

Bases: FrozenBase

Per-column format specification.

Exactly one of pattern, number, or date must be set. source defaults to the target column name when omitted.

`FormatStep` ¶

Bases: FrozenBase

Pipeline step: format columns using pattern, number, or date rules.

`JoinStep` ¶

Bases: FrozenBase

Pipeline step: join with another source.

`MapParams` ¶

Bases: FrozenBase

Parameters for the map step.

Maps discrete values in a column to new values using a lookup dict. Null handling and unmapped value behavior are independently configurable.

`MapStep` ¶

Bases: FrozenBase

Pipeline step: map discrete values.

`PivotStep` ¶

Bases: FrozenBase

Pipeline step: pivot rows to columns.

`RenameStep` ¶

Bases: FrozenBase

Pipeline step: rename columns.

`ResolveBatchItem` ¶

Bases: FrozenBase

Per-FK configuration within a batch resolve step.

Every item requires name, lookup, and match. All other fields are optional overrides merged with shared defaults at runtime.

`ResolveParams` ¶

Bases: FrozenBase

Parameters for the resolve step.

Encapsulates FK resolution: BK completeness check, multi-column equi-join against a named lookup, sentinel assignment for invalid and unknown BKs, optional SCD2 narrowing, include columns, and batch mode for multi-FK fact tables.

In single mode, name, lookup, match, and pk are required. In batch mode, batch contains the per-FK specs and the outer-level fields serve as shared defaults.

`resolve_batch_items()` ¶

Merge shared defaults into each batch item.

Returns a list of fully-resolved ResolveBatchItem objects where item-level values override the shared defaults from this ResolveParams instance. Raises ValueError if a required field (pk) is missing after merge.

`ResolveStep` ¶

Bases: FrozenBase

Pipeline step: resolve foreign keys via lookup join.

`SelectStep` ¶

Bases: FrozenBase

Pipeline step: select a subset of columns.

`SortStep` ¶

Bases: FrozenBase

Pipeline step: sort rows.

`StringOpsStep` ¶

Bases: FrozenBase

Pipeline step: apply expression template to string columns.

`UnionStep` ¶

Bases: FrozenBase

Pipeline step: union with other sources.

`UnpivotStep` ¶

Bases: FrozenBase

Pipeline step: unpivot columns to rows.

`WindowStep` ¶

Bases: FrozenBase

Pipeline step: apply window functions.

`DedupConfig` ¶

Bases: FrozenBase

Deduplication configuration applied immediately after reading a source.

`Source` ¶

Bases: FrozenBase

A data source declaration.

A source is either a direct data reference (with type), a connection-based reference (with connection + table), a lookup reference (with lookup), or a generated sequence (with type set to date_sequence or int_sequence). lookup is mutually exclusive with all other resolution modes.

Cross-field validation rules:

If lookup is set: type must not be set.
If connection is set: table must be set; alias must not be set; type must be None or "delta"; connection sources resolve via the named connection rather than an inline alias.
table without connection is rejected.
connection and alias are mutually exclusive.
If neither lookup nor connection is set: type is required.
type == "delta" without connection requires alias to be set.
type in file types (csv, json, parquet, excel) requires path to be set.
Generated types (date_sequence, int_sequence) require column, start, and end; alias, path, dedup, connection, and non-empty options are rejected.
date_sequence step must be one of day, week, month, year, or omitted.
int_sequence step must be a positive integer or omitted.

`SubPipeline` ¶

Bases: FrozenBase

A named sub-pipeline that produces an intermediate DataFrame.

Sub-pipelines are declared under the with: block of a thread and are referenced by name in subsequent join or union steps.

`ColumnMapping` ¶

Bases: FrozenBase

Mapping specification for a single target column.

Cross-field validation: - expr and drop=True are mutually exclusive.

`Target` ¶

Bases: FrozenBase

Write destination with column mapping and partitioning configuration.

`Thread` ¶

Bases: FrozenBase

Complete domain model for a thread configuration.

A thread is the smallest unit of work: one or more sources, a sequence of transformation steps, and a single target.

`Assertion` ¶

Bases: FrozenBase

A post-execution assertion on the target dataset.

Supports built-in types (row_count, column_not_null, unique, expression) and the FK-specific fk_sentinel_rate type for checking sentinel value rates across resolved FK columns.

`ValidationRule` ¶

Bases: FrozenBase

A pre-write data quality rule evaluated as a Spark SQL expression.

`VariableSpec` ¶

Bases: FrozenBase

Declaration of a weave-scoped variable.

Variables are typed scalar values that can be set by hook steps via set_var and referenced in downstream config as ${var.name}.

Attributes:

Name	Type	Description
`type`	`Literal['string', 'int', 'long', 'float', 'double', 'boolean', 'timestamp', 'date']`	Scalar type of the variable value.
`default`	`str \| int \| float \| bool \| None`	Optional default value used when no hook sets the variable.

`ConditionSpec` ¶

Bases: FrozenBase

A condition expression for conditional execution.

Attributes:

Name	Type	Description
`when`	`str`	Condition expression string. Supports parameter references (`${param.x}`), built-in checks (`table_exists()`, `table_empty()`, `row_count()`), and simple boolean operators.

`ThreadEntry` ¶

Bases: FrozenBase

A thread reference within a weave, with optional per-thread overrides.

Attributes:

Name	Type	Description
`name`	`str`	Thread name. Required for inline definitions; derived from filename stem for external references.
`ref`	`str \| None`	Path to an external `.thread` file, relative to project root. Mutually exclusive with inline definition (name + body).
`as_`	`str \| None`	Alias that overrides the thread's effective name for all downstream consumers (planner, executor, telemetry, watermark, display). Required when the same ref appears multiple times in a weave.
`params`	`dict[str, Any] \| None`	Key-value dict injected into the thread's parameter resolution context under the `param` namespace, enabling `${param.x}` references in the thread config.
`dependencies`	`list[str] \| None`	Explicit upstream thread names. When set, these are merged with any auto-inferred dependencies from source/target path matching.
`condition`	`ConditionSpec \| None`	Optional condition for conditional execution. When set, the thread is only executed if the condition evaluates to True.

`Weave` ¶

Bases: FrozenBase

A collection of thread references with optional shared defaults.

`WriteConfig` ¶

Bases: FrozenBase

Write mode and merge parameters for a thread target.

Cross-field validation: - mode == "merge" requires match_keys to be set. - on_no_match_source == "soft_delete" requires soft_delete_column to be set.

Model API¶

weevr.model ¶

HookStep = Annotated[Annotated[QualityGateStep, Tag('quality_gate')] | Annotated[SqlStatementStep, Tag('sql_statement')] | Annotated[LogMessageStep, Tag('log_message')], Discriminator(_hook_step_discriminator)] module-attribute ¶

SparkExpr = NewType('SparkExpr', str) module-attribute ¶

AuditTemplate ¶

ColumnSet ¶

ColumnSetSource ¶

ReservedWordConfig ¶

ReservedWordPreset ¶

OneLakeConnection ¶

DimensionSurrogateKeyConfig ¶

ExecutionConfig ¶

LogLevel ¶

Export ¶

FailureConfig ¶

LogMessageStep ¶

QualityGateStep ¶

SqlStatementStep ¶

ChangeDetectionConfig ¶

KeyConfig ¶

SurrogateKeyConfig ¶

LoadConfig ¶

Lookup ¶

Loom ¶

WeaveEntry ¶

NamingConfig ¶

NamingPattern ¶

ParamsConfig ¶

ParamSpec ¶

AggregateStep ¶

CaseWhenStep ¶

CastStep ¶

CoalesceStep ¶

ConcatParams ¶

ConcatStep ¶

CurrentConfig ¶

DateOpsStep ¶

DedupStep ¶

DeriveStep ¶

DropStep ¶

EffectiveConfig ¶

FillNullStep ¶

FilterStep ¶

FormatParams ¶

FormatSpec ¶

FormatStep ¶

JoinStep ¶

MapParams ¶

MapStep ¶

PivotStep ¶

RenameStep ¶

ResolveBatchItem ¶

ResolveParams ¶

resolve_batch_items() ¶

ResolveStep ¶

SelectStep ¶

SortStep ¶

StringOpsStep ¶

UnionStep ¶

UnpivotStep ¶

WindowStep ¶

DedupConfig ¶

Source ¶

SubPipeline ¶

ColumnMapping ¶

Target ¶

Thread ¶

Assertion ¶

ValidationRule ¶

VariableSpec ¶

ConditionSpec ¶

ThreadEntry ¶

Weave ¶

WriteConfig ¶

`weevr.model` ¶

`HookStep = Annotated[Annotated[QualityGateStep, Tag('quality_gate')] | Annotated[SqlStatementStep, Tag('sql_statement')] | Annotated[LogMessageStep, Tag('log_message')], Discriminator(_hook_step_discriminator)]` `module-attribute` ¶

`SparkExpr = NewType('SparkExpr', str)` `module-attribute` ¶

`AuditTemplate` ¶

`ColumnSet` ¶

`ColumnSetSource` ¶

`ReservedWordConfig` ¶

`ReservedWordPreset` ¶

`OneLakeConnection` ¶

`DimensionSurrogateKeyConfig` ¶

`ExecutionConfig` ¶

`LogLevel` ¶

`Export` ¶

`FailureConfig` ¶

`LogMessageStep` ¶

`QualityGateStep` ¶

`SqlStatementStep` ¶

`ChangeDetectionConfig` ¶

`KeyConfig` ¶

`SurrogateKeyConfig` ¶

`LoadConfig` ¶

`Lookup` ¶

`Loom` ¶

`WeaveEntry` ¶

`NamingConfig` ¶

`NamingPattern` ¶

`ParamsConfig` ¶

`ParamSpec` ¶

`AggregateStep` ¶

`CaseWhenStep` ¶

`CastStep` ¶

`CoalesceStep` ¶

`ConcatParams` ¶

`ConcatStep` ¶

`CurrentConfig` ¶

`DateOpsStep` ¶

`DedupStep` ¶

`DeriveStep` ¶

`DropStep` ¶

`EffectiveConfig` ¶

`FillNullStep` ¶

`FilterStep` ¶

`FormatParams` ¶

`FormatSpec` ¶

`FormatStep` ¶

`JoinStep` ¶

`MapParams` ¶

`MapStep` ¶

`PivotStep` ¶

`RenameStep` ¶

`ResolveBatchItem` ¶

`ResolveParams` ¶

`resolve_batch_items()` ¶

`ResolveStep` ¶

`SelectStep` ¶

`SortStep` ¶

`StringOpsStep` ¶

`UnionStep` ¶

`UnpivotStep` ¶

`WindowStep` ¶

`DedupConfig` ¶

`Source` ¶

`SubPipeline` ¶

`ColumnMapping` ¶

`Target` ¶

`Thread` ¶

`Assertion` ¶

`ValidationRule` ¶

`VariableSpec` ¶

`ConditionSpec` ¶

`ThreadEntry` ¶

`Weave` ¶

`WriteConfig` ¶