Skip to content

Model API

The weevr.model package contains all Pydantic domain models that represent the configuration structure -- threads, weaves, looms, pipeline steps, sources, targets, and supporting types.

weevr.model

Domain object model for weevr configuration.

HookStep = Annotated[Annotated[QualityGateStep, Tag('quality_gate')] | Annotated[SqlStatementStep, Tag('sql_statement')] | Annotated[LogMessageStep, Tag('log_message')], Discriminator(_hook_step_discriminator)] module-attribute

Discriminated union of all hook step types.

Dispatches on the type field: quality_gate, sql_statement, or log_message.

Step = Annotated[Annotated[FilterStep, Tag('filter')] | Annotated[DeriveStep, Tag('derive')] | Annotated[JoinStep, Tag('join')] | Annotated[SelectStep, Tag('select')] | Annotated[DropStep, Tag('drop')] | Annotated[RenameStep, Tag('rename')] | Annotated[CastStep, Tag('cast')] | Annotated[DedupStep, Tag('dedup')] | Annotated[SortStep, Tag('sort')] | Annotated[UnionStep, Tag('union')] | Annotated[AggregateStep, Tag('aggregate')] | Annotated[WindowStep, Tag('window')] | Annotated[PivotStep, Tag('pivot')] | Annotated[UnpivotStep, Tag('unpivot')] | Annotated[CaseWhenStep, Tag('case_when')] | Annotated[FillNullStep, Tag('fill_null')] | Annotated[CoalesceStep, Tag('coalesce')] | Annotated[StringOpsStep, Tag('string_ops')] | Annotated[DateOpsStep, Tag('date_ops')], Discriminator(_step_discriminator)] module-attribute

Discriminated union of all pipeline step types.

Accepts a dict with a single key matching the step type name: {"filter": {"expr": "amount > 0"}}FilterStep

SparkExpr = NewType('SparkExpr', str) module-attribute

A Spark SQL expression string.

Used to annotate fields that hold Spark SQL expressions (e.g., filter predicates, derived column expressions). At runtime these are plain strings; the NewType signals intent and allows downstream type-checkers to distinguish expressions from arbitrary strings.

__all__ = ['Thread', 'Weave', 'Loom', 'FailureConfig', 'ThreadEntry', 'ConditionSpec', 'WeaveEntry', 'Source', 'DedupConfig', 'Target', 'ColumnMapping', 'Step', 'FilterStep', 'DeriveStep', 'JoinStep', 'SelectStep', 'DropStep', 'RenameStep', 'CastStep', 'DedupStep', 'SortStep', 'UnionStep', 'AggregateStep', 'WindowStep', 'PivotStep', 'UnpivotStep', 'CaseWhenStep', 'FillNullStep', 'CoalesceStep', 'StringOpsStep', 'DateOpsStep', 'KeyConfig', 'SurrogateKeyConfig', 'ChangeDetectionConfig', 'WriteConfig', 'ValidationRule', 'Assertion', 'LoadConfig', 'ParamSpec', 'ParamsConfig', 'ExecutionConfig', 'LogLevel', 'SparkExpr', 'HookStep', 'QualityGateStep', 'SqlStatementStep', 'LogMessageStep', 'Lookup', 'VariableSpec'] module-attribute

ExecutionConfig

Bases: FrozenBase

Runtime execution settings that cascade through loom/weave/thread.

Attributes:

Name Type Description
log_level LogLevel

Logging verbosity for execution output.

trace bool

Whether to collect execution spans for telemetry.

LogLevel

Bases: StrEnum

Configurable log level for weevr execution.

Controls the verbosity of structured logging output during pipeline execution. Maps to Python logging levels internally.

FailureConfig

Bases: FrozenBase

Per-thread failure handling policy.

Controls what happens to remaining threads in a weave when this thread fails.

Attributes:

Name Type Description
on_failure Literal['skip_downstream', 'continue', 'abort_weave']

One of "abort_weave" (default), "skip_downstream", or "continue". See SPEC §14.2 for semantics.

LogMessageStep

Bases: FrozenBase

A log message emitted as a hook step.

Attributes:

Name Type Description
type Literal['log_message']

Step type discriminator, always "log_message".

name str | None

Optional name for telemetry span naming.

on_failure Literal['abort', 'warn'] | None

Failure behaviour. None means the executor applies the phase-specific default.

message str

Message template to log. Supports ${var.name} placeholders.

level Literal['info', 'warn', 'error']

Log level for the message.

QualityGateStep

Bases: FrozenBase

A quality gate check executed as a hook step.

The check field determines which check-specific fields are required.

Attributes:

Name Type Description
type Literal['quality_gate']

Step type discriminator, always "quality_gate".

name str | None

Optional name for telemetry span naming.

on_failure Literal['abort', 'warn'] | None

Failure behaviour. None means the executor applies the phase-specific default (pre=abort, post=warn).

check Literal['source_freshness', 'row_count_delta', 'row_count', 'table_exists', 'expression']

Which quality gate check to perform.

source str | None

Table alias for source_freshness and table_exists checks.

max_age str | None

Duration string for source_freshness (e.g. "24h").

target str | None

Table alias for row_count_delta and row_count checks.

max_decrease_pct float | None

Max allowed decrease percentage for row_count_delta.

max_increase_pct float | None

Max allowed increase percentage for row_count_delta.

min_delta int | None

Minimum absolute row change for row_count_delta.

max_delta int | None

Maximum absolute row change for row_count_delta.

min_count int | None

Minimum row count for row_count.

max_count int | None

Maximum row count for row_count.

sql str | None

Spark SQL boolean expression for expression check.

message str | None

Failure message for expression check diagnostics.

SqlStatementStep

Bases: FrozenBase

An arbitrary SQL statement executed as a hook step.

Optionally captures the scalar result into a weave-scoped variable via set_var.

Attributes:

Name Type Description
type Literal['sql_statement']

Step type discriminator, always "sql_statement".

name str | None

Optional name for telemetry span naming.

on_failure Literal['abort', 'warn'] | None

Failure behaviour. None means the executor applies the phase-specific default.

sql str

Spark SQL statement to execute.

set_var str | None

Optional variable name to capture the scalar result into.

ChangeDetectionConfig

Bases: FrozenBase

Configuration for change detection hash generation.

KeyConfig

Bases: FrozenBase

Key management configuration for a thread.

SurrogateKeyConfig

Bases: FrozenBase

Configuration for surrogate key generation.

LoadConfig

Bases: FrozenBase

Incremental load mode and watermark parameters.

Cross-field validation: - mode == "incremental_watermark" requires watermark_column to be set. - mode == "cdc" requires cdc config to be set. - mode == "cdc" must not have watermark_column set (CDF version tracking is automatic).

Lookup

Bases: FrozenBase

A weave-level named data definition that can be referenced by threads.

Lookups define small reference datasets shared across threads in a weave. When materialize is true, the data is read once and cached (or broadcast) before threads execute.

Attributes:

Name Type Description
source Source

Source definition for the lookup data.

materialize bool

Whether to pre-read and cache/broadcast the data before thread execution.

strategy Literal['broadcast', 'cache']

Materialization strategy. Only meaningful when materialize is true. "broadcast" uses Spark broadcast join hints; "cache" uses DataFrame caching.

Loom

Bases: FrozenBase

A deployment unit containing weave references with optional shared defaults.

WeaveEntry

Bases: FrozenBase

A weave reference within a loom, with optional condition.

Attributes:

Name Type Description
name str

Weave name. Required for inline definitions; derived from filename stem for external references.

ref str | None

Path to an external .weave file, relative to project root. Mutually exclusive with inline definition (name + body).

condition ConditionSpec | None

Optional condition for conditional execution.

ParamsConfig

Bases: BaseModel

Parameter file schema (flat key-value structure).

Uses a mutable BaseModel rather than FrozenBase because it is a validation schema for param files, not a domain object. It relies on extra="allow" for arbitrary param keys and on model_dump(exclude_unset=True) within the config pipeline.

ParamSpec

Bases: FrozenBase

Typed parameter specification for config-level param declarations.

AggregateStep

Bases: FrozenBase

Pipeline step: aggregate rows with optional grouping.

CaseWhenStep

Bases: FrozenBase

Pipeline step: conditional column values.

CastStep

Bases: FrozenBase

Pipeline step: cast column types.

CoalesceStep

Bases: FrozenBase

Pipeline step: coalesce columns.

DateOpsStep

Bases: FrozenBase

Pipeline step: apply expression template to date/timestamp columns.

DedupStep

Bases: FrozenBase

Pipeline step: deduplicate rows.

DeriveStep

Bases: FrozenBase

Pipeline step: derive one or more new columns from Spark SQL expressions.

DropStep

Bases: FrozenBase

Pipeline step: drop columns.

FillNullStep

Bases: FrozenBase

Pipeline step: fill null values with defaults.

FilterStep

Bases: FrozenBase

Pipeline step: filter rows using a Spark SQL expression.

JoinStep

Bases: FrozenBase

Pipeline step: join with another source.

PivotStep

Bases: FrozenBase

Pipeline step: pivot rows to columns.

RenameStep

Bases: FrozenBase

Pipeline step: rename columns.

SelectStep

Bases: FrozenBase

Pipeline step: select a subset of columns.

SortStep

Bases: FrozenBase

Pipeline step: sort rows.

StringOpsStep

Bases: FrozenBase

Pipeline step: apply expression template to string columns.

UnionStep

Bases: FrozenBase

Pipeline step: union with other sources.

UnpivotStep

Bases: FrozenBase

Pipeline step: unpivot columns to rows.

WindowStep

Bases: FrozenBase

Pipeline step: apply window functions.

DedupConfig

Bases: FrozenBase

Deduplication configuration applied immediately after reading a source.

Source

Bases: FrozenBase

A data source declaration.

A source is either a direct data reference (with type) or a lookup reference (with lookup). These are mutually exclusive.

Cross-field validation rules: - If lookup is set: type must not be set (lookup references are resolved at execution time from weave-level lookup definitions). - If lookup is not set: type is required. - type == "delta" requires alias to be set. - type in file types (csv, json, parquet, excel) requires path to be set.

ColumnMapping

Bases: FrozenBase

Mapping specification for a single target column.

Cross-field validation: - expr and drop=True are mutually exclusive.

Target

Bases: FrozenBase

Write destination with column mapping and partitioning configuration.

Thread

Bases: FrozenBase

Complete domain model for a thread configuration.

A thread is the smallest unit of work: one or more sources, a sequence of transformation steps, and a single target.

Assertion

Bases: FrozenBase

A post-execution assertion on the target dataset.

Attributes:

Name Type Description
type Literal['row_count', 'column_not_null', 'unique', 'expression']

Assertion kind — built-in or expression-based.

severity Literal['info', 'warn', 'error', 'fatal']

Severity level when the assertion fails.

columns list[str] | None

Columns involved, for column_not_null and unique types.

min int | None

Minimum bound for row_count assertions.

max int | None

Maximum bound for row_count assertions.

expression SparkExpr | None

Spark SQL expression for expression-type assertions.

ValidationRule

Bases: FrozenBase

A pre-write data quality rule evaluated as a Spark SQL expression.

VariableSpec

Bases: FrozenBase

Declaration of a weave-scoped variable.

Variables are typed scalar values that can be set by hook steps via set_var and referenced in downstream config as ${var.name}.

Attributes:

Name Type Description
type Literal['string', 'int', 'long', 'float', 'double', 'boolean', 'timestamp', 'date']

Scalar type of the variable value.

default str | int | float | bool | None

Optional default value used when no hook sets the variable.

ConditionSpec

Bases: FrozenBase

A condition expression for conditional execution.

Attributes:

Name Type Description
when str

Condition expression string. Supports parameter references (${param.x}), built-in checks (table_exists(), table_empty(), row_count()), and simple boolean operators.

ThreadEntry

Bases: FrozenBase

A thread reference within a weave, with optional per-thread overrides.

Attributes:

Name Type Description
name str

Thread name. Required for inline definitions; derived from filename stem for external references.

ref str | None

Path to an external .thread file, relative to project root. Mutually exclusive with inline definition (name + body).

dependencies list[str] | None

Explicit upstream thread names. When set, these are merged with any auto-inferred dependencies from source/target path matching.

condition ConditionSpec | None

Optional condition for conditional execution. When set, the thread is only executed if the condition evaluates to True.

Weave

Bases: FrozenBase

A collection of thread references with optional shared defaults.

WriteConfig

Bases: FrozenBase

Write mode and merge parameters for a thread target.

Cross-field validation: - mode == "merge" requires match_keys to be set. - on_no_match_source == "soft_delete" requires soft_delete_column to be set.