Model API¶
The weevr.model package contains all Pydantic domain models that represent
the configuration structure -- threads, weaves, looms, pipeline steps, sources,
targets, and supporting types.
weevr.model
¶
Domain object model for weevr configuration.
HookStep = Annotated[Annotated[QualityGateStep, Tag('quality_gate')] | Annotated[SqlStatementStep, Tag('sql_statement')] | Annotated[LogMessageStep, Tag('log_message')], Discriminator(_hook_step_discriminator)]
module-attribute
¶
Discriminated union of all hook step types.
Dispatches on the type field: quality_gate, sql_statement, or
log_message.
Step = Annotated[Annotated[FilterStep, Tag('filter')] | Annotated[DeriveStep, Tag('derive')] | Annotated[JoinStep, Tag('join')] | Annotated[SelectStep, Tag('select')] | Annotated[DropStep, Tag('drop')] | Annotated[RenameStep, Tag('rename')] | Annotated[CastStep, Tag('cast')] | Annotated[DedupStep, Tag('dedup')] | Annotated[SortStep, Tag('sort')] | Annotated[UnionStep, Tag('union')] | Annotated[AggregateStep, Tag('aggregate')] | Annotated[WindowStep, Tag('window')] | Annotated[PivotStep, Tag('pivot')] | Annotated[UnpivotStep, Tag('unpivot')] | Annotated[CaseWhenStep, Tag('case_when')] | Annotated[FillNullStep, Tag('fill_null')] | Annotated[CoalesceStep, Tag('coalesce')] | Annotated[StringOpsStep, Tag('string_ops')] | Annotated[DateOpsStep, Tag('date_ops')], Discriminator(_step_discriminator)]
module-attribute
¶
Discriminated union of all pipeline step types.
Accepts a dict with a single key matching the step type name:
{"filter": {"expr": "amount > 0"}} → FilterStep
SparkExpr = NewType('SparkExpr', str)
module-attribute
¶
A Spark SQL expression string.
Used to annotate fields that hold Spark SQL expressions (e.g., filter predicates, derived column expressions). At runtime these are plain strings; the NewType signals intent and allows downstream type-checkers to distinguish expressions from arbitrary strings.
__all__ = ['Thread', 'Weave', 'Loom', 'FailureConfig', 'ThreadEntry', 'ConditionSpec', 'WeaveEntry', 'Source', 'DedupConfig', 'Target', 'ColumnMapping', 'Step', 'FilterStep', 'DeriveStep', 'JoinStep', 'SelectStep', 'DropStep', 'RenameStep', 'CastStep', 'DedupStep', 'SortStep', 'UnionStep', 'AggregateStep', 'WindowStep', 'PivotStep', 'UnpivotStep', 'CaseWhenStep', 'FillNullStep', 'CoalesceStep', 'StringOpsStep', 'DateOpsStep', 'KeyConfig', 'SurrogateKeyConfig', 'ChangeDetectionConfig', 'WriteConfig', 'ValidationRule', 'Assertion', 'LoadConfig', 'ParamSpec', 'ParamsConfig', 'ExecutionConfig', 'LogLevel', 'SparkExpr', 'HookStep', 'QualityGateStep', 'SqlStatementStep', 'LogMessageStep', 'Lookup', 'VariableSpec']
module-attribute
¶
ExecutionConfig
¶
Bases: FrozenBase
Runtime execution settings that cascade through loom/weave/thread.
Attributes:
| Name | Type | Description |
|---|---|---|
log_level |
LogLevel
|
Logging verbosity for execution output. |
trace |
bool
|
Whether to collect execution spans for telemetry. |
LogLevel
¶
Bases: StrEnum
Configurable log level for weevr execution.
Controls the verbosity of structured logging output during pipeline execution. Maps to Python logging levels internally.
FailureConfig
¶
Bases: FrozenBase
Per-thread failure handling policy.
Controls what happens to remaining threads in a weave when this thread fails.
Attributes:
| Name | Type | Description |
|---|---|---|
on_failure |
Literal['skip_downstream', 'continue', 'abort_weave']
|
One of |
LogMessageStep
¶
Bases: FrozenBase
A log message emitted as a hook step.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal['log_message']
|
Step type discriminator, always |
name |
str | None
|
Optional name for telemetry span naming. |
on_failure |
Literal['abort', 'warn'] | None
|
Failure behaviour. |
message |
str
|
Message template to log. Supports |
level |
Literal['info', 'warn', 'error']
|
Log level for the message. |
QualityGateStep
¶
Bases: FrozenBase
A quality gate check executed as a hook step.
The check field determines which check-specific fields are required.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal['quality_gate']
|
Step type discriminator, always |
name |
str | None
|
Optional name for telemetry span naming. |
on_failure |
Literal['abort', 'warn'] | None
|
Failure behaviour. |
check |
Literal['source_freshness', 'row_count_delta', 'row_count', 'table_exists', 'expression']
|
Which quality gate check to perform. |
source |
str | None
|
Table alias for |
max_age |
str | None
|
Duration string for |
target |
str | None
|
Table alias for |
max_decrease_pct |
float | None
|
Max allowed decrease percentage for |
max_increase_pct |
float | None
|
Max allowed increase percentage for |
min_delta |
int | None
|
Minimum absolute row change for |
max_delta |
int | None
|
Maximum absolute row change for |
min_count |
int | None
|
Minimum row count for |
max_count |
int | None
|
Maximum row count for |
sql |
str | None
|
Spark SQL boolean expression for |
message |
str | None
|
Failure message for |
SqlStatementStep
¶
Bases: FrozenBase
An arbitrary SQL statement executed as a hook step.
Optionally captures the scalar result into a weave-scoped variable
via set_var.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal['sql_statement']
|
Step type discriminator, always |
name |
str | None
|
Optional name for telemetry span naming. |
on_failure |
Literal['abort', 'warn'] | None
|
Failure behaviour. |
sql |
str
|
Spark SQL statement to execute. |
set_var |
str | None
|
Optional variable name to capture the scalar result into. |
ChangeDetectionConfig
¶
Bases: FrozenBase
Configuration for change detection hash generation.
KeyConfig
¶
Bases: FrozenBase
Key management configuration for a thread.
SurrogateKeyConfig
¶
Bases: FrozenBase
Configuration for surrogate key generation.
LoadConfig
¶
Bases: FrozenBase
Incremental load mode and watermark parameters.
Cross-field validation:
- mode == "incremental_watermark" requires watermark_column to be set.
- mode == "cdc" requires cdc config to be set.
- mode == "cdc" must not have watermark_column set
(CDF version tracking is automatic).
Lookup
¶
Bases: FrozenBase
A weave-level named data definition that can be referenced by threads.
Lookups define small reference datasets shared across threads in a weave.
When materialize is true, the data is read once and cached (or broadcast)
before threads execute.
Attributes:
| Name | Type | Description |
|---|---|---|
source |
Source
|
Source definition for the lookup data. |
materialize |
bool
|
Whether to pre-read and cache/broadcast the data before thread execution. |
strategy |
Literal['broadcast', 'cache']
|
Materialization strategy. Only meaningful when |
Loom
¶
Bases: FrozenBase
A deployment unit containing weave references with optional shared defaults.
WeaveEntry
¶
Bases: FrozenBase
A weave reference within a loom, with optional condition.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Weave name. Required for inline definitions; derived from filename stem for external references. |
ref |
str | None
|
Path to an external |
condition |
ConditionSpec | None
|
Optional condition for conditional execution. |
ParamsConfig
¶
Bases: BaseModel
Parameter file schema (flat key-value structure).
Uses a mutable BaseModel rather than FrozenBase because it is a
validation schema for param files, not a domain object. It relies on
extra="allow" for arbitrary param keys and on
model_dump(exclude_unset=True) within the config pipeline.
ParamSpec
¶
Bases: FrozenBase
Typed parameter specification for config-level param declarations.
AggregateStep
¶
Bases: FrozenBase
Pipeline step: aggregate rows with optional grouping.
CaseWhenStep
¶
Bases: FrozenBase
Pipeline step: conditional column values.
CastStep
¶
Bases: FrozenBase
Pipeline step: cast column types.
CoalesceStep
¶
Bases: FrozenBase
Pipeline step: coalesce columns.
DateOpsStep
¶
Bases: FrozenBase
Pipeline step: apply expression template to date/timestamp columns.
DedupStep
¶
Bases: FrozenBase
Pipeline step: deduplicate rows.
DeriveStep
¶
Bases: FrozenBase
Pipeline step: derive one or more new columns from Spark SQL expressions.
DropStep
¶
Bases: FrozenBase
Pipeline step: drop columns.
FillNullStep
¶
Bases: FrozenBase
Pipeline step: fill null values with defaults.
FilterStep
¶
Bases: FrozenBase
Pipeline step: filter rows using a Spark SQL expression.
JoinStep
¶
Bases: FrozenBase
Pipeline step: join with another source.
PivotStep
¶
Bases: FrozenBase
Pipeline step: pivot rows to columns.
RenameStep
¶
Bases: FrozenBase
Pipeline step: rename columns.
SelectStep
¶
Bases: FrozenBase
Pipeline step: select a subset of columns.
SortStep
¶
Bases: FrozenBase
Pipeline step: sort rows.
StringOpsStep
¶
Bases: FrozenBase
Pipeline step: apply expression template to string columns.
UnionStep
¶
Bases: FrozenBase
Pipeline step: union with other sources.
UnpivotStep
¶
Bases: FrozenBase
Pipeline step: unpivot columns to rows.
WindowStep
¶
Bases: FrozenBase
Pipeline step: apply window functions.
DedupConfig
¶
Bases: FrozenBase
Deduplication configuration applied immediately after reading a source.
Source
¶
Bases: FrozenBase
A data source declaration.
A source is either a direct data reference (with type) or a lookup
reference (with lookup). These are mutually exclusive.
Cross-field validation rules:
- If lookup is set: type must not be set (lookup references are
resolved at execution time from weave-level lookup definitions).
- If lookup is not set: type is required.
- type == "delta" requires alias to be set.
- type in file types (csv, json, parquet, excel) requires path to be set.
ColumnMapping
¶
Bases: FrozenBase
Mapping specification for a single target column.
Cross-field validation:
- expr and drop=True are mutually exclusive.
Target
¶
Bases: FrozenBase
Write destination with column mapping and partitioning configuration.
Thread
¶
Bases: FrozenBase
Complete domain model for a thread configuration.
A thread is the smallest unit of work: one or more sources, a sequence of transformation steps, and a single target.
Assertion
¶
Bases: FrozenBase
A post-execution assertion on the target dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal['row_count', 'column_not_null', 'unique', 'expression']
|
Assertion kind — built-in or expression-based. |
severity |
Literal['info', 'warn', 'error', 'fatal']
|
Severity level when the assertion fails. |
columns |
list[str] | None
|
Columns involved, for column_not_null and unique types. |
min |
int | None
|
Minimum bound for row_count assertions. |
max |
int | None
|
Maximum bound for row_count assertions. |
expression |
SparkExpr | None
|
Spark SQL expression for expression-type assertions. |
ValidationRule
¶
Bases: FrozenBase
A pre-write data quality rule evaluated as a Spark SQL expression.
VariableSpec
¶
Bases: FrozenBase
Declaration of a weave-scoped variable.
Variables are typed scalar values that can be set by hook steps via
set_var and referenced in downstream config as ${var.name}.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
Literal['string', 'int', 'long', 'float', 'double', 'boolean', 'timestamp', 'date']
|
Scalar type of the variable value. |
default |
str | int | float | bool | None
|
Optional default value used when no hook sets the variable. |
ConditionSpec
¶
Bases: FrozenBase
A condition expression for conditional execution.
Attributes:
| Name | Type | Description |
|---|---|---|
when |
str
|
Condition expression string. Supports parameter references
( |
ThreadEntry
¶
Bases: FrozenBase
A thread reference within a weave, with optional per-thread overrides.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Thread name. Required for inline definitions; derived from filename stem for external references. |
ref |
str | None
|
Path to an external |
dependencies |
list[str] | None
|
Explicit upstream thread names. When set, these are merged with any auto-inferred dependencies from source/target path matching. |
condition |
ConditionSpec | None
|
Optional condition for conditional execution. When set, the thread is only executed if the condition evaluates to True. |
Weave
¶
Bases: FrozenBase
A collection of thread references with optional shared defaults.
WriteConfig
¶
Bases: FrozenBase
Write mode and merge parameters for a thread target.
Cross-field validation:
- mode == "merge" requires match_keys to be set.
- on_no_match_source == "soft_delete" requires soft_delete_column to be set.